Awesome
Zyte SmartProxy Plugin
A plugin for playwright-extra and puppeteer-extra to provide Smart Proxy Manager specific functionalities.
QuickStart for playwright-extra
- Install Zyte SmartProxy Plugin
npm install playwright playwright-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth @cliqz/adblocker-playwright
- Create a file
sample.js
with following content and replace<SPM_APIKEY>
with your SPM Apikey
// playwright-extra is a drop-in replacement for playwright,
// it augments the installed playwright with plugin functionality
const { chromium } = require('playwright-extra')
// add zyte-smartproxy-plugin
const SmartProxyPlugin = require('zyte-smartproxy-plugin');
chromium.use(SmartProxyPlugin({
spm_apikey: '<SPM_APIKEY>',
static_bypass: false, // enable to save bandwidth (but may break some websites)
}));
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
chromium.use(StealthPlugin());
// create adblocker to block all ads (saves bandwidth)
const { PlaywrightBlocker } = require('@cliqz/adblocker-playwright');
const fetch = require('cross-fetch');
// playwright usage as normal
(async () => {
const adBlocker = await PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch);
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage({ignoreHTTPSErrors: true});
// uncomment to enable adBlocker (saves bandwidth but may break some websites)
// adBlocker.enableBlockingInPage(page);
await page.goto('https://toscrape.com', {timeout: 180000});
await page.screenshot({path: 'screenshot.png'})
await browser.close();
})();
Make sure that you're able to make https
requests using Smart Proxy Manager by following this guide Fetching HTTPS pages with Zyte Smart Proxy Manager
- Run
sample.js
using Node
node sample.js
QuickStart for puppeteer-extra
- Install Zyte SmartProxy Plugin
npm install puppeteer puppeteer-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth puppeteer-extra-plugin-adblocker
- Create a file
sample.js
with following content and replace<SPM_APIKEY>
with your SPM Apikey
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
// add zyte-smartproxy-plugin
const SmartProxyPlugin = require('zyte-smartproxy-plugin');
puppeteer.use(SmartProxyPlugin({
spm_apikey: '<SPM_APIKEY>',
static_bypass: false, // enable to save bandwidth (but may break some websites)
}));
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
// uncomment to enable adblocker plugin (saves bandwidth but may break some websites)
// const AdBlockerPlugin = require('puppeteer-extra-plugin-adblocker');
// puppeteer.use(AdBlockerPlugin({blockTrackers: true}));
// puppeteer usage as normal
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage({ignoreHTTPSErrors: true});
await page.goto('https://toscrape.com', {timeout: 180000});
await page.screenshot({path: 'screenshot.png'})
await browser.close();
})();
Make sure that you're able to make https
requests using Smart Proxy Manager by following this guide Fetching HTTPS pages with Zyte Smart Proxy Manager
- Run
sample.js
using Node
node sample.js
Zyte SmartProxy Plugin arguments
Argument | Default Value | Description |
---|---|---|
spm_apikey | undefined | Zyte Smart Proxy Manager API key that can be found on your zyte.com account. |
spm_host | http://proxy.zyte.com:8011 | Zyte Smart Proxy Manager proxy host. |
static_bypass | true | When true Zyte SmartProxy Plugin will skip proxy use (saves proxy bandwidth) for static assets defined by static_bypass_regex or pass false to use proxy. |
static_bypass_regex | /.*?\.(?:txt|json|css|less|gif|ico|jpe?g|svg|png|webp|mkv|mp4|mpe?g|webm|eot|ttf|woff2?)$/ | Regex to use filtering URLs for static_bypass . |
headers | {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'pass', 'X-Crawlera-Cookies': 'disable'} | List of headers to be appended to requests |
spm_session_id | undefined | When specified Zyte SmartProxy Plugin will use an existing Zyte Smart Proxy Manager session, otherwise a new session will be created. |
Notes
-
Some websites may not work with AdBlocker or
static_bypass
enabled (default). Try to disable them if you encounter any issues. -
When using
headless: true
mode, values generated for some browser-specific headers are a bit different, which may be detected by websites. Try using 'X-Crawlera-Profile': 'desktop' in that case:
puppeteer.use(SmartProxyPlugin({spm_apikey: '<SPM_APIKEY>', headers: {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'desktop', 'X-Crawlera-Cookies': 'disable'}}));
- When connecting to a remote Chrome browser instance, it should be launched with these arguments:
--proxy-server=http://proxy.zyte.com:8011 --disable-site-isolation-trials