Awesome
<h1 align="center"> <img style="width: 500px; margin:3rem 0 1.5rem;" src="https://github.com/microlinkhq/browserless/raw/master/static/logo-banner.png#gh-light-mode-only" alt="browserless"> <img style="width: 500px; margin:3rem 0 1.5rem;" src="https://github.com/microlinkhq/browserless/raw/master/static/logo-banner-light.png#gh-dark-mode-only" alt="browserless"> <br> </h1>The headless Chrome/Chromium driver on top of Puppeteer.
Highlights
- Compatible with Puppeteer API (text, screenshot, html, pdf).
- Built-in adblocker for canceling unnecessary requests.
- Shell interaction via Browserless CLI.
- Easy Google Lighthouse integration.
- Automatic retry & error handling.
- Sensible good defaults.
Installation
You can install it via npm:
npm install browserless puppeteer --save
Browserless runs on top of Puppeteer, so you need that installed to get started.
You can choose between puppeteer
, puppeteer-core
, and puppeteer-firefox
depending on your use case.
Usage
Here is a complete example showcasing some of Browserless capabilities:
const createBrowser = require('browserless')
const termImg = require('term-img')
// First, create a browserless factory
// This is similar to opening a browser for the first time
const browser = createBrowser()
// Browser contexts are like browser tabs
// You can create as many as your resources can support
// Cookies/caches are limited to their respective browser contexts, just like browser tabs
const browserless = await browser.createContext()
// Perform your required browser actions.
// e.g., taking screenshots or fetching HTML markup
const buffer = await browserless.screenshot('http://example.com', {
device: 'iPhone 6'
})
console.log(termImg(buffer))
// After your task is done, destroy your browser context
await browserless.destroyContext()
// At the end, gracefully shutdown the browser process
await browser.close()
As you can see, Browserless is implemented using a single browser process which allows you to create and destroy several browser contexts all within that process.
If you're already using Puppeteer in your project, you can layer Browserless on top of that by simply installing it.
You can also pull in additional Browserless packages for your specific needs, all of which work well with Puppeteer.
CLI
Using the Browserless command-line tool, you can interact with Browserless through a terminal window, or use it as part of an automated process:
<div style="margin: auto;"> <video poster="/static/cli.png" loop="" controls="" src="https://github.com/microlinkhq/browserless/assets/2096101/5200b2c5-d930-40e7-b128-6d23a6974c28" style="width: 100%;border-radius: 4px;" autoplay=""></video> </div>Start by installing @browserless/cli
globally on your system using your favorite package manager:
npm install -g @browserless/cli
Then run browserless
in your terminal to see the list of available commands.
Initializing a browser
Initializing Browserless creates a headless browser instance.
const createBrowser = require('browserless')
const browser = createBrowser({
timeout: 25000,
lossyDeviceName: true,
ignoreHTTPSErrors: true
})
This instance provides several high-level methods.
For example:
// Call `createContext` to create a browser tab
const browserless = await browser.createContext({ retry: 2 })
const buffer = await browserless.screenshot('https://example.com')
// Call `destroyContext` to close the browser tab.
await browserless.destroyContext()
The browser keeps running until you explicitly close it:
// At the end, gracefully shutdown the browser process
await browser.close()
.constructor(options)
The createBrowser
method supports puppeteer.launch#options.
Browserless provides additional options you can use when creating a browser instance:
defaultDevice
This will set your browser viewport to that of the specified device:
type: string
</br>
default: 'Macbook Pro 13'
lossyDeviceName
type: boolean
</br>
default: false
This allows for a margin of error when setting the device name.
// Initialize browser instance
const browser = require('browserless')({ lossyDeviceName: true });
(async () => {
// Create context/tab
const tabInstance = await browser.createContext();
// The device property is consistently set to that of a MacBook Pro even when misspelt
console.log(tabInstance.getDevice({ device: 'MacBook Pro' }))
console.log(tabInstance.getDevice({ device: 'macbook pro 13' }))
console.log(tabInstance.getDevice({ device: 'MACBOOK PRO 13' }))
console.log(tabInstance.getDevice({ device: 'macbook pro' }))
console.log(tabInstance.getDevice({ device: 'macboo pro' }))
})()
The provided name will be resolved to closest matching device.
This comes in handy in situations where the device name is set by a third-party.
mode
type: string
</br>
default: launch
</br>
values: 'launch'
| 'connect'
This specifies if the browser instance should be spawned using puppeteer.launch or puppeteer.connect.
timeout
type: number
</br>
default: 30000
This setting will change the default maximum navigation time.
puppeteer
type: Puppeteer
</br>
default: puppeteer
|puppeteer-core
|puppeteer-firefox
By default, it automatically detects which libary is installed (thus either puppeteer, puppeteer-core or puppeteer-firefox) based on your installed dependecies.
.createContext(options)
After initializing the browser, you can create a browser context which is equivalent to opening a tab:
const browserless = browser.createContext({
retry: 2
})
Each browser context is isolated, thus cookies/cache stay within its corresponding browser contexts just like with browser tabs. Each context can also have different options during its creation.
options
All of Puppeteer's browser.createBrowserContext#options are supported.
Browserless provides additional browser context options:
retry
type: number
</br>
default: 2
The number of retries that can be performed before considering a navigation as failed.
.browser()
It returns the internal Browser instance.
const headlessBrowser = await browser.browser()
console.log('My headless browser PID is', headlessBrowser.process().pid)
console.log('My headless browser version is', await headlessBrowser.version())
.respawn()
It will respawn the internal browser.
const getPID = promise => (await promise).process().pid
console.log('Process PID:', await getPID(browser.browser()))
await browser.respawn()
console.log('Process PID:', await getPID(browser.browser()))
This method is an implementation detail, normally you don't need to call it.
.close()
Used to close the internal browser.
const { onExit } = require('signal-exit')
// automatically teardown resources after
// `process.exit` is called
onExit(browser.close)
Built-in
.html(url, options)
Used to serialize the content of a target url
into HTML.
const html = await browserless.html('https://example.com')
console.log(html)
// => "<!DOCTYPE html><html><head>…"
options
Check out browserless.goto to see the full list of supported values and options.
.text(url, options)
Used to serialize the content from the target url
into plain text.
const text = await browserless.text('https://example.com')
console.log(text)
// => "Example Domain\nThis domain is for use in illustrative…"
options
See browserless.goto to know all the options and values supported.
.pdf(url, options)
It generates the PDF version of a website behind a url
.
const buffer = await browserless.pdf('https://example.com')
console.log(`PDF generated in ${buffer.byteLength()} bytes`)
options
This method uses the following options by default:
{
margin: '0.35cm',
printBackground: true,
scale: 0.65
}
Check out browserless.goto to see the full list of supported values and options.
Also, all of Puppeteer's page.pdf options are supported.
Additionally, you can setup:
margin
type: string
| string[]
</br>
default: '0.35cm'
Used to set screen margins. Supported units include:
px
for pixel.in
for inches.cm
for centimeters.mm
for millimeters.
You can set the margin properties by passing them in as an object
:
const buffer = await browserless.pdf(url.toString(), {
margin: {
top: '0.35cm',
bottom: '0.35cm',
left: '0.35cm',
right: '0.35cm'
}
})
In case a single margin value is provided, this will be used for all sides:
const buffer = await browserless.pdf(url.toString(), {
margin: '0.35cm'
})
.screenshot(url, options)
Used to generate screenshots based on a specified url
.
const buffer = await browserless.screenshot('https://example.com')
console.log(`Screenshot taken in ${buffer.byteLength()} bytes`)
options
This method uses the following options by default:
{
device: 'macbook pro 13'
}
Check out browserless.goto to see the full list of supported values and options.
Also, all of Puppeteer's page.screenshot options are supported.
Additionally, Browserless provides the following options:
codeScheme
type: string
</br>
default: 'atom-dark'
Whenever the incoming response 'Content-Type'
is set to 'json'
, The JSON payload will be presented as a formatted JSON string, beautified using the provided codeScheme
theme or by default atom-dark
.
The color schemes is based on the Prism library.
The Prism repository offers a wide range of themes to choose from as well as a CDN option.
element
type: string
</br>
Returns the first instance of a matching DOM element based on a CSS selector. This operation remains unresolved until the element is displayed on screen or the specified maximum timeout is reached.
overlay
type: object
Once the screenshot has been taken, this option allows you to apply an overlay(backdrop).
You can configure the overlay by specifying the following:
- browser: Specifies the color of the browser stencil to use, thus either
light
ordark
for light and dark mode respecitively. - background: Specifies the background to use. A number of value types are supported:
- Hexadecimal/RGB/RGBA color codes, eg.
#c1c1c1
. - CSS gradients, eg.
linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
- Image URLs, eg.
https://source.unsplash.com/random/1920x1080
.
- Hexadecimal/RGB/RGBA color codes, eg.
const buffer = await browserless.screenshot(url.toString(), {
styles: ['.crisp-client, #cookies-policy { display: none; }'],
overlay: {
browser: 'dark',
background:
'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
}
})
.destroyContext(options)
Destroys the current browser context.
const browserless = await browser.createContext({ retry: 0 })
const content = await browserless.html('https://example.com')
await browserless.destroyContext()
options
force
type: string
</br>
default: 'force'
When force
is set, it prevents the recreation of the context in case a browser action is being executed.
.getDevice(options)
Used to set a specific device type, this method sets the device properties.
browserless.getDevice({ device: 'Macbook Pro 15' })
// => {
// userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
// viewport: {
// width: 1440,
// height: 900,
// deviceScaleFactor: 2,
// isMobile: false,
// hasTouch: false,
// isLandscape: false
// }
// }
This method extends the Puppeteer.KnownDevices list by adding some missing devices.
options
device
type: string
</br>
The device descriptor name. It's used to fetch preset values associated with a device.
When lossyDeviceName is enabled, a fuzzy search rather than a strict search will be performed in order to maximize getting a result back.
viewport
type: object
</br>
Used to set extra viewport settings. These settings will be merged with the preset settings.
browserless.getDevice({
device: 'iPad',
viewport: {
isLandscape: true
}
})
headers
type: object
</br>
Extra headers that will be merged with the device presets.
browserless.getDevice({
device: 'iPad',
headers: {
'user-agent': 'googlebot'
}
})
.evaluate(fn, gotoOpts)
It exposes an interface for creating your own evaluate
function, passing you the page
and response
.
The fn
will receive page
and response
as arguments:
const ping = browserless.evaluate((page, response) => ({
statusCode: response.status(),
url: response.url(),
redirectUrls: response.request().redirectChain()
}))
await ping('https://example.com')
// {
// "statusCode": 200,
// "url": "https://example.com/",
// "redirectUrls": []
// }
You don't need to close the page; It will be closed automatically.
Internally, the method performs a browserless.goto, making it possible to pass extra arguments as a second parameter:
const serialize = browserless.evaluate(page => page.evaluate(() => document.body.innerText), {
waitUntil: 'domcontentloaded'
})
await serialize('https://example.com')
// => '<!DOCTYPE html><html><div>…'
.goto(page, options)
It performs a page.goto with a lot of extra capabilities:
const page = await browserless.page()
const { response, device } = await browserless.goto(page, { url: 'http://example.com' })
options
Any option passed here will bypass to page.goto.
Additionally, you can setup:
abortTypes
type: array
</br>
default: []
It sets the ability to abort requests based on the ResourceType.
adblock
type: boolean
</br>
default: true
It enabled the built-in adblocker by Cliqz that aborts unnecessary third-party requests associated with ads services.
animations
type: boolean
<br>
default: false
Disable CSS animations and transitions, also it sets prefers-reduced-motion consequently.
click
type: string
| string[]
</br>
Click the DOM element matching the given CSS selector.
colorScheme
type: string
</br>
default: 'no-preference'
Sets prefers-color-scheme CSS media feature, used to detect if the user has requested the system use a 'light'
or 'dark'
color theme.
device
type: string
</br>
default: 'macbook pro 13'
It specifies the device descriptor used to retrieve userAgent`` and
viewport`.
headers
type: object
An object containing additional HTTP headers to be sent with every request.
const browserless = require('browserless')
const page = await browserless.page()
await browserless.goto(page, {
url: 'http://example.com',
headers: {
'user-agent': 'googlebot',
cookie: 'foo=bar; hello=world'
}
})
This sets visibility: hidden
on the matched elements.
html
type: string
</br>
In case you provide HTML markup, a page.setContent avoiding fetch the content from the target URL.
javascript
type: boolean
<br>
default: true
When it's false
, it disables JavaScript on the current page.
mediaType
type: string
</br>
default: 'screen'
Changes the CSS media type of the page using page.emulateMediaType.
modules
type: string
| string[]
</br>
Injects <script type="module"> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js'
). - Local file (e.g., `'local-file.js').
- Inline code (e.g.,
"document.body.style.backgroundColor = 'red'"
).
const buffer = await browserless.screenshot(url.toString(), {
modules: [
'https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js',
'local-file.js',
"document.body.style.backgroundColor = 'red'"
]
})
onPageRequest
type:function
Associate a handler for every request in the page.
scripts
type: string
| string[]
</br>
Injects <script> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/@microlink/mql@0.3.12/src/browser.js'
). - Local file (e.g., `'local-file.js').
- Inline code (e.g.,
"document.body.style.backgroundColor = 'red'"
).
const buffer = await browserless.screenshot(url.toString(), {
scripts: [
'https://cdn.jsdelivr.net/npm/jquery@3.4.1/dist/jquery.min.js',
'local-file.js',
"document.body.style.backgroundColor = 'red'"
]
})
Prefer to use modules whenever possible.
scroll
type: string
Scroll to the DOM element matching the given CSS selector.
styles
type: string
| string[]
</br>
Injects <style> into the browser page.
It can accept:
- Absolute URLs (e.g.,
'https://cdn.jsdelivr.net/npm/hack@0.8.1/dist/dark.css'
). - Local file (e.g., `'local-file.css').
- Inline code (e.g.,
"body { background: red; }"
).
const buffer = await browserless.screenshot(url.toString(), {
styles: [
'https://cdn.jsdelivr.net/npm/hack@0.8.1/dist/dark.css',
'local-file.css',
'body { background: red; }'
]
})
timezone
type: string
It changes the timezone of the page.
url
type: string
The target URL.
viewport
It will setup a custom viewport, using page.setViewport method.
waitForSelector
type:string
Wait a quantity of time, selector or function using page.waitForSelector.
waitForTimeout
type:number
Wait a quantity time in milliseconds.
waitUntil
type: string
| string[]
</br>
default: 'auto'
</br>
values: 'auto'
| 'load'
| 'domcontentloaded'
| 'networkidle0'
| 'networkidle2'
When to consider navigation successful.
If you provide an array of event strings, navigation is considered to be successful after all events have been fired.
Events can be either:
'auto'
: A combination of'load'
and'networkidle2'
in a smart way to wait the minimum time necessary.'load'
: Consider navigation to be finished when the load event is fired.'domcontentloaded'
: Consider navigation to be finished when the DOMContentLoaded event is fired.'networkidle0'
: Consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.'networkidle2'
: Consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
.context()
It returns the BrowserContext associated with your instance.
const browserContext = await browserless.context()
console.log(browserContext.id)
// => 'D2CD28FDECB1859772B9C5919E563C84'
.withPage(fn, [options])
It returns a higher-order function as convenient way to interact with a page:
const getTitle = browserless.withPage((page, goto) => async opts => {
const result = await goto(page, opts)
return page.title()
})
The function will be invoked in the following way:
const title = getTitle({ url: 'https://example.com' })
fn
type: function
The function to be executed. It receives page, goto
as arguments.
options
timeout
type: number
</br>
default: browserless.timeout
This setting will change the default maximum navigation time.
.page()
It returns a standalone Page associated with the current browser context.
const page = await browserless.page()
await page.content()
// => '<html><head></head><body></body></html>'
Extended
function
The @browserless/function
package provides an isolated VM scope to run arbitrary JavaScript code with runtime access to a browser page:
const createFunction = require('@browserless/function')
const code = async ({ page }) => page.evaluate('jQuery.fn.jquery')
const version = createFunction(code)
const { isFulfilled, isRejected, value } = await version('https://jquery.com')
// => {
// isFulfilled: true,
// isRejected: false,
// value: '1.13.1'
// }
options
Besides the following properties, any other argument provided will be available during the code execution.
vmOpts
The hosted code is also running inside a secure sandbox created via vm2.
gotoOpts
Any goto#options can be passed for tuning the internal URL resolution.
lighthouse
The @browserless/lighthouse
package provides you the setup for running Lighthouse reports backed by browserless.
const createLighthouse = require('@browserless/lighthouse')
const createBrowser = require('browserless')
const { writeFile } = require('fs/promises')
const { onExit } = require('signal-exit')
const browser = createBrowser()
onExit(browser.close)
const lighthouse = createLighthouse(async teardown => {
const browserless = await browser.createContext()
teardown(() => browserless.destroyContext())
return browserless
})
const report = await lighthouse('https://microlink.io')
await writeFile('report.json', JSON.stringify(report, null, 2))
The report will be generated for the provided URL. This extends the lighthouse:default
settings. These settings are similar to the Google Chrome Audits reports on Developer Tools.
options
The Lighthouse configuration that will extend 'lighthouse:default'
settings:
const report = await lighthouse(url, {
onlyAudits: ['accessibility']
})
Also, you can extend from a different preset of settings:
const report = await lighthouse(url, {
preset: 'desktop',
onlyAudits: ['accessibility']
})
Additionally, you can setup:
The lighthouse execution runs as a worker thread, any worker#options are supported.
logLevel
type: string
</br>
default: 'error'
</br>
values: 'silent'
| 'error'
| 'info'
| 'verbose'
</br>
The level of logging to enable.
output
type: string
| string[]
</br>
default: 'json'
</br>
values: 'json'
| 'csv'
| 'html'
The type(s) of report output to be produced.
timeout
type: number
</br>
default: browserless.timeout
This setting will change the default maximum navigation time.
screencast
The @browserless/screencast
package allows you to capture each frame of a browser navigation using puppeteer.
This API is similar to screenshots, but you have a more granular control over the frame and the output:
const createScreencast = require('@browserless/screencast')
const createBrowser = require('browserless')
const browser = createBrowser()
const browserless = await browser.createContext()
const page = await browserless.page()
const screencast = createScreencast(page, {
maxWidth: 1280,
maxHeight: 800
})
const frames = []
screencast.onFrame(data => frames.push(data))
screencast.start()
await browserless.goto(page, { url, waitForTimeout: 300 })
await screencast.stop()
console.log(frames)
Check a full example generating a GIF as output.
page
type: object
The Page object.
options
See Page.startScreencast to know all the options and values supported.
Packages
browserless is internally divided into multiple packages, this way you only use code you need.
FAQ
Q: Why use browserless
over puppeteer
?
browserless does not replace puppeteer, it complements it. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.
Q: Why do you block ads scripts by default?
Headless navigation is expensive compared to just fetching the content from a website.
To speed up the process, we block ad scripts by default because most of them are resource-intensive.
Q: My output is different from the expected
Probably browserless was too smart and it blocked a request that you need.
You can active debug mode using DEBUG=browserless
environment variable in order to see what is happening behind the code:
Consider opening an issue with the debug trace.
Q: I want to use browserless
with my AWS Lambda like project
Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.
License
browserless © Microlink, released under the MIT License.<br> Authored and maintained by Microlink with help from contributors.
The logo has been designed by xinh studio.
microlink.io · GitHub microlinkhq · X @microlinkhq