Home

Awesome

!!!REPO DEPRACTICATED!!!

Look into backend headless-task-server and php helper headless-task-server-php.

Why? Main reasons is...

- Hero(ex SecretAgent) provide more stable, patched version of Chrome (not Chromium)

- Hero(ex SecretAgent) contains builded-in a lot of techics to make headless browser undetectabe for bot detectors.

PLAYWRIGHT task server

It's a Node.Js server that's hold playwright to process tasks (mainly - crawling)

Concept:

Example of request:

POST to "http://server_address:port/task"

Content-Type: application/x-www-form-urlencoded

if in config.json AUTH_KEY is not null, add header

Authorization: HERE_AUTH_KEY

in form, field with name 'script'

Example of request

!!!WARNING!!!

Field script should be a string.

fetch("http://server_address:port/task", {
  "method": "POST",
  "headers": {
    "content-type": "application/x-www-form-urlencoded",
    "authorization": "HERE_AUTH_KEY"
  },
  "body": {
      "options": {
          "proxy": {
              "server": "PROTOCOL://ADDRESS:PORT", 
              "bypass": "", 
              "username": "USERNAME", 
              "password": "PASSWORD"
          }
      }, 
      "script": "HERE_IS_SCRIPT"
  }
});

Example of script (playwright docs)

//Creating page inside context
const page = await context.newPage();

//Preparing key's for data storage
let data = {
        hosts: [],
        res: [],
        ip: null
    };

//Listener, that's catch all requests, block everything except HTML and loging them.
page.route('**', route => {
    
    //Used module.URL (instance of node.js URL)
    data.hosts.push(modules.URL.parse(route.request().url()).hostname);
    
    if (route.request().resourceType() !== 'document') 
    {
       route.abort('aborted');
    }
    else {
      data.res.push(route.request().resourceType());
      route.continue();
    }
});

//Open 2ip main page and waiting for load
await page.goto('https://2ip.ru/');

//Extracting ip from html
data.ip = (await page.$('div.ip')).innerText();

//End script execution and return data
//also can be reject in case of script failure
resolve(data);

Var data locally created and puted throw resolve. Everything from var, will be displayed in response. All manually created var's/const's/e.t.c. inside script will be ignored in response.

Also task server support modeules, custom libs set, that will be available inside runed script context.

config.json

Proxy

In config, proxy property can be null, object or per-context (default: per-context), follow this docs. Example of proxy object

{
        "server": "hostname:port",
        "bypass": "",
        "username": "usernameForProxy",
        "password": "passwordForProxy"
}

Proxy per-context configuration docs

To set GLOBAL proxy, use ENV

In case of unnecessary authorization with username & password, fields username and password can be skipped or can be null

Env

PW_TASK_KEY - Key for Authorization

PW_TASK_PORT - Running port

PW_TASK_PROXY - Proxy hostname:port

PW_TASK_USERNAME - Proxy username

PW_TASK_PASSWORD - Proxy password

Additional

PHP-Lib for generating simple task script. (lib cover min. req.)

todo