Home

Awesome

DOM Distiller

DOM Distiller provides a better reading experience for articles and article-like web pages by extracting the core text and stripping non-essential from the page.

Projects and features powered by DOM Distiller:

DOM Distiller is loosely based on "Boilerpipe" by Christian Kohlschütter, Peter Fankhauser and Wolfgang Nejdl.

Report a bug

Bugs and feature requests are tracked in Chromium's issue tracker, crbug. DOM Distiller bugs are filed under component:UI>Browser>ReaderMode.

Examples of bugs that should be reported:

How to use Chrome's Reader Mode

Android

Reader Mode has launched on Android and should be available on any up-to-date version of Chrome. Simply visit a non-mobile-friendly article and tap on the "Show simplified view" infobar when it appears at the bottom of the screen. You may need to first enable the feature via accessibility settings.

Desktop

Reader Mode for Chrome on desktop is still in development. As of M80, an experimental preview of the feature can be activated by following these steps:

  1. Open Chrome on your desktop computer.
  2. Navigate to chrome://flags and search for "enable-reader-mode" with the in-page search box. Alternatively, you may go directly to the setting by visiting chrome://flags#enable-reader-mode.
  3. Click the dropdown box and select "Enabled".
  4. Click "Relaunch Now" at the bottom of the page when prompted.
  5. Visit an article or article-like page, and a Reader Mode icon should appear in the omnibox. Click the icon to enter Reader Mode.

Continuous integration

Build Status

CI waterfall

Environment setup

You must install the build dependencies before building for the first time. The following are required on all platforms:

Get the code

  1. Change to the directory where you want the code.

    • Do not put it inside your main Chromium checkout, i.e. chromium/src.
  2. Clone this git repo:

    git clone https://chromium.googlesource.com/chromium/dom-distiller
    

The code will be located inside the newly created dom-distiller folder.

Developing on Ubuntu/Debian

Install the dependencies by entering the dom-distiller folder and running:

sudo ./install-build-deps.sh

Developing on Mac OS X

  1. Install JDK 7 with your organization's software management tool, or download it from Oracle.

  2. Install Homebrew.

  3. Install ant and python using Homebrew:

    brew install ant python
    
  4. Install the protocol buffer compiler with Python bindings:

    brew install protobuf --with-python
    
  5. Create a folder named buildtools inside your DOM Distiller checkout.

  6. Download ChromeDriver.

  7. Unzip the chromedriver_mac32.zip and ensure the binary ends up in your buildtools folder.

  8. Install the PyPI package management tool pip:

    sudo easy_install pip
    
  9. Install selenium using pip:

    pip install --user selenium
    

This guide sometimes references a tool called xvfb, specifically when running shell commands with xvfb-run. You can remove that part of the command when developing on Mac OS X. For example, xvfb-run echo becomes echo.

Developing with Vagrant

Development is supported only on the above operating systems. We recommend using Vagrant for development on other systems, such as Windows or Red Hat Linux.

  1. Install Vagrant on your system. Version 1.7.2 or higher is recommended.

  2. Launch the Vagrant VM instance

    vagrant up
    
  3. SSH to the VM

    vagrant ssh
    
  4. Follow the steps for developing on Ubuntu/Debian.

Tools for contributing

DOM Distiller uses Chromium's collaboration tools. Code reviews are hosted on Chromium Gerrit, and you must install depot_tools by following the guide at Chrome infrastructure documentation for depot_tools.

Formatting code

You can run git cl format to update your code to follow DOM Distiller's code formatting guidelines. You must add the following symbolic links to the buildtools folder in your checkout for the command to work correctly:

Building

Using ant

ant is the tool we use to build. All available targets can be listed using ant -p.

Some important targets that you are likely to use while working on the project:

Contributing

You can use most regular git commands during development and git cl for collaboration.

Preparing changes for review

Create a new local branch and commit the changes you want to make. When you are done, please run git cl format to standardize the code format before uploading.

Uploading changes to Gerrit

Checkout your local branch with the changes you want to have reviewed and run git cl upload to create a change list (CL) at Chromium Gerrit.

The first time you do this, you will have to provide a username and password.

Landing your changes

Once your reviewer approves your changes, you can click "Submit to CQ" to land your changes.

Run in Chrome for desktop

  1. Verify that the following environment variables are set:

    export CHROME_SRC=/path/to/chromium/src
    export DOM_DISTILLER_DIR=/path/to/dom-distiller
    
  2. Run ant package and copy the generated files into Chrome. You can use this bash function to automate the process:

    roll-distiller () {
      (
        (cd $DOM_DISTILLER_DIR && ant package) && \
        rm -rf $CHROME_SRC/third_party/dom_distiller_js/dist/* && \
        cp -rf $DOM_DISTILLER_DIR/out/package/* $CHROME_SRC/third_party/dom_distiller_js/dist/ && \
        touch $CHROME_SRC/components/resources/dom_distiller_resources.grdp
      )
    }
    
  3. From $CHROME_SRC run GN to setup ninja build files using

    gn args out/Debug
    

Running the Chrome browser with distiller support

Build Chrome with the chrome target and run it with DOM Distiller enabled:

autoninja -C out/Debug chrome && out/Debug/chrome --enable-dom-distiller

You can distill web pages in any of the following ways:

To have a unique user profile every time you run Chrome, you can add --user-data-dir=/tmp/$(mktemp -d) as a command line parameter. On Mac OS X, you can instead write --user-data-dir=$(mktemp -d 2>/dev/null || mktemp -d -t 'chromeprofile').

Running the automated tests in Chromium

  1. Build the components_browsertests target:

    autoninja -C out/Debug components_browsertests
    
  2. Run the components_browsertests binary to execute the tests:

    out/Debug/components_browsertests
    

Some additional tips for running tests:

Additional documentation about testing in Chromium can be found on Google Test's GitHub page.

Running the content extractor

To extract the content from a web page directly, you can run

xvfb-run out/Debug/components_browsertests \
  --gtest_filter='*MANUAL_ExtractUrl' \
  --run-manual \
  --test-tiny-timeout=600000 \
  --output-file=./extract.out \
  --url=http://www.example.com \
  > ./extract.log 2>&1

extract.out has the extracted HTML, extract.log has the console logging.

If you need more logging, you can add the following arguments to the command:

If this is something you often do, you can put the following function in a bash file you include (for example ~/.bashrc) and use it for iterative development:

distill() {
  (
    roll-distiller && \
    autoninja -C out/Debug components_browsertests &&
    xvfb-run out/Debug/components_browsertests \
      --gtest_filter='*MANUAL_ExtractUrl' \
      --run-manual \
      --test-tiny-timeout=600000 \
      --output-file=./extract.out \
      --url=$1 \
      > ./extract.log 2>&1
  )
}

Usage when running from $CHROME_SRC:

distill http://example.com/article.html

Debugging

Interactive debugging

You can use the Chrome Developer Tools to debug DOM Distiller:

The Sources panel contains both the extracted JavaScript and the Java source files, as long as you haven't disabled JavaScript source maps in Developer Tools. You can set breakpoints in the Java source files to stop the code execution and examine a variety of useful information, such as variable values.

When a test fails, you will see several stack traces. One of these contains clickable links to the corresponding Java source files for the stack frames.

Developer extension

ant package generates an unpacked Chrome extension under out/extension, which you can add to the browser with the following steps:

  1. Go to chrome://extensions
  2. Enable developer mode
  3. Select to load an unpacked extension and point to the out/extension folder.

The extension currently supports profiling the extraction code.

It also adds a panel to the Developer Tools which you can use to trigger extraction on the inspected page. This can be used to trigger and profile extraction on a mobile device which you are currently inspecting using chrome://inspect.

Logging

Use LogUtil.logToConsole() to log information for debugging. Where the log output is stored varies with how DOM Distiller is run:

For an example, see $DOM_DISTILLER_DIR/java/org/chromium/distiller/PagingLinksFinder.java.

Use ant package '-Dgwt.custom.args=-style PRETTY' for easier JavaScript debugging.

Mobile distillation from desktop

  1. In the tab with the interesting URL, bring up the Developer Tools emulation panel (the mobile device icon).
  2. Select the desired Device and reload the page. Verify that you get what you expect. For example a Nexus 4 might get a mobile site, whereas Nexus 7 might get the desktop site.
  3. Copy the User-Agent from the UA field to the clipboard. This field does require reload after changing device, but it is good practice to verify that you get what you expect.
  4. Re-start chrome with --user-agent="$USER_AGENT_FROM_CLIPBOARD". Remember to also add --enable-dom-distiller.
  5. Select Toggle distilled page contents from the menu to display the distilled page.

If you want you can copy some of these User-Agent aliases into normal bash aliases for easy access later. For example, Nexus 4 would be:

--user-agent="Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 4 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.166 Mobile Safari/535.19"