Awesome
<h1 align="center"> <img alt="Lambda Soup" src="https://raw.githubusercontent.com/aantron/lambdasoup/master/docs/logo.png" width="250"> </img> <br> Lambda Soup </h1>Lambda Soup is a functional HTML scraping and manipulation library for OCaml aimed at being easy to use.
<br><br>
<p align="center"> <img alt="Lambda Soup usage example" src="https://raw.githubusercontent.com/aantron/lambdasoup/master/docs/sample.gif"> </img> </p><br><br>
Lambda Soup is simple. It provides a set of
elementary traversals for getting from node to node, familiar
functional combinators such as filter
, map
, and fold
, and
support for all CSS selectors that still make sense when not running in a
browser (and a few obvious extensions on top of that).
Here is a trivial self-contained example:
(parse "<p class='Hello'>World!</p>") $ ".Hello" |> R.leaf_text;;
- : string = "World!"
And, a mutation:
let soup = parse "<p class='Hello'>World!</p>" in
wrap (soup $ ".Hello" |> R.child) (create_element "strong");
soup |> to_string;;
- : string = "<p class=\"Hello\"><strong>World!</strong></p>"
For some more examples, see the Lambda Soup postprocessor that
runs on Lambda Soup's own documentation after it is generated by
ocamldoc
.
The library is tested thoroughly.
Lambda Soup is based on Markup.ml. As a consequence, it resolves entity references, detects character encodings automatically, and converts everything to UTF-8. And, you can use Lambda Soup on XML, by parsing the XML with Markup.ml and feeding the signals to Lambda Soup.
<br/>Installing
opam install lambdasoup
<br/>
Starting from scratch
To use Lambda Soup interactively as in the GIF at the top of this README, you need to have done something like this:
your-package-manager install ocaml opam
opam init
eval `opam config env` # Or restart your shell
opam install lambdasoup
and make sure your ~/.ocamlinit
file looks something like this:
let () =
try Topdirs.dir_directory (Sys.getenv "OCAML_TOPLEVEL_PATH")
with Not_found -> ()
;;
#use "topfind";;
Then, run ocaml -short-paths
to start the top-level, and scrape away!
Depending
Lambda Soup uses semantic versioning, but is currently in 0.x.x
. For now, the
minor version number will be incremented on breaking changes. So, to give
yourself a chance to review the changelog before your code breaks, put the
following constraint on Lambda Soup: lambdasoup {< "0.7.0"}
.
Documentation
Lambda Soup's interface consists of one module Soup
, whose signature is
documented here.
Developing
See CONTRIBUTING
. All feedback is welcome – open an issue on
GitHub, or send me an email at antonbachin@yahoo.com. If you find
yourself repeatedly writing the same helper on top of Lambda Soup's functions,
perhaps we should add it to Lambda Soup.
History
Lambda Soup was originally written to answer a Stack Overflow question in November 2015.