Awesome
Archived Repository - Do not use this repository anymore!
SANSA got easier to use! All its code has been consolidated into a single repository at https://github.com/SANSA-Stack/SANSA-Stack
SANSA OWL
Description
SANSA OWL is a library to read OWL files into Spark or Flink. It allows files to reside in HDFS as well as in a local file system and distributes them across Spark RDDs/Datasets or Flink DataSets.
Package Structure
The package contains three modules:
sansa-owl-common
which contains platform-independent, mostly parsing specific functionalitysansa-owl-spark
which contains Spark-specific codesansa-owl-flink
which contains Flink-specific code
SANSA OWL Spark
SANSA OWL Spark mainly contains builder objects to read OWL files in different formats. Currently we support Functional Syntax, Manchester Syntax and OWL/XML syntax. Besides this we also work on building OWL axioms from other RDF formats like Turtle or N-Triples.
We support distributed representations of OWL files based on RDDs or Spark datasets. These can either contain string-based representations of single entities of the given format, e.g. single functional-style axiom descriptions like DisjointDataProperties(bar:dataProp1 bar:dataProp2)
or whole Manchester Syntax frames like
ObjectProperty: bar:prop
Characteristics:
Asymmetric
or parsed OWL API axiom objects. We call these intermediate string-based entities 'expressions' and the corresponding distributed data structures 'expressions RDDs' or 'expressions datasets'. The final data structures holding OWL API axiom objects are called 'axiom RDDs' and 'axiom datasets', respectively.
SANSA OWL Flink
SANSA OWL Flink mainly contains builder objects to read OWL files in different formats. Currently we support Functional Syntax and Manchester Syntax. Parsing support for OWL XML is planned for future releases. Besides this we also work on building OWL axioms from other RDF formats like Turtle or N-Triples.
Distributed representations can either contain string-based representations of single entities of the given format, e.g. single functional-style axiom descriptions like DisjointDataProperties(bar:dataProp1 bar:dataProp2)
or whole Manchester Syntax frames like
ObjectProperty: bar:prop
Characteristics:
Asymmetric
or parsed OWL API axiom objects. We call these intermediate string-based entities 'expressions' and the corresponding distributed data structure 'expressions dataset'. The final data structure holding OWL API axiom objects is called 'axiom dataset'.
Usage
The following Scala code shows how to read an OWL file in Functional Syntax (be it a local file or a file residing in HDFS) into a Spark RDD:
import net.sansa_stack.owl.spark.owl._
val syntax = Syntax.FUNCTIONAL
val input = "path/to/functional/syntax/file.owl"
val rdd = spark.owl(syntax)(input)
We also provide builder objects for the other described OWL formats and data structures. The same holds for the Flink implementations. An overview is given in the FAQ section of the SANSA project page. Further documentation about the builder objects can also be found on the ScalaDoc page.
How to Contribute
We always welcome new contributors to the project! Please see our contribution guide for more details on how to get started contributing to SANSA.