Awesome
FlexRML - A Flexible RML Processor
FlexRML provides a robust RML processing solution tailored for different devices. Whether you're working with microcontrollers, single-board computers, consumer hardware, or cloud environments, FlexRML ensures seamless integration and efficient processing.
Description
RML (RDF Mapping Language) is central to data transformation and knowledge graph construction. FlexRML is a flexible RML processor optimized for a wide range of devices:
- Microcontrollers
- Single Board Computers
- Consumer Hardware
- Cloud Environments
Currently, FlexRML only supports data in CSV format. However, future versions will include support for additional data formats such as JSON and XML.
Installation
Using Prebuilt Binaries
Prebuilt binaries for various systems are available in the releases section.
Compiling from Source
Prerequisites
Before compilation, set up a build environment on your system. On Debian-based systems, this can be done using:
apt install build-essential cmake git curl zip unzip tar
Additionally, ensure that you have vcpkg
installed as it will be used for managing dependencies.
Compilation Process:
- Clone or download the repository. Clone or download the repository from GitHub and navigate to the project directory.
git clone git@github.com:wintechis/flex-rml.git
cd flexrml
- Install
vcpkg
as package manager. If you haven't installedvcpkg
, clone it from GitHub and bootstrap it.
git clone https://github.com/microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
- Configure the project with CMake Use CMake to configure the project, specifying the vcpkg toolchain file and the paths to dependencies if necessary. Note: You need to adjust the path to vcpkg.
cmake -B build -S . -DCMAKE_TOOLCHAIN_FILE=/path/to/vcpkg/scripts/buildsystems/vcpkg.cmake
- Compile the project
cmake --build build
- After compilation, the executable
flexrml
will be available in thebuild
directory. You can run it using:
Troubleshooting
- If you encounter errors during the CMake configuration, ensure the paths to serd and cityhash are correctly specified.
- Make sure your system has the correct C++ compiler installed (GCC or Clang).
- Clean the build directory if you face repeated configuration issues:
rm -rf build/*
Getting Started
Depending on the use case and environment FlexRML is executed, different configurations are usefull.
Fastest Execution Speed
The method prioritizes faster execution speed at the expense of increased memory usage. It uses a 128-bit hash function to identify duplicates and bypasses the result size estimation step to achieve faster performance. To use this mode, run the following command:
./flexrml -m [path] -d -t
Lowest memory consumption
This mode is optimized for minimal memory usage by using a result size estimator to approximate the number of N-Quads generated. Although this process takes more time due to the additional computation, it conserves memory, in particular when the estimated number of N-Quads is less than 135,835,773. This approach is beneficial in memory-constrained environments. To enable this mode, use the following command:
./flexrml -m [path] -d -t -a
More informatioin about available flags can be found on the wiki.
Example
In the example folder, there is a mapping.ttl file that contains RML rules for mapping sensor data to RDF, and a sensor_values.csv file.
The sensor_values.csv contains:
id | name | value | unit |
---|---|---|---|
10 | Sensor1 | 24 | C |
20 | Sensor2 | 72.2 | F |
30 | Sensor3 | 34 | C |
If you are in the example folder and run:
./flexrml -m ./mapping.ttl -o output_file.nq -d
The resulting RDF graph can be found in output_file.nq. The graph looks like this:
Conformance
FlexRML is validated against applicable RML test cases to ensure conformance with the specification. <br> Currently, only CSV-related test cases are applicable.
Specification | Coverage |
---|---|
RML-Core | 100% Coverage |
RML-IO | Work in Progress |
RML-CC | Work in Progress |
Planned Features for FlexRML
We are constantly working to improve FlexRML and expand its capabilities. Here's what we have planned for the future development of FlexRML:
- Add Support for Other Data Encodings Enhancing FlexRML to work with various data formats.
- JSON
- Add JSON reader and JSON Path parser
- Adjust generation of index for hash join to JSON
- Adjust result size estimation to JSON
- XML
- JSON
- Add Support for N-Triple RDF Serialization Implementing N-Triple format compatibility for broader RDF serialization options.
- Improve Performance of Join Algorithm Optimize the current join algorithm for faster and more efficient data processing.
- Provide Library for Arduinos Develop a specialized library to make FlexRML easier useable on Arduino devices, expanding its use in IoT applications.
- Support latest RML vocabulary Modify the parsing of RML rules to allow the new RML vocabulary to be used.
We welcome community feedback and contributions! If you have suggestions or want to contribute to any of these features, please let us know through GitHub issues.
ESP32 Compatible Version
For those working with ESP32, we have a dedicated version of this project. It's tailored specifically for compatibility with ESP32 and the Arduino IDE. You can access it and find detailed instructions for setup and use at the following link: FlexRML ESP32 Repository
JavaScript Compatible Version
For those working with JavaScript, we have created a Webassembly version of FlexRML. FlexRML-node is published on npm.
Citation
If you use this work in your research, please cite it as:
@article{Freund_FlexRML_A_Flexible_2024,
author = {Freund, Michael and Schmid, Sebastian and Dorsch, Rene and Harth, Andreas},
journal = {Extended Semantic Web Conference},
title = {{FlexRML: A Flexible and Memory Efficient Knowledge Graph Materializer}},
year = {2024}
}
Licenses
Project License
This project is licensed under the GNU Affero General Public License version 3 (AGPLv3). The full text of the license can be found in the LICENSE
file in this repository.
External Libraries
This project uses external libraries:
- Serd is licensed under the ISC License.
- CityHash is licensed under the MIT License.
- AdrduinoJson is licensed under the MIT License.