Awesome

Welcome to the real-time, cloud-native data pipeline platform based on Apache Kafka® and Kubernetes® that enables data and developer teams to unlock the full value of their data faster.

DataCater is a simple yet powerful approach to building modern, real-time data pipelines. According to reports of our users, data and dev teams save 40% of the time spent on crafting data pipelines and go from zero to production in a matter of minutes.

Users can choose from an extensive repository of filter functions, apply transformations, or code their own transforms in Python® to build their streaming data pipelines.

You find each component in this repository. See the File Structure section for orientation.

Please watch the following video if you are interested in a demo of our 2023.1 release:

Use Cases

DataCater excels at

Making real-time ETL pipelines accessible to data and developer teams
Supporting Python-based transforms for ETL and streaming use cases
Applying cloud-native principles to data development
Supporting a declarative pipeline definition, which enables DataOps and Continuous Delivery
Enabling the interactive development of ETL pipelines with minimal time to production

DataCater is not built for

EL or ELT pipelines with post-load transforms
Analytics use cases that make use of aggregations or multiple joins
Traditional batch processing

File Structure

├── .github            - Workflows for GitHub
├── filters            - Pre-defined filters
├── gradle             - Build configuration based on Gradle (https://spring.io/guides/gs/gradle/)
├── helm-charts        - Source code for public Helm Charts
│   ├── ct.yaml        - Chart Testing Configuration File (https://github.com/helm/chart-testing)
│   └── datacater      - The official DataCater Helm Chart
├── k8s-manifests      - Kubernetes (K8) resources
├── licenses           - Overview of the licenses of our dependencies
├── pipeline           - Reference implementation of a pipeline
├── platform-api       - The main application for DataCater's API
├── python-runner      - Our runner for Python-based filters and transforms
├── serde              - Our (de)serializers
├── transforms         - Pre-defined transforms
├── ui                 - A ReactJS application built on top of DataCater's API.
├── CONTRIBUTING.md    - Describes how you can contribute to the project
├── gradle.properties  - Build properties
├── gradlew            - Build Wrapper Script (https://docs.gradle.org/current/userguide/gradle_wrapper.html)
├── README.md          - The file you are reading
└── settings.gradle    - Build tool properties

Requirements

Make sure you have the following readily available before you proceed installing DataCater:

To start using DataCater

For the time being, we provide the following approach to start using DataCater in your infrastructure:

Via kubectl

Via kubectl

WARNING: Installation uses the default namespace!

The installation via kubectl uses the default namespace. If you wish to use a custom namespace, we recommend to install DataCater via Helm Chart or create the namespace upfront as described here.

kubectl apply -f k8s-manifests/minikube-with-postgres-ns-default.yaml

Wait until all services are running

kubectl get all --all-namespaces

Port-forward to service datacater-ui

kubectl port-forward svc/datacater-ui 8080:80

Browse to localhost:8080 in your browser. The default login credentials are admin:admin.

Uninstalling DataCater

If you ever want to remove DataCater or want to start over again, e.g. during development, we recommend the following steps depending on the installation routine you've chosen:

WARNING: We recommend to backup your data before proceeding

Via kubectl

kubectl delete -f k8s-manifests/minikube-with-postgres-ns-default.yaml

FAQ

How do I install DataCater into a dedicated namespace?

Create the namespace

kubectl create namespace datacater

Apply manifests with namespace option

kubectl apply --namespace=datacater -f <url>

How can I integrate DataCater with external data systems, like MySQL?

The open-core version of DataCater supports only Apache Kafka topics as sources and sinks for pipelines. If you need to integrate your pipelines with external data systems, please consider our Enterprise version, which offers connectors based on Kafka Connect. We can offer a trial to you.

How can I extend the list of transforms and filters?

You can introduce new transforms and filters by adding a folder to the directory transforms or filters. The new folder must contain a spec.yml and a transform.py or filter.py.

DataCater automatically loads all transforms and filters from these directories at startup time.

Please see our documentation for further information.

How can I contribute code changes?

Please have a look at our guide for contributors.

How can I submit feature requests?

Please open an issue in our GitHub repository. We will have a look at it to see whether it fits our product roadmap.

Do you offer a trial for the enterprise version?

Yes, please reach out to support@datacater.io to discuss options for a PoC project.

What are the features in Open Core vs. Enterprise version?

Feature	Open Core	Enterprise
API	✅
Interactive pipeline designer	✅
Pre-defined transforms	✅
Custom Python transforms	✅
Pre-defined filters	✅
Custom Python filters	✅
Declarative pipeline definitions	✅
User authentication	✅
CLI (coming soon)	✅
Collaboration and projects		✅
Plug & play connectors		✅
Data masking		✅
SAML/SSO		✅
RBAC		✅
Audit log		✅
Health notifications		✅

Support

We provide support and help in our Community Slack.

License

DataCater is source-available and licensed under the BSL 1.1, converting to the open-source Apache 2.0 license 4 years after the release.