Awesome
Redpanda Connect
Redpanda Connect is a high performance and resilient stream processor, able to connect various sources and sinks in a range of brokering patterns and perform hydration, enrichments, transformations and filters on payloads.
It comes with a powerful mapping language, is easy to deploy and monitor, and ready to drop into your pipeline either as a static binary or docker image, making it cloud native as heck.
Redpanda Connect is declarative, with stream pipelines defined in as few as a single config file, allowing you to specify connectors and a list of processing stages:
input:
gcp_pubsub:
project: foo
subscription: bar
pipeline:
processors:
- mapping: |
root.message = this
root.meta.link_count = this.links.length()
root.user.age = this.user.age.number()
output:
redis_streams:
url: tcp://TODO:6379
stream: baz
max_in_flight: 20
Delivery Guarantees
Delivery guarantees can be a dodgy subject. Redpanda Connect processes and acknowledges messages using an in-process transaction model with no need for any disk persisted state, so when connecting to at-least-once sources and sinks it's able to guarantee at-least-once delivery even in the event of crashes, disk corruption, or other unexpected server faults.
This behaviour is the default and free of caveats, which also makes deploying and scaling Redpanda Connect much simpler.
Supported Sources & Sinks
AWS (DynamoDB, Kinesis, S3, SQS, SNS), Azure (Blob storage, Queue storage, Table storage), GCP (Pub/Sub, Cloud storage, Big query), Kafka, NATS (JetStream, Streaming), NSQ, MQTT, AMQP 0.91 (RabbitMQ), AMQP 1, Redis (streams, list, pubsub, hashes), Cassandra, Elasticsearch, HDFS, HTTP (server and client, including websockets), MongoDB, SQL (MySQL, PostgreSQL, Clickhouse, MSSQL), and you know what just click here to see them all, they don't fit in a README.
Documentation
If you want to dive fully into Redpanda Connect then don't waste your time in this dump, check out the documentation site.
For guidance on building your own custom plugins in Go check out the public APIs.
Install
Install on Linux:
curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip
unzip rpk-linux-amd64.zip -d ~/.local/bin/
Or use Homebrew:
brew install redpanda-data/tap/redpanda
Or pull the docker image:
docker pull docker.redpanda.com/redpandadata/connect
For more information check out the getting started guide.
Run
rpk connect run ./config.yaml
Or, with docker:
# Using a config file
docker run --rm -v /path/to/your/config.yaml:/connect.yaml docker.redpanda.com/redpandadata/connect run
# Using a series of -s flags
docker run --rm -p 4195:4195 docker.redpanda.com/redpandadata/connect run \
-s "input.type=http_server" \
-s "output.type=kafka" \
-s "output.kafka.addresses=kafka-server:9092" \
-s "output.kafka.topic=redpanda_topic"
Monitoring
Health Checks
Redpanda Connect serves two HTTP endpoints for health checks:
/ping
can be used as a liveness probe as it always returns a 200./ready
can be used as a readiness probe as it serves a 200 only when both the input and output are connected, otherwise a 503 is returned.
Metrics
Redpanda Connect exposes lots of metrics either to Statsd, Prometheus, a JSON HTTP endpoint, and more.
Tracing
Redpanda Connect also emits open telemetry tracing events, which can be used to visualise the processors within a pipeline.
Configuration
Redpanda Connect provides lots of tools for making configuration discovery, debugging and organisation easy. You can read about them here.
Build
Build with Go (any currently supported version):
git clone git@github.com:redpanda-data/connect
cd connect
make
Lint
Redpanda Connect uses golangci-lint for linting, which you can install with:
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin
And then run it with make lint
.
Plugins
It's pretty easy to write your own custom plugins for Redpanda Connect in Go, for information check out [the API docs][godoc-url], and for inspiration there's an example repo demonstrating a variety of plugin implementations.
Extra Plugins
By default Redpanda Connect does not build with components that require linking to external libraries, such as the zmq4
input and outputs. If you wish to build Redpanda Connect locally with these dependencies then set the build tag x_benthos_extra
:
# With go
go install -tags "x_benthos_extra" github.com/redpanda-data/connect/v4/cmd/redpanda-connect@latest
# Using make
make TAGS=x_benthos_extra
Note that this tag may change or be broken out into granular tags for individual components outside of major version releases. If you attempt a build and these dependencies are not present you'll see error messages such as ld: library not found for -lzmq
.
Docker Builds
There's a multi-stage Dockerfile
for creating a Redpanda Connect docker image which results in a minimal image from scratch. You can build it with:
make docker
Then use the image:
docker run --rm \
-v /path/to/your/benthos.yaml:/config.yaml \
-v /tmp/data:/data \
-p 4195:4195 \
docker.redpanda.com/redpandadata/connect run /config.yaml
Contributing
Contributions are welcome! To prevent CI errors, please always make sure a pull request has been:
- Unit tested with
make test
- Linted with
make lint
- Formatted with
make fmt
Note: most integration tests need to spin up Docker containers, so they are skipped by make test
. You can trigger
them individually via go test -run "^Test.*Integration.*$" ./internal/impl/<connector directory>/...
.