Home

Awesome

<p align="center"> <img alt="Addax Logo" src="https://github.com/wgzhao/Addax/blob/master/docs/images/logo.png?raw=true" width="205" /> </p> <p align="center">Addax is a versatile open-source ETL tool</p> <p align="center">The <a href="https://wgzhao.github.io/Addax">documentation</a> describes in detail how to install and use the plugins. It provides detailed instructions and sample configuration documentation for each plugin. </p> <p align="center"> <a href="https://github.com/wgzhao/Addax/releases"> <img src="https://img.shields.io/github/release/wgzhao/addax.svg" alt="release version"/> </a> <a href="https://github.com/wgzhao/Addax/workflows/Maven%20Package/badge.svg"> <img src="https://github.com/wgzhao/Addax/workflows/Maven%20Package/badge.svg" alt="Maven Package" /> </a> </p>

English | 简体中文

The project's initial code originated from Ali's DataX, and has been greatly improved on this basis. It also provides more read and write plugins. For more details, please refer to the difference document.

Supported Data Sources

Addax supports more than 20 SQL and NoSQL data sources. It can also be extended to support more.

<table> <tr> <td><img src="./docs/images/logos/cassandra.svg" height="50px" alt="Cassandra" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/clickhouse.svg" height="50px" alt="Clickhouse" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/databend.svg" height="50px" alt="DataBend" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/db2.svg" height="50px" alt="IMB DB2" style="border: 1px solid #ddd;"></td> </tr> <tr> <td><img src="./docs/images/logos/dbase.svg" height="50px" alt="dBase" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/doris.svg" height="50px" alt="Doris" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/elasticsearch.svg" height="50px" alt="Elasticsearch" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/excel.svg" height="50px" alt="Excel" style="border: 1px solid #ddd;"></td> </tr> <tr> <td><img src="./docs/images/logos/greenplum.svg" height="50px" alt="Greenplum" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/hbase.svg" height="50px" alt="Apache HBase" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/hive.svg" height="50px" alt="Hive" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/influxdata.svg" height="50px" alt="InfluxDB" style="border: 1px solid #ddd;"></td> </tr> <tr> <td><img src="./docs/images/logos/kafka.svg" height="50px" alt="Kafka" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/kudu.svg" height="50px" alt="Kudu" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/minio.svg" height="50px" alt="MinIO" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/mongodb.svg" height="50px" alt="MongoDB" style="border: 1px solid #ddd;"></td> </tr> <tr> <td><img src="./docs/images/logos/mysql.svg" height="50px" alt="MySQL" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/oracle.svg" height="50px" alt="Oracle" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/phoenix.svg" height="50px" alt="Phoenix" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/postgresql.svg" height="50px" alt="PostgreSQL" style="border: 1px solid #ddd;"></td> </tr> <tr> <td><img src="./docs/images/logos/presto.svg" height="50px" alt="Presto" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/redis.svg" height="50px" alt="Redis" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/s3.svg" height="50px" alt="Amazon S3" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/sqlite.svg" height="50px" alt="SQLite" style="border: 1px solid #ddd;"></td> </tr> <tr> <td><img src="./docs/images/logos/sqlserver.svg" height="50px" alt="SQLServer" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/starrocks.svg" height="50px" alt="Starrocks" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/sybase.svg" height="50px" alt="Sybase" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/tdengine.svg" height="50px" alt="TDengine" style="border: 1px solid #ddd;"></td> </tr> <tr> <td><img src="./docs/images/logos/trino.svg" height="50px" alt="Trino" style="border: 1px solid #ddd;"></td> <td><img src="./docs/images/logos/access.svg" height="50px" alt="Access" style="border: 1px solid #add;"></td> <td><img src="./docs/images/logos/sap.svg" height="50px" alt="SAP HANA" style="border: 1px solid #add;"></td> </tr> </table>

Getting Started

Use docker image

docker pull quay.io/wgzhao/addax:latest
docker run -ti --rm --name addax quay.io/wgzhao/addax:latest /opt/addax/bin/addax.sh /opt/addax/job/job.json

If you want to use common reader and writer plugins, you can pull the image whose name ends with -lite, it's very small.

docker pull quay.io/wgzhao/addax:latest-lite
docker run -ti --rm --name addax quay.io/wgzhao/addax:latest-lite /opt/addax/bin/addax.sh /opt/addax/job/job.json

[here][lite-vs-default.md] is the difference between the default image and the lite image.

Use install script

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/wgzhao/Addax/master/install.sh)"

This script installs Addax to its preferred prefix (/usr/local for macOS Intel, /opt/addax for Apple Silicon and /opt/addax/ for Linux)

Compile and Package

git clone https://github.com/wgzhao/addax.git addax
cd addax
mvn clean package
mvn package assembly:single

After successful compilation and packaging, a addax-<version> folder will be created in the target/datax directory of the project directory, where <version indicates the version.

Begin your first task

The job subdirectory contains many sample jobs, of which job.json can be used as a smoke-out test and executed as follows

bin/addax.sh job/job.json

The output of the above command is roughly as follows.

<details> <summary>Click to expand</summary>
$ bin/addax.sh job/job.json
  ___      _     _
 / _ \    | |   | |
/ /_\ \ __| | __| | __ ___  __
|  _  |/ _` |/ _` |/ _` \ \/ /
| | | | (_| | (_| | (_| |>  <
\_| |_/\__,_|\__,_|\__,_/_/\_\

:: Addax version ::    (v4.0.13-SNAPSHOT)

2023-05-14 11:43:38.040 [        main] INFO  VMInfo               - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2023-05-14 11:43:38.062 [        main] INFO  Engine               -
{
	"setting":{
		"speed":{
			"byte":-1,
			"channel":1,
			"record":-1
		}
	},
	"content":{
		"reader":{
			"name":"streamreader",
			"parameter":{
				"sliceRecordCount":10,
				"column":[
					{
						"value":"addax",
						"type":"string"
					},
					{
						"value":19890604,
						"type":"long"
					},
					{
						"value":"1989-06-04 11:22:33 123456",
						"type":"date",
						"dateFormat":"yyyy-MM-dd HH:mm:ss SSSSSS"
					},
					{
						"value":true,
						"type":"bool"
					},
					{
						"value":"test",
						"type":"bytes"
					}
				]
			}
		},
		"writer":{
			"name":"streamwriter",
			"parameter":{
				"print":true,
				"encoding":"UTF-8"
			}
		}
	}
}

2023-05-14 11:43:38.092 [        main] INFO  JobContainer         - The jobContainer begins to process the job.
2023-05-14 11:43:38.107 [       job-0] INFO  JobContainer         - The Reader.Job [streamreader] perform prepare work .
2023-05-14 11:43:38.107 [       job-0] INFO  JobContainer         - The Writer.Job [streamwriter] perform prepare work .
2023-05-14 11:43:38.108 [       job-0] INFO  JobContainer         - Job set Channel-Number to 1 channel(s).
2023-05-14 11:43:38.108 [       job-0] INFO  JobContainer         - The Reader.Job [streamreader] is divided into [1] task(s).
2023-05-14 11:43:38.108 [       job-0] INFO  JobContainer         - The Writer.Job [streamwriter] is divided into [1] task(s).
2023-05-14 11:43:38.130 [       job-0] INFO  JobContainer         - The Scheduler launches [1] taskGroup(s).
2023-05-14 11:43:38.138 [ taskGroup-0] INFO  TaskGroupContainer   - The taskGroupId=[0] started [1] channels for [1] tasks.
2023-05-14 11:43:38.141 [ taskGroup-0] INFO  Channel              - The Channel set byte_speed_limit to -1, No bps activated.
2023-05-14 11:43:38.141 [ taskGroup-0] INFO  Channel              - The Channel set record_speed_limit to -1, No tps activated.
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
addax  19890604	1989-06-04 11:24:36	true	test
2023-05-14 11:43:41.157 [       job-0] INFO  AbstractScheduler    - The scheduler has completed all tasks.
2023-05-14 11:43:41.158 [       job-0] INFO  JobContainer         - The Writer.Job [streamwriter] perform post work.
2023-05-14 11:43:41.159 [       job-0] INFO  JobContainer         - The Reader.Job [streamreader] perform post work.
2023-05-14 11:43:41.162 [       job-0] INFO  StandAloneJobContainerCommunicator - Total 10 records, 260 bytes | Speed 86B/s, 3 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2023-05-14 11:43:41.596 [       job-0] INFO  JobContainer         -
Job start  at             : 2023-05-14 11:43:38
Job end    at             : 2023-05-14 11:43:41
Job took secs             :                  3ss
Average   bps             :               86B/s
Average   rps             :              3rec/s
Number of rec             :                  10
Failed record             :                   0
</details>

Here and Here provides all kinds of job configuration examples

Runtime Requirements

Documentation

compile

First, you need install the following python3 modules

python3 -m pip install mkdocs-material

you can using mkdocs command to build or preview on local

mkdocs build
mkdocs serve -a 0.0.0.0:8888

using the following command to publish release doc

export version=4.1.5
git checkout $version
mike deploy $version
git checkout gh-pages
git push -u origin gh-pages

Code Style

We recommend you use IntelliJ as your IDE. The code style template for the project can be found in the codestyle repository along with our general programming and Java guidelines. In addition to those you should also adhere to the following:

Star History

Star History Chart

License

This software is free to use under the Apache License Apache license.

Special Thanks

Special thanks to JetBrains for his supports to this project.