Awesome
Kik and me (@oryband) are no longer maintaining this repository. Thanks for all the contributions. You are welcome to fork and continue development.
BigQuery Streamer <img src="bigquery.png" alt="BigQuery" width="32">
Stream insert data into BigQuery fast and concurrently,
using InsertAll()
.
Features
- Insert rows from multiple tables, datasets, and projects, and insert them bulk. No need to manage data structures and sort rows by tables - bqstreamer does it for you.
- Multiple background workers (i.e. goroutines) to enqueue and insert rows.
- Insert can be done in a blocking or in the background (asynchronously).
- Perform insert operations in predefined set sizes, according to BigQuery's quota policy.
- Handle and retry BigQuery server errors.
- Backoff interval between failed insert operations.
- Error reporting.
- Production ready, and thoroughly tested. We - at Rounds (now acquired by Kik) - are using it in our data gathering workflow.
- Thorough testing and documentation for great good!
Getting Started
- Install Go, version should be at least 1.5.
- Clone this repository and download dependencies:
- Version v2:
go get gopkg.in/kikinteractive/go-bqstreamer.v2
- Version v1:
go get gopkg.in/kikinteractive/go-bqstreamer.v1
- Acquire Google OAuth2/JWT credentials, so you can authenticate with BigQuery.
How Does It Work?
There are two types of inserters you can use:
SyncWorker
, which is a single blocking (synchronous) worker.- It enqueues rows and performs insert operations in a blocking manner.
AsyncWorkerGroup
, which employes multiple backgroundSyncWorker
s.- The
AsyncWorkerGroup
enqueues rows, and its background workers pull and insert in a fan-out model. - An insert operation is executed according to row amount or time thresholds for each background worker.
- Errors are reported to an error channel for processing by the user.
- This provides a higher insert throughput for larger scale scenarios.
Examples
Check the GoDoc examples section.
Contribute
- Please check the issues page.
- File new bugs and ask for improvements.
- Pull requests welcome!
Test
# Run unit tests and check coverage.
$ make test
# Run integration tests.
# This requires an active project, dataset and pem key.
$ export BQSTREAMER_PROJECT=my-project
$ export BQSTREAMER_DATASET=my-dataset
$ export BQSTREAMER_TABLE=my-table
$ export BQSTREAMER_KEY=my-key.json
$ make testintegration