Awesome
S3Sync
Really fast sync tool for S3
Features
- Multi-threaded file downloading/uploading
- Can sync to multiple ways:
- S3 to local FS
- Local FS to S3
- S3 to S3
- Retrying on errors
- Live statistics
- Rate limiting by objects
- Rate limiting by bandwidth
- Flexible filters by extension, Content-Type, ETag and object mtime
Key feature: very high speed.
Avg listing speed around 5k objects/sec for S3.
With 128 workers we get avg sync speed around 2k obj/sec (small objects 1-20 kb) (limited by 1Gb uplink).
Limitations
- Each object is loaded into RAM. So you need
<avg object size> * <workers count>
RAM.
If you don't have enough RAM, you can use swap. A large (32-64 Gb) swap on SSD does not affect the tool performance.
This happened because the tool was designed to synchronize billions of small files and optimized for this workload.
Usage
>> s3sync --help
Really fast sync tool for S3
VersionId: dev, commit: none, built at: unknown
Usage: cli [--sn] [--sk SK] [--ss SS] [--st ST] [--sr SR] [--se SE] [--tn] [--tk TK] [--ts TS] [--tt TT] [--tr TR] [--te TE] [--s3-retry S3-RETRY] [--s3-retry-sleep S3-RETRY-SLEEP] [--s3-acl S3-ACL] [--s3-storage-class S3-STORAGE-CLASS] [--s3-keys-per-req S3-KEYS-PER-REQ] [--fs-file-perm FS-FILE-PERM] [--fs-dir-perm FS-DIR-PERM] [--fs-disable-xattr] [--fs-atomic-write] [--filter-ext FILTER-EXT] [--filter-not-ext FILTER-NOT-EXT] [--filter-ct FILTER-CT] [--filter-not-ct FILTER-NOT-CT] [--filter-after-mtime FILTER-AFTER-MTIME] [--filter-before-mtime FILTER-BEFORE-MTIME] [--filter-modified] [--filter-exist] [--filter-not-exist] [--filter-dirs] [--filter-not-dirs] [--workers WORKERS] [--debug] [--sync-log] [--sync-log-format SYNC-LOG-FORMAT] [--sync-progress] [--on-fail ON-FAIL] [--error-handling ERROR-HANDLING] [--disable-http2] [--list-buffer LIST-BUFFER] [--ratelimit-objects RATELIMIT-OBJECTS] [--ratelimit-bandwidth RATELIMIT-BANDWIDTH] SOURCE TARGET
Positional arguments:
SOURCE
TARGET
Options:
--sn Don't sign request to source AWS for anonymous access
--sk SK Source AWS key
--ss SS Source AWS session secret
--st ST Source AWS token
--sr SR Source AWS Region
--se SE Source AWS Endpoint
--tn Don't sign request to target AWS for anonymous access
--tk TK Target AWS key
--ts TS Target AWS secret
--tt TT Target AWS session token
--tr TR Target AWS Region
--te TE Target AWS Endpoint
--s3-retry S3-RETRY Max numbers of retries to sync file
--s3-retry-sleep S3-RETRY-SLEEP
Sleep interval (sec) between sync retries on error
--s3-acl S3-ACL S3 ACL for uploaded files. Possible values: private, public-read, public-read-write, aws-exec-read, authenticated-read, bucket-owner-read, bucket-owner-full-control
--s3-storage-class S3-STORAGE-CLASS
S3 Storage Class for uploaded files.
--s3-keys-per-req S3-KEYS-PER-REQ
Max numbers of keys retrieved via List request [default: 1000]
--fs-file-perm FS-FILE-PERM
File permissions [default: 0644]
--fs-dir-perm FS-DIR-PERM
Dir permissions [default: 0755]
--fs-disable-xattr Disable FS xattr for storing metadata
--fs-atomic-write Enable FS atomic writes. New files will be written to temp file and renamed
--filter-ext FILTER-EXT
Sync only files with given extensions
--filter-not-ext FILTER-NOT-EXT
Skip files with given extensions
--filter-ct FILTER-CT
Sync only files with given Content-Type
--filter-not-ct FILTER-NOT-CT
Skip files with given Content-Type
--filter-after-mtime FILTER-AFTER-MTIME
Sync only files modified after given unix timestamp
--filter-before-mtime FILTER-BEFORE-MTIME
Sync only files modified before given unix timestamp
--filter-modified Sync only modified files
--filter-exist Sync only files, that exist in target storage
--filter-not-exist Sync only files, that doesn't exist in target storage
--filter-dirs Sync only files, that ends with slash (/)
--filter-not-dirs Skip files that ends with slash (/)
--workers WORKERS, -w WORKERS
Workers count [default: 16]
--debug, -d Show debug logging
--sync-log Show sync log
--sync-log-format SYNC-LOG-FORMAT
Format of sync log. Possible values: json
--sync-progress, -p Show sync progress
--on-fail ON-FAIL, -f ON-FAIL
Action on failed. Possible values: fatal, skip, skipmissing (DEPRECATED, use --error-handling instead) [default: fatal]
--error-handling ERROR-HANDLING
Controls error handling. Sum of the values: 1 for ignoring NotFound errors, 2 for ignoring PermissionDenied errors OR 255 to ignore all errors
--disable-http2 Disable HTTP2 for http client
--list-buffer LIST-BUFFER
Size of list buffer [default: 1000]
--ratelimit-objects RATELIMIT-OBJECTS
Rate limit objects per second
--ratelimit-bandwidth RATELIMIT-BANDWIDTH
Set bandwidth rate limit, byte/s, Allow suffixes: K, M, G
--help, -h display this help and exit
--version display version and exit
Examples:
- Sync Amazon S3 bucket to FS:
s3sync --sk KEY --ss SECRET -w 128 s3://shared fs:///opt/backups/s3/
- Sync S3 bucket with custom endpoint to FS:
s3sync --sk KEY --ss SECRET --se "http://127.0.0.1:7484" -w 128 s3://shared fs:///opt/backups/s3/
- Sync directory (/test) from Amazon S3 bucket to FS:
s3sync --sk KEY --ss SECRET -w 128 s3://shared/test fs:///opt/backups/s3/test/
- Sync directory from local FS to Amazon S3:
s3sync --tk KEY --ts SECRET -w 128 fs:///opt/backups/s3/ s3://shared
- Sync directory from local FS to Amazon S3 bucket directory:
s3sync --tk KEY --ts SECRET -w 128 fs:///opt/backups/s3/test/ s3://shared/test_new/
- Sync one Amazon bucket to another Amazon bucket:
s3sync --tk KEY2 --ts SECRET2 --sk KEY1 --ss SECRET1 -w 128 s3://shared s3://shared_new
- Sync S3 bucket with custom endpoint to another bucket with custom endpoint:
s3sync --tk KEY2 --ts SECRET2 --sk KEY1 --ss SECRET1 --se "http://127.0.0.1:7484" --te "http://127.0.0.1:7484" -w 128 s3://shared s3://shared_new
- Sync one Amazon bucket directory to another Amazon bucket:
s3sync --tk KEY2 --ts SECRET2 --sk KEY1 --ss SECRET1 -w 128 s3://shared/test/ s3://shared_new
- Use streaming S3 transfers (reduces memory, might be slower for small files):
s3sync --sk KEY --ss SECRET --tk KEY --ts SECRET -w 128 s3s://shared/test/ s3s://shared_new
SOURCE and TARGET should be a directory. Syncing of single file are not supported (This will not work s3sync --sk KEY --ss SECRET s3://shared/megafile.zip fs:///opt/backups/s3/
)
You can use filters.
- Timestamp filter (
--filter-after-mtime
arg) syncing only files, that has been changed after specified timestamp. Its useful for diff backups. - File extension filter (
--filter-ext
arg) syncing only files, that have specified extension. Can be specified multiple times (Like this--filter-ext .jpg --filter-ext .png --filter-ext .bmp
). - Content-type filter (
--filter-ct
arg) syncing only files, that have specified content-type. Can be specified multiple times. - Etag filter (
--filter-modified
) sync only modified files. It have few restrictions. If you are using FS storage, the files must be created using s3sync. FS storage should also support xattr. - There are also inverted filters (
--filter-not-ext
,--filter-not-ct
and--filter-before-mtime
).
Install
Download binary from Release page.
Or use docker image larrabee/s3sync like this:
docker run --rm -ti larrabee/s3sync --tk KEY2 --ts SECRET2 --sk KEY1 --ss SECRET1 -w 128 s3://shared/test/ s3://shared_new
Building
Minimum go version: 1.13
Build it with:
go mod vendor
go build -o s3sync ./cli
Using module
You can easy use s3sync in your application. See example in cli/
folder.
License
GPLv3
Notes
s3sync is a non-destructive one-way sync: it does not delete files in the destination or source paths that are out of sync.