Awesome
<div align="center">🎁 cargo-fetcher
Alternative to cargo fetch
for use in CI or other "clean" environments that you want to quickly bootstrap with the necessary crates to compile/test etc your project(s).
Why?
- You run many CI jobs in clean and/or containerized environments and you want to quickly fetch cargo registries and crates so that you can spend your compute resources on actually compiling and testing the code, rather than downloading dependencies.
Why not?
- Other than the
fs
storage backend, the only supported backends are the 3 major cloud storage backends, as it is generally beneficial to store crate and registry information in the same cloud as you are running your CI jobs to take advantage of locality and I/O throughput. cargo-fetcher
should not be used in a typical user environment as it completely disregards various safety mechanisms that are built into cargo, such as file-based locking.cargo-fetcher
assumes it is running in an environment with high network throughput and low latency.
Supported Storage Backends
gcs
The gcs
feature enables the use of Google Cloud Storage as a backend.
- Must provide a url to the
-u | --url
parameter with the gsutil syntaxgs://<bucket_name>(/<prefix>)?
- Must provide GCP service account credentials either with
--credentials
or via theGOOGLE_APPLICATION_CREDENTIALS
environment variable
s3
The s3
feature enables the use of Amazon S3 as a backend.
- Must provide a url to the
-u | --url
parameter, it must of the formhttp(s)?://<bucket>.s3(-<region>).<host>(/<prefix>)?
- Must provide AWS IAM user via the environment
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
described here or run from an ec2 instance with an assumed role as described here.
fs
The fs
feature enables use of a folder on a local disk to store crates to and fetch crates from.
- Must provide a url to the
-u | --url
parameter with thefile:
scheme
blob
The blob
feature enables the use of Azure Blob storage as a backend.
- Must provide a url to the
-u | --url
parameter, it must of the formblob://<container_name>(/<prefix>)?
- Must provide Azure Storage Account via the environment variables
STORAGE_ACCOUNT
andSTORAGE_MASTER_KEY
described here.
Examples
This is an example from our CI for an internal project.
Dependencies
- 424 crates.io crates: cached - 38MB, unpacked - 214MB
- 13 crates source from 10 git repositories: db - 27MB, checked out - 38MB
Scenario
The following CI jobs are run in parallel, each in a Kubernetes Job running on GKE. The container base is roughly the same as the official rust:1.39.0-slim image.
- Build modules for WASM
- Build modules for native
- Build host client for native
~ wait for all jobs to finish ~
- Run the tests for both the WASM and native modules from the host client
Before
All 3 build jobs take around 1m2s each to do cargo fetch --target x86_64-unknown-linux-gnu
After
All 3 build jobs take 3-4s each to do cargo fetcher --include-index sync
.
Usage
cargo-fetcher
has only 2 subcommands. Both of them share a set of options, the important inputs for each backend are described in Storage Backends.
In addition to the backend specifics, the only required optional is the path to the Cargo.lock
lockfile that you are operating on. cargo-fetcher
requires a lockfile, as otherwise the normal cargo work of generating a lockfile requires having a full registry index locally, which partially defeats the point of this tool.
-l, --lock-file <lock-file>
Path to the lockfile used for determining what crates to operate on [default: Cargo.lock]
mirror
The mirror
subcommand does the work of downloading crates and registry indexes from their original locations and re-uploading them to your storage backend.
It does have one additional option however, to determine how often it should take snapshots of the registry index(es).
-m, --max-stale <max-stale>
The duration for which the index will not be replaced after its most recent update.
Times may be specified with no suffix (default days), or one of:
* (s)econds
* (m)inutes
* (h)ours
* (d)ays
Custom registries
One wrinkle with mirroring is the presence of custom registries. To handle these, cargo fetcher
uses the same logic that cargo uses to locate .cargo/config<.toml>
config files to detect custom registries, however, cargo's config files only contain the metadata needed to fetch and publish to the registry, but the url template for where to download crates from is actually present in a config.json
file in the root of the registry itself.
Rather than wait for a registry index to be downloaded each time before fetching any crates sourced that registry, cargo-fetcher
instead allows you to specify the download location yourself via an environment variable, that way it can fully parallelize the fetching of registry indices and crates.
Example
# .cargo/config.toml
[registries]
embark = { index = "<secret url>" }
The environment variable is of the form CARGO_FETCHER_<name>_DL
where name is the same name (upper-cased) of the registry in the configuration file.
CARGO_FETCHER_EMBARK_DL="https://secret/rust/cargo/{crate}-{version}.crate" cargo fetcher mirror
The format of the URL should be the same as the one in your registry's config.json
file, if this environment variable is not specified for your registry, the default of /{crate}/{version}/download
is just appended to the url of the registry.
sync
The sync
subcommand is the actual replacement for cargo fetch
, except instead of downloading crates and registries from their normal location, it downloads them from your storage backend, and splats them to disk in the same way that cargo does, so that cargo won't have to do any actual work before it can start building code.
Contributing
We welcome community contributions to this project.
Please read our Contributor Guide for more information on how to get started.
License
Licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.