Home

Awesome

<!-- This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/. --> <!-- Copyright 2020 Joyent, Inc. Copyright 2022 MNX Cloud, Inc. -->

Manta: a scalable, distributed object store

Manta is an open-source, scalable, HTTP-based object store. All the pieces required to deploy and operate your own Manta are open source. This repo provides documentation for the overall Manta project and pointers to the other repositories that make up a complete Manta deployment.

Getting started

The fastest way to get started with Manta depends on what exactly one wishes to do.

Community

Community discussion about Manta happens in two main places:

Dependencies

Manta is composed of a number of services that deploy on top of Joyent's Triton DataCenter platform (just "Triton" for short), which is also open-source. Triton provides services for operating physical servers (compute nodes), deploying services in containers, monitoring services, transmitting and visualizing real-time performance data, and a bunch more. Manta primarily uses Triton for initial deployment, service upgrade, and service monitoring.

Triton itself depends on SmartOS. Manta also directly depends on several SmartOS features, notably ZFS.

Building and Deploying Manta

Manta service images are built and packaged using the same mechanisms as building the services that are part of Triton. Once you have Triton set up, follow the instructions in the Manta Operator Guide to deploy Manta. The easiest way to play around with your own Manta installation is to first set up a Triton cloud-on-a-laptop (COAL) installation in VMware and then follow those instructions to deploy Manta on it.

If you want to deploy your own builds of Manta components, see "Deploying your own Manta Builds" below.

Repositories

This repository is just a wrapper containing documentation about Manta. Manta is made up of several components from many repositoies. This section highlights some of the more important ones.

A full list of repositories relevant to Manta is maintained in a repo manifest file in this repo. To more conveniently list those repos, you can use the jr tool.

The front door services respond to requests from the internet at large:

The metadata tiers for the Directory and Buckets APIs store the entire object namespace (not object data) as well as backend storage system capacity:

The storage tier is responsible for actually storing bits on disk:

There are a number of services not part of the data path that are critical for Manta's operation. For example:

Most of the above components are services, of which there may be multiple instances in a single Manta deployment. Except for the last category of non-data-path services, these can all be deployed redundantly for availability and additional instances can be deployed to increase capacity.

For more details on the architecture, including how these pieces actually fit together, see the Architecture section of the Operator Guide.

Deploying your own Manta Builds

As described above, as part of the normal Manta deployment process, you start with the "manta-deployment" zone that's built into Triton. Inside that zone, you run "manta-init" to fetch the latest Joyent build of each Manta component. Then you run Manta deployment tools to actually deploy zones based on these builds.

The easiest way to use your own custom build is to first deploy Manta using the default Joyent build and then replace whatever components you want with your own builds. This will also ensure that you're starting from a known-working set of builds so that if something goes wrong, you know where to start looking. To do this:

  1. Complete the Manta deployment procedure from the operator guide.

  2. Build a zone image for whatever zone you want to replace. See the instructions for building Triton zone images. Manta zones work the same way. The output of this process will be a zone image, identified by uuid. The image is comprised of two files: an image manifest (a JSON file) and the image file itself (a binary blob).

  3. Import the image into the Triton DataCenter that you're using to deploy Manta. (If you've got a multi-datacenter Manta deployment, you'll need to import the image into each datacenter separately using this same procedure.)

    1. Copy the image and manifest files to the Triton headnode where the Manta deployment zone is deployed. For simplicity, assume that the manifest file is "/var/tmp/my_manifest.json" and the image file is "/var/tmp/my_image". You may want to use the image uuid in the filenames instead.

    2. Import the image using:

      sdc-imgadm import -m /var/tmp/my_manifest.json -f /var/tmp/my_image
      
  4. Now you can use the normal Manta zone update procedure (from the operator guide). This involves saving the current configuration to a JSON file using "manta-adm show -sj > config.json", updating the configuration file, and then applying the changes with "manta-adm update < config.json". When you modify the configuration file, you can use your image's uuid in place of whatever service you're trying to replace.

If for some reason you want to avoid deploying the Joyent builds at all, you'll have to follow a more manual procedure. One approach is to update the SAPI configuration for whatever service you want (using sdc-sapi -- see SAPI) immediately after running manta-init but before deploying anything. Note that each subsequent "manta-init" will clobber this change, though the SAPI configuration is normally only used for the initial deployment anyway. The other option is to apply the fully-manual install procedure from the Operator Guide (i.e., instead of using manta-deploy-coal or manta-deploy-lab) and use a custom "manta-adm" configuration file in the first place. If this is an important use case, file an issue and we can improve this procedure.

The above procedure works to update Manta zones, which are most of the components above. The other two kinds of components are the platform and agents. Both of these procedures are documented in the Operator Guide, and they work to deploy custom builds as well as the official Joyent builds.

Contributing to Manta

To report bugs or request features, you can submit issues to the Manta project on Github. If you're asking for help with Joyent's production Manta service, you should contact Joyent support instead.

See the Contribution Guidelines for information about contributing changes to the project.

Design principles

Manta assumes several constraints on the data storage problem:

  1. There should be one canonical copy of data. You shouldn't need to copy data in order to analyze it, transform it, or serve it publicly over the internet.
  2. The system must scale horizontally in every dimension. It should be possible to add new servers and deploy software instances to increase the system's capacity in terms of number of objects, total data stored, or compute capacity.
  3. The system should be general-purpose.
  4. The system should be strongly consistent and highly available. In terms of CAP, Manta sacrifices availability in the face of network partitions. (The reasoning here is that an AP cache can be built atop a CP system like Manta, but if Manta were AP, then it would be impossible for anyone to get CP semantics.)
  5. The system should be transparent about errors and performance. The public API only supports atomic operations, which makes error reporting and performance easy to reason about. (It's hard to say anything about the performance of compound operations, and it's hard to report failures in compound operations.) Relatedly, a single Manta deployment may span multiple datacenters within a region for higher availability, but Manta does not attempt to provide a global namespace across regions, since that would imply uniformity in performance or fault characteristics.

From these constraints, we define some design principles:

  1. Manta presents an HTTP interface (with REST-based PUT/GET/DELETE operations) as the primary way of reading and writing data. Because there's only one copy of data, and some data needs to be available publicly (e.g., on the internet over standard protocols), HTTP is a good choice.
  2. Manta is an object store, meaning that it only provides PUT/GET/DELETE for entire objects. You cannot write to the middle of an object or append to the end of one. This constraint makes it possible to guarantee strong consistency and high availability, since only the metadata tier (i.e., the namespace) needs to be strongly consistent, and objects themselves can be easily replicated for availability.

It's easy to underestimate the problem of just reliably storing bits on disk. It's commonly assumed that the only components that fail are disks, that they fail independently, and that they fail cleanly (e.g., by reporting errors). In reality, there are a lot worse failure modes than disks failing cleanly, including:

Manta delegates to ZFS to solve the single-system data storage problem. To handle these cases,

Further reading

For background on the overall design approach, see "There's Just No Getting Around It: You're Building a Distributed System".

For information about how Manta is designed to survive component failures and maintain strong consistency, see Fault tolerance in Manta.

For information on the latest recommended production hardware, see Joyent Manufacturing Matrix and Joyent Manufacturing Bill of Materials.