Home

Awesome

AzFuse

AzFuse is a lightweight blobfuse-like python tool with the data transfer implemented through AzCopy. With this tool, reading/writing a file in azure storage is similar to reading a local file, which follows the same principle of blobfuse. However, the underlying data transfer is to leverage azcopy, which provides a much faster speed.

Installation

  1. Download azcopy from here. Copy azcopy as ~/code/azcopy/azcopy or under /usr/bin/ and make it executable. Make sure it is version 10 or higher.
  2. install by
    pip install git+https://github.com/microsoft/azfuse.git
    
    or
    git clone https://github.com/microsoft/azfuse.git
    cd azfuse
    python setup.py install
    

Preliminary

Azfuse contains 3 different kinds of file paths.

  1. local or logical path, which is populated by the user script. For example, the user script may want to access the file, named data/abc.txt, which is referred to as local path.
  2. remote path, which is the path in azure storage blob. For example, if the azure storage path is https://accountname.blob.core.windows.net/containername/path/data/abc.txt, the remote path will be path/data/abc.txt. Note that, the remote path does not include the containername in the url.
  3. cache path, which is the destination file of the azcopy, e.g. /tmp/data/abc.txt. We will use azcopy to download the file here or upload this file to Azure.

The pipeline is

  1. the user script tries to access data/abc.txt through with azfuse.File.open().
  2. if it is in read mode, the tool will check if the cache path exists.
    • if it exists, it returns the handle of the cache file
    • if it does not exist, it will download the file from remote path to cache path and return the handle of the cache file.
  3. if it is in write mode, the tool will open the cache path, and return the handle of the cache path. Before leaving with, the tool will upload the cache file to remote file.

Setup

  1. By default, the feature is disabled. That is, the file read/write will directly access the local file without trying to access the remote in azure blob. Thus, it is also recommended to first use such tool, but not to enable it (also, no need to configure it). To enable it, set AZFUSE_USE_FUSE=1 explicitly. The following describes how to configure it when enabled.

  2. Set the environment variable of AZFUSE_CLOUD_FUSE_CONFIG_FILE as the configuration file path, e.g. AZFUSE_CLOUD_FUSE_CONFIG_FILE=./aux_data/configs/azfuse.yaml

  3. The configuration file is in yaml format, and is a list of dictionary. Each dictionary contains local, remote, cache, and storage_account.

    - cache: /tmp/azfuse/data
      local: data
      remote: azfuse_data
      storage_account: storage_config_name
    - cache: /tmp/azfuse/models
      local: models
      remote: models
      storage_account: storage_config_name
    

    The path in the yaml file is the prefix of the corresponding path. For example, if the local path is data/abc.txt, the cache path will be /tmp/azfuse/data/abc.txt, and the remote path will be azfuse_data/abc.txt. The tool will match each prefix from the first to the last, and the one which is matched first will be the one used. If there is no match, it will assume this is a local file, which can also be a blobfuse mount file.

    The storage account here is the base file name. Here, the path will be ./aux_data/storage_account/storage_config_name.yaml. The folder can be changed by setting AZFUSE_STORAGE_ACCOUNT_CONFIG_FOLDER. The storage account yaml file's format should be like this

    account_name: accountname
    account_key: accountkey
    sas_token: sastoken
    container_name: containername
    

    account_key or sas_token can be null. The sas_token should start with ?.

Examples

Tips

Command line

A command line tool is provided for some data management.

setup

set the following alias to use azfuse as a command line.

alias azfuse='ipython --pdb -m azfuse --'

usage

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.