Home

Awesome

remotezip

Build Status

This module provides a way to access single members of a zip file archive without downloading the full content from a remote web server. For this library to work, the web server hosting the archive needs to support the range header.

Installation

pip install remotezip

Usage

Initialization

RemoteZip(url, ...)

To download the content, this library rely on the requests module. The constructor interface matches the function requests.get module.

Class Interface

RemoteZip is a subclass of the python standard library class zipfile.ZipFile, so it supports all its read methods:

Please look at the zipfile documentation for usage details.

NOTE:

Examples

List members in archive

Print all members part of the archive:

from remotezip import RemoteZip

with RemoteZip('http://.../myfile.zip') as zip:
    for zip_info in zip.infolist():
        print(zip_info.filename)

Download a member

The following example will extract the file somefile.txt from the archive stored at the URL http://.../myfile.zip.

from remotezip import RemoteZip

with RemoteZip('http://.../myfile.zip') as zip:
    zip.extract('somefile.txt')

S3 example

If you are trying to download a member from a zip archive hosted on S3 you can use the aws-requests-auth library for that as follow:

from aws_requests_auth.boto_utils import BotoAWSRequestsAuth
from hashlib import sha256

auth = BotoAWSRequestsAuth(
    aws_host='s3-eu-west-1.amazonaws.com',
    aws_region='eu-west-1',
    aws_service='s3'
)
headers = {'x-amz-content-sha256': sha256('').hexdigest()}
url = "https://s3-eu-west-1.amazonaws.com/.../file.zip"

with RemoteZip(url, auth=auth, headers=headers) as z: 
    zip.extract('somefile.txt')

Command line tool

A simple command line tool is included in this distribution.

usage: remotezip [-h] [-l] [-d DIR] url [filename [filename ...]]

Unzip remote files

positional arguments:
  url                URL of the zip archive
  filename           File to extract

optional arguments:
  -h, --help         show this help message and exit
  -l, --list         List files in the archive
  -d DIR, --dir DIR  Extract directory, default current directory

Example

$ remotezip -l "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip"
  Length  DateTime             Name
--------  -------------------  ------------------------
    2962  2008-07-30 13:58:46  Readme.txt
   24740  2008-07-30 12:16:46  TM_WORLD_BORDERS-0.3.dbf
     145  2008-03-12 13:11:54  TM_WORLD_BORDERS-0.3.prj
 6478464  2008-07-30 12:16:46  TM_WORLD_BORDERS-0.3.shp
    2068  2008-07-30 12:16:46  TM_WORLD_BORDERS-0.3.shx
    
$ remotezip "http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip" Readme.txt
Extracting Readme.txt...

How it works

This module uses the zipfile.ZipFile class under the hood to decode the zip file format. The ZipFile class is initialized with a file like object that will perform transparently the remote queries.

The zip format is composed by the content of each compressed member followed by the central directory.

How many requests will this module perform to download a member?

Alternative module

There is a similar module available for python pyremotezip.