Home

Awesome

node-tar

Fast and full-featured Tar for Node.js

The API is designed to mimic the behavior of tar(1) on unix systems. If you are familiar with how tar works, most of this will hopefully be straightforward for you. If not, then hopefully this module can teach you useful unix skills that may come in handy someday :)

Background

A "tar file" or "tarball" is an archive of file system entries (directories, files, links, etc.) The name comes from "tape archive". If you run man tar on almost any Unix command line, you'll learn quite a bit about what it can do, and its history.

Tar has 5 main top-level commands:

The other flags and options modify how this top level function works.

High-Level API

These 5 functions are the high-level API. All of them have a single-character name (for unix nerds familiar with tar(1)) as well as a long name (for everyone else).

All the high-level functions take the following arguments, all three of which are optional and may be omitted.

  1. options - An optional object specifying various options
  2. paths - An array of paths to add or extract
  3. callback - Called when the command is completed, if async. (If sync or no file specified, providing a callback throws a TypeError.)

If the command is sync (ie, if options.sync=true), then the callback is not allowed, since the action will be completed immediately.

If a file argument is specified, and the command is async, then a Promise is returned. In this case, if async, a callback may be provided which is called when the command is completed.

If a file option is not specified, then a stream is returned. For create, this is a readable stream of the generated archive. For list and extract this is a writable stream that an archive should be written into. If a file is not specified, then a callback is not allowed, because you're already getting a stream to work with.

replace and update only work on existing archives, and so require a file argument.

Sync commands without a file argument return a stream that acts on its input immediately in the same tick. For readable streams, this means that all of the data is immediately available by calling stream.read(). For writable streams, it will be acted upon as soon as it is provided, but this can be at any time.

Warnings and Errors

Tar emits warnings and errors for recoverable and unrecoverable situations, respectively. In many cases, a warning only affects a single entry in an archive, or is simply informing you that it's modifying an entry to comply with the settings provided.

Unrecoverable warnings will always raise an error (ie, emit 'error' on streaming actions, throw for non-streaming sync actions, reject the returned Promise for non-streaming async operations, or call a provided callback with an Error as the first argument). Recoverable errors will raise an error only if strict: true is set in the options.

Respond to (recoverable) warnings by listening to the warn event. Handlers receive 3 arguments:

Error Codes

Errors that occur deeper in the system (ie, either the filesystem or zlib) will have their error codes left intact, and a tarCode matching one of the above will be added to the warning metadata or the raised error object.

Errors generated by tar will have one of the above codes set as the error.code field as well, but since errors originating in zlib or fs will have their original codes, it's better to read error.tarCode if you wish to see how tar is handling the issue.

Examples

The API mimics the tar(1) command line functionality, with aliases for more human-readable option and function names. The goal is that if you know how to use tar(1) in Unix, then you know how to use import('tar') in JavaScript.

To replicate tar czf my-tarball.tgz files and folders, you'd do:

import { create } from 'tar'
create(
  {
    gzip: <true|gzip options>,
    file: 'my-tarball.tgz'
  },
  ['some', 'files', 'and', 'folders']
).then(_ => { .. tarball has been created .. })

To replicate tar cz files and folders > my-tarball.tgz, you'd do:

// if you're familiar with the tar(1) cli flags, this can be nice
import * as tar from 'tar'
tar.c(
  {
    // 'z' is alias for 'gzip' option
    z: <true|gzip options>
  },
  ['some', 'files', 'and', 'folders']
).pipe(fs.createWriteStream('my-tarball.tgz'))

To replicate tar xf my-tarball.tgz you'd do:

tar.x( // or `tar.extract`
  {
    // or `file:`
    f: 'my-tarball.tgz'
  }
).then(_=> { .. tarball has been dumped in cwd .. })

To replicate cat my-tarball.tgz | tar x -C some-dir --strip=1:

fs.createReadStream('my-tarball.tgz').pipe(
  tar.x({
    strip: 1,
    C: 'some-dir', // alias for cwd:'some-dir', also ok
  }),
)

To replicate tar tf my-tarball.tgz, do this:

tar.t({
  file: 'my-tarball.tgz',
  onReadEntry: entry => { .. do whatever with it .. }
})

For example, to just get the list of filenames from an archive:

const getEntryFilenames = async tarballFilename => {
  const filenames = []
  await tar.t({
    file: tarballFilename,
    onReadEntry: entry => filenames.push(entry.path),
  })
  return filenames
}

To replicate cat my-tarball.tgz | tar t do:

fs.createReadStream('my-tarball.tgz')
  .pipe(tar.t())
  .on('entry', entry => { .. do whatever with it .. })

To do anything synchronous, add sync: true to the options. Note that sync functions don't take a callback and don't return a promise. When the function returns, it's already done. Sync methods without a file argument return a sync stream, which flushes immediately. But, of course, it still won't be done until you .end() it.

const getEntryFilenamesSync = tarballFilename => {
  const filenames = []
  tar.t({
    file: tarballFilename,
    onReadEntry: entry => filenames.push(entry.path),
    sync: true,
  })
  return filenames
}

To filter entries, add filter: <function> to the options. Tar-creating methods call the filter with filter(path, stat). Tar-reading methods (including extraction) call the filter with filter(path, entry). The filter is called in the this-context of the Pack or Unpack stream object.

The arguments list to tar t and tar x specify a list of filenames to extract or list, so they're equivalent to a filter that tests if the file is in the list.

For those who aren't fans of tar's single-character command names:

tar.c === tar.create
tar.r === tar.replace (appends to archive, file is required)
tar.u === tar.update (appends if newer, file is required)
tar.x === tar.extract
tar.t === tar.list

Keep reading for all the command descriptions and options, as well as the low-level API that they are built on.

tar.c(options, fileList, callback) [alias: tar.create]

Create a tarball archive.

The fileList is an array of paths to add to the tarball. Adding a directory also adds its children recursively.

An entry in fileList that starts with an @ symbol is a tar archive whose entries will be added. To add a file that starts with @, prepend it with ./.

The following options are supported:

The following options are mostly internal, but can be modified in some advanced use cases, such as re-using caches between runs.

Using onWriteMethod to alter entries

The onWriteMethod function, if provided, will get a reference to each entry object on its way into the archive.

If any fields on this entry are changed, then these changes will be reflected in the entry that is written to the archive.

The return value of the method is ignored. All that matters is the final state of the entry object. This can also be used to track the files added to an archive, for example.

import * as tar from 'tar'
const filesAdded = []
tar.c({
  sync: true,
  file: 'lowercase-executable.tar',
  onWriteEntry(entry) {
    // initially, it's uppercase and 0o644
    console.log('adding', entry.path, entry.stat.mode.toString(8))
    // make all the paths lowercase
    entry.path = entry.path.toLowerCase()
    // make the entry executable
    entry.stat.mode = 0o755
    // in the archive, it's lowercase and 0o755
    filesAdded.push([entry.path, entry.stat.mode.toString(8)])
  },
}, ['./bin'])
console.log('added', filesAdded)

Then, if the ./bin directory contained SOME-BIN, it would show up in the archive as:

$ node create-lowercase-executable.js
adding ./bin/SOME-BIN 644
added [[ './bin/some-bin', '755' ]]

$ tar cvf lowercase-executable.tar
-rwxr-xr-x  0 isaacs 20      47731 Aug 14 08:56 ./bin/some-bin

with a lowercase name and a mode of 0o755.

tar.x(options, fileList, callback) [alias: tar.extract]

Extract a tarball archive.

The fileList is an array of paths to extract from the tarball. If no paths are provided, then all the entries are extracted.

If the archive is gzipped, then tar will detect this and unzip it.

Note that all directories that are created will be forced to be writable, readable, and listable by their owner, to avoid cases where a directory prevents extraction of child entries by virtue of its mode.

Most extraction errors will cause a warn event to be emitted. If the cwd is missing, or not a directory, then the extraction will fail completely.

The following options are supported:

The following options are mostly internal, but can be modified in some advanced use cases, such as re-using caches between runs.

Note that using an asynchronous stream type with the transform option will cause undefined behavior in sync extractions. MiniPass-based streams are designed for this use case.

tar.t(options, fileList, callback) [alias: tar.list]

List the contents of a tarball archive.

The fileList is an array of paths to list from the tarball. If no paths are provided, then all the entries are listed.

If the archive is gzipped, then tar will detect this and unzip it.

If the file option is not provided, then returns an event emitter that emits entry events with tar.ReadEntry objects. However, they don't emit 'data' or 'end' events. (If you want to get actual readable entries, use the tar.Parse class instead.)

If a file option is provided, then the return value will be a promise that resolves when the file has been fully traversed in async mode, or undefined if sync: true is set. Thus, you must specify an onReadEntry method in order to do anything useful with the data it parses.

The following options are supported:

tar.u(options, fileList, callback) [alias: tar.update]

Add files to an archive if they are newer than the entry already in the tarball archive.

The fileList is an array of paths to add to the tarball. Adding a directory also adds its children recursively.

An entry in fileList that starts with an @ symbol is a tar archive whose entries will be added. To add a file that starts with @, prepend it with ./.

The following options are supported:

tar.r(options, fileList, callback) [alias: tar.replace]

Add files to an existing archive. Because later entries override earlier entries, this effectively replaces any existing entries.

The fileList is an array of paths to add to the tarball. Adding a directory also adds its children recursively.

An entry in fileList that starts with an @ symbol is a tar archive whose entries will be added. To add a file that starts with @, prepend it with ./.

The following options are supported:

Low-Level API

class Pack

A readable tar stream.

Has all the standard readable stream interface stuff. 'data' and 'end' events, read() method, pause() and resume(), etc.

constructor(options)

The following options are supported:

add(path)

Adds an entry to the archive. Returns the Pack stream.

write(path)

Adds an entry to the archive. Returns true if flushed.

end()

Finishes the archive.

class PackSync

Synchronous version of Pack.

class Unpack

A writable stream that unpacks a tar archive onto the file system.

All the normal writable stream stuff is supported. write() and end() methods, 'drain' events, etc.

Note that all directories that are created will be forced to be writable, readable, and listable by their owner, to avoid cases where a directory prevents extraction of child entries by virtue of its mode.

'close' is emitted when it's done writing stuff to the file system.

Most unpack errors will cause a warn event to be emitted. If the cwd is missing, or not a directory, then an error will be emitted.

constructor(options)

class UnpackSync

Synchronous version of Unpack.

Note that using an asynchronous stream type with the transform option will cause undefined behavior in sync unpack streams. MiniPass-based streams are designed for this use case.

class tar.Parse

A writable stream that parses a tar archive stream. All the standard writable stream stuff is supported.

If the archive is gzipped, then tar will detect this and unzip it.

Emits 'entry' events with tar.ReadEntry objects, which are themselves readable streams that you can pipe wherever.

Each entry will not emit until the one before it is flushed through, so make sure to either consume the data (with on('data', ...) or .pipe(...)) or throw it away with .resume() to keep the stream flowing.

constructor(options)

Returns an event emitter that emits entry events with tar.ReadEntry objects.

The following options are supported:

abort(error)

Stop all parsing activities. This is called when there are zlib errors. It also emits an unrecoverable warning with the error provided.

class tar.ReadEntry extends MiniPass

A representation of an entry that is being read out of a tar archive.

It has the following fields:

constructor(header, extended, globalExtended)

Create a new ReadEntry object with the specified header, extended header, and global extended header values.

class tar.WriteEntry extends MiniPass

A representation of an entry that is being written from the file system into a tar archive.

Emits data for the Header, and for the Pax Extended Header if one is required, as well as any body data.

Creating a WriteEntry for a directory does not also create WriteEntry objects for all of the directory contents.

It has the following fields:

constructor(path, options)

path is the path of the entry as it is written in the archive.

The following options are supported:

warn(message, data)

If strict, emit an error with the provided message.

Othewise, emit a 'warn' event with the provided message and data.

class tar.WriteEntry.Sync

Synchronous version of tar.WriteEntry

class tar.WriteEntry.Tar

A version of tar.WriteEntry that gets its data from a tar.ReadEntry instead of from the filesystem.

constructor(readEntry, options)

readEntry is the entry being read out of another archive.

The following options are supported:

class tar.Header

A class for reading and writing header blocks.

It has the following fields:

constructor(data, [offset=0])

data is optional. It is either a Buffer that should be interpreted as a tar Header starting at the specified offset and continuing for 512 bytes, or a data object of keys and values to set on the header object, and eventually encode as a tar Header.

decode(block, offset)

Decode the provided buffer starting at the specified offset.

Buffer length must be greater than 512 bytes.

set(data)

Set the fields in the data object.

encode(buffer, offset)

Encode the header fields into the buffer at the specified offset.

Returns this.needPax to indicate whether a Pax Extended Header is required to properly encode the specified data.

class tar.Pax

An object representing a set of key-value pairs in an Pax extended header entry.

It has the following fields. Where the same name is used, they have the same semantics as the tar.Header field of the same name.

constructor(object, global)

Set the fields set in the object. global is a boolean that defaults to false.

encode()

Return a Buffer containing the header and body for the Pax extended header entry, or null if there is nothing to encode.

encodeBody()

Return a string representing the body of the pax extended header entry.

encodeField(fieldName)

Return a string representing the key/value encoding for the specified fieldName, or '' if the field is unset.

tar.Pax.parse(string, extended, global)

Return a new Pax object created by parsing the contents of the string provided.

If the extended object is set, then also add the fields from that object. (This is necessary because multiple metadata entries can occur in sequence.)

tar.types

A translation table for the type field in tar headers.

tar.types.name.get(code)

Get the human-readable name for a given alphanumeric code.

tar.types.code.get(name)

Get the alphanumeric code for a given human-readable name.