Home

Awesome

Daru-IO

Gem Version Build Status Yard Docs Inline docs Code Climate Stories in Ready License: MIT

A Ruby plugin-gem to daru gem, that extends support for many Import and Export methods of Daru::DataFrame. This gem is intended to help Rubyists who are into Data Analysis or Web Development, by serving as a general purpose conversion library that takes input in one format (say, JSON) and converts it another format (say, Avro) while also making it incredibly easy to getting started on analyzing data with daru.

While supporting various IO modules, daru-io also provides an easier way of adding more Importers / Exporters. It's strongly recommended to have a look at 'Creating your own IO modules' section, if you're interested in creating new Importers / Exporters.

Table of contents

Installation

(Go to Table of Contents)

Note: Each IO module has it's own set of dependencies. Have a look at the Importers and Exporters section for dependency-specific information.

Importers

The Daru::IO Importers are intended to return a Daru::DataFrame from the given arguments. Generally, all Importers can be called in two ways - from Daru::IO or Daru::DataFrame.

#! Partially requires Format Importer
require 'daru/io/importers/format'

#! Usage from Daru::IO
instance = Daru::IO::Importers::Format.from(connection)
# or,
instance = Daru::IO::Importers::Format.read(path)
df = instance.call(opts)

#! Usage from Daru::DataFrame
df1 = Daru::DataFrame.from_format(connection, opts)
df2 = Daru::DataFrame.read_format(path, opts)

Note: Please have a look at the respective Importer Doc links below, for having a look at arguments and examples.

ActiveRecord Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from an ActiveRecord connection.

Avro Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from an .avro file.

CSV Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a .csv or .csv.gz file.

Excel Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a .xls file.

Excelx Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a .xlsx file.

HTML Importer

(Go to Table of Contents)

Note: This module works only for static tables on a HTML page, and won't work in cases where the table is being loaded into the HTML table by inline Javascript. This is how the Nokogiri gem works, and the HTML Importer also follows suit.

Imports an Array of Daru::DataFrames from a .html file or website.

JSON Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a .json file / response.

Mongo Importer

(Go to Table of Contents)

Note: The Mongo gem faces Argument Error : expected Proc Argument issue due to the bug in MRI Ruby 2.4.0 mentioned here. This seems to have been fixed in Ruby 2.4.1 onwards. Hence, please avoid using this Mongo Importer in Ruby version 2.4.0.

Imports a Daru::DataFrame from a Mongo collection.

Plaintext Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a .dat plaintext file (space separated table of simple strings and numbers). For a sample format of the plaintext file, have a look at the example bank2.dat file.

RData Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a variable in .rdata file.

RDS Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a .rds file.

Redis Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from Redis key(s).

SQL Importer

(Go to Table of Contents)

Imports a Daru::DataFrame from a sqlite.db file / DBI connection.

Exporters

The Daru::IO Exporters are intended to 'migrate' a Daru::DataFrame into a file, or database. All Exporters can be called in two ways - from Daru::IO or Daru::DataFrame.

#! Partially requires Format Exporter
require 'daru/io/exporters/format'

#! Usage from Daru::IO
instance = Daru::IO::Exporters::Format.new(df, opts)
instance.to_s #=> Provides a file-writable string, which can be used in web applications for file download purposes
instance.to #=> Provides a Format instance
instance.write(path) #=> Writes to the given path

#! Usage from Daru::DataFrame
string = df.to_format_string(opts) #=> Provides a file-writable string, which can be to write into a file later
instance = df.to_format(opts) #=> Provides a Format instance
df.write_format(path, opts) #=> Writes to the given path

Note: Please have a look at the respective Exporter Doc links below, for having a look at arguments and examples.

Avro Exporter

(Go to Table of Contents)

Exports a Daru::DataFrame into a .avro file.

CSV Exporter

(Go to Table of Contents)

Exports a Daru::DataFrame into a .csv or .csv.gz file.

Excel Exporter

(Go to Table of Contents)

Exports a Daru::DataFrame into a .xls file.

JSON Exporter

(Go to Table of Contents)

Exports a Daru::DataFrame into a .json file.

RData Exporter

(Go to Table of Contents)

Exports multiple Daru::DataFrames into a .rdata file.

RDS Exporter

(Go to Table of Contents)

Exports a Daru::DataFrame into a .rds file.

SQL Exporter

(Go to Table of Contents)

Exports a Daru::DataFrame into a database (SQL) table through DBI connection.

Creating your own IO modules

Daru-IO currently supports various Import / Export methods, as it can be seen from the above list. But the list is NEVER complete - there may always be specific use-case format(s) that you need very badly, but might not fit the needs of majority of the community. In such a case, don't worry - you can always tweak (aka monkey-patch) daru-io in your application. The architecture of daru-io provides a neater way of monkey-patching into Daru::DataFrame to support your unique use-case.

Note: The new module can be made to inherit from another module (like Importers::JSON) rather than Importers::Base, depending on use-case (say, parse a complexly nested API response with JsonPaths).

Contributing

(Go to Table of Contents)

Contributions are always welcome. But, please have a look at the contribution guidelines first before contributing. :tada:

License

(Go to Table of Contents)

The MIT License (MIT) 2017 - Athitya Kumar and Ruby Science Foundation. Please have a look at the LICENSE.md for more details.