Home

Awesome

NexR RHive 2.0

RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.

Before installing RHive, you have to have installed Hadoop and Hive

Install Hadoop

  1. Single Node
  2. Cluster Node
  3. set HADOOP_HOME at local machine on which R runs

Install Hive

  1. install local machine and remote machine on which NameNode runs or Hive-Server runs.
  2. Installation Guide
  3. set HIVE_HOME at local machine on which R runs.
  4. launch Hive Server with following command on remote machine. it should be as a background process.
    • <code>$HIVE_HOME/bin/hive --service hiveserver</code>

Install R and Packages

  1. install R
    • need to install R on all tasktracker nodes
  2. install rJava
    • only install rJava on local machine.
  3. install Rserve
    • need to install Rserve on all tasktracker nodes
    • make configuration in path (/etc/Rserv.conf) on all tasktracker nodes. edit this file to add 'remote enable' to allow remote connection.
    • launch all Rserve on all tasktracker nodes.
      • e.q> <code>R CMD Rserve</code>
  4. setting tasktracker nodes
    • add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
      • e.q> <code>export R_HOME=/usr/lib/R</code>
  5. install RUnit

Install RHive

  1. Requirements
    • ant (in order to build java files)
  2. Installing RHive
    1. Download source code: <code>git clone https://github.com/nexr/RHive.git</code>
    2. Change your working directory: <code>cd RHive</code>
    3. Set the environment variables HIVE_HOME and HADOOP_HOME: <code>export HIVE_HOME=/path/to/your/hive/directory</code> <code>export HADOOP_HOME=/path/to/your/hadoop/directory</code>
    4. Build java files using ant: <code>ant build</code>
    5. Build RHive: <code>R CMD build RHive</code>
    6. Install RHive: <code>R CMD INSTALL RHive_<VERSION>.tar.gz</code>

Loading RHive and connecting to Hive

  1. Set the environment variables HIVE_HOME and HADOOP_HOME:
    • Set the environment variables: <code>export HIVE_HOME=/path/to/your/hive/directory</code> <code>export HADOOP_HOME=/path/to/your/hadoop/directory</code> <code>export HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory</code>
    • Or, add environment variables into Renviron <code>HIVE_HOME=/path/to/your/hive/directory</code> <code>HADOOP_HOME=/path/to/your/hadoop/directory</code> <code>HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory</code>
  2. launch R
<pre><code>library(RHive) rhive.connect(host, port, hiveServer2)</code></pre>

Tutorials

Requirements