Awesome
NexR RHive 2.0
RHive is an R extension facilitating distributed computing via HIVE query. RHive allows easy usage of HQL(Hive SQL) in R, and allows easy usage of R objects and R functions in Hive.
Before installing RHive, you have to have installed Hadoop and Hive
Install Hadoop
- Single Node
- Cluster Node
- set HADOOP_HOME at local machine on which R runs
Install Hive
- install local machine and remote machine on which NameNode runs or Hive-Server runs.
- Installation Guide
- set HIVE_HOME at local machine on which R runs.
- launch Hive Server with following command on remote machine. it should be as a background process.
- <code>$HIVE_HOME/bin/hive --service hiveserver</code>
Install R and Packages
- install R
- need to install R on all tasktracker nodes
- install rJava
- only install rJava on local machine.
- install Rserve
- need to install Rserve on all tasktracker nodes
- make configuration in path (/etc/Rserv.conf) on all tasktracker nodes. edit this file to add 'remote enable' to allow remote connection.
- launch all Rserve on all tasktracker nodes.
- e.q> <code>R CMD Rserve</code>
- setting tasktracker nodes
- add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
- e.q> <code>export R_HOME=/usr/lib/R</code>
- add R_HOME path at $HADOOP_HOME/conf/hadoop-env.sh
- install RUnit
Install RHive
- Requirements
- ant (in order to build java files)
- Installing RHive
- Download source code: <code>git clone https://github.com/nexr/RHive.git</code>
- Change your working directory: <code>cd RHive</code>
- Set the environment variables HIVE_HOME and HADOOP_HOME: <code>export HIVE_HOME=/path/to/your/hive/directory</code> <code>export HADOOP_HOME=/path/to/your/hadoop/directory</code>
- Build java files using ant: <code>ant build</code>
- Build RHive: <code>R CMD build RHive</code>
- Install RHive: <code>R CMD INSTALL RHive_<VERSION>.tar.gz</code>
Loading RHive and connecting to Hive
- Set the environment variables HIVE_HOME and HADOOP_HOME:
- Set the environment variables: <code>export HIVE_HOME=/path/to/your/hive/directory</code> <code>export HADOOP_HOME=/path/to/your/hadoop/directory</code> <code>export HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory</code>
- Or, add environment variables into Renviron <code>HIVE_HOME=/path/to/your/hive/directory</code> <code>HADOOP_HOME=/path/to/your/hadoop/directory</code> <code>HADOOP_CONF_DIR=/path/to/your/hadoop/conf/directory</code>
- launch R
Tutorials
Requirements
- Java 1.6
- R 2.13.0
- Rserve 0.6-0
- rJava 0.9-0
- Hadoop 0.20.x (x >= 1)
- Hive 0.8.x (x >= 0)