Awesome
README
ClickHouse (https://clickhouse.com/) is an open-source, high performance columnar OLAP (online analytical processing of queries) database management system for real-time analytics using SQL. This DBI backend relies on the ‘ClickHouse’ HTTP interface and support HTTPS protocol.
This package has been developed as an alternative to the excellent RClickhouse to provide secured connection through SSL using HTTPS (unfortunately SSL connection is not yet supported by RClickhouse).
The ClickHouseHTTP R package is licensed under GPL-3.
Installation
From CRAN
install.packages("ClickHouseHTTP")
Dependencies
The following R packages available on CRAN are required:
- methods: Formal Methods and Classes
- DBI: R Database Interface
- httr: Tools for Working with URLs and HTTP
- jsonlite: A Simple and Robust JSON Parser and Generator for R
- arrow: Integration to ‘Apache’ ‘Arrow’
- data.table: Extension
of
data.frame
And those are suggested:
- knitr: A General-Purpose Package for Dynamic Report Generation in R
- rmarkdown: Dynamic Documents for R
- dplyr: A Grammar of Data Manipulation
- stringi: Fast and Portable Character String Processing Facilities
From github
devtools::install_github("patzaw/ClickHouseHTTP")
Documentation
Usage
Connection
library(DBI)
## HTTP connection
con <- dbConnect(
ClickHouseHTTP::ClickHouseHTTP(), host="localhost",
port=8123
)
## HTTPS connection (without ssl peer verification)
con <- dbConnect(
ClickHouseHTTP::ClickHouseHTTP(), host="localhost",
port=8443, https=TRUE, ssl_verifypeer=FALSE
)
Write a table in the database
library(dplyr)
data("mtcars")
mtcars <- as_tibble(mtcars, rownames="car")
dbWriteTable(con, "mtcars", mtcars)
Query the database
carsFromDB <- dbReadTable(con, "mtcars")
dbGetQuery(con, "SELECT car, mpg, cyl, hp FROM mtcars WHERE hp>=110")
By default, ClickHouseHTTP relies on the Apache
Arrow
format provided by ClickHouse.
However, as described in the
documentation,
the following types are not supported in the current implementation of
this format: TIME32, FIXED_SIZE_BINARY, JSON, UUID, ENUM. The
format
argument of the dbGetQuery()
function can be used to rely on
the TabSeparatedWithNamesAndTypes format.
selCars <- dbGetQuery(
con, "SELECT car, mpg, cyl, hp FROM mtcars WHERE hp>=110",
format="TabSeparatedWithNamesAndTypes"
)
## Identifying the original ClickHouse data types
attr(selCars, "type")
Using alternative databases stored in ClickHouse
dbSendQuery(con, "CREATE DATABASE swiss")
dbSendQuery(con, "USE swiss")
The chosen database is used until the session expires. It can also be
chosen when connecting using the dbname
argument of the dbConnect()
function.
The example below shows that spaces in column names are supported. It
also shows the support of R list
using the Array ClickHouse type.
data("swiss")
swiss <- as_tibble(swiss, rownames="province")
swiss <- mutate(swiss, "pr letters"=strsplit(province, ""))
dbWriteTable(
con, "swiss", swiss,
engine="MergeTree() ORDER BY (Fertility, province)"
)
swissFromDB <- dbReadTable(con, "swiss")
A table from another database can also be accessed as following:
dbReadTable(con, SQL("default.mtcars"))
Setting up a ClickHouse database using docker
Configuration
CH_HOME=~/Documents/Projects/Test_CH
mkdir -p $CH_HOME
mkdir -p ${CH_HOME}/data
mkdir -p ${CH_HOME}/conf
mkdir -p ${CH_HOME}/log
Configuration files are shared in the supp/ClickHouse-Conf-Files
folder in this repository.
cp supp/ClickHouse-Conf-Files/users.xml ${CH_HOME}/conf/
cp supp/ClickHouse-Conf-Files/config.xml ${CH_HOME}/conf/
SSL certificate
openssl req -subj "/CN=localhost" -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout ${CH_HOME}/conf/server.key -out ${CH_HOME}/conf/server.crt
openssl dhparam -out ${CH_HOME}/conf/dhparam.pem 4096
Container
The following ports are supported in the shared config.xml
file:
- 9000: Native TCP interface (not used by ClickHouseHTTP)
- 9440: Native TCP interface wrapped in TLS (not used by ClickHouseHTTP)
- 8123: HTTP interface
- 8443: HTTPS interface
chmod -R a+rwx ${CH_HOME}
docker run -d --name Test_CH \
--ulimit nofile=262144:262144 \
--volume ${CH_HOME}/data:/var/lib/clickhouse \
--publish=9000:9000 \
--publish=9440:9440 \
--publish=8123:8123 \
--publish=8443:8443 \
--volume ${CH_HOME}/conf/users.xml:/etc/clickhouse-server/users.xml \
--volume ${CH_HOME}/conf/config.xml:/etc/clickhouse-server/config.xml \
--volume ${CH_HOME}/conf/server.crt:/etc/clickhouse-server/server.crt \
--volume ${CH_HOME}/conf/server.key:/etc/clickhouse-server/server.key \
--volume ${CH_HOME}/conf/dhparam.pem:/etc/clickhouse-server/dhparam.pem \
--volume ${CH_HOME}/log:/var/log/clickhouse-server \
clickhouse/clickhouse-server:22.2.3.5
Alternatives
-
RClickhouse is another DBI backend for the ClickHouse database. It provides basic dplyr support by auto-generating SQL-commands using dbplyr and is based on this C++ Clickhouse Client.
-
clickhouse-r is another DBI backend for the ClickHouse database relying on HTTP protocol. It provides SSL support but without peer verification for the moment.
Acknowledgments
This work was entirely supported by UCB Pharma (Early Solutions department).