Awesome

neoloadcsvskelgen

Pronounce "niölödsiessvìsköldjen". (bless you)

It means Neo4j LOAD CSV skeleton generator

This project is a plugin for Neo4j, the graph database. It contains a procedure that will, from little input, generate for you some Cypher code for importing a given CSV file (with headers).

This is a developer-oriented tool, you will run it once for a file, then tweak the output into a working script.

Installation

Classic: git clone the project then run

mvn clean package

after that, copy the generated JAR original-csvskelgen-0.0.1.jar found into the target folder into the plugins subfolder of your Neo4j server and restart your server.

You are almost done. Some editing of neo4j.conf is needed. At the end of the file, you need give authorizations to my package. Add those lines at the end of your neo4j.conf file

dbms.security.procedures.unrestricted=wadael.*
dbms.security.procedures.whitelist=wadael.*

And if you have installed the Apoc plugin, then the syntax is like this

dbms.security.procedures.unrestricted=apoc.*,wadael.*
dbms.security.procedures.whitelist=apoc.*,wadael.*

Usage

Example

 CALL wadael.csvskelgen("/home/jerome/OpenSource/neoloadcsvskelgen/src/test/resources/guardian_most_polluting_companies_list.csv","|",4,"Company:3")

"/home/jerome/OpenSource/neoloadcsvskelgen/src/test/resources/guardian_most_polluting_companies_list.csv" is the full path to a CSV file. Unlike LOAD CSV, that procedure needs the full path
"|" is the field separator used in this file
4 is the number of example values to give as a comment
"Company:3" is the hints (its more directives in fact) means use the label "Company" for the three next fields. Another possible value would be "Company:3,Country:2" meaning three first columns are for Company, the following two are for Country.

For the file guardian_most_polluting_companies_list.csv that starts with

rank|company|percentage
1|China (Coal)|14.3
2|Saudi Arabian Oil Company (Aramco)|4.5
3|Gazprom OAO|3.9
4|National Iranian Oil Co|2.3
5|ExxonMobil Corp|2.0

The corresponding output is

USING PERIODIC COMMIT 5000 
LOAD CSV WITH HEADERS FROM 'file:/home/jerome/OpenSource/neoloadcsvskelgen/target/test-classes/guardian_most_polluting_companies_list.csv' AS line FIELDTERMINATOR '|'
CREATE (node0:Company)
SET node0.rank= line.`rank`// 1,2,3,4,
SET node0.company= line.`company`// China (Coal),Saudi Arabian Oil Company (Aramco),Gazprom OAO,National Iranian Oil Co,
SET node0.percentage= line.`percentage`// 14.3,4.5,3.9,2.3,

Should you not want to have examples of values, use 0. Examples values are here to help you name your properties correctly.

Of course, the generated code can be modified at will. It must be. The generated skeleton is for helping and avoiding typos, a clean basis for your import script.

Copy/paste it in your favorite text editor and rename the nodes to something meaningful for you. Also, check if you allow CSV files to be loaded from anywhere or just from the $NEO_HOME/import folder (default afaik)

Let me kindly remind you to create the necessary constraints before to run your import.

This project has been motivated by a 100-column CSV file I use in my lectures. It tends to scare :scream: students although they just have to pick some fields for an exercise.

Todo list

propose the Cypher code for constraint creation, as comment
do some syntax checking on the label names like enforcing camel case.
use something more meaningful that node0, node1, etc ...
Add advertisement for my book "Learning Neo4j 3.x" Check it out: on PacktPub, on Amazon.com
Use anti-single quote syntax like label.`field name` only when needed
find an idea to automagically set the identifier for each label
handle CSV files without headers

Cookbook

See cookbook.md

Misc

The given pronounciation is a joke. Have you bought my book ? Please leave a comment on the website where you bought it.

Is this project useful for you ? Send a message !