Skip to content

Latest commit

 

History

History
1158 lines (921 loc) · 39.5 KB

load-csv.adoc

File metadata and controls

1158 lines (921 loc) · 39.5 KB

LOAD CSV

LOAD CSV is used to import data from CSV files into a Neo4j database.

LOAD CSV FROM 'https://data.neo4j.com/bands/artists.csv' // (1)
AS row  // (2)
MERGE (:Artist {name: row[1], year: toInteger(row[2])})  // (3)
  1. FROM takes a STRING containing the path where the CSV file is located.

  2. The clause parses one row at a time, temporarily storing the current row in the variable specified with AS.

  3. The MERGE clause accesses the row variable to insert data into the database.

LOAD CSV supports both local and remote URLs. Local paths are resolved relative to the Neo4j installation folder.

Note

Loading CSV files requires load privileges.

Import CSV data into Neo4j

Import local files

You can store CSV files on the database server and then access them by using a file:/// URL. By default, paths are resolved relative to the Neo4j import directory.

Example 1. Import artists name and year information from a local file
artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
Result
a.name a.year

'ABBA'

'1992'

'Roxette'

'1986'

'Europe'

'1979'

'The Cardigans'

'1992'

4 rows

Added 4 nodes, Set 8 properties, Added 4 labels

Note
For ways of importing data into an Aura instance, see Aura → Importing data.
Tip
When using file:/// URLs, spaces and other non-alphanumeric characters must be URL-encoded.

Configuration settings for file URLs

dbms.security.allow_csv_import_from_file_urls

This setting determines whether file:/// URLs are allowed.

server.directories.import

This setting sets the root directory relative to which file:/// URLs are parsed.

Import from a remote location

You can import data from a CSV file hosted on a remote path.

LOAD CSV supports accessing CSV files via HTTPS, HTTP, and FTP (with or without credentials). It also follows redirects, except those changing the protocol (for security reasons).

Important

It is strongly recommended to permit resource loading only over secure protocols such as HTTPS instead of insecure protocols like HTTP. This can be done by limiting the load privileges to only trusted sources that use secure protocols. If allowing an insecure protocol is absolutely unavoidable, Neo4j takes measures internally to enhance the security of these requests within their limitations. However, this means that insecure URLs on virtual hosts will not function unless you add the JVM argument -Dsun.net.http.allowRestrictedHeaders=true to the configuration setting server.jvm.additional.

Example 2. Import artists name and year information from a remote file via HTTPS
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'https://data.neo4j.com/bands/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
Result
a.name a.year

'ABBA'

'1992'

'Roxette'

'1986'

'Europe'

'1979'

'The Cardigans'

'1992'

4 rows

Added 4 nodes, Set 8 properties, Added 4 labels

Example 3. Import artists name and year information from a remote file via FTP using credentials
ftp://<username>:<password>@<domain>/bands/artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'ftp://<username>:<password>@<domain>/bands/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
Result
a.name a.year

'ABBA'

'1992'

'Roxette'

'1986'

'Europe'

'1979'

'The Cardigans'

'1992'

4 rows

Added 4 nodes, Set 8 properties, Added 4 labels

Import from cloud URIs

You can import data from a number of different cloud storages:

See Operations Manual → Load a dump from a cloud storage on how to set up access to cloud storages.

Import from an Azure Cloud Storage URI

You can import data from a CSV file hosted in an Azure Cloud Storage URI.

Example 4. Import artists name and year information from an Azure Cloud Storage URI
azb://azb-account/azb-container/artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'azb://azb-account/azb-container/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
Result
a.name a.year

'ABBA'

'1992'

'Roxette'

'1986'

'Europe'

'1979'

'The Cardigans'

'1992'

4 rows

Added 4 nodes, Set 8 properties, Added 4 labels

Import from a Google Cloud Storage URI

You can import data from a CSV file hosted in a Google Cloud Storage URI.

Example 5. Import artists name and year information from a Google Cloud Storage URI
gs://gs-bucket/artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'gs://gs-bucket/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
Result
a.name a.year

'ABBA'

'1992'

'Roxette'

'1986'

'Europe'

'1979'

'The Cardigans'

'1992'

4 rows

Added 4 nodes, Set 8 properties, Added 4 labels

Import from an AWS S3 URI

You can import data from a CSV file hosted in an AWS S3 URI.

Example 6. Import artists name and year information from an AWS S3 URI
s3://aws-bucket/artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 's3://aws-bucket/artists.csv' AS row
MERGE (a:Artist {name: row[1], year: toInteger(row[2])})
RETURN a.name, a.year
Result
a.name a.year

'ABBA'

'1992'

'Roxette'

'1986'

'Europe'

'1979'

'The Cardigans'

'1992'

4 rows

Added 4 nodes, Set 8 properties, Added 4 labels

Import CSV files using dynamic columns

CSV columns can be referenced dynamically to map labels to nodes in the graph. This enables flexible data handling, allowing labels to be be populated from CSV column values without manually specifying each entry. It also mitigates the risk of Cypher injection. (For more information about Cypher injection, see Neo4j Knowledge Base → Protecting against Cypher injection).

bands-with-headers.csv
Id,Label,Name
1,Band,The Beatles
2,Band,The Rolling Stones
3,Band,Pink Floyd
4,Band,Led Zeppelin
Query
LOAD CSV WITH HEADERS FROM 'file:///bands-with-headers.csv' AS line
MERGE (n:$(line.Label) {name: line.Name})
RETURN n AS bandNodes
Result
bandNodes

(:Band {name: 'The Beatles'})

(:Band {name: 'The Rolling Stones'})

(:Band {name: 'Pink Floyd'})

(:Band {name: 'Led Zeppelin'})

Rows: 4
Added 4 nodes, Set 4 properties, Added 4 labels

Import compressed CSV files

LOAD CSV can read local CSV files compressed with ZIP or gzip. ZIP archives can have arbitrary directory structures but may only contain a single CSV file.

Import a CSV file from within a ZIP file
LOAD CSV FROM 'file:///artists.zip' AS row
MERGE (:Artist {name: row[1], year: toInteger(row[2])})
Note
You can’t load zipped CSV files from remote URLs.

Import data from relational databases

If the source data comes from a relational model, it’s worth evaluating how to gain the most from moving to a graph data model. Before running the import, think about how the data can be modeled as a graph, and adapt its structure accordingly when running the import (see Graph data modeling).

Data from relational databases may consist of one or multiple CSV files, depending on the source database structure. A performant approach is to run multiple passes of LOAD CSV to import nodes separately from relationships.

Example 7. Import from a single CSV file

The source file books.csv contains information about both authors and books. From a graph perspective, these are nodes with different labels, so it takes different queries to load them.

The example executes multiple passes of LOAD CSV on that one file, and each pass focuses on the creation of one entity type.

books.csv
id,title,author,publication_year,genre,rating,still_in_print,last_purchased
19515,The Heights,Anne Conrad,2012,Comedy,5,true,2023/4/12 8:17:00
39913,Starship Ghost,Michael Tyler,1985,Science Fiction|Horror,4.2,false,2022/01/16 17:15:56
60980,The Death Proxy,Tim Brown,2002,Horror,2.1,true,2023/11/26 8:34:26
18793,Chocolate Timeline,Mary R. Robb,1924,Romance,3.5,false,2022/9/17 14:23:45
67162,Stories of Three,Eleanor Link,2022,Romance|Comedy,2,true,2023/03/12 16:01:23
25987,Route Down Below,Tim Brown,2006,Horror,4.1,true,2023/09/24 15:34:18
Query
// Create `Book` nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/books.csv' AS row
MERGE (b:Book {id: row.id, title: row.title})
MERGE (a:Author {name: row.author});

// Create `WROTE` relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/books.csv' AS row
MATCH (a:Author{name: row.author})
MATCH (b:Book{id: row.id})
MERGE (a)-[:WROTE]->(b);
Result
Added 11 nodes, Set 17 properties, Added 11 labels
Created 6 relationships
Example 8. Import from multiple CSV files

The file acted_in.csv contains data about the relationship between actors and the movies they acted in (from persons.csv and movies.csv). Actors and movies are linked through their ID columns person_tmdbId and movieId.

The file also contains the role the actor played in the movie, and it is imported in Neo4j as a relationship property.

acted_in.csv
movieId,person_tmdbId,role
1,12899,Slinky Dog (voice)
1,12898,Buzz Lightyear (voice)
...

It takes three LOAD CSV clauses to import this dataset: the first two create Person nodes from persons.csv and Movie nodes from movies.csv, and the third adds the :ACTED_IN relationship from acted_in.csv.

Query
// Create person nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {name: row.name, tmdbId: row.person_tmdbId});

// Create movie nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {movieId: row.movieId, title: row.title});

// Create relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: row.person_tmdbId})
MATCH (m:Movie {movieId: row.movieId})
MERGE (p)-[r:ACTED_IN {role: row.role}]->(m);
Result
Added 444 nodes, Set 888 properties, Added 444 labels
Added 93 nodes, Set 186 properties, Added 93 labels
Created 372 relationships, Set 372 properties
Tip
For a guide on importing the Northwind dataset from Postgres into Neo4j, see Tutorial: Import data from a relational database into Neo4j in the Getting Started Guide.

Create additional node labels

In Neo4j a node can have multiple labels, while in a relational setting it’s not as straightforward to mix entities. For example, a node in Neo4j can be labeled both Dog and Actor, while in a relational model dogs and actors are separate entities.

After a relational dataset has been imported, there may be further labels that can be added, depending on the use case. Additional labels can speed up pinpointing a node if you use them in your queries.

Example 9. Add extra Actor label on Person nodes

The :ACTED_IN relationship from acted_in.csv implicitly defines actors as a subset of people. The following queries adds an additional Actor label to all people who have an outgoing :ACTED_IN relationship.

Query
MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p
SET p:Actor
Result
Added 353 labels

Pre-process the data during import

Cast CSV columns to Neo4j data types

LOAD CSV inserts all imported CSV data as STRING properties. However, Neo4j supports a range of data types, and storing data with appropriate types allows both to query it more effectively and to process it with type-specific Cypher functions.

Example 10. Import numeric and temporal data

The column person_tmdbId and born in the file persons.csv contains INTEGER and DATE values respectively. The functions toInteger() and date() allow to cast those values to the appropriate types before importing them.

persons.csv
person_tmdbId,bio,born,bornIn,died,person_imdbId,name,person_poster,person_url
3,"Legendary Hollywood Icon Harrison Ford was born on July 13, 1942 in Chicago, Illinois.   His family history includes a strong lineage of actors, radio personalities, and models.   Harrison attended public high school in Park Ridge, Illinois where he was a member of the school Radio Station WMTH.  Harrison worked as the lead voice for sports reporting at WMTH for several years.   Acting wasn’t a major interest to Ford until his junior year at Ripon College when he first took an acting class...",1942-07-13,"Chicago, Illinois, USA",,148,Harrison Ford,https://image.tmdb.org/t/p/w440_and_h660_face/5M7oN3sznp99hWYQ9sX0xheswWX.jpg,https://themoviedb.org/person/3
...
Query
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET p.name = row.name, p.born = date(row.born)
RETURN
  p.name AS name,
  p.tmdbId AS tmdbId,
  p.born AS born
LIMIT 5
Result
name tmdbId born

'Harrison Ford'

3

1942-07-13

'Tom Hanks'

31

1956-07-09

'Robin Wright'

32

1966-04-08

'Sally Field'

35

1946-11-06

'Sean Bean'

48

1959-04-17

5 rows

Added 444 nodes, Set 1332 properties, Added 444 labels

For a list of type casting functions, see Casting data values.

Handle null values

Neo4j does not store null values. null or empty fields in a CSV files can be skipped or replaced with default values in LOAD CSV.

Example 11. Processing a file with null values

In the file companies.csv, some rows do not specify values for some columns. The examples show several options of how to handle null values.

companies.csv
Id,Name,Location,Email,BusinessType
1,Neo4j,San Mateo,[email protected],P
2,AAA,,[email protected],
3,BBB,Chicago, ,G
,CCC,Michigan,[email protected],G
Skip null values
LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id})
Provide a default for null values
LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id, hqLocation: coalesce(row.Location, "Unknown")})
Change empty STRING values to null values (not stored)
LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
WITH row
WHERE row.Id IS NOT NULL
MERGE (c:Company {id: row.Id})
SET c.email = nullIf(trim(row.Email), "")
Tip
null values are not stored in the database. A strategy for selectively getting rid of some values is to map them into null values. The empty STRING values from the last query serve as an example.

Split list values

The function split() allows to convert a STRING of elements into a list.

Example 12. Parse movies languages and genres as lists

The file movies.csv contains a header line and a total of 94 lines.

The columns languages and genres contain list-like values. Both are separated by a pipe |, and split() allows to make them into Cypher lists ahead of inserting them into the database.

movies.csv
movieId,title,budget,countries,movie_imdbId,imdbRating,imdbVotes,languages,plot,movie_poster,released,revenue,runtime,movie_tmdbId,movie_url,year,genres
1,Toy Story,30000000.0,USA,114709,8.3,591836,English,A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.,https://image.tmdb.org/t/p/w440_and_h660_face/uXDfjJbdP4ijW5hWSBrPrlKpxab.jpg,1995-11-22,373554033.0,81,862,https://themoviedb.org/movie/862,1995,Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji,65000000.0,USA,113497,6.9,198355,English|French,"When two kids find and play a magical board game, they release a man trapped for decades in it and a host of dangers that can only be stopped by finishing the game.",https://image.tmdb.org/t/p/w440_and_h660_face/vgpXmVaVyUL7GGiDeiK1mKEKzcX.jpg,1995-12-15,262797249.0,104,8844,https://themoviedb.org/movie/8844,1995,Adventure|Children|Fantasy
...
Query
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {id: toInteger(row.movieId)})
SET
    m.title = row.title,
    m.imdbId = toInteger(row.movie_imdbId),
    m.languages = split(row.languages, '|'),
    m.genres = split(row.genres, '|')
RETURN
  m.title AS title,
  m.imdbId AS imdbId,
  m.languages AS languages,
  m.genres AS genres
LIMIT 5
Result
title imdbId languages genres

'Toy Story'

114709

['English']

['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy']

'Jumanji'

113497

['English', 'French']

['Adventure', 'Children', 'Fantasy']

'Grumpier Old Men'

113228

['English']

['Comedy', 'Romance"]

'Waiting to Exhale'

114885

['English']

['Comedy', 'Romance', 'Drama']

'Father of the Bride Part II'

113041

['English']

['Comedy']

5 rows

Added 93 nodes, Set 465 properties, Added 93 labels

For more STRING manipulation functions, see String functions.

Recommendations

Create property uniqueness constraints

Always create property uniqueness constraints prior to importing data, to avoid duplicates or colliding entities. If the source file contains duplicated data and the right constraints are in place, Cypher raises an error.

Example 13. Create a node property uniqueness constraints on person ID
persons.csv
person_tmdbId,bio,born,bornIn,died,person_imdbId,name,person_poster,person_url
3,"Legendary Hollywood Icon Harrison Ford was born on July 13, 1942 in Chicago, Illinois.   His family history includes a strong lineage of actors, radio personalities, and models.   Harrison attended public high school in Park Ridge, Illinois where he was a member of the school Radio Station WMTH.  Harrison worked as the lead voice for sports reporting at WMTH for several years.   Acting wasn’t a major interest to Ford until his junior year at Ripon College when he first took an acting class...",1942-07-13,"Chicago, Illinois, USA",,148,Harrison Ford,https://image.tmdb.org/t/p/w440_and_h660_face/5M7oN3sznp99hWYQ9sX0xheswWX.jpg,https://themoviedb.org/person/3
...
Create a node property uniqueness constraint on person ID
CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (p:Person) REQUIRE p.tmdbId IS UNIQUE
Result
Added 1 constraints

Handle large amounts of data

LOAD CSV may run into memory issues with files containing a significant number of rows (approaching hundreds of thousands or millions). For large files, it’s recommended to split the import process in several lighter transactions through the clause CALL {…​} IN TRANSACTIONS.

Example 14. Load a large CSV file in several transactions

The file persons.csv contains a header line and a total of 869 lines. The example loads the name and born columns in transactions of 200 rows.

persons.csv
person_tmdbId,bio,born,bornIn,died,person_imdbId,name,person_poster,person_url
3,"Legendary Hollywood Icon Harrison Ford was born on July 13, 1942 in Chicago, Illinois.   His family history includes a strong lineage of actors, radio personalities, and models.   Harrison attended public high school in Park Ridge, Illinois where he was a member of the school Radio Station WMTH.  Harrison worked as the lead voice for sports reporting at WMTH for several years.   Acting wasn’t a major interest to Ford until his junior year at Ripon College when he first took an acting class...",1942-07-13,"Chicago, Illinois, USA",,148,Harrison Ford,https://image.tmdb.org/t/p/w440_and_h660_face/5M7oN3sznp99hWYQ9sX0xheswWX.jpg,https://themoviedb.org/person/3
...
Note
The below query uses a variable scope clause (introduced in Neo4j 5.23) to import variables into the CALL subquery. If you are using an older version of Neo4j, use an importing WITH clause instead.
Query
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
CALL (row) {
  MERGE (p:Person {tmdbId: row.person_tmdbId})
  SET p.name = row.name, p.born = row.born
} IN TRANSACTIONS OF 200 ROWS
Result
Added 444 nodes, Set 1332 properties, Added 444 labels
Note
In case of errors, CALL {…​} IN TRANSACTIONS may only import a part of the CSV data as the transactions are committed. For example, if the first 200 rows are error free, they are committed. If the next 200 rows contain data that causes an error, the second transaction fails, but leaves the first transaction unaffected.

LOAD CSV and Neo4j functions

Access line numbers with linenumber()

The linenumber() function provides the line number which LOAD CSV is operating on, or null if called outside of a LOAD CSV context.

A common use case for this function is to generate sequential unique IDs for CSV data that doesn’t have a unique column already.

Example 15. linenumber()
artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS row
RETURN linenumber() AS number, row
Result
number row

1

["1","ABBA","1992"]

2

["2","Roxette","1986"]

3

["3","Europe","1979"]

4

["4","The Cardigans","1992"]

4 rows

Access the CSV file path with file()

The file() function provides the absolute path of the file that LOAD CSV is operating on, or null if called out of a LOAD CSV context.

Example 16. file()
artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS row
RETURN DISTINCT file() AS path
Result
path

'/artists.csv'

Tip
file() always returns a local path, even when loading remote CSV files. For remote resources, file() returns the temporary local path it was downloaded to.

CSV file format

The CSV file format and LOAD CSV interact as follows:

  • The file character encoding must be UTF-8.

  • The line terminator is system dependent (\n for Unix and \r\n for Windows).

  • The default field delimiter is ,. Change it with the option FIELDTERMINATOR.

  • CSV files may contain quoted STRING values, and the quotes are dropped when LOAD CSV reads the data.

  • If dbms.import.csv.legacy_quote_escaping is set to the default value of true, \ is used as an escape character.

  • A double quote must be in a quoted STRING and escaped, with either the escape character or a second double quote.

Headers

If the CSV file starts with a header row containing column names, each import row in the file acts as a map instead of an array.

You must indicate the presence of the header row by adding WITH HEADERS to the query. You can then access specific fields by their corresponding column name.

Example 17. Parsing a CSV as a list of maps
artists-with-headers.csv
Id,Name,Year
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV WITH HEADERS FROM 'file:///artists-with-headers.csv' AS row
MERGE (a:Artist {name: row.Name, year: toInteger(row.Year)})
RETURN
  a.name AS name,
  a.year AS year
Result
name year

"ABBA"

1992

"Roxette"

1986

"Europe"

1979

"The Cardigans"

1992

4 rows

Added 4 nodes, Set 8 properties, Added 4 labels

Field delimiter

The default field delimiter is ,. Use the FIELDTERMINATOR option to specify a different field delimiter.

If you try to import a file that doesn’t use , as field delimiter and you also don’t specify a custom delimiter, LOAD CSV will interpret the CSV as having a single column.

Example 18. Import a CSV using ; as field delimiter
artists-fieldterminator.csv
1;ABBA;1992
2;Roxette;1986
3;Europe;1979
4;The Cardigans;1992
Query
LOAD CSV FROM 'file:///artists-fieldterminator.csv' AS row FIELDTERMINATOR ';'
MERGE (:Artist {name: row[1], year: toInteger(row[2])})
Result
Added 4 nodes, Set 8 properties, Added 4 labels
Note
You can use the hexadecimal representation of the unicode character for the field delimiter if you prepend \u. Write the encoding with four digits: for example, \u003B is equivalent to ; (semicolon).

Quotes escaping

Quoted STRING values are allowed in the CSV file and the quotes are dropped when LOAD CSV reads the data. If quoted STRING values must contain quote characters ", there are two ways to escape them:

  1. Double quotes — Use another quote " to escape a quote (for example, the CSV encoding of the STRING The "Symbol" is "The ""Symbol""").

  2. Prefix with backslash \ — If the configuration setting dbms.import.csv.legacy_quote_escaping is set to true (the default value), \ works as the escape character for quotes (for example, the CSV encoding of the STRING The "Symbol" is "The \"Symbol\"").

Example 19. Import a CSV with double-quotes escaping
artists-with-escaped-quotes.csv
"1","The ""Symbol""","1992"
"2","The \"Symbol\"","1992"
Query
LOAD CSV FROM 'file:///artists-with-escaped-quotes.csv' AS row
MERGE (a:Artist {id: toInteger(row[0]), name: row[1], year: toInteger(row[2])})
RETURN
  a.id AS id,
  a.name AS name,
  a.year AS year,
  size(a.name) AS size
Result
id name year size

1

'The "Symbol"'

1992

12

2

'The "Symbol"'

1992

12

Added 2 nodes, Set 6 properties, Added 2 labels

Note that name is a STRING, as it is wrapped in quotes in the output. The third column outputs the STRING length as size. The length only counts what is between the outer quotes, but not the quotes themselves.

Check source data quality

In case of a failed import, there are some elements to check to ensure the source file is not corrupted.

  • Inconsistent headers — The CSV header may be inconsistent with the data. It can be missing, have too many columns or have a different delimiter. Verify that the header matches the data in the file. Adjust the formatting, delimiters or columns.

  • Extra or missing quotes — Standalone double or single quotes in the middle of non-quoted text or non-escaped quotes in quoted text can cause issues reading the file. Either escape or remove stray quotes. See Quotes escaping.

  • Special or newline characters — When dealing with special characters in a file, ensure they are quoted or remove them.

  • Inconsistent line breaks — Ensure line breaks are consistent throughout your file.

  • Binary zeros, BOM byte order mark and other non-text characters — Unusual characters or tool-specific formatting are sometimes hidden in application tools, but become apparent in plain-text editors. If you come across these types of characters in your file, either remove them or use Cypher’s normalize function.

Inspect source files ahead of import

Before importing data into the database, you can use LOAD CSV to inspect a source file and get an idea of what form the imported data is going to have.

Example 20. Assert correct line count
// Assert correct line count
LOAD CSV FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS line
RETURN count(*);
Result
count(*)

445

1 row

Example 21. Check the first five lines with header sampling
// Check first 5 line-sample with header-mapping
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS line
RETURN line.person_tmdbId, line.name
LIMIT 5;
Result
line.person_tmdbId line.name

'3'

'Harrison Ford'

'31'

'Tom Hanks'

'32'

'Robin Wright'

'35'

'Sally Field'

'48'

'Sean Bean'

5 rows

Example

Erase current database and import the full movie dataset
// Clear data
MATCH (n) DETACH DELETE n;

// Create constraints
CREATE CONSTRAINT Person_tmdbId IF NOT EXISTS
FOR (p:Person) REQUIRE p.tmdbId IS UNIQUE;

CREATE CONSTRAINT Movie_movieId IF NOT EXISTS
FOR (m:Movie) REQUIRE m.movieId IS UNIQUE;

// Create person nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/persons.csv' AS row
MERGE (p:Person {tmdbId: toInteger(row.person_tmdbId)})
SET p.name = row.name, p.born = date(row.born);

// Create movie nodes
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/movies.csv' AS row
MERGE (m:Movie {id: toInteger(row.movieId)})
SET
    m.title = row.title,
    m.imdbId = toInteger(row.movie_imdbId),
    m.languages = split(row.languages, '|'),
    m.genres = split(row.genres, '|');

// Create relationships
LOAD CSV WITH HEADERS FROM 'https://data.neo4j.com/importing-cypher/acted_in.csv' AS row
MATCH (p:Person {tmdbId: toInteger(row.person_tmdbId)})
MATCH (m:Movie {id: toInteger(row.movieId)})
MERGE (p)-[r:ACTED_IN]->(m)
SET r.role = row.role;

// Set additional node label
MATCH (p:Person)-[:ACTED_IN]->()
WITH DISTINCT p
SET p:Actor;
Result
Added 1 constraints
Added 1 constraints
Added 444 nodes, Set 1332 properties, Added 444 labels
Added 93 nodes, Set 465 properties, Added 93 labels
Created 372 relationships, Set 372 properties
Added 353 labels
Note
With increasing amounts of data, it is more efficient to create all nodes first, and then add relationships with a second pass.

Other ways of importing data

There are a few other tools to get CSV data into Neo4j.

  1. The neo4j-admin database import command is the most efficient way of importing large CSV files.

  2. Use a language library to parse CSV data and run creation Cypher queries against a Neo4j database. Created as an extension library to provide common procedures and functions to developers. This library is especially helpful for complex transformations and data manipulations. Useful procedures include apoc.load.jdbc, apoc.load.json, and others.

  3. The ETL Tool: allows to extract the schema from a relational database and turn it into a graph model. It then takes care of importing the data into Neo4j.

  4. The Kettle import tool maps and executes steps for the data process flow and works well for very large datasets, especially if you are already familiar with using this tool.