-
Notifications
You must be signed in to change notification settings - Fork 102
Columns and Lists
Columns and Lists are two primary data structures that we use to store node and rel properties as well as adjacency lists in the system.
A Column
is simply an array with a value (of the same data type) at each offset. Hence, given an offset, a Column
gives constant-time random access to its associated value.
Lists
can be imagined as the extension to Columns where multiple values can be associated per offset. Hence, given an offset, a Lists
gives constant-time random access to the associated list that contains an array of values (of the same data type). We use a 2-level CSR structure to organize data in Lists
. We logically differentiate lists into the following types:
- Small List: A list is defined as small if it can fit into one system page, that is in 4KB. For example, if elements of lists occupy 4Bytes each, then we say that a list is small is it contains less than or equal to 1024.
- Large List: If it is not a small list.
Lists
are also logically divided into chunks and all the small lists in a chunk are contiguously stored in the file. The size of the chunk is fixed in the system to be 512
. Large Lists of a chunk span multiple pages and are stored outside the chunk, though contiguously.
We further go on to group offsets in Lists into chunks of 512. All the lists of a single chunk are stored contiguously in the file With each lists, we use 2 auxiliary structures that contains information about where to find the data (which is a list) in the lists data structure.
- Headers: There is one header per offset in the lists data structure. A header contains the
- ListsMetadata
- Coding Style
- Casting Rules
- Frontend
- Processor
- Storage
- Test