Skip to content
/ NacDB Public

Multiple small databases for Internet Computer

License

Notifications You must be signed in to change notification settings

vporton/NacDB

Repository files navigation

NacDB

This is NacDB distributed database: A database with seamless enumeration/scanning of items, because it is split into multiple sub-DBs, each fitting in a canister.

It is anticipated that NacDB will often be used together with CanDB.

During this work I developed a strategy how to accomplish reliable operations over unreliable actors in actor model of ICP. It is a serious computer science research, I should publish this in peer review.

Streams ordering items by voting results will be implemented using NacDBReorder library (in development). It uses an advanced combination of data structures to add, reorder, and delete items.

The current stage of development is a public beta. The public API is frozen.

It is usually recommended to use NacDB together with CanDB, because NacDB is strong in one specific point: seamless enumeration of its sub-databases. For example, in a usual workflow, NacDB could store CanDB keys rather than full values. The full values can be stored in CanDB. Both CanDB and NacDB are implemented in Motoko.

TypeScript client is not provided, because it is automatically created from Candid.

See also a related derived library NacDBReorder, that allows efficient reordering items in NacDB.

Architecture: General

NacDB is a no-SQL multicanister database. In each canister there are several sub-DBs.

Each sub-DB is seamlessly enumerable (unlike CanDB).

When databases in a canister become too big or too many, a new canister is created and a sub-database is moved to it (or, if the last canister is not yet filled, the sub-DB is just moved to it). Such the architecture is chosen because of high cost of creating a new canister. Each sub-database has so called "outer ID" or "outer key" that does not change when a sub-DB is moved. You can also use "inner key" for quikier operations. If a move happens, inner key becomes invalid and the next operation returns an error or null. In this case, you need to update the inner key from the outer key to continue.

When to move a sub-DB is decided by moveCap value of the following type, that restricts the memory used by the canister (the move occurs when we have the actual value above moveCap threshold):

type MoveCap = { #usedMemory: Nat };

Some functions take the argument guid of type GUID = Blob. For the database work correctly, you pass a GUID generated by you (You are recommended to use a cryptographically secure random generator initialized by an entropy value.) into this argument. If an operation such as insert fails, you can retry it by calling again with the same guid argument value. Separate operations should always have different GUIDs, for one operation not to overwrite another.

Architecture: Details

You are recommended to copy (and possibly modify, e.g. add access control) code from example/src/index/ and example/src/partition/ to use this system. These folders contain source for the "index" (controller) canister and for "partition" (part of the actual DB) canisters. You create only index canister (as exampled in example/src/example_backend), the partition canisters will be created by index canister automatically.

Don't forget after copying code add authentication to it, to restrict which operations can be done on your DB.

As you see in example/src/partition/, each partition contains a stable variable of type SuperDB. SuperDB contains several values of type SubDB (that is several sub-databases).

As you see in example/src/index/, the index canister contains a stable variable of type DBIndex (the common data for the entire multi-canister database).

As examples example/src/index/ and example/src/partition/ show, you define shared functions using operations provided by this library on variables of types DBIndex and SuperDB.

Keys in SuperDB (identifying sub-databases) are of the type SubDBKey = Nat. Keys in sub-DBs are of the type SK = Text. Values stored in the sub-DBs are of type AttributeKey defined similarly to the same-named type in CanDB, but I chose to return AttributeKey directly, not a map of values (as in CanDB).

Looser Items

Each sub-DB has optional Nat value hardCap. If the number of items in the sub-DB reaches this number, the value with the smallest (SK = Text) key is removed (it is useful among other things to ensure that a sub-DB fits into a canister).

More on examples

example/src/example_backend shows an example of usage of the system:

The example uses insert and get functions (as defined in example/src/partition/) to store and again retrieve a value to/from a sub-DB. There are also has (for an element of a sub-DB), hasSubDB, delete (for an element of a sub-DB), deleteSubDB, subDBSize, createSubDB, and for enumeration of elements of a sub-DB iter, entries, and entriesRev (the reverse order iterator), as well as scanLimit that returns an array instead of an iterator.

Locking

In src/partition/ there is used locking by boolean flags: ?... variable moving and Bool variable moving. While these flags are set, both write and read operations fail (and need to be repeated).

Testing the project locally

If you want to test your project locally, you can use the following commands inside the folder example/:

# Starts the replica, running in the background
dfx start --background

# Install dependencies
mops i

# Deploys your canisters to the replica and generates your candid interface
make deploy

Once the job completes, your application will be available at http://localhost:4943?canisterId={asset_canister_id}.

If you have made changes to your backend canister, you can generate a new candid interface with

npm run generate

at any time. This is recommended before starting the frontend development server, and will be run automatically any time you run dfx deploy.

If you are making frontend changes, you can start a development server with

npm start

Which will start a server at http://localhost:8080, proxying API requests to the replica at port 4943.

It also has unit test (defunct after switching to stress test, because moc) interpreter does not support cycle operations.

Stress test running several threads, each doing several hundreds random operations is activated by commands

cd stress-test
make

The stress test sometimes fails. Apparently, it is because a bug in test itself.

Note on frontend environment variables

If you are hosting frontend code somewhere without using DFX, you may need to make one of the following adjustments to ensure your project does not fetch the root key in production:

  • set DFX_NETWORK to ic if you are using Webpack
  • use your own preferred method to replace process.env.DFX_NETWORK in the autogenerated declarations
    • Setting canisters -> {asset_canister_id} -> declarations -> env_override to a string in dfx.json will replace process.env.DFX_NETWORK with the string in the autogenerated declarations
  • write your own createActor constructor

About

Multiple small databases for Internet Computer

Resources

License

Stars

Watchers

Forks

Packages

No packages published