Skip to content

Commit

Permalink
update tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
rishabh-ranjan committed Aug 12, 2024
1 parent bd8d4b3 commit 8037530
Showing 1 changed file with 40 additions and 0 deletions.
40 changes: 40 additions & 0 deletions tutorials/custom_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,46 @@
"Inside the `make_db` function we first download the raw files (or read from the local filesystem) and then create a `relbench.base.Database` object out of those. Thus, the `make_db` functions serves as documentation for your pre-processing steps, while also conveniently allowing you to develop and debug them within the RelBench framework."
]
},
{
"cell_type": "markdown",
"id": "a9552931-ef4a-4c2b-b63e-6ec56738db31",
"metadata": {
"execution": {
"iopub.execute_input": "2024-08-12T05:14:46.141436Z",
"iopub.status.busy": "2024-08-12T05:14:46.140960Z",
"iopub.status.idle": "2024-08-12T05:14:46.160386Z",
"shell.execute_reply": "2024-08-12T05:14:46.159281Z",
"shell.execute_reply.started": "2024-08-12T05:14:46.141396Z"
}
},
"source": [
"#### Pkey/Fkey Reindexing"
]
},
{
"cell_type": "markdown",
"id": "f9845d78-7943-47be-a1ce-da5b0b01a3d4",
"metadata": {},
"source": [
"The intended usage is not to call the `make_db` function directly but to use the `get_db` function which internally calls `make_db` and adds a layer of other functionality such as caching."
]
},
{
"cell_type": "markdown",
"id": "c67714b3-f3f0-4841-b66f-3623242ff033",
"metadata": {},
"source": [
"Another important thing that `get_db` does is that it calls `db.reindex_pkeys_and_fkeys()` on the database `db` returned by `make_db`. This reindexes the primary- and foreign- key columns so that the primary keys columns are consecutive integers starting from 0. This makes some downstream logic in RelBench convenient to implement, as it can work under the unified assumption that the pkeys and fkeys are integers, and that too sequential."
]
},
{
"cell_type": "markdown",
"id": "dcc2dc99-128f-43dd-bb31-20752ee469d8",
"metadata": {},
"source": [
"If you want to preserve the original pkey values, either because you believe they can be used as features for predictive tasks, or because you would like to cross-reference the prediction results with the original data source, simply add a duplicate column without marking it as pkey_col. The model designer is free to decide whether to include this duplicate column as input to the model or not."
]
},
{
"cell_type": "markdown",
"id": "fec0b02e-ce3e-426b-8fe2-38b8fa8d20c8",
Expand Down

0 comments on commit 8037530

Please sign in to comment.