Implement datasets data model #137

andresgutgon · 2024-09-05T12:36:26Z

What?

☝️ Table for storing data related with uploaded datasets

What?

Next TODO

Preview dataset content modal (500 first rows)
Table loading skeleton

andresgutgon · 2024-09-06T13:56:33Z

packages/core/src/services/datasets/destroy.ts

+  },
+  db = database,
+) {
+  const deleteResult = await disk.delete(dataset.fileKey)


This deletes the file in disk (Filesystem or S3 or other clouds). Do you think we should do in another way, or is blocking the request fine?

fine for now although it would have been more scalable to delete the file in a job afterwards

Let me try in production with S3 enabled and we can move it to a job

andresgutgon · 2024-09-06T14:06:53Z

apps/web/src/components/modals/DestroyModal/index.tsx

+  description,
+  submitStr,
+  model,
+}: Props<TServerAction>) {


Improve generic for destroy modal

geclos · 2024-09-06T15:00:15Z

apps/web/src/actions/datasets/create.ts

-  'application/vnd.ms-excel',
-  'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
-  'application/vnd.oasis.opendocument.spreadsheet',
-]
 const MAX_SIZE = 3


3mb only? that's very small, i'd push it to 25 or so

25mb of text?

well the theory is that they want to evaluate on a large body of data no?, 3mb doesn't push our infra hard at all and it can be limiting for no reason

25 then. But node server will handle it?

We'll have to do this https://nextjs.org/docs/app/api-reference/next-config-js/serverActions#bodysizelimit

Although the best approach would be to direct upload to S3 and then process the file when S3 responds. But we need to change how upload is done

I'll start with 15MB and see if someone complains

geclos · 2024-09-06T15:01:09Z

apps/web/src/stores/datasets.ts

+  return {
+    data,
+    mutate,
+    createFormAction,


you shouldn't use this as it won't be compatible with the json-based server action

I need this to submit a multi-part form with attachment. This is already working

True. Edge case.

packages/core/src/lib/readCsv.ts

packages/core/src/repositories/datasetsRepository.ts

packages/core/src/schema/models/datasets.ts

packages/core/src/schema/relations.ts

geclos · 2024-09-06T15:06:01Z

packages/core/src/services/datasets/create.ts

@@ -22,15 +24,47 @@ export const createDataset = async (
    }
    disk: DiskWrapper


this is called disk but it can be other file storage no? like s3 or others

This is how they called in flydrive

geclos

minor comments 👌🏼

andresgutgon · 2024-09-07T16:22:52Z

packages/core/drizzle/0044_needy_the_hood.sql

+--> statement-breakpoint
+CREATE INDEX IF NOT EXISTS "datasets_workspace_idx" ON "latitude"."datasets" USING btree ("workspace_id");--> statement-breakpoint
+CREATE INDEX IF NOT EXISTS "datasets_author_idx" ON "latitude"."datasets" USING btree ("author_id");--> statement-breakpoint
+CREATE UNIQUE INDEX IF NOT EXISTS "datasets_workspace_id_name_index" ON "latitude"."datasets" USING btree ("workspace_id","name");


☝️ Added 3 indexes

Workspace

Author

Workspace + Name

I'll add UI validation for workspace + name uniqueness. This was a recommendation from @csansoon . He said it's nice to don't repeat names in datasets and I agree.

andresgutgon added the 🚧 wip Work in progress label Sep 5, 2024

andresgutgon force-pushed the feature/datasets-data-model branch 3 times, most recently from adb6771 to bc5aee9 Compare September 6, 2024 10:54

andresgutgon commented Sep 6, 2024

View reviewed changes

andresgutgon removed the 🚧 wip Work in progress label Sep 6, 2024

andresgutgon force-pushed the feature/datasets-data-model branch from 8c0f56c to ee4887c Compare September 6, 2024 14:05

andresgutgon commented Sep 6, 2024

View reviewed changes