Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement datasets data model #137

Merged
merged 1 commit into from
Sep 9, 2024
Merged

Conversation

andresgutgon
Copy link
Contributor

@andresgutgon andresgutgon commented Sep 5, 2024

What?

☝️ Table for storing data related with uploaded datasets

What?

  • Store file metadata
  • Store headers
  • Store number of rows
  • List of datasets
  • Destroy dataset modal
  • Add uploaded by column
  • Add Created at column
  • Drag & Drop do not detect the file uploaded
  • Remove file when dataset is destroyed
  • Add index to workspace_id and author_id.
  • Make unique datasets.name in combination with workspace_id
  • Add CSV delimiter [',', '\t', ' ', ';']. We need to know this in order to read the CSV. Nothing is easy 😂

Next TODO

  • Preview dataset content modal (500 first rows)
  • Table loading skeleton

@andresgutgon andresgutgon added the 🚧 wip Work in progress label Sep 5, 2024
@andresgutgon andresgutgon force-pushed the feature/datasets-data-model branch 3 times, most recently from adb6771 to bc5aee9 Compare September 6, 2024 10:54
},
db = database,
) {
const deleteResult = await disk.delete(dataset.fileKey)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletes the file in disk (Filesystem or S3 or other clouds). Do you think we should do in another way, or is blocking the request fine?

Copy link
Collaborator

@geclos geclos Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine for now although it would have been more scalable to delete the file in a job afterwards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try in production with S3 enabled and we can move it to a job

@andresgutgon andresgutgon removed the 🚧 wip Work in progress label Sep 6, 2024
@andresgutgon andresgutgon force-pushed the feature/datasets-data-model branch from 8c0f56c to ee4887c Compare September 6, 2024 14:05
description,
submitStr,
model,
}: Props<TServerAction>) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve generic for destroy modal

'application/vnd.ms-excel',
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'application/vnd.oasis.opendocument.spreadsheet',
]
const MAX_SIZE = 3
Copy link
Collaborator

@geclos geclos Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3mb only? that's very small, i'd push it to 25 or so

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25mb of text?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the theory is that they want to evaluate on a large body of data no?, 3mb doesn't push our infra hard at all and it can be limiting for no reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 then. But node server will handle it?

We'll have to do this https://nextjs.org/docs/app/api-reference/next-config-js/serverActions#bodysizelimit

Although the best approach would be to direct upload to S3 and then process the file when S3 responds. But we need to change how upload is done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start with 15MB and see if someone complains

return {
data,
mutate,
createFormAction,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't use this as it won't be compatible with the json-based server action

Copy link
Contributor Author

@andresgutgon andresgutgon Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need this to submit a multi-part form with attachment. This is already working

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Edge case.

@@ -22,15 +24,47 @@ export const createDataset = async (
}
disk: DiskWrapper
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called disk but it can be other file storage no? like s3 or others

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how they called in flydrive
image

Copy link
Collaborator

@geclos geclos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments 👌🏼

@andresgutgon andresgutgon force-pushed the feature/datasets-data-model branch from ee4887c to c026196 Compare September 7, 2024 16:20
--> statement-breakpoint
CREATE INDEX IF NOT EXISTS "datasets_workspace_idx" ON "latitude"."datasets" USING btree ("workspace_id");--> statement-breakpoint
CREATE INDEX IF NOT EXISTS "datasets_author_idx" ON "latitude"."datasets" USING btree ("author_id");--> statement-breakpoint
CREATE UNIQUE INDEX IF NOT EXISTS "datasets_workspace_id_name_index" ON "latitude"."datasets" USING btree ("workspace_id","name");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ Added 3 indexes

  1. Workspace
  2. Author
  3. Workspace + Name

I'll add UI validation for workspace + name uniqueness. This was a recommendation from @csansoon . He said it's nice to don't repeat names in datasets and I agree.

@andresgutgon andresgutgon merged commit 35b1df6 into main Sep 9, 2024
3 checks passed
@andresgutgon andresgutgon deleted the feature/datasets-data-model branch September 9, 2024 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants