Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Separation of file meta data #345

Closed
lpoli opened this issue Sep 20, 2021 · 5 comments
Closed

[FEATURE] Separation of file meta data #345

lpoli opened this issue Sep 20, 2021 · 5 comments
Assignees
Milestone

Comments

@lpoli
Copy link
Contributor

lpoli commented Sep 20, 2021

Currently we put all of the file meta data of all allocations to a single table reference_objects. Since there can be multiple allocations and each allocation can have multitude of files/directories and soft delete is implemented(i.e row is not deleted upon delete query instead its deleted_at field is updated with current timestamp), this reference_objects table grows very large.

Also primary id of file metadata in reference_objects won't be same across blobbers for same allocation. File ref id is important for it to be unique. It will have use cases in future applications just like inode number in linux system; currently it is required by at least 0fs.

One solution for the file ref id issue would be to add new column say unique_id for example in reference_objects table and make it unique within an allocation.

The other solution would be to add new table for file metadata for each allocation. This makes file metadata more granular. Also primary id of file metadata will be consistent across blobbers. When allocation expires, blobber can simply drop the table. Since query for each allocation will look into its own respective table it will obviously improve performance.
The other benefit later on would be to allow client to choose indexing as per their requirement.

@lpoli
Copy link
Contributor Author

lpoli commented Sep 20, 2021

This is much of a discussion than issue

@cnlangzi
Copy link
Contributor

I need more time to check and think it. I will keep you update

@cnlangzi cnlangzi added this to the v1.0.2 milestone Oct 2, 2021
@cnlangzi
Copy link
Contributor

cnlangzi commented Oct 2, 2021

similar to #301

@moldis moldis changed the title Separation of file meta data [FEATURE] Separation of file meta data Oct 25, 2021
@cnlangzi cnlangzi assigned lpoli and unassigned cnlangzi Feb 24, 2022
@sculptex
Copy link

Regarding creation of table instances of reference_objects per allocation;

As per more recent discussions, we should also include allocation_updates in this optimization.

Full allocation_id (varchar(64)) is too long to include as suffix to e.g. reference_objects_xxxxxxxx, so we should use unique index generated by allocations table itself.

I propose standard for such be obj_idn referring to unique hash of obj_id, so in case of allocations,
allocation_idn (int) gets added as primary key and
allocation_id becomes indexed field. References to allocation_id can then be replaced by more compact allocation_idn, such as suffix of reference_objects_nnnn and allocation_updates_nnnn as well as potential to replace such obj_id key references in other tables like replacing
allocation_id varchar(64) with
allocation_idn (int)

If wrapper functions are able to be implemented that will initially return existing table until new model implemented then this allows vast bulk of change to be implemented in readiness without breaking change;
e.g. reference to reference_objects table be replaced by get_reference_objects_table(allocation_id). Initially this will just return reference_objects table but once model change implemented, function can be switched to return reference_objects_nnnn table. (Can be fetched from in memory map[]). Simultaneous dropping of (now redundant) allocation_id field from table and struct would also require handling.

Note: stats functions as returned by _stats seem to be only place where multiple allocations are required to be referenced by queries. This method is inefficient and needs replacing with more modular method anyway.

@lpoli
Copy link
Contributor Author

lpoli commented Apr 15, 2022

This issue is incorporated in (#627)

@lpoli lpoli closed this as completed Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants