-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIME type sniffing #30
Conversation
src/filesystem/nodes/metadata/mod.rs
Outdated
pub enum MetadataKey { | ||
MimeType, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that being able to store arbitrary bytes with an arbitrary key was an intentional design choice to maximize flexibility, but that might be inaccurate. But obviously restricting the key type to an enum has value in terms of organization and consistency.
@sstelfox any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought the same, but then realized Node is not exposed to the client (if I remember correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata entries are going to need to support arbitrary keys. We intend to allow users to set and access any key by string (case-sensitive, string limit of 255 characters) it just isn't fully exposed out to the public facing APIs yet (just on the Node type). There are going to be some "reserved" keywords for things like this.
I do like the idea of using enums for the reserved type, maybe we could just add an extra variant like "User" or "Custom" that contains a String if it doesn't match a reserved type.
At a high level, I don't know how I feel about adding the concept of mime types into the filesystem. It seems one or two logical layers above the layer of a "file system". And requires bringing in two crates, at least one of which has a questionable future. I wonder if this could be its own crate that the fuse crate or others could import that is built on top of banyanFS, maybe defining constants that are used for the metadata key. Or maybe less extreme it could probably be an optional feature that banyanFS can be built without? |
I believe the main use case of all this was for |
I'm in the process of reviewing this but yeah we largely want this as part of this implementation and not part of the protocol / spec itself. Our WASM front-end needs to have this mime guessed information using more than just the file extension. We don't want to have to retrieve the data from storage to perform the mime type guessing whenever we need it so it needs to be part of the metadata associated with entries. We could have done this in javascript itself, but previously discussed an asset pipeline living at a higher level inside BanyanFS for doing things like image preview generation (to store as associated data), likewise things like embeddings could easily be stored in associated data and would probably need the mime type for selecting the type of embedding models to use for that type. So yes I think Mime belong in here, but not the spec. It's probably worth feature gating whether this is present or not to allow for use in small or minimal environments but I don't think that is necessary for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor tweaks but looks really solid!
src/filesystem/nodes/metadata/mod.rs
Outdated
pub enum MetadataKey { | ||
MimeType, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata entries are going to need to support arbitrary keys. We intend to allow users to set and access any key by string (case-sensitive, string limit of 255 characters) it just isn't fully exposed out to the public facing APIs yet (just on the Node type). There are going to be some "reserved" keywords for things like this.
I do like the idea of using enums for the reserved type, maybe we could just add an extra variant like "User" or "Custom" that contains a String if it doesn't match a reserved type.
None | ||
} | ||
|
||
fn pattern_match(&self) -> Option<mime::MediaType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang I wasn't expecting to see an actual byte based mime guesser in here that's pretty neat. I've never thought of matching variable length byte arrays like this, that's pretty neat and probably pretty efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Does looks like some of the new tests are legitimately breaking, need a touch of clean up there but otherwise 👍🏼 |
No description provided.