-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File hashes: reading from/writing to extended attributes #1204
Comments
What are you talking about? Hash twice? Are you referring to AVDump? That isn't ours. We can't pass data to it. That's kind of the point of it. |
Can you elaborate which part of the workflow I described is confusing to you? Is it the naming? If so: a program that makes calls to an API (like Shoko's REST API) is often called a client, short for "API client". That's the term I have used and the reason I reference Shoko's API endpoints by their paths. I'm not sure why AVDump was brought into the picture. I'm happy to give more details but I need more than than "what are you talking about". Throw me a bone here. I |
I misread. I don't understand the point, though |
|
What issue are you particularly trying to solve that requires the double hashing that you mention in the original issue? Since I have a few concerns about this approach, particularly being very linux-specific and it is not particularly that portable across OS platforms that we support. The reason @da3dsoul might have brought AVDump into the question is because that's the only use case within Shoko's code & supported tooling that would cause a file to be hashed a second time, outside of requesting a new hashing operation on an unrecognised file. Alternatively, we are slowly building on a plugin API that exposes events such as
|
Hash information about file was always included in database (or if you are old-school like me in file name of given file). Storing it in extended attributes is fine idea, but as mention before not as portable. |
Let me reply to both #1204 (comment) and #1204 (comment) in a single comment:
I need to identify a file whose metadata (basepath, name, owner, ctime/mtime/atime, etc.) may be different that what Shoko sees (e.g. Shoko sees
I proposed xattr since I'm familiar with it and I have a working solution. I believe the win64 has both extended attributes and NTFS Alternate Data Streams. I remember reading that WSL1 was implementing linux FS features on NTFS using them. I don't know how complicated the Windows side is, but on Linux the overhead is one system call for reading and one for writing. Regarding interop, Samba has the
I see. To reiterate: (as you've observed) my request is for interop with code not in Shoko.
Without more info I don't think I can realistically pursue this. A plugin API (a binary one as far as I can see) would require me create and maintain code in language I haven't learned, using APIs and libraries I don't know, and set up/maintain CI/CD for this and deal with the artifact distribution. I'm not really seeing docs or stability guarantees either.
I agree that using the filename is the most straightforward option but I cannot change the original filename/directory structure/etc for archival reasons.
Certainly. |
I already did go to this rabbit hole. ;). When I was pursing the cloud filesystem support for Shoko, since download the whole file for hashing, was indeed costly... Deep in Shoko, probably before the Command Refactor, and the WPF Server exists. There was a file checker that leverage that Shoko maintains in their database the ED2K, MD5 and SHA1 of every file, it was uses probably only by me to check all the files health. ;) You could use that as a base, to create your own tool, that connect to Shoko database and write the desire video.xattr files or videofolder.xattr or NTFS alternate file stream with such data. .sfv files per example. You could in the future leverage what @Cazzar is talking about, and create a plugin that leverages the FileHash event and write such file, every time Shoko sees a new file and hash it. Shoko, maintains a mapping between name and the file in db, so moving the file to other folder, will not trigger the hashing again. I think @bigretromike used to have metadata files in his collections for other Media Centers, to leverage File based metadata, maybe you should extend such kind of file types. Per example: I do find attractive to maintain a standard file type in every folder (Maybe .nfo), containing the hashes, and minimal data, like AniDB id, etc. In that way, in some dystopian future, if you move such folders or recreate the Shoko from scratch the import could be much faster. Also, interop with other Media Centers could be more straightforward, since reading such file is probably easier than call a custom API, and more if it is a pseudo-standard already supported. In the above nfo case the uniqueid tag, one could fill that with SHA1, ED2K, MD5 and/or AniDB Id. |
This has two issues:
This is useful to know, thanks. What happens if files have the same name, e.g.
This is approximately my use case I wrote a tool to go from "arbitrary unstructured files" to something long the lines of
|
Filesize is also stored, both are checked against, to bypass a new Hash Calculation. I think also renamers touch this table, in that case, in the import you rename the filename. Both filenames are stored. (But I'm not sure at the moment). Videolocal table has all the hashes and CRC32, videolocal_place the paths where the file is stored (It might be in multiple places) (IN conjunction with import folders prefix path).
Nice, it seems Kodi support multiepisodes nfo in the tvshow, idk others. I also think or remember, APIs (probably not v3), has and endpoint or a combination, where you provide the file and, in the end, you get all the hashes, and all the anime information. That can be leveraged, if you don't want to connect to the DB. If you need a custom API endpoint, it will be possible. If you do the hardwork ;) and @ElementalCrisis or other masters approve it. And of course, you have a third option, join the team, help @Cazzar with the plugin framework, and create the first plugin that do everything automagically. |
To give a final decision on this: We are not going to implement this into the core of Shoko as it is out of scope for the project itself and as prior discussed it seems to be for a very specific use case. If you would like to implement this yourself, the answer of the Plugin API has been provided, though also looking at the swagger endpoints might also provide you some information as well. |
VERSION INFORMATION
Server Version: 5.0.0.60 (9808915)
LOG FILE
N/A
DESCRIPTION
When using Shoko as a data source, starting the query chain with a file hash (
/File/Hash/*
endpoints) is the most reliable method. However, to do so the file currently needs to be hashed at least twice: by Shoko and by the API client. This is wasteful, especially over the network.As an alternative I propose for Shoko to
This allows for the following workflow
setfattr -n user.$HASHTYPE -v "$HASHVAL" $FILE
getfattr -n user.$HASHTYPE -e hex $FILE
and finds a valueGET /File/Hash/$HASHTYPE
and gets fileid to work with)Same idea idea vice-versa.
FAQ
STEPS TO REPRODUCE
N/A
The text was updated successfully, but these errors were encountered: