Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count occurrences of the same media item #89

Open
4 of 5 tasks
Tracked by #53
dennyabrain opened this issue Mar 11, 2024 · 5 comments · Fixed by tattle-made/feluda#163
Open
4 of 5 tasks
Tracked by #53

Count occurrences of the same media item #89

dennyabrain opened this issue Mar 11, 2024 · 5 comments · Fixed by tattle-made/feluda#163
Assignees
Labels
level:ticket An issue that describes a ticket (initiative>feature>ticket) priority:medium

Comments

@dennyabrain
Copy link
Contributor

dennyabrain commented Mar 11, 2024

For every media item that we receive on the tipline, we need to show to the users how many occurences of this exact file exist on the server. Given our infra, the scope of this task is to

@aatmanvaidya
Copy link
Contributor

aatmanvaidya commented Mar 11, 2024

In the short term hashing is the way to check if files are same or not. It is also the fastest way to do so

Are SHA-256 and SHA-512 collision resistant?

The probability of collision is very very low

We call an event is-not-gonna-happen if it has probability <1/2^100

You can use any 512-bit cryptographic hash function like SHA-512, SHA3-512, and BLAKE2b 
without fear of collision. 
You may look at BLAKE2b quite fast compared to alternatives and its parallel version BLAKE3.

@dennyabrain
Copy link
Contributor Author

In the short term hashing is the way to check if same or not. It is also the fastest way to do so

Great. Then lets move onto checking if they apply for our use case. Do share the various way you test out media items received on whatsapp. To keep as a log of things that worked and which did not.

@aatmanvaidya
Copy link
Contributor

In the short term hashing is the way to check if same or not. It is also the fastest way to do so

Great. Then lets move onto checking if they apply for our use case. Do share the various way you test out media items received on whatsapp. To keep as a log of things that worked and which did not.

Yes I have started working on this, can you also see the updated comment with the stackoverflow link that talks about how sha256 and sha512 are collision resistant

We should consider using Blacke3 over sha512. It is much much faster

@aatmanvaidya
Copy link
Contributor

aatmanvaidya commented Mar 12, 2024

Time taken by blake2b to find the hash of audio and video files of different lengths and sizes

Audio

Media Type - Length Time Taken
audio - 30s 0.018s
audio - 60s 0.027s
audio - 120s 0.056s
audio - 300s 0.122s
audio - 600s 0.234s
audio - 1200s 0.425s
audio - 1800s 0.631s

Video

Media Type - Length Time Taken
video - 30s 0.0081s
video - 60s 0.013s
video - 300s 0.022s
video - 600s 0.074s
video - 1200s 0.087s
video - 1800s 0.148s
video - 3600s 0.33s

@aatmanvaidya aatmanvaidya added level:ticket An issue that describes a ticket (initiative>feature>ticket) priority:medium labels Mar 12, 2024
@aatmanvaidya aatmanvaidya linked a pull request Mar 12, 2024 that will close this issue
@tarunima tarunima added level:feature An issue that describes a feature (initiative>feature>ticket) and removed level:ticket An issue that describes a ticket (initiative>feature>ticket) labels Mar 13, 2024
@dennyabrain dennyabrain added level:ticket An issue that describes a ticket (initiative>feature>ticket) and removed level:feature An issue that describes a feature (initiative>feature>ticket) labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level:ticket An issue that describes a ticket (initiative>feature>ticket) priority:medium
Projects
Status: In progress
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants