Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enormous amount of Data sent by OSD #201

Open
pmithrandir opened this issue Jul 23, 2024 · 13 comments
Open

Enormous amount of Data sent by OSD #201

pmithrandir opened this issue Jul 23, 2024 · 13 comments
Assignees
Labels

Comments

@pmithrandir
Copy link
Collaborator

On my phone, I got about 281GB of data sent by OSD online for the last month only.

When checking over the last 4-5 month, it's consistent (between 230 and 360GB / month)

Does anyone has the same issue ?
I bet we have a very big issue here, like a loop trying to send data over and over.

For what it means, the screen to share data is always very very slow on my app.

I realized the issue when I used my 4G modem to share internet on wifi during my holidays. I ate almost 80GB in 10 days, almost the full amount available.

Do you have any idea on what could cause the issue ? It might explain the slowness experienced on the other side.

Regards,
Pierre

@jones139
Copy link
Member

Thanks for reporting this - that data usage is not right. I had noticed that the server load was high too...
I'll try and have a look tonight.

@pmithrandir
Copy link
Collaborator Author

Thank you !!

Please tell me if I can help to debug, test or correct something in the code.

Pierre

@pmithrandir
Copy link
Collaborator Author

my first bet would be :
-to send all events and not only one to the server each time we qualify an event
-to get some sort of retry mechanism enabled by default from the phone to the server(for example with a small timeout) and the same data sent multiple time.

@jones139
Copy link
Member

As a quick work around, I have purged the amount of false alarms and unknown events in the online database - I think the API is returning all of your data every time it queries it, which will be quite a lot once you have used it for a while. I am hoping that will improve performance and drop data usage in the short term.
In the medium term I need to make the API limit the number of events it returns unless you ask for everything (which you will not do through the app). Then I'll trace what the app is doing that might have resulted in so many queries......

@jones139
Copy link
Member

I am coming around to your idea of a few months ago to have a serious look at the API - maybe even using something more like mongodb rather than a relational database because the data is actually stored as json style objects like mongodb uses.....

@pmithrandir
Copy link
Collaborator Author

Would you have a swagger / description of the api available ?
I could query it with my account to see what's returned.

Also, it would help me to estimate the amount of work needed on the api, and maybe provide you better feedback on the data storage choice.

What amount of data are we talking about on the sql server ?

Pierre

@jones139
Copy link
Member

I think the best description is here: https://github.com/OpenSeizureDetector/webApi/blob/master/api/README.md - you should be able to query it with your credentials - I think you use your login details to obtain a token, and pass that to the API to authenticate each query - but it is quite a while since I did it and can't really remember!

It must have got to about 25GB before I purged it. The anonymised and grouped (I group into 3 minute intervals which I call a single event) version of the database is 730 MB compressed JSON format.

@pmithrandir
Copy link
Collaborator Author

maybe you can look here:

You get all events for the current user.

What about retreiving only the one from the last 24h ? (or 48) I doubt event older than that could be useful anymore.
But even 10 days would be much better than the actual unlimited amount.

Even worse, it seems the checkevents is also calling getEvents. (I bet this is the one killing the bandwith)

If you have a way, I would undelete the false alarm and try to correct these 2 points. Because false alarm and unknown are still useful data for machine learning to limit the false positive.(we can run the algo against these data to see if they would trigger an alarm or not).

@jones139
Copy link
Member

Thanks - I think you have probably found the issue. I think if I make the default behaviour of the API to return only a limited number of events, I can do that without having to do an Android app update. I then just need to modify the code that I use to make the off-line, anonymised database request everything.

Don't worry - I have not thrown any data away - I maintain an off-line, anonymised version that will have everything in it - is just the online one which is used by the app that I have pruned.

(The reason I am thinking of doing the fix on the server side is that it took Google about 10 days to approve my last small change to the app - but I can change the server without having to get it approved)

@pmithrandir
Copy link
Collaborator Author

pmithrandir commented Jul 25, 2024

Interesting suggestion.

I think you can easily change just one line to acheive that result.
https://github.com/OpenSeizureDetector/webApi/blob/master/api/events/views.py#L31

If you replace the default value None for start variable by current date minus 1 day, any query will return only latest events, except if you manually insert a date earlier manually.

My concern, is that with such little test available and doc, I would be unable to guarantee much of it. And if we are wrong, we can shutdown the entire data sharing service, maybe even the app itself.

I still recommend to take some time to better understand what the api does exactly, and to migrate it to a more readable testable and maintainable technology such as fast API(very popular, readable and light for the server and the code)
On my side, I'll check if during summer time I can find some time to make it happen and make a PR. (BTW, I'm not member of webAPI project I believe)

@pmithrandir
Copy link
Collaborator Author

@jones139 any news on your side ?

I had to shut down the entire service due to this bug while being on vacation. So if my son is doing seizure by night, I will know it only if I ear it.

As I'm not on the project, I can't make a PR like usually to suggest the change.

@jones139
Copy link
Member

No software progress yet, sorry - I am away from home this week so might not get much done. I did prune the database though which I think should have improved it a lot for now.

You can do a PR by cloning the repository if you want to, but I'll see if I can add you to the project on my phone....

Graham

@pmithrandir
Copy link
Collaborator Author

Hi,

I got issue to connect on github this summer, so I didn't progress on the subject.

I'll try to resume my work on it soon.

Pierre

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants