Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on the topic of anonymous data collection #18

Open
orenmazor opened this issue Jun 18, 2020 · 10 comments
Open

on the topic of anonymous data collection #18

orenmazor opened this issue Jun 18, 2020 · 10 comments

Comments

@orenmazor
Copy link

my test suite uses requests_mock to control outgoing requests and this one surprised me when I integrated this library.

>       raise exceptions.NoMockAddress(request)
E       requests_mock.exceptions.NoMockAddress: No mock address: POST https://stats.ponytech.net/new-event

looking in the code:

def _submit_stats(self, event_type):
"""
this submits anonymous usage statistics to help us better understand how this library is used
you can opt-out by initializing the client with submit_stats=False
"""
payload = {
'project': 'appstoreconnectapi',
'version': version,
'type': event_type,
'parameters': {
'python_version': platform.python_version(),
'platform': platform.platform(),
'issuer_id_hash': hashlib.sha1(self.issuer_id.encode()).hexdigest(), # send anonymized hash
}
}
if event_type == 'session_end':
payload['parameters']['endpoints'] = self._call_stats
requests.post('https://stats.ponytech.net/new-event', json.dumps(payload))

I see I can disable this but you should make this opt in, or at least extremely visible in your README.md. I'm using this library to pull financial reports and if I didn't opt into my usage being tracked, I'd be pretty upset down the line to discover this.

I get that its anonymous, but I also didn't opt in to it.

@ppawlak
Copy link
Contributor

ppawlak commented Jun 18, 2020

Hey @orenmazor thank you for raising this subject.

When I implemented this I was really anxious about how people would react about this "feature" as I know many are using this library to deal with sensitive data. I eventually decided not to make it opt-out by default as I was afraid to get no data at all.

I added a note about it in the changelog but I guess your right, this is not visible enough and I'll update the example in the README to show how to turn it off.

My goals for data collection were mainly for 3 reasons:

  • I wanted to know if the library was used by many : to keep me motivated in the development
  • what was it used for : to improve parts people are using the most
  • what python versions were used : to not bother support older versions of python

Feel free (you or anyone else reading this) to share your thoughts or best practices on this topic.

@ppawlak
Copy link
Contributor

ppawlak commented Jun 19, 2020

I had updated the README: 2c06c90

I am leaving the issue open for now to get more feedback.

@orenmazor
Copy link
Author

@ppawlak I get what you're saying, and this is after all your codebase. I appreciate that you added the notice. We're a little more privacy focused, so I actually forked your library and removed that stats collection.

quick comment on your changes: why do you need the issuer ID? if you truly have to collect it, please hash it with something other than sha1 as it is not cryptographically secure anymore, like sha-256.

@ppawlak
Copy link
Contributor

ppawlak commented Jun 20, 2020

quick comment on your changes: why do you need the issuer ID? if you truly have to collect it, please hash it with something other than sha1 as it is not cryptographically secure anymore, like sha-256.

This is to know how many different organizations uses the library.
Thanks for tips, I'll switch to sha-256 in the next version.

@jberkel
Copy link

jberkel commented Oct 2, 2020

there is a worrying trend of more and more libraries collecting data and/or performing unrequested update checks.

to get an approximation of usage data, wouldn't it be enough to simply analyze PyPI package download stats?

@ppawlak
Copy link
Contributor

ppawlak commented Oct 3, 2020

to get an approximation of usage data, wouldn't it be enough to simply analyze PyPI package download stats?

This is indeed what I first looked at..
But I don't think it is very relevant: what if a single organization CI makes a dozen downloads per day? It also doesn't gives me what python versions are used or what are the most used functions.

@ironslob
Copy link

ironslob commented Dec 4, 2020

This is an extremely worrying addition to any project, and certainly doesn't feel good. It certainly wasn't obvious to me that this would be included.

Would you consider making this opt-in? From a community and privacy perspective the "opt-in by default" approach leaves a sour taste in my mouth. Even the GDPR highlighted that this is a poor approach.

I can understand the desire for some analytics, but in this instance ask permission, not forgiveness.

@ppawlak
Copy link
Contributor

ppawlak commented Dec 4, 2020

@ironslob Thanks for your feedback and I totally understand your point.

Making this opt-out by default had been mentioned already but my feeling is, this is far different from a desktop app were you have a popup asking for permission at first launch. I may be wrong but I think we'd would just get no analytics at all here.

Anyway, I am just considering to remove this "feature" completely now, there are too many concerns. I think for version 1.0 which I hope will happen sooner than later.

@MrChadMWood
Copy link

MrChadMWood commented Oct 27, 2022

@ppawlak

I understand this concern, and your post about no nagware like a desktop app would have is valid.
Have you considered making dont_collect_stats a required parameter? You could even hardcode a small print() statement on every run if disabled--advising that they are not supporting the project by this manner of use.

@ppawlak
Copy link
Contributor

ppawlak commented Nov 20, 2022

@MrChadMWood thanks for the suggestion, a required parameter makes sense.
I'll see if I make this change or completely remove this "feature" as mentioned earlier. I had not been active on the development recently but I hope to get back on it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants