Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.5 Tracking API usage and other usage like disk space [Optional] #12

Open
rossjones opened this issue Dec 10, 2014 · 5 comments
Open

Comments

@rossjones
Copy link

This is useful to Cloud Admin as measuring usage across different instances may 
become important to performance management.

Tracking API usage and other usage like disk space (and displaying this in admin 
interface) 
@rossjones
Copy link
Author

Tracking of API usage would be important, but I think the monitoring of other aspects is already well handled by other tools built specifically for the purpose.

Perhaps extending CKAN in a way that some stats can be more easily captured by one of these systems would be a better general approach?

@jqnatividad
Copy link
Contributor

Maybe the project can make https://github.com/ckan/ckanext-googleanalytics more robust. Again, consistent with the Unix toolchain approach, leveraging best-of-breed tools like Google Analytics.

Though GA is a "free as in beer" and not strictly "free as in speech" open source software, it has become the de facto standard.

With that said, the team should consider exposing the webserver logs of a CKAN instance as a dataset. In a SBIR study we did earlier this year about opendata, we found out that there is no simple way to measure the downstream usage of a dataset, which is a big signal that both data publishers and advocates need to prioritize data.

The existing reports are simply too coarse (only total aggregate views, downloads; no way to filter geotemporally). Of course, there should be some mechanism to control who has access to the webserver logs dataset. And better aggregated reports, much better than the existing ones, can be created and exposed to the general public.

From the webserver log dataset, you can even track if businesses, citizens, apps, other agencies are using the data. It can even be used to find downstream data users and automagically catalog them in CKAN's related items tab (e.g. visualizations/PDFs/sites using the https://github.com/BetaNYC/getDataButton, etc.)

@waldoj
Copy link
Member

waldoj commented Dec 16, 2014

the team should consider exposing the webserver logs of a CKAN instance as a dataset

Huh. That's both clever and simple, my favorite combination of traits in an idea. :) Adding a new log config line to Apache could output a properly anonymized access log directly in the webroot. I like it!

@wardi
Copy link

wardi commented Mar 21, 2015

ckan-multisite has all requests going through a single HTTP router, so the access logs for all the sites can be aggregated or reported on really easily. I've opened a ticket to revisit this when we have some code to show: datacats/ckan-multisite#4

@jqnatividad
Copy link
Contributor

Great! We may also want to look at 18F's http://analytics.usa.gov for inspiration. Since we have full access logs, it doesn't directly apply, but once some instances "graduate" to their own dedicated open data installations, it may still be a way to aggregate high-level analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants