-
Notifications
You must be signed in to change notification settings - Fork 6
Database Queries
A collection of useful or interesting queries for the clearinghouse database.
There is a script to get all kinds of current portal usage stats in the tools directory.
Sample output:
Active Non Tutorial Members
-----------------------------
3484
Active Non Tutorial GPO IdP Members
-------------------------------------
1080
Active Non Tutorial Non GPO IdP Members
-----------------------------------------
2401
Most common member institutions | members
---------------------------------+---------
gpolab.bbn.com | 1454
asu.edu | 192
...
(30 rows)
Most common TLDs | members
------------------+---------
edu | 2712
com | 531
in | 131
...
(25 rows)
Active users in last 4 months
-------------------------------
1293
New Portal users in last 4 months
-----------------------------------
831
Source of new members in last 4 months | members
----------------------------------------+---------
gpolab.bbn.com | 345
nyu.edu | 102
...
(20 rows)
Project Leads
---------------
277
Current Projects
------------------
314
New Projects in last 4 months
-------------------------------
66
Projects with Slices
----------------------
83
Current Slices
----------------
371
New Slices in last 4 months
-----------------------------
6527
Portal Resource Reservations in last 4 months
-----------------------------------------------
9456
Current Portal Slivers
------------------------
2507
Slices with Resources
-----------------------
190
Slices Reserving Resources in the last month
----------------------------------------------
129
Slices using Stitching now
----------------------------
25
Most Active Current Slices | slivers
---------------------------------------------------------------------------+---------
urn:publicid:IDN+ch.geni.net:GeniExperimentEngine+slice+Docker | 460
...
(20 rows)
Aggregates
------------
94
Aggregates with Reservations
------------------------------
54
Most Currently Used Aggregates | slivers
-------------------------------------------------------------+---------
urn:publicid:IDN+utahddc.geniracks.net+authority+cm | 275
...
(20 rows)
Most Active Projects in Last 4 months | Reservations
---------------------------------------+--------------
EL6383-NYU-S2015 | 2925
....
(30 rows)
Most Active Slices in Last 4 months | Reservations
-------------------------------------------------------------------------------------+--------------
urn:publicid:IDN+ch.geni.net:EL6383-NYU-S2015+slice+project-dsb402 | 181
...
(30 rows)
Most Active Members in Last 4 months | Reservations
--------------------------------------+--------------
[email protected] | 251
...
(30 rows)
select count(*) from ma_member;
Sample output:
count
-------
320
(1 row)
select count(*) from ma_member where member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');
select count(*) from ma_member_attribute
where member_id not in (select member_id from ma_member_attribute where name = 'email_address' and value like '%gpolab.bbn.com')
and name='username'
and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');
select count(*) from ma_member_attribute where name = 'eppn' and value like '%gpolab%' and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');
Sample output:
count
-------
112
(1 row)
select count(*) from ma_member_attribute where name = 'eppn' and value not like '%gpolab%' and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');
Sample output:
count
-------
208
(1 row)
Count members by their eppn domain. The eppn domain is unique per InCommon identity provider. This query effectively counts users by institution.
select trim(leading '@' from substring(value from '@.*$')) as institution, count(*) from ma_member_attribute where name = 'eppn' group by institution order by count desc;
Sample output:
institution | count
----------------+-------
gpolab.bbn.com | 110
clemson.edu | 31
unc.edu | 18
asu.edu | 17
iu.edu | 12
[more elided]
select distinct substring(lower(value) from '%.#"_+#"' for '#') as domain, count(*) from ma_member_attribute where name = 'email_address' group by substring(lower(value) from '%.#"_+#"' for '#') order by count(*) desc;
domain | count
--------+-------
edu | 2459
com | 499
gr | 112
in | 63
br | 48
ca | 46
it | 35
net | 16
nl | 15
org | 12
de | 11
jp | 11
au | 10
kr | 10
...
select count(distinct member_id) from last_seen where ts > '2014-05-01';
select trim(leading '@' from substring(value from '@.*$')) as institution, count(*) from ma_member_attribute where name = 'eppn' and member_id in (select uuid(a.attribute_value) from logging_entry_attribute a, logging_entry l where l.id = a.event_id and l.message like 'Activated GENI user%' and l.event_time > '2015-02-01') group by institution order by count desc;
select count(*) from pa_project where expired != 't';
select count(*) from pa_project where expired != 't' and creation > '2015-01-01';
select count(*) from sa_slice where expired != 't';
select count(*) from sa_slice where creation > '2015-01-01';
select count(*) from logging_entry where message like 'Add resource request%' and event_time > '2015-01-01';
*See better query above'''
select count(distinct member_id) from ma_member_attribute where member_id not in (select distinct member_id from ma_member_attribute where name = 'last_name' and value like '%Tutorial%') and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');
select member_id from ma_member_attribute where name = 'email_address' and value like '%ahelsing%';
select m2.value from ma_member_attribute m1, ma_member_attribute m2 where m1.name = 'email_address' and m1.value like '%ahelsing%' and m1.member_id = m2.member_id and m2.name = 'username';
select m2.name, m2.value from ma_member_attribute m1, ma_member_attribute m2 where m1.name = 'email_address' and m1.value like '%ahelsing%' and m1.member_id = m2.member_id;
select count(*) from logging_entry where message like 'Activated GENI user%' and event_time < '2014-03-01';
'_FIXME*: This may depend on pre and post chapi? Old query was:
select count(*) from logging_entry where message like '%authorizing client%' and event_time < '2013-07-20';
Here's a query to list all the projects a person or set of people are in. Just edit the line that says 'gec17u%' for the username you want. Or change 'username' to 'email_address' if that is what you have.
select cs_attribute.name as role, pa_project.project_name,
ma_member_attribute.value as username
from pa_project, pa_project_member, ma_member_attribute, cs_attribute
where pa_project.project_id = pa_project_member.project_id
and pa_project_member.member_id = ma_member_attribute.member_id
and ma_member_attribute.name = 'username'
and ma_member_attribute.value like 'gec17u%'
and cs_attribute.id = pa_project_member.role;
Here's a variation if what you want is a list of the people in a project with their roles:
select cs_attribute.name as role, pa_project.project_name,
ma_member_attribute.value as username
from pa_project, pa_project_member, ma_member_attribute, cs_attribute
where pa_project.project_id = pa_project_member.project_id
and pa_project_member.member_id = ma_member_attribute.member_id
and lower(pa_project.project_name) like 'someproject%'
and ma_member_attribute.name = 'username'
and cs_attribute.id = pa_project_member.role;
select distinct value from ma_member_attribute
where name = 'email_address'
and value not like '%gpolab.bbn.com'
and member_id not in
(select member_id from ma_member_attribute
where name = 'member_enabled'
and value = 'n');
Or to write this to a file do the following from a terminal:
echo "COPY (select distinct value from ma_member_attribute where name='email_address' and value not like '%gpolab.bbn.com' and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n')) TO STDOUT;" | psql -U portal -h localhost portal > ~/all_portal_email.csv
A more complex query that adds a name: first name if we have it, else display name, else the email address:
select distinct ma.value as email, coalesce(mb.value, coalesce(mc.value, ma.value)) as name from ma_member_attribute ma left outer join ma_member_attribute mb on ma.member_id=mb.member_id and mb.name = 'first_name' left outer join ma_member_attribute mc on ma.member_id=mc.member_id and mc.name = 'displayName' where ma.name='email_address' and ma.value not like '%gpolab.bbn.com' and ma.member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');
- Issue the following query on the ch database:
\copy (select event_time::timestamp::date date, count(*) from logging_entry where message like '%Activated%' group by date order by date) To '/tmp/users.csv' with CSV
- Your data is now in
/tmp/users.csv
- Import this data into excel
- Create a new column to add the current row total to the previous total
- Chart the data as a line chart
Names and emails of project leads:
sudo -u www-data /usr/local/bin/geni-ops-report -u portal -P /usr/sysadmin/etc/portal_password -A /dev/null
A database query that is close, but leaves a few people off:
echo "COPY (select distinct ma.value as email, coalesce(mc.value, coalesce(mb.value, ma.value)) as name from ma_member_privilege priv, ma_member_attribute ma left outer join ma_member_attribute mb on ma.member_id=mb.member_id and mb.name = 'first_name' left outer join ma_member_attribute mc on ma.member_id=mc.member_id and mc.name = 'displayName' where ma.name='email_address' and ma.value not like '%gpolab.bbn.com' and priv.member_id=ma.member_id and priv.privilege_id=1 order by name) TO STDOUT with CSV HEADER;" | psql -U portal -h localhost portal > project_lead.csv
select m.* from ma_member_attribute m where m.member_id in (select member_id from ma_member_attribute where name = 'email_address' and value like '%umsl.edu') and m.member_id in (select member_id from ma_member_attribute where name = 'PROJECT_LEAD');
All domain names in eppns for non-GPO identity providers:
select trim(leading '@' from substring(value from '@.*$')) as institution, count(*)
into institutions
from ma_member_attribute
where name = 'eppn'
and value not like '%gpolab.bbn.com'
group by institution
order by count desc;
All domain names in email addresses for the GPO lab identity provider:
select trim(leading '@' from substring(value from '@.*$')) as institution, count(*)
into institutions2
from ma_member_attribute
where name = 'email_address'
and member_id in (select member_id from ma_member_attribute where name = 'eppn' and value like '%gpolab.bbn.com')
and value not like '%gpolab.bbn.com'
group by institution
order by count desc;
Export to files (note no semicolon at the end):
\copy (select * from institutions) to '/tmp/institutions.csv' with csv
\copy (select * from institutions2) to '/tmp/institutions2.csv' with csv
select p.requestor as member_id, m.value as requestor, count(distinct p.context_id) as requests from pa_project_member_request p, ma_member_attribute m where m.member_id = p.requestor and m.name = 'username' and p.creation_timestamp > '2014-02-01' group by p.requestor, m.value having count(distinct p.context_id) > 2 order by count(distinct p.context_id) desc;
Here's a follow-on query to that one, showing the subset with queries either open or approved:
select p.requestor as member_id, m.value as requestor, count(distinct p.context_id) as requests from pa_project_member_request p, ma_member_attribute m where m.member_id = p.requestor and m.name = 'username' and p.creation_timestamp > '2014-02-01' and (p.status in (0,1) or p.resolution_timestamp is null) group by p.requestor, m.value having count(distinct p.context_id) > 2 order by count(distinct p.context_id) desc;
select p.requestor as member_id, m.value as requestor, count(*) as requests from pa_project_member_request p, ma_member_attribute m where m.member_id = p.requestor and m.name = 'username' group by p.requestor, m.value having count(*) > 5 order by count(*) desc;
To reject a bunch of requests, do something like this:
update pa_project_member_request set status=3, resolution_timestamp=now(), resolver='9ff4d1e4-8587-4a71-8e30-3af3b2136582' where requestor = 'd9e1ae60-be58-45ff-810e-9f385c9651be' and resolution_timestamp is null;
(Filling in your own member id as the resolver and the doorknob rattler's ID as the requestor.)
To find old outstanding requests (edit the date to be, say, 60 days ago):
select requestor, context_id, creation_timestamp, status, resolution_timestamp from pa_project_member_request where (resolution_timestamp is null or status = 0) and creation_timestamp < '2014-02-14' order by creation_timestamp asc;
To cancel such old requests:
update pa_project_member_request set status=2, resolution_timestamp=now(), resolver='9ff4d1e4-8587-4a71-8e30-3af3b2136582', resolution_description='Request too old' where (resolution_timestamp is null or status = 0) and creation_timestamp < '2014-02-14';
Projects that have not had an active slice or logged any events since February 1:
select project_name, project_id from pa_project where expired = 'f' and project_id not in (select project_id from sa_slice where expired='f' or expiration > '2014-02-01') and cast (project_id as char) not in (select a.attribute_value from logging_entry_attribute a, logging_entry l where a.event_id = l.id and l.event_time > '2014-02-01' and a.attribute_name = 'PROJECT' group by a.attribute_value order by attribute_value);
List the users whose outside certificates will expire before a given date:
select mma1.value as "Email address", mma2.value as "EPPN", mma3.value as "First name",
mma4.value as "Last name", mma5.value as "Full name",
ls.ts::timestamp::date
from ma_member_attribute mma1, ma_member_attribute mma2, last_seen ls
left join ma_member_attribute mma3
on ls.member_id = mma3.member_id and mma3.name = 'first_name'
left join ma_member_attribute mma4
on ls.member_id = mma4.member_id and mma4.name = 'last_name'
left join ma_member_attribute mma5
on ls.member_id = mma5.member_id and mma5.name = 'displayName'
where ls.member_id in (select member_id from ma_outside_cert where expiration < '2014-06-25')
and ls.member_id not in (select member_id from ma_member_attribute where name = 'email_address' and value like '%gpolab.bbn.com')
and ls.member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n')
and mma1.member_id = ls.member_id and mma1.name = 'email_address'
and mma2.member_id = ls.member_id and mma2.name = 'eppn'
order by ls.ts desc;
The same, but exported to CSV:
create temporary view gec20 as
select mma1.value as "Email address", mma2.value as "EPPN", mma3.value as "First name",
mma4.value as "Last name", mma5.value as "Full name",
ls.ts::timestamp::date as "Last Seen"
from ma_member_attribute mma1, ma_member_attribute mma2, last_seen ls
left join ma_member_attribute mma3
on ls.member_id = mma3.member_id and mma3.name = 'first_name'
left join ma_member_attribute mma4
on ls.member_id = mma4.member_id and mma4.name = 'last_name'
left join ma_member_attribute mma5
on ls.member_id = mma5.member_id and mma5.name = 'displayName'
where ls.member_id in (select member_id from ma_outside_cert where expiration < '2014-06-25')
and ls.member_id not in (select member_id from ma_member_attribute where name = 'email_address' and value like '%gpolab.bbn.com')
and ls.member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n')
and mma1.member_id = ls.member_id and mma1.name = 'email_address'
and mma2.member_id = ls.member_id and mma2.name = 'eppn'
order by ls.ts desc;
\copy (select * from gec20) to '/tmp/expiring-gec-20.csv' with csv header
A bit of SQL to get a pretty name for all members:
select case when m3.value is not null then m3.value else case when m2.value is not null and m3.value is not null then m2.value || ' ' || m3.value else m4.value end end \
from ma_member_attribute m4 \
left outer join ma_member_attribute m1 on (m4.member_id = m1.member_id and m1.name = 'first_name') \
left outer join ma_member_attribute m2 on (m4.member_id = m2.member_id and m2.name = 'last_name') \
left outer join ma_member_attribute m3 on (m4.member_id = m3.member_id and m3.name = 'displayName') \
where m4.name = 'email_address';
Get all projects with their purpose, expiration, and lead (pretty name and email):
select p.project_name, p.creation, p.expiration, p.project_purpose, \
case when m3.value is not null then m3.value \
else case when m2.value is not null and m3.value is not null \
then m2.value || ' ' || m3.value \
else m4.value end end as lead, \
m4.value as lead_email \
from pa_project p join ma_member_attribute m4 on p.lead_id = m4.member_id \
left outer join ma_member_attribute m1 on (m4.member_id = m1.member_id and m1.name = 'first_name') \
left outer join ma_member_attribute m2 on (m4.member_id = m2.member_id and m2.name = 'last_name') \
left outer join ma_member_attribute m3 on (m4.member_id = m3.member_id and m3.name = 'displayName') \
where m4.name = 'email_address' \
order by p.creation desc;
Get the Admin and Member or Auditor count for projects:
select p.project_name, count(distinct ppm1.member_id) as "# Admins", count(distinct ppm2.member_id) as "# Members or Auditors" \
from pa_project p \
left join pa_project_member ppm1 on (p.project_id = ppm1.project_id and ppm1.role = 2) \
left join pa_project_member ppm2 on (p.project_id = ppm2.project_id and ppm2.role in (3,4))
group by p.project_name, p.creation \
order by p.creation desc;
Or combining those two:
select p.project_name, p.creation, p.expiration, p.project_purpose,
case when m3.value is not null then m3.value else case when m2.value is not null and m3.value is not null then m2.value || ' ' || m3.value else m4.value end end as lead,
m4.value as lead_email, count(distinct ppm1.member_id) as "# Admins", count(distinct ppm2.member_id) as "# Members or Auditors"
from pa_project p join ma_member_attribute m4 on p.lead_id = m4.member_id
left outer join ma_member_attribute m1 on (m4.member_id = m1.member_id and m1.name = 'first_name')
left outer join ma_member_attribute m2 on (m4.member_id = m2.member_id and m2.name = 'last_name')
left outer join ma_member_attribute m3 on (m4.member_id = m3.member_id and m3.name = 'displayName')
left join pa_project_member ppm1 on (p.project_id = ppm1.project_id and ppm1.role = 2)
left join pa_project_member ppm2 on (p.project_id = ppm2.project_id and ppm2.role in (3,4))
where m4.name = 'email_address'
group by p.project_name, p.creation, p.expiration, p.project_purpose,
case when m3.value is not null then m3.value else case when m2.value is not null and m3.value is not null then m2.value || ' ' || m3.value else m4.value end end,
m4.value
order by p.creation desc;
Number of attempts to reserve resources with a multi-AM topology:
select count(*) from logging_entry where event_time > '2015-08-01' and event_time < '2015-09-01' and message like '%stitching R%';
Projects with no active slices showing most recent activity in the log table, oldest (least used) first:
select distinct p.project_name, m.value as lead_email, max(l.event_time) as last_used from pa_project p, ma_member_attribute m, logging_entry l, logging_entry_attribute l2 where p.project_id = l2.attribute_value::uuid and p.lead_id = m.member_id and m.name = 'email_address' and l2.attribute_name = 'PROJECT' and l.id = l2.event_id and p.expired = 'f' and p.project_id not in (select project_id from sa_slice where expired = 'f') group by p.project_name, m.value order by max(l.event_time) asc;
select substring(value from '@(.*)$'), count(*)
from ma_member_attribute
where name = 'email_address' and
member_id not in (select member_id
from ma_member_attribute
where name = 'member_enabled' and
value = 'n')
group by substring(value from '@(.*)$')
order by count(*) desc;