Skip to content

Database Queries

Tom Mitchell edited this page Mar 24, 2018 · 6 revisions

A collection of useful or interesting queries for the clearinghouse database.

Current Portal Usage

There is a script to get all kinds of current portal usage stats in the tools directory.

Sample output:

 Active Non Tutorial Members 
-----------------------------
                        3484

 Active Non Tutorial GPO IdP Members 
-------------------------------------
                                1080

 Active Non Tutorial Non GPO IdP Members 
-----------------------------------------
                                    2401

 Most common member institutions | members 
---------------------------------+---------
 gpolab.bbn.com                  |    1454
 asu.edu                         |     192
...
(30 rows)

 Most common TLDs | members 
------------------+---------
 edu              |    2712
 com              |     531
 in               |     131
...
(25 rows)

 Active users in last 4 months 
-------------------------------
                          1293

 New Portal users in last 4 months 
-----------------------------------
                               831

 Source of new members in last 4 months | members 
----------------------------------------+---------
 gpolab.bbn.com                         |     345
 nyu.edu                                |     102
...
(20 rows)

 Project Leads 
---------------
           277

 Current Projects 
------------------
              314

 New Projects in last 4 months 
-------------------------------
                            66

 Projects with Slices 
----------------------
                   83

 Current Slices 
----------------
            371

 New Slices in last 4 months 
-----------------------------
                        6527

 Portal Resource Reservations in last 4 months 
-----------------------------------------------
                                          9456

 Current Portal Slivers 
------------------------
                   2507

 Slices with Resources 
-----------------------
                   190

 Slices Reserving Resources in the last month 
----------------------------------------------
                                          129

 Slices using Stitching now 
----------------------------
                         25

                        Most Active Current Slices                         | slivers 
---------------------------------------------------------------------------+---------
 urn:publicid:IDN+ch.geni.net:GeniExperimentEngine+slice+Docker            |     460
...
(20 rows)

 Aggregates 
------------
         94

 Aggregates with Reservations 
------------------------------
                           54

               Most Currently Used Aggregates                | slivers 
-------------------------------------------------------------+---------
 urn:publicid:IDN+utahddc.geniracks.net+authority+cm         |     275
...
(20 rows)

 Most Active Projects in Last 4 months | Reservations 
---------------------------------------+--------------
 EL6383-NYU-S2015                      |         2925
....
(30 rows)

                         Most Active Slices in Last 4 months                         | Reservations 
-------------------------------------------------------------------------------------+--------------
 urn:publicid:IDN+ch.geni.net:EL6383-NYU-S2015+slice+project-dsb402                  |          181
...
(30 rows)

 Most Active Members in Last 4 months | Reservations 
--------------------------------------+--------------
 [email protected]                       |          251
...
(30 rows)

Count members

select count(*) from ma_member;

Sample output:

 count 
-------
   320
(1 row)

Count active members

select count(*) from ma_member where member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');

Active members without a GPO Lab email address (not tutorial)

select count(*) from ma_member_attribute
  where member_id not in (select member_id from ma_member_attribute where name = 'email_address' and value like '%gpolab.bbn.com')
  and name='username'
  and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');

Count active members from GPO Lab IdP

select count(*) from ma_member_attribute where name = 'eppn' and value like '%gpolab%' and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');

Sample output:

 count 
-------
   112
(1 row)

Count members ''not'' from GPO Lab IdP

select count(*) from ma_member_attribute where name = 'eppn' and value not like '%gpolab%' and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');

Sample output:

 count 
-------
   208
(1 row)

Count members by eppn domain

Count members by their eppn domain. The eppn domain is unique per InCommon identity provider. This query effectively counts users by institution.

select trim(leading '@' from substring(value from '@.*$')) as institution, count(*) from ma_member_attribute where name = 'eppn' group by institution order by count desc;

Sample output:

  institution   | count 
----------------+-------
 gpolab.bbn.com |   110
 clemson.edu    |    31
 unc.edu        |    18
 asu.edu        |    17
 iu.edu         |    12

[more elided]

List Countries / top level domains

select distinct substring(lower(value) from '%.#"_+#"' for '#') as domain, count(*) from ma_member_attribute where name = 'email_address' group by substring(lower(value) from '%.#"_+#"' for '#') order by count(*) desc;
 domain | count
--------+-------
 edu    |  2459
 com    |   499
 gr     |   112
 in     |    63
 br     |    48
 ca     |    46
 it     |    35
 net    |    16
 nl     |    15
 org    |    12
 de     |    11
 jp     |    11
 au     |    10
 kr     |    10
...

Portal Users active since...

select count(distinct member_id) from last_seen where ts > '2014-05-01';

Source of new Portal Users

select trim(leading '@' from substring(value from '@.*$')) as institution, count(*) from ma_member_attribute where name = 'eppn' and member_id in (select uuid(a.attribute_value) from logging_entry_attribute a, logging_entry l where l.id = a.event_id and l.message like 'Activated GENI user%' and l.event_time > '2015-02-01') group by institution order by count desc;

Portal Usage Stats

select count(*) from pa_project where expired != 't';
select count(*) from pa_project where expired != 't' and creation > '2015-01-01';
select count(*) from sa_slice where expired != 't';
select count(*) from sa_slice where creation > '2015-01-01';
select count(*) from logging_entry where message like 'Add resource request%' and event_time > '2015-01-01';

Portal Users excluding most tutorial accounts and disabled users

*See better query above'''

select count(distinct member_id) from ma_member_attribute where member_id not in (select distinct member_id from ma_member_attribute where name = 'last_name' and value like '%Tutorial%') and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');

Member ID of someone given their email address

select member_id from ma_member_attribute where name = 'email_address' and value like '%ahelsing%';

Username of someone given their email address

select m2.value from ma_member_attribute m1, ma_member_attribute m2 where m1.name = 'email_address' and m1.value like '%ahelsing%' and m1.member_id = m2.member_id and m2.name = 'username';

All attributes of someone given their email address

select m2.name, m2.value from ma_member_attribute m1, ma_member_attribute m2 where m1.name = 'email_address' and m1.value like '%ahelsing%' and m1.member_id = m2.member_id;

People who activated the portal for their account before a given date

select count(*) from logging_entry where message like 'Activated GENI user%' and event_time < '2014-03-01';

'_FIXME*: This may depend on pre and post chapi? Old query was:

select count(*) from logging_entry where message like '%authorizing client%' and event_time < '2013-07-20';

Getting information about people in projects

Here's a query to list all the projects a person or set of people are in. Just edit the line that says 'gec17u%' for the username you want. Or change 'username' to 'email_address' if that is what you have.

select cs_attribute.name as role, pa_project.project_name,
ma_member_attribute.value as username
from pa_project, pa_project_member, ma_member_attribute, cs_attribute
where pa_project.project_id = pa_project_member.project_id
and pa_project_member.member_id = ma_member_attribute.member_id
and ma_member_attribute.name = 'username'
and ma_member_attribute.value like 'gec17u%'
and cs_attribute.id = pa_project_member.role;

Here's a variation if what you want is a list of the people in a project with their roles:

select cs_attribute.name as role, pa_project.project_name,
ma_member_attribute.value as username
from pa_project, pa_project_member, ma_member_attribute, cs_attribute
where pa_project.project_id = pa_project_member.project_id
and pa_project_member.member_id = ma_member_attribute.member_id
and lower(pa_project.project_name) like 'someproject%'
and ma_member_attribute.name = 'username'
and cs_attribute.id = pa_project_member.role;

Email Portal Users

select distinct value from ma_member_attribute
  where name = 'email_address'
  and value not like '%gpolab.bbn.com'
  and member_id not in
    (select member_id from ma_member_attribute
      where name = 'member_enabled'
      and value = 'n');

Or to write this to a file do the following from a terminal:

echo "COPY (select distinct value from ma_member_attribute where name='email_address' and value not like '%gpolab.bbn.com' and member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n')) TO STDOUT;" | psql -U portal -h localhost portal > ~/all_portal_email.csv

A more complex query that adds a name: first name if we have it, else display name, else the email address:

select distinct ma.value as email, coalesce(mb.value, coalesce(mc.value, ma.value)) as name from ma_member_attribute ma left outer join ma_member_attribute mb on ma.member_id=mb.member_id and mb.name = 'first_name' left outer join ma_member_attribute mc on ma.member_id=mc.member_id and mc.name = 'displayName' where ma.name='email_address' and ma.value not like '%gpolab.bbn.com' and ma.member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n');

Graph portal user count by date

  1. Issue the following query on the ch database:
\copy (select event_time::timestamp::date date, count(*) from logging_entry where message like '%Activated%' group by date order by date) To '/tmp/users.csv' with CSV
  1. Your data is now in /tmp/users.csv
  2. Import this data into excel
  3. Create a new column to add the current row total to the previous total
  4. Chart the data as a line chart

Project Leads

Names and emails of project leads:

sudo -u www-data /usr/local/bin/geni-ops-report -u portal -P /usr/sysadmin/etc/portal_password -A /dev/null

A database query that is close, but leaves a few people off:

echo "COPY (select distinct ma.value as email, coalesce(mc.value, coalesce(mb.value, ma.value)) as name from ma_member_privilege priv,  ma_member_attribute ma left outer join ma_member_attribute mb on ma.member_id=mb.member_id and mb.name = 'first_name' left outer join ma_member_attribute mc on ma.member_id=mc.member_id and mc.name = 'displayName' where ma.name='email_address' and ma.value not like '%gpolab.bbn.com' and priv.member_id=ma.member_id and priv.privilege_id=1 order by name) TO STDOUT with CSV HEADER;" | psql -U portal -h localhost portal > project_lead.csv

Are there any leads from a given school (by email)

select m.* from ma_member_attribute m where m.member_id in (select member_id from ma_member_attribute where name = 'email_address' and value like '%umsl.edu') and m.member_id in (select member_id from ma_member_attribute where name = 'PROJECT_LEAD');

All institutions

All domain names in eppns for non-GPO identity providers:

select trim(leading '@' from substring(value from '@.*$')) as institution, count(*)
  into institutions
  from ma_member_attribute
  where name = 'eppn'
    and value not like '%gpolab.bbn.com'
  group by institution
  order by count desc;

All domain names in email addresses for the GPO lab identity provider:

select trim(leading '@' from substring(value from '@.*$')) as institution, count(*)
  into institutions2
  from ma_member_attribute
  where name = 'email_address'
    and member_id in (select member_id from ma_member_attribute where name = 'eppn' and value like '%gpolab.bbn.com')
    and value not like '%gpolab.bbn.com'
  group by institution
  order by count desc;

Export to files (note no semicolon at the end):

\copy (select * from institutions) to '/tmp/institutions.csv' with csv
\copy (select * from institutions2) to '/tmp/institutions2.csv' with csv

Find potential doorknob rattlers

People who requested to join more than 2 projects since a date

select p.requestor as member_id, m.value as requestor, count(distinct p.context_id) as requests from pa_project_member_request p, ma_member_attribute m where m.member_id = p.requestor and m.name = 'username' and p.creation_timestamp > '2014-02-01' group by p.requestor, m.value having count(distinct p.context_id) > 2 order by count(distinct p.context_id) desc;

Here's a follow-on query to that one, showing the subset with queries either open or approved:

select p.requestor as member_id, m.value as requestor, count(distinct p.context_id) as requests from pa_project_member_request p, ma_member_attribute m where m.member_id = p.requestor and m.name = 'username' and p.creation_timestamp > '2014-02-01' and (p.status in (0,1) or p.resolution_timestamp is null) group by p.requestor, m.value having count(distinct p.context_id) > 2 order by count(distinct p.context_id) desc;

People who made lots of requests

select p.requestor as member_id, m.value as requestor, count(*) as requests from pa_project_member_request p, ma_member_attribute m where m.member_id = p.requestor and m.name = 'username' group by p.requestor, m.value having count(*) > 5 order by count(*) desc;

To reject a bunch of requests, do something like this:

update pa_project_member_request set status=3, resolution_timestamp=now(), resolver='9ff4d1e4-8587-4a71-8e30-3af3b2136582' where requestor = 'd9e1ae60-be58-45ff-810e-9f385c9651be' and resolution_timestamp is null;

(Filling in your own member id as the resolver and the doorknob rattler's ID as the requestor.)

To find old outstanding requests (edit the date to be, say, 60 days ago):

select requestor, context_id, creation_timestamp, status, resolution_timestamp from pa_project_member_request where (resolution_timestamp is null or status = 0) and creation_timestamp < '2014-02-14' order by creation_timestamp asc;

To cancel such old requests:

update pa_project_member_request set status=2, resolution_timestamp=now(), resolver='9ff4d1e4-8587-4a71-8e30-3af3b2136582', resolution_description='Request too old' where (resolution_timestamp is null or status = 0) and creation_timestamp < '2014-02-14';

Inactive Projects

Projects that have not had an active slice or logged any events since February 1:

select project_name, project_id from pa_project where expired = 'f' and project_id not in (select project_id from sa_slice where expired='f' or expiration > '2014-02-01') and cast (project_id as char) not in (select a.attribute_value from logging_entry_attribute a, logging_entry l where a.event_id = l.id and l.event_time > '2014-02-01' and a.attribute_name = 'PROJECT' group by a.attribute_value order by attribute_value);

Expiring Outside Certificates

List the users whose outside certificates will expire before a given date:

select mma1.value as "Email address", mma2.value as "EPPN", mma3.value as "First name",
    mma4.value as "Last name", mma5.value as "Full name",
    ls.ts::timestamp::date
  from ma_member_attribute mma1, ma_member_attribute mma2, last_seen ls
  left join ma_member_attribute mma3
    on ls.member_id = mma3.member_id and mma3.name = 'first_name'
  left join ma_member_attribute mma4
    on ls.member_id = mma4.member_id and mma4.name = 'last_name'
  left join ma_member_attribute mma5
    on ls.member_id = mma5.member_id and mma5.name = 'displayName'
  where ls.member_id in (select member_id from ma_outside_cert where expiration < '2014-06-25')
    and ls.member_id not in (select member_id from ma_member_attribute where name = 'email_address' and value like '%gpolab.bbn.com')
    and ls.member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n')
    and mma1.member_id = ls.member_id and mma1.name = 'email_address'
    and mma2.member_id = ls.member_id and mma2.name = 'eppn'
  order by ls.ts desc;

The same, but exported to CSV:

create temporary view gec20 as
 select mma1.value as "Email address", mma2.value as "EPPN", mma3.value as "First name",
    mma4.value as "Last name", mma5.value as "Full name",
    ls.ts::timestamp::date as "Last Seen"
  from ma_member_attribute mma1, ma_member_attribute mma2, last_seen ls
  left join ma_member_attribute mma3
    on ls.member_id = mma3.member_id and mma3.name = 'first_name'
  left join ma_member_attribute mma4
    on ls.member_id = mma4.member_id and mma4.name = 'last_name'
  left join ma_member_attribute mma5
    on ls.member_id = mma5.member_id and mma5.name = 'displayName'
  where ls.member_id in (select member_id from ma_outside_cert where expiration < '2014-06-25')
    and ls.member_id not in (select member_id from ma_member_attribute where name = 'email_address' and value like '%gpolab.bbn.com')
    and ls.member_id not in (select member_id from ma_member_attribute where name = 'member_enabled' and value = 'n')
    and mma1.member_id = ls.member_id and mma1.name = 'email_address'
    and mma2.member_id = ls.member_id and mma2.name = 'eppn'
  order by ls.ts desc;

\copy (select * from gec20) to '/tmp/expiring-gec-20.csv' with csv header

Pretty Names of Members

A bit of SQL to get a pretty name for all members:

select case when m3.value is not null then m3.value else case when m2.value is not null and m3.value is not null then m2.value || ' ' || m3.value else m4.value end end \
  from ma_member_attribute m4 \
       left outer join ma_member_attribute m1 on (m4.member_id = m1.member_id and m1.name = 'first_name') \
       left outer join ma_member_attribute m2 on (m4.member_id = m2.member_id and m2.name = 'last_name') \
       left outer join ma_member_attribute m3 on (m4.member_id = m3.member_id and m3.name = 'displayName') \
  where m4.name = 'email_address';

List Projects with Info

Get all projects with their purpose, expiration, and lead (pretty name and email):

select p.project_name, p.creation, p.expiration, p.project_purpose, \
       case when m3.value is not null then m3.value \
                 else case when m2.value is not null and m3.value is not null \
		           then m2.value || ' ' || m3.value \
			   else m4.value end end as lead, \
       m4.value as lead_email \
  from pa_project p join ma_member_attribute m4 on p.lead_id = m4.member_id \
       left outer join ma_member_attribute m1 on (m4.member_id = m1.member_id and m1.name = 'first_name') \
       left outer join ma_member_attribute m2 on (m4.member_id = m2.member_id and m2.name = 'last_name') \
       left outer join ma_member_attribute m3 on (m4.member_id = m3.member_id and m3.name = 'displayName') \
  where m4.name = 'email_address' \
  order by p.creation desc;

Get the Admin and Member or Auditor count for projects:

select p.project_name, count(distinct ppm1.member_id) as "# Admins", count(distinct ppm2.member_id) as "# Members or Auditors" \
  from pa_project p \
          left join pa_project_member ppm1 on (p.project_id = ppm1.project_id and ppm1.role = 2) \
          left join pa_project_member ppm2 on (p.project_id = ppm2.project_id and ppm2.role in (3,4)) 
  group by p.project_name, p.creation \
  order by p.creation desc;

Or combining those two:

select p.project_name, p.creation, p.expiration, p.project_purpose, 
          case when m3.value is not null then m3.value else case when m2.value is not null and m3.value is not null then m2.value || ' ' || m3.value else m4.value end end as lead, 
          m4.value as lead_email, count(distinct ppm1.member_id) as "# Admins", count(distinct ppm2.member_id) as "# Members or Auditors" 
    from pa_project p join ma_member_attribute m4 on p.lead_id = m4.member_id 
            left outer join ma_member_attribute m1 on (m4.member_id = m1.member_id and m1.name = 'first_name') 
            left outer join ma_member_attribute m2 on (m4.member_id = m2.member_id and m2.name = 'last_name') 
            left outer join ma_member_attribute m3 on (m4.member_id = m3.member_id and m3.name = 'displayName') 
            left join pa_project_member ppm1 on (p.project_id = ppm1.project_id and ppm1.role = 2) 
            left join pa_project_member ppm2 on (p.project_id = ppm2.project_id and ppm2.role in (3,4)) 
    where m4.name = 'email_address' 
    group by p.project_name, p.creation, p.expiration, p.project_purpose, 
                   case when m3.value is not null then m3.value else case when m2.value is not null and m3.value is not null then m2.value || ' ' || m3.value else m4.value end end, 
                   m4.value 
    order by p.creation desc;

Portal Stitching requests

Number of attempts to reserve resources with a multi-AM topology:

select count(*) from logging_entry where event_time > '2015-08-01' and event_time < '2015-09-01' and message like '%stitching R%';

Unused projects

Projects with no active slices showing most recent activity in the log table, oldest (least used) first:

select distinct p.project_name, m.value as lead_email, max(l.event_time) as last_used from pa_project p, ma_member_attribute m, logging_entry l, logging_entry_attribute l2 where p.project_id = l2.attribute_value::uuid and p.lead_id = m.member_id and m.name = 'email_address' and l2.attribute_name = 'PROJECT' and l.id = l2.event_id and p.expired = 'f' and p.project_id not in (select project_id from sa_slice where expired = 'f') group by p.project_name, m.value order by max(l.event_time) asc;

Email domains of active users

select substring(value from '@(.*)$'), count(*)
  from ma_member_attribute
  where name = 'email_address' and
        member_id not in (select member_id
                            from ma_member_attribute
                            where name = 'member_enabled' and
                                  value = 'n')
  group by substring(value from '@(.*)$')
  order by count(*) desc;
Clone this wiki locally