Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Given a table name and run number, retrieve all variations for that table and run #39

Open
DraTeots opened this issue Dec 21, 2016 · 6 comments

Comments

@DraTeots
Copy link
Contributor

No description provided.

@DraTeots
Copy link
Contributor Author

DraTeots commented Dec 21, 2016

CCDB JAVA Api under the hood works in this order:

  1. Almost all loading logic is in JDBCProvider. MySQLProvider and SQLiteProvider hold queries specific for databases.

  2. First, when connect function is called. All directories and all variations are loaded from database. It is done to build directory tree and variations tree. After this all variations and directories are available by

private var variationsById = HashMap<Int, Variation>()
private var variationsByName = HashMap<String, Variation>()
private var directoriesByFullPath = HashMap<String,Directory>()
private var directoriesById = HashMap<Int, Directory>()
private var directories = Vector<Directory>()

So one can retrieve any variation or directory by Id or by name.


  1. What are Directory and Variation objects? All objects representing CCDB data structure (more or less the same as in DB) are located in model.kt. The classes above looks like this:
class Directory(
    val id:Int,                    /// DB id
    val parentId:Int,         /// DB id of parent directory. Id=0 - root directory
    val name:String,        /// Name of the directory
    var fullPath                                                     /// Full path (including self name) of the directory
    var parentDirectory:Directory?=null              /// null if there is no parent directory
    var subdirectories = Vector<Directory>()     /// List of subdirectories
class Variation(
val id:Int,                /// DB ID
val parentId:Int,     /// DB of parent ID 
val name:String     /// Variation name

var parentVariation:Variation? = null        /// Parent variation object
var children = Vector<Variation>()          /// Children variations list

  1. To implement this ticket one needs to get a list of variations IDs from database, then just get Variation objects from the above variations HashMap;

  1. Lets dissect original CCDB data selection query
SELECT `assignments`.`id` AS `asId`, 
`constantSets`.`vault` AS `blob`, 
`assignments`.`modified` as `asModified`
FROM  `assignments` 
USE INDEX (id_UNIQUE) 
INNER JOIN `runRanges` ON `assignments`.`runRangeId`= `runRanges`.`id` 
INNER JOIN `constantSets` ON `assignments`.`constantSetId` = `constantSets`.`id` 
INNER JOIN `typeTables` ON `constantSets`.`constantTypeId` = `typeTables`.`id` 
WHERE  `runRanges`.`runMin` <= ? 
AND `runRanges`.`runMax` >= ? 
AND `assignments`.`variationId`= ? 
AND `constantSets`.`constantTypeId` = ? 

The first part chooses what data to get

SELECT `assignments`.`id` AS `asId`, 
`constantSets`.`vault` AS `blob`, 
`assignments`.`modified` as `asModified`
FROM  `assignments` 

The middle part says how tables are connected

INNER JOIN `runRanges` ON `assignments`.`runRangeId`= `runRanges`.`id` 
INNER JOIN `constantSets` ON `assignments`.`constantSetId` = `constantSets`.`id` 
INNER JOIN `typeTables` ON `constantSets`.`constantTypeId` = `typeTables`.`id` 

The last part is about how data is selected

WHERE  `runRanges`.`runMin` <= ? 
AND `runRanges`.`runMax` >= ? 
AND `assignments`.`variationId`= ? 
AND `constantSets`.`constantTypeId` = ? 

There is also USE INDEX (id_UNIQUE) which improves performance and can be skipped.


  1. So lets create the right query:

first we need to select variations Ids. We use assignments.variationId and not variations.id... But this is equvivalent

SELECT `assignments`.`variationId` AS `varId`
FROM  `assignments` 

leaving middle section intact lets choose what we need to select the data. Here we need to put our run

WHERE  `runRanges`.`runMin` <= ? 
AND `runRanges`.`runMax` >= ? 

Example:

WHERE  `runRanges`.`runMin` <= 1000
AND `runRanges`.`runMax` >= 1000

We don't need this restriction:

AND `assignments`.`variationId`= ? 

Finally here we need Id of our directory
Like

AND `constantSets`.`constantTypeId` = 4

We get constantTypeId by loading the table by name:

val typeTable = JDBCProvider.getTypeTable("/name/of/the/table")
val constantTypeId = typeTable.id

Now we have everything to create the query:


  1. Our query should look like this:
SELECT DISTINCT  
`assignments`.`variationId` AS `varId`
FROM  `assignments` 
USE INDEX (id_UNIQUE) 
INNER JOIN `runRanges` ON `assignments`.`runRangeId`= `runRanges`.`id` 
INNER JOIN `constantSets` ON `assignments`.`constantSetId` = `constantSets`.`id` 
INNER JOIN `typeTables` ON `constantSets`.`constantTypeId` = `typeTables`.`id` 
WHERE  `runRanges`.`runMin` <= 1000 
AND `runRanges`.`runMax` >= 1000 
AND `constantSets`.`constantTypeId` = 1 

One can run it on existing CCDB instance. Here 1000 is a run number. And 1 is a type table ID. Both are from CCDB test records. Since we selecting only variation ID's we use DISTINCT to get only unique IDs


  1. TL;DR; Impelentation steps
  • get typeTableId by val table = getTypeTable("/name/of/the/table") and table.id
  • call mighty query from p.7 to get variations IDs
  • use variationsById HashMap to get Variation objects by the loaded variations IDs

@lendackya
Copy link
Contributor

Okay! Thank you very much. So basically everything can be done by manipulating and calling this mighty query?

Also, just out of curiosity what do the ON and AS keywords do? I have only really used simple queries before in MySQL.

@lendackya
Copy link
Contributor

Also, how would I sign into the CCDB using workbench?

@DraTeots
Copy link
Contributor Author

Yes, you can tweak this query a little bit to get all variations for the table (by removing runRanges filter) or selecting all runRanges (#38) by replacing assignments.variationId with assignments.runRangeId

AS renames entity. To a simpler name in our case. When you do query like

SELECT X,Y,Z ...

usually language API allows you to get this values by ID or by order number: mysql.get("X"), mysql.get("Y") mysql.get(0) mysql.get(1). Sometimes you have a name collision and sometimes you just want to have simpler name. Basically with query above kotlin part looks like this:

        val result = prs.executeQuery()

        //loop through results
        while (result.next()) {
            val variationId = result.getInt("varId")

That is how AS varId in SQL query reappears in kotlin

ON is a part of JOIN syntax. Here is the famous picture about JOIN types

We use INNER JOIN

INNER JOIN `runRanges` ON `assignments`.`runRangeId`= `runRanges`.`id` 

ON here means that we select such run ranges that have the same ID as in assignments.runRangeId.
I.e., ON shows how data in tables is connected together.

In our case we want to select assignments.variationId but we want to filter the results by runRanges.XXX and constantSets.constantTypeId. So we have to tell MySQL how assignments connected to runRanges and constantSets

@DraTeots
Copy link
Contributor Author

DraTeots commented Dec 21, 2016

Also, how would I sign into the CCDB using workbench?

I believe on the home page of MySQL workbench you have an option to connect to DB. You can test use HallD DB for test purpuses. It doesn't have password and it is read only.

url: hallddb.jlab.org
name: ccdb_user
db: ccdb
with no password

And just press all the buttons =)

@DraTeots
Copy link
Contributor Author

DraTeots commented Dec 21, 2016

Oops typo. Sorry. The right address is hallddb.jlab.org with 2 'd'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants