Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Edk2DB to use an ORM instead of sqlite3 directly #485

Merged
merged 8 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 80 additions & 35 deletions docs/user/features/edk2_db.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,51 @@
# Edk2 Database

Edk2DB enables EDKII repository developers or maintainers to query specific information about their workspace. `Edk2Db`
utilizes the sqlite3 python module to create and manipulate a sqlite database. Multiple Table generators are provided
with edk2-pytool-library that developers can register and use, however a [Table Generator](#table-generators) interface
is also provided to allow the creation of additional parsers that create tables and insert rows into them.
utilizes the sqlalchemy and sqlite3 python modules to create and manipulate a sqlite database. Multiple Table
generators are provided with edk2-pytool-library that developers can register and use, however a [Table Generator](#table-generators)
interface is also provided to allow the creation of additional parsers that create tables and insert rows into them.

Edk2DB automatically registers an environment table which records the current environment at the time of parsing, and
provides a unique key (a uuid) for that parse to all table generators. This unique key can optionally be used as a
column in the table to distinguish common values between parsing (Such as having a database that contains parsed
information about a platform as if it was built in DEBUG mode and as if it was built in RELEASE mode. Another example
is database that contains parsed information for multiple platforms or packages.)

Edk2DB automatically registers a junction table, `junction`, that acts as a lookup table between unique keys in two
tables to link them together, primarily for a one-to-many relation. One example used in the codebase is to associate
an INF file with the many source files it uses.
Edk2DB provides a unique key (a uuid) for each execution of `parse` to all table generators. This unique key can
optionally be used as a column in the table to distinguish common values between parsing (Such as having a database
that contains parsed information about a platform as if it was built in DEBUG mode and as if it was built in RELEASE
mode. Another example is database that contains parsed information for multiple platforms or packages.)

The database generated in an actual sqlite database and any tools that work on a sqlite database will work on this
database. VSCode provides multiple extensions for viewing and running queries on a standalone database, along with
other downloadable tools.

Once parsing is complete, the easiest way to work with the data is to use the context manager
`with <db>.session() as session:` which provides access to a sqlalchemy session variable for working with data in the
database. By using sqlalchemy as an ORM, users do not need to worry about the database itself, and will be able to
work with python objects representing rows in a database. This will be discussed in [Working with Database Data](#working-with-database-data).

## General Flow

The expected usage of Edk2DB is fairly simple:

1. Instantiate the DB
2. Register and run the necessary table generators
3. (optional) run queries on the database through python's sqlite3 module
3. (optional) Work with the data
4. Release the database
5. (optional) run queries on the database through external tools

### Instantiate Edk2DB

Edk2DB supports normal instantiation and instantiation through a context manager. It is suggested to open the database
through a context manager, but if using it through normal instantion, remember to do a a final `db.connection.commit()`
and `db.connection.close()` to cleanly close the database.
Instantiating a database is as simple as initializing `Edk2DB` with the database path and optionally a Edk2Path object.
The Edk2Path object is only necessary if running parsers. If you are opening an existing database to work with the
data, it is not needed. You can optionally create an in-memory database by passing ":memory:" as the path.

``` python
db = Edk2DB(db_path, pathobj=pathobj)
db.commit()
db.close()

with Edk2DB(db_path, pathobj=pathobj) as db:
...

db = Edk2DB(db_path)
db = Edk2DB(":memory")
```

### Register and run table generators

A [Table Generator](#table-generators) is a type of parser that creates a table in the database and fills it with rows
of data. A Table Generator should never expect specific data in a table to exist. It's simple to register a table
A [Table Generator](#table-generators) is a type of parser that creates a table(s) in the database and fills it with
rows of data. Pre-made table generators exist at `edk2toollib.database.tables`, but a user can create their own by
subclassing the `TableGenerator` object also found at `edk2toollib.database.tables`. It's simple to register a table
generator! simply call the `register()` with one or more of the instantiated parsers:

``` python
Expand All @@ -72,11 +69,6 @@ db.clear_parsers()
Lastly is running all registered parsers. The `parse(env: dict)` method expects to be provided a dictionary of
environment variables used when building a platform. Depending on the parser, the dictionary can be empty.

The `parse(env: dict)` command will perform two loops across the parsers.The first loop will create all tables for all
table parsers. This ensures that any dependencies on tables existing between parsers is handled. The second loop
performs the parsing and row insertion. The order in which parsers execute is the same as the order that they are
registered.

```python
# Option 1: parse one at a time
db.register(Parser(key=value2))
Expand All @@ -90,14 +82,67 @@ db.register(Parser(key=value1), Parser(key=value2))
db.parse(env)
```

### Release the Database

If you are using a context manager, then this is handled automatically for you. Otherwise, you need to call
`db.connection.commit()` and `db.connection.close()` on the database (or `__exit__()`)

## Table Generators

Table generators are just that, classes that subclass the [TableGenerator](/api/database/edk2_db/#edk2toollib.database.edk2_db.TableGenerator)
, parse some type of information (typically the workspace) and insert the data into one of the tables managed by Edk2DB.
Multiple table generators are provided by edk2toollib, and can be seen at [edk2toollib/database/tables](https://github.com/tianocore/edk2-pytool-library/tree/master/edk2toollib/database/tables).
Multiple table generators are provided by edk2toollib, and can be seen at [edk2toollib.database.tables](https://github.com/tianocore/edk2-pytool-library/tree/master/edk2toollib/database/tables).
Edk2DB can use any class that implements the `TableGenerator` interface.

When creating a a custom table generator, you will also need to create create an ORM mapping for your table(s). Reading
the [ORM Quick Start](https://docs.sqlalchemy.org/en/20/orm/quickstart.html) provided by sqlalchemy is the best way to
go, but here is a simple example so you know what to expect

```python
from edk2toollib.database import Edk2DB

class ExampleTable(Edk2DB.Base):
__tablename__ "example"

id: Mapped[int] = mapped_column(primary_key = True, autoincrement=True)
uuid: Mapped[str] = mapped_column(String(32))
```

This example simply creates a table "example" with two columns. The first is an auto-incrementing primary key while
the second is a string that is always 32 characters long, representing a uuid. Between the provided documentation above
and examples found at [edk2toollib.database](https://github.com/tianocore/edk2-pytool-library/blob/master/edk2toollib/database/__init__.py)
, it should be relatively simple to create a mapping.

## Working with database data

As mentioned at the beggining, Edk2DB uses sqlalchemy's ORM (Object-Relational Mapping) functionality for working with
data in the database. This abstracts the database schema and the complexities of working with databases (particularly
one that is unfamiliar or can change on use-case since adding additional tables is supported). Instead users can rely
on this functionality to write simple queries and get access to database information as objects without needing to
worry about the database itself.

Users should follow the [ORM Querying Guide](https://docs.sqlalchemy.org/en/20/orm/queryguide/index.html) for detailed
documentation, but here is a simple query example using the Mappings provided by Edk2DB at [edk2toollib.database](https://github.com/tianocore/edk2-pytool-library/blob/master/edk2toollib/database/__init__.py)

```python
from edk2toollib.database import Edk2DB, InstancedInf, Fv
from sqlalchemy.orm import aliased

with Edk2DB(DB_PATH).session() as session:
dsc_components_query = (
session
.query(InstancedInf)
.filter_by(cls = None, arch = "IA32")
.order_by(InstancedInf.package_name, InstancedInf.path)
)

fdf_components_query = (
session
.query(InstancedInf)
.join(Fv.infs)
.filter(InstancedInf.arch == "IA32")
)

dsc_components = set([inf.path for inf in dsc_components_query.all()])
fdf_components = set([inf.path for inf in fdf_components_query.all()])

unused_componets = dsc_components - fdf_components
```

The above example is a simple way to determine which IA32 components were compiled per the DSC but not placed in the
final binary per the FDF.
143 changes: 143 additions & 0 deletions edk2toollib/database/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,147 @@
# SPDX-License-Identifier: BSD-2-Clause-Patent
##
"""Core classes and methods used to interact with the database module inside edk2-pytool-library."""
import datetime
from typing import List, Optional

from sqlalchemy import Column, ForeignKey, Integer, String, Table, UniqueConstraint, func
from sqlalchemy.orm import Mapped, Session, mapped_column, relationship # noqa: F401

from .edk2_db import Edk2DB # noqa: F401

# Association tables. Should not be used directly. Only for relationships
_source_association = Table(
'source_association', Edk2DB.Base.metadata,
Column('left_id', Integer, ForeignKey('inf.id')),
Column('right_id', Integer, ForeignKey('source.id')),
)

_instance_source_association = Table(
'instance_source_association', Edk2DB.Base.metadata,
Column('left_id', Integer, ForeignKey('instancedinf.id')),
Column('right_id', Integer, ForeignKey('source.id')),
)

_fv_association = Table(
'fv_association', Edk2DB.Base.metadata,
Column('left_id', Integer, ForeignKey('fv.id')),
Column('right_id', Integer, ForeignKey('instancedinf.id')),
)
_library_association = Table(
'library_association', Edk2DB.Base.metadata,
Column('left_id', Integer, ForeignKey('inf.id')),
Column('right_id', Integer, ForeignKey('library.id')),
)

_inf_association = Table(
'inf_association', Edk2DB.Base.metadata,
Column('left_id', Integer, ForeignKey('instancedinf.id')),
Column('right_id', Integer, ForeignKey('instancedinf.id')),
)

class Environment(Edk2DB.Base):
"""A class to represent an environment in the database."""
__tablename__ = "environment"

id: Mapped[str] = mapped_column(primary_key=True)
date: Mapped[datetime.datetime] = mapped_column(insert_default=func.now())
version: Mapped[str] = mapped_column(String(40))
values: Mapped[List["Value"]] = relationship(back_populates="env", cascade="all, delete-orphan")

class Value(Edk2DB.Base):
"""A class to represent a key-value pair in the database."""
__tablename__ = "value"

env_id: Mapped[str] = mapped_column(ForeignKey("environment.id"), primary_key=True, index=True)
key: Mapped[str] = mapped_column(primary_key=True)
value: Mapped[str]
env: Mapped["Environment"] = relationship(back_populates="values")

class Inf(Edk2DB.Base):
"""A class to represent an INF file in the database."""
__tablename__ = "inf"

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
path: Mapped[str] = mapped_column(unique=True)
guid: Mapped[str]
library_class: Mapped[Optional[str]]
package_name: Mapped[Optional[str]] = mapped_column(ForeignKey("package.name"))
module_type: Mapped[Optional[str]]
sources: Mapped[List["Source"]] = relationship(secondary=_source_association)
libraries: Mapped[List["Library"]] = relationship(secondary=_library_association)

class InstancedInf(Edk2DB.Base):
"""A class to represent an instanced INF file in the database."""
__tablename__ = "instancedinf"

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
env: Mapped[str] = mapped_column(ForeignKey("environment.id"), index=True)
path: Mapped[str] = mapped_column(ForeignKey("inf.path"))
arch: Mapped[str]
name: Mapped[str]
package: Mapped[Optional["Package"]] = relationship()
package_id: Mapped[Optional[int]] = mapped_column(ForeignKey("package.id"))
repository: Mapped["Repository"] = relationship()
repository_id: Mapped[Optional[int]] = mapped_column(ForeignKey("repository.id"))
dsc: Mapped[str]
cls: Mapped[Optional[str]]
component: Mapped[str] = mapped_column(ForeignKey("inf.path"))
libraries: Mapped[List["InstancedInf"]] = relationship(
secondary=_inf_association,
primaryjoin=(id == _inf_association.c.left_id),
secondaryjoin=(id == _inf_association.c.right_id),
)
sources: Mapped[List["Source"]] = relationship(
secondary=_instance_source_association,
)

class Source(Edk2DB.Base):
"""A class to represent a source file in the database."""
__tablename__ = "source"

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
path: Mapped[str] = mapped_column(unique=True)
license: Mapped[Optional[str]]
total_lines: Mapped[Optional[int]]
code_lines: Mapped[Optional[int]]
comment_lines: Mapped[Optional[int]]
blank_lines: Mapped[Optional[int]]

class Library(Edk2DB.Base):
"""A class to represent a library in the database."""
__tablename__ = "library"

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
name: Mapped[str] = mapped_column(unique=True)

class Repository(Edk2DB.Base):
"""A class to represent a repository in the database."""
__tablename__ = "repository"
__table_args__ = (UniqueConstraint("name", "path"),)

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
name: Mapped[str]
path: Mapped[Optional[str]]
packages: Mapped[List["Package"]] = relationship("Package", back_populates="repository")


class Package(Edk2DB.Base):
"""A class to represent a package in the database."""
__tablename__ = "package"
__table_args__ = (UniqueConstraint("name", "path"),)

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
name: Mapped[str]
path: Mapped[str]
repository: Mapped["Repository"] = relationship("Repository", back_populates="packages")
repository_id: Mapped[int] = mapped_column(ForeignKey("repository.id"))

class Fv(Edk2DB.Base):
"""A class to represent an FV in the database."""
__tablename__ = "fv"

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
env: Mapped[str] = mapped_column(ForeignKey("environment.id"), index=True)
name: Mapped[str]
fdf: Mapped[str]
infs: Mapped[List["InstancedInf"]] = relationship(secondary=_fv_association)
Loading