Initial change to use ORM rather than raw queries

Updates small update Update UPdate Clean up code .\.cspell.json Update update Spelling Update parsing .\.cspell.json Fix source and inf paths Fix source and inf paths update for logging Update parsing package table Separate Repository and Package reformat bugfix Update to log Update relative paths
tianocore · Jan 18, 2024 · dce2d59 · dce2d59
1 parent 1069591
commit dce2d59
Show file tree

Hide file tree

Showing 21 changed files with 764 additions and 575 deletions.
diff --git a/docs/user/features/edk2_db.md b/docs/user/features/edk2_db.md
@@ -1,54 +1,51 @@
 # Edk2 Database
 
 Edk2DB enables EDKII repository developers or maintainers to query specific information about their workspace. `Edk2Db`
-utilizes the sqlite3 python module to create and manipulate a sqlite database. Multiple Table generators are provided
-with edk2-pytool-library that developers can register and use, however a [Table Generator](#table-generators) interface
-is also provided to allow the creation of additional parsers that create tables and insert rows into them.
+utilizes the sqlalchemy and sqlite3 python modules to create and manipulate a sqlite database. Multiple Table
+generators are provided with edk2-pytool-library that developers can register and use, however a [Table Generator](#table-generators)
+interface is also provided to allow the creation of additional parsers that create tables and insert rows into them.
 
-Edk2DB automatically registers an environment table which records the current environment at the time of parsing, and
-provides a unique key (a uuid) for that parse to all table generators. This unique key can optionally be used as a
-column in the table to distinguish common values between parsing (Such as having a database that contains parsed
-information about a platform as if it was built in DEBUG mode and as if it was built in RELEASE mode. Another example
-is database that contains parsed information for multiple platforms or packages.)
-
-Edk2DB automatically registers a junction table, `junction`, that acts as a lookup table between unique keys in two
-tables to link them together, primarily for a one-to-many relation. One example used in the codebase is to associate
-an INF file with the many source files it uses.
+Edk2DB provides a unique key (a uuid) for each execution of `parse` to all table generators. This unique key can
+optionally be used as a column in the table to distinguish common values between parsing (Such as having a database
+that contains parsed information about a platform as if it was built in DEBUG mode and as if it was built in RELEASE
+mode. Another example is database that contains parsed information for multiple platforms or packages.)
 
 The database generated in an actual sqlite database and any tools that work on a sqlite database will work on this
 database. VSCode provides multiple extensions for viewing and running queries on a standalone database, along with
 other downloadable tools.
 
+Once parsing is complete, the easiest way to work with the data is to use the context manager
+`with <db>.session() as session:` which provides access to a sqlalchemy session variable for working with data in the
+database. By using sqlalchemy as an ORM, users do not need to worry about the database itself, and will be able to
+work with python objects representing rows in a database. This will be discussed in [Working with Database Data](#working-with-database-data).
+
 ## General Flow
 
 The expected usage of Edk2DB is fairly simple:
 
 1. Instantiate the DB
 2. Register and run the necessary table generators
-3. (optional) run queries on the database through python's sqlite3 module
+3. (optional) Work with the data
 4. Release the database
 5. (optional) run queries on the database through external tools
 
 ### Instantiate Edk2DB
 
-Edk2DB supports normal instantiation and instantiation through a context manager. It is suggested to open the database
-through a context manager, but if using it through normal instantion, remember to do a a final `db.connection.commit()`
-and `db.connection.close()` to cleanly close the database.
+Instantiating a database is as simple as initializing `Edk2DB` with the database path and optionally a Edk2Path object.
+The Edk2Path object is only necessary if running parsers. If you are opening an existing database to work with the
+data, it is not needed. You can optionally create an in-memory database by passing ":memory:" as the path.
 
 ``` python
 db = Edk2DB(db_path, pathobj=pathobj)
-db.commit()
-db.close()
-
-with Edk2DB(db_path, pathobj=pathobj) as db:
-   ...
-
+db = Edk2DB(db_path)
+db = Edk2DB(":memory")
 ```
 
 ### Register and run table generators
 
-A [Table Generator](#table-generators) is a type of parser that creates a table in the database and fills it with rows
-of data. A Table Generator should never expect specific data in a table to exist. It's simple to register a table
+A [Table Generator](#table-generators) is a type of parser that creates a table(s) in the database and fills it with
+rows of data. Pre-made table generators exist at `edk2toollib.database.tables`, but a user can create their own by
+subclassing the `TableGenerator` object also found at `edk2toollib.database.tables`. It's simple to register a table
 generator! simply call the `register()` with one or more of the instantiated parsers:
 
 ``` python
@@ -72,11 +69,6 @@ db.clear_parsers()
 Lastly is running all registered parsers. The `parse(env: dict)` method expects to be provided a dictionary of
 environment variables used when building a platform. Depending on the parser, the dictionary can be empty.
 
-The `parse(env: dict)` command will perform two loops across the parsers.The first loop will create all tables for all
-table parsers. This ensures that any dependencies on tables existing between parsers is handled. The second loop
-performs the parsing and row insertion. The order in which parsers execute is the same as the order that they are
-registered.
-
 ```python
 # Option 1: parse one at a time
 db.register(Parser(key=value2))
@@ -90,14 +82,67 @@ db.register(Parser(key=value1), Parser(key=value2))
 db.parse(env)
 ```
 
-### Release the Database
-
-If you are using a context manager, then this is handled automatically for you. Otherwise, you need to call
-`db.connection.commit()` and `db.connection.close()` on the database (or `__exit__()`)
-
 ## Table Generators
 
 Table generators are just that, classes that subclass the [TableGenerator](/api/database/edk2_db/#edk2toollib.database.edk2_db.TableGenerator)
 , parse some type of information (typically the workspace) and insert the data into one of the tables managed by Edk2DB.
-Multiple table generators are provided by edk2toollib, and can be seen at [edk2toollib/database/tables](https://github.com/tianocore/edk2-pytool-library/tree/master/edk2toollib/database/tables).
+Multiple table generators are provided by edk2toollib, and can be seen at [edk2toollib.database.tables](https://github.com/tianocore/edk2-pytool-library/tree/master/edk2toollib/database/tables).
 Edk2DB can use any class that implements the `TableGenerator` interface.
+
+When creating a a custom table generator, you will also need to create create an ORM mapping for your table(s). Reading
+the [ORM Quick Start](https://docs.sqlalchemy.org/en/20/orm/quickstart.html) provided by sqlalchemy is the best way to
+go, but here is a simple example so you know what to expect
+
+```python
+from edk2toollib.database import Edk2DB
+
+class ExampleTable(Edk2DB.Base):
+   __tablename__ "example"
+
+   id: Mapped[int] = mapped_column(primary_key = True, autoincrement=True)
+   uuid: Mapped[str] = mapped_column(String(32))
+```
+
+This example simply creates a table "example" with two columns. The first is an auto-incrementing primary key while
+the second is a string that is always 32 characters long, representing a uuid. Between the provided documentation above
+and examples found at [edk2toollib.database](https://github.com/tianocore/edk2-pytool-library/blob/master/edk2toollib/database/__init__.py)
+, it should be relatively simple to create a mapping.
+
+## Working with database data
+
+As mentioned at the beggining, Edk2DB uses sqlalchemy's ORM (Object-Relational Mapping) functionality for working with
+data in the database. This abstracts the database schema and the complexities of working with databases (particularly
+one that is unfamiliar or can change on use-case since adding additional tables is supported). Instead users can rely
+on this functionality to write simple queries and get access to database information as objects without needing to
+worry about the database itself.
+
+Users should follow the [ORM Querying Guide](https://docs.sqlalchemy.org/en/20/orm/queryguide/index.html) for detailed
+documentation, but here is a simple query example using the Mappings provided by Edk2DB at [edk2toollib.database](https://github.com/tianocore/edk2-pytool-library/blob/master/edk2toollib/database/__init__.py)
+
+```python
+from edk2toollib.database import Edk2DB, InstancedInf, Fv
+from sqlalchemy.orm import aliased
+
+with Edk2DB(DB_PATH).session() as session:
+   dsc_components_query = (
+      session
+         .query(InstancedInf)
+         .filter_by(cls = None, arch = "IA32")
+         .order_by(InstancedInf.package_name, InstancedInf.path)
+   )
+
+   fdf_components_query = (
+      session
+         .query(InstancedInf)
+         .join(Fv.infs)
+         .filter(InstancedInf.arch == "IA32")
+   )
+
+   dsc_components = set([inf.path for inf in dsc_components_query.all()])
+   fdf_components = set([inf.path for inf in fdf_components_query.all()])
+
+   unused_componets = dsc_components - fdf_components
+```
+
+The above example is a simple way to determine which IA32 components were compiled per the DSC but not placed in the
+final binary per the FDF.
diff --git a/edk2toollib/database/__init__.py b/edk2toollib/database/__init__.py
@@ -7,4 +7,147 @@
 # SPDX-License-Identifier: BSD-2-Clause-Patent
 ##
 """Core classes and methods used to interact with the database module inside edk2-pytool-library."""
+import datetime
+from typing import List, Optional
+
+from sqlalchemy import Column, ForeignKey, Integer, String, Table, UniqueConstraint, func
+from sqlalchemy.orm import Mapped, Session, mapped_column, relationship  # noqa: F401
+
 from .edk2_db import Edk2DB  # noqa: F401
+
+# Association tables. Should not be used directly. Only for relationships
+_source_association = Table(
+    'source_association', Edk2DB.Base.metadata,
+    Column('left_id', Integer, ForeignKey('inf.id')),
+    Column('right_id', Integer, ForeignKey('source.id')),
+)
+
+_instance_source_association = Table(
+    'instance_source_association', Edk2DB.Base.metadata,
+    Column('left_id', Integer, ForeignKey('instancedinf.id')),
+    Column('right_id', Integer, ForeignKey('source.id')),
+)
+
+_fv_association = Table(
+    'fv_association', Edk2DB.Base.metadata,
+    Column('left_id', Integer, ForeignKey('fv.id')),
+    Column('right_id', Integer, ForeignKey('instancedinf.id')),
+)
+_library_association = Table(
+    'library_association', Edk2DB.Base.metadata,
+    Column('left_id', Integer, ForeignKey('inf.id')),
+    Column('right_id', Integer, ForeignKey('library.id')),
+)
+
+_inf_association = Table(
+    'inf_association', Edk2DB.Base.metadata,
+    Column('left_id', Integer, ForeignKey('instancedinf.id')),
+    Column('right_id', Integer, ForeignKey('instancedinf.id')),
+)
+
+class Environment(Edk2DB.Base):
+    """A class to represent an environment in the database."""
+    __tablename__ = "environment"
+
+    id: Mapped[str] = mapped_column(primary_key=True)
+    date: Mapped[datetime.datetime] = mapped_column(insert_default=func.now())
+    version: Mapped[str] = mapped_column(String(40))
+    values: Mapped[List["Value"]] = relationship(back_populates="env", cascade="all, delete-orphan")
+
+class Value(Edk2DB.Base):
+    """A class to represent a key-value pair in the database."""
+    __tablename__ = "value"
+
+    env_id: Mapped[str] = mapped_column(ForeignKey("environment.id"), primary_key=True, index=True)
+    key: Mapped[str] = mapped_column(primary_key=True)
+    value: Mapped[str]
+    env: Mapped["Environment"] = relationship(back_populates="values")
+
+class Inf(Edk2DB.Base):
+    """A class to represent an INF file in the database."""
+    __tablename__ = "inf"
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    path: Mapped[str] = mapped_column(unique=True)
+    guid: Mapped[str]
+    library_class: Mapped[Optional[str]]
+    package_name: Mapped[Optional[str]] = mapped_column(ForeignKey("package.name"))
+    module_type: Mapped[Optional[str]]
+    sources: Mapped[List["Source"]] = relationship(secondary=_source_association)
+    libraries:  Mapped[List["Library"]] = relationship(secondary=_library_association)
+
+class InstancedInf(Edk2DB.Base):
+    """A class to represent an instanced INF file in the database."""
+    __tablename__ = "instancedinf"
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    env: Mapped[str] = mapped_column(ForeignKey("environment.id"), index=True)
+    path: Mapped[str] = mapped_column(ForeignKey("inf.path"))
+    arch: Mapped[str]
+    name: Mapped[str]
+    package: Mapped[Optional["Package"]] = relationship()
+    package_id: Mapped[Optional[int]] = mapped_column(ForeignKey("package.id"))
+    repository: Mapped["Repository"] = relationship()
+    repository_id: Mapped[Optional[int]] = mapped_column(ForeignKey("repository.id"))
+    dsc: Mapped[str]
+    cls: Mapped[Optional[str]]
+    component: Mapped[str] = mapped_column(ForeignKey("inf.path"))
+    libraries: Mapped[List["InstancedInf"]] = relationship(
+        secondary=_inf_association,
+        primaryjoin=(id == _inf_association.c.left_id),
+        secondaryjoin=(id == _inf_association.c.right_id),
+    )
+    sources: Mapped[List["Source"]] = relationship(
+        secondary=_instance_source_association,
+    )
+
+class Source(Edk2DB.Base):
+    """A class to represent a source file in the database."""
+    __tablename__ = "source"
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    path: Mapped[str] = mapped_column(unique=True)
+    license: Mapped[Optional[str]]
+    total_lines: Mapped[Optional[int]]
+    code_lines: Mapped[Optional[int]]
+    comment_lines: Mapped[Optional[int]]
+    blank_lines: Mapped[Optional[int]]
+
+class Library(Edk2DB.Base):
+    """A class to represent a library in the database."""
+    __tablename__ = "library"
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    name: Mapped[str] = mapped_column(unique=True)
+
+class Repository(Edk2DB.Base):
+    """A class to represent a repository in the database."""
+    __tablename__ = "repository"
+    __table_args__ = (UniqueConstraint("name", "path"),)
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    name: Mapped[str]
+    path: Mapped[Optional[str]]
+    packages: Mapped[List["Package"]] = relationship("Package", back_populates="repository")
+
+
+class Package(Edk2DB.Base):
+    """A class to represent a package in the database."""
+    __tablename__ = "package"
+    __table_args__ = (UniqueConstraint("name", "path"),)
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    name: Mapped[str]
+    path: Mapped[str]
+    repository: Mapped["Repository"] = relationship("Repository", back_populates="packages")
+    repository_id: Mapped[int] = mapped_column(ForeignKey("repository.id"))
+
+class Fv(Edk2DB.Base):
+    """A class to represent an FV in the database."""
+    __tablename__ = "fv"
+
+    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
+    env: Mapped[str] = mapped_column(ForeignKey("environment.id"), index=True)
+    name: Mapped[str]
+    fdf: Mapped[str]
+    infs: Mapped[List["InstancedInf"]] = relationship(secondary=_fv_association)