-
Notifications
You must be signed in to change notification settings - Fork 0
Deeper Dive Into Schema Annotations
In this tutorial, we will highlight a few problematic patterns in web applications that may require more involved schema changes to be compatible with K9db.
For more details, consult the annotated schemas we provide for 7 applications in experiments/schema-annot
.
Many web applications do not actually spell out foreign key constraints in their schema. While they still have columns that act as foreign keys logically (i.e. represent a relationship between two tables), they do not apply the FOREIGN KEY
or REFERENCES
keywords to them.
This makes migrating such schemas to K9db tedious. Usually, it is sufficient to reason about only FK columns when selecting the correct K9db annotations to apply. However, when no FK constraints are explicit, we will need to consider all columns when thinking about annotations, while also reasoning about whether these columns represents a relationship or not.
We recommend introducing the FK constraints into the schema first, and then thinking about annotating them, rather than doing both at the same time.
Some applications may have a de-normalized schema. Sometimes, this may have an affect on annotating the schema.
We encountered one such pattern in ownCloud. Files in ownCloud may be shared with users or groups. This is in encoded in an oc_share
table. In particular, this table contains a share_with
column, that contains an identifier that may correspond to either a user or a group, depending on the value of the adjacent share_type
column. K9db annotations can only be applied to a FK. However, share_with
is not a valid FK, as it may be point to two different tables and columns based on a dynamic value in the row.
We address such scenarios by normalizing the schema. For example, in ownCloud, we add a new share_with_group
column to the oc_share
table, and make this new column a FK to the groups table, while making the share_with
a FK to the users table only. We allow both columns to be nullable. Now, a row that represents a share with a group can set the share_with_group
column but not the share_with
, and vice-versa. We can now apply K9db annotations (specifically, ACCESSED_BY
) to both of these columns to get the desired functionality.
Note that doing this in practice also requires modifying the application code to use the new schema correctly.
Schemas that are normalized in the traditional SQL sense may still be de-normalized with respect to ownership.
For example, in the conference submission system hotcrp
, the PaperConflict
table encodes a many-to-many relationship between papers and users, represented in the Paper
and Contact
tables. The exact relationship depends on the conflictType
column, which can represent authorship, an institutional conflict, a recent collaborator, etc.
We would like to specify that users who are authors of a paper are owners of it. At the same time, users that have other types of conflicts with a paper have no rights to it (i.e., are neither owners nor accessors), but both relationships are represented via the same relationship via PaperConflict
.
We address this by normalizing the schema with respect to ownership. We create a new table PaperAuthors
that only contains records, and annotate the FK linking it to Paper
with OWNS
. We keep PaperConflict
as is, but only use it to record conflicts different than authorship. Thus, we can keep its FK to Paper
unannotated.
As in ownCloud, this also requires modifying the application code to use the new schema.
K9db requires annotating one or more tables with DATA_SUBJECT
. Each row in these tables must represent a unique human person with GDPR rights in the application. Some applications may not contain such an explicit table. For example, mouthful
stores comments made by users in its Comment
table, which contains an Author
text column, that host applications use to identify authors of comments.
mouthful
itself does not store any additional information about these authors. In particular, it has no users
nor authors
tables. The Comment
table is not suited for the annotation DATA_SUBJECT
, as it does not represent unique data subject, specifically its ID doesn't uniquely identify data subjects.
We introduce a new user
table that we annotate with data subject. This new table only contains a text id
column. We make the Author
column in Comment
a FK to to user(id)
. This has the desired behavior of identifying a single data subject, and also connecting that data subject with all comments they authored.