Create abstract interface #61

reece · 2019-05-29T02:22:12Z

seqrepo currently has distinct backends for aliases and sequences. Both are essentially key-value stores.

In order to open up other possibilities (redis, elasticache, federated repo proxy, local cache, etc.), implement abstract classes and subclass for each backend type.

teemuvesala · 2019-07-02T06:35:17Z

Does this mean that we'll be able to move data off from SQLite3? If that's the case, then I can help you with this to speed up the development. That's part of our requirement and I'm just investigating the possibility for that.

teemuvesala · 2019-07-02T08:12:42Z

For me the code with DB management seems to be really straightforward and uses mostly Python's DB API. My notes and first review points following non standard DB stuff:

Initialisation
Couple places where integrity errors are ignored by passing the sqlite3 specific exception. Why they are ignored? Can this cause missing data from database?

Of course if you want more than "select any SQL database" kind of abstraction, then the code requires a lot more work. But in our case it would be just enough to

Get SQL dump from pulled SQLite3 database
Export it to the PostgreSQL or some other database.
Make library SQL database engine agnostic by accepting the DB object at initialisation parameters.

teemuvesala · 2019-07-02T10:38:56Z

And after reading it even better it seems that SQLite3 module is used non-standard way. It provides short cut to avoid separate (visible) cursor object. That shortcut is used everywhere. The DB-API 2.0 requires separate use of cursor object. So e.g following is illegal:

this._db.execute("<sql query>").fetchone()

This should be:

cursor = this._db.cursor()
cursor.execute("<sql query>")
cursor.fetchone()

reece · 2019-07-02T15:06:43Z

@teemuvesala...

Re: Yes, the intention for this issue is to refactor the current sqlite-specific backend into a service interface with multiple implementations (one of which is sqlite).

Can you please explain elaborate on your requirements? I would certainly appreciate code contributions if we're aligned on direction.

Are you imagining putting sequences in postgresql too? If so, do you have plans for optimizing sequence slicing (e.g., reading a few nucleotides from a chromosome)?

Two of your suggestions should be undertaken anyway: 1) db as init arg, and 2) deconstruct the execute().fetchone() construct. Let's save (1) for the refactor (since it'll come up then anyway). Want to submit a PR for (2)?

Can you elaborate on your comment about exceptions being ignored? Examples will help.

teemuvesala · 2019-07-02T17:08:35Z

Our current requirement (which is almost instant) is to replace the SQLite with PostgreSQL, but long term plan is to get DynamoDB support.

So from roadmap perspective good idea would be to 1) Remove SQLite specific code (which is your step 2). I've started it and I'll do. 2) Add support for any Python module which is DB API 2.0 compatible. 3) Add non-SQL-way to do things.

File biocommons.seqrepo/src/biocommons/seqrepo/seqaliasdb/seqaliasdb.py has following snippet at function store_alias:

try:
    c = self._db.execute(SQL statement removed for simplicity)
    # success => new record
    return c.lastrowid
except sqlite3.IntegrityError:
    pass

The problem with this snippet is that exception is SQLite specific. But this follows the DB API 2.0 specification correctly. Now if we think that _db is now some other (e.g. imaginarydb) driver, the exception would be imaginarydb.IntegrityError. The only common parent class is Exception. So to support this non-SQLite3-way we'd have to check if the class name of exception has IntegrityError, if it has, then ignore the error, otherwise rise the caught exception.

teemuvesala · 2019-07-05T09:25:13Z

Now at my repository is the code which can use also PostgreSQL for database. It needed quite many changes. E.g. parametrisation required changes. At SQLIte the style is SELECT * FROM table WHERE id=?, but with Psycopg2 it's SELECT * FROM table WHERE id=%s. It works at read only mode well enough for our use.

reece · 2023-08-03T01:43:44Z

The intended class hierarchy is shown here:

github-actions · 2023-09-26T01:42:18Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2023-10-03T01:42:50Z

This issue was closed because it has been stalled for 7 days with no activity.

reece added this to the 0.7 milestone Jun 9, 2020

korikuzma mentioned this issue Jul 24, 2023

Add new backends to seqrepo biocommons/hackathon-2023#12

Open

reece changed the title ~~Create abstract backend classes and adapt current backends~~ Create abstract interface Aug 19, 2023

github-actions bot added the stale Issue is stale and subject to automatic closing label Sep 26, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2023

reece added this to biocommons long-term roadmap Nov 27, 2023

reece removed the stale Issue is stale and subject to automatic closing label Nov 27, 2023

reece reopened this Nov 27, 2023

reece mentioned this issue Feb 5, 2024

Generalize the seqrepo interface and implement new backends #136

Open

reece mentioned this issue Feb 19, 2024

Support custom/federated data #132

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create abstract interface #61

Create abstract interface #61

reece commented May 29, 2019 •

edited

Loading

teemuvesala commented Jul 2, 2019

teemuvesala commented Jul 2, 2019

teemuvesala commented Jul 2, 2019

reece commented Jul 2, 2019

teemuvesala commented Jul 2, 2019

teemuvesala commented Jul 5, 2019

reece commented Aug 3, 2023 •

edited

Loading

github-actions bot commented Sep 26, 2023

github-actions bot commented Oct 3, 2023

Create abstract interface #61

Create abstract interface #61

Comments

reece commented May 29, 2019 • edited Loading

teemuvesala commented Jul 2, 2019

teemuvesala commented Jul 2, 2019

teemuvesala commented Jul 2, 2019

reece commented Jul 2, 2019

teemuvesala commented Jul 2, 2019

teemuvesala commented Jul 5, 2019

reece commented Aug 3, 2023 • edited Loading

github-actions bot commented Sep 26, 2023

github-actions bot commented Oct 3, 2023

reece commented May 29, 2019 •

edited

Loading

reece commented Aug 3, 2023 •

edited

Loading