Problems with BioThings BindingDB #201

colleenXu · 2024-05-22T05:12:03Z

@newgene @andrewsu @everaldorodrigo @rjawesome

It looks like there's a few problems with the current BioThings Binding DB API, and it would be helpful to fix these and maybe update the data.

Andy Crouse (Translator UI) has found that some relation.bindingdb_link urls now don't work. I wonder if some urls were updated...and maybe using a recent data release would help.
- his example (Translator Slack link) comes from this record in the API with this relation.bindingdb_link url. I think this relationship still exists - see the bottom row here.
- but other urls are still working: this record in the API has this working relation.bindingdb_link
Problems with incorrect, outdated, or problematic object fields. Perhaps using a recent data release would help, PLUS adjusting the parser. I see that Rohan started some work on adjusting the parser...
Not broken, but a nice-to-have-if-possible: adjusting the parser to assign more specific relationships

Note:

Previous issue creating the BindingDB API Data source: BindingDB #70
BindingDB may update fairly frequently, maybe monthly? See https://www.bindingdb.org/rwd/bind/chemsearch/marvin/Download.jsp

The text was updated successfully, but these errors were encountered:

everaldorodrigo · 2024-05-31T20:19:46Z

@colleenXu, the latest data was released to the CI environment.

colleenXu · 2024-06-13T04:06:35Z

I'm looking at the current CI responses now...

I think there's a parsing issue with subject.uniprot.secondary_accession. In this document, it looks like the 1-string-element should have been split for each value "B4DYS6 D3DVV8 P19138 P20426 Q14013 Q5U065". Compare it to the same document in ncats.io.

colleenXu · 2024-06-13T04:20:34Z

Regarding problem 1 (relation.bindingdb_link urls not reaching the actual webpages)...

This seems to be addressed in CI! It looks like enzyme names were updated, which meant the webpage urls also needed to be updated.

in my opening post, I pointed out this record with this problematic bindingdb_link. And the current CI's corresponding record has a different bindingdb_link that works! The only diff I see in the urls is the enzyme name.
I found another example of a record with a problem bindingdb_link. And the current CI's corresponding record has a different bindingdb_link that works!

colleenXu · 2024-06-13T05:12:40Z

Regarding problem 2 (object field values are incorrect/problematic/outdated)...

Some problems were addressed in CI!

object.chembl: multiple IDs now seem to be correctly split. I checked all previous examples. Note that I still haven't checked how reliable/accurate these IDs are.
object.name: multiple values now seem to be correctly split

One idea is double-check how reliable the chembl IDs are, and if they're good, to switch BTE/x-bte annotation to using it rather than inchikey (current)/pubchem_cid (previous).

However, this would decrease our coverage of this resource to <50% (old breakdown's proportions are still roughly correct).

Some problems still exist. We may have to dig deeper into the data/parser to figure these out...

object.pubchem_cid: all "incorrect" values are still there (more details in another post)
object.inchikey: all "incorrect" values are still there.

more investigation into the inchikey examples

Example 1: CI has the object.inchikey YQCLAYRIYWYIKH-UHFFFAOYSA-N. But Translator's NodeNorm doesn't recognize this ID and it maps the object chembl IDs to slightly different inchikeys:

CHEMBL341945 to YQCLAYRIYWYIKH-MKCFTUBBSA-N
CHEMBL106813 to YQCLAYRIYWYIKH-WGPBWIAQSA-N

Example 2: CI has the object.inchikey ZUXABONWMNSFBN-UHFFFAOYSA-N for clozapine. But Translator's NodeNorm treats this inchikey as a different entity 3-chloro-6-(4-methyl-1-piperazinyl)-5H-benzo[b][1,4]benzodiazepine. Instead, NodeNorm uses a different inchikey for clozapine: QZUDBNBUXVUHMW-UHFFFAOYSA-N

And a note: problem 3 (optional, more specific relationships) hasn't been addressed yet.

everaldorodrigo · 2024-06-26T01:19:35Z

I'm looking at the current CI responses now...

I think there's a parsing issue with subject.uniprot.secondary_accession. In this document, it looks like the 1-string-element should have been split for each value "B4DYS6 D3DVV8 P19138 P20426 Q14013 Q5U065". Compare it to the same document in ncats.io.

Hi @colleenXu,

Now, the field subject.uniprot.secondary_accession has the values split for each value.

It's deployed to the CI environment. Let me know if it is as expected.

colleenXu · 2024-06-27T23:03:18Z

@everaldorodrigo

subject.uniprot.secondary_accession now looks wrong in a different way.

Sometimes the array's last value is an array (a duplication happening somewhere)? Examples:

newgene · 2024-06-28T04:35:07Z

good catch @colleenXu !

Also want to mention that this kind of parsing issue can be identified at its early stage if we run the inspect step after the data upload. It should warn a field if its values have mixed data types. @everaldorodrigo

colleenXu added bug Something isn't working enhancement New feature or request labels May 22, 2024

newgene assigned everaldorodrigo May 22, 2024

everaldorodrigo linked a pull request Jun 11, 2024 that will close this issue

New data on May 2024 biothings/BindingDB#2

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with BioThings BindingDB #201

Problems with BioThings BindingDB #201

colleenXu commented May 22, 2024 •

edited

Loading

everaldorodrigo commented May 31, 2024

colleenXu commented Jun 13, 2024

colleenXu commented Jun 13, 2024 •

edited

Loading

colleenXu commented Jun 13, 2024 •

edited

Loading

everaldorodrigo commented Jun 26, 2024

colleenXu commented Jun 27, 2024

newgene commented Jun 28, 2024

Problems with BioThings BindingDB #201

Problems with BioThings BindingDB #201

Comments

colleenXu commented May 22, 2024 • edited Loading

everaldorodrigo commented May 31, 2024

colleenXu commented Jun 13, 2024

colleenXu commented Jun 13, 2024 • edited Loading

colleenXu commented Jun 13, 2024 • edited Loading

everaldorodrigo commented Jun 26, 2024

colleenXu commented Jun 27, 2024

newgene commented Jun 28, 2024

colleenXu commented May 22, 2024 •

edited

Loading

colleenXu commented Jun 13, 2024 •

edited

Loading

colleenXu commented Jun 13, 2024 •

edited

Loading