-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #716 #722
base: main
Are you sure you want to change the base?
Closes #716 #722
Conversation
Abstract is build as follows: `{title} {label}: {abstract.label}`
Mismatched offsets in 7 examples, all others pass
Some concrete examples of strange abstract creation Example 1: Example 2 In Example 1, the start and end match up if you include the "Title", but in Example 2, they match up if you exclude the word "Title". |
@phlobo What do we want to do with this dataset? It just contains the annotations but not the abstracts / texts. The latter could be downloaded via API however there might be a lot of offset errors due to changed content etc |
Would it be an option to include the abstracts (e.g., as a zip file) as part of the repo? I guess there are other datasets (MedMentions comes to my mind), that re-distribute Pubmed abstracts as part of a GitHub repo. |
Note: This dataset has a few issues
Is there a standard way these abstracts are formed?
biodatasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_BIGBIO_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneBigBioConfig
for the source schema and one for a bigbio schema.datasets.load_dataset
function.python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py
. - Note