Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site format inconsistency across articles #4057

Open
2 of 4 tasks
shyousefi opened this issue Nov 14, 2024 · 2 comments
Open
2 of 4 tasks

Site format inconsistency across articles #4057

shyousefi opened this issue Nov 14, 2024 · 2 comments
Assignees
Labels
correction for corrections submitted to the anthology metadata Correction to metadata

Comments

@shyousefi
Copy link

Confirm that this is a metadata correction

  • I want to file corrections to make the metadata match the PDF file hosted on the ACL Anthology.

Anthology ID

2024.signlang-1.3

Type of Paper Metadata Correction

  • Paper Title
  • Paper Abstract
  • Author Name(s)

Correction to Paper Title

No response

Correction to Paper Abstract

The site format varies across articles in this particular link (along with others). For some articles, accessing the abstract is not possible through this page. When scraping the page, the abstract line remains empty. To obtain the abstract, you must navigate to the individual article link, as the abstract is unavailable on this page. In such cases, extracting the PDF becomes necessary.

Correction to Author Name(s)

No response

@shyousefi shyousefi added correction for corrections submitted to the anthology metadata Correction to metadata labels Nov 14, 2024
@mbollmann
Copy link
Member

Whether abstracts appear on the website or not depends on the metadata the workshop organizers supplied us; we don’t scrape PDFs, for example, to get the abstracts. The inconsistency between the volume page and the individual paper pages is something that ideally shouldn’t happen, though.

However, I would really not recommend scraping the web pages at all — you can extractly all information directly from our XML files or access them through our Python library.

@shyousefi
Copy link
Author

shyousefi commented Nov 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correction for corrections submitted to the anthology metadata Correction to metadata
Projects
None yet
Development

No branches or pull requests

3 participants