-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
support earth science records that can appear both as eprint and pub
- Loading branch information
Showing
10 changed files
with
192 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
47 changes: 47 additions & 0 deletions
47
adsdocmatch/tests/unittests/stubdata/ArXiv/oai/eprints/2312/08579
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
<record> | ||
<header> | ||
<identifier>oai:arXiv.org:2312.08579</identifier> | ||
<datestamp>2023-12-15</datestamp> | ||
<setSpec>cs</setSpec> | ||
<setSpec>physics:astro-ph</setSpec> | ||
</header> | ||
<metadata> | ||
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> | ||
<dc:title>Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach</dc:title> | ||
<dc:creator>Shapurian, Golnaz</dc:creator> | ||
<dc:creator>Kurtz, Michael J</dc:creator> | ||
<dc:creator>Accomazzi, Alberto</dc:creator> | ||
<dc:subject>Computer Science - Computation and Language</dc:subject> | ||
<dc:subject>Astrophysics - Instrumentation and Methods for Astrophysics</dc:subject> | ||
<dc:subject>Computer Science - Machine Learning</dc:subject> | ||
<dc:description> The automatic identification of planetary feature names in astronomy | ||
publications presents numerous challenges. These features include craters, | ||
defined as roughly circular depressions resulting from impact or volcanic | ||
activity; dorsas, which are elongate raised structures or wrinkle ridges; and | ||
lacus, small irregular patches of dark, smooth material on the Moon, referred | ||
to as "lake" (Planetary Names Working Group, n.d.). Many feature names overlap | ||
with places or people's names that they are named after, for example, Syria, | ||
Tempe, Einstein, and Sagan, to name a few (U.S. Geological Survey, n.d.). Some | ||
feature names have been used in many contexts, for instance, Apollo, which can | ||
refer to mission, program, sample, astronaut, seismic, seismometers, core, era, | ||
data, collection, instrument, and station, in addition to the crater on the | ||
Moon. Some feature names can appear in the text as adjectives, like the lunar | ||
craters Black, Green, and White. Some feature names in other contexts serve as | ||
directions, like craters West and South on the Moon. Additionally, some | ||
features share identical names across different celestial bodies, requiring | ||
disambiguation, such as the Adams crater, which exists on both the Moon and | ||
Mars. We present a multi-step pipeline combining rule-based filtering, | ||
statistical relevance analysis, part-of-speech (POS) tagging, named entity | ||
recognition (NER) model, hybrid keyword harvesting, knowledge graph (KG) | ||
matching, and inference with a locally installed large language model (LLM) to | ||
reliably identify planetary names despite these challenges. When evaluated on a | ||
dataset of astronomy papers from the Astrophysics Data System (ADS), this | ||
methodology achieves an F1-score over 0.97 in disambiguating planetary feature | ||
names. | ||
</dc:description> | ||
<dc:date>2023-12-13</dc:date> | ||
<dc:type>text</dc:type> | ||
<dc:identifier>http://arxiv.org/abs/2312.08579</dc:identifier> | ||
</oai_dc:dc> | ||
</metadata> | ||
</record> |
15 changes: 15 additions & 0 deletions
15
adsdocmatch/tests/unittests/stubdata/text/L48/L48-23288.abs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
Title: Native H2 exploration in the western Pyrenean | ||
foothills | ||
Authors: Lefeuvre, Nicolas; Truche, Laurent; | ||
Donze, Frederic Victor; Ducoux, Maxime; | ||
Barré, Guillaume; Fakoury, Rose-Adeline; | ||
Calassou, Sylvain; Gaucher, Eric | ||
Journal: ESS Open Archive, id. essoar.10507102.1 | ||
Publication Date: 05/2021 | ||
Category: Earth Science | ||
Origin: ESSOAR | ||
DOI: 10.1002/essoar.10507102.1 | ||
Bibliographic Code: 2021esoar.10507102L | ||
|
||
Abstract | ||
Not Available |
29 changes: 29 additions & 0 deletions
29
adsdocmatch/tests/unittests/stubdata/text/L52/L52-28159.abs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
Title: Wildfire influence on recent US pollution trends | ||
Authors: Burke, Marshall; Childs, Marissa; | ||
de la Cuesta, Brandon; Qiu, Minghao; Li, Jessica; | ||
Gould, Carlos; Heft-Neal, Sam; Wara, Michael | ||
Journal: EarthArXiv Preprint, id. X58667 | ||
Publication Date: 12/2022 | ||
Category: Earth Science | ||
Origin: EAARX | ||
Keywords: Environmental Health and Protection | ||
DOI: 10.31223/x58667 | ||
Bibliographic Code: 2022EaArX...X58667B | ||
|
||
Abstract | ||
Steady improvements in ambient air quality in the US over the past | ||
several decades have led to large public health benefits. However, | ||
recent trends in PM2.5 concentrations, a key pollutant, have stagnated | ||
or begun to reverse throughout much of the US. We quantify the | ||
contribution of wildfire smoke to these trends and find that since 2016, | ||
wildfire smoke has significantly slowed or reversed previous | ||
improvements in average annual PM2.5 concentrations in two-thirds of US | ||
states, eroding 23% of previous gains on average in those states | ||
(equivalent to 3.6 years of air quality progress) and over 50% in | ||
multiple western states. Smoke influence on trends in extreme PM2.5 | ||
concentrations is detectable by 2010, but remains concentrated primarily | ||
in western states. Wildfire-driven increases in ambient PM2.5 | ||
concentrations are unregulated under current air pollution law, and, | ||
absent additional intervention, wildfire's contribution to regional and | ||
national air quality trends is likely to grow as the climate continues | ||
to warm. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters