Support arXiv citations #17

polyluxus · 2018-10-02T11:17:38Z

Coming from your recent comment (edited) in the chemistry chat:

Do you have any way any way of deciphering the DOI from a given arXiv URL, like for example https://arxiv.org/abs/1803.00014? This one has a DOI of 10.1103/PhysRevD.97.123524 but I find it impossible to comprehend from the URL (without loading and scraping the webpage of course). Am I right?

I said you are right, and that an arxiv article might not be peer-reviewed and published at all. That does not mean that you shouldn't be able to cite it. Now it'll be fairly complicated to go through all the motions, if an arXiv source is cited, but we can leave that to another day for now and focus on supporting only the platform.

Basically you can use the arXiv API (help) export.arxiv.org to extract the meta data. Let's go ahead with the example from above. Testset with different URL spaces:

https://arxiv.org/abs/1803.00014
https://arxiv.org/format/1803.00014
https://arxiv.org/pdf/1803.00014.pdf?
https://arxiv.org/ps/1803.00014?fname=cm&font=TypeI
arXiv:1803.00014

The last one, arXiv:1803.00014, is the arxiv-id which you need.

This should all basically be straight forward to parse, taking the regex from wikidata for the id: (\d{4}.\d{4,5}|[a-z\-]+(\.[A-Z]{2})?/\d{7})(v\d+)? You actually only need the digit part in the front, so \[Aa][Rr][Xx][Ii][Vv][^\d]+(\d{4}.\d{4,5})\ should already give you the number necessary, see here that it works.

Now you only have to plug it into the api:
http://export.arxiv.org/api/query?id_list=1803.00014
and you'll get back the meta data.
You can extract title <title>(.*)<\/title>, and author(s) <author>(.*)<\/author>, and category <arxiv:primary_category.+?(?=term)term="([^"]+)"[^>]+>[ref. How to match "anything up until this sequence of characters" in a regular expression?]. Add-on: get the doi, where I believe you already have regex to do that in place, or in this case easily <arxiv:doi[^>]+>(.*)<\/arxiv:doi>.

Then cite it as follows:

short: arXiv:dddd.dddd(d) **\[prim_categ\]**

long:

1. Author, A. ; Author, B. TITLE(.) arXiv:dddd.dddd(d) **\[prim_categ\]**

Specifically in this case:

short: arXiv:1803.00014 **[gr-qc]**

long:

1. Diez-Tejedor, A.; Flores, F.; Niz, G. Horndeski dark matter and beyond. 
arXiv:1803.00014 **[gr-qc]**

You can further extend this after getting the DOI to include the Journal publication like (but I think that is only sensible in the long format):

long:

1. Diez-Tejedor, A.; Flores, F.; Niz, G. Horndeski dark matter and beyond. 
arXiv:1803.00014 **[gr-qc]** <br /> 
Published as: Diez-Tejedor, A.; Flores, F.; Niz, G. Horndeski dark matter and beyond. 
*Phys. Rev. D* **2018,** *97* (12), 123524. 
[DOI: 10.1103/PhysRevD.97.123524](https://doi.org/10.1103/PhysRevD.97.123524).

The text was updated successfully, but these errors were encountered:

GaurangTandon self-assigned this Dec 3, 2018

GaurangTandon added the enhancement New feature or request label Dec 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support arXiv citations #17

Support arXiv citations #17

polyluxus commented Oct 2, 2018

Support arXiv citations #17

Support arXiv citations #17

Comments

polyluxus commented Oct 2, 2018