Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support arXiv citations #17

Open
polyluxus opened this issue Oct 2, 2018 · 0 comments
Open

Support arXiv citations #17

polyluxus opened this issue Oct 2, 2018 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@polyluxus
Copy link

Coming from your recent comment (edited) in the chemistry chat:

Do you have any way any way of deciphering the DOI from a given arXiv URL, like for example https://arxiv.org/abs/1803.00014? This one has a DOI of 10.1103/PhysRevD.97.123524 but I find it impossible to comprehend from the URL (without loading and scraping the webpage of course). Am I right?

I said you are right, and that an arxiv article might not be peer-reviewed and published at all. That does not mean that you shouldn't be able to cite it. Now it'll be fairly complicated to go through all the motions, if an arXiv source is cited, but we can leave that to another day for now and focus on supporting only the platform.

Basically you can use the arXiv API (help) export.arxiv.org to extract the meta data. Let's go ahead with the example from above. Testset with different URL spaces:

https://arxiv.org/abs/1803.00014
https://arxiv.org/format/1803.00014
https://arxiv.org/pdf/1803.00014.pdf?
https://arxiv.org/ps/1803.00014?fname=cm&font=TypeI
arXiv:1803.00014

The last one, arXiv:1803.00014, is the arxiv-id which you need.

This should all basically be straight forward to parse, taking the regex from wikidata for the id: (\d{4}.\d{4,5}|[a-z\-]+(\.[A-Z]{2})?/\d{7})(v\d+)? You actually only need the digit part in the front, so \[Aa][Rr][Xx][Ii][Vv][^\d]+(\d{4}.\d{4,5})\ should already give you the number necessary, see here that it works.

Now you only have to plug it into the api:
http://export.arxiv.org/api/query?id_list=1803.00014
and you'll get back the meta data.
You can extract title <title>(.*)<\/title>, and author(s) <author>(.*)<\/author>, and category <arxiv:primary_category.+?(?=term)term="([^"]+)"[^>]+>[ref. How to match "anything up until this sequence of characters" in a regular expression?]. Add-on: get the doi, where I believe you already have regex to do that in place, or in this case easily <arxiv:doi[^>]+>(.*)<\/arxiv:doi>.

Then cite it as follows:

  • short: arXiv:dddd.dddd(d) **\[prim_categ\]**
  • long:
    1. Author, A. ; Author, B. TITLE(.) arXiv:dddd.dddd(d) **\[prim_categ\]**
    

Specifically in this case:

  • short: arXiv:1803.00014 **[gr-qc]**
  • long:
    1. Diez-Tejedor, A.; Flores, F.; Niz, G. Horndeski dark matter and beyond. 
    arXiv:1803.00014 **[gr-qc]**
    

You can further extend this after getting the DOI to include the Journal publication like (but I think that is only sensible in the long format):

  • long:
    1. Diez-Tejedor, A.; Flores, F.; Niz, G. Horndeski dark matter and beyond. 
    arXiv:1803.00014 **[gr-qc]** <br /> 
    Published as: Diez-Tejedor, A.; Flores, F.; Niz, G. Horndeski dark matter and beyond. 
    *Phys. Rev. D* **2018,** *97* (12), 123524. 
    [DOI: 10.1103/PhysRevD.97.123524](https://doi.org/10.1103/PhysRevD.97.123524).
    
@GaurangTandon GaurangTandon self-assigned this Dec 3, 2018
@GaurangTandon GaurangTandon added the enhancement New feature or request label Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants