The OpenSciMetrics (OSM) application is a command-line tool designed for the evaluation of bibliometric indicators related to transparency, data sharing, rigor, and open science in biomedical publications. The application processes a PDF of a scientific publication, extracts relevant data, and outputs a JSON file with an array of bibliometric indicators.
- PDF File: A PDF document of a biomedical publication.
- Unique Identifier: This can be a DOI (Digital Object Identifier), a PubMed ID, or an OpenAlex ID.
- PDF to XML Conversion: The application will utilize ScienceBeam Parser to convert the PDF document into an XML format.
- Indicator Extraction: Using the rtransparent tool, the application will analyze the XML to extract and generate a set of indicators and metrics regarding the publication's adherence to open science principles.
- JSON File: The output will be a JSON file containing:
- An array of bibliometric indicators and metrics.
- Additional metadata including:
- Version of the OSM application.
- Unique identifier for the Docker container.
- MD5 hashes of the original PDF and the generated XML file.
The architecture will largely mimic an existing application our group has been involved in called MRIQC
(Code, Documentation and the Mongo database Web API are publicly available.)
Much of the apps functionality has been previously implemented in a series of small scripts available in this github repository: https://github.com/nimh-dsst/sharestats-leo-bsc
- Docker: The application will be containerized using Docker, ensuring consistency across different computing environments and facilitating easy distribution and deployment.
- ScienceBeam Parser: Available at https://github.com/elifesciences/sciencebeam-parser
- rtransparent Tool: A tool that will be integrated into the application workflow for analyzing XML data. A docker container for rtransparent is available here
- oddpub: https://github.com/quest-bih/oddpub
- R
- GitHub Repository: The application's source code and documentation will be maintained in the GitHub repository at https://github.com/nimh-dsst/OpenSciMetrics.
- Although some of the tools used in OSM are written in R, it will be written entirely in Python
- The app will be documented using readthedocs, similar to MRIQC.
- Semantic Versioning: The application will adhere to semantic versioning to manage versions of the software effectively.
- Unit Tests: Will cover individual components and functions.
- Integration Tests: To ensure that the components work together as expected.
- Continuous Integration: Automated tests will run for every commit and pull request using GitHub Actions.
- Docker Hub: The Docker image will be available on Docker Hub for easy retrieval and deployment.
- Automated Build: Automated Docker builds will be set up to ensure that the latest version is always available for deployment.
The application will be run from the command line within the Docker container, taking the following arguments:
docker run -v /path/to/pdf:/data osm-image <PDF file path> <Unique Identifier>
This specification outlines the requirements and design for the OpenSciMetrics application, setting the groundwork for development, deployment, and usage in assessing open science practices in biomedical research.