Containerized Application to convert pdf to markdown
Due to the licensing of the underlying models like layoutlmv3 and nougat, this is only suitable for noncommercial usage (citation from [marker repo] (https://github.com/VikParuchuri/marker)).
- LayoutLMv3: CC BY-NC-SA 4.0 . Source
- PyMuPDF - GPL . Source Other dependencies/datasets are openly licensed (doclaynet, byt5), or used in a way that is compatible with commercial usage (ghostscript).
This work would not have been possible without [email protected]. and amazing open source models and datasets, including (but not limited to):
- Nougat from Meta
- Layoutlmv3 from Microsoft
- DocLayNet from IBM
- ByT5 from Google
Thank you to the authors of these models and datasets for making them available to the community!