Skip to content
/ PDF2MD Public

Containerized Application to convert pdf to markdown

License

Notifications You must be signed in to change notification settings

ThHanke/PDF2MD

Repository files navigation

PDF2MD

Containerized Application to convert pdf to markdown

Commercial usage

Marker - Submodule

Due to the licensing of the underlying models like layoutlmv3 and nougat, this is only suitable for noncommercial usage (citation from [marker repo] (https://github.com/VikParuchuri/marker)).

  • LayoutLMv3: CC BY-NC-SA 4.0 . Source
  • PyMuPDF - GPL . Source Other dependencies/datasets are openly licensed (doclaynet, byt5), or used in a way that is compatible with commercial usage (ghostscript).

Acknowledgments

This work would not have been possible without [email protected]. and amazing open source models and datasets, including (but not limited to):

  • Nougat from Meta
  • Layoutlmv3 from Microsoft
  • DocLayNet from IBM
  • ByT5 from Google

Thank you to the authors of these models and datasets for making them available to the community!