Converting Chinese Pdf to HTML

pdfminer-20140328/ & pdftables-0.0.4/: Libraries' source code, could be downloaded from their official websites

test/: something I did when testing the basic functions of pdfminer, might give you an insight on what's going on by providing you with the simple version of the code. They are extended to more complex implementation latter.

mycode/: this is my implementation on converting pdf files to HTML files

mycode/pdfminer/: the necessary files from pdfminer library

mycode/data/: where all input (PDF) & output (HTML) files are located at

mycode/convert.py: this is the file served as the main entrance; you assign the name (and path) of the files you are to convert and then run it to convert the files. A function (get_PDF_fnames) is provided to collect all PDF files that are located under a certain directory.

mycode/simplePDF2html.py: here I implemented a class called simplePDF2HTML, which is capable of extracting text contents from a given PDF file, as well as tables and their contents (it works well for most of the cases), outlines of the file (and I embedded the reference links into the generated HTML files, making the outlines clickable; error-free for most cases).

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
mycode		mycode
pdfminer-20140328		pdfminer-20140328
pdftables-0.0.4		pdftables-0.0.4
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
note.txt		note.txt
pdfminer-20140328.tar.gz		pdfminer-20140328.tar.gz
pdfminer-docs.pdf		pdfminer-docs.pdf
pdftables-0.0.4.tar.gz		pdftables-0.0.4.tar.gz
问题描述.docx		问题描述.docx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Converting Chinese Pdf to HTML

About

Releases

Packages

Languages

PatriciaXiao/PDF_to_HTML

Folders and files

Latest commit

History

Repository files navigation

Converting Chinese Pdf to HTML

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages