xps/pdf/png/json转换 #18

wanghaisheng · 2016-08-12T07:37:15Z

http://www.jacobfenton.com/

I’m a journalist and software developer based in Portland, Oregon. I've spent the last decade working as a reporter, editor, and programmer in newsrooms and nonprofits in the U.S.

During the 2015-16 academic year I was a John S. Knight Journalism Fellow at Stanford University researching ways to make complex document processing affordable to reporters. I’m especially interested in turning unstructured images into data, and building tools to mine actionable news tips from some of the dullest corners of the web. You can read more about that project here.

Previously I was editorial engineer at The Sunlight Foundation, where I worked extensively on campaign finance, TV ad disclosure, and House and Senate expenditure reporting. Prior to that I was Director of Computer-Assisted Reporting, at the Investigative Reporting Workshop, a nonprofit at American University. I also reported for several newspapers in Pennsylvania.

Long ago I was an undergraduate physics major, and got my first real taste of programming hacking on C++ code to look at engineering runs at LIGO Hanford.

I can be reached at jsfenfen at gmail dot com.

dannyedel/dspdfviewer#163
这个pdf浏览器是能够正常查看无法使用poppler-util中自带的pdftohtml转换成正常中文的pdf文件

table detect

https://github.com/Booppey/table-detection
https://github.com/transpect/evolve-hub

pdf ocr

OCRmyPDF
pdfsandwich

pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (no text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images.

pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text.

Essentially, pdfsandwich is a wrapper script which calls the following binaries: unpaper (since version 0.0.9), convert, gs, hocr2pdf (for tesseract prior to version 3.03), and tesseract. It is known to run on Unix systems and has been tested on Linux and MacOS X. It supports parallel processing on multiprocessor systems.

While pdfsandwich works with any version of tesseract from version 3.0 on, tesseract 3.03 or later is recommended for best performance. By default, pdfsandwich runs unpaper to enhance the readability of scanned pages and to improve OCR. For instance, slightly rotated pages are automatically straightened and dark edges removed. For optimally scanned pdf files, this can be switched off by option -nopreproc to speed up processing.

3. xps<-->png/jpeg

My work-around is to save the PDF as a lossless or near lossless image such as .tiff format, then create a new PDF from the image and run OCR. Thus I lose no clarity/sharpness in the PDF image and get accurate OCR content that can be copied and pasted. And, yes, lots of folks do something similar with screenshots from protected PDFs to grab all the text (without the need to retype it). Simple non-expert scripts (such as Tornado's "Do It Again" freeware) and PDF generating software make it easy to process hundreds of pages quickly and accurately (at least as accurately as OCR from images can be from relatively high-res images - not screenshots of documents you are not zooming in on or otherwise capturing with tremendously low spatial resolution relative to the original document).

https://github.com/wanghaisheng/pdfconvertme-public

4. pdf<-->png/jpeg

5. png/jpeg<-->json

1 Online service for xps<--->pdf

2. Library for xps<--->pdf

(1) gs/gxps

(2) xpdf

(2.1) BePDF:This is a PDF reader that is based on XPDF 3.04. It handles PDF files up to PDF version 1.7 (Adobe Reader 9+).

(2.2) poppler-utils :Poppler is a PDF rendering library based on the xpdf-3.0 code base

(2.3)pdf2htmlEX based on poppler Fontforge

(3) libgxps

(4) Aspose.Pdf not free sdk

(5) mupdf

reference for xps<--->pdf

libgxps-utils

3. pdf<--->html<-->json

pdfminer

pdfminer pdf to html/txt demo

pdf2htmlEX

reference for pdf<--->html

1. Simple (open) PDF to text service #52
  https://github.com/Micka33/content-extractor
  https://github.com/euske/pdfminer
  https://github.com/galkahana/HummusJS
  https://github.com/EbenZhang/PdfSharp.XPS
  https://github.com/modesty/p2jsvc
  https://github.com/modesty/pdf2json
  https://github.com/coolwanglu/pdf2htmlEX

others

All the fun of converting XPS files to PDF

The best way to view an XPS is to use Mupdf
The best way to convert it to PDF is to use a wrapper around gxps.
The best way to convert it to a PNG might be another wrapper around gxps or it might be to use Mudraw.
And the best way to extract the text from an XPS is still to run KDE in a virtual machine.

pdf embed font的处理
http://stackoverflow.com/questions/11093051/handling-remapping-missing-problematic-cid-cjk-fonts-in-pdf-with-ghostscript?rq=1
https://github.com/pts/pdfsizeopt
http://stackoverflow.com/questions/2656329/linux-pdf-postscript-optimizing
http://www.aivosto.com/vbtips/pdf-optimize.html

http://stackoverflow.com/questions/21279548/facing-issues-on-extracting-text-from-pdf-file-using-java

http://stackoverflow.com/questions/29633504/embedded-fonts-in-pdf-copy-and-paste-problems?rq=1
http://stackoverflow.com/questions/18762625/get-information-whether-text-is-extractable-from-pdf?rq=1
http://stackoverflow.com/questions/30222424/copy-text-from-pdf-with-custom-font?rq=1
http://stackoverflow.com/questions/3488042/how-can-i-extract-embedded-fonts-from-a-pdf-as-valid-font-files/3489099#3489099

http://stackoverflow.com/questions/7140476/pdf-font-mapping-error?rq=1
http://stackoverflow.com/questions/11093051/handling-remapping-missing-problematic-cid-cjk-fonts-in-pdf-with-ghostscript?rq=1
http://stackoverflow.com/questions/25602262/ghostscript-re-encoding-embedded-font?rq=1
http://stackoverflow.com/questions/28797418/replace-all-font-glyphs-in-a-pdf-by-converting-them-to-outline-shapes?rq=1
http://stackoverflow.com/questions/15722099/issues-decoding-flate-from-pdf-embedded-font?rq=1
http://stackoverflow.com/questions/3647940/pdf-on-linux-combine-font-subsets-and-replace-type-3-with-type-1?rq=1
http://stackoverflow.com/questions/3036373/altering-an-embedded-truetype-font-so-it-will-be-usable-by-windows-gdi?rq=1

wanghaisheng · 2016-08-12T09:19:41Z

针对 xps pdf 图片处理的整体pipeline
xps----->xps解析模块------>json中间格式------>xps提取模块------->键值对
xps----->xps解析模块------>XML中间格式------>xps提取模块------->键值对
pdf----->pdf解析模块------>解析成txt------->pdf提取模块
pdf----->pdf解析模块 ------>解析成html
pdf----->pdf解析模块 ------>解析成xml
pdf----->pdf解析模块------>解析成json

wanghaisheng · 2016-08-12T11:24:12Z

wanghaisheng · 2016-08-15T10:14:46Z

for xpdf

由于该库与poppler-util功能一致但又无人维护主要可以提取txt 提取html 查看字体查看基本信息提取嵌套的图片

236
docker run -it --rm --name pdf-miner-demo -v /home/wanghs/dockerfiles-repo/docker-for-fun/docker-alpine/projects/pdf-parser:/tmp dc/alpine-python2 /bin/sh

The Xpdf package honors these permission settings. Specifically:

xpdf will not copy/paste from a PDF file which disallows copying text/graphics
xpdf and pdftops will not print (convert to PostScript) a PDF file which disallows printing
pdftotext will not convert a PDF file which disallows copying text/graphics
pdfimages will not extract images from a PDF file which disallows copying text/graphics

From ubuntu:15.10

# docker build -t dc/xpdf .

ADD sources.list /etc/apt/sources.list
ADD . /tmp
RUN cd /tmp && \
    tar xvf xpdfbin-linux-3.04.tar.gz && \
    cd xpdfbin-linux-3.04 && \
    cp bin64/* /usr/local/bin &&  mkdir /usr/local/man/man1 && mkdir /usr/local/man5 && cp doc/*.1 /usr/local/man/man1 && cp doc/*.5 /usr/local/man/man5 && \
    cd /tmp && tar xvf xpdf-chinese-simplified.tar.gz && \
    cd xpdf-chinese-simplified && mkdir /usr/local/share/xpdf &&  \
    mkdir /usr/local/share/xpdf/chinese-simplified && \
    mv * /usr/local/share/xpdf/chinese-simplified && \
    mv /tmp/xpdfrc /usr/local/etc/xpdfrc

xpdfrc

#========================================================================
#
# Sample xpdfrc file
#
# The Xpdf tools look for a config file in two places:
# 1. ~/.xpdfrc
# 2. in a system-wide directory, typically /usr/local/etc/xpdfrc
#
# This sample config file demonstrates some of the more common
# configuration options.  Everything here is commented out.  You
# should edit things (especially the file/directory paths, since
# they'll likely be different on your system), and uncomment whichever
# options you want to use.  For complete details on config file syntax
# and available options, please see the xpdfrc(5) man page.
#
# Also, the Xpdf language support packages each include a set of
# options to be added to the xpdfrc file.
#
# http://www.foolabs.com/xpdf/
#
#========================================================================

#----- display fonts

# These map the Base-14 fonts to the Type 1 fonts that ship with
# ghostscript.  You'll almost certainly want to use something like
# this, but you'll need to adjust this to point to wherever
# ghostscript is installed on your system.  (But if the fonts are
# installed in a "standard" location, xpdf will find them
# automatically.)

#fontFile Times-Roman       /usr/local/share/ghostscript/fonts/n021003l.pfb
#fontFile Times-Italic      /usr/local/share/ghostscript/fonts/n021023l.pfb
#fontFile Times-Bold        /usr/local/share/ghostscript/fonts/n021004l.pfb
#fontFile Times-BoldItalic  /usr/local/share/ghostscript/fonts/n021024l.pfb
#fontFile Helvetica     /usr/local/share/ghostscript/fonts/n019003l.pfb
#fontFile Helvetica-Oblique /usr/local/share/ghostscript/fonts/n019023l.pfb
#fontFile Helvetica-Bold        /usr/local/share/ghostscript/fonts/n019004l.pfb
#fontFile Helvetica-BoldOblique /usr/local/share/ghostscript/fonts/n019024l.pfb
#fontFile Courier       /usr/local/share/ghostscript/fonts/n022003l.pfb
#fontFile Courier-Oblique   /usr/local/share/ghostscript/fonts/n022023l.pfb
#fontFile Courier-Bold      /usr/local/share/ghostscript/fonts/n022004l.pfb
#fontFile Courier-BoldOblique   /usr/local/share/ghostscript/fonts/n022024l.pfb
#fontFile Symbol            /usr/local/share/ghostscript/fonts/s050000l.pfb
#fontFile ZapfDingbats      /usr/local/share/ghostscript/fonts/d050000l.pfb

# If you need to display PDF files that refer to non-embedded fonts,
# you should add one or more fontDir options to point to the
# directories containing the font files.  Xpdf will only look at .pfa,
# .pfb, .ttf, and .ttc files in those directories (other files will
# simply be ignored).

#fontDir        /usr/local/fonts/bakoma

#----- PostScript output control

# Set the default PostScript file or command.

#psFile         "|lpr -Pmyprinter"

# Set the default PostScript paper size -- this can be letter, legal,
# A4, or A3.  You can also specify a paper size as width and height
# (in points).

#psPaperSize        letter

#----- text output control

# Choose a text encoding for copy-and-paste and for pdftotext output.
# The Latin1, ASCII7, and UTF-8 encodings are built into Xpdf.  Other
# encodings are available in the language support packages.

#textEncoding       UTF-8

# Choose the end-of-line convention for multi-line copy-and-past and
# for pdftotext output.  The available options are unix, mac, and dos.

#textEOL        unix

#----- misc settings

# Enable FreeType, and anti-aliased text.

#enableFreeType     yes
#antialias      yes

# Set the command used to run a web browser when a URL hyperlink is
# clicked.

#launchCommand  viewer-script
#urlCommand "netscape -remote 'openURL(%s)'"
#----- begin Chinese Simplified support package (2011-sep-02)
cidToUnicode    Adobe-GB1   /usr/local/share/xpdf/chinese-simplified/Adobe-GB1.cidToUnicode
unicodeMap  ISO-2022-CN /usr/local/share/xpdf/chinese-simplified/ISO-2022-CN.unicodeMap
unicodeMap  EUC-CN      /usr/local/share/xpdf/chinese-simplified/EUC-CN.unicodeMap
unicodeMap  GBK     /usr/local/share/xpdf/chinese-simplified/GBK.unicodeMap
cMapDir     Adobe-GB1   /usr/local/share/xpdf/chinese-simplified/CMap
toUnicodeDir            /usr/local/share/xpdf/chinese-simplified/CMap

fontFileCC1 Adobe-GB1   /usr/local/share/xpdf/chinese-simplified/fonts/gbsn00lp.ttf
fontFileCC2 Adobe-GB1   /usr/local/share/xpdf/chinese-simplified/fonts/gkai00mp.ttf

#----- end Chinese Simplified support package

原始文件为

1.xps.zip

root@30c7a47055f5:/tmp# xpstopdf test-data/original-file/1.xps

转换后得到的
1xpsto.pdf.zip

root@6334724bdee5:/tmp# pdffonts original-file/1xpsto.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
LCRSCA+SimSun                        CID TrueType      Identity-H       yes yes yes      5  0
PAIKWA+SimSun                        TrueType          WinAnsi          yes yes yes      6  0

pdfminer 能够正常处理

root@30c7a47055f5:/tmp# pdf2txt.py -o test-data/original-file/1.x.output.html -Y exact test-data/original-file/1xpsto.pdf

得到的结果为

1.x.output.html.zip

同样的一份报告 pdf如下
1.pdf.zip

root@6334724bdee5:/tmp# pdffonts 1.pdf
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
SRPUEP+SimSun                        TrueType          WinAnsi          yes yes yes     13  0
root@6334724bdee5:/tmp# pdfinfo 1.pdf 
Creator:        Online2PDF.com
Producer:       Online2PDF.com
CreationDate:   Sat Aug 13 07:42:20 2016 UTC
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      594.75 x 419.25 pts
Page rot:       0
File size:      31707 bytes
Optimized:      no
PDF version:    1.4

pdfminer 处理的结果则为cid乱码

root@30c7a47055f5:/tmp# pdf2txt.py -o test-data/1.x.output.html -Y exact test-data/1.pdf

1.x.output.html.zip

惊喜的是
直接利用xpdf 自带的lib库 pdftohtml对上面xps得到的pdf进行处理

root@6334724bdee5:/tmp# pdftohtml  original-file/1xpsto.pdf  22.html

能够得到很好的结果

poppler本身是基于xpdf的 库也都是一样的 结果一致
[22s.zip](https://github.com/clear-datacenter/plan/files/524885/22s.zip)
[22-poppler.html.zip](https://github.com/clear-datacenter/plan/files/524886/22-poppler.html.zip)

解决方案似乎是在 fontconfig 配置将通过pdffonts 1.pdf检测出来的字体全部替换为标准字体也就是我们期望中的几种
例如 SRPUEP+SimSun

https://lists.freedesktop.org/archives/poppler-bugs/2013-November/010909.html

1.对于这个特殊的pdf文件使用 adobe reader 复制出来的就是乱码啥文本都没有
2.使用OCRmyPDF转换后编码丢失
ocrmypdf/OCRmyPDF#99
3.使用pdf.js 读取也未果
mozilla/pdf.js#7712
按照pdf.js作者的建议只能走OCR了

4.按照这里的建议使用gs 重建该pdf
http://stackoverflow.com/questions/12703387/pdf-font-encoding-why-cant-i-copy-text-from-a-pdf
http://stackoverflow.com/questions/12703387/pdf-font-encoding-why-cant-i-copy-text-from-a-pdf

root@6334724bdee5:/tmp# pdfinfo 11.pdf
Creator:        Online2PDF.com
Producer:       Online2PDF.com
CreationDate:   Sat Aug 13 07:42:20 2016 UTC
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      594.75 x 419.25 pts
Page rot:       0
File size:      31707 bytes
Optimized:      no
PDF version:    1.4
root@6334724bdee5:/tmp# pdfinfo 1.pdf
Creator:        Online2PDF.com
Producer:       Online2PDF.com
CreationDate:   Sat Aug 13 07:42:20 2016 UTC
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      594.75 x 419.25 pts
Page rot:       0
File size:      31707 bytes
Optimized:      no
PDF version:    1.4
root@6334724bdee5:/tmp# pdfinfo 2.pdf
Creator:        Online2PDF.com
Producer:       Online2PDF.com
CreationDate:   Sat Aug 13 07:42:20 2016 UTC
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          1
Encrypted:      no
Page size:      594.75 x 419.25 pts
Page rot:       0
File size:      37416 bytes
Optimized:      no
PDF version:    1.4

暂时套上OCR解决了问题
ocrmypdf/OCRmyPDF#99
pdfminer也能顺利提取文本了

看起来需要将字体变换

可提取
root@6334724bdee5:/tmp# pdffonts 11-ocr-output.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
EGKPEK+GlyphLessFont                 CID TrueType      Identity-H       yes yes yes     10  0

不可提取
root@6334724bdee5:/tmp# pdffonts 11.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
SRPUEP+SimSun                        TrueType          WinAnsi          yes yes yes     13  0

wanghaisheng · 2016-08-15T16:20:27Z

for libgxps2 libgxps-utils
利用该库转换之前pdfminer识别会出现乱码的xps 能够正常显示中文

➜  xpstools git:(master) ✗ cat Dockerfile 

From ubuntu:15.10

# docker build -t dc/xpstools .
# docker run --rm --name xpstools -it -v $(pwd):/tmp dc/xpstools /bin/sh
ADD sources.list /etc/apt/sources.list
ADD .  /tmp
RUN apt-get -y update && apt-get install -y make gcc libgxps2 libgxps-utils 
#root@31c08e7de6a0:/tmp# xpstopdf 3.xps

The default conversion on Ubuntu is in a library called libgxps. This is used by Evince (the default document viewer), and a number of command-line tools. One tool is xpstopdf, which sounds just right.

问题现在是无论是使用在线还是libgxps转换得到的pdf 对于测试文件3 利用pdfminer得到的html 版面分析都是错误的

wanghaisheng · 2016-08-15T17:18:35Z

from xps <-->pdf<-->html<-->json

for pdfminer

based on ALPINE LINUX

# docker build -t dc/alpine-python2 .
FROM dc/alpine

RUN apk update && apk upgrade

RUN apk add python

# Clean APK cache
RUN rm -rf /var/cache/apk/*

docker run -it --rm --name pdf-miner-demo -v /home/wanghs/dockerfiles-repo/docker-for-fun/docker-alpine/projects/pdf-parser:/tmp dc/alpine-python2 /bin/sh

将pdfminer嵌在or的镜像里提供http服务


[wanghs@db2 alpine-or-python2-based-pdfminer]$ cat Dockerfile
FROM dc/openresty-alpine

#  docker build -t dc/alpine-or-python2-pdfminer .
RUN apk update && apk upgrade

RUN apk add python git

# Clean APK cache
RUN rm -rf /var/cache/apk/*

RUN cd /tmp &&  git https://github.com/euske/pdfminer && \
   cd pdfminer && python setup.py install

FROM alpine:3.3
MAINTAINER edwin_uestc <[email protected]>

ENV LUA_SUFFIX=jit-2.1.0-beta1 \
    LUAJIT_VERSION=2.1 \
    NGINX_PREFIX=/opt/openresty/nginx \
    OPENRESTY_PREFIX=/opt/openresty \
    OPENRESTY_SRC_SHA1=1a2029e1c854b6ac788b4d734dd6b5c53a3987ff \
    OPENRESTY_VERSION=1.9.7.3 \
    VAR_PREFIX=/var/nginx

RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/' /etc/apk/repositories


RUN set -ex \
  && apk --no-cache add --virtual .build-dependencies \
    curl \
    make \
    musl-dev \
    gcc \
    ncurses-dev \
    openssl-dev \
    pcre-dev \
    perl \
    readline-dev \
    zlib-dev \
  \
  && curl -fsSL http://openresty.org/download/openresty-${OPENRESTY_VERSION}.tar.gz -o /tmp/openresty.tar.gz \
  \
  && cd /tmp \
  && echo "${OPENRESTY_SRC_SHA1} *openresty.tar.gz" | sha1sum -c - \
  && tar -xzf openresty.tar.gz \
  \
  && cd openresty-* \
  && readonly NPROC=$(grep -c ^processor /proc/cpuinfo 2>/dev/null || 1) \
  && ./configure \
    --prefix=${OPENRESTY_PREFIX} \
    --http-client-body-temp-path=${VAR_PREFIX}/client_body_temp \
    --http-proxy-temp-path=${VAR_PREFIX}/proxy_temp \
    --http-log-path=${VAR_PREFIX}/access.log \
    --error-log-path=${VAR_PREFIX}/error.log \
    --pid-path=${VAR_PREFIX}/nginx.pid \
    --lock-path=${VAR_PREFIX}/nginx.lock \
    --with-luajit \
    --with-pcre-jit \
    --with-ipv6 \
    --with-http_ssl_module \
    --without-http_ssi_module \
    --with-http_realip_module \
    --without-http_scgi_module \
    --without-http_uwsgi_module \
    --without-http_userid_module \
    -j${NPROC} \
  && make -j${NPROC} \
  && make install \
  \
  && rm -rf /tmp/openresty-* \
  && apk del .build-dependencies

RUN ln -sf ${NGINX_PREFIX}/sbin/nginx /usr/local/bin/nginx \
  && ln -sf ${NGINX_PREFIX}/sbin/nginx /usr/local/bin/openresty \
  && ln -sf ${OPENRESTY_PREFIX}/bin/resty /usr/local/bin/resty \
  && ln -sf ${OPENRESTY_PREFIX}/luajit/bin/luajit-* ${OPENRESTY_PREFIX}/luajit/bin/lua \
  && ln -sf ${OPENRESTY_PREFIX}/luajit/bin/luajit-* /usr/local/bin/lua

RUN apk --no-cache add \
    libgcc \
    libpcrecpp \
    libpcre16 \
    libpcre32 \
    libssl1.0 \
    libstdc++ \
    openssl \
    pcre

WORKDIR $NGINX_PREFIX

CMD ["nginx", "-g", "daemon off; error_log /dev/stderr info;"]

https://github.com/felipeochoa/minecart

based on UBUTUN



From ubuntu:15.10

#  docker build -t dc/pdfminer-programming .
# docker run --rm --name pdfminer-programming-demo -it -v $(pwd):/tmp dc/pdfminer-programming /bin/sh
ADD sources.list /etc/apt/sources.list



RUN apt-get -y update && apt-get install -y make gcc git curl libgxps2 libgxps-utils   xz-utils zlib1g-dev python-pip

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

# gpg: key 18ADD4FF: public key "Benjamin Peterson <[email protected]>" imported
RUN gpg --keyserver ha.pool.sks-keyservers.net --recv-keys C01E1CAD5EA2C4F0B8E3571504C367C218ADD4FF

ENV PYTHON_VERSION 2.7.12

# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"
ENV PYTHON_PIP_VERSION 8.1.2

RUN set -x \
    && mkdir -p /usr/src/python \
    && curl -L "http://mirrors.sohu.com/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tar.xz" -o python.tar.xz \
    && tar -xJC /usr/src/python --strip-components=1 -f python.tar.xz \
    && rm python.tar.xz* \
    && cd /usr/src/python \
    && ./configure --enable-shared --enable-unicode=ucs4 \
    && make -j$(nproc) \
    && make install \
    && ldconfig \
    && pip install -i https://pypi.tuna.tsinghua.edu.cn/simple  -U pip  \
    && pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip==$PYTHON_PIP_VERSION \
    && find /usr/local \
        \( -type d -a -name test -o -name tests \) \
        -o \( -type f -a -name '*.pyc' -o -name '*.pyo' \) \
        -exec rm -rf '{}' + \
    && rm -rf /usr/src/python

# install "virtualenv", since the vast majority of users of this image will want it
RUN pip install --no-cache-dir virtualenv


#root@31c08e7de6a0:/tmp# xpstopdf 3.xps  

RUN cd /tmp  && \
    git clone https://github.com/wanghaisheng/pdfminer && \
   cd pdfminer  && make cmap &&  python setup.py install

步骤一
无论文件是不是pdf 都使用转换工具重新转换一遍这个可能需要gs
https://github.com/coolwanglu/pdf2htmlEX/wiki/Optimizing-PDF-Files



Lots of PDF files have contents that might not be necessary, for example, annotations, unused objects and deleted objects. These information can be removed without affecting the visual, while making the PDF files smaller.

Note that pdf2htmlEX is designed to be a converter but not an optimizer, so generally it's a good idea to optimize the PDF file before feed them to pdf2htmlEX.
PDF Optimizers

Ghostscript is an open source tool that can be used to optimize PDF file. You can try gs -sDEVICE=pdfwrite -sOutputFile='output.pdf' -dNOPAUSE -dBATCH input.pdf, or more advanced options. You should also read its documentation for the full power of Ghostscript.

Others include Adobe Acrobat, or any you can find online. The tools have different advantages and disadvantages, so you should try different ones on your files and find the best one for you.

步骤二
使用pdfminer生成html文件

pdf2txt.py -o output.html -Y exact 3.xps.pdf

步骤三
对生成的html进行预处理得到raw json
步骤四
将raw json拆解成目标json
对于每一个块计算行数

import subprocess
subprocess.call("pdf2htmlEX /path/to/foobar.pdf", shell=True)

wanghaisheng · 2016-08-15T17:19:16Z

from xps <-->pdf<--->png/jpeg<-->html<-->json

wanghaisheng · 2016-08-15T17:19:58Z

from xps <--->png/jpeg
pdf<-->png/jpeg

wanghaisheng · 2016-08-15T17:20:20Z

如果两个top值小于设置的阈值比如说3px 则将较大的替换为较小的
比如
根据

<span style="position:absolute; color:black; left:247px; top:82px; font-size:26px;">属</span>

中top可能值的数目决定整个版面的行数如果同样的top值后续出现比它小的top 然后又重复出现原来的top值需要将这两组top合并为一组以较小top值为准

这里如果同样的top值后续出现比它大的多的top 然后又重复出现原来的top值需要将该大很多的top移动到对应的顺序中去

<span style="position:absolute; color:black; left:49px; top:154px; font-size:12px;">姓</span>
<span style="position:absolute; color:black; left:62px; top:154px; font-size:12px;">名</span>
<span style="position:absolute; color:black; left:76px; top:154px; font-size:12px;">：</span>

<span style="position:absolute; color:black; left:94px; top:152px; font-size:14px;">姚</span>
<span style="position:absolute; color:black; left:110px; top:152px; font-size:14px;">荣</span>
<span style="position:absolute; color:black; left:126px; top:152px; font-size:14px;">炳</span>

<span style="position:absolute; color:black; left:205px; top:154px; font-size:12px;">性</span>
<span style="position:absolute; color:black; left:219px; top:154px; font-size:12px;">别</span>
<span style="position:absolute; color:black; left:232px; top:154px; font-size:12px;">：</span>

对于html中包含多个border的情况，先对所有border的span 按照width排序去重只保留最大的10个或5个值(可配置 ) 然后根据left值进行排序移除left值大于50 width值小于300的span

1.对width排序移除第五位以下的所有span(移除width小于500的)
2. 对于width值排前五的span 移除left值第五位(这里取50)
3. 对于剩下的按照top排序如果遇到相同top值的span 保留width最大的那一个

以线为分隔符将整个版面划分为若干块

<span style="position:absolute; border: black 1px solid; left:39px; top:230px; width:520px; height:0px;"></span>
<span style="position:absolute; border: black 1px solid; left:39px; top:149px; width:520px; height:0px;"></span>
<span style="position:absolute; border: black 1px solid; left:39px; top:859px; width:520px; height:0px;"></span>

4 .对块内所有出现的top值进行排序如果相邻两个top值的差小于该两个top值对应的font-size之和，则认为该两个相邻的top为同一line,如果差大于该两个top值对应的font-size之和，则认为是不同的line

5 . 在单独的每个分块中对字体进行归一化以小的为准且对同样的top值的所有left值进行排序如果连续两个left值之差小于font-size 则对比较大的left(包含该left值在内)所有left值加上font-size的值

6 .如果同样top的两个left值之差大于两个font-size值/或可配置的值则认为其是block区块之间具有意义的分隔其他则视为不具有意义为区块内部的分隔区块内部所有的值中间的空格需要移除 (亦可以对所有left值进行归一化保持最小值，按照次序在最小值基础上依次加上font-size值这种方法如果top标记的本身就是空格则拼接字符串后仍然需要移除空格故放弃)

7 .同样的top 也就是同一行中要计算下一个字符与上一个字符是否属于同一个segment，只需要在第一个字符的left值基础上加上从第一个字符到待计算字符之间所有字符的font-size值，如果对应字符的left值小于该预期值，则认为该字符与上一个字符属于同一个segment 如果对应字符的left值大于该预期值，则认为该字符与上一个字符属于不同的segment

初步处理后的json如下所示:
由于无法将值与键分开暂时将所有内容放在值里面待后续处理

block 代表由横线等分割出的大块
lines 代表每块中的行
segments 代表每行中的不同段落

confidence 代表

{
    "blocks": [
        {
            "id": 1,
            "lines": [
                {
                    "id": 1,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",
                            "key": "南 京 医 科 大 学 附 属 常 州 市 第 二 人 民 医 院"
                        }
                    ]
                },
                {
                    "id": 2,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "M R I 诊 断 报 告 单"
                        }
                    ]
                }
            ]
        },
        {
            "id": 2,
            "lines": [
                {
                    "id": 1,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "姓名：姚荣炳"
                        },
                        {
                            "id": 2,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",
                            "key": "性别：男"
                        },
                        {
                            "id": 3,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "年龄：78岁"
                        },
                        {
                            "id": 4,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "检查号：F1440092"
                        }
                    ]
                },
                {
                    "id": 2,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "科 室 ： 神 经 内 科  "
                        },
                        {
                            "id": 2,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "病 区 ：十八病区"
                        },
                        {
                            "id": 3,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "床 号 ： 3 4  "
                        },
                        {
                            "id": 4,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": " 住 院 号 ： 6 9 9 0 7 5"
                        }
                    ]
                },
                {
                    "id": 3,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "门诊号： "
                        },
                        {
                            "id": 2,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": " 报告日期：2015-06-15"
                        },
                        {
                            "id": 3,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "报告时间 ： 17：14：28 "
                        }
                    ]
                }
            ]
        },
        {
            "id": 3,
            "lines": [
                {
                    "id": 1,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "检查名称：颅脑MR平扫+弥散成像"
                        }
                    ]
                },
                {
                    "id": 2,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "影像表现 ： xxxxxx  "
                        }
                    ]
                },
                {
                    "id": 3,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "诊断：(1)xxxxxx (2)xxxxx (3)xxxxxxx "
                        }
                    ]
                },
                {
                    "id": 4,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "建议： "
                        }
                    ]
                },
                {
                    "id": 5,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "报告医师： "
                        },
                        {
                            "id": 2,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": " 审核医师："
                        }
                    ]
                }
            ]
        },
        {
            "id": 4,
            "lines": [
                {
                    "id": 1,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "本报告仅供临床参考"
                        }
                    ]
                },
                {
                    "id": 2,
                    "segments": [
                        {
                            "id": 1,
                            "confidence": "AAAAAA",
                            "pos": "160-42-192-74|197-42-229-75|237-47-246-71|466-42-498-74|503-42-535-75|540-47-554-71",                            
                            "key": "金马扬名  www.jinpacs.com"
                        }
                    ]
                }
            ]
        }
    ]
}

wanghaisheng · 2016-08-15T17:21:00Z

其他参考资料

Fonts in PDF

With the goal of getting tests\viewer.py to render fonts appropriately, I've been reading about how to extract font programs from PDF documents so that they can be displayed on tkinter. Loading fonts into tkinter is a non-trivial task, but I have a solution for loading fonts on Windows, provided they are in a recognized format and we have the name of the family. (see my stackoverflow answer for details).

The question then becomes: how do we extract the font family name and the embedded font program (if any) from the PDF document? I'm putting together this wiki page to keep track of my efforts towards that question.

How are fonts stored/referenced in PDF?

When drawing text on a PDF page, the application keeps track of what's known as the text state. In the text state, there is a parameter T_f called text font. Whenever text is drawn on the page, it is drawn using the font stored in the T_f field of the text state. The text font is set and updated through the use of the Tf graphics operator.

When using the Tf operator, the first argument is "the name of a font resource in the Font subdictionary of the current resource dictionary" (p. 398). These font resources are themselves dictionaries, identified by having their 'Type' set to /Font [1]. Using minecart and pdfminer, we can explore these structures with the following code:

import minecart
import pdfminer.pdfpage
doc = minecart.Document(open("path/to/sample.pdf", 'rb'))
page = next(pdfminer.pdfpage.PDFPage.create_pages(doc.doc))
fonts = page.resources['Font']
print fonts
# {'F0': <PDFObjRef:7>}
font = fonts['F0'].resolve()
print font
# {'Encoding': /Identity-H,
#  'BaseFont': /HDIABS+AlbanyWTTC-Identity-H,
#  'DescendantFonts': [<PDFObjRef:26>],
#  'Subtype': /Type0,
#  'ToUnicode': <PDFObjRef:25>,
#  'Type': /Font}

At this point, the exercise become more of a choose-your-own-adventure, since it will largely depend on the fonts that are referenced in your document.

The different types of PDF fonts

PDF allows documents to use a variety of font formats, which can be embedded with the document, included in the viewer application, or found elsewhere in the system. Font types are identified by the /Subtype entry in the dictionary; fonts can be in the following formats:

Type 1

/Subtype = /Type1. Type 1 fonts fall into two categories, distinguished by their /BaseFont key.

The 14 standard fonts: There are 14 fonts (Times roman, bold, italic, and bold/italic; Helvetica
plain, bold, oblique, and bold/oblique; Courier plain, bold, oblique, and bold/oblique; Symbol and
ZapfDingbats) that get special treatment in PDF. Tk takes care of aliasing the first three families properly across
platforms. Symbol looks like is frowned upon to use nowadays, so won't worry about it for now (see issue #7). ZapfDingbats I think can be incorporated by buying the ITC Zapf
Dingbats® Std Medium, but I'm not sure the license would allow me to distribute it.
Everything else: The /BaseFont key is in theory the font's
family name that can be used as the argument to the Font(family=XXX) call in tkinter.
[2] If the font program is embedded, it is a stream referenced
from the font descriptor subdictionary (font['/FontDescriptor']) in one of the following places:
- Under the /FontFile key, in the non-compact Type 1 format
- Under the /FontFile3 key, in the compact Type 1 format. In this case, the stream must contain a /Subtype
  key with value /Type1C
- Under the /FontFile3 key, as an OpenType font program. In this case the stream must contain a /Subtype
  key with value OpenType

Multiple Master

/Subtype = /MMType1. Per Wikipedia, "Current application support for these fonts is sparse, if not entirely absent." So we won't support them either.

TrueType

/Subtype = /TrueType.

TrueType fonts must also have a /BaseFont key, whose value may be used to look up the font in the central
repository. In some (rare ?) cases [3], the name may be mangled and thus not usable. The entire font program can be embedded under the /FontFile2 key of the /FontDescriptor subdictionary. It can also appear under the /FontFile3 key as an OpenType font program if the stream has subtype /OpenType.

Type 3

/Subtype = /Type3.

Type 3 fonts have no font program to embed or reference. Instead, they specify PDF graphics procedures for rendering each character as a PDF shape. Rendering the text is thus a job for the shape engine and not for the text engine. I'd have to investigate how pdfminer handles Type3 fonts, since it's possible this is taken care of already. If not, it would require adding support for Type 3 fonts through the interpreter class.

Type 0

/Subtype = /Type0.

Type 0 fonts are also called "composite fonts" in the spec. They have a "subfont" that's stored in the /DescendantFonts entry of the main font dicitonary.[4] Type 0 fonts can contain two types of embedded subfonts, distinguished by the value of their /SubType entry:

Type 0 subfont: /Subtype = /CIDFontType0. Must appear under FontFile3. If the stream has subtype
/CIDFontType0C, the font is in Compact Font Format. Otherwise, the stream must have subtype /OpenType, and
the font program will be in OpenType form.
Type 2 subfont: /Subtype = /CIDFontType2.Can appear under FontFile3 in OpenType form if the stream has
subtype /OpenType. Otherwise, it must appear under FontFile2 and will be in TrueType format.

Footnotes

Strictly speaking, it's not sufficient, since Type 2 CIDFonts
also have Type set to /Font, but aren't actually PDF font instances. (back to content)
For both Type 1 and TrueType fonts, the /BaseFont entry may begin with 6 uppercase
letters followed by a + sign that are extraneous to the font's family name and should be stripped out. This
naming style indicates that only a subset of the font is used. I'm still not sure how to deal with these.
(back to content)
Namely, when the font doesn't include the optional PostScript name in the name table
and the font family name has spaces in it. The name can also be mangled if "the font in a source document
uses a bold or italic style but there is no font data for that style" (p. 418). (back to content)
The actual value stored is a 1-element array containing the subfont as its only
element. (back to content)

wanghaisheng · 2016-08-17T08:32:41Z

for pdf2htmlEX

弃用
1.社区不活跃
2.目前无法将转换得到的html中css转换成如下形式

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:793px; height:559px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<span style="position:absolute; color:black; left:196px; top:60px; font-size:21px;">福</span>
<span style="position:absolute; color:black; left:217px; top:60px; font-size:21px;">建</span>
<span style="position:absolute; color:black; left:238px; top:60px; font-size:21px;">医</span>

https://hub.docker.com/r/bwits/pdf2htmlex-alpine/

#Dockerfile to build a pdf2htmlEx image
FROM debian:wheezy

ENV REFRESHED_AT 20151007

# update debian source list
RUN echo "deb http://ftp.de.debian.org/debian sid main" >> /etc/apt/sources.list && \
    apt-get -qqy update && \
    apt-get -qqy install pdf2htmlex && \
    rm -rf /var/lib/apt/lists/*

VOLUME /pdf
WORKDIR /pdf

CMD ["pdf2htmlEX"]

based on Ubuntu

FROM ubuntu:15.04

ENV PDF2HTML_VERSION          0.12-1~git201411121058r1a6ec-0ubuntu1~precise1
ENV FONTFORGE_VERSION         20150612-0ubuntu1~precise
ENV NODEJS_VERSION            0.10.37-1chl1~precise1

MAINTAINER [email protected]

RUN echo "deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ precise multiverse" >> /etc/apt/sources.list
RUN echo "deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ precise-updates multiverse" >> /etc/apt/sources.list


# Freshen up ubuntu
RUN apt-get update
RUN apt-get -y dist-upgrade

RUN apt-get install -y software-properties-common python-software-properties apt-utils
RUN apt-get -y install curl wget

#install fontforge PPA
#RUN apt-add-repository ppa:fontforge/fontforge

# install nodeJs PPA
RUN curl -sL https://deb.nodesource.com/setup_0.12 | bash -

RUN apt-get update

#
#Install git and all dependencies
# libtiff4-dev
RUN apt-get install -y sudo git cmake autotools-dev libjpeg-dev  libpng12-dev libgif-dev libxt-dev autoconf automake libtool bzip2 libxml2-dev libuninameslist-dev libspiro-dev python-dev libpango1.0-dev libcairo2-dev chrpath uuid-dev uthash-dev

#
#Clone the pdf2htmlEX fork of fontforge
#compile it
#
RUN git clone https://github.com/coolwanglu/fontforge.git fontforge.git
RUN cd fontforge.git && git checkout pdf2htmlEX && ./autogen.sh && ./configure && make V=1 && sudo make install

#
#Install poppler utils
#
RUN apt-get install -y libpoppler-glib-dev poppler-utils libpoppler-dev gir1.2-poppler-0.18 libpoppler-cil-dev libpopplerkit-dev libpoppler-cpp-dev libpoppler-private-dev

#
#Install cairo utils
#
RUN apt-get install -y libcairo2-dev libghc-svgcairo-dev

#
#Clone and install the pdf2htmlEX git repo
#
RUN git clone git://github.com/coolwanglu/pdf2htmlEX.git
RUN cd pdf2htmlEX && cmake . && make && sudo make install

span前面空格数量不同

替换“
” 为空

替换"
"为空
替换" " 为空这里的1可以是任意数字 span的值为空

替换"
)"为空

https://github.com/fmalina/transcript 借助这个脚本来理解其中PDF2htmlEX中一些字段含义

pdf2htmlEX --external-hint-tool=ttfautohint --auto-hint 1 --zoom 2

转换得到的css html是分离的意味着和pdfminer不同如果要直接处理的话需要把css的style 弄成html inline的形式借助工具
https://github.com/davecranwell/inline-styler
https://github.com/rennat/pynliner

关键词inline style attributes to style tags

https://www.npmjs.com/package/gulp-inline-css
https://github.com/christiaan/InlineStyle

root@6e4b6b36f399:/tmp# pdf2htmlEX --external-hint-tool=ttfautohint --auto-hint 1 --zoom 2 --tounicode 1   1.pdf

结果

1.html.zip

pdf2htmlex --external-hint-tool=ttfautohint --auto-hint 1 --zoom 2  --tounicode 1 --correct-text-visibility 1 --process-nontext 1 --remove-unsued-glyph 0 1.pdf

wanghaisheng · 2016-08-21T05:25:03Z

html to json
https://github.com/fb55/htmlparser2

https://github.com/inikulin/parse5
http://demos.forbeslindesay.co.uk/htmlparser2/
http://astexplorer.net/#/1CHlCXc4n4
https://github.com/cheeriojs/cheerio

var cheerio = require('cheerio');
var fs = require('fs');  
var css2json = require('./css2json.js')      
$ = cheerio.load(require('fs').readFileSync('original_检验报告_1_xps_from_online_pdfmier.htm','utf-8'));

console.log($("[style*=top]").length)
console.log($("[style]").not("[style*=top]").length)
console.log($("span[style*=border]").length);
console.log($("span[style]").not("span[style*=border]").length);

var data = JSON.stringify($("span[style*=border]").attr('style'));
var border_tops = [];
$("span[style*=border]").each(function(i, elem) {
  top = (css2json.css2json(JSON.stringify( $(this).attr('style'))))["top"]; 

  border_tops.push(top);
});

console.log(uniq_fast(border_tops));


function uniq_fast(a) {
    var seen = {};
    var out = [];
    var len = a.length;
    var j = 0;
    for(var i = 0; i < len; i++) {
         var item = a[i];
         if(seen[item] !== 1) {
               seen[item] = 1;
               out[j++] = item;
         }
    }
    return out;
}

//去掉border最大和最小值

for (var i=1;i<uniq_fast(border_tops).length-1;i++)
{

var block =uniq_fast(border_tops)[i];
var span_tops = [];

$("span[style]").not("span[style*=border]").each(function(i, elem) {
  top = (css2json.css2json(JSON.stringify( $(this).attr('style'))))["top"]; 
  if(parseInt(top) < parseInt("62px")){

      console.log("y坐标为："+parseInt(top)+"边界为："+parseInt(block)+"---"+"第一块");
      console.log($('span[style*="top:'+top+'"]').not("span[style*=border]").length);
//    console.log($.html('span[style*="top:'+top+'"]').html());
//判断同一个top 同一行中left值 来区分不同的segment
      console.log($(this).html());
  }

  span_tops.push(top);
});
}

<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head><body>
<span style="position:absolute; border: gray 1px solid; left:0px; top:50px; width:793px; height:559px;"></span>
<div style="position:absolute; top:50px;"><a name="1">Page 1</a></div>
<span style="position:absolute; color:black; left:196px; top:60px; font-size:21px;">福</span>
<span style="position:absolute; color:black; left:217px; top:60px; font-size:21px;">建</span>
<span style="position:absolute; color:black; left:238px; top:60px; font-size:21px;">医</span>
<span style="position:absolute; color:black; left:260px; top:60px; font-size:21px;">科</span>
<span style="position:absolute; color:black; left:281px; top:60px; font-size:21px;">大</span>
<span style="position:absolute; color:black; left:303px; top:60px; font-size:21px;">学</span>
<span style="position:absolute; color:black; left:324px; top:60px; font-size:21px;">附</span>
<span style="position:absolute; color:black; left:346px; top:60px; font-size:21px;">属</span>
<span style="position:absolute; color:black; left:367px; top:60px; font-size:21px;">第</span>
<span style="position:absolute; color:black; left:388px; top:60px; font-size:21px;">一</span>
<span style="position:absolute; color:black; left:410px; top:60px; font-size:21px;">医</span>
<span style="position:absolute; color:black; left:431px; top:60px; font-size:21px;">院</span>
<span style="position:absolute; color:black; left:453px; top:60px; font-size:21px;">检</span>
<span style="position:absolute; color:black; left:474px; top:60px; font-size:21px;">验</span>
<span style="position:absolute; color:black; left:495px; top:60px; font-size:21px;">报</span>
<span style="position:absolute; color:black; left:517px; top:60px; font-size:21px;">告</span>
<span style="position:absolute; color:black; left:538px; top:60px; font-size:21px;">单</span>
<span style="position:absolute; color:black; left:634px; top:56px; font-size:12px;">【</span>
<span style="position:absolute; color:black; left:647px; top:56px; font-size:12px;">免</span>
<span style="position:absolute; color:black; left:659px; top:56px; font-size:12px;">疫</span>
<span style="position:absolute; color:black; left:671px; top:56px; font-size:12px;">】</span>
<span style="position:absolute; color:black; left:683px; top:56px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:689px; top:56px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:695px; top:56px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:701px; top:56px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:707px; top:56px; font-size:12px;">3</span>


<span style="position:absolute; color:black; left:188px; top:88px; font-size:12px;">门</span>
<span style="position:absolute; color:black; left:200px; top:88px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:206px; top:88px; font-size:12px;">诊</span>
<span style="position:absolute; color:black; left:218px; top:88px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:224px; top:88px; font-size:12px;">号</span>
<span style="position:absolute; color:black; left:236px; top:88px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:245px; top:88px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:251px; top:88px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:257px; top:88px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:263px; top:88px; font-size:12px;">7</span>
<span style="position:absolute; color:black; left:269px; top:88px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:275px; top:88px; font-size:12px;">8</span>
<span style="position:absolute; color:black; left:281px; top:88px; font-size:12px;">9</span>
<span style="position:absolute; color:black; left:287px; top:88px; font-size:12px;">4</span>
<span style="position:absolute; color:black; left:293px; top:88px; font-size:12px;">6</span>
<span style="position:absolute; color:black; left:300px; top:88px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:340px; top:88px; font-size:12px;">申</span>
<span style="position:absolute; color:black; left:352px; top:88px; font-size:12px;">请</span>
<span style="position:absolute; color:black; left:364px; top:88px; font-size:12px;">医</span>
<span style="position:absolute; color:black; left:376px; top:88px; font-size:12px;">生</span>
<span style="position:absolute; color:black; left:388px; top:88px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:402px; top:88px; font-size:12px;">郭</span>
<span style="position:absolute; color:black; left:414px; top:88px; font-size:12px;">玉</span>
<span style="position:absolute; color:black; left:426px; top:88px; font-size:12px;">佳</span>
<span style="position:absolute; color:black; left:438px; top:88px; font-size:12px;">/</span>
<span style="position:absolute; color:black; left:444px; top:88px; font-size:12px;">2</span>
<span style="position:absolute; color:black; left:450px; top:88px; font-size:12px;">9</span>
<span style="position:absolute; color:black; left:456px; top:88px; font-size:12px;">8</span>
<span style="position:absolute; color:black; left:462px; top:88px; font-size:12px;">5</span>



<span style="position:absolute; color:black; left:666px; top:70px; font-size:12px;">N</span>
<span style="position:absolute; color:black; left:672px; top:70px; font-size:12px;">O</span>
<span style="position:absolute; color:black; left:679px; top:70px; font-size:12px;">．</span>
<span style="position:absolute; color:black; left:691px; top:70px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:697px; top:70px; font-size:12px;">2</span>
<span style="position:absolute; color:black; left:703px; top:70px; font-size:12px;">6</span>
<span style="position:absolute; color:black; left:709px; top:70px; font-size:12px;">9</span>
<span style="position:absolute; color:black; left:523px; top:88px; font-size:12px;">申</span>
<span style="position:absolute; color:black; left:535px; top:88px; font-size:12px;">请</span>
<span style="position:absolute; color:black; left:547px; top:88px; font-size:12px;">时</span>
<span style="position:absolute; color:black; left:559px; top:88px; font-size:12px;">间</span>
<span style="position:absolute; color:black; left:571px; top:88px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:585px; top:88px; font-size:12px;">2</span>
<span style="position:absolute; color:black; left:591px; top:88px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:597px; top:88px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:603px; top:88px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:609px; top:88px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:615px; top:88px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:621px; top:88px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:627px; top:88px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:633px; top:88px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:639px; top:88px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:645px; top:88px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:651px; top:88px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:658px; top:88px; font-size:12px;">9</span>
<span style="position:absolute; color:black; left:664px; top:88px; font-size:12px;">:</span>
<span style="position:absolute; color:black; left:670px; top:88px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:676px; top:88px; font-size:12px;">9</span>
<span style="position:absolute; color:black; left:188px; top:107px; font-size:12px;">条</span>
<span style="position:absolute; color:black; left:200px; top:107px; font-size:12px;">形</span>
<span style="position:absolute; color:black; left:212px; top:107px; font-size:12px;">码</span>
<span style="position:absolute; color:black; left:224px; top:107px; font-size:12px;">号</span>
<span style="position:absolute; color:black; left:236px; top:107px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:245px; top:107px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:251px; top:107px; font-size:12px;">4</span>
<span style="position:absolute; color:black; left:257px; top:107px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:263px; top:107px; font-size:12px;">6</span>
<span style="position:absolute; color:black; left:269px; top:107px; font-size:12px;">6</span>
<span style="position:absolute; color:black; left:275px; top:107px; font-size:12px;">2</span>
<span style="position:absolute; color:black; left:281px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:287px; top:107px; font-size:12px;">8</span>
<span style="position:absolute; color:black; left:293px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:300px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:188px; top:125px; font-size:12px;">床</span>
<span style="position:absolute; color:black; left:200px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:206px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:212px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:218px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:224px; top:125px; font-size:12px;">号</span>
<span style="position:absolute; color:black; left:236px; top:125px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:188px; top:145px; font-size:12px;">标</span>
<span style="position:absolute; color:black; left:200px; top:145px; font-size:12px;">本</span>
<span style="position:absolute; color:black; left:212px; top:145px; font-size:12px;">状</span>
<span style="position:absolute; color:black; left:224px; top:145px; font-size:12px;">态</span>
<span style="position:absolute; color:black; left:236px; top:145px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:245px; top:144px; font-size:12px;">合</span>
<span style="position:absolute; color:black; left:257px; top:144px; font-size:12px;">格</span>
<span style="position:absolute; color:black; left:340px; top:107px; font-size:12px;">采</span>
<span style="position:absolute; color:black; left:352px; top:107px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:358px; top:107px; font-size:12px;">集</span>
<span style="position:absolute; color:black; left:370px; top:107px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:376px; top:107px; font-size:12px;">者</span>
<span style="position:absolute; color:black; left:388px; top:107px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:402px; top:107px; font-size:12px;">黄</span>
<span style="position:absolute; color:black; left:414px; top:107px; font-size:12px;">文</span>
<span style="position:absolute; color:black; left:426px; top:107px; font-size:12px;">夏</span>
<span style="position:absolute; color:black; left:438px; top:107px; font-size:12px;">/</span>
<span style="position:absolute; color:black; left:444px; top:107px; font-size:12px;">T</span>
<span style="position:absolute; color:black; left:450px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:456px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:462px; top:107px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:468px; top:107px; font-size:12px;">7</span>
<span style="position:absolute; color:black; left:340px; top:125px; font-size:12px;">科</span>
<span style="position:absolute; color:black; left:352px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:358px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:364px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:370px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:376px; top:125px; font-size:12px;">别</span>
<span style="position:absolute; color:black; left:388px; top:125px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:402px; top:126px; font-size:12px;">台</span>
<span style="position:absolute; color:black; left:414px; top:126px; font-size:12px;">胞</span>
<span style="position:absolute; color:black; left:426px; top:126px; font-size:12px;">生</span>
<span style="position:absolute; color:black; left:438px; top:126px; font-size:12px;">殖</span>
<span style="position:absolute; color:black; left:450px; top:126px; font-size:12px;">中</span>
<span style="position:absolute; color:black; left:462px; top:126px; font-size:12px;">心</span>
<span style="position:absolute; color:black; left:341px; top:144px; font-size:12px;">临</span>
<span style="position:absolute; color:black; left:353px; top:144px; font-size:12px;">床</span>
<span style="position:absolute; color:black; left:365px; top:144px; font-size:12px;">诊</span>
<span style="position:absolute; color:black; left:377px; top:144px; font-size:12px;">断</span>
<span style="position:absolute; color:black; left:389px; top:144px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:402px; top:144px; font-size:12px;">女</span>
<span style="position:absolute; color:black; left:414px; top:144px; font-size:12px;">性</span>
<span style="position:absolute; color:black; left:426px; top:144px; font-size:12px;">盆</span>
<span style="position:absolute; color:black; left:438px; top:144px; font-size:12px;">腔</span>
<span style="position:absolute; color:black; left:450px; top:144px; font-size:12px;">炎</span>
<span style="position:absolute; color:black; left:462px; top:144px; font-size:12px;">性</span>
<span style="position:absolute; color:black; left:474px; top:144px; font-size:12px;">疾</span>
<span style="position:absolute; color:black; left:486px; top:144px; font-size:12px;">病</span>
<span style="position:absolute; color:black; left:523px; top:107px; font-size:12px;">采</span>
<span style="position:absolute; color:black; left:535px; top:107px; font-size:12px;">集</span>
<span style="position:absolute; color:black; left:547px; top:107px; font-size:12px;">时</span>
<span style="position:absolute; color:black; left:559px; top:107px; font-size:12px;">间</span>
<span style="position:absolute; color:black; left:571px; top:107px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:585px; top:107px; font-size:12px;">2</span>
<span style="position:absolute; color:black; left:591px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:597px; top:107px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:603px; top:107px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:609px; top:107px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:615px; top:107px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:621px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:627px; top:107px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:633px; top:107px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:639px; top:107px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:645px; top:107px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:652px; top:107px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:658px; top:107px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:664px; top:107px; font-size:12px;">:</span>
<span style="position:absolute; color:black; left:670px; top:107px; font-size:12px;">4</span>
<span style="position:absolute; color:black; left:676px; top:107px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:523px; top:125px; font-size:12px;">接</span>
<span style="position:absolute; color:black; left:535px; top:125px; font-size:12px;">收</span>
<span style="position:absolute; color:black; left:547px; top:125px; font-size:12px;">时</span>
<span style="position:absolute; color:black; left:559px; top:125px; font-size:12px;">间</span>
<span style="position:absolute; color:black; left:571px; top:125px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:585px; top:125px; font-size:12px;">2</span>
<span style="position:absolute; color:black; left:591px; top:125px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:597px; top:125px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:603px; top:125px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:609px; top:125px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:615px; top:125px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:621px; top:125px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:627px; top:125px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:633px; top:125px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:639px; top:125px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:645px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:651px; top:125px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:657px; top:125px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:663px; top:125px; font-size:12px;">:</span>
<span style="position:absolute; color:black; left:670px; top:125px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:676px; top:125px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:57px; top:88px; font-size:12px;">姓</span>
<span style="position:absolute; color:black; left:69px; top:88px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:75px; top:88px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:81px; top:88px; font-size:12px;">名</span>
<span style="position:absolute; color:black; left:93px; top:88px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:108px; top:87px; font-size:16px;">陈</span>
<span style="position:absolute; color:black; left:124px; top:87px; font-size:16px;">超</span>
<span style="position:absolute; color:black; left:57px; top:107px; font-size:12px;">性</span>
<span style="position:absolute; color:black; left:69px; top:107px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:75px; top:107px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:81px; top:107px; font-size:12px;">别</span>
<span style="position:absolute; color:black; left:93px; top:107px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:108px; top:107px; font-size:12px;">女</span>
<span style="position:absolute; color:black; left:57px; top:125px; font-size:12px;">年</span>
<span style="position:absolute; color:black; left:69px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:75px; top:125px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:81px; top:125px; font-size:12px;">龄</span>
<span style="position:absolute; color:black; left:93px; top:125px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:108px; top:125px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:114px; top:125px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:120px; top:125px; font-size:12px;">岁</span>
<span style="position:absolute; color:black; left:57px; top:144px; font-size:12px;">标</span>
<span style="position:absolute; color:black; left:69px; top:144px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:75px; top:144px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:81px; top:144px; font-size:12px;">本</span>
<span style="position:absolute; color:black; left:93px; top:144px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:107px; top:144px; font-size:12px;">血</span>
<span style="position:absolute; color:black; left:119px; top:144px; font-size:12px;">清</span>
<span style="position:absolute; color:black; left:61px; top:167px; font-size:12px;">N</span>
<span style="position:absolute; color:black; left:67px; top:167px; font-size:12px;">o</span>
<span style="position:absolute; color:black; left:85px; top:167px; font-size:12px;">项</span>
<span style="position:absolute; color:black; left:98px; top:167px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:104px; top:167px; font-size:12px;">目</span>
<span style="position:absolute; color:black; left:57px; top:189px; font-size:13px;"> </span>
<span style="position:absolute; color:black; left:64px; top:189px; font-size:13px;">1</span>
<span style="position:absolute; color:black; left:83px; top:189px; font-size:13px;">梅</span>
<span style="position:absolute; color:black; left:96px; top:189px; font-size:13px;">毒</span>
<span style="position:absolute; color:black; left:109px; top:189px; font-size:13px;">螺</span>
<span style="position:absolute; color:black; left:122px; top:189px; font-size:13px;">旋</span>
<span style="position:absolute; color:black; left:135px; top:189px; font-size:13px;">体</span>
<span style="position:absolute; color:black; left:148px; top:189px; font-size:13px;">特</span>
<span style="position:absolute; color:black; left:161px; top:189px; font-size:13px;">异</span>
<span style="position:absolute; color:black; left:174px; top:189px; font-size:13px;">性</span>
<span style="position:absolute; color:black; left:187px; top:189px; font-size:13px;">抗</span>
<span style="position:absolute; color:black; left:200px; top:189px; font-size:13px;">体</span>
<span style="position:absolute; color:black; left:292px; top:167px; font-size:12px;">结</span>
<span style="position:absolute; color:black; left:304px; top:167px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:310px; top:167px; font-size:12px;">果</span>
<span style="position:absolute; color:black; left:291px; top:189px; font-size:13px;">0</span>
<span style="position:absolute; color:black; left:297px; top:189px; font-size:13px;">.</span>
<span style="position:absolute; color:black; left:304px; top:189px; font-size:13px;">0</span>
<span style="position:absolute; color:black; left:311px; top:189px; font-size:13px;">5</span>
<span style="position:absolute; color:black; left:318px; top:189px; font-size:13px;">(</span>
<span style="position:absolute; color:black; left:325px; top:189px; font-size:13px;">-</span>
<span style="position:absolute; color:black; left:332px; top:189px; font-size:13px;">)</span>
<span style="position:absolute; color:black; left:406px; top:167px; font-size:12px;">参</span>
<span style="position:absolute; color:black; left:418px; top:167px; font-size:12px;">考</span>
<span style="position:absolute; color:black; left:430px; top:167px; font-size:12px;">区</span>
<span style="position:absolute; color:black; left:442px; top:167px; font-size:12px;">间</span>
<span style="position:absolute; color:black; left:407px; top:187px; font-size:13px;">&lt;</span>
<span style="position:absolute; color:black; left:413px; top:187px; font-size:13px;">1</span>
<span style="position:absolute; color:black; left:420px; top:187px; font-size:13px;">.</span>
<span style="position:absolute; color:black; left:427px; top:187px; font-size:13px;">0</span>
<span style="position:absolute; color:black; left:434px; top:187px; font-size:13px;">0</span>
<span style="position:absolute; color:black; left:502px; top:167px; font-size:12px;">单</span>
<span style="position:absolute; color:black; left:514px; top:167px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:520px; top:167px; font-size:12px;">位</span>
<span style="position:absolute; color:black; left:555px; top:167px; font-size:12px;">仪</span>
<span style="position:absolute; color:black; left:567px; top:167px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:573px; top:167px; font-size:12px;">器</span>
<span style="position:absolute; color:black; left:624px; top:167px; font-size:12px;">方</span>
<span style="position:absolute; color:black; left:636px; top:167px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:642px; top:167px; font-size:12px;">法</span>
<span style="position:absolute; color:black; left:502px; top:189px; font-size:13px;">s</span>
<span style="position:absolute; color:black; left:509px; top:189px; font-size:13px;">/</span>
<span style="position:absolute; color:black; left:516px; top:189px; font-size:13px;">c</span>
<span style="position:absolute; color:black; left:523px; top:189px; font-size:13px;">o</span>
<span style="position:absolute; color:black; left:554px; top:188px; font-size:13px;">I</span>
<span style="position:absolute; color:black; left:561px; top:188px; font-size:13px;">2</span>
<span style="position:absolute; color:black; left:568px; top:188px; font-size:13px;">0</span>
<span style="position:absolute; color:black; left:575px; top:188px; font-size:13px;">0</span>
<span style="position:absolute; color:black; left:582px; top:188px; font-size:13px;">0</span>
<span style="position:absolute; color:black; left:589px; top:188px; font-size:13px;"> </span>
<span style="position:absolute; color:black; left:596px; top:188px; font-size:13px;"> </span>
<span style="position:absolute; color:black; left:602px; top:188px; font-size:13px;"> </span>
<span style="position:absolute; color:black; left:609px; top:188px; font-size:13px;">化</span>
<span style="position:absolute; color:black; left:622px; top:188px; font-size:13px;">学</span>
<span style="position:absolute; color:black; left:635px; top:188px; font-size:13px;">发</span>
<span style="position:absolute; color:black; left:648px; top:188px; font-size:13px;">光</span>
<span style="position:absolute; color:black; left:661px; top:188px; font-size:13px;">法</span>
<span style="position:absolute; color:black; left:60px; top:515px; font-size:12px;">备</span>
<span style="position:absolute; color:black; left:72px; top:515px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:78px; top:515px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:84px; top:515px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:90px; top:515px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:96px; top:515px; font-size:12px;">注</span>
<span style="position:absolute; color:black; left:108px; top:515px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:54px; top:534px; font-size:12px;">※</span>
<span style="position:absolute; color:black; left:66px; top:534px; font-size:12px;">结</span>
<span style="position:absolute; color:black; left:78px; top:534px; font-size:12px;">果</span>
<span style="position:absolute; color:black; left:90px; top:534px; font-size:12px;">仅</span>
<span style="position:absolute; color:black; left:102px; top:534px; font-size:12px;">对</span>
<span style="position:absolute; color:black; left:114px; top:534px; font-size:12px;">送</span>
<span style="position:absolute; color:black; left:126px; top:534px; font-size:12px;">检</span>
<span style="position:absolute; color:black; left:139px; top:534px; font-size:12px;">标</span>
<span style="position:absolute; color:black; left:151px; top:534px; font-size:12px;">本</span>
<span style="position:absolute; color:black; left:163px; top:534px; font-size:12px;">负</span>
<span style="position:absolute; color:black; left:175px; top:534px; font-size:12px;">责</span>
<span style="position:absolute; color:black; left:187px; top:534px; font-size:12px;">，</span>
<span style="position:absolute; color:black; left:199px; top:534px; font-size:12px;">有</span>
<span style="position:absolute; color:black; left:211px; top:534px; font-size:12px;">疑</span>
<span style="position:absolute; color:black; left:223px; top:534px; font-size:12px;">问</span>
<span style="position:absolute; color:black; left:235px; top:534px; font-size:12px;">请</span>
<span style="position:absolute; color:black; left:247px; top:534px; font-size:12px;">于</span>
<span style="position:absolute; color:black; left:260px; top:534px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:266px; top:534px; font-size:12px;">日</span>
<span style="position:absolute; color:black; left:278px; top:534px; font-size:12px;">内</span>
<span style="position:absolute; color:black; left:290px; top:534px; font-size:12px;">咨</span>
<span style="position:absolute; color:black; left:302px; top:534px; font-size:12px;">询</span>
<span style="position:absolute; color:black; left:54px; top:548px; font-size:12px;">※</span>
<span style="position:absolute; color:black; left:66px; top:548px; font-size:12px;">带</span>
<span style="position:absolute; color:black; left:78px; top:548px; font-size:12px;">"</span>
<span style="position:absolute; color:black; left:84px; top:548px; font-size:12px;">*</span>
<span style="position:absolute; color:black; left:90px; top:548px; font-size:12px;">"</span>
<span style="position:absolute; color:black; left:96px; top:548px; font-size:12px;">结</span>
<span style="position:absolute; color:black; left:108px; top:548px; font-size:12px;">果</span>
<span style="position:absolute; color:black; left:120px; top:548px; font-size:12px;">按</span>
<span style="position:absolute; color:black; left:132px; top:548px; font-size:12px;">卫</span>
<span style="position:absolute; color:black; left:144px; top:548px; font-size:12px;">生</span>
<span style="position:absolute; color:black; left:157px; top:548px; font-size:12px;">厅</span>
<span style="position:absolute; color:black; left:169px; top:548px; font-size:12px;">规</span>
<span style="position:absolute; color:black; left:181px; top:548px; font-size:12px;">定</span>
<span style="position:absolute; color:black; left:193px; top:548px; font-size:12px;">参</span>
<span style="position:absolute; color:black; left:205px; top:548px; font-size:12px;">加</span>
<span style="position:absolute; color:black; left:217px; top:548px; font-size:12px;">互</span>
<span style="position:absolute; color:black; left:229px; top:548px; font-size:12px;">认</span>
<span style="position:absolute; color:black; left:340px; top:535px; font-size:12px;">报</span>
<span style="position:absolute; color:black; left:352px; top:535px; font-size:12px;">告</span>
<span style="position:absolute; color:black; left:364px; top:535px; font-size:12px;">时</span>
<span style="position:absolute; color:black; left:376px; top:535px; font-size:12px;">间</span>
<span style="position:absolute; color:black; left:388px; top:535px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:395px; top:534px; font-size:12px;">2</span>
<span style="position:absolute; color:black; left:401px; top:534px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:407px; top:534px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:413px; top:534px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:419px; top:534px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:425px; top:534px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:431px; top:534px; font-size:12px;">0</span>
<span style="position:absolute; color:black; left:437px; top:534px; font-size:12px;">.</span>
<span style="position:absolute; color:black; left:443px; top:534px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:449px; top:534px; font-size:12px;">3</span>
<span style="position:absolute; color:black; left:455px; top:534px; font-size:12px;"> </span>
<span style="position:absolute; color:black; left:461px; top:534px; font-size:12px;">1</span>
<span style="position:absolute; color:black; left:468px; top:534px; font-size:12px;">4</span>
<span style="position:absolute; color:black; left:474px; top:534px; font-size:12px;">:</span>
<span style="position:absolute; color:black; left:480px; top:534px; font-size:12px;">5</span>
<span style="position:absolute; color:black; left:486px; top:534px; font-size:12px;">6</span>
<span style="position:absolute; color:black; left:506px; top:534px; font-size:12px;">检</span>
<span style="position:absolute; color:black; left:518px; top:534px; font-size:12px;">验</span>
<span style="position:absolute; color:black; left:530px; top:534px; font-size:12px;">者</span>
<span style="position:absolute; color:black; left:542px; top:534px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:549px; top:534px; font-size:12px;">陈</span>
<span style="position:absolute; color:black; left:561px; top:534px; font-size:12px;">静</span>
<span style="position:absolute; color:black; left:612px; top:534px; font-size:12px;">核</span>
<span style="position:absolute; color:black; left:624px; top:534px; font-size:12px;">对</span>
<span style="position:absolute; color:black; left:636px; top:534px; font-size:12px;">者</span>
<span style="position:absolute; color:black; left:648px; top:534px; font-size:12px;">：</span>
<span style="position:absolute; color:black; left:656px; top:534px; font-size:12px;">林</span>
<span style="position:absolute; color:black; left:668px; top:534px; font-size:12px;">永</span>
<span style="position:absolute; color:black; left:680px; top:534px; font-size:12px;">梅</span>
<span style="position:absolute; border: black 1px solid; left:56px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:58px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:60px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:61px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:62px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:64px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:67px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:69px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:70px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:73px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:74px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:76px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:77px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:80px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:82px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:84px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:86px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:88px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:89px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:92px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:93px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:95px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:97px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:99px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:102px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:104px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:105px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:107px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:108px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:111px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:112px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:114px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:116px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:117px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:120px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:121px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:124px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:126px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:127px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:129px; top:62px; width:0px; height:20px;"></span>
<span style="position:absolute; border: black 1px solid; left:57px; top:162px; width:631px; height:0px;"></span>
<span style="position:absolute; border: black 1px solid; left:291px; top:187px; width:80px; height:18px;"></span>
<span style="position:absolute; border: black 1px solid; left:55px; top:180px; width:633px; height:0px;"></span>
<span style="position:absolute; border: black 1px solid; left:54px; top:531px; width:659px; height:0px;"></span>
<span style="position:absolute; border: black 1px solid; left:54px; top:533px; width:278px; height:13px;"></span>
<span style="position:absolute; border: black 1px solid; left:54px; top:547px; width:255px; height:13px;"></span>
<div style="position:absolute; border: figure 1px solid; writing-mode:False; left:655px; top:524px; width:65px; height:28px;">
</div>
<div style="position:absolute; top:0px;">Page: <a href="#1">1</a></div>

</body></html>

wanghaisheng · 2016-08-22T17:06:28Z

问题1
1.如何移植最新版本的pdf.js到pdf2json库中来
2.如何实现
坐标换算
参考源码
x/px = PDFUnit.toPixelX( x+0.25)
y/px = PDFUnit.toPixelY( x+0.75)

w/px= PDFUnit.toFixedFloat(maxWidth),

The unit for all width, height, length, etc, is in "Form Unit". If you need pixels value, you can use the converter below:
modesty/pdf2json#4

https://github.com/modesty/pdf2json/blob/3fe724db05659ad12c2c0f1b019530c906ad23de/lib/pdffont.js
https://github.com/modesty/pdf2json/blob/3fe724db05659ad12c2c0f1b019530c906ad23de/lib/pdfunit.js



    let dpi = 96.0;
    let gridXPerInch = 4.0;
    let gridYPerInch = 4.0;

    let _pixelXPerGrid = dpi/gridXPerInch;
    let _pixelYPerGrid = dpi/gridYPerInch;
let _pixelPerPoint = dpi/72;


 let TS = [this.faceIdx, this.fontSize, this.bold?1:0, this.italic?1:0];
        let clrId = PDFUnit.findColorIndex(color);

        let oneText = {x: PDFUnit.toFormX(p.x) - 0.25,
            y: PDFUnit.toFormY(p.y) - 0.75,
            w: PDFUnit.toFixedFloat(maxWidth),
            sw: this.spaceWidth, //font space width, use to merge adjacent text blocks
            clr: clrId,
            A: "left",
            R: [{
                T: this.flash_encode(text),
                S: this.fontStyleId,
                TS: TS
}]

URL decode 之后pdf2json的结果

{
    "formImage": {
        "Transcoder": "[email protected] [https://github.com/modesty/pdf2json]",
        "Agency": "",
        "Id": {
            "AgencyId": "",
            "Name": "",
            "MC": false,
            "Max": 1,
            "Parent": ""
        },
        "Pages": [
            {
                "Height": 34.96,
                "HLines": [
                    {
                        "x": 3.596,
                        "y": 7.062,
                        "w": 2.877,
                        "l": 39.455
                    },
                    {
                        "x": 3.466,
                        "y": 8.181,
                        "w": 1.438,
                        "l": 39.585
                    },
                    {
                        "x": 3.406,
                        "y": 30.095,
                        "w": 2.877,
                        "l": 41.203
                    }
                ],
                "VLines": [
                    {
                        "x": 3.526,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.646,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.766,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.826,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.936,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.055,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.235,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.355,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.415,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.585,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.645,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.765,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.824,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.004,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.174,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.294,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.414,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.534,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.593,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.763,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.823,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.943,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.063,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.243,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.413,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.532,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.593,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.712,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.772,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.952,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.062,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.182,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.302,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.362,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.541,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.591,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.771,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.891,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.951,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 8.071,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    }
                ],
                "Fills": [
                    {
                        "x": 0,
                        "y": 0,
                        "w": 0,
                        "h": 0,
                        "clr": 1
                    },
                    {
                        "x": 18.199,
                        "y": 8.57,
                        "w": 5.054,
                        "h": 1.178,
                        "clr": 1
                    },
                    {
                        "x": 3.406,
                        "y": 30.215,
                        "w": 17.42,
                        "h": 0.869,
                        "clr": 1
                    },
                    {
                        "x": 3.406,
                        "y": 31.095,
                        "w": 15.982,
                        "h": 0.869,
                        "clr": 1
                    }
                ],
                "Texts": [
                    {
                        "x": 12.006,
                        "y": 1.038,
                        "w": 17,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "福建医科大学附属第一医院检验报告单",
                                "S": -1,
                                "TS": [
                                    3,
                                    24.275657,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 39.435,
                        "y": 0.30899999999999994,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "【免疫】",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 42.441,
                        "y": 0.30899999999999994,
                        "w": 2.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "10.13",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 41.422,
                        "y": 1.148,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "NO",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 42.201,
                        "y": 1.148,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "．",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 42.959,
                        "y": 1.148,
                        "w": 2,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "1269",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 2.326,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "姓",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 2.326,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "名：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.532,
                        "y": 2.456,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "陈超",
                                "S": -1,
                                "TS": [
                                    3,
                                    18.996696999999998,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.527,
                        "y": 2.326,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "门",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 12.659,
                        "y": 2.326,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "诊",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 13.784,
                        "y": 2.326,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "号：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.082,
                        "y": 2.326,
                        "w": 5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "3037089460",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 20.961,
                        "y": 2.326,
                        "w": 8,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "申请医生：郭玉佳",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 26.96,
                        "y": 2.326,
                        "w": 2.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "/2985",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.254,
                        "y": 2.326,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "申请时间：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 36.124,
                        "y": 2.326,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13 09:59",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "性",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 3.5149999999999997,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "别：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.531,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "女",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.525,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "条形码号：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.087,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "1436620800",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 20.958,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "采",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 22.083,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "集",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 23.216,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "者：黄文夏",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 26.965,
                        "y": 3.5149999999999997,
                        "w": 3,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "/T0037",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.259,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "采集时间：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 36.129,
                        "y": 3.5149999999999997,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13 10:45",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "年",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 4.634,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "龄：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.531,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "33",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 7.288,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "岁",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.525,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "床",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 13.775,
                        "y": 4.634,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "号：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 20.996,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "科",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 23.253,
                        "y": 4.634,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "别：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 24.881,
                        "y": 4.694,
                        "w": 6,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "台胞生殖中心",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.443,
                        "y": 4.634,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "接收时间：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 36.313,
                        "y": 4.634,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13 10:51",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 5.822,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "标",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 5.822,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "本：血清",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.527,
                        "y": 5.852,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "标本状态：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.082,
                        "y": 5.822,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "合格",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 21.074,
                        "y": 5.822,
                        "w": 13,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "临床诊断：女性盆腔炎性疾病",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 8.669,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": " 1",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.964,
                        "y": 8.699,
                        "w": 10,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "梅毒螺旋体特异性抗体",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 25.191,
                        "y": 8.529,
                        "w": 2.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "<1.00",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 31.184,
                        "y": 8.699,
                        "w": 2,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "s/co",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 34.43,
                        "y": 8.639,
                        "w": 4,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "I2000   ",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 37.758,
                        "y": 8.639,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "化学发光法",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 17.949,
                        "y": 8.699,
                        "w": 3.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "0.05(-)",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.156,
                        "y": 30.165,
                        "w": 17,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "※结果仅对送检标本负责，有疑问请于",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.911000000000001,
                        "y": 30.165,
                        "w": 0.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "3",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 16.286,
                        "y": 30.165,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "日内咨询",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.156,
                        "y": 31.044,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "※带",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.663,
                        "y": 31.044,
                        "w": 1.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": ""*"",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 5.788,
                        "y": 31.044,
                        "w": 12,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "结果按卫生厅规定参加互认",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 21.016,
                        "y": 30.235,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "报告时间：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 24.442,
                        "y": 30.205,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13 14:56",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 31.303,
                        "y": 30.205,
                        "w": 6,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "检验者：陈静",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 38.299,
                        "y": 30.205,
                        "w": 7,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "核对者：林永梅",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.576,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "No",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 5.12,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "项",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.253,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "目",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 18.033,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "结",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 19.165,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "果",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 25.126,
                        "y": 7.260999999999999,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "参考区间",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 31.178,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "单",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.31,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "位",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 34.44,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "仪",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 35.572,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "器",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 38.789,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "方",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 39.921,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "法",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.506,
                        "y": 28.986,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "备",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 5.763,
                        "y": 28.986,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "注：",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    }
                ],
                "Fields": [],
                "Boxsets": []
            }
        ],
        "Width": 49.61
    }
}

{
    "formImage": {
        "Transcoder": "[email protected] [https://github.com/modesty/pdf2json]",
        "Agency": "",
        "Id": {
            "AgencyId": "",
            "Name": "",
            "MC": false,
            "Max": 1,
            "Parent": ""
        },
        "Pages": [
            {
                "Height": 34.96,
                "HLines": [
                    {
                        "x": 3.596,
                        "y": 7.062,
                        "w": 2.877,
                        "l": 39.455
                    },
                    {
                        "x": 3.466,
                        "y": 8.181,
                        "w": 1.438,
                        "l": 39.585
                    },
                    {
                        "x": 3.406,
                        "y": 30.095,
                        "w": 2.877,
                        "l": 41.203
                    }
                ],
                "VLines": [
                    {
                        "x": 3.526,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.646,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.766,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.826,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 3.936,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.055,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.235,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.355,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.415,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.585,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.645,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.765,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 4.824,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.004,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.174,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.294,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.414,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.534,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.593,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.763,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.823,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 5.943,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.063,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.243,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.413,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.532,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.593,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.712,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.772,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 6.952,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.062,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.182,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.302,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.362,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.541,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.591,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.771,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.891,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 7.951,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    },
                    {
                        "x": 8.071,
                        "y": 0.759,
                        "w": 1.438,
                        "l": 1.309
                    }
                ],
                "Fills": [
                    {
                        "x": 0,
                        "y": 0,
                        "w": 0,
                        "h": 0,
                        "clr": 1
                    },
                    {
                        "x": 18.199,
                        "y": 8.57,
                        "w": 5.054,
                        "h": 1.178,
                        "clr": 1
                    },
                    {
                        "x": 3.406,
                        "y": 30.215,
                        "w": 17.42,
                        "h": 0.869,
                        "clr": 1
                    },
                    {
                        "x": 3.406,
                        "y": 31.095,
                        "w": 15.982,
                        "h": 0.869,
                        "clr": 1
                    }
                ],
                "Texts": [
                    {
                        "x": 12.006,
                        "y": 1.038,
                        "w": 17,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E7%A6%8F%E5%BB%BA%E5%8C%BB%E7%A7%91%E5%A4%A7%E5%AD%A6%E9%99%84%E5%B1%9E%E7%AC%AC%E4%B8%80%E5%8C%BB%E9%99%A2%E6%A3%80%E9%AA%8C%E6%8A%A5%E5%91%8A%E5%8D%95",
                                "S": -1,
                                "TS": [
                                    3,
                                    24.275657,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 39.435,
                        "y": 0.30899999999999994,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E3%80%90%E5%85%8D%E7%96%AB%E3%80%91",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 42.441,
                        "y": 0.30899999999999994,
                        "w": 2.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "10.13",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 41.422,
                        "y": 1.148,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "NO",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 42.201,
                        "y": 1.148,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%EF%BC%8E",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 42.959,
                        "y": 1.148,
                        "w": 2,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "1269",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 2.326,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%A7%93",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 2.326,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%90%8D%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.532,
                        "y": 2.456,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E9%99%88%E8%B6%85",
                                "S": -1,
                                "TS": [
                                    3,
                                    18.996696999999998,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.527,
                        "y": 2.326,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E9%97%A8",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 12.659,
                        "y": 2.326,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E8%AF%8A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 13.784,
                        "y": 2.326,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%8F%B7%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.082,
                        "y": 2.326,
                        "w": 5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "3037089460",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 20.961,
                        "y": 2.326,
                        "w": 8,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E7%94%B3%E8%AF%B7%E5%8C%BB%E7%94%9F%EF%BC%9A%E9%83%AD%E7%8E%89%E4%BD%B3",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 26.96,
                        "y": 2.326,
                        "w": 2.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%2F2985",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.254,
                        "y": 2.326,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E7%94%B3%E8%AF%B7%E6%97%B6%E9%97%B4%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 36.124,
                        "y": 2.326,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13%2009%3A59",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%80%A7",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 3.5149999999999997,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%88%AB%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.531,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%A5%B3",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.525,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%9D%A1%E5%BD%A2%E7%A0%81%E5%8F%B7%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.087,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "1436620800",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 20.958,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E9%87%87",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 22.083,
                        "y": 3.5149999999999997,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E9%9B%86",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 23.216,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E8%80%85%EF%BC%9A%E9%BB%84%E6%96%87%E5%A4%8F",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 26.965,
                        "y": 3.5149999999999997,
                        "w": 3,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%2FT0037",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.259,
                        "y": 3.5149999999999997,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E9%87%87%E9%9B%86%E6%97%B6%E9%97%B4%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 36.129,
                        "y": 3.5149999999999997,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13%2010%3A45",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%B9%B4",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 4.634,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E9%BE%84%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.531,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "33",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 7.288,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%B2%81",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.525,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%BA%8A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 13.775,
                        "y": 4.634,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%8F%B7%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 20.996,
                        "y": 4.634,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E7%A7%91",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 23.253,
                        "y": 4.634,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%88%AB%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 24.881,
                        "y": 4.694,
                        "w": 6,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%8F%B0%E8%83%9E%E7%94%9F%E6%AE%96%E4%B8%AD%E5%BF%83",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.443,
                        "y": 4.634,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%8E%A5%E6%94%B6%E6%97%B6%E9%97%B4%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 36.313,
                        "y": 4.634,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13%2010%3A51",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 5.822,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%A0%87",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.836,
                        "y": 5.822,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%9C%AC%EF%BC%9A%E8%A1%80%E6%B8%85",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 11.527,
                        "y": 5.852,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%A0%87%E6%9C%AC%E7%8A%B6%E6%80%81%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.082,
                        "y": 5.822,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%90%88%E6%A0%BC",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 21.074,
                        "y": 5.822,
                        "w": 13,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E4%B8%B4%E5%BA%8A%E8%AF%8A%E6%96%AD%EF%BC%9A%E5%A5%B3%E6%80%A7%E7%9B%86%E8%85%94%E7%82%8E%E6%80%A7%E7%96%BE%E7%97%85",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.336,
                        "y": 8.669,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%201",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.964,
                        "y": 8.699,
                        "w": 10,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%A2%85%E6%AF%92%E8%9E%BA%E6%97%8B%E4%BD%93%E7%89%B9%E5%BC%82%E6%80%A7%E6%8A%97%E4%BD%93",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 25.191,
                        "y": 8.529,
                        "w": 2.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%3C1.00",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 31.184,
                        "y": 8.699,
                        "w": 2,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "s%2Fco",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 34.43,
                        "y": 8.639,
                        "w": 4,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "I2000%20%20%20",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 37.758,
                        "y": 8.639,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%8C%96%E5%AD%A6%E5%8F%91%E5%85%89%E6%B3%95",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 17.949,
                        "y": 8.699,
                        "w": 3.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "0.05(-)",
                                "S": -1,
                                "TS": [
                                    3,
                                    16.277309000000002,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.156,
                        "y": 30.165,
                        "w": 17,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E2%80%BB%E7%BB%93%E6%9E%9C%E4%BB%85%E5%AF%B9%E9%80%81%E6%A3%80%E6%A0%87%E6%9C%AC%E8%B4%9F%E8%B4%A3%EF%BC%8C%E6%9C%89%E7%96%91%E9%97%AE%E8%AF%B7%E4%BA%8E",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 15.911000000000001,
                        "y": 30.165,
                        "w": 0.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "3",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 16.286,
                        "y": 30.165,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%97%A5%E5%86%85%E5%92%A8%E8%AF%A2",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.156,
                        "y": 31.044,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E2%80%BB%E5%B8%A6",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 4.663,
                        "y": 31.044,
                        "w": 1.5,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%22*%22",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 5.788,
                        "y": 31.044,
                        "w": 12,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E7%BB%93%E6%9E%9C%E6%8C%89%E5%8D%AB%E7%94%9F%E5%8E%85%E8%A7%84%E5%AE%9A%E5%8F%82%E5%8A%A0%E4%BA%92%E8%AE%A4",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 21.016,
                        "y": 30.235,
                        "w": 5,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%8A%A5%E5%91%8A%E6%97%B6%E9%97%B4%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 24.442,
                        "y": 30.205,
                        "w": 8,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "2015.10.13%2014%3A56",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 31.303,
                        "y": 30.205,
                        "w": 6,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%A3%80%E9%AA%8C%E8%80%85%EF%BC%9A%E9%99%88%E9%9D%99",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 38.299,
                        "y": 30.205,
                        "w": 7,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%A0%B8%E5%AF%B9%E8%80%85%EF%BC%9A%E6%9E%97%E6%B0%B8%E6%A2%85",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.576,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.65103125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "No",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 5.12,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E9%A1%B9",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 6.253,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E7%9B%AE",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 18.033,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E7%BB%93",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 19.165,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%9E%9C",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 25.126,
                        "y": 7.260999999999999,
                        "w": 4,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%8F%82%E8%80%83%E5%8C%BA%E9%97%B4",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 31.178,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%8D%95",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 32.31,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E4%BD%8D",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 34.44,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E4%BB%AA",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 35.572,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%99%A8",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 38.789,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%96%B9",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 39.921,
                        "y": 7.260999999999999,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%B3%95",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 3.506,
                        "y": 28.986,
                        "w": 1,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E5%A4%87",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    },
                    {
                        "x": 5.763,
                        "y": 28.986,
                        "w": 2,
                        "sw": 0.32553125,
                        "clr": 0,
                        "A": "left",
                        "R": [
                            {
                                "T": "%E6%B3%A8%EF%BC%9A",
                                "S": -1,
                                "TS": [
                                    3,
                                    14.997573,
                                    0,
                                    0
                                ]
                            }
                        ]
                    }
                ],
                "Fields": [],
                "Boxsets": []
            }
        ],
        "Width": 49.61
    }
}

wanghaisheng · 2016-10-05T17:24:10Z

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO

def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)

    fp = file(path, 'rb')

    parser = PDFParser(fp)
    doc = PDFDocument(parser)
    parser.set_document(doc)

    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos=set()

    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages,        password=password,caching=caching, check_extractable=True):
        interpreter.process_page(page)

    text = retstr.getvalue()

    fp.close()
    device.close()
    retstr.close()
    print text
    return text

wanghaisheng · 2016-10-05T17:25:13Z

http://stackoverflow.com/questions/9942594/unicodeencodeerror-ascii-codec-cant-encode-character-u-xa0-in-position-20?rq=1
https://github.com/Micka33/content-extractor

wanghaisheng · 2016-10-10T14:19:32Z



Not an issue, just a note that this is possible and that informal testing suggests this provides around a 33-40% speedup in terms of PDF parsing and processing.

I was following ideas from http://stackoverflow.com/questions/11507101/how-to-compile-and-link-multiple-python-modules-or-packages-using-cython

Steps
1. Rename the .py files, apart from __init__.py inside the main pdfminer module directory (NB: this may not be necessary, Cython may work with .py files but I haven't tested this)
2. Create a setup.py inside the main pdfminer module directory, i.e.

from distutils.core import setup
from Cython.Build import cythonize

setup(
name = 'pdfminer',
    ext_modules = cythonize("*.pyx")
)

    Compile with python setup.py build_ext --inplace
    This will create an additional pdfminer directory containing compiled .so files
    Move this pdfminer directory with the .so files into your project structure and import as normal

wanghaisheng · 2016-10-13T14:42:19Z

ocrmypdf

测试对于图片类型的输入
图片

➜  test-data git:(master) ✗ docker run -v "$(pwd):/home/docker"  ocrmypdf --output-type pdf --pdf-renderer tesseract --force-ocr -l  chi_sim 1.png    1-pic-redo.pdf
   INFO - Input file is not a PDF, checking if it is an image...
   INFO - Input file is an image
   INFO - Image seems valid. Try converting to PDF...
   INFO - Successfully converted to PDF, processing...
➜  test-data git:(master) ✗ docker run -v "$(pwd):/home/docker"  ocrmypdf --output-type pdf --pdf-renderer tesseract --force-ocr -l  chi_sim 1-pic-redo.pdf 1-pic-redo11.pdf
   INFO -    1: page already has text! – rasterizing text and running OCR anyway


切换到pdfminer的docker镜像 
root@30c7a47055f5:/tmp/test-data# pdf2txt.py -o 11_redo.html -Y exact 11_redo.pdf 
打开html 查看发现只有手写体  核对者姓名错误了

测试对于编码错误的pdf输入

➜  test-data git:(master) ✗ docker run -v "$(pwd):/home/docker"  ocrmypdf --output-type pdf --pdf-renderer tesseract --force-ocr -l chi_sim 11.pdf 11_redo.pdf
WARNING -    1: page has no images - all vector content will be rasterized at 400 DPI, losing some resolution and likely increasing file size. Use --oversample to adjust the DPI.

11.PDF.zip

wanghaisheng · 2016-11-21T16:56:22Z

1 xps<-->pdf<--->html
2 利用爬虫工具提取关键信息

爬虫软件中，国外的http://import.io、Data Scraping Studio、Scrapinghub，国内的集搜客、八爪鱼、火车头等各有什么优缺点
http://portia.readthedocs.io/en/latest/examples.html

docker build -t portia .

wanghaisheng · 2017-02-27T08:02:16Z

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents. https://datascience.blog.wzb.eu/2017/…

源码 https://github.com/WZBSocialScienceCenter/pdftabextract
说明文档 https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/

20170705测试

wanghaisheng · 2017-07-07T17:45:27Z

https://stackoverflow.com/questions/2926159/copypasting-text-from-pdf-results-in-garbage

1	2
	Very often in such cases, where you can't select, copy'n'paste text
from the Acrobat (Reader) window, there is another option which may work
nevertheless:

Open 'File' menu,
select 'Save as...',
select 'Text (normal) (*.txt)',
browse to the target directory,
type the name you want to use for the text file.

You'll have all text from all pages in the file and need to locate
the spot you wanted to copy'n'paste initially -- insofar it is not as
comfortable as direct copy'n'paste. But it works more reliably....

It also works with acroread on Linux (but you have to choose 'Save as text...' from the file menu).

Update

You can use the pdffonts command line utility to get a quick-shot analysis of the fonts used by a PDF.

Here is an example output, which demonstrates where a problem for
text extraction will very likely occur. It uses one of these hand-coded
PDF files from a GitHub-Repository which was created to provide PDF sample files which are well commented and may easily be opened in a text editor:

$ pdffonts textextract-bad2.pdf
name type encoding emb sub uni object ID

BAAAAA+Helvetica TrueType WinAnsi yes yes yes 12 0
CAAAAA+Helvetica-Bold TrueType WinAnsi yes yes no 13 0

How to interpret this table?

The above PDF file uses two subsetted fonts (as indicated by the BAAAAA+ and CAAAAA+ prefixes to their names, as well as by the yes entries in the sub column), Helvetica and Helvtica-Bold.
Both fonts are of type TrueType.
Both fonts use a WinAnsi encoding (a font encoding maps char identifiers used in the PDF source code to glyphs that should be drawn).
However, only for font /Helvetica there is a /ToUnicode table available inside the PDF (for /Helvetica-Bold there is none), as indicated by the yes/no in the uni-column).

The /ToUnicode table is required to provide a reverse mapping from character identifiers/codes to characters.

A missing /ToUnicode table for a specific font is almost
always a sure indicator that text strings using this font cannot be
extracted or copied'n'pasted from the PDF. (Even if a /ToUnicode table is
there, text extraction may still pose a problem, because this table may
be damaged, incorrect or incomplete -- as seen in many real-world PDF
files, and as also demonstrated by a few companion files in the above
linked GitHub repository.)

wanghaisheng · 2017-08-30T18:12:12Z

http://www.thebigdata.cn/JieJueFangAn/30066.html
https://www.import.io/
http://www.cogniview.com/
http://www.datawatch.com/our-platform/monarch/

wanghaisheng · 2017-10-31T02:02:10Z

http://tm.durusau.net/?cat=1480
A Comparison of Two Unsupervised Table Recognition Methods from Digital Scientific Articles 2014
Configurable Table Structure Recognition in Untagged PDF documents 2016
Extracting hierarchical data points and tables from scanned contracts. 2013
Towards domain-independent information extraction from web tables 2007
A methodology for evaluating algorithms for table understanding in PDF documents 2012
Layout-aware text extraction from full-text PDF of scientific articles 2012
Extraction of References Using Layout and Formatting Information from Scientific Articles 2013
Information Extraction and Annotation Systems and Methods for Documents 2013
Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents 2014
Analysis of Documents Born Digital 2014

wanghaisheng · 2017-10-31T02:25:47Z

XEROX 的Herve Dejean等人
A system for converting PDF documents into structured XML format 2006
Extracting structured data from unstructured document with incomplete resources 2015
https://www.bing.com/academic/profile?id=2164603628&mkt=zh-cn

北大某实验室
Xin Tao, Zhi Tang, Canhui Xu, Liangcai Gao
Ground-Truth and Performance Evaluation for Page Layout Analysis of Born-Digital Documents

wanghaisheng · 2017-10-31T04:42:18Z

With the advancements in information and communication technology, various forms of paper documents are being scanned in order to be interpreted and indexed. The bigger vision however, is to treat paper as a legitimate form of media (like magnetic tapes and optical discs) which can be both machine and human readable. One challenge is that the variety of paper documents being scanned today is much more diverse than what it was several years ago. Many new scripts, more complex, non-Manhattan page layouts and various font styles are making this vision challenging. Furthermore, a much larger percentage of handwritten material is being acquired which does not adhere to traditional layout constraints. Character recognition as well as various established pre-processing modules such as noise removal, layout analysis and zone classification are affected by this increased complexity.

The process of identifying structures of a document image can be based on the physical (process of dividing the document into physical homogeneous zones) or logical (process of assigning logical roles and relations to detected zones) layout. Page segmentation algorithms fall into the category of physical layout analysis. They perform segmentation of a document page into homogeneous zones, each consisting of only one physical layout structure such as text, graphics, equations, logos, stamps. Physical layout analysis can be pixel based or texture based segmentation, but here the goal is that the final result is a region segmentation. In texture-based segmentation, isolated points or small areas could be classified as zonal objects disregarding the connectivity aspect of an object. In contrast, the work is concerned with non overlapping geometric zones where document components are separated by white space. Such connected component based approaches use macro level content information, and can be further classified into Manhattan and non-Manhattan layouts.
https://lampsrv02.umiacs.umd.edu/projdb/project.php?id=57

wanghaisheng closed this as completed Aug 15, 2016

wanghaisheng reopened this Aug 15, 2016

wanghaisheng changed the title ~~pdf解析库~~ xps/pdf/png/json转换 Aug 15, 2016

wanghaisheng assigned liaicheng, doricat and zhuguangqiang Nov 4, 2016

wanghaisheng added the project label Nov 7, 2016

wanghaisheng unassigned doricat Apr 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xps/pdf/png/json转换 #18

xps/pdf/png/json转换 #18

wanghaisheng commented Aug 12, 2016 •

edited

Loading

wanghaisheng commented Aug 12, 2016 •

edited

Loading

wanghaisheng commented Aug 12, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016

wanghaisheng commented Aug 15, 2016

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016

wanghaisheng commented Aug 17, 2016 •

edited

Loading

wanghaisheng commented Aug 21, 2016 •

edited

Loading

wanghaisheng commented Aug 22, 2016 •

edited

Loading

wanghaisheng commented Oct 5, 2016

wanghaisheng commented Oct 5, 2016 •

edited

Loading

wanghaisheng commented Oct 10, 2016

wanghaisheng commented Oct 13, 2016 •

edited

Loading

wanghaisheng commented Nov 21, 2016 •

edited

Loading

wanghaisheng commented Feb 27, 2017 •

edited

Loading

wanghaisheng commented Jul 7, 2017

wanghaisheng commented Aug 30, 2017 •

edited

Loading

wanghaisheng commented Oct 31, 2017 •

edited

Loading

wanghaisheng commented Oct 31, 2017 •

edited

Loading

wanghaisheng commented Oct 31, 2017

xps/pdf/png/json转换 #18

xps/pdf/png/json转换 #18

Comments

wanghaisheng commented Aug 12, 2016 • edited Loading

table detect

pdf ocr

3. xps<-->png/jpeg

4. pdf<-->png/jpeg

5. png/jpeg<-->json

1 Online service for xps<--->pdf

2. Library for xps<--->pdf

(1) gs/gxps

(2) xpdf

(2.1) BePDF:This is a PDF reader that is based on XPDF 3.04. It handles PDF files up to PDF version 1.7 (Adobe Reader 9+).

(2.2) poppler-utils :Poppler is a PDF rendering library based on the xpdf-3.0 code base

(2.3)pdf2htmlEX based on poppler Fontforge

(3) libgxps

(4) Aspose.Pdf not free sdk

(5) mupdf

reference for xps<--->pdf

3. pdf<--->html<-->json

pdfminer

pdf2htmlEX

reference for pdf<--->html

wanghaisheng commented Aug 12, 2016 • edited Loading

wanghaisheng commented Aug 12, 2016 • edited Loading

wanghaisheng commented Aug 15, 2016 • edited Loading

for xpdf

wanghaisheng commented Aug 15, 2016 • edited Loading

wanghaisheng commented Aug 15, 2016 • edited Loading

for pdfminer

based on ALPINE LINUX

based on UBUTUN

wanghaisheng commented Aug 15, 2016

wanghaisheng commented Aug 15, 2016

wanghaisheng commented Aug 15, 2016 • edited Loading

wanghaisheng commented Aug 15, 2016

Fonts in PDF

How are fonts stored/referenced in PDF?

The different types of PDF fonts

Type 1

Multiple Master

TrueType

Type 3

Type 0

Footnotes

wanghaisheng commented Aug 17, 2016 • edited Loading

wanghaisheng commented Aug 21, 2016 • edited Loading

wanghaisheng commented Aug 22, 2016 • edited Loading

wanghaisheng commented Oct 5, 2016

wanghaisheng commented Oct 5, 2016 • edited Loading

wanghaisheng commented Oct 10, 2016

wanghaisheng commented Oct 13, 2016 • edited Loading

wanghaisheng commented Nov 21, 2016 • edited Loading

wanghaisheng commented Feb 27, 2017 • edited Loading

wanghaisheng commented Jul 7, 2017

wanghaisheng commented Aug 30, 2017 • edited Loading

wanghaisheng commented Oct 31, 2017 • edited Loading

wanghaisheng commented Oct 31, 2017 • edited Loading

wanghaisheng commented Oct 31, 2017

wanghaisheng commented Aug 12, 2016 •

edited

Loading

wanghaisheng commented Aug 12, 2016 •

edited

Loading

wanghaisheng commented Aug 12, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 15, 2016 •

edited

Loading

wanghaisheng commented Aug 17, 2016 •

edited

Loading

wanghaisheng commented Aug 21, 2016 •

edited

Loading

wanghaisheng commented Aug 22, 2016 •

edited

Loading

wanghaisheng commented Oct 5, 2016 •

edited

Loading

wanghaisheng commented Oct 13, 2016 •

edited

Loading

wanghaisheng commented Nov 21, 2016 •

edited

Loading

wanghaisheng commented Feb 27, 2017 •

edited

Loading

wanghaisheng commented Aug 30, 2017 •

edited

Loading

wanghaisheng commented Oct 31, 2017 •

edited

Loading

wanghaisheng commented Oct 31, 2017 •

edited

Loading