【测试】Umi-OCR 支持数学公式识别啦 #254

hiroi-sora · 2023-11-30T14:50:46Z

预览截图：

预览输出：

gradients in at least two (significantly) different orientations are the easiest to localiz, as shown schematically in Figure 7.4a.
These intuitions can be formalized by looking at the simplest possible matching criterion for comparing two image patches, i., their (weighted) summed square difference,

$E_{\mathrm{W S S D}} ( {\bf u} )=\sum_{i} w ( {\bf x}_{i} ) [ I_{1} ( {\bf x}_{i}+{\bf u} )-I_{0} ( {\bf x}_{i} ) ]^{2},$ (7.1)

where $I_{0}$ and $I_{1}$ are the two images being compared, ${\mathbf u}=( u, v )$ is the displacement vector, $w ( {\bf x} )$ is a spatially varying weighting (or window) function, and the summation i is over all the pixels in the patch. Note that this is the same formulation we later use to estimate motion between complete images (Section 9.1).
When performing feature detection, we do not know which other image locations the feature will end up being matched against. Therefore, we can only compute how stable this

前言

Pix2Text 是一个开源OCR项目，能够识别既包含文字又包含数学公式的混合图片。

我将其封装为插件，可导入 Umi-OCR v2 任意版本使用。支持 win7 x64 及以上的系统。

Pix2Text插件的用法与Paddle、Rapid插件相同，支持截图OCR和批量OCR。你可以同时导入这些插件，但不能同时启用。你需要在软件中切换它们。

P2T插件当前为测试阶段，可能不稳定或有bug。遇到任何相关问题，可以在本贴反馈。

请注意：下载插件后，第一次执行OCR时，P2T插件需要大量时间（10~60s）进行初始化、构建缓存，请耐心等待。后续OCR速度将恢复正常。

P2T是离线的，无需网络即可使用。

如何导入插件

前往 https://github.com/hiroi-sora/Umi-OCR_plugins/releases
下载 win7_x64_Pix2Text.v1.0.7z （注意版本号，建议选择最新版本）
解压，放入 UmiOCR-data/plugins
打开 Umi-OCR ，全局设置→文字识别→接口改为Pix2Text→点击应用修改

务必点击 应用修改 ！

然后，回到截图/批量标签页，像往常一样使用Umi即可。

P2T的特色

与 Paddle、Rapid相比，P2T插件有以下优点：

支持中+英+公式混合识别。
中英场景下，不会出现空格丢失现象。
中英场景下，识别速度较快。

P2T插件也存在这些不足：

初始化时间较长。首次OCR任务，可能需10~60s时间加载。
体积较大。极限压缩下约350MB，部署时约1.6GB。
对于复杂排版的图片，可能文本框检测的精度略低于别的OCR插件。

The text was updated successfully, but these errors were encountered:

LanRenLan · 2023-12-03T20:00:06Z

非常感谢，这个精度算是可以用了

oyyuyu · 2023-12-05T10:58:10Z

增加数学公式的支持太需要了，可惜p2t识别公式精度有点低，不知道是不是跟个人电脑配置有关？有个叫simpletex的公式混合效果特别好，是否可以参考下。另外nougat的公式识别精度很高，这个应该是开源的，遗憾的是好像不支持中文。

hiroi-sora · 2023-12-05T12:08:14Z

@oyyuyu 感谢建议。

p2t有闭源收费的模型，据说效果更好。
https://www.breezedeus.com/article/p2t-mfd-20230702
https://www.breezedeus.com/pix2text_cn

simpletex 我了解过，闭源，只支持在线。个人用它家软件是免费的，可惜API只有1000次免费调用额度。

nougat 有点意思，不过似乎只支持PDF，不知道有没有图像的接口。以后有空看看。

whxzyf · 2023-12-07T01:17:13Z

这个识别非常好，但是改版2.0后，觉得界面太复杂了，而且占面积过大，希望能回复以前的极简页面；还有非常期待数学公式的识别，表格的识别。数学公式识别建议接入微软的识别接口。非常感谢原作的辛苦付出！
刚试了一下公式识别，非常好，够用了。如果能增加一个功能，直接复制到word，不需要通过mathtype那就更好了

oyyuyu · 2023-12-09T12:21:31Z

@oyyuyu 感谢建议。

p2t有闭源收费的模型，据说效果更好。 https://www.breezedeus.com/article/p2t-mfd-20230702 https://www.breezedeus.com/pix2text_cn

simpletex 我了解过，闭源，只支持在线。个人用它家软件是免费的，可惜API只有1000次免费调用额度。

nougat 有点意思，不过似乎只支持PDF，不知道有没有图像的接口。以后有空看看。

nougat应该有图像接口，网络上见过把nougat做成本地截图版本的实例，另外GitHub里有个[RapidLatexOCR]公式识别好像也还行（试了下demo网站），不知是不是可以做个参考？

906051999 · 2023-12-10T04:22:29Z

厉害，马克一下

hiroi-sora · 2023-12-10T08:49:09Z

@oyyuyu 实际上，RapidLatexOCR与P2T的模型都来自 LaTeX-OCR 这个项目，所以它们的识别精度理应是没有差距的。Rapid在性能上优化更好。

总之就公式识别部分而言，P2T与Rapid的差距并不大；而P2T额外具有文字+公式混合图片的识别能力。所以暂时Rapid没有更多优势，让我有动力去适配。

oyyuyu · 2023-12-10T13:45:44Z

@oyyuyu 实际上，RapidLatexOCR与P2T的模型都来自 LaTeX-OCR 这个项目，所以它们的识别精度理应是没有差距的。Rapid在性能上优化更好。

总之就公式识别部分而言，P2T与Rapid的差距并不大；而P2T额外具有文字+公式混合图片的识别能力。所以暂时Rapid没有更多优势，让我有动力去适配。

我也安装了一下RapidLatexOCR，实测本地效果离demo网站的效果有不小差距。谢谢你的解释，愿你的软件越来越好。

2054248312 · 2023-12-16T08:29:31Z

w's为什么导入了文件选择不了

2054248312 · 2023-12-16T08:30:12Z

只有一个选项

hiroi-sora · 2023-12-17T02:06:26Z

@2054248312

你应该是导入的步骤不正确。请确保解压后放置的文件结构如下：

Umi-OCR\UmiOCR-data\plugins\win7_x64_Pix2Text\ (插件文件，如__init__.py)

你可能解压后创建了两层 win7_x64_Pix2Text 文件夹（如下所示），这是不正确的，请删掉一层。(将插件文件剪贴到上一层文件夹)

Umi-OCR\UmiOCR-data\plugins\win7_x64_Pix2Text\win7_x64_Pix2Text\ (插件文件)

breezedeus · 2023-12-20T05:04:56Z

感谢作者对 Pix2Text 的适配，辛苦👍

hiroi-sora · 2023-12-21T03:13:03Z

@breezedeus 哈哈，也希望大佬开源更多的高精度库啦~

LingyvKong · 2023-12-21T08:20:29Z

@oyyuyu @breezedeus 可关注下Vary，效果展示和原理介绍，功能参考nougat，支持中英文有demo。我是作者之一，欢迎反馈

oyyuyu · 2023-12-23T10:41:39Z

@oyyuyu @breezedeus 可关注下Vary，效果展示和原理介绍，功能参考nougat，支持中英文有demo。我是作者之一，欢迎反馈

测试了一下demo，很强啊，感谢大佬。

breezedeus · 2023-12-24T14:41:38Z

@oyyuyu @breezedeus 可关注下Vary，效果展示和原理介绍，功能参考nougat，支持中英文有demo。我是作者之一，欢迎反馈

👍 应该是这个方向最大的模型了。求个下载链接： breezedeus AT gmail DOT com 🙏

RarityBrown · 2024-01-01T14:12:05Z

VikParuchuri/texify: OCR model for math that outputs LaTeX and markdown

Benchmarks provided by @VikParuchuri:

realDGD · 2024-02-29T05:38:06Z

辛苦开发者开发这个插件了，实在是好用。
Pix2Text 已经更新到V1.0了，望开发者进行更新。
十分感激。

hiroi-sora · 2024-02-29T08:42:03Z

Pix2Text 已经更新到V1.0了

看了看更新日志，精度提升幅度确实很诱人！我会尽快着手对V1.0的适配。

breezedeus · 2024-02-29T13:07:20Z

@hiroi-sora 辛苦辛苦，非常感谢🙏

realDGD · 2024-03-01T02:03:44Z

Pix2Text 已经更新到V1.0了

看了看更新日志，精度提升幅度确实很诱人！我会尽快着手对V1.0的适配。

感谢开发者的回应和热情适配，我这还有一个不情之请：希望该插件有个类似于 SimpleTex 的实时预览功能，因为每次识别之后都无法第一时间检验 OCR 的正确性。即：#323

hiroi-sora · 2024-03-01T07:57:36Z

@realDGD 我更新了 #323 实时预览功能的一些调查结论。你可以看一下。

hiroi-sora · 2024-03-03T11:05:37Z

Umi-OCR Pix2Text 插件已同步更新到原项目的 `v1.0` 版本！

公式识别精度显著进步。
中英文的识别速度显著进步。部分情景能达到 Paddle mkldnn 引擎的性能。
插件现已支持更多识别模式选项：
- 只开启 启用数学公式 ：将整个图像当成单个公式来识别，精度较高，不支持混排。
- 只开启 启用文字识别 ：只识别文字，不处理公式。支持中英文。
- 同时开启两者：能识别文字+公式的混合排版图像，但精度可能比单模式略有下降。

您可以在 Umi-OCR 插件仓库下载最新版P2T插件。

未来展望

预计在未来，Umi-OCR将具有独立的公式识别标签页，该标签页内提供Latex实时预览等功能。

breezedeus · 2024-03-03T11:23:27Z

Umi-OCR Pix2Text 插件已同步更新到原项目的 v1.0 版本！

公式识别精度显著进步。

中英文的识别速度显著进步。部分情景能达到 Paddle mkldnn 引擎的性能。

插件现已支持更多识别模式选项：

只开启 启用数学公式 ：将整个图像当成单个公式来识别，精度较高，不支持混排。

只开启 启用文字识别 ：只识别文字，不处理公式。支持中英文。

同时开启两者：能识别文字+公式的混合排版图像，但精度可能比单模式略有下降。

您可以在 Umi-OCR 插件仓库下载最新版P2T插件。

未来展望

预计在未来，Umi-OCR将具有独立的公式识别标签页，该标签页内提供Latex实时预览等功能。

赞效率 👍

ligb929 · 2024-05-06T08:17:39Z

有一张图片使用P2T插件识别时出错（仅启用文字识别），JPG和PNG都试了，同样是203异常状态码
异常状态码：203
异常信息：p2t recognize error: OpenCV(4.9.0) 👎 error: (-5:Bad argument) in function 'cvtColor'

Overload resolution failed:

src data type = <U56 is not supported

Expected Ptrcv::UMat for argument 'src'

willre · 2024-09-29T06:37:39Z

对同一个公式截图识别两次后，都不能稳定得到正确的结果

hiroi-sora · 2024-10-12T13:07:50Z

对同一个公式截图识别两次后，都不能稳定得到正确的结果

你图中是软件将公式当成普通文本来识别了。建议使用纯数学公式模式，即只勾选启用数学公式，不勾选启用文字识别。

hiroi-sora pinned this issue Nov 30, 2023

This was referenced Dec 1, 2023

【预告】V2.0版本 | 前景展望 | 意见收集 | 功能投票 #146

Closed

请问后续有没有计划添加别的引擎版本？ #255

Closed

hiroi-sora mentioned this issue Jan 11, 2024

希望能加入对数学公式识别的支持 #314

Open

hiroi-sora mentioned this issue Mar 2, 2024

一些建议：任务流程/返回值/优化相关 breezedeus/Pix2Text#67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【测试】Umi-OCR 支持数学公式识别啦 #254

【测试】Umi-OCR 支持数学公式识别啦 #254

hiroi-sora commented Nov 30, 2023 •

edited

Loading

LanRenLan commented Dec 3, 2023

oyyuyu commented Dec 5, 2023

hiroi-sora commented Dec 5, 2023 •

edited

Loading

whxzyf commented Dec 7, 2023 •

edited

Loading

oyyuyu commented Dec 9, 2023

906051999 commented Dec 10, 2023

hiroi-sora commented Dec 10, 2023

oyyuyu commented Dec 10, 2023

2054248312 commented Dec 16, 2023

2054248312 commented Dec 16, 2023

hiroi-sora commented Dec 17, 2023 •

edited

Loading

breezedeus commented Dec 20, 2023

hiroi-sora commented Dec 21, 2023 •

edited

Loading

LingyvKong commented Dec 21, 2023 •

edited

Loading

oyyuyu commented Dec 23, 2023

breezedeus commented Dec 24, 2023

RarityBrown commented Jan 1, 2024

realDGD commented Feb 29, 2024

hiroi-sora commented Feb 29, 2024

breezedeus commented Feb 29, 2024

realDGD commented Mar 1, 2024 •

edited

Loading

hiroi-sora commented Mar 1, 2024

hiroi-sora commented Mar 3, 2024

breezedeus commented Mar 3, 2024

Umi-OCR Pix2Text 插件已同步更新到原项目的 `v1.0` 版本！

您可以在 Umi-OCR 插件仓库下载最新版P2T插件。

未来展望

ligb929 commented May 6, 2024 •

edited

Loading

willre commented Sep 29, 2024

hiroi-sora commented Oct 12, 2024

【测试】Umi-OCR 支持数学公式识别啦 #254

【测试】Umi-OCR 支持数学公式识别啦 #254

Comments

hiroi-sora commented Nov 30, 2023 • edited Loading

预览截图：

预览输出：

前言

如何导入插件

P2T的特色

LanRenLan commented Dec 3, 2023

oyyuyu commented Dec 5, 2023

hiroi-sora commented Dec 5, 2023 • edited Loading

whxzyf commented Dec 7, 2023 • edited Loading

oyyuyu commented Dec 9, 2023

906051999 commented Dec 10, 2023

hiroi-sora commented Dec 10, 2023

oyyuyu commented Dec 10, 2023

2054248312 commented Dec 16, 2023

2054248312 commented Dec 16, 2023

hiroi-sora commented Dec 17, 2023 • edited Loading

breezedeus commented Dec 20, 2023

hiroi-sora commented Dec 21, 2023 • edited Loading

LingyvKong commented Dec 21, 2023 • edited Loading

oyyuyu commented Dec 23, 2023

breezedeus commented Dec 24, 2023

RarityBrown commented Jan 1, 2024

realDGD commented Feb 29, 2024

hiroi-sora commented Feb 29, 2024

breezedeus commented Feb 29, 2024

realDGD commented Mar 1, 2024 • edited Loading

hiroi-sora commented Mar 1, 2024

hiroi-sora commented Mar 3, 2024

Umi-OCR Pix2Text 插件已同步更新到原项目的 v1.0 版本！

您可以在 Umi-OCR 插件仓库 下载最新版P2T插件。

未来展望

breezedeus commented Mar 3, 2024

Umi-OCR Pix2Text 插件已同步更新到原项目的 v1.0 版本！

您可以在 Umi-OCR 插件仓库 下载最新版P2T插件。

未来展望

ligb929 commented May 6, 2024 • edited Loading

willre commented Sep 29, 2024

hiroi-sora commented Oct 12, 2024

hiroi-sora commented Nov 30, 2023 •

edited

Loading

hiroi-sora commented Dec 5, 2023 •

edited

Loading

whxzyf commented Dec 7, 2023 •

edited

Loading

hiroi-sora commented Dec 17, 2023 •

edited

Loading

hiroi-sora commented Dec 21, 2023 •

edited

Loading

LingyvKong commented Dec 21, 2023 •

edited

Loading

realDGD commented Mar 1, 2024 •

edited

Loading

Umi-OCR Pix2Text 插件已同步更新到原项目的 `v1.0` 版本！

您可以在 Umi-OCR 插件仓库下载最新版P2T插件。

Umi-OCR Pix2Text 插件已同步更新到原项目的 `v1.0` 版本！

您可以在 Umi-OCR 插件仓库下载最新版P2T插件。

ligb929 commented May 6, 2024 •

edited

Loading