diff --git a/index.html b/index.html index bed9ab9..4edbac7 100644 --- a/index.html +++ b/index.html @@ -1,23 +1,21 @@ + - + - "LineTR:Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts" + "LineTR: Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts" - + - + @@ -27,104 +25,105 @@ + - - -
-
-
-
-
-

LineTR

- + +
+
+
+
+
+

LineTR

+ -
- International Institute of Information Technology,Hyderabad -
-
- Center for Visual Information Technology (CVIT) -
+
+ International Institute of Information Technology,Hyderabad +
+
+ Center for Visual Information Technology (CVIT) +
-
-
-
- +
+ - +
+
+
+ +

+ LineTR works on palm leaf manuscripts in a dataset agnostic manner. +

+
- -
--> + + - -
-
-
-
- Title Card. -
+ + +
+
+ +
+
+

Abstract

+
+

+ Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, + scripts, and appearances. + Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting + scalability and maintainability.
+ To this end, we propose LineTR, a single model for all dataset collections. + LineTR is a two-staged approach. The first stage predicts text-strike-through lines called + scribbles and a novel text-energy map of the input document image. The second stage is a + seam-generation network which uses these to get precise polygons around the text-lines.
+ Text-line segmentation has been mainly approached as a dense-prediction task, which is ineffective, as the + inductive prior of a line is not utilized, and this leads to poor segmentation performance. Thus, our key + insight is to parametrize a text-line, thus preserving these inductive priors. To avoid resizing the + document, the input image is first broken down into context-adapted patches, and each patch is + processed by the stage-1 network independently. The patch-level outputs are combined using a + dataset-agnostic post processing pipeline. Notably, we show that carefully choosing the patch size to + capture enough context is crucial for generalization, as document images come in arbitrary + resolutions. + LineTR has been evaluated extensively through experiments and qualitative comparisons. Additionally, our + method exhibits strong zero-shot generalization to unseen document collections. +

+
+
+
+ + + + +
-
-
- - - - -
-
- -
-
-

Abstract

-
-

- We propose LineTR, a novel two-stage line segmentation approach which can process a diverse variety of challenging handwritten documents in a unified, - dataset-agnostic manner. -

-

- Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, scripts, and appearances. - Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting scalability and maintainability. - In the first stage, LineTR processes context-adaptive image patches using a DETR-style network to generate parametric representations of text lines and a hybrid CNN-transformer network to create a text energy map. - A robust post-processing procedure converts these into document-level scribbles. - In the second stage, these scribbles and the text energy map are used to generate precise polygons enclosing the text lines. - Experimental results demonstrate that LineTR achieves superior line segmentation with a single model and performs well in zero-shot inference on the new datasets. -

+
+ + + + +
+
+ +
+
+

Why do previous methods fail?

+
+

+ Previous work treats text-line segmentation as a dense-prediction task. This leads to merging of adjacent + text-lines, leading to poor segmentation performance. +

+
+
+
+
+
+
+ method +
- +
+ + - -
-
-

Video

-
- + +
+
+ +
+
+

Proposed Approach: LineTR

+
+

+ Our method first breaks the input image into context-adapted patches (1). These image patches are + processed independently by a branched network (stage-1) to output line-parameters and a text-energy map. + Specifically, an image patch is passed through a ViT encoder to obtain image features. A DETR-style + network, called the Line-Parameter Generator (2a) decodes a set of randomly initialized + line-queries conditioned on the image features, and finally predicts the line parameters and + probability scores. The second branch, the Text-Energy Map Generator is a hybrid CNN-transformer + network which predicts the text-energy map as shown. The patch-level outputs from both the branches are + independently post-processed to obtain global outputs (3).
+ Stage-2 is a seam generation network, which uses the outputs of stage-1 to output precise polygons + enclosing the text lines (4). +

+
+
+
+
+
+
+ method +
+
LineTR
+
+
- -
- - - - - -
-
- -
-
-

Introduction

-
-

- Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, scripts, and appearances. - Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting scalability and maintainability. - In the first stage, LineTR processes context-adaptive image patches using a DETR-style network to generate parametric representations of text lines and a hybrid CNN-transformer network to create a text energy map. - A robust post-processing procedure converts these into document-level scribbles. - In the second stage, these scribbles and the text energy map are used to generate precise polygons enclosing the text lines. - Experimental results demonstrate that LineTR achieves superior line segmentation with a single model and performs well in zero-shot inference on the new datasets. -

+
+ + + + +
+
+ +
+
+

Re-Imagining Text-Lines!

+
+

+ We use the point-slope form to parametrize a text-line, as shown. +

+
+
+
+
+
+
+
+ method +
+
-
- - - - -
-
- -
-
-

Network Architecture

-
-

- Historical manuscripts pose significant challenges for line segmentation due to their diverse sizes, scripts, and appearances. - Traditional methods often rely on dataset-specific processing or training per-dataset models, limiting scalability and maintainability. - In the first stage, LineTR processes context-adaptive image patches using a DETR-style network to generate parametric representations of text lines and a hybrid CNN-transformer network to create a text energy map. - A robust post-processing procedure converts these into document-level scribbles. - In the second stage, these scribbles and the text energy map are used to generate precise polygons enclosing the text lines. - Experimental results demonstrate that LineTR achieves superior line segmentation with a single model and performs well in zero-shot inference on the new datasets. -

+
+ + + + +
+
+ +
+
+

Choose Your Patches Wisely!

+
+

+ Patching avoids resizing the document to a small size. However, choosing fixed size patches is ineffective and hinders out of domain generalization. This is explained by the fact that document images come in arbitrary resolutions, and therefore a fixed patch may not capture good context. +

+
+
+
+
+
+
+
+ method +
+
+
+
+
+
+
+

+ To this end, we propose an algorithm for context-aware patching. (1) We sample raw patches of varying sizes. (2) For these raw patches, we perform inference through the Line-Parameter Generator to get noisy predictions. (3) These noisy predictions are used to estimate an average value of the interline gap in the document. (4) This interline gap is then used to get the context adapted patch size. Patches of this size are finally sampled from the document, and fed to LineTR for inference. +

+
+
+
+
+
+
+
+ method +
+
Context-Adaptive Patching
+
+
+
-
- + + - -
-
+ +
+
@@ -273,8 +364,8 @@

Qualitative Results

-

Curved Text-lines

-

SeamFormer and Palmira - fail when the text-lines have a curvature spread across the document width. But LineTR is able to detect all the text-lines accurately. +

SeamFormer and Palmira - fail when the text-lines have a curvature spread across the document width. But + LineTR is able to detect all the text-lines accurately.

@@ -282,36 +373,39 @@

Curved Text-lines

- Predictions from SeamFormer for a manuscript with curved text. -
- SeamFormer. + Predictions from SeamFormer for a manuscript with curved text. +
+ SeamFormer.
- Predictions from Palmira for a manuscript with curved text. -
- Palmira. -
+ Predictions from Palmira for a manuscript with curved text. +
+ Palmira. +
- Predictions from LineTR (ours) for a manuscript with curved text. -
- LineTR (Ours). -
+ Predictions from LineTR (ours) for a manuscript with curved text. +
+ LineTR (Ours). +
+
-
- - + +
-

Dense Text-lines

-

SeamFormer and Palmira - fails on images where the density of text is very high. But LineTR succeeds in detecting all the text-lines accurately. +

SeamFormer and Palmira - fails on images where the density of text is very high. But LineTR succeeds + in detecting all the text-lines accurately.

@@ -319,34 +413,37 @@

Dense Text-lines

- Predictions from SeamFormer for a manuscript with dense text. -
- SeamFormer. + Predictions from SeamFormer for a manuscript with dense text. +
+ SeamFormer.
- Predictions from Palmira for a manuscript with dense text. -
- Palmira. -
+ Predictions from Palmira for a manuscript with dense text. +
+ Palmira. +
- Predictions from LineTR (ours) for a manuscript with dense text. -
- LineTR (Ours). -
+ Predictions from LineTR (ours) for a manuscript with dense text. +
+ LineTR (Ours). +
-
+
-

Zero-shot Predictions

+

Zero-shot Results

Zero-shot outputs of LineTR on the newly introduced datasets.

@@ -355,7 +452,7 @@

Zero-shot Predictions

- Zero-shot result from SM. + Zero-shot result from SM.
@@ -363,7 +460,7 @@

Zero-shot Predictions

- Zero-shot result from UB. + Zero-shot result from UB.
@@ -371,35 +468,36 @@

Zero-shot Predictions

- Zero-shot result from WM. + Zero-shot result from WM.
-
+
-

LineTR performs well on other Handwritten Datasets

-

Even though LineTR was trained only on palm leaf manuscripts, it is able to generalize to documents well outside it’s primary domain.

+

LineTR generalizes well!

+

Even though LineTR was trained only on palm leaf manuscripts, it is able to generalize to documents well + outside its domain.

- ICDAR2017 dataset prediction 1 -
- ICDAR2017 HTR dataset + ICDAR2017 dataset prediction 1 +
+ ICDAR2017 HTR dataset
- ICDAR2017 dataset prediction 2 -
- ICDAR2017 HTR dataset -
+ ICDAR2017 dataset prediction 2 +
+ ICDAR2017 HTR dataset +
@@ -416,56 +514,58 @@

Quantitative Results

- Comparative evaluation of LineTR against baseline models using benchmark datasets. + Comparative evaluation of LineTR against baseline models using benchmark datasets.
-
-
+
+
- + - -
-
-

BibTeX

-
@article{vaibav2024linetr,
+  
+  
+
+

BibTeX

+
@article{vaibav2024linetr,
   author    = {Agrawal, Vaibhav and Vadlamudi, Niharika and Waseem, Muhammad and Joseph, Amal and Chitluri, Sreenya and Sarvadevabhatla, Ravi Kiran},
   title     = {LineTR:Unified Text Line Segmentation for Challenging Palm Leaf Manuscripts},
   journal   = {ICPR},
   year      = {2024},
 }
-
-
- -
-
-

Contact

-
-

- If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in. -

-
-
- +
+
+ +
+
+

Contact

+
+

+ If you have any question, please contact Dr. Ravi Kiran Sarvadevabhatla at ravi.kiran@iiit.ac.in. +

+
+
+ - -
-
+ +
+
-
- This website templated is borrowed from nerfies. -
+
+ This website templated is borrowed from nerfies. +
-
-
- +
+
+ - + + \ No newline at end of file diff --git a/static/images/context_patching.png b/static/images/context_patching.png new file mode 100644 index 0000000..20aa0e7 Binary files /dev/null and b/static/images/context_patching.png differ diff --git a/static/images/good_bad_context.png b/static/images/good_bad_context.png new file mode 100644 index 0000000..26298c5 Binary files /dev/null and b/static/images/good_bad_context.png differ diff --git a/static/images/method.png b/static/images/method.png new file mode 100644 index 0000000..643e222 Binary files /dev/null and b/static/images/method.png differ diff --git a/static/images/re_imagine.png b/static/images/re_imagine.png new file mode 100644 index 0000000..600bbaa Binary files /dev/null and b/static/images/re_imagine.png differ diff --git a/static/images/seamformer_fail.jpg b/static/images/seamformer_fail.jpg new file mode 100644 index 0000000..9f381d8 Binary files /dev/null and b/static/images/seamformer_fail.jpg differ diff --git a/static/videos/1711_final_2mins.mp4 b/static/videos/1711_final_2mins.mp4 new file mode 100644 index 0000000..db6f857 Binary files /dev/null and b/static/videos/1711_final_2mins.mp4 differ