paper | pretrained models | darknet
REV (paper) is a text analysis pipeline which detects text elements in a chart, classifies their role (e.g., chart title, x-axis label, y-axis title, etc.), and recovers the text content using optical character recognition. It also uses a Convolutional Neural Network for mark type classification. Using the identified text elements and graphical mark type, it infers the encoding specification of an input chart image.
Our pipeline consist of the following steps:
- Text localization and recognition
- Text role classification
- Mark type classification
- Specification induction
In a perfect world, the following line
make && conda activate rev
should prepare the computer for our pipeline (allowing, for instance, the execution of the cells in the notebook Example.ipynb). Nevertheless, the world isn't perfect; so, we provide more details in the next paragraphs.
For instance, if you confront some inconveniences while using caffe
, please check locate_caffe; it contains the description of usual adversities that contemplate caffe
-- and how we are addressing them.
On the other hand, do notice that the models and the data are kept in OSF, which is an awesome project by the way. However, you are free to use the files available on Google Drive, yet they aren't actively maintained (OSF is up-to-date!).
Our API works with objects of the class Chart
. A chart is composed of an image ( visualization) and the text elements (texts, text boxes, and text roles).
In this example, we use the image examples/image.png
and a CSV file that contains the information of the text elements examples/image-texts.csv
with the following format:
id,x,y,width,height,text,type
1,30,5,19,17,"45",y-axis-label
...
from rev.chart import Chart
chart = Chart('examples/image.png', text_from=0)
The parameter 'text_from' means:
-
0: read ground truth data:
- '{image_name}-texts.csv'
- '{image_name}-mask.png'
- '{image_name}-debug.png'
-
1: read text from 'pred1', i.e., ground truth boxes and output of text role classification and output of OCR:
- '{image_name}-pred1-texts.csv'
- '{image_name}-pred1-mask.png'
- '{image_name}-pred1-debug.png'
-
2: read text from 'pred2', i.e., output of text localization and output of text role classification, and output of OCR:
- '{image_name}-pred2-texts.csv'
- '{image_name}-pred2-mask.png'
- '{image_name}-pred2-debug.png'
In some cases, it is possible we do not have access to the information of the text elements, so we can infer them using our pipeline. Also, we can write the information files using the methods of the
Chart
class:
# Create a new chart
chart = Chart('examples/image.png', text_from=2)
# Infer the text boxes information
inferred_text_boxes = ... #(we will explain each step of the pipeline further)
# Set the inferred text boxes to the chart
chart.text_boxes = inferred_text_boxes
# Save the file with the information
chart.save_text_boxes()
In this example, the
text_from=2
parameter indicates that even though theexamples/image-pred2-texts.csv
file does not yet exist, all the information will be saved in a new file with that name.
For text localization and recognition we must first create an object of the class TextLocalizer
from rev.text.localizer import TextLocalizer
localizer = TextLocalizer(method='default')
When we instantiate an object of the TextLocalizer
class, it is possible to choose the method we will use with the method
parameter, which allows us to choose between two methods:
- default: uses the same technique proposed in this paper.
- pixel_link: uses the technique presented in en 'PixelLink: Detecting Scene Text via Instance Segmentation'.
- craft: uses the technique presented in CRAFT: Character-Region Awareness For Text detection.
For CRAFT, in particular, we need to load the pretrained model; it is available here. With the pth file in hand, use the craft_model
argument on the instantiation of TextLocalizer
class. For instance,
localizer = TextLocalizer(method = "craft",
craft_model = "/path/to/model.pth")
[Image description: Metrics for each method, in each data set, for text localization: recall, dice, F1 score, jaccard and precision. The default method is, then, more appropriate in general.]
Also, we can choose, at this moment, the method for the text recognition: Tesseract or Attn. For Attn, in particular, we need additional (hyper)parameters; specifically, the path to the trained model, which is available (currently) at this repository, and other idiosyncratic aspects of the model, which are described in the documentation. The next snippet, then, represents its usage.
localizer = TextLocalizer(method = "craft",
craft_model = "path/to/model",
ocr = "attn",
attn_params = {"saved_model": "path/to/model"})
Then we use the localize
method that receives a list of charts as input and returns the text boxes and text for each chart in the list.
all_text_boxes = localizer.localize([chart])
As in this example, we only use one chart, we will take the first element of the returned list, which contains the text boxes and texts of our chart.
chart_text_boxes = all_text_boxes[0]
for text_box in chart_text_boxes:
print(text_box)
Finally, we create a copy of the original chart to which we assign the text boxes and save a new file with the calculated information (examples/image-pred2-texts.csv
).
new_chart = chart.copy(text_from=2)
new_chart.text_boxes = chart_text_boxes
new_chart.save_text_boxes()
We also save an image where we can visualize the results at this stage of the pipeline (examples/image-pred2-debug.png
).
new_chart.save_debug_image()
For the text role classification task, we need to instantiate an object of the TextClassifier
class and use the classify
method that receives as input a list of charts and returns the labels with the text roles for each chart.
from rev.text import TextClassifier
text_clf = TextClassifier('default')
all_text_type_preds = text_clf.classify([chart])
text_type_preds = all_text_type_preds[0]
for text_box, type_rol in zip(chart.text_boxes , text_type_preds):
print(text_box.text,':',type_rol)
We provide the feature_extractor.from_chart
function for extracting features from a chart, and you can build your feature file for training from new charts.
from rev.text import feature_extractor
text_features = feature_extractor.from_chart(chart)
text_features
It is possible to train our model to classify text roles. To achieve this, we need a CSV file containing the features for each textbox in the image and the type label (role) that we will use for the training. Check the file data/features_all.csv
for an example.
import pandas as pd
data = pd.read_csv('data/features_all.csv')
data.head()
-
First, we choose the features from our dataset that we will use in training, in this case, we provide the list with the features used in the paper:
rev.text.classifier.VALID_COLUMNS
. -
Then we take the
type
column as the text role labels to be used in training.
import rev.text
features = data[rev.text.classifier.VALID_COLUMNS]
types = data['type']
- Finally, we created an instance of the
TextClassifier
class and used thetrain
method that receives as parameters the features and labels that will be used in training.
text_clf = TextClassifier()
text_clf.train(features, types)
The MarkClassifier
class is used to classify the type of mark on the chart. Currently, our API has two different trained models.
-
charts5cats
Model trained with the following five categories:
- area
- bar
- line
- plotting_symbol
- undefined.
-
revision
Model trained with the following ten categories, using the data presented in the paper ReVision: Automated Classification, Analysis and Redesign of Chart Images:
- AreaGraph
- BarGraph
- LineGraph
- Map
- ParetoChart
- PieChart
- RadarPlot
- ScatterGraph
- Table
- VennDiagram
The classify
method also receives a list of charts and returns a list with the predicted marks for each chart.
from rev.mark import MarkClassifier
mark_clf = MarkClassifier(model_name = 'charts5cats')
print(mark_clf.classify([chart]))
The last step in our pipeline is the generation of the specification. The class SpecGenerator
performs this task. To generate the specification (visual encoding) of a chart, it is only necessary to use the generate
method that works with a list of charts and returns another list with the specifications for each chart.
from IPython.display import JSON
from rev.spec.generator import SpecGenerator
import json
chart = Chart('examples/vega1.png', text_from=0)
spec_gen = SpecGenerator()
spec = spec_gen.generate([chart])
JSON(spec[0], expanded=True)
Here is an example of how to use the API to generate the specification from a chart image from scratch and without any other information.
Default:
from IPython.display import JSON
from rev.spec.generator import SpecGenerator
import json
from rev.chart import Chart
from rev.text.localizer import TextLocalizer
from rev.text import TextClassifier
# Load a chart
chart = Chart('examples/image.png')
# Text localization and recognition:
localizer = TextLocalizer()
# set textbox information
text_boxes = localizer.localize([chart])
chart.text_boxes = text_boxes[0]
# Getting the roles for each textbox
text_clf = TextClassifier('default')
text_type_preds = text_clf.classify([chart])
# Set the role for each textbox on the chart
for (text_box, role) in zip(chart.text_boxes, text_type_preds[0]):
text_box.type = role
# Generamos la especificación (este método internamente también obtiene el tipo marca del chart)
spec_gen = SpecGenerator()
spec = spec_gen.generate([chart])
JSON(spec[0], expanded=True)
Neural Network based text localization:
from rev.chart import Chart
from rev.text.localizer import TextLocalizer
from rev.text.classifier import TextClassifier
from rev.spec.generator import SpecGenerator
import json
# Hyperparameters
attn_parameters = {
"saved_model": "../models/attn/TPS-ResNet-BiLSTM-Attn-case-sensitive.pth",
}
craft_params = {
"text_threshold": .7,
"link_threshold": .4,
"low_text": .4,
"poly": False,
"canvas_size": 1280,
"mag_ratio": 1.8,
"cuda": False
}
chart = Chart("examples/chart.png")
text_classifier = {
"default": "../models/text_role_classifier/text_type_classifier.pkl"
}
localizer = TextLocalizer("craft",
craft_model = "../models/craft/craft_mlt_25k.pth",
craft_params = craft_params,
ocr = "attn",
attn_params = attn_parameters)
chart.text_boxes = localizer.localize([chart],
debug = True)[0]
# print(chart.text_boxes)
text_clf = TextClassifier(model_checkpoint = text_classifier["default"])
text_type_preds = text_clf.classify([chart])
# Set the role for each textbox on the chart
for (text_box, role) in zip(chart.text_boxes, text_type_preds[0]):
text_box.type = role
# Generate specification and chart mark
spec_gen = SpecGenerator()
spec = spec_gen.generate([chart])
json.loads(spec[0])
Some usefull script to reproduce results from paper:
# run text localization and recognition in multiple charts
python scripts/run_box_predictor.py multiple ./data/academic.txt
python scripts/run_box_predictor.py multiple ./data/quartz.txt
python scripts/run_box_predictor.py multiple ./data/vega.txt
# script to rate the text localization module
python scripts/rate_box_predictor.py ./data/academic.txt --mask --pad 3 --from_bbs 2
python scripts/rate_box_predictor.py ./data/quartz.txt --mask --pad 3 --from_bbs 2
python scripts/rate_box_predictor.py ./data/vega.txt --mask --pad 3 --from_bbs 2
# script to rate the text-role classifier
python scripts/rate_text_role_classifier.py features ./data/features_academic.csv
python scripts/rate_text_role_classifier.py features ./data/features_quartz.csv
python scripts/rate_text_role_classifier.py features ./data/features_vega.csv
# script to extract features
python scripts/run_feature_extraction.py multiple ./data/academic.txt out.csv
# train text-role classifier
python scripts/run_text_role_classifier.py train ./data/features_all.csv out.plk
# run text-role classifier in a chart to test
python scripts/run_text_role_classifier.py single ./examples/vega1.png
# run text-role classifier in multiple charts
python scripts/run_text_role_classifier.py multiple ./data/academic.txt
Also, we have a script for evaluating CRAFT; execute, for instance,
# run text localization using CRAFT
python scripts/rate_craft.py academic
python scripts/rate_craft.py quartz
python scripts/rate_craft.py vega
# script to rate the text localization module
python scripts/rate_box_predictor.py ./data/academic.txt --mask --pad 2 --from_bbs 2
python scripts/rate_box_predictor.py ./data/quartz.txt --mask --pad 2 --from_bbs 2
python scripts/rate_box_predictor.py ./data/vega.txt --mask --pad 2 --from_bbs 2