- Input data must follow the folder structure
data_tag/page
, where images must be into thedata_tag
folder and xml files intopage
. For example:
mkdir -p data/{train,val,test,prod}/page;
tree data;
data
├── prod
│ ├── page
│ │ ├── prod_0.xml
│ │ └── prod_1.xml
│ ├── prod_0.jpg
│ └── prod_1.jpg
├── test
│ ├── page
│ │ ├── test_0.xml
│ │ └── test_1.xml
│ ├── test_0.jpg
│ └── test_1.jpg
├── train
│ ├── page
│ │ ├── train_0.xml
│ │ └── train_1.xml
│ ├── train_0.jpg
│ └── train_1.jpg
└── val
├── page
│ ├── val_0.xml
│ └── val_1.xml
├── val_0.jpg
└── val_1.jpg
- Run the tool.
python P2PaLA.py --config config.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
- Use TensorBoard to visualize train status:
tensorboard --logdir ./work/runs
- xml-PAGE files must be at "./work/results/test/"
We recomend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.
- For detail about arguments and config file, see the full help or
python P2PaLa.py -h
. - For more detailed example see egs:
- cBAD complex competition dataset see
Return to docs