Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New high-level Python API #94

Open
wants to merge 38 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
390fb1a
Fix custom eigen path in SConstruct
jbaiter Aug 4, 2016
b9b0769
Fix isnan
jbaiter Aug 4, 2016
bdd149c
Fix test error reporting
jbaiter Aug 4, 2016
c85894e
First running cythonized bindings
jbaiter Aug 5, 2016
22570e3
Implement bindings for recognition
jbaiter Aug 6, 2016
b8fe1be
Add levenshtein binding
jbaiter Aug 6, 2016
ec56f6d
Minor stuff
jbaiter Aug 6, 2016
fdb0fc8
Add example Python script for UW3 training
jbaiter Aug 6, 2016
2aed97c
Remove SWIG bindings
jbaiter Aug 7, 2016
f1466fb
Include compiled protobuf files
jbaiter Aug 7, 2016
56ca566
Fix bug in protobuf stale check
jbaiter Aug 14, 2016
4e2a7aa
python: allow loading of model in constructor
jbaiter Aug 14, 2016
d17cea1
python: Add option to use numpy array as image data
jbaiter Aug 14, 2016
90b336a
Merge branch 'master' of https://github.com/tmbdev/clstm into cython
jbaiter Oct 6, 2016
368e6af
Combine create_bidi/set_learning_rate
jbaiter Oct 6, 2016
5e02b0f
More docstrings
jbaiter Oct 6, 2016
1b66443
Embed function signatures into Python extension
jbaiter Oct 6, 2016
7b32e65
Add docs for Python extension
jbaiter Oct 6, 2016
9aecfe0
Update README
jbaiter Oct 7, 2016
3c9b5ad
Rename to
jbaiter Oct 14, 2016
02f14b2
Don't track generated protobuf code
jbaiter Sep 3, 2016
4701ccb
Fix typo in Cython code
jbaiter Oct 14, 2016
fa6b00e
Adapt run_uw3_500.py script
jbaiter Oct 14, 2016
eed8b55
Update docs
jbaiter Oct 14, 2016
d96195a
Fix typo in docstring
jbaiter Oct 10, 2016
002c9ce
Python 3 compatibility
jbaiter Oct 18, 2016
e9be7cd
Merge branch 'master' into cython
jbaiter Oct 24, 2016
92245d8
Merge remote-tracking branch 'upstream' into cython
jbaiter Oct 24, 2016
262b691
Add requirements.txt
jbaiter Oct 24, 2016
7050e57
Fix std=c++11 flag in setup.py
jbaiter Oct 24, 2016
e670308
run_uw3_500: sys.version{,_info}, die helpfully unless './book'
kba Oct 25, 2016
62dadea
Allow all possible string types for fname in save/load
jbaiter Oct 25, 2016
d3ee308
Update required Cython version
jbaiter Oct 25, 2016
c690c99
Remove unused import
jbaiter Oct 25, 2016
1c89c0a
Make image loading compatible with Pillow<2.9.0
jbaiter Oct 26, 2016
47653f3
Fix length calculation (thanks @mittagessen)
jbaiter Oct 26, 2016
40cd277
Basic training CLI 'pyclstm-train'
kba Oct 27, 2016
dbd07e4
Merge pull request #1 from kba/cython-trainer-cli
jbaiter Nov 1, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ build/
*.os
*.a
*.so
pydoc/_build
12 changes: 5 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,7 @@ After building the executables, you can run two simple test runs as follows:

To build the Python extension, run

python setup.py build
sudo python setup.py install

(this is currently broken)
pip install .

# Documentation / Examples

Expand Down Expand Up @@ -165,9 +162,10 @@ storage format.

# Python API

The `clstm.i` file implements a simple Python interface to clstm, plus
a wrapper that makes an INetwork mostly a replacement for the lstm.py
implementation from ocropy.
The source code includes a Python interface to clstm (via Cython). Currently
it only exposes the `CLSTMOCR` class for OCR training and prediction.
To install it, just make sure you have the above dependencies and
Cython (>=0.23) installed and run `pip install .`.

# Comand Line Drivers

Expand Down
81 changes: 81 additions & 0 deletions _clstm.pxd
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
from cpython.ref cimport PyObject
from libc.stddef cimport wchar_t
from libcpp.vector cimport vector
from libcpp.string cimport string
from libcpp.memory cimport shared_ptr


cdef extern from "<string>" namespace "std":
cppclass wstring:
cppclass iterator:
iterator()
wchar_t* operator*()
iterator(iterator &)
iterator operator++()
iterator operator--()
iterator operator==(iterator)
iterator operator!=(iterator)
iterator begin()
iterator end()


cdef extern from "pyextra_defs.h":
cdef Py_ssize_t Unicode_AsWideChar(PyObject* ustr, Py_ssize_t length,
wchar_t* wchars)


cdef extern from "pstring.h":
wstring utf8_to_utf32(string s)


cdef extern from "clstm.h":
cdef double levenshtein[A, B](A a, B b)


cdef extern from "clstm.h" namespace "ocropus":
cdef cppclass Assoc:
Assoc()
Assoc(string &s)
bint contains(string &key, bint parent = true)
string get(string &key)
string get(string &key, string default)
void set(string &key, string value)

cdef cppclass INetwork:
Assoc attr

ctypedef shared_ptr[INetwork] Network


cdef extern from "tensor.h" namespace "ocropus":
cppclass TensorMap2:
pass

cdef cppclass Tensor2:
int dims[2]
float *ptr
void resize(int i, int j)
void put(float val, int i, int j)
float get(int i, int j)
TensorMap2 map()

cdef extern from "clstmhl.h" namespace "ocropus":
struct CharPrediction:
int i
int x
wchar_t c
float p

# NOTE: The content of `codec` should be the utf-32 characters that the
# network is supposed to learn, encoded as integers
cppclass CLSTMOCR:
int target_height
Network net
bint maybe_load(string &fname)
bint maybe_save(string &fname)
void createBidi(vector[int] codec, int nhidden)
void setLearningRate(float learning_rate, float momentum)
string train_utf8(TensorMap2 imgdata, string &target)
string predict_utf8(TensorMap2 imgdata)
void predict(vector[CharPrediction] &preds, TensorMap2 imgdata)
string aligned_utf8()
Loading