-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New high-level Python API #94
base: master
Are you sure you want to change the base?
Conversation
- Use PIL.PyAccess when filling Tensor2 from image - Return unicode string from `ClstmOcr.aligned` - Disable warnings during compilation
into a higher-level `prepare_training` method
Did you install Cython from the Jessie package? I just tried it with a edit: Can confirm, this is due to Jessie shipping with Cython 0.21.1. Smart pointers like |
I tried the version shipped in Jessie stable first, then pip install, but it seemed to fall back to the Jessie bundled path at some point. As I said, I guess it's just a path issue. I'll try the fixed bytes/str commit later. |
After removing
|
Hm, that's weird :-) Can you make a core dump and check out the trace with gdb? $ ulimit -c unlimited
$ python run_uw3_500.py
$ gdb $(which python) core
# Then enter `bt` to get the backtrace |
The segfault was due to a compatibility problem with older versions of Pillow, Jessie uses 2.6.1 while I used 3.4.2 for developing. 2.9.0 added the |
Awesome. Are already working on interfacing the lower level INetwork interface? If not I'll put something together as I'm currently working on a new training subcommand for kraken and the old swig bindings are not complete enough for that purpose. |
Nope, I played around with it for a while, but gave up on it pretty quickly. My main aim was to make accessing the high-level OCR stuff from |
My main need is having access to the output matrix for running a slightly modified label extractor producing bounding boxes as the label locations are just the point of the maximum value in the softmax layer in a thresholded region. Explicit codec access is also rather useful. I'd quite like to switch to a ML library more widely used but I haven't found one yet that doesn't use incredibly annoying serialization (pickles, pickles everywhere and somewhat easy to fix) and more importantly has reasonably performant model instantiation. With CLSTM I'm able to instantiate/deserialize models instantaneously while tensorflow and theano always run compilation (and per default optimization) steps which take at least a minute even on a modern machine. As far as I know it is also rather inherent in their design so there's no way around it. |
@mittagessen |
warp-ctc used with LSTM |
""" | ||
graphemes_str = u"".join(sorted(graphemes)) | ||
cdef vector[int] codec | ||
cdef Py_ssize_t length = len(graphemes_str.encode("UTF-16")) // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTF-16 is a variable length encoding, which may increase code point count and produce an incorrect codec. Just exchange the length calculation by len(graphemes_str) and everything should be fine.
I had a short look at mxnet as it seemed promising and I prefer its interface to theano's; initialization still takes quite a bit of time and warp-ctc is prone to crashes (so no drop-in replacement), although I'll probably work more with it for the layout analysis thingy once I get around to it. |
Sorry for spamming but there's one major reason for using the lower level interface. By preloading the entire training set into memory and doing all the normalization, encoding, etc. once, I've just now decreased training time by ~2/3. While I'm fairly sure the main reason is just having everything in main memory rerunning the codec and line normalization over and over again seems needlessly wasteful. |
That's a really good point. I'll see what I can do about exposing the lower-level interfaces :-) |
how ? can i realize that through your kraken trainning api? |
@wanghaisheng: You don't really as the old swig interface is broken, so it isn't quite possible to instantiate a network. What is working (since yesterday night) is continuing training a model with the separate_derivs branch and some minor bug fixes to the swig interface. Wait a few days until we've sorted out some of the parallel development. |
@mittagessen I've started work on exposing the |
@jbaiter |
The eigency code for eigen->numpy is just:
for bazillion combinations of orders and data types and while I haven't looked at the memory layout of a tensor object it should work for 2nd order tensors without adaptation (ugly but workable for now). The other way around is in eigency_cpp.h and will probably work for 2nd order tensors, too. For higher orders I'd have to take a look at how strides are implemented in both ndarray and eigen tensors. |
Basic CLI for training OCR using python bindings
I've merged this with current master in cython-2017 branch, so as not to interfere with any changes you may not have pushed. |
Since the original Python bindings are not working anymore and are unlikely to be fixed/maintained in the future, I created new high-level bindings using Cython. The module is compatible with both Python 2 and 3 and can be installed by running
pip install .
in the root directory of the repository.For both training and prediction, images loaded via PIL/Pillow can be used, as well as numpy arrays.
Currently only the OCR functionality is exposed, but I plan on adding a wrapper around
ClstmText
in the future.The API documentation can be found at https://jbaiter.github.io/clstm.
An example on how the training and prediction API is used can be found in
run_uw3_500.py
. This script is very close to what therun-uw3-500
application does, only through Python, so it can be used to compare performance. In my tests I found that the performance of the Python and C++ versions is pretty much indistinguishable.