-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Tesseract 4.0.0 – open tasks #1423
Comments
SSE and AVX are also done on CPU :) |
Adding a compile option |
I'll do it. |
My suggestion would be to leave --list-langs as is, and add this as --list-langs-details or as --list-lang-details for one language file based on lang-code. |
--list-langs should also display the directory it is using. This is useful when tessdata files ate installed in multiple directories, eg. By ppa or Linux distribution vs when built directory. |
Re: tessdata, Eg. When doing lstm training, lstm.train config file is not found if one uses tessdata_best as the continue_from dir. My workaround has been to copy these to both tessdata_fast and tessdata_best repos. |
Add/implement install-langs. |
A week with no API changes. |
Add a simple bash script for building tesseract. I use the following, it should probably also add commands to offer to download osd and eng traineddata files for first time users.
|
I would add this:
|
Mission impossible. Edit: That was a joke. |
There was (online) tool that is monitoring API changes (for tesseract). But I can not find a link for it. Does somebody has it? Can somebody show changes 4.0.beta1 vs. current code? |
Please see #793 The tracker is at https://abi-laboratory.pro/tracker/timeline/tesseract/ @zdenop Please tag another release for 3.05 branch since 3.05.01 had a couple of problems which have been fixed in later commits. |
Sorry, I was wrong: there is libtesseract-dev. |
@zdenop I suggest adding labels to issues with the following proposed list of keywords, so that it is easy to see related issues and see if there are any critical pending issues. 4.0.0 for the final relaese LSTM training Accuracy for reports of incorrect recognition Build related to compile and build from source This is a suggested list. |
IMO, our final 4.0.0 should not significantly diverge from the version that will be shipped in Ubuntu 18.04.
A new branch should be created for 4.0.0. We can decide that 4.1.0 will be released 2-3 months after 4.0.0 (still with legacy?). |
How do you define "significantly"? There are some changes with the latest Git master:
Would you suggest reverting these changes? They are major changes which require a step of the major version, so I think 4.0.0 is a good candidate to include those changes. Otherwise we would have to wait for 5.0.0. I would even go further and fix potential name space problems with the 58 include files which are part of the Tesseract programming API in 4.0.0-beta.1, although that is a significant change, too. |
basically, any bug fix is ok, must follow the 2 conditions I specified, no new features. |
What was shipped for Ubuntu 18.04 reports as tesseract 4.00.00alpha. C
I think our aim should be to get all significant changes included in final
4.0.0 and get it ready in time for Ubuntu 18.10. What are the deadlines
for that?
ShreeDevi
…____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Mar 27, 2018 at 5:01 PM, Amit D. ***@***.***> wrote:
How do you define "significantly"?
basically, any bug fix is ok, must follow the 2 conditions I specified, no
new features.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1423 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o7atyVy_7E3uk81VhUn_tqFXFJ3-ks5tiiMogaJpZM4S57Iv>
.
|
18.04 is much more significant because it's LTS - supported for 5 years. |
We tagged it as 4.0.0-beta.1. |
Another option is to skip final 4.0.0 and go straight to 5.0.0. |
As per Jeff, we can't make any changes to what is shipped for 18.04.
But we still have time to do another beta, rc-1 and final 4.0.0 release in
time for 18.10.
I do not really know much about Linux releases, but my hope would be that
users would be able to install/upgrade to the 4.0.0 final version shipped
with 18.10 on 18.04.
@AlexanderP please explain whether the above is possible.
…On Tue 27 Mar, 2018, 5:48 PM Amit D., ***@***.***> wrote:
18.04 is a much more significant because it's LTS - supported for 5 years.
18.10 will be supported for only 9 month. We should not care about it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1423 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o1f3WICsaeI5d2ge9MMOvA8axn5xks5tii4PgaJpZM4S57Iv>
.
|
@zdenop, your thoughts about these two options? |
On Tue 27 Mar, 2018, 5:58 PM Amit D., ***@***.***> wrote:
What was shipped for Ubuntu 18.04 reports as tesseract 4.00.00alpha. C
We tagged it as 4.0.0-beta.1.
Yes, that tag is within github.
Please see the post by Jeff, where he has shown what tesseract -v will
report for 18.04.
… |
What was shipped for Ubuntu 18.04 reports as tesseract 4.00.00alpha. C
>
> We tagged it as 4.0.0-beta.1.
>
Yes, that tag is within github.
Please see the post by Jeff, where he has shown what tesseract -v will
report for 18.04.
Here is the link:
#995 (comment)
… |
Jeff just said that the version in Ubuntu won't change in final 18.04. We are talking about what we want to do in Tessseract's official Github repo. |
>IMO, our final 4.0.0 should not significantly diverge from the version
that will be shipped in Ubuntu 18.04.
I am trying to understand how 4.0.0 final release on github relates to
Ubuntu 18.04, in light of the above.
I am missing your reasoning for why it should not significantly diverge.
…On Tue 27 Mar, 2018, 6:16 PM Amit D., ***@***.***> wrote:
Jeff just said that the the version in *Ubuntu* won't change in final
18.04.
We are talking about what we want to do in Tessseract's official Github
repo.
We are the upstream, not Ubuntu!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1423 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o62Ddg3LsJ9b5FQXiigM96Fy1wGoks5tijS_gaJpZM4S57Iv>
.
|
@stweil Thank you for all your work in getting 4.0.0 ready for release. One of the things that will be useful, IMO, It would be useful when people report issues. However, this is only a nice to have feature, and could wait for 4.1.0. |
https://github.com/tesseract-ocr/tesseract/milestones/4.0.0 show only one open topic. ;-) It would be great if following issues are solved:
|
@zdenop, are you planning a rc4 before the final 4.0.0? Maybe rc4 today, 4.0.0 next weekend? I'm afraid that we won't be able to solve the issues in your list for 4.0.0. |
Don't hurry. Do as many betas and rcs as needed. |
rc4 released. |
That works automatically, also for the release candidates:
It's not necessary to omit and restore something. Just update |
What about replacing |
+1 You can add:
|
https://github.com/tesseract-ocr/tesseract/commits/4.0.0-rc4 shows the commit list for rc4, so users who don't have a git command line can look at https://github.com/tesseract-ocr/tesseract/commits/4.0.0 for the commits of 4.0.0. Such information can be added to the Wiki, so it would be sufficient to refer to the Wiki in the |
Congratulation on the release of 4.0.0 🎉 Thanks to every one who contributed: developers, testers, documentation writers, bug reporters. |
Closing because of 4.0.0. was released.. |
Well, be broke API/ABI compatibility so bug/fix release is not easy (we should remove some fixes/improvement to keep it). Maybe we should think about next release (4.1.0) or do not care about compatibility (release 4.0.1) which is IMO not right, but in line with tesseract history ;-) |
We decided to use semantic versioning (which I think is good), so a new release which is based on Git master would have to be 4.1.0. @AlexanderP, is that a problem for the Debian tesseract-ocr packages? Maybe |
February 21st
[image: Warning /!\] FeatureFreeze
<https://wiki.ubuntu.com/FeatureFreeze>, [image:
Warning /!\] Debian Import Freeze
for *Ubuntu 19.04* DiscoDingo
…On Sun, Feb 10, 2019 at 2:12 PM Stefan Weil ***@***.***> wrote:
We decided to use semantic versioning (which I think is good), so a new
release which is based on Git master would have to be 4.1.0. @AlexanderP
<https://github.com/AlexanderP>, is that a problem for the Debian
tesseract-ocr packages? Maybe /usr/share/tesseract-ocr/4.00/tessdata
would have to be renamed (I suggest to use
/usr/share/tesseract-ocr/4/tessdata).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1423 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE2_o94Ne9JzfaZ_xG5Rc7emQL-oX6Asks5vL9tegaJpZM4S57Iv>
.
--
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
|
The function pointers and callbacks file_reader_, file_writer_, checkpointer_reader_ and checkpoint_writer_ are always set to the same values. Replacing them by direct function calls simplifies the code and allows removing more code from tesscallback.h. Signed-off-by: Stefan Weil <[email protected]>
Debian will start with a new stable release in a few days, and as far as I see that new release will include Tesseract 4.0 for the next few years. Should we backport important fixes to the 4.0 branch? What does that mean for Tesseract 4.1? Are there still interested parties who need it? Or should we focus on Tesseract 5 which may drop or replace old code? @AlexanderP, what upgrade path do you see for Debian? |
This project has limited resources, so I suggest to release 4.1 soon (1-6 weeks), and then concentrate on 5.0 and abandon 4.x. |
I planned to release 4.1 on first of July. Unfortunately I found out there are problem with backwards API compatibility... |
I think it is necessary to load version 4.1 and to upgrade to version 5.0 is closer to release. |
@AlexanderP : Does it mean that if we make 4.1 backwards compatible, you can get it to Debian? |
@zdenop I think he can get into the Debian Backports. |
So Debian Buster will keep using Tesseract 4.0 for the next years? Then a 4.0.1 with carefully selected bug fixes will be required. |
Yes, but it is necessary to ask @jbreiden |
In general, Debian only accepts security fixes for their stable releases.
And that's fine.
People who want fresher software will often do something else (such as run
Debian Testing).
I'm not sure how many people use buster-backports, but if Alexander wants
to make them,
I'm happy to keep signing. (Someday I imagine he will get his own keys, in
my opinion he has
more than earned them!)
Reminder of versions in Debian:
https://packages.qa.debian.org/t/tesseract.html
|
I'd like to collect open tasks which should be addressed before tagging the official release 4.0.0.
These tasks are on my own list and to be discussed whether we consider them important for the new release or not:
--version
parameter for all command line commands.--list-langs
to show additional information for scripts and languages like legacy / LSTM, version. This will make the command slower, because each file must be opened and parsed.The text was updated successfully, but these errors were encountered: