Skip to content

Tesseract OCR 使用用户安装的语言包 #7157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
eulores opened this issue Jan 24, 2023 · 0 comments
Closed
2 tasks done

Tesseract OCR 使用用户安装的语言包 #7157

eulores opened this issue Jan 24, 2023 · 0 comments
Assignees
Milestone

Comments

@eulores
Copy link
Contributor

eulores commented Jan 24, 2023

该问题是否已经被报告过了? Is there an existing issue for this?

  • 我已经搜索了 Issues,没有发现类似问题
    I have searched the existing issues

该问题是否能够在默认主题(daylight/midnight)下重现? Can the issue be reproduced with the default theme (daylight/midnight)?

  • 我能够在默认主题下重现该问题
    I was able to reproduce the issue with the default theme

描述问题 Describe the problem

I have tesseract installed with additional languages not including Chinese.
In the log file I see tesseract.go:126: no chi_* tesseract lang found, but tesseract is still enabled.
As soon as any images are fed to the OCR engine, the result is a failure with error code 0xc0000005.

期待的结果 Expected result

I would like that the OCR engine correctly processes all my images, using all my installed language packs.
This should be independent from the precise language packs that I decide to install.

截屏或者录屏演示 Screenshot or screen recording presentation

No response

版本环境 Version environment

- Version: 2.7.0
- Operating System: Windows
- Browser (if used):

日志文件 Log file

W 2023/01/24 10:00:34 tesseract.go:126: no chi_* tesseract lang found
W 2023/01/24 10:00:42 tesseract.go:86: tesseract [path=[...]\data\assets\fig3-20230108141236-obvkqyg.png, size=19410] failed: exit status 0xc0000005

更多信息 More information

In source file kernel/util/tesseract.go, inside the function initTesseract(), the variable TesseractLangs is only populated with languages when both eng and chi* are found in the installed list of languages, otherwise that variable stays empty and creates a bad command line (-l ) that crashes tesseract with error code 0xc0000005, "access violation".

I do not have any chi* language installed with my version of tesseract, and that causes this problem.

Suggestion: please make the list of languages to pass to tesseract configurable, with all languages enabled by default (not just eng and chi*).

@88250 88250 changed the title tesseract crashes, unless English and Chinese language packs are installed Tesseract OCR 使用用户安装的语言包 Jan 24, 2023
@88250 88250 self-assigned this Jan 24, 2023
@88250 88250 added this to the 2.7.1 milestone Jan 24, 2023
@88250 88250 closed this as completed Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants