Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract OCR 语言包支持通过环境变量设置 #7242

Closed
88250 opened this issue Feb 2, 2023 · 2 comments
Closed

Tesseract OCR 语言包支持通过环境变量设置 #7242

88250 opened this issue Feb 2, 2023 · 2 comments
Assignees
Milestone

Comments

@88250
Copy link
Member

88250 commented Feb 2, 2023

语言包不要安装太多,否则会导致 OCR 缓慢甚至超时返回空结果,并且占用过多的系统资源。

优先读取环境变量 SIYUAN_TESSERACT_LANGS,例如 SIYUAN_TESSERACT_LANGS=chi_sim+chi_sim_vert+eng+fra+osd,不存在环境变量的情况下自动过滤,只保留:eng、chi*、fra、spa、deu、rus、osd

@88250 88250 added this to the 2.7.3 milestone Feb 2, 2023
@88250 88250 self-assigned this Feb 2, 2023
@88250 88250 changed the title Tesseract 语言包过滤 Tesseract OCR 语言包过滤 Feb 2, 2023
88250 added a commit that referenced this issue Feb 2, 2023
@88250 88250 closed this as completed Feb 2, 2023
@eulores
Copy link
Contributor

eulores commented Feb 3, 2023

Dear D,

please consider the need of users to do OCR with languages outside the suggested list.
Maybe a configuration variable can be added instead, with these default values, but which can be modified if needed?

Thanks for your cosideration!

@88250
Copy link
Member Author

88250 commented Feb 3, 2023

有道理,我想我们可以继续改进为通过环境变量进行扩展,优先读取环境变量中的配置,例如可以配置环境变量 SIYUAN_TESSERACT_LANGS=chi_sim+chi_sim_vert+eng+fra+osd 如果存在这样的环境变量就直接使用了。

@88250 88250 reopened this Feb 3, 2023
@88250 88250 changed the title Tesseract OCR 语言包过滤 Tesseract OCR 语言包支持通过环境变量设置 Feb 3, 2023
@88250 88250 closed this as completed Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants