Skip to content

Tesseract OCR 语言包支持通过环境变量设置 #7242

Closed
@88250

Description

@88250
Member

语言包不要安装太多,否则会导致 OCR 缓慢甚至超时返回空结果,并且占用过多的系统资源。

优先读取环境变量 SIYUAN_TESSERACT_LANGS,例如 SIYUAN_TESSERACT_LANGS=chi_sim+chi_sim_vert+eng+fra+osd,不存在环境变量的情况下自动过滤,只保留:eng、chi*、fra、spa、deu、rus、osd

Activity

added this to the 2.7.3 milestone on Feb 2, 2023
self-assigned this
on Feb 2, 2023
changed the title [-]Tesseract 语言包过滤[/-] [+]Tesseract OCR 语言包过滤[/+] on Feb 2, 2023
added a commit that references this issue on Feb 2, 2023
eulores

eulores commented on Feb 3, 2023

@eulores
Contributor

Dear D,

please consider the need of users to do OCR with languages outside the suggested list.
Maybe a configuration variable can be added instead, with these default values, but which can be modified if needed?

Thanks for your cosideration!

88250

88250 commented on Feb 3, 2023

@88250
MemberAuthor

有道理,我想我们可以继续改进为通过环境变量进行扩展,优先读取环境变量中的配置,例如可以配置环境变量 SIYUAN_TESSERACT_LANGS=chi_sim+chi_sim_vert+eng+fra+osd 如果存在这样的环境变量就直接使用了。

reopened this on Feb 3, 2023
changed the title [-]Tesseract OCR 语言包过滤[/-] [+]Tesseract OCR 语言包支持通过环境变量设置[/+] on Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @88250@eulores

      Issue actions

        Tesseract OCR 语言包支持通过环境变量设置 · Issue #7242 · siyuan-note/siyuan