Skip to content

OCR engine:substitute PaddleOCR for Tesseract-OCR #10232

Closed
@pureTrue

Description

@pureTrue

In what scenarios do you need this feature?

After configuring siyuan's OCR, I felt that the recognition rate was low. Later, then I switched to software that invoke the paddleOCR API and found that both English and Chinese had better recognition rates. I hope siyuan can replace the original OCR engine.

Describe the optimal solution

PaddleOCR has better text recognition capabilities than Tesseract.

Quote:

Recently PaddleOCR updated the v3 version, and the English space problem has been significantly improved. I tried the English model, it works very well.

In document scenarios, PaddleOCR can achieve 95%+ accuracy. But Tesseract may be confused on some rhythmic characters.

In particular, PaddleOCR's performance in some non-Latin languages ​​is beyond my imagination. For example Arabic, the effect is far better than EasyOCR and Tesseract

Highly recommend PaddleOCR!!!


Paddle OCR is a deep learning-based OCR system created by PaddlePaddle, a Chinese AI firm. Paddle OCR is built on the PaddlePaddle framework, which is well-known for its quick and efficient deep learning algorithms. Paddle OCR supports numerous languages, including Chinese, English, Japanese, and Korean, and can properly detect different text styles and fonts.
Advantages: High accuracy: Paddle OCR has achieved state-of-the-art performance on various OCR benchmarks, including the ICDAR 2015 and ICDAR 2017 competitions.Fast and efficient: Paddle OCR is optimized for speed and can process large volumes of images in real-time, making it suitable for applications that require high throughput.Easy to use: Paddle OCR has a user-friendly interface that allows users to quickly train and deploy OCR models.

Reference:

Describe the candidate solution

pls

Other information

.

Activity

Aiviokoo

Aiviokoo commented on Jan 25, 2024

@Aiviokoo

借楼,建议 mac 端可以直接调用 Apple 自带的 OCR 功能

sayinmehmet47

sayinmehmet47 commented on Mar 3, 2025

@sayinmehmet47

i created nice example to use , i also switched from tesseract and get better with paddleOCR. You can take a look on that

https://github.com/sayinmehmet47/ocr

88250

88250 commented on Apr 18, 2025

@88250
Member

Currently, the community has provided an online OCR plug-in. Considering the cross-platform compatibility, we will not consider using other tools. Thanks for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @88250@pureTrue@Achuan-2@sayinmehmet47@Aiviokoo

        Issue actions

          OCR engine:substitute PaddleOCR for Tesseract-OCR · Issue #10232 · siyuan-note/siyuan