OCR engine：substitute PaddleOCR for Tesseract-OCR

### In what scenarios do you need this feature?

After configuring siyuan's OCR, I felt that the recognition rate was low. Later, then I switched to software that invoke the paddleOCR API and found that both English and Chinese had better recognition rates. I hope siyuan can replace the original OCR engine.

### Describe the optimal solution

PaddleOCR has better text recognition capabilities than Tesseract.


**Quote:**
> Recently PaddleOCR updated the v3 version, and the English space problem has been significantly improved. I tried the English model, it works very well.

In document scenarios, PaddleOCR can achieve 95%+ accuracy. But Tesseract may be confused on some rhythmic characters.

In particular, PaddleOCR's performance in some non-Latin languages ​​is beyond my imagination. For example Arabic, the effect is far better than EasyOCR and Tesseract

Highly recommend PaddleOCR！！！

---
> Paddle OCR is a deep learning-based OCR system created by PaddlePaddle, a Chinese AI firm. Paddle OCR is built on the PaddlePaddle framework, which is well-known for its quick and efficient deep learning algorithms. Paddle OCR supports numerous languages, including Chinese, English, Japanese, and Korean, and can properly detect different text styles and fonts.
> **Advantages**: High accuracy: Paddle OCR has achieved state-of-the-art performance on various OCR benchmarks, including the ICDAR 2015 and ICDAR 2017 competitions.Fast and efficient: Paddle OCR is optimized for speed and can process large volumes of images in real-time, making it suitable for applications that require high throughput.Easy to use: Paddle OCR has a user-friendly interface that allows users to quickly train and deploy OCR models.



**Reference:**
- [Stack Overflow --- paddle paddle - How does PaddleOCR performance compare to Tesseract? - Stack Overflow"](https://stackoverflow.com/questions/68005555/how-does-paddleocr-performance-compare-to-tesseract)
- [Comparison of Paddle OCR, EasyOCR, KerasOCR, and Tesseract OCR](https://www.plugger.ai/blog/comparison-of-paddle-ocr-easyocr-kerasocr-and-tesseract-ocr)

### Describe the candidate solution

pls

### Other information

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCR engine：substitute PaddleOCR for Tesseract-OCR #10232

In what scenarios do you need this feature?

Describe the optimal solution

Describe the candidate solution

Other information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

OCR engine：substitute PaddleOCR for Tesseract-OCR #10232

Description

In what scenarios do you need this feature?

Describe the optimal solution

Describe the candidate solution

Other information

Activity

Aiviokoo commented on Jan 25, 2024

Achuan-2 commented on Apr 25, 2024

sayinmehmet47 commented on Mar 3, 2025

88250 commented on Apr 18, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions