Description
In what scenarios do you need this feature?
After configuring siyuan's OCR, I felt that the recognition rate was low. Later, then I switched to software that invoke the paddleOCR API and found that both English and Chinese had better recognition rates. I hope siyuan can replace the original OCR engine.
Describe the optimal solution
PaddleOCR has better text recognition capabilities than Tesseract.
Quote:
Recently PaddleOCR updated the v3 version, and the English space problem has been significantly improved. I tried the English model, it works very well.
In document scenarios, PaddleOCR can achieve 95%+ accuracy. But Tesseract may be confused on some rhythmic characters.
In particular, PaddleOCR's performance in some non-Latin languages is beyond my imagination. For example Arabic, the effect is far better than EasyOCR and Tesseract
Highly recommend PaddleOCR!!!
Paddle OCR is a deep learning-based OCR system created by PaddlePaddle, a Chinese AI firm. Paddle OCR is built on the PaddlePaddle framework, which is well-known for its quick and efficient deep learning algorithms. Paddle OCR supports numerous languages, including Chinese, English, Japanese, and Korean, and can properly detect different text styles and fonts.
Advantages: High accuracy: Paddle OCR has achieved state-of-the-art performance on various OCR benchmarks, including the ICDAR 2015 and ICDAR 2017 competitions.Fast and efficient: Paddle OCR is optimized for speed and can process large volumes of images in real-time, making it suitable for applications that require high throughput.Easy to use: Paddle OCR has a user-friendly interface that allows users to quickly train and deploy OCR models.
Reference:
- Stack Overflow --- paddle paddle - How does PaddleOCR performance compare to Tesseract? - Stack Overflow"
- Comparison of Paddle OCR, EasyOCR, KerasOCR, and Tesseract OCR
Describe the candidate solution
pls
Other information
.
Activity
Aiviokoo commentedon Jan 25, 2024
借楼,建议 mac 端可以直接调用 Apple 自带的 OCR 功能
Achuan-2 commentedon Apr 25, 2024
+1
不知道这个能不能用上:
ps:公式识别可以用https://github.com/breezedeus/Pix2Text
sayinmehmet47 commentedon Mar 3, 2025
i created nice example to use , i also switched from tesseract and get better with paddleOCR. You can take a look on that
https://github.com/sayinmehmet47/ocr
88250 commentedon Apr 18, 2025
Currently, the community has provided an online OCR plug-in. Considering the cross-platform compatibility, we will not consider using other tools. Thanks for the suggestion.