Improve content parsing performance for large PDF asset #9051

nekrondev · 2023-08-26T11:57:48Z

feat(assets): improve PDF asset parser performance (#9037)

Improve PDF parsing using a worker pool

This commit will change the single-threaded behavior of PDF parser into a multi-threaded worker pool speeding up PDF parsing performance multiple times.

By default the number of available CPU cores will be used to create a worker pool that will process the PDF pages. The pages are split up and feed into the pool returning text results for every PDF page that will be sorted and joined afterwards.

My internal tests using this approach showed on an old i5 (4 cores CPU) a speedup of 3x - 4x faster than the single-threaded previous version.

This commit will change the single-threaded behavior of PDF parser into multi-threaded worker pool speeding up PDF parsing into text

88250 · 2023-08-26T14:44:21Z

Thanks a lot for your contribution.

88250 assigned nekrondev Aug 26, 2023

88250 changed the title ~~Improve PDF parsing using a worker pool~~ Improve content parsing performance for large PDF asset Aug 26, 2023

88250 added the Enhancement label Aug 26, 2023

88250 added this to the 2.10.2 milestone Aug 26, 2023

88250 merged commit f4e840f into siyuan-note:dev Aug 26, 2023

This was referenced Aug 26, 2023

Improve content parsing performance for large PDF asset #9037

Closed

PDF files longer than 1024 pages are not included in asset file content searching #9053

Closed

88250 added a commit that referenced this pull request Aug 27, 2023

🎨 Limit memory usage of PDF parsing #9051

9612e41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve content parsing performance for large PDF asset #9051

Improve content parsing performance for large PDF asset #9051

nekrondev commented Aug 26, 2023 •

edited by 88250

Loading

88250 commented Aug 26, 2023

Improve content parsing performance for large PDF asset #9051

Improve content parsing performance for large PDF asset #9051

Conversation

nekrondev commented Aug 26, 2023 • edited by 88250 Loading

88250 commented Aug 26, 2023

nekrondev commented Aug 26, 2023 •

edited by 88250

Loading