Skip to content

Improve content parsing performance for large PDF asset #9051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 26, 2023

Conversation

nekrondev
Copy link
Contributor

@nekrondev nekrondev commented Aug 26, 2023

feat(assets): improve PDF asset parser performance (#9037)

Improve PDF parsing using a worker pool

This commit will change the single-threaded behavior of PDF parser into a multi-threaded worker pool speeding up PDF parsing performance multiple times.

By default the number of available CPU cores will be used to create a worker pool that will process the PDF pages. The pages are split up and feed into the pool returning text results for every PDF page that will be sorted and joined afterwards.

My internal tests using this approach showed on an old i5 (4 cores CPU) a speedup of 3x - 4x faster than the single-threaded previous version.

Verified

This commit was signed with the committer’s verified signature.
Zuoqiu-Yingyi Yingyi / 颖逸
This commit will change the single-threaded behavior of
PDF parser into multi-threaded worker pool
speeding up PDF parsing into text
@88250 88250 changed the title Improve PDF parsing using a worker pool Improve content parsing performance for large PDF asset Aug 26, 2023
@88250 88250 added this to the 2.10.2 milestone Aug 26, 2023
@88250 88250 merged commit f4e840f into siyuan-note:dev Aug 26, 2023
@88250
Copy link
Member

88250 commented Aug 26, 2023

Thanks a lot for your contribution.

88250 added a commit that referenced this pull request Aug 27, 2023

Verified

This commit was signed with the committer’s verified signature.
Zuoqiu-Yingyi Yingyi / 颖逸
88250 added a commit that referenced this pull request Aug 27, 2023

Verified

This commit was signed with the committer’s verified signature.
Zuoqiu-Yingyi Yingyi / 颖逸
88250 added a commit that referenced this pull request Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants