Skip to content

Support searching PDF asset content #8985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 17, 2023
Merged

Conversation

nekrondev
Copy link
Contributor

@nekrondev nekrondev commented Aug 16, 2023

feat(asset): Add PDF parser based on go-pdfium WASM module

This change adds the webassembly-based pdfium module for parsing PDF files into text used by SiYuan asset search.
A test case checks parsing a multi-page Chinese PDF (GPL in Chinese language) if the text contents will be returned.

Note: Frontend changes are not included with this PR.

@nekrondev
Copy link
Contributor Author

About the licensing issues discussed at #8874: go-pdfium is MIT, pdfium C++ library is Apache License v2 so it's compatible.

@88250 88250 changed the title feat(asset): Add PDF parser based on go-pdfium WASM module Support searching PDF asset content Aug 16, 2023
@88250 88250 added this to the 2.10.1 milestone Aug 16, 2023
@88250 88250 merged commit 19a295e into siyuan-note:dev Aug 17, 2023
88250 added a commit that referenced this pull request Aug 17, 2023
@88250 88250 assigned 88250 and unassigned Vanessa219 Aug 17, 2023
88250 added a commit that referenced this pull request Aug 17, 2023
88250 added a commit that referenced this pull request Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants