Skip to content

Support for searching asset content #8874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
88250 opened this issue Aug 1, 2023 · 8 comments
Closed

Support for searching asset content #8874

88250 opened this issue Aug 1, 2023 · 8 comments
Assignees
Labels
Milestone

Comments

@88250
Copy link
Member

88250 commented Aug 1, 2023

Assets that can be parsed as text, such as:

  • .txt
  • .md
  • .docx
  • .xlsx
  • .pptx

Dependencies:


This is a one-time payment PRO feature, for more details please visit #8906

@88250 88250 added the Feature label Aug 1, 2023
@88250 88250 moved this to Short Term in SiYuan Roadmap Aug 1, 2023
@88250 88250 added this to the backlog milestone Aug 5, 2023
@88250 88250 pinned this issue Aug 5, 2023
@DereckZhang
Copy link

hope it will support PDF files

@88250 88250 modified the milestones: backlog, 2.10.0 Aug 8, 2023
@88250 88250 moved this from Short Term to In Progress in SiYuan Roadmap Aug 8, 2023
@88250
Copy link
Member Author

88250 commented Aug 9, 2023

hope it will support PDF files

PDF parsing has not yet found a suitable open source library, so it is currently impossible to support this format, sorry.

@DereckZhang
Copy link

hope it will support PDF files

PDF parsing has not yet found a suitable open source library, so it is currently impossible to support this format, sorry.

It seems like docconv support pdf files.(at least in its README)

@88250
Copy link
Member Author

88250 commented Aug 10, 2023 via email

@Zuoqiu-Yingyi
Copy link
Contributor

Zuoqiu-Yingyi commented Aug 11, 2023

PDF parsing has not yet found a suitable open source library, so it is currently impossible to support this format, sorry.

Extract all text: GitHub - ledongthuc/pdf: PDF reader
Extract the text for each page: unipdf-examples/extract/pdf_extract_text.go at master · unidoc/unipdf-examples

@88250
Copy link
Member Author

88250 commented Aug 12, 2023

@Zuoqiu-Yingyi ledongthuc/pdf doesn't work, I've tested it and can't parse the text.

UniPDF may be possible, but this requires the purchase of a license, which is a bit expensive.

image

Vanessa219 added a commit that referenced this issue Aug 12, 2023
Vanessa219 added a commit that referenced this issue Aug 12, 2023
Vanessa219 added a commit that referenced this issue Aug 12, 2023
Vanessa219 added a commit that referenced this issue Aug 12, 2023
Vanessa219 added a commit that referenced this issue Aug 12, 2023
@88250 88250 closed this as completed Aug 15, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Already Done in SiYuan Roadmap Aug 15, 2023
Vanessa219 added a commit that referenced this issue Aug 15, 2023
Vanessa219 added a commit that referenced this issue Aug 15, 2023
Vanessa219 added a commit that referenced this issue Aug 15, 2023
@88250 88250 unpinned this issue Aug 15, 2023
@tcmtom
Copy link

tcmtom commented Aug 16, 2023

image

帮助文档该更新了

@88250
Copy link
Member Author

88250 commented Aug 16, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Already Done
Development

No branches or pull requests

5 participants