Skip to content

PDF files larger than 128MB are not included in asset file content searching #9500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
GuangDai opened this issue Oct 24, 2023 · 9 comments
Closed
3 tasks done
Assignees
Milestone

Comments

@GuangDai
Copy link

GuangDai commented Oct 24, 2023

当增加较大PDF文件时候,内存量占用异常

Is there an existing issue for this?

  • I have searched the existing issues

Can the issue be reproduced with the default theme (daylight/midnight)?

  • I was able to reproduce the issue with the default theme

Could the issue be due to extensions?

  • I've ruled out the possibility that the extension is causing the problem.

Describe the problem

复制一个大PDF文件到内容的时候,就这样了,几次尝试都是相同的结果

image
image

Expected result

正常的占用

Screenshot or screen recording presentation

No response

Version environment

- Version: 2.10.11
- Operating System: ArchLinux
- Browser (if used):

Log file

siyuan.log

More information

No response

@88250
Copy link
Member

88250 commented Oct 25, 2023

应该是因为建立 PDF 内容索引导致的,麻烦发一下这个 PDF 文件给我们测试看看 845765@qq.com

@88250 88250 self-assigned this Oct 25, 2023
@88250 88250 added this to the 2.10.13 milestone Oct 25, 2023
@88250 88250 changed the title 当增加较大PDF文件时候,内存量占用异常 PDF files larger than 64MB are not included in asset file content searching Oct 25, 2023
@88250
Copy link
Member

88250 commented Oct 25, 2023

感谢提供测试数据,下个版本我们会限制资源文件索引 PDF 大小为 64MB,超过这个大小的 PDF 不支持内容搜索。

@GuangDai
Copy link
Author

GuangDai commented Oct 25, 2023

64MB可能有点小,我看了一下我的asset文件夹下,基本上100MB以内,有两个300MB的,以上都没出现过问题,只有这个接近900MB的出现了问题,感觉可以限制到128左右

@88250
Copy link
Member

88250 commented Oct 25, 2023

好的,那就限制在 128MB 看看,后面可能可以支持设置调整,看后续需求吧。

@88250 88250 changed the title PDF files larger than 64MB are not included in asset file content searching PDF files larger than 128MB are not included in asset file content searching Oct 25, 2023
88250 added a commit that referenced this issue Oct 25, 2023

Verified

This commit was signed with the committer’s verified signature.
Zuoqiu-Yingyi Yingyi / 颖逸
…searching #9500
@88250 88250 closed this as completed Oct 25, 2023
@88250
Copy link
Member

88250 commented Oct 25, 2023

等需求吧,不是扫描版的 PDF 很少有这么大的。

@zxhd863943427
Copy link
Contributor

存在一种可能:扫描版加文字夹层。
这实际上才是最常见的文字版。

@UltramarineSky
Copy link

等需求吧,不是扫描版的 PDF 很少有这么大的。

国内没有几个提供原版PDF的出版社,大部分都是扫描版加ocr文字夹层,直接限制128可能会有其他使用这个功能的人出现问题,还是加设置比较好

@88250
Copy link
Member

88250 commented Oct 26, 2023

这样吧,咱们加个环境变量 SIYUAN_PDF_ASSET_CONTENT_INDEX_MAX_SIZE=256000000(单位字节)进行设置。

88250 added a commit that referenced this issue Oct 26, 2023

Verified

This commit was signed with the committer’s verified signature.
Zuoqiu-Yingyi Yingyi / 颖逸
…searching #9500
appdev pushed a commit to appdev/siyuan-unlock that referenced this issue Oct 31, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
appdev pushed a commit to appdev/siyuan-unlock that referenced this issue Oct 31, 2023
appdev pushed a commit to appdev/siyuan-unlock that referenced this issue Oct 31, 2023
appdev pushed a commit to appdev/siyuan-unlock that referenced this issue Oct 31, 2023
appdev pushed a commit to appdev/siyuan-unlock that referenced this issue Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
@88250 @UltramarineSky @GuangDai @zxhd863943427 and others