Description
rclone v1.39-211-g572ee5ecβ
- os/arch: linux/amd64
- go version: go1.10
I recently upgraded from rclone.v1.38-235-g2a01fa9fβ to rclone v1.39-211-g572ee5ecβ
I'm using the following mount:
export RCLONE_CONFIG="/home/robert/.rclone.conf"
export RCLONE_BUFFER_SIZE=0M
export RCLONE_RETRIES=5
export RCLONE_STATS=0
export RCLONE_TIMEOUT=10m
export RCLONE_LOG_LEVEL=INFO
export RCLONE_DRIVE_USE_TRASH=false
/usr/sbin/rclone -vv --log-file /data/log/rmount-gs.log
mount robgs-cryptp:Media $GS_RCLONE
--allow-other
--default-permissions
--gid $gid --uid $uid
--max-read-ahead 1024k
--buffer-size 50M
--dir-cache-time=72h
--umask $UMASK 2>&1 > /data/log/debug &
This worked flawlessly before. In the most recent version, it is using WAY too much memory causing OOM issues on my linux server. The above uses upwards of 7GIGS of resident memory before my OOM killer terminates the mount when activity is being read/written to the mount. I used to be able to crank up the buffer-size to 150M without issues. There is NOTHING else different between the setup. I can restore my old version and re-run the batch process without issues and then replace it with the new version and it consistently terminates due to OOM after consuming everything left.
If I restore the old version and rerun the exact same process, rclone consistently uses no more than 1.8G.
I can provide logs but there is nothing them. They simply show transfers. Even in the fuse-debug there is nothing but normal activity. There seems to be either a leak or the use of memory has changed DRASTICALLY between the versions making things unusable. I've rolled back till this is sorted.
Activity
remusb commentedon Mar 20, 2018
Are you using cache? If yes and you're also using Plex, can you disable that feature and see if it does the same thing?
It will be enough to just remove the configs from the section and start again.
LE: Even easier will be to run simply with
-v
.calisro commentedon Mar 20, 2018
no I'm not using cache and it is running with -v normally. The older version didn't use cache as that was not implemented yet I dont think. The new one is using the same remotes and setup as the old. The only change is the executable upgrade.
remusb commentedon Mar 20, 2018
Hmm, ok. There was another report in the forum and that person uses cache. I was trying to establish a corellation but it seems to be coming from somewhere else.
ncw commentedon Mar 20, 2018
I'd like to replicate this if possible. Can you describe the activity that causes the problem more?
Or maybe you've got a script I could run?
Which provider are you using? And are you using crypt too?
calisro commentedon Mar 20, 2018
So the batch script is actually just a set of rclone commands to sync stuff up against the mount points.
SETUP:
I have a local mount point called /data/Media1
I have the rclone mounted on /data/Media2
This is my mount:
I'm using the following mount:
BATCH:
The batch script is simply copying new data from one to the other using this among other commands. But this is where most of hte activity starts and where the OOM starts to happen:
I'm using google as a provider and it is using crypt as well. Media2 is simply synced with Media1 but I don't delete so it becomes a superset of what is copied to Media1. New files get put in Media1 and then replicated via this process.
danielloader commentedon Mar 20, 2018
I am experiencing this too, only noticed as my IOWAIT shot up as I hit the swap and rclone pages were being pushed to the disk.
As to replication, can't seem to replicate it other than mount the remote with crypt/cache and try and play media on plex, after about 5 mins it thrashes the disk as memory shoots up from 200MB to the full 2GB on the VPS.
This wasn't the behaviour in the later betas of 1.39, anything changed in that respect?
ncw commentedon Mar 20, 2018
I managed to replicate this...
First I made a directory with 1000 files in using
(this is one of the rclone tools)
I then mounted up
and ran this little script to copy the files in and out of the mount
v1.39 uses a pretty constant 4MB, v1.40 goes up and up!
danielloader commentedon Mar 20, 2018
https://i.imgur.com/Lvm01QB.png
Graph of when I replicate your steps, looks like a memory leak alright. Once RAM is exhausted it's just 100% IOWAIT stalled as it's trying to use swap.
ncw commentedon Mar 20, 2018
OK here is the pprof memory usage - rclone was at about 2GB RSS when I took this.
So looks like it might be a bug in the fuse library... Though I'm not 100% sure about that.
remusb commentedon Mar 20, 2018
Was it updated just before the release? It's a bit weird no one noticed this in the last betas.
A simple list of the vendor shows that it wasn't.
calisro commentedon Mar 20, 2018
I have this issue on 1.39 though.
calisro commentedon Mar 20, 2018
rclone v1.39-211-g572ee5ecβ
ncw commentedon Mar 20, 2018
I used the magic of git bisect to narrow it down to this commit fc32fee
Which is both good and bad.
Good because there is a workaround
--attr-timeout 1s
, but bad because this is the second bug I've bisected to the same commit todayI tried
--attr-timeout 1ms
and that still leaked a bit of memory, whereas--attr-timeout 1s
seems OK.I suspect this is a bug in the fuse library as
rclone cmount
(which is based off libfuse) doesn't show the same behaviour - it works fine with--attr-timeout 0
.I'll report a bug upstream in a moment...
22 remaining items
seuffert commentedon Mar 22, 2018
--buffer-size is per open file on a mount
neik1 commentedon Mar 22, 2018
OK! But how is it possible that there were so many files open that led to that issue when there was only one streaming going? That's the point I do not understand yet.
danielloader commentedon Mar 22, 2018
Well if there's a library scan if it's not doing it sequentially it might be doing mediainfo lookups on multiple files at once. You could try changing the buffer size down to 8MB and see if it retains enough performance? Alternatively as said, write the 32MB buffers to disk as a swap of sorts.
I know on plex you can opt out of chapter/thumbnail creation which would read the whole file on scans, emby can opt out too?
neik1 commentedon Mar 22, 2018
With this adapted mount command it crashed again:
Log -> https://1drv.ms/t/s!AoPn9ceb766mgYsqBGX5KjSKxFVeiA
During the crash there was only one streaming ongoing and nothing else.
Unfortunately, the memusage script apparently stopped when the ssh connection disconnected. How can I start it so that it stays alive? I used ./memusage.sh > /tmp/mem.log &
My next step will be trying to avoid the mem cache at all with --cache-chunk-no-memory
€dit: Is this a rclone cache specific flag? I am not using cache!
@Daniel-Loader, I am not using chapter/thumbnail creation. The library scan for new files takes about 8min (scrapping of new files included). So, it's actually pretty fast I suppose.
danielloader commentedon Mar 22, 2018
Yeah that's incredibly fast for a non cached remote media scan, depending on library size!
hklcf commentedon Mar 23, 2018
@neik1 Screen
neik1 commentedon Mar 23, 2018
Yeah, well... At the end it doesn't seem to be a rclone issue. In my case it seems to be an Emby problem with a specific client.
Gonna try to narrow it down with the developers over there and will come back to report. ;-)
€dit: I am saying this because Emby crashed again but this time rclone didn't (probably because of the --cache-no-mem flag) and was only using 80mb of memory.
mount, cmount: set --attr-timeout default to 1s - fixes #2157
ncw commentedon Mar 24, 2018
I've committed a fix to change the default to
--attr-timeout
to1s
and made nodes in the docs about what happens when you set it to 0s.Hopefuly upstream will come up with a fix eventually, but this will do for the moment.
ncw commentedon Mar 24, 2018
This will be in
https://beta.rclone.org/v1.40-018-g98a92460/ (uploaded in 15-30 mins)
Added attr-timeout flag to rclone to workaround bug