This repository was archived by the owner on Jan 22, 2024. It is now read-only.
This repository was archived by the owner on Jan 22, 2024. It is now read-only.
MPS Support #419
Closed
Description
Hi,
When I use "CUDA Multi-Process Service" aka MPS in nvidia-docker environment, I met a couple of issues. So I'm wonder if MPS is supported in nvidia-docker? Please help me, thanks in advance~
Here is problems I have met:
- When I run
nvidia-cuda-mps-control -d
to start mps daemon in Nvidia-docker, I can't see this process fromnvidia-smi
, however, I can see this process from host machine.
In comparison, when I run the same command,nvidia-cuda-mps-control -d
, in Host machine (physical server), I got see this from nvidia-smi. (need run a gpu program first to start MPS server) - I tried to run caffe training with MPS as a example, 2 training process at the same time in Nvidia-docker env. It showed:
F0703 13:39:15.539633 97 common.cpp:165] Check failed: error == cudaSuccess (46 vs. 0) all CUDA-capable devices are busy or unavailable
In comparison, this works ok in host (physical machine).
I'm trying this on P100 GPU, Ubuntu14,
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
Docker version 17.04.0-ce, build 4845c56
I hope this is the right place to ask, thanks again.
Activity
3XX0 commentedon Jul 6, 2017
Short answer, it is not supported for now. However, we are looking at it for the 2.0 timeframe but there are a lot of corner cases that need to be investigated.
I'll update this issue with additional information once we are confident it could work properly.
[-]Can "cuda-mps-server" work on nvidia-docker?[/-][+]MPS Support[/+]xpp1021 commentedon Jan 20, 2018
Hi,
Is the 2.0 supports the Cuda9 for Volta MPS now?
@3XX0 ,thanks.
andrewpsp commentedon Jan 28, 2018
This MPS Support seems like it would be a blocker creating the service deployments in orchestration. I'll be following the outcome in anticipation for a pull request use-case for the swarm or Kubernetes functionality.
@
vivisidea commentedon Feb 4, 2018
Any progress? or is there any workaround so I can use CUDA Multi-Process Service in the container?
ksze commentedon Feb 9, 2018
Shouldn't it be the other way around? I.E. The MPS should run on the host so it can allocate process time to multiple containers? Is that an already supported architecture?
3XX0 commentedon Feb 9, 2018
With 2.0 it should work as long as you run the MPS server on the host and use
--ipc=host
. We're working torward a better integration though, so I'll keep this issue open.bhagatindia commentedon Feb 11, 2018
@3XX0,
Does it mean that we can set and limit CUDA_MPS_ACTIVE_THREAD_PERCENTAGE for each container? Any examples of usage would really help.
Could you please elaborate what you mean by "better integration"?
Thank you
WIZARD-CXY commentedon Mar 1, 2018
mark
hehaifengcn commentedon Apr 12, 2018
@3XX0 How much does "-ipc=host" compromise security? Somebody asked the question on SO but no answer yet: https://stackoverflow.com/questions/38907708/docker-ipc-host-and-security
hehaifengcn commentedon Apr 20, 2018
@3XX0 Any update on when nvidia-docker will officially support MPS?
hehaifengcn commentedon Apr 27, 2018
@3XX0 I did some tests and --ipc=host does appear to work. But is there anything else we should pay attention to run current nvidia-docker 2 under MPS? Would you recommend to use it in production? Would be super helpful if you can provide some guidance here.
27 remaining items