Skip to content
This repository has been archived by the owner on Jan 22, 2024. It is now read-only.

MPS Support #419

Closed
waynesuzq opened this issue Jul 6, 2017 · 28 comments
Closed

MPS Support #419

waynesuzq opened this issue Jul 6, 2017 · 28 comments

Comments

@waynesuzq
Copy link

Hi,

When I use "CUDA Multi-Process Service" aka MPS in nvidia-docker environment, I met a couple of issues. So I'm wonder if MPS is supported in nvidia-docker? Please help me, thanks in advance~

Here is problems I have met:

  1. When I run nvidia-cuda-mps-control -d to start mps daemon in Nvidia-docker, I can't see this process from nvidia-smi, however, I can see this process from host machine.
    In comparison, when I run the same command, nvidia-cuda-mps-control -d, in Host machine (physical server), I got see this from nvidia-smi. (need run a gpu program first to start MPS server)
  2. I tried to run caffe training with MPS as a example, 2 training process at the same time in Nvidia-docker env. It showed:
    F0703 13:39:15.539633 97 common.cpp:165] Check failed: error == cudaSuccess (46 vs. 0) all CUDA-capable devices are busy or unavailable
    In comparison, this works ok in host (physical machine).

I'm trying this on P100 GPU, Ubuntu14,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Docker version 17.04.0-ce, build 4845c56

I hope this is the right place to ask, thanks again.

@3XX0
Copy link
Member

3XX0 commented Jul 6, 2017

Short answer, it is not supported for now. However, we are looking at it for the 2.0 timeframe but there are a lot of corner cases that need to be investigated.

I'll update this issue with additional information once we are confident it could work properly.

@3XX0 3XX0 changed the title Can "cuda-mps-server" work on nvidia-docker? MPS Support Nov 14, 2017
@xpp1021
Copy link

xpp1021 commented Jan 20, 2018

Hi,
Is the 2.0 supports the Cuda9 for Volta MPS now?
@3XX0 ,thanks.

@andrewpsp
Copy link

andrewpsp commented Jan 28, 2018

This MPS Support seems like it would be a blocker creating the service deployments in orchestration. I'll be following the outcome in anticipation for a pull request use-case for the swarm or Kubernetes functionality.

@

@vivisidea
Copy link

Any progress? or is there any workaround so I can use CUDA Multi-Process Service in the container?

@ksze
Copy link

ksze commented Feb 9, 2018

Shouldn't it be the other way around? I.E. The MPS should run on the host so it can allocate process time to multiple containers? Is that an already supported architecture?

@3XX0
Copy link
Member

3XX0 commented Feb 9, 2018

With 2.0 it should work as long as you run the MPS server on the host and use --ipc=host. We're working torward a better integration though, so I'll keep this issue open.

# Launch two containers on the second GPU device
sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1 nvidia-cuda-mps-control -d

docker run -ti --rm -e NVIDIA_VISIBLE_DEVICES=1 --runtime=nvidia --ipc=host nvidia/cuda
docker run -ti --rm -e NVIDIA_VISIBLE_DEVICES=1 --runtime=nvidia --ipc=host nvidia/cuda

echo quit | sudo nvidia-cuda-mps-control

@bhagatindia
Copy link

bhagatindia commented Feb 11, 2018

@3XX0,

Does it mean that we can set and limit CUDA_MPS_ACTIVE_THREAD_PERCENTAGE for each container? Any examples of usage would really help.

Could you please elaborate what you mean by "better integration"?

Thank you

@WIZARD-CXY
Copy link

mark

@hehaifengcn
Copy link

@3XX0 How much does "-ipc=host" compromise security? Somebody asked the question on SO but no answer yet: https://stackoverflow.com/questions/38907708/docker-ipc-host-and-security

@hehaifengcn
Copy link

@3XX0 Any update on when nvidia-docker will officially support MPS?

@hehaifengcn
Copy link

@3XX0 I did some tests and --ipc=host does appear to work. But is there anything else we should pay attention to run current nvidia-docker 2 under MPS? Would you recommend to use it in production? Would be super helpful if you can provide some guidance here.

@flx42
Copy link
Member

flx42 commented Oct 16, 2018

I've added a wiki page on how to use MPS with Docker Compose:
https://github.com/NVIDIA/nvidia-docker/wiki/MPS-(EXPERIMENTAL)

You can look at the docker-compose.yml file for implementation details.

@azazhu
Copy link

azazhu commented Oct 29, 2018

Hi, @flx42 , is it possible to provide a compose file which format version is 2.1? As lots of companies still use docker 1.12 in their cluster and they cannot upgrade their docker version to 17.0.6 in short term.

@flx42
Copy link
Member

flx42 commented Oct 29, 2018

@azazhu are you running RHEL/Atomic's fork of Docker? If you do, you can just remove the runtime: lines and it should work fine. That's the docker package on RHEL/CentOS and probably other derivatives.

If that's not what you are running, you won't be able to make it work since the runtime option requires format 2.3:
https://docs.docker.com/compose/compose-file/compose-versioning/#version-23

@azazhu
Copy link

azazhu commented Oct 30, 2018

Thx, @flx42, Could you check me if my understanding is correct or not:

  1. nvidia-docker can work with volta MPS even if we don't use docker compose file you provided, right?
  • we just need a) nvidia-docker2; b) recommend to set EXCLUSIVE_PROCESS in host machine; c) start mps daemon(nvidia-cuda-mps-control in host machine; d) set CUDA_MPS_PIPE_DIRECTORY in host machine; e) make sure container can read the path of CUDA_MPS_PIPE_DIRECTORY by using -v; f) start container with "--ipc=host". Are my a,b, ~ e,f right?
  1. another question is: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE should be set in container instead of host machine, right?

@flx42
Copy link
Member

flx42 commented Oct 30, 2018

Yes, that should work. But you can also containerize the MPS daemon, like in the Docker Compose example.
I need to document the steps with the docker CLI too.

another question is: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE should be set in container instead of host machine, right?

IIRC you can set this value for the MPS daemon, or for all CUDA client apps. I think both work fine.

@azazhu
Copy link

azazhu commented Oct 31, 2018

thx @flx42 , what do you mean by "containerize the MPS daemon"? To launch MPS daemon(nvidia-cuda-mps-control) on both host machine and container?
In my experiment, I only launched nvidia-cuda-mps-control on host machine(i didn't launch it in container) and looks it works fine.

@flx42
Copy link
Member

flx42 commented Oct 31, 2018

Yes, you can launch it inside a container or on the host. Both ways will work.

@azazhu
Copy link

azazhu commented Oct 31, 2018

hi @flx42 ,

  1. it will be great if you can document the steps with the docker CLI, as I failed to launch docker-compose. I met "ERROR: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network" in my work env. I tried to change the "bip" to avoid the subnet conflict, but still met the same error.
  2. I use the method I mentioned above and it can work, but it looks different from https://github.com/NVIDIA/nvidia-docker/wiki/MPS-(EXPERIMENTAL).
    In docker-compose.yml, looks container has sys admin permission. So container can set gpu mode(to EXCLUSIVE_PROCESS) and launch mps demon by itself(pls correct me if my understanding is wrong). While the method I used is that gpu mode is set by host machine and mps demon is launched by host machine, and container doesn't have sys admin permission. Both methods can work, right?

@GoodJoey
Copy link

GoodJoey commented Nov 6, 2018

@flx42 Does MPS support pascal GPU in nvidia-docker contrainers?

@flx42
Copy link
Member

flx42 commented Nov 6, 2018

@GoodJoey not with the approach documented above, you would need a Volta GPU.

@lxl910915
Copy link

lxl910915 commented Jan 12, 2020

@flx42 In this wiki MPS , does Volta mean Volta Architecture or Volta GPU in sentence 'Only Volta MPS is supported' ?
What's more, does 7.0 mean Compute Capability 7.0 in sentence 'NVIDIA GPU with Architecture >= Volta (7.0)' ? Forward your repley, thanks!

'

@renedlog
Copy link

renedlog commented Feb 21, 2020

Seems like mps is not supported on the newest docker version. especially it's not --runtime=nvidia but --gpus=all now.
Also the missing support for docker-compose is annoying.

This example shows well that the containers have some kind of problem with cuda....

sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1 nvidia-cuda-mps-control -d #start deamon
docker run -it --rm -e NVIDIA_VISIBLE_DEVICES=1 --gpus=all --ipc=host tensorflow/tensorflow:2.1.0-gpu-py3 python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
echo quit | sudo nvidia-cuda-mps-control #shutdown deamon

Would really love to see "usable" support of mps with docker

@NVIDIA NVIDIA deleted a comment from ChaosJu Sep 13, 2020
@NVIDIA NVIDIA deleted a comment from zrss Sep 13, 2020
@elepherai
Copy link

Any update on this issue?

@elepherai
Copy link

Seems like mps is not supported on the newest docker version. especially it's not --runtime=nvidia but --gpus=all now.
Also the missing support for docker-compose is annoying.

This example shows well that the containers have some kind of problem with cuda....

sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1 nvidia-cuda-mps-control -d #start deamon
docker run -it --rm -e NVIDIA_VISIBLE_DEVICES=1 --gpus=all --ipc=host tensorflow/tensorflow:2.1.0-gpu-py3 python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
echo quit | sudo nvidia-cuda-mps-control #shutdown deamon

Would really love to see "usable" support of mps with docker

Hi, have you solved this problem?

@husterdjx
Copy link

husterdjx commented Sep 13, 2022

@3XX0,

Does it mean that we can set and limit CUDA_MPS_ACTIVE_THREAD_PERCENTAGE for each container? Any examples of usage would really help.

Could you please elaborate what you mean by "better integration"?

Thank you

Hi, have you solved this problem? I want to set different CUDA_MPS_ACTIVE_THREAD_PERCENTAGE for each container, such as 3*30%and1*10% for a specific GPU.

@domef
Copy link

domef commented Aug 31, 2023

any update?

@elezar
Copy link
Member

elezar commented Oct 30, 2023

We are working on a DRA Driver for NVIDA GPUs (https://github.com/NVIDIA/k8s-dra-driver) which will include better MPS support.

If there are use cases not covered by this (e.g. outside of K8s), please create an issue describing the use case against https://github.com/NVIDIA/nvidia-container-toolkit.

@elezar elezar closed this as completed Oct 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests