Skip to content
This repository was archived by the owner on Jan 22, 2024. It is now read-only.
This repository was archived by the owner on Jan 22, 2024. It is now read-only.

MPS Support #419

Closed
Closed
@waynesuzq

Description

@waynesuzq

Hi,

When I use "CUDA Multi-Process Service" aka MPS in nvidia-docker environment, I met a couple of issues. So I'm wonder if MPS is supported in nvidia-docker? Please help me, thanks in advance~

Here is problems I have met:

  1. When I run nvidia-cuda-mps-control -d to start mps daemon in Nvidia-docker, I can't see this process from nvidia-smi, however, I can see this process from host machine.
    In comparison, when I run the same command, nvidia-cuda-mps-control -d, in Host machine (physical server), I got see this from nvidia-smi. (need run a gpu program first to start MPS server)
  2. I tried to run caffe training with MPS as a example, 2 training process at the same time in Nvidia-docker env. It showed:
    F0703 13:39:15.539633 97 common.cpp:165] Check failed: error == cudaSuccess (46 vs. 0) all CUDA-capable devices are busy or unavailable
    In comparison, this works ok in host (physical machine).

I'm trying this on P100 GPU, Ubuntu14,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

Docker version 17.04.0-ce, build 4845c56

I hope this is the right place to ask, thanks again.

Activity

3XX0

3XX0 commented on Jul 6, 2017

@3XX0
Member

Short answer, it is not supported for now. However, we are looking at it for the 2.0 timeframe but there are a lot of corner cases that need to be investigated.

I'll update this issue with additional information once we are confident it could work properly.

changed the title [-]Can "cuda-mps-server" work on nvidia-docker?[/-] [+]MPS Support[/+] on Nov 14, 2017
xpp1021

xpp1021 commented on Jan 20, 2018

@xpp1021

Hi,
Is the 2.0 supports the Cuda9 for Volta MPS now?
@3XX0 ,thanks.

andrewpsp

andrewpsp commented on Jan 28, 2018

@andrewpsp

This MPS Support seems like it would be a blocker creating the service deployments in orchestration. I'll be following the outcome in anticipation for a pull request use-case for the swarm or Kubernetes functionality.

@

vivisidea

vivisidea commented on Feb 4, 2018

@vivisidea

Any progress? or is there any workaround so I can use CUDA Multi-Process Service in the container?

ksze

ksze commented on Feb 9, 2018

@ksze

Shouldn't it be the other way around? I.E. The MPS should run on the host so it can allocate process time to multiple containers? Is that an already supported architecture?

3XX0

3XX0 commented on Feb 9, 2018

@3XX0
Member

With 2.0 it should work as long as you run the MPS server on the host and use --ipc=host. We're working torward a better integration though, so I'll keep this issue open.

# Launch two containers on the second GPU device
sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=1 nvidia-cuda-mps-control -d

docker run -ti --rm -e NVIDIA_VISIBLE_DEVICES=1 --runtime=nvidia --ipc=host nvidia/cuda
docker run -ti --rm -e NVIDIA_VISIBLE_DEVICES=1 --runtime=nvidia --ipc=host nvidia/cuda

echo quit | sudo nvidia-cuda-mps-control
bhagatindia

bhagatindia commented on Feb 11, 2018

@bhagatindia

@3XX0,

Does it mean that we can set and limit CUDA_MPS_ACTIVE_THREAD_PERCENTAGE for each container? Any examples of usage would really help.

Could you please elaborate what you mean by "better integration"?

Thank you

WIZARD-CXY

WIZARD-CXY commented on Mar 1, 2018

@WIZARD-CXY

mark

hehaifengcn

hehaifengcn commented on Apr 12, 2018

@hehaifengcn

@3XX0 How much does "-ipc=host" compromise security? Somebody asked the question on SO but no answer yet: https://stackoverflow.com/questions/38907708/docker-ipc-host-and-security

hehaifengcn

hehaifengcn commented on Apr 20, 2018

@hehaifengcn

@3XX0 Any update on when nvidia-docker will officially support MPS?

hehaifengcn

hehaifengcn commented on Apr 27, 2018

@hehaifengcn

@3XX0 I did some tests and --ipc=host does appear to work. But is there anything else we should pay attention to run current nvidia-docker 2 under MPS? Would you recommend to use it in production? Would be super helpful if you can provide some guidance here.

27 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @ksze@hehaifengcn@vivisidea@elezar@andrewpsp

        Issue actions

          MPS Support · Issue #419 · NVIDIA/nvidia-docker