-
Notifications
You must be signed in to change notification settings - Fork 614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The MPS container has started running, but cannot call GPU resources inside the container #805
Comments
My Deployment: ---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
labels:
k8s.kuboard.cn/layer: svc
k8s.kuboard.cn/name: video
name: video
namespace: edge
resourceVersion: '1597692'
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 1
selector:
matchLabels:
k8s.kuboard.cn/layer: svc
k8s.kuboard.cn/name: video
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: '2024-07-08T17:37:11+08:00'
creationTimestamp: null
labels:
k8s.kuboard.cn/layer: svc
k8s.kuboard.cn/name: video
spec:
containers:
- image: 'harbor.moolink.net/moolink/video-supervision:v1.2'
imagePullPolicy: IfNotPresent
name: video
resources:
limits:
nvidia.com/gpu: '1'
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: shm
dnsPolicy: ClusterFirst
nodeName: edgenode-test
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /dev/shm
type: Directory
name: shm
|
1 similar comment
My Deployment: ---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations: {}
labels:
k8s.kuboard.cn/layer: svc
k8s.kuboard.cn/name: video
name: video
namespace: edge
resourceVersion: '1597692'
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 1
selector:
matchLabels:
k8s.kuboard.cn/layer: svc
k8s.kuboard.cn/name: video
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
kubectl.kubernetes.io/restartedAt: '2024-07-08T17:37:11+08:00'
creationTimestamp: null
labels:
k8s.kuboard.cn/layer: svc
k8s.kuboard.cn/name: video
spec:
containers:
- image: 'harbor.moolink.net/moolink/video-supervision:v1.2'
imagePullPolicy: IfNotPresent
name: video
resources:
limits:
nvidia.com/gpu: '1'
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /dev/shm
name: shm
dnsPolicy: ClusterFirst
nodeName: edgenode-test
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /dev/shm
type: Directory
name: shm
|
Meet a similar issue and find a comment at #467 (comment)
|
|
why after
no a
and
use |
Hi any update on mps shm configurable in last releases? we cannot use it like this each of workload is using different shm |
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.
1. Quick Debug Information
2. Issue or feature description
After successfully deploying nvidia device plugin using the command, I also successfully made the GPU 10 schedulable. However, when using MPS mode in the container, YOLO found that it was unable to successfully call the GPU when calling resources. But timeSlicing is actually possible。
Do I need to enable certain functions? I have already opened “nvidia cuda mps control - d”, and using “nvidia smi” in the container can also view GPU resources
3. Information to attach (optional if deemed irrelevant)
/
The text was updated successfully, but these errors were encountered: