nvidia-docker中的TensorFlow：调用cuInit失败：CUDA_ERROR_UNKNOWN

我一直在努力获得一个依赖于TensorFlow的应用程序来作为一个docker容器与nvidia-docker 。我已经在tensorflow/tensorflow:latest-gpu-py3之上编译了我的应用程序tensorflow/tensorflow:latest-gpu-py3 image。我用以下命令运行我的docker容器：

sudo nvidia-docker run -d -p 9090:9090 -v /src/weights:/weights myname/myrepo:mylabel

当通过portainer查看日志时，我看到以下内容：

 2017-05-16 03:41:47.715682: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.715896: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.715948: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.715978: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.716002: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2017-05-16 03:41:47.718076: E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: CUDA_ERROR_UNKNOWN 2017-05-16 03:41:47.718177: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: 1e22bdaf82f1 2017-05-16 03:41:47.718216: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: 1e22bdaf82f1 2017-05-16 03:41:47.718298: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 367.57.0 2017-05-16 03:41:47.718398: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016 GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) """ 2017-05-16 03:41:47.718455: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0 2017-05-16 03:41:47.718484: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 367.57.0

容器似乎开始正常，我的应用程序似乎运行。当我向它发送预测请求时，预测会被正确地返回，但是在CPU上运行推理时，速度会很慢，所以我认为很明显GPU没有被使用。我也尝试了在同一个容器中运行nvidia-smi ，以确保它能看到我的GPU，结果如下：

 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.57 Driver Version: 367.57 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GRID K1 Off | 0000:00:07.0 Off | N/A | | N/A 28C P8 7W / 31W | 25MiB / 4036MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| +-----------------------------------------------------------------------------+

我当然不是这方面的专家 – 但是看起来GPU在容器内部是可见的。任何想法如何使这与TensorFlow工作？

nvidia-docker中的TensorFlow：调用cuInit失败：CUDA_ERROR_UNKNOWN

带有TensorFlow后端的Keras不使用GPU

当Keras（Tensorflow）在Django启动时Docker退出

捆绑tensorflow桌面

Docker错误从守护进程获取事件：EOF

容积安装tensorflow容器用于持久性存储

在重新训练基于诗歌张量stream的图像分类器期间运行retrain.py时出错

GCP Docker错误：文件不驻留在使用–proto_path（或-I）指定的任何path中

docker工本地主机url不能打开

bazel安装到0.5.4或更高版本来构buildTensorFlow

在python tensorflow工作簿执行后取消分配内存