前言:
之前曾经有一篇文章, 详细讲述了如何一步步手动的安装配置环境.
包括:
- 驱动程序 driver
- cuda
- cudnn
- nividia-docker
但是现在安装相比之前已经简化了非常非常多了. 现在前面3个事情, 一个apt
命令就可以搞定. 废话不多说, 开始进入正文.
删除\卸载以前跟NVIDIA相关的东西
1 2 |
sudo apt pruge nvidia* |
这个会卸载包括驱动以及nvidia-docker
命令
安装显卡相关的驱动
写这一篇文章的时候, 当前大版本是390. 相应的命令如下:
1 2 3 4 |
sudo add-apt-repository ppa:graphics-drivers sudo apt-get update sudo apt install nvidia-390 # 未测试的命令: apt install nvidia-current |
以前手动下载驱动跟CUDA, 再一步步的安装的方法, 已经完全过时啦!
安装完成之后, 运行nvidia-smi
可以看到运行的结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Fri Aug 10 10:54:49 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.77 Driver Version: 390.77 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A | | 0% 40C P5 16W / 120W | 0MiB / 6077MiB | 2% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
安装完成之后, 最好重启一次, 确保驱动生效.
安装最新版本的Docker-CE
因为后面还需要安装命令行工具nvidia-docker2
所以需要安装新版本的docker-ce
之前通过APT直接安装的docker-io
已经不能用 不兼容了.
注意, 首先要卸载历史版本:
sudo apt-get remove docker docker-engine docker.io
然后就按照官网的命令一个个输入进去好了:
官网地址: https://docs.docker.com/v17.12/install/linux/docker-ce/ubuntu/#install-docker-ce-1
在这里就不赘述了.
安装nvidia-docker2
安装这个的原因是, 因为我们需要在docker之中使用GPU 与 CUDA Toolkit
官网: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
- 首先卸载老版本
123docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -fsudo apt-get purge nvidia-docker
- 增加APT Repository
1234567curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \sudo apt-key add -distribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get update
- 安装
12sudo apt-get install nvidia-docker2
- 安装完成之后看看版本
123456789101112131415161718192021$ nvidia-docker versionNVIDIA Docker: 2.0.3Client:Version: 18.06.0-ceAPI version: 1.38Go version: go1.10.3Git commit: 0ffa825Built: Wed Jul 18 19:11:02 2018OS/Arch: linux/amd64Experimental: falseServer:Engine:Version: 18.06.0-ceAPI version: 1.38 (minimum version 1.12)Go version: go1.10.3Git commit: 0ffa825Built: Wed Jul 18 19:09:05 2018OS/Arch: linux/amd64Experimental: false
- 安装完成之后, 检查一下环境
12345678910sudo cat /etc/docker/daemon.json{"runtimes": {"nvidia": {"path": "nvidia-container-runtime","runtimeArgs": []}}}
- 重启docker service 确保生效
123sudo systemctl daemon-reloadsudo systemctl restart docker
启动Docker镜像, 使用GPU
首先进行一个简单的测试:
1 2 3 4 5 6 |
# Sample 1: nvidia-docker run --rm nvidia/cuda nvidia-smi # Sample 2: docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi |
运行结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Fri Aug 10 04:50:35 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 390.77 Driver Version: 390.77 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A | | 0% 39C P0 25W / 120W | 0MiB / 6077MiB | 2% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
使用TensorFlow-GPU镜像
1 2 3 4 5 6 7 8 9 10 |
# 拉取Tensorflow-GPU的镜像 docker pull tensorflow/tensorflow:latest-gpu-py3 # 启动 sudo nvidia-docker run --name wenjun-tf-gpu-py3\ -it -p 8888:8888 -p 6006:6006\ -v /data/notebooks:/notebooks\ -v /data/jupyter-notebook-dataset:/dataset\ tensorflow/tensorflow:latest-gpu-py3 |
到docker container之中确认GPU正常工作
1 2 3 4 5 6 7 8 9 10 |
import tensorflow as tf # Creates a graph. a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') c = tf.matmul(a, b) # Creates a session with log_device_placement set to True. sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) # Runs the op. print(sess.run(c)) |
这是我的运行的结果:
1 2 3 4 5 6 7 8 9 |
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-10 05:08:25.640901: I tensorflow/core/common_runtime/placer.cc:935] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0 a: (Const): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-10 05:08:25.640934: I tensorflow/core/common_runtime/placer.cc:935] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0 b: (Const): /job:localhost/replica:0/task:0/device:GPU:0 2018-08-10 05:08:25.640950: I tensorflow/core/common_runtime/placer.cc:935] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0 [[22. 28.] [49. 64.]] |
可以看到, GPU显示出来了.
本文原创, 转载需要注明出处:
https://www.flyml.net/2018/08/10/simple-deep-learning-gpu-env-setup/

文章评论