Tensorflow, Keras, Pytorch, Theano/Aesara, etc,...

Last modified by Jan Rhebergen on 2022/01/24 15:57

Tensorflow installation and configuration

These libraries (see title) enable deeplearning. Deep learning usually means application of (convolutional)-neural-nets.

Tensorflow is deeplearning toolkit originally from Google and very powerful but can be tricky to set up.
PyTorch is originally from Facebook and a little easier to use and good for quick prototyping (Tesla uses it in production).
Theano is a Python deeplearning library that is good for teaching and development and gaining a better understanding. Development is continued in a fork named Aesara
Keras is a Python library that serves as a API on top of Tensorflow.
CUDA is the low-level library that enables the use of NVidia GPU for data-science applications.

A more extensive explanation of the above mentioned tools can be found on Wikipedia. Below we describe how to install Tensorflow and CUDA. We originally chose to employ Pop!_OS as our operating system because it support data-science applications and libraries so well.

NB: we will only install tensorman (Tensorflow manager) and CUDA libraries.

In essence we will follow the instructions as given by System76 the creator of Pop!_OS:

https://support.system76.com/articles/cuda/

https://support.system76.com/articles/tensorman/

apt install system76-cuda-latest

This command can potentially pull in a lot of packages (2GB) so be patient. Subsequently install the following (latest if you can) package.

apt install system76-cudnn-11.1

The latter package may pull in a similar amount depending on the version of 'latest'. If they are the same it will be limited, if the cdnn is running behind it can be substantial.

For switching between version (if mutiple are installed) use this:

update-alternatives --config cuda
nvcc -V

To get going with Tensorflow we install tensorman (the Tensorflow manager)

apt install tensorman

For NVIDIA CUDA support, the following package must also be installed:

apt install nvidia-container-runtime

Users that work with tensorman need to be in the docker group. Edit the /etc/group file and add the user names to the docker group entry:

docker:x:998:jan,denise,romario,bas,stan,gertjan

Don't forget to run grpconv to effectuate the additions!

All the above should satisfy that which is needed on a system level. Depending on what you want to do how you want to use it you may need to install tensorflow or keras related conda packages.

For more detailed information check out the PDF file attached here.

(base) jan@liszt:~$ nvidia-smi 
Fri May 28 21:16:05 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 2080    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   32C    P8     6W /  N/A |    752MiB /  7980MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3223      G   /usr/lib/xorg/Xorg                167MiB |
|    0   N/A  N/A      3436      G   /usr/bin/gnome-shell               12MiB |
|    0   N/A  N/A    694607      C   /usr/bin/python3                  193MiB |
|    0   N/A  N/A    694761      C   /usr/bin/python3                  375MiB |
+-----------------------------------------------------------------------------+

Notes

As the system 76 tensorman webpage already notes the installation and configuration of tensorflow can be a hairy issue. Of course there are multiple instructables and youtube clips available that help but very few offer a sustainable solution. With a sustainable solution I mean a solution which will survive update, is compatible with the environment and can be safely update/upgraded. The tensorflow docker container offer this solution and tensorman is the tool to manage it. The documentation supplied is a bit scarce and often does not address specific but not uncommon use cases. Searching the web you will find various post and solutions. The current use should know that if you load tensorflow in jupyter it does not mean your code is executing on the GPU! To have code executing on the GPU using tensorflow one can use tensorman to execute it (by hand). The best solution however is to create a customised tensorflow docker container that also has Jupyter (notebook/lab) and will be able to run your code per default on the GPU.

Tags: