Tensorflow, Keras, Pytorch, Theano/Aesara, etc,...
Tensorflow installation and configuration
These libraries (see title) enable deeplearning. Deep learning usually means application of (convolutional)-neural-nets.
- Tensorflow is deeplearning toolkit originally from Google and very powerful but can be tricky to set up.
- PyTorch is originally from Facebook and a little easier to use and good for quick prototyping (Tesla uses it in production).
- Theano is a Python deeplearning library that is good for teaching and development and gaining a better understanding. Development is continued in a fork named Aesara
- Keras is a Python library that serves as a API on top of Tensorflow.
- CUDA is the low-level library that enables the use of NVidia GPU for data-science applications.
A more extensive explanation of the above mentioned tools can be found on Wikipedia. Below we describe how to install Tensorflow and CUDA. We originally chose to employ Pop!_OS as our operating system because it support data-science applications and libraries so well.
NB: we will only install tensorman (Tensorflow manager) and CUDA libraries.
In essence we will follow the instructions as given by System76 the creator of Pop!_OS:
https://support.system76.com/articles/cuda/
https://support.system76.com/articles/tensorman/
apt install system76-cuda-latest
This command can potentially pull in a lot of packages (2GB) so be patient. Subsequently install the following (latest if you can) package.
apt install system76-cudnn-11.1
The latter package may pull in a similar amount depending on the version of 'latest'. If they are the same it will be limited, if the cdnn is running behind it can be substantial.
For switching between version (if mutiple are installed) use this:
update-alternatives --config cuda nvcc -V
To get going with Tensorflow we install tensorman (the Tensorflow manager)
apt install tensorman
For NVIDIA CUDA support, the following package must also be installed:
apt install nvidia-container-runtime
Users that work with tensorman need to be in the docker group. Edit the /etc/group file and add the user names to the docker group entry:
docker:x:998:jan,denise,romario,bas,stan,gertjan
Don't forget to run grpconv to effectuate the additions!
All the above should satisfy that which is needed on a system level. Depending on what you want to do how you want to use it you may need to install tensorflow or keras related conda packages.
For more detailed information check out the PDF file attached here.
(base) jan@liszt:~$ nvidia-smi
Fri May 28 21:16:05 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:01:00.0 Off | N/A |
| N/A 32C P8 6W / N/A | 752MiB / 7980MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3223 G /usr/lib/xorg/Xorg 167MiB |
| 0 N/A N/A 3436 G /usr/bin/gnome-shell 12MiB |
| 0 N/A N/A 694607 C /usr/bin/python3 193MiB |
| 0 N/A N/A 694761 C /usr/bin/python3 375MiB |
+-----------------------------------------------------------------------------+
Notes
As the system 76 tensorman webpage already notes the installation and configuration of tensorflow can be a hairy issue. Of course there are multiple instructables and youtube clips available that help but very few offer a sustainable solution. With a sustainable solution I mean a solution which will survive update, is compatible with the environment and can be safely update/upgraded. The tensorflow docker container offer this solution and tensorman is the tool to manage it. The documentation supplied is a bit scarce and often does not address specific but not uncommon use cases. Searching the web you will find various post and solutions. The current use should know that if you load tensorflow in jupyter it does not mean your code is executing on the GPU! To have code executing on the GPU using tensorflow one can use tensorman to execute it (by hand). The best solution however is to create a customised tensorflow docker container that also has Jupyter (notebook/lab) and will be able to run your code per default on the GPU.