Conda installation
Creating a basic functional Jupyter data-science/machine learning environment
Anaconda or conda offers a well known and standardised environment for data-science and other activities. Anaconda is the graphically and visually appealing shell whereas miniconda is a much smaller and lighter command line tool that achieves the same purpose.
Download the basic version from https://docs.conda.io/en/latest/miniconda.html and make it executable(chmod +x). We place the downloaded version in /home/4all/Downloads This way it is available for others to install as well. When installing accept all the default options except the one where it is asked: "Do you wish the installer to initialize Miniconda3" which we do want so we answer "y"
After installation you can add another channel to expand the available packages and versions see: https://conda-forge.org/
conda config --add channels conda-forge conda config --set channel_priority strict conda update -n base -c defaults conda
Attached to this page one can find the handy which contains some examples of regularly used commands. For each user we will install a simple data-science environment with default tools that are considered essential. The default base environment we will leave as it is.
- Jupyter lab
- spyder (optional console based python IDE)
- pandas
- numpy (will be install due to panadas dependency)
Some are dependent on one another an may be automatically installed when selecting a package.
conda create -n data-science conda activate data-science conda install jupyterlab conda install pandas conda install jupyterlab-git
After executing the above we have a basic environment suitable for data science.
"The littelest Jupyter Hub" a.k.a. TLJH we usually execute the following. When installing this in our own environment we can ommit the sudo -E part.
sudo -E conda install pandas sudo -E conda install numpy sudo -E conda install scikit-learn sudo -E conda install nltk sudo -E conda install nodejs -c conda-forge --repodata-fn=repodata.json #JBRv if applicable this one too sudo -E jupyter labextension uninstall @jupyterlab/plotly-extension #JBRv this one can take while sudo -E jupyter labextension install jupyterlab-plotly plotlywidget @jupyter-widgets/jupyterlab-manager #JBRv next one causes a lot of conflicts and need to be looked at sudo -E conda install scikit-learn-intelex sudo -E jupyter lab build sudo -E conda install geopandas folium plotly matplotlib geopy sudo -E conda install r-base r-essentials r-irkernel
For remote access later, you need to do a basic configuration and set a password.
Jupyter server --generate-config Jupyter server password
In case we also want to be able to use the GPU we need to install either of the following as well (please also refer to this page for more info).
conda install keras-gpu
or this:
conda install tensorflow-gpu
Subsequent steps
Depending on the complete system setup, the above described set up may not work as you expect. This particularly is the case when you are using a docker environment for tensorflow. System76 has created a great tool called tensorman which eases the manipulation of tensorflow docker containers. The reason for this is that lining up the correct versions of all the tools and libraries can be tricky and prone to fail especially after upgrades/updates.
When using docker libraries employing tensorman the tools and conda environment don't know about each other. The tools (and libraries) are in the container while the the conda modules are in its own environment on the host. One solution to this problem is to not use conda at all but use pip inside the tensorflow docker container. Basically you treat the container the same way as you would treat python environments in pip or conda. This is perfectly legitimate but can be cumbersome and hard for novices. Hence we choose the conda approach partly because we also prefer this approach when we're not using tensorflow docker containers.
We're assuming you have a tensorflow container running with jupyter-lab installed as described here. As you are running this in subdirectory /project on your $HOME it is not aware of your (possibly) other environment(s) (be it from conda or pip origines). From a bash command prompt inside your tensorflow container you can install conda in the /project/miniconda3 subdirectory. It will create the usual (base) enviroment which we will use per default (NB: it also creates a .bashrc file!). If you started Jupyter it will not know about this enviroment because it uses the ipython kernel from the tensorflow docker container. This will result in "unknown module" error messages if not remedied.
To remedy this situation execute the following steps:
conda install ipykernel python -m ipykernel install --user --name conda-base --display-name "Conda (base)" jupyter kernelspec list
You can now (re)start Jupyter from your tensorflow command shell (replace port number with yours from here):
jupyter lab --ip=0.0.0.0 --port=8889 --no-browser
(NB: Jupyter needs to be restarted once for this to take effect and have the newly defined kernel available)
You can now select the kernel named "Conda (base)". This will make your conda enviroment (ie. packages/modules) available for use.