Conda installation

Last modified by Jan Rhebergen on 2022/01/24 15:56

Creating a basic functional Jupyter data-science/machine learning environment

Anaconda or conda offers a well known and standardised environment for data-science and other activities. Anaconda is the graphically and visually appealing shell whereas miniconda is a much smaller and lighter command line tool that achieves the same purpose.

Download the basic version from https://docs.conda.io/en/latest/miniconda.html and make it executable(chmod +x). We place the downloaded version in /home/4all/Downloads This way it is available for others to install as well. When installing accept all the default options except the one where it is asked: "Do you wish the installer to initialize Miniconda3" which we do want so we answer "y"

After installation you can add another channel to expand the available packages and versions see: https://conda-forge.org/

conda config --add channels conda-forge
conda config --set channel_priority strict
conda update -n base -c defaults conda

Attached to this page one can find the handy conda-cheat-sheet.pdf which contains some examples of regularly used commands. For each user we will install a simple data-science environment with default tools that are considered essential. The default base environment we will leave as it is.

Jupyter lab
spyder (optional console based python IDE)
pandas
numpy (will be install due to panadas dependency)

Some are dependent on one another an may be automatically installed when selecting a package.

conda create -n data-science
conda activate data-science
conda install jupyterlab
conda install pandas
conda install jupyterlab-git

After executing the above we have a basic environment suitable for data science.

"The littelest Jupyter Hub" a.k.a. TLJH we usually execute the following. When installing this in our own environment we can ommit the sudo -E part.

sudo -E conda install pandas
sudo -E conda install numpy
sudo -E conda install scikit-learn
sudo -E conda install nltk
sudo -E conda install nodejs -c conda-forge --repodata-fn=repodata.json
#JBRv if applicable this one too
sudo -E jupyter labextension uninstall @jupyterlab/plotly-extension
#JBRv this one can take while
sudo -E jupyter labextension install jupyterlab-plotly plotlywidget @jupyter-widgets/jupyterlab-manager
#JBRv next one causes a lot of conflicts and need to be looked at
sudo -E conda install scikit-learn-intelex
sudo -E jupyter lab build

sudo -E conda install geopandas folium plotly matplotlib geopy
sudo -E conda install r-base r-essentials r-irkernel

For remote access later, you need to do a basic configuration and set a password.

Jupyter server --generate-config
Jupyter server password

In case we also want to be able to use the GPU we need to install either of the following as well (please also refer to this page for more info).

conda install keras-gpu

or this:

conda install tensorflow-gpu

Subsequent steps

Depending on the complete system setup, the above described set up may not work as you expect. This particularly is the case when you are using a docker environment for tensorflow. System76 has created a great tool called tensorman which eases the manipulation of tensorflow docker containers. The reason for this is that lining up the correct versions of all the tools and libraries can be tricky and prone to fail especially after upgrades/updates.

When using docker libraries employing tensorman the tools and conda environment don't know about each other. The tools (and libraries) are in the container while the the conda modules are in its own environment on the host. One solution to this problem is to not use conda at all but use pip inside the tensorflow docker container. Basically you treat the container the same way as you would treat python environments in pip or conda. This is perfectly legitimate but can be cumbersome and hard for novices. Hence we choose the conda approach partly because we also prefer this approach when we're not using tensorflow docker containers.

We're assuming you have a tensorflow container running with jupyter-lab installed as described here. As you are running this in subdirectory /project on your $HOME it is not aware of your (possibly) other environment(s) (be it from conda or pip origines). From a bash command prompt inside your tensorflow container you can install conda in the /project/miniconda3 subdirectory. It will create the usual (base) enviroment which we will use per default (NB: it also creates a .bashrc file!). If you started Jupyter it will not know about this enviroment because it uses the ipython kernel from the tensorflow docker container. This will result in "unknown module" error messages if not remedied.

To remedy this situation execute the following steps:

conda install ipykernel
python -m ipykernel install --user --name conda-base --display-name "Conda (base)"
jupyter kernelspec list

You can now (re)start Jupyter from your tensorflow command shell (replace port number with yours from here):

jupyter lab --ip=0.0.0.0 --port=8889 --no-browser

(NB: Jupyter needs to be restarted once for this to take effect and have the newly defined kernel available)

You can now select the kernel named "Conda (base)". This will make your conda enviroment (ie. packages/modules) available for use.

Tags: