Anaconda's distribution of Python claims to be the most popular platform for data science and machine learning. It supports a lot of packages and has tools to easily manage these packages and environments. Some of its advantages include
Ease of managing packages, dependences and environments using Conda
Support for machine learning and deep learning models with scikit-learn, TensorFlow, and Theano
Integration Dask, NumPy, pandas, and Numba for scalability
Support for visualization and analysis tools such as Matplotlib, Bokeh, Datashader, and Holoviews
Below, we will describe how to run Anaconda's version of Python on the CofC cluster using Jupyter Notebooks. You will start a Jupyter notebook server on a compute node from the command line and connect to your notebook using a local browser of your workstation/laptop. This documentation is based on an example at Harvard's HPC FAS-RC
General steps
Below are the steps involved in being able to run a Python notebook on our HPC using Anaconda Jupyter Notebooks
First time set up
Load the Anaconda environment. We have different anaconda2 and anaconda3 versions. It is best to use the latest one as long as there are no compatibility issues with the packages/libraries you intend to install and run.
Create the kernel/environment you need. For example, you can install Python2 or Python3, or Haskell or Julia or R ...etc environments depending on what codes you intend to use
Since your anaconda environment is contained locally in your account (~/.conda) , you can install or uninstall these environments and any packages within them as necessary. You do not need a system administrator's permission or input to manage these environments.
Running applications
Once you have the anaconda environment you need, you can
load the anaconda2/anaconda3 module,
activate the environment of choice
start an interactive SLURM session to reserve a compute resource
set up SSH port forwarding between the compute resource and your local computer (laptop/desktop)
start a Jupyter Notebook
access the Jupyter Notebook via your web browser on your local computer.
run your notebook on the HPC as if it were on your local computer.
These steps may look complicated, but we have scripts that make everything easier. Everyone needs to go through the first time set up step below. After that, there is an easier way to run calculations using scripts, or a more involved way which outlines the actual steps that are incorporated into the scripts.
First time set up
The first time you use Anaconda and its distribution of Python/R, you need to perform the following tasks on the master/head node.
If you plan to run Python3 notebooks, load up the anaconda/3 module.
$user@hpc[~]: module load anaconda/3/2020.02
Python/R versions
Initially, only the base version of Anaconda is installed in a central/shared location. If you type conda env list, you will only see the base installation.
Depending on your needs, you would need to install specific versions of Python and R. Anaconda uses the conda tool to install packages and manage your software environment. In this particular case, we'll install a Python 3.7 environment.
To activate this environment, you can use the following command:
$user@hpc[~]: source activate myJupyter_3.7
Running your notebook
After performing the tasks above once the first time you use Anaconda's Python distribution, you will not need to repeat them. You can proceed with running your notebook the easy way using scripts or the more involved way, if you want more control and are up to the challenge. The steps go as follows:
load the anaconda2/anaconda3 module,
activate the environment of choice
start an interactive SLURM session to reserve a compute resource
set up SSH port forwarding between the compute resource and your local computer (laptop/desktop)
start a Jupyter Notebook
access the Jupyter Notebook via your web browser on your local computer.
run your notebook on the HPC as if it were on your local computer.
The easy way using scripts
There are two interactive shell scripts that will allow you to run your notebooks. The steps are
go to the location where your notebook resides
reserve computer time to run your notebook ( request-interactive.sh)
start running your notebook to load up the Anaconda and Python environments easily ( run-jupyter-notebook.sh)
Reserve computer time
Running a simple interactive shell script (reserve-interactive.sh) allows you to reserve resources to run your Jupyter Notebook as shown in the following example.
A simple interactive shell script (run-jupyter-notebook.sh) guides you through the process of initiating your Jupyter Notebook run.
$user@hpc[~/test] run-jupyter-notebook.shWhichversionofAnacondawouldyouliketorun.Enterselection [0-2]:1.anaconda22.anaconda30.Quit2loadinganaconda3moduleCurrentlyLoadedModules:1) autotools 2) prun/1.2 3) gnu8/8.3.0 4) openmpi3/3.1.3 5) ohpc 6) use.own 7) anaconda/3/2020.02YouhavetheseenvironmentstopickfrommyJupyter_3.7/home/user/.conda/envs/myJupyter_3.7base*/opt/ohpc/pub/apps/anaconda/3/2020.02Whichenvironmentwouldyouliketouse?myJupyter_3.7Onyourlocalcomputer (laptop/desktop), set up SSH port forwarding using the followingcommand.1.Openasecondterminal2Copyandpastethecommandbelowintotheterminalonyourlocallaptop/desktop.Pleasenotethattheterminalwill'hang'oncetheSSHtunnelissetup.So,youwouldnotbeabletointeractwithit.PleasedonotclosetheterminalasthatwouldclosetheSSHportforwardingbetweenyourlocalcomputerandtheHPCCopyandpastethefollowingcommandonyourlocalcomputer:ssh-NL19620:hpc.cofc.edu:19620user@hpc.cofc.eduAJupyternotebookwillstartupshortly.YouwillbegiveninstructionssuchasthisToaccessthenotebook,openthisfileinabrowser:file:///home/user/.local/share/jupyter/runtime/nbserver-9192-open.htmlOrcopyandpasteoneoftheseURLs:http://...orhttp://...Whenyouarefinishedwithyourcalculation1.Use'Control-C'tostopthisserverandshutdownallkernels (twicetoskipconfirmation).2.Logoutofthecomputenodeusingbytyping'exit'3.ClosetheSSHportforwardingonyourlocalcomputerbyentering'Control-C'[I 19:04:17.442 NotebookApp] Loading IPython parallel extension[I 19:04:17.443 NotebookApp] Serving notebooks from local directory: /home/user/test[I 19:04:17.443 NotebookApp] The Jupyter Notebook is running at:[I 19:04:17.443 NotebookApp] http://hpc.cofc.edu:19620/?token=c4bb51f28e8ee836d8101b02cb9d344c98967ea0a4e9d[I 19:04:17.443 NotebookApp] or http://127.0.0.1:19620/?token=c4bb518ee836d8101b02cb9d344c98967ea0a4e9d[I 19:04:17.443 NotebookApp] Use Control-C to stop this server and shut down all kernels (twicetoskipconfirmation).[C 19:04:17.448 NotebookApp]Toaccessthenotebook,openthisfileinabrowser:file:///home/user/.local/share/jupyter/runtime/nbserver-168451-open.htmlOrcopyandpasteoneoftheseURLs:http://hpc.cofc.edu:19620/?token=c4bb51f20e68e8ee8cb9d344c98967ea0a4e9dorhttp://127.0.0.1:19620/?token=c4bb51f20e68e8ee836d810d344c98967ea0a4e9d[I 19:04:33.423 NotebookApp] 302 GET /?token=c4bb51f20e68e8ee836d8101b02cb9d344c98967ea0a4e9d (172.16.0.1) 0.40ms[E 19:04:33.466 NotebookApp] Could not open static file ''[W 19:04:33.623 NotebookApp] 404 GET /static/components/react/react-dom.production.min.js (172.16.0.1) 5.19ms referer=http://127.0.0.1:19620/tree?token=c4bb51f20e68e8ee836d8101b02cb9d344c98967ea0a4e9d
[W 19:04:33.657 NotebookApp] 404 GET /static/components/react/react-dom.production.min.js (172.16.0.1) 0.87ms referer=http://127.0.0.1:19620/tree?token=c4bb51f20e68e8ee836d8101b02cb9d344c98967ea0a4e9d
[W 19:04:36.359 NotebookApp] 404 GET /static/components/react/react-dom.production.min.js (172.16.0.1) 1.63ms referer=http://127.0.0.1:19620/notebooks/Data_echo -e "
Cleaning_using_Python_with_Pandas_Library.ipynb[W 19:04:36.436 NotebookApp] 404 GET /static/components/react/react-dom.production.min.js (172.16.0.1) 0.84ms referer=http://127.0.0.1:19620/notebooks/Data_Cleaning_using_Python_with_Pandas_Library.ipynb
[I 19:04:37.769 NotebookApp] Kernel started: 092b9b5f-4b02-4a97-8f11-d31d0076aed5[I 19:04:38.153 NotebookApp] Adapting from protocol version 5.1 (kernel 092b9b5f-4b02-4a97-8f11-d31d0076aed5) to 5.3 (client).
[I 19:04:49.097 NotebookApp] Starting buffering for 092b9b5f-4b02-4a97-8f11-d31d0076aed5:fc649ff7bb2840438168f502ab8b5c0b
[I 19:04:49.309 NotebookApp] Kernel restarted: 092b9b5f-4b02-4a97-8f11-d31d0076aed5[I 19:04:49.683 NotebookApp] Adapting from protocol version 5.1 (kernel 092b9b5f-4b02-4a97-8f11-d31d0076aed5) to 5.3 (client).
[I 19:04:49.684 NotebookApp] Restoring connection for 092b9b5f-4b02-4a97-8f11-d31d0076aed5:fc649ff7bb2840438168f502ab8b5c0b
[I 19:04:49.684 NotebookApp] Replaying 10 buffered messages^C[I19:05:25.752NotebookApp]interruptedServingnotebooksfromlocaldirectory:/home/user/test1activekernelTheJupyterNotebookisrunningat:http://hpc.cofc.edu:19620/?token=c4bb51f20e68e8ee01b02cb9d344c98967ea0a4e9dorhttp://127.0.0.1:19620/?token=c4bb51f20e68e8eb02cb9d344c98967ea0a4e9dShutdownthisnotebookserver (y/[n])? y[C 19:05:28.171 NotebookApp] Shutdown confirmed[I 19:05:28.173 NotebookApp] Shutting down 1 kernel[I 19:05:28.474 NotebookApp] Kernel shutdown: 092b9b5f-4b02-4a97-8f11-d31d0076aed5Whenyouarefinishedwithyourcalculation1.Use'Control-C'tostopthisserverandshutdownallkernels (twicetoskipconfirmation).2.Logoutofthecomputenodeusingbytyping'exit'3.ClosetheSSHportforwardingonyourlocalcomputerbyentering'Control-C'
The hard way
Run test on the master node
You are probably on the master/login node at this point and you can run quick, simple tests there. All your heavy calculations need to be submitted to a compute note using the SLURM batch scheduler. We'll get to that in the next stage. For now, let's do an interactive run on the master node.
Set up the calculation
On the server side
Load up and activate the environment on the server/node
You should see something like that looks like this:
$(jupyter_3.7) user@hpc[~]: jupyter-notebook --no-browser --port=$myport --ip='0.0.0.0'[I 12 NotebookApp] Serving notebooks from local directory: /home/bt-local[I 12 NotebookApp] The Jupyter Notebook is running at:[I 12 NotebookApp] http://hpc.cofc.edu:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01[I 12 NotebookApp] or http://127.0.0.1:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01[I 12 NotebookApp] Use Control-C to stop this server and shut down all kernels (twicetoskipconfirmation).[C 12 NotebookApp]Toaccessthenotebook,openthisfileinabrowser:file:///home/user/.local/share/jupyter/runtime/nbserver-274617-open.htmlOrcopyandpasteoneoftheseURLs:http://hpc.cofc.edu:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01orhttp://127.0.0.1:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01
Before you connect notebooks using the above URLs, you need to start SSH forwarding on your local desktop/laptop
On the client side (your laptop/desktop)
In a new terminal on your laptop/desktop, start an SSH tunnel between the server (master node) and your local machine using the command from [1].
You will be prompted for a password unless you have SSH keys already set up. Please note that you will not see any output if the connection is successful. Please keep the terminal alive and open your browser to access your notebook
Point your browser to the URL provided above in [2]
You should see something like that looks like this:
$(jupyter_3.7) user@hpc[~] jupyter-notebook --no-browser --port=$myport --ip='0.0.0.0'[I 12 NotebookApp] Serving notebooks from local directory: /home/bt-local[I 12 NotebookApp] The Jupyter Notebook is running at:[I 12 NotebookApp] http://host.cofc.edu:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01[I 12 NotebookApp] or http://127.0.0.1:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01[I 12 NotebookApp] Use Control-C to stop this server and shut down all kernels (twicetoskipconfirmation).[C 12 NotebookApp]Toaccessthenotebook,openthisfileinabrowser:file:///home/user/.local/share/jupyter/runtime/nbserver-274617-open.htmlOrcopyandpasteoneoftheseURLs:http://host.cofc.edu:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01orhttp://127.0.0.1:10002/?token=7e2e36e849cb39150f32300ad7ac9253ed7f01
Before you connect notebooks using the above URLs, you need to start SSH forwarding on your local desktop/laptop
On the client side (your laptop/desktop)
In a new terminal on your laptop/desktop, start an SSH tunnel between the server (master node) and your local machine using the command from [1]. It should look something like
You will be prompted for a password unless you have SSH keys already set up. Please note that you will not see any output if the connection is successful. Please keep the terminal alive and open your browser to access your notebook