JupyterHub
JupyterHub+JupyterLab provide a complete multi-user web interface to the HPC for interactive computing
Project Jupyter Overview
Project Jupyter provides tools for users to do interactive computing using different programming languages on a unified web interface. It has many components intended for single-user or multi-user environments running on personal computers or shared resources like our HPC cluster. Depending on one's needs, it is possible to deploy one or more of these components together.
Single-user Jupyter Notebooks
Jupyter Notebooks are single-user web-based interactive notebooks. They allow users to create and share documents that contain live code, equations, visualizations and narrative text. Using Anaconda, anyone can install and run Jupyter Notebooks on their local computer. However, to be able to run Jupyter Notebooks on a remote shared resource like our HPC, one would need to log into the HPC cluster, use the commandline to reserve computing resources and set up some cumbersome SSH tunneling as described here. A more convenient way to run Jupyter Notebooks on a shared resource is using JupyterHub.
Multi-user environments using JupyterHub
JupyterHub is the great way to serve single-user Jupyter Notebooks to a large number of users in a clean and secure way. It allows multiple authentication methods and integrates with our HPC's batch scheduler to request computing resources and spawn Jupyter Notebook servers (jupyterhub-singleuser
) on those computing resources when they become available.
JupyterHub+JupyterLab for a complete interactive web interface
JupyterLab adds powerful plugins like the terminals, file browsers, built-in Markdown editor, ability to start and stop multiple kernels, and many other extensions to any JupyterHub installation. All these capabilities enable users to do all their interactive computation from the JupyterLab+JupyterHub interface.
See the video below to review some of these capabilities:
Typical Workflow
A typical workflow for a user on our JupyterHub installation would look like this:
Use a browser to connect to our JupyterHub installation (https://hpc.cofc.edu/jupyterhub)
Log in with your CofC HPC credentials
Request a resource
Local login node for tasks that are not computationally intensive
your notebook server will be shut down after a period of inactivity
Compute node for computationally intensive tasks
your notebook server will be shut down when your allocated time runs out or when you explicitly stop your notebook server
Open your notebook using the appropriate kernel
Shared kernels - these kernels have most of the libraries you would need, but you can't install new packages into the kernels if anything is missing
User kernels - these are kernels you install in your user space (
$HOME/.conda/env
) and have full control over
Run your notebook or perform any other tasks
Shut down all kernels and the notebook server when you finish
Accessing our JupyterHub Installation
Our JupyterHub installation can be found at https://hpc.cofc.edu/jupyterhub. To access it,
users need to have a CofC HPC account
users need to
be on our campus 'wired' or 'eduroam' wireless networks or
use our CofC VPN if they are off-campus
Please note that users will need to be added to the CofC HPC VPN group to access the HPC and services it hosts like JupyterHub.
Requesting Resources
Once you log into our JupyterHub installation, you will see a Server Options page asking for the resources you need. Please note that there may be 5-10 second delay as the server confirms your credentials and starts up your environment. From the Select a job profile dropdown menu, please select the appropriate resource based on your needs.
Login-local (access to login node; no heavy computations allowed)
If you simply want to look at data, transfer files or some non-intensive analysis, this is the best option for you. It allows you to perform these simple tasks on the login node.You should, however, not run anything computationally intensive because you are on a shared server with many other users. If you intend to run demanding computations, please request one of the other job profiles.
Do not run anything computationally intensive on the login node.
By the same token, if you are not doing anything computationally intensive, do not waste resources by requesting compute modes.
In short, request the right resource for your needs every time.
Compute-8 cores, 32GB for 2 hrs -
this requests 8 computing cores, 32GB memory for 2 hours in one of the compute nodes.
If the resources you requested are available immediately, it will spawn a single-user Jupyter Notebook for you in the compute node
If the resources you requested are not available immediately, the server will wait for as long as 120 seconds to see if anything becomes available. If it does not find the resources you requested, it will inform you to try making the request later.
Compute-8 cores, 32GB for 4 hrs -
Compute-16 cores, 64GB for 2 hrs -
Compute-16 cores, 64GB for 4 hrs -
...
Compute-1 GPU, 24 cores, 180GB for 2 hrs -
This one further requests one of our Nvidia Tesla V100 GPUs. We'll add the capability to request our NVIDIA Quadro P4000 GPU is there is interest.
Compute-1 GPU, 24 cores, 180GB for 4 hrs -
If the time (time limit) or resources (#CPUs, memory, GPUs) from the above profiles are not enough for your computing needs, please email hpc@cofc.edu
for help and we'll accommodate your request.
Once the requested resource is available, you will have a single-user Jupyter notebook server running on that resource.
Kernels
What makes Project Jupyter powerful is that it allows users to run notebooks written in many programming languages even though Python(iPython) is the original language of choice. We provide a set system-wide kernels all users can access, but not modify. Users can add their own kernels and make it visible to the Jupyter environment.
If you open a terminal on the designated resource, load the anaconda/3
module, activate the jupyter-hub
environment, and enter jupyter kernelspec list
, you will see the system-wide and user kernels available to you.
System-wide/Shared kernels
We currently provide systemwide kernels to run
Python3.7
,Python3.6
,Python2.7
- including most commonly used libraries such as
numpy
,scipy
,matplotlib
,plotly
,pandas
,tensorflow
,scikit-learn
,seaborn
,imblearn,
numba,
dask
,rdkit
,pybel,
openbabel
Tensorflow2.0
- including the most commonly used libraries listed above and with support for GPUsiR
- kernel to run R/3.5.2iJulia
- to run Julia codeMatlab
- to run matlab/r2019bMathematica
- to run mathematica/12.1currently, we only have license to run it on the login node
GNUplot
- to run GNUplot 3.5.2Psi4
- to run Psi4 notebooks for computational chemistryArcGIS
- to run ArcGIS Python notebooks
These system-wide kernels are installed at /opt/ohpc/pub/apps/anaconda/3/2020.02/envs
where users do not have permission to modify them. Therefore, in cases where these kernels are insufficient, users would need to
email
hpc@cofc.edu
to ask for modifications of these kernels or installation of additional ones ORinstall an environment in their own user space and make it available to the Jupyter server
The names of the system-wide kernels has a "_'" prefix to distinguish them from kernels in your user space. Users are encouraged to give unique display names to their own kernels as well.
User kernels
To install a new user kernel, say to install Python 2.7
, you would need to
open a terminal in the HPC cluster
load the appropriate anaconda or miniconda module and activate the base environment if it isn't already
create a new environment
install the packages (including ipykernel) you want inside that environment
make that kernel visible to Jupyter
These steps are explained below.
Log into the HPC
You can access the HPC different ways, but the easiest way would be using the JupyterHub. Log into our JupyterHub installation and open a terminal from the ensuing dashboard.
Load anaconda module
You are encouraged to use the latest anaconda installation (which is the default), but slightly older versions should work as well.
Create a new environment
You would need to create a new environment for the new kernel in your user space. As you can see below, the default environment is the base
containing base Python 3.7 and other useful tools like the pip
and conda
package managers.
It is wise to append something (e.g. myPython37
instead of Python37
) to distinguish system-wide kernels from those in your user space. If a kernel in your user space has the same name as a system-wide one, the one in your user space will have precedence. You can check the order in which jupyter looks for kernels by entering 'jupyter --paths
' in the command line
Now, we'll create a Python 2.7 environment
And install the most commonly used packages for scientific and data analysis
Do not forget to install iPykernel as you would need it to run your kernel in Jupyter Notebooks
Make that kernel visible to Jupyter
To be able to run your kernel in Jupyter Notebooks, you need to make it visible to our JupyterHub installation from the new environment.
Verify that the kernel is installed and visible
You should deactivate the current environment, activate the jupyter-hub
environment, and check to see if the kernel is installed and visible
You can further confirm the kernel's availability on the JupyterHub's web interface:
Working with the JupyterLab Interface
As you can see above JupyterLab adds a lot of capabilities to our JupyterHub installation by providing terminals, text editors, data viewers, file upload/download, and a myriad of other extensions. In short, it removes a lot of the barriers non-expert users face when working in a terminal-based environment like our HPC.
A demonstration of a lot of its features is available here.
The 6 minute video below covers some of the basic features.
The Interface
The JupyterLab interface is fairly intuitive and you can learn more about it here among other places. We will highlight the most important parts below.
Menu bar - contains
File
,Edit
,View
,Kernel
,Tabs
,Settings
andHelp
with dropdown options under each.Left Sidebar
allows file browsing,
checking running kernels,
shows other JupyterLab extensions
Main Work Area
displays notebooks, terminals, images, data files, ... etc
allows tiling of windows in any kind of configuration
The options in the left sidebar are labeled in the figure below.
JupyterLab extensions
We provide a set of standard extension as well as a few others that are deemed useful to our user base. While individual users do not have permission to install extensions themselves, they can send requests to hpc@cofc.edu
.
Exiting Cleanly
Since the HPC is a shared resource, every user has a responsibility to make sure that it is being used in a manner that benefits everyone optimally. One way of ensuring that is
requesting the right amount of resource ( CPUs, memory, GPUs) for the appropriate amount of time ( 2hrs, 4hrs, ... )
Properly shutting down running kernels and single-user JupyterHub server when you are finished
Information about requesting the right resource for the right task is described in the "Requesting Resources" section above.
Shut down running kernels
It is not unusual for you to have many terminals, notebooks, files ... etc running on JupyterHub. When you finish, please click on 'Running Terminals and Kernels' button on the left sidebar and shut down the running kernel sessions.
Shut down single-user server
Then, go to the File
menu and select Hub Control Panel
to see the single-user Jupyterhub servers running under your account. Then, stop the server and log out.
F.A.Q.
How come I can't connect to the JupyterHub interface?
If your attempt to connect to our JupyterHub installation is timing out, it is likely because you are being blocked by our campus firewall.
If you are on campus, make sure you are connected to the wired or 'eduroam' wireless networks. You will need use a VPN if you are using the campus guest wireless network.
If you are off campus, make sure you are using the CofC VPN.
How can one turn any Python script into a notebook?
Yes. There are many tools to convert regular Python scripts (*.py
) to iPython notebooks (*.ipynb
). These include
p2j
- install usingpip install p2j
orconda install p2j
py2nb
...
Alternatively, you can just open a blank Jupyter Notebook and copy/paste your Python script into the cell and save it as iPython notebook.
Can I run Python2 scripts/notebooks?
Yes, you can still use the 'Python2.7-shared' kernel to run Python2 scripts/notebooks. However, everyone is strongly encouraged to migrate to Python3 given the end of official support for Python2 in January 2020. You can convert Python2 script to Python3 using 2to3 - the automated Python 2 to 4 code translator.
What other kernels are available for Jupyter?
Scala
, Matlab
, Mathematica
, Haskell
, Spark
, Javascript
, ...
You can find a more complete list at https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
If there is a particular kernel you or your students want to share, please send that request to hpc@cofc.edu
and we'll make it available.
If the time or resources I request is not enough, can I request an extension or more resources?
Yes. The time limit (2-4 hours) in the current profiles is intended to prevent users from hogging compute nodes for unnecessarily long time. If you have a calculation that requires more time, please email hpc@cofc.edu and request for the time limit to be extended. If your calculations generally require more than 4 hours, please email hpc@cofc.edu and we will create a new profile with a longer time limit for you.
If the number of CPUs or RAM in the current profiles do not match your needs well, we can create new ones.
Can the JupyerHub installation be used for instruction in a classroom environment?
Yes. Faculty can request classroom accounts they can use on a recurring basis. Please note that off-campus access to the HPC as a whole and the JupyterHub installation in particular requires students to be added to the HPC VPN group. Please make those requests far in advance because they take time.
Are there sample example notebooks one can play around with?
Yes. For a trusted set of examples that have been tested on our JupyterHub installation, please check your $HOME/jupyter
directory.
There is a large "gallery of interesting Jupyter Notebooks" from all sorts of fields here: https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks
There is a constantly growing collection of notebooks at https://nbviewer.jupyter.org/
Kaggle also has lots of interesting notebooks to play around with.
Please note that some of the Notebooks are old (dating back to 2016) and they may not run on our installation. Some libraries, calls, attributed may have been deprecated or replaced. You are better off running more recent notebooks (2019 - ).
Please be cautious about randomly downloading and running a notebook from the web. You could compromise your account and the security of the cluster as a whole. Also, *do not* run these notebooks on the login node as their resource requirements could be substantial.
Are Google Colaboratory Notebooks capable of running on our JupyterHub installation?
Yes, Google Colab runs Jupyter Notebooks on Google Cloud Platform. So. their notebooks should be compatible with our JupyterHub installation.
Last updated