CADESUser DocumentationUser-Contributed Tutorial IndexPython Analytics Server

Setting up a Python Analytics Server

Suhas Somnath
Advanced Data and Workflows Group
National Center for Computational Sciences
Oak Ridge National Laboratory

10/9/2017

Table of contents:

Introduction:

  • R and python are two of the most popular languages for analyzing scientific data. However, it can be challenging for first-time users to set up these familiar languages on cloud computing resources for data analytics.
  • These self-service instructions will guide you through the process of creating a virtual machine (VM) on the CADES Cloud cloud (comparable to a powerful desktop computer and scalable) that you could use instead of your personal computer for data analysis via a Jupyter notebook server.
  • The entire setup process (besides step 0) should take about 20 minutes. Once set up, connecting to the notebook server should only take a few button clicks.

Support:

  • CADES provides the ability and support to create and use virtual machines. Users are free to use such VMs for a variety of purposes, such as running a Jupyter notebook server. Users are responsible for maintaining the software installed on their own VMs (e.g. - python packages, Jupyter server, etc.).
  • Please follow all steps in this guide to ensure a smooth setup of your analytics VM. For questions regarding the virtual machine itself (steps 0-2), please contact CADES support. If you have any questions regarding the setup of anaconda, Jupyter, etc. (steps 3-6) please feel free to contact me.

Best Practices and ethical use of the cloud:

A virtual machine is like a public-use desktop or a laptop. It costs money to run VMs and reserving resources for your VM, precludes others from utilizing resources. Here are a few guidelines for using and managing VMs:

  • If you are not using the VM for extended periods or are not using all the horsepower, consider resizing to a smaller flavor (fewer CPU cores and smaller RAM). Remember that you can always resize it back to something bigger whenever necessary.
  • Additionally, you can shut down your VM, just like you would shut down your personal computer.
  • Consider deleting VMs you will never use again.

Other notes:

  • The remote machine runs the Ubuntu (Linux) operating system. It is recommended to take this short tutorial if you are new to Linux and/or the command line interface.
  • CADES has several helpful guides on learning the basics of Linux as well.
  • This document is limited to the instructions necessary for setting up a virtual machine for data analytics using python and is not intended to serve as a comprehensive manual for maintaining and administering your virtual Linux machine or implementing other advanced analytics features such as JupyterHub. Please refer to other online resources for such topics.
  • Though you can set up a VM for analytics using R, this is NOT extendable to Matlab and similar proprietary / paid software packages.
  • This virtual machine will be only be accessible within the ORNL network. You would need to use the VPN on an ORNL laptop when off campus or access the machine via Citrix on your personal computer.
  • Many thanks to the Chris Layton, Pete Eby, Ketan Maheshwari from CADES and Ondrej Dyck from CNMS.

Configuration

Step 0: Getting a CADES Cloud account:

  1. You will need to request for access to the CADES Cloud from the following the instructions here. You should receive a mail within a few minutes to 1-2 hours regarding the approval of your request.
  2. OPTIONAL: By default, everyone has access to virtual machines that have up to 8 GB of memory and 16 CPU cores. If you need more, you can request to have your quotas increased by contacting CADES including details such as your three character UCAMS id, justification, and duration for the increase in quota in the email.
  3. OPTIONAL: Consider joining the #ornl_cloud channel on the CADES SLACK group to communicate with other users of the CADES cloud.

Step 1: Creating and Launching an instance:

You can follow the four steps in CADES’ documentation in the links below but pay attention to the following notes:

  1. Log in to Horizon, name your VM - follow the instructions on this page as is.
  2. Choose a flavor, image, and boot source - follow instructions here but pay attention to a few things:
    1. At the Source Tab:
      • Delete Volume on Instance Delete: Set to No if you want to drive to be kept alive even though the instance is deleted. This is generally a good idea - you can always delete the volume (after you delete the instance) if you don't need it.
      • Volume Size: This is the size of the storage drive that will contain the operating system, data, python packages etc. You are recommended to set this to 16 GB or larger. If you intend to use your CADES Cloud account exclusively for this analytics server, you can use up your entire quota (eg. 40 GB). Like any personal computer, you can always add volumes to your instance but starting off with a large enough volume can mitigate additional work. Please see this document if you already created an instance but need to add a new storage volume.
    2. At the Flavor Tab: This mainly determines the number of processor cores and memory. You can change the flavor after creating the instance so do not worry about this step very much. Pick the flavor that best suits your applications:
    3. Pick any flavor that begins with m1. if you do a lot of statistical analysis that requires a large RAM compared to the number of CPU cores
    4. Pick any flavor beginning with c1. if you tend to run a lot of small computations in parallel.
    5. For additional flavors request CADES to increase your quota. See Step 0.
    6. You can always run multiple machines in parallel. So you could distribute your memory / CPUs among two machines that fully utilize your quota.
  3. Set up a security group as it says in the document.
  4. Configure a key pair for accessing the VM as it says in the document.

Step 2: Accessing the Instance:

The instructions below are a simplification of the official CADES documentation:

1. Find the IP address of your machine

  1. While in the Horizon interface you used for creating the instance to your VM, Click on the Compute tab, then the Instances sub-tab
  2. Copy the IP address listed for your instance

2. Get the public SSH key

  1. Click on the Access and Security tab and then navigate to Key Pairs.
  2. Click on the key. In this case – CADESCloudKey
  3. Copy the contents of Public Key and paste into a text editor like TextEdit on MacOS or Notepad in Windows. Read the next step before saving:
  4. Before saving, make sure to change the format to plain text. This is especially true of TextEdit in Mac (in the Menu bar - Go to Format -> Make Plain Text) Wordpad (when saving, select Text Document (.txt) instead of the default Rich text in the pull down menu) in Windows for example.
  5. Save the file as id_rsa.pub

From here on follow instructions specific to your operating system:

ORNL Mac / Linux computer:

Before you begin: These instructions are for ORNL computers only. Instructions for personal computers will follow. If you are outside the ORNL network but working on an ORNL computer, you will need to connect to the ORNL VPN using your PIN and RSA token to get back into the network

1. Moving the keys:
  1. OPTIONAL but Recommended: If you are interested in accessing your instance from your personal computer, it is recommended to make a copy of your public and private keys and place the copies someplace on ORNLDATA (e.g. - My Documents).
  2. Open the Terminal application and navigate to the directory where you stored your private key.
  3. Rename your private key from the original name (for example - CADESCloudKey) by typing:

    $ mv CADESCloudKey id_rsa
    
  4. Move the private and public keys to ~/.ssh/. For example, if you stored both the private and public keys in Documents.

     $ cd Documents
     $ mv id_rsa ~/.ssh/id_rsa
     $ mv id_rsa.pub ~/.ssh/id_rsa.pub
    
2. OPTIONAL: Shortcuts!

Aliases:

You can set up aliases that make it easier to refer to your remote machine. Aliases can turn commands like: ssh cades@172.22.3.50 to something far simpler like: ssh jupyterVM.

Graphical interface for SSH:

The Mac Terminal application comes with utilities that simplify the ssh process with a graphical interface. If you are comfortable with the command line and do not mind typing ssh / sftp commands you can skip this step.

If you are interested in this quick setup, follow the instructions here. Please only follow instructions till step 6 (set up the entries and do not follow any steps including and following those that expect you to click on the Connect button. We will get to this in Step 4 below)

3. Connecting to the instance
  • Via the command line interface on the Terminal app:

    ssh cades@172.22.3.50
    
  • Via the graphical interface of the Mac Terminal app:

    1. Open the Terminal app.
    2. Go to ShellNew Remote Connection
    3. Ensure that Secure Shell (ssh) is selected on the left-hand column, then select the first entry you made (cades@172.22.3.50 in my case) in the right-hand column, and click on the Connect button in the bottom right.

ORNL Windows computer:

Before you begin: These instructions are for ORNL computers only. Instructions for personal computers will follow. If you are outside the ORNL network but working on an ORNL computer, you will need to connect to the ORNL VPN using your PIN and RSA token to get back into the network.

  1. Install PuTTY: PuTTY should be preinstalled on all ORNL Windows computers. However, if you don’t have PuTTY installed, install it via the following links:
  2. OPTIONAL but Recommended: If you are interested in accessing your instance from your personal computer, you are recommended to make a copy of your public and private keys and place the copies some place on ORNLDATA.
  3. Configure PuTTY to connect to your instance by following the instructions starting from the topic titled Connect to Your VM Instance Using PuTTY in CADES' instructions
  4. Configure the tunneling to connect to the Jupyter notebook server by following the instructions here

From your personal computer:

  1. Log in via the Citrix page
  2. PuTTY setup:
    • Select the ORNL General Desktop application
    • Follow steps 2-4 in the instructions laid out for ORNL Windows computers above.
  3. You can access your VM through at least two routes:
    • Recommended: In the Citrix menu, select the PuTTY application and use it as you would use an ORNL Windows computer.
    • In Citrix, select the ORNL General Desktop application and use the PuTTY application to access your VM. This may be slow (bandwidth wasted on transporting the bits of the Windows virtual machine) and tedious (you cannot forward the Jupyter notebook server to your personal computer - it would stay within the Windows virtual machine). This option is preferable in the event that you want to upload data / code from your ORNLDATA to your VM.

Step 3: Installing analytics packages on the instance:

  1. Download Anaconda 5.2 -> python 3.6. You can download a different version if you wish.

    $ mkdir temp
    $ curl https://repo.continuum.io/archive/Anaconda3-5.2.0-Linux-x86_64.sh > temp/Anaconda3-5.2.0-Linux-x86_64.sh
    
  2. Change privileges before installing Anaconda

    $ chmod +x temp/Anaconda3-5.2.0-Linux-x86_64.sh
    
  3. Install Anaconda:

    • Start the installer

         $ bash temp/Anaconda3-5.2.0-Linux-x86_64.sh
      
    • Follow the prompts to install Anaconda. Accept the license, say yes to installing to the default location, say yes to prepending anaconda to the path.

    • Delete temporary installation folder:
         $ rm -r temp
      
  4. Switch to anaconda environment:

     $ source ~/.bashrc
    
  5. Install missing packages for wholesome Jupyter functionality:

    • Enable ability to export to pdf in Jupyter: $ conda install -c anaconda-nb-extensions nbbrowserpdf
    • Enable javascript for interactive elements in Jupyter: $ jupyter nbextension enable --py --sys-prefix widgetsnbextension
  6. OPTIONAL: To simplify the command to start up the Jupyter notebook:

    1. First create the configuration file:

       $ jupyter notebook --generate-config
      
    2. Open up the notebook:

       $ nano ~/.jupyter/jupyter_notebook_config.py
      
    3. Use the key combination Ctrl+W to search for .open_browswer

    4. Uncomment the line
    5. Set the flag to False
    6. Search for NotebookApp.port = 8888 using Ctrl+W
    7. Uncomment the line
    8. Set the port number to 8889 (or any number > 1024 for that matter)
    9. Close the editor with Ctrl+X
    10. Save the file
  7. OPTIONAL - You can always install any python packages from this point on. You could install deep-learning frameworks like Keras or TensorFlow but you are recommended to use optimized Docker containers for this. Please refer to this separate tutorial for this.

Running

Step 1: Starting a Jupyter server:

  1. Ensure that you don’t leave room for accidental damage to the rest of the VM (such as the anaconda folder etc.) by starting the Jupyter notebook in a new / separate folder. Perhaps this folder contains data + notebooks, etc. For now, we will make an empty folder and start the notebook from there:

    $ mkdir workspace
    $ cd workspace
    
  2. OPTIONAL:Persistent Jupyter server: As it stands, if you close this ssh session, your command or operation (for example, a running jupyer server) will be aborted as well. In order to keep the jupyter server easily accessible, we will need to either use the screen or the tmux commands. We will be using screen here. Note that this approach does not keep your ssh connection to the Jupyter server (discussed in the next step) alive if your local computer goes to sleep or is shut down. IF you need your computation / analysis to run even after you shut down your local machine, you are recommend to run your analysis as a script on the remote machine instead of using Jupyter notebooks. If you decide to use screen, type the following command BEFORE you initiate the Jupyter server:

    $ screen
    
  3. Starting the Jupyter server:

    • If you modified the configuration file that was optional in the previous step:

      $ jupyter notebook
      
    • If you did NOT follow the optional instructions, specify the port and no-browser flags

      $ jupyter notebook --no-browser --port=8889
      
  4. OPTIONAL: If you ran the notebook with screen:

    1. You can now detach the screen using the key sequence: Ctrl+A, Ctrl+d.
    2. You can now close the ssh session to the remote machine. This will NOT close your Jupyter server.

      $ exit
      

Step 2: Accessing the Jupyter server:

Mac / Linux:

Connection in the Mac Terminal app:

  1. Open the Terminal.
  2. Depending on which method you prefer (and have set up):

    • Command line interface:

      $ ssh -N -L localhost:8889:localhost:8889 cades@172.22.3.50
      
    • Graphical Interface: see this document.

  3. Open a browser (Chrome is recommended for interactive widgets) and go to: http://localhost:8889/.

Windows:

  1. Close any open PuTTY connections to the VM.
  2. Open PutTTY, load the configurations for your machine and connect. You will be presented with a new SSH connection to the VM. You can close this if you do not need it.
  3. Open a browser (Chrome is recommended for interactive widgets) and go to: http://localhost:8889/.

Personal computer:

  1. Log in via the ORNL Citrix page.
  2. Select the PuTTY application.
  3. Follow the same instructions for Windows computers.

Step 3: Shutting down the Jupyter server:

Once you are done working on your Jupyter server, you will need to:

  • If you used screen and closed your SSH connection to your virtual machine where you initiated the Jupyter server, SSH into your virtual machine:
    1. Windows – use your saved PuTTY profile
    2. Mac / Linux: Use either the command line or graphical interface described in Step 2. For the command line interface - open the terminal and replace with your IP address:
      $ ssh cades@172.22.3.50
      

At this point, you should either have access to an existing SSH connection to the remote machine or you should have created a new connection in the preceding step.

  • If you used screen, re-attach the screen where your Jupyter notebook server is running by typing:

    $ screen –r
    

You should be seeing the print logs of the Jupyter server on the remote machine now.

  • Press Ctrl+C twice to shut down the Jupyter server as you normally would on your local machine.

results matching ""

    No results matching ""