# Running Jupyter notebooks

## Starting Sid

In all cases you need to start Sid

* In the browser go to <https://sid.hmdc.harvard.edu>
* Launch Sid
* Click `Run An Interactive Application`

## Running Jupyter

The basic steps for Python, R or Julia notebooks are the same:

* Select Jupyter
* Select Version
* Click Google drive (login if necessary)
* Select CPU/RAM
* Click `Launch Application`
* Wait until URL is visible. The status will change from `Initializing` to `Running`.
* Click on the URL

You can now create a Jupyter notebook.

Click on the `New` button on the top right and select Julia, Python or R.

**Video tutorial of these steps:**

[![Starting a Jupyter Notebook](https://1259840169-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Lp-J4FRkNjGVUHcJnwd%2F-M-p413tI8V1bmmIYF7j%2F-M-p41YHx2GxcCZeSKwf%2FStarting%20a%20Jupyter%20Notebook.png?generation=1581438479635781\&alt=media)](https://drive.google.com/open?id=1-jW7whc5wOtHN4GM159VyXG4_-NsP4ZD)

{% hint style="warning" %}
At this moment you notebooks are saves on ephemeral storage. This means that when the container is deleted all the files will be lost. Store your files on Google Drive to save your work.
{% endhint %}

## Accessing Google Drive

If you selected Google Drive as your storage provider when running a Jupyter notebook on Sid, the `google-drive` folder will be available in the default location as a subfolder of Jupyter's home directory. You should see the `google-drive` immediately upon accessing your job via the Sid-produced job url.

## Installing Python libraries

Some libraries are already installed in the Python notebooks of Jupyter. `ggplot2` or `pandas` for example. Other libraries will need to be installed manually.

* Open a new terminal `New>Terminal`
* run this to install libraries

```
pip install <library name>
```

For example if`lxml` is not installed, you will get an error like this.

```
ModuleNotFoundError                      Traceback (most recent call last)
<ipython-input-2-72d693772b13> in <module>
----> 1 from lxml import html
      2 
      3 events_html = html.fromstring(events0.text)

ModuleNotFoundError: No module named 'lxml'
```

In this case you need to install lxml using:

```
pip install lxml
```

**Video tutorial of these steps:**

[![Installing Python Modules in Jupyter](https://1259840169-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Lp-J4FRkNjGVUHcJnwd%2F-M-p413tI8V1bmmIYF7j%2F-M-p41YLEG1iM53EFNVJ%2FInstalling%20python%20modules%20in%20Jupyter.png?generation=1581438479617902\&alt=media)](https://drive.google.com/open?id=1S6STWYTd3r5J8wjPZOgDR0VOHLB9JSAA)

## Creating a script for installing Python libraries

Just as files on the ephemeral storage will be lost after deleting the container, so will the libraries need to be reinstalled for every container. To make things easier you can write a file that you can run to install the libraries you need.

{% hint style="info" %}
This is a different way than explained in the video tutorial. Working with `requirements.txt` is a more standard way of doing this.
{% endhint %}

### Creating the script

* Choose `New>Text File`
* In text file write the names of the libraries you want to install. Write one library on each line.

```
lxml
```

* Choose `File>Save` and save the file as: `requirements.txt.`
* Make sure it is saved on Google Drive
* Close the Tab

### Running the script

* Open a terminal from Jupyter: `New > Terminal`
* `cd` into the directory where you saved the file
* run the command the following command to install all the libraries:

```
pip install -r requirements.txt
```

**Video tutorial of these steps:**

[![Create an install Script for Python modules in Jupyter](https://1259840169-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-Lp-J4FRkNjGVUHcJnwd%2F-M-p413tI8V1bmmIYF7j%2F-M-p41YNkVsf6Reb1En4%2FCreate%20an%20install%20script.png?generation=1581438479693853\&alt=media)](https://drive.google.com/open?id=1RzqCegn7-QI9Swb3PEfmc0dWceQfAqOm)

## Using and installing R Libraries

Installing R libraries in Jupyter is easy. Using libraries you add the library command in the notebook cells:

```r
library(tidyverse) 
library(gridExtra)
```

If a library is not available and get an error like this. In this case the \`dslabs\` library is missing:

```
Error in library(dslabs): there is no package called ‘dslabs’
Traceback:

1. library(dslabs)
2. stop(txt, domain = NA)
```

Install the package from within your notebook with:

```r
install.packages("dslabs")
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://hmdc.gitbook.io/sid/running-jupyter-notebooks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
