doc updated

58053a3a · Stefan Kesselheim · 27850ffc · 58053a3a
Commit 58053a3a authored 2 years ago by Stefan Kesselheim
--- a/README.md
+++ b/README.md
@@ -2,10 +2,34 @@
 ## Interactive Spark Cluster
 Script `start_spark_cluster.sh`. Spin up a Spark cluster with the specified number of nodes. 
-To start, simply execute 
+### tldr;
+To start your spark cluster on the cluster, simply run:
+```bash
+sbatch start_spark_cluster.sh
+```
+You only need to pick the number of nodes you need.
+### Preparation
+1. Clone this repo
+```bash
+git clone https://gitlab.jsc.fz-juelich.de/AI_Recipe_Book/recipes/spark-examples.git
+```
+2. Prepare the virtual environment to install required Pyhton dependencies. 
+- `cd spark_env`
+- Edit `requirements.txt`
+- Create the virtual environment by calling `./setup.sh`
+- Create a kernel for Jupyter-JSC by calling `./create_kernel.sh`
+- To recreate the virtual environment, simple delete the folder `./venv`
+3. Pick the Number of nodes by adjusting the line `#SBATCH --nodes=2` in `start_spark_cluster.sh`.
+### Execution
+To start your spark cluster on the cluster, simply run:
 ```bash
 sbatch start_spark_cluster.sh
 ```
 This will return information similar to 
 ```
 Submitted batch job 6525353
@@ -16,18 +40,23 @@ In order to connect, you need to find out the hostname of you compute job.
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           6525353 develboos spark-cl kesselhe  R      30:40      2 jwb[0129,0149]
 ```
+In this case, the spark cluster runs on the nodes jwb0129 and jwb0149. Note that to 
+access the nodes from everywhere, you must add a letter `i`. Then a valid hostname
+is `jwb0129i.juwels`. According adjustments are made automatically in the scripts. 
+The spark master always runs in the first node.
 Then you can run a Spark App with a command similar to
 ```bash
-module load Stages/2023  GCC  OpenMPI Spark
+source ./spark_env/activate.sh
 export MASTER_URL=spark://jwb0129i.juwels:4124
 python pyspark_pi.py
 ```
-Note the `i` that that has been added to the master hostname. 
+Note the `i` that you must replace the hostname including the `i`. 
+### Monitoring
 To connect to the master and workers with a browser, you need a command of the following form:
 ```bash
-ssh -L 18080:localhost:18080 -L 8080:localhost:8080 kesselheim1@jwb0085i.juwels -J kesselheim1@juwels-booster.fz-juelich.de
+ssh -L 18080:localhost:18080 -L 8080:localhost:8080 kesselheim1@jwb0129i.juwels -J kesselheim1@juwels-booster.fz-juelich.de
 ```
 Then you can navigate to (http://localhost:8080) to the the output.