Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
S
Spark-Examples
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package registry
Container registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
SC Recipe Book
AI Recipes
Spark-Examples
Commits
58053a3a
Commit
58053a3a
authored
2 years ago
by
Stefan Kesselheim
Browse files
Options
Downloads
Patches
Plain Diff
doc updated
parent
27850ffc
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
README.md
+33
-4
33 additions, 4 deletions
README.md
with
33 additions
and
4 deletions
README.md
+
33
−
4
View file @
58053a3a
...
@@ -2,10 +2,34 @@
...
@@ -2,10 +2,34 @@
## Interactive Spark Cluster
## Interactive Spark Cluster
Script
`start_spark_cluster.sh`
. Spin up a Spark cluster with the specified number of nodes.
Script
`start_spark_cluster.sh`
. Spin up a Spark cluster with the specified number of nodes.
To start, simply execute
### tldr;
To start your spark cluster on the cluster, simply run:
```
bash
sbatch start_spark_cluster.sh
```
You only need to pick the number of nodes you need.
### Preparation
1.
Clone this repo
```
bash
git clone https://gitlab.jsc.fz-juelich.de/AI_Recipe_Book/recipes/spark-examples.git
```
2.
Prepare the virtual environment to install required Pyhton dependencies.
-
`cd spark_env`
-
Edit
`requirements.txt`
-
Create the virtual environment by calling
`./setup.sh`
-
Create a kernel for Jupyter-JSC by calling
`./create_kernel.sh`
-
To recreate the virtual environment, simple delete the folder
`./venv`
3.
Pick the Number of nodes by adjusting the line
`#SBATCH --nodes=2`
in
`start_spark_cluster.sh`
.
### Execution
To start your spark cluster on the cluster, simply run:
```
bash
```
bash
sbatch start_spark_cluster.sh
sbatch start_spark_cluster.sh
```
```
This will return information similar to
This will return information similar to
```
```
Submitted batch job 6525353
Submitted batch job 6525353
...
@@ -16,18 +40,23 @@ In order to connect, you need to find out the hostname of you compute job.
...
@@ -16,18 +40,23 @@ In order to connect, you need to find out the hostname of you compute job.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6525353 develboos spark-cl kesselhe R 30:40 2 jwb[0129,0149]
6525353 develboos spark-cl kesselhe R 30:40 2 jwb[0129,0149]
```
```
In this case, the spark cluster runs on the nodes jwb0129 and jwb0149. Note that to
access the nodes from everywhere, you must add a letter
`i`
. Then a valid hostname
is
`jwb0129i.juwels`
. According adjustments are made automatically in the scripts.
The spark master always runs in the first node.
Then you can run a Spark App with a command similar to
Then you can run a Spark App with a command similar to
```
bash
```
bash
module load Stages/2023 GCC OpenMPI Spark
source
./spark_env/activate.sh
export
MASTER_URL
=
spark://jwb0129i.juwels:4124
export
MASTER_URL
=
spark://jwb0129i.juwels:4124
python pyspark_pi.py
python pyspark_pi.py
```
```
Note the
`i`
that
that has been added to the master hostname
.
Note the
`i`
that
you must replace the hostname including the
`i`
.
### Monitoring
To connect to the master and workers with a browser, you need a command of the following form:
To connect to the master and workers with a browser, you need a command of the following form:
```
bash
```
bash
ssh
-L
18080:localhost:18080
-L
8080:localhost:8080 kesselheim1@jwb0
085
i.juwels
-J
kesselheim1@juwels-booster.fz-juelich.de
ssh
-L
18080:localhost:18080
-L
8080:localhost:8080 kesselheim1@jwb0
129
i.juwels
-J
kesselheim1@juwels-booster.fz-juelich.de
```
```
Then you can navigate to (http://localhost:8080) to the the output.
Then you can navigate to (http://localhost:8080) to the the output.
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment