diff --git a/README.md b/README.md index a66f4e2c5e48d2b49c611291d4258b2951d6f167..c12bb5310d906c802074d37f17807541384d0245 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,11 @@ python pyspark_pi.py ``` Note the `i` that that has been added to the master hostname. +To connect to the master and workers with a browser, you need a command of the following form: +```bash +ssh -L 18080:localhost:18080 -L 8080:localhost:8080 kesselheim1@jwb0085i.juwels -J kesselheim1@juwels-booster.fz-juelich.de +``` +Then you can navigate to (http://localhost:8080) to the the output. Open Questions - In the Scala Example, is uses all worker instances as expected. The Python Example uses only 2. Why? @@ -32,6 +37,11 @@ Open Questions ToDos: - Include a Python Virtual Environment - Create a Notebook that illustrates how to run the Pi example in Juypter +- The history server does not work yet. It crashed with this error message: +``` +Exception in thread "main" java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory? +``` +The logdir config is not configured in the right way. ## References - Pi Estimate (Python + Scala): [](https://spark.apache.org/examples.html)