Spark on Kubernetes Python and R bindings
The version 2.4
of Spark for Kubernetes introduces Python and R bindings.
spark-py
: The Spark image with Python bindings (including Python 2 and 3 executables)spark-r
: The Spark image with R bindings (including R executable)
Databricks has published an article dedicated to the Spark 2.4
features for Kubernetes.
It’s exactly the same principle as already explained in my previous article. But this time we are using:
- A different image:
spark-py
- Another example:
local:///opt/spark/examples/src/main/python/pi.py
, once again a Pi computation :-| - A dedicated Spark namespace:
spark.kubernetes.namespace=spark
The namespace must be created first:
|
It permits to isolate the Spark pods from the rest of the cluster and could be used later to cap available resources.
|
spark.kubernetes.pyspark.pythonVersion
is an additional (an optional) property that can be used to select the major Python version to use (it’s 2
by default).
This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3.
Labels
An interesting that has nothing to do with Python is that Spark defines labels that are applied on pods. They permit to easily identify the role of each pod.
|
You can for example use the label to delete all the terminated driver pods.
|