Find Spark
Contents
Find Spark is an handy tool to use each time you want to switch between spark versions in Jupyter Notebooks without the need to change the SPARK_HOME
environment variable.
It works by:
adding
pyspark
tosys.path
at runtime.
Note: You need to restart the Kernel in order to change the Spark version.
Install it.
$ pip install findspark
Use it.
# Make sure you call it before importing pyspark
import findspark
# Without parameter it will use the SPARK_HOME variable to perform the init
findspark.init('/Users/xxxx/spark/spark-2.3.1-bin-hadoop2.7')
# It will import the corresponding version (2.3.1 in this case)
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local').appName('spark-local').getOrCreate()
f'Using Spark {spark.version} from {findspark.find()}'
# 'Using Spark 2.3.1 from /Users/xxxx/spark/spark-2.3.1-bin-hadoop2.7'