Step 1: download spark from https://spark.apache.org/downloads.html
Step 2: unzip
$ tar zxvf ../spark-3.x.x.tar.gz
Step 3: setup bash by adding the following t~/.bashrc
export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$PATH
Step 4: install jupyter notebook:
$ pip install jupyter
Step 5: Start Spark: $ start-all.sh
Step 6: install findspark package: $ pip install findspark
Step 7: launch jupyter notebook: $jupyter notebook
Step 8: create a new notebook and add the following code for testing:
import findspark
findspark.init()
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql("select 'spark' as hello ")
df.show()
No comments:
Post a Comment