Wednesday, January 12, 2022

spark + jupyter notebook on ubuntu

 

Step 1: download spark from https://spark.apache.org/downloads.html

Step 2: unzip

           $  tar zxvf ../spark-3.x.x.tar.gz

Step 3: setup bash by adding the following t~/.bashrc

export SPARK_HOME=/opt/spark
export PATH=$SPARK_HOME/bin:$PATH
Step 4: install jupyter notebook:  
        $ pip install jupyter
Step 5: Start Spark: 
                $ start-all.sh
Step 6: install findspark package: 
        $ pip install findspark
Step 7: launch jupyter notebook: 
        $jupyter notebook
Step 8: create a new notebook and add the following code for testing:
import findspark

findspark.init()

import pyspark

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

df = spark.sql("select 'spark' as hello ")

df.show()

No comments:

Disable Microsoft Defender for Cloud for Visual Studio Subscription (MSDN)

I use a visual studio pro subscription which comes with $150 azure cloud credit, for some reason Microsoft Defender for Cloud was turned on ...