-
Notifications
You must be signed in to change notification settings - Fork 120
PySpark Project Creation
sudo add-apt-repository ppa:webupd8team/java sudo apt update; sudo apt install oracle-java8-installer
- Create Project directory
- Copy launch_spark_submit script here ( Required if notebook also running on same spark )
#!/bin/bash
unset PYSPARK_DRIVER_PYTHON
spark-submit $*
export PYSPARK_DRIVER_PYTHON=jupyter
-
Now create entry program entry.py with 'main'
-
create another dir 'additionalCode'
-
cd additionalCode
-
Create setup.py
from setuptools import setup
setup(
name='PySparkUtilities',
version='0.1dev',
packages=['utilities'],
license='''
Creative Commons
Attribution-Noncommercial-Share Alike license''',
long_description='''
An example of how to package code for PySpark'''
)
-
mkdir utilities
-
Copy modules inside it
-
In additionalCode execute - python setup.py bdist_egg
-
This will create dist dir.
-
dist will contain egg file
-
To run ./launch_spark_submit.sh --master local[4] --py-files additionalCode/dist/PySparkUtilities-0.2.dev0-py2.7.egg entry.py