-
Notifications
You must be signed in to change notification settings - Fork 120
PySpark Project Creation
Awantik Das edited this page Nov 21, 2017
·
21 revisions
- Create Project directory
- Copy launch_spark_submit script here ( Required if notebook also running on same spark )
#!/bin/bash unset PYSPARK_DRIVER_PYTHON spark-submit $* export PYSPARK_DRIVER_PYTHON=jupyter
- Now create entry program entry.py with 'main'
- create another dir 'additionalCode'
- cd additionalCode
- Create setup.py from setuptools import setup
setup( name='PySparkUtilities', version='0.1dev', packages=['utilities'], license=''' Creative Commons Attribution-Noncommercial-Share Alike license''', long_description=''' An example of how to package code for PySpark''' )
- mkdir utilities
- Copy modules inside it
- In additionalCode execute - python setup.py bdist_egg
- This will create dist dir.
- dist will contain egg file
- To run ./launch_spark_submit.sh --master local[4] --py-files additionalCode/dist/PySparkUtilities-0.2.dev0-py2.7.egg entry.py