From 606a2c6292f7d01702c18f75b33d9d7f88bcb01b Mon Sep 17 00:00:00 2001 From: elvaliuliuliu <47404285+elvaliuliuliu@users.noreply.github.com> Date: Wed, 20 Nov 2019 13:43:05 -0800 Subject: [PATCH 1/2] Init --- docs/take-to-prod.md | 108 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 docs/take-to-prod.md diff --git a/docs/take-to-prod.md b/docs/take-to-prod.md new file mode 100644 index 000000000..669dae2b4 --- /dev/null +++ b/docs/take-to-prod.md @@ -0,0 +1,108 @@ +Taking your Spark .Net Application to Production +=== + +# Table of Contents +This how-to provides general instructions on how to take your .NET for Apache Spark application to production. +In this documentation, we will summary the most commonly asked scenarios when running Spark .Net Application. +And you will also learn how to package your application and submit your application with [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) and [Apachy Livy](https://livy.incubator.apache.org/). +- [How to take your application to production when you have single dependency](#how-to-take-your-application-to-production-when-you-have-single-dependency) + - [Scenarios - Scenario 1 and Scenario 2](#scenarios---single-dependency) + - [Package your application](#package-your-application---single-dependency) + - [Launch your application](#launch-your-application---single-dependency) +- [How to take your application to production when you have multiple dependencies](#how-to-take-your-application-to-production-when-you-have-multiple-dependencies) + - [Scenarios - Scenario 3, Scenario 4, Scenario 5 and Scenario 6](#scenarios---multiple-dependencies) + - [Package your application](#package-your-application---multiple-dependencies) + - [Launch your application](#launch-your-application---multiple-dependencies) + +## How to take your application to production when you have single dependency +### Scenarios - single dependency +#### Scenario 1. SparkSession code and business logic in the same Program.cs file +This would be the simple usecase when you have SparkSession code and business logic (UDFs) in the same Program.cs file and in the same project (e.g. mySparkApp.csproj). +#### Scenario 2. SparkSession code and business logic in the same project, but different .cs files +This would be the usecase when you have SparkSession code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). + +### Package your application - single dependency +Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your application in Scenario 1 and Scenario 2. + +### Launch your application - single dependency +#### 1. Using spark-submit +Please see below as an example of running your app with `spark-submit` in Scenario 1 and Scenario 2. +```shell +%SPARK_HOME%\bin\spark-submit \ +--class org.apache.spark.deploy.dotnet.DotnetRunner \ +--master local \ +--files bin\Debug\netcoreapp3.0\mySparkApp.dll \ +bin\Debug\netcoreapp3.0\microsoft-spark-2.4.x-0.6.0.jar \ +dotnet bin\Debug\netcoreapp3.0\mySparkApp.dll ... +``` +#### 2. Using Apache Livy +Please see below as an example of running your app with Apache Livy in Scenario 1 and Scenario 2. +```shell +{ + "file": "adl://.azuredatalakestore.net//microsoft-spark-2.4.x-0.6.0.jar", + "className": "org.apache.spark.deploy.dotnet.DotnetRunner", + "files": [“adl://.azuredatalakestore.net//mySparkApp.dll" ], + "args": ["dotnet","adl://.azuredatalakestore.net//mySparkApp.dll","",","...",""] +} +``` + +## How to take your application to production when you have multiple dependencies +### Scenarios - multiple dependencies +#### Scenario 3. SparkSession code in one project that references another project including the business logic +This would be the usecase when you have SparkSession code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). +#### Scenario 4. SparkSession code references a function from a Nuget package that has been installed in the csproj +This would be the usecase when SparkSession code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). +#### Scenario 5. SparkSession code references a function from a DLL on the user machine +This would be the usecase when SparkSession code reference business logic (UDFs) on the user machine (e.g. SparkSession code in the mySparkApp.csproj and businessLogic.dll on a different machine). +#### Scenario 6. SparkSession code references functions and business logic from multiple projects/solutions that themselves depend on multiple Nuget packages +This would be a more complex usecase when you have SparkSession code reference business logic (UDFs) and functions from nuget packages in multiple projects and/or solutions. + +### Package your application - multiple dependencies +- Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your mySparkApp.csproj in Scenario 4 and Scenario 5 (and businessLogic.csproj for Scenario 3). +- Please see detailed steps [here](https://github.com/dotnet/spark/tree/master/deployment#preparing-your-spark-net-app) on how to build, publish and zip your application in Scenario 6. After packaging your .Net for Spark application, you will have a zip file (e.g. mySparkApp.zip) which has all the dependencies. + +### Launch your application - multiple dependencies +#### 1. Using spark-submit +- Please see below as an example of running your app with `spark-submit` in Scenario 3 and Scenario 5. +And you should use `--files bin\Debug\netcoreapp3.0\nugetLibrary.dll` in Scenario 4. +```shell +%SPARK_HOME%\bin\spark-submit \ +--class org.apache.spark.deploy.dotnet.DotnetRunner \ +--master local \ +--files bin\Debug\netcoreapp3.0\businessLogic.dll \ +bin\Debug\netcoreapp3.0\microsoft-spark-2.4.x-0.6.0.jar \ +dotnet bin\Debug\netcoreapp3.0\mySparkApp.dll ... +``` +- Please see below as an example of running your app with `spark-submit` in Scenario 6. +```shell +spark-submit \ +--class org.apache.spark.deploy.dotnet.DotnetRunner \ +--master yarn \ +--deploy-mode cluster \ +--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./udfs \ +--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./myLibraries.zip \ +--archives hdfs:///businessLogics.zip#udfs,hdfs:///myLibraries.zip \ +hdfs:///microsoft-spark-2.4.x-0.6.0.jar \ +hdfs:///mySparkApp.zip mySparkApp ... +``` +#### 2. Using Apache Livy +- Please see below as an example of running your app with Apache Livy in Scenario 3 and Scenario 5. +And you should use `"files": ["adl://.azuredatalakestore.net//nugetLibrary.dll"]` in Scenario 4. +```shell +{ + "file": "adl://.azuredatalakestore.net//microsoft-spark-2.4.x-0.6.0.jar", + "className": "org.apache.spark.deploy.dotnet.DotnetRunner", + "files": [“adl://.azuredatalakestore.net//businessLogic.dll" ], + "args": ["dotnet","adl://.azuredatalakestore.net//mySparkApp.dll","",","...",""] +} +``` +- Please see below as an example of running your app with Apache Livy in Scenario 6. +```shell +{ + "file": "adl://.azuredatalakestore.net//microsoft-spark--.jar", + "className": "org.apache.spark.deploy.dotnet.DotnetRunner", +    "conf": {"spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS": "./udfs, ./myLibraries.zip"}, + "archives": ["adl://.azuredatalakestore.net//businessLogics.zip#udfs”, "adl://.azuredatalakestore.net//myLibraries.zip”], + "args": ["adl://.azuredatalakestore.net//mySparkApp.zip","mySparkApp","",","...",""] +} +``` From 7abab905a18a7335baa6b094064b1b211613d9ca Mon Sep 17 00:00:00 2001 From: elvaliuliuliu <47404285+elvaliuliuliu@users.noreply.github.com> Date: Wed, 20 Nov 2019 16:04:52 -0800 Subject: [PATCH 2/2] resolve comments --- docs/take-to-prod.md | 48 ++++++++++++++++++++++---------------------- 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/docs/take-to-prod.md b/docs/take-to-prod.md index 669dae2b4..9f8e8807b 100644 --- a/docs/take-to-prod.md +++ b/docs/take-to-prod.md @@ -1,25 +1,25 @@ -Taking your Spark .Net Application to Production +Taking your .NET for Apache Spark Application to Production === # Table of Contents This how-to provides general instructions on how to take your .NET for Apache Spark application to production. -In this documentation, we will summary the most commonly asked scenarios when running Spark .Net Application. -And you will also learn how to package your application and submit your application with [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) and [Apachy Livy](https://livy.incubator.apache.org/). -- [How to take your application to production when you have single dependency](#how-to-take-your-application-to-production-when-you-have-single-dependency) - - [Scenarios - Scenario 1 and Scenario 2](#scenarios---single-dependency) +In this documentation, we will summarize the most commonly asked scenarios when running a .NET for Apache Spark Application. +You will also learn how to package your application and submit your application with [spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html) and [Apachy Livy](https://livy.incubator.apache.org/). +- [How to deploy your application when you have a single dependency](#how-to-deploy-your-application-when-you-have-a-single-dependency) + - [Scenarios](#scenarios---single-dependency) - [Package your application](#package-your-application---single-dependency) - [Launch your application](#launch-your-application---single-dependency) -- [How to take your application to production when you have multiple dependencies](#how-to-take-your-application-to-production-when-you-have-multiple-dependencies) - - [Scenarios - Scenario 3, Scenario 4, Scenario 5 and Scenario 6](#scenarios---multiple-dependencies) +- [How to deploy your application when you have multiple dependencies](#how-to-deploy-your-application-when-you-have-multiple-dependencies) + - [Scenarios](#scenarios---multiple-dependencies) - [Package your application](#package-your-application---multiple-dependencies) - [Launch your application](#launch-your-application---multiple-dependencies) -## How to take your application to production when you have single dependency +## How to deploy your application when you have a single dependency ### Scenarios - single dependency #### Scenario 1. SparkSession code and business logic in the same Program.cs file -This would be the simple usecase when you have SparkSession code and business logic (UDFs) in the same Program.cs file and in the same project (e.g. mySparkApp.csproj). +This would be the simple use case when you have `SparkSession` code and business logic (UDFs) in the same Program.cs file and in the same project (e.g. mySparkApp.csproj). #### Scenario 2. SparkSession code and business logic in the same project, but different .cs files -This would be the usecase when you have SparkSession code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). +This would be the use case when you have `SparkSession` code and business logic (UDFs) in the different .cs files but in the same project (e.g. SparkSession in Program.cs, business logic in BusinessLogic.cs and both are in mySparkApp.csproj). ### Package your application - single dependency Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your application in Scenario 1 and Scenario 2. @@ -32,45 +32,45 @@ Please see below as an example of running your app with `spark-submit` in Scenar --class org.apache.spark.deploy.dotnet.DotnetRunner \ --master local \ --files bin\Debug\netcoreapp3.0\mySparkApp.dll \ -bin\Debug\netcoreapp3.0\microsoft-spark-2.4.x-0.6.0.jar \ +bin\Debug\\microsoft-spark--.jar \ dotnet bin\Debug\netcoreapp3.0\mySparkApp.dll ... ``` #### 2. Using Apache Livy Please see below as an example of running your app with Apache Livy in Scenario 1 and Scenario 2. ```shell { - "file": "adl://.azuredatalakestore.net//microsoft-spark-2.4.x-0.6.0.jar", + "file": "adl://.azuredatalakestore.net//microsoft-spark--.jar", "className": "org.apache.spark.deploy.dotnet.DotnetRunner", "files": [“adl://.azuredatalakestore.net//mySparkApp.dll" ], "args": ["dotnet","adl://.azuredatalakestore.net//mySparkApp.dll","",","...",""] } ``` -## How to take your application to production when you have multiple dependencies +## How to deploy your application when you have multiple dependencies ### Scenarios - multiple dependencies #### Scenario 3. SparkSession code in one project that references another project including the business logic -This would be the usecase when you have SparkSession code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). +This would be the use case when you have `SparkSession` code in one project (e.g. mySparkApp.csproj) and business logic (UDFs) in another project (e.g. businessLogic.csproj). #### Scenario 4. SparkSession code references a function from a Nuget package that has been installed in the csproj -This would be the usecase when SparkSession code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). -#### Scenario 5. SparkSession code references a function from a DLL on the user machine -This would be the usecase when SparkSession code reference business logic (UDFs) on the user machine (e.g. SparkSession code in the mySparkApp.csproj and businessLogic.dll on a different machine). +This would be the use case when `SparkSession` code references a function from a Nuget package in the same project (e.g. mySparkApp.csproj). +#### Scenario 5. SparkSession code references a function from a DLL on the user's machine +This would be the use case when `SparkSession` code reference business logic (UDFs) on the user's machine (e.g. `SparkSession` code in the mySparkApp.csproj and businessLogic.dll on a different machine). #### Scenario 6. SparkSession code references functions and business logic from multiple projects/solutions that themselves depend on multiple Nuget packages -This would be a more complex usecase when you have SparkSession code reference business logic (UDFs) and functions from nuget packages in multiple projects and/or solutions. +This would be a more complex use case when you have `SparkSession` code reference business logic (UDFs) and functions from Nuget packages in multiple projects and/or solutions. ### Package your application - multiple dependencies - Please follow [Get Started](https://github.com/dotnet/spark/#get-started) to build your mySparkApp.csproj in Scenario 4 and Scenario 5 (and businessLogic.csproj for Scenario 3). -- Please see detailed steps [here](https://github.com/dotnet/spark/tree/master/deployment#preparing-your-spark-net-app) on how to build, publish and zip your application in Scenario 6. After packaging your .Net for Spark application, you will have a zip file (e.g. mySparkApp.zip) which has all the dependencies. +- Please see detailed steps [here](https://github.com/dotnet/spark/tree/master/deployment#preparing-your-spark-net-app) on how to build, publish and zip your application in Scenario 6. After packaging your .NET for Apache Spark application, you will have a zip file (e.g. mySparkApp.zip) which has all the dependencies. ### Launch your application - multiple dependencies #### 1. Using spark-submit - Please see below as an example of running your app with `spark-submit` in Scenario 3 and Scenario 5. -And you should use `--files bin\Debug\netcoreapp3.0\nugetLibrary.dll` in Scenario 4. +Additionally, you should use `--files bin\Debug\netcoreapp3.0\nugetLibrary.dll` in Scenario 4. ```shell %SPARK_HOME%\bin\spark-submit \ --class org.apache.spark.deploy.dotnet.DotnetRunner \ --master local \ --files bin\Debug\netcoreapp3.0\businessLogic.dll \ -bin\Debug\netcoreapp3.0\microsoft-spark-2.4.x-0.6.0.jar \ +bin\Debug\\microsoft-spark--.jar \ dotnet bin\Debug\netcoreapp3.0\mySparkApp.dll ... ``` - Please see below as an example of running your app with `spark-submit` in Scenario 6. @@ -82,15 +82,15 @@ spark-submit \ --conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./udfs \ --conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./myLibraries.zip \ --archives hdfs:///businessLogics.zip#udfs,hdfs:///myLibraries.zip \ -hdfs:///microsoft-spark-2.4.x-0.6.0.jar \ +hdfs:///microsoft-spark--.jar \ hdfs:///mySparkApp.zip mySparkApp ... ``` #### 2. Using Apache Livy - Please see below as an example of running your app with Apache Livy in Scenario 3 and Scenario 5. -And you should use `"files": ["adl://.azuredatalakestore.net//nugetLibrary.dll"]` in Scenario 4. +Additionally, you should use `"files": ["adl://.azuredatalakestore.net//nugetLibrary.dll"]` in Scenario 4. ```shell { - "file": "adl://.azuredatalakestore.net//microsoft-spark-2.4.x-0.6.0.jar", + "file": "adl://.azuredatalakestore.net//microsoft-spark--.jar", "className": "org.apache.spark.deploy.dotnet.DotnetRunner", "files": [“adl://.azuredatalakestore.net//businessLogic.dll" ], "args": ["dotnet","adl://.azuredatalakestore.net//mySparkApp.dll","",","...",""]