[HWORKS-2190][APPEND] Updating job configuration to include file, pyfiles, archives and jars (#478)

manu-sj · web-flow · commit 30421ddca804 · 2025-07-03T09:42:07.000+02:00
* updating docs for jobs configs to include files, pyFiles, jars and archives

* updating based on review comments

* updating documentation for notebooks and python Jobs
diff --git a/docs/user_guides/projects/jobs/notebook_job.md b/docs/user_guides/projects/jobs/notebook_job.md
@@ -179,6 +179,7 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
 | `resourceConfig.gpus`   | number (int)   | Number of GPUs to be allocated                       | `0`                      |
 | `logRedirection`        | boolean        | Whether logs are redirected                          | `true`                   |
 | `jobType`               | string         | Type of job                                          | `"PYTHON"`               |
+| `files`        | string   | HDFS path(s) to files to be provided to the Notebook Job. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/file1.py,hdfs:///Project/<project_name>/Resources/file2.txt"` | `null` |
 
 
 ## Accessing project data
diff --git a/docs/user_guides/projects/jobs/pyspark_job.md b/docs/user_guides/projects/jobs/pyspark_job.md
@@ -217,7 +217,7 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
 | Field                                      | Type           | Description                                         | Default                    |
 | ------------------------------------------ | -------------- |-----------------------------------------------------| -------------------------- |
 | `type`                                     | string         | Type of the job configuration                       | `"sparkJobConfiguration"`  |
-| `appPath`               | string         | Project path to script (e.g `Resources/foo.py`) | `null`            |
+| `appPath`                                  | string         | Project path to script (e.g `Resources/foo.py`)     | `null`                     |
 | `environmentName`                          | string         | Name of the project spark environment               | `"spark-feature-pipeline"` |
 | `spark.driver.cores`                       | number (float) | Number of CPU cores allocated for the driver        | `1.0`                      |
 | `spark.driver.memory`                      | number (int)   | Memory allocated for the driver (in MB)             | `2048`                     |
@@ -229,6 +229,10 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
 | `spark.dynamicAllocation.maxExecutors`     | number (int)   | Maximum number of executors with dynamic allocation | `2`                        |
 | `spark.dynamicAllocation.initialExecutors` | number (int)   | Initial number of executors with dynamic allocation | `1`                        |
 | `spark.blacklist.enabled`                  | boolean        | Whether executor/node blacklisting is enabled       | `false`                    |
+| `files`        | string   | HDFS path(s) to files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/file1.py,hdfs:///Project/<project_name>/Resources/file2.txt"` | `null` |
+| `pyFiles`      | string   | HDFS path(s) to Python files to be provided to the Spark application. These will be added to the `PYTHONPATH` so they can be imported as modules. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/module1.py,hdfs:///Project/<project_name>/Resources/module2.py"` | `null` |
+| `jars`         | string   | HDFS path(s) to JAR files to be provided to the Spark application. These will be added to the classpath. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/lib1.jar,hdfs:///Project/<project_name>/Resources/lib2.jar"` | `null` |
+| `archives`     | string   | HDFS path(s) to archive files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/archive1.zip,hdfs:///Project/<project_name>/Resources/archive2.tar.gz"` | `null` |
 
 
 ## Accessing project data
diff --git a/docs/user_guides/projects/jobs/python_job.md b/docs/user_guides/projects/jobs/python_job.md
@@ -177,6 +177,7 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
 | `resourceConfig.gpus`   | number (int)   | Number of GPUs to be allocated                  | `0`                      |
 | `logRedirection`        | boolean        | Whether logs are redirected                     | `true`                   |
 | `jobType`               | string         | Type of job                                     | `"PYTHON"`               |
+| `files`        | string   | HDFS path(s) to files to be provided to the Python Job. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/file1.py,hdfs:///Project/<project_name>/Resources/file2.txt"` | `null` |
 
 
 ## Accessing project data
diff --git a/docs/user_guides/projects/jobs/spark_job.md b/docs/user_guides/projects/jobs/spark_job.md
@@ -230,7 +230,12 @@ The following table describes the JSON payload returned by `jobs_api.get_configu
 | `spark.dynamicAllocation.minExecutors`     | number (int)   | Minimum number of executors with dynamic allocation     | `1`                        |
 | `spark.dynamicAllocation.maxExecutors`     | number (int)   | Maximum number of executors with dynamic allocation     | `2`                        |
 | `spark.dynamicAllocation.initialExecutors` | number (int)   | Initial number of executors with dynamic allocation     | `1`                        |
-| `spark.blacklist.enabled`                  | boolean        | Whether executor/node blacklisting is enabled           | `false`                    |
+| `spark.blacklist.enabled`                  | boolean        | Whether executor/node blacklisting is enabled           | `false`                    
+| `files`        | string   | HDFS path(s) to files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/file1.py,hdfs:///Project/<project_name>/Resources/file2.txt"` | `null` |
+| `pyFiles`      | string   | HDFS path(s) to Python files to be provided to the Spark application. These will be added to the `PYTHONPATH` so they can be imported as modules. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/module1.py,hdfs:///Project/<project_name>/Resources/module2.py"` | `null` |
+| `jars`         | string   | HDFS path(s) to JAR files to be provided to the Spark application. These will be added to the classpath. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/lib1.jar,hdfs:///Project/<project_name>/Resources/lib2.jar"` | `null` |
+| `archives`     | string   | HDFS path(s) to archive files to be provided to the Spark application. Multiple files can be included in a single string, separated by commas. <br>Example: `"hdfs:///Project/<project_name>/Resources/archive1.zip,hdfs:///Project/<project_name>/Resources/archive2.tar.gz"` | `null` |
+
 
 ## Accessing project data