VDYP-714: VDYP Batch - Execution #245

vividroyjeong · 2025-09-19T17:06:42Z

Update VDYP batch to process the partitioned files in parallel

TAC

Partitioned files are able to be processed by VDYP batch in parallel (at least two partitions at a time)
Output is stored in a structed way next to the partiioned input files for aggregation later

Out of Scope

Aggregation of the output files is a separate step from this
Updating the DB with progress as processing completes / fails
This module does not need to be deployed to OpenShift

Developer Note:

To proceed, call the projection and generate the correct output files using an internal poly.csv and layer.csv.
Adjust the existing skeleton to accommodate the actual file formats required for the projection.

pminter-vivid

Besides my specific line commetns I am slightly concerned about the memory expense and processing of parsing out csv inputs only to recreate CSV input streams for projection. Do you see a version of this in which we do not parse the csvs besides partitioining them by FEATURE_ID using just the first bit of each line (up to the comma) and just writing out csvs to partition folders for input. Then for the projection service simply reading the csvs out of the partition input folder. Is there a reason this would not work?

pminter-vivid · 2025-09-19T20:20:45Z

batch/src/main/java/ca/bc/gov/nrs/vdyp/batch/configuration/PolygonAwareItemReader.java

+			layerReader.open(executionContext);
+
+			// Load all polygon data
+			loadAllPolygonData();


I am concerned that these two functions while efficient may use too much memory in production. Batch will be operating very large data sets loading all that into memory to partition may be expensive.

To address memory concerns, I've refactored the batch module to process files line-by-line with configurable chunk sizes (default 10), store raw CSV data without parsing into Java objects, and use lazy loading to keep only the current chunk in memory.

batch/src/main/java/ca/bc/gov/nrs/vdyp/batch/service/VdypProjectionService.java

pminter-vivid · 2025-09-19T20:58:09Z

batch/src/main/java/ca/bc/gov/nrs/vdyp/batch/service/VdypProjectionService.java

+	/**
+	 * Loads parameters from the configured parameters file
+	 */
+	private Parameters loadParameters() throws IOException {


Is this reading parameters file from a configuration location just for now? since the processing ticket is skipping actually reading data from the job? It seems like this should be referenced from the BatchRecord or some parent of the BatchRecord not an application.properties setting.

You're absolutely right. I've refactored the approach - Parameters are now passed from job level via the BatchJobRequest.

…rdBean to enable CSV header extraction

…dBeans

…ob level

…reaming CSV partitioning

…VDYP-714

batch/src/main/java/ca/bc/gov/nrs/vdyp/batch/controller/BatchController.java

batch/src/main/java/ca/bc/gov/nrs/vdyp/batch/service/StreamingCsvPartitioner.java

vividroyjeong · 2025-09-25T18:33:55Z

I implemented your suggestions: the batch module now extracts FEATURE_ID by reading only the line's first part, writes raw CSV data to partition folders without full parsing. refactoring - batch processing with chunk-based approach and streaming CSV partitioning.

pminter-vivid

I think you could refactor that Utility sanitze function before merging but I also won't hold this out because of it.

Feel free to refactor that function and then merge or merge if you disagree with the refactor.

pminter-vivid · 2025-09-25T19:53:03Z

batch/src/main/java/ca/bc/gov/nrs/vdyp/batch/service/StreamingCsvPartitioner.java

+	 * Sanitizes filename for safe logging
+	 * Removes control characters, line breaks, and limits length.
+	 */
+	private String sanitizeFilename(String filename) {


This is a duplicate of sanitizeFileName. As soon as you are making a duplicate of somethign that operates statically you should make a static Utils Class that does this processing for you.

Please refactor to Utils.SanitizeForLogging(String filename)

@pminter-vivid refactored to use 'Utils.SanitizeForLogging.' Thanks for the review.

sonarqubecloud · 2025-09-25T20:29:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
81.5% Coverage on New Code
0.4% Duplication on New Code

See analysis details on SonarQube Cloud

vividroyjeong added 2 commits September 19, 2025 10:03

VDYP-714: VDYP Batch - Execution

499be29

Merge branch 'main' into feature/VDYP-714

48a2903

vividroyjeong requested a review from pminter-vivid September 19, 2025 17:26

pminter-vivid reviewed Sep 19, 2025

View reviewed changes

vividroyjeong added 2 commits September 22, 2025 09:43

VDYP-714: resolving issues raised by SonarQube

b5b64de

VDYP-714: add CsvBindByName annotation to all fields in HcsvLayerReco…

e971443

…rdBean to enable CSV header extraction

vividroyjeong temporarily deployed to dev September 22, 2025 20:34 — with GitHub Actions Inactive

VDYP-714: refactor: extract CSV headers from vdyp-extended-core Recor…

73117d4

…dBeans

vividroyjeong temporarily deployed to dev September 22, 2025 20:42 — with GitHub Actions Inactive

VDYP-714: refactor: move VDYP parameters from application config to j…

7154178

…ob level

vividroyjeong temporarily deployed to dev September 23, 2025 17:36 — with GitHub Actions Inactive

vividroyjeong added 2 commits September 25, 2025 11:12

VDYP-714: refactor: batch processing with chunk-based approach and st…

a9dfb43

…reaming CSV partitioning

Merge branch 'main' of https://github.com/bcgov/nr-vdyp into feature/…

278ff68

…VDYP-714

github-advanced-security bot found potential problems Sep 25, 2025

View reviewed changes

vividroyjeong temporarily deployed to dev September 25, 2025 18:23 — with GitHub Actions Inactive

VDYP-714: resolving sonarqube issues

24828f0

vividroyjeong temporarily deployed to dev September 25, 2025 19:27 — with GitHub Actions Inactive

vividroyjeong requested a review from pminter-vivid September 25, 2025 19:32

pminter-vivid approved these changes Sep 25, 2025

View reviewed changes

VDYP-714: refactor to Utils.SanitizeForLogging

9badaa5

vividroyjeong temporarily deployed to dev September 25, 2025 20:33 — with GitHub Actions Inactive

vividroyjeong merged commit d072606 into main Sep 25, 2025
12 checks passed

vividroyjeong deleted the feature/VDYP-714 branch October 7, 2025 19:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VDYP-714: VDYP Batch - Execution #245

VDYP-714: VDYP Batch - Execution #245

Uh oh!

vividroyjeong commented Sep 19, 2025 •

edited

Loading

Uh oh!

pminter-vivid left a comment

Uh oh!

pminter-vivid Sep 19, 2025

Uh oh!

vividroyjeong Sep 25, 2025

Uh oh!

Uh oh!

pminter-vivid Sep 19, 2025

Uh oh!

vividroyjeong Sep 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vividroyjeong commented Sep 25, 2025

Uh oh!

pminter-vivid left a comment

Uh oh!

pminter-vivid Sep 25, 2025

Uh oh!

vividroyjeong Sep 25, 2025

Uh oh!

sonarqubecloud bot commented Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

VDYP-714: VDYP Batch - Execution #245

VDYP-714: VDYP Batch - Execution #245

Uh oh!

Conversation

vividroyjeong commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pminter-vivid left a comment

Choose a reason for hiding this comment

Uh oh!

pminter-vivid Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

vividroyjeong Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pminter-vivid Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

vividroyjeong Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vividroyjeong commented Sep 25, 2025

Uh oh!

pminter-vivid left a comment

Choose a reason for hiding this comment

Uh oh!

pminter-vivid Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

vividroyjeong Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Sep 25, 2025

Quality Gate passed

Uh oh!

Uh oh!

Uh oh!

vividroyjeong commented Sep 19, 2025 •

edited

Loading