6.1.5 #14670
DevinTDHa
announced in
Announcement
6.1.5
#14670
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📢 Spark NLP 6.1.5: Smarter Readers and More Resilient Pipelines
Spark NLP 6.1.5 focuses on improving data ingestion reliability and pipeline flexibility. This release enhances reader components with better fault tolerance, broader input support, and introduces a new
ReaderAssembler
annotator for streamlined integration. Several key fixes also improve model loading and stability in distributed environments.🔥 Highlights
ReaderAssembler
Annotator: Unify multiple reader annotators into one configurable component for simpler and cleaner ingestion pipelines.🚀 New Features & Enhancements
Reader Pipeline Enhancements
ReaderAssembler
AnnotatorA new meta-annotator that unifies
Reader2X
components (e.g.,Reader2Doc
,Reader2Image
,Reader2Table
) under a single interface.This simplifies pipeline construction and improves maintainability for multi-format ingestion workflows. (Link to notebook)
Support for String Input Columns in Readers (SPARKNLP-1291)
Spark NLP readers only supported inputs via file paths. That means if you already had a DataFrame with text content (say from another pipeline or a preliminary load), you had to write it to disk just to let the reader ingest it. This adds friction and overhead, especially in streaming or in-memory pipelines.
With this change, you can:
Fault-Tolerant XML Reader
The XML reader now skips malformed XML fragments (e.g., mismatched tags, missing closures, invalid characters) instead of failing the job.
Enhanced error handling ensures more resilient ingestion of imperfect real-world data.
🐛 Bug Fixes
Fixed an issue in
FeaturesFallbackReader
that caused duplicate loading or missing model files when calling.pretrained()
on GGUF-based annotators such asAutoGGUFModel
and rerankers, especially in Databricks environments.❤️ Community Support
💻 Installation
Python
Spark Packages
CPU
GPU
Apple Silicon
AArch64
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
spark-nlp-gpu:
spark-nlp-silicon:
spark-nlp-aarch64:
FAT JARs
What's Changed
Full Changelog: 6.1.4...6.1.5
This discussion was created from the release 6.1.5.
Beta Was this translation helpful? Give feedback.
All reactions