Project Overview: Minimal Viable Agentic NLP Pipeline

TL;DR - calling agentic / mcp tools while trying to offload everything we can from expensive agents / models

Project Overview: Minimal Viable Agentic NLP Pipeline

1. Vision & Objective

Build a deterministic, code-first NLP pipeline that transforms freeform user text into executable actions, minimizing reliance on large LLMs. Our MVP will:

Extract intents and parameters via rule-based & lightweight NLP (spaCy) components.
Execute or serialize actions in a structured array for downstream orchestrators.
Optionally offload ambiguous cases to micro-models (30–125M params) only when rules cannot fully resolve.

This approach ensures high accuracy, low latency, and predictable behavior, while keeping large LLM usage limited to dialogue and fallback scenarios.

2. MVP Scope

2.1 In Scope

CLI Interface (nlp_agent.py) that:
- Accepts a paragraph or multi-sentence string as input.
- Splits into semantic sentences (spaCy senter / doc.sents).
- POS-tags tokens (coarse & fine) using en_core_web_sm.
- Extracts an Actions Array of simple action objects:
```
[
  {"action":"convert", "values":["2000"], "objects":["USD","AUD"]},
  {"action":"set_reminder","entities":["doctor appointment","Friday","3pm"]}
]
```
Rule-Based Extractor to map tokens → {action, parameters}:
- Verbs → action
- NUM → values
- NOUN/PROPN → objects/entities
Zero-LLM Path: If rule extractor covers 100% of sentences, no model calls beyond spaCy.
JSON Output: Print final actions array to stdout.

2.2 Out of Scope

Production REST APIs or GUIs.
Dependency parsing, NER, or external integrations.
Large-model (e.g. OpenAI GPT) calls in the MVP.
Distributed deployment, monitoring, or containerization.

3. Architecture & Pipeline

[User Input Paragraph]
        ↓
[1. Sentence Segmentation]    (spaCy senter/parser)
        ↓
[2. POS Tagging]             (spaCy en_core_web_sm)
        ↓
[3. Rule-Based Extraction]   (Python code ⇒ action objects)
        ↓
[4. Actions Array Output]    (CLI JSON)

Action Object Schema:

{
  "action": "string",        // e.g. "convert", "set_reminder"
  "values": ["string"],      // numeric or temporal tokens
  "objects": ["string"],     // noun phrases
  "entities": ["string"]     // proper nouns or named entities
}

4. Micro-LLM Integration (Future Extension)

For edge cases or ambiguous sentences, introduce a Micro-Model layer:

Intent Classifier (30–125M params)
- DistilBERT/TinyBERT fine-tuned on a small intent set.
- Triggered only if rule extractor fails confidence threshold.
Parameter Normalizer (small seq2seq)
- Canonicalize slang or colloquialisms (bucks→USD).
Configuration: CLI flags --use-intent-model, --use-normalizer.

This hybrid model+code approach keeps most logic in deterministic code, invoking models sparingly.

5. CLI Usage & Examples

# Basic MVP run (rules only)
$ python nlp_agent.py "Convert 2000 USD to AUD and set a reminder for Friday at 3pm."
[
  {"action":"convert","values":["2000"],"objects":["USD","AUD"]},
  {"action":"set_reminder","entities":["Friday","3pm"]}
]

# With micro-model fallback enabled
$ python nlp_agent.py --use-intent-model     "Could you, like, swap 250 bucks and remind me later?"

6. Milestones & Timeline

Phase	Deliverables	Timeline
Setup	Repo scaffolding, CLI arg parsing	Day 1
MVP Core	Sentence split + POS tagging + rule extraction	Days 2–3
Testing	Unit tests for 20+ sample inputs	Day 4
Demo	Example runs, README, usage docs	Day 5

7. Next Steps

Collect a small corpus to validate rule extractor coverage.
Prototype micro-model intent classifier; measure fallback rate.
Iterate on action schema (multiple actions per input).
Plan for NER or dependency parsing if required by domain.

8. Fallback & Override Mechanism

In the event that the deterministic pipeline’s extracted actions do not reflect the user’s true intent, the system will:

Feedback Loop to LLM: Present the actions array and original user input back to the LLM for review.
```
{
  "input": "<original text>",
  "actions": [ ... ]
}
```
LLM Validation: The LLM examines the proposed actions and either confirms alignment or provides corrections/overrides:
- Confirm: Executes actions as-is.
- Override: Returns a revised actions array or calls tools directly via function/tool calling syntax.
Re-Execution Path: If corrections are provided, re-run the pipeline or invoke specific tools per the LLM’s guidance.

This fallback ensures that any edge-case or mis-extraction can be caught and remedied, while preserving the primary goal of minimizing LLM calls to only validation and critical overrides.

End of Project Overview.

Copyrighted 2025 Orchestrate LLC License: Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Overview: Minimal Viable Agentic NLP Pipeline

1. Vision & Objective

2. MVP Scope

2.1 In Scope

2.2 Out of Scope

3. Architecture & Pipeline

4. Micro-LLM Integration (Future Extension)

5. CLI Usage & Examples

6. Milestones & Timeline

7. Next Steps

8. Fallback & Override Mechanism

About

Uh oh!

Releases

Packages

License

orchestrate-solutions/nlp-tool-calling

Folders and files

Latest commit

History

Repository files navigation

Project Overview: Minimal Viable Agentic NLP Pipeline

1. Vision & Objective

2. MVP Scope

2.1 In Scope

2.2 Out of Scope

3. Architecture & Pipeline

4. Micro-LLM Integration (Future Extension)

5. CLI Usage & Examples

6. Milestones & Timeline

7. Next Steps

8. Fallback & Override Mechanism

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages