Add harmony tool call parser by louis-jan · Pull Request #116 · ml-explore/mlx-swift-lm

louis-jan · 2026-02-20T13:37:52Z

As mentioned in issue #61, mlx-swift-lm includes a GPT-OSS model definition, but GPT-OSS requires Harmony formatting end-to-end. However, since thinking case is not yet standardized (see #79), it can be parsed from applications. For tool calls, as they are available for most supported models, this PR adds the tool call parser for gpt-oss model.

Proposed changes

Adds:

HarmonyToolCallParser to parse tool call for GPT-OSS model
Move includeStopToken to ToolHandler impl, as it's used only in RawTokenLoopHandler. generateLoopTask should not know this setting as it's ToolHandler impl-specific logic.
In Harmony, <|call|> is a tool call end tag but also an EOS chunk, so this PR added a check for the current tool call parser state and detokenize to complete the tool parse and emit.

Checklist

Put an x in the boxes that apply.

I have read the CONTRIBUTING document
I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
I have added tests that prove my fix is effective or that my feature works
I have updated the necessary documentation (if needed)

louis-jan · 2026-02-20T13:39:52Z

Libraries/MLXLMCommon/Evaluate.swift

-                    if includeStopToken {
+                    // Increase tokenCount only if handler emits end token
+                    if !handler.onStopToken(token, emit: { value in
                        tokenCount += 1
-                        if !handler.onStopToken(token, emit: continuation.yield) {
-                            stopReason = .cancelled
-                            break
-                        }
+                        return continuation.yield(value)
+                    }) {
+                        stopReason = .cancelled
+                        break
                    }
+


@ronaldmannak FYI, take a look at this update. I've refactored so that includeStopToken is encapsulated within RawTokenHandler only. Now, whenever a token needs to be emitted, the count increments automatically without requiring this function to be aware of the includeStopToken flag.
cc @davidkoski

louis-jan · 2026-02-20T13:41:41Z

Libraries/MLXLMCommon/Evaluate.swift

+        // Checking whether toolCallProcessor is collecting tool call or not
+        // Then detokenizer for better performance instead of always detokenizing
+        // As EOS token could also tool call endTag as well (e.g. GPT-OSS)
+        if(toolCallProcessor.state == .collectingToolCall) {
+            detokenizer.append(token: token)
+            if let chunk = detokenizer.next() {
+                if chunk == toolCallProcessor.parser.endTag {
+                    // Process chunk through the tool call processor.
+                    let _ = toolCallProcessor.processChunk(chunk)
+
+                    // Emit completed tool call if needed
+                    if let toolCall = toolCallProcessor.toolCalls.popLast() {
+                        let _ = emit(.toolCall(toolCall))
+                    }
+                }
+            }
+        }
+        return true


@davidkoski, gpt-oss model will return <|call|> as tool call endtag but also stop token EOS. This change is to check whether it needs to complete the tool parsing process instead of just ending the stream.

louis-jan · 2026-02-20T13:43:23Z

Libraries/MLXLMCommon/Evaluate.swift

+        if(self.includeStopToken) {
+            if case .terminated = emit(.token(token)) {
+                return false
+            }
        }


So generateLoopTask would not need to care about EOS token emitting (as this is a specific logic for this raw token handler)

louis-jan · 2026-02-20T14:10:49Z

Libraries/MLXLMCommon/Tool/ToolCallProcessor.swift

+    var state = State.normal
+    let parser: any ToolCallParser


I don't really like this change, but exposing these changes so TextToolTokenLoopHandler can check whether the tool parser is processing or not, also for endTag check.
https://github.com/ml-explore/mlx-swift-lm/pull/116/changes#diff-d636823870a24e57e1e1d2c6b125edc68ba15e35c0122cb749fab0fd714d601fR1315

ronaldmannak · 2026-02-20T20:24:12Z

Hi @louis-jan — thanks for tagging me, I appreciate it.
There are a couple of Harmony-specific issues, which is why I didn’t add Harmony parsing in the Generate Token PR:

Parsing needs to happen at the token level, not on the tokenized string.
Even with token-level parsing, non–tool-calling Harmony output doesn’t map cleanly onto MLX-Swift-LM’s Generation model. That leaves you with two rough options:
- Lose important nuance (which matters for things like Open Responses), or
- Expose Harmony tags in the generated text.

Neither is an acceptable option IMO.
For now, the solution I am using is doing the harmony parsing in the app layer. OpenAI provides a Rust library for Harmony parsing. I built an FFI wrapper for it: Pico Harmony. The catch is that the FFI currently ships with a precompiled version of the Rust library, and I totally understand why folks would be hesitant to depend on that.
I’ve offered for OpenAI to take ownership of the library, but I haven’t heard back yet (hint hint @stephancasas). I also haven’t followed up :)
I’d be happy to transfer ownership to ml-explore as well, but integrating Harmony properly at the MLX-Swift-LM layer would require a fairly substantial refactor.
I guess it depends on where the MLX team wants to take the library

davidkoski · 2026-02-20T21:18:01Z

Very interesting -- I wasn't familiar with harmony, but if I am reading it right, it is describing the structure of the output. Sort of the inverse of UserInputProcessor + chat template. The structure they define looks reasonable, but mlx-swift-lm would need to support outputs from many models. We could make a generic structure (like UserInput) that we could translate tokens -> harmony parser -> ModelOutput.

Aside from tool calling, most models produce bare unstructured output, which we could certainly wrap in this structure.

Although using the official library sounds nice, I don't think we want that in mlx-swift-lm as it complicates the build tremendously and we wouldn't want to require people to use a prebuilt library. Ideally we would allow somebody to build a third party library that did link the official harmony library and would interoperate with the mlx-swift-lm infrastructure just like any other model. I think that mostly requires:

definition of something like ModelOutput which could be streamed from generation
the right hooks to parse the token stream and turn it into structured output (a state machine)
tool parsing and bare text would fit into this for existing models

The perhaps interesting part is how this interacts with streaming text -- I think the ability to see the text as it is generated is key. Not impossible but it is a constraint on the representation and how it is built.

If Generation had the bare token, I wonder if we could accomplish this with composition? Simply wrap the streamDetails() call in a structured decoder. Maybe not as it affects tool calling, which (in #114) is part of the inner loop.

I will talk with the other mlx folks and see where mlx-lm is headed and think about how we might make something that would support a wide variety of models.

See also:

ronaldmannak · 2026-02-21T03:51:34Z

Those are great link, especially the comprehensive bug report.
I'm not sure I understand your concern for streaming. Harmony uses chunks. That part doesn't change (apart from the extra metadata, channels, etc).
If you're considering adding a raw token to Generation, maybe we add a reasoning option to Generation as well for reasoning models, because that is getting a bit chaotic as well (e.g. <think></think> vs [think][/think] and GLM 4.7 doesn't even emit a <think> prefix. It would make sense to have the generate loop take care of that automatically (perhaps on token level, not sure).
However, just adding a raw token to generation itself doesn't solve the issue and it even might be unwanted to tokenize the token for gpt-oss before the Harmony parser. So you probably would use generateTokenTask instead of generateTask anyway. At least, that's how I do it in Pico: two different paths for Harmony and non-Harmony.

I do think you want to use the official OpenAI Harmony parser, given the complexity and since the specs might change in the future. You don't want to reinvent the wheel and you also don't want to just support a subset of Harmony. Since you can't use the official Rust parser directly in Swift, you'll need to have an FFI like Pico Harmony. You can have the developer add their own parser as an adapter, but that might just be confusing.

How about this:

We add Harmony FFI to MLX-Swift-LM but keep it a separate and optional target so projects that don't support GPT-oss don't have to link the code. I'm happy to rename Pico Harmony to MLX-Harmony or something and create a PR
Maybe there's a better solution than bundling the compiled Rust library (even when compiled by the MLX team) and developers may be able to compile the library themselves easily (I am no expert in Swift or Rust packages, but since I know you're good at finding obscure things like JSON 5 support, you might find a better solution)
Instead of trying to add full Harmony support in Generation you could maybe instead create a separate generateHarmonyTask that returns streaming Harmony items and internally uses generateTokenTask and the same tool calling code as non-Harmony models.

The developer will have to use generation like so:

import MLXLMCommon
import MLXHarmony

if model.type == "gpt_oss" {
  generateHarmonyTask {...}
} else {
  generateTask {...}
}

Add harmony tool call parser

6473c59

louis-jan commented Feb 20, 2026

View reviewed changes

ronaldmannak mentioned this pull request Feb 24, 2026

Add wiredMemoryTicket to GenerateTokens #117

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add harmony tool call parser#116

Add harmony tool call parser#116
louis-jan wants to merge 1 commit intoml-explore:mainfrom
louis-jan:add-harmony-tool-call-parser

louis-jan commented Feb 20, 2026

Uh oh!

louis-jan Feb 20, 2026 •

edited

Loading

Uh oh!

louis-jan Feb 20, 2026

Uh oh!

louis-jan Feb 20, 2026

Uh oh!

louis-jan Feb 20, 2026

Uh oh!

ronaldmannak commented Feb 20, 2026

Uh oh!

davidkoski commented Feb 20, 2026 •

edited

Loading

Uh oh!

ronaldmannak commented Feb 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

louis-jan commented Feb 20, 2026

Proposed changes

Checklist

Uh oh!

louis-jan Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

louis-jan Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

louis-jan Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

louis-jan Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

ronaldmannak commented Feb 20, 2026

Uh oh!

davidkoski commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ronaldmannak commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

louis-jan Feb 20, 2026 •

edited

Loading

davidkoski commented Feb 20, 2026 •

edited

Loading

ronaldmannak commented Feb 21, 2026 •

edited

Loading