Conversation
| if includeStopToken { | ||
| // Increase tokenCount only if handler emits end token | ||
| if !handler.onStopToken(token, emit: { value in | ||
| tokenCount += 1 | ||
| if !handler.onStopToken(token, emit: continuation.yield) { | ||
| stopReason = .cancelled | ||
| break | ||
| } | ||
| return continuation.yield(value) | ||
| }) { | ||
| stopReason = .cancelled | ||
| break | ||
| } | ||
|
|
There was a problem hiding this comment.
@ronaldmannak FYI, take a look at this update. I've refactored so that includeStopToken is encapsulated within RawTokenHandler only. Now, whenever a token needs to be emitted, the count increments automatically without requiring this function to be aware of the includeStopToken flag.
cc @davidkoski
| // Checking whether toolCallProcessor is collecting tool call or not | ||
| // Then detokenizer for better performance instead of always detokenizing | ||
| // As EOS token could also tool call endTag as well (e.g. GPT-OSS) | ||
| if(toolCallProcessor.state == .collectingToolCall) { | ||
| detokenizer.append(token: token) | ||
| if let chunk = detokenizer.next() { | ||
| if chunk == toolCallProcessor.parser.endTag { | ||
| // Process chunk through the tool call processor. | ||
| let _ = toolCallProcessor.processChunk(chunk) | ||
|
|
||
| // Emit completed tool call if needed | ||
| if let toolCall = toolCallProcessor.toolCalls.popLast() { | ||
| let _ = emit(.toolCall(toolCall)) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| return true |
There was a problem hiding this comment.
@davidkoski, gpt-oss model will return <|call|> as tool call endtag but also stop token EOS. This change is to check whether it needs to complete the tool parsing process instead of just ending the stream.
| if(self.includeStopToken) { | ||
| if case .terminated = emit(.token(token)) { | ||
| return false | ||
| } | ||
| } |
There was a problem hiding this comment.
So generateLoopTask would not need to care about EOS token emitting (as this is a specific logic for this raw token handler)
| var state = State.normal | ||
| let parser: any ToolCallParser |
There was a problem hiding this comment.
I don't really like this change, but exposing these changes so TextToolTokenLoopHandler can check whether the tool parser is processing or not, also for endTag check.
https://github.com/ml-explore/mlx-swift-lm/pull/116/changes#diff-d636823870a24e57e1e1d2c6b125edc68ba15e35c0122cb749fab0fd714d601fR1315
|
Hi @louis-jan — thanks for tagging me, I appreciate it.
Neither is an acceptable option IMO. |
|
Very interesting -- I wasn't familiar with harmony, but if I am reading it right, it is describing the structure of the output. Sort of the inverse of UserInputProcessor + chat template. The structure they define looks reasonable, but mlx-swift-lm would need to support outputs from many models. We could make a generic structure (like UserInput) that we could translate tokens -> harmony parser -> ModelOutput. Aside from tool calling, most models produce bare unstructured output, which we could certainly wrap in this structure. Although using the official library sounds nice, I don't think we want that in mlx-swift-lm as it complicates the build tremendously and we wouldn't want to require people to use a prebuilt library. Ideally we would allow somebody to build a third party library that did link the official harmony library and would interoperate with the mlx-swift-lm infrastructure just like any other model. I think that mostly requires:
The perhaps interesting part is how this interacts with streaming text -- I think the ability to see the text as it is generated is key. Not impossible but it is a constraint on the representation and how it is built. If I will talk with the other mlx folks and see where mlx-lm is headed and think about how we might make something that would support a wide variety of models. See also: |
|
Those are great link, especially the comprehensive bug report. I do think you want to use the official OpenAI Harmony parser, given the complexity and since the specs might change in the future. You don't want to reinvent the wheel and you also don't want to just support a subset of Harmony. Since you can't use the official Rust parser directly in Swift, you'll need to have an FFI like Pico Harmony. You can have the developer add their own parser as an adapter, but that might just be confusing. How about this:
The developer will have to use generation like so: import MLXLMCommon
import MLXHarmony
if model.type == "gpt_oss" {
generateHarmonyTask {...}
} else {
generateTask {...}
} |
As mentioned in issue #61, mlx-swift-lm includes a GPT-OSS model definition, but GPT-OSS requires Harmony formatting end-to-end. However, since thinking case is not yet standardized (see #79), it can be parsed from applications. For tool calls, as they are available for most supported models, this PR adds the tool call parser for gpt-oss model.
Proposed changes
Adds:
includeStopTokento ToolHandler impl, as it's used only inRawTokenLoopHandler. generateLoopTask should not know this setting as it's ToolHandler impl-specific logic.Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes