Off-by-one error on completions stream with logprobs when prompt is "Hi there!"

Edit: so far, this has only been observed for the prompt "Hi there!". Logprobs for all other prompts tested look good.

When using `logprobs: true` parameter in completions when `stream: true` and at least one output detector is configured with sentence chunkers, the first token of the next chunk is being shown in the previous message (off-by-one error).

```sh
curl --request POST \
  --url http://localhost/api/v2/text/completions-detection \
  --header 'Content-Type: application/json' \
  --data '{
	"prompt": "Hi there!",
	"model": "my-model",
	"n": 1,
	"temperature": 0,
	"top_p": 1,
	"user": "user-1234",
	"detectors": {
		"output": {
			"detector-a": {
				"threshold": "0"
			},
			"detector-b": {}
		}
	},
	"stream": true,
	"logprobs": 2
}'
```

Response (I turned it into a json array to make it easier to read):
```json
[
    {
        "id": "cmpl-d4658ffc982d476cbd5dc792a362d3d5",
        "object": "text_completion",
        "created": 1754423257,
        "model": "my-model",
        "choices": [
            {
                "index": 0,
                "text": " I'm so glad you're here.",
                "logprobs": {
                    "tokens": [
                        "ĠI",
                        "'m",
                        "Ġso",
                        "Ġglad",
                        "Ġyou",
                        "'re",
                        "Ġhere",
                        ".",
                        "ĠI"
                    ],
                    "token_logprobs": [
                        -0.6309449,
                        -0.3174411,
                        -1.8193831,
                        -0.24041994,
                        -0.0075156083,
                        -0.17254148,
                        -0.38406774,
                        -0.7424222,
                        -0.46643928
                    ],
                    "top_logprobs": [
                        {
                            "ĠWelcome": -1.880945,
                            "ĠI": -0.6309449
                        },
                        {
                            "Ġhope": -2.192441,
                            "'m": -0.3174411
                        },
                        {
                            "Ġso": -1.8193831,
                            "Ġa": -2.0068831
                        },
                        {
                            "Ġexcited": -1.74042,
                            "Ġglad": -0.24041994
                        },
                        {
                            "Ġyou": -0.0075156083,
                            "Ġto": -5.2575154
                        },
                        {
                            "'re": -0.17254148,
                            "Ġstopped": -2.7975414
                        },
                        {
                            "Ġhere": -0.38406774,
                            "Ġinterested": -1.2590678
                        },
                        {
                            ".": -0.7424222,
                            "!": -1.9924222
                        },
                        {
                            "ĠMy": -1.4664392,
                            "ĠI": -0.46643928
                        }
                    ],
                    "text_offset": [
                        0,
                        2,
                        4,
                        7,
                        12,
                        16,
                        19,
                        24,
                        25
                    ]
                },
                "finish_reason": null,
                "stop_reason": null
            }
        ],
        "usage": null,
        "detections": {
            "output": [
                {
                    "choice_index": 0,
                    "results": []
                }
            ]
        }
    },
    {
        "id": "cmpl-d4658ffc982d476cbd5dc792a362d3d5",
        "object": "text_completion",
        "created": 1754423257,
        "model": "my-model",
        "choices": [
            {
                "index": 0,
                "text": " I'm a writer, a mom,",
                "logprobs": {
                    "tokens": [
                        "'m",
                        "Ġa",
                        "Ġwriter",
                        ",",
                        "Ġa",
                        "Ġmom",
                        ","
                    ],
                    "token_logprobs": [
                        -0.07426807,
                        -0.63765574,
                        -1.886275,
                        -0.1680328,
                        -0.80962294,
                        -1.4887499,
                        -0.09533816
                    ],
                    "top_logprobs": [
                        {
                            "'m": -0.07426807,
                            "Ġhope": -4.199268
                        },
                        {
                            "ĠRachel": -2.6376557,
                            "Ġa": -0.63765574
                        },
                        {
                            "Ġfreelance": -2.386275,
                            "Ġwriter": -1.886275
                        },
                        {
                            "Ġand": -1.9180328,
                            ",": -0.1680328
                        },
                        {
                            "Ġa": -0.80962294,
                            "Ġeditor": -1.684623
                        },
                        {
                            "Ġmom": -1.4887499,
                            "Ġreader": -1.9887499
                        },
                        {
                            "Ġof": -2.470338,
                            ",": -0.09533816
                        }
                    ],
                    "text_offset": [
                        27,
                        29,
                        31,
                        38,
                        39,
                        41,
                        45
                    ]
                },
                "finish_reason": "length",
                "stop_reason": null
            }
        ],
        "usage": null,
        "detections": {
            "output": [
                {
                    "choice_index": 0,
                    "results": []
                }
            ]
        }
    },
    {
        "id": "cmpl-d4658ffc982d476cbd5dc792a362d3d5",
        "object": "",
        "created": 1754423257,
        "model": "my-model",
        "choices": [],
        "usage": null,
        "detections": {
            "output": [
                {
                    "choice_index": 0,
                    "results": [
                        {
                            "start": 0,
                            "end": 46,
                            "text": " I'm so glad you're here. I'm a writer, a mom,",
                            "detection": "No",
                            "detection_type": "risk",
                            "detector_id": "detector-b",
                            "score": 0.016914913430809975,
                            "metadata": {
                                "confidence": "High"
                            }
                        }
                    ]
                }
            ]
        },
        "warnings": [
            {
                "type": "UNSUITABLE_OUTPUT",
                "message": "Unsuitable output detected."
            }
        ]
    }
]
```

The batching logic needs to be updated to fix this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Off-by-one error on completions stream with logprobs when prompt is "Hi there!" #477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Off-by-one error on completions stream with logprobs when prompt is "Hi there!" #477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions