Error while using Qwen/Qwen3-Reranker-0.6B with Cross Encoder Reranker

### Checked other resources

- [x] This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- [x] I added a clear and descriptive title that summarizes this issue.
- [x] I used the GitHub search to find a similar question and didn't find it.
- [x] I am sure that this is a bug in LangChain rather than my code.
- [x] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- [x] I read what a minimal reproducible example is (https://stackoverflow.com/help/minimal-reproducible-example).
- [x] I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

### Example Code

```python
from langchain_qdrant import QdrantVectorStore, RetrievalMode, FastEmbedSparse
from langchain_ollama import OllamaEmbeddings
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain.retrievers.document_compressors import CrossEncoderReranker

embed_model = OllamaEmbeddings(model="Qwen3")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25") 


### any dummy vector store will work
vector_store = QdrantVectorStore.from_existing_collection(
    collection_name='collection',
    embedding=embed_model,
    sparse_embedding=sparse_embeddings,
    path='qdrantDB',
    vector_name='dense',
    sparse_vector_name='bm25'
)

reranker = HuggingFaceCrossEncoder(model_name="Qwen/Qwen3-Reranker-0.6B")
compressor = CrossEncoderReranker(model=reranker, top_n=3)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=vector_store.as_retriever(k=10)
)
context = compression_retriever.invoke('any query text to invoke it')
```

### Error Message and Stack Trace (if applicable)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 7
      
----> 7 context = compression_retriever.invoke('any query text to invoke it')
    

File virtualenvs\GenAI\Lib\site-packages\langchain_core\retrievers.py:261, in BaseRetriever.invoke(self, input, config, **kwargs)
    259 kwargs_ = kwargs if self._expects_other_args else {}
    260 if self._new_arg_supported:
--> 261     result = self._get_relevant_documents(
    262         input, run_manager=run_manager, **kwargs_
    263     )
    264 else:
    265     result = self._get_relevant_documents(input, **kwargs_)

File virtualenvs\GenAI\Lib\site-packages\langchain\retrievers\contextual_compression.py:46, in ContextualCompressionRetriever._get_relevant_documents(self, query, run_manager, **kwargs)
     40 docs = self.base_retriever.invoke(
     41     query,
     42     config={"callbacks": run_manager.get_child()},
     43     **kwargs,
     44 )
     45 if docs:
---> 46     compressed_docs = self.base_compressor.compress_documents(
     47         docs,
     48         query,
     49         callbacks=run_manager.get_child(),
     50     )
     51     return list(compressed_docs)
     52 return []

File virtualenvs\GenAI\Lib\site-packages\langchain\retrievers\document_compressors\cross_encoder_rerank.py:45, in CrossEncoderReranker.compress_documents(self, documents, query, callbacks)
     28 def compress_documents(
     29     self,
     30     documents: Sequence[Document],
     31     query: str,
     32     callbacks: Optional[Callbacks] = None,
     33 ) -> Sequence[Document]:
     34     """
     35     Rerank documents using CrossEncoder.
     36 
   (...)     43         A sequence of compressed documents.
     44     """
---> 45     scores = self.model.score([(query, doc.page_content) for doc in documents])
     46     docs_with_scores = list(zip(documents, scores))
     47     result = sorted(docs_with_scores, key=operator.itemgetter(1), reverse=True)

File virtualenvs\GenAI\Lib\site-packages\langchain_community\cross_encoders\huggingface.py:59, in HuggingFaceCrossEncoder.score(self, text_pairs)
     50 def score(self, text_pairs: List[Tuple[str, str]]) -> List[float]:
     51     """Compute similarity scores using a HuggingFace transformer model.
     52 
     53     Args:
   (...)     57         List of scores, one for each pair.
     58     """
---> 59     scores = self.client.predict(text_pairs)
     60     # Some models e.g bert-multilingual-passage-reranking-msmarco
     61     # gives two score not_relevant and relevant as compare with the query.
     62     if len(scores.shape) > 1:  # we are going to get the relevant scores

File virtualenvs\GenAI\Lib\site-packages\torch\utils\_contextlib.py:120, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    117 @functools.wraps(func)
    118 def decorate_context(*args, **kwargs):
    119     with ctx_factory():
--> 120         return func(*args, **kwargs)

File virtualenvs\GenAI\Lib\site-packages\sentence_transformers\cross_encoder\util.py:68, in cross_encoder_predict_rank_args_decorator.<locals>.wrapper(self, *args, **kwargs)
     63         kwargs.pop(deprecated_arg)
     64         logger.warning(
     65             f"The CrossEncoder.predict `{deprecated_arg}` argument is deprecated and has no effect. It will be removed in a future version."
     66         )
---> 68 return func(self, *args, **kwargs)

File virtualenvs\GenAI\Lib\site-packages\sentence_transformers\cross_encoder\CrossEncoder.py:651, in CrossEncoder.predict(self, sentences, batch_size, show_progress_bar, activation_fn, apply_softmax, convert_to_numpy, convert_to_tensor)
    644 features = self.tokenizer(
    645     batch,
    646     padding=True,
    647     truncation=True,
    648     return_tensors="pt",
    649 )
    650 features.to(self.model.device)
--> 651 model_predictions = self.model(**features, return_dict=True)
    652 logits = self.activation_fn(model_predictions.logits)
    654 if apply_softmax and logits.ndim > 1:

File virtualenvs\GenAI\Lib\site-packages\torch\nn\modules\module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)
   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1772 else:
-> 1773     return self._call_impl(*args, **kwargs)

File virtualenvs\GenAI\Lib\site-packages\torch\nn\modules\module.py:1784, in Module._call_impl(self, *args, **kwargs)
   1779 # If we don't have any hooks, we want to skip the rest of the logic in
   1780 # this function, and just call forward.
   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1782         or _global_backward_pre_hooks or _global_backward_hooks
   1783         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1784     return forward_call(*args, **kwargs)
   1786 result = None
   1787 called_always_called_hooks = set()

File virtualenvs\GenAI\Lib\site-packages\transformers\utils\generic.py:959, in can_return_tuple.<locals>.wrapper(self, *args, **kwargs)
    957 if return_dict_passed is not None:
    958     return_dict = return_dict_passed
--> 959 output = func(self, *args, **kwargs)
    960 if not return_dict and not isinstance(output, tuple):
    961     output = output.to_tuple()

File virtualenvs\GenAI\Lib\site-packages\transformers\modeling_layers.py:142, in GenericForSequenceClassification.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, **kwargs)
    139     batch_size = inputs_embeds.shape[0]
    141 if self.config.pad_token_id is None and batch_size != 1:
--> 142     raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
    143 if self.config.pad_token_id is None:
    144     last_non_pad_token = -1

ValueError: Cannot handle batch sizes > 1 if no padding token is defined.

### Description

I'm trying to use `Qwen/Qwen3-Reranker-0.6B` to rerank the retrieved documents from Vector Store.

### System Info

System Information
------------------
> OS:  Windows
> OS Version:  10.0.26100
> Python Version:  3.12.11 (main, Jun 12 2025, 12:44:17) [MSC v.1943 64 bit (AMD64)]

Package Information
-------------------
> langchain_core: 0.3.74
> langchain: 0.3.27
> langchain_community: 0.3.27
> langsmith: 0.4.17
> langchain_chroma: 0.2.4
> langchain_docling: 1.0.0
> langchain_experimental: 0.3.4
> langchain_huggingface: 0.3.0
> langchain_ollama: 0.3.6
> langchain_openai: 0.3.25
> langchain_pymupdf4llm: 0.4.1
> langchain_qdrant: 0.2.0
> langchain_tavily: 0.2.11
> langchain_text_splitters: 0.3.9
> langchain_unstructured: 0.1.6
> langgraph_sdk: 0.2.3
> langgraph_supervisor: 0.0.29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error while using Qwen/Qwen3-Reranker-0.6B with Cross Encoder Reranker #32686

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error while using Qwen/Qwen3-Reranker-0.6B with Cross Encoder Reranker #32686

Description

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

System Information

Package Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions