This repository addresses the challenge of multilingual product search in e-commerce platforms. In today's global marketplace, users search for products using queries in multiple languages, often with mixed scripts, informal language, and domain-specific terminology. Traditional search systems struggle with:
- Multilingual queries: Users search in their native language while product information might be in different languages
- Code-mixing: Queries mixing multiple languages (e.g., "smartphone màu đỏ" - Vietnamese + English)
- Informal language: Colloquial terms, abbreviations, and typos common in search queries
- Relevance matching: Determining if a query matches relevant product categories or specific items
This system tackles two core problems:
-
Query-Category (QC) Classification: Determine if a search query is relevant to a specific product category
- Input: Search query + Product category
- Output: Relevance score (0-1)
- Example: Query "smartphone" → Category "Electronics/Mobile Phones" → High relevance
-
Query-Item (QI) Classification: Determine if a search query matches a specific product
- Input: Search query + Product title/description
- Output: Relevance score (0-1)
- Example: Query "red iPhone" → Product "Apple iPhone 14 Red 128GB" → High relevance
- Multilingual Support: Handles queries in multiple languages simultaneously
- State-of-the-art LLMs: Fine-tuned Gemma3-12B and Qwen models
- Efficient Training: LoRA fine-tuning with DeepSpeed for memory optimization
- Production Ready: Optimized inference pipeline for real-time applications
Our models achieve state-of-the-art performance on multilingual e-commerce search (unseen records, unseen languages). Achieved the #1 in CIKM 2025 Multilingual E-commerce Product Search Competition.
Task | Model | Dev F1-Score | Test F1-Score | Languages Tested |
---|---|---|---|---|
QC | Gemma3-12B | 89.56% | 89.65% | EN, FR, ES, KO, PT, JA, DE, IT, PL, AR |
QI | Gemma3-12B | 88.90% | 88.97% | EN, FR, ES, KO, PT, JA, DE, IT, PL, AR, TH, VN, ID |
- Multilingual Search: Enable users to search in their preferred language
- Cross-language Matching: Match English product descriptions with local language queries
- Query Understanding: Better interpret user intent from informal search terms
- Category Suggestion: Recommend relevant categories based on user queries
- Product Ranking: Improve product ranking by better query-item relevance scoring
- Personalization: Adapt search results based on user's language preferences
- Search Analytics: Analyze search patterns across different languages
- Content Optimization: Identify gaps in multilingual product information
- Market Expansion: Understand demand in different linguistic markets
# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone repository
git clone https://github.com/nhtlongcs/e-commerce-product-search.git
cd e-commerce-product-search
# Setup environment
uv sync
source .venv/bin/activate
- Download our final Gemma3-12B checkpoints from gdrive and unzip these into
models
. In the folder./models
, you should have the models' paths as follow:
./models/gemma-3-12b-pt
./models/best-gemma-3-QC-stage-02
./models/best-gemma-3-QI-stage-02
from quickstart import predict_relevance
# Vietnamese query - automatically translated
score = predict_relevance(
"models/best-gemma-3-QC-stage-02",
"điện thoại thông minh", # Vietnamese
"Electronics > Mobile Phones",
task="QC"
)
print(f"Relevance: {score:.3f}")
# Output: Relevance: 0.997
from quickstart import predict_relevance
# Direct prediction with model path
query = "red iPhone 128GB"
product = "Apple iPhone 14 Pro Red 128GB Unlocked"
relevance_score = predict_relevance(
"models/best-gemma-3-QI-stage-02",
query, product, task="QI"
)
print(f"Relevance: {relevance_score:.3f}")
# Output: Relevance: 0.956
from quickstart import batch_predict
import pandas as pd
# Mixed language queries (Japanese, Vietnamese, etc.)
queries = ["スマートフォン", "điện thoại", "laptop gaming"]
categories = ["Electronics > Phones", "Electronics > Phones", "Computers > Laptops"]
# Batch prediction with automatic translation
scores = batch_predict(
"models/best-gemma-3-QC-stage-02",
queries, categories, task="QC"
)
# Create results dataframe
results = [
{"query": q, "category": c, "score": s}
for q, c, s in zip(queries, categories, scores)
]
df = pd.DataFrame(results)
print(df)
# Output:
# query category score
# 0 smartphone Electronics > Phones 0.995
# 1 điện thoại Electronics > Phones 0.998
# 2 laptop gaming Computers > Laptops 0.975
Our algorithm requires translating queries to English for best performance (check out our technical report for details). For performance-critical applications, you can pre-translate queries once and reuse them for multiple predictions:
from quickstart import translate_queries, predict_relevance_pretranslated, load_model
# Pre-translate queries once for multiple predictions
queries = ["điện thoại", "máy tính", "áo thun"]
translated = translate_queries(queries)
print("Translation results:")
for orig, trans in zip(queries, translated):
print(f"'{orig}' -> '{trans}'")
# Output:
# 'điện thoại' -> 'phone'
# 'máy tính' -> 'computer'
# 'áo thun' -> 't-shirt'
# Load model once for multiple predictions
model, tokenizer = load_model("models/best-gemma-3-QC-stage-02")
targets = ["Electronics > Phones", "Computers > Laptops", "Fashion > Clothing"]
for orig, trans, target in zip(queries, translated, targets):
score = predict_relevance_pretranslated(
(model, tokenizer), orig, trans, target, task="QC"
)
print(f"'{orig}' -> '{target}': {score:.3f}")
# Output:
# 'điện thoại' -> 'Electronics > Phones': 0.998
# 'máy tính' -> 'Computers > Laptops': 0.987
# 'áo thun' -> 'Fashion > Clothing': 0.975
# Standalone translation
from quickstart import translate_queries
translated = translate_queries(["điện thoại", "スマートフォン", "手机"])
# Output: ['phone', 'smartphone', 'mobile phone']
- Python 3.8+
- CUDA-compatible GPU (recommended: 4x 80GB+ for training)
- 32GB+ RAM for inference
- Linux
- PyTorch 2.0+
- Transformers 4.30+
- DeepSpeed (for distributed training)
- UV package manager
Task | RAM | GPU Memory | GPUs | Training Time |
---|---|---|---|---|
Inference | 32GB | 32GB | 1 | - |
Fine-tuning | 64GB | 80GB | 4 | 8-12 hours |
To train your own model, prepare your dataset in the same format as provided (data/raw/
). Then start with data preprocessing, followed by model training. For detailed steps, refer to REPRODUCE.md.
This work achieved 1st place for the CIKM 2025 Multilingual E-commerce Product Search Competition.
Team: DcuRAGONS - Dublin City University, Ireland
Members:
- Thang-Long Nguyen Ho: thanglong.nguyenho27@mail.dcu.ie
- Hoang-Bao Le: bao.le2@mail.dcu.ie
- Minh-Khoi Pham: minhkhoi.pham4@mail.dcu.ie
Technical Report: Available in report/
directory
Port Already in Use
# Change master port in training scripts
export MASTER_PORT=29501
Model Loading Error
# Ensure model paths contain "gemma-3" for proper loading
mv models/my-model models/gemma-3-my-model
- Alibaba AIDC for the competition dataset
- Dublin City University for computational resources
- The open-source community for tools and libraries used