feat:support for two long-context embedding models (Qwen3 and Gemma) #453

OneZero-Y · 2025-10-16T12:25:52Z

What type of PR is this?
support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M)

What this PR does / why we need it:

1. Qwen3-Embedding-0.6B

Model Specifications:

Architecture: 28 layers, 1024 hidden size, 16 attention heads, 8 KV heads (GQA)
Context Length: 32,768 tokens
Embedding Dimensions: [1024, 512, 256, 128] (Matryoshka)
Pooling Strategy: Last-token pooling with L2 normalization
Activation: SwiGLU
Position Encoding: RoPE (θ=1,000,000)

Key Features:

Grouped Query Attention (GQA) for efficient long-context processing
Left-padding strategy optimized for last-token pooling
High-quality embeddings for retrieval and semantic search

2. EmbeddingGemma-300M

Model Specifications:

Architecture: 14 layers, 768 hidden size, 3 query heads, 1 KV head (MQA)
Context Length: 8,192 tokens
Embedding Dimensions: [768, 512, 256, 128] (Matryoshka)
Pooling Strategy: Mean pooling with L2 normalization
Dense Bottleneck: 768 → 3072 → 768 for quality enhancement
Activation: GeGLU
Attention: Hybrid sliding window (4K) + full attention
Position Encoding: Global RoPE (θ=1,000,000) + Local RoPE (θ=10,000)

Key Features:

Multi-Query Attention (MQA) for low-latency inference
Dense bottleneck layer for improved embedding quality
Right-padding strategy optimized for mean pooling

3. Intelligent Routing System

Routing Logic:

Condition	Selected Model	Rationale
`quality_priority > 0.7`	Qwen3	Higher quality (1024-dim, last-token pooling)
`latency_priority > 0.7`	Gemma	Lower latency
Balanced (`latency ≤ 0.7`)	Qwen3	Default to quality
`512 < seq_len ≤ 4096` + Balanced	Gemma	Optimized for medium sequences
`seq_len > 4096`	Qwen3	Long-context advantage (32K max)

Features:

Sequence length awareness (0-512, 513-4096, 4097+)
Priority-based selection (quality vs latency trade-off)
Target dimension validation (supports Matryoshka truncation)
Fallback to quality-optimized model by default

4. Enhanced API Endpoints

4.1 Embedding Generation API

Endpoint: POST /api/v1/embeddings
Request:

curl -X POST http://localhost:8080/api/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "texts": ["Hello world", "How are you?"],
    "model": "auto",
    "dimension": 768,
    "quality_priority": 0.5,
    "latency_priority": 0.3
  }'

Response:

{"embeddings":[{"text":"Hello world","embedding":[-0.0011776701,0.007934736,-0.011614905,-0.069743134,0.0038819378,-0.013523025,-0.013302667,-0.013149623,-0.110761255,-0.017031778,-0.005930664,-0.030125689,0.040773585,-0.011440966,-0.04770526,0.09613129,-0.0014638528,0.09509718,0.09494334,-0.054391436,-0.0020897032,0.035339814,-0.021833058,0.13721538,-0.038567428,-0.037829112,-0.063753285,0.12518048,0.018808221,-0.020483524,0.0059032775,0.046593938,-0.018551195,-0.016806917,-0.036421772,-0.015456601,0.026749283,-0.01122462,-0.029844467,0.04526407,0.018204559,-0.0072549977,0.052898858,-0.016359843,0.029457115,0.01298179,0.01801,-0.016472356,0.022225356,0.009826025,-0.033700198,-0.007061259,-0.0033656745,-0.0056725773,0.01836465,-0.043071102,0.019256165,-0.025562814,0.02602406,-0.013416847,-0.065772854,0.046992764,-0.06785469,-0.011567845,0.015336861,0.04758737,0.0057834517,-0.034286067,-0.08161117,-0.02324808,-0.012708064,-0.039974876,-0.06001572,0.011223743,-0.0139067015,0.044922095,-0.011723515,-0.07243128,-0.007279289,0.014650157,-0.042103495,0.010381119,0.0057879593,0.015964841,-0.027991252,0.01771154,0.039843146,-0.01941307,0.030058417,-0.011496778,0.022528622,0.03423984,-0.01498692,-0.021434994,-0.010240162,-0.034212504,-0.009323697,0.027459912,-0.009895696,0.01241982,-0.019939952,-0.017852187,-0.022444885,0.031270206,-0.007970279,0.054569997,-0.08672202,0.006629516,-0.029559178,-0.025445472,0.04059863,0.012183859,-0.024404349,0.047179647,-0.013648596,-0.01490265,-0.00013017676,-0.02450505,-0.0018584615,0.020799987,0.026012216,-0.03292143,-0.011597405,-0.0060056485,-0.038779255,0.018817881,-0.013597122,-0.0014876737,0.026870968,-0.039269753,0.022592615,-0.0061427583,-0.029515263,-0.0620767,0.03594276,0.014359092,0.028502243,-0.0045799874,0.053815804,0.004320876,-0.024249097,0.023128437,0.0056351507,-0.009190675,-0.001348589,0.012393652,-0.0070440415,0.00425259,0.03977148,0.023439541,-0.006672565,0.0057664854,0.0026115666,-0.029725123,-0.005135876,0.03751997,-0.024319746,-0.040855538,0.020056438,0.015429411,0.003315916,0.0035442382,-0.021802701,-0.038691856,-0.029281877,0.040952794,-0.017199202,-0.02344868,-0.02573083,-0.003891258,-0.0060664453,0.033763517,-0.0053477865,0.0045854473,0.017451534,-0.0032039755,-0.031705134,-0.06155096,-0.028876746,-0.0045428276,-0.0030209003,-0.0159006,0.003289891,-0.023817137,0.003466424,0.036604717,0.002943647,0.0026952373,0.0038659417,-0.03182722,-0.041021496,-0.034839902,-0.04418572,-0.019022595,-0.027335886,-0.032770332,0.0340952,0.001639541,0.010570816,-0.0018090276,-0.029376939,-0.02756262,0.0024074377,0.031434618,-0.034492984,-0.0144294975,0.016222548,-0.016283115,-0.019075776,-0.0365932,0.004635058,-0.024825921,-0.00034818443,0.023385312,-0.02441677,0.023864735,-0.0044570942,0.03043529,-0.012903999,0.0019909157,0.047047853,-0.0070095602,-0.037368115,0.0033693789,0.03574133,-0.0379522,0.023330256,-0.019325629,-0.018294191,0.018658416,-0.013180991,-0.013505713,0.03746225,0.051020514,0.052633204,-0.05528711,0.005697388,0.039139148,0.047146965,-0.024603525,-0.006362081,-0.0060550203,0.022352997,-0.0006954921,-0.04412668,0.02547887,-0.016690737,0.0390946,0.003148849,0.02722218,0.014348823,-0.09372626,-0.014871775,0.034534905,0.0138108535,-0.020523604,-0.0053432235,-0.011875785,0.003273271,-0.000023344579,0.054128446,-0.050679486,-0.026407199,0.01884287,0.067124575,0.0066466467,-0.00109404,0.004614791,-0.04541507,-0.0038610692,0.033234425,0.029314004,-0.057476472,0.017554866,-0.0016484737,0.018493941,0.008613782,-0.07667692,0.0342826,0.033308208,0.0022795652,-0.0056974525,0.0047485805,-0.054709766,0.040707782,0.02155165,0.032027535,0.017938884,-0.049721897,-0.022716342,0.04905613,0.011560995,-0.024089629,0.04493244,-0.005221984,0.0179325,0.0038464053,0.06546941,0.0572408,-0.05611776,-0.011494613,-0.019644028,0.011298983,0.008333101,-0.0068284166,-0.020574884,0.039558575,-0.014883925,0.0005924551,-0.017006062,0.053173944,-0.022080317,0.0013523472,-0.02487049,0.017781265,-0.02438059,-0.0555372,0.06903056,-0.026278012,0.026846303,0.024213865,0.0062420913,0.0049660066,-0.007000429,-0.0073207105,0.012079489,-0.042659756,0.015751842,-0.006360046,0.0032720033,-0.013028973,0.018975629,0.03335066,-0.019504344,0.016434032,0.03638623,0.03764156,0.027010813,-0.013275654,-0.00036471194,0.028852843,0.03690148,-0.029017035,0.006302584,-0.007589289,-0.037588358,-0.07803552,0.006188399,0.020933114,0.027815672,-0.040471938,0.006155548,-0.009000069,-0.031578418,-0.06337097,-0.013561679,0.0077440287,0.0058699124,-0.051710144,-0.049422517,-0.018016346,-0.07374648,0.0050060567,0.011810754,-0.054510564,0.013661655,0.033607606,-0.0031822105,0.050140835,0.006457927,-0.041210532,-0.050923835,-0.031870425,0.009073121,0.08176662,0.054262713,0.08285925,-0.012549974,0.01976759,0.034951117,0.019130118,-0.0019579947,-0.014740428,-0.024253204,-0.03413451,-0.02880418,0.03704287,-0.04562056,0.019868223,-0.0025967413,0.027464813,-0.026074952,0.022079965,-0.04297823,0.056112465,0.009332766,-0.037954587,0.017639387,-0.0041110967,0.0071016955,0.0057284785,-0.0037316189,-0.07983752,-0.0035650276,0.01779454,-0.045581352,-0.05751492,-0.022371396,0.0428655,-0.013850221,0.049352434,-0.011919211,-0.028414508,-0.029559545,0.023474066,-0.015694713,0.0956935,-0.002338615,0.02301684,0.02540595,0.016548797,-0.012145937,-0.03643208,-0.000035791756,-0.02870492,-0.014922357,-0.041055497,-0.05336095,-0.0064700088,0.08074731,-0.011011576,0.0028937547,-0.0017490197,0.0024193553,0.010197607,-0.0031211467,0.030824924,-0.005040887,-0.008652503,0.013261951,-0.029954879,0.024015605,-0.008939344,-0.0066627734,-0.027829394,-0.030609315,0.014128119,0.0076351394,0.00051385857,-0.0023315917,0.057905756,-0.025545806,-0.003066322,0.04158488,0.011264878,-0.04644039,-0.010959641,0.0147877475,0.07088528,-0.002603211,-0.035321735,0.03522705,0.018787868,-0.021267159,-0.013037708,0.019945694,0.022165427,-0.024118215,-0.014929879,0.021486001,0.009236394,0.04342486,0.023448026,0.003105599,0.0049067917,0.020933064,-0.050960075,-0.0028150147,-0.00883601,0.02930497,0.0150023755,-0.025582412,0.052369747,-0.017015394,0.054080874,-0.041011646,-0.012358218,-0.029215049,-0.06170366,-0.03131988,-0.017766573,0.026997631,-0.0020162256,-0.0429391,0.015939536,-0.01791068,-0.017200438,-0.046608217,-0.10629024,0.04663697,0.0044270786,-0.0041578654,-0.03158995,0.037761103,-0.027226388,-0.0125292465,-0.011220218,-0.00017500523,-0.0015987612,0.016892895,0.0116157485,-0.020097783,0.011054542,0.01417011,-0.025809815,-0.028117172,0.010655313,0.043879397,-0.006845394,0.04727274,-0.021628017,0.013827505,0.04140578,0.010139222,-0.013650472,0.036142208,0.015272486,-0.02941387,0.020383876,0.03659396,0.01221859,-0.013096179,-0.0037262228,-0.014649352,0.018505676,-0.03366923,0.0020116826,-0.017140944,-0.00082362874,0.012734089,-0.012500266,-0.013031929,-0.032808498,0.015234133,0.02649142,-0.05603723,-0.01362289,0.017649988,-0.04294344,0.02675647,0.009724848,0.03177922,0.04511783,-0.044304796,0.02328322,-0.03249888,0.06610185,0.041508447,0.011661373,-0.030429814,-0.043557815,-0.009455762,0.00038080724,-0.042194176,0.013991351,0.021280484,0.005238205,0.012856736,0.054863553,-0.0130526,0.051673315,-0.027055388,0.007017527,0.009234334,0.0033340005,-0.032555733,-0.026003402,0.013932356,0.013662331,-0.05472701,0.049953703,0.0016582012,-0.012542819,-0.02233808,-0.021187767,0.006996858,0.009891145,0.028843902,0.035764515,-0.019012483,-0.029690294,-0.054012332,-0.022831751,-0.0060906727,0.053507887,-0.0048476397,-0.033779908,-0.008568681,0.042945843,0.015557778,0.008947764,0.040858883,0.01535743,-0.031733554,-0.014365906,0.0012145478,-0.012653413,-0.02241744,-0.00005415466,-0.005944772,-0.0033701686,0.0410025,-0.047128197,0.021731382,-0.009132736,-0.040104598,-0.041204758,0.03361227,-0.015102556,-0.085486546,0.054217294,0.008097941,0.015693776,0.055691294,0.025338741,0.026753506,-0.0043068724,-0.0052188477,-0.026860956,-0.02339998,-0.006184486,0.003941785,-0.015213447,0.011434913,0.002850069,0.0109541295,0.008238506,-0.025728777,0.042504136,0.0019291276,-0.021913018,-0.026364287,-0.0061725266,-0.021406079,0.0048704785,0.03895028,-0.045250814,0.011278067,0.027054194,-0.031413674,0.03416442,0.023965828,0.0053399606,0.034325417,-0.013425412,0.037339628,-0.019279372,0.03566186,-0.006655148,-0.06658192,0.005046546,-0.015504481,0.010546908,-0.043291777,-0.04413891,-0.04988152,0.08177896,0.05036275,0.01644124,-0.0014379107,0.024255613,-0.024105396,0.013691204,0.028028153,0.0018223953,-0.03015957,-0.067614034,-0.026223592,-0.043722246,-0.057574153,0.044068407,-0.04160083,0.0148691535,-0.06199515,0.04576727,-0.02813782,-0.025926787,0.05294758,-0.0055075022,-0.0070233946,-0.037080146,-0.008094346,0.008963273,-0.034346994,-0.014043747,0.015577774,-0.0015659996,-0.0018457673,-0.013213707,-0.04806063,-0.021678824,0.0075029647,-0.024100443,0.02930732,0.041339755,0.014990503,-0.07801809,0.045161545,-0.047620393,-0.051731557,0.032499865,0.02855045,0.03614515,-0.0057707555,-0.0011333714,-0.05693559,-0.039355364,-0.0028181495,-0.0010701724,0.015420613,0.011587787,0.0031880771,0.034810882,0.038862407,-0.06303768,-0.017583659,0.03763303,0.022899076,-0.012741052,0.034025397,0.02415911,-0.022370078,0.022137877,-0.013461385,0.024673723,0.040311642,0.06364941,0.03162614,-0.016534086,-0.02222795,0.027945325,0.025674498,-0.023718059,-0.039659604,-0.007962601,0.03203866,-0.017277177,-0.057789598,0.005863454,0.0016228104,0.053332873,-0.034830425,0.0029137426,0.040061526,-0.029931733,-0.038198125,-0.026972355,0.075112574,0.010935784,0.013148635,0.0037961367,-0.022672692,0.014630495,0.057828974,0.036382347,-0.03838268,0.02845551,-0.05291444],"dimension":768,"model_used":"qwen3","processing_time_ms":314},{"text":"How are you?","embedding":[-0.004742121,-0.028285975,-0.0072721327,-0.023122756,0.02300375,-0.04387419,0.004954264,0.04251561,-0.0491771,-0.012283453,-0.02045992,0.019563705,0.12380614,-0.006649246,-0.04119169,0.084975705,-0.015707437,-0.011854641,0.11029557,-0.06286131,-0.02862425,0.01392823,-0.022030517,0.08863494,-0.04438717,-0.007904946,0.0089219455,0.11182111,-0.034948338,0.0014121088,0.029627359,0.025279567,-0.036695328,0.005713874,-0.015776183,-0.009724424,0.007724148,-0.03939848,-0.019047318,0.016799081,-0.015868356,0.0012757834,0.059271697,0.032568574,0.02712838,0.045280803,0.06352715,-0.00054432935,-0.028824734,-0.002798857,-0.012593757,-0.04633947,-0.007043044,0.030283278,0.00053529453,-0.047833357,0.038306165,-0.019806758,0.0017395378,-0.026097547,-0.06576478,0.059153724,-0.08835784,0.024431298,-0.017478283,0.08649806,-0.015814288,0.010202637,-0.078500494,-0.015419936,0.0013826173,-0.029458683,-0.038203027,0.044586495,-0.019551905,0.055639654,0.007637812,0.006334496,0.01749197,-0.013765181,-0.01587337,-0.010905278,-0.012761322,0.0067865103,-0.016098632,-0.009505348,-0.05046187,-0.015848674,0.02056981,0.029309079,0.026195785,0.01743074,-0.028351681,-0.058485366,0.03965784,-0.0139008,-0.0010841925,0.00076815323,-0.02160749,-0.012906479,0.005067272,-0.0022386685,0.017492378,0.01349513,-0.003485689,0.06710639,-0.061084576,-0.040261433,0.053684413,-0.029575104,0.024042133,-0.028008267,0.01242674,0.032149214,-0.010928923,-0.02605322,-0.034427546,-0.006290597,-0.010307823,-0.00085097936,0.0315166,-0.00082867796,-0.018429723,-0.024253322,-0.005941684,-0.019003594,-0.006715353,-0.012296489,0.03153839,-0.042339012,0.009028099,-0.062028658,0.0041337768,-0.08464613,0.021148834,-0.02120839,0.06863514,0.0017258474,0.046101935,0.00022861086,-0.009189849,0.03578932,0.013432448,-0.016813492,-0.029284487,0.0040750075,-0.010597939,-0.011949942,0.004893096,-0.030070214,0.0027656879,0.00981376,-0.048103377,0.0020191774,-0.013131878,0.03909326,-0.0014112958,-0.011418491,0.025684496,0.018660918,0.023939298,-0.010801204,-0.008825848,-0.013051798,0.0027830072,-0.013286585,-0.04734741,-0.030824775,-0.02007871,-0.006282119,0.016728504,0.038556837,0.00025223056,0.015442045,0.028124014,-0.021863954,-0.01819012,-0.04737181,-0.011459871,-0.045483,0.0117348805,-0.024147453,-0.0038768197,-0.00087706704,0.015867695,-0.003975911,-0.015404027,-0.0057546725,-0.040083196,-0.021699606,-0.013607079,-0.015696064,-0.046015617,-0.013876569,-0.029143695,-0.013502779,0.032003533,-0.022986852,-0.010617916,-0.0068795527,-0.0070732757,0.004651698,0.026508862,0.0067972657,-0.0068252594,0.039020266,0.005015373,-0.015142995,-0.024486432,-0.029444246,-0.028027913,-0.037478004,0.030825725,-0.024102855,-0.00039065117,0.012248081,-0.05989472,-0.010369811,0.020237772,0.041243076,0.043743595,0.019364018,-0.044358943,0.0060412567,-0.016332727,-0.05356422,0.016576061,-0.044831406,0.00981636,-0.0027974932,0.014728481,-0.03166245,-0.010558563,0.053922266,0.041455913,-0.00023815308,0.007998732,-0.0034772877,0.023351688,-0.0027515898,0.0013077952,-0.0003158032,0.030560354,0.007155038,-0.05029961,0.033159453,-0.011783757,0.0053332383,-0.028177777,0.043118656,0.04403452,-0.10019251,0.031794198,-0.006223954,0.010048124,-0.008794733,-0.007937976,-0.0058941427,-0.0016739252,-0.012042538,0.04051639,-0.062117416,-0.03747257,-0.0070180553,0.0077634067,0.03228482,-0.015337232,0.017296385,-0.04709118,-0.033393998,-0.016730761,0.010873913,-0.055355776,-0.012467278,0.0011429243,-0.0050222683,-0.004178195,-0.010054097,0.033084095,0.08591332,-0.020546472,0.0009829424,-0.014462255,-0.022103354,-0.008910164,0.02968877,0.038335506,0.0485233,0.016177205,-0.013931837,0.05894499,-0.01678934,-0.013133997,0.04679121,0.0025400675,-0.033992715,0.031413432,0.101362154,0.008341887,-0.040579375,-0.029936625,-0.037956852,0.02873222,0.023380416,0.008474923,-0.0080737695,0.023045098,-0.04179496,-0.05239921,-0.029243393,0.09000636,0.035241496,0.017359879,0.03351349,-0.026584303,-0.034339573,0.014456286,0.08310269,-0.010723889,-0.0035047636,0.0047213617,0.03752203,-0.015626797,-0.030771697,0.018974915,0.03688094,0.028808106,0.01299607,-0.02646611,0.010367574,-0.029577486,-0.020939577,0.0037850065,-0.005706879,0.028572198,0.011903314,0.011338788,0.00056717056,-0.014614151,0.0044952547,0.040730413,0.010679485,-0.04651278,0.029708868,-0.012592005,-0.07479545,-0.06124988,0.03688197,0.042450424,0.019806128,-0.029359413,0.011022945,-0.011480791,0.0062191044,-0.033819564,-0.02256535,0.009280946,0.0289638,-0.045576673,-0.010879443,0.031900734,-0.05466447,0.03871028,-0.031194963,-0.06367194,0.015447054,-0.022382928,-0.0026909066,0.02442468,-0.04069324,-0.06712901,-0.067578174,-0.013299861,0.025912803,0.036509115,0.009499263,0.057530798,-0.018951282,0.019382305,0.006507781,0.002260826,0.017497536,0.025840065,0.029949224,-0.00005820468,-0.06308767,0.00238863,0.013499019,-0.039016068,-0.00461814,-0.013897039,-0.060581412,-0.0016828496,-0.017547663,0.05695455,-0.017773246,-0.0013027551,-0.0129412385,-0.006333807,0.026665656,0.01331541,0.04050049,-0.011166124,-0.01845173,-0.026021212,0.00600454,-0.0047208867,-0.037983313,0.037177272,0.0083367,-0.004504212,0.018764764,0.024190072,-0.024886206,0.04041205,-0.029774267,0.042187706,-0.018168738,-0.008566579,0.020522937,-0.005989644,-0.017683536,-0.04108127,-0.04437971,-0.03616209,-0.038050998,-0.0018844125,-0.04265985,-0.03817574,0.098413266,0.010823644,-0.053950045,-0.031617824,-0.016048584,-0.03657667,-0.05541076,0.04341141,-0.028076202,0.010740894,-0.030611105,-0.018865634,-0.0064055305,-0.0033578533,-0.052898776,0.010772084,-0.029496204,-0.005896643,0.009005542,-0.0059324587,0.03289233,0.008982322,-0.025317544,-0.017107489,0.07630055,-0.026558887,-0.03812595,0.032542296,-0.0068340637,0.02728917,-0.024924856,0.010982762,-0.0379342,-0.017215684,-0.019579789,-0.0049873246,-0.01207459,0.04922398,-0.024217775,-0.02863802,0.02234329,-0.0081413,0.09835404,-0.0093606245,0.016514672,0.031964347,0.04491389,-0.024566635,-0.0456833,-0.044337105,0.01583726,0.02357277,-0.008125632,-0.0044831326,-0.020189174,0.025289278,-0.04897233,-0.03315391,0.00255569,-0.06627637,0.00057151867,-0.010638313,-0.042805444,-0.011312308,-0.06904263,0.014298265,0.004588954,-0.0066755866,-0.029363792,-0.07720426,0.04446103,0.031201938,0.034721237,0.011206575,0.06680524,-0.0040864185,-0.027047392,-0.043731205,-0.0062415563,0.002546915,-0.0014488219,-0.0033805296,-0.04651463,0.013561382,0.006432569,0.0048152264,-0.005544479,0.024959546,-0.0036618826,-0.027945211,0.008158624,-0.028250184,0.012272111,0.075540446,-0.0035446256,0.0048639593,0.037892703,0.024033187,-0.0074768728,0.0056394604,0.05376065,-0.0139173,-0.0304107,0.02610214,0.005069008,0.0507784,0.011988787,0.011517066,-0.016467296,0.02690165,0.0019133531,0.0333204,-0.020487076,0.012627776,-0.026392542,0.019360872,-0.052182212,-0.052011803,0.0022448245,-0.03354484,0.011008455,0.04969766,0.035096318,0.05270286,-0.061451208,0.03482762,-0.026678909,0.041115478,0.00087154424,0.0636229,-0.039217442,-0.037868716,-0.039364543,0.020218236,-0.008923043,-0.015169439,0.029874552,-0.040044844,-0.027080335,0.04658867,-0.026996791,0.025437353,-0.052263707,-0.026606783,0.033479944,0.012992962,-0.022131244,0.02723854,0.009117898,0.056999244,-0.020913377,0.010846081,-0.027287986,-0.008105181,-0.011248863,-0.024009999,-0.012726499,0.016902085,0.02247331,0.037120566,-0.006740706,-0.01818739,-0.037737638,0.0050196294,0.0108994,0.026049772,-0.03845868,0.0022166567,-0.00032577864,0.021713391,0.03313875,0.04735475,0.0184704,0.006614805,-0.00831309,-0.02153048,0.00040700636,-0.044003274,-0.008425354,-0.0061063026,-0.005935196,-0.03914773,0.031122468,-0.047421448,0.032249507,-0.0018543709,-0.03893734,-0.04380069,-0.017249523,-0.026432501,-0.070519485,-0.0009578351,-0.007230888,0.03827858,0.03310439,0.029970752,0.04646099,-0.0027785709,-0.033861943,-0.007135693,0.0045042234,-0.05602748,-0.03918842,-0.009734658,0.00332631,-0.009053177,0.00820691,-0.015753768,-0.013843714,0.019595698,0.023725398,0.03289698,-0.013157111,0.0037748162,-0.048580445,0.03157986,0.043894954,-0.02542404,0.019395685,0.008829985,-0.010605843,0.009736577,0.03511596,0.024664767,0.054230794,-0.024993017,0.034359545,0.004787591,0.04147105,0.017196782,-0.045503024,-0.009973089,-0.0356511,0.012115947,-0.06340016,-0.042865634,-0.017628035,0.035293356,0.03444595,-0.008006205,0.016221715,-0.0024592548,-0.031804293,0.0032293093,0.0055933595,-0.03265013,0.02971065,-0.031825524,-0.056254506,-0.030121876,-0.017659053,-0.011574499,-0.036685944,0.042729937,-0.07855001,0.028347334,-0.02562543,-0.018166982,0.028740086,0.016807614,-0.013451873,-0.032672618,-0.0029982068,0.0068974667,0.0082025835,0.017460017,-0.027402185,-0.048427783,-0.011092605,-0.030617438,-0.008829505,0.004550576,0.023406442,-0.06138006,0.0063103614,0.042465337,0.020130305,-0.061726503,-0.01362692,-0.037806734,0.011986168,-0.02988337,0.04709838,-0.013343217,-0.026808793,0.017226757,0.01110629,-0.032751862,-0.02378901,-0.008270518,0.0042557865,0.008961744,0.012691355,0.0051510367,0.05235891,-0.004054577,0.012921136,0.035038035,0.05982486,-0.003860448,0.009283418,0.036680862,-0.04382969,-0.0124488175,-0.019157771,0.028029567,0.007136516,0.05167178,0.01699086,0.0022451044,-0.029495176,0.028553052,0.06344924,-0.01848515,0.0044675423,0.00599733,-0.0016001803,-0.03281533,-0.046496406,0.038195916,-0.0001727386,0.010726275,-0.044711288,-0.05551199,0.04481388,-0.08869976,-0.010203734,0.0055523873,0.025730776,0.06698987,0.06503828,-0.018888751,0.0010825741,0.0272216,0.012563091,0.052060258,-0.0142641915,0.0362488,-0.032540787],"dimension":768,"model_used":"qwen3","processing_time_ms":262}],"total_count":2,"total_processing_time_ms":576,"avg_processing_time_ms":288}

4.2 Cosine Similarity Calculation API

Endpoint: POST /api/v1/similarity

Request:

curl -X POST http://localhost:8080/api/v1/similarity \
  -H "Content-Type: application/json" \
-d '{
  "text1": "Hello world",
  "text2": "Hi there",
  "model": "auto",  
  "dimension": 768
}'

Response:

{"model_used":"qwen3","similarity":0.74183154,"processing_time_ms":30.0131}

4.3 Batch Similarity Matching API

Endpoint: POST /api/v1/similarity/batch

Request:

curl -X POST http://localhost:8080/api/v1/similarity/batch \
  -H "Content-Type: application/json" \
  -d '{
    "query": "machine learning",
    "candidates": ["artificial intelligence", "cooking recipes", "deep learning", "gardening tips"],
    "top_k": 2,
    "model": "auto",
    "dimension": 768
  }'

Response:

{"matches":[{"index":2,"similarity":0.9605684,"text":"deep learning"},{"index":0,"similarity":0.9055326,"text":"artificial intelligence"}],"total_candidates":4,"model_used":"gemma","processing_time_ms":356.3315}

4.4 Embedding Models Information API

Endpoint: GET /api/v1/embeddings/models

Request:

curl -X GET http://localhost:8080/api/v1/embeddings/models

Response:

{"count":2,"models":[{"name":"qwen3_embedding_model","type":"embedding","loaded":true,"model_path":"models/Qwen3-Embedding-0.6B","metadata":{"default_dimension":"1024","matryoshka_supported":"true","max_sequence_length":"32768","model_type":"qwen3"}},{"name":"gemma_embedding_model","type":"embedding","loaded":true,"model_path":"models/embeddinggemma-300m","metadata":{"default_dimension":"768","matryoshka_supported":"true","max_sequence_length":"8192","model_type":"gemma"}}]}

Configuration

config.yaml

semantic_router:
  models:
    qwen3_embedding:
      path: "models/Qwen3-Embedding-0.6B"
    gemma_embedding:
      path: "models/embeddinggemma-300m"

Which issue(s) this PR fixes:

part of #266

Release Notes: Yes/No

….6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com> feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com>

github-actions · 2025-10-16T12:26:10Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/src/ffi/embedding.rs
candle-binding/src/ffi/embedding_test.rs
candle-binding/src/model_architectures/embedding/dense_layers.rs
candle-binding/src/model_architectures/embedding/dense_layers_test.rs
candle-binding/src/model_architectures/embedding/gemma3_model.rs
candle-binding/src/model_architectures/embedding/gemma3_model_test.rs
candle-binding/src/model_architectures/embedding/gemma_embedding.rs
candle-binding/src/model_architectures/embedding/gemma_embedding_test.rs
candle-binding/src/model_architectures/embedding/mod.rs
candle-binding/src/model_architectures/embedding/pooling.rs
candle-binding/src/model_architectures/embedding/pooling_test.rs
candle-binding/src/model_architectures/embedding/qwen3_embedding.rs
candle-binding/src/model_architectures/embedding/qwen3_embedding_test.rs
candle-binding/test_data/gemma_reference_outputs.json
candle-binding/test_data/qwen3_reference_outputs.json
candle-binding/Cargo.toml
candle-binding/semantic-router.go
candle-binding/semantic-router_test.go
candle-binding/src/classifiers/lora/mod.rs
candle-binding/src/classifiers/mod.rs
candle-binding/src/classifiers/traditional/mod.rs
candle-binding/src/classifiers/unified.rs
candle-binding/src/classifiers/unified_test.rs
candle-binding/src/core/config_loader.rs
candle-binding/src/core/mod.rs
candle-binding/src/core/tokenization.rs
candle-binding/src/core/unified_error.rs
candle-binding/src/ffi/mod.rs
candle-binding/src/ffi/similarity.rs
candle-binding/src/ffi/types.rs
candle-binding/src/model_architectures/config.rs
candle-binding/src/model_architectures/mod.rs
candle-binding/src/model_architectures/model_factory.rs
candle-binding/src/model_architectures/routing.rs
candle-binding/src/model_architectures/traditional/modernbert.rs
candle-binding/src/model_architectures/traits.rs
candle-binding/src/test_fixtures.rs
candle-binding/src/utils/memory.rs

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

scripts/generate_gemma_reference.py
scripts/generate_qwen3_reference.py

📁 `config`

Owners: @rootfs
Files changed:

config/config.yaml

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/cmd/main.go
src/semantic-router/pkg/api/server.go
src/semantic-router/pkg/config/config.go

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/make/models.mk
tools/make/rust.mk

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-16T12:30:35Z

Test failure is fixed in main branch, merging it for testing.

….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com>

….6B and EmbeddingGemma-300M) (#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com>

….6B and EmbeddingGemma-300M) (vllm-project#453) feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <aukovyps@163.com> Signed-off-by: Huamin Chen <hchen@redhat.com>

OneZero-Y requested review from Xunzhuo, rootfs and wangchen615 as code owners October 16, 2025 12:25

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Oct 16, 2025

rootfs merged commit fa4f5c7 into vllm-project:feat-candle-refactoring Oct 16, 2025
3 of 4 checks passed

OneZero-Y deleted the feat/support-embedding-models-1 branch October 18, 2025 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

Uh oh!

OneZero-Y commented Oct 16, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

rootfs commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

feat:support for two long-context embedding models (Qwen3 and Gemma) #453

Uh oh!

Conversation

OneZero-Y commented Oct 16, 2025

1. Qwen3-Embedding-0.6B

2. EmbeddingGemma-300M

3. Intelligent Routing System

4. Enhanced API Endpoints

4.1 Embedding Generation API

4.2 Cosine Similarity Calculation API

4.3 Batch Similarity Matching API

4.4 Embedding Models Information API

Configuration

config.yaml

Uh oh!

github-actions bot commented Oct 16, 2025

👥 vLLM Semantic Team Notification

📁 candle-binding

📁 Root Directory

📁 config

📁 src

📁 tools

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

📁 `candle-binding`

📁 `Root Directory`

📁 `config`

📁 `src`

📁 `tools`