Skip to content

The first time result of rerank is wrong each time restarting the ovms #3224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Septa2112 opened this issue Apr 10, 2025 · 7 comments
Open
Labels
bug Something isn't working

Comments

@Septa2112
Copy link

Septa2112 commented Apr 10, 2025

Describe the bug
Each time ovms is restarted, the result of the first time rerank must be 0.5.

To Reproduce
Steps to reproduce the behavior:

  1. Start OVMS: .\ovms\ovms.exe --port 9000 --rest_port 8000 --config_path ..\local_models\config.json
  2. Client command: I post a request to http://127.0.0.1:8000/v3/rerank/. The body is
{
    "model": "BAAI/bge-reranker-base",
    "query": "Hello",
    "documents": ["Welcome","Farewell"]
}
  1. See Error: The first time result of rerank must be 0.5
{
    "results": [
        {
            "index": 0,
            "relevance_score": 0.5
        },
        {
            "index": 1,
            "relevance_score": 0.5
        }
    ]
}

Expected behavior
The results for the first time, the second time, and the subsequent times should be the same as

{
    "results": [
        {
            "index": 1,
            "relevance_score": 0.9936364889144897
        },
        {
            "index": 0,
            "relevance_score": 0.9777563810348511
        }
    ]
}
@Septa2112 Septa2112 added the bug Something isn't working label Apr 10, 2025
@Septa2112 Septa2112 changed the title The first time result of rerank is wrong after restarting the ovms The first time result of rerank is wrong each time restarting the ovms Apr 10, 2025
@dtrawins
Copy link
Collaborator

dtrawins commented Apr 11, 2025

@Septa2112 I was unable to reproduce such behavior. Were you using the latest version 2025.1? Can you send the server logs in debug model with --log_level DEBUG
Did you get the first time the relevance score 0.5 regardless of the documents and query? What is your host platform and windows version? Did you export the model with a target_device CPU or GPU?

@Septa2112
Copy link
Author

@dtrawins Thanks for you suggestions. I found that this problem occurs when target_device is GPU, but when it is CPU, everything is ok.

I can solve this problem by setting the device to CPU. But what puzzles me is why the score is 0.5 in the first few times and normal in the next few times when target_device is GPU.

Version Information

  • openvino model server: 2025.0
  • CPU: i7-1185G7
  • Windows version: Windows 11 Enterprise 23H2

@Septa2112
Copy link
Author

Another problem occurs

When I run model on MTL with GPU by the latest ovms (2025.1), it will failed to load model.

Version Information

  • ovms: 2025.1
  • CPU: Intel(R) Core(TM) Ultra 7 155H
  • windows version: Windows 11 Pro 23H2

Command

  • export model
python .\export_model.py rerank --source_model BAAI/bge-reranker-base --target_device GPU --config_file_path .\local_models_gpu\config.json --model_repository_path .\local_models_gpu
  • run ovms
.\ovms.exe --port 9000 --rest_port 8000 --config_path ..\local_models_gpu\config.json --log_level DEBUG

Server Log

(base) PS C:\WorkSpace\LiuJia\ovms_models\2025.1\ovms> .\ovms.exe --port 9000 --rest_port 8000 --config_path ..\local_models_gpu\config.json --log_level DEBUG
[2025-04-14 14:42:16.930][16344][serving][info][src/server.cpp:89] OpenVINO Model Server 2025.1.a53a7255
[2025-04-14 14:42:16.930][16344][serving][info][src/server.cpp:90] OpenVINO backend 2025.1.0.0rc3
[2025-04-14 14:42:16.930][16344][serving][debug][src/server.cpp:91] CLI parameters passed to ovms server
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:108] config_path: ..\local_models_gpu\config.json
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:110] gRPC port: 9000
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:111] REST port: 8000
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:112] gRPC bind address: 0.0.0.0
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:113] REST bind address: 0.0.0.0
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:114] REST workers: 22
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:115] gRPC workers: 1
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:116] gRPC channel arguments:
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:117] log level: DEBUG
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:118] log path:
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:119] file system poll wait milliseconds: 1000
[2025-04-14 14:42:16.931][16344][serving][debug][src/server.cpp:120] sequence cleaner poll wait minutes: 5
[2025-04-14 14:42:16.931][16344][serving][info][src/python/pythoninterpretermodule.cpp:37] PythonInterpreterModule starting
Python version:
3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
Python sys.path output:
['', 'C:\\WorkSpace\\LiuJia\\ovms_models\\2025.1\\ovms\\python\\python311', 'C:\\WorkSpace\\LiuJia\\ovms_models\\2025.1\\ovms\\python', 'C:\\WorkSpace\\LiuJia\\ovms_models\\2025.1\\ovms\\python\\Scripts', 'C:\\WorkSpace\\LiuJia\\ovms_models\\2025.1\\ovms\\python\\Lib\\site-packages']
[2025-04-14 14:42:16.949][16344][serving][info][src/python/pythoninterpretermodule.cpp:50] PythonInterpreterModule started
[2025-04-14 14:42:16.950][16344][modelmanager][debug][src/mediapipe_internal/mediapipefactory.cpp:52] Registered Calculators: AddHeaderCalculator, AlignmentPointsRectsCalculator, AnnotationOverlayCalculator, AnomalyCalculator, AnomalySerializationCalculator, AssociationNormRectCalculator, BeginLoopDetectionCalculator, BeginLoopFloatCalculator, BeginLoopGpuBufferCalculator, BeginLoopImageCalculator, BeginLoopImageFrameCalculator, BeginLoopIntCalculator, BeginLoopMatrixCalculator, BeginLoopMatrixVectorCalculator, BeginLoopModelApiDetectionCalculator, BeginLoopNormalizedLandmarkListVectorCalculator, BeginLoopNormalizedRectCalculator, BeginLoopRectanglePredictionCalculator, BeginLoopStringCalculator, BeginLoopTensorCalculator, BeginLoopUint64tCalculator, BoxDetectorCalculator, BoxTrackerCalculator, CallbackCalculator, CallbackPacketCalculator, CallbackWithHeaderCalculator, ClassificationCalculator, ClassificationListVectorHasMinSizeCalculator, ClassificationListVectorSizeCalculator, ClassificationSerializationCalculator, ClipDetectionVectorSizeCalculator, ClipNormalizedRectVectorSizeCalculator, ColorConvertCalculator, ConcatenateBoolVectorCalculator, ConcatenateClassificationListCalculator, ConcatenateClassificationListVectorCalculator, ConcatenateDetectionVectorCalculator, ConcatenateFloatVectorCalculator, ConcatenateImageVectorCalculator, ConcatenateInt32VectorCalculator, ConcatenateJointListCalculator, ConcatenateLandmarListVectorCalculator, ConcatenateLandmarkListCalculator, ConcatenateLandmarkListVectorCalculator, ConcatenateLandmarkVectorCalculator, ConcatenateNormalizedLandmarkListCalculator, ConcatenateNormalizedLandmarkListVectorCalculator, ConcatenateRenderDataVectorCalculator, ConcatenateStringVectorCalculator, ConcatenateTensorVectorCalculator, ConcatenateTfLiteTensorVectorCalculator, ConcatenateUInt64VectorCalculator, ConstantSidePacketCalculator, CountingSourceCalculator, CropCalculator, DefaultSidePacketCalculator, DequantizeByteArrayCalculator, DetectionCalculator, DetectionClassificationCombinerCalculator, DetectionClassificationResultCalculator, DetectionClassificationSerializationCalculator, DetectionExtractionCalculator, DetectionLabelIdToTextCalculator, DetectionLetterboxRemovalCalculator, DetectionProjectionCalculator, DetectionSegmentationCombinerCalculator, DetectionSegmentationResultCalculator, DetectionSegmentationSerializationCalculator, DetectionSerializationCalculator, DetectionsToRectsCalculator, DetectionsToRenderDataCalculator, EmbeddingsCalculator, EmptyLabelCalculator, EmptyLabelClassificationCalculator, EmptyLabelDetectionCalculator, EmptyLabelRotatedDetectionCalculator, EmptyLabelSegmentationCalculator, EndLoopAffineMatrixCalculator, EndLoopBooleanCalculator, EndLoopClassificationListCalculator, EndLoopDetectionCalculator, EndLoopFloatCalculator, EndLoopGpuBufferCalculator, EndLoopImageCalculator, EndLoopImageFrameCalculator, EndLoopImageSizeCalculator, EndLoopLandmarkListVectorCalculator, EndLoopMatrixCalculator, EndLoopModelApiDetectionClassificationCalculator, EndLoopModelApiDetectionSegmentationCalculator, EndLoopNormalizedLandmarkListVectorCalculator, EndLoopNormalizedRectCalculator, EndLoopPolygonPredictionsCalculator, EndLoopRectanglePredictionsCalculator, EndLoopRenderDataCalculator, EndLoopTensorCalculator, EndLoopTfLiteTensorCalculator, FaceLandmarksToRenderDataCalculator, FeatureDetectorCalculator, FlowLimiterCalculator, FlowPackagerCalculator, FlowToImageCalculator, FromImageCalculator, GateCalculator, GetClassificationListVectorItemCalculator, GetDetectionVectorItemCalculator, GetLandmarkListVectorItemCalculator, GetNormalizedLandmarkListVectorItemCalculator, GetNormalizedRectVectorItemCalculator, GetRectVectorItemCalculator, GraphProfileCalculator, HandDetectionsFromPoseToRectsCalculator, HandLandmarksToRectCalculator, HttpLLMCalculator, HttpSerializationCalculator, ImageCloneCalculator, ImageCroppingCalculator, ImagePropertiesCalculator, ImageToTensorCalculator, ImageTransformationCalculator, ImmediateMuxCalculator, InferenceCalculatorCpu, InstanceSegmentationCalculator, InverseMatrixCalculator, IrisToRenderDataCalculator, KeypointDetectionCalculator, LandmarkLetterboxRemovalCalculator, LandmarkListVectorSizeCalculator, LandmarkProjectionCalculator, LandmarkVisibilityCalculator, LandmarksRefinementCalculator, LandmarksSmoothingCalculator, LandmarksToDetectionCalculator, LandmarksToRenderDataCalculator, MakePairCalculator, MatrixMultiplyCalculator, MatrixSubtractCalculator, MatrixToVectorCalculator, MediaPipeInternalSidePacketToPacketStreamCalculator, MergeCalculator, MergeDetectionsToVectorCalculator, MergeGpuBuffersToVectorCalculator, MergeImagesToVectorCalculator, ModelInferHttpRequestCalculator, ModelInferRequestImageCalculator, MotionAnalysisCalculator, MuxCalculator, NonMaxSuppressionCalculator, NonZeroCalculator, NormalizedLandmarkListVectorHasMinSizeCalculator, NormalizedRectVectorHasMinSizeCalculator, OpenCvEncodedImageToImageFrameCalculator, OpenCvImageEncoderCalculator, OpenCvPutTextCalculator, OpenCvVideoDecoderCalculator, OpenCvVideoEncoderCalculator, OpenVINOConverterCalculator, OpenVINOInferenceAdapterCalculator, OpenVINOInferenceCalculator, OpenVINOModelServerSessionCalculator, OpenVINOTensorsToClassificationCalculator, OpenVINOTensorsToDetectionsCalculator, OverlayCalculator, PacketGeneratorWrapperCalculator, PacketInnerJoinCalculator, PacketPresenceCalculator, PacketResamplerCalculator, PacketSequencerCalculator, PacketThinnerCalculator, PassThroughCalculator, PreviousLoopbackCalculator, PyTensorOvTensorConverterCalculator, PythonExecutorCalculator, QuantizeFloatVectorCalculator, RectToRenderDataCalculator, RectToRenderScaleCalculator, RectTransformationCalculator, RefineLandmarksFromHeatmapCalculator, RerankCalculator, ResourceProviderCalculator, RoiTrackingCalculator, RotatedDetectionCalculator, RotatedDetectionSerializationCalculator, RoundRobinDemuxCalculator, SegmentationCalculator, SegmentationSerializationCalculator, SegmentationSmoothingCalculator, SequenceShiftCalculator, SerializationCalculator, SetLandmarkVisibilityCalculator, SidePacketToStreamCalculator, SplitAffineMatrixVectorCalculator, SplitClassificationListVectorCalculator, SplitDetectionVectorCalculator, SplitFloatVectorCalculator, SplitImageVectorCalculator, SplitJointListCalculator, SplitLandmarkListCalculator, SplitLandmarkVectorCalculator, SplitMatrixVectorCalculator, SplitNormalizedLandmarkListCalculator, SplitNormalizedLandmarkListVectorCalculator, SplitNormalizedRectVectorCalculator, SplitTensorVectorCalculator, SplitTfLiteTensorVectorCalculator, SplitUint64tVectorCalculator, SsdAnchorsCalculator, StreamToSidePacketCalculator, StringToInt32Calculator, StringToInt64Calculator, StringToIntCalculator, StringToUint32Calculator, StringToUint64Calculator, StringToUintCalculator, SwitchDemuxCalculator, SwitchMuxCalculator, TensorsToClassificationCalculator, TensorsToDetectionsCalculator, TensorsToFloatsCalculator, TensorsToLandmarksCalculator, TensorsToSegmentationCalculator, TfLiteConverterCalculator, TfLiteCustomOpResolverCalculator, TfLiteInferenceCalculator, TfLiteModelCalculator, TfLiteTensorsToDetectionsCalculator, TfLiteTensorsToFloatsCalculator, TfLiteTensorsToLandmarksCalculator, ThresholdingCalculator, ToImageCalculator, TrackedDetectionManagerCalculator, UpdateFaceLandmarksCalculator, VideoPreStreamCalculator, VisibilityCopyCalculator, VisibilitySmoothingCalculator, WarpAffineCalculator, WarpAffineCalculatorCpu, WorldLandmarkProjectionCalculator

[2025-04-14 14:42:16.950][16344][modelmanager][debug][src/mediapipe_internal/mediapipefactory.cpp:52] Registered Subgraphs: FaceDetection, FaceDetectionFrontDetectionToRoi, FaceDetectionFrontDetectionsToRoi, FaceDetectionShortRange, FaceDetectionShortRangeByRoiCpu, FaceDetectionShortRangeCpu, FaceLandmarkCpu, FaceLandmarkFrontCpu, FaceLandmarkLandmarksToRoi, FaceLandmarksFromPoseCpu, FaceLandmarksFromPoseToRecropRoi, FaceLandmarksModelLoader, FaceLandmarksToRoi, FaceTracking, HandLandmarkCpu, HandLandmarkModelLoader, HandLandmarksFromPoseCpu, HandLandmarksFromPoseToRecropRoi, HandLandmarksLeftAndRightCpu, HandLandmarksToRoi, HandRecropByRoiCpu, HandTracking, HandVisibilityFromHandLandmarksFromPose, HandWristForPose, HolisticLandmarkCpu, HolisticTrackingToRenderData, InferenceCalculator, IrisLandmarkCpu, IrisLandmarkLandmarksToRoi, IrisLandmarkLeftAndRightCpu, IrisRendererCpu, PoseDetectionCpu, PoseDetectionToRoi, PoseLandmarkByRoiCpu, PoseLandmarkCpu, PoseLandmarkFiltering, PoseLandmarkModelLoader, PoseLandmarksAndSegmentationInverseProjection, PoseLandmarksToRoi, PoseSegmentationFiltering, SwitchContainer, TensorsToFaceLandmarks, TensorsToFaceLandmarksWithAttention, TensorsToPoseLandmarksAndSegmentation

[2025-04-14 14:42:16.950][16344][modelmanager][debug][src/mediapipe_internal/mediapipefactory.cpp:52] Registered InputStreamHandlers: BarrierInputStreamHandler, DefaultInputStreamHandler, EarlyCloseInputStreamHandler, FixedSizeInputStreamHandler, ImmediateInputStreamHandler, MuxInputStreamHandler, SyncSetInputStreamHandler, TimestampAlignInputStreamHandler

[2025-04-14 14:42:16.951][16344][modelmanager][debug][src/mediapipe_internal/mediapipefactory.cpp:52] Registered OutputStreamHandlers: InOrderOutputStreamHandler

[2025-04-14 14:42:17.101][16344][modelmanager][info][src/modelmanager.cpp:165] Available devices for Open VINO: CPU, GPU, NPU
[2025-04-14 14:42:17.101][16344][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: CPU; plugin configuration
[2025-04-14 14:42:17.102][16344][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: CPU; plugin configuration: { AVAILABLE_DEVICES: , CPU_DENORMALS_OPTIMIZATION: NO, CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1, DEVICE_ARCHITECTURE: intel64, DEVICE_ID: , DEVICE_TYPE: integrated, DYNAMIC_QUANTIZATION_GROUP_SIZE: 32, ENABLE_CPU_PINNING: YES, ENABLE_CPU_RESERVATION: NO, ENABLE_HYPER_THREADING: YES, EXECUTION_DEVICES: CPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Core(TM) Ultra 7 155H, INFERENCE_NUM_THREADS: 0, INFERENCE_PRECISION_HINT: f32, KEY_CACHE_GROUP_SIZE: 0, KEY_CACHE_PRECISION: u8, KV_CACHE_PRECISION: u8, LOG_LEVEL: LOG_NONE, MODEL_DISTRIBUTION_POLICY: , NUM_STREAMS: 1, OPTIMIZATION_CAPABILITIES: FP32 INT8 BIN EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 1 1, RANGE_FOR_STREAMS: 1 22, SCHEDULING_CORE_TYPE: ANY_CORE, VALUE_CACHE_GROUP_SIZE: 0, VALUE_CACHE_PRECISION: u8 }
[2025-04-14 14:42:17.102][16344][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: GPU; plugin configuration
[2025-04-14 14:42:17.102][16344][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: GPU; plugin configuration: { ACTIVATIONS_SCALE_FACTOR: -1, AVAILABLE_DEVICES: 0, CACHE_DIR: , CACHE_ENCRYPTION_CALLBACKS: , CACHE_MODE: optimize_speed, COMPILATION_NUM_THREADS: 22, DEVICE_ARCHITECTURE: GPU: vendor=0x8086 arch=v12.71.4, DEVICE_GOPS: {f16:9216,f32:4608,i8:18432,u8:18432}, DEVICE_ID: 0, DEVICE_LUID: 922b010000000000, DEVICE_TYPE: integrated, DEVICE_UUID: 8680557d080000000002000000000000, DYNAMIC_QUANTIZATION_GROUP_SIZE: 0, ENABLE_CPU_PINNING: NO, ENABLE_CPU_RESERVATION: NO, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) Arc(TM) Graphics (iGPU), GPU_DEVICE_TOTAL_MEM_SIZE: 15482728448, GPU_DISABLE_WINOGRAD_CONVOLUTION: NO, GPU_ENABLE_LOOP_UNROLLING: YES, GPU_ENABLE_SDPA_OPTIMIZATION: YES, GPU_EXECUTION_UNITS_COUNT: 128, GPU_HOST_TASK_PRIORITY: MEDIUM, GPU_MEMORY_STATISTICS: , GPU_QUEUE_PRIORITY: MEDIUM, GPU_QUEUE_THROTTLE: MEDIUM, GPU_UARCH_VERSION: 12.71.4, INFERENCE_PRECISION_HINT: f16, KV_CACHE_PRECISION: dynamic, MAX_BATCH_SIZE: 1, MODEL_PRIORITY: MEDIUM, MODEL_PTR: 0000000000000000, NUM_STREAMS: 1, OPTIMAL_BATCH_SIZE: 1, OPTIMIZATION_CAPABILITIES: FP32 BIN FP16 INT8 EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 2 1, RANGE_FOR_STREAMS: 1 2, WEIGHTS_PATH:  }
[2025-04-14 14:42:17.103][16344][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: NPU; plugin configuration
[2025-04-14 14:42:17.103][16344][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: NPU; plugin configuration: { AVAILABLE_DEVICES: 3720, CACHE_DIR: , COMPILATION_NUM_THREADS: 22, DEVICE_ARCHITECTURE: 3720, DEVICE_GOPS: {bf16:0,f16:5734.4,f32:0,i8:11468.8,u8:11468.8}, DEVICE_ID: , DEVICE_PCI_INFO: {domain: 0 bus: 0 device: 0xb function: 0}, DEVICE_TYPE: integrated, DEVICE_UUID: 80d1d11eb73811eab3de0242ac130004, ENABLE_CPU_PINNING: NO, EXECUTION_DEVICES: NPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: Intel(R) AI Boost, INFERENCE_PRECISION_HINT: f16, LOG_LEVEL: LOG_ERROR, MODEL_PRIORITY: MEDIUM, NPU_BYPASS_UMD_CACHING: NO, NPU_COMPILATION_MODE_PARAMS: , NPU_COMPILER_VERSION: 327685, NPU_DEFER_WEIGHTS_LOAD: NO, NPU_DEVICE_ALLOC_MEM_SIZE: 0, NPU_DEVICE_TOTAL_MEM_SIZE: 2147483648, NPU_DRIVER_VERSION: 2552, NPU_MAX_TILES: 2, NPU_TILES: -1, NUM_STREAMS: 1, OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1, OPTIMIZATION_CAPABILITIES: FP16 INT8 EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 1, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 10 1, RANGE_FOR_STREAMS: 1 4, WEIGHTS_PATH:  }
[2025-04-14 14:42:17.103][16344][serving][info][src/capi_frontend/capimodule.cpp:40] C-APIModule starting
[2025-04-14 14:42:17.103][16344][serving][info][src/capi_frontend/capimodule.cpp:42] C-APIModule started
[2025-04-14 14:42:17.103][16344][serving][info][src/grpcservermodule.cpp:172] GRPCServerModule starting
[2025-04-14 14:42:17.103][16344][serving][debug][src/grpcservermodule.cpp:204] setting grpc channel argument grpc.max_concurrent_streams: 22
[2025-04-14 14:42:17.104][16344][serving][debug][src/grpcservermodule.cpp:217] setting grpc MaxThreads ResourceQuota 176
[2025-04-14 14:42:17.104][16344][serving][debug][src/grpcservermodule.cpp:221] setting grpc Memory ResourceQuota 2147483648
[2025-04-14 14:42:17.104][16344][serving][debug][src/grpcservermodule.cpp:228] Starting gRPC servers: 1
[2025-04-14 14:42:17.104][16344][serving][info][src/grpcservermodule.cpp:249] GRPCServerModule started
[2025-04-14 14:42:17.104][16344][serving][info][src/grpcservermodule.cpp:250] Started gRPC server on port 9000
[2025-04-14 14:42:17.104][16344][serving][info][src/httpservermodule.cpp:33] HTTPServerModule starting
[2025-04-14 14:42:17.104][16344][serving][info][src/httpservermodule.cpp:37] Will start 22 REST workers
[2025-04-14 14:42:17.104][16344][serving][debug][src/drogon_http_server.cpp:39] Starting http thread pool for streaming (22 threads)
[2025-04-14 14:42:17.105][16344][serving][debug][src/drogon_http_server.cpp:41] Thread pool started
[2025-04-14 14:42:17.105][16344][serving][debug][src/drogon_http_server.cpp:65] DrogonHttpServer::startAcceptingRequests()
[2025-04-14 14:42:17.105][16344][serving][debug][src/drogon_http_server.cpp:129] Waiting for drogon to become ready on port 8000...
[2025-04-14 14:42:17.105][13600][serving][debug][src/drogon_http_server.cpp:101] Starting to listen on port 8000
[2025-04-14 14:42:17.105][13600][serving][debug][src/drogon_http_server.cpp:102] Thread pool size for unary (22 drogon threads)
[2025-04-14 14:42:17.162][16344][serving][debug][src/drogon_http_server.cpp:138] Drogon run procedure took: 57.118 ms
[2025-04-14 14:42:17.162][16344][serving][info][src/drogon_http_server.cpp:142] REST server listening on port 8000 with 22 unary threads and 22 streaming threads
[2025-04-14 14:42:17.162][16344][serving][info][src/httpservermodule.cpp:58] HTTPServerModule started
[2025-04-14 14:42:17.162][16344][serving][info][src/httpservermodule.cpp:59] Started REST server at 0.0.0.0:8000
[2025-04-14 14:42:17.162][16344][serving][info][src/servablemanagermodule.cpp:51] ServableManagerModule starting
[2025-04-14 14:42:17.162][16344][modelmanager][debug][src/modelmanager.cpp:1119] Loading configuration from ..\local_models_gpu\config.json for: 1 time
[2025-04-14 14:42:17.163][16344][modelmanager][debug][src/modelmanager.cpp:806] Configuration file doesn't have monitoring property.
[2025-04-14 14:42:17.163][16344][modelmanager][debug][src/modelmanager.cpp:1171] Reading metric config only once per server start.
[2025-04-14 14:42:17.163][16344][serving][debug][src/mediapipe_internal/mediapipegraphconfig.cpp:109] graph_path not defined in config so it will be set to default based on base_path and graph name: ..\local_models_gpu\BAAI\bge-reranker-base\graph.pbtxt
[2025-04-14 14:42:17.163][16344][serving][debug][src/mediapipe_internal/mediapipegraphconfig.cpp:118] No subconfig path was provided for graph: BAAI/bge-reranker-base so default subconfig file: ..\local_models_gpu\BAAI\bge-reranker-base\subconfig.json will be loaded.
[2025-04-14 14:42:17.163][16344][modelmanager][debug][src/modelmanager.cpp:942] Loading subconfig models from subconfig path: ..\local_models_gpu\BAAI\bge-reranker-base\subconfig.json provided for graph: BAAI/bge-reranker-base
[2025-04-14 14:42:17.163][16344][serving][debug][src/mediapipe_internal/mediapipegraphconfig.cpp:109] graph_path not defined in config so it will be set to default based on base_path and graph name: ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer\graph.pbtxt
[2025-04-14 14:42:17.164][16344][serving][debug][src/mediapipe_internal/mediapipegraphconfig.cpp:118] No subconfig path was provided for graph: BAAI/bge-reranker-base_tokenizer_model so default subconfig file: ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer\subconfig.json will be loaded.
[2025-04-14 14:42:17.164][16344][modelmanager][debug][src/modelmanager.cpp:848] Graph.pbtxt not found for config BAAI/bge-reranker-base_tokenizer_model, ..\local_models_gpu\BAAI\bge-reranker-base\..\local_models_gpu\BAAI\bge-reranker-base\..\local_models_gpu\BAAI\bge-reranker-base\tokenizer\1\graph.pbtxt
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:634] Specified model parameters:
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:635] model_basepath: ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:636] model_name: BAAI/bge-reranker-base_tokenizer_model
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:637] batch_size: not configured
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:641] shape:
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:647] model_version_policy: latest: 1
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:649] nireq: 0
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:650] target_device: CPU
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:651] plugin_config:
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:659] Batch size set: false, shape set: false
[2025-04-14 14:42:17.164][16344][serving][debug][src/modelconfig.cpp:666] stateful: false
[2025-04-14 14:42:17.164][16344][modelmanager][debug][src/ov_utils.cpp:101] Validating plugin: CPU; configuration
[2025-04-14 14:42:17.164][16344][serving][info][src/model.cpp:42] Getting model from ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer
[2025-04-14 14:42:17.164][16344][serving][info][src/model.cpp:49] Model downloaded to ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer
[2025-04-14 14:42:17.164][16344][serving][info][src/model.cpp:149] Will add model: BAAI/bge-reranker-base_tokenizer_model; version: 1 ...
[2025-04-14 14:42:17.164][16344][modelmanager][debug][src/modelconfig.cpp:421] Parsing model: BAAI/bge-reranker-base_tokenizer_model mapping from path: ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer\1
[2025-04-14 14:42:17.164][16344][serving][debug][src/model.cpp:123] Creating new model instance - model name: BAAI/bge-reranker-base_tokenizer_model; model version: 1;
[2025-04-14 14:42:17.165][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_tokenizer_model status change. New status: ( "state": "START", "error_code": "OK" )
[2025-04-14 14:42:17.165][16344][serving][info][src/modelinstance.cpp:1059] Loading model: BAAI/bge-reranker-base_tokenizer_model, version: 1, from path: ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer\1, with target device: CPU ...
[2025-04-14 14:42:17.165][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_tokenizer_model status change. New status: ( "state": "START", "error_code": "OK" )
[2025-04-14 14:42:17.165][16344][serving][debug][src/modelversionstatus.cpp:81] setLoading: BAAI/bge-reranker-base_tokenizer_model - 1 (previous state: START) -> error: OK
[2025-04-14 14:42:17.165][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_tokenizer_model status change. New status: ( "state": "LOADING", "error_code": "OK" )
[2025-04-14 14:42:17.165][16344][serving][debug][src/modelinstance.cpp:901] Getting model files from path: ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer\1
[2025-04-14 14:42:17.165][16344][serving][debug][src/modelinstance.cpp:724] Try reading model file: ..\local_models_gpu\BAAI\bge-reranker-base\tokenizer\1\model.xml
[2025-04-14 14:42:17.171][16344][modelmanager][debug][src/modelinstance.cpp:237] Applying layout configuration:
[2025-04-14 14:42:17.171][16344][modelmanager][debug][src/modelinstance.cpp:279] model: BAAI/bge-reranker-base_tokenizer_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); input name: Parameter_4039
[2025-04-14 14:42:17.171][16344][modelmanager][debug][src/modelinstance.cpp:332] model: BAAI/bge-reranker-base_tokenizer_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); output name: input_ids
[2025-04-14 14:42:17.171][16344][modelmanager][debug][src/modelinstance.cpp:332] model: BAAI/bge-reranker-base_tokenizer_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); output name: attention_mask
[2025-04-14 14:42:17.172][16344][serving][debug][src/modelinstance.cpp:549] model: BAAI/bge-reranker-base_tokenizer_model, version: 1; reshaping inputs is not required
[2025-04-14 14:42:17.172][16344][modelmanager][debug][src/modelinstance.cpp:201] Reporting input layout from RTMap: [N,...]; for tensor name: Parameter_4039
[2025-04-14 14:42:17.172][16344][modelmanager][info][src/modelinstance.cpp:593] Input name: Parameter_4039; mapping_name: Parameter_4039; shape: (-1); precision: STRING; layout: N...
[2025-04-14 14:42:17.173][16344][modelmanager][debug][src/modelinstance.cpp:212] Reporting output layout from RTMap: [N,...]; for tensor name: input_ids
[2025-04-14 14:42:17.173][16344][modelmanager][info][src/modelinstance.cpp:656] Output name: input_ids; mapping_name: input_ids; shape: (-1,-1); precision: I64; layout: N...
[2025-04-14 14:42:17.173][16344][modelmanager][debug][src/modelinstance.cpp:212] Reporting output layout from RTMap: [N,...]; for tensor name: attention_mask
[2025-04-14 14:42:17.173][16344][modelmanager][info][src/modelinstance.cpp:656] Output name: attention_mask; mapping_name: attention_mask; shape: (-1,-1); precision: I64; layout: N...
[2025-04-14 14:42:17.197][16344][modelmanager][info][src/modelinstance.cpp:1363] Number of OpenVINO streams: 1
[2025-04-14 14:42:17.197][16344][modelmanager][info][src/modelinstance.cpp:863] Plugin config for device: CPU
[2025-04-14 14:42:17.197][16344][modelmanager][info][src/modelinstance.cpp:867] OVMS set plugin settings key: PERFORMANCE_HINT; value: LATENCY;
[2025-04-14 14:42:17.197][16344][modelmanager][debug][ov_utils.hpp:56] Logging compiled model: BAAI/bge-reranker-base_tokenizer_model;  version: 1; target device: CPU;plugin configuration
[2025-04-14 14:42:17.197][16344][modelmanager][debug][ov_utils.hpp:91] compiled model: BAAI/bge-reranker-base_tokenizer_model;  version: 1; target device: CPU;plugin configuration: { CPU_DENORMALS_OPTIMIZATION: NO, CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1, DYNAMIC_QUANTIZATION_GROUP_SIZE: 32, ENABLE_CPU_PINNING: NO, ENABLE_CPU_RESERVATION: NO, ENABLE_HYPER_THREADING: NO, EXECUTION_DEVICES: CPU, EXECUTION_MODE_HINT: PERFORMANCE, INFERENCE_NUM_THREADS: 14, INFERENCE_PRECISION_HINT: f32, KEY_CACHE_GROUP_SIZE: 0, KEY_CACHE_PRECISION: u8, KV_CACHE_PRECISION: u8, LOG_LEVEL: LOG_NONE, MODEL_DISTRIBUTION_POLICY: , NETWORK_NAME: tokenizer, NUM_STREAMS: 1, OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, SCHEDULING_CORE_TYPE: ANY_CORE, VALUE_CACHE_GROUP_SIZE: 0, VALUE_CACHE_PRECISION: u8 }
[2025-04-14 14:42:17.197][16344][serving][info][src/modelinstance.cpp:934] Loaded model BAAI/bge-reranker-base_tokenizer_model; version: 1; batch size: -1; No of InferRequests: 1
[2025-04-14 14:42:17.197][16344][modelmanager][debug][src/modelinstance.cpp:1028] Is model loaded from cache: false
[2025-04-14 14:42:17.197][16344][serving][debug][src/modelversionstatus.cpp:88] setAvailable: BAAI/bge-reranker-base_tokenizer_model - 1 (previous state: LOADING) -> error: OK
[2025-04-14 14:42:17.197][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_tokenizer_model status change. New status: ( "state": "AVAILABLE", "error_code": "OK" )
[2025-04-14 14:42:17.197][16344][serving][info][src/model.cpp:89] Updating default version for model: BAAI/bge-reranker-base_tokenizer_model, from: 0
[2025-04-14 14:42:17.197][16344][serving][info][src/model.cpp:99] Updated default version for model: BAAI/bge-reranker-base_tokenizer_model, to: 1
[2025-04-14 14:42:17.197][16344][serving][debug][src/mediapipe_internal/mediapipegraphconfig.cpp:109] graph_path not defined in config so it will be set to default based on base_path and graph name: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\graph.pbtxt
[2025-04-14 14:42:17.198][16344][serving][debug][src/mediapipe_internal/mediapipegraphconfig.cpp:118] No subconfig path was provided for graph: BAAI/bge-reranker-base_rerank_model so default subconfig file: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\subconfig.json will be loaded.
[2025-04-14 14:42:17.198][16344][modelmanager][debug][src/modelmanager.cpp:848] Graph.pbtxt not found for config BAAI/bge-reranker-base_rerank_model, ..\local_models_gpu\BAAI\bge-reranker-base\..\local_models_gpu\BAAI\bge-reranker-base\..\local_models_gpu\BAAI\bge-reranker-base\rerank\1\graph.pbtxt
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:634] Specified model parameters:
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:635] model_basepath: ..\local_models_gpu\BAAI\bge-reranker-base\rerank
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:636] model_name: BAAI/bge-reranker-base_rerank_model
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:637] batch_size: not configured
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:641] shape:
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:647] model_version_policy: latest: 1
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:649] nireq: 0
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:650] target_device: GPU
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:651] plugin_config:
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:653]   NUM_STREAMS: 1
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:659] Batch size set: false, shape set: false
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelconfig.cpp:666] stateful: false
[2025-04-14 14:42:17.198][16344][modelmanager][debug][src/ov_utils.cpp:101] Validating plugin: GPU; configuration
[2025-04-14 14:42:17.198][16344][serving][info][src/model.cpp:42] Getting model from ..\local_models_gpu\BAAI\bge-reranker-base\rerank
[2025-04-14 14:42:17.198][16344][serving][info][src/model.cpp:49] Model downloaded to ..\local_models_gpu\BAAI\bge-reranker-base\rerank
[2025-04-14 14:42:17.198][16344][serving][info][src/model.cpp:149] Will add model: BAAI/bge-reranker-base_rerank_model; version: 1 ...
[2025-04-14 14:42:17.198][16344][modelmanager][debug][src/modelconfig.cpp:421] Parsing model: BAAI/bge-reranker-base_rerank_model mapping from path: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\1
[2025-04-14 14:42:17.198][16344][serving][debug][src/model.cpp:123] Creating new model instance - model name: BAAI/bge-reranker-base_rerank_model; model version: 1;
[2025-04-14 14:42:17.198][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_rerank_model status change. New status: ( "state": "START", "error_code": "OK" )
[2025-04-14 14:42:17.198][16344][serving][info][src/modelinstance.cpp:1059] Loading model: BAAI/bge-reranker-base_rerank_model, version: 1, from path: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\1, with target device: GPU ...
[2025-04-14 14:42:17.198][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_rerank_model status change. New status: ( "state": "START", "error_code": "OK" )
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelversionstatus.cpp:81] setLoading: BAAI/bge-reranker-base_rerank_model - 1 (previous state: START) -> error: OK
[2025-04-14 14:42:17.198][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_rerank_model status change. New status: ( "state": "LOADING", "error_code": "OK" )
[2025-04-14 14:42:17.198][16344][serving][debug][src/modelinstance.cpp:901] Getting model files from path: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\1
[2025-04-14 14:42:17.199][16344][serving][debug][src/modelinstance.cpp:724] Try reading model file: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\1\model.xml
[2025-04-14 14:42:17.224][16344][modelmanager][debug][src/modelinstance.cpp:237] Applying layout configuration:
[2025-04-14 14:42:17.224][16344][modelmanager][debug][src/modelinstance.cpp:279] model: BAAI/bge-reranker-base_rerank_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); input name: input_ids
[2025-04-14 14:42:17.224][16344][modelmanager][debug][src/modelinstance.cpp:279] model: BAAI/bge-reranker-base_rerank_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); input name: attention_mask
[2025-04-14 14:42:17.224][16344][modelmanager][debug][src/modelinstance.cpp:332] model: BAAI/bge-reranker-base_rerank_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); output name: logits
[2025-04-14 14:42:17.236][16344][serving][debug][src/modelinstance.cpp:549] model: BAAI/bge-reranker-base_rerank_model, version: 1; reshaping inputs is not required
[2025-04-14 14:42:17.236][16344][modelmanager][debug][src/modelinstance.cpp:201] Reporting input layout from RTMap: [N,...]; for tensor name: input_ids
[2025-04-14 14:42:17.236][16344][modelmanager][info][src/modelinstance.cpp:593] Input name: input_ids; mapping_name: input_ids; shape: (-1,-1); precision: I64; layout: N...
[2025-04-14 14:42:17.237][16344][modelmanager][debug][src/modelinstance.cpp:201] Reporting input layout from RTMap: [N,...]; for tensor name: attention_mask
[2025-04-14 14:42:17.237][16344][modelmanager][info][src/modelinstance.cpp:593] Input name: attention_mask; mapping_name: attention_mask; shape: (-1,-1); precision: I64; layout: N...
[2025-04-14 14:42:17.237][16344][modelmanager][debug][src/modelinstance.cpp:212] Reporting output layout from RTMap: [N,...]; for tensor name: logits
[2025-04-14 14:42:17.237][16344][modelmanager][info][src/modelinstance.cpp:656] Output name: logits; mapping_name: logits; shape: (-1,1); precision: FP32; layout: N...
[2025-04-14 14:42:17.237][16344][modelmanager][error][src/modelinstance.cpp:847] Cannot compile model into target device; error: Exception from src\inference\src\cpp\core.cpp:112:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\inference\src\dev\plugin_config.cpp:95:
Invalid value: 1 for property: NUM_STREAMS
Property description: Number of streams to be used for inference


; model: BAAI/bge-reranker-base_rerank_model; version: 1; device: GPU
[2025-04-14 14:42:17.237][16344][serving][debug][src/modelversionstatus.cpp:81] setLoading: BAAI/bge-reranker-base_rerank_model - 1 (previous state: LOADING) -> error: UNKNOWN
[2025-04-14 14:42:17.237][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_rerank_model status change. New status: ( "state": "LOADING", "error_code": "UNKNOWN" )
[2025-04-14 14:42:17.237][16344][serving][error][src/model.cpp:157] Error occurred while loading model: BAAI/bge-reranker-base_rerank_model; version: 1; error: Cannot compile model into target device
[2025-04-14 14:42:17.237][16344][modelmanager][error][src/modelmanager.cpp:1595] Error occurred while loading model: BAAI/bge-reranker-base_rerank_model versions; error: Cannot compile model into target device
[2025-04-14 14:42:17.237][16344][modelmanager][debug][src/modelmanager.cpp:1694] Removing available version 1 due to load failure;
[2025-04-14 14:42:17.237][16344][serving][info][src/model.cpp:197] Will clean up model: BAAI/bge-reranker-base_rerank_model; version: 1 ...
[2025-04-14 14:42:17.237][16344][serving][info][src/model.cpp:89] Updating default version for model: BAAI/bge-reranker-base_rerank_model, from: 0
[2025-04-14 14:42:17.237][16344][serving][info][src/model.cpp:101] Model: BAAI/bge-reranker-base_rerank_model will not have default version since no version is available.
[2025-04-14 14:42:17.237][16344][serving][debug][src/modelversionstatus.cpp:81] setLoading: BAAI/bge-reranker-base_rerank_model - 1 (previous state: LOADING) -> error: UNKNOWN
[2025-04-14 14:42:17.237][16344][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_rerank_model status change. New status: ( "state": "LOADING", "error_code": "UNKNOWN" )
[2025-04-14 14:42:17.238][16344][modelmanager][debug][src/modelmanager.cpp:894] Cannot reload model: BAAI/bge-reranker-base_rerank_model with versions due to error: Cannot compile model into target device
[2025-04-14 14:42:17.238][16344][modelmanager][error][src/modelmanager.cpp:967] Loading Mediapipe BAAI/bge-reranker-base models from subconfig ..\local_models_gpu\BAAI\bge-reranker-base\subconfig.json failed.
[2025-04-14 14:42:17.238][16344][modelmanager][info][src/modelmanager.cpp:657] Configuration file doesn't have custom node libraries property.
[2025-04-14 14:42:17.238][16344][modelmanager][info][src/modelmanager.cpp:700] Configuration file doesn't have pipelines property.
[2025-04-14 14:42:17.238][16344][modelmanager][debug][src/modelmanager.cpp:488] Mediapipe graph:BAAI/bge-reranker-base was not loaded so far. Triggering load
[2025-04-14 14:42:17.238][16344][modelmanager][debug][src/mediapipe_internal/mediapipegraphdefinition.cpp:120] Started validation of mediapipe: BAAI/bge-reranker-base
[2025-04-14 14:42:17.239][16344][modelmanager][debug][src/mediapipe_internal/mediapipe_utils.cpp:81] setting input stream: input packet type: REQUEST from: REQUEST_PAYLOAD:input
[2025-04-14 14:42:17.239][16344][modelmanager][debug][src/mediapipe_internal/mediapipe_utils.cpp:81] setting output stream: output packet type: RESPONSE from: RESPONSE_PAYLOAD:output
[2025-04-14 14:42:17.239][16344][modelmanager][debug][src/mediapipe_internal/mediapipegraphdefinition.cpp:306] KServe for mediapipe graph: BAAI/bge-reranker-base; passing whole KFS request graph detected.
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20250414 14:42:17.238828 16344 openvinomodelserversessioncalculator.cc:119] OpenVINOModelServerSessionCalculator GetContract start
I20250414 14:42:17.238828 16344 openvinomodelserversessioncalculator.cc:127] OpenVINOModelServerSessionCalculator ovms log level setting: INFO
I20250414 14:42:17.238828 16344 openvinomodelserversessioncalculator.cc:128] OpenVINOModelServerSessionCalculator GetContract end
I20250414 14:42:17.238828 16344 openvinomodelserversessioncalculator.cc:119] OpenVINOModelServerSessionCalculator GetContract start
I20250414 14:42:17.238828 16344 openvinomodelserversessioncalculator.cc:127] OpenVINOModelServerSessionCalculator ovms log level setting: INFO
I20250414 14:42:17.238828 16344 openvinomodelserversessioncalculator.cc:128] OpenVINOModelServerSessionCalculator GetContract end
[2025-04-14 14:42:17.240][16344][serving][info][src/mediapipe_internal/mediapipegraphdefinition.cpp:419] MediapipeGraphDefinition initializing graph nodes
[2025-04-14 14:42:17.240][16344][modelmanager][debug][src/mediapipe_internal/mediapipegraphdefinition.cpp:176] Finished validation of mediapipe: BAAI/bge-reranker-base
[2025-04-14 14:42:17.240][16344][modelmanager][info][src/mediapipe_internal/mediapipegraphdefinition.cpp:177] Mediapipe: BAAI/bge-reranker-base inputs:
name: input; mapping: ; shape: (); precision: UNDEFINED; layout: ...
[2025-04-14 14:42:17.240][16344][modelmanager][info][src/mediapipe_internal/mediapipegraphdefinition.cpp:178] Mediapipe: BAAI/bge-reranker-base outputs:
name: output; mapping: ; shape: (); precision: UNDEFINED; layout: ...
[2025-04-14 14:42:17.241][16344][modelmanager][info][src/mediapipe_internal/mediapipegraphdefinition.cpp:179] Mediapipe: BAAI/bge-reranker-base kfs pass through: false
[2025-04-14 14:42:17.241][16344][modelmanager][debug][../dags/pipelinedefinitionstatus.hpp:51] Mediapipe: BAAI/bge-reranker-base state: BEGIN handling: ValidationPassedEvent:
[2025-04-14 14:42:17.241][16344][modelmanager][info][../dags/pipelinedefinitionstatus.hpp:60] Mediapipe: BAAI/bge-reranker-base state changed to: AVAILABLE after handling: ValidationPassedEvent:
[2025-04-14 14:42:17.241][16344][serving][info][src/servablemanagermodule.cpp:55] ServableManagerModule started
[2025-04-14 14:42:17.241][15496][modelmanager][info][src/modelmanager.cpp:1313] Started model manager thread
[2025-04-14 14:42:17.241][18188][modelmanager][info][src/modelmanager.cpp:1332] Started cleaner thread
[2025-04-14 14:42:18.254][15496][modelmanager][debug][src/modelmanager.cpp:1605] Reloading model versions
[2025-04-14 14:42:18.254][15496][serving][info][src/model.cpp:259] Will reload model: BAAI/bge-reranker-base_rerank_model; version: 1 ...
[2025-04-14 14:42:18.254][15496][serving][info][src/model.cpp:42] Getting model from ..\local_models_gpu\BAAI\bge-reranker-base\rerank
[2025-04-14 14:42:18.254][15496][serving][info][src/model.cpp:49] Model downloaded to ..\local_models_gpu\BAAI\bge-reranker-base\rerank
[2025-04-14 14:42:18.254][15496][modelmanager][debug][src/modelconfig.cpp:421] Parsing model: BAAI/bge-reranker-base_rerank_model mapping from path: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\1
[2025-04-14 14:42:18.255][15496][serving][debug][src/modelversionstatus.cpp:81] setLoading: BAAI/bge-reranker-base_rerank_model - 1 (previous state: LOADING) -> error: OK
[2025-04-14 14:42:18.255][15496][serving][info][src/modelversionstatus.cpp:113] STATUS CHANGE: Version 1 of model BAAI/bge-reranker-base_rerank_model status change. New status: ( "state": "LOADING", "error_code": "OK" )
[2025-04-14 14:42:18.255][15496][serving][debug][src/modelinstance.cpp:901] Getting model files from path: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\1
[2025-04-14 14:42:18.255][15496][serving][debug][src/modelinstance.cpp:724] Try reading model file: ..\local_models_gpu\BAAI\bge-reranker-base\rerank\1\model.xml
[2025-04-14 14:42:18.281][15496][modelmanager][debug][src/modelinstance.cpp:237] Applying layout configuration:
[2025-04-14 14:42:18.281][15496][modelmanager][debug][src/modelinstance.cpp:279] model: BAAI/bge-reranker-base_rerank_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); input name: input_ids
[2025-04-14 14:42:18.281][15496][modelmanager][debug][src/modelinstance.cpp:279] model: BAAI/bge-reranker-base_rerank_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); input name: attention_mask
[2025-04-14 14:42:18.281][15496][modelmanager][debug][src/modelinstance.cpp:332] model: BAAI/bge-reranker-base_rerank_model, version: 1; Configuring layout: Tensor Layout:; Network Layout:[N,...] (default); output name: logits
[2025-04-14 14:42:18.293][15496][serving][debug][src/modelinstance.cpp:549] model: BAAI/bge-reranker-base_rerank_model, version: 1; reshaping inputs is not required
[2025-04-14 14:42:18.293][15496][modelmanager][debug][src/modelinstance.cpp:201] Reporting input layout from RTMap: [N,...]; for tensor name: input_ids
[2025-04-14 14:42:18.293][15496][modelmanager][info][src/modelinstance.cpp:593] Input name: input_ids; mapping_name: input_ids; shape: (-1,-1); precision: I64; layout: N...
[2025-04-14 14:42:18.293][15496][modelmanager][debug][src/modelinstance.cpp:201] Reporting input layout from RTMap: [N,...]; for tensor name: attention_mask
[2025-04-14 14:42:18.293][15496][modelmanager][info][src/modelinstance.cpp:593] Input name: attention_mask; mapping_name: attention_mask; shape: (-1,-1); precision: I64; layout: N...
[2025-04-14 14:42:18.293][15496][modelmanager][debug][src/modelinstance.cpp:212] Reporting output layout from RTMap: [N,...]; for tensor name: logits
[2025-04-14 14:42:18.293][15496][modelmanager][info][src/modelinstance.cpp:656] Output name: logits; mapping_name: logits; shape: (-1,1); precision: FP32; layout: N...
[2025-04-14 14:42:18.294][15496][modelmanager][error][src/modelinstance.cpp:847] Cannot compile model into target device; error: Exception from src\inference\src\cpp\core.cpp:112:
Exception from src\inference\src\dev\plugin.cpp:53:
Exception from src\inference\src\dev\plugin_config.cpp:95:
Invalid value: 1 for property: NUM_STREAMS
Property description: Number of streams to be used for inference
......

@atobiszei
Copy link
Collaborator

atobiszei commented Apr 14, 2025

I confirm there is issue with NUM_STREAMS being present in plugin config for model 2025.1. I also have reproduction on iGPU with
on Intel(R) Core(TM) Ultra 7 165U, 1700 Mhz, 12 Core(s), 14 Logical Processor(s).

@dtrawins
Copy link
Collaborator

@Septa2112 The issue with NUM_STREAMS is now fixed on main and releases/2025/1 branches. There was corrected export_model.py script to generate proper format of the plugin parameters. Reexporting the rerank model is needed.
The issue with first invalid response was narrowed to MTL iGPU on Windows. Most likely the driver issue. There was no reproduction on LNL and BMG GPUs.

@Septa2112
Copy link
Author

Septa2112 commented Apr 15, 2025

@dtrawins Thanks for your answer. I will try the latest release and driver. Then close the issue if no error occurs.


After updating my Graphic Driver to the latest 32.0.101.6734 on i7-1185G7, the issue with first invalid response has not been resolved. It doesn't look like a driver issue.

Version information

  • target device: GPU
  • ovms: 2025.1
  • Graphic Driver: 32.0.101.6734
  • CPU: i7-1185G7
  • Windows version: Windows 11 Enterprise 23H2

@Septa2112 Septa2112 reopened this Apr 15, 2025
@atobiszei
Copy link
Collaborator

atobiszei commented Apr 24, 2025

Fix:
openvinotoolkit/openvino#30278

Will be available in 2025.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants