-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Current Behavior
When running APISIX 3.13.0 in file-driven standalone mode (deployment.role=data_plane, config_provider=yaml), the /status/ready
health check endpoint always returns HTTP 503 with error "worker id: X has not received configuration", despite:
- Routes working correctly
- Configuration being successfully loaded from apisix.yaml
- All workers functioning normally
Example error response:
{"error":"worker id: 0 has not received configuration","status":"error"}
Expected Behavior
The /status/ready
endpoint should return HTTP 200 with {"status":"ok"}
when all workers have successfully loaded the configuration from the YAML file.
Error Logs
2025/01/10 00:41:47 [warn] 33#33: *3 [lua] init.lua:1003: status_ready(): worker id: 0 has not received configuration, context: ngx.timer
Steps to Reproduce
- Configure APISIX in file-driven standalone mode:
# config.yaml
deployment:
role: data_plane
role_data_plane:
config_provider: yaml
apisix:
enable_admin: false
- Create a valid apisix.yaml with routes
- Start APISIX
- Test the health check endpoint:
curl http://127.0.0.1:7085/status/ready
- Observe HTTP 503 error despite routes working correctly
Environment
- APISIX version: 3.13.0
- Operating System: Docker (apache/apisix:3.13.0-debian)
- OpenResty / Nginx version: From official image
- Deployment mode: data_plane with yaml config_provider
Root Cause Analysis (UPDATED)
After extensive debugging with added logging, I've identified the actual root cause. The issue occurs when the configuration file is rendered before APISIX starts (common in container environments):
Timing Issue:
- Configuration file (
apisix.yaml
) is created by an entrypoint script before APISIX starts - Master process reads the file during startup, setting
apisix_yaml_mtime
global variable - Workers initialize and call
sync_status_to_shdict(false)
marking themselves as unhealthy - Workers create timers that call
read_apisix_config()
every second - Critical bug:
read_apisix_config()
checks if file mtime has changed:if apisix_yaml_mtime == last_modification_time then return -- File hasn't changed, return early end
- Because the file was rendered before startup, the mtime never changes
update_config()
is never called by workers- Workers remain marked as unhealthy forever
/status/ready
endpoint fails perpetually
Debug Evidence:
Adding logging to config_yaml.lua
confirmed:
update_config()
is only called once by the master process (PID 1) during startup- Master's call to
sync_status_to_shdict(true)
does nothing because it checksif process.type() ~= "worker" then return end
- All 12 workers successfully create timers
- Timers fire every second but return early due to unchanged mtime
- Workers never call
update_config()
, thus never callsync_status_to_shdict(true)
Relevant Code
apisix/core/config_yaml.lua - Lines ~565-585:
function _M.init_worker()
sync_status_to_shdict(false) -- Mark worker as unhealthy
if is_use_admin_api() then
apisix_yaml = {}
apisix_yaml_mtime = 0
return true
end
-- sync data in each non-master process
ngx.timer.every(1, read_apisix_config) -- Timer created but never calls update_config
return true
end
apisix/core/config_yaml.lua - Lines ~150-165:
local function read_apisix_config(premature, pre_mtime)
if premature then
return
end
local attributes, err = lfs.attributes(config_file.path)
if not attributes then
log.error("failed to fetch ", config_file.path, " attributes: ", err)
return
end
local last_modification_time = attributes.modification
if apisix_yaml_mtime == last_modification_time then
return -- BUG: Returns early, never calls update_config()
end
-- This code is never reached if file hasn't changed since startup
local config_new, err = config_file:parse()
if err then
log.error("failed to parse the content of file ", config_file.path, ": ", err)
return
end
update_config(config_new, last_modification_time)
log.warn("config file ", config_file.path, " reloaded.")
end
apisix/core/config_yaml.lua - Lines ~136-148:
local function sync_status_to_shdict(status)
if process.type() ~= "worker" then
return -- Master process calls are ignored
end
local dict_name = "status-report"
local key = worker_id()
local shdict = ngx.shared[dict_name]
local _, err = shdict:set(key, status)
if err then
log.error("failed to ", status and "set" or "clear",
" shdict " .. dict_name .. ", key=" .. key, ", err: ", err)
end
end
Proposed Solution
In init_worker()
, immediately call update_config()
after creating the timer to mark the worker as healthy:
function _M.init_worker()
sync_status_to_shdict(false)
if is_use_admin_api() then
apisix_yaml = {}
apisix_yaml_mtime = 0
return true
end
-- sync data in each non-master process
ngx.timer.every(1, read_apisix_config)
-- FIX: Mark worker as healthy immediately if config already loaded
if apisix_yaml then
update_config(apisix_yaml, apisix_yaml_mtime)
end
return true
end
This ensures workers are marked healthy on initialization, before the timer even fires. The timer will still update configuration when the file changes.
Verified Fix
I patched the code in a running container and confirmed:
- All 12 workers call
update_config()
ininit_worker_by_lua*
context /status/ready
returns{"status":"ok"}
with HTTP 200- Docker health check passes (container shows "healthy" status)
- Routes continue working correctly
Impact
This bug affects production deployments using:
- Kubernetes readiness probes with file-driven standalone mode
- Docker health checks
- Load balancers that depend on
/status/ready
endpoint - Any container orchestration that renders config files before starting APISIX
The health check always fails, preventing proper deployment orchestration, even though APISIX is functioning correctly and serving traffic.
Additional Context
The bug is specific to the timing of when the configuration file is created relative to APISIX startup. If the file is created and never modified, workers never get marked as healthy. This is a common pattern in containerized deployments where entrypoint scripts render configuration from environment variables before starting the main process.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status