Skip to content

Alpine (musl) based haproxy ingress images performance issue #541

Open
@amorozkin

Description

@amorozkin

Could you please consider adding an option to use non-alpine based haproxy ingress images?

Alpine's PTHREAD implementaion has a drasitc CPU overhead - (internals/details can be found here https://stackoverflow.com/questions/73807754/how-one-pthread-waits-for-another-to-finish-via-futex-in-linux/73813907#73813907 )

Here are two strace statistics samples for the same load profile (25K RPS via 3 haproxy ingress pods) for the equal period of time (about 1 minute):
1. GLIBC based haproxy

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 47.55  147.946790          53   2787268    880506 recvfrom
 26.33   81.933249          88    929414           sendto
 16.81   52.295309          54    962217           epoll_ctl
  3.37   10.486387          51    203040           getpid
  1.48    4.597493          51     90048           clock_gettime
  1.41    4.380619          97     44924           epoll_wait
  0.64    2.003053          54     36497           getsockopt
  0.56    1.731618          97     17829           close
  0.51    1.582058          56     28118           setsockopt
  0.39    1.207813          66     18144      8945 accept4
  0.38    1.188416         116     10223     10223 connect
  0.29    0.903808          88     10223           socket
  0.18    0.548180          53     10223           fcntl
  0.10    0.299368          79      3785      1130 futex
  0.00    0.011658          60       193           timer_settime
  0.00    0.010546          54       193        30 rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00  311.126365               5152339    900834 total

2. MUSL based haproxy:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 68.24  412.454997          96   4259899    419280 futex
 10.00   60.440537         120    502107           madvise
  8.74   52.833292         111    472438    121948 recvfrom
  4.22   25.477060         166    152913           sendto
  2.80   16.921311         107    157293           getpid
  2.26   13.680361         109    125062           epoll_ctl
  1.38    8.351141         119     69682           writev
  0.54    3.254861         106     30535           clock_gettime
  0.37    2.255775         148     15187           epoll_pwait
  0.34    2.033282         178     11419           close
  0.31    1.844610         117     15724      5964 accept4
  0.25    1.530881         110     13850           setsockopt
  0.25    1.509742         107     14001           getsockopt
  0.08    0.466851         157      2966           munmap
  0.06    0.392208         170      2294      2294 connect
  0.06    0.378519         107      3505           mmap
  0.05    0.287839         125      2294           socket
  0.04    0.234976         102      2294           fcntl
  0.00    0.014530          94       154           timer_settime
  0.00    0.014262          92       154        15 rt_sigreturn
  0.00    0.006613         143        46        23 read
  0.00    0.003571         148        24           write
  0.00    0.003377         241        14           shutdown
------ ----------- ----------- --------- --------- ----------------
100.00  604.390596               5853855    549524 total

As you can see - the last one (MUSL based one) - 60+% of time spends on futex (FUTEX_WAKE_PRIVATE to be exact) system calls.
As a reuslt - more than twice higher CPU utilisation on the same load profile acommpaned by upstream's sessions number spikes:
image

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestinvestigationmore investigation needed on our side

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions