You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RecFlex paper points out that embedding tables within a fused embedding bag collection can be heterogeneous, such as having different embedding dimensions or access patterns (e.g., one-hot vs. multi-hot). Applying the same code schedule to all tables in the fused kernel could lead to sub-optimal performance.
I’m wondering if there is any plan to support generating and compiling kernels at runtime, so that different tables can use different code schedules, for both inference and training?