How about move Yue auto generated anonymous function to upvalue with name prefix '__' #162

GokuHiki · 2024-03-08T04:36:55Z

Yue have some nice features/syntax that make it a joy to work with like existence ?, nil coalescing ??, backcalls, destruct vargs... . But it also has big draw back that make it a bad choice for me to writing performance code with it because it auto create a new function every time a function called, this is a big performance issue. So I intent to avoid using it as much as possible.

It is a big shame that I can not use it when it very nice to have. So I intent to make a proposal to fix this issue.
We can move yue auto generated anonymous function to a upvalue named with prefix '__' in the same scope with parent function so that it with not generate a new function every time parent called, this generate a more better performance.

In many cases, we could send variables of closures as function parameters and get modified variable as function return results.
Well... let take the look at the example below to see what I mean.

buff_strength = (char, item) ->
 item.buffer.strength? char.stats.strength?::ref()

To default Lua:

local buff_strength
buff_strength = function(char, item)
  local _obj_0 = item.buffer.strength
  if _obj_0 ~= nil then
    return _obj_0((function()
      local _obj_1 = char.stats.strength
      if _obj_1 ~= nil then
        return _obj_1:ref()
      end
      return nil
    end)())
  end
  return nil
end

Maybe to Lua without inner function:

local __buff_strength__stub_0 = function(char)
  local _obj_1 = char.stats.strength
  if _obj_1 ~= nil then
    return _obj_1:ref()
  end
  return nil
end
local buff_strength
buff_strength = function(char, item)
  local _obj_0 = item.buffer.strength
  if _obj_0 ~= nil then
    return _obj_0(__buff_strength__stub_0(char))
  end
  return nil
end

Another more complex example:

exe_func = (func, env) ->
  ok, ... = try
    debug_env_before(env)
    func(env)
    debug_env_after(env)
  catch ex
    -- only access ex
    error ex
    return ex
  if ok
    return ...
  else
    os.exit(1)

To Lua with poor performance:

local exe_func
exe_func = function(func, env)
  return (function(_arg_0, ...)
    local ok = _arg_0
    if ok then
      return ...
    else
      return os.exit(1)
    end
  end)(xpcall(function()
    debug_env_before(env)
    func(env)
    return debug_env_after(env)
  end, function(ex)
    error(ex)
    return ex
  end))
end

To Lua with better performance:

local __exe_func__stub_0 = function(_arg_0, ...)
  local ok = _arg_0
  if ok then
    return ...
  else
    return os.exit(1)
  end
end
local __exe_func__stub_1 = function(env)
  debug_env_before(env)
  func(env)
  return debug_env_after(env)
end
local __exe_func__stub_2 = function(ex)
  error(ex)
  return ex
end
local exe_func
exe_func = function(func, env)
  return __exe_func__stub_0(xpcall(__exe_func__stub_1, __exe_func__stub_2, env))
end

I may make some mistake in the rush, but I hope you can get the idea.

Well... this solution is not work with all cases but aleast it will save a lot of performance in some cases. And I think it is worth to try.
Thanks and regards.

pigpigyyy · 2024-03-19T09:24:17Z

It is a brilliant idea! And it seems the TypescriptToLua compiler is doing similar optimizations too.
Just tried implementing this feature. And I encountered a few more issues during coding.

Since we can alter the environment table for a function block to access different global variables, we have to pass those accessed global variables to the upvalue functions. For example:

f = ->
  func if cond
    print 123
    true
  else
    false

is generating to:

local _anon_func_0 = function(cond, print)
  if cond then
    print(123)
    return true
  else
    return false
  end
end
local f
f = function()
   -- passing the accessed global variable "print" from the call site
  return func(_anon_func_0(cond, print))
end

When the formerly generated anonymous function contains codes that are creating new closures that are capturing local variables, we can no longer optimize them out. For example:

onEvent "start", ->
  -- the "with" expression below that generating anonymous function can be optimized
  gameScene\addChild with ScoreBoard!
    gameScore = 100
    \schedule (deltaTime) ->
      .updateScore gameScore

compiles to:

local _anon_func_0 = function(ScoreBoard)
  local _with_0 = ScoreBoard()
  local gameScore = 100
  _with_0:schedule(function(deltaTime)
    return _with_0.updateScore(gameScore)
  end)
  return _with_0
end
onEvent("start", function()
  return gameScene:addChild(_anon_func_0(ScoreBoard))
end)

But in another case:

onEvent "start", ->
  gameScore = 100
  -- the "with" expression below can not be optimized due to capturing the upvalue "gameScore"
  gameScene\addChild with ScoreBoard!
    \schedule (deltaTime) ->
      .updateScore gameScore

compiles to:

onEvent("start", function()
  local gameScore = 100
  return gameScene:addChild((function()
    local _with_0 = ScoreBoard()
    _with_0:schedule(function(deltaTime)
      return _with_0.updateScore(gameScore)
    end)
    return _with_0
  end)())
end)

The try expression case can only be compiled this way:

-- the case to optimize
exe_func = (func, env) ->
  ok, ... = try
    debug_env_before(env)
    func(env)
    debug_env_after(env)
  catch ex
    -- accessed both 'ex' and 'error'
    error ex
    return ex
  if ok
    return ...
  else
    os.exit(1)

compiles to:

local _anon_func_0 = function(os, _arg_0, ...)
  do
    local ok = _arg_0
    if ok then
      return ...
    else
      return os.exit(1)
    end
  end
end
local _anon_func_1 = function(debug_env_after, debug_env_before, env, func)
  do
    debug_env_before(env)
    func(env)
    return debug_env_after(env)
  end
end
local exe_func
exe_func = function(func, env)
  -- get no way to pass the global variable 'error'
  -- so we have to keep this anonymous callback function below
  return _anon_func_0(os, xpcall(_anon_func_1, function(ex)
    error(ex)
    return ex
  end, debug_env_after, debug_env_before, env, func))
end

pigpigyyy · 2024-03-19T09:33:43Z

The test cases can be found here.
https://github.com/pigpigyyy/Yuescript/blob/main/spec/inputs/upvalue_func.yue
https://github.com/pigpigyyy/Yuescript/blob/main/spec/outputs/upvalue_func.lua

GokuHiki · 2024-03-22T04:58:19Z

Yes! There is the case that Yue does not call the function itself, but if function require access and modify the closure, then we can not do anything about it.
Uhm... except in a very hacky way, by passing the variable into upper scope inside a holder-weak table. Well... haha, this has a lot of limitations, a very bad practice and, of course, is not acceptable!

SkyyySi · 2025-03-18T07:29:04Z

Maybe I missed something important, but I don't see this being anywhere near the problem it is presented as here.

I made a test script based on the code provided in the initial comment: https://gist.github.com/SkyyySi/dce94707e15c1f5c304285cf9c524abc

My results were that the outlined function, while slightly faster to be fair, didn't provide a significant difference:

~$ lua5.4 benchmark.lua
Without stub --> took 9.141s
With stub --> took 8.761s

~$ luajit benchmark.lua
Without stub --> took 0.318s
With stub --> took 0.295s

And that shouldn't really be a surprise because closures do not create a new function each time that they are evaluated! Lua code only gets compiled once before launching it. Subsequently, closures behave more like a struct, bundling a function pointer with an array of argument pointers. That's why it looks like a different object each time. But all you do is setting a pointer.

Of course, now that this optimization is already here, you may as well keep it, but please always test your assumptions properly before optimizing.

pigpigyyy · 2025-03-18T14:48:34Z

Thank you, @SkyyySi, for sharing your benchmarking code and insights. It provided a solid starting point for understanding the performance differences between Lua closures and functions. However, after testing your code, I noticed a few issues and wanted to share some refined benchmarking results for clarity.

Observations on Your Benchmark Code

xpcall Overhead:
In your first benchmark, most of the execution time is spent on xpcall error handling due to a nil func variable being passed. This skews the results, making it difficult to fairly evaluate the performance differences between closures and functions.
Second Benchmark Relevance:
While your second benchmark provides some performance comparisons, it doesn't directly address scenarios relevant to the YueScript optimization case. Specifically, it doesn't isolate or emphasize the overhead differences introduced by closures versus functions in a controlled and targeted context.

Revised Benchmark Code

To address these issues, I created updated benchmarks that focus more directly on the performance differences. Below are the two refined test cases:

Benchmark 1: Closure vs Function in a Controlled Context

-- benchmark.lua
local function benchmark(name, func, ...)
  io.write(name, " --> ")
  io.flush()
  collectgarbage()
  local time_start = os.clock()
  for i = 1, 100000 do
    func(...)
  end
  collectgarbage()
  local time_finish = os.clock()
  print(string.format(
    "took %.03fs",
    time_finish - time_start
  ))
end

local debug_env_before = function(env) end
local debug_env_after = function(env) end
local env_shared = {}
local func_shared = function(env)
  local result = 1
  for i = 1, 100 do
    result = result * i
  end
  return result
end

-- Using closure
do
  local exe_func
  exe_func = function(func, env)
    return (function(_arg_0, ...)
      local ok = _arg_0
      if ok then
        return ...
      else
        return --os.exit(1)
      end
    end)(xpcall(function()
      debug_env_before(env)
      func(env)
      return debug_env_after(env)
    end, function(ex)
      error(ex)
      return ex
    end))
  end
  benchmark("Using closure", exe_func, func_shared, env_shared)
end

-- Using function
do
  local __exe_func__stub_0 = function(_arg_0, ...)
    local ok = _arg_0
    if ok then
      return ...
    else
      return --os.exit(1)
    end
  end
  local __exe_func__stub_1 = function(func, env)
    debug_env_before(env)
    func(env)
    return debug_env_after(env)
  end
  local __exe_func__stub_2 = function(ex)
    error(ex)
    return ex
  end
  local exe_func
  exe_func = function(func, env)
    return __exe_func__stub_0(xpcall(__exe_func__stub_1, __exe_func__stub_2, func, env))
  end
  benchmark("Using function", exe_func, func_shared, env_shared)
end

-- Results:
-- With Lua 5.4:
-- Using closure --> took 0.118s
-- Using function --> took 0.075s
--
-- With LuaJIT:
-- Using closure --> took 0.040s
-- Using function --> took 0.018s

Benchmark 2: Simplified Closure vs Function Comparison

-- benchmark2.lua
local function benchmark(name, func)
  io.write(name, " --> ")
  io.flush()
  collectgarbage()
  local time_start = os.clock()
  for i = 1, 100000 do
    func()
  end
  collectgarbage()
  local time_finish = os.clock()
  print(string.format(
    "took %.03fs",
    time_finish - time_start
  ))
end

local function using_closure()
  local result = 1
  for i = 1, 100 do
    result = (function()
      return result * i
    end)()
  end
  return result
end

local operation = function(acc, i)
  return acc * i
end
local function using_function()
  local result = 1
  for i = 1, 100 do
    result = operation(result, i)
  end
  return result
end

benchmark("Using closure", using_closure)
benchmark("Using function", using_function)

-- Results:
-- With Lua 5.4:
-- Using closure --> took 1.722s
-- Using function --> took 0.249s
--
-- With LuaJIT:
-- Using closure --> took 0.828s
-- Using function --> took 0.011s

Key Insights

Performance Differences:
- In both Lua 5.4 and LuaJIT, functions consistently outperform closures in these benchmarks.
- The performance gap is more pronounced in the second benchmark, where closures introduce additional overhead compared to functions.
Closures Behavior:
While closures in Lua behave more like a "struct" (combining a function pointer with an array of argument pointers), they are not free in terms of performance. They introduce overhead that can accumulate, especially in performance-critical code.
Optimization Context:
The YueScript optimization is valid and worthwhile if targeting performance-critical scenarios, especially when closures are used repeatedly in tight loops. However, as you pointed out, it's essential to test assumptions with targeted benchmarks before optimizing.

Conclusion

Thank you again for sharing your perspective! I hope this reply clarifies the differences and provides a more precise comparison. While the optimization may not always yield significant gains, it does have merit in specific contexts, particularly for performance-sensitive applications. Let me know if you have further questions or thoughts!

GokuHiki · 2025-03-18T16:43:46Z

Well... This is my problem, then let me share my view:

The performance gain you get from remove the closures is not that significant in most cases. However, the performance loss from closures can be significant depending on your code context, especially in my case.
Yuescript tend to automatic create closures, which is not a problem in most cases, but in certain situations, it can lead to performance loss in real-time applications with in-game update loops or high-frequency function calls if you not careful. So the less closures you use, the safer it is for your performance in the long run.
As my past experience, the problem comes from closures that I encountered that mostly related to memory GC pikes as I use custom embedded Lua in game engine. The app performance become more laggy more long time it runs with closures in game loop.

My Experience:

I avoid closures in tight loops, in-game update loops, or high-frequency function calls at all costs.
Closures while more syntactically appealing, but tend to intrude more hidden costs as code base grows more and more complex.
Closures come with it own problems: memory leaks, GC spikes, etc. It is universally accepted that closures are more expensive than non-closures in all programming languages.
In game update loops, critical performance function, or high-frequency function calls, if I use closures at there places, the code will has a hard time to pass the code-review phase.

But, in many case, the closures can not be avoided when Yuescript compile to Lua, so you have to deal with it as well. So let's see what the closure really is in Lua.

About Statement:
"Closures do not create a new function each time that they are evaluated"

As I know:
Lua DOES create a new closure instance every time a closure-creating code path is executed. While the bytecode for the function body is compiled only once (during initial code loading), each closure evaluation creates a new object containing:

Reference to the precompiled bytecode
Unique storage for upvalues (closed-over variables)

Accurate Parts:

"Lua code only gets compiled once" - Correct (bytecode generation happens once)
"Bundling a function pointer with argument pointers" - Partially correct (it's more about upvalues than arguments)

Better Analogy:
Closures behave like objects containing:

A method (shared bytecode)
Instance variables (unique upvalues per closure instance)

Closure process only instantiates, Lua must:

Function prototype reference
Create new closure instance as new 'function' type object
Initialize upvalue references, allocate memory for upvalue storage
Initialize references to closed-over variables
Manage upvalue lifecycle (Lua uses "upvalue joining" for efficiency)
...(more and more, but not necessary to use to know as Lua handle it automatically very well)
Cleanup when no longer needed, garbage collected

About Impact Performance:

Lua create new instance of closure very very fast, but not free cost, and more expensive than a simple upvalue function call.
Memory allocation: Biggest cost comes from new object allocation (which is still fast in Lua)
⏱️ Cost scales with the number of upvalues and nested closures.
Massive performance boost by avoiding closures in tight loops or high-frequency function calls, or using parameterized functions instead.
In-game loop or high-frequency function calls with closures can lead to GC spikes due to allocation gc overhead.

Example:

COUNT = 1
COUNT_STEP = 100000

benchmark = (name, func, ...) ->
  name = name or "<anonymous>"

  -- io.write(name, " --> ")
  -- io.flush()

  time_start = os.clock()

  for _ = 1, COUNT
    func(...)
  
  time_finish = os.clock()
  time_process = time_finish - time_start
  print(string.format(
    "%s --> took %.03fs",
    name,
    time_process
  ))
  return time_process


func_has_closure = () ->
  -- Yue automatic create closures
  for i = 1, COUNT_STEP
    x, y, z = 1, 2, 3
    local res
    try
      res = x + y + z + i
    assert(res == x + y + z + i)

func_no_closure = () ->
  -- Yue automatic create upvalue function
  for i = 1, COUNT_STEP
    x, y, z = 1, 2, 3
    _, res = try
      x + y + z + i
    assert(res == x + y + z + i)

time1 = benchmark("has_closure", func_has_closure)
time2 = benchmark("no_closure", func_no_closure)
print("VS: #{time1 / time2} time.")

-- Result with Unity3D+xlua
-- LUA: has_closure --> took 0.086s
-- LUA: no_closure --> took 0.016s
-- LUA: VS: 5.375 time.

Conclusion:

The impact of closures on performance is not that significant but still measurable; and it will depend on context.
In single benchmark, it not really matter much, but in my case with real time game loop, the performance will be significant. As I use xlua, it became more and more lags when run long time with closures in game loop. I think it because how xlua manage memory as I need to call GC from C# manual.
About vanilla Lua or LuaJIT, I don't think it will matter at all.
As a gamedev, the more performance gain I get with the minimal effort, the better. As people usually say, "If it ain't broke, don't fix it". But in my case, "Mosquito's meat is still meat".

pigpigyyy added a commit that referenced this issue Mar 18, 2024

try fixing issue #162.

afc8661

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How about move Yue auto generated anonymous function to upvalue with name prefix '__' #162

How about move Yue auto generated anonymous function to upvalue with name prefix '__' #162

GokuHiki commented Mar 8, 2024

pigpigyyy commented Mar 19, 2024 •

edited

Loading

Uh oh!

pigpigyyy commented Mar 19, 2024

Uh oh!

GokuHiki commented Mar 22, 2024 •

edited

Loading

Uh oh!

SkyyySi commented Mar 18, 2025

Uh oh!

pigpigyyy commented Mar 18, 2025

Uh oh!

GokuHiki commented Mar 18, 2025

Uh oh!

How about move Yue auto generated anonymous function to upvalue with name prefix '__' #162

How about move Yue auto generated anonymous function to upvalue with name prefix '__' #162

Comments

GokuHiki commented Mar 8, 2024

pigpigyyy commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pigpigyyy commented Mar 19, 2024

Uh oh!

GokuHiki commented Mar 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SkyyySi commented Mar 18, 2025

Uh oh!

pigpigyyy commented Mar 18, 2025

Observations on Your Benchmark Code

Revised Benchmark Code

Benchmark 1: Closure vs Function in a Controlled Context

Benchmark 2: Simplified Closure vs Function Comparison

Key Insights

Conclusion

Uh oh!

GokuHiki commented Mar 18, 2025

Uh oh!

pigpigyyy commented Mar 19, 2024 •

edited

Loading

GokuHiki commented Mar 22, 2024 •

edited

Loading