Skip to content

Implement FourierLayer for higher dimensions #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

pzimbrod
Copy link
Contributor

So far, FourierLayer only works in 1-D. This should be extended to more dimensions, ideally with minimal code repetition.

Replacing the standard implementation by a generated function can do this.

@codecov
Copy link

codecov bot commented Feb 16, 2022

Codecov Report

Merging #31 (8e69b17) into master (9b16e02) will decrease coverage by 2.20%.
The diff coverage is 35.71%.

❗ Current head 8e69b17 differs from pull request most recent head 2e9d4f0. Consider uploading reports for the commit 2e9d4f0 to get more accurate results

@@            Coverage Diff             @@
##           master      #31      +/-   ##
==========================================
- Coverage   41.09%   38.88%   -2.21%     
==========================================
  Files           6        8       +2     
  Lines          73      108      +35     
==========================================
+ Hits           30       42      +12     
- Misses         43       66      +23     
Impacted Files Coverage Δ
src/OperatorLearning.jl 100.00% <ø> (ø)
src/generated.jl 0.00% <0.00%> (ø)
src/FourierLayer.jl 43.75% <39.47%> (-1.42%) ⬇️
src/trainable.jl 100.00% <100.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@pzimbrod
Copy link
Contributor Author

Also, I ditched rfft for regular fft for sake of simplicity. But since this roughly doubles computation time and memory requirements, it should be changed later on.

@ChrisRackauckas
Copy link
Member

Also, I ditched rfft for regular fft for sake of simplicity. But since this roughly doubles computation time and memory requirements, it should be changed later on.

Can you capture this in an issue for a future performance improvement? Usually it's easier to remember these kinds of things when they are in the list.

@pzimbrod
Copy link
Contributor Author

@ChrisRackauckas It seems that codecov complains about the new generated function not being tested. I imagine this can be fixed by taking care of #26 / including tests that call the layer on some data, right?

Can you capture this in an issue for a future performance improvement? Usually it's easier to remember these kinds of things when they are in the list.

Will do.

@ChrisRackauckas
Copy link
Member

I imagine this can be fixed by taking care of #26 / including tests that call the layer on some data, right?

Maybe. @generated + coverage is a bit unreliable IIRC.

@pzimbrod
Copy link
Contributor Author

Training works fine on CPU and GPU with input randn(8,32,32,16,12) (3D Grid 32x32x16), but somehow memory requirements have shot up to a point where the burgers_FNO example runs out of memory on a 24GB RTX A5000 - which is a rather simple 1D case.

# examples/burgers_FNO.jl l.82
# Run on an RTX A5000 24GB
julia> train!(loss, parameters, train_loader, opt, cb = throttled_cb)
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.

[...] A whole lot more of those

ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
       Run Julia on debug level 2 for device stack traces.
ERROR: KernelException: exception thrown during kernel execution on device NVIDIA RTX A5000
Stacktrace:
  [1] check_exceptions()
    @ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/exceptions.jl:34
  [2] nonblocking_synchronize
    @ ~/.julia/packages/CUDA/t62lT/lib/cudadrv/context.jl:329 [inlined]
  [3] device_synchronize()
    @ CUDA ~/.julia/packages/CUDA/t62lT/lib/cudadrv/context.jl:317
  [4] CUDA.CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA ~/.julia/packages/CUDA/t62lT/lib/cudadrv/module.jl:41
  [5] CuModule
    @ ~/.julia/packages/CUDA/t62lT/lib/cudadrv/module.jl:23 [inlined]
  [6] cufunction_link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry, :external_gvars), Tuple{Vector{UInt8}, String, Vector{String}}})
    @ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:451
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/1Ajz2/src/cache.jl:95
  [8] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Float64, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(-), Tuple{Int64, Float64}}, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:297
  [9] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Float64, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(-), Tuple{Int64, Float64}}, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}}}, Int64}})
    @ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:291
 [10] macro expansion
    @ ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:102 [inlined]
 [11] #launch_heuristic#274
    @ ~/.julia/packages/CUDA/t62lT/src/gpuarrays.jl:17 [inlined]
 [12] copyto!
    @ ~/.julia/packages/GPUArrays/umZob/src/host/broadcast.jl:65 [inlined]
 [13] copyto!
    @ ./broadcast.jl:913 [inlined]
 [14] materialize!
    @ ./broadcast.jl:871 [inlined]
 [15] materialize!
    @ ./broadcast.jl:868 [inlined]
 [16] apply!(o::ADAM, x::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Δ::CUDA.CuArray{ComplexF32, 2, CUDA.Mem.DeviceBuffer})
    @ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/optimisers.jl:184
 [17] update!(opt::ADAM, x::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, x̄::CUDA.CuArray{ComplexF32, 2, CUDA.Mem.DeviceBuffer})
    @ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:26
 [18] update!(opt::ADAM, xs::Zygote.Params, gs::Zygote.Grads)
    @ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:32
 [19] macro expansion
    @ ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:112 [inlined]
 [20] macro expansion
    @ ~/.julia/packages/Juno/n6wyj/src/progress.jl:134 [inlined]
 [21] train!(loss::Function, ps::Zygote.Params, data::Flux.Data.DataLoader{Tuple{CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}}, Random._GLOBAL_RNG}, opt::ADAM; cb::Flux.var"#throttled#74"{Flux.var"#throttled#70#75"{Bool, Bool, typeof(evalcb), Int64}})
    @ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:107
 [22] top-level scope
    @ REPL[21]:1
 [23] top-level scope
    @ ~/.julia/packages/CUDA/t62lT/src/initialization.jl:52

So this will need some additional work.

@pzimbrod
Copy link
Contributor Author

pzimbrod commented Feb 18, 2022

It seems that using @ein within @generated messes up type inference since i𝔉, 𝔉 and 𝔏 all show type Any. Perhaps related to under-Peter/OMEinsum.jl/issues/92.

julia> model = FourierLayer(8,8,(32,32,16),(2,3,4),σ)
FourierLayer with
Convolution path: (8, 8, 32, 32, 16)
Linear path: (8, 8)
Fourier modes: (2, 3, 4)
Activation: σ

julia> x = randn(Float32, 8, 32, 32, 16, 2);

julia> model(x);

julia> @code_warntype model(x)
MethodInstance for (::FourierLayer{typeof(σ), Array{ComplexF32, 5}, Matrix{Float32}, Array{ComplexF32, 5}, Array{Float32, 5}})(::Array{Float32, 5})
  from (a::FourierLayer)(x::AbstractArray{T, N}) where {T, N} in OperatorLearning at /home/zimbropa/OperatorLearning.jl/src/FourierLayer.jl:124
Static Parameters
  T = Float32
  N = 5
Arguments
  a::FourierLayer{typeof(σ), Array{ComplexF32, 5}, Matrix{Float32}, Array{ComplexF32, 5}, Array{Float32, 5}}
  x::Array{Float32, 5}
Locals
  i𝔉 ::Any
  𝔉 ::Any
  𝔏 ::Any
  xₚ::Array{Float32, 5}
  σ::typeof(sigmoid_fast)
  bₗ::Array{Float32, 5}
  bᵩ::Array{ComplexF32, 5}
  Wₗ::Matrix{Float32}
  Wᵩ::Array{ComplexF32, 5}
Body::Any
1 ─       (Wᵩ = Base.getproperty(a, :Wf))
│         (Wₗ = Base.getproperty(a, :Wl))
│         (bᵩ = Base.getproperty(a, :bf))
│         (bₗ = Base.getproperty(a, :bl))
│   %5  = Base.getproperty(a, )::Core.Const(NNlib.σ)
│         (σ = OperatorLearning.fast_act(%5, x))
│   %7  = Core.tuple($(Expr(:static_parameter, 2)))::Core.Const((5,))
│   %8  = ($(Expr(:static_parameter, 2)) - 1)::Core.Const(4)
│   %9  = (1:%8)::Core.Const(1:4)
│   %10 = Base.Generator(Base.identity, %9)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:4))
│   %11 = Base.collect(%10)::Vector{Int64}%12 = Core._apply_iterate(Base.iterate, OperatorLearning.tuple, %7, %11)::Core.PartialStruct(Tuple{Int64, Vararg{Int64}}, Any[Core.Const(5), Vararg{Int64}])
│         (xₚ = OperatorLearning.permutedims(x, %12))
│   %14 = Core.tuple((1, 2), (3, 2, 4, 5, 6))::Core.Const(((1, 2), (3, 2, 4, 5, 6)))
│   %15 = OMEinsum.EinCode(%14, (3, 1, 4, 5, 6))::OMEinsum.DynamicEinCode{Int64}%16 = Core.tuple(Wₗ, xₚ)::Tuple{Matrix{Float32}, Array{Float32, 5}}
│         (𝔏 = OMEinsum.einsum(%15, %16))
│   %18 = Base.broadcasted(OperatorLearning.:+, 𝔏, bₗ) ::Any
│         Base.materialize(%18)
│   %20 = Core.tuple((1, 2, 3, 4, 5), (6, 1, 3, 4, 5))::Core.Const(((1, 2, 3, 4, 5), (6, 1, 3, 4, 5)))
│   %21 = OMEinsum.EinCode(%20, (6, 2, 3, 4, 5))::OMEinsum.DynamicEinCode{Int64}%22 = Wᵩ::Array{ComplexF32, 5}%23 = xₚ::Array{Float32, 5}%24 = (3:$(Expr(:static_parameter, 2)))::Core.Const(3:5)
│   %25 = Base.Generator(Base.identity, %24)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 3:5))
│   %26 = Base.collect(%25)::Vector{Int64}%27 = OperatorLearning.fft(%23, %26)::Array{ComplexF32, 5}%28 = Core.tuple(%22, %27)::Tuple{Array{ComplexF32, 5}, Array{ComplexF32, 5}}
│         (𝔉 = OMEinsum.einsum(%21, %28))
│   %30 = Base.broadcasted(OperatorLearning.:+, 𝔉,  bᵩ)::Any
│         Base.materialize(%30)
│   %32 = 𝔉 ::Any%33 = (3:$(Expr(:static_parameter, 2)))::Core.Const(3:5)
│   %34 = Base.Generator(Base.identity, %33)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 3:5))
│   %35 = Base.collect(%34)::Vector{Int64}
│         (i𝔉 = OperatorLearning.ifft(%32, %35))
│   %37 = σ::Core.Const(NNlib.sigmoid_fast)
│   %38 = 𝔏 ::Any%39 = OperatorLearning.real(i𝔉) ::Any%40 = (%38 + %39)::Any%41 = Base.broadcasted(%37, %40)::Any%42 = Base.materialize(%41)::Any%43 = (2:$(Expr(:static_parameter, 2)))::Core.Const(2:5)
│   %44 = Base.Generator(Base.identity, %43)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 2:5))
│   %45 = Base.collect(%44)::Vector{Int64}%46 = Core.tuple(1)::Core.Const((1,))
│   %47 = Core._apply_iterate(Base.iterate, OperatorLearning.tuple, %45, %46)::Tuple{Vararg{Int64}}%48 = OperatorLearning.permutedims(%42, %47)::Any
└──       return %48

Can't use Tullio.jl as an alternative since it won't work in generated functions at all - mcabbott/Tullio.jl#129

@pzimbrod pzimbrod mentioned this pull request Feb 21, 2022
@pzimbrod pzimbrod closed this Jun 24, 2022
@pzimbrod pzimbrod deleted the pzimbrod-FourierLayer-ND branch June 24, 2022 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants