-
-
Notifications
You must be signed in to change notification settings - Fork 8
Implement FourierLayer for higher dimensions #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #31 +/- ##
==========================================
- Coverage 41.09% 38.88% -2.21%
==========================================
Files 6 8 +2
Lines 73 108 +35
==========================================
+ Hits 30 42 +12
- Misses 43 66 +23
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
Also, I ditched |
Can you capture this in an issue for a future performance improvement? Usually it's easier to remember these kinds of things when they are in the list. |
@ChrisRackauckas It seems that codecov complains about the new generated function not being tested. I imagine this can be fixed by taking care of #26 / including tests that call the layer on some data, right?
Will do. |
Maybe. |
Training works fine on CPU and GPU with input # examples/burgers_FNO.jl l.82
# Run on an RTX A5000 24GB
julia> train!(loss, parameters, train_loader, opt, cb = throttled_cb)
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
[...] A whole lot more of those
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2 for device stack traces.
ERROR: KernelException: exception thrown during kernel execution on device NVIDIA RTX A5000
Stacktrace:
[1] check_exceptions()
@ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/exceptions.jl:34
[2] nonblocking_synchronize
@ ~/.julia/packages/CUDA/t62lT/lib/cudadrv/context.jl:329 [inlined]
[3] device_synchronize()
@ CUDA ~/.julia/packages/CUDA/t62lT/lib/cudadrv/context.jl:317
[4] CUDA.CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
@ CUDA ~/.julia/packages/CUDA/t62lT/lib/cudadrv/module.jl:41
[5] CuModule
@ ~/.julia/packages/CUDA/t62lT/lib/cudadrv/module.jl:23 [inlined]
[6] cufunction_link(job::GPUCompiler.CompilerJob, compiled::NamedTuple{(:image, :entry, :external_gvars), Tuple{Vector{UInt8}, String, Vector{String}}})
@ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:451
[7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/1Ajz2/src/cache.jl:95
[8] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Float64, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(-), Tuple{Int64, Float64}}, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}}}, Int64}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:297
[9] cufunction(f::GPUArrays.var"#broadcast_kernel#17", tt::Type{Tuple{CUDA.CuKernelContext, CUDA.CuDeviceMatrix{Float32, 1}, Base.Broadcast.Broadcasted{Nothing, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, typeof(+), Tuple{Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Float64, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{Float32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0}, Nothing, typeof(-), Tuple{Int64, Float64}}, Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{2}, Nothing, typeof(conj), Tuple{Base.Broadcast.Extruded{CUDA.CuDeviceMatrix{ComplexF32, 1}, Tuple{Bool, Bool}, Tuple{Int64, Int64}}}}}}}}, Int64}})
@ CUDA ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:291
[10] macro expansion
@ ~/.julia/packages/CUDA/t62lT/src/compiler/execution.jl:102 [inlined]
[11] #launch_heuristic#274
@ ~/.julia/packages/CUDA/t62lT/src/gpuarrays.jl:17 [inlined]
[12] copyto!
@ ~/.julia/packages/GPUArrays/umZob/src/host/broadcast.jl:65 [inlined]
[13] copyto!
@ ./broadcast.jl:913 [inlined]
[14] materialize!
@ ./broadcast.jl:871 [inlined]
[15] materialize!
@ ./broadcast.jl:868 [inlined]
[16] apply!(o::ADAM, x::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Δ::CUDA.CuArray{ComplexF32, 2, CUDA.Mem.DeviceBuffer})
@ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/optimisers.jl:184
[17] update!(opt::ADAM, x::CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, x̄::CUDA.CuArray{ComplexF32, 2, CUDA.Mem.DeviceBuffer})
@ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:26
[18] update!(opt::ADAM, xs::Zygote.Params, gs::Zygote.Grads)
@ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:32
[19] macro expansion
@ ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:112 [inlined]
[20] macro expansion
@ ~/.julia/packages/Juno/n6wyj/src/progress.jl:134 [inlined]
[21] train!(loss::Function, ps::Zygote.Params, data::Flux.Data.DataLoader{Tuple{CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{Float32, 3, CUDA.Mem.DeviceBuffer}}, Random._GLOBAL_RNG}, opt::ADAM; cb::Flux.var"#throttled#74"{Flux.var"#throttled#70#75"{Bool, Bool, typeof(evalcb), Int64}})
@ Flux.Optimise ~/.julia/packages/Flux/qAdFM/src/optimise/train.jl:107
[22] top-level scope
@ REPL[21]:1
[23] top-level scope
@ ~/.julia/packages/CUDA/t62lT/src/initialization.jl:52 So this will need some additional work. |
It seems that using julia> model = FourierLayer(8,8,(32,32,16),(2,3,4),σ)
FourierLayer with
Convolution path: (8, 8, 32, 32, 16)
Linear path: (8, 8)
Fourier modes: (2, 3, 4)
Activation: σ
julia> x = randn(Float32, 8, 32, 32, 16, 2);
julia> model(x);
julia> @code_warntype model(x)
MethodInstance for (::FourierLayer{typeof(σ), Array{ComplexF32, 5}, Matrix{Float32}, Array{ComplexF32, 5}, Array{Float32, 5}})(::Array{Float32, 5})
from (a::FourierLayer)(x::AbstractArray{T, N}) where {T, N} in OperatorLearning at /home/zimbropa/OperatorLearning.jl/src/FourierLayer.jl:124
Static Parameters
T = Float32
N = 5
Arguments
a::FourierLayer{typeof(σ), Array{ComplexF32, 5}, Matrix{Float32}, Array{ComplexF32, 5}, Array{Float32, 5}}
x::Array{Float32, 5}
Locals
i𝔉 ::Any
𝔉 ::Any
𝔏 ::Any
xₚ::Array{Float32, 5}
σ::typeof(sigmoid_fast)
bₗ::Array{Float32, 5}
bᵩ::Array{ComplexF32, 5}
Wₗ::Matrix{Float32}
Wᵩ::Array{ComplexF32, 5}
Body::Any
1 ─ (Wᵩ = Base.getproperty(a, :Wf))
│ (Wₗ = Base.getproperty(a, :Wl))
│ (bᵩ = Base.getproperty(a, :bf))
│ (bₗ = Base.getproperty(a, :bl))
│ %5 = Base.getproperty(a, :σ)::Core.Const(NNlib.σ)
│ (σ = OperatorLearning.fast_act(%5, x))
│ %7 = Core.tuple($(Expr(:static_parameter, 2)))::Core.Const((5,))
│ %8 = ($(Expr(:static_parameter, 2)) - 1)::Core.Const(4)
│ %9 = (1:%8)::Core.Const(1:4)
│ %10 = Base.Generator(Base.identity, %9)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:4))
│ %11 = Base.collect(%10)::Vector{Int64}
│ %12 = Core._apply_iterate(Base.iterate, OperatorLearning.tuple, %7, %11)::Core.PartialStruct(Tuple{Int64, Vararg{Int64}}, Any[Core.Const(5), Vararg{Int64}])
│ (xₚ = OperatorLearning.permutedims(x, %12))
│ %14 = Core.tuple((1, 2), (3, 2, 4, 5, 6))::Core.Const(((1, 2), (3, 2, 4, 5, 6)))
│ %15 = OMEinsum.EinCode(%14, (3, 1, 4, 5, 6))::OMEinsum.DynamicEinCode{Int64}
│ %16 = Core.tuple(Wₗ, xₚ)::Tuple{Matrix{Float32}, Array{Float32, 5}}
│ (𝔏 = OMEinsum.einsum(%15, %16))
│ %18 = Base.broadcasted(OperatorLearning.:+, 𝔏, bₗ) ::Any
│ Base.materialize(%18)
│ %20 = Core.tuple((1, 2, 3, 4, 5), (6, 1, 3, 4, 5))::Core.Const(((1, 2, 3, 4, 5), (6, 1, 3, 4, 5)))
│ %21 = OMEinsum.EinCode(%20, (6, 2, 3, 4, 5))::OMEinsum.DynamicEinCode{Int64}
│ %22 = Wᵩ::Array{ComplexF32, 5}
│ %23 = xₚ::Array{Float32, 5}
│ %24 = (3:$(Expr(:static_parameter, 2)))::Core.Const(3:5)
│ %25 = Base.Generator(Base.identity, %24)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 3:5))
│ %26 = Base.collect(%25)::Vector{Int64}
│ %27 = OperatorLearning.fft(%23, %26)::Array{ComplexF32, 5}
│ %28 = Core.tuple(%22, %27)::Tuple{Array{ComplexF32, 5}, Array{ComplexF32, 5}}
│ (𝔉 = OMEinsum.einsum(%21, %28))
│ %30 = Base.broadcasted(OperatorLearning.:+, 𝔉, bᵩ)::Any
│ Base.materialize(%30)
│ %32 = 𝔉 ::Any
│ %33 = (3:$(Expr(:static_parameter, 2)))::Core.Const(3:5)
│ %34 = Base.Generator(Base.identity, %33)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 3:5))
│ %35 = Base.collect(%34)::Vector{Int64}
│ (i𝔉 = OperatorLearning.ifft(%32, %35))
│ %37 = σ::Core.Const(NNlib.sigmoid_fast)
│ %38 = 𝔏 ::Any
│ %39 = OperatorLearning.real(i𝔉) ::Any
│ %40 = (%38 + %39)::Any
│ %41 = Base.broadcasted(%37, %40)::Any
│ %42 = Base.materialize(%41)::Any
│ %43 = (2:$(Expr(:static_parameter, 2)))::Core.Const(2:5)
│ %44 = Base.Generator(Base.identity, %43)::Core.Const(Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 2:5))
│ %45 = Base.collect(%44)::Vector{Int64}
│ %46 = Core.tuple(1)::Core.Const((1,))
│ %47 = Core._apply_iterate(Base.iterate, OperatorLearning.tuple, %45, %46)::Tuple{Vararg{Int64}}
│ %48 = OperatorLearning.permutedims(%42, %47)::Any
└── return %48 Can't use |
So far,
FourierLayer
only works in 1-D. This should be extended to more dimensions, ideally with minimal code repetition.Replacing the standard implementation by a generated function can do this.