@@ -46,6 +46,53 @@ function huber_loss(ŷ, y; agg=mean, δ=ofeltype(ŷ, 1))
46
46
agg (((abs_error.^ 2 ) .* temp) .* x .+ δ* (abs_error .- x* δ) .* (1 .- temp))
47
47
end
48
48
49
+ """
50
+ label_smoothing(y::Union{Number, AbstractArray}, α; dims::Int=1)
51
+
52
+ Returns smoothed labels, meaning the confidence on label values are relaxed.
53
+
54
+ When `y` is given as one-hot vector or batch of one-hot, its calculated as
55
+
56
+ y .* (1 - α) .+ α / size(y, dims)
57
+
58
+ when `y` is given as a number or batch of numbers for binary classification,
59
+ its calculated as
60
+
61
+ y .* (1 - α) .+ α / 2
62
+
63
+ in which case the labels are squeezed towards `0.5`.
64
+
65
+ α is a number in interval (0, 1) called the smoothing factor. Higher the
66
+ value of α larger the smoothing of `y`.
67
+
68
+ `dims` denotes the one-hot dimension, unless `dims=0` which denotes the application
69
+ of label smoothing to binary distributions encoded in a single number.
70
+
71
+ Usage example:
72
+
73
+ sf = 0.1
74
+ y = onehotbatch([1, 1, 1, 0, 0], 0:1)
75
+ y_smoothed = label_smoothing(ya, 2sf)
76
+ y_sim = y .* (1-2sf) .+ sf
77
+ y_dis = copy(y_sim)
78
+ y_dis[1,:], y_dis[2,:] = y_dis[2,:], y_dis[1,:]
79
+ @assert crossentropy(y_sim, y) < crossentropy(y_sim, y_smoothed)
80
+ @assert crossentropy(y_dis, y) > crossentropy(y_dis, y_smoothed)
81
+ """
82
+ function label_smoothing (y:: Union{AbstractArray,Number} , α:: Number ; dims:: Int = 1 )
83
+ if ! (0 < α < 1 )
84
+ throw (ArgumentError (" α must be between 0 and 1" ))
85
+ end
86
+ if dims == 0
87
+ y_smoothed = y .* (1 - α) .+ α* 1 // 2
88
+ elseif dims == 1
89
+ y_smoothed = y .* (1 - α) .+ α* 1 // size (y, 1 )
90
+ else
91
+ throw (ArgumentError (" `dims` should be either 0 or 1" ))
92
+ end
93
+ return y_smoothed
94
+ end
95
+
49
96
"""
50
97
crossentropy(ŷ, y; dims=1, ϵ=eps(ŷ), agg=mean)
51
98
@@ -54,16 +101,20 @@ calculated as
54
101
55
102
agg(-sum(y .* log.(ŷ .+ ϵ); dims=dims))
56
103
57
- Cross entropy is tipically used as a loss in multi-class classification,
104
+ Cross entropy is typically used as a loss in multi-class classification,
58
105
in which case the labels `y` are given in a one-hot format.
59
106
`dims` specifies the dimension (or the dimensions) containing the class probabilities.
60
107
The prediction `ŷ` is supposed to sum to one across `dims`,
61
108
as would be the case with the output of a [`softmax`](@ref) operation.
62
109
110
+ Use [`label_smoothing`](@ref) to smooth the true labels as preprocessing before
111
+ computing the loss.
112
+
63
113
Use of [`logitcrossentropy`](@ref) is recomended over `crossentropy` for
64
114
numerical stability.
65
115
66
- See also: [`Flux.logitcrossentropy`](@ref), [`Flux.binarycrossentropy`](@ref), [`Flux.logitbinarycrossentropy`](@ref)
116
+ See also: [`logitcrossentropy`](@ref), [`binarycrossentropy`](@ref), [`logitbinarycrossentropy`](@ref),
117
+ [`label_smoothing`](@ref)
67
118
"""
68
119
function crossentropy (ŷ, y; dims= 1 , agg= mean, ϵ= epseltype (ŷ))
69
120
agg (.- sum (xlogy .(y, ŷ .+ ϵ); dims= dims))
72
123
"""
73
124
logitcrossentropy(ŷ, y; dims=1, agg=mean)
74
125
75
- Return the crossentropy computed after a [`Flux. logsoftmax`](@ref) operation;
126
+ Return the crossentropy computed after a [`logsoftmax`](@ref) operation;
76
127
calculated as
77
128
78
129
agg(.-sum(y .* logsoftmax(ŷ; dims=dims); dims=dims))
79
130
131
+ Use [`label_smoothing`](@ref) to smooth the true labels as preprocessing before
132
+ computing the loss.
133
+
80
134
`logitcrossentropy(ŷ, y)` is mathematically equivalent to
81
- [`Flux.Losses.crossentropy(softmax(ŷ), y)`](@ref) but it is more numerically stable.
135
+ [`crossentropy(softmax(ŷ), y)`](@ref) but it is more numerically stable.
136
+
82
137
83
- See also: [`Flux.Losses. crossentropy`](@ref), [`Flux.Losses. binarycrossentropy`](@ref), [`Flux.Losses. logitbinarycrossentropy`](@ref)
138
+ See also: [`crossentropy`](@ref), [`binarycrossentropy`](@ref), [`logitbinarycrossentropy`](@ref), [`label_smoothing `](@ref)
84
139
"""
85
140
function logitcrossentropy (ŷ, y; dims= 1 , agg= mean)
86
141
agg (.- sum (y .* logsoftmax (ŷ; dims= dims); dims= dims))
@@ -97,9 +152,13 @@ The `ϵ` term provides numerical stability.
97
152
98
153
Typically, the prediction `ŷ` is given by the output of a [`sigmoid`](@ref) activation.
99
154
155
+ Use [`label_smoothing`](@ref) to smooth the `y` value as preprocessing before
156
+ computing the loss.
157
+
100
158
Use of `logitbinarycrossentropy` is recomended over `binarycrossentropy` for numerical stability.
101
159
102
- See also: [`Flux.Losses.crossentropy`](@ref), [`Flux.Losses.logitcrossentropy`](@ref), [`Flux.Losses.logitbinarycrossentropy`](@ref)
160
+ See also: [`crossentropy`](@ref), [`logitcrossentropy`](@ref), [`logitbinarycrossentropy`](@ref),
161
+ [`label_smoothing`](@ref)
103
162
"""
104
163
function binarycrossentropy (ŷ, y; agg= mean, ϵ= epseltype (ŷ))
105
164
agg (@. (- xlogy (y, ŷ+ ϵ) - xlogy (1 - y, 1 - ŷ+ ϵ)))
@@ -111,10 +170,12 @@ end
111
170
logitbinarycrossentropy(ŷ, y; agg=mean)
112
171
113
172
Mathematically equivalent to
114
- [`Flux.binarycrossentropy(σ(ŷ), y)`](@ref) but is more numerically stable.
173
+ [`binarycrossentropy(σ(ŷ), y)`](@ref) but is more numerically stable.
174
+
175
+ Use [`label_smoothing`](@ref) to smooth the `y` value as preprocessing before
176
+ computing the loss.
115
177
116
- See also: [`Flux.Losses.crossentropy`](@ref), [`Flux.Losses.logitcrossentropy`](@ref), [`Flux.Losses.binarycrossentropy`](@ref)
117
- ```
178
+ See also: [`crossentropy`](@ref), [`logitcrossentropy`](@ref), [`binarycrossentropy`](@ref), [`label_smoothing`](@ref)
118
179
"""
119
180
function logitbinarycrossentropy (ŷ, y; agg= mean)
120
181
agg (@. ((1 - y)* ŷ - logσ (ŷ)))
0 commit comments