High level description
lm
ignores contrasts
base setting for a particular categorical variable if this variable is interacted with some other regressor.
MWE
using DataFrames, CategoricalArrays, GLM
df = DataFrame(y = rand(100), x1 = categorical(rand(1:3, 100)), x2 = categorical(rand(1:3, 100)))
lm(@formula(y ~ x1), df; contrasts = Dict(:x1 => DummyCoding(base = 2)))
lm(@formula(y ~ x1&x2), df; contrasts = Dict(:x1 => DummyCoding(base = 2)))
The first regression does set x1=2
as the base level, as expected. However, the second regression just sets the interaction of the highest value for each each x1
and x2
as the base level. One could always re-normalize to the desired based level but shouldn't be an easy way to set a base level in the latter case too?