-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Just came across this issue (thanks to @reifjulian for the prompt).
The TL;DR version is that using interaction term expansion --- i.e. f1*f2
, or even f1:f2
--- in the FE slot causes a major slowdown. The latter is faster than the former, but still significantly slower than creating the interaction outside of the felm()
call.
In the reprex below, I'm using an IV regression (adapted from the docs) since that's the use-case we've been troubleshooting. But I've tested a non-IV example and the effect is the same. From my limited testing, the relative disparities also appear to increase as the data get bigger.
PS. felm()
documentation warns users not to use *
expansion in the FE slot. But AFAIK this only applies in cases where both variables have not been specified as factors.
library(lfe)
#> Loading required package: Matrix
library(broom)
library(microbenchmark)
## Create the dataset (riffing off of the felm help documentation)
set.seed(141556)
n = 2e5
# Covariates
x1 = rnorm(n)
x2 = rnorm(n)
# Individuals and firms
id = factor(sample(200,n,replace=TRUE))
firm = factor(sample(130,n,replace=TRUE))
# Effects for them
id.eff = rnorm(nlevels(id))
firm.eff = rnorm(nlevels(firm))
# Left hand side
u = rnorm(n)
y = x1 + 0.5*x2 + id.eff[id] + firm.eff[firm] + u
# Example with 'reverse causation' (IV regression)
# Q and W are instrumented by x3 and the factor x4.
x3 = rnorm(n)
x4 = sample(12,n,replace=TRUE)
Q = 0.3*x3 + x1 + 0.2*x2 + id.eff[id] + 0.3*log(x4) - 0.3*y + rnorm(n,sd=0.3)
W = 0.7*x3 - 2*x1 + 0.1*x2 - 0.7*id.eff[id] + 0.8*cos(x4) - 0.2*y+ rnorm(n,sd=0.6)
# Add them to the outcome variable
y = y + Q + W
## Create interaction exansion outside of the felm call
idfirm = factor(paste0(id, '_', firm))
## Temp functions for running the models (useful for microbenchmark)
est1 = function() felm(y ~ x1 + x2 | id * firm | (Q|W ~ x3 + factor(x4)) | id)
est2 = function() felm(y ~ x1 + x2 | id + firm + id:firm | (Q|W ~ x3 + factor(x4)) | id)
est3 = function() felm(y ~ x1 + x2 | id + firm + idfirm | (Q|W ~ x3 + factor(x4)) | id)
## Benchmark
microbenchmark(est1(), est2(), est3(), times = 1)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> est1() 15284.549 15284.549 15284.549 15284.549 15284.549 15284.549 1
#> est2() 8296.269 8296.269 8296.269 8296.269 8296.269 8296.269 1
#> est3() 682.677 682.677 682.677 682.677 682.677 682.677 1
Again, the first two cases with internal expansion (especially est1) are much slower than est3, which creates the interaction outside of the felm()
call.
And just to confirm that they're yielding the same output:
all.equal(tidy(est1()), tidy(est2()))
#> [1] TRUE
all.equal(tidy(est1()), tidy(est3()))
#> [1] TRUE
Created on 2020-10-05 by the reprex package (v0.3.0)