Skip to content

Interaction expansion causes major slowdown #35

@grantmcdermott

Description

@grantmcdermott

Just came across this issue (thanks to @reifjulian for the prompt).

The TL;DR version is that using interaction term expansion --- i.e. f1*f2, or even f1:f2 --- in the FE slot causes a major slowdown. The latter is faster than the former, but still significantly slower than creating the interaction outside of the felm() call.

In the reprex below, I'm using an IV regression (adapted from the docs) since that's the use-case we've been troubleshooting. But I've tested a non-IV example and the effect is the same. From my limited testing, the relative disparities also appear to increase as the data get bigger.

PS. felm() documentation warns users not to use * expansion in the FE slot. But AFAIK this only applies in cases where both variables have not been specified as factors.

library(lfe)
#> Loading required package: Matrix
library(broom)
library(microbenchmark)

## Create the dataset (riffing off of the felm help documentation)

set.seed(141556)
n = 2e5
# Covariates
x1 = rnorm(n)
x2 = rnorm(n)
# Individuals and firms
id = factor(sample(200,n,replace=TRUE))
firm = factor(sample(130,n,replace=TRUE))
# Effects for them
id.eff = rnorm(nlevels(id))
firm.eff = rnorm(nlevels(firm))
# Left hand side
u = rnorm(n)
y = x1 + 0.5*x2 + id.eff[id] + firm.eff[firm] + u
# Example with 'reverse causation' (IV regression)
# Q and W are instrumented by x3 and the factor x4.
x3 = rnorm(n)
x4 = sample(12,n,replace=TRUE)
Q = 0.3*x3 + x1 + 0.2*x2 + id.eff[id] + 0.3*log(x4) - 0.3*y + rnorm(n,sd=0.3)
W = 0.7*x3 - 2*x1 + 0.1*x2 - 0.7*id.eff[id] + 0.8*cos(x4) - 0.2*y+ rnorm(n,sd=0.6)
# Add them to the outcome variable
y = y + Q + W

## Create interaction exansion outside of the felm call
idfirm = factor(paste0(id, '_', firm))

## Temp functions for running the models (useful for microbenchmark)

est1 = function() felm(y ~ x1 + x2 | id * firm | (Q|W ~ x3 + factor(x4)) | id)
est2 = function() felm(y ~ x1 + x2 | id + firm + id:firm | (Q|W ~ x3 + factor(x4)) | id)
est3 = function() felm(y ~ x1 + x2 | id + firm + idfirm | (Q|W ~ x3 + factor(x4)) | id)

## Benchmark

microbenchmark(est1(), est2(), est3(), times = 1)
#> Unit: milliseconds
#>    expr       min        lq      mean    median        uq       max neval
#>  est1() 15284.549 15284.549 15284.549 15284.549 15284.549 15284.549     1
#>  est2()  8296.269  8296.269  8296.269  8296.269  8296.269  8296.269     1
#>  est3()   682.677   682.677   682.677   682.677   682.677   682.677     1

Again, the first two cases with internal expansion (especially est1) are much slower than est3, which creates the interaction outside of the felm() call.

And just to confirm that they're yielding the same output:

all.equal(tidy(est1()), tidy(est2()))
#> [1] TRUE
all.equal(tidy(est1()), tidy(est3()))
#> [1] TRUE

Created on 2020-10-05 by the reprex package (v0.3.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions