Interaction expansion causes major slowdown

Just came across this issue (thanks to @reifjulian for the prompt). 

The **TL;DR** version is that using interaction term expansion --- i.e. `f1*f2`, or even `f1:f2` --- in the FE slot causes a major slowdown. The latter is faster than the former, but still significantly slower than creating the interaction outside of the `felm()` call.

In the reprex below, I'm using an IV regression (adapted from the docs) since that's the use-case we've been troubleshooting. But I've tested a non-IV example and the effect is the same. From my limited testing, the relative disparities also appear to increase as the data get bigger.

PS. `felm()` documentation warns users not to use `*` expansion in the FE slot. But AFAIK this only applies in cases where both variables have not been specified as factors.
  
    library(lfe)
    #> Loading required package: Matrix
    library(broom)
    library(microbenchmark)

    ## Create the dataset (riffing off of the felm help documentation)

    set.seed(141556)
    n = 2e5
    # Covariates
    x1 = rnorm(n)
    x2 = rnorm(n)
    # Individuals and firms
    id = factor(sample(200,n,replace=TRUE))
    firm = factor(sample(130,n,replace=TRUE))
    # Effects for them
    id.eff = rnorm(nlevels(id))
    firm.eff = rnorm(nlevels(firm))
    # Left hand side
    u = rnorm(n)
    y = x1 + 0.5*x2 + id.eff[id] + firm.eff[firm] + u
    # Example with 'reverse causation' (IV regression)
    # Q and W are instrumented by x3 and the factor x4.
    x3 = rnorm(n)
    x4 = sample(12,n,replace=TRUE)
    Q = 0.3*x3 + x1 + 0.2*x2 + id.eff[id] + 0.3*log(x4) - 0.3*y + rnorm(n,sd=0.3)
    W = 0.7*x3 - 2*x1 + 0.1*x2 - 0.7*id.eff[id] + 0.8*cos(x4) - 0.2*y+ rnorm(n,sd=0.6)
    # Add them to the outcome variable
    y = y + Q + W

    ## Create interaction exansion outside of the felm call
    idfirm = factor(paste0(id, '_', firm))

    ## Temp functions for running the models (useful for microbenchmark)

    est1 = function() felm(y ~ x1 + x2 | id * firm | (Q|W ~ x3 + factor(x4)) | id)
    est2 = function() felm(y ~ x1 + x2 | id + firm + id:firm | (Q|W ~ x3 + factor(x4)) | id)
    est3 = function() felm(y ~ x1 + x2 | id + firm + idfirm | (Q|W ~ x3 + factor(x4)) | id)

    ## Benchmark

    microbenchmark(est1(), est2(), est3(), times = 1)
    #> Unit: milliseconds
    #>    expr       min        lq      mean    median        uq       max neval
    #>  est1() 15284.549 15284.549 15284.549 15284.549 15284.549 15284.549     1
    #>  est2()  8296.269  8296.269  8296.269  8296.269  8296.269  8296.269     1
    #>  est3()   682.677   682.677   682.677   682.677   682.677   682.677     1

Again, the first two cases with internal expansion (especially est1) are much slower than est3, which creates the interaction outside of the `felm()` call. 

And just to confirm that they're yielding the same output:

    all.equal(tidy(est1()), tidy(est2()))
    #> [1] TRUE
    all.equal(tidy(est1()), tidy(est3()))
    #> [1] TRUE

<sup>Created on 2020-10-05 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Interaction expansion causes major slowdown #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Interaction expansion causes major slowdown #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions