Embeddings through ReLU an unconventional decision

This is not necessarily wrong, but I want to point out that using a ReLU [here](https://github.com/emalgorithm/structured-neural-summarization-replication/blob/b5b1883e2fbf32c65ac9d5813c6b260811c294cd/models/lstm_decoder.py#L18) is not a very common choice as far as I know. This might not hurt anything, but if it does, this could be a thing to check.

Also: you can tie the input/output embedding matrices (ie. use a single parameter instead of `self.embedding.weight` and `self.out`). This will reduce your vocabulary size by half and might help a bit with overfitting. Note that you would still need the bias which is included in the `self.out` layer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Embeddings through ReLU an unconventional decision #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Embeddings through ReLU an unconventional decision #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions