self.q_proj = nn.Linear(dim, all_head_dim, bias=False)
self.q_bias = nn.Parameter(torch.zeros(all_head_dim))
...
q = F.linear(input=x, weight=self.q_proj.weight, bias=self.q_bias)
self.q_proj = nn.Linear(dim, all_head_dim, bias=True )
...
q = self.q_proj(x)
Hello,
Thank you for the code!!
I have a trivial doubt on why linear layer MLP layer computed in an indirect way. For example,
Why not directly do