Would the transformer architecture — the backbone of large language models (LLMs) like ChatGPT — still work if we replaced its neural networks with quantum unitary operations?
The answer is yes!
This repository contains two Quantum Transformer (QT) models in separate Jupyter notebooks that show how it works. For a detailed explanation, please refer to the code block descriptions in the attached Jupyter Notebooks or see my blog post on this work. While early results on the tiny Shakespeare dataset are modest, the structure is promising.
Replaces classical linear layers with interferometric networks—phase shifters + beamsplitters (Fourier transforms).
Replaces classical linear layers with qubit‐rotation networks—only single‐qubit Ry rotations.
