TranslodeP2C is an AI-powered pseudocode-to-C++ conversion system.
Leveraging a Transformer-based seq2seq model,
it translates pseudocode descriptions into structured C++ programs.
The project includes preprocessing, vocabulary building, training,
and inference, with an interactive Streamlit UI.
- Transformer-based sequence-to-sequence model for code generation.
- Converts pseudocode to C++ using deep learning.
- Preprocessing and vocabulary management for structured learning.
- Training pipeline with customizable hyperparameters.
- Inference system with greedy decoding.
- Streamlit-based web UI for user-friendly interactions.
Ensure you have the following installed:
- Python 3.8+
- PyTorch
- Streamlit
- tqdm
- Clone the repository: git clone https://github.com/absarraashid3/translodep2c.git cd translodep2c
- Install dependencies: pip install -r requirements.txt
- Prepare your dataset and place it in data/train/split/.
Convert TSV trInaining data into paired pseudocode-code format:
python src/preprocess.py --input_tsv "C:\Projects\GenAi\data\train\split\spoc-train-train.tsv" --output_txt "C:\Projects\GenAi\data\train_pairs.txt"
Generate vocabulary pickle files from training pairs:
python src/vocab.py --pairs_file "C:\Projects\GenAi\data\train_pairs.txt" --src_vocab_file "src/src_vocab.pkl" --tgt_vocab_file "src/tgt_vocab.pkl"
Train the Transformer model for pseudocode-to-C++ conversion:
python src/train.py --pairs_file "C:\Projects\GenAi\data\train_pairs.txt" --src_vocab_file "src/src_vocab.pkl" --tgt_vocab_file "src/tgt_vocab.pkl" --epochs 10 --batch_size 8
Generate C++ code from input pseudocode:
python src/infer.py --model_checkpoint transformer_seq2seq.pt --src_vocab_file "src/src_vocab.pkl" --tgt_vocab_file "src/tgt_vocab.pkl" --pseudocode "read n print factorial of n"
Launch the Streamlit UI:
streamlit run src/app.py
Enter pseudocode and get auto-generated C++ code!
- Implement beam search decoding for better predictions.
- Fine-tune with more programming languages.
- Optimize the model for faster inference.