PerAugy: Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers
This repository provides a step-by-step pipeline for preparing and processing the PENS dataset for sequential recommendation tasks using synthetic user interaction graphs (UIGs).
- PENS Dataset: Includes
train,validation,test, andnewssets.
Start by downloading the PENS dataset, which contains the following files:
train.jsonval.jsontest.jsonnews.json
Ensure these files are placed in the appropriate directory for further processing.
Run the script:
python Scripts/PENS_augmentation.pyThis script:
- Sorts user interactions by timestamp.
- Appends summary nodes from the test set.
- Generates a Seed UIG (user interaction graph) structure.
The following file will be created:
synthetic-original.csv– the initial synthetic UIG dataset.
Additionally, a corresponding summary dataset will be generated with metadata about the summary nodes for each synthetic user.
Open and run:
DS/DoubleShuffling.ipynbThis notebook generates Double Shuffled (DS) user trajectories. You can experiment with:
offsetgapsegment length- other relevant hyperparameters
Run the following notebook to refine the DS dataset:
perturbation/perturbation_D2.ipynbThis step smooths the shuffled trajectories by applying history-aware perturbations to improve recommendation dynamics.
Run:
KG2PENS/KG2PENS_Trainer_New_Convertor.ipynbThis notebook:
- Replaces
<d-s>pairs with correspondings-nodesonly. - Converts the processed dataset into PENS-compatible knowledge graph format.
The final output format will be:
UserID, ClickedNewsID, PositiveNewsID, NegativeNewsID
After completing all steps, you’ll obtain:
- A PENS-style dataset ready for sequential recommendation experiments.
- Summary statistics and user graph structures aligned with experimental design.
- Make sure all dependencies are installed before running the notebooks.
- Intermediate datasets are saved automatically after each step.
- You can tweak parameters in the Jupyter notebooks for different experimental settings.
- Paper :- https://openreview.net/forum?id=JVx7Qi8tz3
- Datasets :- https://doi.org/10.6084/m9.figshare.30327451