This repository provides a suite of scripts to automate the preparation of ligand topologies, conversion to GROMACS format, and the setup and analysis of Molecular Dynamics (MD) simulations in multiple solvents. The overall methodology is inspired by:
Schneider, S. H.; Boxer, S. G. Vibrational Stark Effects of Carbonyl Probes Applied to Reinterpret IR and Raman Data for Enzyme Inhibitors in Terms of Electric Fields at the Active Site. J. Phys. Chem. B 2016, 120 (36), 9672–9684.
DOI: 10.1021/acs.jpcb.6b08133
Additional elements are adapted from: https://github.com/yasminshamsudin/boxer-solvatochrom.git
For solvent parameters, please refer to:
van der Spoel, D.; Ghahremanpour, M. M.; Lemkul, J. Small Molecule Thermochemistry: A Tool For Empirical Force Field Development, J. Phys. Chem. A 2018, 122, 8982–8988.
Accessible via Virtual Chemistry.
The MD setup and analysis workflow is divided into several sequential steps:
-
Topology Generation and Conversion (Step 1)
step_1.sh
- Reads
smiles.txt
for each ligand and generates a 3D structure using Obabel. - Assigns GAFF atom types via antechamber and parmchk2, and creates AMBER topologies using tleap.
- Converts AMBER files to GROMACS format with acpype.
- Cleans up and organizes outputs into a
step_1_outputs/
folder.
- Reads
-
MD Simulation Setup (Step 2)
step_2.sh
- Uses the GROMACS files from Step 1 to build a simulation box and solvate the system in various solvents (H2O, D2O, and organic solvents).
- Generates both standard and zero-charge (0q) topologies for vibrational Stark analysis.
- Sets up triplicate MD simulation replicates (
run1/
,run2/
,run3/
) within each ligand’s solvent-specific MD folder, copying analysis scripts and creating an HPC submission script (run_md.sh
).
-
Job Submission (Step 3)
step_3.sh
- Iterates through each ligand’s MD run directories, ensures that the submission script (
run_md.sh
) is executable, changes to the appropriate directory, and submits the jobs viasbatch
. - This script ensures the working directory is correctly set so that all input files are found by GROMACS.
- Iterates through each ligand’s MD run directories, ensures that the submission script (
-
Post-Simulation Analysis (Step 4)
step_4.sh
- Loads the
scipy-stack
module to provide the necessary Python scientific libraries. - Iterates through all run directories, executes
calc_fields.py
(which computes field data and generatesFIELDS.txt
), and then runstime_series.py
withFIELDS.txt
as input. - Saves the output of
time_series.py
toresults.txt
in each run directory.
- Loads the
-
Data Aggregation and Plotting
Two Python scripts streamline the analysis of vibrational frequency data:aggregate_results.py
- Recursively parses each
results.txt
file (extracting key metrics from the "Eproj (average overall electric field)" block) from the MD runs, records one row per run, and then appends an average row per ligand–solvent pair. - Saves the aggregated data to
aggregated_results.csv
.
- Recursively parses each
plot_field_vs_frequency.py
- Reads
aggregated_results.csv
and frequency data fromfreq_data.txt
. - Merges the average electric field data with vibrational frequencies (provided per ligand and solvent).
- Creates a scatter plot where the x-axis is the average solvent electric field (in MV/cm) and the y-axis is the C=O frequency (in cm⁻¹).
- For each ligand, a solid black regression line (extended across the full x-range) is fitted and printed to the terminal in the format:
v̅C═O<ligand> = <slope>|F⃗solv| + <intercept> (R2 = <r²>)
- The solvent markers are color-coded; a legend (showing only solvent names and colors) is placed outside to the right.
- The final plot is saved as
electric_field_vs_frequency.png
.
- Reads
Auto-VSE/
├── ANALYSIS_SCRIPTS/
│ └── (custom analysis scripts, if any)
├── LIGANDS/
│ ├── ligandA/
│ │ ├── smiles.txt
│ │ ├── probe.ndx
│ │ └── (outputs from step_1.sh, step_2.sh, including *_md/ folders)
│ └── ligandB/
│ ├── smiles.txt
│ ├── probe.ndx
│ └── ...
├── MD_PARAMS/
│ ├── min.mdp
│ ├── nvt.mdp
│ ├── npt.mdp
│ └── md.mdp
├── SOLVENTS/
│ ├── H2O/
│ │ ├── spc216.gro
│ │ └── tip3p.itp
│ ├── D2O/
│ │ ├── spc216.gro
│ │ └── tip3p.itp (modified for D2O)
│ ├── dibutyl-ether/
│ │ ├── conf.gro
│ │ └── topol.top
│ └── (other solvents)/
│ ├── conf.gro
│ └── topol.top
├── freq_data.txt
├── step_1.sh
├── step_2.sh
├── step_3.sh
├── step_4.sh
├── aggregate_results.py
└── plot_field_vs_frequency.py
Key Files:
-
LIGANDS/
Each ligand must be placed in its own subdirectory. The minimal requirements inside each ligand folder are:smiles.txt
containing the SMILES string of the ligand (line 1).probe.ndx
containing an index group file for the carbonyl probe atoms (e.g., for post-MD analysis).- Any outputs from
step_1.sh
andstep_2.sh
will be placed here (e.g.,*_GMX.gro
,*_GMX.top
,*_md/
folders, etc.).
-
MD_PARAMS/
Contains the GROMACS.mdp
files for each MD phase: energy minimization (min.mdp
), NVT equilibration (nvt.mdp
), NPT equilibration (npt.mdp
), and production MD (md.mdp
). -
SOLVENTS/
Each solvent is placed in its own folder with at least:conf.gro
andtopol.top
for non-water solvents.spc216.gro
andtip3p.itp
for water (H2O
), or similar forD2O
.
The scripts use these to solvate the ligand and build the final topologies.
If you need additional solvent parameter files, you can obtain them from Virtual Chemistry and place them here.
-
ANALYSIS_SCRIPTS/
Place any custom analysis scripts here. They are automatically copied into everyrun*
folder created bystep_2.sh
.
For D2O simulations included here, we simply increase the hydrogen mass to 2.014 within the water model. This approach is a rough approximation and not strictly correct, but it may provide a first pass at simulating heavy water in these simulations.
-
Clone or Download this repository into a working directory.
git clone https://github.com/DomFico/Auto-VSE.git
-
Place Your Ligands:
- Create subfolders in
LIGANDS/
, one for each ligand. - Put
smiles.txt
(with exactly one SMILES on the first line) in each ligand folder. - Include
probe.ndx
to specify atomic indices for the carbonyl probe.
- Create subfolders in
-
Check Solvents:
- In
SOLVENTS/
, create subfolders for each solvent. - Ensure that each subfolder has either
spc216.gro
+tip3p.itp
(for H2O/D2O) orconf.gro
+topol.top
(for non-water). - You may drag and drop additional solvent parameter files from Virtual Chemistry if needed.
- In
-
Check MD Parameters:
- Ensure that
MD_PARAMS/
contains.mdp
files:min.mdp
,nvt.mdp
,npt.mdp
, andmd.mdp
.
- Ensure that
-
Run Step 1:
- From the
AUTO-VSE/
directory, execute:./step_1.sh
- Step 2:
./step_2.sh
- Step 3:
./step_3.sh
- Step 4:
./step_4.sh
- From the
-
Data Aggregation and Plotting:
- Aggregate the analysis results:
python aggregate_results.py
- Generate the publication-quality plot and print regression lines:
python plot_field_vs_frequency.py
- Aggregate the analysis results:
- Obabel (for 3D geometry generation)
- AmberTools (antechamber, parmchk2, tleap)
- Acpype (for AMBER-to-GROMACS conversion)
- GROMACS
- GNU Bash and standard GNU utilities
- SLURM (or equivalent HPC job scheduler)
- Python 3 with packages:
numpy
,pandas
,matplotlib
- HPC Job Submission:
The SLURM directives inrun_md.sh
andstep_3.sh
may need adaptation to your cluster's specifications.
Have Fun :)