Financial markets are inherently noisy, providing opportunities for algorithmic strategies to exploit pricing inefficiencies. This report develops a statistical arbitrage strategy inspired by Avellaneda and Lee, adapted for the Vietnamese stock market, where short selling of individual stocks is prohibited. I propose longing a basket of stocks and shorting the VN30F1M futures contract to capture mean-reverting relationships. I formed combinations by employing clustering techniques, the Johansen cointegration test, and generate signals by using the s-score of the Ornstein-Uhlenbeck process. After the backtesting, I found that the key things depends on how the trading signal is designed and additional efforts (time and computational power) should be implemented for the best performance of the model
In the Vietnamese stock market, dominated by retail investors, stocks exhibit exaggerated price movements due to overreactions to news or sentiment. These deviations from fundamental values result in wider spreads between stock baskets and the VN30F1M futures, which revert to their historical mean, enabling profitable trades via algorithmic mean-reversion strategies.
Statistical arbitrage is a popular algorithmic trading strategy that delivers market-neutral returns, independent of market trends. It attracts investors with its diversification benefits and high-reward, low-risk potential—similar to earning high interest from a bank but with greater upside.
This strategy uses statistical methods to exploit pricing inefficiencies, often via mean-reverting portfolios. A classic example, pairs trading, involves trading two correlated securities (long one, short the other) when their price spread diverges, expecting it to revert. The model is:
Here,
In Vietnam, short-selling stocks is banned, making pairs trading unfeasible. Instead, this strategy longs a basket of stocks and shorts the VN30F1M futures contract to stay market-neutral. The goal is to find:
where the residual is stationary.
Statistical arbitrage is well-documented in finance. Avellaneda and Lee (2010) modeled pairs trading with cointegration and the Ornstein-Uhlenbeck process, where the spread (
Here, (
Stanford students (Lu, Parulekar, Xu, 2018) proposed clustering and the Johansen Test for cointegration—unlike the Engel-Granger test, it handles multiple cointegration relationships—enhancing stock-future cointegration analysis.
The data is taken from Algotrade Database from 06/2021-12/2024 using the daily closing price. Data is stored in the data folder.
-
Requirement:
pip
,virtualenv
-
Create and source new virtual environment in the current working directory with command
-
On Linux
# Create the virtual environment
python3 -m venv venv
# Activate the virtual environment
source venv/bin/activate
- On Window
# Create the virtual environment
python -m venv venv
# Activate the virtual environment
venv\Scripts\activate
- Install the dependencies by:
pip install -r requirements.txt
- To load the data:
python load_data.py
I formed combinations by employing clustering techniques, the Johansen cointegration test, and generate signals by using the s-score of the Ornstein-Uhlenbeck process. ( More detail in Final_Report)
-Run by (Use existing data)
python main.py in_sample use_data
-Parameters
{
"estimation_window": 60,
"min_trading_days": 30,
"max_clusters": 10,
"top_stocks": 6,
"tier": 1,
"first_allocation": 0.4,
"adding_allocation": 0.4,
"correlation_threshold": 0.6
}
-Cumulative Return
One run time cost about 7-15 minutes depending on using existing data or not and the computer power.
(The turnover ratio in the code is calcualted slightly wrong so I mannually modified it)
Metric | Strategy (Initial) | VN30 |
---|---|---|
HPR | -1.03% | -20.80% |
Excess HPR | 19.78% | n/a |
Annual Return | -0.44% | -9.54% |
Annual Excess Return | 9.09% | n/a |
Maximum Drawdown | 8.01% | 42.46% |
Longest Drawdown | 449 | 477 |
Turnover Ratio | 8.25% | n/a |
Sharpe Ratio | -0.68 | -0.65 |
Sortino Ratio | -0.06 | -0.54 |
Information Ratio | 0.45 | n/a |
The strategy performs poorly and does not really have the market-neutral charecteristics. A possible explaination is that the hedging is not enough (beta around 0.2), and stocks usually falls larger than index when there is a sharpe downturn becuase in the index there are stocks that have small correlation with the index
Due to limited computational power and inapporiate methodology, I only ran 50 trial with 5 bathces of 10 triasls, so the parameters are in high risk of overfitting.
It can take 10-15 minutes per trial, which can make it 10 hours for an entire process. Becareful
-Run the optimization process by
python optimization.py
( The VN30 here and above are different because of the difference in the estimation_window, which lead to slightly different start_date.)
-Parameters
{
"estimation_window": 50,
"min_trading_days": 25,
"max_clusters": 10,
"top_stocks": 3,
"tier": 1,
"first_allocation": 0.4,
"adding_allocation": 0.2,
"correlation_threshold": 0.6
}
-Run the backtesting with optimization parameters by (Use existing data)
python main.py optimization use_data
-Cumulative Return
Metric | Strategy (Initial) | VN30 |
---|---|---|
HPR | -2.06% | -24.29% |
Excess HPR | 22.23% | n/a |
Annual Return | -0.87% | -11.01% |
Annual Excess Return | 10.14% | n/a |
Maximum Drawdown | 10.33% | 42.46% |
Longest Drawdown | 355 | 508 |
Turnover Ratio | 7.16% | n/a |
Sharpe Ratio | -0.92 | -0.74 |
Sortino Ratio | -0.11 | -0.63 |
Information Ratio | 0.50 | n/a |
The new set of parameters after optimization seem to be a better, although it do not generate positive profit. One causes maybe of the "high-beta" (0.2-0.4) of my strategy which makes the strategy move in correlation with the index.
-Run by
python main.py out_sample use_data
-Cumulative Return
Metric | Strategy (Optimal) | VN30 |
---|---|---|
HPR | 10.42% | 18.83% |
Excess HPR | -8.41% | n/a |
Annual Return | 10.46% | 18.9% |
Annual Excess Return | -8.45% | n/a |
Maximum Drawdown | 5.39% | 8.38% |
Longest Drawdown | 187 | 71 |
Turnover Ratio | 9.72% | n/a |
Sharpe Ratio | 0.78 | 0.97 |
Sortino Ratio | 2.07 | 1.67 |
Information Ratio | -0.64 | n/a |
python main.py overall use_data
Metric | Strategy (Optimal) | VN30 |
---|---|---|
HPR | 8.15% | -10.10% |
Excess HPR | 18.16% | n/a |
Annual Return | 2.34% | -3.06% |
Annual Excess Return | 5.40% | n/a |
Maximum Drawdown | 10.33% | 42.46% |
Longest Drawdown | 383 | 762 |
Turnover Ratio | 6.55% | n/a |
Sharpe Ratio | -0.41 | -0.4 |
Sortino Ratio | 0.36 | -0.19 |
Information Ratio | 0.29 | n/a |
The result is good however when look closer to the graph, it still have high correlation with the index.
The strategy’s complexity suggests several improvements for better results:
- Optimize combination formation to reduce time and computational demands.
- Address relatively-high(beta) remaining in the strategy.
- Improve the optimization process to avoid overfitting, limited by computational power and methodology.
This report introduces a statistical arbitrage strategy for Vietnam, bypassing short-selling limits by pairing stock baskets with futures. Clustering and cointegration aid combination formation, and backtesting shows promise. Further enhancements are needed to manage model complexity.
- Avellaneda, M., & Lee, J. (2010). "Statistical arbitrage in the U.S. equities market." Quantitative Finance, 10(7), 761–782. DOI: 10.1080/14697680903124632
- Lu, Y., Parulekar, A., & Xu, J. (2018). "Statistical arbitrage via cointegration and clustering." Stanford University Working Paper.