diff --git a/.github/workflows/ci-checks.yaml b/.github/workflows/ci-checks.yaml
new file mode 100644
index 0000000..168a39e
--- /dev/null
+++ b/.github/workflows/ci-checks.yaml
@@ -0,0 +1,33 @@
+name: CI Checks
+
+on:
+ push:
+ branches:
+ - main
+
+ pull_request:
+ branches:
+ - "*" # Run on all branches
+
+env:
+ MIN_PYTHON_VERSION: 3.11
+
+jobs:
+ job-pre-commit-check:
+ name: Pre-Commit Check
+ runs-on: ubuntu-latest
+
+ steps:
+ - name: Checkout code
+ uses: actions/checkout@v3
+
+ - name: Set up Python
+ uses: actions/setup-python@v3
+ with:
+ python-version: ${{ env.MIN_PYTHON_VERSION }}
+
+ - name: Install dependencies
+ run: pip install -r requirements-dev.txt
+
+ - name: Run pre-commit
+ run: pre-commit run --all-files
diff --git a/adi_function_app/README.md b/adi_function_app/README.md
index 1794059..722734f 100644
--- a/adi_function_app/README.md
+++ b/adi_function_app/README.md
@@ -42,7 +42,7 @@ Using the [Phi-3 Technical Report: A Highly Capable Language Model Locally on Yo
```json
{
- "content": "\n
\nTable 1: Comparison results on RepoQA benchmark.\n\nModel | \nCtx Size | \nPython | \nC++ | \nRust | \nJava | \nTypeScript | \nAverage | \n
\n\ngpt-4O-2024-05-13 | \n128k | \n95 | \n80 | \n85 | \n96 | \n97 | \n90.6 | \n
\n\ngemini-1.5-flash-latest | \n1000k | \n93 | \n79 | \n87 | \n94 | \n97 | \n90 | \n
\n\nPhi-3.5-MoE | \n128k | \n89 | \n74 | \n81 | \n88 | \n95 | \n85 | \n
\n\nPhi-3.5-Mini | \n128k | \n86 | \n67 | \n73 | \n77 | \n82 | \n77 | \n
\n\nLlama-3.1-8B-Instruct | \n128k | \n80 | \n65 | \n73 | \n76 | \n63 | \n71 | \n
\n\nMixtral-8x7B-Instruct-v0.1 | \n32k | \n66 | \n65 | \n64 | \n71 | \n74 | \n68 | \n
\n\nMixtral-8x22B-Instruct-v0.1 | \n64k | \n60 | \n67 | \n74 | \n83 | \n55 | \n67.8 | \n
\n
\n\n\nsuch as Arabic, Chinese, Russian, Ukrainian, and Vietnamese, with average MMLU-multilingual scores\nof 55.4 and 47.3, respectively. Due to its larger model capacity, phi-3.5-MoE achieves a significantly\nhigher average score of 69.9, outperforming phi-3.5-mini.\n\nMMLU(5-shot) MultiLingual\n\nPhi-3-mini\n\nPhi-3.5-mini\n\nPhi-3.5-MoE\n\n\n\n\n\nWe evaluate the phi-3.5-mini and phi-3.5-MoE models on two long-context understanding tasks:\nRULER [HSK+24] and RepoQA [LTD+24]. As shown in Tables 1 and 2, both phi-3.5-MoE and phi-\n3.5-mini outperform other open-source models with larger sizes, such as Llama-3.1-8B, Mixtral-8x7B,\nand Mixtral-8x22B, on the RepoQA task, and achieve comparable performance to Llama-3.1-8B on\nthe RULER task. However, we observe a significant performance drop when testing the 128K context\nwindow on the RULER task. We suspect this is due to the lack of high-quality long-context data in\nmid-training, an issue we plan to address in the next version of the model release.\n\nIn the table 3, we present a detailed evaluation of the phi-3.5-mini and phi-3.5-MoE models\ncompared with recent SoTA pretrained language models, such as GPT-4o-mini, Gemini-1.5 Flash, and\nopen-source models like Llama-3.1-8B and the Mistral models. The results show that phi-3.5-mini\nachieves performance comparable to much larger models like Mistral-Nemo-12B and Llama-3.1-8B, while\nphi-3.5-MoE significantly outperforms other open-source models, offers performance comparable to\nGemini-1.5 Flash, and achieves above 90% of the average performance of GPT-4o-mini across various\nlanguage benchmarks.\n\n\n\n\n",
+ "content": "\n\nTable 1: Comparison results on RepoQA benchmark.\n\nModel | \nCtx Size | \nPython | \nC++ | \nRust | \nJava | \nTypeScript | \nAverage | \n
\n\ngpt-4O-2024-05-13 | \n128k | \n95 | \n80 | \n85 | \n96 | \n97 | \n90.6 | \n
\n\ngemini-1.5-flash-latest | \n1000k | \n93 | \n79 | \n87 | \n94 | \n97 | \n90 | \n
\n\nPhi-3.5-MoE | \n128k | \n89 | \n74 | \n81 | \n88 | \n95 | \n85 | \n
\n\nPhi-3.5-Mini | \n128k | \n86 | \n67 | \n73 | \n77 | \n82 | \n77 | \n
\n\nLlama-3.1-8B-Instruct | \n128k | \n80 | \n65 | \n73 | \n76 | \n63 | \n71 | \n
\n\nMixtral-8x7B-Instruct-v0.1 | \n32k | \n66 | \n65 | \n64 | \n71 | \n74 | \n68 | \n
\n\nMixtral-8x22B-Instruct-v0.1 | \n64k | \n60 | \n67 | \n74 | \n83 | \n55 | \n67.8 | \n
\n
\n\n\nsuch as Arabic, Chinese, Russian, Ukrainian, and Vietnamese, with average MMLU-multilingual scores\nof 55.4 and 47.3, respectively. Due to its larger model capacity, phi-3.5-MoE achieves a significantly\nhigher average score of 69.9, outperforming phi-3.5-mini.\n\nMMLU(5-shot) MultiLingual\n\nPhi-3-mini\n\nPhi-3.5-mini\n\nPhi-3.5-MoE\n\n\n\n\n\n We evaluate the phi-3.5-mini and phi-3.5-MoE models on two long-context understanding tasks:\nRULER [HSK+24] and RepoQA [LTD+24]. As shown in Tables 1 and 2, both phi-3.5-MoE and phi-\n3.5-mini outperform other open-source models with larger sizes, such as Llama-3.1-8B, Mixtral-8x7B,\nand Mixtral-8x22B, on the RepoQA task, and achieve comparable performance to Llama-3.1-8B on\nthe RULER task. However, we observe a significant performance drop when testing the 128K context\nwindow on the RULER task. We suspect this is due to the lack of high-quality long-context data in\nmid-training, an issue we plan to address in the next version of the model release.\n\n In the table 3, we present a detailed evaluation of the phi-3.5-mini and phi-3.5-MoE models\ncompared with recent SoTA pretrained language models, such as GPT-4o-mini, Gemini-1.5 Flash, and\nopen-source models like Llama-3.1-8B and the Mistral models. The results show that phi-3.5-mini\nachieves performance comparable to much larger models like Mistral-Nemo-12B and Llama-3.1-8B, while\nphi-3.5-MoE significantly outperforms other open-source models, offers performance comparable to\nGemini-1.5 Flash, and achieves above 90% of the average performance of GPT-4o-mini across various\nlanguage benchmarks.\n\n\n\n\n",
"sections": [],
"page_number": 7
}