Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Commit 307ab8e

Browse files
Refine the README for TPP usage (#1498)
1 parent 1663aad commit 307ab8e

File tree

5 files changed

+200
-22
lines changed

5 files changed

+200
-22
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
This README serves as a guide to set up the backend for a code generation chatbot utilizing the NeuralChat framework. You can deploy this code generation chatbot across various platforms, including Intel XEON Scalable Processors, Habana's Gaudi processors (HPU), Intel Data Center GPU and Client GPU, Nvidia Data Center GPU, and Client GPU.
2+
3+
This code generation chatbot demonstrates how to deploy it specifically on Intel XEON processors using [intel exteion for pytorch](https://github.com/intel/intel-extension-for-pytorch) BFloat16 optimization.
4+
5+
# Setup Conda
6+
7+
First, you need to install and configure the Conda environment:
8+
9+
```bash
10+
# Download and install Miniconda
11+
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
12+
bash Miniconda*.sh
13+
source ~/.bashrc
14+
conda create -n demo python=3.9
15+
conda activate demo
16+
```
17+
18+
# Install numactl
19+
20+
Next, install the numactl library:
21+
22+
```shell
23+
sudo apt install numactl
24+
```
25+
26+
# Install ITREX
27+
28+
```bash
29+
git clone https://github.com/intel/intel-extension-for-transformers.git
30+
cd ./intel-extension-for-transformers/
31+
python setup.py install
32+
```
33+
34+
# Install NeuralChat Python Dependencies
35+
36+
Install neuralchat dependencies:
37+
38+
```bash
39+
pip install -r ../../../../../../requirements_cpu.txt
40+
```
41+
42+
# Install Python dependencies
43+
```bash
44+
conda install astunparse ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
45+
conda install jemalloc gperftools -c conda-forge -y
46+
```
47+
48+
# Configure the codegen.yaml
49+
50+
You can customize the configuration file 'codegen.yaml' to match your environment setup. Here's a table to help you understand the configurable options:
51+
52+
| Item | Value |
53+
| ------------------- | ------------------------------------------ |
54+
| host | 0.0.0.0 |
55+
| port | 8000 |
56+
| model_name_or_path | "ise-uiuc/Magicoder-S-DS-6.7B" |
57+
| device | "cpu" |
58+
| tasks_list | ['codegen'] |
59+
60+
Note: To switch from code generation to text generation mode, adjust the model_name_or_path settings accordingly, e.g. the model_name_or_path can be set "Intel/neural-chat-7b-v3-3".
61+
62+
63+
# Run the Code Generation Chatbot Server
64+
65+
To start the code-generating chatbot server, use the following command:
66+
67+
```shell
68+
bash run.sh
69+
```
70+
71+
Note: Please adapt the core number in the commands `export OMP_NUM_THREADS=48` and `numactl -l -C 0-47 python -m run_code_gen` based on your CPU specifications for the `run.sh` script, which can be checked using `lscpu`.
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright (c) 2023 Intel Corporation
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
# This is the parameter configuration file for NeuralChat Serving.
19+
20+
#################################################################################
21+
# SERVER SETTING #
22+
#################################################################################
23+
host: 0.0.0.0
24+
port: 8000
25+
26+
model_name_or_path: "ise-uiuc/Magicoder-S-DS-6.7B"
27+
device: "cpu"
28+
29+
# task choices = ['textchat', 'voicechat', 'retrieval', 'text2image', 'finetune', 'codegen']
30+
tasks_list: ['codegen']
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright (c) 2023 Intel Corporation
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
# Kill the exist and re-run
19+
ps -ef |grep 'run_code_gen' |awk '{print $2}' |xargs kill -9
20+
21+
# KMP
22+
export KMP_BLOCKTIME=1
23+
export KMP_SETTINGS=1
24+
export KMP_AFFINITY=granularity=fine,compact,1,0
25+
26+
# OMP
27+
export OMP_NUM_THREADS=48
28+
export LD_PRELOAD=${CONDA_PREFIX}/lib/libiomp5.so
29+
30+
# tc malloc
31+
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
32+
33+
numactl -l -C 0-47 python -m run_code_gen 2>&1 | tee run.log
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# !/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
#
4+
# Copyright (c) 2023 Intel Corporation
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
19+
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
20+
21+
def main():
22+
server_executor = NeuralChatServerExecutor()
23+
server_executor(config_file="./codegen.yaml", log_file="./codegen.log")
24+
25+
if __name__ == "__main__":
26+
main()

intel_extension_for_transformers/neural_chat/examples/deployment/codegen/backend/xeon/tpp/README.md

Lines changed: 40 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,7 @@ sudo apt install numactl
2929
```bash
3030
git clone https://github.com/intel/intel-extension-for-transformers.git
3131
cd ./intel-extension-for-transformers/
32-
pip install -r requirements.txt
33-
pip install -e .
32+
python setup.py install
3433
```
3534

3635
# Install NeuralChat Python Dependencies
@@ -73,21 +72,54 @@ export USE_MXFP4=1
7372
export KV_CACHE_INC_SIZE=512
7473
```
7574

76-
# Configure Multi-NumaNodes
77-
To use the multi-socket model parallelism with Xeon servers, you need to configure a hostfile first.
75+
# Configure the codegen.yaml
76+
77+
You can customize the configuration file 'codegen.yaml' to match your environment setup. Here's a table to help you understand the configurable options:
78+
79+
| Item | Value |
80+
| ------------------- | ------------------------------------------ |
81+
| host | 0.0.0.0 |
82+
| port | 8000 |
83+
| model_name_or_path | "Phind/Phind-CodeLlama-34B-v2" |
84+
| device | "cpu" |
85+
| use_tpp | true |
86+
| tasks_list | ['codegen'] |
87+
88+
Note: To switch from code generation to text generation mode, adjust the model_name_or_path settings accordingly, e.g. the model_name_or_path can be set "meta-llama/Llama-2-13b-chat-hf".
89+
90+
# Using Single NumaNode
91+
To configure a single NUMA node on Xeon processors, edit the hostfile located at `../../../../../../server/config/hostfile` and set the NUMA node number to 1.
92+
93+
## Modify hostfile
94+
```bash
95+
vim ../../../../../../server/config/hostfile
96+
localhost slots=1
97+
```
98+
99+
## Run the Code Generation Chatbot Server
100+
101+
To start the code-generating chatbot server, use the following command:
102+
103+
```shell
104+
bash run.sh
105+
```
106+
78107

79-
Here is a example using 3 numa nodes on single socket on GNR server.
108+
# Configure Multi-NumaNodes
109+
To utilize multi-socket model parallelism on Xeon servers, you'll need to adjust the hostfile settings.
110+
For instance, to allocate 3 NUMA nodes on a single socket of the GNR server, modify the hostfile as shown below:
80111

81112
## Modify hostfile
82113
```bash
83114
vim ../../../../../../server/config/hostfile
84115
localhost slots=3
85116
```
86117

118+
Afterward, run the run.sh script as previously instructed.
119+
87120

88121
# Configure Multi-Nodes
89122
To use the multi-node model parallelism with Xeon servers, you need to configure a hostfile first and make sure ssh is able between your servers.
90-
91123
For example, you have two servers which have the IP of `192.168.1.1` and `192.168.1.2`, and each of it has 3 numa nodes on single socket.
92124

93125
## Modify hostfile
@@ -135,22 +167,8 @@ localhost slots=3
135167
192.168.1.2 slots=3
136168
```
137169

138-
# Configure the codegen.yaml
139-
140-
You can customize the configuration file 'codegen.yaml' to match your environment setup. Here's a table to help you understand the configurable options:
141-
142-
| Item | Value |
143-
| ------------------- | --------------------------------------- |
144-
| host | 0.0.0.0 |
145-
| port | 8000 |
146-
| model_name_or_path | "Phind/Phind-CodeLlama-34B-v2" |
147-
| device | "cpu" |
148-
| use_tpp | true |
149-
| tasks_list | ['codegen'] |
150-
151-
152-
# Run the Code Generation Chatbot Server
153-
Before running the code-generating chatbot server, make sure you have already deploy the same `conda environment` and `intel-extension-for-tranformers codes` on both servers.
170+
## Run the Code Generation Chatbot Server
171+
Before running the code-generating chatbot server, make sure you have already deploy the same conda environment and intel-extension-for-tranformers codes on both servers.
154172

155173
To start the code-generating chatbot server, use the following command:
156174

0 commit comments

Comments
 (0)