Skip to content

Commit b723c72

Browse files
authored
Merge pull request #157 from meiranp-nvidia/meiranp-nv/llm_prompt_helpers
Add new tool : llm prompt design helper
2 parents b2fd9c5 + ff6e6ec commit b723c72

22 files changed

+1648
-1
lines changed

experimental/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,4 +58,8 @@ Experimental examples are sample code and deployments for RAG pipelines that are
5858

5959
* [NVIDIA Event Driven RAG for CVE Analysis with NVIDIA Morpheus](./event-driven-rag-cve-analysis/)
6060

61-
This example demonstrates how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines. These pipelines will be used to automatically and scalably traige and detect Common Vulnerabilities and Exposures (CVEs) in Docker containers using references to source code, dependencies, and information about the CVEs.
61+
This example demonstrates how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines. These pipelines will be used to automatically and scalably traige and detect Common Vulnerabilities and Exposures (CVEs) in Docker containers using references to source code, dependencies, and information about the CVEs.
62+
63+
* [LLM Prompt Design Helper using NIM](./llm-prompt-design-helper/)
64+
65+
This tool demonstrates how to utilize a user-friendly interface to interact with NVIDIA NIMs, including those available in the API catalog, self-deployed NIM endpoints, and NIMs hosted on Hugging Face. It also provides settings to integrate RAG pipelines with either local and temporary vector stores or self-hosted search engines. Developers can use this tool to design system prompts, few-shot prompts, and configure LLM settings.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.gif filter=lfs diff=lfs merge=lfs -text
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
FROM ubuntu:20.04
2+
3+
RUN apt-get -y update
4+
RUN apt-get -y install python3 python3-pip
5+
6+
# RUN mkdir /chat_ui
7+
# COPY chat_ui.py /chat_ui
8+
# COPY config.yaml /chat_ui
9+
# COPY api_request.py /chat_ui
10+
COPY requirements.txt /chat_ui/
11+
WORKDIR /chat_ui/
12+
13+
RUN pip3 install --upgrade pip
14+
RUN pip3 install -r requirements.txt
15+
16+
ENTRYPOINT ["python3"]
17+
CMD ["-u", "chat_ui.py"]
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# guide_to_integrate_api_catalog
2+
3+
This project is used to create a simple UI to interact with selectable NIM endpoints (see below supported endpoints) and integrate RAG pipeline.
4+
5+
- [API catalog](https://build.nvidia.com/explore/discover) hold by NVIDIA.
6+
- Self-host NIM
7+
- HuggingFace NIM
8+
9+
10+
## Target Users
11+
This project targets to help developers who:
12+
- Want to evaluate different NIM LLMs with small or large dataset.
13+
- Need to tune parameters, such as temperature, top_p, etc.
14+
- Need to do prompt engineering, such as system prompt, few shot examples, etc.
15+
- Need to design some simple agents based on prompt engineering.
16+
- Want to integrate with RAG pipeline to evaluate the designed system prompt.
17+
18+
## System prompt helper
19+
20+
![screenshot of the UI](./data/simple_ui.jpeg)
21+
22+
The provided interface of this project supports designing a system prompt to call the LLM. The system prompt is configured in the `config.yaml` file using the model name as the key, e.g., `"meta/llama3-70b-instruct"`. You can also add few-shot examples in the `config.yaml` file (there are some commented lines for description) or via the UI in a defined format for your typical use case.
23+
24+
For development purposes, developers can use this interface to design the system prompt interactively. After selecting the model, you can input a new system prompt, which will overwrite the system prompt in `config.yaml`. If the system prompt is defined, you can configure it for the related model in `config.yaml` by clicking `Update Yaml based on UI settings` button.
25+
26+
The interface will automatically load the selected model's conguration from `config.yaml` and display it in the UI. Additionally, it will list available chat models from the API catalog via `langchain-nvidia-ai-endpoints` in a dropdown menu. To see the list from the API catalog, you need to set the API key by following the instructions in the next section. If new models are not available via the endpoints or you want to test with self-hosted or Hugging Face NIMs endpoints, you can manually insert the model via the UI textbox (Input the name under `Model name in API catalog`, then click `Insert the model into list` button)
27+
28+
Note: To insert models deployed in API catalog, pls using the same name as defined in the API catalog.
29+
30+
## Integrate with RAG pipeline
31+
![screenshot of the UI - DB](./data/simple_ui_db.jpeg)
32+
33+
This tool provides two methods to integrate with the RAG pipeline:
34+
1. Generate a temporary vector store for retrieval.
35+
2. Interact with a self-hosted retrieval engine that provides an endpoint for retrieval.
36+
37+
### Temporary vector store
38+
This tool supports inserting website HTML links, downloadable PDF links, and uploading PDFs from local storage.
39+
40+
By clicking the **DataBase** tab in the UI, you can input website links or downloadable PDF links, using commas to separate multiple entries. You can also upload PDFs by clicking the `Click to Upload Local PDFs` button. Once the data sources are prepared, you can set the chunk size, chunk overlap size, and select one of the embedder models in [NVIDIA API catalog](https://build.nvidia.com/explore/retrieval). By clicking `Embedding and Insert`, the content will be parsed, embedded, and inserted into a temporary vector store.
41+
42+
With this vector store set up, go back to the **LLM Prompt Designer** tab and expand the `Data Base settings`. The retrieval settings will be available. You can then select one of the Reranker models for the RAG pipeline, which are available in [NVIDIA API catalog](https://build.nvidia.com/explore/retrieval).
43+
44+
![screenshot of the Local Database settings - DB](./data/local-database-settings.jpeg)
45+
46+
### Self-deployed retrieval engine
47+
This tool also supports interacting with self-hosted retrieval engine which provided an endpoint for retrieval.
48+
49+
Expand the `Data Base settings` -> `Self deployed vector database settings` in **LLM Prompt Designer** tab, input the engine endpoint, and query format string, using `{input}` as format query input. The retrieval database selection of `self-deployed-db` will be available. You can then select one of the Reranker models which are available in [NVIDIA API catalog](https://build.nvidia.com/explore/retrieval) for the RAG pipeline or disable reranker by selecting `None`.
50+
51+
![screenshot of the Self-Deployed Database settings - DB](./data/self-host-database-settings.jpeg)
52+
53+
## Getting started
54+
### Prepare the docker image
55+
Run below command to build the docker image
56+
```bash
57+
git clone https://github.com/NVIDIA/GenerativeAIExamples/ && cd GenerativeAIExamples/community/llm-prompt-design-helper
58+
bash ./build_image.sh
59+
```
60+
61+
### Start the project
62+
#### API catalog NIM endpoints
63+
Set the API key env before start the container.
64+
65+
```bash
66+
export API_CATALOG_KEY="nvapi-*******************"
67+
export NIM_INFER_URL="https://integrate.api.nvidia.com/v1"
68+
```
69+
70+
If you don't have an API key, follow [these instructions](https://github.com/NVIDIA/GenerativeAIExamples/blob/main/docs/api-catalog.md#get-an-api-key-for-the-accessing-models-on-the-api-catalog) to sign up for an NVIDIA AI Foundation developer account and obtain access.
71+
72+
Run below command to run the container.
73+
```bash
74+
bash ./run_container.sh
75+
```
76+
77+
#### Self-host NIM endpoints
78+
If you already have access to self-host NIM, you can follow the [guide](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html) to set up the NIM.
79+
80+
To inference via this UI, follow this [run inference](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html#openai-completion-request) guide to get the base_url and api_key. Then run below command to set the environment.
81+
82+
```bash
83+
export API_CATALOG_KEY="not-used"
84+
export NIM_INFER_URL="http://0.0.0.0:8000/v1"
85+
```
86+
87+
Run below command to run the container.
88+
```bash
89+
bash ./run_container.sh
90+
```
91+
92+
NOTE:
93+
1. If you have different models deployed with different IP, you can set the env once, and using UI -> Show more settings -> input your different IP and port like "http://{IP}:{PORT}/v1"
94+
2. The **Insert model mannually** feature will be disabled when inference with self-host NIM endpoint
95+
96+
#### Hugging Face NIM endpoints
97+
NVIDIA have already collaboration with Hugging Face to simplify generative AI model deployments, you can follow this [technical blog](https://developer.nvidia.com/blog/nvidia-collaborates-with-hugging-face-to-simplify-generative-ai-model-deployments/) to deploy the NIM in Hugging Face. After the deployment, you can also interact with the NIM endpoints via this project.
98+
99+
To inference via this UI, get the base_url and api_key of Hugging Face. Then run below command to set the environment.
100+
```bash
101+
export API_CATALOG_KEY="hf_xxxx"
102+
export NIM_INFER_URL="{hugging face inference URL}"
103+
```
104+
105+
Run below command to run the container.
106+
```bash
107+
bash ./run_container.sh
108+
```
109+
NOTE:
110+
1. The **Insert model mannually** feature will be disabled when inference with Hugging Face NIM endpoint
111+
112+
### Access the UI
113+
After service starts up, you can open the UI via http://localhost:80/
114+
115+
## Test with dataset
116+
If you want to test with a local dataset when the config.yaml is finalized, then you can load your test set and run the inference with the configuration to test. The sample scripts can refer [`test.py`](./test.py).
117+
118+
### Demo
119+
To update the application port number instead of default 80, do following:
120+
- Update the port number in `chat_ui.py` line `UI_SERVER_PORT = int(os.getenv("UI_SERVER_PORT", 80))`
121+
- Update the port number in `run_container.sh` line `docker run -d -p80:80 ***`
122+
123+
See the demo ![workflow demo](./data/llm-prompt-designer-demo.gif)
124+
125+
## Contributing
126+
127+
Please create a merge request to this repository, our team appreciates any and all contributions that add features! We will review and get back as soon as possible.
128+
129+
130+
131+
132+
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
import yaml
17+
import os
18+
from abc import ABC,abstractmethod
19+
import logging
20+
21+
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
22+
23+
API_CATALOG_KEY = os.getenv("API_CATALOG_KEY", "")
24+
NIM_INFER_URL = os.getenv("NIM_INFER_URL", "https://integrate.api.nvidia.com/v1")
25+
26+
class APIRequest(ABC):
27+
def __init__(self, config_path):
28+
self.config_path = config_path
29+
self.config = {}
30+
with open(config_path, 'r') as file:
31+
self.config = yaml.safe_load(file)
32+
return
33+
34+
def get_model_settings(self,api_model):
35+
model_settings = self.config.get(api_model,None)
36+
if model_settings == None:
37+
logging.info(f"No config for {api_model}, load the default")
38+
model_settings=self.config.get('default')
39+
40+
return model_settings
41+
def update_yaml(self,api_model,parameters):
42+
self.config.update({api_model:parameters})
43+
with open(self.config_path, 'w') as file:
44+
yaml.dump(self.config, file, default_flow_style=False, sort_keys=False)
45+
46+
def get_model_configuration(self,api_model):
47+
model_config = self.get_model_settings(api_model)
48+
# system_prompt = model_config.get('system_prompt','')
49+
return model_config
50+
51+
@abstractmethod
52+
def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
53+
pass
54+
55+
def generate_response(self,api_model,chat_messages,system_prompt=None,initial_prompt=None,temperature=None, top_p=None,max_tokens=None,few_shot_exampls=None,base_url='',context=''):
56+
# Step 1: Get model config based on configuration yaml file.
57+
model_config = self.get_model_settings(api_model)
58+
59+
# Step 2: Get the parameters for LLM based on different model.
60+
temperature = temperature if temperature!=None else model_config.get("temperature",0.0)
61+
top_p = top_p if top_p!=None else model_config.get("top_p",0.7)
62+
max_tokens = max_tokens if max_tokens!=None else model_config.get("max_tokens",1024)
63+
64+
# Step 3: Prepare the messages to be sent to API catalog
65+
# System prompt
66+
# few shot examples.
67+
oai_message = []
68+
system_prompt_message = system_prompt if system_prompt!=None else model_config.get('system_prompt','')
69+
if context:
70+
system_prompt_message += f"\nUse the following pieces of retrieved context to answer the question. \n {context}"
71+
fewshow_examples = few_shot_exampls if few_shot_exampls else model_config.get('few_shot_examples',[])
72+
if system_prompt_message !='':
73+
oai_message.append({'role': 'system', 'content': system_prompt_message})
74+
if fewshow_examples:
75+
for example in few_shot_exampls:
76+
oai_message.append(example)
77+
78+
for item in chat_messages:
79+
if item[0] == None and item[1] == initial_prompt:
80+
continue
81+
oai_message.append({'role': 'user', 'content': item[0]})
82+
if item[1] != '' and item[1] !=None:
83+
# add pure assitant response to chat history.
84+
assistant_msg = item[1]
85+
oai_message.append({'role': 'assistant', 'content': assistant_msg})
86+
87+
request_body = {
88+
"model":api_model,
89+
"messages":oai_message,
90+
"temperature":temperature,
91+
"top_p":top_p,
92+
"max_tokens":max_tokens,
93+
"stream":True
94+
}
95+
logging.info(request_body)
96+
97+
# Step 4: Send the requests using OpenAI Compatible API
98+
return self.send_request(api_model,oai_message,temperature,top_p,max_tokens,base_url)
99+
100+
101+
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
from .api_request import API_CATALOG_KEY,APIRequest,NIM_INFER_URL
17+
from langchain_nvidia_ai_endpoints import ChatNVIDIA
18+
19+
class ChatNVDIAClient(APIRequest):
20+
def __init__(self, config_path):
21+
super().__init__(config_path)
22+
23+
def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
24+
# default NIM_INFER_URL = "https://integrate.api.nvidia.com/v1"
25+
26+
base_url_infer = NIM_INFER_URL
27+
api_key = API_CATALOG_KEY
28+
if base_url !='':
29+
base_url_infer = base_url
30+
31+
client = ChatNVIDIA(base_url= base_url_infer, api_key= api_key,model=api_model, temperature=temperature, max_tokens=max_tokens, top_p=top_p)
32+
try:
33+
completion = client.stream(oai_message,timeout=10.0)
34+
35+
# Step 5: Yield the output of delta content
36+
for chunk in completion:
37+
if chunk.content is not None:
38+
next_token= chunk.content
39+
yield next_token
40+
else:
41+
pass
42+
except Exception as e:
43+
yield "Request is Error:\n" + str(e)
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
from .api_request import API_CATALOG_KEY,APIRequest,NIM_INFER_URL
17+
from openai import OpenAI
18+
from langchain_openai import ChatOpenAI
19+
20+
class OpenAIClient(APIRequest):
21+
def __init__(self, config_path):
22+
super().__init__(config_path)
23+
24+
def send_request_chain(self, api_model, oai_message, temperature, top_p, max_tokens, base_url=''):
25+
base_url_infer = NIM_INFER_URL
26+
api_key = API_CATALOG_KEY
27+
if base_url !='':
28+
base_url_infer = base_url
29+
30+
client = ChatOpenAI(
31+
base_url = base_url_infer,
32+
api_key = api_key,
33+
model=api_model,
34+
temperature=temperature,
35+
max_tokens=max_tokens,
36+
model_kwargs={"top_p": top_p},
37+
timeout=10.0
38+
)
39+
try:
40+
completion = client.stream(oai_message)
41+
42+
# Step 5: Yield the output of delta content
43+
for chunk in completion:
44+
if chunk.content is not None:
45+
next_token= chunk.content
46+
yield next_token
47+
else:
48+
pass
49+
except Exception as e:
50+
yield "Request is Error:\n" + str(e)
51+
52+
def send_request(self,api_model,oai_message,temperature,top_p,max_tokens,base_url=''):
53+
# default NIM_INFER_URL = "https://integrate.api.nvidia.com/v1"
54+
55+
base_url_infer = NIM_INFER_URL
56+
api_key = API_CATALOG_KEY
57+
if base_url !='':
58+
base_url_infer = base_url
59+
60+
client = OpenAI(
61+
base_url = base_url_infer,
62+
api_key = api_key
63+
)
64+
try:
65+
completion = client.chat.completions.create(
66+
model=api_model,
67+
messages=oai_message,
68+
temperature=temperature,
69+
top_p=top_p,
70+
max_tokens=max_tokens,
71+
stream=True,
72+
timeout=10.0
73+
)
74+
75+
# Step 5: Yield the output of delta content
76+
for chunk in completion:
77+
if chunk.choices[0].delta.content is not None:
78+
next_token= chunk.choices[0].delta.content
79+
yield next_token
80+
else:
81+
pass
82+
except Exception as e:
83+
yield "Request is Error:\n" + str(e)

0 commit comments

Comments
 (0)