Skip to content

Utilities Overview

Emir Sahin edited this page Aug 26, 2024 · 1 revision

Utilities Overview

Utilities are contained primarily within core.py, including various functions designed to streamline common tasks and operations within your development projects. Below, we will overview each function, providing explanations and examples to help you effectively integrate these utilities into your work.


Enum: AgentType

Description:
Defines different types of agents used within the system. This enum simplifies the process of assigning and managing agent roles.

Members:

  • PLANNER: For planning algorithms or operations.
  • SUMMARIZER: For summarizing content or data.
  • GENERIC_RESPONDER: For handling general responses in interactions.
  • VALIDATOR: For validating data or operations.

Function: make_prompt

Description:
Creates a structured prompt in OpenAI format, optionally including images.

Parameters:

  • role (str): The role of the prompt.
  • content (str): The content of the prompt.
  • images (list, optional): A list of images to include in the prompt.

Returns:

  • dict: The structured prompt in OpenAI format.

Example:

prompt = make_prompt("system", "You are a helpful assistant")
print(prompt)

Function: read_pdf

Description:
Reads a PDF file and returns its text content.

Parameters:

  • file (str): The path to the PDF file.

Returns:

  • str: The text extracted from the PDF file.

Example:

text = read_pdf("sample.pdf")
print(text)

Function: get_yaml_prompt

Description:
Extracts a specific prompt from a YAML file based on the provided name. This is meant to be used as a storage method for prompts outside of your code. See the system_prompts.yaml file to see how it can be setup.

Parameters:

  • yaml_file_name (str): The name of the YAML file.
  • prompt_name (str): The specific prompt to retrieve.

Returns:

  • str: The content of the prompt.

Example:

prompt = get_yaml_prompt("system_prompts.yaml", "welcome_message")
print(prompt)

Function: generate_schema

Description:
Generates a JSON schema for a list of functions using their documentation and signatures.

Parameters:

  • functions (list): A list of functions to generate the schema for.

Returns:

  • str: A JSON string representing the schema of the provided functions.

Example:

import core
schema = generate_schema([core.read_pdf, core.make_prompt])
print(schema)

Function: safe_read_json

Description:
Safely parses a JSON string, handling potential errors.

Parameters:

  • response (str): The JSON string to parse.

Returns:

  • dict: The parsed JSON object, or None if the JSON is invalid.

Example:

json_str = '{"key": "value"}'
data = safe_read_json(json_str)
print(data)

Function: find_most_relevant

Description:
Finds the most relevant embeddings in the embeddings list in relation to the prompt_embedding, useful for semantic searches or recommendation systems.

Parameters:

  • text_embedding_pairs (list): A list of tuples where each tuple contains a text and its corresponding embedding.
  • prompt_embedding (list): The embedding of the prompt against which other embeddings are compared.
  • top_k (int, optional): The number of texts to return, in descending order of relevance. Defaults to 5.

Returns:

  • list: A list of the most relevant text based on their cosine similarity to the prompt embedding.

Example:

texts = ["Hello world", "Hello there", "Greetings", "Hi there", "Welcome"]
embeddings = [some_embedding_function(text) for text in texts]
prompt_embedding = some_embedding_function("Hello")
most_relevant_texts = find_most_relevant(list(zip(texts, embeddings)), prompt_embedding, top_k=3)
print(most_relevant_texts)  # Output: ['Hello world', 'Hello there', 'Hi there']

Function: split_into_sentences

Description:
Splits a given text into sentences using punctuation and other markers as delimiters. This function handles edge cases like abbreviations, numbers, websites, and more.

Parameters:

  • text (str): The text to split into sentences.

Returns:

  • list: A list of sentences derived from the text.

Example:

text = "My name is Bilbo Baggins. Who are you?"
sentences = split_into_sentences(text)
print(sentences)  # Output: ['My name is Bilbo Baggins.', 'Who are you?']

Function: split_into_chunks

Description:
Divides a given text into chunks, each containing a specified number of sentences. This is particularly useful for processing or summarization tasks where large text needs to be broken down into manageable parts.

Parameters:

  • text (str): The text to split into chunks.
  • sentences_per_chunk (int): The maximum number of sentences per chunk.

Returns:

  • list: A list of text chunks, each containing up to the specified number of sentences.

Example:

text = "Sentence one. Sentence two. Sentence three. Sentence four. Sentence five."
chunks = split_into_chunks(text, sentences_per_chunk=2)
print(chunks)  # Output: ['Sentence one. Sentence two.', 'Sentence three. Sentence four.', 'Sentence five.']

Additional Functions

  • clean_json_response: Cleans up a JSON response string by removing unnecessary characters.
  • internet_search: Performs an internet search for a given query and returns the top results.
  • read_website, selenium_reader, selenium_hybrid_reader: Different methods for retrieving and reading website content.
  • fetch_url_info: Fetches basic metadata like title and description from a URL.