Skip to content
/ tnl Public

Question Answering (QA) system with RAG supported designed specifically for the Malaysian language.

License

Notifications You must be signed in to change notification settings

yumiian/tnl

Repository files navigation

Question Answering System in Malaysian Language

A Question Answering (QA) system designed specifically for the Malaysian language, allowing users to ask natural language questions and receive accurate answers based on provided context. It supports context input via file upload or raw text and includes a user-friendly Gradio interface. The system also utilizes Retrieval-Augmented Generation (RAG) to handle multiple documents inputs for better information retrieval.

Features

  • Contextual QA using RAG with support for .txt, .pdf, .docx files.
  • Gradio UI: Intuitive chatbot interface for users to chat with the QA system.

Dataset

Model

Installation

  1. Clone the repository.

  2. Create a virtual environment.

  3. Install CUDA or check your installed CUDA version using this command (cmd):

nvcc --version
  1. Then, install PyTorch based on your installed CUDA version.

  2. Install the required dependencies:

pip install -r requirements.txt
  1. Start the gradio application:
python app.py

Wait until it shows this output:

Device set to use cuda:0
* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.
  1. Then, navigate to http://localhost:7860/ and start using the application.

  2. Done!

Getting started

  1. Provide Context:
  • Upload .txt, .pdf, .docx files or
  • Paste raw text into the "Extracted Context" field.
  1. Ask Questions:
  • Enter your question in the input box.
  • The model will return an answer based on the uploaded/pasted context.
  1. RAG Pipeline:

When multiple documents are provided, the system uses RAG to extract relevant content before answering.

Screenshots

interface

Acknowledgement

Huge thanks to: