This project is a web application built with Streamlit to generate fake datasets. The application allows users to select fields such as names, addresses, emails, phone numbers, etc., and generate fake data based on their selections. Additionally, the app provides an option to choose the output format (CSV, JSON, or JSONL) and supports different locales (e.g., en_US, de_DE, fr_FR, etc.).
- Select multiple fields to generate fake data (e.g., Name, Address, Email, Job, etc.).
- Choose from multiple output formats: CSV, JSON, or JSONL.
- Select the locale for Faker (e.g., English, French, German, Spanish, etc.).
- Easy-to-use interface built with Streamlit.
-
Docker: Ensure that you have Docker installed. You can download it from here.
-
Streamlit, Faker, Pandas: These libraries are required to run the app locally. If you're not using Docker, you can install them with the following command:
pip install streamlit faker pandas
To build the Docker image, run the following command from the root directory of the project:
docker build --tag fake_dataset_builder:latest .This will create a Docker image named fake_dataset_builder.
To run the application in a Docker container, use the following command:
docker run -d --name fake_dataset_builder -p 8501:8501 fake_dataset_builder:latestThis command will run the container in the background, map port 8501 of the container to port 8501 on your machine, and start the Streamlit application.
Once the Docker container is running, open a web browser and go to:
http://localhost:8501
This will load the Streamlit interface where you can select the fields to include in your dataset, choose the locale, set the number of records to generate, and download the generated file.
You can find the source code and contribute to the project at:
Fake Dataset Creator GitHub Repository
- Select Fields: Choose the fields you want to include (e.g., Name, Address, Email).
- Select Locale: Choose the locale (e.g.,
en_US,fr_FR) for the generated data. - Set Number of Records: Choose how many records you want to generate.
- Select Output Format: Choose between CSV, JSON, or JSONL format.
- Generate File: Click the Generate File button, and a download link will appear.
- Download File: Download the generated file and use it for testing or data simulation.
If you prefer to run the application without Docker, you can run it locally by following these steps:
-
Clone this repository:
git clone https://github.com/We4TechAI/Fake-Dataset-Creator.git
-
Navigate to the project directory:
cd Fake-Dataset-Creator -
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run main.py
-
Open your web browser and go to http://localhost:8501 to interact with the app.
This project is licensed under the MIT License - see the LICENSE file for details.
- Project Overview: Describes what the project is about and its key features.
- Prerequisites: Lists the necessary tools and libraries.
- Docker Setup: Provides instructions on how to build and run the Docker container for the app.
- Running Locally: Explains how to run the app locally without Docker.
- Usage Instructions: Describes how to interact with the app once it's up and running.
- GitHub Link: A link to the GitHub repository where the source code is hosted.
- License and Acknowledgments: Includes information about the project's license and credits.
This README should give users everything they need to get started with your Fake Dataset Creator project, both through Docker and locally.
