TinyLlama CoreML iOS 18 Quantization 🦙📱

Welcome to the TinyLlama CoreML iOS 18 Quantization repository! This project focuses on converting the TinyLlama-1.1B-Chat model from PyTorch to CoreML formats such as float16, int8, and int4. This conversion allows for efficient on-device inference on iOS 18 and later.

You can find the latest releases here. Download the necessary files and execute them to get started.

Overview

TinyLlama is a state-of-the-art language model designed for mobile applications. By quantizing this model, we make it lightweight and efficient for use on iOS devices. This repository provides the tools necessary to convert and optimize the TinyLlama model, ensuring it runs smoothly on Apple Silicon.

Features

Efficient Quantization: Convert models to float16, int8, and int4 formats.
On-Device Inference: Optimized for iOS 18 and later.
Easy Integration: Simple setup for developers.
Hugging Face Compatibility: Leverage the power of Hugging Face transformers.

Installation

To install the necessary tools and libraries, follow these steps:

Clone the repository:

git clone https://github.com/ambv231/tinyllama-coreml-ios18-quantization.git
cd tinyllama-coreml-ios18-quantization

Install dependencies using pip:
```
pip install -r requirements.txt
```
Ensure you have the latest version of Xcode installed on your machine.
Download the latest model files from the Releases section.

Usage

After installation, you can begin using the TinyLlama model in your iOS applications. Here's a simple example of how to load and use the model:

import CoreML

guard let model = try? TinyLlama(configuration: MLModelConfiguration()) else {
    fatalError("Could not load model")
}

// Perform inference
let input = TinyLlamaInput(text: "Hello, world!")
let output = try? model.prediction(input: input)
print(output?.response ?? "No response")

Model Details

TinyLlama-1.1B-Chat

Parameters: 1.1 billion
Architecture: Transformer-based
Training Data: Diverse datasets for improved language understanding

Supported Formats

float16: A half-precision floating-point format that reduces memory usage.
int8: An 8-bit integer format for faster computations.
int4: A 4-bit integer format for even smaller model sizes.

Quantization Techniques

Quantization is the process of mapping a large set of values to a smaller set. In the context of machine learning, it helps in reducing the model size and improving inference speed without significantly sacrificing accuracy.

Techniques Used

Post-Training Quantization: This technique applies quantization after the model has been trained. It allows for efficient conversion with minimal loss in performance.
Dynamic Quantization: This approach quantizes weights on-the-fly during inference, allowing for flexibility and speed.
Quantization-Aware Training: This method involves training the model with quantization in mind, helping it adapt to the reduced precision.

Contributing

We welcome contributions to improve this project. If you want to help, please follow these steps:

Fork the repository.
Create a new branch for your feature or bug fix.
Make your changes and commit them.
Push your changes to your fork.
Submit a pull request.

Please ensure your code adheres to the project's coding standards and includes relevant tests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or support, please open an issue on GitHub or contact the repository owner.

You can also find the latest releases here. Download the files you need and start working with TinyLlama today!

This README provides an overview of the TinyLlama CoreML iOS 18 Quantization project. For further details and updates, please check the repository frequently.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
convert_tinyllama_to_coreml.py		convert_tinyllama_to_coreml.py
quantize_coreml.py		quantize_coreml.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TinyLlama CoreML iOS 18 Quantization 🦙📱

Table of Contents

Overview

Features

Installation

Usage

Model Details

TinyLlama-1.1B-Chat

Supported Formats

Quantization Techniques

Techniques Used

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ambv231/tinyllama-coreml-ios18-quantization

Folders and files

Latest commit

History

Repository files navigation

TinyLlama CoreML iOS 18 Quantization 🦙📱

Table of Contents

Overview

Features

Installation

Usage

Model Details

TinyLlama-1.1B-Chat

Supported Formats

Quantization Techniques

Techniques Used

Contributing

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages