Exa

Ultra-optimized fast inference library for running exascale LLMs locally on modern consumer-class GPUs.

Principles

Radical Simplicity (Utilizing super-powerful LLMs with as minimal code as possible)
Ultra-Optimizated (High Performance classes that extract all the power from these LLMs)
Fludity & Shapelessness (Plug in and play and re-architecture as you please)

🤝 Schedule a 1-on-1 Session

Book a 1-on-1 Session with Kye, the Creator, to discuss any issues, provide feedback, or explore how we can improve Exa for you.

📦 Installation 📦

You can install the package using pip

pip install exxa

Usage

Inference

from exa import Inference

model = Inference(
    model_id="georgesung/llama2_7b_chat_uncensored",
    quantize=True
)

model.run("What is your name")

GPTQ Inference

from exa import GPTQInference


model_id = "facebook/opt-125m"
model = GPTQInference(model_id=model_id, max_length=400)

prompt = "in a land far far away"
result = model.run(prompt)
print(result)

🎉 Features 🎉

World-Class Quantization: Get the most out of your models with top-tier performance and preserved accuracy! 🏋️‍♂️
Automated PEFT: Simplify your workflow! Let our toolkit handle the optimizations. 🛠️
LoRA Configuration: Dive into the potential of flexible LoRA configurations, a game-changer for performance! 🌌
Seamless Integration: Designed to work seamlessly with popular models like LLAMA, Falcon, and more! 🤖

💌 Feedback & Contributions 💌

We're excited about the journey ahead and would love to have you with us! For feedback, suggestions, or contributions, feel free to open an issue or a pull request. Let's shape the future of fine-tuning together! 🌱

License

MIT