
Lighter inference.
Sharper model path.
Real bundle.
RPT packs, optimizes and runs LLMs with ternary precision. Faster inference. Smaller models. Real world performance.
$ curl -O https://downloads.rpt.dog/rpt/gemma2b-ternary/v0.1.0/production_triton_packed_recovered_l78.zipHost bundle on downloads.rpt.dog or keep this as the final subdomain.

Ternary Precision
-1, 0, +1 weights reduce memory, bandwidth and compute while preserving quality.
Packed Models
High density packing shrinks models up to 50% vs standard formats.
Faster Inference
Optimized kernels deliver real speedups across compatible hardware.
Production Ready
Stable runtime, simple APIs and tooling for real world deployments.
Accuracy (local)
100/100
Custom benchmark
Size reduction
~50%
vs original FP16
Dense baseline
0.92
student_pruned local score
Original HF
0.90
Gemma 2B local score
Engineered to run.
Built to scale.
RPT is a lean, high performance runtime for ternary packed models. Minimal deps, max performance.
- ✓Cross platform (Linux, Windows)
- ✓CUDA and CPU backends
- ✓Streaming and batching
- ✓Extensible and open


Get RPT bundle
Everything you need to pack, run and deploy ternary models.
- ✓RPT Runtime
- ✓Packer and tooling
- ✓Examples and configs
- ✓Model adapters
$ curl -O https://downloads.rpt.dog/rpt/gemma2b-ternary/v0.1.0/production_triton_packed_recovered_l78.zip