
Ternary Precision
-1, 0, +1 weights reduce memory, bandwidth and compute while preserving quality.

RPT packs, optimizes and runs LLMs with ternary precision. Faster inference. Smaller models. Real world performance.


-1, 0, +1 weights reduce memory, bandwidth and compute while preserving quality.

High density packing shrinks models up to 50% vs standard formats.

Optimized kernels deliver real speedups across consumer hardware.

Stable runtime, simple APIs and tooling for real world deployments.
Accuracy (MMLU)
100/100
Custom Benchmark
Size Reduction
~50%
vs Original FP16
Speedup
2.4x
Avg. Inference
Memory Savings
48%
Peak Usage
RPT is a lean, high performance runtime path for ternary packed models. The public repo keeps the notebooks, loaders and result trail visible.


The useful pieces are public: scripts, notebooks, runtime loaders, result files, manifests and model notices. Use them to reproduce the path or build a new one.