Lighter inference.
Sharper model path.
Real bundle.

RPT packs, optimizes and runs LLMs with ternary precision. Faster inference. Smaller models. Real world performance.

Public research is open.
Scripts, notebooks, results and runtime experiments are available.
Fork it, reproduce it, improve it.

Ternary
Packed
Runtime

3 States01More Density02Less Weight03

Ternary Precision

-1, 0, +1 weights reduce memory, bandwidth and compute while preserving quality.

Packed Models

High density packing shrinks models up to 50% vs standard formats.

Faster Inference

Optimized kernels deliver real speedups across consumer hardware.

Production Ready

Stable runtime, simple APIs and tooling for real world deployments.

Benchmarks

Accuracy (MMLU)

100/100

Custom Benchmark

Size Reduction

~50%

vs Original FP16

Speedup

2.4x

Avg. Inference

Memory Savings

48%

Peak Usage

Gemma 2B Ternary vs FP16 - RTX 3060 12GB - Batch Size: 1 - Seq Len: 1024View full report ->

Runtime

Engineered to run.
Built to scale.

RPT is a lean, high performance runtime path for ternary packed models. The public repo keeps the notebooks, loaders and result trail visible.

vValidated Linux/NVIDIA target
vOpen notebooks and scripts
vStreaming & batching
vPublished result trail

Runtime docs ->

Open Source

Build from the working trail

The useful pieces are public: scripts, notebooks, runtime loaders, result files, manifests and model notices. Use them to reproduce the path or build a new one.

vResearch notebooks
vRuntime loaders
vBenchmark JSON/CSV
vModel license notices

Read Guide ->Open Source Map ->

Model weights remain subject to their original licenses.

Lighter inference.Sharper model path.Real bundle.