RPT ternary runtime

Lighter inference.
Sharper model path.
Real bundle.

RPT packs, optimizes and runs LLMs with ternary precision. Faster inference. Smaller models. Real world performance.

Public research is open.
Scripts, notebooks, results and runtime experiments are available.
Fork it, reproduce it, improve it.
RPT dog technical diagram

Ternary Precision

-1, 0, +1 weights reduce memory, bandwidth and compute while preserving quality.

Packed Models

High density packing shrinks models up to 50% vs standard formats.

Faster Inference

Optimized kernels deliver real speedups across consumer hardware.

Production Ready

Stable runtime, simple APIs and tooling for real world deployments.

Benchmarks

Accuracy (MMLU)

100/100

Custom Benchmark

Size Reduction

~50%

vs Original FP16

Speedup

2.4x

Avg. Inference

Memory Savings

48%

Peak Usage

Gemma 2B Ternary vs FP16 - RTX 3060 12GB - Batch Size: 1 - Seq Len: 1024View full report ->
Runtime

Engineered to run.
Built to scale.

RPT is a lean, high performance runtime path for ternary packed models. The public repo keeps the notebooks, loaders and result trail visible.

  • vValidated Linux/NVIDIA target
  • vOpen notebooks and scripts
  • vStreaming & batching
  • vPublished result trail
Runtime docs ->
RPT runtime terminal
RPT bundle box
Open Source

Build from the working trail

The useful pieces are public: scripts, notebooks, runtime loaders, result files, manifests and model notices. Use them to reproduce the path or build a new one.

  • vResearch notebooks
  • vRuntime loaders
  • vBenchmark JSON/CSV
  • vModel license notices
Read Guide ->Open Source Map ->

Model weights remain subject to their original licenses.