RPT ternary runtime

Lighter inference.
Sharper model path.
Real bundle.

RPT packs, optimizes and runs LLMs with ternary precision. Faster inference. Smaller models. Real world performance.

$ curl -O https://downloads.rpt.dog/rpt/gemma2b-ternary/v0.1.0/production_triton_packed_recovered_l78.zip

Host bundle on downloads.rpt.dog or keep this as the final subdomain.

RPT dog mark
Ternary
Packed
Runtime
+1
0
-1

Ternary Precision

-1, 0, +1 weights reduce memory, bandwidth and compute while preserving quality.

Packed Models

High density packing shrinks models up to 50% vs standard formats.

Faster Inference

Optimized kernels deliver real speedups across compatible hardware.

Production Ready

Stable runtime, simple APIs and tooling for real world deployments.

Benchmarks

Accuracy (local)

100/100

Custom benchmark

Size reduction

~50%

vs original FP16

Dense baseline

0.92

student_pruned local score

Original HF

0.90

Gemma 2B local score

Runtime

Engineered to run.
Built to scale.

RPT is a lean, high performance runtime for ternary packed models. Minimal deps, max performance.

  • Cross platform (Linux, Windows)
  • CUDA and CPU backends
  • Streaming and batching
  • Extensible and open
Documentation
RPT runtime terminal panel
RPT bundle box
Downloads

Get RPT bundle

Everything you need to pack, run and deploy ternary models.

  • RPT Runtime
  • Packer and tooling
  • Examples and configs
  • Model adapters
$ curl -O https://downloads.rpt.dog/rpt/gemma2b-ternary/v0.1.0/production_triton_packed_recovered_l78.zip
SHA256 371ed799c773...032adownloads.rpt.dog