Greater efficiency on consumer hardware

According to the company, the new framework significantly reduces memory and compute requirements. This opens the door to fine-tuning models on standard laptops, consumer GPUs from AMD, Intel, and Apple, as well as on modern mobile devices.

Benchmarks show that the BitNet-1B model uses up to 77.8% less VRAM than traditional 16-bit models such as Gemma or Qwen. That efficiency gain makes it possible to run larger models on hardware that was previously seen as insufficient.

Tests on iPhone 16 and Samsung S25

Tether demonstrated the technology on flagship smartphones. A BitNet model with 125 million parameters was fine-tuned on a biomedical dataset on a Samsung S25 in about 10 minutes.

On the iPhone 16, the team was able to fine-tune models with up to 13 billion parameters. In addition, inference performance on mobile GPUs was reported to be 2 to 11 times faster than on CPUs.

A step toward decentralized AI

Tether CEO Paolo Ardoino said that centralized training of AI models could slow innovation and create imbalances in access to technology. The company’s goal is to make AI more accessible by allowing users to work with models locally while keeping control over their data.

The framework also becomes the first to support LoRA fine-tuning for 1-bit LLMs on non-Nvidia hardware, reducing dependence on any single chipmaker.

Conclusion

QVAC Fabric could become an important step toward the democratization of AI. By lowering infrastructure requirements, it opens model development to a broader group of users and companies and may accelerate the growth of decentralized technology.