pkgsrc.se | The NetBSD package collection

./wip/llama.cpp, LLM inference in C/C++

[ CVSweb ] [ Homepage ] [ RSS ] [ Required by ] [ Add to tracker ]

Branch: CURRENT, Version: 0.0.2.3183, Package name: llama.cpp-0.0.2.3183, Maintainer: pkgsrc-users

The main goal of llama.cpp is to enable LLM inference with minimal
setup and state-of-the-art performance on a wide variety of hardware
- locally and in the cloud.

* Plain C/C++ implementation without any dependencies
* Apple silicon is a first-class citizen - optimized via ARM NEON,
Accelerate and Metal frameworks
* AVX, AVX2 and AVX512 support for x86 architectures
* 1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer
quantization for faster inference and reduced memory use
* Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for
AMD GPUs via HIP)
* Vulkan and SYCL backend support
* CPU+GPU hybrid inference to partially accelerate models larger
than the total VRAM capacity

Master sites:

https://github.com/ggerganov/ (Download)

Filesize: 20116.013 KB

Version history: (Expand)

(2024-06-19) Updated to version: llama.cpp-0.0.2.3183
(2024-06-18) Updated to version: llama.cpp-0.0.2.3173
(2024-06-08) Package added to pkgsrc.se, version llama.cpp-0.0.2.3091 (created)