Local inference engine
aknowledgements: ggml-org/llama.cpp
The CLI is built for 4 platforms:
- macOS Apple Silicon, Intel
8667 (c08d28d08) - Windows AMD
8771 (873c82561), ARM8667 (c08d28d08)
git clone https://github.com/ggml-org/llama.cpp.git --recursive
cd llama.cpp
- Use
cmake.appwith generator=xcode - Set
BUILD_SHARED_LIBStoFALSE LLAMA_BUILD_SERVERisONby default- Set
LLAMA_BUILD_TESTStoOFF - Set path to static OpenSSL lib, include
- Open Xcode
Build:
- libllama.a
- llama-bench
- llama-cli
- llama-diffusion-cli
- llama-embedding
- llama-gguf
- llama-gguf-split
- llama-imatrix
- llama-perplexity
- llama-quantize
- llama-server
- llama-tokenize
- Back to
cmake.app - Set
GGML_CPUtoFALSE - Set
CMAKE_OSX_ARCHITECTUREStox86_64 - Back to Xcode
set
LLAMA_CURL to FALSEc.f. ggml-org/llama.cpp#9937
cmake -B build -G "Visual Studio 17 2022" -A x64 ^
-D OPENSSL_INCLUDE_DIR=C:\Users\miyako\Documents\GitHub\llama-cpp\include ^
-DOPENSSL_ROOT_DIR=C:\Users\miyako\Documents\GitHub\llama-cpp\lib\x64 ^
-DLLAMA_BUILD_TESTS=OFF ^
-DLLAMA_BUILD_SERVER=ON ^
-DGGML_OPENMP=OFF ^
-DGGML_CCACHE=OFF ^
-DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^
-DBUILD_SHARED_LIBS=FALSE
ARM NEON and fp16 C-intrinsics not supported by MSVC native compiler. Use Clang or ninja.
cmake -B build -G "Visual Studio 17 2022" -A ARM64 -T ClangCL ^
-DCMAKE_SYSTEM_PROCESSOR=ARM64 ^
-DOPENSSL_INCLUDE_DIR=C:\Users\miyako\Documents\GitHub\llama-cpp\include ^
-DOPENSSL_ROOT_DIR=C:\Users\miyako\Documents\GitHub\llama-cpp\lib\arm64 ^
-DLLAMA_BUILD_TESTS=OFF ^
-DLLAMA_BUILD_SERVER=ON ^
-DGGML_OPENMP=OFF ^
-DGGML_CCACHE=OFF ^
-DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded ^
-DBUILD_SHARED_LIBS=FALSE
cmake --build build --config Release