Unsloth’ s save_pretrained_gguf usually:

  1. merges LoRA,
  2. converts to 16-bit weights,
  3. writes a .gguf file (and metadata) and optionally quantizes to q4_k_m (like your previous script did).

Typical workflow (Windows)

  1. After training, confirm the .gguf file exists:
  2. Create a Modelfile text file for Ollama (in same folder or anywhere). Example Modelfile content:
FROM /absolute/path/to/gguf_model/model_file_name.gguf
# optional: add SYSTEM or additional metadata
  1. Use Ollama CLI to create & run:
# create model in Ollama
ollama create my-gemma-resume -f Modelfile

# run it
ollama run my-gemma-resume
  1. If ollama create complains about permissions/paths on Windows, copy the .gguf into your Ollama models folder:

    %USERPROFILE%\.ollama\models\ and then create a Modelfile referencing that local path or run ollama create pointing to the path.

Why GGUF?

GGUF stands for “GPT-Generated Unified Format”.

It is a model file format introduced by the llama.cpp project, designed to make large language models easier to run on local machines.

Key Points about GGUF:

  1. Unified Format
  2. Optimized for Local Inference
  3. Cross-Compatibility
  4. Quantization Support