Ollama - Running (small) LLMs at home

This is just a documentation of my learning progress.

Memory allocation on multiple GPUs

In general ollama does a good job finding multiple GPUs and split the layers of the model to different GPUs. Yet sometimes it reservers too much overhead and does not utilizes the full potential. A granular settings approach did not help yet.

2025 starting on smartphones

I have some systems with several CPUs, and some times they can be used to train AI models.