Ollama - Running (small) LLMs at home
This is just a documentation of my learning progress.
Memory allocation on multiple GPUs
In general ollama does a good job finding multiple GPUs and split the layers of the model to different GPUs. Yet sometimes it reservers too much overhead and does not utilizes the full potential. A granular settings approach did not help yet.
2025 starting on smartphones
I have some systems with several CPUs, and some times they can be used to train AI models.