I got the Developer Kit A02 (only one CSI camera interface, the B01 has two) of the Jetson Nano in early 2020. In 2021 I added the case, proper power supply and Wifi Card. Early 2024 I started to use it again, but run into limitations very early. Here is part of the documentation.

While some updated images with 20.04 exist, officially Nvidia only supports 18.04 LTS from the official website. This version is 7 years old in 2025 and ended support in May 2023. This adds severe software limitations to the already existing hardware limitations:
sudo apt-cache show nvidia-jetpackThe first system start (after having written the latest image from Nvidia to the SD card - 43 to 52 minutes) with some setup, useful software features and ssh login needs less than 8 minutes:
apt update (not upgrade), few packages and a rebootA few things you might want to do. Disable the graphical login. Update your apt repository (348 packages, 38 seconds). Install jtop. This will take another 3 minutes, followed by a reboot.
sudo systemctl set-default multi-user.target
sudo apt update
sudo apt install nano curl libcurl4-openssl-dev python3-pip
sudo -H pip3 install -U jetson-stats
sudo reboot

After this the system uses df some 12,615,632 Bytes of the SD card. jtop reports Jetpack 4.6.1 [L4T 32.7.1]. A run of sudo apt autoremove will take 45 seconds and save 114 MBytes. The compiler /usr/local/cuda/bin/nvcc --version returns:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
The kernel is uname -a: Linux nano 4.9.253-tegra #1 SMP PREEMPT Sat Feb 19 08:59:22 PST 2022. Without the GUI its 383M/3.87G Mem used. Compiler gcc --version is:
gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The hardware was released 2019, based on the Maxwell architecture of Nvidia from 2014. That’s 11 years by 2025.
I checked the actual memory bandwidth with sysbench:
mk@jetson:~$ sysbench memory --memory-block-size=1m run
sysbench 1.0.11 (using system LuaJIT 2.1.0-beta3)
68903.00 MiB transferred (6887.13 MiB/sec)
It looks like only 6.8 GB/s are usable with LPDDR4, not 25.60. This will limit the speed in token generation (TG) later.

See Nvidia Jetson AI Lab, it generally starts with the Jetson Orin Nano. The orignal Jetson is only there for comparison. For example the Ollama tutorial not even mention the 4GB Orin model, and lists JetPack 5 (L4T r35.x) and JetPack 6 (L4T r36.x) as requirements. The latest JetPack for the original Jetson Nano is JetPack 4.6.6 - see the archive.
You can install ollama on this machine. The simple instruction is curl -fsSL https://ollama.com/install.sh | sh. And it does run llama3.2:1b with 3.77 token/s at 100% CPU. Gemma3 is faster with more than 5 t/s. Is it possible to get GPU acceleration? The hardware should be able to, since Cuda CC >= 5.0 is required, and the Jetson has CC 5.3.
As per this ollama issue 4140 on Github it should not be possible to run ollama on the Jetson Nano. The challenge is the version of gcc.
mk@jetson:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_28_22:34:44_PST_2021
Cuda compilation tools, release 10.2, V10.2.300
Build cuda_10.2_r440.TC440_70.29663091_0
As tested by dtischler in May 4, 2024 an upgrade to gcc-11 is relatively easy done, but CUDA and the GPU are not usable, and it falls back to CPU inference.
Technically the Maxwell GPU with Cuda CC 5.3 should be supported by Cuda 12. But it would be privided by Nvidia with their JetPack SDK. And as the list of old versions indicate, the latest supported version for the Jetson Nano is 4.6.6. Since 5.1.1 the Jetson Orin Nano is supported.

You can compile and run llama.cpp on the Jetson Nano with a decent speed with the following commands:
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DLLAMA_CURL=ON
sudo cmake --build build --config Release
More in these repositories:

Look here and on medium.
The following procedure seems to be obsolete, now that specific entries for the Jetson Nano are included in the Makefile for llama.cpp and even ollama runs out-of-the-box. The gist does not mention the use of the GPU with CUDA for acceleration.
An gist article by Flor Sanders from April 2024 describes the process of running llama.cpp on a 2GB Jetson Nano. With 4GB it should be possible to run a complete llama3.2:1b model file. But you can’t use the provided gcc 7.5 compiler, you need at least 8.5 - so you compile it yourself over night. It needs around 3 hours to complete.
Since I have a 32GB SDcard and 11 GB free it should be possible to compile GCC 8.5 overnight. But first we have to add the link to nvcc the path. in ~/.bashrc I have to add (with nano, that has to be installed too):
$ export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
The recommended version of llama.cpp to check out is a33e6a0 from February 26, 2024. The current version of the Makefile has entries for the Jetson in line 476. It could well be that this only refers to run on the CPU (as with the mentioned Raspberry Pi’s) and not using the GPU with CUDA. This aligns with the error message by VViliams123 on October 4, 2024.
Install a CUDA version of llama.cpp, llama-server and llama-bench on the Jetson Nano in one minute, compiled with gcc 8.5. Just type:
curl -fsSL https://kreier.github.io/llama.cpp-jetson.nano/install.sh | sh
If the path is not automatically adjusted, run export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH or add this line permanently with nano ~/.bashrc to the end.

An article on Medium from September 2021 describes the installation of PoCL 1.7 on the Jetson Nano. By 2024 version 6.0 is the latest one, but it’s not supported by PoCL. The reason is described here, and related to the old Ubuntu 18.04. That’s why the latest version that can be installed is PoCL 3.0. This has been done successfully in October 2022.
On “old” Jetson boards (TX2, Xavier NX, AGX Xavier & Nano), it is not possible to install recent PoCL version 5 because the OS is too old (Ubuntu 18.04) and it is complicated to install a recent version of the required Clang compiler (version 17). This is why on these specific boards we will install PoCL version 3. One of the main drawback is that there is no GPU 16-bit float support in this version 😔.

Before we compile and install PoCL we need LLVM 11.1.0, the target-independent optimizer and code generator initially named Low Level Virtual Machine in 2003.
Install the LLVM for ARM64 and Jetson Nano on Ubuntu 18.04 (source):
export LLVM_VERSION=10
sudo apt install -y build-essential ocl-icd-libopencl1 cmake git pkg-config libclang-${LLVM_VERSION}-dev clang-${LLVM_VERSION} llvm-${LLVM_VERSION} make ninja-build ocl-icd-libopencl1 ocl-icd-dev ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils libxml2-dev libclang-cpp${LLVM_VERSION}-dev libclang-cpp${LLVM_VERSION} llvm-${LLVM_VERSION}-dev libncurses5
cd /opt
sudo wget https://github.com/llvm/llvm-project/releases/download/llvmorg-11.1.0/clang+llvm-11.1.0-aarch64-linux-gnu.tar.xz
sudo tar -xvvf clang+llvm-11.1.0-aarch64-linux-gnu.tar.xz
sudo mv clang+llvm-11.1.0-aarch64-linux-gnu llvm-11.1.0
sudo rm clang+llvm-11.1.0-aarch64-linux-gnu.tar.xz
cd ~/
mkdir softwares
cd softwares
git clone -b release_3_0 https://github.com/pocl/pocl.git pocl_3.0
cd pocl_3.0
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/opt/pocl-3.0 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-funroll-loops -march=native" -DCMAKE_C_FLAGS="-funroll-loops -march=native" -DWITH_LLVM_CONFIG=/opt/llvm-11.1.0/bin/llvm-config -DSTATIC_LLVM=ON -DENABLE_CUDA=ON ..
make -j4
sudo make install
sudo mkdir -p /etc/OpenCL/vendors/
sudo touch /etc/OpenCL/vendors/pocl.icd
echo "/opt/pocl-3.0/lib/libpocl.so" | sudo tee --append /etc/OpenCL/vendors/pocl.icd

Now OpenCL should be successfully installed on the system. You can check if it works with the following command:
mk@jetson:~$ clinfo
My result:
Number of platforms 1
Platform Name Portable Computing Language
Platform Vendor The pocl project
Platform Version OpenCL 3.0 PoCL 3.0-rc2 Linux, RELOC, LLVM 11.1.0, SLEEF, FP16, CUDA, POCL_DEBUG
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_pocl_content_size
Platform Host timer resolution 0ns
Platform Extensions function suffix POCL
Platform Name Portable Computing Language
Number of devices 2
Device Name pthread-cortex-a57
Device Vendor ARM
Device Vendor ID 0x13b5
Device Version OpenCL 1.2 PoCL HSTR: pthread-aarch64-unknown-linux-gnu-cortex-a57
Driver Version 3.0-rc2
Device OpenCL C Version OpenCL C 1.2 PoCL
Device Type CPU
Device Name NVIDIA Tegra X1
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 PoCL HSTR: CUDA-sm_53
Driver Version 3.0-rc2
Device OpenCL C Version OpenCL C 1.2 PoCL
Device Type GPU
Device Topology (NV) PCI-E, 00:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 1
Max clock frequency 921MHz
Compute Capability (NV) 5.3
clpeak Benchmarkcd ~/ && mkdir workspace && cd workspace
git clone https://github.com/krrishnarraj/clpeak.git
cd clpeak
git submodule update --init --recursive --remote
mkdir build && cd build
cmake ..
make -j4
./clpeak
Result (also here in a Google Sheet):
Platform: Portable Computing Language
Device: pthread-cortex-a57
Driver version : 3.0-rc2 (Linux ARM64)
Compute units : 4
Clock frequency : 1479 MHz
Single-precision compute (GFLOPS)
float : 1.04
float2 : 2.09
float4 : 4.14
float8 : 8.20
float16 : 15.94
Integer compute (GIOPS)
int : 3.80
int2 : 3.28
int4 : 5.69
int8 : 11.14
int16 : 21.34
Device: NVIDIA Tegra X1
Driver version : 3.0-rc2 (Linux ARM64)
Compute units : 1
Clock frequency : 921 MHz
Single-precision compute (GFLOPS)
float : 220.29
float2 : 228.37
float4 : 229.89
float8 : 228.30
float16 : 228.25
Integer compute (GIOPS)
int : 62.01
int2 : 77.55
int4 : 77.35
int8 : 57.64
int16 : 63.62
The GPU is slightly faster, but provides only 228 GFLOPS and 0.06 TOPS with 25 GB/s theoretical memory bandwidth. In theory it should achieve 472 GFLOPS in FP16. For comparison: the Jetson Nano Orin Super has 2560 GFLOPS and 67 TOPS with 102 GB/s theoretical memory bandwidth.
Deactivate the GUI with sudo systemctl set-default multi-user.target. To apply, do a reboot with sudo reboot. Reactivate with sudo systemctl set-default graphical.target and sudo reboot.