How to Use LLM
How to use LLMs (Large Language Models), including how to host them, how to add a frontend to them and others.
Ollama
- Ollama
- Ollama github repo
Installation
My system configuration is:
- Ubuntu 22.04
- ISA: x86_64
- GPUs: two NVIDIA 1080Tis
Install on bare-metal
1 | curl -fsSL https://ollama.com/install.sh | sh |
Manual install instructions
Install with Docker
First, you need to install the NVIDIA Container Toolkit. Since I use Ubuntu, I choose the apt
installation method.
Configure the production repository:
1
2
3
4curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listOptionally, configure the repository to use experimental packages:
1
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
Update the packages list from the repository:
1
sudo apt-get update
Install the NVIDIA Container Toolkit packages:
1
sudo apt-get install -y nvidia-container-toolkit
Restart the Docker daemon:
1
sudo systemctl restart docker
Next, Check if your docker service is healthy:
1 | sudo systemctl status docker |
If it shows error like:
1 | Failed to start Docker Application Container Engine. |
You need to reboot the computer.
FInally, you can run the container:
1 | docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama |
Run the model
To run and chat with Llama 2:
If you install ollama on bare-metal:
1 | ollama run llama2 |
If you install ollama via docker
1 | docker exec -it ollama ollama run llama2 |
You can switch to other models. Ollama supports a list of open-source models available on the Ollama library.
Here are some example open-source models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 2 | 7B | 3.8GB | ollama run llama2 |
Mistral | 7B | 4.1GB | ollama run mistral |
Dolphin Phi | 2.7B | 1.6GB | ollama run dolphin-phi |
Phi-2 | 2.7B | 1.7GB | ollama run phi |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
Llama 2 13B | 13B | 7.3GB | ollama run llama2:13b |
Llama 2 70B | 70B | 39GB | ollama run llama2:70b |
Orca Mini | 3B | 1.9GB | ollama run orca-mini |
Vicuna | 7B | 3.8GB | ollama run vicuna |
LLaVA | 7B | 4.5GB | ollama run llava |
Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Customize a model
Import from GGUF
GGUF (GPT-Generated Unified Format), introduced as a successor to GGML (GPT-Generated Model Language), is a file format used for storing models for inference, especially in the context of language models like GPT (Generative Pre-trained Transformer).
Ollama supports importing GGUF models in the Modelfile:
Create a file named
Modelfile
, with aFROM
instruction with the local filepath to the model you want to import.1
FROM ./vicuna-33b.Q4_0.gguf
Create the model in Ollama
1
ollama create example -f Modelfile
Run the model
1
ollama run example
Import from PyTorch or Safetensors
See the guide on importing models for more information.
Customize a prompt
Models from the Ollama library can be customized with a prompt. For example, to customize the llama2
model:
1 | ollama pull llama2 |
Create a Modelfile
:
1 | FROM llama2 |
Next, create and run the model:
1 | ollama create mario -f ./Modelfile |
For more examples, see the examples directory. For more information on working with a Modelfile, see the Modelfile documentation.
Open-webui
- Github: open-webui
- Open WebUI Community
Ollama works at terminal, we can install a frontend for it. We choose open-webui since its fast and has no bug.
Installation
Install Ollama, by default it listens to port 11434.
Run
1
docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_API_BASE_URL=http://127.0.0.1:11434/api --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Source: Open WebUI: Server Connection Error
Then visit http://localhost:8080
BionicGPT
BionicGPT
BionicGPT github repo
Install BionicGPT via docker compose
The easiest way to get running with BionicGPT is with our docker-compose.yml
file. You'll need Docker installed on your machine.
1 | mkdir BionicGPT |
You must have access to ports 7800
and 7810
.
Run the User Interface
You can then access the front end from http://localhost:7800
and you'll be redirected to a registration screen.
The first user to register with BionicGPT will become the system administrator.