Ollama

Ollama — User Guide

Local open models.

Strengths
  • Minimalist installation, one-line command to run large models
  • Supports mainstream models such as Llama 3, Mistral, Qwen, DeepSeek, etc.
  • Data is processed completely locally and privacy is guaranteed.
  • Provide REST API for easy integration into own applications
  • Completely free, no token restrictions, no network dependencies
Best for
  • Run AI assistant locally to protect data privacy
  • Develop and test AI applications (local API service)
  • Use AI offline (no internet connection required)
  • Enterprise intranet deployment, data does not leave the country
  • Learn and research open source large models

Installation and quick start

Ollama is extremely easy to install and you can run large models locally in minutes.

Scenario

Install Ollama and run your first model

Prompt example
# Step 1: Install Ollama


# macOS/Linux:


curl -fsSL https://ollama.com/install.sh | sh




# Windows:


# Visit https://ollama.com/download to download the installation package




# Step 2: Run the model (automatically download)


ollama run llama3.2




# Step 3: Start a conversation


# After the installation is complete, enter the question directly in the terminal
Output / what to expect

Ollama will automatically download the Llama 3.2 model (approximately 2GB),

After the download is completed, enter the conversation mode directly.

Conversations can be had in the terminal just like using ChatGPT.

Type /bye to exit the conversation.

Tips

You need to download the model for the first run. It is recommended to choose the model size suitable for your own hardware (a 7B model requires about 8GB of memory).

Scenario

View available models and choose the right one

Prompt example
# View installed models


ollama list




# Search available models (visit ollama.com/library)


# Commonly used model recommendations:




# General dialogue (Chinese and English):


ollama pull qwen2.5:7b # Ali Tongyi Qianwen, excellent in Chinese


ollama pull deepseek-r1:7b # DeepSeek inference model




# Code generation:


ollama pull qwen2.5-coder:7b # Code-specific model


ollama pull codellama:7b # Meta code model




#Lightweight (low configuration computer):


ollama pull llama3.2:1b # 1B parameters, very low configuration to run
Output / what to expect

Choose the appropriate model based on your hardware configuration:

  • 8GB memory: select 7B parameter model

  • 16GB memory: can run 13B parameter model

  • 32GB+ memory: can run 34B parameter model

Tips

The Qwen series is recommended for Chinese tasks, and Qwen-Coder or DeepSeek-Coder is recommended for coding tasks.

API integration development

Ollama provides a REST API compatible with the OpenAI format for easy integration into your own applications.

Scenario

Call the local Ollama API using Python

Prompt example
Please show how to call the native Ollama API to have a conversation
using Python:




Requirements:


- Use ollama Python library (easiest)


-Support streaming output (streaming)


- Supports multiple rounds of dialogue (maintaining context)


- The code is concise and can be run directly
Output / what to expect
import ollama

# Simple conversation
response = ollama.chat(
    model='qwen2.5:7b',
    messages=[{'role': 'user', 'content': 'Hello, introduce yourself'}]
)
print(response['message']['content'])

# Streaming output
stream = ollama.chat(
    model='qwen2.5:7b',
    messages=[{'role': 'user', 'content': 'Write a poem about autumn'}],
    stream=True
)
for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

#Multiple rounds of dialogue
messages = []
while True:
    user_input = input("you: ")
    if user_input == 'quit':
        break
    messages.append({'role': 'user', 'content': user_input})
    response = ollama.chat(model='qwen2.5:7b', messages=messages)
    assistant_msg = response['message']['content']
    messages.append({'role': 'assistant', 'content': assistant_msg})
    print(f"AI: {assistant_msg}\n")
Tips

First run `pip install ollama` to install the library and ensure that the Ollama service is running in the background (ollama serve).

Scenario

Use OpenAI compatible APIs

Prompt example
Ollama is compatible with the OpenAI API format and can be called directly using the openai library:

```python
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama' # Any string will do
)

response = client.chat.completions.create(
    model='qwen2.5:7b',
    messages=[
        {'role': 'system', 'content': 'You are a helpful assistant'},
        {'role': 'user', 'content': 'Explain what machine learning is'}
    ]
)
print(response.choices[0].message.content)
```
Output / what to expect

Existing OpenAI applications can be easily

To switch to local Ollama, just modify base_url.

Tips

This method is particularly suitable for migrating existing OpenAI applications to local deployment, with almost no need to change the code.

Integration with Open WebUI (graphical interface)

With Open WebUI, you can get a graphical interface experience similar to ChatGPT.

Scenario

Install Open WebUI to get a ChatGPT-like interface

Prompt example
# Install Open WebUI using Docker
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

# Visit http://localhost:3000
# First access requires registering an administrator account
Output / what to expect

After the installation is complete, visit localhost:3000 in the browser,

Get a full ChatGPT-style interface,

Supports functions such as conversation history, model switching, file upload, etc.

All data is still processed locally.

Tips

Docker needs to be installed first. Open WebUI supports multiple users and is suitable for internal team deployment.

Compared with similar tools

ToolStrengthBest forPricing
Ollama This toolThe installation is the simplest, the data is completely local, and it is completely freeRequires local deployment, data privacy protection, and offline usecompletely free
LM StudioGraphical interface, novice-friendly, no command line requiredUsers who are not familiar with the command line and want a graphical interfacecompletely free
Hugging FaceThe largest number of models and the most active communityNeed more model options for research purposesFree / Pro $9/month
Together AICloud inference, no local hardware requiredInsufficient local hardware requires cloud inferencePay by token

Sources & references: