LM Studio — Guide | AI devotee

LM Studio — User Guide

GUI for local LLMs.

Visit website VPN may be required Free

Strengths

Graphical interface, no command line required, novice-friendly
Supports searching and downloading models directly from Hugging Face
Built-in ChatGPT-like conversation interface
Provides a local OpenAI compatible API server
Supports multiple quantization formats (GGUF) and flexibly adapts to hardware

Best for

Getting Started with Local Large Model Running for Newbies
Run models like Llama, Qwen, Mistral and more locally
Provide AI API services for local applications
Test and compare the performance of different models
Scenarios with high data privacy requirements

Install and download models

LM Studio provides a complete graphical interface, from downloading the model to starting a conversation in just a few steps.

Scenario

Install and download the first model

Prompt example

Steps:
1. Visit lmstudio.ai to download the installation package
   - Windows: .exe installation package
   - macOS: .dmg installation package
   - Linux: .AppImage

2. After the installation is complete, open LM Studio

3. Click the "Search" icon (magnifying glass) on the left

4. Search models, recommended for novices:
   - "Qwen2.5-7B-Instruct-GGUF" (Excellent Chinese)
   - "Llama-3.2-3B-Instruct-GGUF" (lightweight)

5. Select the quantized version (Q4_K_M is the balance between quality and size)

6. Click Download and wait for completion

Output / what to expect

After the model is downloaded, it will be displayed in the “My Models” list.

Q4_K_M quantized 7B model is about 4-5GB,

Download time depends on internet speed.

Tips

Q4_K_M Quantize is the most recommended choice, providing the best balance between quality and file size.

Scenario

Choose the right model based on your hardware

Prompt example

Hardware configuration and model selection guide:




8GB memory (without discrete graphics):


- Select 3B-4B parametric model


- Quantization: Q4_K_M or Q5_K_M


- Recommended: Llama-3.2-3B, Phi-3.5-mini




16GB memory (without discrete graphics):


- Can run 7B parameter model


- Quantization: Q4_K_M


- Recommended: Qwen2.5-7B, Llama-3.1-8B




With discrete graphics card (8GB video memory):


- Models can be loaded to the GPU, greatly improving speed


- Fully load the 7B model to the GPU


- Turn on GPU acceleration in LM Studio settings




32GB RAM or 24GB VRAM:


- Can run 13B-14B parametric models


- Recommended: Qwen2.5-14B, Llama-3.1-14B

Output / what to expect

Choose the appropriate model based on your hardware,

Avoid running slowly due to too large models,

GPU acceleration can speed up generation by 5-10 times.

Tips

On the model download page of LM Studio, the hardware requirements of the model will be displayed, with green indicating recommended and yellow indicating barely usable.

Dialogue and use

After the model is downloaded, it can be used directly in the built-in dialogue interface of LM Studio.

Scenario

Start your first conversation

Prompt example

Steps:


1. Click the "Conversation" icon (bubble) on the left


2. Click the model selection drop-down menu at the top


3. Select the downloaded model


4. Wait for the model to load (about 10-30 seconds for the first time)


5. Enter the question in the bottom input box




Example conversation:


"Hello, please introduce yourself in Chinese,


And what can you do to help me? "

Output / what to expect

Start the conversation after the model is loaded,

Generation speed depends on hardware configuration,

CPU runs about 5-20 tokens/second,

GPU acceleration can reach 50-100+ tokens/second.

Tips

Loading the model is slow for the first time, subsequent conversations are faster. The context length can be adjusted in settings (affects memory usage).

Local API server

LM Studio can start a local API server for other applications to call.

Scenario

Start the local API and call it with Python

Prompt example

Steps:
1. Click the "Developer" icon on the left in LM Studio
2. Select the model to use
3. Click "Start Server"
4. Default port: http://localhost:1234

Python call example:
```python
from openai import OpenAI

# Point to local LM Studio server
client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio" # Any string
)

response = client.chat.completions.create(
    model="qwen2.5-7b-instruct", # Use the loaded model name
    messages=[
        {"role": "user", "content": "Explain what a vector database is"}
    ]
)
print(response.choices[0].message.content)
```

Output / what to expect

After the local API server starts,

Local models can be called with OpenAI compatible code,

The data does not leave the local area at all.

Ideal for developing and testing AI applications.

Tips

LM Studio's API format is fully compatible with OpenAI, and existing code can be reused by simply modifying the base_url.

Compared with similar tools

Tool	Strength	Best for	Pricing
LM Studio This tool	The most user-friendly graphical interface, the first choice for novices, and easy model downloading	Users who are not familiar with the command line and want a graphical interface for local AI	completely free
Ollama	Command line tool, lighter weight, more stable API	Developers need stable API services	completely free
GPT4All	Simpler interface, focus on conversation	Only conversation functionality is required, no API required	completely free
Jan	Open source, beautiful interface, cross-platform	Pursue open source and beautiful interface	completely free

Sources & references:

LM Studio official website (2025-03)
LM Studio Documentation (2025-03)