
Strengths
- Graphical interface, no command line required, novice-friendly
- Supports searching and downloading models directly from Hugging Face
- Built-in ChatGPT-like conversation interface
- Provides a local OpenAI compatible API server
- Supports multiple quantization formats (GGUF) and flexibly adapts to hardware
Best for
- Getting Started with Local Large Model Running for Newbies
- Run models like Llama, Qwen, Mistral and more locally
- Provide AI API services for local applications
- Test and compare the performance of different models
- Scenarios with high data privacy requirements
Install and download models
LM Studio provides a complete graphical interface, from downloading the model to starting a conversation in just a few steps.
Install and download the first model
Steps: 1. Visit lmstudio.ai to download the installation package - Windows: .exe installation package - macOS: .dmg installation package - Linux: .AppImage 2. After the installation is complete, open LM Studio 3. Click the "Search" icon (magnifying glass) on the left 4. Search models, recommended for novices: - "Qwen2.5-7B-Instruct-GGUF" (Excellent Chinese) - "Llama-3.2-3B-Instruct-GGUF" (lightweight) 5. Select the quantized version (Q4_K_M is the balance between quality and size) 6. Click Download and wait for completion
After the model is downloaded, it will be displayed in the “My Models” list.
Q4_K_M quantized 7B model is about 4-5GB,
Download time depends on internet speed.
Q4_K_M Quantize is the most recommended choice, providing the best balance between quality and file size.
Choose the right model based on your hardware
Hardware configuration and model selection guide: 8GB memory (without discrete graphics): - Select 3B-4B parametric model - Quantization: Q4_K_M or Q5_K_M - Recommended: Llama-3.2-3B, Phi-3.5-mini 16GB memory (without discrete graphics): - Can run 7B parameter model - Quantization: Q4_K_M - Recommended: Qwen2.5-7B, Llama-3.1-8B With discrete graphics card (8GB video memory): - Models can be loaded to the GPU, greatly improving speed - Fully load the 7B model to the GPU - Turn on GPU acceleration in LM Studio settings 32GB RAM or 24GB VRAM: - Can run 13B-14B parametric models - Recommended: Qwen2.5-14B, Llama-3.1-14B
Choose the appropriate model based on your hardware,
Avoid running slowly due to too large models,
GPU acceleration can speed up generation by 5-10 times.
On the model download page of LM Studio, the hardware requirements of the model will be displayed, with green indicating recommended and yellow indicating barely usable.
Dialogue and use
After the model is downloaded, it can be used directly in the built-in dialogue interface of LM Studio.
Start your first conversation
Steps: 1. Click the "Conversation" icon (bubble) on the left 2. Click the model selection drop-down menu at the top 3. Select the downloaded model 4. Wait for the model to load (about 10-30 seconds for the first time) 5. Enter the question in the bottom input box Example conversation: "Hello, please introduce yourself in Chinese, And what can you do to help me? "
Start the conversation after the model is loaded,
Generation speed depends on hardware configuration,
CPU runs about 5-20 tokens/second,
GPU acceleration can reach 50-100+ tokens/second.
Loading the model is slow for the first time, subsequent conversations are faster. The context length can be adjusted in settings (affects memory usage).
Local API server
LM Studio can start a local API server for other applications to call.
Start the local API and call it with Python
Steps:
1. Click the "Developer" icon on the left in LM Studio
2. Select the model to use
3. Click "Start Server"
4. Default port: http://localhost:1234
Python call example:
```python
from openai import OpenAI
# Point to local LM Studio server
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio" # Any string
)
response = client.chat.completions.create(
model="qwen2.5-7b-instruct", # Use the loaded model name
messages=[
{"role": "user", "content": "Explain what a vector database is"}
]
)
print(response.choices[0].message.content)
```After the local API server starts,
Local models can be called with OpenAI compatible code,
The data does not leave the local area at all.
Ideal for developing and testing AI applications.
LM Studio's API format is fully compatible with OpenAI, and existing code can be reused by simply modifying the base_url.
Compared with similar tools
| Tool | Strength | Best for | Pricing |
|---|---|---|---|
| LM Studio This tool | The most user-friendly graphical interface, the first choice for novices, and easy model downloading | Users who are not familiar with the command line and want a graphical interface for local AI | completely free |
| Ollama | Command line tool, lighter weight, more stable API | Developers need stable API services | completely free |
| GPT4All | Simpler interface, focus on conversation | Only conversation functionality is required, no API required | completely free |
| Jan | Open source, beautiful interface, cross-platform | Pursue open source and beautiful interface | completely free |
Sources & references:
- LM Studio official website (2025-03)
- LM Studio Documentation (2025-03)