LM Studio

LM Studio — User Guide

GUI for local LLMs.

Visit website VPN may be required Free
Strengths
  • Graphical interface, no command line required, novice-friendly
  • Supports searching and downloading models directly from Hugging Face
  • Built-in ChatGPT-like conversation interface
  • Provides a local OpenAI compatible API server
  • Supports multiple quantization formats (GGUF) and flexibly adapts to hardware
Best for
  • Getting Started with Local Large Model Running for Newbies
  • Run models like Llama, Qwen, Mistral and more locally
  • Provide AI API services for local applications
  • Test and compare the performance of different models
  • Scenarios with high data privacy requirements

Install and download models

LM Studio provides a complete graphical interface, from downloading the model to starting a conversation in just a few steps.

Scenario

Install and download the first model

Prompt example
Steps:
1. Visit lmstudio.ai to download the installation package
   - Windows: .exe installation package
   - macOS: .dmg installation package
   - Linux: .AppImage

2. After the installation is complete, open LM Studio

3. Click the "Search" icon (magnifying glass) on the left

4. Search models, recommended for novices:
   - "Qwen2.5-7B-Instruct-GGUF" (Excellent Chinese)
   - "Llama-3.2-3B-Instruct-GGUF" (lightweight)

5. Select the quantized version (Q4_K_M is the balance between quality and size)

6. Click Download and wait for completion
Output / what to expect

After the model is downloaded, it will be displayed in the “My Models” list.

Q4_K_M quantized 7B model is about 4-5GB,

Download time depends on internet speed.

Tips

Q4_K_M Quantize is the most recommended choice, providing the best balance between quality and file size.

Scenario

Choose the right model based on your hardware

Prompt example
Hardware configuration and model selection guide:




8GB memory (without discrete graphics):


- Select 3B-4B parametric model


- Quantization: Q4_K_M or Q5_K_M


- Recommended: Llama-3.2-3B, Phi-3.5-mini




16GB memory (without discrete graphics):


- Can run 7B parameter model


- Quantization: Q4_K_M


- Recommended: Qwen2.5-7B, Llama-3.1-8B




With discrete graphics card (8GB video memory):


- Models can be loaded to the GPU, greatly improving speed


- Fully load the 7B model to the GPU


- Turn on GPU acceleration in LM Studio settings




32GB RAM or 24GB VRAM:


- Can run 13B-14B parametric models


- Recommended: Qwen2.5-14B, Llama-3.1-14B
Output / what to expect

Choose the appropriate model based on your hardware,

Avoid running slowly due to too large models,

GPU acceleration can speed up generation by 5-10 times.

Tips

On the model download page of LM Studio, the hardware requirements of the model will be displayed, with green indicating recommended and yellow indicating barely usable.

Dialogue and use

After the model is downloaded, it can be used directly in the built-in dialogue interface of LM Studio.

Scenario

Start your first conversation

Prompt example
Steps:


1. Click the "Conversation" icon (bubble) on the left


2. Click the model selection drop-down menu at the top


3. Select the downloaded model


4. Wait for the model to load (about 10-30 seconds for the first time)


5. Enter the question in the bottom input box




Example conversation:


"Hello, please introduce yourself in Chinese,


And what can you do to help me? "
Output / what to expect

Start the conversation after the model is loaded,

Generation speed depends on hardware configuration,

CPU runs about 5-20 tokens/second,

GPU acceleration can reach 50-100+ tokens/second.

Tips

Loading the model is slow for the first time, subsequent conversations are faster. The context length can be adjusted in settings (affects memory usage).

Local API server

LM Studio can start a local API server for other applications to call.

Scenario

Start the local API and call it with Python

Prompt example
Steps:
1. Click the "Developer" icon on the left in LM Studio
2. Select the model to use
3. Click "Start Server"
4. Default port: http://localhost:1234

Python call example:
```python
from openai import OpenAI

# Point to local LM Studio server
client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="lm-studio" # Any string
)

response = client.chat.completions.create(
    model="qwen2.5-7b-instruct", # Use the loaded model name
    messages=[
        {"role": "user", "content": "Explain what a vector database is"}
    ]
)
print(response.choices[0].message.content)
```
Output / what to expect

After the local API server starts,

Local models can be called with OpenAI compatible code,

The data does not leave the local area at all.

Ideal for developing and testing AI applications.

Tips

LM Studio's API format is fully compatible with OpenAI, and existing code can be reused by simply modifying the base_url.

Compared with similar tools

ToolStrengthBest forPricing
LM Studio This toolThe most user-friendly graphical interface, the first choice for novices, and easy model downloadingUsers who are not familiar with the command line and want a graphical interface for local AIcompletely free
OllamaCommand line tool, lighter weight, more stable APIDevelopers need stable API servicescompletely free
GPT4AllSimpler interface, focus on conversationOnly conversation functionality is required, no API requiredcompletely free
JanOpen source, beautiful interface, cross-platformPursue open source and beautiful interfacecompletely free

Sources & references: