Together AI — Guide

Together AI — User Guide

Cheap fast cloud inference.

Visit website VPN may be required Freemium Sign-up required

Strengths

Extremely low price, more than 10 times cheaper than OpenAI
Supports mainstream open source models such as Llama 3, Qwen, Mistral, DeepSeek, etc.
Fast inference and low latency
OpenAI compatible API, extremely low migration cost
Free $1 credit, enough for testing

Best for

Reduce API call costs for AI applications
Using open source models as an alternative to OpenAI
Backend inference for highly concurrent AI applications
Test and compare different open source models
Build cost-sensitive AI products

quick start

Together AI’s API is fully compatible with OpenAI, requiring little modification to existing code.

Scenario

Called with OpenAI compatible API

Prompt example

from openai import OpenAI

# Just modify base_url and api_key
client = OpenAI(
    base_url="https://api.together.xyz/v1",
    api_key="your-together-api-key"
)

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[
        {"role": "user", "content": "Explain what RAG technology is"}
    ],
    max_tokens=1000
)
print(response.choices[0].message.content)

Output / what to expect

Calling the Llama 3.1 70B model,

The price is about 1/10 of OpenAI GPT-4o,

The answer quality is close to GPT-4 level.

Tips

To migrate existing OpenAI code to Together AI, just modify the base_url and api_key lines.

Scenario

Choose the right model

Prompt example

Together AI main models and prices (2025):




High quality model:


- Llama-3.1-70B-Instruct: $0.88/million tokens


- Qwen2.5-72B-Instruct: $1.2/million tokens


- DeepSeek-V3: $0.27/million tokens




Fast and lightweight model:


- Llama-3.2-3B-Instruct: $0.06/million tokens


- Llama-3.1-8B-Instruct: $0.18/million tokens




Compare OpenAI:


- GPT-4o: $2.5/million input tokens


- GPT-4o-mini: $0.15/million input tokens

Output / what to expect

DeepSeek-V3 is the most cost-effective option.

The quality is close to GPT-4o and the price is only 1/10,

Suitable for production applications with large number of calls.

Tips

For cost-sensitive applications, DeepSeek-V3 is one of the most cost-effective options available.

Batch inference optimization

Together AI supports batch inference, further reducing costs and increasing throughput.

Scenario

Process text in batches

Prompt example

import together

client = together.Together(api_key="your-key")

# Batch process multiple texts
texts = [
    "Please summarize:" + text
    for text in long_texts_list
]

# Concurrent calls
import asyncio
from together import AsyncTogether

async def process_batch(texts):
    async_client = AsyncTogether(api_key="your-key")
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
            messages=[{"role": "user", "content": text}]
        )
        for text in texts
    ]
    return await asyncio.gather(*tasks)

results = asyncio.run(process_batch(texts))

Output / what to expect

Concurrent calls greatly increase processing speed.

Suitable for batch text processing tasks,

Together AI has looser concurrency limits than OpenAI.

Tips

Using asynchronous calls during batch processing can increase processing speed by 10-50 times.

Compared with similar tools

Tool	Strength	Best for	Pricing
Together AI This tool	The lowest price, many open source model choices, OpenAI compatible	Cost-sensitive production applications, high volume of API calls	Pay per token (10x+ cheaper than OpenAI)
OpenAI API	The model has the highest quality and the most complete ecology	Highest quality required, sufficient budget	Pay by token
Groq	Fastest inference speed (LPU chip)	Real-time applications with extremely high latency requirements	Free quota/paid version
Ollama	Totally local, zero cost	There is a local GPU and high data privacy requirements	completely free

Sources & references:

Together AI official website (2025-03)
Together AI Documentation (2025-03)