Together AI

Together AI — User Guide

Cheap fast cloud inference.

Visit website VPN may be required Freemium Sign-up required
Strengths
  • Extremely low price, more than 10 times cheaper than OpenAI
  • Supports mainstream open source models such as Llama 3, Qwen, Mistral, DeepSeek, etc.
  • Fast inference and low latency
  • OpenAI compatible API, extremely low migration cost
  • Free $1 credit, enough for testing
Best for
  • Reduce API call costs for AI applications
  • Using open source models as an alternative to OpenAI
  • Backend inference for highly concurrent AI applications
  • Test and compare different open source models
  • Build cost-sensitive AI products

quick start

Together AI’s API is fully compatible with OpenAI, requiring little modification to existing code.

Scenario

Called with OpenAI compatible API

Prompt example
from openai import OpenAI

# Just modify base_url and api_key
client = OpenAI(
    base_url="https://api.together.xyz/v1",
    api_key="your-together-api-key"
)

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[
        {"role": "user", "content": "Explain what RAG technology is"}
    ],
    max_tokens=1000
)
print(response.choices[0].message.content)
Output / what to expect

Calling the Llama 3.1 70B model,

The price is about 1/10 of OpenAI GPT-4o,

The answer quality is close to GPT-4 level.

Tips

To migrate existing OpenAI code to Together AI, just modify the base_url and api_key lines.

Scenario

Choose the right model

Prompt example
Together AI main models and prices (2025):




High quality model:


- Llama-3.1-70B-Instruct: $0.88/million tokens


- Qwen2.5-72B-Instruct: $1.2/million tokens


- DeepSeek-V3: $0.27/million tokens




Fast and lightweight model:


- Llama-3.2-3B-Instruct: $0.06/million tokens


- Llama-3.1-8B-Instruct: $0.18/million tokens




Compare OpenAI:


- GPT-4o: $2.5/million input tokens


- GPT-4o-mini: $0.15/million input tokens
Output / what to expect

DeepSeek-V3 is the most cost-effective option.

The quality is close to GPT-4o and the price is only 1/10,

Suitable for production applications with large number of calls.

Tips

For cost-sensitive applications, DeepSeek-V3 is one of the most cost-effective options available.

Batch inference optimization

Together AI supports batch inference, further reducing costs and increasing throughput.

Scenario

Process text in batches

Prompt example
import together

client = together.Together(api_key="your-key")

# Batch process multiple texts
texts = [
    "Please summarize:" + text
    for text in long_texts_list
]

# Concurrent calls
import asyncio
from together import AsyncTogether

async def process_batch(texts):
    async_client = AsyncTogether(api_key="your-key")
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
            messages=[{"role": "user", "content": text}]
        )
        for text in texts
    ]
    return await asyncio.gather(*tasks)

results = asyncio.run(process_batch(texts))
Output / what to expect

Concurrent calls greatly increase processing speed.

Suitable for batch text processing tasks,

Together AI has looser concurrency limits than OpenAI.

Tips

Using asynchronous calls during batch processing can increase processing speed by 10-50 times.

Compared with similar tools

ToolStrengthBest forPricing
Together AI This toolThe lowest price, many open source model choices, OpenAI compatibleCost-sensitive production applications, high volume of API callsPay per token (10x+ cheaper than OpenAI)
OpenAI APIThe model has the highest quality and the most complete ecologyHighest quality required, sufficient budgetPay by token
GroqFastest inference speed (LPU chip)Real-time applications with extremely high latency requirementsFree quota/paid version
OllamaTotally local, zero costThere is a local GPU and high data privacy requirementscompletely free

Sources & references: