models: Unified Interface to LLM API

If you want to just use LLM models, you can use models module. Let’s see how to use it.

Note

If you build complex conversation flow, you may want to use agents module. See agents section for more details.

Make an API Call

PromptTrail implement many LLM models under models. Let’s call OpenAI’s GPT models. (You need to set OPENAI_API_KEY environment variable with your API key.)

import os
from prompttrail.core import Session, Message
from prompttrail.models.openai import OpenAIModel, OpenAIConfig

api_key = os.environ["OPENAI_API_KEY"]
config = OpenAIConfig(
    api_key=api_key,
    model_name="gpt-4o-mini",
    max_tokens=100,
    temperature=0
)
model = OpenAIModel(configuration=config)
session = Session(
  messages=[
    Message(content="Hey", role="user"),
  ]
)
model.send(session=session)

You can see the response from the model like this:

Message(content="Hello! How can I assist you today?", role="assistant")

Yay! You have successfully called an OpenAI GPT model.

Core Concepts

Some new concepts are introduced in the example above. Let’s see them in detail. You can skip the following sections if you’re already familiar with other LLM libraries.

Message

Message(content="Hello! How can I assist you today?", role="assistant", metadata={})

Message represents a single message in a conversation. It has the following attributes:

  • content: str: the content of the message (text)

  • role: str: the role of the message

    • OpenAI’s API expect one of system, user, assistant as the role.

    • Other providers have different rules.

  • metadata: additional metadata for the message (used for templates, hooks, and other features)

Session

session = Session(
  messages=[
    Message(content="Hey", role="user"),
  ]
)

Session represents a conversation. Session is just a collection of messages.

Note

If you want to use non-chat models as traditional language models, you can just pass a session with a single message.

Model and Configuration

config = OpenAIConfig(api_key=api_key, model_name="gpt-4o-mini")
model = OpenAIModel(configuration=config)
message = model.send(session=session)

We call the interface to access LLM models Model.

Conversation with LLM is a simple process:

  • Pass a Session to the LLM

  • Get a new Message from the LLM

Each provider has its own configuration class that inherits from Config. This configuration includes:

  • Static settings (e.g., API keys, organization IDs)

  • Model parameters (e.g., model name, temperature, max tokens)

  • Optional providers (cache provider, mock provider)

Note

CacheProvider can be passed as a configuration parameter. PromptTrail has a built-in cache mechanism to reduce the number of API calls. See Cache section for more details.

Try different API

Google

If you want to call Google’s Gemini model, you can do it by changing some lines.

import os
from prompttrail.core import Session, Message
from prompttrail.models.google import GoogleModel, GoogleConfig

api_key = os.environ["GOOGLE_CLOUD_API_KEY"]
config = GoogleConfig(
    api_key=api_key,
    model_name="models/gemini-1.5-flash",
    max_tokens=100,
    temperature=0
)
model = GoogleModel(configuration=config)
session = Session(
  messages=[
    Message(content="Hey", role="user"),
  ]
)
message = model.send(session=session)

You will get the following response:

Message(content='Hey there! How can I help you today?', role='1', metadata={})

You may notice the role system is different from OpenAI’s! We’re successfully using Google’s model!

The code is almost the same as the OpenAI example. Just change the Model and Config to Google’s.

You will get plenty type hints for every model and configuration. So you may not need to view documentation for every provider.

Note

PromptTrail is fully typed. Therefore we recommend you to write code with VSCode or PyCharm. docstring is also available for almost every class and method. We want users to be able to write code without viewing documentation.

Anthropic

Anthropic’s Claude is also available:

import os
from prompttrail.core import Session, Message
from prompttrail.models.anthropic import AnthropicModel, AnthropicConfig

api_key = os.environ["ANTHROPIC_API_KEY"]
config = AnthropicConfig(
    api_key=api_key,
    model_name="claude-3-5-haiku-latest",
    max_tokens=100,
    temperature=0
)
model = AnthropicModel(configuration=config)
session = Session(
  messages=[
    Message(content="Hey", role="user"),
  ]
)
message = model.send(session=session)

You will get the following response:

Message(content='Hello! How can I assist you today?', role='assistant', metadata={})

Try local LLMs

One of the key features of PromptTrail is the ability to run LLMs locally using the Transformers library. This allows you to use any model from the HuggingFace Hub without relying on external APIs.

First, install the transformers library. Please refer to the Transformers installation guide for detailed instructions.

Then you can use it like this:

from transformers import AutoModelForCausalLM, AutoTokenizer
from prompttrail.models.transformers import TransformersModel, TransformersConfig
from prompttrail.core import Session, Message

# Load model and tokenizer from HuggingFace Hub
model_name = "facebook/opt-125m"  # You can use any model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Create TransformersModel instance
llm = TransformersModel(
    configuration=TransformersConfig(
        device="cuda",  # Use "cpu" if you don't have GPU
        model_name=model_name,
        temperature=0.7,
        max_tokens=100,
        top_p=0.9,
        top_k=50,
        repetition_penalty=1.2
    ),
    model=model,
    tokenizer=tokenizer
)

session = Session(
    messages=[
        Message(content="What is machine learning?", role="user")
    ]
)

# Generate response
response = llm.send(session=session)

The TransformersConfig supports various generation parameters:

  • temperature: Controls randomness in generation (default: 1.0)

  • max_tokens: Maximum number of tokens to generate (default: 1024)

  • top_p: Nucleus sampling parameter (default: 1.0)

  • top_k: Top-k sampling parameter (optional)

  • repetition_penalty: Penalizes repeated tokens (default: 1.0)

Stream Output

Streaming output is supported for OpenAI’s API and local Transformers models. If you want streaming output, you can use the send_async method if the provider offers the feature.

message_generator = model.send_async(session=session)
for message in message_generator:
    print(message.content, sep="", flush=True)

This will print the following:

Hello! How can # text is incrementally typed

send_async returns a generator that yields Message objects.

For Transformers models, you can use streaming in the same way:

for partial_response in llm.send_async(session=session):
    print(partial_response.content, end="", flush=True)