models: Unified Interface to LLM API
If you want to just use LLM models, you can use models module.
Let’s see how to use it.
Note
If you build complex conversation flow, you may want to use agents module.
See agents section for more details.
Make an API Call
PromptTrail implement many LLM models under models.
Let’s call OpenAI’s GPT models. (You need to set OPENAI_API_KEY environment variable with your API key.)
import os
from prompttrail.core import Session, Message
from prompttrail.models.openai import OpenAIModel, OpenAIConfig
api_key = os.environ["OPENAI_API_KEY"]
config = OpenAIConfig(
api_key=api_key,
model_name="gpt-4o-mini",
max_tokens=100,
temperature=0
)
model = OpenAIModel(configuration=config)
session = Session(
messages=[
Message(content="Hey", role="user"),
]
)
model.send(session=session)
You can see the response from the model like this:
Message(content="Hello! How can I assist you today?", role="assistant")
Yay! You have successfully called an OpenAI GPT model.
Core Concepts
Some new concepts are introduced in the example above. Let’s see them in detail. You can skip the following sections if you’re already familiar with other LLM libraries.
Message
Message(content="Hello! How can I assist you today?", role="assistant", metadata={})
Message represents a single message in a conversation. It has the following attributes:
content: str: the content of the message (text)role: str: the role of the messageOpenAI’s API expect one of
system,user,assistantas the role.Other providers have different rules.
metadata: additional metadata for the message (used for templates, hooks, and other features)
Session
session = Session(
messages=[
Message(content="Hey", role="user"),
]
)
Session represents a conversation. Session is just a collection of messages.
Note
If you want to use non-chat models as traditional language models, you can just pass a session with a single message.
Model and Configuration
config = OpenAIConfig(api_key=api_key, model_name="gpt-4o-mini")
model = OpenAIModel(configuration=config)
message = model.send(session=session)
We call the interface to access LLM models Model.
Conversation with LLM is a simple process:
Pass a
Sessionto the LLMGet a new
Messagefrom the LLM
Each provider has its own configuration class that inherits from Config. This configuration includes:
Static settings (e.g., API keys, organization IDs)
Model parameters (e.g., model name, temperature, max tokens)
Optional providers (cache provider, mock provider)
Note
CacheProvider can be passed as a configuration parameter. PromptTrail has a built-in cache mechanism to reduce the number of API calls. See Cache section for more details.
Try different API
Google
If you want to call Google’s Gemini model, you can do it by changing some lines.
import os
from prompttrail.core import Session, Message
from prompttrail.models.google import GoogleModel, GoogleConfig
api_key = os.environ["GOOGLE_CLOUD_API_KEY"]
config = GoogleConfig(
api_key=api_key,
model_name="models/gemini-1.5-flash",
max_tokens=100,
temperature=0
)
model = GoogleModel(configuration=config)
session = Session(
messages=[
Message(content="Hey", role="user"),
]
)
message = model.send(session=session)
You will get the following response:
Message(content='Hey there! How can I help you today?', role='1', metadata={})
You may notice the role system is different from OpenAI’s! We’re successfully using Google’s model!
The code is almost the same as the OpenAI example. Just change the Model and Config to Google’s.
You will get plenty type hints for every model and configuration. So you may not need to view documentation for every provider.
Note
PromptTrail is fully typed. Therefore we recommend you to write code with VSCode or PyCharm. docstring is also available for almost every class and method. We want users to be able to write code without viewing documentation.
Anthropic
Anthropic’s Claude is also available:
import os
from prompttrail.core import Session, Message
from prompttrail.models.anthropic import AnthropicModel, AnthropicConfig
api_key = os.environ["ANTHROPIC_API_KEY"]
config = AnthropicConfig(
api_key=api_key,
model_name="claude-3-5-haiku-latest",
max_tokens=100,
temperature=0
)
model = AnthropicModel(configuration=config)
session = Session(
messages=[
Message(content="Hey", role="user"),
]
)
message = model.send(session=session)
You will get the following response:
Message(content='Hello! How can I assist you today?', role='assistant', metadata={})
Try local LLMs
One of the key features of PromptTrail is the ability to run LLMs locally using the Transformers library. This allows you to use any model from the HuggingFace Hub without relying on external APIs.
First, install the transformers library. Please refer to the Transformers installation guide for detailed instructions.
Then you can use it like this:
from transformers import AutoModelForCausalLM, AutoTokenizer
from prompttrail.models.transformers import TransformersModel, TransformersConfig
from prompttrail.core import Session, Message
# Load model and tokenizer from HuggingFace Hub
model_name = "facebook/opt-125m" # You can use any model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Create TransformersModel instance
llm = TransformersModel(
configuration=TransformersConfig(
device="cuda", # Use "cpu" if you don't have GPU
model_name=model_name,
temperature=0.7,
max_tokens=100,
top_p=0.9,
top_k=50,
repetition_penalty=1.2
),
model=model,
tokenizer=tokenizer
)
session = Session(
messages=[
Message(content="What is machine learning?", role="user")
]
)
# Generate response
response = llm.send(session=session)
The TransformersConfig supports various generation parameters:
temperature: Controls randomness in generation (default: 1.0)max_tokens: Maximum number of tokens to generate (default: 1024)top_p: Nucleus sampling parameter (default: 1.0)top_k: Top-k sampling parameter (optional)repetition_penalty: Penalizes repeated tokens (default: 1.0)
Stream Output
Streaming output is supported for OpenAI’s API and local Transformers models.
If you want streaming output, you can use the send_async method if the provider offers the feature.
message_generator = model.send_async(session=session)
for message in message_generator:
print(message.content, sep="", flush=True)
This will print the following:
Hello! How can # text is incrementally typed
send_async returns a generator that yields Message objects.
For Transformers models, you can use streaming in the same way:
for partial_response in llm.send_async(session=session):
print(partial_response.content, end="", flush=True)