(models)= # `models`: Unified Interface to LLM API If you want to just use LLM models, you can use `models` module. Let's see how to use it. ```{Note} If you build complex conversation flow, you may want to use `agents` module. See [agents](#agents) section for more details. ``` ## Make an API Call PromptTrail implement many LLM models under `models`. Let's call OpenAI's GPT models. (You need to set `OPENAI_API_KEY` environment variable with your API key.) ```python import os from prompttrail.core import Session, Message from prompttrail.models.openai import OpenAIModel, OpenAIConfig api_key = os.environ["OPENAI_API_KEY"] config = OpenAIConfig( api_key=api_key, model_name="gpt-4o-mini", max_tokens=100, temperature=0 ) model = OpenAIModel(configuration=config) session = Session( messages=[ Message(content="Hey", role="user"), ] ) model.send(session=session) ``` You can see the response from the model like this: ```python Message(content="Hello! How can I assist you today?", role="assistant") ``` Yay! You have successfully called an OpenAI GPT model. ## Core Concepts Some new concepts are introduced in the example above. Let's see them in detail. You can skip the following sections if you're already familiar with other LLM libraries. ### Message ```python Message(content="Hello! How can I assist you today?", role="assistant", metadata={}) ``` Message represents a single message in a conversation. It has the following attributes: - `content: str`: the content of the message (text) - `role: str`: the role of the message - OpenAI's API expect one of `system`, `user`, `assistant` as the role. - Other providers have different rules. - `metadata`: additional metadata for the message (used for templates, hooks, and other features) ### Session ```python session = Session( messages=[ Message(content="Hey", role="user"), ] ) ``` Session represents a conversation. Session is just a collection of messages. ```{Note} If you want to use non-chat models as traditional language models, you can just pass a session with a single message. ``` ### Model and Configuration ```python config = OpenAIConfig(api_key=api_key, model_name="gpt-4o-mini") model = OpenAIModel(configuration=config) message = model.send(session=session) ``` We call the interface to access LLM models `Model`. Conversation with LLM is a simple process: - Pass a `Session` to the LLM - Get a new `Message` from the LLM Each provider has its own configuration class that inherits from `Config`. This configuration includes: - Static settings (e.g., API keys, organization IDs) - Model parameters (e.g., model name, temperature, max tokens) - Optional providers (cache provider, mock provider) ```{Note} `CacheProvider` can be passed as a configuration parameter. PromptTrail has a built-in cache mechanism to reduce the number of API calls. See `Cache` section for more details. ``` ## Try different API ### Google If you want to call Google's Gemini model, you can do it by changing some lines. ```python import os from prompttrail.core import Session, Message from prompttrail.models.google import GoogleModel, GoogleConfig api_key = os.environ["GOOGLE_CLOUD_API_KEY"] config = GoogleConfig( api_key=api_key, model_name="models/gemini-1.5-flash", max_tokens=100, temperature=0 ) model = GoogleModel(configuration=config) session = Session( messages=[ Message(content="Hey", role="user"), ] ) message = model.send(session=session) ``` You will get the following response: ```python Message(content='Hey there! How can I help you today?', role='1', metadata={}) ``` You may notice the role system is different from OpenAI's! We're successfully using Google's model! The code is almost the same as the OpenAI example. Just change the `Model` and `Config` to Google's. You will get plenty type hints for every model and configuration. So you may not need to view documentation for every provider. ```{Note} PromptTrail is fully typed. Therefore we recommend you to write code with VSCode or PyCharm. docstring is also available for almost every class and method. We want users to be able to write code without viewing documentation. ``` ### Anthropic Anthropic's Claude is also available: ```python import os from prompttrail.core import Session, Message from prompttrail.models.anthropic import AnthropicModel, AnthropicConfig api_key = os.environ["ANTHROPIC_API_KEY"] config = AnthropicConfig( api_key=api_key, model_name="claude-3-5-haiku-latest", max_tokens=100, temperature=0 ) model = AnthropicModel(configuration=config) session = Session( messages=[ Message(content="Hey", role="user"), ] ) message = model.send(session=session) ``` You will get the following response: ```python Message(content='Hello! How can I assist you today?', role='assistant', metadata={}) ``` ## Try local LLMs One of the key features of PromptTrail is the ability to run LLMs locally using the Transformers library. This allows you to use any model from the HuggingFace Hub without relying on external APIs. First, install the transformers library. Please refer to the [Transformers installation guide](https://huggingface.co/docs/transformers/installation) for detailed instructions. Then you can use it like this: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from prompttrail.models.transformers import TransformersModel, TransformersConfig from prompttrail.core import Session, Message # Load model and tokenizer from HuggingFace Hub model_name = "facebook/opt-125m" # You can use any model from HuggingFace Hub tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Create TransformersModel instance llm = TransformersModel( configuration=TransformersConfig( device="cuda", # Use "cpu" if you don't have GPU model_name=model_name, temperature=0.7, max_tokens=100, top_p=0.9, top_k=50, repetition_penalty=1.2 ), model=model, tokenizer=tokenizer ) session = Session( messages=[ Message(content="What is machine learning?", role="user") ] ) # Generate response response = llm.send(session=session) ``` The TransformersConfig supports various generation parameters: - `temperature`: Controls randomness in generation (default: 1.0) - `max_tokens`: Maximum number of tokens to generate (default: 1024) - `top_p`: Nucleus sampling parameter (default: 1.0) - `top_k`: Top-k sampling parameter (optional) - `repetition_penalty`: Penalizes repeated tokens (default: 1.0) ## Stream Output Streaming output is supported for OpenAI's API and local Transformers models. If you want streaming output, you can use the `send_async` method if the provider offers the feature. ```python message_generator = model.send_async(session=session) for message in message_generator: print(message.content, sep="", flush=True) ``` This will print the following: ```shell Hello! How can # text is incrementally typed ``` `send_async` returns a generator that yields `Message` objects. For Transformers models, you can use streaming in the same way: ```python for partial_response in llm.send_async(session=session): print(partial_response.content, end="", flush=True)