You can select which AI model suits you best for different uses from the list of models offered by the platform or plug in your own custom AI model.Supported usages include:
Conversation
Image Analysis
Image Generation
Embeddings Generation
Audio Transcription
Text to Speech
Document Summarization
Speech to Speech (Conversation / Realtime)
Moderation
These models have various functions, performance profiles, and feature sets.
Models used for Conversation must support Tools and streaming simultaneously.
At runtime, you can easily switch the LLM used for Conversation by accessing the Settings in the Side Bar panel and review model capabilities by clicking the Model Options detailCustom Model is displayed in the same section in the Side Bar
You can specify which Conversation model to use for each assistantsBehind the scenes, Praxis AI’s Neural Engine can route requests to different models based on:
Use Case
Assistant Specific model
Token budget and cost constraints
Model availability and latency
User preferences and history
This allows you to balance quality, speed, and cost without changing your front-end integration.
Praxis AI middleware offers access to a broad catalog of state-of-the-art AI models. You can select the model that best fits your needs based on performance, cost, and capabilities.The default model is configured to use the latest, most capable model available on the platform. In most cases, you should keep default selected unless you have a specific requirement (for example, strict cost control, specific provider, or latency constraints).Models can be accessed using:
Praxis AI exposes conversation and related capabilities (vision, audio, embeddings, moderation, realtime) through multiple provider types:
Amazon Bedrock
OpenAI-Compatible Clients (OpenAI, xAI, Cohere)
Anthropic Direct API
Google Gemini Native SDK
Mistral AI Native SDK
Each provider contains groups and individual models with specific capabilities and uses.
Amazon Bedrock
Anthropic
Amazon
OpenAI (Open Source)
Meta
Cohere
Mistral
Anthropic models via Bedrock are platform models of choice, mainly for Conversation and Image Analysis. Models marked with Extended support the optional 1M token context window (see Inference Settings).
Model Name
Status
Capabilities
Input (tokens)
Output (tokens)
Thinking
Typical Uses
global.anthropic.claude-sonnet-4-6
New
Tools, Streaming, Vision
200,000
64,000
Yes (Extended)
Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-6
Default
Tools, Streaming, Vision
200,000
64,000
Yes (Extended)
Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-5-20250929-v1:0
Current
Tools, Streaming, Vision
200,000
64,000
Yes (Extended)
Conversation, Image Analysis, Summary
us.anthropic.claude-sonnet-4-20250514-v1:0
Deprecated
Tools, Streaming, Vision
200,000
64,000
Yes (Extended)
Conversation, Image Analysis, Summary
us.anthropic.claude-3-7-sonnet-20250219-v1:0
Deprecated
Tools, Streaming, Vision
200,000
64,000
Yes
Conversation, Image Analysis, Summary
us.anthropic.claude-3-5-sonnet-20241022-v2:0
Deprecated
Tools, Streaming, Vision
200,000
8,192
—
Conversation, Image Analysis
us.anthropic.claude-opus-4-6-v1
New
Tools, Streaming, Vision
200,000
128,000
Yes (Extended)
Conversation, Image Analysis, Summary
us.anthropic.claude-opus-4-5-20251101-v1:0
Current
Tools, Streaming, Vision
200,000
64,000
Yes
Conversation, Image Analysis, Summary
us.anthropic.claude-opus-4-1-20250805-v1:0
Deprecated
Tools, Streaming, Vision
200,000
32,000
Yes
Conversation, Image Analysis
us.anthropic.claude-opus-4-20250514-v1:0
Deprecated
Tools, Streaming, Vision
200,000
32,000
Yes
Conversation, Image Analysis
us.anthropic.claude-haiku-4-5-20251001-v1:0
Current
Tools, Streaming, Vision
200,000
64,000
Yes
Conversation, Summary, Image Analysis
us.anthropic.claude-3-5-haiku-20241022-v1:0
Deprecated
Tools, Streaming, Vision
200,000
8,192
—
Conversation, Image Analysis
Deprecated models will be removed in a future release. Migrate to a newer model. When a deprecated model is removed, any assistant or configuration referencing it will automatically fall back to the institution’s default model.
Amazon models hosted on Bedrock, used for Conversation, Image Analysis, Audio Transcription, Image Generation, and Embeddings.
ElevenLabs Conversational AI can be used as an alternative real-time speech-to-speech provider. When selected, Pria’s Convo Mode connects to your configured ElevenLabs agent instead of OpenAI GPT-Realtime.
Gemini models are accessed through the Google GenAI native SDK and are used for Conversation, Image Analysis, Summary, and Image Generation. Requires an API key.
Anthropic models via direct API, used for Conversation, Image Analysis, and Summary. Requires an API key. Models marked with Extended support the optional 1M token context window.
Mistral AI models are accessed through the native Mistral SDK (@mistralai/mistralai) and are used for Conversation, Image Analysis, Summary, Audio, TTS, Embeddings, and Moderation. Requires an API key.
Some AI models support extended thinking (also called reasoning), where the model can spend additional time analyzing a problem before responding. Praxis AI provides a unified 5-level reasoning effort system that works across all supported providers.
Level
Description
Best For
None
Disable thinking. Fastest responses, lowest cost.
Simple queries, quick lookups
Low
Minimal reasoning.
Straightforward questions
Medium
Balanced reasoning.
Most everyday tasks
High
Thorough reasoning.
Complex analysis, multi-step problems
Max
Maximum reasoning depth. Highest latency and cost.
The reasoning effort level is resolved using this priority:
AI Model override — If a custom AI model has a reasoning effort configured, that takes precedence
Institution setting — The institution-level default reasoning effort
Platform default — None (thinking disabled)
Reasoning effort is mapped to each provider’s native format automatically — OpenAI reasoning_effort, Anthropic budget_tokens, Gemini thinkingConfig, Bedrock budgetTokens, and Mistral thinking. You don’t need to configure provider-specific parameters.
Not all models support extended thinking. Look for models marked with thinking support in the tables above. Currently supported thinking models include:
Anthropic: Claude Opus 4.6, Opus 4.5, Sonnet 4.6, Sonnet 4.5, Sonnet 4, Claude 3.7 Sonnet, Haiku 4.5 (via Bedrock or Direct API)
Some providers support prompt caching, which reduces latency and input token costs by reusing previously processed prompt prefixes. Praxis AI enables prompt caching automatically where supported — no configuration is needed.
Provider
Caching Type
How It Works
Cost Savings
OpenAI
Automatic
Cached automatically on every request — no code changes needed. The API returns cached_tokens in the usage response.
Up to 50% on cached input tokens
Anthropic
Explicit
Praxis marks cache breakpoints on tools, system prompt, and the last user message using cache_control headers. Cached prefixes are reused on subsequent requests.
Up to 90% on cached reads
Google Gemini
Context caching
Supports context caching via a separate API to create reusable cached content objects.
Varies by content size and TTL
Amazon Bedrock
Varies
Depends on the underlying model provider (e.g., Anthropic models on Bedrock inherit Anthropic’s caching).
Varies
Mistral AI
Not available
The Mistral API does not currently support prompt caching. Usage tracking returns promptTokens, completionTokens, and totalTokens only.
—
Prompt caching is most impactful for conversations with long system prompts, many tools, or extended history — exactly the pattern used by Praxis AI’s RAG pipeline. Anthropic and OpenAI caching are enabled by default for all eligible requests.
Praxis AI routes AI requests through five backend providers:
Provider
How It Works
Amazon Bedrock
Models hosted on AWS infrastructure. Uses IAM credentials for authentication.
OpenAI API
Direct OpenAI API calls. Used for OpenAI models and OpenAI-compatible endpoints.
Anthropic Direct API
Direct Anthropic API calls. Bypasses Bedrock for Claude models when preferred.
Google GenAI
Direct Google Gemini API calls via the @google/genai SDK.
Mistral AI
Direct Mistral API calls via the @mistralai/mistralai SDK.
Some model families (e.g., Anthropic Claude, Mistral) are available through multiple providers — both via Bedrock and via Direct API. The admin can choose which provider to use based on latency, cost, and regional availability preferences.
You can connect your own hosted LLM (for example, a model deployed on Google Vertex AI, private OpenAI-compatible endpoint, or a Bedrock-hosted custom model) and use it as a replacement for any of the supported usages.
To add a custom model for Conversation (or any other use):
In the Admin UI, edit your Digital Twin.
Under Personalization and AI Models, click Add AI Model.
In the Add AI Model panel, enter the properties required to connect to your LLM:
Model Name
The exact model identifier published by your hosting platform.
This value is case sensitive and must match your provider’s model name, for example:
gemini-flash or projects/my-proj/locations/us/models/my-model.
StatusActive models are considered by the system for routing and selection.
Inactive models are ignored but kept in configuration.
Description
Human-readable description of the LLM for admins and authors using this Digital Twin.
Model Use
The specific usage for this model (for example, Conversation, Image Generation, Document Summarization).
This determines which internal calls will use this model.
Client Library Type
Choose from:
Open AI for OpenAI-compatible endpoints (including many custom or Vertex AI gateways exposing an OpenAI-style API).
Bedrock for Amazon Bedrock-hosted models.
Most Gemini-based models connected through an OpenAI-compatible proxy should use Open AI.
API URL
The base public URL of your model endpoint, for example:
https://ai.my-school.edu or your Bedrock-compatible endpoint.
Typically, the model name or ID is appended to this base URL when interacting with the LLM.
API Key
The secret key used to authenticate requests to your endpoint.
Keep this key secure and confidential; rotate it periodically for security.
Click Save to register the new custom AI model.
Once saved:
The model appears in the list of custom AI models.
For its configured Model Use, it will replace the platform default model.
All conversations or tasks mapped to that Model Use will start using your custom model without any client-side code changes.
Use a non-production Digital Twin first to validate latency, cost, and behavior of your custom model before assigning it to high-traffic or mission-critical usages.
Go to Configuration → Personalization and AI Models and enter API keys and endpoints for each provider you plan to use (OpenAI-compatible, Bedrock, or custom gateways).
2
Select Models per Usage
For each Model Use (Conversation, Image, Audio, etc.), select the preferred model from the list of available platform and custom models.
3
Enable and Test Your Digital Twin
Use the Test or preview mode to run conversations against your updated configuration. Validate:
Response quality
Latency
Tool and streaming support (for Conversation models)
4
Monitor and Optimize
Use Analytics to track token usage, latency, and error rates per model. Adjust your model selection or routing preferences to balance performance and cost.
5
Scale to Production
Once validated, deploy your Digital Twin to users through LMS integration (e.g., Canvas), Web SDK, or REST APIs—no additional code changes required when switching models.
6
Connect New Digital Twins
Repeat the configuration setup for any additional twins so they can connect to the same custom LLM
Need help choosing models or configuring BYOM?
Praxis AI supports multi-LLM orchestration and can route across OpenAI, Anthropic, Amazon, Google, Mistral, and your own hosted models in a single Digital Twin configuration.