BoxAgnts Introduction (7) — OpenAI API and Anthropic API

The 2025 AI model market is in full bloom. But each provider has its own API format, authentication method, and streaming protocol. BoxAgnts' design goal: users switch models by changing just one parameter, with all internal logic remaining unchanged. This article dissects this abstraction across four levels: Unified Interface: How the LlmProvider trait defines a "model provider" Three Major API Format Comparisons: Format differences between Anthropic, OpenAI, and Google Gemini Format Conversion: How to translate between three completely different message formats Engineering Practices: Think configuration, error handling, ProviderQuirks, API Key management Everything starts with the interface definition: // boxagnts-api/src/provider.rs #[async_trait] pub trait LlmProvider: Send + Sync { fn id(&self) -> &ProviderId; // Unique identifier fn name(&self) -> &str; // Human-readable name async fn create_message( // Non-streaming request &self, request: ProviderRequest, ) -> Result<ProviderResponse, ProviderError>; async fn create_message_stream( // Streaming request &self, request: ProviderRequest, ) -> Result< Pin<Box<dyn Stream<Item = Result<StreamEvent, ProviderError>> + Send>>, ProviderError, >; async fn list_models(&self) -> Result<Vec<ModelInfo>, ProviderError>; // Model list async fn check_connectivity(&self) -> Result<ProviderStatus, ProviderError>; // Health check fn capabilities(&self) -> ProviderCapabilities; // Capability declaration } Both input and output use provider-agnostic unified types: pub struct ProviderRequest { pub model: String, pub messages: Vec<Message>, // Unified conversation format pub system_prompt: Option<SystemPrompt>, pub tools: Vec<ToolDefinition>, // Unified tool definitions pub max_tokens: u32, pub temperature: Option<f64>, pub thinking: Option<ThinkingConfig>, // Deep thinking configuration pub provider_options: Value, // Provider-specific parameters } pub struct ProviderResponse { pub id: String, pub content: Vec<ContentBlock>, // Unified content blocks pub stop_reason: StopReason, // Unified stop reason pub usage: UsageInfo, // Token usage pub model: String, } The core value of the normalization layer: whether the underlying is Claude, GPT, or Gemini, upper-layer code only sees ProviderRequest and ProviderResponse. // boxagnts-api/src/registry.rs pub struct ProviderRegistry { providers: HashMap<ProviderId, Arc<dyn LlmProvider>>, default_provider_id: ProviderId, } fn provider_from_key(provider_id: &str, key: String) -> Option<Arc<dyn LlmProvider>> { match provider_id { // Native implementations — each with its own API format "anthropic" => Some(Arc::new(AnthropicProvider::from_config(...))), "openai" => Some(Arc::new(OpenAiProvider::new(key))), "google" => Some(Arc::new(GoogleProvider::new(key))), "github-copilot" => Some(Arc::new(CopilotProvider::new(key))), "cohere" => Some(Arc::new(CohereProvider::new(key))), // OpenAI-compatible providers — share the same conversion logic, only change base_url "deepseek", "groq", "ollama", "mistral", "xai", "perplexity", "openrouter", "siliconflow", "moonshot", "zhipu", "stepfun", "fireworks", "llamacpp", "sambanova", "huggingface", "nvidia", "cerebras", // ... 30+ OpenAI-compatible providers in total _ => None, } } Three implementation strategies: Type Representative Conversion Strategy Count Native Anthropic claude-sonnet-4-5 Near-zero conversion (internal format = Anthropic format) 1 Native OpenAI gpt-4o, o3 ProviderRequest → Chat Completions 1 Native Google gemini-2.5-flash ProviderRequest → generateContent 1 OpenAI Compatible deepseek, groq, ollama, etc. Same logic as OpenAI, only URL changes 30+ Other Native github-copilot, cohere Independent format conversion 3+ Anthropic, OpenAI, Google Gemini — three APIs with vast differences in message format. Understanding these differences is essential to understanding the value of the conversion layer. Feature Anthropic OpenAI Google Gemini Location Top-level "system" field messages[0], role:"system" Top-level "systemInstruction" field Type string or ContentBlock array string only content parts array only // Anthropic — top-level standalone field {"model": "claude-sonnet-4-5", "system": "You are helpful.", "messages": [...]} // OpenAI — embedded in messages array {"model": "gpt-4o", "messages": [{"role":"system","content":"You are helpful."}, ...]} // Google — uses systemInstruction field, structure differs from messages { "systemInstruction": {"parts": [{"text": "You are helpful."}]}, "contents": [{"role": "user", "parts": [{"text": "Hello"}]}] } Feature Anthropic OpenAI Google Field "tools": [{name, description, input_schema}] "tools": [{type:"function", function:{...}}] "tools": [{functionDeclarations: [{name, description, parameters}]}] Wrapping Layers 0 1 1, with different nesting names // Anthropic — native block in content array {"content": [{"type":"tool_use", "id":"toolu_01A", "name":"read", "input": {...}}]} // OpenAI — standalone tool_calls array, arguments is JSON string {"tool_calls": [{"id":"call_abc", "function": {"name":"read", "arguments": "{\"path\":\"...\"}"}}]} // Google — functionCall embedded in parts, args is JSON object {"candidates": [{"content": {"parts": [{"functionCall": {"name":"read", "args": {...}}}]}}]} // Anthropic — tool_result is a block in the user message content array {"role":"user", "content": [{"type":"tool_result", "tool_use_id":"toolu_01A", "content":"..."}]} // OpenAI — requires a separate role: "tool" message {"role":"tool", "tool_call_id":"call_abc", "content":"..."} // Google — functionResponse embedded in user content parts {"role":"user", "parts": [{"functionResponse": {"name":"read", "response": {...}}}]} Anthropic OpenAI Google user user user assistant assistant model Google uses model instead of assistant — this is the most easily overlooked but most error-prone difference. OpenAiProvider is the most complete example of the conversion layer: // boxagnts-api/src/providers/openai.rs impl OpenAiProvider { fn to_openai_messages( messages: &[Message], system_prompt: Option<&SystemPrompt>, ) -> Vec<Value> { let mut result: Vec<Value> = Vec::new(); // Step 1: system prompt → role: "system" message if let Some(sys) = system_prompt { result.push(json!({"role": "system", "content": sys_text})); } for msg in messages { match msg.role { Role::User => { // User messages may mix text and tool_result blocks // tool_result needs to be split into separate role: "tool" messages Self::append_user_messages(&mut result, &msg.content); } Role::Assistant => { let (text, tool_calls) = Self::assistant_content_to_openai(&msg.content); result.push(json!({ "role": "assistant", "content": text, "tool_calls": tool_calls })); } } } result } fn to_openai_tools(tools: &[ToolDefinition]) -> Vec<Value> { tools.iter().map(|td| { json!({ "type": "function", "function": { "name": td.name, "description": td.description, "parameters": td.input_schema } }) }).collect() } } The most complex part is tool_use_id sanitization — Anthropic's tool IDs (e.g., toolu_01Bx...) may contain characters that OpenAI does not accept. GoogleProvider shows how to handle an API format that is different from both Anthropic and OpenAI: // boxagnts-api/src/providers/google.rs // URL pattern completely different from OpenAI's /v1/chat/completions fn generate_url(&self, model: &str) -> String { format!( "{}/v1beta/models/{}:generateContent?key={}", self.base_url, model, self.api_key // API Key in URL query parameters! ) } Key differences from OpenAI: Difference Google Gemini OpenAI API Key Location URL query parameter ?key= HTTP Header Authorization: Bearer Endpoint Format /v1beta/models/{model}:generateContent /v1/chat/completions Streaming Endpoint /v1beta/models/{model}:streamGenerateContent?alt=sse /v1/chat/completions + stream:true Message Roles user / model (not assistant) user / assistant Tool Results functionResponse in parts Separate role: tool message Image Input inlineData base64 image_url or content parts ThinkingConfig is the normalized deep thinking configuration — but different providers handle it completely differently: // Normalized configuration pub struct ThinkingConfig { pub budget_tokens: u32, // Thinking token budget } // When building ProviderRequest, decides whether to pass based on provider capabilities let provider_request = ProviderRequest { // ... thinking: if caps.thinking { effective_thinking_budget .map(|b| ThinkingConfig::enabled(b)) } else { None // This provider doesn't support thinking, don't pass }, }; Provider Thinking Support How It's Passed Anthropic (Claude 3.5+) ✓ "thinking": {"type": "enabled", "budget_tokens": N} Google (Gemini 2.5+) ✓ "thinkingConfig": {"thinkingBudget": N} OpenAI (o1/o3 series) Partial Via reasoning_effort parameter Other OpenAI Compatible Mostly unsupported Not passed At request construction time, ProviderCapabilities declares each provider's capabilities: pub struct ProviderCapabilities { pub thinking: bool, // Whether deep thinking is supported pub prompt_caching: bool, // Whether prompt caching is supported pub image_input: bool, // Whether image input is supported pub native_tool_use: bool, // Whether native tool calling exists pub supports_streaming: bool, // Whether streaming responses are supported // ... } OpenAI-compatible providers' APIs are roughly compatible, but all have subtle differences. ProviderQuirks handles these: pub struct ProviderQuirks { /// Specific error message patterns for context overflow pub overflow_patterns: Vec<String>, /// Local services that don't require API Keys (e.g., Ollama, LM Studio) pub no_api_key_required: bool, /// Whether streaming responses include usage info pub include_usage_in_stream: bool, /// Providers like DeepSeek need the reasoning_content field pub reasoning_field: Option<String>, } For example, DeepSeek's streaming response returns reasoning content with a field name different from OpenAI's — adapted via reasoning_field. Ollama's context overflow error message is "exceeds the available context size", while LM Studio's is "greater than the context length" — adapted via overflow_patterns. Streaming responses are also completely different across the three APIs: Feature Anthropic (SSE) OpenAI (SSE) Google (SSE) Event Granularity High: 6 event types (start/delta/stop × 2) Low: each chunk is a complete delta Medium: pushed by chunk, but structure is flat Tool call Increment Fragmented send of input_json_delta Single send of complete arguments string Single send of complete functionCall Termination Signal message_stop event data: [DONE] marker Stream ends naturally Need to Reassemble by index Yes (reassemble by index for multiple tool_use) Yes Yes All three formats are normalized to the same StreamEvent enum: pub enum StreamEvent { MessageStart { id, model, usage }, ContentBlockStart { index, content_block }, TextDelta { text }, ThinkingDelta { thinking }, InputJsonDelta { index, partial_json }, ContentBlockStop { index }, MessageDelta { stop_reason, usage }, MessageStop, } Each provider's error format is also different: // Unified error types pub enum ProviderError { Auth { ... }, // Authentication failure RateLimited { ... }, // Rate limiting ContextOverflow { ... }, // Context exceeds window (matched via ProviderQuirks) InvalidRequest { ... }, // Invalid request parameters ServerError { ... }, // Server error StreamError { ... }, // Stream interruption Other { ... }, // Unknown error } In the query loop, specific errors trigger specific recovery strategies: RateLimited / Overloaded → Switch to fallback_model ContextOverflow → Trigger auto_compact StreamError (stall) → Retry (max 2 times, 45s timeout) Auth → Unrecoverable, return error BoxAgnts defines environment variable name mappings for each provider: // boxagnts-workspace/src/config.rs pub fn api_key_env_vars_for_provider(provider_id: &str) -> &'static [&'static str] { match provider_id { "anthropic" => &["ANTHROPIC_API_KEY"], "openai" => &["OPENAI_API_KEY"], "google" => &["GOOGLE_API_KEY", "GOOGLE_GENERATIVE_AI_API_KEY"], "deepseek" => &["DEEPSEEK_API_KEY"], "mistral" => &["MISTRAL_API_KEY"], "xai" => &["XAI_API_KEY"], "zhipu" => &["ZHIPU_API_KEY"], // ... 40+ provider environment variables } } Three-tier priority: Environment Variables > User Config JSON > No Default. This design supports different scenarios such as multi-tenancy, CI/CD, and local development. BoxAgnts' model abstraction layer solves the essential problem of "one set of code adapting to all APIs": ┌──────────────────────────────────────────────┐ │ boxagnts-query (Agent reasoning loop) │ │ Only uses ProviderRequest / ProviderResponse │ └────────────────────┬─────────────────────────┘ │ ┌────────────────────▼─────────────────────────┐ │ LlmProvider trait │ │ + ProviderRegistry (40+ providers) │ ├──────────┬──────────┬──────────┬─────────────┤ │Anthropic │ OpenAI │ Google │ OpenAiCompat │ │Provider │ Provider │ Provider │ (30+ vendors)│ │(Near-zero│ (Full │ (Independent│ (Shares │ │ conversion)│ format │ format │ OpenAI │ │ │ conversion)│ conversion)│ conversion │ │ │ │ │ +Quirks) │ └──────────┴──────────┴──────────┴─────────────┘ Three key capabilities: User freedom: Switch models by just changing the --model parameter Code unaffected: run_query_loop() has no idea what's underneath Extremely low extension cost: Adding a new OpenAI-compatible provider takes about 3 lines of code This is not a simple "adapter pattern" — it's a production-grade abstraction validated against 40+ real-world APIs. Boxagnts: https://github.com/guyoung/boxagnts Anthropic API: https://docs.anthropic.com/en/api/messages OpenAI API: https://platform.openai.com/docs/api-reference/chat Google Gemini API: https://ai.google.dev/gemini-api/docs

BoxAgnts Introduction (7) — OpenAI API and Anthropic API

Key Takeaways

Related Articles

Building a GitHub Stats MCP Server with Security Metrics

Your Agent Made a $500 Mistake. Who Pays?

Moonrepo: Open-Source Build Systems for LLMs

Add AudioObject Schema to Your Blog Posts

Discussion

BoxAgnts Introduction (7) — OpenAI API and Anthropic API

Key Takeaways

Related Articles

Building a GitHub Stats MCP Server with Security Metrics

Your Agent Made a $500 Mistake. Who Pays?

Moonrepo: Open-Source Build Systems for LLMs

Add AudioObject Schema to Your Blog Posts

Discussion

Related Articles

Dev.to
Building a GitHub Stats MCP Server with Security Metrics
👋 This is the second chapter of a series where I document what I'm learning about Model Context Protocol Architecture and Tool implementations In Chapter 1, I built a simple Calculator MCP Server. This time, I connected my MCP server to an external API, added the two other MCP structures (Resources

Dev.to
Your Agent Made a $500 Mistake. Who Pays?
Last month, American Express did something no other financial institution has done: they promised to cover losses when AI agents make purchasing errors. They called it Agent Purchase Protection. One company. Out of the entire global payments industry. That tells you everything about the state of age

Dev.to
Moonrepo: Open-Source Build Systems for LLMs
Moonrepo (YC W23) – Open-source build systems for the LLM era and developer focus We are moving away from the monolithic repository model that dominated software engineering for decades. That era of massive, unified build systems handling everything from source code to binary artifacts is giving w

Dev.to
Add AudioObject Schema to Your Blog Posts
You've invested time narrating your blog posts with natural-sounding voices. Readers can now listen instead of read. But here's the problem: Google doesn't know your audio exists unless you tell it in a language it understands. That language is structured data — specifically, schema.org's AudioObjec