Agentic Workflows with Typed Tools Trial
Overview
Agentic workflows with typed tools combine LLM tool use with explicit contracts for tool names, parameters, return shapes, and final outputs. The practical shift is from "ask the model to produce JSON" toward schema-governed interfaces: OpenAI Structured Outputs ensure model responses adhere to supplied JSON Schema rather than merely producing valid JSON, while Anthropic supports strict tool definitions so Claude tool calls match a declared schema exactly (OpenAI Structured Outputs, Anthropic Tool Use). Google Gemini function calling uses function declarations with names, descriptions, required parameters, typed properties, enums, and modes such as VALIDATED, AUTO, ANY, and NONE, reflecting the same move toward constrained model-to-tool interfaces (Google Gemini function calling).
The value is strongest when typed tools are treated as software contracts rather than prompting tricks. Pydantic AI exposes this pattern directly: structured outputs use Pydantic-generated JSON Schema for model-facing tools and Pydantic validation for returned data, with output validators and retry budgets for cases where schema conformance still needs repair (Pydantic AI output docs). This makes typed tools a good fit for workflows where agents query systems, update records, create documents, execute code, or coordinate multiple steps, because the contract boundary can be tested, versioned, observed, and reviewed.
Typed tools are necessary but not sufficient for production agentic workflows. Durable orchestration still has to manage pause/resume, retries, state, and side effects; LangGraph's durable execution guidance, for example, relies on checkpointers, thread identifiers, idempotent operations, and wrapping side-effecting or non-deterministic operations in tasks so resumed workflows do not repeat writes or API calls (LangGraph durable execution). Human-in-the-loop middleware adds another production control by pausing matching tool calls for approval, edit, reject, or respond decisions before execution, with persistent graph state used to resume later (LangChain human-in-the-loop).
Adoption Signals
- OpenAI, Anthropic, Google, AWS, and common agent frameworks now expose typed tool/function interfaces as first-class primitives, including OpenAI Structured Outputs and function calling, Anthropic strict tool use, Gemini function declarations, and Amazon Bedrock action groups backed by function details or OpenAPI schemas (OpenAI Structured Outputs, Anthropic Tool Use, Google Gemini function calling, Amazon Bedrock action groups).
- Python agent stacks are converging on type-driven contracts: Pydantic AI supports tool output, native structured output, prompted output, Pydantic validation, output validators, and per-output retry controls (Pydantic AI output docs).
- Orchestration layers are maturing around production concerns that typed tools expose but do not solve, including durable execution, resumability, human approval, and idempotent side-effect handling (LangGraph durable execution, LangChain human-in-the-loop).
- Enterprise agent platforms increasingly model actions as explicit schemas or action groups. Amazon Bedrock Agents can define action groups through function details or OpenAPI schemas, invoke Lambda executors, return control to the application, and require user confirmation before invocation to reduce malicious prompt-injection impact (Amazon Bedrock action groups).
Risks
- Prompt injection through tool metadata remains a live attack surface. Microsoft describes tool poisoning as malicious instructions embedded in MCP tool descriptions, which models use to decide which tools to invoke; this can steer unintended calls, exfiltrate data, or exploit tools that changed after approval (Microsoft MCP injection guidance).
- Schema conformance is not universal. OpenAI Structured Outputs support only a subset of JSON Schema, require all fields or function parameters to be listed as required, require
additionalProperties: falsefor objects, and can still produce mistakes or explicit refusals that do not follow the supplied schema (OpenAI Structured Outputs). - Cross-model behavior differs. Pydantic AI distinguishes tool output, native structured output, and prompted output, and notes that prompted output is often least reliable because the model is not forced to match the schema; it also notes provider-specific limitations such as Gemini not being able to use tools at the same time as native structured output in that mode (Pydantic AI output docs).
- Side effects require workflow design. A typed call that writes files, sends messages, updates databases, or invokes external APIs still needs idempotency keys, approval gates, audit logs, compensation paths, and durable state so retries or resumes do not duplicate destructive actions (LangGraph durable execution, LangChain human-in-the-loop).
- Tool catalogs can become unsafe or noisy. Google recommends providing only relevant tools, keeping the active tool set ideally to 10-20, using clear function and parameter descriptions, validating significant actions with users before execution, and implementing robust error handling and security controls (Google Gemini function calling).
Pros & Cons
Advantages
- Schema-validated tool calls reduce malformed arguments, missing fields, and invalid enum values compared with prompt-only JSON conventions.
- Typed contracts make agent workflows easier to test, log, audit, replay, and hand over between teams.
- Works well with orchestration frameworks and enterprise agent platforms when combined with durable execution, retries, idempotency, and human approval gates.
Disadvantages
- Tool names, descriptions, schemas, and external tool results remain prompt injection and tool-poisoning attack surfaces.
- Strict schemas can fail on unsupported JSON Schema features, over-constrain model behavior, or require defensive retry and validation logic.
- Typed calls do not by themselves solve workflow durability, side-effect duplication, authorization, observability, or framework portability.
Recommendation
Trial typed agentic workflows for bounded, high-value use cases where tool contracts can be reviewed and tested: data-quality checks, internal workflow automation, IT service actions, codebase operations, document generation, analytics pipelines, and customer-support assistive workflows. Use JSON Schema, OpenAPI, Pydantic models, or provider-native function declarations for every tool boundary; validate both inputs and outputs in application code; and log tool selection, arguments, results, retries, approvals, and errors with correlation IDs.
Do not treat typed tools as a security boundary by themselves. Pair them with least-privilege credentials, allowlisted tools, prompt-injection defenses, tool-metadata review, schema versioning, human approval for irreversible actions, sandboxed execution for code or shell tools, and durable orchestration for long-running or multi-step workflows. Prefer small, task-specific tool sets over large global catalogs, and move from trial to adopt only after teams have repeatable evaluations, failure-mode tests, observability, and migration plans across providers and orchestration frameworks.
Sources
- OpenAI: Structured model outputs
- Anthropic: Tool use with Claude
- Google: Function calling with the Gemini API
- Pydantic AI: Output
- LangGraph: Durable execution
- LangChain: Human-in-the-loop
- Amazon Bedrock: Add an action group to your agent
- Microsoft: Protecting against indirect prompt injection attacks in MCP