Tool Use & Function Calling

TL;DR

Tool use (Anthropic) and function calling (OpenAI) are two names for the same pattern: the LLM emits a structured request to invoke a named function with JSON arguments, the application runs it, and the result is fed back into the next turn.
OpenAI introduced function calling in June 2023 with the gpt-3.5/4-0613 models. Anthropic added tool use in 2024. Google, Mistral, Cohere, and most open-weight model families followed.
Tools are defined by a name, a human-readable description, and a JSON Schema describing the expected arguments. The model's training has taught it to recognise when a tool is relevant and emit a syntactically valid call.
Tool use is the foundation under every modern agent framework. Understanding the raw mechanics — without a framework — is the single most useful thing to learn when starting LLM application work.

The Loop#

The tool-use loop is short. The application sends a request with a list of tool definitions and the conversation so far. The model returns a response that may contain a tool-call block with a name and JSON arguments. The application parses the call, runs the function (locally, via an API, anywhere), and sends a follow-up request containing the original conversation, the tool call, and the tool result. The model uses the result to compose a reply or call another tool.

That is the entire pattern. Agent frameworks add scheduling, observability, parallelism, memory, and multi-agent coordination, but at the bottom they are all running this loop.

Defining a Tool#

Tools are defined by three things: a name (snake_case, model-readable), a description (English, model-readable — this is what the model reads to decide whether to call the tool), and a JSON Schema for the arguments. The description is more important than people expect: a vague or generic description leads to over- or under-calling. A precise description with examples of when to use the tool and when not to is the single biggest lever for tool-use reliability.

json

{
  "name": "lookup_order",
  "description": "Look up the status of a customer order by its order ID. Use this only when the user provides an order ID; do not guess one.",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "pattern": "^ORD-[0-9]{8}$",
        "description": "Order ID in the format ORD-12345678"
      }
    },
    "required": ["order_id"]
  }
}

Parallel Tool Calls#

Modern frontier models can emit multiple independent tool calls in a single response. Anthropic and OpenAI both support this and the SDK exposes them as a list of tool-use blocks in the response. The application is expected to execute them in parallel and return all results in the next request.

Parallel tool calls dramatically lower latency for fan-out workloads — checking three calendars, querying four databases, fetching five URLs. Make sure your tool implementations are actually safe to run concurrently; the model assumes they are.

Structured Outputs vs Tool Calls#

There is a related but distinct feature called structured outputs (OpenAI) or JSON mode / tool-result coercion (Anthropic). It uses the same JSON-schema mechanism but for the final response rather than to invoke a tool. The difference is intent: tool calls are for invoking external code; structured outputs are for getting typed data back.

When you want the model to produce a typed object for downstream code (extracting entities, classifying, filling a form), use structured outputs. When you want the model to take an action, use tool calls. The distinction matters because mis-modelled use of one for the other tends to produce worse results.

Reserve the model's tool budget for things only the model can decide. Don't expose pure data lookups as tools when you can pre-fetch them into the prompt — the model is better at reasoning over retrieved context than at deciding which lookup to perform.

Schema Quality Matters#

Tool-use reliability tracks schema quality more than model quality. A schema with enum values, regex patterns, required-field discipline, and per-property descriptions reliably produces valid arguments. A loose schema with `additionalProperties: true` and few constraints produces a fraction of valid calls — and the model is more likely to hallucinate fields.

Spend the time. Write descriptions for every field. Use enums whenever a property has a fixed value set. Mark required fields explicitly. The hours you invest pay back in lower retry rates, fewer customer-visible errors, and lower model spend.

Common Pitfalls#

Too many tools — beyond around 20-30 tools, model accuracy degrades. Split into agents or use a router pattern.
Vague descriptions — "search for things" is not enough. Be specific about what, when, and what not to use the tool for.
Side-effecting tools without confirmation — destructive operations (send email, delete record) should require an explicit confirmation step or human-in-the-loop approval.
Returning unstructured tool results — wrap results in a clear, parsable format so the model does not have to guess where the answer lives.
No timeout or retry — tools fail. Plan for it; surface the failure to the model so it can decide whether to retry or change tack.

References

The Loop#

That is the entire pattern. Agent frameworks add scheduling, observability, parallelism, memory, and multi-agent coordination, but at the bottom they are all running this loop.

Defining a Tool#

json

{
  "name": "lookup_order",
  "description": "Look up the status of a customer order by its order ID. Use this only when the user provides an order ID; do not guess one.",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "pattern": "^ORD-[0-9]{8}$",
        "description": "Order ID in the format ORD-12345678"
      }
    },
    "required": ["order_id"]
  }
}

Parallel Tool Calls#

Structured Outputs vs Tool Calls#

Schema Quality Matters#

Common Pitfalls#

Too many tools — beyond around 20-30 tools, model accuracy degrades. Split into agents or use a router pattern.

Vague descriptions — "search for things" is not enough. Be specific about what, when, and what not to use the tool for.

Side-effecting tools without confirmation — destructive operations (send email, delete record) should require an explicit confirmation step or human-in-the-loop approval.

Returning unstructured tool results — wrap results in a clear, parsable format so the model does not have to guess where the answer lives.

No timeout or retry — tools fail. Plan for it; surface the failure to the model so it can decide whether to retry or change tack.

Tool Use & Function Calling

The Loop#

Defining a Tool#

Parallel Tool Calls#

Structured Outputs vs Tool Calls#

Schema Quality Matters#

Common Pitfalls#

References

Browse all entries

Deploy on Yobitel

Tool Use & Function Calling

The Loop#

Defining a Tool#

Parallel Tool Calls#

Structured Outputs vs Tool Calls#

Schema Quality Matters#

Common Pitfalls#

References

Browse all entries

Deploy on Yobitel