The Tool-Use Revolution: How Function Calling Transformed LLMs Into Agents

The single most important capability that turned language models into agents wasn't better reasoning — it was tool use. Here's the technical story of how function calling changed everything.

The Missing Piece

In 2023, GPT-4 could reason about code, explain algorithms, and even write functional programs. But it couldn't do anything. It couldn't run a command, read a file, or check if its code actually worked. The model was brilliant but trapped inside a text box.

Tool use — the ability for a model to call external functions — broke it free.

What Is Tool Use?

Tool use (also called function calling) allows a language model to:

Recognize that it needs external capability to fulfill a request
Select the appropriate tool from an available set
Format the correct parameters for that tool
Interpret the tool's output and continue reasoning

// Model receives tool definitions:
tools: [
  { name: "read_file", params: { path: "string" } },
  { name: "run_command", params: { command: "string" } },
  { name: "edit_file", params: { path: "string", content: "string" } },
  { name: "web_search", params: { query: "string" } }
]

// Model decides to use a tool:
→ tool_call: read_file({ path: "src/auth.ts" })
← result: "import { verify } from 'jsonwebtoken'..."
→ model: "I see the auth module uses JWT. Let me check the middleware..."
→ tool_call: read_file({ path: "src/middleware/auth.ts" })

The Evolution of Tool Use

Phase 1: Structured Output (2023)

Early function calling was fragile. Models would sometimes generate malformed JSON, call nonexistent functions, or hallucinate parameter values. Reliability was around 80-85%.

Phase 2: Reliable Tool Use (2024)

Claude 3, GPT-4 Turbo, and Gemini 1.5 made tool use reliable enough for production. JSON formatting became consistent, parameter validation improved, and models learned to handle tool errors gracefully. Reliability jumped to 95%+.

Phase 3: Agentic Tool Use (2025)

Models began using tools strategically — not just when asked, but proactively. They plan multi-step tool sequences, parallelize independent calls, and adjust their tool usage based on results. This is the agentic leap.

Tool Design Patterns

The Swiss Army Knife Anti-Pattern

Bad: One tool that does everything

tools: [{ name: "do_everything", params: { action: "string", ... } }]

Good: Focused tools with clear responsibilities

tools: [
  { name: "read_file", params: { path: "string" } },
  { name: "write_file", params: { path: "string", content: "string" } },
  { name: "run_tests", params: { test_path: "string" } },
  { name: "search_code", params: { pattern: "string", path: "string" } }
]

The Feedback Loop Pattern

Tools should return rich information that helps the model reason:

// Bad: run_tests returns "FAIL"
// Good: run_tests returns:
{
  "passed": 12,
  "failed": 1,
  "failures": [{
    "test": "test_auth_middleware",
    "error": "Expected 401, got 200",
    "file": "tests/auth.test.ts:45"
  }]
}

The Permission Tier Pattern

Not all tools should be equally accessible:

Always available: read_file, search, list_directory
Requires confirmation: write_file, run_command
Requires explicit approval: delete_file, deploy, send_email

The Compounding Effect

Tool use enables other capabilities that enable more tool use:

Tool use → agents can run tests → agents can verify their code → agents write better code
Tool use → agents can search the web → agents have current information → agents give better advice
Tool use → agents can read codebases → agents understand context → agents make targeted edits

This compounding effect is why tool use was the tipping point that created the agent era. Not smarter models — models that can act.

Conclusion

The tool-use revolution is easy to overlook because it's infrastructure, not a headline feature. But it's the foundation on which everything else — coding agents, security agents, research agents — is built. Language models were always intelligent. Tool use made them capable.