---
title: "OpenAI function calling tutorial: building tools for GPT"
canonical: "https://agenticup.dev/posts/openai-function-calling-tutorial/"
pubDate: "2026-06-01T00:00:00.000Z"
description: "OpenAI's function calling API lets the model request function execution — fetch data, interact with APIs, compute things. Here's how to use it from scratch."
tags: [openai, function calling, tutorials, gpt, tool use]
---

**TL;DR:** Function calling turns a chat model into something that can actually do things — query databases, call APIs, and compute results. This guide covers defining tools with JSON Schema, handling parallel calls, streaming with tool call deltas, and building a complete agent loop in 80 lines of Python with no frameworks.


Function calling is the single most important primitive in building AI agents. It's what turns a chat model from a text generator into something that can actually *do things* — query databases, call APIs, send emails, compute results.

I've built agents using both OpenAI's and Anthropic's tool use APIs. Here's my complete guide to OpenAI function calling, built from production experience rather than documentation examples.

> **Key takeaways:**
> - Function calling lets the model request structured function execution — it doesn't execute functions itself, it asks *you* to do it
> - Define tools as JSON Schema objects in the `tools` parameter alongside messages
> - Parallel function calling means the model can request multiple tools in a single response — handle them all before returning results
> - Streaming with function calls works by collecting partial `tool_calls` delta chunks by index
> - A complete agent loop needs just OpenAI's SDK — no frameworks required

OpenAI's [function calling documentation](https://platform.openai.com/docs/guides/function-calling) defines the standard for tool-use APIs — models that accept structured tool definitions and return callable function invocations. This is the most widely adopted tool-use format in the industry.

## What function calling actually is

The name is misleading. OpenAI's function calling doesn't mean the model *calls* functions on your computer. The model outputs a structured request that says "I want to call this function with these arguments." Your code decides whether to execute it.

The flow looks like this:

```
User: "What's the weather in Bengaluru?"

Model: "I should check the weather API."
       ↓
Model outputs: { "function": "get_weather", "args": { "location": "Bengaluru" } }
       ↓
Your code executes get_weather("Bengaluru") → "26°C, partly cloudy"
       ↓
You send the result back to the model

Model: "The weather in Bengaluru is 26°C and partly cloudy."
```

The model never touches your API keys, never executes code on your server. It just requests tool execution. You control what runs.

## Defining tools

Tools are defined as JSON Schema objects. Each tool has a name, description, and parameters schema. The description is critical — it's how the model knows *when* to call the tool.

```python
import openai

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location. Returns temperature, conditions, humidity, and wind speed.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. 'Bengaluru, India' or 'San Francisco, CA'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units. Defaults to celsius for India, fahrenheit for US."
                    }
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_air_quality",
            "description": "Get air quality index and PM2.5 data for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }
]
```

**Rule of thumb for descriptions:** Describe *when* to call the function, not just what it does. A function name like `get_weather` is obvious. The description should clarify edge cases:

- "Call when user asks about weather, temperature, or climate conditions"
- "Call for both current conditions and short-term forecasts"
- "Does NOT support historical weather data"

This prevents the model from calling the wrong tool or calling a tool for tasks it can't handle.

## The basic function calling loop

Here's a working agent loop from scratch — no frameworks, just OpenAI's SDK:

```python
import json
import openai

def agent_loop(user_input: str, tools: list, system_prompt: str = None):
    messages = []

    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})

    messages.append({"role": "user", "content": user_input})

    while True:
        response = openai.responses.create(
            model="gpt-4o",
            input=messages,
            tools=tools,
            tool_choice="auto"
        )

        output = response.output

        # Check if the model wants to call tools
        if output and output[0].type == "function_call":
            tool_call = output[0]

            # Extract function name and arguments
            func_name = tool_call.name
            func_args = json.loads(tool_call.arguments)

            print(f"  → Calling: {func_name}({func_args})")

            # Execute the function
            if func_name == "get_weather":
                result = get_weather(**func_args)
            elif func_name == "get_air_quality":
                result = get_air_quality(**func_args)
            else:
                result = {"error": f"Unknown function: {func_name}"}

            # Add the function call and result to messages
            messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": [{
                    "id": tool_call.call_id,
                    "type": "function",
                    "function": {
                        "name": func_name,
                        "arguments": tool_call.arguments
                    }
                }]
            })

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.call_id,
                "content": json.dumps(result)
            })

            # Continue the loop — the model will use the tool result
            continue

        # No tool calls — return the text response
        return output[0].content
```

This is the core pattern. The loop:

1. Sends messages to the model with available tools
2. If the model requests a function call, executes it and sends the result back
3. If the model returns text, we're done

<div class="callout">
  <div class="callout-title">Note</div>
  <p>I'm using the newer <code>openai.responses.create()</code> API here (the Responses API), which is cleaner for agent loops than the older Chat Completions API. If you're on <code>openai.ChatCompletion.create()</code>, the structure is similar but uses <code>response.choices[0].message.tool_calls</code> instead.</p>
</div>

## Parallel function calling

One of the biggest improvements in recent OpenAI models is parallel function calling — the model can request multiple function calls at once. This is critical for efficiency.

When a user asks "What's the weather and air quality in Bengaluru?", the model can call both `get_weather` and `get_air_quality` simultaneously instead of sequentially.

```python
def agent_loop_parallel(user_input: str, tools: list):
    messages = [{"role": "user", "content": user_input}]

    while True:
        response = openai.responses.create(
            model="gpt-4o",
            input=messages,
            tools=tools,
            tool_choice="auto"
        )

        output = response.output

        # Collect all function calls
        function_calls = [item for item in output if item.type == "function_call"]

        if function_calls:
            # Execute ALL function calls (these could run in parallel)
            tool_messages = []
            for fc in function_calls:
                func_name = fc.name
                func_args = json.loads(fc.arguments)
                print(f"  → Calling: {func_name}({func_args})")

                if func_name == "get_weather":
                    result = get_weather(**func_args)
                elif func_name == "get_air_quality":
                    result = get_air_quality(**func_args)
                else:
                    result = {"error": f"Unknown function: {func_name}"}

                # Add each result to the assistant message
                tool_messages.append({
                    "role": "tool",
                    "tool_call_id": fc.call_id,
                    "content": json.dumps(result)
                })

            # Add assistant message with all tool calls
            messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": [
                    {
                        "id": fc.call_id,
                        "type": "function",
                        "function": {"name": fc.name, "arguments": fc.arguments}
                    }
                    for fc in function_calls
                ]
            })

            # Add all tool results
            messages.extend(tool_messages)
            continue

        return output[0].content
```

The key insight: execute all parallel calls before returning to the model. The model expects to receive all results together.

For performance, I run parallel calls with `concurrent.futures.ThreadPoolExecutor`:

```python
import concurrent.futures

def execute_parallel_calls(function_calls):
    """Execute multiple function calls in parallel using threads."""
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        future_to_call = {
            executor.submit(execute_function, fc): fc
            for fc in function_calls
        }
        results = []
        for future in concurrent.futures.as_completed(future_to_call):
            fc = future_to_call[future]
            try:
                result = future.result()
                results.append((fc.call_id, result))
            except Exception as e:
                results.append((fc.call_id, {"error": str(e)}))
        return results
```

## Streaming with function calls

Streaming complicates function calling because the model sends `tool_calls` deltas as stream chunks instead of a complete JSON object. Each chunk has an `index` property that groups partial arguments for the same function call.

```python
def agent_loop_streaming(user_input: str, tools: list):
    messages = [{"role": "user", "content": user_input}]

    while True:
        stream = openai.responses.create(
            model="gpt-4o",
            input=messages,
            tools=tools,
            tool_choice="auto",
            stream=True
        )

        # Collect streaming chunks
        text_content = ""
        tool_call_deltas = {}  # index → {id, function: {name, arguments}}

        for event in stream:
            if event.type == "response.output_text.delta":
                text_content += event.delta

            elif event.type == "response.function_call_arguments.delta":
                idx = event.item_id
                if idx not in tool_call_deltas:
                    tool_call_deltas[idx] = {"id": "", "name": "", "arguments": ""}

                # Accumulate function call name and arguments
                # (structure depends on SDK version — check your response schema)
                if hasattr(event, 'name'):
                    tool_call_deltas[idx]["name"] += event.name
                if hasattr(event, 'arguments'):
                    tool_call_deltas[idx]["arguments"] += event.arguments

        # After streaming completes, process tool calls
        if tool_call_deltas:
            tool_messages = []
            for call_id, delta in tool_call_deltas.items():
                func_args = json.loads(delta["arguments"])

                if delta["name"] == "get_weather":
                    result = get_weather(**func_args)
                else:
                    result = {"error": f"Unknown function"}

                tool_messages.append({
                    "role": "tool",
                    "tool_call_id": call_id,
                    "content": json.dumps(result)
                })

            messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": [
                    {"id": call_id, "type": "function",
                     "function": {"name": d["name"], "arguments": d["arguments"]}}
                    for call_id, d in tool_call_deltas.items()
                ]
            })
            messages.extend(tool_messages)
            continue

        return text_content
```

<div class="callout">
  <div class="callout-title">Pro tip</div>
  <p>When streaming, always check the stream event type before accessing fields. Different SDK versions structure streaming events differently. I've been burnt by this twice — test your stream parsing against the actual SDK version you're using.</p>
</div>

## Error handling for function calls

Function calls fail. APIs return 500s. Network drops. Invalid arguments. Your agent needs to handle these gracefully.

```python
def safe_execute_function(func_name: str, func_args: dict) -> dict:
    """Execute a function with error handling. Returns a result dict regardless of outcome."""
    try:
        if func_name == "get_weather":
            return get_weather(**func_args)
        elif func_name == "get_air_quality":
            return get_air_quality(**func_args)
        else:
            return {"error": f"Unknown function: {func_name}", "success": False}
    except KeyError as e:
        return {"error": f"Missing required parameter: {e}", "success": False}
    except TypeError as e:
        return {"error": f"Invalid arguments: {e}", "success": False, "args": func_args}
    except Exception as e:
        return {"error": f"Function execution failed: {str(e)}", "success": False}
```

When a function fails, return a structured error message to the model. The model can then:
- Explain the error to the user
- Try again with corrected arguments
- Try a different approach

Models handle errors surprisingly well if you return clear error messages. I've had the model suggest fixes for API credential issues based on the error text alone.

## Comparison with Anthropic tool use

I build with both providers. Here's how they compare for function calling:

| Aspect | OpenAI | Anthropic |
|--------|--------|-----------|
| Tool definition | JSON Schema in `tools` parameter | JSON Schema in `tools` parameter |
| Response format | `tool_calls` array on message | `content` blocks with `tool_use` type |
| Parallel calls | Native in one response | Native in one response |
| Streaming | Delta chunks with index | Content block deltas |
| Thinking before tools | No, calls directly | Optional `thinking` block before tool calls |
| Error recovery | Good with clear messages | Better — Claude is more cautious about retrying |

Anthropic's key difference: Claude can optionally *think* before calling tools, which produces better results for complex multi-step reasoning. OpenAI's models tend to call tools more eagerly but also more prematurely.

I use OpenAI for simpler tool use (fetch data, compute results) and Anthropic when the agent needs to reason deeply before acting (multi-step analysis, research agents).

## When function calling breaks

After months of production use, here's what causes function calling to fail:

**Ambiguous schemas.** If two functions have overlapping descriptions (e.g., `search_documents` and `search_web`), the model gets confused about which to call. I've seen the model call `search_documents` when it should call `search_web` simply because the descriptions weren't distinct enough.

Fix: Make descriptions mutually exclusive. "Use for searching the local document store" vs "Use for searching the internet."

**Contradictory instructions.** If your system prompt says "Never make up information" but you also have a `generate_report` function that expects complete data, the model may refuse to call the function because it can't satisfy both constraints.

Fix: Review your system prompt for conflicts with tool descriptions.

**Missing required parameters.** The model sometimes omits optional parameters it should include. Making the parameter required (in JSON Schema) forces the model to provide it but increases the chance of hallucinated values.

Fix: Accept reasonable defaults in your function implementation instead of requiring the model to provide every parameter.

---

*Related: [Best AI agent frameworks in 2026](/posts/best-ai-agent-frameworks-2026/) — where frameworks help and where they get in the way.*

## Building a simple agent from scratch

Here's the complete agent pattern I use for production. It's about 80 lines of Python with no framework dependencies:

```python
import json
import openai
from datetime import datetime

class FunctionCallingAgent:
    def __init__(self, tools: list, functions: dict, model="gpt-4o", max_steps=10):
        self.tools = tools
        self.functions = functions  # {"function_name": callable}
        self.model = model
        self.max_steps = max_steps
        self.steps = 0
        self.messages = []

    def run(self, user_input: str) -> str:
        self.messages = [
            {"role": "system", "content": f"You are a helpful assistant. Today is {datetime.now().strftime('%Y-%m-%d')}. Use tools when needed."},
            {"role": "user", "content": user_input}
        ]

        while self.steps < self.max_steps:
            self.steps += 1

            response = openai.responses.create(
                model=self.model,
                input=self.messages,
                tools=self.tools,
                tool_choice="auto"
            )

            output = response.output
            function_calls = [o for o in output if o.type == "function_call"]

            if not function_calls:
                return output[0].content

            # Execute all tool calls
            assistant_tool_calls = []
            for fc in function_calls:
                func = self.functions.get(fc.name)
                if not func:
                    result = {"error": f"Unknown function: {fc.name}"}
                else:
                    try:
                        args = json.loads(fc.arguments)
                        result = func(**args)
                    except Exception as e:
                        result = {"error": str(e)}

                assistant_tool_calls.append({
                    "id": fc.call_id,
                    "type": "function",
                    "function": {"name": fc.name, "arguments": fc.arguments}
                })

                self.messages.append({
                    "role": "tool",
                    "tool_call_id": fc.call_id,
                    "content": json.dumps(result)
                })

            self.messages.append({
                "role": "assistant",
                "content": None,
                "tool_calls": assistant_tool_calls
            })

        return "Agent stopped: max steps reached."

# Usage
tools = [...]  # Your tool definitions
functions = {
    "get_weather": get_weather,
    "get_air_quality": get_air_quality,
}

agent = FunctionCallingAgent(tools, functions)
result = agent.run("What's the weather and air quality in Bengaluru?")
```

That's it. No LangChain. No LangGraph. One class, 80 lines, production-ready if you add logging and error handling on top.

Function calling is the foundation. Everything else — state machines, multi-agent orchestration, monitoring — is built on top of this pattern. Master this first, and you can build anything.
