Building AI Agents from scratch - Part 1: Tool use
Let's implement AI Agent from scratch without using any framework. Today we implement the tool use capability.
👋 I am Aurimas. I write the SwirlAI Newsletter with the goal of presenting complicated Data related concepts in a simple and easy-to-digest way. My mission is to help You UpSkill and keep You updated on the latest news in GenAI, MLOps, Data Engineering, Machine Learning and overall Data space.
First of all, I want to wish you a joyful and peaceful holiday season in advance!
This is the first article in the series where we will build AI Agents from scratch without using any LLM orchestration frameworks. In this one you will learn:
What are agents?
How the Tool usage actually works.
How to build a decorator wrapper that extracts relevant details from a Python function to be passed to the LLM via system prompt.
How to think about constructing effective system prompts that can be used for Agents.
How to build an Agent class that is able to plan and execute actions using provided Tools.
You can find the code examples for this and following projects in this GitHub repository.
If something does not work as expected, feel free to DM me or leave a comment, let’s figure it out together!
“The future of AI is Agentic.”
“Year 2025 will be the year of Agents.”
These are the phrases you hear nowadays left and right. And there is a lot of truth to it. In order to bring the most business value out of LLMs, we are turning to complex agentic flows.
What is an AI Agent?
In it’s simplest high level definition, an AI agent is an application that uses LLM at the core as it’s reasoning engine to decide on the steps it needs to take to solve for users intent. It is usually depicted similar to the picture bellow and is composed of multiple building blocks:
Planning - the capability to plan a sequence of actions that the application needs to perform in order to solve for the provided intent.
Memory - short-term and long-term memory containing any information that the agent might need to reason about the actions it needs to take. This information is usually passed to LLM via a system prompt as part of the core. You can read more about different types of memories in one of my previous articles:
Tools - any functions that the application can call to enhance it’s reasoning capabilities. One should not be fooled by the simplicity of this definition as a tool can be literally anything:
Simple functions defined in code.
VectorDBs and other data stores containing context.
Regular Machine Learning model APIs.
Other Agents!
…
In the following set of articles, I will implement most of the moving parts of an agent from scratch without using any orchestration frameworks. This episode is about Tool use.
If you are using any orchestration frameworks for agentic applications, you might be abstracted away from what using a tool really means. This article will help you understand what providing a tool and using it via an agent involves. I believe that understanding applications from the base building blocks is really important for few reasons:
Frameworks hide the implementation details of the system prompts used, different approaches might be needed in different use cases.
You might want to tune the low level details to achieve most optimal performance of the agent.
Having clarity of how the systems actually work helps build up your systems thinking enabling you to craft advanced applications more efficiently.
Tool use on a high level.
The basic thing one needs to understand when building agentic applications is that LLMs do not run code, they are only used to produce intent via prompting. Why can ChatGPT browse the internet and return more accurate and recent results? Because ChatGPT IS an agent and there are many non LLM building blocks hidden from us behind the API.
Prompt engineering becomes critical when building agentic applications. More specifically, how you craft the system prompt. Simplified prompt structure looks like the following.
The agent will only perform well if you are able to efficiently provide the system prompt with available tool definitions and expected outputs which are in a form of planned actions or raw answers.
Implementing the Agent.
In this part, we will create an AI Agent, that is capable of checking currency conversion rates online and performing the conversion if needed to answer the user query.
You can also find the code in a GitHub repository here.
You can follow the tutorial using the Jupyter notebook here.
I will also create a Youtube video explaining the process in the following weeks. If you don’t want to miss it, you can subscribe to the Youtube channel here.
Preparing python functions to be used as tools.
The easiest and most convenient way to provide tools to an agent is through functions, in our project we will be using Python for this.
We do not need to provide the function code itself to the system prompt but we need to extract useful information about it so that LLM can decide if and how the function should be invoked.
We’ll define a dataclass that contains desired information including the function runnable.
@dataclass
class Tool:
name: str
description: str
func: Callable[..., str]
parameters: Dict[str, Dict[str, str]]
def __call__(self, *args, **kwargs) -> str:
return self.func(*args, **kwargs)
The information we are extracting includes:
Function name.
function description (we will extract this from a docstring).
Function callable so that we can invoke it as part of the agent.
Parameters that should be used with the function so that the LLM can decide on how to call the function.
Now we will need to extract the above information from the functions we define. One requirement for the functions we will enforce is to have properly formatted docstrings. We will require the following format:
"""Description of what the tool does.
Parameters:
- param1: Description of first parameter
- param2: Description of second parameter
"""
The following function extracts information about parameters - parameter names and descriptions.
def parse_docstring_params(docstring: str) -> Dict[str, str]:
"""Extract parameter descriptions from docstring."""
if not docstring:
return {}
params = {}
lines = docstring.split('\n')
in_params = False
current_param = None
for line in lines:
line = line.strip()
if line.startswith('Parameters:'):
in_params = True
elif in_params:
if line.startswith('-') or line.startswith('*'):
current_param = line.lstrip('- *').split(':')[0].strip()
params[current_param] = line.lstrip('- *').split(':')[1].strip()
elif current_param and line:
params[current_param] += ' ' + line.strip()
elif not line:
in_params = False
return params
We will be extracting function parameter types from typehints provided via function definition. The bellow function will help format them.
def get_type_description(type_hint: Any) -> str:
"""Get a human-readable description of a type hint."""
if isinstance(type_hint, _GenericAlias):
if type_hint._name == 'Literal':
return f"one of {type_hint.__args__}"
return type_hint.__name__
A very convenient way to turn a function into a tool is to use a decorator. The below code defines a tool decorator that wraps a function if used. It uses either function name for the tool name or a variable provided via decorator.
def tool(name: str = None):
def decorator(func: Callable[..., str]) -> Tool:
tool_name = name or func.__name__
description = inspect.getdoc(func) or "No description available"
type_hints = get_type_hints(func)
param_docs = parse_docstring_params(description)
sig = inspect.signature(func)
params = {}
for param_name, param in sig.parameters.items():
params[param_name] = {
"type": get_type_description(type_hints.get(param_name, Any)),
"description": param_docs.get(param_name, "No description available")
}
return Tool(
name=tool_name,
description=description.split('\n\n')[0],
func=func,
parameters=params
)
return decorator
Currency exchange tool.
The below creates a tool from a function that takes in the amount of currency to exchange from, the currency code to be converted from and the currency code to convert to. The function searches for the relevant currency exchange rate and performs the calculation of resulting currency amount.
@tool()
def convert_currency(amount: float, from_currency: str, to_currency: str) -> str:
"""Converts currency using latest exchange rates.
Parameters:
- amount: Amount to convert
- from_currency: Source currency code (e.g., USD)
- to_currency: Target currency code (e.g., EUR)
"""
try:
url = f"https://open.er-api.com/v6/latest/{from_currency.upper()}"
with urllib.request.urlopen(url) as response:
data = json.loads(response.read())
if "rates" not in data:
return "Error: Could not fetch exchange rates"
rate = data["rates"].get(to_currency.upper())
if not rate:
return f"Error: No rate found for {to_currency}"
converted = amount * rate
return f"{amount} {from_currency.upper()} = {converted:.2f} {to_currency.upper()}"
except Exception as e:
return f"Error converting currency: {str(e)}"
Let’s just run
convert_currency
It should return something like
Tool(name='convert_currency', description='Converts currency using latest exchange rates.', func=<function convert_currency at 0x106d8fa60>, parameters={'amount': {'type': 'float', 'description': 'Amount to convert'}, 'from_currency': {'type': 'str', 'description': 'Source currency code (e.g., USD)'}, 'to_currency': {'type': 'str', 'description': 'Target currency code (e.g., EUR)'}})
This is great! We have successfully extracted information we will be providing to the LLM as a tool definition.
Crafting the system prompt.
We will be using gpt-4o-mini as our reasoning engine. It is known that GPT model family performs better when the input prompt is formatted as json. So we will do exactly that. Actually, the system prompt is the most important part of our agent, here is the final one we will be using:
{
"role": "AI Assistant",
"capabilities": [
"Using provided tools to help users when necessary",
"Responding directly without tools for questions that don't require tool usage",
"Planning efficient tool usage sequences"
],
"instructions": [
"Use tools only when they are necessary for the task",
"If a query can be answered directly, respond with a simple message instead of using tools",
"When tools are needed, plan their usage efficiently to minimize tool calls"
],
"tools": [
{
"name": tool.name,
"description": tool.description,
"parameters": {
name: {
"type": info["type"],
"description": info["description"]
}
for name, info in tool.parameters.items()
}
}
for tool in self.tools.values()
],
"response_format": {
"type": "json",
"schema": {
"requires_tools": {
"type": "boolean",
"description": "whether tools are needed for this query"
},
"direct_response": {
"type": "string",
"description": "response when no tools are needed",
"optional": True
},
"thought": {
"type": "string",
"description": "reasoning about how to solve the task (when tools are needed)",
"optional": True
},
"plan": {
"type": "array",
"items": {"type": "string"},
"description": "steps to solve the task (when tools are needed)",
"optional": True
},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tool": {
"type": "string",
"description": "name of the tool"
},
"args": {
"type": "object",
"description": "parameters for the tool"
}
}
},
"description": "tools to call in sequence (when tools are needed)",
"optional": True
}
},
"examples": [
{
"query": "Convert 100 USD to EUR",
"response": {
"requires_tools": True,
"thought": "I need to use the currency conversion tool to convert USD to EUR",
"plan": [
"Use convert_currency tool to convert 100 USD to EUR",
"Return the conversion result"
],
"tool_calls": [
{
"tool": "convert_currency",
"args": {
"amount": 100,
"from_currency": "USD",
"to_currency": "EUR"
}
}
]
}
},
{
"query": "What's 500 Japanese Yen in British Pounds?",
"response": {
"requires_tools": True,
"thought": "I need to convert JPY to GBP using the currency converter",
"plan": [
"Use convert_currency tool to convert 500 JPY to GBP",
"Return the conversion result"
],
"tool_calls": [
{
"tool": "convert_currency",
"args": {
"amount": 500,
"from_currency": "JPY",
"to_currency": "GBP"
}
}
]
}
},
{
"query": "What currency does Japan use?",
"response": {
"requires_tools": False,
"direct_response": "Japan uses the Japanese Yen (JPY) as its official currency. This is common knowledge that doesn't require using the currency conversion tool."
}
}
]
}
}
A lot to unpack, let’s analyse it step by step:
"role": "AI Assistant",
"capabilities": [
"Using provided tools to help users when necessary",
"Responding directly without tools for questions that don't require tool usage",
"Planning efficient tool usage sequences"
],
"instructions": [
"Use tools only when they are necessary for the task",
"If a query can be answered directly, respond with a simple message instead of using tools",
"When tools are needed, plan their usage efficiently to minimize tool calls"
]
This is where we define the qualities of the Agent, in general we are enforcing the behaviour that tools should be used only when necessary.
"tools": [
{
"name": tool.name,
"description": tool.description,
"parameters": {
name: {
"type": info["type"],
"description": info["description"]
}
for name, info in tool.parameters.items()
}
}
for tool in self.tools.values()
]
This is where we unpack the tools into a list. The tool list will be part of Agent class, that is why we loop through self.tools. Remember, each tool is defined by the Dataclass we created in the first part.
"response_format": {
"type": "json",
"schema": {
"requires_tools": {
"type": "boolean",
"description": "whether tools are needed for this query"
},
"direct_response": {
"type": "string",
"description": "response when no tools are needed",
"optional": True
},
"thought": {
"type": "string",
"description": "reasoning about how to solve the task (when tools are needed)",
"optional": True
},
"plan": {
"type": "array",
"items": {"type": "string"},
"description": "steps to solve the task (when tools are needed)",
"optional": True
},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tool": {
"type": "string",
"description": "name of the tool"
},
"args": {
"type": "object",
"description": "parameters for the tool"
}
}
},
"description": "tools to call in sequence (when tools are needed)",
"optional": True
}
}
}
Above enforces the LLM output schema. We provide strict instructions here:
requires_tools: return if tool usage is required.
direct_response: if above is false return a direct response.
thought: description on how the task should be solved.
plan: steps to solve the task.
tool_calls: tool calls in sequence including functions and parameters to be used. Our example only includes one tool, but it does not necessarily have to.
"examples": [
{
"query": "Convert 100 USD to EUR",
"response": {
"requires_tools": True,
"thought": "I need to use the currency conversion tool to convert USD to EUR",
"plan": [
"Use convert_currency tool to convert 100 USD to EUR",
"Return the conversion result"
],
"tool_calls": [
{
"tool": "convert_currency",
"args": {
"amount": 100,
"from_currency": "USD",
"to_currency": "EUR"
}
}
]
}
},
{
"query": "What's 500 Japanese Yen in British Pounds?",
"response": {
"requires_tools": True,
"thought": "I need to convert JPY to GBP using the currency converter",
"plan": [
"Use convert_currency tool to convert 500 JPY to GBP",
"Return the conversion result"
],
"tool_calls": [
{
"tool": "convert_currency",
"args": {
"amount": 500,
"from_currency": "JPY",
"to_currency": "GBP"
}
}
]
}
},
{
"query": "What currency does Japan use?",
"response": {
"requires_tools": False,
"direct_response": "Japan uses the Japanese Yen (JPY) as its official currency. This is common knowledge that doesn't require using the currency conversion tool."
}
}
]
Finally, we provide some examples of correct reasoning above.
Implementing the Agent Class
The agent class is quite lengthy due to the long system prompt:
class Agent:
def __init__(self):
"""Initialize Agent with empty tool registry."""
self.client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self.tools: Dict[str, Tool] = {}
def add_tool(self, tool: Tool) -> None:
"""Register a new tool with the agent."""
self.tools[tool.name] = tool
def get_available_tools(self) -> List[str]:
"""Get list of available tool descriptions."""
return [f"{tool.name}: {tool.description}" for tool in self.tools.values()]
def use_tool(self, tool_name: str, **kwargs: Any) -> str:
"""Execute a specific tool with given arguments."""
if tool_name not in self.tools:
raise ValueError(f"Tool '{tool_name}' not found. Available tools: {list(self.tools.keys())}")
tool = self.tools[tool_name]
return tool.func(**kwargs)
def create_system_prompt(self) -> str:
"""Create the system prompt for the LLM with available tools."""
tools_json = {
"role": "AI Assistant",
"capabilities": [
"Using provided tools to help users when necessary",
"Responding directly without tools for questions that don't require tool usage",
"Planning efficient tool usage sequences"
],
"instructions": [
"Use tools only when they are necessary for the task",
"If a query can be answered directly, respond with a simple message instead of using tools",
"When tools are needed, plan their usage efficiently to minimize tool calls"
],
"tools": [
{
"name": tool.name,
"description": tool.description,
"parameters": {
name: {
"type": info["type"],
"description": info["description"]
}
for name, info in tool.parameters.items()
}
}
for tool in self.tools.values()
],
"response_format": {
"type": "json",
"schema": {
"requires_tools": {
"type": "boolean",
"description": "whether tools are needed for this query"
},
"direct_response": {
"type": "string",
"description": "response when no tools are needed",
"optional": True
},
"thought": {
"type": "string",
"description": "reasoning about how to solve the task (when tools are needed)",
"optional": True
},
"plan": {
"type": "array",
"items": {"type": "string"},
"description": "steps to solve the task (when tools are needed)",
"optional": True
},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tool": {
"type": "string",
"description": "name of the tool"
},
"args": {
"type": "object",
"description": "parameters for the tool"
}
}
},
"description": "tools to call in sequence (when tools are needed)",
"optional": True
}
},
"examples": [
{
"query": "Convert 100 USD to EUR",
"response": {
"requires_tools": True,
"thought": "I need to use the currency conversion tool to convert USD to EUR",
"plan": [
"Use convert_currency tool to convert 100 USD to EUR",
"Return the conversion result"
],
"tool_calls": [
{
"tool": "convert_currency",
"args": {
"amount": 100,
"from_currency": "USD",
"to_currency": "EUR"
}
}
]
}
},
{
"query": "What's 500 Japanese Yen in British Pounds?",
"response": {
"requires_tools": True,
"thought": "I need to convert JPY to GBP using the currency converter",
"plan": [
"Use convert_currency tool to convert 500 JPY to GBP",
"Return the conversion result"
],
"tool_calls": [
{
"tool": "convert_currency",
"args": {
"amount": 500,
"from_currency": "JPY",
"to_currency": "GBP"
}
}
]
}
},
{
"query": "What currency does Japan use?",
"response": {
"requires_tools": False,
"direct_response": "Japan uses the Japanese Yen (JPY) as its official currency. This is common knowledge that doesn't require using the currency conversion tool."
}
}
]
}
}
return f"""You are an AI assistant that helps users by providing direct answers or using tools when necessary.
Configuration, instructions, and available tools are provided in JSON format below:
{json.dumps(tools_json, indent=2)}
Always respond with a JSON object following the response_format schema above.
Remember to use tools only when they are actually needed for the task."""
def plan(self, user_query: str) -> Dict:
"""Use LLM to create a plan for tool usage."""
messages = [
{"role": "system", "content": self.create_system_prompt()},
{"role": "user", "content": user_query}
]
response = self.client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0
)
try:
return json.loads(response.choices[0].message.content)
except json.JSONDecodeError:
raise ValueError("Failed to parse LLM response as JSON")
def execute(self, user_query: str) -> str:
"""Execute the full pipeline: plan and execute tools."""
try:
plan = self.plan(user_query)
if not plan.get("requires_tools", True):
return plan["direct_response"]
# Execute each tool in sequence
results = []
for tool_call in plan["tool_calls"]:
tool_name = tool_call["tool"]
tool_args = tool_call["args"]
result = self.use_tool(tool_name, **tool_args)
results.append(result)
# Combine results
return f"""Thought: {plan['thought']}
Plan: {'. '.join(plan['plan'])}
Results: {'. '.join(results)}"""
except Exception as e:
return f"Error executing plan: {str(e)}"
Let’s look into it step by step (skipping the create_system_prompt method as we already analysed it in the previous part).
def add_tool(self, tool: Tool) -> None:
"""Register a new tool with the agent."""
self.tools[tool.name] = tool
def get_available_tools(self) -> List[str]:
"""Get list of available tool descriptions."""
return [f"{tool.name}: {tool.description}" for tool in self.tools.values()]
def use_tool(self, tool_name: str, **kwargs: Any) -> str:
"""Execute a specific tool with given arguments."""
if tool_name not in self.tools:
raise ValueError(f"Tool '{tool_name}' not found. Available tools: {list(self.tools.keys())}")
tool = self.tools[tool_name]
return tool.func(**kwargs)
Above contain methods to manage tools:
Attaching tools to the agent.
List attached tools.
Invoke execution of a tool.
def plan(self, user_query: str) -> Dict:
"""Use LLM to create a plan for tool usage."""
messages = [
{"role": "system", "content": self.create_system_prompt()},
{"role": "user", "content": user_query}
]
response = self.client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0
)
try:
return json.loads(response.choices[0].message.content)
except json.JSONDecodeError:
raise ValueError("Failed to parse LLM response as JSON")
The above simply executes the system prompt, we defined the expected output as part of the system prompt. It exactly provides the actions that the LLM planned or a direct answer if the tool calling is not needed.
def execute(self, user_query: str) -> str:
"""Execute the full pipeline: plan and execute tools."""
try:
plan = self.plan(user_query)
if not plan.get("requires_tools", True):
return plan["direct_response"]
# Execute each tool in sequence
results = []
for tool_call in plan["tool_calls"]:
tool_name = tool_call["tool"]
tool_args = tool_call["args"]
result = self.use_tool(tool_name, **tool_args)
results.append(result)
# Combine results
return f"""Thought: {plan['thought']}
Plan: {'. '.join(plan['plan'])}
Results: {'. '.join(results)}"""
except Exception as e:
return f"Error executing plan: {str(e)}"
The above executes the plan method and acts on it. You might remember that the plan can include multiple sequential tool executions, that is why we are looping through planned tool calls.
Running the Agent.
That’s it, we have all of the necessary code to create and use the Agent. in the following code we initialise the agent, attach a convert_currency tool to it and loop through two user queries. First one should require the tool use while the second not.
agent = Agent()
agent.add_tool(convert_currency)
query_list = ["I am traveling to Japan from Serbia, I have 1500 of local currency, how much of Japanese currency will I be able to get?",
"How are you doing?"]
for query in query_list:
print(f"\nQuery: {query}")
result = agent.execute(query)
print(result)
The output should be similar to:
Query: I am traveling to Japan from Serbia, I have 1500 of local currency, how much of Japanese currency will I be able to get?
Thought: I need to convert 1500 Serbian Dinars (RSD) to Japanese Yen (JPY) using the currency conversion tool.
Plan: Use convert_currency tool to convert 1500 RSD to JPY. Return the conversion result
Results: 1500 RSD = 2087.49 JPY
Query: How are you doing?
I'm just a computer program, so I don't have feelings, but I'm here and ready to help you!
As expected! First query uses the tool, while the second does not.
That’s it for today, we’ve learned:
How to wrap python functions to be provided as tools to the Agent.
How to craft a system prompt that uses the tool definitions in planning the execution.
How to implement the agent that executes on the plan.
Interesting, I am working on a vscode copilot called Codebuddy. This definitely is a good material to get me started.
Interesting article, aurimas
Thank you for writing this
I was wondering your thoughts abt Anthropic piece on Dec 19 about agents and workflows
You agree with their definition of agents and that most times you don’t really need agents?