Function Calling All-In
Qwen Korea Meetupqwen-korea-meetup-20260326.pptxβQwen Meetup Korea β 2026.03.26
TL;DR
- AutoBeΒ
- A backend AI agent built entirely on function calling
- The LLM never writes code β it fills typed structures, and the compiler converts them to code
- 100% compilation success across all 4 Qwen models
- TypiaΒ
- Infrastructure that automates the entire function calling lifecycle
- Schema generation β lenient parsing β type coercion β validation feedback
- qwen3-coder-next: 6.75% β 100%, qwen3.5 series: 0% β 100%
- The Case for Function Calling
- A methodology for domains that demand precision
- Constraints through structural absence, model-neutral, mechanically verifiable
- Why Qwen
- Local models are essential for R&D
- Small models make the best QA engineers
- Open ecosystem
- The LLM doesnβt need to be accurate β it just needs to be correctable
1. AutoBe
6.75%.
Thatβs the probability that qwen3-coder-next produces a valid result on its first attempt when asked to generate API data types (input/output structures for products, orders, payments, etc.) for a shopping mall backend. Out of 100 tries, 93 fail.
And yet AutoBeβs final compilation success rate is 100%. Across all four Qwen models.
1.1. What AutoBe Does
AutoBeΒ is an open-source AI agent that generates production-ready backends from natural language conversation. Itβs developed by Wrtn TechnologiesΒ .
βBuild me a shopping mall backend. I need product listings, a shopping cart, orders, and payments.β β Say this, and AutoBe generates all of the following:
- Requirements analysis (SRS)
- Database schema (Prisma ERD)
- API specification (OpenAPI 3.1)
- E2E test code
- Complete implementation code
- Type-safe SDK
Every line of generated code compiles. What comes out is a real, working backend built on TypeScript + NestJS + Prisma.
Todo
| Analyze | actors: 2, documents: 6 | Analyze Phase Token Usage: 377.8K in: 308.6K / out: 69.2K Function Calls: 47 / 47 (100.0%) | |
| Database | namespaces: 2, models: 7 | Database Phase Token Usage: 1.25M in: 1.03M / out: 219.5K Function Calls: 37 / 38 (97.4%) | |
| Interface | operations: 14, schemas: 28 | Interface Phase Token Usage: 11.65M in: 11.40M / out: 242.6K Function Calls: 163 / 203 (80.3%) | |
| Test | functions: 44 | Test Phase Token Usage: 2.90M in: 2.76M / out: 142.2K Function Calls: 102 / 107 (95.3%) | |
| Realize | functions: 24 | Realize Phase Token Usage: 1.90M in: 1.78M / out: 120.8K Function Calls: 71 / 82 (86.6%) |
| Analyze | actors: 2, documents: 6 | Analyze Phase Token Usage: 1.33M in: 1.07M / out: 253.7K Function Calls: 99 / 105 (94.3%) | |
| Database | namespaces: 6, models: 21 | Database Phase Token Usage: 2.71M in: 2.53M / out: 171.5K Function Calls: 60 / 63 (95.2%) | |
| Interface | operations: 62, schemas: 80 | Interface Phase Token Usage: 67.76M in: 66.30M / out: 1.46M Function Calls: 628 / 898 (69.9%) | |
| Test | functions: 183 | Test Phase Token Usage: 25.28M in: 24.14M / out: 1.14M Function Calls: 608 / 624 (97.4%) | |
| Realize | functions: 98 | Realize Phase Token Usage: 11.70M in: 11.03M / out: 661.2K Function Calls: 286 / 320 (89.4%) |
Shopping
| Analyze | actors: 3, documents: 6 | Analyze Phase Token Usage: 3.83M in: 3.29M / out: 541.3K Function Calls: 170 / 197 (86.3%) | |
| Database | namespaces: 10, models: 30 | Database Phase Token Usage: 5.01M in: 4.87M / out: 148.1K Function Calls: 85 / 87 (97.7%) | |
| Interface | operations: 148, schemas: 155 | Interface Phase Token Usage: 160.24M in: 157.56M / out: 2.68M Function Calls: 1322 / 1764 (74.9%) | |
| Test | functions: 429 | Test Phase Token Usage: 84.24M in: 81.16M / out: 3.08M Function Calls: 1403 / 1445 (97.1%) | |
| Realize | functions: 207 | Realize Phase Token Usage: 32.63M in: 31.51M / out: 1.12M Function Calls: 599 / 665 (90.1%) |
Erp
| Analyze | actors: 2, documents: 6 | Analyze Phase Token Usage: 1.86M in: 1.56M / out: 295.3K Function Calls: 161 / 162 (99.4%) | |
| Database | namespaces: 7, models: 27 | Database Phase Token Usage: 2.55M in: 2.43M / out: 120.9K Function Calls: 73 / 75 (97.3%) | |
| Interface | operations: 101, schemas: 135 | Interface Phase Token Usage: 89.90M in: 87.74M / out: 2.17M Function Calls: 920 / 1313 (70.1%) | |
| Test | functions: 34 | Test Phase Token Usage: 6.61M in: 6.37M / out: 243.1K Function Calls: 208 / 215 (96.7%) | |
| Realize | functions: 157, errors: 4 | Realize Phase Token Usage: 23.84M in: 22.46M / out: 1.39M Function Calls: 581 / 638 (91.1%) |
1.2. The LLM Never Writes Code Directly
Most AI coding agents tell the LLM βwrite this code,β then save whatever text it outputs straight to a source file. AutoBe doesnβt work that way.
Instead, AutoBe uses function calling. Rather than letting the LLM generate freeform text, it hands the LLM a predefined structure (a JSON Schema) and says βfill in the blanks.β Think of it as giving someone a form and asking them to complete it.
Once the LLM fills in the form and returns structured data, AutoBeβs compiler reads that data and converts it into actual code. The LLM fills structures; the compiler writes code.
The entire pipeline works this way:
| Phase | What the LLM fills | Compiler validation |
|---|---|---|
| Requirements | AutoBeAnalyze β structured SRS | Structure check |
| Database | AutoBeDatabase β Prisma schema structure | Prisma compiler |
| API design | AutoBeOpenApi β OpenAPI spec structure | OpenAPI compiler |
| Tests | AutoBeTest β 30+ expression types | TypeScript compiler |
| Implementation | Modular code (Collector/Transformer/Operation) | TypeScript compiler |
At every phase, the LLM fills a structure, and a compiler validates it. This is AutoBeβs all-in function calling strategy.
1.3. What the LLM Has to Fill Is Far from Simple
The βformsβ the LLM has to fill are anything but trivial. Two examples will give you a sense of the precision required.
First, the DTO schema type that the LLM must generate during API design. A DTO (Data Transfer Object) describes the data structures in API requests and responses β things like βa productβs price is a positive integer, its name is a string, and its category list is an array of strings.β
The type that defines these DTO schemas is IJsonSchema. Itβs a union of 10 distinct kinds (constant, boolean, integer, number, string, array, objectβ¦) with recursive nesting β arrays contain more IJsonSchema, objects map to more IJsonSchema:
export type IJsonSchema =
| IJsonSchema.IConstant
| IJsonSchema.IBoolean
| IJsonSchema.IInteger
| IJsonSchema.INumber
| IJsonSchema.IString
| IJsonSchema.IArray // items: IJsonSchema β recursive
| IJsonSchema.IObject // properties: Record<string, IJsonSchema> β recursive
| IJsonSchema.IReference
| IJsonSchema.IOneOf // oneOf: IJsonSchema[] β recursive
| IJsonSchema.INull;10 variants, infinitely recursive nesting. The 6.75% figure from earlier? Thatβs the raw function calling success rate for this exact type.
The test phase takes complexity up another level. To generate E2E test code, the LLM has to express logic like βcall this API, check that the response status is 200, verify that the bodyβs items array has length greater than 0.β The type that captures this is IExpression:
export type IExpression =
| IBooleanLiteral | INumericLiteral | IStringLiteral // literals
| IArrayLiteralExpression | IObjectLiteralExpression // compound literals
| INullLiteral | IUndefinedKeyword // null/undefined
| IIdentifier | IPropertyAccessExpression // accessors
| IElementAccessExpression | ITypeOfExpression // access/ops
| IPrefixUnaryExpression | IPostfixUnaryExpression // unary ops
| IBinaryExpression // binary ops
| IArrowFunction | ICallExpression | INewExpression // functions
| IArrayFilterExpression | IArrayForEachExpression // array ops
| IArrayMapExpression | IArrayRepeatExpression // array ops
| IPickRandom | ISampleRandom | IBooleanRandom // random generation
| IIntegerRandom | INumberRandom | IStringRandom // random generation
| IPatternRandom | IFormatRandom | IKeywordRandom // random generation
| IEqualPredicate | INotEqualPredicate // assertions
| IConditionalPredicate | IErrorPredicate; // assertionsOver 30 variants, recursively nested. This is essentially programming-language-level complexity, and the LLM must generate it through a single function call.
1.4. How 6.75% Becomes 100%
Given structures this complex, a 6.75% first-attempt success rate is no surprise. The real question is how to turn 6.75% into 100%.
The answer is a validation feedback loop β a cycle of verification, feedback, and correction.
When a function call fails, the system doesnβt just say βwrong.β Typia (a library weβll cover in detail shortly) takes the LLMβs raw JSON output and inserts // β inline annotations at every exact point where an error occurred. Hereβs an example from Typiaβs documentationΒ :
{
"order": {
"payment": {
"type": "card",
"cardNumber": 12345678 // β [{"path":"$input.order.payment.cardNumber","expected":"string"}]
},
"product": {
"name": "Laptop",
"price": -100, // β [{"path":"$input.order.product.price","expected":"number & Minimum<0>"}]
"quantity": 2.5 // β [{"path":"$input.order.product.quantity","expected":"number & Type<\"uint32\">"}]
},
"customer": {
"name": "John Doe",
"email": "invalid-email", // β [{"path":"$input.order.customer.email","expected":"string & Format<\"email\">"}]
"vip": "yes" // β [{"path":"$input.order.customer.vip","expected":"boolean"}]
}
}
}cardNumber should be a string, not a number. price must be β₯ 0. quantity must be a positive integer. email isnβt a valid email. vip should be a boolean. Five errors, each with the exact path and expected type.
With this feedback, the LLM doesnβt need to regenerate everything from scratch. It can precisely correct only the flagged fields and retry.
Compiler validation β precise diagnostics β LLM correction β revalidation. This loop repeats until it succeeds. Whether it takes one attempt or ten, the end result is 100%.
1.5. Qwen 3.5: From 0% to 100%
The qwen3.5 series presents an even more dramatic case.
Hereβs a function calling application from Typiaβs documentationΒ :
interface IOrder {
payment: IPayment;
product: {
name: string;
price: number & tags.Minimum<0>;
quantity: number & tags.Type<"uint32">;
};
customer: {
name: string;
email: string & tags.Format<"email">;
vip: boolean;
};
}
type IPayment =
| { type: "card"; cardNumber: string }
| { type: "bank"; accountNumber: string };And hereβs what the LLM actually returns:
const llmOutput = `
> I'd be happy to help you with your order! π
\`\`\`json
{
"order": {
"payment": "{\\"type\\":\\"card\\",\\"cardNumber\\":\\"1234-5678",
"product": {
name: "Laptop",
price: 1300,
quantity: 2,
},
"customer": {
"name": "John Doe",
"email": "john@example.com",
vip: tru
\`\`\``;Markdown wrapping, explanation prefix, unquoted keys, trailing commas, tru instead of true, unclosed brackets β and payment is double-stringified because IPayment is an anyOf (JSON Schemaβs way of saying βone of several typesβ). Double-stringify means the LLM wrote the object as a JSON string inside a string β instead of {"type": "card", ...} (an object), it produced "{\"type\": \"card\", ...}" (a string containing JSON). Seven problems in a single output.
The double-stringify is the one that makes success rate 0%. Other errors are occasional; anyOf double-stringify is 100% consistent β every anyOf field, every time. This isnβt Qwen-specific; Anthropicβs Claude models do the same thing with oneOf. Every model family has its union-type blind spot.
Typiaβs parse() handles all of this in a single call β broken JSON recovery, type coercion, double-stringify unwrapping. No changes to the model. This is how Qwen 3.5 went from 0% to 100%.
1.6. Four Models, All at 100%
AutoBe currently tests against four Qwen models. All of them pass compilation.
| Model | Active parameters | Characteristics |
|---|---|---|
qwen/qwen3-coder-next | 3B / 80B | Coding-focused, tool choice support |
qwen/qwen3.5-397b-a17b | 17B / 397B | Largest MoE |
qwen/qwen3.5-122b-a10b | 10B / 122B | Mid-size MoE |
qwen/qwen3.5-35b-a3b | 3B / 35B | Compact MoE |
From 397B down to 35B. Even a compact model with just 3B active parameters can generate a complete shopping mall backend. Same pipeline, same schemas, same results.
1.7. It Runs Without System Prompts
One anecdote.
AI agents typically have system prompts β documents that instruct the LLM in natural language: βYou are a backend development expert. Follow these rules when writing codeβ¦β In most AI agents, the system prompt is the crown jewel.
Once, we shipped a build where the system prompt was completely missing. The agent ran on nothing but function calling schemas and validation logic. No natural language instructions whatsoever.
Nobody noticed. Output quality was identical.
This wasnβt a one-time fluke. It happened multiple times, and the result was the same every time.
The types were the best prompt, and validation feedback was the best orchestration.
2. Typia β The Infrastructure Behind All of This
The things that kept appearing naturally throughout Section 1 β schema conversion, broken JSON recovery, type coercion, precise error feedback β who does all of that?
To use function calling in production, thereβs no shortage of problems to solve. How do you generate the JSON Schema to send to the LLM? What do you do when the LLM returns broken JSON? How do you correct wrong types? How do you communicate errors in a format the LLM can understand?
TypiaΒ handles all of this in a single library.
2.1. From TypeScript Types to Function Calling Schemas
Function calling requires a JSON Schema that tells the LLM βgive me data in this structure.β Normally, developers write these schemas by hand β define the type, write a matching schema separately, then make sure the two donβt drift apart over time.
Typia automates this. Define a TypeScript type, and Typia automatically generates its JSON Schema at compile time. Not through runtime reflection, but by directly leveraging the TypeScript compilerβs type analyzer:
import typia, { tags } from "typia";
interface IMember {
/**
* The member's age.
*
* Only adults aged 19 or older can register.
* This is the platform's legal age restriction.
*/
age: number & tags.Type<"uint32"> & tags.ExclusiveMinimum<18>;
email: string & tags.Format<"email">;
name: string & tags.MinLength<1> & tags.MaxLength<100>;
}
const schema = typia.llm.parameters<IMember>();
// {
// type: "object",
// properties: {
// age: {
// type: "integer",
// description: "The member's age.\n\nOnly adults aged 19 or older can register.\nThis is the platform's legal age restriction.",
// exclusiveMinimum: 18
// },
// email: { type: "string", format: "email" },
// name: { type: "string", minLength: 1, maxLength: 100 }
// },
// required: ["age", "email", "name"]
// }Two things to note here.
First, JSDoc comments become description fields. The LLM reads these descriptions to decide what values to generate. βOnly adults aged 19 or older can registerβ is automatically included in the schema, giving the LLM the context it needs.
Second, type constraints become validation rules. ExclusiveMinimum<18> becomes a β> 18β validation rule; Format<"email"> becomes an email format check. A single type definition produces both LLM guidance and validation rules simultaneously.
When schemas are written by hand, they inevitably drift from the types over time. Typia eliminates this problem entirely. The type is the schema.
At the class level, typia.llm.application<T>() converts all public methods into function calling schemas, with parse(), coerce(), and validate() methods automatically built into each function.
2.2. Lenient JSON Parsing: Cleaning Up the LLMβs Broken JSON
LLMs donβt produce perfect JSON. Why? Because an LLM is a language model that generates text token by token β not a JSON generator. It forgets to close brackets, misplaces commas, prepends βHere is your answer:β before the JSON, and wraps everything in Markdown code blocks.
JSON.parse() rejects all of these. Typiaβs ILlmFunction.parse() handles every case:
| Problem | Example | Resolution |
|---|---|---|
| Unclosed bracket | {"name": "John" | Auto-close |
| Trailing comma | [1, 2, 3, ] | Ignore |
| JavaScript comments | {"a": 1 /* comment */} | Strip |
| Unquoted keys | {name: "John"} | Allow |
| Incomplete keywords | {"done": tru | Complete to true |
| Explanation prefix | Here is your JSON: {"a": 1} | Skip |
| Markdown code block | ```json\n{"a": 1}\n``` | Extract inner |
In real LLM outputs, these problems occur simultaneously:
const llmOutput = `
> I'd be happy to help you with your order!
\`\`\`json
{
"order": {
"payment": "{\\"type\\":\\"card\\",\\"cardNumber\\":\\"1234-5678",
"product": {
name: "Laptop",
price: "1299.99",
quantity: 2,
},
"customer": {
"name": "John Doe",
"email": "john@example.com",
vip: tru
\`\`\` `;
const result = func.parse(llmOutput);
// Markdown code block, explanation prefix, unquoted keys, trailing commas,
// double-stringify, stringβnumber, incomplete keyword, unclosed brackets
// β 8 problems at once, all handled by a single parse() call.Most JSON repair tools (jsonrepair, dirty-json, LangChainβs parse_partial_json, etc.) work at the string level β fix trailing commas, close brackets, strip Markdown, then pass the result to JSON.parse(). The output is a syntactically valid JSON string. But a double-stringified value like "{\\"type\\":\\"card\\"}" passes through unscathed β itβs already valid JSON (itβs a string). Without knowing the schema, thereβs no way to know it should be an object.
Typiaβs parse() works differently. It doesnβt repair a string and hand it off β it parses greedily while consulting the schema. When it encounters a string value where the schema expects an object, it re-enters parse() on that string, applying the same lenient recovery. The result feeds into type coercion, which may find another string-where-object-expected, triggering yet another round. Parsing and coercion call each other recursively, unwinding layers of stringify naturally β double, triple, however deep.
This is why Section 1.5βs seven problems are solved in βa single parse() call.β Itβs not seven separate fixes applied sequentially. Itβs a schema-driven recursive cycle where parsing and coercion are inseparable. And itβs why double-stringify β the problem that made Qwen 3.5βs success rate 0% β canβt be solved by string-level repair. You need the schema to know whatβs supposed to be an object.
2.3. Schema-Based Type Coercion: Correction That Knows the Schema
LLMs frequently get types wrong, not just structure. They write "42" (a string) where 42 (a number) is expected, and "true" (a string) where true (a boolean) is expected. A human would see these as equivalent, but to a program theyβre completely different types.
Naive type casting canβt solve this. Whether "42" should be a number or remain a string depends entirely on whether the schema for that field says number or string.
Typiaβs ILlmFunction.coerce() consults the JSON Schema and converts values to the type the schema expects:
| LLM output | Expected type | Result |
|---|---|---|
"42" | number or integer | 42 |
"true" / "false" | boolean | true / false |
"null" | null | null |
"{\"x\": 1}" | object | { x: 1 } (recursive parsing) |
"[1, 2, 3]" | array | [1, 2, 3] (recursive parsing) |
Hereβs what this looks like in practice:
const fromLlm = {
order: {
payment: '{"type":"card","cardNumber":"1234-5678"}', // double-stringify
product: {
name: "Laptop",
price: "1299.99", // string, but schema says number
quantity: "2", // string, but schema says integer
},
customer: {
name: "John Doe",
vip: "true", // string, but schema says boolean
},
},
};
const result = func.coerce(fromLlm);
// result.order.product.price === 1299.99 (number)
// result.order.product.quantity === 2 (integer)
// result.order.customer.vip === true (boolean)
// result.order.payment === { type: "card", cardNumber: "1234-5678" } (object)For union types (structures where one of several types is selected), Typia structurally analyzes the data to identify the correct variant, then applies that variantβs coercion rules β no discriminator required.
This is the mechanism behind the Qwen 3.5 seriesβ 0% β 100% from Section 1. The modelβs tendency to double-stringify objects in union types was solved at the infrastructure level using schema information.
When the SDK has already parsed the JSON (Anthropic SDK, Vercel AI, LangChain, MCP, etc.), use coerce() instead of parse().
2.4. Validation and Precise Feedback
Even after parsing and type coercion, the values themselves can be wrong. A negative number for a price, a non-email string in an email field, a decimal where an integer is required.
Typiaβs ILlmFunction.validate() detects these schema violations and pinpoints not just that something is wrong, but exactly where and why:
const result = func.validate(input);
// Error example:
// {
// path: "$input.order.product.price",
// expected: "number & Minimum<0>",
// value: -100
// }βThe price inside product inside order must be a number β₯ 0, but you gave -100.β Thatβs the level of precision.
LlmJson.stringify() then inserts these errors as // β inline annotations directly onto the LLMβs original JSON output:
{
"order": {
"payment": {
"type": "card",
"cardNumber": 12345678 // β [{"path":"$input.order.payment.cardNumber","expected":"string"}]
},
"product": {
"name": "Laptop",
"price": -100, // β [{"path":"$input.order.product.price","expected":"number & Minimum<0>"}]
"quantity": 2.5 // β [{"path":"$input.order.product.quantity","expected":"number & Type<\"uint32\">"}]
},
"customer": {
"email": "invalid-email", // β [{"path":"$input.order.customer.email","expected":"string & Format<\"email\">"}]
"vip": "yes" // β [{"path":"$input.order.customer.vip","expected":"boolean"}]
}
}
}The LLM can see exactly where and why it went wrong, right on top of its own JSON. With this feedback, thereβs no need to rewrite everything β just correct the five flagged fields and retry.
2.5. The Full Loop: Parse β Coerce β Validate β Feedback β Retry
Combining everything introduced so far into a single loop, we get the complete picture of the validation feedback loop from Section 1:
async function callWithFeedback(
llm: LLM,
func: ILlmFunction,
prompt: string,
maxRetries: number = 10,
): Promise<unknown> {
let feedback: string | null = null;
for (let i = 0; i < maxRetries; i++) {
// 1. Request function call from LLM (with previous feedback)
const rawOutput = await llm.call(prompt, feedback);
// 2. Lenient JSON parsing + Type coercion
const parsed = func.parse(rawOutput);
if (!parsed.success) {
feedback = `JSON parsing failed: ${JSON.stringify(parsed.errors)}`;
continue;
}
// 3. Schema validation
const validated = func.validate(parsed.data);
if (!validated.success) {
// 4. Generate structured feedback (// β inline annotations)
feedback = LlmJson.stringify(validated);
continue;
}
// 5. Success
return validated.data;
}
throw new Error("Max retries exceeded");
}parse() rescues broken JSON and performs first-pass type correction. validate() catches schema violations. LlmJson.stringify() renders errors in a format the LLM can read. The LLM reads this feedback and corrects itself. This is the complete engine that turns 6.75% into 100%.
2.6. One Type Does It All
To sum up: define a single TypeScript type, and Typia handles the rest:
- Generates the schema β
typia.llm.parameters<T>(),typia.llm.application<T>() - Parses β
ILlmFunction.parse()(broken JSON recovery + type coercion) - Coerces β
ILlmFunction.coerce()(type coercion for SDK-parsed objects) - Validates β
ILlmFunction.validate()(schema violation detection) - Generates feedback β
LlmJson.stringify()(LLM-readable// βinline diagnostics)
No other tool provides this complete pipeline. Individual pieces exist elsewhere β JSON repair libraries handle broken syntax, Pydantic offers validation, some frameworks have retry loops. But the schema-driven recursive cycle of parse β coerce, combined with structural variant identification and inline error feedback, exists only in Typia.
The type is the schema, the validator, and the prompt.
3. The Case for Function Calling
So far, weβve seen how function calling works through AutoBe and Typia. Now letβs talk about why function calling is an effective methodology for domains that demand precision and correctness.
3.1. Natural Language vs. Types
Natural language is, well, natural. It evolved organically over millennia of human society, and ambiguity is a feature, not a bug. Metaphor, nuance, politeness, humor β all of it runs on ambiguity. βJust make it look niceβ works as an instruction between humans.
Programming languages are designed. Someone intentionally built them to eliminate room for interpretation. βJust make it look niceβ doesnβt compile. Ambiguity is a bug.
When people communicate in natural language, they misunderstand each other and argue. When they communicate in types and schemas, thereβs no misunderstanding.
Letβs contrast an LLM prompt with a type schema.
Expressing constraints via prompt:
βThe age field must be a positive integer greater than 18. Donβt use string types for numeric fields. All required fields must be presentβ¦β
Several problems are visible. Does βgreater than 18β mean >18 or β₯18? Thereβs no way to verify whether the LLM followed these rules without inspecting the output. And as the schema grows more complex, rules like these multiply endlessly.
Expressing constraints via types:
interface IMember {
/** Only adults aged 19 or older can register */
age: number & Type<"uint32"> & ExclusiveMinimum<18>;
}ExclusiveMinimum<18> means >18. Itβs an integer. Itβs required. Unambiguous and mechanically verifiable.
In domains that demand precise results, defining a schema and annotating each field is far clearer, easier, and more verifiable than writing a natural language prompt.
3.2. The Pink Elephant Problem
If youβve ever built a prompt-based AI agent, youβve written prohibition rules:
- βDo not create utility functionsβ
- βDo not use the
anytypeβ - βDo not create circular dependenciesβ
When someone says βdonβt think of a pink elephant,β a pink elephant is the first thing that comes to mind. When you tell an LLM βdonβt do X,β X is placed at the center of its attention. To avoid a forbidden pattern, the model must first recall that pattern β which paradoxically increases the probability of generating it. This is inherent to the token prediction mechanism.
Even knowing this problem, you canβt avoid prohibition rules in prompts. βDonβt do Xβ is the only tool natural language has for expressing constraints. Iβve never seen a prompt-based AI agent that doesnβt use prohibition rules.
In schemas, this problem doesnβt exist.
You donβt need to say βdonβt use the any typeβ β if any isnβt in the schema, the LLM physically canβt produce it. You donβt need to say βdonβt create utility functionsβ β if thereβs no slot for utility functions in the schema, thatβs the end of it. If the field type is limited to "boolean" | "int" | "double" | "string" | "uri" | "uuid" | "datetime" β seven options β thereβs no path for the LLM to write "varchar".
Not prohibition, but absence. Prompts try to forbid what you donβt want; schemas only permit what you do. This is why function calling is particularly effective in domains that demand precise output.
3.3. Model Neutrality
Prompt engineering is inherently model-dependent. A prompt optimized for GPT behaves differently on Claude, and differently again on Qwen. When a new model comes out or you want to experiment with a different one, itβs not uncommon to rewrite prompts from scratch.
Function calling schemas are model-neutral. JSON Schema is JSON Schema. It means the same thing regardless of which model reads it, and the validation feedback loop absorbs any performance differences between models. A strong model gets it right in 1β2 attempts; a weaker model takes 3β4; both converge to 100%.
AutoBe running Qwen, GLM, DeepSeek, and OpenAI models on the same schemas, the same pipeline, achieving 100% compilation across the board, is proof of this neutrality. Weβve never done model-specific prompt tuning.
This changes the nature of model selection. It goes from βcan this model do this task?β β a capability question β to βwhich model is the most cost-effective?β β a cost optimization problem: average retries Γ tokens per attempt Γ price per token.
3.4. The Core: Verifiability and the Feedback Loop
One thread runs through everything weβve discussed.
The most powerful advantage of function calling is that it brings LLM output into the domain of software engineering.
If you let an LLM generate freeform text, determining whether that output is correct becomes yet another AI problem. Parsing is fuzzy. Validation is fuzzy. Correction is fuzzy. Everything is uncertain.
With function calling, the output is structured data. From that moment on, you can use the tools of software engineering:
- Verification is deterministic β JSON Schema validation yields a clear pass/fail
- Feedback is precise β βfield X should be type Y, but you gave Zβ can be identified exactly
- Correction converges β precise feedback enables the model to fix only the affected parts
These three form a deterministic chain. The model is still probabilistic and still makes mistakes, but the loop outside the model is deterministic, so the process converges to 100%.
Typed Schema + Deterministic Validator + Structured Error Feedback = Reliable LLM Output
If prompt engineering is about tinkering with the inside of the model, function calling is about making the outside of the model rock-solid. In domains that demand precision, the effectiveness of the latter approach is proven by results: 6.75% β 100%.
The LLM doesnβt need to be accurate. It just needs to be correctable. And correctability is not a property of the model β itβs a property of the validation infrastructure.
3.5. Application Spectrum: How Far Can This Go?
So is this pattern β function calling + validation feedback β limited to coding? No. It forms a spectrum based on verifiability.
3.5.1. Domains Where All Output Is Verifiable
AutoBeβs Database, Interface, Test, and Realize phases fall here. The compiler serves as the validator, guaranteeing 100% correctness.
This isnβt unique to software. Any field where βcorrect or incorrectβ can be mechanically determined supports the same structure, with a natural hierarchy based on verification cost:
| Domain | Fast (ms) | Medium (sec) | Deep (min+) |
|---|---|---|---|
| Software | Type check | Compilation | Test execution |
| Semiconductors | DRC | LVS | SPICE simulation |
| Chemical Process | Mass balance | Energy balance | Process simulation |
| Interior Design | Dimension / clearance | Code compliance, clash detection | Lighting / HVAC simulation |
| Control Systems | Transfer function validity | Stability / margin analysis | Time-domain simulation |
The feedback loop naturally exploits this hierarchy: run the cheapest validator first, fix the errors, then move to the next level.
3.5.2. The Pattern in Practice
The table above shows the hierarchy in overview. Hereβs what the pattern looks like as concrete types β each from an engineering field with validators refined over decades.
Semiconductors β Physical rules in chip design are non-negotiable:
interface IChipLayout {
technology_node: "5nm" | "7nm" | "14nm" | "28nm";
blocks: IBlock[];
connections: IConnection[];
}
interface IBlock {
type: "logic" | "memory" | "io" | "analog" | "pll";
position: IPoint2D;
dimensions: IDimension;
sub_blocks: IBlock[]; // Recursive hierarchy
}DRC (Design Rule Check, fast), LVS (Layout vs. Schematic, medium), SPICE simulation (slow). Costs vary by tier, but all are deterministic validations. The feedback loop starts with the cheapest β DRC.
Chemical Process β Conservation laws are absolute validators:
interface IProcessStream {
temperature: number & Minimum<0>; // Kelvin
pressure: number & Minimum<0>; // Pa
composition: IComponent[]; // sum must equal 1.0
phase: "liquid" | "vapor" | "solid" | "two_phase";
flow_rate: number & Minimum<0>; // kg/s
}
interface IUnitOperation {
type: "reactor" | "distillation" | "heat_exchanger"
| "compressor" | "pump" | "mixer" | "splitter";
inlet_streams: IProcessStream[];
outlet_streams: IProcessStream[]; // mass balance: Ξ£in = Ξ£out
energy_duty: number; // energy balance
}Mass conservation (Ξ£ inlet = Ξ£ outlet), energy balance, thermodynamic consistency β these are the laws of physics, not opinions. Tools like ASPEN and HYSYS have provided deterministic validation for over 40 years. Mass balance check (fast) β energy balance (medium) β full process simulation (deep).
Interior Design β Beneath the aesthetics, hard constraints define the space:
interface IRoom {
type: "bedroom" | "living" | "kitchen" | "bathroom"
| "office" | "hallway" | "storage";
dimensions: IDimension3D;
openings: IOpening[];
fixtures: IFixture[];
}
interface IOpening {
type: "door" | "window" | "sliding_door" | "arch";
width: number & Minimum<0>; // door β₯ 900mm (accessibility)
height: number & Minimum<0>;
position: IPoint3D;
swing_direction?: "inward" | "outward" | "sliding";
}
interface IFixture {
type: "cabinet" | "counter" | "appliance"
| "furniture" | "lighting" | "plumbing";
position: IPoint3D;
dimensions: IDimension3D;
clearance_required: number; // min clear space (mm)
}People think of interior design as purely aesthetic, but itβs built on hard constraints: minimum passage width (800mm), door width for accessibility (β₯900mm), fire compartment regulations, emergency egress distances. BIM tools like Revit have provided clash detection for decades. Dimension and clearance checks (fast) β building code compliance and collision detection (medium) β lighting (lux) and HVAC simulation (deep).
Control Systems β Stability is mathematically provable:
interface IControlLoop {
type: "PID" | "MPC" | "LQR" | "feedforward" | "cascade";
plant_model: ITransferFunction;
setpoint: number;
sampling_rate: number & Minimum<0>; // Hz
constraints: IConstraint[];
}
interface ITransferFunction {
numerator: number[]; // polynomial coefficients
denominator: number[]; // degree β₯ numerator
delay: number & Minimum<0>; // transport delay (sec)
}A control system is either stable or it isnβt β and this can be proven mathematically. Bode plots, Nyquist diagrams, pole placement: over 60 years of established analysis tools. Transfer function validity (fast) β stability and gain/phase margin analysis (medium) β full time-domain simulation (deep).
Look at these types again. Every one has a type field with enumerated variants β "logic" | "memory" | ..., "reactor" | "distillation" | ..., "bedroom" | "living" | ..., "PID" | "MPC" | .... Several nest recursively. These are the same union + tree structures as AutoBeβs IJsonSchema and IExpression. This isnβt coincidence β itβs the nature of engineering data. Appendix A.3 explains why.
Note: These domain examples are AI-recommended β all are engineering fields where deterministic validators have been established for decades, so the pattern fits in principle. That said, Iβm a developer, not a domain expert β take the specifics with a grain of salt.
3.5.3. Where This Doesnβt Apply
Conversely, this pattern doesnβt fit domains where deterministic validators canβt be built. Creative writing, emotional intelligence, strategic decision-making. Thereβs no validator for βa good novelβ or βa wise business decision.β Iβll acknowledge that honestly.
4. Why Qwen
4.1. Function Calling Performance: Best in Class for Small/Medium Models
Let me start with the most direct answer to βwhy Qwen?β
AutoBeβs entire pipeline is function calling. Whether a model writes good prose or carries on a smooth conversation doesnβt matter. The only criterion is how accurately it fills complex JSON Schemas.
Qwen is not the only open-weight model that does function calling well. GLM, Kimi, and others deliver strong function calling performance at large model scales. But at the small and medium scale, Qwen was the only one that could handle function calling of this complexity.
Even a compact 3B-active-parameter MoE model supports tool choice and processes complex schemas containing 10+ variant recursive unions. For AutoBe, this small/medium-scale performance was decisive β the reasons why continue in the following sections.
4.2. R&D Cost: Users vs. Developers
For customers using AutoBe, model cost isnβt an issue. Even the most expensive model is cheaper than actually hiring a backend developer.
But for us developing AutoBe, itβs different. Every time we design a new type or add a new validation rule, we need to run the entire pipeline end to end. Thousands of generate-compile-feedback cycles. Using commercial models every time would be financially ruinous.
Local models make this R&D cycle possible. We can experiment without limit, without worrying about cost. The journey from 6.75% to 100% required hundreds of experiment cycles, and that was only possible because the models were local.
4.3. Small Models Make the Best QA Engineers
Large models make fewer mistakes. Thatβs an advantage β and simultaneously a disadvantage.
Even when our validation has blind spots we havenβt thought of, large models rarely trigger those failures. They βguess correctlyβ through ambiguous parts of the schema and get it right. Our mistakes stay hidden.
Switch to a small model, and the story changes. These are separate models from the four in Section 1.6 β smaller or non-coding-optimized variants we used specifically for QA:
| Model | Active / Total | Success rate | What it found |
|---|---|---|---|
qwen3-30b-a3b-thinking | 3B / 30B | ~10% | Fundamental schema ambiguities, missing required fields |
qwen3-next-80b-a3b-instruct | 3B / 80B | ~20% | Subtle type mismatches in complex nested relationships |
The 10% success rate was the most valuable result. Every failure pointed to a gap in our system, and each fix strengthened the pipeline not just for weak models, but for all models.
AI is probabilistic. Large models make mistakes less often, not never. Counterexamples that surface with small models will eventually occur with large models too β just rarely. In production, βrarelyβ is an outage.
When the schema is precise enough that even a 35B model canβt misinterpret it, the probability of a strong model getting it wrong converges to effectively zero.
4.4. No Vendor Lock-In
Price changes, model deprecation, and rate limits for commercial APIs are entirely at the vendorβs discretion. The model you use today could disappear tomorrow.
AutoBeβs function calling schemas are designed to be model-neutral. We donβt use model-specific prompt tricks. JSON Schema and type-based validation are industry standards β the code stays the same even when the model changes.
4.5. Open Source + Open Weights: A Virtuous Cycle
AutoBe is open source (AGPL 3.0), and Qwen is open-weight. Both are part of the open ecosystem.
This combination is what made thousands of experiments possible, what made edge case discovery possible, and what made system hardening possible. With commercial models, experimentation at this scale would have been financially impossible.
The open ecosystem creates a virtuous cycle of mutual reinforcement:
- AutoBe hardens its system using Qwen
- The hardened system proves Qwenβs production-level viability
- Improvements to Qwen raise AutoBeβs overall performance
- AutoBeβs discoveries (like the double-stringify issue) can contribute to Qwenβs improvement
5. Closing
AutoBe achieved 100% compilation success across all four Qwen models through an all-in function calling strategy.
What made it possible was neither smarter prompts nor more sophisticated orchestration. It was the type-based infrastructure Typia provides β automatic schema generation, lenient parsing, type coercion, validation feedback β deterministically overcoming the modelβs probabilistic limitations.
When you communicate in types, thereβs no misunderstanding. When you constrain with schemas, thereβs no pink elephant. When you have a deterministic validation loop, even 6.75% becomes 100%.
This pattern isnβt limited to coding. Itβs transferable to any engineering field where deterministic validators exist.
And what made all of this experimentation and validation possible was Qwen β an open-weight model.
The LLM doesnβt need to be accurate. It just needs to be correctable.
About AutoBe: AutoBeΒ is an open-source AI agent developed by Wrtn TechnologiesΒ . It generates production-ready backend applications from natural language.
About Typia: TypiaΒ is a compiler library that automatically generates runtime validators, JSON Schema, and function calling schemas from TypeScript types.
Appendix: Technical Deep Dives
Union types appear throughout this talk, from start to finish. The 10 variants of IJsonSchema (Section 1.3), the 30+ variants of IExpression (Section 1.3), Qwen 3.5βs double-stringify issue (Section 1.5), type coercion (Section 2.3), validation feedback (Section 2.4). Sections A.1βA.4 dive deep into why union types are the make-or-break challenge for function calling infrastructure. Section A.5 explores a separate capability that Typiaβs schema-driven parsing makes possible.
A.1. What Is a Discriminated Union?
A union type represents βone of several kinds.β For example, if a payment method can be either a card or a bank transfer:
type Payment =
| { type: "card"; cardNumber: string; cvc: string }
| { type: "bank_transfer"; bankCode: string; accountNumber: string }A discriminated union is a union that has a discriminator field β a single field whose value determines which variant the data belongs to. In the example above, type is the discriminator. If type is "card", the data has cardNumber and cvc; if itβs "bank_transfer", it has bankCode and accountNumber. A single discriminator value determines the rest of the structure.
Why does this matter? When an LLM generates data for a union type and makes a mistake, correcting it requires knowing βwhich variant was this data intended to be?β first. A discriminator makes this identification straightforward β check one field, know the variant. Without one, intent must be inferred from the dataβs shape, which is harder but still possible with the right infrastructure.
AutoBeβs IJsonSchema (10 variants) and IExpression (30+ variants) are all discriminated unions, and Typiaβs ability to structurally identify variants and generate precise per-field feedback is the core mechanism behind 6.75% β 100%.
A.2. Typiaβs x-discriminator β Adding Intelligence to anyOf
The JSON Schema standard offers two ways to represent union types: anyOf (matches any) and oneOf (matches exactly one). But neither carries βwhich field distinguishes the variantsβ β they just say βmatch one of these schemas.β
OpenAPI 3.x has a discriminator, but itβs exclusive to oneOf, and most LLMs donβt handle oneOf reliably.
Typia solves this with a plugin property called x-discriminator. It uses anyOf β which LLMs broadly support β while attaching discriminator metadata:
// Schema generated by Typia (simplified)
{
"anyOf": [
{ "type": "object", "properties": { "type": { "const": "card" }, "cardNumber": { ... } } },
{ "type": "object", "properties": { "type": { "const": "bank_transfer" }, "bankCode": { ... } } }
],
"x-discriminator": {
"propertyName": "type",
"mapping": {
"card": "#/$defs/CardPayment",
"bank_transfer": "#/$defs/BankTransferPayment"
}
}
}This serves a distinct purpose from Typiaβs internal processing. Typiaβs coerce() and validate() identify the correct variant through structural analysis of the data itself β matching property names, types, and shapes against each variantβs schema. This works with or without a discriminator.
x-discriminator is LLM-facing. It tells the model βuse the type field to select a variant,β reducing the chance of the LLM generating structurally ambiguous data in the first place. Better input from the LLM means fewer corrections needed downstream.
The two work in tandem:
x-discriminatorreduces errors at the source β the LLM reads the hint and generates data that more clearly belongs to one variant- Typiaβs structural analysis handles the rest β
coerce()identifies the variant and applies variant-specific coercion (including double-stringify unwrapping for Qwen 3.5).validate()identifies the variant and produces precise per-field errors β not βdoesnβt match any of 10 variants,β but βcard variantβs cardNumber should be string, but you gave numberβ
x-discriminator makes the LLM smarter; Typiaβs structural engine makes the infrastructure robust. This is why the type coercion from Section 2.3 and validation feedback from Section 2.4 work reliably on union types.
A.3. The World Is Made of Recursive Unions
Engineering manages complexity through hierarchical decomposition β break a big system into smaller parts, break those into smaller parts still. A chip is blocks; blocks are sub-blocks. A plant is sections; sections are units. A building is floors; floors are rooms. This decomposition is a tree. And at each level, the parts come in different kinds β a block can be logic, memory, or IO; a unit can be a reactor, distillation column, or heat exchanger. The moment a treeβs nodes have kinds, it becomes a recursive union type.
The engineering domains from Section 3 are no exception:
- Semiconductors:
IBlockβsub_blocks: IBlock[](chip β block β sub-block) - Chemical Process: plants β sections β units β sub-units (recursive process hierarchy)
- Interior Design: buildings β floors β rooms β zones (recursive spatial decomposition)
- Control Systems: cascade control β outer loopβs output becomes inner loopβs setpoint (recursive nesting)
These are structurally identical to AutoBeβs IJsonSchema (10 variants) and IExpression (30+ variants). Theyβre all ASTs β abstract syntax trees. This isnβt a coincidence specific to these four fields. Hierarchical decomposition is how engineers manage complexity, and hierarchical decomposition produces recursive union types. Any deterministic engineering domain that structures its data β which is virtually all of them β will share this structure.
In Section 3, we said βif a domainβs output is verifiable, the function calling + validation feedback pattern is transferable.β But if the data structures of those domains are all recursive unions, then conquering union types is the prerequisite for that transfer.
If coercion doesnβt work on union types, Qwen 3.5βs double-stringify problem will surface in chip design too. If validation feedback doesnβt work on union types, βdoesnβt match any of 30 variantsβ wonβt get the feedback loop to converge. If you canβt identify which variant the data was intended to be, correction itself is impossible.
Typiaβs structural variant identification, schema-based coercion, and precise per-field validation are the solution for this universal structure. AutoBeβs 6.75% β 100% is not just an achievement in code generation. Itβs the establishment of 100% reliability on the universal structure of recursive unions β an achievement transferable to every structured domain that shares this structure.
A.4. Why Not Zod?
Zod is the most popular runtime validation library in the TypeScript ecosystem. βWhy donβt you use Zod?β is a question we get often.
Letβs see what happens when you try to define AutoBe-scale 30+ variant recursive discriminated unions in Zod:
const ExpressionSchema: z.ZodType<IExpression> = z.lazy(() =>
z.discriminatedUnion("type", [
z.object({ type: z.literal("booleanLiteral"), value: z.boolean() }),
z.object({
type: z.literal("callExpression"),
expression: ExpressionSchema, // circular reference
arguments: z.array(ExpressionSchema), // circular reference
}),
// ... 28 more
])
);Three problems.
First, you must define TypeScript types and Zod schemas separately.
Zodβs official documentation states this explicitly: βyou can define a recursive schema in Zod, but because of a limitation of TypeScript, their type canβt be statically inferred.β When you use z.lazy(), z.infer doesnβt work, so you must define a TypeScript interface separately and pass it manually via z.ZodType<T>:
// 1. Define the TypeScript type first
type IExpression =
| { type: "booleanLiteral"; value: boolean }
| { type: "callExpression"; expression: IExpression; arguments: IExpression[] }
| { type: "binaryExpression"; left: IExpression; operator: string; right: IExpression }
// ... 27 more
// 2. Define the Zod schema separately, manually linking the type hint
const ExpressionSchema: z.ZodType<IExpression> = z.lazy(() =>
z.discriminatedUnion("type", [
z.object({ type: z.literal("booleanLiteral"), value: z.boolean() }),
z.object({ type: z.literal("callExpression"), expression: ExpressionSchema, arguments: z.array(ExpressionSchema) }),
z.object({ type: z.literal("binaryExpression"), left: ExpressionSchema, operator: z.string(), right: ExpressionSchema }),
// ... 27 more
])
);For a 30+ variant recursive union, this dual definition runs to hundreds of lines. Over time the two drift apart, and thereβs nothing to catch the mismatch.
Second, even with dual definitions, it wonβt compile.
As the depth of recursive unions increases, you hit TypeScriptβs generic instantiation limit:
TS2589: Type instantiation is excessively deep and possibly infinite.
Why does this happen with Zod but not with native TypeScript types? The difference is how the type checker resolves recursive references.
When TypeScript encounters IExpression in a native type alias, the recursive reference is a name lookup β a pointer back to the same definition. 30 variants referencing IExpression in their fields? 30 pointer lookups. O(N) β linear.
In Zod, z.discriminatedUnion is a deeply nested generic. TypeScript must structurally expand each variantβs output type through Zodβs internal conditional types. When it hits ExpressionSchema inside a variant, z.lazy() forces re-entry into the full union β N variants Γ K recursive fields, each triggering another full expansion. Depth 0: N. Depth 1: NΒ·K. Depth 2: (NΒ·K)Β². For N=30, K=2, depth 3 alone is 216,000 type resolutions. O((NΒ·K)^d) β exponential.
This is the most recurrently reported error in Zodβs issue tracker. #577Β , #5064Β , #5256Β β all recursive schemas, all TS2589, all unresolved in Zod v4. Discussion #1459Β even shows the same error with complex discriminated unions that arenβt recursive at all β the generic expansion is expensive enough on its own.
The practical consequence goes beyond compilation. The TypeScript language server runs the same type checker for IDE features β autocompletion, hover types, error highlighting. With a 30+ variant recursive Zod schema, the language server enters the same exponential expansion, memory spikes to gigabytes, and the IDE freezes β not just in the file where the schema is defined, but in every file that imports it. The development environment becomes unusable.
Third, even after enduring all of that, validation feedback is fundamentally impossible.
This is the most critical problem.
When validation fails on a union type, Zod canβt determine βwhich variant was this value intended to be.β In a 10-variant union, errors either flood out for all variants at once (#792Β ), or β if the discriminator doesnβt match β errors for other fields are silently hidden (#2202Β ). In Zod v4, this actually regressed: on discriminator mismatch, it returns an empty error array and βNo matching discriminatorβ (#4909Β , #5670Β ).
Think about it from the LLMβs perspective. If it intended a callExpression variant but got the arguments fieldβs type wrong, it needs feedback like βarguments should be an IExpression array, but you gave a string.β But Zod says βdoesnβt match any of 10 variants.β Feedback that doesnβt tell you what to fix isnβt feedback at all.
Typia structurally identifies the intended variant by analyzing the dataβs shape, then generates precise per-field errors against that variantβs schema. This is the prerequisite for the validation feedback loop to converge, and Zod lacks this mechanism entirely.
In summary: with Zod, you get dual definitions, compilation failure, and β even then β no feedback loop. The very engine behind AutoBeβs 6.75% β 100% simply cannot exist on top of Zod.
With Typia, a single TypeScript interface is all you need:
const result = typia.validate<AutoBeTest.IExpression>(llmOutput);It operates at the compiler level, so it handles types of any complexity. No separate schema definitions, no generic depth limits, no incomplete error messages.
A.5. Beyond the Token Limit: Incremental Structured Output
Function calling has an unspoken constraint: the entire JSON must fit in a single response. If the modelβs max output is 32K tokens and the target JSON is 100K tokens, the output gets truncated mid-JSON. With JSON.parse(), a truncated JSON is a failed JSON. The entire generation is wasted.
This is the structured output equivalent of non-incremental compilation β if any part fails, you throw everything away and start from scratch.
Typiaβs schema-driven lenient parsing changes this equation. Because parse() auto-closes unclosed brackets, completes incomplete values, and applies type coercion recursively β a truncated JSON isnβt a failure. Itβs a DeepPartial<T>: a typed object where completed fields are valid and missing fields are identifiable by the schema.
Turn 1: LLM generates 32K tokens β truncated mid-JSON
β Typia parse() β DeepPartial<T>
β Schema diff: "these fields are still missing"
Turn 2: "Fill in the remaining fields" + previous DeepPartial<T>
β LLM generates next chunk β Typia parse()
β DeepPartial<T> updated, validate() on completed subtrees
Turn N: β All fields present β validate() passes β TAt each turn, parse() recovers the truncated output, coerce() ensures correct types on what exists, and validate() can run on completed subtrees before the whole object is finished. Errors surface incrementally, not at the end.
This is incremental compilation applied to structured output. A traditional compiler recompiles everything on each run; an incremental compiler reuses previous results and only processes what changed. Similarly, traditional function calling discards truncated output and retries from scratch; Typiaβs approach reuses every valid field and only asks the LLM to fill whatβs missing.
The implication is that function callingβs output size is no longer bounded by max_output_tokens. A 200K-token JSON β far beyond any single model response β can be built incrementally across multiple turns, with type safety maintained at every step. The schema tells you what you have and what you need; the lenient parser ensures nothing is wasted.