Skip to Content
📖 Guide Documents📅 RoadmapDelta Release (active)

Preface

The AutoBE Delta roadmap focuses on transitioning from the horizontal expansion of Gamma to vertical deepening.

In the Gamma roadmap, we rapidly implemented various features such as RAG, Modularization, and Complementation under the “just ship it” philosophy. By prioritizing breadth of features over quality, AutoBE grew into a platform covering all areas of backend generation, but at the same time, stability gaps remained throughout.

In Delta, we fill these gaps. We discover logic defects and validation omissions that weren’t exposed in commercial models through Local LLM benchmarks and systematically fix them. We also complete the Hybrid Search system by adding Vector Similarity search to the RAG introduced in Gamma, and fully launch Multi-lingual Support (Java/Kotlin).

1. Local LLM Benchmark


Todo

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 3, documents: 12
Analyze Phase
Token Usage: 1.09M
in: 930.7K / out: 154.3K
Function Calls: 31 / 42 (73.8%)
Databasenamespaces: 3, models: 13
Database Phase
Token Usage: 1.57M
in: 1.54M / out: 38.0K
Function Calls: 40 / 43 (93.0%)
Interfaceoperations: 18, schemas: 19
Interface Phase
Token Usage: 28.01M
in: 27.86M / out: 146.6K
Function Calls: 159 / 271 (58.7%)
Testfunctions: 19
Test Phase
Token Usage: 2.61M
in: 2.58M / out: 26.4K
Function Calls: 40 / 52 (76.9%)
Realizefunctions: 22, errors: 3
Realize Phase
Token Usage: 10.24M
in: 10.06M / out: 171.7K
Function Calls: 135 / 187 (72.2%)
Function Calling Success Rate
67.30%
Elapsed Time
1h 47m 45s
🧠Total Tokens
76.79M
in: 75.88M (16.4K cached)
out: 902.4K

Bbs

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 2, documents: 11
Analyze Phase
Token Usage: 1.05M
in: 924.7K / out: 126.4K
Function Calls: 26 / 42 (61.9%)
Databasenamespaces: 9, models: 53
Database Phase
Token Usage: 6.42M
in: 6.26M / out: 155.0K
Function Calls: 156 / 167 (93.4%)
Interfaceoperations: 293, schemas: 297
Interface Phase
Token Usage: 352.91M
in: 349.25M / out: 3.65M
Function Calls: 2786 / 4027 (69.2%)
Testfunctions: 169
Test Phase
Token Usage: 138.02M
in: 136.58M / out: 1.44M
Function Calls: 574 / 2217 (25.9%)
Realizefunctions: 110, errors: 30
Realize Phase
Token Usage: 60.27M
in: 58.67M / out: 1.60M
Function Calls: 413 / 911 (45.3%)
Function Calling Success Rate
53.71%
Elapsed Time
6h 51m 42s
🧠Total Tokens
558.67M
in: 551.70M (16.4K cached)
out: 6.97M

Reddit

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 3, documents: 11
Analyze Phase
Token Usage: 1.46M
in: 1.31M / out: 150.8K
Function Calls: 27 / 54 (50.0%)
Databasenamespaces: 9, models: 90
Database Phase
Token Usage: 10.70M
in: 10.36M / out: 344.5K
Function Calls: 228 / 265 (86.0%)
Interfaceoperations: 507, schemas: 515
Interface Phase
Token Usage: 741.29M
in: 734.04M / out: 7.25M
Function Calls: 4992 / 7522 (66.4%)
Testfunctions: 781
Test Phase
Token Usage: 435.87M
in: 423.32M / out: 12.55M
Function Calls: 4343 / 4736 (91.7%)
Realize
Realize Phase
Token Usage: 60.87M
in: 59.74M / out: 1.13M
Function Calls: 486 / 1037 (46.9%)
Function Calling Success Rate
74.01%
Elapsed Time
6h 12m 6s
🧠Total Tokens
1250.18M
in: 1228.76M (0 cached)
out: 21.43M

Shopping

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 3, documents: 12
Analyze Phase
Token Usage: 1.81M
in: 1.55M / out: 263.2K
Function Calls: 33 / 54 (61.1%)
Databasenamespaces: 11, models: 100
Database Phase
Token Usage: 11.77M
in: 11.42M / out: 349.7K
Function Calls: 253 / 269 (94.1%)
Interfaceoperations: 560, schemas: 641
Interface Phase
Token Usage: 1001.00M
in: 992.30M / out: 8.70M
Function Calls: 5840 / 9409 (62.1%)
Testfunctions: 557
Test Phase
Token Usage: 495.09M
in: 486.86M / out: 8.23M
Function Calls: 3032 / 5141 (59.0%)
Realizefunctions: 241, errors: 94
Realize Phase
Token Usage: 134.35M
in: 131.08M / out: 3.27M
Function Calls: 957 / 1987 (48.2%)
Function Calling Success Rate
59.99%
Elapsed Time
14h 55m 1s
🧠Total Tokens
1644.03M
in: 1623.21M (99.2K cached)
out: 20.82M

Commercial models like Claude Sonnet or GPT are smart. Even if system prompts are somewhat ambiguous or there are gaps in validation feedback logic, they rarely create situations that trigger those issues. This also means it’s difficult for developers to discover defects. Since everything works, we mistakenly assume there are no problems.

However, when running the same workflow with open-source models like qwen3-next-80b-a3b or qwen3-30b-a3b-thinking, the story changes. Workflows that never failed with commercial models frequently crash with Qwen3. They reference non-existent DB tables, use reserved words for DTO property names, and put description or non-existent spec data inside JSON Schema properties.

This is Delta’s core strategy. Using Qwen3 as a touchstone to discover hidden defects and fixing them ensures more robust operation even with commercial models. We perform benchmarks for each Phase—Database, Interface, Test, and Realize—repeatedly analyzing failure causes and making improvements. When we achieve 100% success rate in one Phase, we move to the next, then analyze and improve again. This simple but tedious repetition is the foundation of Delta.

2. Validation Logic Enhancement

Schema and logic must be perfect before prompts. Edge cases that weren’t problematic with commercial models are exposed with Qwen3, revealing gaps in existing designs. We strengthen schemas and validation logic to be flawless, continuously improving edge cases discovered during benchmarking.

2.1. Dynamic Function Calling Schema

The Preliminary Asset introduced in Gamma allows subsequent Phases to access outputs from previous Phases. For example, when designing DTOs in Interface Phase, you can call getDatabaseTable(name: string) to reference table schemas designed in Database Phase. Test Phase uses getInterfaceOperation(endpoint: string) to get API endpoint information.

The problem is that these functions’ parameter types are statically declared as string. Even when the system prompt states “do not request non-existent endpoints” and validation feedback returns errors, Qwen3 rarely follows this.

Therefore, we construct dynamic function calling schemas for preliminary asset-related functions. By restricting requestable table names or endpoints to enum unions, we completely block access to non-existent DB tables or API operations at the type level.

// Before: Static schema interface IAutoBePreliminaryGetDatabaseSchemas { type: "getDatabaseSchemas"; schemaNames: string[] & tags.MinItems<1>; } // After: Dynamic schema interface IAutoBePreliminaryGetDatabaseSchemas { type: "getDatabaseSchemas"; schemaNames: Array< | "shopping_customers" | "shopping_sales" | "shopping_sale_snapshots" | "shopping_sale_snapshot_units" | ... > & tags.MinItems<1>; }

2.2. JSON Schema Validator

When designing DTOs in Interface Phase, Qwen3 makes a wide variety of JSON Schema errors. From simple constraint errors to structural defects, the types are extensive:

  • Placing Object type metadata (description, AutoBeOpenApi.IJsonSchema.IObject["x-autobe-database-schema"]) inside properties
  • Specifying AutoBeOpenApi.IJsonSchema.IObject["x-autobe-database-schema"] on non-Object types
  • Constructing mutually contradictory JSON schema constraints (e.g., minimum > maximum)
  • Mixed case in DTO type/property names and use of system reserved words
  • Non-compliance with UUID type for id and _id properties

Various other edge cases are continuously discovered during benchmarking. We add validators to verify these errors, preventing malformed schemas from propagating to subsequent Phases.

2.3. Validation Feedback Stringify

Previously, when runtime validation failed, the IValidation.IFailure value returned by typia.validate<T>() was fed back directly to the agent. However, this struct contains only mechanical error information, and Qwen3 often couldn’t accurately understand what went wrong.

We develop a custom JSON stringify function that annotates validation error details as comments. By specifying concrete error reasons in // ERROR: ... format next to properties where errors occurred, agents can intuitively understand what to fix and how.

2.4. Schema Review Validation Logic

The Schema Relation/Structure/Content Agents covered in the Design Integrity section are core agents for ensuring DTO quality. For these agents to function properly, strong validation logic must be in place first. We systematically build validation logic to prevent issues revealed in Qwen3 benchmarks—partial property review omissions, nonsensical review results, and unimplemented modification instructions.

Complete Traversal Verification:

  • Verify that all types and properties are included in the agent’s output review results
  • Return error if even one type or property is missing
  • Confirm that the path field in review results matches actual schema paths

Modification Instruction Consistency Verification:

  • Verify required fields based on modification type (add/modify require property)
  • Confirm that the location pointed to by path exists in the actual schema (for delete, modify)
  • Confirm no existing property at the same path for add
  • Pre-verify circular reference possibility

Agent-Specific Verification:

  • Relation Agent: Bidirectional consistency of FK relationships, existence of referenced types
  • Structure Agent: 1:1 correspondence verification between DB columns and DTO properties, apply type compatibility matrix
  • Content Agent: Non-empty verification of description/example fields, type appropriateness of example values

This validation logic must be implemented before Schema Agents operate and applies commonly to all three agents in the Design Integrity section.

2.5. Test Mapping Plan Enhancement

Modularization introduced in Gamma is a feature that divides large projects into multiple modules, generating code independently for each. Test Phase’s Mapping Plan defines which test files cover which API endpoints, and the execution order and dependencies between tests. The following issues frequently occurred with Qwen3:

  • Specifying non-existent endpoints as test targets
  • Duplicate test assignment for the same endpoint
  • Circular dependencies between tests (A→B→C→A)
  • Attempting to execute subsequent tests without prerequisite tests

Validation Enhancement Items:

  • Endpoint existence verification
  • Duplicate assignment verification
  • Dependency DAG verification (circular reference detection)
  • Dependency order verification (topological sort feasibility)

Dynamic Enum Schema Application:

  • Change Mapping Plan’s endpoint field from string to enum union of actually existing endpoints
  • Restrict test filename field to dynamic enum to prevent typos and references to non-existent files

2.6. Realize Mapping Plan Enhancement

Realize Phase’s Mapping Plan defines which Provider implements which API endpoint and the call relationships between Providers. It has a more complex dependency structure than Test Phase, and the following additional issues occurred:

  • Omissions in N:M mapping between Providers and endpoints
  • Duplicate generation of common utility functions
  • Circular calls between Providers
  • DB transaction boundary inconsistencies

Validation Enhancement Items:

  • Complete endpoint coverage verification
  • Provider inter-dependency DAG verification
  • Common utility duplication detection
  • DB access pattern consistency verification

Hierarchical Mapping Plan Structure:

  • Redesign the existing flat Mapping Plan into a hierarchical structure
  • Define clear ownership with 3-tier hierarchy: Module → Provider → Endpoint
  • Perform validation independently per tier for easier error cause identification

Explicit Provider Dependency Declaration:

  • Force explicit declaration in Mapping Plan when a Provider calls another Provider
  • Treat undeclared Provider calls as compilation errors
  • Prevent runtime errors from implicit dependencies

Progressive Validation Pipeline:

  1. Syntactic Validation: Grammatical correctness of Mapping Plan structure
  2. Semantic Validation: Endpoint/Provider existence, type matching
  3. Dependency Validation: DAG verification, circular reference detection
  4. Coverage Validation: Confirm all endpoints are implemented

Each stage must pass before proceeding to the next, with detailed error messages specific to the failed stage.

3. RAG Optimization

The RAG (Retrieval-Augmented Generation) introduced in Gamma had a monolithic structure where agents received all input materials at once. As project size grew, token consumption increased exponentially, and inefficiencies arose from including information the agent didn’t actually need in the context.

In Delta, we transition to a structure where agents actively request only the information they need selectively. By adding functions like analyzeFiles(), getDatabaseSchemas(), getInterfaceOperations(), and getInterfaceSchemas() to each workflow agent’s function calling schema, agents acquire necessary assets based on their own judgment. This aims to reduce function calling frequency for preliminary asset acquisition.

3.1. Hybrid Search (Vector + BM25)

Existing BM25-based keyword search relies on exact keyword matching, making it difficult to capture synonyms or similar concepts. When searching for “payment system,” “payment processing” or “purchase handling” might be missed.

In Delta, we build a Hybrid Search system by adding Vector Similarity search. Each section of requirements documents is embedded using Xenova TransformersXenova/all-MiniLM-L6-v2 model, and cosine similarity with queries is calculated. By computing weighted sums of BM25 scores and Vector scores for final ranking, searches that consider both keyword matching and semantic relevance become possible.

final_score = α × BM25_score + (1 - α) × vector_similarity

The optimal α value is derived through benchmarking, with an initial value of 0.5.

3.2. Dynamic K Retrieval

Fixing the K value in Top-K search causes two problems. If K is too small, relevant documents are missed; if K is too large, unnecessary documents are included, wasting tokens. Depending on the query, there might be 3 or 30 actually relevant documents.

In Delta, we dynamically adjust the K value by analyzing the score distribution (sharpness) of search results. We detect the point where scores drop sharply (elbow point) and return results only up to that point.

  • Sharpness threshold: Raised from 0.2 to 0.5 (based on benchmarks)
  • K range: kMin=3, kMax=8~12

This loads only the appropriate amount of context for each query’s characteristics, reducing token costs while maintaining recall.

3.3. RAG Benchmark & Tuning

We build a benchmark framework to quantitatively evaluate RAG system performance. We construct 14 test cases with clear answers and verify whether intended documents are retrieved as Top-1.

Benchmark items:

  • Retrieval Accuracy: Whether intended documents are included in Top-K
  • Token Efficiency: Token consumption comparison before and after RAG introduction
  • Tool Calling Count: Measure function calling frequency for todo, bbs, shopping projects

To solve the “infinite re-request of already acquired assets” problem occurring with open-source models (Qwen3), we continuously strengthen validation feedback logic and system prompts.

3.4. RAG Preliminary Prompt Tuning

The core of the RAG system is that agents request appropriate assets at the right time. However, current agents rarely call the getAnalysisFiles() function that retrieves requirements documents. They request DB schemas and API specs well, but don’t reference the requirements documents that form the basis of those designs.

The cause of this problem lies in the RAG preliminary agent’s system prompt. Agents were only instructed to “request when needed,” lacking concrete guidance on when to reference requirements.

Prompt Optimization Directions:

  • Explicitly enumerate situations requiring requirements reference
  • Present specific trigger conditions for getAnalysisFiles() calls
  • Warn about problems that can occur when requirements aren’t referenced
  • Provide examples contrasting good and bad requirements utilization

Index File Dual Structure: For large requirements documents, loading everything at once exceeds token limits. To solve this, we apply a [index file] + [RAG top-k files] dual structure. The index file contains the table of contents, section titles, and 1-2 sentence summaries of each section. Agents first read the index file to understand the overall structure, then selectively request only the detailed sections needed for current work. This enables efficient information exploration where agents “see the forest and pick the trees.”

We comprehensively review and optimize prompts for not just RAG preliminary agents but all agents that utilize requirements (Database, Interface, Test, Realize).

3.5. Analyze Agent Restructuring

Analyze Phase is the first step in transforming user requirements into structured form. Currently it only organizes input documents, but fundamental redesign is needed in conjunction with RAG optimization.

Current Problems:

  • Requirements documents are RAG searched and loaded at file granularity, causing large variance in content size between files
  • Main feature pages have thousands of tokens, subsidiary feature pages have hundreds, causing search ranking imbalance
  • File boundaries don’t match logical units
  • Chunk division criteria should be determined at Analyze stage but currently aren’t considered
  • Cross-references and dependency information between requirements are missing

Chunk Granulation Strategy:

  • Divide chunks from file units to topic or section units
  • Hierarchical division based on heading levels (H1, H2, H3)
  • Set appropriate token count range per chunk (e.g., 500~2000 tokens)
  • Ensure context continuity through chunk overlap

Search Method Improvements:

  • Automatically include parent context (parent section) after chunk-level search
  • Load related documents simultaneously through clustering of related chunks
  • Utilize chunk metadata (source file, section path, token count)

Analyze Agent Restructuring Directions:

  • Automatic hierarchical structure analysis of requirements documents and chunk boundary determination
  • Automatic summary generation for each chunk (for index file)
  • Build cross-reference graph between requirements
  • Automatic assignment of semantic tags for use in subsequent Phases

Clearly define and integrate the interface between the two systems so that Analyze Agent’s output can be directly used as RAG system input.

4. Design Integrity

In backend systems, DB design and Interface design must be closely connected. Tables defined in DB should be appropriately exposed via API, data referenced by API must exist in DB, and DB column types and DTO property types must be consistent. When this design consistency breaks, it can lead to runtime errors, data loss, and security vulnerabilities.

In Gamma, each Phase operated independently, lacking such cross-verification. In Delta, we introduce the concept of Design Integrity to build mechanisms that verify and ensure design consistency between Phases.

4.1. DB Coverage Agent

We develop an agent that verifies the coverage of DB table design against requirements documents. It analyzes requirements documents to extract mentioned entities, attributes, and relationships, and confirms whether these are reflected in actual DB schemas.

Verification items:

  • Entity Coverage: Are all concepts specified in requirements modeled as tables?
  • Attribute Coverage: Are each entity’s attributes defined as columns?
  • Relationship Coverage: Are entity relationships (1:1, 1:N, M:N) implemented as FKs and junction tables?

When omissions are detected, the agent suggests supplementary tables/columns, reflecting them in Database Phase outputs after user approval.

4.2. API Endpoint Coverage Agent

We develop an agent that verifies API endpoint coverage against requirements and DB design. It confirms whether appropriate CRUD endpoints exist for tables defined in DB and whether business logic specified in requirements is exposed as APIs.

Verification items:

  • Base CRUD Coverage: Existence of basic CRUD (Create, Read, Update, Delete) endpoints for each table
  • Action Endpoint Coverage: Existence of endpoints for special actions specified in requirements (e.g., “cancel order”, “approve payment”)
  • Query Endpoint Coverage: Existence of endpoints for query features like list retrieval, search, filtering

We provide an interactive review process enabling addition, modification, and deletion of already designed endpoints.

4.3. Schema Relation Agent

In Gamma, 4 review agents operated sequentially to ensure DTO quality:

  1. Relation Review Agent: FK relationships, referential integrity verification
  2. Security Review Agent: Security vulnerabilities, authentication boundary verification
  3. Content Review Agent: Field completeness, type appropriateness verification
  4. Phantom Review Agent: Remove properties that don’t exist in actual DB

These agents generally worked properly with commercial models. However, serious issues were revealed in Qwen3 benchmarks. They frequently skipped reviewing some properties, wrote nonsensical review results, or failed to properly reflect modification instructions in actual schemas.

In Delta, we restructure these into 3 agents (Relation, Structure, Content) and extremely strengthen function calling schemas and validation logic. The goal is super schema + logic strengthening that leaves no room for error.

The previous approach regenerated entire JSON Schemas as review results, which consumed many tokens and risked unintended changes. In Delta, all three agents switch output to AST structures containing targeted modification instructions for individual properties:

interface ISchemaModification { type: "add" | "delete" | "modify"; path: string; // e.g., "IShoppingSale.seller_id" property?: IProperty; // for add/modify reason: string; // modification reason }

Additionally, all three agents enforce complete traversal of all properties. Agents must review each type and property one by one, explicitly outputting review results (modification needed/not needed). The Validator checks for review omissions, returning errors if even a single property is missed. This completely blocks Qwen3 from skipping some properties.

Schema Relation Agent verifies and corrects reference relationships between DTOs:

  • FK Relation Integrity: Verify that DB FK relationships are correctly reflected in DTOs
  • Reference Type Consistency: Confirm correctness of reference types (IShoppingSale.seller: IShoppingSeller)
  • Aggregation/Composition: Verify that 1:N, M:N relationships are appropriately expressed as array types
  • Circular Reference Prevention: Prevent infinite loops from circular references

4.4. Schema Structure Agent

Schema Structure Agent verifies and corrects the structural correctness of DTOs. It particularly uses alignment with DB schema as the core criterion:

  • DB Existence Check: Confirm that DTO properties correspond to actual DB columns, instruct deletion if not (Phantom functionality integrated)
  • Type Consistency: Verify compatibility between DB types and DTO types, modify to match DB on mismatch
  • Nullable Consistency: Adjust DTO optional/required according to DB’s NOT NULL constraints
  • Security Boundary: Verify that sensitive information (actor_id, actor_session_id, etc.) is not exposed in Write DTOs (Security functionality integrated)

Like Relation Agent, complete traversal of all properties is enforced, returning errors on review omissions. When inconsistencies are detected, DTOs are modified based on DB schema. DB is the source of truth.

4.5. Schema Content Agent

Schema Content Agent doesn’t change DTO structure, only supplements metadata (descriptions, examples, comments). Like Relation and Structure Agents, it doesn’t regenerate entire JSON Schemas but outputs AST structures containing targeted modification instructions:

interface IContentModification { path: string; // e.g., "IShoppingSale.customer_id" description?: string; // type/property description example?: unknown; // example value deprecated?: string; // deprecation notice }

Review items:

  • Type Description: Write clear descriptions of what each DTO type represents
  • Property Description: Describe each property’s meaning, purpose, and constraints
  • Example Values: Provide representative example values (OpenAPI’s example field)
  • Deprecation Notice: Guidance for properties scheduled for deprecation

Like other agents, complete traversal of all types and properties is enforced. The Validator checks for review omissions, returning errors if any property lacks description or example.

By separating Relation, Structure, and Content Agents, each agent’s role becomes clear, work can proceed independently, and review completeness can be individually guaranteed for each.

5. Multi-lingual Support

AutoBE currently generates TypeScript/NestJS-based backend code. However, demand for the Java/Spring ecosystem remains strong in the enterprise market. In Delta, we fully launch architecture restructuring and Java Compiler development for Multi-lingual support.

Each Phase of AutoBE has different code generation structures, so Java support strategies also differ by Phase. Some Phases already use language-neutral ASTs (Abstract Syntax Trees) and can be addressed by adding code generators, but some Phases have TypeScript-specific structures requiring fundamental architecture restructuring.

PhaseCurrent StructureJava Support Strategy
DatabaseAgent generates ASTDevelop Java code generator
InterfaceAgent generates ASTDevelop Java code generator
TestAgent writes TypeScript as textTS → AST conversion Agent → Java code gen
RealizeNo AST structureLanguage-neutral AST design needed

5.1. Java Compiler PoC

We perform Java Compiler PoC by the first week of January 2026. We attempt Java code generation for each of Database, Interface, and Test Phases, identifying anticipated difficulties during actual implementation in advance.

Key issues derived from PoC:

  • Database: Define mapping rules between Prisma schema and Hibernate Entity
  • Interface: Define transformation rules from OpenAPI spec to Spring Controller/DTO
  • Test: Analyze TypeScript test code structure and review AST conversion feasibility

Through this, we derive focus work points for each Phase and concretize the overall implementation roadmap.

5.2. Java Database

In Database Phase, agents generate AutoBeDatabase.IApplication AST through function calling. This AST represents all DB schema information—tables, columns, relationships, indexes—in a language-neutral way.

The Java code generator takes this AST as input and generates:

  • Hibernate Entity classes: Java classes with JPA annotations like @Entity, @Table, @Column
  • Repository interfaces: Interfaces extending Spring Data JPA’s JpaRepository
  • Relationship mappings: Relationship annotations like @OneToMany, @ManyToOne, @ManyToMany

There are conceptual differences between Prisma’s relation syntax and Hibernate’s relationship mapping, requiring logic to accurately convert between them. Careful handling is particularly needed for implicit many-to-many relationships, cascade options, and fetch strategies.

5.3. Java Interface

In Interface Phase, agents also generate AutoBeOpenApi.IDocument AST. This AST represents API endpoints, request/response schemas, authentication information, etc., based on OpenAPI 3.1 spec.

The Java code generator takes this AST as input and generates:

  • Spring Controller: Controller classes with @RestController, @RequestMapping, @GetMapping, etc.
  • DTO classes: Java records or classes representing Request/Response bodies
  • Swagger documentation: SpringDoc annotations like @Operation, @ApiResponse, @Schema

Mapping between OpenAPI’s JSON Schema and Java type system, particularly handling of oneOf, anyOf, nullable, is the core challenge. Correspondence between NestJS’s decorator-based validation and Spring’s Bean Validation (@Valid, @NotNull, etc.) must also be considered.

5.4. Java Test

Test Phase has a structure where agents directly write TypeScript code as text. Since code is generated without language-neutral AST, intermediate conversion steps are needed for Java support.

The AutoBeTest struct exists as a remnant of past AST mode. However, this struct also contains TypeScript-specific expressions, limiting its direct use as Java code generator input. Therefore, we employ the following multi-stage strategy:

  1. TypeScript code generation: Agent writes TypeScript e2e test code as text
  2. AST conversion Agent: Separate conversion agent parses TypeScript code into AutoBeTest struct
  3. Language neutralization: Refine TypeScript-specific expressions in AutoBeTest to language-neutral form
  4. Java code generator: Generate JUnit 5-based test code from refined AST
  5. Language-specific utility module support: Current test code uses random functions from libraries like Typia to ensure test integrity. These must be reimplemented for each language.

Particularly in step 3, language neutralization requires various experiments:

  • Convert TypeScript’s async/await pattern to Java’s synchronous calls or CompletableFuture
  • Convert arrow function callbacks to Java lambdas or anonymous classes
  • Convert type assertions to Java casting
  • Address test framework differences (vitest → JUnit 5, expect → AssertJ)

In this process, AutoBeTest struct itself may need restructuring, and a more language-neutral test expression system may need to be designed.

5.5. Java Realize

Realize Phase is the most challenging area. Currently no AST structure exists, and agents directly write TypeScript Provider code as text. Provider code implements actual business logic, defining API endpoint behavior.

For Java support, a language-neutral AST must be newly designed. This AST must be able to express the following programming concepts:

  • Control flow: if/else, switch, for/while, try/catch
  • Data manipulation: variable declaration, assignment, operations
  • Function calls: method calls, chaining, callbacks
  • Object manipulation: property access, instance creation

The problem is TypeScript-specific syntax:

  • Object Literal: Instant object creation in { key: value } form
  • Undefined/Null handling: undefined keyword, optional chaining (?.), nullish coalescing (??)
  • Spread Operator: Object/array spreading in ...obj form
  • Destructuring: Destructuring assignment in const { a, b } = obj form

We must design a universal AST that excludes these TypeScript-specific syntax elements and develop a code generator to convert it to Java code. The goal within the 3-month period is draft-level design, with actual implementation and verification proceeding in subsequent roadmaps.

6. Human Modification Support

Backend code generated by AutoBE is complete production code. However, in reality, users frequently modify generated code directly. Code changes for various reasons—new business requirements, bug fixes, performance optimization. The problem is that if AutoBE can no longer support maintenance for such modified code, AutoBE’s value degrades to a one-time code generator.

In Delta, we introduce Human Modification Support, ensuring maintenance continuity by converting user-modified code into AutoBE’s internal representation. The key is utilizing parsers. Previously, we generated code in the AST → source code direction; now we parse in the source code → AST direction. Parsed ASTs are reflected in AutoBE’s history, enabling continued maintenance in subsequent conversations while being aware of user modifications.

6.1. Database Schema Parser

When users directly modify Prisma schema files (schema.prisma), we parse them and convert to AutoBeDatabase.IApplication AST. We implement a parser matching Prisma schema grammar to syntactically analyze schema files and map them to AutoBE’s internal representation.

Parsing targets:

  • Model definitions: Table names, column names, types, constraints
  • Relation definitions: FK relationships expressed via @relation annotations
  • Index definitions: Index settings like @@index, @@unique

Considerations when parsing:

  • AutoBE AST correspondence for Prisma-specific syntax (@default(autoincrement()), @updatedAt, etc.)
  • Preserve custom annotations or comments added by users
  • Identification and integration of new tables/columns not generated by AutoBE

6.2. Interface Schema Parser

When users directly modify NestJS Controller/DTO code, we parse TypeScript source code and convert to AutoBeOpenApi.IDocument AST.

TypeScript code parsing:

  • Extract operation information from NestJS Controller files (parsing @Get, @Post, etc. decorators)
  • Extract schema information from DTO classes (type definitions, validation decorators)
  • Static analysis using TypeScript AST parser

Considerations when parsing:

  • Identification and integration of new endpoints added by users
  • Detection of modifications to existing endpoints (parameter additions, response type changes, etc.)
  • Accurate reflection of DTO schema changes

6.3. Requirements Synchronization Agent

When DB schema or Interface changes, this signifies requirements changes. Users directly modifying code means new requirements not in original requirements have emerged, or existing requirements have changed.

Requirements Synchronization Agent analyzes parsed ASTs to automatically revise requirements documents:

  • Added entity/property detection: Reflect new tables or columns in requirements documents
  • Changed relationship detection: Update related requirements sections when FK relationships or reference structures change
  • Added endpoint detection: Document functionality requirements when new API endpoints are added
  • Changed schema detection: Reflect DTO field additions/deletions/type changes in requirements

The agent summarizes changes and presents them to users, updating requirements documents after approval. This maintains consistency between code and documentation, enabling change history tracking for future maintenance.

7. Miscellaneous

7.1. System Prompt Simplification

AutoBE’s system prompts are structured using Meta Prompting. We provide agents with functions to call and their type definitions, where JSDoc comments written on functions and interface types serve as documentation.

The problem is that we had AI explain these comments and generate system prompts. As a result, system prompts are overly verbose, with redundant expressions or missed key points. This leads to unnecessary token consumption, and prompt quality itself cannot be guaranteed.

In Delta, we fundamentally review system prompts:

  • Remove duplicate expressions: Consolidate parts explaining the same concept multiple times
  • Clarify core instructions: Concisely organize rules agents must follow
  • Improve token efficiency: Reduce token consumption by removing unnecessary modifiers and explanations
  • Ensure consistency: Unify writing style and structure across Phase-specific system prompts

7.2. Playground Service Enhancement

AutoBE provides a Playground service that can run locally. However, the current version has only minimal functionality implemented, with poor user experience.

In Delta, we enhance Playground to the level of services developed in past hackathons:

  • SQLite-based history storage: Permanently preserve backend applications and conversation history in local SQLite DB
  • Project management: Management features to create, view, and delete multiple backend projects
  • Conversation history viewing: View conversation records with agents per project in timeline format
  • Artifact download: Batch download generated code, schemas, and documents

This enables users to systematically preserve and manage backend applications generated through AutoBE.

7.3. PR Articles

We conduct promotional activities to introduce AutoBE to the developer community. We continuously post technical articles on major developer communities like dev.to, Reddit, and Hacker News.

Planned topics:

  • AutoBE Introduction: Concepts and operating principles of AI-based automatic backend generation platform
  • Technical Deep Dive: Core technology explanations including RAG, Multi-Phase Architecture, Design Integrity
  • Use Cases: Tutorials on generating actual backends with AutoBE
  • Benchmark Results: Performance comparison analysis between commercial and open-source models

We continuously write articles throughout the quarter to raise AutoBE’s awareness and collect community feedback.

Last updated on