Delta

Preface

The AutoBE Delta roadmap focuses on transitioning from the horizontal expansion of Gamma to vertical deepening.

In the Gamma roadmap, we rapidly implemented various features such as RAG, Modularization, and Complementation under the “just ship it” philosophy. By prioritizing breadth of features over quality, AutoBE grew into a platform covering all areas of backend generation, but at the same time, stability gaps remained throughout.

In Delta, we fill these gaps. We discover logic defects and validation omissions that weren’t exposed in commercial models through Local LLM benchmarks and systematically fix them. We also complete the Hybrid Search system by adding Vector Similarity search to the RAG introduced in Gamma, and fully launch Multi-lingual Support (Java/Kotlin).

1. Local LLM Benchmark

🔍

No projects available for this model

Commercial models like Claude Sonnet or GPT are smart. Even if system prompts are somewhat ambiguous or there are gaps in validation feedback logic, they rarely create situations that trigger those issues. This also means it’s difficult for developers to discover defects. Since everything works, we mistakenly assume there are no problems.

However, when running the same workflow with open-source models like qwen3-next-80b-a3b or qwen3-30b-a3b-thinking, the story changes. Workflows that never failed with commercial models frequently crash with Qwen3. They reference non-existent DB tables, use reserved words for DTO property names, and put description or non-existent spec data inside JSON Schema properties.

This is Delta’s core strategy. Using Qwen3 as a touchstone to discover hidden defects and fixing them ensures more robust operation even with commercial models. We perform benchmarks for each Phase—Database, Interface, Test, and Realize—repeatedly analyzing failure causes and making improvements. When we achieve 100% success rate in one Phase, we move to the next, then analyze and improve again. This simple but tedious repetition is the foundation of Delta.

2. Validation Logic Enhancement

Schema and logic must be perfect before prompts. Edge cases that weren’t problematic with commercial models are exposed with Qwen3, revealing gaps in existing designs. We strengthen schemas and validation logic to be flawless, continuously improving edge cases discovered during benchmarking.

2.1. Dynamic Function Calling Schema

The Preliminary Asset introduced in Gamma allows subsequent Phases to access outputs from previous Phases. For example, when designing DTOs in Interface Phase, you can call getDatabaseTable(name: string) to reference table schemas designed in Database Phase. Test Phase uses getInterfaceOperation(endpoint: string) to get API endpoint information.

The problem is that these functions’ parameter types are statically declared as string. Even when the system prompt states “do not request non-existent endpoints” and validation feedback returns errors, Qwen3 rarely follows this.

Therefore, we construct dynamic function calling schemas for preliminary asset-related functions. By restricting requestable table names or endpoints to enum unions, we completely block access to non-existent DB tables or API operations at the type level.


// Before: Static schema
interface IAutoBePreliminaryGetDatabaseSchemas {
  type: "getDatabaseSchemas";
  schemaNames: string[] & tags.MinItems<1>;
}
 
// After: Dynamic schema
interface IAutoBePreliminaryGetDatabaseSchemas {
  type: "getDatabaseSchemas";
  schemaNames: Array<
    | "shopping_customers"
    | "shopping_sales"
    | "shopping_sale_snapshots"
    | "shopping_sale_snapshot_units"
    | ...
  > & tags.MinItems<1>;
}

2.2. JSON Schema Validator

When designing DTOs in Interface Phase, Qwen3 makes a wide variety of JSON Schema errors. From simple constraint errors to structural defects, the types are extensive:

Placing Object type metadata (description, AutoBeOpenApi.IJsonSchema.IObject["x-autobe-database-schema"]) inside properties
Specifying AutoBeOpenApi.IJsonSchema.IObject["x-autobe-database-schema"] on non-Object types
Constructing mutually contradictory JSON schema constraints (e.g., minimum > maximum)
Mixed case in DTO type/property names and use of system reserved words
Non-compliance with UUID type for id and _id properties

Various other edge cases are continuously discovered during benchmarking. We add validators to verify these errors, preventing malformed schemas from propagating to subsequent Phases.

2.3. Validation Feedback Stringify

Previously, when runtime validation failed, the IValidation.IFailure value returned by typia.validate<T>() was fed back directly to the agent. However, this struct contains only mechanical error information, and Qwen3 often couldn’t accurately understand what went wrong.

We develop a custom JSON stringify function that annotates validation error details as comments. By specifying concrete error reasons in // ERROR: ... format next to properties where errors occurred, agents can intuitively understand what to fix and how.

2.4. Schema Review Validation Logic

The Schema Relation/Structure/Content Agents covered in the Design Integrity section are core agents for ensuring DTO quality. For these agents to function properly, strong validation logic must be in place first. We systematically build validation logic to prevent issues revealed in Qwen3 benchmarks—partial property review omissions, nonsensical review results, and unimplemented modification instructions.

Complete Traversal Verification:

Verify that all types and properties are included in the agent’s output review results
Return error if even one type or property is missing
Confirm that the path field in review results matches actual schema paths

Modification Instruction Consistency Verification:

Verify required fields based on modification type (add/modify require property)
Confirm that the location pointed to by path exists in the actual schema (for delete, modify)
Confirm no existing property at the same path for add
Pre-verify circular reference possibility

Agent-Specific Verification:

Relation Agent: Bidirectional consistency of FK relationships, existence of referenced types
Structure Agent: 1:1 correspondence verification between DB columns and DTO properties, apply type compatibility matrix
Content Agent: Non-empty verification of description/example fields, type appropriateness of example values

This validation logic must be implemented before Schema Agents operate and applies commonly to all three agents in the Design Integrity section.

2.5. Test Mapping Plan Enhancement

Modularization introduced in Gamma is a feature that divides large projects into multiple modules, generating code independently for each. Test Phase’s Mapping Plan defines which test files cover which API endpoints, and the execution order and dependencies between tests. The following issues frequently occurred with Qwen3:

Specifying non-existent endpoints as test targets
Duplicate test assignment for the same endpoint
Circular dependencies between tests (A→B→C→A)
Attempting to execute subsequent tests without prerequisite tests

Validation Enhancement Items:

Endpoint existence verification
Duplicate assignment verification
Dependency DAG verification (circular reference detection)
Dependency order verification (topological sort feasibility)

Dynamic Enum Schema Application:

Change Mapping Plan’s endpoint field from string to enum union of actually existing endpoints
Restrict test filename field to dynamic enum to prevent typos and references to non-existent files

2.6. Realize Mapping Plan Enhancement

Realize Phase’s Mapping Plan defines which Provider implements which API endpoint and the call relationships between Providers. It has a more complex dependency structure than Test Phase, and the following additional issues occurred:

Omissions in N:M mapping between Providers and endpoints
Duplicate generation of common utility functions
Circular calls between Providers
DB transaction boundary inconsistencies

Validation Enhancement Items:

Complete endpoint coverage verification
Provider inter-dependency DAG verification
Common utility duplication detection
DB access pattern consistency verification

Hierarchical Mapping Plan Structure:

Redesign the existing flat Mapping Plan into a hierarchical structure
Define clear ownership with 3-tier hierarchy: Module → Provider → Endpoint
Perform validation independently per tier for easier error cause identification

Explicit Provider Dependency Declaration:

Force explicit declaration in Mapping Plan when a Provider calls another Provider
Treat undeclared Provider calls as compilation errors
Prevent runtime errors from implicit dependencies

Progressive Validation Pipeline:

Syntactic Validation: Grammatical correctness of Mapping Plan structure
Semantic Validation: Endpoint/Provider existence, type matching
Dependency Validation: DAG verification, circular reference detection
Coverage Validation: Confirm all endpoints are implemented

Each stage must pass before proceeding to the next, with detailed error messages specific to the failed stage.

3. RAG Optimization

The RAG (Retrieval-Augmented Generation) introduced in Gamma had a monolithic structure where agents received all input materials at once. As project size grew, token consumption increased exponentially, and inefficiencies arose from including information the agent didn’t actually need in the context.

In Delta, we transition to a structure where agents actively request only the information they need selectively. By adding functions like analyzeFiles(), getDatabaseSchemas(), getInterfaceOperations(), and getInterfaceSchemas() to each workflow agent’s function calling schema, agents acquire necessary assets based on their own judgment. This aims to reduce function calling frequency for preliminary asset acquisition.

3.1. Hybrid Search (Vector + BM25)

Existing BM25-based keyword search relies on exact keyword matching, making it difficult to capture synonyms or similar concepts. When searching for “payment system,” “payment processing” or “purchase handling” might be missed.

In Delta, we build a Hybrid Search system by adding Vector Similarity search. Each section of requirements documents is embedded using Xenova Transformers’ Xenova/all-MiniLM-L6-v2 model, and cosine similarity with queries is calculated. By computing weighted sums of BM25 scores and Vector scores for final ranking, searches that consider both keyword matching and semantic relevance become possible.


final_score = α × BM25_score + (1 - α) × vector_similarity

The optimal α value is derived through benchmarking, with an initial value of 0.5.

3.2. Dynamic K Retrieval

Fixing the K value in Top-K search causes two problems. If K is too small, relevant documents are missed; if K is too large, unnecessary documents are included, wasting tokens. Depending on the query, there might be 3 or 30 actually relevant documents.

In Delta, we dynamically adjust the K value by analyzing the score distribution (sharpness) of search results. We detect the point where scores drop sharply (elbow point) and return results only up to that point.

Sharpness threshold: Raised from 0.2 to 0.5 (based on benchmarks)
K range: kMin=3, kMax=8~12

This loads only the appropriate amount of context for each query’s characteristics, reducing token costs while maintaining recall.

3.3. RAG Benchmark & Tuning

We build a benchmark framework to quantitatively evaluate RAG system performance. We construct 14 test cases with clear answers and verify whether intended documents are retrieved as Top-1.

Benchmark items:

Retrieval Accuracy: Whether intended documents are included in Top-K
Token Efficiency: Token consumption comparison before and after RAG introduction
Tool Calling Count: Measure function calling frequency for todo, bbs, shopping projects

To solve the “infinite re-request of already acquired assets” problem occurring with open-source models (Qwen3), we continuously strengthen validation feedback logic and system prompts.

3.4. RAG Preliminary Prompt Tuning

The core of the RAG system is that agents request appropriate assets at the right time. However, current agents rarely call the getAnalysisSections() function that retrieves requirements documents. They request DB schemas and API specs well, but don’t reference the requirements documents that form the basis of those designs.

The cause of this problem lies in the RAG preliminary agent’s system prompt. Agents were only instructed to “request when needed,” lacking concrete guidance on when to reference requirements.

Prompt Optimization Directions:

Explicitly enumerate situations requiring requirements reference
Present specific trigger conditions for getAnalysisSections() calls
Warn about problems that can occur when requirements aren’t referenced
Provide examples contrasting good and bad requirements utilization

Index File Dual Structure: For large requirements documents, loading everything at once exceeds token limits. To solve this, we apply a [index file] + [RAG top-k files] dual structure. The index file contains the table of contents, section titles, and 1-2 sentence summaries of each section. Agents first read the index file to understand the overall structure, then selectively request only the detailed sections needed for current work. This enables efficient information exploration where agents “see the forest and pick the trees.”

We comprehensively review and optimize prompts for not just RAG preliminary agents but all agents that utilize requirements (Database, Interface, Test, Realize).

3.5. Analyze Agent Restructuring

Analyze Phase is the first step in transforming user requirements into structured form. Currently it only organizes input documents, but fundamental redesign is needed in conjunction with RAG optimization.

Current Problems:

Requirements documents are RAG searched and loaded at file granularity, causing large variance in content size between files
Main feature pages have thousands of tokens, subsidiary feature pages have hundreds, causing search ranking imbalance
File boundaries don’t match logical units
Chunk division criteria should be determined at Analyze stage but currently aren’t considered
Cross-references and dependency information between requirements are missing

Chunk Granulation Strategy:

Divide chunks from file units to topic or section units
Hierarchical division based on heading levels (H1, H2, H3)
Set appropriate token count range per chunk (e.g., 500~2000 tokens)
Ensure context continuity through chunk overlap

Search Method Improvements:

Automatically include parent context (parent section) after chunk-level search
Load related documents simultaneously through clustering of related chunks
Utilize chunk metadata (source file, section path, token count)

Analyze Agent Restructuring Directions:

Automatic hierarchical structure analysis of requirements documents and chunk boundary determination
Automatic summary generation for each chunk (for index file)
Build cross-reference graph between requirements
Automatic assignment of semantic tags for use in subsequent Phases

Clearly define and integrate the interface between the two systems so that Analyze Agent’s output can be directly used as RAG system input.

4. Design Integrity

In backend systems, DB design and Interface design must be closely connected. Tables defined in DB should be appropriately exposed via API, data referenced by API must exist in DB, and DB column types and DTO property types must be consistent. When this design consistency breaks, it can lead to runtime errors, data loss, and security vulnerabilities.

In Gamma, each Phase operated independently, lacking such cross-verification. In Delta, we introduce the concept of Design Integrity to build mechanisms that verify and ensure design consistency between Phases.

4.1. DB Coverage Agent

We develop an agent that verifies the coverage of DB table design against requirements documents. It analyzes requirements documents to extract mentioned entities, attributes, and relationships, and confirms whether these are reflected in actual DB schemas.

Verification items:

Entity Coverage: Are all concepts specified in requirements modeled as tables?
Attribute Coverage: Are each entity’s attributes defined as columns?
Relationship Coverage: Are entity relationships (1:1, 1:N, M:N) implemented as FKs and junction tables?

When omissions are detected, the agent suggests supplementary tables/columns, reflecting them in Database Phase outputs after user approval.

4.2. API Endpoint Coverage Agent

We develop an agent that verifies API endpoint coverage against requirements and DB design. It confirms whether appropriate CRUD endpoints exist for tables defined in DB and whether business logic specified in requirements is exposed as APIs.

Verification items:

Base CRUD Coverage: Existence of basic CRUD (Create, Read, Update, Delete) endpoints for each table
Action Endpoint Coverage: Existence of endpoints for special actions specified in requirements (e.g., “cancel order”, “approve payment”)
Query Endpoint Coverage: Existence of endpoints for query features like list retrieval, search, filtering

We provide an interactive review process enabling addition, modification, and deletion of already designed endpoints.

4.3. Schema Relation Agent

In Gamma, 4 review agents operated sequentially to ensure DTO quality:

Relation Review Agent: FK relationships, referential integrity verification
Security Review Agent: Security vulnerabilities, authentication boundary verification
Content Review Agent: Field completeness, type appropriateness verification
Phantom Review Agent: Remove properties that don’t exist in actual DB

These agents generally worked properly with commercial models. However, serious issues were revealed in Qwen3 benchmarks. They frequently skipped reviewing some properties, wrote nonsensical review results, or failed to properly reflect modification instructions in actual schemas.

In Delta, we restructure these into 3 agents (Relation, Structure, Content) and extremely strengthen function calling schemas and validation logic. The goal is super schema + logic strengthening that leaves no room for error.

The previous approach regenerated entire JSON Schemas as review results, which consumed many tokens and risked unintended changes. In Delta, all three agents switch output to AST structures containing targeted modification instructions for individual properties:


interface ISchemaModification {
  type: "add" | "delete" | "modify";
  path: string;           // e.g., "IShoppingSale.seller_id"
  property?: IProperty;   // for add/modify
  reason: string;         // modification reason
}

Additionally, all three agents enforce complete traversal of all properties. Agents must review each type and property one by one, explicitly outputting review results (modification needed/not needed). The Validator checks for review omissions, returning errors if even a single property is missed. This completely blocks Qwen3 from skipping some properties.

Schema Relation Agent verifies and corrects reference relationships between DTOs:

FK Relation Integrity: Verify that DB FK relationships are correctly reflected in DTOs
Reference Type Consistency: Confirm correctness of reference types (IShoppingSale.seller: IShoppingSeller)
Aggregation/Composition: Verify that 1:N, M:N relationships are appropriately expressed as array types
Circular Reference Prevention: Prevent infinite loops from circular references

4.4. Schema Structure Agent

Schema Structure Agent verifies and corrects the structural correctness of DTOs. It particularly uses alignment with DB schema as the core criterion:

DB Existence Check: Confirm that DTO properties correspond to actual DB columns, instruct deletion if not (Phantom functionality integrated)
Type Consistency: Verify compatibility between DB types and DTO types, modify to match DB on mismatch
Nullable Consistency: Adjust DTO optional/required according to DB’s NOT NULL constraints
Security Boundary: Verify that sensitive information (actor_id, actor_session_id, etc.) is not exposed in Write DTOs (Security functionality integrated)

Like Relation Agent, complete traversal of all properties is enforced, returning errors on review omissions. When inconsistencies are detected, DTOs are modified based on DB schema. DB is the source of truth.

4.5. Schema Content Agent

Schema Content Agent doesn’t change DTO structure, only supplements metadata (descriptions, examples, comments). Like Relation and Structure Agents, it doesn’t regenerate entire JSON Schemas but outputs AST structures containing targeted modification instructions:


interface IContentModification {
  path: string;                    // e.g., "IShoppingSale.customer_id"
  description?: string;            // type/property description
  example?: unknown;               // example value
  deprecated?: string;             // deprecation notice
}

Review items:

Type Description: Write clear descriptions of what each DTO type represents
Property Description: Describe each property’s meaning, purpose, and constraints
Example Values: Provide representative example values (OpenAPI’s example field)
Deprecation Notice: Guidance for properties scheduled for deprecation

Like other agents, complete traversal of all types and properties is enforced. The Validator checks for review omissions, returning errors if any property lacks description or example.

By separating Relation, Structure, and Content Agents, each agent’s role becomes clear, work can proceed independently, and review completeness can be individually guaranteed for each.

5. Multi-lingual Support

AutoBE currently generates TypeScript/NestJS-based backend code. However, demand for the Java/Spring ecosystem remains strong in the enterprise market. In Delta, we fully launch architecture restructuring and Java Compiler development for Multi-lingual support.

Each Phase of AutoBE has different code generation structures, so Java support strategies also differ by Phase. Some Phases already use language-neutral ASTs (Abstract Syntax Trees) and can be addressed by adding code generators, but some Phases have TypeScript-specific structures requiring fundamental architecture restructuring.

Phase	Current Structure	Java Support Strategy
Database	Agent generates AST	Develop Java code generator
Interface	Agent generates AST	Develop Java code generator
Test	Agent writes TypeScript as text	TS → AST conversion Agent → Java code gen
Realize	No AST structure	Language-neutral AST design needed

5.1. Java Compiler PoC

We perform Java Compiler PoC by the first week of January 2026. We attempt Java code generation for each of Database, Interface, and Test Phases, identifying anticipated difficulties during actual implementation in advance.

Key issues derived from PoC:

Database: Define mapping rules between Prisma schema and Hibernate Entity
Interface: Define transformation rules from OpenAPI spec to Spring Controller/DTO
Test: Analyze TypeScript test code structure and review AST conversion feasibility

Through this, we derive focus work points for each Phase and concretize the overall implementation roadmap.

5.2. Java Database

In Database Phase, agents generate AutoBeDatabase.IApplication AST through function calling. This AST represents all DB schema information—tables, columns, relationships, indexes—in a language-neutral way.

The Java code generator takes this AST as input and generates:

Hibernate Entity classes: Java classes with JPA annotations like @Entity, @Table, @Column
Repository interfaces: Interfaces extending Spring Data JPA’s JpaRepository
Relationship mappings: Relationship annotations like @OneToMany, @ManyToOne, @ManyToMany

There are conceptual differences between Prisma’s relation syntax and Hibernate’s relationship mapping, requiring logic to accurately convert between them. Careful handling is particularly needed for implicit many-to-many relationships, cascade options, and fetch strategies.

5.3. Java Interface

In Interface Phase, agents also generate AutoBeOpenApi.IDocument AST. This AST represents API endpoints, request/response schemas, authentication information, etc., based on OpenAPI 3.1 spec.

The Java code generator takes this AST as input and generates:

Spring Controller: Controller classes with @RestController, @RequestMapping, @GetMapping, etc.
DTO classes: Java records or classes representing Request/Response bodies
Swagger documentation: SpringDoc annotations like @Operation, @ApiResponse, @Schema

Mapping between OpenAPI’s JSON Schema and Java type system, particularly handling of oneOf, anyOf, nullable, is the core challenge. Correspondence between NestJS’s decorator-based validation and Spring’s Bean Validation (@Valid, @NotNull, etc.) must also be considered.

5.4. Java Test

Test Phase has a structure where agents directly write TypeScript code as text. Since code is generated without language-neutral AST, intermediate conversion steps are needed for Java support.

The AutoBeTest struct exists as a remnant of past AST mode. However, this struct also contains TypeScript-specific expressions, limiting its direct use as Java code generator input. Therefore, we employ the following multi-stage strategy:

TypeScript code generation: Agent writes TypeScript e2e test code as text
AST conversion Agent: Separate conversion agent parses TypeScript code into AutoBeTest struct
Language neutralization: Refine TypeScript-specific expressions in AutoBeTest to language-neutral form
Java code generator: Generate JUnit 5-based test code from refined AST
Language-specific utility module support: Current test code uses random functions from libraries like Typia to ensure test integrity. These must be reimplemented for each language.

Particularly in step 3, language neutralization requires various experiments:

Convert TypeScript’s async/await pattern to Java’s synchronous calls or CompletableFuture
Convert arrow function callbacks to Java lambdas or anonymous classes
Convert type assertions to Java casting
Address test framework differences (vitest → JUnit 5, expect → AssertJ)

In this process, AutoBeTest struct itself may need restructuring, and a more language-neutral test expression system may need to be designed.

5.5. Java Realize

Realize Phase is the most challenging area. Currently no AST structure exists, and agents directly write TypeScript Provider code as text. Provider code implements actual business logic, defining API endpoint behavior.

For Java support, a language-neutral AST must be newly designed. This AST must be able to express the following programming concepts:

Control flow: if/else, switch, for/while, try/catch
Data manipulation: variable declaration, assignment, operations
Function calls: method calls, chaining, callbacks
Object manipulation: property access, instance creation

The problem is TypeScript-specific syntax:

Object Literal: Instant object creation in { key: value } form
Undefined/Null handling: undefined keyword, optional chaining (?.), nullish coalescing (??)
Spread Operator: Object/array spreading in ...obj form
Destructuring: Destructuring assignment in const { a, b } = obj form

We must design a universal AST that excludes these TypeScript-specific syntax elements and develop a code generator to convert it to Java code. The goal within the 3-month period is draft-level design, with actual implementation and verification proceeding in subsequent roadmaps.

6. Human Modification Support

Backend code generated by AutoBE is complete production code. However, in reality, users frequently modify generated code directly. Code changes for various reasons—new business requirements, bug fixes, performance optimization. The problem is that if AutoBE can no longer support maintenance for such modified code, AutoBE’s value degrades to a one-time code generator.

In Delta, we introduce Human Modification Support, ensuring maintenance continuity by converting user-modified code into AutoBE’s internal representation. The key is utilizing parsers. Previously, we generated code in the AST → source code direction; now we parse in the source code → AST direction. Parsed ASTs are reflected in AutoBE’s history, enabling continued maintenance in subsequent conversations while being aware of user modifications.

6.1. Database Schema Parser

When users directly modify Prisma schema files (schema.prisma), we parse them and convert to AutoBeDatabase.IApplication AST. We implement a parser matching Prisma schema grammar to syntactically analyze schema files and map them to AutoBE’s internal representation.

Parsing targets:

Model definitions: Table names, column names, types, constraints
Relation definitions: FK relationships expressed via @relation annotations
Index definitions: Index settings like @@index, @@unique

Considerations when parsing:

AutoBE AST correspondence for Prisma-specific syntax (@default(autoincrement()), @updatedAt, etc.)
Preserve custom annotations or comments added by users
Identification and integration of new tables/columns not generated by AutoBE

6.2. Interface Schema Parser

When users directly modify NestJS Controller/DTO code, we parse TypeScript source code and convert to AutoBeOpenApi.IDocument AST.

TypeScript code parsing:

Extract operation information from NestJS Controller files (parsing @Get, @Post, etc. decorators)
Extract schema information from DTO classes (type definitions, validation decorators)
Static analysis using TypeScript AST parser

Considerations when parsing:

Identification and integration of new endpoints added by users
Detection of modifications to existing endpoints (parameter additions, response type changes, etc.)
Accurate reflection of DTO schema changes

6.3. Requirements Synchronization Agent

When DB schema or Interface changes, this signifies requirements changes. Users directly modifying code means new requirements not in original requirements have emerged, or existing requirements have changed.

Requirements Synchronization Agent analyzes parsed ASTs to automatically revise requirements documents:

Added entity/property detection: Reflect new tables or columns in requirements documents
Changed relationship detection: Update related requirements sections when FK relationships or reference structures change
Added endpoint detection: Document functionality requirements when new API endpoints are added
Changed schema detection: Reflect DTO field additions/deletions/type changes in requirements

The agent summarizes changes and presents them to users, updating requirements documents after approval. This maintains consistency between code and documentation, enabling change history tracking for future maintenance.

7. Miscellaneous

7.1. System Prompt Simplification

AutoBE’s system prompts are structured using Meta Prompting. We provide agents with functions to call and their type definitions, where JSDoc comments written on functions and interface types serve as documentation.

The problem is that we had AI explain these comments and generate system prompts. As a result, system prompts are overly verbose, with redundant expressions or missed key points. This leads to unnecessary token consumption, and prompt quality itself cannot be guaranteed.

In Delta, we fundamentally review system prompts:

Remove duplicate expressions: Consolidate parts explaining the same concept multiple times
Clarify core instructions: Concisely organize rules agents must follow
Improve token efficiency: Reduce token consumption by removing unnecessary modifiers and explanations
Ensure consistency: Unify writing style and structure across Phase-specific system prompts

7.2. Playground Service Enhancement

AutoBE provides a Playground service that can run locally. However, the current version has only minimal functionality implemented, with poor user experience.

In Delta, we enhance Playground to the level of services developed in past hackathons:

SQLite-based history storage: Permanently preserve backend applications and conversation history in local SQLite DB
Project management: Management features to create, view, and delete multiple backend projects
Conversation history viewing: View conversation records with agents per project in timeline format
Artifact download: Batch download generated code, schemas, and documents

This enables users to systematically preserve and manage backend applications generated through AutoBE.

7.3. PR Articles

We conduct promotional activities to introduce AutoBE to the developer community. We continuously post technical articles on major developer communities like dev.to, Reddit, and Hacker News.

Planned topics:

AutoBE Introduction: Concepts and operating principles of AI-based automatic backend generation platform
Technical Deep Dive: Core technology explanations including RAG, Multi-Phase Architecture, Design Integrity
Use Cases: Tutorials on generating actual backends with AutoBE
Benchmark Results: Performance comparison analysis between commercial and open-source models

We continuously write articles throughout the quarter to raise AutoBE’s awareness and collect community feedback.