Skip to Content
πŸ“– Guide DocumentsπŸ† Hackathon

1. Overview

Wrtn Technologies is hosting the 1st AutoBE Hackathon.

Hackathon Information

Event Details

  • Participants: Maximum 40 people
  • Registration Period: September 5 - 10, 2025
  • Registration Form: Google FormsΒ 
  • Event Schedule: September 12 - 14, 2025 (64 hours)
    • Start: September 12, 08:00:00 (PDT, UTC-7)
    • End: September 14, 23:59:59 (PDT, UTC-7)
  • Winners Announcement: September 17, 2025
  • Total Prize Pool: $6,400
    • Grand Prize (1 person): $2,000
    • Excellence Award (1 person): $1,000
    • Participation Prize: $50 for all who submit detailed reviews for both models
  • NO API COST BARRIERS TO PARTICIPATION: Each participant will receive token usage credits worth $350
Backend Without Humans, Closer Than You Think?

AutoBE is a no-code AI platform that turns natural language into backend applications. It analyzes requirements, designs schemas and APIs, writes tests, and implements code.

This Hackathon challenges experienced backend developers to evaluate whether AutoBE’s output is truly production-ready. Assess its code quality, scalability, and performance, compare it with your own practices, and suggest improvements.

Your insights will be essential in proving whether AutoBE is a genuinely useful tool!

2. What is AutoBE?

AutoBE is an AI-based no-code platform for generating production-grade backend applications from natural language.

Key Innovation

AutoBE uses a Compiler-in-the-Loop approach to ensure generated code compiles and runs, addressing limitations of existing AI tools.

Core Achievement

Achieves a 100% build success rate (based on OpenAI GPT-4.1) for backend applications.

2.1. How It Works

AutoBE follows a 5-stage process with specialized AI agents and real-time compiler validation:

  1. Analyze Agent: Interprets natural language requirements, defines user roles, and clarifies ambiguities.
  2. Database Agent: Designs type-safe database schemas using Prisma ORM, validated by the Prisma compiler.
  3. Interface Agent: Creates RESTful APIs with OpenAPI 3.1 documentation, validated by an AutoBE-specific compiler.
  4. Test Agent: Writes E2E test code for normal, edge, and error cases, validated by the test runner.
  5. Realize Agent: Implements NestJS-based backend code with features like dependency injection, validated by TypeScript and NestJS compilers.

2.2. Technical Features

AutoBE’s AI-specific compilers validate syntax, logic, and functionality in real-time, providing detailed feedback to AI for code correction. These compilers are optimized for Prisma, OpenAPI, and test domains, ensuring consistency via structured AST-based code generation. The tech stack includes TypeScript, NestJS, Prisma ORM, and PostgreSQL/SQLite.

You can check each compiler’s AST structure on GitHub:

2.3. Live Demonstration

See AutoBE in action with fully functional backend applications:


Example Applications

How Simple Is It?

Create a discussion board with five natural language commands, generating a deployable backend in ~70 minutes.

[!TIP] For Hackathon Participants Please provide detailed requirements for better results. Avoid vague prompts like β€œdo everything.”

3. Eligibility

Who We’re Looking For

  • Experience: Developers or those majoring in related fields
  • Tech Stack: Experience with Node.js, Java, Python, or similar frameworks.
  • Database Skills: Relational database design beyond CRUD.
  • API Knowledge: RESTful API design experience.
  • English Proficiency: Conversational and technical reading skills.
  • Technical Setup: Laptop with Node.js, Git, and a code editor.

4. How to Participate

4.1. Registration

Apply via Google FormsΒ . Limited to 70 participants, first-come, first-served, by September 10, 2025.

4.2. Account Issuance

On September 12, participants will receive AutoBE access credentials and usage guides via email.

4.3. Hackathon Process

During the hackerthon on September 12–14, Participants log into the AutoBE platform with provided accounts and generate two backend applications using openai/gpt-4.1-mini and openai/gpt-4.1 with different themes. Record conversations, results, and issues.

4.4. Submission

Submit two separate reviews for each application by September 14, 2025, to GitHub DiscussionsΒ . Provide detailed, specific feedback. Further details will be provided via email.

5. Provided AI Models

5.1. openai/gpt-4.1-mini


Todo

openai/gpt-4.1-mini
Analyzeactors: 2, documents: 11
Analyze Phase
Token Usage: 457.8K
in: 390.1K / out: 67.8K
Function Calls: 23 / 23 (100.0%)
Databasenamespaces: 3, models: 14
Database Phase
Token Usage: 1.81M
in: 1.77M / out: 38.8K
Function Calls: 43 / 54 (79.6%)
Interfaceoperations: 35, schemas: 38
Interface Phase
Token Usage: 39.47M
in: 39.25M / out: 219.0K
Function Calls: 339 / 483 (70.2%)
Testfunctions: 22
Test Phase
Token Usage: 2.92M
in: 2.90M / out: 28.0K
Function Calls: 34 / 64 (53.1%)
Realizefunctions: 50, errors: 2
Realize Phase
Token Usage: 7.16M
in: 7.01M / out: 149.6K
Function Calls: 158 / 174 (90.8%)
βœ“Function Calling Success Rate
72.86%
⏱Elapsed Time
2h 30m 51s
🧠Total Tokens
96.48M
in: 95.63M (2.53M cached)
out: 856.8K

Bbs

openai/gpt-4.1-mini
Analyzeactors: 3, documents: 11
Analyze Phase
Token Usage: 644.3K
in: 561.7K / out: 82.6K
Function Calls: 23 / 29 (79.3%)
Databasenamespaces: 4, models: 8
Database Phase
Token Usage: 266.6K
in: 259.9K / out: 6.8K
Function Calls: 9 / 9 (100.0%)
Interfaceoperations: 48, schemas: 64
Interface Phase
Token Usage: 20.65M
in: 20.50M / out: 148.1K
Function Calls: 264 / 346 (76.3%)
Testfunctions: 52
Test Phase
Token Usage: 7.04M
in: 6.89M / out: 147.4K
Function Calls: 83 / 95 (87.4%)
Realizefunctions: 74
Realize Phase
Token Usage: 16.37M
in: 16.11M / out: 258.4K
Function Calls: 306 / 360 (85.0%)
βœ“Function Calling Success Rate
81.64%
⏱Elapsed Time
1h 44m 39s
🧠Total Tokens
44.97M
in: 44.33M (488.4K cached)
out: 643.3K

Reddit

openai/gpt-4.1-mini
Analyzeactors: 4, documents: 12
Analyze Phase
Token Usage: 568.0K
in: 511.5K / out: 56.5K
Function Calls: 25 / 25 (100.0%)
Databasenamespaces: 5, models: 17
Database Phase
Token Usage: 497.7K
in: 482.8K / out: 14.9K
Function Calls: 11 / 14 (78.6%)
Interfaceoperations: 105, schemas: 118
Interface Phase
Token Usage: 38.20M
in: 37.81M / out: 388.2K
Function Calls: 486 / 632 (76.9%)
Testfunctions: 94
Test Phase
Token Usage: 13.66M
in: 13.33M / out: 326.8K
Function Calls: 161 / 181 (89.0%)
Realizefunctions: 152
Realize Phase
Token Usage: 36.34M
in: 35.75M / out: 590.0K
Function Calls: 705 / 793 (88.9%)
βœ“Function Calling Success Rate
84.38%
⏱Elapsed Time
2h 41m 22s
🧠Total Tokens
89.27M
in: 87.90M (865.8K cached)
out: 1.38M

Shopping

openai/gpt-4.1-mini
Analyzeactors: 4, documents: 12
Analyze Phase
Token Usage: 628.5K
in: 539.9K / out: 88.6K
Function Calls: 25 / 25 (100.0%)
Databasenamespaces: 10, models: 40
Database Phase
Token Usage: 791.0K
in: 762.6K / out: 28.4K
Function Calls: 22 / 24 (91.7%)
Interfaceoperations: 211, schemas: 248
Interface Phase
Token Usage: 90.15M
in: 89.09M / out: 1.06M
Function Calls: 1114 / 1390 (80.1%)
Testfunctions: 177
Test Phase
Token Usage: 27.96M
in: 27.44M / out: 512.3K
Function Calls: 326 / 369 (88.3%)
Realizefunctions: 323
Realize Phase
Token Usage: 61.71M
in: 60.63M / out: 1.08M
Function Calls: 1199 / 1353 (88.6%)
βœ“Function Calling Success Rate
84.97%
⏱Elapsed Time
3h 11m 14s
🧠Total Tokens
181.24M
in: 178.46M (904.1K cached)
out: 2.78M

This model offers a cost-effective balance for generating small to medium backend applications (~20 tables, 150 API endpoints). It performs well for web services like community boards, blogs, or project management tools, supporting CRUD operations, user authentication, permission management, and file uploads. Its strengths are in requirements analysis and API design, producing clear specifications and clean, RESTful API structures, making it ideal for project initialization.

However, it may produce logical errors in complex business logic or fail to fully resolve compilation issues in E2E test code due to its lightweight design. We provide this model first to demonstrate the role of model capacity in code generation and to manage hackathon costs, as more powerful models are expensive. Developers often use it for initial setups, refining output with tools like Claude Code or GitHub Copilot for a cost-efficient workflow.

5.2. openai/gpt-4.1


Todo

openai/gpt-4.1
Analyzeactors: 1, documents: 11
Analyze Phase
Token Usage: 453.1K
in: 409.3K / out: 43.8K
Function Calls: 24 / 25 (96.0%)
Databasenamespaces: 3, models: 4
Database Phase
Token Usage: 266.5K
in: 260.4K / out: 6.2K
Function Calls: 7 / 8 (87.5%)
Interfaceoperations: 15, schemas: 21
Interface Phase
Token Usage: 4.79M
in: 4.75M / out: 44.2K
Function Calls: 78 / 89 (87.6%)
Testfunctions: 20
Test Phase
Token Usage: 2.15M
in: 2.11M / out: 37.1K
Function Calls: 30 / 30 (100.0%)
Realizefunctions: 23
Realize Phase
Token Usage: 1.85M
in: 1.82M / out: 28.5K
Function Calls: 48 / 49 (98.0%)
βœ“Function Calling Success Rate
93.03%
⏱Elapsed Time
49m 36s
🧠Total Tokens
9.51M
in: 9.35M (63.2K cached)
out: 159.8K

Bbs

openai/gpt-4.1
Analyzeactors: 2, documents: 11
Analyze Phase
Token Usage: 537.0K
in: 460.1K / out: 76.9K
Function Calls: 23 / 27 (85.2%)
Databasenamespaces: 6, models: 12
Database Phase
Token Usage: 477.0K
in: 462.9K / out: 14.1K
Function Calls: 13 / 14 (92.9%)
Interfaceoperations: 59, schemas: 63
Interface Phase
Token Usage: 17.12M
in: 16.84M / out: 281.2K
Function Calls: 260 / 294 (88.4%)
Testfunctions: 93
Test Phase
Token Usage: 9.83M
in: 9.67M / out: 162.3K
Function Calls: 127 / 131 (96.9%)
Realizefunctions: 82
Realize Phase
Token Usage: 7.16M
in: 7.03M / out: 133.9K
Function Calls: 164 / 175 (93.7%)
βœ“Function Calling Success Rate
91.58%
⏱Elapsed Time
1h 26m 29s
🧠Total Tokens
35.13M
in: 34.46M (237.7K cached)
out: 668.4K

Reddit

openai/gpt-4.1
Analyzeactors: 3, documents: 12
Analyze Phase
Token Usage: 664.6K
in: 601.7K / out: 62.9K
Function Calls: 25 / 25 (100.0%)
Databasenamespaces: 10, models: 56
Database Phase
Token Usage: 1.28M
in: 1.20M / out: 75.2K
Function Calls: 23 / 31 (74.2%)
Interfaceoperations: 245, schemas: 285
Interface Phase
Token Usage: 87.77M
in: 86.56M / out: 1.21M
Function Calls: 1108 / 1365 (81.2%)
Testfunctions: 257
Test Phase
Token Usage: 30.59M
in: 30.05M / out: 532.6K
Function Calls: 395 / 401 (98.5%)
Realizefunctions: 369
Realize Phase
Token Usage: 37.20M
in: 36.39M / out: 810.1K
Function Calls: 743 / 801 (92.8%)
βœ“Function Calling Success Rate
87.46%
⏱Elapsed Time
3h 21m 12s
🧠Total Tokens
157.50M
in: 154.80M (519.0K cached)
out: 2.69M

Shopping

openai/gpt-4.1
Analyzeactors: 3, documents: 12
Analyze Phase
Token Usage: 807.0K
in: 735.1K / out: 71.9K
Function Calls: 25 / 28 (89.3%)
Databasenamespaces: 10, models: 46
Database Phase
Token Usage: 1.13M
in: 1.06M / out: 68.1K
Function Calls: 23 / 28 (82.1%)
Interfaceoperations: 278, schemas: 255
Interface Phase
Token Usage: 83.01M
in: 81.73M / out: 1.28M
Function Calls: 1050 / 1304 (80.5%)
Testfunctions: 286
Test Phase
Token Usage: 35.19M
in: 34.55M / out: 642.6K
Function Calls: 448 / 452 (99.1%)
Realizefunctions: 390
Realize Phase
Token Usage: 47.06M
in: 46.00M / out: 1.06M
Function Calls: 885 / 966 (91.6%)
βœ“Function Calling Success Rate
87.51%
⏱Elapsed Time
3h 39m 17s
🧠Total Tokens
167.20M
in: 164.07M (316.2K cached)
out: 3.12M

Available after completing openai/gpt-4.1-mini review

This is the most advanced model, optimized for enterprise-grade backend applications (>500 APIs, 1,000 test scenarios). It excels at understanding complex requirements, inferring implicit needs, and implementing advanced features like real-time notifications, complex permissions, transaction processing, and caching. AutoBE achieves a 100% build success rate with this model, producing production-ready code with no compilation errors.

Generating an e-commerce platform costs ~$300–400 (150M tokens), so access is restricted to manage expenses. Completing the gpt-4.1-mini review unlocks free access, providing insight into how model capacity impacts code quality. This ensures participants can explore its full potential without cost concerns.

5.3. qwen/qwen3-next-80b-a3b


Todo

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 3, documents: 12
Analyze Phase
Token Usage: 1.09M
in: 930.7K / out: 154.3K
Function Calls: 31 / 42 (73.8%)
Databasenamespaces: 3, models: 13
Database Phase
Token Usage: 1.57M
in: 1.54M / out: 38.0K
Function Calls: 40 / 43 (93.0%)
Interfaceoperations: 18, schemas: 19
Interface Phase
Token Usage: 28.01M
in: 27.86M / out: 146.6K
Function Calls: 159 / 271 (58.7%)
Testfunctions: 19
Test Phase
Token Usage: 2.61M
in: 2.58M / out: 26.4K
Function Calls: 40 / 52 (76.9%)
Realizefunctions: 22, errors: 3
Realize Phase
Token Usage: 10.24M
in: 10.06M / out: 171.7K
Function Calls: 135 / 187 (72.2%)
βœ“Function Calling Success Rate
67.30%
⏱Elapsed Time
1h 47m 45s
🧠Total Tokens
76.79M
in: 75.88M (16.4K cached)
out: 902.4K

Bbs

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 2, documents: 11
Analyze Phase
Token Usage: 1.05M
in: 924.7K / out: 126.4K
Function Calls: 26 / 42 (61.9%)
Databasenamespaces: 9, models: 53
Database Phase
Token Usage: 6.42M
in: 6.26M / out: 155.0K
Function Calls: 156 / 167 (93.4%)
Interfaceoperations: 293, schemas: 297
Interface Phase
Token Usage: 352.91M
in: 349.25M / out: 3.65M
Function Calls: 2786 / 4027 (69.2%)
Testfunctions: 169
Test Phase
Token Usage: 138.02M
in: 136.58M / out: 1.44M
Function Calls: 574 / 2217 (25.9%)
Realizefunctions: 110, errors: 30
Realize Phase
Token Usage: 60.27M
in: 58.67M / out: 1.60M
Function Calls: 413 / 911 (45.3%)
βœ“Function Calling Success Rate
53.71%
⏱Elapsed Time
6h 51m 42s
🧠Total Tokens
558.67M
in: 551.70M (16.4K cached)
out: 6.97M

Reddit

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 3, documents: 11
Analyze Phase
Token Usage: 1.46M
in: 1.31M / out: 150.8K
Function Calls: 27 / 54 (50.0%)
Databasenamespaces: 9, models: 90
Database Phase
Token Usage: 10.70M
in: 10.36M / out: 344.5K
Function Calls: 228 / 265 (86.0%)
Interfaceoperations: 507, schemas: 515
Interface Phase
Token Usage: 741.29M
in: 734.04M / out: 7.25M
Function Calls: 4992 / 7522 (66.4%)
Testfunctions: 781
Test Phase
Token Usage: 435.87M
in: 423.32M / out: 12.55M
Function Calls: 4343 / 4736 (91.7%)
Realize
Realize Phase
Token Usage: 60.87M
in: 59.74M / out: 1.13M
Function Calls: 486 / 1037 (46.9%)
βœ“Function Calling Success Rate
74.01%
⏱Elapsed Time
6h 12m 6s
🧠Total Tokens
1250.18M
in: 1228.76M (0 cached)
out: 21.43M

Shopping

qwen/qwen3-next-80b-a3b-instruct
Analyzeactors: 3, documents: 12
Analyze Phase
Token Usage: 1.81M
in: 1.55M / out: 263.2K
Function Calls: 33 / 54 (61.1%)
Databasenamespaces: 11, models: 100
Database Phase
Token Usage: 11.77M
in: 11.42M / out: 349.7K
Function Calls: 253 / 269 (94.1%)
Interfaceoperations: 560, schemas: 641
Interface Phase
Token Usage: 1001.00M
in: 992.30M / out: 8.70M
Function Calls: 5840 / 9409 (62.1%)
Testfunctions: 557
Test Phase
Token Usage: 495.09M
in: 486.86M / out: 8.23M
Function Calls: 3032 / 5141 (59.0%)
Realizefunctions: 241, errors: 94
Realize Phase
Token Usage: 134.35M
in: 131.08M / out: 3.27M
Function Calls: 957 / 1987 (48.2%)
βœ“Function Calling Success Rate
59.99%
⏱Elapsed Time
14h 55m 1s
🧠Total Tokens
1644.03M
in: 1623.21M (99.2K cached)
out: 20.82M

Optional - Just for Fun! This model is NOT required for the hackathon. It’s included purely for fun and for those curious about local LLM performance!

This lightweight, open-source model runs on laptop-level resources and is included to explore local LLM performance. It’s suitable for small apps (5–10 tables, 20 APIs) like todo lists or simple accounting tools, handling basic CRUD operations and straightforward logic. However, it struggles with complex requirements and often fails to resolve compilation errors, leading to process interruptions. This model offers a fun way to compare open-source and commercial models, highlighting their performance gap.

6. Evaluation Criteria

6.1. Requirements Analysis

  • Accuracy: Are requirements clearly understood and prioritized?
  • User Personas: Are roles and permissions logical?
  • Non-functional Needs: Are performance, security, and scalability covered?
  • Document Quality: Is it clear and detailed for development?

6.2. Database Design

  • Production-Readiness: Are table relationships logical, without issues?
  • Normalization: Is it balanced for integrity and performance?
  • Keys & Indexing: Are keys and indexes set for efficiency?
  • Details: Are naming, data types, and scalability appropriate?

6.3. API Design

  • RESTful Compliance: Are methods, URIs, and status codes correct?
  • Consistency: Are endpoints and formats unified?
  • Documentation: Are OpenAPI specs clear with examples?
  • Security: Is authentication and data protection adequate?

6.4. Test Code

  • Validation: Does it test business logic effectively?
  • Completeness: Are normal, edge, and exception cases included?
  • Quality: Are tests clear, independent, and easy to debug?

6.5. Implementation Code

  • Quality: Is it readable, modular, and SOLID-compliant?
  • Architecture: Is it extensible with clear layer separation?
  • Performance: Are queries efficient, avoiding N+1 issues?
  • Security & Types: Are vulnerabilities absent and types used well?

6.6. Overall Review Writing Guide

  • AutoBE Assessment: Strengths, weaknesses, and suitable projects?
  • Impact: Saves time? Code quality level? Maintainable?
  • Improvements: Specific areas and priorities for enhancement.
  • Further instructions regarding the Review Writing Guide will be provided via email.

7. Prizes and Benefits

  • Grand Prize (1 person): $2,000 for the best review.
  • Excellence Award (1 person): $1,000 for the second-best review.
  • Participation Prize: $50 for all who submit detailed reviews for both models.
  • Exclusions: AI-generated, perfunctory, or plagiarized reviews.
  • Judging: By AutoBE team and experts, announced September 17, 2025.

8. Disclaimer

8.1. Beta Limitations

AutoBE is in beta and may have inefficiencies or errors. These are not bugs but part of its development stage.

8.2. Code Usage

Generated code isn’t recommended for production without review and audit. Wrtn Technologies isn’t liable for issues.

8.3. Open Source

Reviews and generated code are public on GitHub Discussions. Avoid sensitive information in conversations or projects.

Last updated on