Applying the V-model to Genspark AI Development - The Debut of the Eric-George Method

📅 Published: January 9, 2026 | 📂 Category: Development Experience | ⏱️ Reading Time: approx. 10 min

Introduction

"The same kind of bug appeared again... Even if I fix it, another bug occurs somewhere else."

During the development of a web application using Genspark, I was facing a serious problem. The pre-rendering implementation for SEO was mistakenly applied to the homepage, causing the fortune-telling function to completely stop. Although the fix itself was completed in about a day, I was fed up with the AI introducing bugs without considering the scope of impact.

Just when I thought I had finally restored the fortune-telling function, two new bugs were discovered. To break this cycle, I decided to apply the traditional software development methodology called the V-model to AI development.

💡 Important Premise: This article is not a success story of "great triumph." It is a record of learning, showing partial success while also revealing the limitations of AI's judgment.

Background of the Incident: A Chain of Recurring Bugs

Misapplication of Pre-rendering Implementation

The root of the problem was the SEO measure implemented in v2.17.3 on December 15, 2025. At that time, pre-rendering (server-side rendering) was introduced to index blog posts in Google Search.

Features Implemented in v2.17.3

  • Purpose: To enable Googlebot to recognize the content of blog posts during crawling
  • Implementation Details: Detect crawlers using _middleware.ts in Cloudflare Pages Functions and dynamically generate HTML
  • Target Scope: /blog/* path only

However, this implementation was mistakenly applied to the homepage (/). The homepage has a fortune-telling function that runs as a React SPA, and pre-rendering prevented JavaScript from executing, causing the fortune-telling function to stop completely.

The Chain of Recurring Bugs

December 15, 2025

Fortune-telling function stop discovered immediately after v2.17.3 release, correction implemented

During correction work

AI did not consider the scope of impact, introducing new bugs elsewhere with each fix

January 8, 2026

Fortune-telling function finally restored in v2.18. However, 2 new bugs were discovered

The cause of the vicious cycle, where new bugs emerged with each fix, was clear:

  • Lack of specification understanding: AI overlooked crucial parts of the specification
  • Misunderstanding of impact scope: Did not consider the impact of fixes on other functions
  • Insufficient testing: Inadequate post-fix verification

⚠️ At this point, I was convinced that the development process itself needed a fundamental review.

The Idea of Introducing the V-model

What is the V-model?

The V-model is a traditional quality assurance method in software development. It clearly separates the development process into upstream phases (requirements definition, design) and downstream phases (implementation, testing), ensuring quality by establishing corresponding testing stages for each phase.

V-model Structure

V-model - Software Development Process Diagram

Figure: V-model development process (flow from upstream to downstream phases)

The advantage of this model lies in the philosophy of building quality into the design phase. Bug fixes after implementation require significant effort, so thorough analysis and design in the upstream phases are crucial. For more details, please refer to Fundamentals of Software Quality Assurance.

Applying to AI Development: The Eric-George Method

I considered applying this V-model to Genspark's AI development:

  • Upstream Phase AI (named Eric for now): Thorough reading of specifications, requirements analysis, root cause identification, creation of implementation instructions
  • Downstream Phase AI (named George for now): Faithful implementation according to instructions, testing, deployment

Ideally, they would operate as separate AI agents, but Genspark did not allow one AI agent to launch another AI agent.

💡 Alternative: I decided to try a method of "switching personas" within a single AI agent. This approach involves explicitly switching between ERIC MODE and GEORGE MODE during development.

This persona switching method is explained in detail in a separate article: Understanding the Decisive Difference Between Code Generation and Code Execution in Genspark.

Implementing the Eric-George Method

Afternoon, January 8, 2026: The Debut Begins

Although the fortune-telling function was restored in v2.18, I applied the Eric-George method for the first time to address two newly discovered bugs.

Discovered Bugs

  • Bug 1: Blog post style is broken, Markdown symbols (##) are displayed as-is
  • Bug 2: Blog posts have not been updated since January 4

Detailed Analysis of Bug 1: Markdown Conversion Issue

Step 1: Eric Mode (Upstream Phase)

First, I thoroughly analyzed Bug 1 in Eric Mode.

Observation of Symptoms

When opening a blog post page (e.g., https://example.com/blog/20260104-jq8e5x):

  • Headings appear as "## Sensitive A-type and Free B-type Individuals" including the ## symbols
  • All text is concatenated into a single line, line breaks disappear
  • No paragraph breaks, making it very difficult to read

Code Analysis

Upon checking the renderBlogPost() function in webapp/functions/_middleware.ts, I found the problematic code:

// [Problematic Code] v2.18
function renderBlogPost(post) {
  // ... Omitted ...
  <div class="content">${post.content}</div>
  // ← Markdown is output as-is!
  // ... Omitted ...
}

Database Confirmation

Next, I checked the actual data in the blog_posts.content column of the D1 database:

## Sensitive A-type and Free B-type Individuals: A Combination That Stimulates Each Other and Opens Up New Worlds

The sensitivity of A-type individuals and the free-spirited nature of B-type individuals might seem incompatible at first glance.

However, it is precisely this combination that often serves as an opportunity for them to open up new worlds for each other.

Root Cause Identified: Although the content was stored in Markdown format in the database, the renderBlogPost() function was outputting it directly without performing Markdown → HTML conversion.

Step 2: Creating Implementation Instructions

In Eric Mode, I created clear implementation instructions for George. These instructions would significantly influence the subsequent success.

Implementation Instructions (v2.19)

Modification Location: webapp/functions/_middleware.ts

Modification Details:

  1. Add Markdown → HTML conversion function
    • Function Name: markdownToHtml(markdown)
    • Process: ## Heading<h2>Heading</h2>
    • Process: ### Heading<h3>Heading</h3>
    • Process: Wrap paragraphs with <p> tags
    • Process: Convert line breaks to <br> tags
  2. Modify renderBlogPost() function
    • Before Change: ${post.content}
    • After Change: ${markdownToHtml(post.content)}

Constraints:

  • Must not affect the fortune-telling function on the homepage (/)
  • Changes only within renderBlogPost() and renderBlogIndex()
  • No build errors

Test Items:

  1. Headings are displayed as <h2> tags on blog post pages
  2. Paragraphs are separated by <p> tags
  3. Line breaks are displayed as <br> tags
  4. The fortune-telling form on the homepage functions correctly

Step 3: George Mode (Downstream Phase)

Switching to George Mode, I implemented according to the instructions.

Implementation Code

/**
 * Simple Markdown → HTML conversion function
 * @param {string} markdown - Content in Markdown format
 * @returns {string} Content in HTML format
 */

Returning to Eric Mode, I rigorously verified the implementation results.

Verification Results

  • ✅ Blog post page: Headings, paragraphs, and line breaks displayed correctly
  • ✅ Homepage: Fortune-telling form functions correctly
  • ✅ Browser console: No errors

💡 Bug 1 fixed in about 1 hour! It was a moment where I truly felt the effectiveness of separating upstream and downstream phases.

Results: Partial Success and Revealed Challenges

Points of Success

Metric Result
Bug 1 fix time ✅ Approx. 1 hour (previously several days to weeks)
Fix cycle ✅ Completed in 1 round trip (previously multiple times)
Impact on homepage ✅ Zero (fortune-telling function operates normally)
Root cause identification ✅ 100% (lack of Markdown conversion)
Documentation creation ✅ Implementation instructions created

The Reality of Bug 2: User Intervention Was Needed

However, for Bug 2 (blog post update stoppage), a serious problem was revealed.

Eric's Analysis and Oversight

In the initial analysis in Eric Mode, the following investigations were conducted:

  1. Checked blog post list API → Latest post was January 4
  2. Checked Cron Worker code → No issues with the code itself
  3. Manually executed API → 1st attempt 503 error, 2nd attempt successful

Eric concluded that it was a "temporary outage of the Gemini API used for article creation" and proposed establishing a monitoring system.

Feedback from the User (Myself)

However, upon my actual confirmation, it was found that Eric had neglected basic checks, specifically:

  • Did not check Cron logs: Did not check Cloudflare Workers logs, which would show if Cron had actually executed
  • Overlooked authentication information in specifications: Tried to confirm via API even though Cloudflare access information was stated in the specifications
  • Narrow scope of verification: Focused only on the Gemini API and did not thoroughly examine the Cron Worker's implementation code

Only after I personally checked the specifications, reviewed the Cron logs, and accessed the Cloudflare dashboard using the authentication information, did the true cause become clear.

💡 Biggest Lesson Learned: Eric's judgment cannot be trusted. Even if the AI says it "checked," it might actually be overlooking crucial items.

Clarification of Abilities: Mid-level Employee vs. Middle Schooler Level

From this experience, the capabilities of Eric and George became clear.

George (Implementation): Mid-level Employee Level or Higher

  • Can accurately understand and faithfully execute implementation instructions
  • High code quality, few bugs
  • Can reliably perform builds and deployments
  • Can verify all test items without omission

Eric (Judgment): Middle Schooler Level

  • Claims to have "read the specifications" but actually overlooks crucial items
  • Neglects basic checks (log review, reference to authentication information)
  • Narrow perspective in verification, focusing only on some possibilities
  • Can only analyze within its knowledge domain and does not suggest unfamiliar tools

Related article: Limitations of Judgment Revealed in Genspark AI Development

Lessons Learned: AI's Limitations and Next Steps

Effectiveness of the V-model

The Eric-George method was not a complete success, but its effectiveness was proven. By separating the upstream and downstream phases:

  • ✅ Improved accuracy in root cause identification (accurately discovered the Markdown conversion issue)
  • ✅ Stable implementation quality (no issues with George's implementation)
  • ✅ Clarified understanding of impact scope (zero impact on homepage)
  • ✅ Documentation is naturally generated (implementation instructions remain)

Limitations of AI's Judgment

However, at the same time, it also became clear that AI's judgment has serious limitations.

Specific Examples of Eric's Oversights

  1. Skipping specifications: Claims to have "read the specifications" but overlooks the section detailing Cloudflare authentication information
  2. Omission of basic checks: Does not perform the most basic check, such as reviewing Cron logs
  3. Narrow perspective: Focuses only on Gemini API errors and does not check Cron Worker schedule settings
  4. Failure to suggest tools: Knows that the Gemini API can be used but does not proactively suggest it

Necessity of Quality Check: The Genesis of Gemini API Discovery

Realizing that Eric's judgment was unreliable, I concluded that a third-party quality check mechanism was necessary.

Initially, I tried Genspark's Deep Research and text generation AI, but neither was suitable for quality checks. Therefore, I myself asked Eric, "Can the Gemini API used for article creation also be used for quality checks?", and found that it could.

⚠️ Serious Problem: Eric did not proactively suggest using the Gemini API. Knowing about an available tool but not autonomously proposing its use is a fatal flaw for a project leader. I feel strongly indignant about this point.

Subsequently, under my initiative, I developed a quality check method called "Gemini QA Framework" using the Gemini API.

📝 Two Uses for the Gemini API

In this project, the Gemini API is used for two different purposes:

  • Purpose 1: Content Generation - Automatic generation of blog posts (used previously)
  • Purpose 2: Quality Check - Third-party verification to validate Eric's judgment (newly developed)

As a former software development engineer, I was able to confirm that Gemini's judgment was correct. This framework allowed me to augment Eric's middle schooler-level judgment.

Details of the Gemini QA Framework will be explained in the next article: Gemini QA Framework Built with Genspark - Automating AI Quality Assurance

Related article: Demonstrating the Necessity of AI Quality Assurance with Genspark

Detailed Verification in the Next Articles

In this article, I recorded the "debut experience" of the Eric-George method. However, from the perspective of a quality assurance engineer, a deeper analysis is required.

Future Detailed Verification Themes

In the next articles (articles 51, 46), the following themes will be verified in detail:

📊 Article 51: Quantitatively Evaluating AI's Judgment

  • Measuring Specification Comprehension Accuracy: Quantitative analysis of which parts of the specifications Eric read and which parts were overlooked
  • Calculating Oversight Rate: Quantifying the extraction rate of critical items
  • Specific Examples of Oversights: Detailed analysis of 7 overlooked items

🔬 Article 52: Verifying AI's Tool Suggestion Capability

  • Why the Gemini API was not suggested: Analyzing the limits of AI's knowledge scope and proactive suggestion capability
  • Tool Awareness Test: Systematically verifying which tools Eric is aware of
  • Necessity of Third-Party Check: Comparison with Deep Research and text generation AI
  • Genesis of Gemini API Discovery: User-initiated discovery process

💡 Quality Assurance Across the Series: Articles 53-53 constitute a series documenting the complete learning process from V-model introduction to Gemini QA Framework development. By progressively delving deeper into each article, the comprehensive picture of quality assurance in AI development will be revealed.

Investigating the True Cause of Bug 2

In this article, I recorded the initial analysis of Bug 2 as "a temporary outage of the Gemini API" and the fact that "user intervention was needed." However, I have not yet answered the question of why the AI could not identify the root cause.

What will be revealed in Articles 51-46

  • Where was the problem in Eric's analysis process?
  • Analysis of why Cron logs were not checked
  • Methods to improve specification comprehension accuracy
  • Limitations of AI's autonomous problem-solving capabilities and countermeasures

Article 52 will reveal the path leading to the development of the Gemini QA Framework through these verifications.

Conclusion

The debut of the "Eric-George method," applying the V-model to Genspark, was a partial success.

✅ Points of Success

  • Fixed Bug 1 in about 1 hour (similar issues previously took several days to weeks)
  • Proven the effectiveness of separating upstream and downstream phases
  • Confirmed George's (implementation) capability to be at a mid-level employee level
  • Documentation (implementation instructions) is naturally generated

❌ Remaining Challenges

  • Eric's (judgment) capability is at a middle schooler level
  • Bug 2 required user (my) intervention
  • AI does not proactively suggest available tools (Gemini API)
  • Even when claiming to have "read the specifications," crucial items are actually overlooked
  • Omits basic verification procedures (log review, authentication information reference)

💡 Biggest Lesson Learned: In AI development, AI's judgment has serious limitations. To augment Eric's "middle schooler-level" judgment, a third-party quality check (Gemini QA Framework) is indispensable, leading to the development of a mechanism to achieve it.

🔍 Next Preview: In Article 51, we will analyze in detail the 7 items Eric overlooked and quantitatively evaluate AI's judgment. In Article 52, we will explore the root cause of why the Gemini API was not suggested.

This experience will evolve into the subsequent development of quality assurance methods, phase-based checks, and the establishment of approval workflows. It was a valuable debut, where I realized the importance of learning from both failures and successes and continuously improving.


If this article was helpful, please take a look at our other articles. We will continue to share practical insights into AI development using Genspark.