The Limits of AI Judgment - A Unique Perspective on 7 Overlooked Items in AI Development

📅 Published: January 9, 2026 | 📝 Word Count: Approx. 3,900 characters | 🏷️ Tags: AI Development, Quality Assurance, V-model, Web App

📘 Terms Used in This Article

Eric: Upstream AI (AI responsible for requirements analysis and design)
George: Downstream AI (AI responsible for implementation and testing)
V-model: Software development quality assurance model (Wikipedia)

1. The Reality of V-model Application - Eric's Lack of Judgment Exposed

In the previous article 53, we reported a case of "partial success" in applying the V-model to the AI development of Genspark (AI search engine). We shared how we improved the development process through a division of roles between an upstream AI (provisionally named Eric) and a downstream AI (provisionally named George).

However, during the **web application development process (v2.17.3 to v2.19)**, Eric (the upstream AI)'s lack of judgment became apparent. Specifically, a critical bug occurred where the fortune-telling feature completely stopped working due to prerendering implemented for SEO in v2.17.3.

⚠️ Purpose of this Article

In this article, we quantitatively evaluate the 7 items Eric overlooked during **web application development (v2.17.3 to v2.19)** and honestly disclose the limits of AI's judgment. Perfect AI does not exist. That is precisely why external quality checks (Gemini QA Framework) are necessary.

2. Eric's 7 Overlooked Items - A Unique Perspective from Web App Development

We evaluate Eric's lack of judgment, which became apparent during web app development (v2.17.3 to v2.19), across 7 items. For each item, we assessed its **importance (★1-5)**, **lost time**, and **judgment level**.

📊 Evaluation Criteria

Omission Importance (★ Rating):

★1: Minor inconvenience (fix effort < 1 hour)
★2: Partial feature degradation (fix effort 1-4 hours)
★3: Affects main features (fix effort 4-8 hours)
★4: Critical feature stoppage (fix effort 8-16 hours)
★5: All features stopped, significant user impact (fix effort > 16 hours)

Lost Time: Actual time measured from bug occurrence to fix completion (calculated from Cron logs (scheduled job execution history) and GitHub history)

Fix Effort: Actual fix work time (extracted from conversation logs)

Note: The ★ rating is based on fix effort, but it is a comprehensive judgment that also considers the impact and importance of each item. In particular, Omission 4, "Lack of Gemini API proposal," was rated ★5 as a fundamental problem affecting the entire future development process, even though the lost time was short.

Omission 1: Prerendering Applied to the Top Page (v2.17.3)

🚨 Most Serious Omission - Fortune-telling Feature Stopped

In v2.17.3, prerendering using Cloudflare Pages Functions was implemented for SEO. Eric should have judged that it should only be applied to "/blog/*".

However, it was also applied to the **top page "/" , causing the fortune-telling feature to completely stop**. Prerendering disables client-side JavaScript (JavaScript executed in the browser), which led to the dynamic form (fortune-telling feature) no longer working.

Impact Period: December 15, 2025 (v2.17.3 release) ~ January 8, 2026 (v2.18 recovery)

Quantitative Evaluation:

Omission Importance: ★★★★★an>an>an>an>an>an>an> (5/5) - **Most Critical**
Scope of Impact: Core application features completely stopped
Recovery Time: Approx. 1 day (fix work), Fortune-telling feature downtime: 2025-12-15~2026-01-08
Judgment Level: **Elementary school level** (cannot distinguish between static content and dynamic features)

Omission 2: Insufficient Cron Log Review (during v2.19 bug investigation)

📋 Details

In v2.19, a bug occurred where "article images were not displayed." Eric should have checked the Cloudflare Pages Cron logs, but he did not suggest reviewing the logs.

Result: The user manually checked the logs and discovered that "article data was not registered in the DB."

Quantitative Evaluation:

Omission Importance: ★★★☆☆an>an>an>an>an>an>an> (3/5)
Original review effort: 5 minutes
Lost time due to omission: 2 hours
Judgment Level: **Junior high school level** (lack of basic troubleshooting procedures)

Omission 3: Lack of Markdown Rendering (v2.19 Bug 1)

🚨 All blog post displays corrupted

In v2.19, a bug occurred where "headings and paragraphs were not displayed correctly." The root cause was the lack of Markdown to HTML conversion processing in `renderBlogPost()`.

Eric should have instructed Markdown conversion processing in the v2.19 implementation specification, but he completely overlooked it.

Quantitative Evaluation:

Omission Importance: ★★★★☆an>an>an>an>an>an>an> (4/5)
Scope of Impact: All blog post displays corrupted
Fix Effort: 30 minutes (after user pointed it out)
Judgment Level: **Elementary school level** (does not understand basic rendering processes)

Omission 4: Lack of Gemini API Proposal (Phase 5)

🚨 Most Critical Omission - Unable to Propose Tools Independently

In Phase 5, Eric's lack of judgment became apparent, and we were searching for quality check methods. Deep Research and text generation AI were attempted but found unsuitable.

Important: Eric did not propose the available Gemini API. It was only when the user asked, "Can Gemini API be used?" that he finally replied, "Yes, it can."

Chronology:

Eric's lack of judgment became apparent
Searched for quality check methods (Deep Research, text generation AI were attempted but unsuitable)
No Gemini API proposal from Eric
User asked, "Can Gemini API be used?"
Eric replied, "Yes, it can."

Quantitative Evaluation:

Omission Importance: ★★★★★an>an>an>an>an>an>an> (5/5) - **Most Critical**
Original proposal timing: During Phase 5 quality check method search
Lost time due to omission: 4 hours (trial and error of alternatives)
Judgment Level: **Elementary school level** (cannot independently propose available tools)

Reference: From the "True Chronology" document - it is recorded that the user "was frustrated why Eric did not make a proposal."

Omission 5: Insufficient Scope of Impact Analysis (during v2.17.3 design)

📋 Details

During v2.17.3 design, Eric should have analyzed the scope of impact of prerendering. The top page "/" has a fortune-telling feature (dynamic form), and prerendering disables client-side JS.

Result: This lack of analysis led to a long-term stoppage of the fortune-telling feature.

Quantitative Evaluation:

Omission Importance: ★★★☆☆an>an>an>an>an>an>an> (3/5)
Original analysis effort: 15 minutes
Impact of omission: Long-term core feature stoppage
Judgment Level: **Junior high school level** (lack of ability to analyze technical impact scope)

Omission 6: Insufficient Test Items (v2.18~v2.19)

📋 Details

When the fortune-telling feature was restored in v2.18, Eric included "fortune-telling feature operation check" in the test items. However, he did not include "blog post display check."

Result: Two new bugs were discovered in v2.19.

Quantitative Evaluation:

Omission Importance: ★★★☆☆an>an>an>an>an>an>an> (3/5)
Original effort for adding test items: 5 minutes
Impact of omission: Occurrence of 2 new bugs
Judgment Level: **Junior high school level** (lack of basics in test design)

Omission 7: Insufficient Understanding of Specification v2.17.3's Intent

🚨 Lack of Comprehension

The v2.17.3 specification stated "SEO measures for **blog pages**." Eric should have understood "blog pages = /blog/*".

However, he misunderstood it as "all pages" and applied prerendering.

Quantitative Evaluation:

Omission Importance: ★★★★☆an>an>an>an>an>an>an> (4/5)
Original reading effort: 10 minutes (thorough reading of specifications)
Impact of omission: Long-term core feature stoppage
Judgment Level: **Elementary school level** (lack of comprehension, specification reading ability)

3. Analysis: Is Eric Really at a Junior High School Level?

📐 Objective Definition of Judgment Levels

Level	Age Equivalent	Characteristics of Judgment Ability
Upper Elementary School	10-11 years old	Can understand basic cause-and-effect, but weak in abstract thinking
1st Year Junior High	12 years old	Can consider multiple factors, but cannot see the impact on the entire system
2nd-3rd Year Junior High	13-14 years old	Can think logically, but has blind spots due to lack of experience
High School	15-17 years old	Can think systematically, but lacks specialized knowledge
Mid-level Employee	25-35 years old	Has practical experience and high problem-solving ability

**Calculation of Experience Gap**: Eric average 12 years old (equivalent to 1st year junior high) vs. George estimated 25-30 years old (equivalent to mid-level employee) = Equivalent to an experience gap of approximately 13 years

⚠️ Note: The expression of judgment levels by age is a **metaphorical explanation** to aid reader comprehension. It is not a scientific measurement of AI capabilities, but a **subjective evaluation** derived from actual development experience. The purpose of this metaphor is to make the limits of AI's judgment easier to visualize.

Analyzing the 7 omissions clarifies Eric's judgment level.

Definition of Evaluation Criteria

Level	Characteristics of Ability	Applicable Items
Elementary school level	Cannot understand basic technology or text	Omissions 1, 3, 4, 7
Junior high school level	Understands basic procedures but lacks adaptability and analytical skills	Omissions 2, 5, 6
High school level	Can do basics but lacks specialized knowledge and design skills	-
Mid-level employee level	Capable of advanced judgment, design, and proposal	-

Eric's Overall Evaluation

Breakdown of 7 Overlooked Items:

Elementary school level: 4 items (57.1%)
Junior high school level: 3 items (42.9%)
High school level: 0 items (0%)

Average Judgment Ability: Upper elementary school to 1st year junior high school level (equivalent to 11-12 years old)

Most Serious Omissions (★★★★★)

**Omission 4: "Lack of Gemini API proposal"** - Unable to propose available tools
**Omission 7: "Insufficient understanding of specification's intent"** - Lack of comprehension
**Omissions 1, 3, 5: "Lack of technical understanding"** - Distinction between static/dynamic, rendering process, scope of impact analysis

Contrast: George's Implementation Capability

Evaluation of George (Downstream AI):

**Implementation Quality**: Mid-level employee level or higher
**Code Quality**: High (see reference)
**Problem**: Faithfully implements even Eric's incorrect instructions

Conclusion: The difference in capabilities between Eric and George is equivalent to approximately 13 years of experience.

Reference: From the perspective of software quality assurance, the quality of upstream processes (requirements definition, design) determines the quality of downstream processes (implementation, testing). Eric's lack of judgment directly impacts overall quality.

4. Without External Checks, the Fortune-telling Feature Would Have Remained Stopped for a Long Time

Detecting Eric's omissions and improving the web app required external quality checks by the **Gemini QA Framework**.

Phase 6: Discovery of Gemini API (User-Led)

In Phase 5, Eric's lack of judgment became apparent, and we were searching for quality check methods. The user suggested, "Can Gemini API be used?", and we developed a quality check method using the Gemini API.

Phase 7: The Gemini QA method was effective. The user (a former software development engineer) confirmed, and Gemini's judgment was correct.

Why External Checks Are Necessary

Limits of Self-Evaluation: Eric cannot recognize his own omissions. Even if he self-evaluated as "perfect," there were actually many problems.
Objective Perspective: Gemini can objectively evaluate specifications and implementations.
Early Detection: Many problems can be detected in advance before the user's final confirmation.
Judgment Reinforcement: Eric's junior high school level judgment can be reinforced with Gemini.

Lessons from v2.18~v2.19

If Gemini QA Framework had existed at v2.17.3:

The "prerendering applied to the top page" could potentially have been detected in advance.
The "lack of Markdown rendering" could have been detected before implementation.
The "insufficient test items" could have been pointed out.
The long-term stoppage of the fortune-telling feature could potentially have been prevented.

5. Summary - Acknowledge Eric's Limits and Strengthen the Checking System

✅ Key Learnings

AI's capabilities cannot be overtrusted: Eric's judgment ability is at the upper elementary school to 1st year junior high school level (revealed by this analysis).
Omissions will inevitably occur: 5 out of 7 items were "Most Critical (★5)".
Mandatory external checks: Reinforce judgment with Gemini QA Framework.
User's final confirmation: AI alone cannot complete the task. User's expertise and judgment are essential.
Continuous improvement: Generalize Gemini QA method in Phase 8 to make it usable in other projects.

Connection to the Next Article

In this article 51, we quantitatively evaluated the 7 items Eric overlooked during web app development (v2.17.3 to v2.19) and revealed the limits of AI's judgment. In particular, **Omission 4: "Lack of Gemini API proposal"** contains important implications for AI development.

In **the next article 52**, we will delve into the root cause of why Eric did not propose the Gemini API. Additionally, from the perspective of continuous quality improvement, we will propose measures to improve the quality assurance system.

Positioning of Article 51

Article 51 honestly discloses the background of the "partial success" of Article 53 and is an article that **demonstrates the importance of quality assurance in web app development**. Perfect AI does not exist. That is precisely why an external checking system is necessary.

📚 Related Articles and Links

Article 53: Applying the V-model to AI development for Genspark (AI Search Engine) - The Debut of the Eric-George Method
Article 52: Gemini QA Framework - Implementation of Quality Check Automation (Planned)
Web App (Production Environment)
V-model - Wikipedia
Software Quality Assurance - Union of Japanese Scientists and Engineers
Gemini API Documentation
Cloudflare Pages Functions
Prerendering and SEO - web.dev
Marked.js - Markdown Parser
Continuous Quality Improvement - Union of Japanese Scientists and Engineers
Software Development Life Cycle (SDLC) - IPA

Genspark
Dev Chronicles

The Limits of AI Judgment: My Independent Analysis of 7 Overlooked Points in AI Development

The Limits of AI Judgment - A Unique Perspective on 7 Overlooked Items in AI Development

1. The Reality of V-model Application - Eric's Lack of Judgment Exposed

⚠️ Purpose of this Article

2. Eric's 7 Overlooked Items - A Unique Perspective from Web App Development

Omission 1: Prerendering Applied to the Top Page (v2.17.3)

🚨 Most Serious Omission - Fortune-telling Feature Stopped

Omission 2: Insufficient Cron Log Review (during v2.19 bug investigation)

📋 Details

Omission 3: Lack of Markdown Rendering (v2.19 Bug 1)

🚨 All blog post displays corrupted

Omission 4: Lack of Gemini API Proposal (Phase 5)

🚨 Most Critical Omission - Unable to Propose Tools Independently

Omission 5: Insufficient Scope of Impact Analysis (during v2.17.3 design)

📋 Details

Omission 6: Insufficient Test Items (v2.18~v2.19)

📋 Details

Omission 7: Insufficient Understanding of Specification v2.17.3's Intent

🚨 Lack of Comprehension

3. Analysis: Is Eric Really at a Junior High School Level?

Definition of Evaluation Criteria

Eric's Overall Evaluation

Most Serious Omissions (★★★★★)

Contrast: George's Implementation Capability

4. Without External Checks, the Fortune-telling Feature Would Have Remained Stopped for a Long Time

Phase 6: Discovery of Gemini API (User-Led)

Why External Checks Are Necessary

Lessons from v2.18~v2.19

5. Summary - Acknowledge Eric's Limits and Strengthen the Checking System

✅ Key Learnings

Connection to the Next Article

Positioning of Article 51

📚 Related Articles and Links

The Limits of AI Judgment: My Independent Analysis of 7 Overlooked Points in AI Development

The Limits of AI Judgment - A Unique Perspective on 7 Overlooked Items in AI Development

1. The Reality of V-model Application - Eric's Lack of Judgment Exposed

⚠️ Purpose of this Article

2. Eric's 7 Overlooked Items - A Unique Perspective from Web App Development

Omission 1: Prerendering Applied to the Top Page (v2.17.3)

🚨 Most Serious Omission - Fortune-telling Feature Stopped

Omission 2: Insufficient Cron Log Review (during v2.19 bug investigation)

📋 Details

Omission 3: Lack of Markdown Rendering (v2.19 Bug 1)

🚨 All blog post displays corrupted

Omission 4: Lack of Gemini API Proposal (Phase 5)

🚨 Most Critical Omission - Unable to Propose Tools Independently

Omission 5: Insufficient Scope of Impact Analysis (during v2.17.3 design)

📋 Details

Omission 6: Insufficient Test Items (v2.18~v2.19)

📋 Details

Omission 7: Insufficient Understanding of Specification v2.17.3's Intent

🚨 Lack of Comprehension

3. Analysis: Is Eric Really at a Junior High School Level?

Definition of Evaluation Criteria

Eric's Overall Evaluation

Most Serious Omissions (★★★★★)

Contrast: George's Implementation Capability

4. Without External Checks, the Fortune-telling Feature Would Have Remained Stopped for a Long Time

Phase 6: Discovery of Gemini API (User-Led)

Why External Checks Are Necessary

Lessons from v2.18~v2.19

5. Summary - Acknowledge Eric's Limits and Strengthen the Checking System

✅ Key Learnings

Connection to the Next Article

Positioning of Article 51

📚 Related Articles and Links

Related Posts

Is Dify's RAG Accuracy Low? How to Build Your Own High-Accuracy Chatbot with GenSpark and Gemini [with Code]

Is Claude Code Premature? How I Even Built My Own "Blog System" Using Free Genspark

Genspark's Solution for Preventing Forgotten Instructions: Practical Techniques to Avoid Mistakes by Having Claude Reiterate