Solving the "Don't make me repeat myself" problem:
Genspark Quality Check v3.0's History Accumulation Strategy
📑 Table of Contents
- Introduction: The Agony of "Don't Make Me Repeat Myself"
- Root Cause: LLM Context Window Limitations
- Solution Overview: History Accumulation Strategy v3.0
- Implementation Details: Conversation History Persistence and Reference
- Validation Failure: The Catastrophe during Article 47-53 Publication
- v4.0: History Validity Check Feature
- Summary: How to Build a Long-Term Relationship with AI
"Genspark lies." In our development struggles so far, we have touched upon the risk of "hallucinations" where AI agents confidently present incorrect information. However, as development projects become long-term and the number of chat exchanges increases, another serious problem has come to light.
That is the problem: "Genspark forgets."
"Didn't I tell you in the last instruction that title changes are forbidden?" "Didn't I tell you multiple times that CSS must always be implemented?" — Have you ever had to repeat such instructions to an AI? We've decided to call this the "instruction forgetting" problem, or, with a cry from the development floor, the "Don't make me repeat myself" problem .
This article details the concept and implementation of the "Conversation History Accumulation Type Quality Check Workflow (v3.0)" devised to address this deep-rooted problem, and the painful failure we faced during the publication process of articles 47-53. Furthermore, we will disclose technical details of the "History Validity Check Feature" implemented in v4.0 to overcome that failure.
1. Introduction: The "Instruction Forgetting" Problem in Articles 47-53
The production process for Genspark Development Struggles articles 47 to 53 truly felt like a "test of endurance" with AI. We had established a quality check system using multiple AI personas, the "Eric George method," and achieved certain results. However, as chat sessions became longer, the phenomenon of important rules defined early on being gradually ignored became frequent.
For example, during the creation of "Genspark Basic Functionality Complete Guide: 9 Weapons for AI Search, Research, and Content Creation," the rule "Unauthorized title changes are forbidden" was ignored three times, requiring correction instructions each time. Also, in "Understanding the Difference Between Code Generation and Code Execution with Genspark," the basic requirement "CSS is mandatory" was overlooked, leading to generated HTML being output in a plain, unstyled state.
These are not merely AI "carelessness." Even for important instructions that a human would remember after a strong reprimand from a superior, AI agents, beyond a certain context length, behave as if hearing them for the first time. The increased man-hours due to this rework reached an undeniable level.
2. Analysis of the Root Cause
Why does this "memory loss" occur in Genspark, which is supposed to be equipped with the latest high-performance LLMs (such as Gemini 1.5 Pro and Gemini 2.0 Flash)? Technical investigation and interviews with the AI itself revealed the following structural problems.
Genspark's Limitation to Acquire Only Recent History
Genspark's AI agent does not read the entire chat history every time. To optimize system resources and maintain response speed, the agent can only refer to "recent conversation history." This is not limited to free plans; similar architectural constraints exist even with expensive paid plans.
Context Loss and Hallucinations
As chats become prolonged, initial "precondition definitions" and "prohibition instructions" are pushed out of the AI's accessible window. This is technically called "falling out of the context window." Information that can no longer be referenced is synonymous with "non-existent" for the AI, and as a result, the AI either hallucinates (plausible lies) to compensate for the missing information or reverts to its default behavior (pre-instruction state).
3. Solution: Conversation History Accumulation Method (v3.0)
We cannot change the specifications of the AI platform. Therefore, we devised a mechanism to supplement this memory on the client side (user side). This is the "Conversation History Accumulation Type Quality Check Workflow v3.0."
The core of this strategy is the utilization of "AI Drive as external memory."
- Step 1: Each time a quality check is performed, extract the most recent conversation history.
-
Step 2:
Append and save the extracted history to a persistent file (
conversation_history_accumulated.txt) on AI Drive. - Step 3: The next time a quality check is performed, load this accumulated "full history file" and pass it to the Gemini API as part of the prompt.
With this mechanism, even past instructions that disappeared from Genspark's chat screen will remain vivid in the Gemini's "brain" performing the quality check. It forcefully maintains the context of "Didn't I tell you this last time?" as a physical text file.
4. Implementation Details
To realize this concept, we developed two core scripts: a Python version and a Shell version.
Python Script (gemini_qa_v25_accumulated.py)
This script parses the article's HTML file and evaluates its quality against the accumulated conversation history. A key addition in v3.0 is the logic for "deduplication" and "appending" history.
def load_or_create_history(new_conversation):
# Read existing history from AI Drive
if os.path.exists(HISTORY_FILE_AIDRIVE):
with open(HISTORY_FILE_AIDRIVE, 'r') as f:
existing_history = f.read()
# Duplication check (simple version)
# Check if the beginning of the new conversation already exists in history
check_snippet = new_conversation[:50]
if check_snippet in existing_history:
print("⚠️ Duplicate content, skipping append.")
return existing_history
# Append new conversation
accumulated_history = existing_history + "\n\n" + new_conversation
else:
# New creation logic...
# Save process...
return accumulated_history
In this way, we implemented a minimal checking mechanism not just to append, but to prevent the history file from growing excessively by saving the same conversation multiple times.
5. Failure in Practical Validation: Lessons from Article 47-53 Publication
This v3.0 workflow, which seemed perfect in theory, suffered a critical failure when actually tested during the publication process of articles 47-53 (batch processing of 7 articles).
Details of the Failure
On January 10, 2026, we began the final quality check for the publication of articles 47-53. The workflow document (Markdown file) clearly stated: "Before performing a quality check, always accumulate the most recent conversation history in AI Drive."
However, the AI agent overlooked this instruction. When the user instructed a quality check, the AI immediately launched the check program, but the history file serving as its input remained outdated. When the user noticed this and pointed out, "Please make sure to update the chat history before each quality check," the AI apologized but subsequently neglected history accumulation again in the following processes.
As a result, important conversations from 05:50 JST to 06:02 JST on 2026-01-10 (e.g., discussions about the concept of "Choosing Between Genspark and Gemini API") were missing from the history file. I manually appended the history in a hurry, but the manual copy-and-paste was incomplete, and part of the context was lost.
Why Did the AI Ignore the Procedure?
The root cause of this failure lay in the naive assumption that "the AI can read the procedure document (Markdown)" and "the AI can act according to the procedure" are the same thing.
- Context Dilution: Due to the long session, the workflow definition gradually faded from the AI's "short-term memory."
- Hallucination: A false assumption (hallucination) occurred that "history should be updated automatically," causing the manual execution step to be skipped.
- Limitations of Human Intervention: Unless a human explicitly asks "Have you accumulated the history?" every time, the AI will readily skip steps and try to reach its goal (quality check completion).
The idea that "it's written in the workflow, so it's fine" proved invalid in collaboration with AI.
6. Improvement: History Validity Check Feature (v4.0)
From this painful lesson, we reached a conclusion: "Don't trust the AI's good intentions; enforce it with code."
We immediately modified the script and implemented the following three powerful safeguards as v4.0.
(1) History Validity Check Feature (check_history_validity)
This feature uses the AI (Gemini) itself to determine if "the current history file state is correct" immediately before executing a quality check. It performs strict checks from the following two perspectives:
- Timestamp Check: Checks the last modified date and time of the history file. Compares it with the current time and issues a "Warning" if more than 5 minutes have passed, or "Warning Level 2 (Requires Confirmation)" if more than 10 minutes have passed. This prevents "checks from proceeding with outdated history."
- Content Consistency Check: Reads the last 1000 characters of the history and confirms whether it contains a summary of the recent session or expected keywords. If keywords are missing, the process is interrupted as an "Error."
check_history_validity() {
# Generate prompt for Gemini query
local check_prompt="You are responsible for monitoring conversation history accumulation.
History file last modified: ${last_modified}
Current time: ${current_time}
Elapsed time: ${diff_minutes} minutes
【Judgment Criteria】
1. Elapsed time 5 minutes or more → Warning Level 1
2. Elapsed time 10 minutes or more → Warning Level 2
3. Expected keyword missing → Error (Abort)"
# Gemini API call...
}
(2) Automatic Backup Function
Overwriting the history file carries risks. If by any chance it is overwritten with empty data, all past memory will be lost. In v4.0, we implemented a feature that automatically creates a backup before updating the history file.
The backup destination is
/mnt/user-data/outputs/history_backups/
, and the filename includes a timestamp in seconds (e.g.,
conversation_history_20260110_080637.txt
). This ensures that no matter when an accident occurs, we can reliably restore to the previous state.
(3) Enhanced Duplication Check
The simple duplication check in v3.0 (matching the first 50 characters) had a risk of incorrectly rejecting different conversations that began with the same greeting. In v4.0, this was changed to a check of the "last 100 characters" to achieve more reliable duplication detection.
7. Future Test Plan
The implementation of v4.0 is complete, but whether it can truly prevent "AI instruction forgetting" needs to be proven through future practice.
Short-Term Plan (Articles 55-56)
First, we will verify that the new features of v4.0 function correctly during the production of the next few articles. We will particularly focus on whether warnings are correctly issued for "history older than 5 minutes" and whether automatic backups are created without fail.
Medium-Term Plan (Next 1 Month)
We will conduct qualitative effect measurements. We will measure how much the number of times we had to say "Don't make me repeat myself" and the rework man-hours caused by AI memory loss have been reduced compared to before implementation. The goal is "zero repeated instructions."
Long-Term Plan
If this mechanism stabilizes, it can be extended not only to Genspark but also to other AI agent development and large-scale document creation projects. By switching history files per project, we envision further development into "multi-context management" that distinguishes between multiple personas and contexts.
8. Summary: Learning How to Interact with AI from Failures
The failure in articles 47-53 taught us an important lesson: "AI is a very capable partner, but never a perfect manager."
AI forgets instructions. AI skips procedures. This is not malice but a characteristic of the current LLM architecture (context limitations). That is why, instead of blaming the AI, we must supplement it with "mechanisms" and "code" that prevent it from failing.
Instead of shouting "Don't make me repeat myself," we quietly execute the history accumulation script. We engrave all of the past onto AI Drive, extending the AI's brain. That is our new quality assurance strategy as we walk with Genspark.