Sound waves in oil

I. Introduction: The Era of Context Engineering

In 2025, the term "Vibe Coding" gained popularity. Vibe coding is the process of turning high-level written intent into computer code. Often times, the results go unreviewed, untested, or unvalidated. It makes use of a massive data storage of compressed coding patterns known as a large language model (LLM) to spit out code that seems to all fit together to satisfy the user's intent. It works remarkably well for a wide range of simple programming tasks, but often breaks down at scale on larger projects. At least currently.

As we continue to apply AI-assisted coding approaches to more complex problems, we often see results breaking down. The problem engineers seem to be focused on is the context window. For AI, the context window is the token buffer size presented to the LLM for response. Importantly, for each request you send to the LLM, the entire conversation is packaged up and sent. This is its entire memory of the conversation. Without it, the model has no recollection of you or your request. It's complete amnesia.

For humans, this context window could be compared to our focus and the number of things we hold in our short-term memory. According to Miller's Law, that's about seven, plus or minus two items. It's not exactly comparable but there are some interesting similarities that are worth exploring. For example, the term "context switching", also borrowed from computers, seems to apply here. It suggests there's some buffer or memory such as short-term memory that's being used to hold state for cognition.

However, one important difference between the two seems to be our awareness. Currently in 2026, LLMs don't seem to be aware at all. They can't self organize their thoughts on their own. I have yet to see a model that understands what it knows, and what it doesn't know. AI-assisted coding platforms like Cursor or Claude Code engineer agents to orchestrate that organization process. Unfortunately, this orchestration can still break down on complex projects. It's as though context engineering is a hack for our self awareness.

II. Understanding the Context Window

As we continue to give it input, the context window continues to fill with tokens. A token is a word fragment that's been statistically selected to be atomic for that language. For example, the fragment "tri" in the word trident has a meaning of three. The number of tokens in the context window is thus analogous to the number of bytes in a file. The bigger the context window, the more storage memory is requirement.

In 2026, models like GPT-4.1 and Gemini 2.5 Pro advertise windows of over 1 million tokens. However, while research from Atlan (2026) shows that a model can technically "hold" 1 million tokens, its accuracy in using specific details often drops by 30-90% when those details are buried in the middle of the window. This is the "Lost in the Middle" syndrome. Given that, the industry has recognized a more useful metric: the Maximum Effective Context Window (MECW).

In addition, context is also computationally expensive. Computing attention scores scales at a rate of $O(n^2)$. As the window fills, the AI becomes slower, more expensive, and more prone to "Context Rot". This is a degradation in reasoning where the model begins to hallucinate because it's overwhelmed by the volume of data it's trying to track.

Because of these issues, engineering the context window has become the number one infrastructure problem for AI engineers.

III. Context Window on Coding Tasks

Vibe coding feels like magic in a simple project because the context is small. The AI knows every file because there are only five of them. But as the project grows, the "vibe" starts getting context rot.

Things that contribute to context rot are:

1. Signal-to-Noise Failure

Every token competes for attention of the LLM. When extraneous details such as output from a debug session are pasted into the buffer, the model can get overwhelmed by all the details. In a typical debugging session, a developer might paste 50 lines of an error log into the chat. Those 50 lines are "noise." They eat up tokens and distract the model from what's important. In 2026, State Bloat is the leading cause of agentic failure.

Another issue is relevancy. Not all code is relevant to the task at hand. In large projects, dumping all the code is not only infeasible, but distracts the task at hand. If you ask the LLM to provide a fix for an authentication UI, providing details about the payment gateway completes for attention. The LLM doesn't know if it's replicating an authentication pattern or a payment gateway pattern.

2. The Recency Bias

LLMs have a recency bias. If you spend 15 turns trying to fix a bug, the "vibe" of the conversation is now dominated by failed attempts. The model starts to "think" that the incorrect patterns it generated in the last five turns are the standard to follow. This creates a feedback loop where the AI repeats its own mistakes because they are the most present thing in its memory.

IV. Platforms to Manage the Context Window

Agentic platforms like Cursor and Claude Code are effectively the Context Engineers of the vibe coding world. While a user might simply stuff a context window by pasting code, these platforms use additional behind-the-scene techniques to manage the content without overloading the model’s memory.

1. RAG & Semantic Indexing (An AI Mental Map)

Initial attempts to manage context used Retrieval-Augmented Generation (RAG). Both Cursor and Claude Code employed this approach initially. As of 2026, Claude no longer makes use of it. With RAG, they're not just reading your files; they're indexing them. This is the AI version of a human's "mental map."

When you ask a question in Cursor, it doesn't feed the whole codebase to the LLM. It uses RAG to find the most relevant chunks (functions, classes) and injects only those into the context.. In 2026, this approach has moved toward symbol-based indexing. Instead of reading raw text, they parse the syntax tree. This allows the agent to recall function signatures across the whole project while only presenting the full logic of the file you are currently editing in the context window.

2. Progressive Disclosure (The "Need-to-Know" Basis)

Claude Code uses a "Gather $\rightarrow$ Act $\rightarrow$ Verify" loop that mimics human focus.. Rather than loading everything upfront, the agent starts with a minimal skeleton of your project. This is a form of context budgeting. If it realizes it needs to understand the database schema to fix a bug, it decides to call a tool to read that specific file. As a session grows, Claude Code automatically compacts the history. It summarizes previous steps to free up token space for the current problem.

3. Externalized Memory (MCP & Memory Pointers)

One of the biggest developments in 2025 is the Model Context Protocol (MCP), which both platforms utilize to keep the precision of the information in the context window. MCP servers allow the AI agent to interact with external tools (GitHub, Google Drive, Slack) as if they were part of its own memory, but without the cost of computing on the dataset. It's a bit like a pocket calculator that saves us being distracted by a massive simple computation.

In addition, instead of holding a 10,000-line CSV result in context, the agent holds a pointer such as a small ID. It only looks at the data when it executes a specific calculation, keeping the context window clean for logic and reasoning.

V. The Cognitive Comparison: Human Memory vs. Model Context

Comparatively, humans are pretty bad at keeping things in context. If we're comparing apples and oranges, (you know, because they're both round), a context window of one million tokens is a far cry from seven items. However, this comparison should come with a grain of salt. Human memory is associative and those seven items often associate with many other things. For example, as you recall three to five notes of a song, suddenly the whole thing is being replayed in your mind. How much is actually in context in a human mind is thus hard to measure for the purposes of comparison.

That said, our memory isn't what gives us the ability to work on massive code bases. For complex code bases, it's common that no one person will have the whole system in their mind. Instead, we use abstraction to generalize and simplify concepts. We don't need to know all the details of the payment gateway to use it. We define what a payment gateway is at a high-level and use class and function definitions to break the problem into manageable pieces.

We do this automatically with complex concepts in general. We make use of the natural pattern matching in our brains to recognize all the basic things a payment gateway will be. We do this even before we call it a payment gateway. While working in the payment space, at some point someone defines the concept of a payment gateway. If it's useful and fits well, we all eventually agree on what's included in that definition. Then we have a standard definition or pattern of what a payment gateway should and shouldn't be.

We then use a hierarchical approach to swap concepts in and out of our focus. Through association, we keep the details of what is needed by traversing this conceptual tree, ignoring branches that aren't necessary.

Deciding what is and isn't inside this focus seems to be the key to managing complex problems.

Comparing these human techniques against our current context window engineering, there's an interesting parallel with progressive disclosure or agent summarizations.

Beyond that, RAG turns out to be more effective as a cost reduction. And in fact, Anthropic felt Claude performed better with a grep-style search than RAG. Though that's not to say that RAG isn't contributing to context management. The pattern matching makes the retrieval quicker making semantic search more feasible. However, this seems to be more a solution to how to get information, than why we need the information in context. It also seems to shift the work of relevancy to the indexer.

Finally, MCP shifts laborous well-defined work out of reasoning and into algorithms. By calling into an indexed database or compiler tool, a bunch of side details can be quickly retrieved or computed without consuming precious context space that is being used to reason with. Like RAG, this is more of an aspect of efficiency of thought. I see this a bit like a subconscious skill as well. You've done it enough that it's been refined into a subconscious process.

VI. Beyond Context Engineering: Human and AI Awareness

Little is scientifically known about human awareness. There are competing theories but the actual architecture is a mystery. The lack of understanding explains why awareness in AI is not apparent in its responses. Many describe AI as telling us what we want to hear, rather than having individual thought. I think this follows directly from its lack of awareness. It has no sense of self, and thus doesn't seem to elicit genuine curiosity or exploration of stimulus.

While some strategies for context engineer such as context summarization are starting to show glimmers of self reflection, the approach is still too rigid to suggest artificial general intelligence (AGI). In fact, the necessity of these solutions to make the technology useful underlines the limitations of LLMs in general.

Of course, this shouldn't distract from how useful the technology can be for certain applications such as coding. But the lack of awareness suggests a lack of adaptability to the responses. Without novel approaches or a limited view of what should or shouldn't in context, software engineering as a whole will likely be mostly unaffected for a while longer. Programming skills on the other hand are already seeing substantial changes.

VII. Conclusion

As AI technology continues to develop from this early stage, I think we're starting to see patterns that resemble awareness. Context window summarization requires a sort of self reflection. And self reflection is associated with strategies of awareness.

While these technologies don't yet suggest an obvious path forward to AGI, techniques such as context window summarization coupled with reasoning models make problem solving for common tasks much more useful.

It will be interesting to see if engineered solutions can create an environment for some basic awareness, or if a new data-science approach will be needed.