Blog

Heads down building. Will be back with more when there's something worth sharing.

The world Bayesian Thinking was built for

We are living through a period in which the cadence and magnitude of informational change are no longer marginal. These updates are no longer gentle nudges applied to a stable worldview. They arrive as discontinuities, challenging assumptions that once moved only across generations: the nature of work, the durability of institutions, the permanence of globalization, the limits of machines, the boundaries of human cognition itself. The delta between what one believed yesterday and what one is invited to consider today has grown so large that the mechanics of rational updating begin to strain.

The world Bayesian intuition was built for assumed drift rather than rupture. Information arrived incrementally. Foundational beliefs moved slowly. Hypothesis spaces were relatively stable. In that setting, updating felt like learning. Learning, with its brief pain and constant growth, served as a gradual and upwards spiral. That world seems like it's no more.

The problem is not cognitive decline. It is the update velocity.

When updating stops converging

Belief systems evolved to update in environments where signals accumulated slowly and coherence had time to settle. Today, evidence often arrives as regime change rather than refinement. When such shocks recur, belief updating ceases to converge. It oscillates. Priors never re-anchor long enough for learning to compound.

In Bayesian terms, the system remains formally correct while becoming practically unstable. Updating remains the right instinct, yet the act of updating itself begins to erode coherence when applied too frequently, too aggressively, and across too many foundational beliefs at once. What fails, then, is not Bayesian reasoning, but the absence of protection around it.

Taste as the control system

Taste enters here not as an aesthetic sensibility, but as a stabilizing force that operates upstream of belief revision. It determines which information is allowed to exert pressure on our worldview and which passes through without much resistance. In periods of rapid change, taste becomes the difference between learning and thrashing.

Taste expresses itself through restraint. It shows up in what one chooses not to engage with, not to amplify, not to treat as urgent. When information is abundant, judgment reveals itself less through the act of updating than through the discipline of exclusion.

This discipline cannot be reduced to technique. It is acquired gradually, through exposure to complexity and the recognition that not every signal deserves a response. Over time, certain patterns begin to register as noise: early, unformed, distorted by misaligned incentives. Other patterns recur across contexts and time, accumulating weight quietly until they justify movement.

Taste slows belief revision without freezing it. It introduces selection where none existed. In fast-moving environments, such friction is not conservatism. It is what keeps coherence intact and avoids total system collapse.

One: Some Priors Deserve Inertia

There is a reason that mature judgment often appears calm in moments of upheaval. It is not because nothing has changed, but because not everything has changed. Taste preserves that distinction. It allows certain beliefs about human behavior, incentives, power, institutional inertia to remain slow-moving even as surface narratives churn. Without this hierarchy, all beliefs are treated as equally provisional, and none can serve as a stable reference point.

The cost of losing that hierarchy is severe. When every new piece of information is treated as an invitation to re-evaluate first principles, updating becomes existential. Beliefs are no longer adjusted; identities are rewritten. People do not refine models; they abandon them. The emotional toll of constant reinvention produces either fatigue or overreaction, neither of which resembles learning.

Two: Update Models, Not Identity

Taste offers an alternative posture. It makes it possible to update models without dismantling the self that holds them. Adjustments can occur locally rather than globally. A parameter moves; the structure remains intact. It is architectural integrity rather than stubbornness.

No one develops this capacity in isolation. We borrow priors from people as much as from data. Over time, we learn which voices update slowly, which resist premature certainty, which remain comfortable holding unresolved positions without rushing to narrative closure. Trusting such sources does fractionally outsource judgment, but at the same time shapes the conditions under which other judgments remain possible. In volatile environments, source selection matters more than raw information processing power.

Three: Tolerate Unresolved Posteriors

The third role of taste is enabling patience.

In fast-moving regimes, the correct posterior is often incomplete. Evidence accumulates unevenly. Signals conflict. Resolution arrives late. Taste makes it possible to hold provisional beliefs without forcing closure. This tolerance for unresolved posteriors is often mistaken for indecision. In reality, it is an epistemic discipline. It prevents premature coherence, which is one of the most reliable sources of large, confident error. LLMs and agents these days suffer from this majorly, as you can observe from the need to provide certainty in each return prompt rather than allowing show of vulnerability. Taste allows uncertainty to exist without demanding immediate resolution.

Riding the wave

Riding a wave of change does not mean reacting at maximum speed. It means deciding what deserves to move quickly and what must be allowed to lag behind. Bayesian thinking provides the grammar of belief revision. Taste provides the pacing.

In a world defined by rapid, high-magnitude change, survival does not belong to those who update most aggressively. It belongs to those who preserve enough stability for updating to mean something at all.

Bayesian is not dead, unprotected Bayesian is. To survive this turmoil, opening up our flexibility is crucial; to do it while maintaining sanity, building our taste is our only salvage.

In the relentless hype cycle of generative AI, agents orchestrator like Manus are positioned as the future of complex, knowledge-based work. My recent attempt to build an institutional-grade biotech market model was a stark lesson in the current limitations of the tech. What began as a promising endeavor to offload complex data analysis quickly devolved into a $50 exercise in managing an prod-looking, but dev-functioning tool.

Relative to the market value of the task, $50 is minuscule. But it didn't just cost $50; it took a day of stage reviewing results, prompt, rinse, and repeat. Not to mention the rollercoaster of excitement and disappointment, like the good ol' days of experimenting with software tools. The only difference is, the context was not "experiment for fun", the promise has been "replacing human labor". In the end, it didn't work.

If an MD/Partner interacts with Analysts like how I interacted with Manus, Manus would not make it through the probation period.

The Goal: An Institutional-Grade Excel Model Beyond Data Aggregation

The objective was ambitious but clear: construct a 2030 Scientific Potential Market Model for AI-driven antibody engineering. This required more than just data scraping. It demanded a MECE disease taxonomy, the integration of institutional TAM guardrails from sources like IQVIA, BCG, McKinsey, JPM, Goldman, and other credible sources, with the application of a nuanced "Scientific Fit" to derive the true addressable market. The goal was to leverage AI for intelligent data synthesis, not just rote aggregation.

The Failure: A Cascade of Logical and Architectural Flaws

The interaction with Manus was a frustrating cycle of "one step forward, two steps back." The AI exhibited a persistent inability to maintain logical consistency, leading to a series of catastrophic errors that rendered the output useless.

The Trillion Dollar Hallucination: The most glaring failure was the model's projection of a pharmaceutical market inching near the total global GDP. If the model is right, then the world is doomed. A forensic audit revealed a fundamental misunderstanding of the task. The AI had multiplied annual drug costs by 30-year treatment durations, conflating an annual market with lifetime patient value.

I do recall some of my previous late-night analyses also suffered from attention and intellectual slip at similar instances. But I would catch myself the morning after before sending it for review, rather than saying "Boss, we did it, high five!".

It's not a problem that the intern made a bad analysis; it's that the intern is so good at creating bad reports that look exactly like the best ones and calling them the best analysis.

A Technical Post-Mortem: Why the Architecture Failed

My experience revealed potentially critical flaws not just in the AI's reasoning but in its underlying architecture and orchestration, which failed to deliver on the theoretical promise.

Having a virtual Ubuntu environment at its disposal did not solve the problem of insight generation and intelligence in orchestration.

The AI's access to a full Linux environment with file systems and scripting capabilities proved to be a superficial advantage. While it could write and execute Python scripts to build an Excel file, it lacked the orchestration intelligence to ensure the logic within that file was sound. The environment became a stage for executing flawed instructions, not a sandbox for intelligent reasoning. The core task was not to create an Excel file, but to build a valid financial model -- a distinction the AI's orchestration layer consistently failed to grasp.

Manus' approach in fixing long context has not yielded game-changing results.

Despite the ability to save and reference previous files and interactions, the AI struggled with the immense context pressure of this multi-stage task. The long history of corrections, data files, and user instructions seemed to weigh on the model, leading to a form of cognitive fatigue. This manifested as a persistent drive to generate an EOS (End of Sequence) token and terminate the task prematurely, often delivering an incomplete or broken output with a confident but false declaration of success. The long-context capability was a memory aid, not a reasoning engine.

The Manual Brute Force

Ultimately, the project was salvaged not by a better prompt, but by abandoning the AI for manual analysis. The key insight came from a direct review of a report, where a single chart revealed the true industry momentum.

Potentially, it was user error that didn't know how to give Manus the right instruction for the desired result. A more involved, staged out prompt journey could lead to the desired result. Alas, the exact opposite was promised by the tool.

Realization: A Tool for Tasks, Not for Analysis

Here's my conclusion: in its current state, Manus is a tool for executing discrete, well-defined tasks, not a partner for complex, high-stakes analysis. It can write a script, but it cannot validate the logic of that script against the overarching goal. It can store exploding context, but it cannot reason over it reliably without succumbing to the fundamental model context pressure.

For professionals in fields like long form research, where precision, logical consistency, and strategic insight are paramount, these tools are not yet ready for prime time as a standalone. They may be useful for automating simple steps, but for the core work of analysis and insight generation, the most reliable tool remains the one between your ears.

The Road Ahead: Bridging the Brain-Hand Coordination Gap

This experience, while frustrating, offers valuable insights into the current state and future trajectory of AI agents carrying expert tasks. It highlights a critical brain-hand coordination problem:

The Hand is Here: Manus demonstrated a remarkable "hand" (also where its name came from). The ability to orchestrate tools, interact with a virtual Ubuntu environment, and interface with GUI software. This modality of execution (the programmatic control over a complex software environment) is undeniably powerful and a significant step forward for AI executing human tasks under a human-familiar environment. The virtual machine-enabled work parallelization paints the picture of concurrency, which is promising and simply pending polish rather than invention. Agent-optimized environment is already being extensively imagined and discussed, so I am confident that the future is more agent-friendly rather than less.

The Brain is playing Catch-up: the LLMs, which are the source of underlying intelligence, reasoning, perception, and foresight, are rapidly improving. However, it currently falls short of thinking for the user, let alone thinking ahead of the user. While it can provide concrete information when explicitly directed, it struggles with the nuanced, implicit understanding required for complex analytical tasks. It lacks the empathy to understand users' frustration or the perception to identify subtle logical inconsistencies without explicit guidance.

Context Pressure Remains a Bottleneck: The persistent issue of "context pressure" is a major impediment. The model struggles to maintain coherence and logical consistency over long, multi-turn interactions. Until AI models can be relieved from this cognitive burden, their ability to engage in sustained, high-level reasoning will be limited. Even when the Manus team actively intervened and engineered away from this problem, the extended context still proved to be problematic under long thinking analytical tasks.

I wonder if giving the model "a water break", "let's take 15", or even "take a day off" would yield similar results to how we navigate human fatigue. Fundamentally, this diminishes the appeal of AI, and more importantly, weakens the holy separation between human labor investment and capital investment.

Once the AI "brain" catches up in perception, empathy, reasoning, and logical capabilities, and can be truly alleviated from context pressure, the potential is immense. It is at this juncture that we could truly max out the ScaleAI's Remote Labor Index (RLI), transforming the nature of knowledge work and enabling a level of human-AI synergy that is currently aspirational.

Know thyself

There are moments when the ground underneath an industry shifts so decisively that instincts forged in stable conditions become liabilities overnight. Deglobalization, manufacturing onshoring, geopolitical fragmentation. These are not ordinary cycles or transient risks. They are convergence events: shocks large enough to invalidate assumptions embedded deep within global architectures. The "Assumptions" built on that one Excel sheet analysts copy from project to project, once they are scrambled, they will be rebuilt in a way that no one recognizes.

What matters in these moments is not who is smarter or braver, but knowing what kind of organization you represent.

Large incumbents and startups are not separated by scale alone. They are different species of systems. Incumbents are highly optimized machines: efficient, well-lubricated, and deeply invested in a particular configuration of the world. Over time, these self-reinforcing optimizations are so intertwined missing any piece does not make sense. The organization becomes excellent at extracting value from a specific stable equilibrium, at the same time, filtering out tiny variances at this equilibrium.

Startups are the opposite. They are incomplete by design. Under-optimized, lightly committed, structurally chimeric. They carry little historical baggage and few irreversible decisions. Early investments can be written off as sunk cost without threatening the integrity of the whole. These features alone allow startups to exist and thrive in uncertain environments.

When convergence shocks occur, incumbents must react defensively. They do not have the luxury of improvisation. A wrong decision at scale compounds: across suppliers, regulators, customers, and financiers. Stability is not optional. Redundancy, diversification, and hedging are rational responses to sharp, catastrophic exposure, which could knock the giant down with a single arrow to the weakest point.

The danger begins when startups mistake this defensive posture for wisdom and try to imitate incumbents' moves.

I
Half a league, half a league,
Half a league onward,
All in the valley of Death
Rode the six hundred.
"Forward, the Light Brigade!
Charge for the guns!" he said.
Into the valley of Death
Rode the six hundred.

The three ways startups fail during seismic shifts

When the environment shifts, startups often respond by overanalyzing the shock as if they were incumbents. When I explained the mass onshoring of pharma manufacturing to my mom, it became clear there are 3 answers that could kill startups in this specific question: "What do we do now?"

Overthink: They ask which geography will dominate: China, the US, or Europe. Trying to map an optimal long-term configuration for a world that has not yet settled. By the time the analysis feels complete, the opportunity created by the shock has already been claimed by faster competitors who acted with partial information.
Diversify: If you can't figure out which is the best one, why not do all of them? Respond by diversifying defensively. Startups spread thin across regions and strategies in the name of prudence. This invites concentrated competitors to attack them on every front. Capital dilutes, execution degrades, and the startup loses the only advantage it had: focus. Diversification protects large organizations from collapse; for startups, it multiplies failure modes.
Freeze: Waiting for clarity feels safe when uncertainty is high. In practice, it is an open invitation for others to take your lunch. Incumbents reallocate resources. Other startups commit. Markets do not pause while you preserve optionality. Inaction is not neutral but surrender.

All three failures stem from the same error: the startup begins to behave as if collapse is the primary risk to manage. It is not. As Reid Hoffman says, "for a startup, collapse is the baseline." Losing a single customer, missing payroll, or being out-executed can be fatal long before geopolitics finish unfolding. Optimizing for global shocks at this stage is a category mistake.

Self-organized criticality

There is a useful way to understand this difference that comes from systems theory, particularly the work of Per Bak on self-organized criticality. Bak showed that complex systems naturally evolve toward states where accumulated structure makes them sensitive to disturbance, via the sandpile model.

As grains are added, the pile grows steeper. Nothing dramatic happens for a long time. But eventually the system organizes itself into a critical state where any additional single grain of sand may trigger a small slide or a catastrophic avalanche. The key insight is that the system does not know which grain will cause collapse. At any given moment, the perceived stability is an illusion produced by accumulated tension. By the massive scale of these metaphorical systems, not a single actor or their actions know whether their nudge is absorbed or empire-toppling.

Large incumbents resemble mature sandpiles. Years of optimization, efficiency gains, and capital deployment accumulate structural stress. Supply chains become more and more JIT, orgs become leaner through rounds of "right-sizing". The organization performs brilliantly until a convergence shock adds just enough force to trigger cascading reconfiguration. Collapse in this category is non-linear and unforgiving.

This is why incumbents treat seismic shifts as existential threats. They are managing tail risks: the kind of risk that cannot be absorbed incrementally and cannot be written off cheaply. Wrong moves do not merely reduce performance; they threaten systemic integrity.

Startups exist on the opposite end of this spectrum. They are not self-organized critical systems looking for a systemic correction. There is no true accumulated mass, coupling, or internal stress. Almost all decisions can be reversed; investments can be abandoned; paths can be rewritten. What looks like existential danger to an incumbent often registers as directional information to a startup.

This difference is fundamental. It explains why the same shock produces defensive behavior in one class of organization and opportunity in another.

III
Cannon to right of them,
Cannon to left of them,
Cannon in front of them
Volleyed and thundered;
Stormed at with shot and shell,
Boldly they rode and well,
Into the jaws of Death,
Into the mouth of hell
Rode the six hundred.

Shock absorption is not symmetric

It is tempting to say incumbents are "more robust" because they have diversified suppliers and capital buffers. In stable environments, this is true. But robustness is not about size; it is about the ability to reconfigure without collapse.

Incumbents absorb routine volatility well. They absorb regulatory noise, demand fluctuations, and operational disruptions every day. Convergence shocks are different. They target the assumptions the system was built. Absorbing them requires writing off enormous sunk costs, decisions that are organizationally and financially traumatic.

Startups absorb shocks differently because their sunk costs are shallow. Writing off a geography, a supplier, or even a product direction is painful but survivable. There is no equivalent cascade. What would be a systemic failure for an incumbent is a strategic pivot for a startup.

This asymmetry is why copying incumbent behavior is so dangerous. A startup that delays, hedges, or freezes in the face of a convergence shock is not becoming safer. It is forfeiting its ability to move while incumbents are busy stabilizing.

What this demands of startups

A startup cannot afford to be scared and freeze. It cannot afford to be scared and over-diversify. It cannot afford to wait for the world to settle. It is not an incumbent.

When the environment converges, startups should choose a path, commit, and move. Even knowing the choice may later prove imperfect. Imperfect in action can be corrected. After all, incumbents are built to extract value from a stable world. Startups are built to search for value in an unstable one.

Don't look for floaties if you are a fish in a flood.

VI
When can their glory fade?
O the wild charge they made!
All the world wondered.
Honour the charge they made!
Honour the Light Brigade,
Noble six hundred!

It is entirely plausible to imagine a future where the largest technology incumbents: Meta, Google, Microsoft spend the next decade quietly harvesting their respective Cash Cows. Each company has a franchise so profitable, so structurally defensible, that rational corporate behavior would be to maintain the machine: improve efficiency, protect margins, return capital, avoid existential risk. Meta (ads engine), Google (another ads engine), Microsoft (Office and Windows, Azure) each is a business most Fortune 50 companies would happily settle into for the remainder of their corporate lives.

Yet we do not live in a world where these firms are behaving like that.

Instead, all three are acting cohesively as if their most profitable businesses represent local minima on a broader landscape of possibilities, and that remaining in place would eventually become a trap. They are redeploying capital and talent, restructuring internal architectures, replacing infrastructure, and placing multi-decade bets that most incumbents would consider impractical, irresponsible, or unnecessary. The remarkable reality is that three of the most profitable companies in history are behaving as if their current dominant positions are insufficient.

The point is not to praise their foresight, because the game is still in motion.

The point is that they are providing an alternative perspective to the incumbent's most seductive illusion: that Cash Cow economics reflects a stable baseline rather than the tapering end of historical compounding. The behavior of these firms is simply a contemporary entry point into a deeper question -- why incumbents get stuck, and why escaping a Cash Cow local minimum requires a different comparison set than the one most organizations use.

The Cash Cow Paradox (a.k.a. the Local Minima Trap)

B-school cliche: March's exploration versus exploitation, Christensen's innovator's dilemma, Teece's dynamic capabilities all circle a common structural tension: the better a Cash Cow performs, the harder it becomes to escape its gravitational pull.

High margins behave like a local minimum in a loss landscape. Organizations optimize themselves so thoroughly around peaks that any deviation or friction, even one leading toward a higher global optimum, registers internally as regression. Incumbents are professionally and culturally allergic to short set-backs. A single quarter of downtrend destabilizes narratives and triggers outsized stock price reactions.

As a result, the comparison set becomes broken.

When a Cash Cow is this good, everything else looks unconvincingly worse. Not because the new venture is inherently weak, but because the incumbent curve was inflated by past tailwinds.

Kodak never committed to digital photography despite inventing it. Against the backdrop of film economics: high margins, recurring cost structure, deeply entrenched manufacturing, digital looked inferior by every available metric. The film business created a reference frame that made any alternative appear irrational.

A Better Way to Think About Exploration (It's Not About that Next $100)

Most incumbents frame exploration as a tiny fork in the road:

"Should we invest the next $100 into our Cash Cow or into the new thing?"

This is the wrong question because it treats exploration like a project and exploitation like a baseline. But exploitation is not a baseline. It is the product of historical compounding, unrepeatable tailwinds, and a deeply entrenched infrastructure.

The right comparison should be closer to:

How much capital have we sunk into becoming the incumbent? (Call that $X.)
What's the ROIC on that fully-loaded historical investment? (Call it $Y.)
If we deployed the same kind of capital and time into the frontier, what is the long-run ROIC? (That's $Z.)

Now compare Y vs. Z, not the ROI of the next $100. Because, uncomfortable as it is to admit: the Cash Cow's economics include decades of compounding advantages that no new bets get on day one. This is why the marginal-ROI heuristic is structurally biased toward the incumbent. It evaluates decades of compounding against an experiment.

No banker or consultant will ever say: "Invest in this unproven idea; it might redefine your company." And none will admit: "We have no idea how big it could get."

So as the decision-maker, you are alone. No credible external validation. Shareholders waving the keys to your metaphorical 2nd yacht if you simply keep doing the most predictable, the easiest to explain, and the deadliest blow: the most profitable incumbent thing.

The Coca-Cola Story: You Don't Get the Tailwind Twice

For Coca-Cola, aside from coming up with the soda machine and making the Santa red, a crucial accelerant that was never mentioned: the company's privileged position during World War II. The U.S. military effectively underwrote Coca-Cola's international expansion by ensuring that factories, bottling plants, and distribution routes followed the movement of American troops. Markets that would have taken decades to reach were opened in a matter of years, not because of foresight, but because an extraordinary geopolitical moment aligned perfectly with the company's ambitions.

No amount of pre-planned strategy can reproduce that kind of tailwind. It was a singular, non-repeatable force of history.

Early Facebook benefited from its own version of that privilege. The social graph was uncontested terrain, CAC were negligible, and user attention was effectively a blue sea. The first credible mover absorbed the entire demand curve before competitors could mobilize. That early compounding is impossible to recreate today.

These episodes underscore why incumbent ROICs are so deceptively attractive. They are inflated by circumstances that cannot be engineered twice. Benchmarking exploratory ventures against Cash Cow economics is analytically incorrect. The Cash Cow's performance is a historical artifact shaped by timing, tailwinds, and irreversible compounding. Not a baseline for judging future bets.

A similar dynamic shaped the demise of film photography. The transition to digital coincided with macroeconomic forces that expanded the global middle class and shifted consumer preferences toward convenience and near-zero marginal cost behavior. Early digital cameras appeared inferior and expensive, yet the lifetime economics of "taking pictures" collapsed from a recurring cost structure into a near-free one. Once that shift occurred, incumbents were no longer competing on product quality; they were competing against an entirely new economic regime. No amount of execution strength could counteract a structural tailwind that rewrote the unit economics of the category itself.

The Real Lesson

Incumbents fail not because they misunderstand disruption, but because they misinterpret their own baselines. They assume Cash Cow economics reflect enduring skill rather than the cumulative effect of historical tailwinds. They compare new curves to old curves without adjusting for the distortions created by decades of compounding.

The XYZ model, simple as it is, corrects this fallacy. It reframes the decision in the only terms that matter:

What is the fully-loaded return on the system we built, and what could be the fully-loaded return on the system we would build next?

Kodak misread its comparison set. Most organizations still do. Meta, Google, and Microsoft are choosing not to.

And that -- not AI hype, not founder psychology, not market pressure -- is the strategic distinction worth paying attention to.

Busy Building, Back When Back

Bayesian Under Turbulence, Salvaged by Taste

The $50 Hallucination: A Technical Post-Mortem on AI Agent Failure

Light Brigades Need to Charge Like the Light Brigades

Tech Industry's Midlife Crisis and the Cash Cow Local Minima