Or: How to Avoid Having Your Board Meeting Derailed by a Chatbot That Can’t Count to Ten

There’s a moment in every executive’s AI journey that feels a bit like discovering your brilliant new hire has been doing their accounting with crayons. One minute you’re marveling as ChatGPT writes a perfectly coherent strategic memo, and the next minute you’re staring in horror as it confidently explains why 7 x 8 = 54, with the same authoritative tone it used to nail your competitive analysis.

Welcome to the “jagged edge” of AI—a term that captures how these systems can perform like Nobel laureates on Monday and like confused interns on Tuesday, often within the same conversation. This isn’t an argument against using AI; properly deployed, these tools can generate tremendous business value. But “properly deployed” carries a lot of weight in that sentence. It means understanding exactly which problems these systems solve well (and which ones they don’t), building verification systems that catch their predictable failure modes, and measuring performance with the same rigor you’d apply to any other business-critical process.

The executives who succeed with AI aren’t the ones who deploy it most aggressively—they’re the ones who deploy it most strategically, with clear-eyed understanding of where the jagged edge cuts deepest and how to design processes that turn those limitations into manageable constraints rather than business-threatening vulnerabilities. Understanding where this edge gets particularly jagged isn’t just intellectually interesting; it’s essential for any executive who wants to deploy AI without accidentally turning their organization into a cautionary tale.

The Fundamental Misconception: It’s Not Actually Thinking

Let’s start with the uncomfortable truth that undermines most AI strategies: Large Language Models don’t reason the way humans do. They’re extraordinarily sophisticated pattern-matching machines that have memorized vast portions of the internet and can recombine those patterns in impressively human-like ways.

Think of it this way: if human intelligence is like having a conversation with a knowledgeable colleague who can think through problems step by step, LLM intelligence is like having access to someone with perfect recall of every conversation they’ve ever overheard, who can mix and match pieces of those conversations to create something that sounds original but is fundamentally derivative.

This distinction matters enormously for business applications. When an LLM writes a marketing email that perfectly captures your brand voice, it’s not because it understands your brand—it’s because it has seen thousands of similar emails and can pattern-match its way to something that feels right. When it fails spectacularly at basic logic problems, it’s not having an off day; it’s revealing the fundamental limitations of its underlying architecture.

The Seven Deadly Failures: Where LLMs Will Let You Down

1. Multi-Step Reasoning (Or: Why Your AI Can’t Plan a Dinner Party)

Ask an LLM to plan a complex project with dependencies, resource constraints, and contingencies, and you’ll quickly discover that it approaches planning the way a particularly eloquent drunk person approaches directions—lots of confident-sounding steps that don’t actually connect to each other in any logical way.

The problem isn’t that LLMs are bad at individual steps; they’re often quite good at generating plausible-sounding project phases or identifying relevant considerations. The problem is that they fundamentally lack the ability to model cause-and-effect relationships across multiple steps. They can tell you that Step A needs to happen before Step C, but they can’t actually verify that their suggested sequence won’t create logical contradictions or resource conflicts.

This manifests in business contexts as beautifully written project plans that fall apart the moment someone tries to actually execute them. The timeline looks reasonable, the deliverables sound comprehensive, and the whole thing reads like it was written by a very expensive consultant—right up until you realize that Phase 2 requires the output of Phase 4, and the budget assumes you can hire three senior developers who don’t exist in your local market.

2. Novel Problem Solving (Or: The Copy-Paste Limitation)

LLMs excel at problems they’ve seen before, in slightly different clothes. They struggle profoundly with genuinely novel situations that require creative problem-solving rather than sophisticated pattern matching.

Consider this scenario: you ask an LLM to help optimize your supply chain for a product that has never existed before, in a market with unusual regulatory constraints, using a business model that combines elements in a way that’s genuinely new. The LLM will confidently generate recommendations that sound sophisticated and well-researched. The problem is that those recommendations will be clever recombinations of supply chain advice for similar-but-not-identical situations, regulatory guidance for adjacent-but-not-applicable frameworks, and business model suggestions for companies that sort of resemble yours but aren’t actually facing your specific constraints.

The result is advice that sounds authoritative but fundamentally misses the novel aspects of your situation—the very aspects that probably make your business interesting in the first place.

3. Consistency Under Pressure (Or: The Brittleness Problem)

Perhaps the most professionally embarrassing failure mode is the brittleness problem: LLMs can solve a problem perfectly when presented one way and fail spectacularly when the same problem is presented with trivial variations in wording, context, or framing.

Imagine you’ve trained your team to use an LLM for financial analysis, and it consistently provides excellent insights when analyzing “quarterly revenue trends.” Then one day, someone asks it to analyze “Q4 revenue patterns” and it completely misunderstands the request, providing analysis that would get a junior analyst fired. The underlying question is identical; only the surface presentation has changed.

This brittleness makes LLMs particularly unsuitable for customer-facing applications where you can’t control how people phrase their requests, or for mission-critical processes where consistent performance is more important than peak performance.

4. Self-Verification (Or: The Confidence Paradox)

One of the most dangerous characteristics of LLMs is their complete inability to reliably assess the quality of their own output. They will deliver completely incorrect information with exactly the same tone of authority they use for accurate information.

Worse, when prompted to check their own work, they often become even more confident in their mistakes. It’s like asking someone to proofread their own writing while they’re still drunk—they’ll confidently tell you it’s perfect, possibly while making additional errors in the process of “checking.”

This creates a particularly treacherous scenario for executives: the more sophisticated the LLM’s output sounds, the more likely you are to trust it, but the sophistication of the language provides no meaningful signal about the accuracy of the content. A completely fabricated financial projection can be presented with exactly the same level of apparent expertise as a meticulously researched market analysis.

5. Abstract Reasoning (Or: The Symbol Grounding Problem)

LLMs manipulate symbols (words, numbers, concepts) without actually understanding what those symbols represent in the real world. They can discuss complex business concepts fluently without having any grounded understanding of how those concepts actually work in practice.

This manifests as advice that sounds strategically sophisticated but reveals, upon closer examination, fundamental misunderstandings about how businesses actually operate. An LLM might recommend a pricing strategy that sounds brilliant in theory but assumes customer behavior that doesn’t exist, or suggest an organizational restructuring that ignores basic human psychology, or propose a partnership structure that violates securities law in ways that would be obvious to anyone who actually understands what those legal terms mean in practice.

6. Domain Expertise Verification (Or: The Confident Ignorance Problem)

LLMs are remarkably good at sounding like experts in fields where they have no actual expertise. They’ve been trained on so much text that they can approximate the language patterns of specialists in virtually any domain, but this linguistic facility masks a complete absence of real understanding.

The seemingly simple task of creating PowerPoint presentations illustrates this perfectly. While LLMs can generate slides that contain all the right information and follow standard formats, they consistently produce presentations that feel oddly ineffective despite being technically correct. The content is accurate, the structure is logical, but something crucial is missing—the subtle understanding of how information flows, builds momentum, and persuades audiences that separates competent presentations from compelling ones. And perhaps more fundamentally, they lack the aesthetic judgment to create visuals that actually enhance rather than merely accompany the content.

But this effectiveness gap pales compared to the accuracy risks in technical domains. This is particularly dangerous in regulated industries, technical fields, or specialized business contexts where getting the details wrong has serious consequences. An LLM can write a compliance memo that uses all the right terminology and follows the standard format, but quietly includes recommendations that would expose your company to significant legal liability. It can produce a technical specification that sounds authoritative to non-experts but contains assumptions that would cause expensive failures during implementation.

7. Context Maintenance (Or: The Goldfish Memory Problem)

While LLMs can handle reasonably long conversations, they struggle to maintain consistent understanding of complex contexts over extended interactions. They’re particularly bad at tracking implicit context, evolving constraints, and the cumulative impact of previous decisions within a conversation.

In business contexts, this means that an LLM might provide excellent advice for the first part of a strategic discussion, then gradually lose track of the key constraints and objectives as the conversation evolves, eventually providing recommendations that directly contradict earlier advice or ignore crucial context that was established earlier in the discussion.

The Illusion of Competence: Why This Matters More Than You Think

The fundamental challenge with LLM failures isn’t that they’re obviously broken—it’s that they fail in ways that can be remarkably hard to detect, especially for executives who aren’t immersed in the technical details of how these systems work.

Traditional software fails in obvious ways: error messages, crashes, clearly incorrect outputs. LLMs fail by producing sophisticated-sounding outputs that contain subtle but critical flaws. It’s the difference between a calculator that displays “ERROR” when you ask it to divide by zero versus one that confidently tells you the answer is 47.

This creates a particularly insidious problem for executives: the more impressive an LLM’s output appears, the more likely it is to be trusted, but the apparent sophistication provides no reliable signal about accuracy or appropriateness. In fact, the most dangerous LLM outputs are often the ones that sound most authoritative.

Practical Implications: Building AI Strategy Around Limitations

Understanding these limitations doesn’t mean avoiding AI—it means using it strategically in contexts where its strengths matter more than its weaknesses, and building verification systems that account for its failure modes.

Use LLMs for tasks where getting it mostly right is more valuable than getting it perfectly right: Content generation, brainstorming, initial drafts, research synthesis. Don’t use them for tasks where precision matters more than creativity.

Never use LLMs as final authorities on facts, regulations, or technical specifications: They’re excellent research assistants but terrible fact-checkers. If accuracy matters, verify everything through authoritative sources.

Design workflows that assume inconsistency: Build processes that work even when the LLM has an off day. If your business process breaks when the AI provides inconsistent responses to similar queries, you’ve designed a fragile system.

Implement systematic verification for high-stakes decisions: The more important the decision, the more human oversight you need. LLM output should inform expert judgment, not replace it.

Be especially cautious with novel or complex scenarios: The more your situation deviates from common patterns, the less reliable LLM advice becomes. Use AI for inspiration and initial analysis, but rely on human expertise for the truly novel aspects of your business.

The Measurement Imperative: Why Your Gut Feeling About AI Is Probably Wrong

Here’s the uncomfortable truth about evaluating AI performance: human intuition is spectacularly bad at it. We’re naturally wired to be impressed by fluent language and confident tone, which means we systematically overestimate LLM performance in exactly the areas where they’re most dangerous.

Consider this scenario: your team uses an LLM to draft client proposals for six months. Everyone feels good about it—the proposals sound professional, clients respond positively, and your win rate seems steady. Then someone finally runs a systematic evaluation and discovers that 23% of the proposals contain factual errors about your company’s capabilities, 31% include pricing structures that would lose money if accepted, and 18% make commitments that violate your standard service agreements. Your intuitive assessment was “this is working great,” while the actual performance was “this is creating significant legal and financial risk.”

This isn’t an indictment of your judgment—it’s a feature of how these systems are designed. LLMs are optimized to produce outputs that feel right to humans, not outputs that actually are right. They’re essentially weaponized conviction, capable of making terrible advice sound authoritative and brilliant insights sound routine.

The Evaluation Discipline: Measuring What Actually Matters

The most successful AI implementations treat evaluation as a core competency, not an afterthought. This means building systematic measurement frameworks that capture the specific ways these systems succeed and fail in your business context.

Effective evaluation goes far beyond asking “does this seem good?” Instead, it requires developing specific, measurable criteria for success and failure. For a financial analysis LLM, this might mean tracking not just whether the analysis sounds sophisticated, but whether the assumptions are reasonable, the calculations are correct, the risk assessments align with historical data, and the recommendations would actually generate positive returns if implemented.

The revelation that comes from proper measurement is often sobering. Teams discover that their AI performs brilliantly on 70% of cases, adequately on 20%, and fails catastrophically on 10%—but that 10% includes scenarios that would cause serious business problems if not caught. More importantly, they discover that their intuitive sense of which cases would be problematic bears almost no relationship to which cases actually are problematic.

The Hidden Costs of Intuitive Assessment

Organizations that rely on intuitive evaluation of AI performance systematically underinvest in verification and systematically overestimate reliability. This creates a particularly insidious form of technical debt: the AI seems to be working well right up until it isn’t, and by then the problems are deeply embedded in business processes.

It’s rather like the corporate equivalent of carbon monoxide poisoning—everything seems fine until you realize you’ve been slowly accumulating dangerous exposures that could have been easily detected with proper monitoring equipment. Except in this case, the “carbon monoxide” is confidently incorrect AI output, and the “monitoring equipment” is systematic evaluation frameworks that most organizations consider too boring to implement properly.

The companies that invest early in measurement capabilities—building evaluation frameworks, tracking performance metrics, and developing systematic approaches to identifying edge cases—typically discover two things: first, their AI systems have more limitations than they initially realized; second, understanding those limitations precisely makes the systems vastly more valuable, because they can be deployed strategically rather than hopefully.

The Bottom Line: It’s a Tool, Not a Colleague

The most successful executives approach LLMs the way they approach any other sophisticated tool: with clear understanding of what it does well, what it does poorly, and how to structure work to leverage its strengths while mitigating its weaknesses.

The key insight is that LLMs are not artificial colleagues—they’re artificial consultants who have read everything but understand nothing, who can help you think through problems but can’t actually think through problems themselves. Used with appropriate caution and verification, they can be extraordinarily valuable. Used with excessive trust or inappropriate expectations, they can create expensive problems that are remarkably hard to debug.

The companies that will benefit most from AI aren’t the ones that deploy it most aggressively—they’re the ones that deploy it most thoughtfully, with clear-eyed understanding of where the jagged edge cuts deepest and how to design processes that turn those limitations into manageable constraints rather than business-threatening vulnerabilities.

After all, the goal isn’t to have the most impressive AI implementation at the next industry conference. The goal is to have AI that actually makes your business better, one carefully verified, appropriately constrained, systematically improved process at a time.