Product Thinking w/ Surya - Strategic Insights on Product and Technology

Traffic Metrics Are Lying to You

Your traffic is down. Your growth team is panicking. And your product metrics might be telling you absolutely nothing useful.

Kyle Poyar's 2025 State of B2B GTM report uncovered something fascinating: Webflow's aggregate traffic is declining while their business is accelerating. ChatGPT referrals convert at 24% compared to 4% from Google. Two-thirds convert within 7 days.

This isn't a Webflow-specific anomaly. It's what happens when AI search reshapes discovery.

The death of aggregate traffic as a north star

Google AI overviews are fundamentally changing what traffic even means. Low-intent, high-volume queries that used to pad your metrics are vanishing into AI-generated answer boxes. The traffic that remains is radically higher quality.

"A lot of our lower value and lower intent traffic has gone down, but there's higher quality traffic occurring even as the aggregate declines," Josh Grant, Webflow's VP of Growth, told Poyar.

Aggregate traffic is completely misleading without a quality metric. You're not measuring growth. You're measuring noise. This is a symptom of a broader problem with how metrics fail to capture what actually matters.

The new metrics layer: Visibility, comprehension, conversion

If traditional traffic metrics don't work, what does? Webflow built a three-layer framework for AI discovery:

Visibility: How often you're cited in AI search results. Not impressions or rankings. Citations. They track this across ChatGPT, Perplexity, and Claude using tools like Profound.

Comprehension: How accurately AI models describe your product versus competitors. Grant's team prompts multiple LLMs side by side to audit their narrative. If the description is wrong, they know where to improve.

Conversion: Signup rates and time-to-conversion from LLM-referred traffic. High-intent traffic doesn't just convert better. It converts faster.

Traditional SEO dashboards track rankings and clicks. This framework tracks whether AI systems understand, trust, and recommend you.

What this means for product teams

This isn't just a marketing problem. Your product's narrative has to work for both humans and AI models. How ChatGPT describes your product when users aren't searching for you by name is your new positioning test.

The metrics you're optimizing for might be pushing you in the wrong direction. Volume-based goals (MAU, traffic, impressions) reward low-quality interactions. Quality-based goals (conversion rate, time-to-convert, citation frequency) reward relevance and trust. Instead of chasing traffic, map your work by customer value and business value to reveal what's actually moving the needle.

AI discovery is volatile, not fixed like Google rankings. Grant's observation: "Every query is a fresh model run that reshuffles sources in real time based on context, trust, and recency." You can't optimize once and coast.

Teams treating AI discovery as optional or as a one-time project will spend the next year explaining why their metrics look strong but their pipeline has dried up.

The question you should be asking

If aggregate traffic is misleading, what quality metrics are you tracking today? If you're not tracking quality separately from volume, how do you know whether your growth is real or just noise?

The shift from traffic to intent, from volume to quality, from rankings to comprehension is not a future state. It's happening now. Webflow's data proves it. The question is whether your metrics can see it.

45 Minutes with Claude Code: From Tag Chaos to Scalable Taxonomy

I publish daily. Over 100 posts already live.

The tag problem emerged quickly—60-ish unique tags across 101 posts. Twenty-one used exactly once. Tags like AI and ai coexisting. go-to-market-strategy next to go-to-market. moat, paradigm-shift, outcome-oriented, scattered everywhere.

Navigation was becoming noise.

The Real Problem Isn't Tags

It's what happens when you don't design for scale from day one.

Most blogs start small. A handful of posts. A few obvious tags. Everything feels manageable because the volume is low. But daily publishing changes the equation—you're not building a collection, you're building a system.

Without structure, tags proliferate. Every post gets 6-7 tags "just to be safe." New tags appear for one-off topics. Capitalization drifts. Similar concepts fragment across multiple tags.

The result? Your archive becomes harder to navigate as it grows. The opposite of what you want.

Enter Claude Code and Three-Tier Taxonomy

In a 45-minute session with Claude Code on my iPad, I was able to achieve a clean taxonomy, migrating from the legacy system. This follows the same pattern I used to redesign the archive pages—defining the system, then letting AI implement it.

I built a simple framework: Tier 1, Tier 2, Tier 3.

Tier 1 (8 tags): Core themes. The pillars of what I write about—ai, product-strategy, product-management, product-leadership, innovation, customer-focus, team-empowerment, decision-making. Every post needs at least one.

Tier 2 (18 tags): Supporting topics. More specific angles—agentic-ai, systems-thinking, business-models, frameworks, metrics. These add nuance without fragmenting.

Tier 3 (6 tags): Niche specialists. Topics like vibe-coding, healthcare-tech, security. Use sparingly. Promote to Tier 2 if usage hits 15+ posts.

Total: 40 approved tags. Down from 63. Every tag earns its place.

The rules are strict: relevant tags not exceeding 5 per post, all lowercase, approved list only. No exceptions. Consistency beats flexibility when you're shipping daily.

Automation Makes It Stick

Rules without enforcement don't work.

I built validation: npm run validate-tags . It checks everything—tag count, capitalization, approved list, tier distribution. The script fails the build if tags don't comply. No manual checking. No drift over time.

Then migration: npm run migrate-tags. It consolidated the 63 tags into 40, updated all 92 existing posts, auto-suggested tags for posts with fewer than 5, and created backups before touching anything.

One command. Complete taxonomy overhaul. Reversible if needed.

Categories Map to Navigation

Tags are a backend structure. Categories are frontend experience. Then, bucketed tags into appropriate categories. Claude Code helped me implement the Archives and the landing page with these categories.

I mapped all 40 tags to 6 high-level categories:

Now the homepage sidebar and archive filters show the same 6 categories. Consistent navigation. Readers don't see "63 tags"—they see 6 clear paths into the content.

Build the structure. Automate the compliance. Let the content compound.

When Your Reports Become Your Customers

You call a meeting to "align on priorities." Your team spends two hours in a conference room. Decisions get deferred pending "more data." Everyone leaves to update their status decks for next week's follow-up.

You just cost your team 10 hours of productive work. What did they get in return?

If the answer isn't something concrete and valuable, you're net negative. And most managers are.

I've been testing a framework inspired by Roger Martin's A New Way to Think. His concept applies to organizational layers, corporate strategy, and entire business units. But the core idea hit me hard: every layer above the front line must add more value than it costs. If it doesn't, it weakens the competitive position.

I apply it to my functioning. One question before every initiative: Does this help my reports win more than it costs them?

What does that look like in practice? Here are the shifts:

The value-add question

Before I schedule a meeting, create a process, or ask for a deliverable, I force myself to answer: Does this help more than it costs?

Time is the cost. Coordination overhead is the cost. Delayed decisions are the cost.

If I can't articulate a specific value that exceeds those costs, I kill it. Even if my peers do it. Even if it's "standard practice."

What your reports actually need

I flipped my 1-on-1s. Instead of collecting status, I ask: "What do you need that I can provide or unblock?"

The shift is fundamental. Your reports aren't your employees—they're your customers. I justify my existence by providing services they can't get more efficiently elsewhere.

Legitimate services: strategic rigor, negotiating vendor contracts, building executive relationships, creating shared capabilities, and removing obstacles. Roll up my sleeves for prototyping and building as necessary.

Not legitimate: coordination and alignment that doesn't produce outcomes, strategic oversight that doesn't make them more competitive, process standardization that doesn't reduce their work.

Most of what managers do falls in the second category. I'm trying to kill it.

The elimination test

If I disappeared tomorrow, what would break? That's my real value. Everything else is coordination theater.

Stay connected to real competition

I spend time where my team's work actually competes. Customer calls, demos, key artifacts for solution discovery and building. Anywhere customers make choices.

Competition doesn't happen in strategy decks. It happens at the front line, where a customer picks your product over the alternative.

Why this works

Most managers optimize for boss satisfaction, peer coordination, and risk minimization.

If you optimize for your direct reports' competitive advantage instead, you create alpha. Your team ships faster. They win more customers. Morale improves.

You're not changing the org chart or fighting power structures. You're shifting from coordination to capability-building. That shift requires no executive approval. It's a choice.

The organization may not reward it explicitly. But your team's results will compound over time. The performance difference becomes undeniable.

Not prescriptive, just experimental

This is what I'm testing in my own work. It's not the only way to manage. I'm learning as I go, adjusting when things don't work.

Roger Martin's framework goes much deeper than what I've described here. His book is worth reading if you're interested in the broader implications challenging broad spectrum of existing operating models.

For me, the practical takeaway was simpler: I can choose to be net positive or net negative to my team. That choice is mine to make.

So I keep asking the question: Does this help my reports win more than it costs them?

And I keep killing the things that fail the test.

What Emerged: 99 Days of Product Thinking Journal

One hundred posts. For me, writing is clarifying thinking. I built this entire site with Claude Code—designed it, deployed it, automated the Obsidian-to-Cloudflare publishing flow. Now, sixty percent of what I've written is about AI. The tool became the subject. That's either profound or obvious, depending on your tolerance for meta-commentary.

Here's what I didn't expect: not the daily writing (I'm reading widely anyway, so ideas only compound), but the sheer pace at which AI developments demanded rethinking. Every week brought capability shifts, strategic implications, and deployment patterns worth exploring. You can't ignore the acceleration even if you tried.

This is what emerged when you show up every day without an agenda.

The AI Emergence I Didn't Plan

When I started in July, I knew AI would feature. Product thinking intersects with every major platform shift, and this one's moving faster than most. But I didn't anticipate writing fifty-nine posts with AI in the title, tags, or core argument. That's not editorial strategy—it's the environment forcing constant synthesis.

The vibe coding surprise compounds this. Claude Code didn't just help build features; it architected the entire site. Pagination systems, archive layouts, newsletter integration, test coverage—all generated through conversational iteration. The flow from Obsidian draft to live Cloudflare deployment is trivial now. No friction, no deployment anxiety, no "let me check if this breaks production."

What does that say about where we are? When the tool that builds your platform becomes the platform story itself, you're living through the shift everyone's theorizing about. The gap between "AI will change how we work" and "AI is how I work" closed faster than expected.

Systems thinking applies to content, too. Each post wasn't planned in isolation—ideas connected, frameworks built on frameworks, and patterns emerged that I didn't consciously design. The writing process became a feedback loop: publish insight, watch what resonates, follow threads that matter. Product thinking applied to product thinking itself. [...]

Two GTM Insights Product Managers Can Actually Use

I've been digging through the 2025 State of B2B GTM Report from Growth Unhinged, and while most of it focuses on channel strategy and GTM execution, two findings stood out for their direct relevance to product work.

These aren't prescriptions—they're observations from one dataset that might be useful as you think about your own product decisions.

Your pricing tier predicts your GTM motion (not the other way around)

The survey shows clear patterns between product pricing and which GTM motions actually work:

PLG dominates for products under $5k/year and companies under $1M ARR
Account-based motions work best for expensive products (above $25k ACV)
Mid-range products ($5-25k) see more success with paid acquisition

What caught my attention: this suggests that pricing isn't just a revenue decision—it's a GTM architecture decision.

When you're setting pricing tiers or deciding on packaging, you're also making a bet on how the product will go to market. A $2k/year product architected for sales-assisted conversion is fighting uphill. A $30k/year product expecting viral PLG growth faces the same problem.

This doesn't mean you can't defy these patterns. But it does mean your pricing strategy should be informed by the GTM motion you're willing and able to execute or vice versa.

For PMs: the next time you're in a pricing discussion, it's worth asking explicitly: "Which GTM motions does this pricing strategy enable or constrain?"

AI features work better as an augmentation than a replacement

The report shows high AI adoption across GTM teams, but 53% see limited or no impact from those investments.

The specifics are telling. AI SDRs (full replacement plays) are particularly disappointing. One team reported "six months, zero opportunities." Meanwhile, AI that augments human workflows—intent-driven outbound, market intelligence, content support—shows better results.

This maps to a broader product principle: automation that eliminates steps in an existing workflow tends to work better than automation that tries to replace the entire workflow. I've explored this pattern before—AI agents grow work rather than replace it.

For product teams building AI features, this suggests focusing on making humans more effective rather than eliminating them. AI that surfaces insights, automates tedious parts of a process, or handles high-volume, low-stakes tasks seems to land better than AI that tries to own an entire job function.

The nuance: this is one survey of B2B GTM teams, not a universal law. But it's consistent with what I'm seeing across other domains—the "copilot" framing works, the "autopilot" framing struggles. For now.

What this means in practice

These aren't definitive answers—they're data points worth considering as you make product decisions.

On pricing: think through the GTM implications before you lock in that tier structure. Your pricing model is also a distribution model.

On AI: consider whether your AI feature is designed to augment a human workflow or replace it entirely. The former seems to be landing better in the market right now.

What patterns are you seeing in your own product work? Do these observations match what you're experiencing, or are you seeing something different?

Early Experience: A Different Approach to Agent Training

AI agents are currently in use, handling customer service interactions, automating research workflows, and navigating complex software environments. But training them remains resource-intensive: you either need comprehensive expert demonstrations or the ability to define clear rewards at every decision point.

Meta's recent research explores a third path. Agent Learning via Early Experience proposes agents that learn from their own rollouts—without exhaustive expert coverage or explicit reward functions. It's early, but the direction is worth understanding.

(Source: Meta's research paper)

Current Training Constraints

Today's agent training follows two primary approaches, each with different resource demands:

Imitation Learning works well when you can provide thorough expert demonstrations. The challenge isn't the method—it's achieving comprehensive coverage across the scenarios your agent will encounter in production.

Reinforcement Learning delivers strong results when you can define verifiable rewards. But most real-world agent tasks like content creation, customer support, and research assistance don't have clear numerical rewards at each step. You're left with engineering proxy metrics that may not capture what actually matters.

Neither approach is inherently limited. Both are constrained by what they require: extensive demonstrations or definable rewards.

What Early Experience Proposes

Meta's research introduces a training paradigm where agents use their own exploration as the learning signal. Two mechanisms drive this:

Implicit World Modeling: The agent learns to predict what happens after it takes actions. These predictions become training targets—future states serve as supervision without external reward signals. The agent builds intuition about environmental dynamics through its own experience.

Self-Reflection: The agent compares its actions to expert alternatives and generates natural language explanations for why different choices would be superior. It's learning from its suboptimal decisions through structured comparison.

The core idea: an agent's own rollouts contain a training signal. You don't need a human expert for every scenario or a reward function for every decision.

Whether this scales to production environments across different domains remains an open question.

The Research Numbers

In controlled benchmark environments, Early Experience showed meaningful gains over imitation learning: +18.4% on e-commerce navigation tasks, +15.0% on multi-step travel planning, and +13.3% on scientific reasoning environments.

When used as initialization for reinforcement learning, the approach provided an additional +6.4% improvement over starting from standard imitation learning.

These are research benchmarks, not production deployments. The question is whether these gains transfer to real-world complexity and whether the approach works across different agent domains.

What Changes If This Materializes

If this training paradigm proves viable at scale, several implications follow:

Training economics shift: Less dependence on comprehensive expert demonstration coverage could reduce the human-in-the-loop burden during agent development. You're trading labor-intensive curation for computation-intensive self-supervised learning.

Deployment pathway evolves: Start with Early Experience training, deploy and collect production data, then layer reinforcement learning for further optimization where rewards are verifiable. Each stage builds on actual agent experience rather than static expert datasets.

Infrastructure requirements matter: The approach needs agents with enough initial capability to generate meaningful rollouts. It's most applicable in domains with rich state spaces like web navigation, API interactions, and complex planning tasks.

This isn't a universal solution. It's likely domain-dependent, and we don't yet know where the boundaries are.

The Question Worth Asking

It's too early to call this a paradigm shift. But it represents a direction worth watching: agents learning through structured exploration of their own experience rather than pure imitation or reward maximization.

The research suggests that training agents might become less labor-intensive. Whether that transfers from research benchmarks to production systems is still uncertain.

For teams building agents: what experiments could validate whether self-supervised learning works for your specific use cases? The window between "interesting research" and "table stakes capability" has a way of closing faster than expected.

Agentic AI: It's the Readiness and Access Story

The gap between hype and reality isn't the story everyone's missing about agentic AI. The gap between who's positioned to deploy it and who's stuck waiting for infrastructure—that's the story.

And that gap is widening every quarter.

The technology is proven—access to it is not

Nearly every senior enterprise developer is experimenting with AI agents right now. One in four enterprises is deploying them across teams this year. The question isn't whether autonomous AI systems work. It's whether your organization is set up to use them.

Agentic AI means systems that plan workflows, make decisions, use tools, and execute toward goals autonomously. Several companies are automating complex research workflows. Not demos—production deployments.

The constraint isn't capability. It's infrastructure readiness.

The divide that determines everything

Two types of organizations are emerging.

One group is navigating APIs that don't exist, data scattered across incompatible systems, procurement processes that take months, and compliance frameworks designed for a different era. They're blocked by legacy infrastructure.

The other group solved these problems early. They built integration layers, consolidated data architectures, and established governance processes before they were urgent.

This second group is deploying autonomous AI systems right now, while the first waits for infrastructure to catch up. In twelve months, the capability gap between these groups will be dramatic.

The comfort of "everyone's struggling together" is false. Some organizations aren't struggling; they're shipping.

What's changing about work itself

Humans are shifting toward workflow design and outcome verification rather than task execution. Less time gathering data, more time interpreting it.

This transition creates winners and losers. Product managers who learn to architect agent workflows will be indispensable. Those focused on task-level execution will find their roles increasingly automated. Technologists who understand how to build for autonomous systems will command premium value. Those who wait for clarity will find the market has moved past them.

Some roles will be eliminated. Others will be created. Most will transform beyond recognition.

The realistic path forward requires action now

Most deployments today are basic: simple tasks with predefined objectives. Not revolutionary, but achievable even with infrastructure constraints.

You don't need perfect systems to start learning. Pick one workflow: document triage, report generation, or data synthesis. Run it with full human review. Measure time saved. Identify what breaks. Iterate.

You're building organizational fluency with the technology, so when infrastructure catches up, you're ready to deploy at scale.

The teams treating this as optional will spend next year explaining to leadership why competitors moved faster.

What's actually at stake

The transformation is real, but access to it is unequal. That inequality is compounding.

Companies positioned to deploy autonomous AI systems are establishing leads measured in quarters, not weeks. The window for experimentation without falling behind is closing.

This isn't about whether agents will replace human work. It's about whether you're positioned to architect the systems that leverage them. Or whether you'll be explaining why your organization wasn't ready.

What experiment can your team run this quarter?

The Impact Scorecard

It's surprisingly easy to stay busy without making much of an impact.

A team ships features, hits sprint goals, and sees metrics move—but six months later, it's unclear what actually mattered. Not because the team wasn't working hard, but because "impact" is slippery to define.

I've found it helpful to think about impact along two dimensions: customer value and business value. When you map your work on both axes, patterns start to emerge about what's actually moving the needle.

The Impact Scorecard

Think of it as a simple 2x2 matrix. One axis measures how much customers value what you built. The other measures how much it helps the business. Every product initiative lands somewhere on this grid, and where it lands tells you something important.

Quadrant 1: High Customer Value + High Business Value

This is what you're aiming for. You've built something that genuinely helps customers while also driving metrics that matter to the business—maybe retention, revenue, or strategic positioning. A feature that reduces a key pain point and improves conversion. An onboarding flow that both helps new users succeed and increases activation rates. When customer needs and business needs align, you've found the sweet spot.

Quadrant 2: High Customer Value + Low Business Value

Customers love what you built, but it's not moving business metrics. Maybe it's a delightful feature that doesn't connect to conversion or retention. These aren't always wrong: some features are strategic investments in trust and brand. But if most of your roadmap lives here, it's worth asking whether your work is sustainable long-term.

Quadrant 3: Low Customer Value + High Business Value

This drives short-term business results, but customers don't find much value in it. Maybe it's an aggressive upsell prompt or a feature that benefits the business more than users. These can create tension over time. Numbers might look good this quarter, but you're spending down trust, and that has costs down the road that don't always show up in dashboards.

Quadrant 4: Low Customer Value + Low Business Value

Neither customers nor the business benefit much. This often happens when we build based on assumptions rather than evidence, or when we optimize for stakeholder requests without validating demand. It's not a failure, it's just learning. The goal is to recognize these early and redirect effort toward higher-impact work.

What This Means in Practice

Try mapping your recent launches on this grid. You'll probably find a mix across quadrants—that's normal. The exercise isn't about judging past decisions, but about spotting patterns in where you're investing time.

If you notice most of your work clustering outside Quadrant 1, it might be worth asking: How could we shift more effort toward work that delivers both customer and business value?

Impact happens at the intersection of solving real customer problems and moving metrics that matter to your business. Everything else is still valuable work. You learn, you build skills, you discover what doesn't work. But knowing the difference helps you be more intentional about where you spend your time.