Academic Research Scan — 2026-02-21

🔬 High Priority Papers

arXiv — Agentic Commerce & AI Marketplaces

Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge — Wyatt Benno, Alberto Centelles, Antoine Douchet, Khalil Gibran
- Abstract summary: Presents a zero-knowledge machine learning (zkML) framework that extends the Jolt proving system to verify model inference directly on ONNX tensor operations. Unlike zkVMs, it works natively with ML model formats, achieving practical proving times for classification, embedding, reasoning, and small language models. The system enables on-device cryptographic verification without specialized hardware. Critically, the companion work outlines use cases including guardrails for agentic commerce and trustless AI context/memory.
- Relevance to agentic commerce: This is one of the first papers to explicitly frame zkML as infrastructure for agentic commerce guardrails. The ability to cryptographically verify that an AI agent's inference was performed correctly — without revealing the model or data — is foundational for trust in autonomous agent transactions. Directly applicable to ERC-8004 agent identity verification and the trust gap that AgentProof and Sapiom are trying to solve.
- Link: https://arxiv.org/abs/2602.17452
- Published: 2026-02-19 | Categories: cs.CR, cs.AI
Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation — Yuhong Luo, Daniel Schoepflin, Xintong Wang (Rutgers)
- Abstract summary: Introduces a meta-game framework for studying whether AI pricing agents collude under realistic "test-time" constraints (no long training horizons, no symmetry assumptions). Models agents as having pretrained policies with distinct strategic characteristics and examines rational meta-strategy selection. Evaluates both RL-based and LLM-based strategies in repeated pricing games under symmetric and asymmetric cost settings. Accepted at AAMAS 2026.
- Relevance to agentic commerce: As AI agents increasingly set prices autonomously (think Amazon algorithmic pricing, x402 service pricing), the collusion question becomes urgent. This paper provides the first framework for evaluating collusion risk in environments where agents have pre-trained behaviors — exactly the scenario when deployed agentic commerce systems interact. Regulatory implications are significant for any marketplace where AI agents transact.
- Link: https://arxiv.org/abs/2602.17203
- Published: 2026-02-19 | Categories: cs.MA, cs.GT
SPILLage: Agentic Oversharing on the Web — Jaechul Roh, Eugene Bagdasarian, Hamed Haddadi, Ali Shahin Shamsabadi
- Abstract summary: Formalizes "Natural Agentic Oversharing" — the unintentional disclosure of task-irrelevant user information through an agent's action trace on the web. Introduces a taxonomy along two dimensions: channel (content vs. behavioral) and directness (explicit vs. implicit). Benchmarks 180 tasks on live e-commerce sites across 1,080 runs spanning two agentic frameworks and three LLMs. Finds that oversharing is pervasive, with behavioral oversharing dominating content oversharing by 5×. Removing task-irrelevant information improves task success by up to 17.9%.
- Relevance to agentic commerce: This is a critical safety paper for the agentic commerce space. When OpenClaw agents browse shopping sites, book flights, or execute purchases, they leak information through clicks, scrolls, and navigation patterns — not just text. This directly impacts lobster.cash and any agent payment system: agents acting on behalf of users create behavioral fingerprints that third parties can monitor. The finding that reducing oversharing actually improves task success is actionable.
- Link: https://arxiv.org/abs/2602.13516
- Published: 2026-02-13 | Categories: cs.AI
Towards a Science of AI Agent Reliability — Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan (Princeton)
- Abstract summary: Proposes twelve concrete metrics decomposing agent reliability along four dimensions: consistency, robustness, predictability, and safety. Evaluates 14 agentic models across two benchmarks. Key finding: recent capability gains have yielded only small improvements in reliability. Agents that score well on success benchmarks still fail inconsistently, degrade unpredictably, and lack bounded error severity. Grounded in safety-critical engineering practices.
- Relevance to agentic commerce: Arvind Narayanan (Princeton, "AI Snake Oil" author) bringing safety-critical engineering rigor to agent evaluation is a landmark. For agentic commerce, reliability metrics are arguably more important than capability metrics — you need to know an agent will fail gracefully when handling money. These 12 metrics should inform how platforms like lobster.cash and Skyfire evaluate agent trustworthiness. Directly connects to the KYA (Know Your Agent) frameworks being built by Sapiom and XKOVA.
- Link: https://arxiv.org/abs/2602.16666
- Published: 2026-02-18 | Categories: cs.AI, cs.CY, cs.LG
AgenticShop: Benchmarking Agentic Product Curation for Personalized Web Shopping — Sunghwan Kim, Ryang Heo, Yongsik Seo, Jinyoung Yeo, Dongha Lee
- Abstract summary: First benchmark for evaluating agentic systems on personalized product curation in open-web environments. Features realistic shopping scenarios, diverse user profiles, and a checklist-driven personalization evaluation framework. Extensive experiments demonstrate that current agentic systems remain largely insufficient for real-world shopping tasks. Accepted at WWW 2026.
- Relevance to agentic commerce: This is the benchmark the agentic shopping space has been waiting for. Companies like Xoori, Mindtrip, and Amazon's agentic commerce teams need standardized ways to evaluate how well shopping agents serve diverse users. The finding that current systems are "largely insufficient" sets a clear bar for improvement and validates the market opportunity.
- Link: https://arxiv.org/abs/2602.12315
- Published: 2026-02-12 | Categories: cs.IR, cs.AI
Autonomous Market Intelligence: Agentic AI Nowcasting Predicts Stock Returns — Zefeng Chen, Darcy Pu
- Abstract summary: Deploys a fully agentic LLM that autonomously searches the web, filters sources, and synthesizes information to evaluate Russell 1000 stocks daily since April 2025. The framework is 100% agentic with zero human curation. Key finding: AI possesses genuine stock selection ability but only for identifying top winners — longing the top 20 stocks generates 18.4 bps daily alpha and an annualized Sharpe ratio of 2.43 with implementable transaction costs. Negative predictions are no better than random, attributed to corporate obfuscation contaminating negative signals.
- Relevance to agentic commerce: This is the first rigorous out-of-sample study of fully agentic financial decision-making. The asymmetry finding (good at identifying winners, bad at losers) has profound implications for autonomous agent wallets like Coinbase's agentic wallets. Agents with spending authority need guardrails specifically for the domains where they're unreliable — and this paper quantifies exactly where that boundary is.
- Link: https://arxiv.org/abs/2601.11958
- Published: 2026-01-17 | Categories: q-fin.GN, q-fin.PM, q-fin.TR

arXiv — AI Agent Economics & Market Dynamics

Experimentation, Biased Learning, and Conjectural Variations in Competitive Dynamic Pricing — Bar Light, Wenyu Wang
- Abstract summary: Studies competitive dynamic pricing among multiple sellers motivated by algorithmic pricing in retail and online marketplaces. Shows that sellers using simple A/B price experiments with linear demand estimates converge to a Conjectural Variations equilibrium. Key insight: synchronized experimentation schedules create learning biases that lead to supra-competitive prices (above Nash equilibrium). Independent experimentation eliminates this bias. Provides finite-sample convergence guarantee with squared price error decaying at T^{-1/2}.
- Relevance to agentic commerce: When AI agents on platforms like Amazon or x402 services run pricing experiments simultaneously, they may inadvertently coordinate on above-market prices — not through explicit collusion but through correlated learning. This has direct regulatory implications for any marketplace where AI agents set prices. The finding that experimentation design is effectively a "market design lever" means platform operators (like those running agent marketplaces) can shape pricing outcomes through how they structure agent experimentation.
- Link: https://arxiv.org/abs/2602.12888
- Published: 2026-02-13 | Categories: cs.GT
Governing AI Forgetting: Auditing for Machine Unlearning Compliance — Qinqi Lin, Ningning Ding, Lingjie Duan, Jianwei Huang
- Abstract summary: Introduces the first economic framework for auditing machine unlearning compliance, integrating certified unlearning theory with regulatory enforcement. Proposes a game-theoretic model capturing strategic interactions between auditors and AI operators. Counterintuitively finds that auditors can optimally reduce inspection intensity as deletion requests increase because weaker unlearning makes non-compliance easier to detect. Also proves that undisclosed auditing paradoxically reduces regulatory cost-effectiveness vs. disclosed auditing.
- Relevance to agentic commerce: Right-to-be-forgotten compliance becomes especially complex when AI agents accumulate transaction histories and user preferences. As agents interact with users across commerce platforms, the question of what an agent "remembers" about a user becomes legally significant. This framework could inform how agentic commerce platforms handle data deletion requests — relevant to GDPR compliance for systems like OpenClaw and lobster.cash.
- Link: https://arxiv.org/abs/2602.14553
- Published: 2026-02-16 | Categories: cs.LG, cs.AI, cs.GT

arXiv — Financial Agent Systems

FactorMiner: A Self-Evolving Agent Framework for Financial Alpha Discovery — Yanlong Wang, Jian Xu, Hongkang Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang
- Abstract summary: Proposes a self-evolving agent framework for quantitative investment that combines modular skill architecture (financial evaluation tools) with structured experience memory (distilled patterns from prior mining). Implements a Ralph Loop (retrieve, generate, evaluate, distill) to iteratively discover alpha factors while reducing redundancy. Experiments across multiple markets show it constructs diverse, high-quality factor libraries while maintaining low factor redundancy even as the library scales.
- Relevance to agentic commerce: Demonstrates the pattern of agents with persistent memory and evolving skill sets that's central to the agentic economy vision. The "experience memory" that distills past trials into actionable insights parallels what Jolt Atlas calls "trustless AI context" — agent memory that persists across sessions and informs future decisions. As AI agents manage financial portfolios autonomously, this self-improvement loop becomes the competitive moat.
- Link: https://arxiv.org/abs/2602.14670
- Published: 2026-02-16 | Categories: q-fin.TR, cs.MA
Resisting Manipulative Bots in Meme Coin Copy Trading: A Multi-Agent Approach with CoT Reasoning — Yichen Luo, Yebo Feng, Jiahua Xu, Yang Liu
- Abstract summary: Addresses the growing problem of manipulative bots in meme coin copy trading — adversaries that front-run trades, conceal positions, and fabricate sentiment. Proposes a manipulation-resistant system using multi-agent architecture with multimodal LLMs and chain-of-thought reasoning. Achieves 3% average return per meme coin investment under realistic market frictions, outperforming zero-shot and statistical baselines. Published at ACM Web Conference 2026 (WWW'26).
- Relevance to agentic commerce: As autonomous agents trade crypto (via Coinbase wallets, x402, lobster.cash), they become both perpetrators and victims of market manipulation. This paper provides the first defensive framework specifically for agent-vs-agent adversarial dynamics in crypto markets. The multi-agent defense architecture could inform how agent payment platforms build fraud detection.
- Link: https://arxiv.org/abs/2601.08641
- Published: 2026-01-13 | Categories: cs.AI, q-fin.TR
Who Restores the Peg? A Mean-Field Game Approach to Model Stablecoin Market Dynamics — Hardhik Mohanty, Bhaskar Krishnamachari (USC)
- Abstract summary: Develops an agent-based mean-field game framework for fiat-collateralized stablecoins (USDC, USDT) during de-peg episodes. Models arbitrageurs and retail traders interacting across primary (mint/redeem) and secondary (exchange) markets. Calibrated against three historical de-peg events, finding that primary-market arbitrage predominantly stabilizes stress, but when primary redemption is impaired, both channels must work together. Identifies a non-linear breakdown threshold for primary-rail frictions.
- Relevance to agentic commerce: USDC is the backbone of agent payments (Circle nanopayments, x402, lobster.cash). Understanding stablecoin peg dynamics under stress is critical for any system where AI agents hold and transact in stablecoins. The non-linear breakdown threshold finding means agent payment platforms need contingency plans for when stablecoin rails degrade — not a gradual degradation but a sudden cliff.
- Link: https://arxiv.org/abs/2601.18991
- Published: 2026-01-26 | Categories: q-fin.TR, cs.GT, econ.GN
Trading in CEXs and DEXs with Priority Fees and Stochastic Delays — Philippe Bergault, Yadh Hafsi, Leandro Sánchez-Betancourt
- Abstract summary: Develops a mixed control framework for optimal trading between centralized and decentralized exchanges, where traders control execution delay distribution through priority fees. Derives dynamic programming principles for this new class of impulse control problems with stochastic delays and multiple pending orders. Shows that optimal priority fee selection significantly outperforms non-strategic approaches, providing insights on how traders manage latency risk.
- Relevance to agentic commerce: Directly relevant to autonomous agents trading across CEX/DEX — the exact scenario for Coinbase agentic wallets and x402 on-chain payments. The priority fee optimization framework could be embedded into agent wallets to automatically select optimal fees based on latency requirements, similar to how x402 agents currently pay gasless but without fee optimization.
- Link: https://arxiv.org/abs/2602.10798
- Published: 2026-02-11 | Categories: q-fin.TR, math.OC
LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets — Aidan Vyas
- Abstract summary: Benchmarks LLMs as economic agents running a simulated lemonade stand over 30 days — managing inventory, setting prices, choosing hours, maximizing profit. All models achieve profitability, with frontier models capturing 70% of theoretical optimal (>10× improvement over basic models). However, decomposition across six business dimensions reveals models achieve local rather than global optimization — excelling in some areas while showing blind spots in others.
- Relevance to agentic commerce: This is the simplest possible test of "can AI agents do commerce?" and the answer is encouraging but qualified. The local-vs-global optimization finding is important: agents that set great prices might manage inventory poorly. For agentic commerce platforms, this suggests agents need modular evaluation across different commerce functions, not just end-to-end metrics.
- Link: https://arxiv.org/abs/2602.13209
- Published: 2026-01-14 | Categories: q-fin.GN, cs.AI

📄 Notable Papers

Modeling Distinct Human Interaction in Web Agents — Faria Huq, Zora Zhiruo Wang, Zhanqiu Guo et al., including Graham Neubig and Jeffrey P. Bigham (CMU)
- Abstract summary: Collects CowCorpus — 400 real-user web navigation trajectories with 4,200+ interleaved human and agent actions. Identifies four interaction patterns: hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover. Training LMs on these patterns yields 61-63% improvement in intervention prediction accuracy. Live user study shows 26.5% increase in agent usefulness when agents model intervention patterns.
- Relevance to agentic commerce: Understanding when users take over from agents is crucial for commerce — you don't want an agent to autonomously complete a $10,000 purchase when the user was about to intervene. The four interaction patterns provide a framework for designing agent checkout flows and payment authorization levels.
- Link: https://arxiv.org/abs/2602.17588
- Published: 2026-02-19 | Categories: cs.CL, cs.HC
Seeing the Goal, Missing the Truth: Human Accountability for AI Bias — Sean Cao, Wei Jiang, Hui Xu
- Abstract summary: Shows that revealing the downstream use of LLM outputs (e.g., predicting stock returns) causes the LLM to generate biased intermediate measurements, even when those measures should be task-independent. This "purpose leakage" improves performance before the LLM's knowledge cutoff but offers no advantage post-cutoff. Argues this is a research design flaw, not an algorithmic one.
- Relevance to agentic commerce: When AI agents are told they're evaluating products for purchase vs. comparison vs. recommendation, they may produce systematically different assessments. This has implications for agent shopping platforms where the same product evaluation might serve different purposes.
- Link: https://arxiv.org/abs/2602.09504
- Published: 2026-02-10 | Categories: q-fin.GN, cs.AI
Behavioral Consistency Validation for LLM Agents: Trading-Style Switching through Stock-Market Simulation — Zeping Li, Guancheng Wan et al., including Philip Torr (Oxford)
- Abstract summary: Tests whether LLM agents' trading behavior aligns with real market participants' patterns, specifically examining strategy switching driven by loss aversion, herding, wealth differentiation, and price misalignment. Uses year-long simulations with daily price-volume data. Finds LLM switching behavior is only partially consistent with behavioral finance theories, highlighting the gap between agent behavior and human economic rationality.
- Relevance to agentic commerce: If AI agents don't behave like real economic actors, market simulations using them are unreliable. For anyone building agent-to-agent marketplaces (the "agent economy"), this raises the question of whether agent-dominated markets will exhibit fundamentally different dynamics than human markets.
- Link: https://arxiv.org/abs/2602.07023
- Published: 2026-02-02 | Categories: q-fin.TR, cs.AI
Manipulation in Prediction Markets: An Agent-based Modeling Experiment — Bridget Smart, Ebba Mark, Anne Bastian, Josefina Waugh
- Abstract summary: Uses agent-based simulations to study how high-budget "whale" agents can distort prediction market prices. Finds that biased whales can temporarily shift prices proportionally to their market capital share, with distortion duration depending on non-whale learning rates and herding intensity. Model exhibits self-regulatory price discovery across broad parameter space.
- Relevance to agentic commerce: As prediction markets expand (Polymarket, etc.) and AI agents participate, the manipulation dynamics studied here become directly relevant. Whale agents with programmatic access could systematically exploit markets populated by smaller agents.
- Link: https://arxiv.org/abs/2601.20452
- Published: 2026-01-28 | Categories: econ.GN, q-fin.TR
Second Thoughts: How 1-second subslots transform CEX-DEX Arbitrage on Ethereum — Adadurov, Barseghyan, Chtepine, Eloranta, Sebyakin, Valitov
- Abstract summary: Models the impact of reducing Ethereum slot time from 12 seconds to 1-second subslots on DEX arbitrage. Calibrated to Binance/Uniswap v3 data (Jul-Sep 2025), finds faster slots increase arbitrage transactions by 535% and volume by 203%. The improvement comes from reduced variance in trade outcomes, making CEX-DEX arbitrage more appealing on a risk-adjusted basis.
- Relevance to agentic commerce: Faster Ethereum execution directly benefits AI agents doing cross-exchange arbitrage and on-chain payments. For agent payment infrastructure built on Ethereum L2s (Base, etc.), this research quantifies the relationship between block time and agent trading efficiency.
- Link: https://arxiv.org/abs/2601.00738
- Published: 2026-01-02 | Categories: q-fin.TR
From Labor to Collaboration: AI Agents Augmenting Research in Taiwan's Humanities — Yi-Chih Huang
- Abstract summary: Uses Taiwan's Claude.ai usage data (N=7,729 conversations, Nov 2025) from the Anthropic Economic Index to validate an AI agent-based research workflow. Proposes a seven-stage modular framework with three operational modes of human-AI collaboration: direct execution, iterative refinement, and human-led. Finds that human judgment remains irreplaceable for question formulation, theoretical interpretation, and ethical reflection.
- Relevance to agentic commerce: The Anthropic Economic Index data used here is the same dataset that will eventually quantify how humans use AI for commercial tasks. The three collaboration modes (direct execution, iterative, human-led) map neatly onto the four interaction patterns from the CMU CowCorpus paper above — emerging consensus on how humans and agents divide labor.
- Link: https://arxiv.org/abs/2602.17221
- Published: 2026-02-19 | Categories: cs.AI, cs.CL, cs.CY
Towards Sustainable Investment Policies Informed by Opponent Shaping — (truncated in feed)
- Abstract summary: Addresses global coordination for climate investment using multi-agent RL opponent shaping techniques. Studies how agents can learn investment policies that account for other agents' adaptive behavior.
- Relevance to agentic commerce: Multi-agent coordination for investment decisions is a precursor to how AI agents will negotiate and coordinate in marketplaces — learning to shape opponents' behavior is fundamental to agent-to-agent commerce.
- Link: https://arxiv.org/abs/2602.11829
- Published: 2026-02-12 | Categories: cs.MA

📊 Working Papers & Reports

NBER Working Papers

Firm Data on AI — Ivan Yotzov, Jose Maria Barrero, Nicholas Bloom (Stanford), Philip Bunn, Steven J. Davis (Chicago/Hoover), Kevin M. Foster, Aaron Jalca, Brent H. Meyer, Paul Mizen et al.
- Abstract summary: First representative international survey of firm-level AI use across ~6,000 CFOs/CEOs in US, UK, Germany, Australia. 70% of firms actively use AI, particularly younger, more productive firms. Average executive AI use is only 1.5 hours/week. Over 80% of firms report no impact on employment or productivity over the past 3 years. However, firms predict AI will boost productivity by 1.4% and cut employment by 0.7% in the next 3 years. Individual employees predict a 0.5% increase in employment — a significant perception gap between executives and workers.
- Relevance to agentic commerce: This is THE definitive snapshot of firm-level AI adoption from top labor economists (Bloom, Davis, Barrero — the same team behind the WFH research). The 80% "no impact yet" finding is striking given all the hype. For the agentic commerce thesis, it suggests we're still in very early innings — most firms haven't even figured out basic AI use, let alone autonomous agent commerce. The executive-worker perception gap on job losses is politically important.
- Link: https://www.nber.org/papers/w34836
GPT as a Measurement Tool — Hemanth Asirvatham, Elliott Mokski, Andrei Shleifer (Harvard)
- Abstract summary: Presents the GABRIEL software package using GPT to quantify attributes in qualitative data. Evaluates against 1,000+ human-annotated tasks, finding GPT is accurate and generally indistinguishable from human evaluators. Results don't depend on prompting strategy or training data contamination. Application to tech adoption history reveals a tenfold decline in time lags from invention to adoption over the industrial age — from ~50 years to ~5 years today. Documents increasing dominance of companies (vs. individuals) and the US in innovation.
- Relevance to agentic commerce: Andrei Shleifer (most-cited economist alive) validating GPT as a measurement tool has enormous implications for economic research. The technology adoption acceleration finding (50→5 years) suggests agentic commerce infrastructure may reach mass adoption faster than previous technology waves. The finding that companies increasingly dominate innovation validates the enterprise focus of firms like Stripe, Coinbase, and PayPal building agent payment rails.
- Link: https://www.nber.org/papers/w34834
Non-Fungible Tokens as Investment — William N. Goetzmann (Yale), Dong Huang, Milad Nozari
- Abstract summary: Analyzes NFTs as an investment class, finding returns were exceptionally right-skewed with pervasive illiquidity. A handful of trades drove aggregate performance. Investors extrapolating from realized returns without recognizing selection bias and survivorship faced substantial disappointment risk. Successful NFT investing required "an almost perfect confluence of timing, liquidity, and luck."
- Relevance to agentic commerce: Goetzmann (Yale, author of "Money Changes Everything") dissecting the NFT bubble is a cautionary tale for tokenized agent identity/reputation systems. As agent reputation becomes tokenized (AgentProof, ERC-8004), the same survivorship bias and illiquidity risks apply. Any agent marketplace with tradeable agent credentials should study this carefully.
- Link: https://www.nber.org/papers/w34837

🏛️ Institutions & Labs to Watch

Princeton (Narayanan group) — Producing foundational work on AI agent reliability and safety metrics. Their "AI Snake Oil" perspective brings healthy skepticism to agent capability claims. The 12-metric reliability framework from this scan could become a standard.
Rutgers (CHAI Lab) — Multi-agent systems with economic focus. The algorithmic collusion meta-game paper (AAMAS 2026) positions them at the intersection of game theory and LLM agents.
USC (Krishnamachari group) — The stablecoin mean-field game work demonstrates strong DeFi + formal methods expertise. Bhaskar Krishnamachari is prolific in blockchain systems research.
Stanford/Chicago (Bloom, Davis, Barrero) — The firm-level AI survey team continues to produce the most authoritative macro data on AI economic impact. Their working-from-home research shaped policy; their AI research will too.
Harvard (Shleifer) — Shleifer's validation of GPT as a measurement tool signals mainstream economics embracing AI tools for research. When the most-cited economist is publishing on GPT utility, it's a leading indicator.
CMU (Neubig, Bigham) — Human-agent interaction patterns for web agents. Their CowCorpus dataset could become a standard benchmark for studying how humans and agents collaborate on commercial tasks.

📝 Scan Notes

Source Availability

arXiv: All four queries returned rich results. Total results across queries: 3,892 + 9,896 + 59,826 + 937. Filtered to ~60 papers, selected 18 for inclusion based on relevance.
NBER: RSS feed working well. 22 papers in latest batch. 3 directly relevant (Bloom AI survey, Shleifer GPT measurement, Goetzmann NFTs).
Semantic Scholar: Rate-limited (429) on both queries. Need API key for reliable daily access. Action item: Apply for Semantic Scholar API key at https://www.semanticscholar.org/product/api#api-key-form
SSRN: Blocked by Cloudflare (403). May need browser automation or different access method. Action item: Try SSRN via browser profile in future scans.

Key Themes This Week

Agent reliability ≠ agent capability — Multiple papers (Narayanan, AgenticShop) show agents passing benchmarks but failing in practice. The gap between demo and deployment is the core challenge.
Pricing collusion risk is real — Both the Rutgers meta-game and the competitive dynamic pricing papers converge on the finding that AI pricing agents can create supra-competitive outcomes without explicit coordination.
Privacy leakage from agent behavior — SPILLage formalizes what everyone suspected: agents leak user data through actions, not just text. Behavioral oversharing is 5× worse than content oversharing.
zkML for agentic commerce guardrails — Jolt Atlas is the first paper to explicitly frame zero-knowledge proofs as infrastructure for verifiable agent commerce. This is the technical foundation for "trustless agents."
Firm-level AI adoption is shallower than expected — Bloom et al.'s finding that 80% of firms see no productivity/employment impact from AI yet suggests we're in the "trough of disillusionment" for enterprise AI.

Suggestions for Next Scan

Prioritize getting Semantic Scholar API key for supplementary coverage
Add Google Scholar alerts for: "agentic commerce", "agent marketplace", "AI economy mechanism design"
Monitor AAMAS 2026 proceedings (May 25-29, Paphos, Cyprus) — several papers from this scan are accepted there
Track WWW 2026 proceedings — at least two relevant papers accepted (AgenticShop, meme coin manipulation)