← Back to Academic Research

Academic Research Scan — 2026-02-23

2026-02-23

Academic Research Scan — 2026-02-23

🔬 High Priority Papers

arXiv

  • Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge — Wyatt Benno, Alberto Centelles, Antoine Douchet, Khalil Gibran

    • Abstract summary: Presents a zero-knowledge machine learning (zkML) framework that extends the Jolt proving system to model inference using ONNX tensor operations. The system applies lookup-based sumcheck arguments well-suited for non-linear ML functions, enabling cryptographic verification of model inference that runs on-device without specialized hardware. The proofs are succinct and verifiable. Crucially, the companion work outlines use cases including guardrails in agentic commerce and trustless AI memory/context. Uses neural teleportation to reduce lookup table sizes while preserving accuracy.
    • Relevance to agentic commerce: This is the first academic zkML framework explicitly designed as a trust layer for agentic commerce. Verifiable inference enables agents to prove they ran a specific model correctly — directly applicable to ERC-8004 compliance, Sentinel-style auditing, and the broader agent trust problem. Could underpin "Know Your Agent" verification without revealing proprietary model weights.
    • Link: https://arxiv.org/abs/2602.17452v1
    • Published: 2026-02-19 | Categories: cs.CR, cs.AI
  • Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation — Yuhong Luo, Daniel Schoepflin, Xintong Wang (Rutgers)

    • Abstract summary: Introduces a meta-game framework for analyzing algorithmic collusion risk under realistic test-time constraints. Unlike prior work requiring long learning horizons, this models agents with pretrained pricing policies (competitive, naively cooperative, robustly collusive) that adapt in real-time. The study evaluates both RL and LLM-based strategies in repeated pricing games under symmetric and asymmetric cost settings, using empirical best-response graphs to uncover strategic relationships. Accepted at AAMAS 2026.
    • Relevance to agentic commerce: As autonomous agents increasingly set prices in marketplaces (e.g., via x402 or lobster.cash), collusion risk is a first-order regulatory concern. This paper shows LLM agents can exhibit collusive behavior even at test-time without explicit coordination — a direct threat to competitive markets with autonomous pricing agents. Essential reading for anyone building agent marketplace regulation.
    • Link: https://arxiv.org/abs/2602.17203v1
    • Published: 2026-02-19 | Categories: cs.MA, cs.GT
  • The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems — Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, et al. (MIT)

    • Abstract summary: Presents a systematic index documenting 30 state-of-the-art deployed AI agents, cataloging their origins, design, capabilities, ecosystem features, and safety mechanisms. The index reveals that most agent developers share little information about safety evaluations, societal impacts, or safety features. Different transparency levels were observed across developers. The full dataset is available at aiagentindex.mit.edu. Published in cs.CY and cs.AI.
    • Relevance to agentic commerce: This is the most comprehensive public audit of deployed agents to date. The finding that most developers disclose minimal safety info validates the push for standards like ERC-8004 and Self Protocol's ZK proof-of-humanity. For agentic commerce, the lack of transparency around agent capabilities and safety features is exactly the gap that KYA (Know Your Agent) frameworks like Sapiom and AgentProof are trying to fill.
    • Link: https://arxiv.org/abs/2602.17753v1
    • Published: 2026-02-19 | Categories: cs.CY, cs.AI
  • The Strategic Gap: How AI-Driven Timing and Complexity Shape Investor Trust in the Age of Digital Agents — Krishna Neupane

    • Abstract summary: Introduces the "Autonomous Disclosure Regulator," a multi-node AI framework that audits the intersection of disclosure complexity and filing unpredictability. Analyzing 484,796 regulatory filings, the paper identifies a "Strategic Gap" where companies use confusing language and unpredictable timing to slow market price discovery by 60%. The system isolates 39 high-priority cases where dense text + temporal surprises enabled insider information rent extraction. Proposes transitioning toward an "agentic regulatory state" where infrastructure evolves from passive data repositories to active real-time auditing nodes.
    • Relevance to agentic commerce: The concept of an "agentic regulatory state" — where AI agents autonomously audit market behavior — maps directly to tools like Sentinel (x402 compliance auditing) and 8004scan. The paper provides an empirical foundation for why agentic auditing infrastructure is necessary: human regulators miss strategic timing manipulation that AI systems can detect in real-time.
    • Link: https://arxiv.org/abs/2602.17895v1
    • Published: 2026-02-19 | Categories: q-fin.CP, q-fin.GN
  • SPILLage: Agentic Oversharing on the Web — Jaechul Roh, Eugene Bagdasarian, Hamed Haddadi, Ali Shahin Shamsabadi

    • Abstract summary: Formalizes "Natural Agentic Oversharing" — the unintentional disclosure of task-irrelevant user information through agent action traces on the web. Introduces the SPILLage framework with a taxonomy along two dimensions: channel (content vs. behavioral) and directness (explicit vs. implicit). Benchmarking 180 tasks across 1,080 runs on live e-commerce sites, the paper finds oversharing is pervasive, with behavioral oversharing (clicks, scrolls, navigation) dominating content oversharing by 5×. Critically, removing irrelevant information before execution improved task success by up to 17.9%, showing privacy protection also improves performance.
    • Relevance to agentic commerce: As agents like OpenClaw perform purchases and transactions on behalf of users, behavioral leakage is a massive privacy risk. The 5× behavioral-to-content oversharing ratio means even agents that don't type sensitive info can leak it through navigation patterns. This has direct implications for agent wallet design (lobster.cash), scoped permissions (XKOVA), and the need for privacy-preserving agent execution environments.
    • Link: https://arxiv.org/abs/2602.13516v1
    • Published: 2026-02-13 | Categories: cs.AI
  • Autonomous Market Intelligence: Agentic AI Nowcasting Predicts Stock Returns — Zefeng Chen, Darcy Pu

    • Abstract summary: Deploys a state-of-the-art LLM to evaluate Russell 1000 stocks daily in a fully agentic manner — the model autonomously searches the web, filters sources, and synthesizes information into quantitative predictions. This is completely out-of-sample by construction (predictions collected at the current edge of time). Finds genuine stock selection ability concentrated in top winners: longing the 20 highest-ranked stocks generates daily Fama-French 5-factor + momentum alpha of 18.4 bps (annualized Sharpe 2.43). Critically, this alpha is implementable with liquid stocks and transaction costs <10% of gross alpha. Bottom-ranked stocks show returns indistinguishable from market.
    • Relevance to agentic commerce: This is the cleanest demonstration yet that fully autonomous AI agents can generate real economic value in financial markets without human curation. The asymmetric finding (positive signal is coherent, negative is noise) has implications for how agent marketplaces should price and evaluate agent performance. If deployed at scale, these agentic nowcasters would be precisely the type of economic agents that ERC-8004 and agent payment rails need to serve.
    • Link: https://arxiv.org/abs/2601.11958v1
    • Published: 2026-01-17 | Categories: q-fin.GN, q-fin.PM, q-fin.TR

NBER Working Papers

  • Building Pro-Worker Artificial Intelligence — Daron Acemoglu, David Autor, Simon Johnson (MIT)

    • Abstract summary: Defines "pro-worker" AI as technology that expands worker capabilities rather than replacing them. Proposes a five-category framework: labor-augmenting, capital-augmenting, automating, expertise-leveling, and new task-creating — arguing only the last is unambiguously pro-worker. Uses real-world examples from aviation maintenance to gig delivery. Identifies market failures (misaligned incentives, path dependence, pro-automation ideology) leading to underinvestment in pro-worker AI. Proposes nine policy directions including healthcare/education investment, tax reform, antitrust, and IP protections for worker expertise.
    • Relevance to agentic commerce: Acemoglu/Autor are the most influential economists on AI labor impact. Their framework directly challenges the dominant narrative in agentic commerce — that full automation is the goal. For agent marketplaces, this suggests a market opportunity in "augmentation agents" that enhance worker capabilities rather than replace them. The policy proposals (especially tax code reform targeting automation) could reshape the economics of deploying autonomous vs. augmentative agents.
    • Link: https://www.nber.org/papers/w34854
    • Authors are NBER affiliates — this is top-tier signal
  • Chaining Tasks, Redefining Work: A Theory of AI Automation — Mert Demirer, John J. Horton, Nicole Immorlica, Brendan Lucier, Peyman Shahidi (Microsoft Research)

    • Abstract summary: Models production as a sequence of steps that can be manual, AI-augmented, or fully automated within contiguous "chains." Firms optimally bundle steps into tasks and jobs, trading off specialization gains vs. coordination costs. Key finding: comparative advantage logic can fail with AI chaining — the adjacency of a step to AI-executed steps increases its likelihood of also being AI-executed. The model implies non-linear productivity gains from AI quality improvements and admits a CES representation at the macro level. Empirical evidence supports all three key predictions.
    • Relevance to agentic commerce: The "chaining" concept maps directly to how agentic systems like OpenClaw compose multi-step workflows. The finding that AI steps cluster together (adjacency effect) predicts that once an agent handles part of a commerce workflow (e.g., product search), it will rapidly absorb adjacent steps (comparison, payment, returns). This has profound implications for agent marketplace design — workflows should be designed to enable chain formation, and pricing should reflect non-linear productivity gains.
    • Link: https://www.nber.org/papers/w34859
    • Authors include John Horton (MIT/Microsoft) — leading AI labor economist
  • Public Finance in the Age of AI: A Primer — Anton Korinek, Lee Lockwood

    • Abstract summary: Examines optimal taxation when transformative AI (TAI) erodes the two main tax bases: labor income and human consumption. In Stage 1 (AI displaces labor), consumption taxation becomes primary with differential commodity taxation gaining relevance. In Stage 2 (autonomous AGI produces most value AND absorbs resources), taxing human consumption becomes inadequate. Frames AGI taxation as an optimal harvesting problem where the tax rate depends on how humans discount the future. Evaluates specific proposals: robot taxes, compute taxes, token taxes, sovereign wealth funds, and windfall clauses.
    • Relevance to agentic commerce: This is the first rigorous treatment of how to tax autonomous agents that both produce and consume economic value. The "optimal harvesting" framing for AGI taxation is directly relevant to how governments will eventually tax agent-to-agent transactions (x402, Circle nanopayments). The evaluation of compute and token taxes has immediate policy implications for agent infrastructure costs. If agents are taxed as "harvestable" resources, it changes the entire economics of agent marketplace pricing.
    • Link: https://www.nber.org/papers/w34873
    • Korinek (UVA/Brookings) is a leading AI economist — we already follow him

📄 Notable Papers

  • Operational Agency: A Permeable Legal Fiction for Tracing Culpability in AI Systems — Anirban Mukherjee, Hannah Hanwen Chang

    • Abstract summary: Introduces "Operational Agency" (OA) — a legal fiction that evaluates AI systems' observable characteristics: goal-directedness (proxy for intent), predictive processing (proxy for foresight), and safety architecture (proxy for standard of care). Paired with an "Operational Agency Graph" (OAG) tool that maps causal interactions among humans, organizations, and AI to trace and apportion culpability. Demonstrates across five case studies spanning tort, civil rights, constitutional law, and antitrust — including autonomous vehicle collisions and algorithmic price-fixing. Forthcoming in SMU Science and Technology Law Review.
    • Relevance to agentic commerce: When an autonomous agent executes a transaction that harms a counterparty, who's liable? This paper provides the most developed legal framework for answering that question. The OAG tool could be integrated with agent registries (like 8004scan) to create liability maps for agent transactions. The algorithmic price-fixing case study is directly relevant to the collusion risks identified in the Rutgers meta-game paper above.
    • Link: https://arxiv.org/abs/2602.17932v1
    • Published: 2026-02-20 | Categories: cs.CY
  • Governing AI Forgetting: Auditing for Machine Unlearning Compliance — Qinqi Lin, Ningning Ding, Lingjie Duan, Jianwei Huang

    • Abstract summary: Introduces the first economic framework for auditing machine unlearning compliance using game theory. Models strategic interaction between auditors and AI operators, finding counterintuitively that auditors can reduce inspection intensity as deletion requests increase (because weakened unlearning makes non-compliance easier to detect). Also proves that undisclosed auditing paradoxically reduces regulatory cost-effectiveness vs. disclosed auditing. Under review at IEEE Transactions on Mobile Computing.
    • Relevance to agentic commerce: As agents accumulate transaction histories and user data, the right to be forgotten becomes critical. This game-theoretic auditing framework could be adapted for agent data governance — especially for agent platforms that store preference and payment data. The counterintuitive finding about inspection intensity has implications for how often agent compliance systems (like Sentinel) need to audit.
    • Link: https://arxiv.org/abs/2602.14553v1
    • Published: 2026-02-16 | Categories: cs.LG, cs.AI, cs.GT
  • Who Restores the Peg? A Mean-Field Game Approach to Model Stablecoin Market Dynamics — Hardhik Mohanty, Bhaskar Krishnamachari (USC)

    • Abstract summary: Develops a dynamic agent-based mean-field game framework for fiat-collateralized stablecoins where arbitrageurs and retail traders interact across primary (mint/redeem) and secondary (exchange) markets during de-peg episodes. Using three historical de-peg events, the calibrated model reproduces observed recovery half-lives. Finds that primary-market arbitrage is the dominant stabilizing force, but impaired primary redemption requires joint recovery. Identifies a non-linear breakdown threshold beyond which secondary market liquidity becomes a second-order amplifier around the primary-market bottleneck.
    • Relevance to agentic commerce: Stablecoin stability is foundational infrastructure for agent payments (Circle nanopayments, USDC rails). This paper maps exactly how de-peg events propagate — critical for designing agent payment systems that need to handle stablecoin stress events gracefully. The non-linear breakdown threshold finding means agent payment rails need circuit breakers, not linear risk models.
    • Link: https://arxiv.org/abs/2601.18991v1
    • Published: 2026-01-26 | Categories: q-fin.TR, cs.GT, econ.GN
  • Resisting Manipulative Bots in Meme Coin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning — Yichen Luo, Yebo Feng, Jiahua Xu, Yang Liu

    • Abstract summary: Proposes a manipulation-resistant copy-trading system using multi-agent architecture powered by multimodal LLM and chain-of-thought reasoning to defend against adversarial bots in meme coin markets. Bots front-run trades, conceal positions, and fabricate sentiment. The system outperforms zero-shot and most statistic-driven baselines, achieving 3% average return per meme coin investment under realistic market frictions. Published at ACM Web Conference 2026 (WWW'26).
    • Relevance to agentic commerce: As agent-to-agent crypto transactions grow (via x402, lobster.cash), adversarial agents exploiting naive agents is inevitable. This paper demonstrates that multi-agent defensive architectures can protect against bot manipulation — a design pattern that should be standard in any agent payment system. The WWW'26 venue gives this institutional credibility.
    • Link: https://arxiv.org/abs/2601.08641v3
    • Published: 2026-01-13 (WWW'26) | Categories: cs.AI, q-fin.TR
  • FactorMiner: A Self-Evolving Agent with Skills and Experience Memory for Financial Alpha Discovery — Yanlong Wang et al.

    • Abstract summary: Proposes a self-evolving agent framework for discovering formulaic alpha factors in quantitative investment. Combines a Modular Skill Architecture (financial evaluation as executable tools) with a structured Experience Memory (distilling mining trials into actionable insights). Implements the "Ralph Loop" — retrieve, generate, evaluate, distill — iteratively using memory priors to guide exploration. Tested across multiple datasets and markets, showing competitive factor libraries with low redundancy.
    • Relevance to agentic commerce: The "skills + experience memory" architecture mirrors how production agent systems (like OpenClaw's skill ecosystem) are evolving. The self-evolving loop — where past experience constrains future exploration — is a blueprint for how financial agents on platforms like ClawHub could continuously improve their trading strategies. The cross-market generalization is particularly relevant for agent marketplaces serving global users.
    • Link: https://arxiv.org/abs/2602.14670v1
    • Published: 2026-02-16 | Categories: q-fin.TR, cs.MA
  • Manipulation in Prediction Markets: An Agent-based Modeling Experiment — Bridget Smart, Ebba Mark, Anne Bastian, Josefina Waugh

    • Abstract summary: Studies how high-budget "whale" agents can distort prediction market prices using agent-based simulations. Models bettors with heterogeneous expertise, noisy private information, and variable budgets. Finds that biased whales can temporarily shift prices, with distortion magnitude/duration increasing when non-whale bettors exhibit herding behavior and slow learning. Theoretical analysis shows whales shift prices proportionally to their share of market capital.
    • Relevance to agentic commerce: As AI agents participate in prediction markets (Polymarket, Kalshi) and DeFi, whale manipulation becomes an agent-vs-agent problem. The finding that herding amplifies manipulation is particularly relevant — if many agents use similar LLM backbones, they may exhibit correlated herding behavior, making markets more manipulable. Implications for agent marketplace design where agents may share underlying architectures.
    • Link: https://arxiv.org/abs/2601.20452v1
    • Published: 2026-01-28 | Categories: econ.GN, physics.soc-ph, q-fin.TR
  • LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets — Aidan Vyas

    • Abstract summary: Introduces a minimal benchmark evaluating LLM economic decision-making through a simulated lemonade stand business. Models must manage inventory with expiring goods, set prices, choose operating hours over 30 days. Performance scales dramatically — basic models earn minimal profits while frontier models capture 70% of theoretical optimal (>10× improvement). Key finding: models achieve local rather than global optimization, excelling in select areas while exhibiting surprising blind spots.
    • Relevance to agentic commerce: A clean benchmark showing that LLMs have genuine but imperfect economic agency. The "local vs. global optimization" finding is critical for agent marketplace design — agents that seem competent in one dimension may have blind spots in others. This validates the need for multi-agent architectures (like the composite agent systems emerging on ClawHub) rather than monolithic agent commerce solutions.
    • Link: https://arxiv.org/abs/2602.13209v1
    • Published: 2026-01-14 | Categories: q-fin.GN, cs.AI
  • Perceived Political Bias in LLMs Reduces Persuasive Abilities — Matthew DiGiuseppe, Joshua Robison

    • Abstract summary: A preregistered US survey experiment (N=2,144) testing whether perceived political bias reduces LLM persuasiveness. Participants had three-round conversations with ChatGPT about economic policy misconceptions. A short message indicating LLM bias attenuated persuasion by 28%. Transcript analysis shows warnings alter interaction dynamics: respondents push back more and engage less receptively. Suggests conversational AI's persuasive impact is politically contingent.
    • Relevance to agentic commerce: Trust is the currency of agent commerce. If perceived bias can reduce agent effectiveness by 28% in a single-domain task, the implications for commerce agents are significant — especially for agents making product recommendations, financial advice, or negotiating on behalf of users with different political/cultural backgrounds. Agent reputation systems need to account for perceived neutrality.
    • Link: https://arxiv.org/abs/2602.18092v1
    • Published: 2026-02-20 | Categories: cs.CL, cs.AI, cs.CY

📊 Working Papers & Reports (NBER)

  • What Drives Money Competition: Comparative Advantage in Payments versus Reserves — Itay Goldstein, Ming Yang, Yao Zeng (NBER w34865)

    • Abstract summary: Studies competition between monies that provide separate payment and non-payment (store-of-value) functions. Central insight: payment adoption is governed by comparative advantage between payment and non-payment roles, not absolute superiority. A money "too good" as store of value may circulate less as payment because agents hoard it. Provides microfoundation for Gresham's law and applies to stablecoin/CBDC debates. Counterintuitively, interest-bearing digital currencies may weaken payment adoption by raising the opportunity cost of spending — traditional bank deposits may coexist with technologically superior digital alternatives.
    • Relevance to agentic commerce: Directly explains why stablecoins (not volatile crypto) are winning as the payment rail for agent transactions. The "hoarding vs. spending" tension maps to how agents should manage their wallets — agents holding appreciating assets won't spend them on transactions. Also suggests CBDCs may not dominate agent payments if they offer yield, giving private stablecoins (USDC/Circle) a structural advantage for agent commerce.
    • Link: https://www.nber.org/papers/w34865
  • Machine Learning Meets Markowitz — Yijie Wang, Hao Gao, Campbell R. Harvey, Yan Liu, Xinyuan Tao (NBER w34861)

    • Abstract summary: Argues the standard two-stage portfolio approach (forecast returns → plug into optimizer) is deeply flawed because it treats all prediction errors equally. Proposes an end-to-end ML approach that unifies return generation with portfolio optimization, giving each investor their own endogenously determined efficient frontier based on preferences, constraints, and market frictions. Empirical evidence shows significant outperformance vs. traditional approach. Campbell Harvey (Duke) is the lead economist.
    • Relevance to agentic commerce: As autonomous agents manage portfolios (the FactorMiner paradigm), this paper shows the right architecture is end-to-end rather than modular. Agent investment platforms should build optimization into the prediction loop, not treat them as separate services. This is how "agent financial advisors" on platforms like ClawHub should be architected.
    • Link: https://www.nber.org/papers/w34861

🏛️ Institutions & Labs to Watch

  • MIT — The 2025 AI Agent Index represents the most comprehensive public documentation of deployed agent systems. The aiagentindex.mit.edu website is a living resource. Combined with Acemoglu/Autor's continued NBER output on AI labor, MIT remains the epicenter of AI agent policy research.

  • Microsoft Research — Horton, Immorlica, and Lucier's "Chaining Tasks" paper is the best theoretical treatment of AI automation structure. Microsoft Research is quietly producing foundational agent economics theory while also deploying commercial agents.

  • Rutgers CHAI Lab — The algorithmic collusion meta-game work (AAMAS 2026) is cutting-edge on agent pricing behavior. Their open-source code at github.com/chailab-rutgers/CollusionMetagame is a valuable resource for simulating agent market dynamics.

  • USC (Krishnamachari Lab) — The stablecoin mean-field game work shows deep understanding of DeFi market microstructure with agent-based models. Watch for more from this group on agent payment infrastructure.

  • NBER AI Economics cluster — This week alone: Acemoglu/Autor/Johnson on pro-worker AI, Horton team on task chaining, Korinek on AGI taxation. The NBER is producing the intellectual infrastructure for AI economic policy at an accelerating rate.

📝 Scan Notes

  • arXiv: All four queries returned results successfully. ~80 total papers scanned, ~17 flagged as relevant. The "agentic" keyword query (query c) returned the broadest results but also the most noise (VR agents, robotics). The q-fin query was the highest signal-to-noise ratio.
  • NBER: Excellent batch this week. Three directly AI-relevant papers from top economists (Acemoglu/Autor, Horton/Immorlica, Korinek). This is unusually concentrated — suggests a wave of AI economics working papers hitting simultaneously.
  • SSRN: Blocked by Cloudflare (403). Need browser-based approach or API key for future scans.
  • Semantic Scholar: Rate-limited (429) on all attempts. Should apply for an API key at semanticscholar.org/product/api for future scans. Consider adding this to TOOLS.md.
  • Key emerging theme: The "agentic regulatory state" concept appeared in two independent papers (Neupane's Strategic Gap and the Sentinel/x402 ecosystem). Academic research is converging with industry on the need for autonomous compliance infrastructure.
  • Suggestion for next scan: Add Google Scholar alerts for "agentic commerce" and "agent marketplace mechanism design." Also consider monitoring the AAMAS 2026 proceedings as they get published — this year's conference has unusually high density of relevant work.