Academic Research Scan — 2026-02-22

🔬 High Priority Papers

1. Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge — Wyatt Benno, Alberto Centelles, Antoine Douchet, Khalil Gibran

Published: 2026-02-19 | Categories: cs.CR, cs.AI
Abstract summary: Presents a zero-knowledge ML (zkML) framework that extends the Jolt proving system to verify model inference directly on ONNX tensor operations, rather than emulating CPU execution like zkVMs. Achieves practical proving times for classification, embedding, reasoning, and small language models — all without specialized hardware. The key innovation is using lookup-centric sumcheck protocols well-suited for non-linear ML functions. The paper explicitly states a companion work outlines use cases including "guardrails in agentic commerce and for trustless AI context (AI memory)."
Relevance to agentic commerce: This is directly aimed at the agentic commerce trust gap. If an AI agent claims it ran a particular model to make a purchasing decision, zkML can cryptographically prove it. Directly applicable to ERC-8004 agent verification, Sapiom's KYA (Know Your Agent) concept, and AgentProof's on-chain reputation. Could become infrastructure for x402 and lobster.cash payment verification.
Link: https://arxiv.org/abs/2602.17452

2. Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation — Yuhong Luo, Daniel Schoepflin, Xintong Wang (Rutgers)

Published: 2026-02-19 | Categories: cs.MA, cs.GT | Venue: AAMAS 2026
Abstract summary: Introduces a meta-game framework to study whether algorithmic pricing agents collude under realistic "test-time" constraints — agents have pretrained policies (competitive, cooperative, or collusive) and must select strategies in repeated pricing games. Evaluates both RL-based and LLM-based strategies under symmetric and asymmetric cost settings. Unlike prior work requiring long learning horizons, this tests collusion emergence under rational meta-strategy selection with limited adaptation time. Code available on GitHub.
Relevance to agentic commerce: As autonomous AI agents increasingly set prices in marketplaces (Alibaba's 120M agent-mediated orders, Amazon's agentic commerce roles), algorithmic collusion is a regulatory time bomb. This paper provides the formal framework regulators will use. Directly relevant to anyone building multi-agent marketplace infrastructure — the risk that agent-set prices converge to supra-competitive levels without explicit coordination.
Link: https://arxiv.org/abs/2602.17203

3. Towards a Science of AI Agent Reliability — Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan (Princeton)

Published: 2026-02-18 | Categories: cs.AI, cs.CY, cs.LG
Abstract summary: Argues that single success metrics (like benchmark accuracy) are misleading for evaluating AI agents deployed in real tasks. Proposes 12 concrete metrics decomposing agent reliability along four dimensions: consistency (same inputs → same outputs), robustness (performance under perturbation), predictability (error patterns are foreseeable), and safety (error severity is bounded). Evaluates 14 agentic models across two benchmarks and finds that recent capability gains have yielded only small improvements in reliability. A sobering result from a top-tier group.
Relevance to agentic commerce: Arvind Narayanan (Princeton, author of "AI Snake Oil") is arguably the most influential voice on AI accountability. These 12 reliability metrics are exactly what regulators, insurers, and enterprise buyers will demand before trusting agents with financial transactions. Directly applicable to Sapiom's agent scoring, Agnic Family's DID/VC framework, and anyone building agent-to-agent payment rails. The finding that capability ≠ reliability should alarm the entire "let agents spend money" ecosystem.
Link: https://arxiv.org/abs/2602.16666

4. SPILLage: Agentic Oversharing on the Web — Jaechul Roh, Eugene Bagdasarian, Hamed Haddadi, Ali Shahin Shamsabadi

Published: 2026-02-13 | Categories: cs.AI
Abstract summary: Formalizes "Natural Agentic Oversharing" — when web agents unintentionally disclose task-irrelevant user information through their action traces on websites. Introduces a taxonomy: content vs. behavioral oversharing, explicit vs. implicit. Benchmarks 180 tasks on live e-commerce sites across 1,080 runs with 2 frameworks and 3 LLMs. Key finding: behavioral oversharing (clicks, scrolls, navigation) dominates content oversharing by 5x and persists or worsens under prompt-level mitigation. However, removing irrelevant info before execution improved task success by 17.9%.
Relevance to agentic commerce: This is a critical privacy concern for any system where AI agents shop or transact on behalf of users. When an OpenClaw agent browses Amazon or makes purchases via lobster.cash, its click patterns alone leak information about the user's preferences, budget, and priorities. The 5x behavioral-over-content finding means that even encrypted payloads don't protect users if agent navigation patterns are observable. Directly relevant to the Hudson Rock infostealer threat we flagged earlier.
Link: https://arxiv.org/abs/2602.13516

5. Autonomous Market Intelligence: Agentic AI Nowcasting Predicts Stock Returns — Zefeng Chen, Darcy Pu

Published: 2026-01-17 | Categories: q-fin.GN, q-fin.PM, q-fin.TR
Abstract summary: Deploys a fully agentic LLM that autonomously searches the web, filters sources, and synthesizes predictions for Russell 1000 stocks daily — with zero human curation of inputs. This is a true out-of-sample test starting April 2025 when AI web search became available. The top-20 long-only portfolio generates 18.4 bps daily alpha (Fama-French 5-factor + momentum) with annualized Sharpe of 2.43. However, alpha is concentrated in the top tier; expanding beyond it rapidly dilutes returns. Bottom-ranked stocks show no predictive signal, suggesting an asymmetry in online information structure.
Relevance to agentic commerce: This is the first rigorous demonstration that autonomous AI agents can generate genuine financial alpha without human input — a proof point for the entire "agents with wallets" thesis. The asymmetry finding (agents are better at identifying winners than losers) has implications for agent-mediated investment platforms. The methodology is irreproducible by design (real-time information environment), setting a standard for how to evaluate autonomous agent capabilities.
Link: https://arxiv.org/abs/2601.11958

6. Who Restores the Peg? A Mean-Field Game Approach to Model Stablecoin Market Dynamics — Hardhik Mohanty, Bhaskar Krishnamachari (USC)

Published: 2026-01-26 | Categories: q-fin.TR, cs.GT, econ.GN
Abstract summary: Develops a dynamic agent-based mean-field game framework for fiat-collateralized stablecoins (USDC, USDT), modeling how arbitrageurs and retail traders interact across primary (mint/redeem) and secondary (exchange) markets during de-peg events. The key advantage: it endogenously maps market frictions into clearing prices and order flows, attributing peg-recovery pressure by channel. Using three historical de-pegs, the calibrated model reproduces observed recovery half-lives. Finds that primary-market arbitrage dominates stabilization, and identifies a non-linear breakdown threshold beyond which secondary liquidity merely amplifies the bottleneck.
Relevance to agentic commerce: Stablecoins (USDC especially) are the payment rail for the entire agentic commerce stack — x402, Circle nanopayments, lobster.cash. Understanding when and how the peg breaks is existential. The finding that primary-market arbitrage dominates recovery is directly relevant to Circle's new Gateway infrastructure and Bridge's OCC bank application. Any agent holding USDC balances needs to understand these dynamics.
Link: https://arxiv.org/abs/2601.18991

📄 Notable Papers

7. Governing AI Forgetting: Auditing for Machine Unlearning Compliance — Qinqi Lin, Ningning Ding, Lingjie Duan, Jianwei Huang

Published: 2026-02-16 | Categories: cs.LG, cs.AI, cs.GT
Abstract summary: First economic framework for auditing machine unlearning (MU) compliance — right-to-be-forgotten for AI. Uses hypothesis-testing interpretation of certified unlearning to derive auditor detection capability, then proposes a game-theoretic model of strategic interaction between auditor and operator. Key counterintuitive finding: auditors can optimally reduce inspection intensity as deletion requests increase, because weakened unlearning makes non-compliance easier to detect. Also proves that undisclosed auditing paradoxically reduces regulatory cost-effectiveness vs. disclosed auditing.
Relevance to agentic commerce: As AI agents accumulate transaction histories and user preferences, the right-to-be-forgotten becomes critical infrastructure. If an agent has learned your spending patterns through x402 payments, can you request that knowledge be deleted? This paper provides the economic framework for that audit — directly relevant to GDPR compliance for European agentic commerce deployments.
Link: https://arxiv.org/abs/2602.14553

8. FactorMiner: A Self-Evolving Agent with Skills and Experience Memory for Financial Alpha Discovery — Yanlong Wang, Jian Xu, Hongkang Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang

Published: 2026-02-16 | Categories: q-fin.TR, cs.MA
Abstract summary: A self-evolving agent framework for discovering formulaic alpha factors in quantitative trading. Combines a Modular Skill Architecture (financial evaluation tools) with structured Experience Memory (distilled insights from past mining trials). Uses the "Ralph Loop" — retrieve, generate, evaluate, distill — to iteratively explore the factor space while reducing redundancy. Tested across multiple assets and markets; builds diverse libraries of high-quality, interpretable factors under the "Correlation Red Sea" constraint (new factors increasingly correlated with existing ones).
Relevance to agentic commerce: Demonstrates that autonomous agents can systematically discover financial signals using accumulated experience — a template for how agents in commerce might learn purchasing strategies, supplier selection criteria, or arbitrage opportunities over time. The experience memory architecture maps directly onto how OpenClaw agents could persist and refine economic behaviors across sessions.
Link: https://arxiv.org/abs/2602.14670

9. Experimentation, Biased Learning, and Conjectural Variations in Competitive Dynamic Pricing — (truncated in feed)

Published: 2026-02-13 | Categories: (likely cs.GT/econ)
Abstract summary: Studies competitive dynamic pricing among multiple sellers using simple learning rules (A/B experiments in the style of switchback designs), where sellers observe only their own prices and realized demand. Models how sellers form "conjectural variations" about competitors' responses when they can't observe others' actions directly. Motivated by the rise of large-scale algorithmic pricing in online marketplaces.
Relevance to agentic commerce: When AI agents set prices in marketplaces, they'll observe only their own outcomes — exactly this setting. The "conjectural variations" framework describes how agents might implicitly coordinate (or fail to) without communication. Combined with the algorithmic collusion paper above, this paints a picture of the pricing dynamics in an agent-mediated marketplace.
Link: https://arxiv.org/abs/2602.12888

10. Manipulation in Prediction Markets: An Agent-based Modeling Experiment — Bridget Smart, Ebba Mark, Anne Bastian, Josefina Waugh

Published: 2026-01-28 | Categories: econ.GN, physics.soc-ph, q-fin.TR
Abstract summary: Uses agent-based simulations to study how high-budget "whale" agents can distort prediction market prices. Agents have heterogeneous expertise, noisy private info, variable learning rates and budgets. Finds that biased whales can temporarily shift prices proportional to their capital share, with distortion duration depending on non-whale learning rates and herding intensity. The model exhibits self-regulatory price discovery across a broad parameter space.
Relevance to agentic commerce: As prediction markets grow (Polymarket, Kalshi) and AI agents increasingly participate, this directly models the manipulation risk. Relevant to any "agent marketplace" where autonomous agents trade — from DeFi to information markets. The finding that herding amplifies manipulation suggests agent-dense markets could be more manipulable.
Link: https://arxiv.org/abs/2601.20452

11. Trading in CEXs and DEXs with Priority Fees and Stochastic Delays — Philippe Bergault, Yadh Hafsi, Leandro Sánchez-Betancourt

Published: 2026-02-11 (updated Feb 19) | Categories: q-fin.TR, math.OC
Abstract summary: Develops a mixed control framework combining continuous controls with impulse interventions subject to stochastic execution delays — motivated by optimal trading across centralized and decentralized exchanges. In DEXs, traders control execution delay distribution through priority fees, creating a tradeoff between speed, uncertainty, and cost. Derives dynamic programming principle and viscosity solutions for the resulting quasi-variational inequalities. Shows optimal priority fee selection significantly outperforms non-strategic approaches.
Relevance to agentic commerce: As AI agents execute trades across CEX/DEX (via wallets like those in x402/ERC-8004), this paper provides the mathematical framework for optimal fee selection. Directly relevant to how autonomous trading agents should manage gas/priority fees on Ethereum, Solana, or Base when executing x402 payments or DeFi operations.
Link: https://arxiv.org/abs/2602.10798

12. Resisting Manipulative Bots in Meme Coin Copy Trading — Yichen Luo, Yebo Feng, Jiahua Xu, Yang Liu

Published: 2026-01-13 (WWW'26) | Categories: cs.AI, q-fin.TR
Abstract summary: Proposes a manipulation-resistant copy-trading system for meme coin markets using multi-agent LLM architecture with chain-of-thought reasoning. Adversaries deploy bots to front-run, conceal positions, and fabricate sentiment. The defense system outperforms zero-shot and most statistic-driven baselines, achieving 3% average copier return per investment under realistic frictions. Published at ACM Web Conference 2026.
Relevance to agentic commerce: Multi-agent defense against adversarial manipulation in crypto markets — the exact scenario that arises when autonomous agents trade with real money. Demonstrates that LLM-based agents can identify and resist bot manipulation patterns, a capability essential for any agent-wallet system operating in adversarial environments like DeFi.
Link: https://arxiv.org/abs/2601.08641

13. LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets — Aidan Vyas

Published: 2026-01-14 | Categories: q-fin.GN, cs.AI
Abstract summary: Minimal benchmark testing LLM economic reasoning through a simulated lemonade stand: models manage inventory (expiring goods), set prices, choose hours, and maximize profit over 30 days. All models achieve profitability, with frontier models capturing ~70% of theoretical optimal — a >10x improvement over basic models. However, decomposition across six business dimensions reveals consistent pattern: models achieve local rather than global optimization, excelling in select areas while showing "surprising blind spots."
Relevance to agentic commerce: If we're trusting agents to make purchasing and selling decisions, how good is their economic intuition? This benchmark shows frontier models are decent but not great — they optimize locally, not globally. The blind spots finding is a warning for anyone deploying agents in complex commercial environments. Complements the Princeton reliability paper above.
Link: https://arxiv.org/abs/2602.13209

14. Modeling Distinct Human Interaction in Web Agents — Faria Huq, Zora Zhiruo Wang et al. (CMU)

Published: 2026-02-19 | Categories: cs.CL, cs.HC
Abstract summary: Introduces the task of modeling human intervention patterns during collaborative web task execution. Collects CowCorpus — 400 real-user web navigation trajectories with 4,200+ interleaved human-agent actions. Identifies four interaction patterns: hands-off, hands-on, collaborative, and full takeover. Trains models to predict intervention timing based on interaction style (61-63% improvement). Live deployment shows 26.5% increase in user-rated agent usefulness.
Relevance to agentic commerce: As agents handle transactions, understanding when humans want to intervene is critical — especially for high-value purchases. The four interaction patterns map directly onto how users might oversee agent spending: fully autonomous for small purchases, collaborative for medium ones, full takeover for big decisions. Relevant to OpenClaw's human-in-the-loop design philosophy.
Link: https://arxiv.org/abs/2602.17588

15. From Labor to Collaboration: AI Agents in Taiwan's Humanities and Social Sciences — Yi-Chih Huang

Published: 2026-02-19 | Categories: cs.AI, cs.CL, cs.CY
Abstract summary: Proposes an AI Agent-based collaborative research workflow for humanities/social science, validated using Taiwan's Claude.ai usage data (N=7,729 conversations, Nov 2025) from the Anthropic Economic Index (AEI). Identifies three operational modes: direct execution, iterative refinement, and human-led. Confirms the irreplaceability of human judgment in question formulation, theoretical interpretation, and ethical reflection.
Relevance to agentic commerce: The Anthropic Economic Index data is a unique window into how people actually use AI agents for knowledge work — the same workforce that will increasingly delegate economic tasks to agents. The three collaboration modes (direct/iterative/human-led) map onto agentic commerce trust levels.
Link: https://arxiv.org/abs/2602.17221

📊 Working Papers & Reports

NBER Working Papers

16. Firm Data on AI — Ivan Yotzov, Jose Maria Barrero, Nicholas Bloom, Philip Bunn, Steven J. Davis, Kevin M. Foster, Aaron Jalca, Brent H. Meyer, Paul Mizen, Michael A. Navarrete, Pawel Smietanka, Gregory Thwaites, Ben Zhe Wang

Published: 2026-02 | NBER Working Paper w34836
Abstract summary: First representative international survey of firm-level AI use across ~6,000 CFOs/CEOs in the US, UK, Germany, and Australia. 70% of firms actively use AI (especially younger, more productive firms). Top executives average only 1.5 hours/week of AI use. 80%+ firms report no impact on employment or productivity over the last 3 years. But they forecast AI will boost productivity by 1.4%, increase output by 0.8%, and cut employment by 0.7% over the next 3 years. Individual employees, by contrast, predict 0.5% employment increase — a sizable expectation gap between executives and workers.
Relevance to agentic commerce: This is the Bloom-Davis survey — the gold standard for firm-level technology adoption data. The 70% adoption rate but minimal current impact suggests we're still in early deployment. The executive vs. worker expectation gap on employment effects is a political flashpoint. For agentic commerce specifically: if firms are already using AI but not yet seeing productivity gains, the hypothesis is that the gains come when AI agents start autonomously executing transactions — the next wave.
Link: https://www.nber.org/papers/w34836

17. GPT as a Measurement Tool — Hemanth Asirvatham, Elliott Mokski, Andrei Shleifer

Published: 2026-02 | NBER Working Paper w34834
Abstract summary: Introduces GABRIEL, a software package using GPT to quantify attributes in qualitative data (e.g., how "pro-innovation" a speech is). Validated against 1,000+ human-annotated tasks — GPT is accurate and generally indistinguishable from human evaluators. Results don't depend on prompting strategy and aren't driven by training data contamination. Applied to Congressional remarks, social media toxicity, and school curricula. Also assembles a novel dataset of 37,000 technologies documenting a tenfold decline in adoption lag from ~50 years to ~5 years today, plus the increasing dominance of companies (vs. individuals) and the US in innovation.
Relevance to agentic commerce: Shleifer (Harvard, one of the most-cited economists alive) validating LLMs as measurement instruments is a huge credibility signal. The tech adoption dataset showing 50→5 year adoption lags suggests agentic commerce could reach scale much faster than historical precedent. The GABRIEL framework could be used to measure agent behavior quality, trustworthiness, or commercial intent — exactly the kind of measurement infrastructure the agent economy needs.
Link: https://www.nber.org/papers/w34834

18. Non-Fungible Tokens as Investment — William N. Goetzmann, Dong Huang, Milad Nozari

Published: 2026-02 | NBER Working Paper w34837
Abstract summary: Provides a rigorous post-mortem of NFTs as an investment class. Returns were exceptionally right-skewed, illiquidity pervaded even active platforms, and aggregate performance was driven by a handful of trades. Successful NFT investing required "an almost perfect confluence of timing, liquidity, and luck." Investors extrapolating from realized returns without recognizing selection bias and survivorship faced substantial disappointment risk.
Relevance to agentic commerce: A cautionary tale for any token-based agent economy. As agent reputation tokens (AgentProof), agent payment tokens, and potential agent-minted assets emerge, the NFT bubble's lessons about survivorship bias, illiquidity, and right-skewed returns are directly applicable. The "timing, liquidity, and luck" finding should temper enthusiasm for tokenized agent markets.
Link: https://www.nber.org/papers/w34837

Additional Q-Fin Papers

19. Behavioral Consistency Validation for LLM Agents: Stock-Market Simulation — Zeping Li et al. (incl. Philip Torr)

Published: 2026-02-02 | Categories: q-fin.TR, cs.AI
Abstract summary: Tests whether LLM agents in financial simulations behave consistently with real market participants. Assigns personality traits (loss aversion, herding, wealth differentiation, price misalignment) via prompting, then tests strategy-switching behavior over year-long simulations. Uses Mann-Whitney U tests to compare with behavioral finance theory. Finds that recent LLMs' switching behavior is only partially consistent with financial theory — highlighting the gap between agent behavior and real human market dynamics.
Relevance to agentic commerce: If agent financial behavior doesn't align with real human behavior, agent-populated markets will have different dynamics than expected. This matters for anyone building agent-mediated marketplaces — prices, liquidity, and volatility may behave differently when agents dominate.
Link: https://arxiv.org/abs/2602.07023

20. Seeing the Goal, Missing the Truth: Human Accountability for AI Bias — Sean Cao, Wei Jiang, Hui Xu

Published: 2026-02-10 | Categories: q-fin.GN, cs.AI
Abstract summary: Demonstrates "purpose-conditioned cognition" in LLMs: revealing the downstream use of outputs (e.g., predicting stock returns) causes the model to generate biased intermediate measures, even when those measures should be task-independent. Goal-aware prompting shifts measurements toward the disclosed objective. This "purpose leakage" improves pre-cutoff performance but offers no advantage post-cutoff — it's memorization, not prediction.
Relevance to agentic commerce: When AI agents are given goals ("minimize purchase price" or "maximize portfolio return"), this paper shows they'll bias their intermediate analysis toward that goal — potentially distorting market research, price comparisons, or supplier evaluations. A subtle but important failure mode for autonomous commerce agents.
Link: https://arxiv.org/abs/2602.09504

🏛️ Institutions & Labs to Watch

Princeton (Narayanan group) — Leading the agent reliability/accountability charge. Sayash Kapoor (AI Snake Oil co-author) now focused on agentic systems. Their 12-metric reliability framework could become an industry standard.
Rutgers (CHAI Lab) — Algorithmic collusion meta-game work accepted at AAMAS 2026. Active in multi-agent market dynamics. GitHub: chailab-rutgers.
USC (Krishnamachari group) — Stablecoin mean-field game work bridges crypto economics with formal game theory. Relevant as stablecoins become the payment rail for agent commerce.
Harvard/NBER (Shleifer, Bloom, Davis) — The Shleifer GABRIEL measurement framework + Bloom firm-level AI survey set the empirical foundation for understanding AI adoption at scale.
Jolt Atlas team — Anonymous/pseudonymous team but the zkML-for-agentic-commerce framing is unique. Companion paper forthcoming.

📝 Scan Notes

arXiv: All four queries returned results. Strongest yield from Query A (cs.MA/cs.GT/cs.AI + marketplace/commerce) and Query D (q-fin + agent/LLM). ~90 papers scanned, 20 selected.
NBER: RSS feed returned ~25 papers. 3 highly relevant (Bloom firm AI survey, Shleifer GPT measurement, Goetzmann NFTs).
SSRN: Blocked by Cloudflare (403). Need to try browser-based access in future scans.
Semantic Scholar: Rate-limited (429) on both attempts. Need API key for reliable access — free tier available at semanticscholar.org/product/api.
Key theme this week: The gap between agent capability and agent reliability is becoming the dominant research concern. Multiple independent groups (Princeton, Rutgers, various q-fin authors) are converging on the conclusion that capable agents ≠ trustworthy agents. This directly validates the thesis that agent identity, verification, and spending controls are the bottleneck — not agent intelligence.
Suggestions for next scan: (1) Get Semantic Scholar API key for reliable access. (2) Try browser-based SSRN scraping. (3) Add Google Scholar alerts for "agentic commerce" and "agent marketplace." (4) Monitor AAMAS 2026 proceedings (May 25-29, Paphos, Cyprus) — multiple relevant papers accepted.