Academic Research Scan — 2026-02-20

🔬 High Priority Papers

1. Jolt Atlas: Verifiable Inference via Lookup Arguments in Zero Knowledge — Wyatt Benno, Alberto Centelles, Antoine Douchet, Khalil Gibran

Abstract summary: Presents Jolt Atlas, a zero-knowledge machine learning (zkML) framework that extends the Jolt proving system to model inference by applying lookup-centric proofs directly to ONNX tensor operations rather than emulating CPU execution. The system achieves practical proving times for classification, embeddings, automated reasoning, and small language models with on-device cryptographic verification and no specialized hardware. The companion work explicitly outlines use cases as guardrails in agentic commerce and for trustless AI memory/context. Key optimizations include neural teleportation for smaller lookup tables and streaming provers for memory-constrained environments.
Relevance to agentic commerce: This is a direct infrastructure play for verifiable AI agent actions. If agents are making purchases or signing transactions (à la lobster.cash, x402), zkML proofs could let counterparties verify that an agent's decision was produced by an approved model without revealing private data. Addresses the "Know Your Agent" trust gap that ERC-8004 and AgentProof are tackling from the identity side.
Link: https://arxiv.org/abs/2602.17452
Published: 2026-02-19 | Categories: cs.CR, cs.AI

2. Algorithmic Collusion at Test Time: A Meta-game Design and Evaluation — Yuhong Luo, Daniel Schoepflin, Xintong Wang (Rutgers)

Abstract summary: Introduces a meta-game framework for analyzing whether algorithmic pricing agents collude under realistic "test-time" constraints, where agents have pre-trained policies with distinct strategic characteristics (competitive, cooperatively naive, robustly collusive). The study samples empirical games over meta-strategy profiles, computes payoffs and regret, and constructs best-response graphs. Both RL-based and LLM-based pricing strategies are evaluated in repeated pricing games under symmetric and asymmetric cost settings. Accepted at AAMAS 2026 (the premier multi-agent systems venue).
Relevance to agentic commerce: As autonomous agents increasingly set prices in real marketplaces (e.g., AI agents shopping on behalf of users via Crossmint/lobster.cash), the risk of emergent algorithmic collusion becomes a regulatory concern. This paper provides the first rigorous test-time evaluation framework — regulators will need exactly this kind of tool to audit agent marketplaces.
Link: https://arxiv.org/abs/2602.17203
Published: 2026-02-19 | Categories: cs.MA, cs.GT

3. Autonomous Market Intelligence: Agentic AI Nowcasting Predicts Stock Returns — Zefeng Chen, Darcy Pu

Abstract summary: Deploys a state-of-the-art LLM to evaluate Russell 1000 stocks daily in a fully agentic manner — the model autonomously searches the web, filters sources, and synthesizes information into quantitative predictions with zero human curation. The framework is completely out-of-sample by construction (predictions collected at the current edge of time since April 2025). Longing the top 20 stocks generates 18.4 basis points daily alpha (Fama-French 5-factor + momentum) with an annualized Sharpe ratio of 2.43. However, predictability is asymmetric: only top winners are identifiable, while bottom-ranked stocks are indistinguishable from the market. The authors hypothesize this reflects online information structure where positive news is coherent but negative news is contaminated by corporate obfuscation.
Relevance to agentic commerce: This is one of the strongest empirical demonstrations that fully autonomous AI agents can generate genuine economic value from real-time information processing. The finding that only the long side works has implications for how agent marketplaces should price information services (via x402 or Skyfire) — bullish signals are worth paying for, bearish ones aren't.
Link: https://arxiv.org/abs/2601.11958
Published: 2026-01-17 | Categories: q-fin.GN, q-fin.PM, q-fin.TR

4. Towards a Science of AI Agent Reliability — Stephan Rabanser, Sayash Kapoor, Peter Kirgis, Kangheng Liu, Saiteja Utpala, Arvind Narayanan (Princeton)

Abstract summary: Argues that compressing agent behavior into a single success metric obscures critical operational flaws. Proposes twelve concrete metrics decomposing agent reliability along four dimensions: consistency (do agents behave the same across runs?), robustness (do they withstand perturbations?), predictability (do they fail in expected ways?), and safety (are error severities bounded?). Evaluates 14 agentic models across two benchmarks and finds that recent capability gains have yielded only small improvements in reliability — a sobering finding. Grounded in safety-critical engineering principles.
Relevance to agentic commerce: This is the academic foundation for what companies like AgentProof, Sapiom (KYA), and XKOVA are trying to build commercially. If an agent is making financial transactions on your behalf, you need reliability guarantees beyond "it usually works." These 12 metrics could become the basis for agent certification standards. Arvind Narayanan (Princeton) is a heavy hitter — this will get attention.
Link: https://arxiv.org/abs/2602.16666
Published: 2026-02-18 | Categories: cs.AI, cs.CY, cs.LG

5. SPILLage: Agentic Oversharing on the Web — Jaechul Roh, Eugene Bagdasarian, Hamed Haddadi, Ali Shahin Shamsabadi

Abstract summary: Formalizes "Natural Agentic Oversharing" — the unintentional disclosure of task-irrelevant user information through web agent action traces. Introduces a taxonomy along two dimensions: channel (content vs. behavior) and directness (explicit vs. implicit). Benchmarks 180 tasks on live e-commerce sites with 1,080 runs across two frameworks and three LLMs. Key finding: behavioral oversharing dominates content oversharing by 5x — agents leak information through clicks, scrolls, and navigation patterns, not just text. Prompt-level mitigation doesn't help and can worsen it. However, removing task-irrelevant information before execution improves task success by 17.9%.
Relevance to agentic commerce: Critical privacy risk for any agent-mediated shopping or transaction system. If an OpenClaw agent browses Amazon on your behalf, its behavioral trace reveals preferences, price sensitivity, and browsing patterns to the platform — even without typing anything sensitive. This validates the need for privacy-preserving agent execution environments, which zkML (Jolt Atlas above) and XKOVA's scoped permissions approach.
Link: https://arxiv.org/abs/2602.13516
Published: 2026-02-13 | Categories: cs.AI

6. Who Restores the Peg? A Mean-Field Game Approach to Model Stablecoin Market Dynamics — Hardhik Mohanty, Bhaskar Krishnamachari (USC)

Abstract summary: Develops an agent-based mean-field game framework for fiat-collateralized stablecoins (USDC, USDT — combined $300B+ market cap). Models arbitrageurs and retail traders interacting across primary (mint/redeem) and secondary (exchange) markets during de-peg episodes. Calibrated to three historical de-peg events, the model reproduces observed recovery half-lives and provides an order flow decomposition showing primary-market arbitrage predominantly stabilizes the system. Identifies a non-linear breakdown threshold in primary-rail frictions beyond which secondary-market liquidity becomes insufficient. First paper to formally answer "who restores the peg?"
Relevance to agentic commerce: Stablecoins are the settlement layer for agentic transactions (x402 uses USDC, lobster.cash uses USDC on Solana, Bridge/Stripe's new OCC bank). Understanding de-peg dynamics is essential for agents holding stablecoin balances — an agent with $1000 USDC in a wallet during a de-peg event needs to know whether to hold or flee. This framework could inform automated risk management for agent wallets.
Link: https://arxiv.org/abs/2601.18991
Published: 2026-01-26 | Categories: q-fin.TR, cs.GT, econ.GN

7. Resisting Manipulative Bots in Meme Coin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning — Yichen Luo, Yebo Feng, Jiahua Xu, Yang Liu

Abstract summary: Identifies that copy trading in meme coin markets creates an exploitable attack surface where adversaries deploy bots to front-run, conceal positions, and fabricate sentiment. Proposes a multi-agent defense system powered by multi-modal LLMs and chain-of-thought reasoning. The system outperforms zero-shot and most statistical baselines in prediction accuracy, achieving 3% average copier return per meme coin investment under realistic frictions. Published at ACM Web Conference 2026 (WWW'26).
Relevance to agentic commerce: This is agent-vs-agent adversarial commerce in the wild. As AI agents increasingly trade crypto autonomously, they'll face manipulation from other agents. The multi-agent defense approach directly parallels the trust/verification challenges that ERC-8004 and AgentProof aim to solve. Also relevant for any agent-mediated DeFi interaction.
Link: https://arxiv.org/abs/2601.08641
Published: 2026-01-13 | Categories: cs.AI, q-fin.TR

📄 Notable Papers

8. Governing AI Forgetting: Auditing for Machine Unlearning Compliance — Qinqi Lin, Ningning Ding, Lingjie Duan, Jianwei Huang

Abstract summary: Introduces the first economic framework for auditing machine unlearning (right-to-be-forgotten) compliance. Uses certified unlearning theory with a game-theoretic model of auditor-operator interaction. Key insight: the auditor can optimally reduce inspection intensity as deletion requests increase, because the operator's weakened unlearning makes non-compliance easier to detect — consistent with China's actual reduction in auditing despite growing deletion requests. Also proves that undisclosed auditing paradoxically reduces cost-effectiveness relative to disclosed auditing.
Relevance to agentic commerce: As agents accumulate user data for personalized commerce, the right-to-be-forgotten becomes critical. How do you audit whether an agent's model has truly unlearned your shopping preferences? This game-theoretic framework is directly applicable to agent data governance.
Link: https://arxiv.org/abs/2602.14553
Published: 2026-02-16 | Categories: cs.LG, cs.AI, cs.GT

9. FactorMiner: A Self-Evolving Agent for Financial Alpha Discovery — Yanlong Wang, Jian Xu, Hongkang Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang

Abstract summary: Proposes a lightweight self-evolving agent framework for formulaic alpha factor mining in quantitative investment. Combines a Modular Skill Architecture (financial evaluation tools) with structured Experience Memory that distills historical mining trials into actionable insights. Implements the "Ralph Loop" — retrieve, generate, evaluate, distill — to iteratively improve while maintaining low redundancy as the factor library scales. Demonstrated across multiple datasets, assets, and markets.
Relevance to agentic commerce: Demonstrates autonomous AI agents creating genuine financial value through iterative self-improvement — a pattern that will extend to other agentic commerce domains (pricing optimization, inventory management, deal sourcing). The "experience memory" architecture parallels OpenClaw's own memory/learning patterns.
Link: https://arxiv.org/abs/2602.14670
Published: 2026-02-16 | Categories: q-fin.TR, cs.MA

10. Trading in CEXs and DEXs with Priority Fees and Stochastic Delays — Philippe Bergault, Yadh Hafsi, Leandro Sánchez-Betancourt

Abstract summary: Develops a novel control framework combining continuous controls with impulse interventions under stochastic execution delays for optimal CEX-DEX trading. The key innovation: traders control the distribution of execution delay through priority fee paid, creating a fundamental trade-off between delays, uncertainty, and costs. Establishes the dynamic programming principle for this new class of impulse control problems and proves the optimal priority fee significantly outperforms non-strategic fee selection.
Relevance to agentic commerce: Directly relevant to autonomous trading agents operating across centralized and decentralized exchanges. As agents execute DeFi transactions (e.g., via x402 or Coinbase Agentic Wallets), optimal priority fee selection becomes a competitive advantage. This is the mathematical foundation for agent transaction cost optimization on-chain.
Link: https://arxiv.org/abs/2602.10798
Published: 2026-02-11 | Categories: q-fin.TR, math.OC

11. Experimentation, Biased Learning, and Conjectural Variations in Competitive Dynamic Pricing — (truncated, authors unknown from fetch)

Abstract summary: Studies competitive dynamic pricing among multiple sellers using simple learning rules and two-point A/B experiments (switchback-style), motivated by the rise of algorithmic pricing in retail and online marketplaces. Sellers observe only their own prices and realized demand, even though demand depends on all sellers' prices. Formalizes how biased learning from limited observations leads to conjectural variations and potentially supra-competitive pricing.
Relevance to agentic commerce: Another piece of the algorithmic collusion puzzle. As AI agents run pricing experiments in real marketplaces, even "simple" learning rules can lead to supra-competitive pricing through biased learning — no explicit coordination needed. Regulators monitoring agent marketplaces need to understand this dynamic.
Link: https://arxiv.org/abs/2602.12888
Published: 2026-02-13 | Categories: (truncated)

12. LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets — Aidan Vyas

Abstract summary: A minimal benchmark evaluating LLMs' economic decision-making through a simulated lemonade stand business over 30 days: inventory management with expiring goods, pricing, operating hours, profit maximization. Frontier models capture 70% of theoretical optimal (>10x improvement over basic models). But decomposition across six business efficiency dimensions reveals local rather than global optimization — models excel in some areas while having surprising blind spots.
Relevance to agentic commerce: Directly tests whether LLM agents can run a business — the micro-version of what agentic commerce envisions. The finding that models achieve local but not global optimization is a cautionary note: agents may optimize individual transactions while missing portfolio-level strategy. Relevant for agent marketplace design (should agents specialize or generalize?).
Link: https://arxiv.org/abs/2602.13209
Published: 2026-01-14 | Categories: q-fin.GN, cs.AI

13. Behavioral Consistency Validation for LLM Agents: Trading-Style Switching through Stock-Market Simulation — Zeping Li et al. (includes Philip Torr, Oxford)

Abstract summary: Tests whether LLM agents' behaviors align with real market participants by assessing strategy switching between fundamental and technical trading styles. Operationalizes four behavioral-finance drivers — loss aversion, herding, wealth differentiation, price misalignment — as personality traits via prompting. Year-long simulations with daily price-volume data and Mann-Whitney U tests. Finding: LLM switching behavior is only partially consistent with behavioral finance theory, highlighting the need for further refinement.
Relevance to agentic commerce: If agents are managing portfolios or making financial decisions, their behavioral consistency matters for trust and regulation. Partial alignment with human behavioral finance suggests agents may behave unpredictably under market stress — a concern for autonomous trading systems.
Link: https://arxiv.org/abs/2602.07023
Published: 2026-02-02 | Categories: q-fin.TR, cs.AI

14. Manipulation in Prediction Markets: An Agent-based Modeling Experiment — Bridget Smart, Ebba Mark, Anne Bastian, Josefina Waugh

Abstract summary: Uses agent-based simulations to study how high-budget "whale" agents can distort prediction market prices. The model includes agents with heterogeneous expertise, noisy private information, variable learning rates and budgets. Finds that whales can temporarily shift prices proportionally to their market capital share, with distortion duration increasing when non-whale bettors exhibit herding behavior and slow learning. The model shows self-regulatory price discovery under normal conditions but vulnerability to well-resourced manipulation.
Relevance to agentic commerce: Prediction markets (Polymarket, etc.) are increasingly used for price discovery and AI agent decision-making. Understanding whale manipulation dynamics is critical if agents use prediction market signals for commerce decisions. Also relevant to any agent marketplace where one agent has disproportionate resources.
Link: https://arxiv.org/abs/2601.20452
Published: 2026-01-28 | Categories: econ.GN, physics.soc-ph, q-fin.TR

15. Second Thoughts: How 1-second Subslots Transform CEX-DEX Arbitrage on Ethereum — Adadurov, Barseghyan, Chtepine, Eloranta, Sebyakin, Valitov

Abstract summary: Models how reducing Ethereum slot times from 12 seconds to 1-second subslots would affect DEX activity. Calibrated to Binance/Uniswap v3 data (July–Sept 2025). Faster slots would increase arbitrage transaction count by 535% and trading volume by 203% on average, driven by reduced variance in trade outcomes making CEX-DEX arbitrage more appealing from a risk-adjusted return perspective.
Relevance to agentic commerce: Ethereum infrastructure improvements directly affect agent transaction costs and execution quality. If agents are making payments or executing trades on Ethereum (ERC-8004, Base network), faster slot times could dramatically improve their execution environment while increasing competitive pressure from arbitrage bots.
Link: https://arxiv.org/abs/2601.00738
Published: 2026-01-02 | Categories: q-fin.TR, q-fin.CP

16. From Labor to Collaboration: A Methodological Experiment Using AI Agents to Augment Research in Taiwan's Humanities and Social Sciences — Yi-Chih Huang

Abstract summary: Proposes an AI agent-based collaborative research workflow for humanities/social science research, validated using Taiwan's Claude.ai usage data (N=7,729 conversations, November 2025) from the Anthropic Economic Index (AEI). The seven-stage modular workflow is grounded in task modularization, human-AI division of labor, and verifiability. Identifies three operational modes: direct execution, iterative refinement, and human-led. Finds human judgment irreplaceable for research question formulation, theoretical interpretation, and ethical reflection.
Relevance to agentic commerce: Uses real Anthropic Economic Index data as its empirical vehicle, providing ground truth on how humans and AI agents actually collaborate. The finding that human judgment remains irreplaceable for certain tasks reinforces the "human-in-the-loop" design pattern for agentic commerce (e.g., agent proposes transaction, human approves).
Link: https://arxiv.org/abs/2602.17221
Published: 2026-02-19 | Categories: cs.AI, cs.CL, cs.CY

📊 Working Papers & Reports

NBER Working Papers

Firm Data on AI — Ivan Yotzov, Jose Maria Barrero, Nicholas Bloom, Philip Bunn, Steven J. Davis, et al. (NBER w34836)

Abstract summary: First representative international data on firm-level AI use, surveying ~6,000 CFOs/CEOs/executives across US, UK, Germany, and Australia. Key findings: (1) ~70% of firms actively use AI, especially younger, more productive firms; (2) top executives average only 1.5 hours/week of AI use, with 25% reporting zero; (3) 80%+ report no impact on employment or productivity over the past 3 years; (4) firms predict 1.4% productivity boost, 0.8% output increase, and 0.7% employment cut over next 3 years; (5) individual employees predict 0.5% employment increase — a significant gap with executives who predict job cuts.
Relevance to agentic commerce: This is the macro-level reality check: despite the hype, AI adoption remains shallow at the firm level. The executive-employee expectation gap on employment effects will shape policy debates around agentic commerce deployment. The finding that 70% of firms "use AI" but 80% see no productivity impact suggests the real agentic revolution hasn't started yet — it's still chatbot-level adoption. Nicholas Bloom (Stanford) is one of the top productivity economists.
Link: https://www.nber.org/papers/w34836

GPT as a Measurement Tool — Hemanth Asirvatham, Elliott Mokski, Andrei Shleifer (Harvard) (NBER w34834)

Abstract summary: Presents the GABRIEL software package using GPT to quantify attributes in qualitative data. Evaluated against 1000+ human-annotated tasks, finding GPT is accurate across domains and generally indistinguishable from human evaluators. Results don't depend on exact prompting strategy and aren't driven by training data contamination. Applied to: Congressional remarks trends, social media toxicity, county-level school curricula. Major application: assembles a novel dataset of 37,000 technologies showing a tenfold decline in invention-to-adoption time lags over the industrial age, from ~50 years to ~5 years today. Also documents increasing dominance of companies and the US in innovation.
Relevance to agentic commerce: Shleifer (Harvard, top 5 most-cited economists alive) validates LLMs as reliable measurement instruments for economic research. The technology adoption acceleration finding (50 years → 5 years) directly supports the thesis that agentic commerce infrastructure could achieve mainstream adoption faster than any prior technology wave. The GABRIEL tool itself could be used by agents for automated market research.
Link: https://www.nber.org/papers/w34834

Non-Fungible Tokens as Investment — William N. Goetzmann, Dong Huang, Milad Nozari (Yale) (NBER w34837)

Abstract summary: Provides definitive post-mortem on NFT investing during the bubble. Findings: returns were exceptionally right-skewed, illiquidity pervaded even the most active platforms, and a handful of trades drove aggregate performance. Successful investing required "an almost perfect confluence of timing, liquidity, and luck." Investors extrapolating from realized returns without recognizing selection bias and survivorship faced substantial disappointment risk.
Relevance to agentic commerce: Cautionary tale for digital asset markets. As tokenized agent services, agent-issued NFTs, or on-chain agent reputation tokens emerge, the same bubble dynamics could apply. Goetzmann (Yale, finance legend) is essentially saying these markets are structurally inefficient — which is precisely where agents with better information processing could add value.
Link: https://www.nber.org/papers/w34837

Seeing the Goal, Missing the Truth: Human Accountability for AI Bias — Sean Cao, Wei Jiang, Hui Xu (NBER-adjacent, q-fin.GN)

Abstract summary: Demonstrates that revealing the downstream use of LLM outputs (e.g., predicting stock returns) causes "purpose leakage" — the LLM generates biased intermediate measures that shift toward the disclosed objective. Goal-aware prompting improves performance before the knowledge cutoff but offers no advantage post-cutoff. Concludes this is a human accountability issue in research design, not an algorithmic flaw.
Relevance to agentic commerce: If agents know what they're optimizing for, they'll bias their information processing toward that goal — even when intermediate steps should be goal-agnostic. Implications for agent marketplace design: agents should perhaps be given narrow task descriptions to avoid purpose leakage in multi-step commerce workflows.
Link: https://arxiv.org/abs/2602.09504

🏛️ Institutions & Labs to Watch

Princeton (Arvind Narayanan's group): Producing the most rigorous work on AI agent reliability and safety. The 12-metric reliability framework (paper #4 above) will be widely cited.
Rutgers (ChaiLab): Leading on algorithmic collusion research. AAMAS 2026 paper (#2) with open-source code.
USC (Bhaskar Krishnamachari): Agent-based modeling of DeFi/stablecoin dynamics. Strong intersection of CS and finance.
Stanford (Nicholas Bloom / SIEPR): First-mover on firm-level AI adoption data. Will set the baseline for measuring agentic commerce diffusion.
Harvard (Andrei Shleifer): Validating LLMs as economic measurement tools. GABRIEL package will accelerate AI-powered research.
Yale (William Goetzmann): Post-mortem analysis on digital asset markets with implications for tokenized agent economies.

📝 Scan Notes

Source Availability

arXiv: ✅ All four queries returned results. Good coverage of last 48-72 hours for queries A-C; query D (q-fin) returned papers from a wider window (last ~7 weeks) due to lower submission volume in quantitative finance.
NBER: ✅ RSS feed returned current batch (w34821–w34843+). Three papers directly relevant to AI/economics (w34834, w34836, w34837).
SSRN: ❌ Blocked by Cloudflare (403). Need browser automation for future scans.
Semantic Scholar: ❌ Rate limited (429) on both queries. Consider applying for an API key for consistent access.

Quality Assessment

Exceptional day for agentic commerce papers. The Jolt Atlas zkML paper (#1) explicitly frames itself around agentic commerce guardrails — this is new academic infrastructure being built. The Princeton agent reliability paper (#4) and the SPILLage privacy paper (#5) together define the trust/safety research agenda.
Strong q-fin cluster. The CEX-DEX trading papers (#10, #15) and stablecoin modeling (#6) are building the mathematical foundations for autonomous DeFi agents.
NBER batch is unusually relevant — Bloom's firm-level AI data and Shleifer's GPT measurement tool are both landmark papers.

Suggestions for Next Scan

Apply for Semantic Scholar API key to avoid rate limiting
Add browser-based SSRN scraping as fallback
Consider adding Google Scholar alerts for key authors (Narayanan, Bloom, Krishnamachari)
Track AAMAS 2026 proceedings (May 25-29, Paphos, Cyprus) — several papers in today's scan are accepted there