The Purpose I Write This Blog
LLM-based agent is gonna change the world. Amazing agent systems have been created to change our life. Since I was once in a team that aimed to build advanced agents for the control of digital devices and for which I was impressed, I want to keep to collect information about LLM agents and share my thoughts here.
Resource
Agentic RL
- survey
- papers
- LLM
- rStar2-Agent: Agentic Reasoning Technical Report
- (GiGPO) Group-in-Group Policy Optimization for LLM Agent Training
- (ARPO) Agentic Reinforced Policy Optimization
- (AEPO) Agentic Entropy-Balanced Policy Optimization
- Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs
- VLM
- multi-agent
- advantage computation
- MARL credit assignment
- experience replay
- Efficient RL Training for LLMs with Experience Replay
- Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
- AgentHER: Hindsight Experience Replay for LLM Agent Trajectory
- ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
- context management
- self evolution
- memory
- meta-RL
- test-time scaling
- online learning
- emergence
- LLM
- blogs
GUI Agent
- survey
- models
- UI-TARS: Pioneering Automated GUI Interaction with Native Agents
- autoglm
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
- MobileRL: Advancing Mobile Use Agents With Adaptive Online Reinforcement Learning
- ANDROIDGEN: Building an Android Language Agent under Data Scarcity
- Autoglm: Autonomous foundation agents for guis
- WebRL:Training llm web agents via self-evolving online curriculum reinforcement learning
- AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
- Appagent: Multimodal agents as smartphone users
- (SeeAct) GPT-4V(ision) is a Generalist Web Agent, if Grounded
- Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
- benchmarks
- web
- android
DeepResearch
- survey
- models
- Search-o1: Agentic search-enhanced large reasoning models
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
- R1-searcher: Incentivizing the search capability in llms via reinforcement learning
- (Jina) node-DeepResearch Public
- Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities
- Language Modeling by Language Models