The Purpose I Write This Blog
LLM-based agent is gonna change the world. Amazing agent systems have been created to change our life. Since I was once in a team that aimed to build advanced agents for the control of digital devices and for which I was impressed, I want to keep to collect information about LLM agents and share my thoughts here.
Resource
GUI Agents
- survey
- models
- autoglm
- ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
- MobileRL: Advancing Mobile Use Agents With Adaptive Online Reinforcement Learning
- Autoglm: Autonomous foundation agents for guis
- WebRL:Training llm web agents via self-evolving online curriculum reinforcement learning
- AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
- others
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning
- SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
- Appagent: Multimodal agents as smartphone users
- (SeeAct) GPT-4V(ision) is a Generalist Web Agent, if Grounded
- Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
- autoglm
- benchmarks
- web
- android
DeepResearch
- survey
- models
- Search-o1: Agentic search-enhanced large reasoning models
- Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
- R1-searcher: Incentivizing the search capability in llms via reinforcement learning
- (Jina) node-DeepResearch Public
- Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities
- Language Modeling by Language Models