Thinking and Reasoning
The Purpose I Write This Blog Thinking models are crazily popualr nowadays. The first time I delved in this area was in September, 2023. Later I gradually forgetted this area, until Deepseek came to life. I want to keep to collect information about LLM reasoning and share my thoughts here. Thinking Models text-based explicit reasoning DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Kimi k1.5: Scaling Reinforcement Learning with LLMs GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Skywork Open Reasoner 1 Technical Report implicit reasoning (Coconut) Training Large Language Models to Reason in a Continuous Latent Space others ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline blogs 自顶向下方式深度解读 DeepSeek-R1,内含大量细节 MLA(1):从代码角度学习和彻底理解 DeepSeek MLA 算法 从头理解思考模型(LLM based Reasoning Model),O1,DeepSeek R1,Kimi K1.5 overthinking ...
LLM Agents
The Purpose I Write This Blog LLM-based agent is gonna change the world. Amazing agent systems have been created to change our life. Since I was once in a team that aimed to build advanced agents for the control of digital devices and for which I was impressed, I want to keep to collect information about LLM agents and share my thoughts here. Resource GUI Agents survey Large Language Model-Brained GUI Agents: A Survey GUI Agent综述 : 揭秘GUI智能体的前世今生-1 : 总览篇-启程 models autoglm ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents MobileRL: Advancing Mobile Use Agents With Adaptive Online Reinforcement Learning Autoglm: Autonomous foundation agents for guis WebRL:Training llm web agents via self-evolving online curriculum reinforcement learning AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents others DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents Appagent: Multimodal agents as smartphone users (SeeAct) GPT-4V(ision) is a Generalist Web Agent, if Grounded Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V benchmarks web WebArena: A Realistic Web Environment for Building Autonomous Agents Mind2web: Towards a generalist agent for the web Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments (MiniWob) World of Bits: An Open-Domain Platform for Web-Based Agents android Android in the Wild: A Large-Scale Dataset for Android Device Control (AndroidArena) Understanding the weakness of large language model agents within a complex android environment DeepResearch survey Deep Research Agents: A Systematic Examination And Roadmap Towards AI Search Paradigm models Search-o1: Agentic search-enhanced large reasoning models Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning R1-searcher: Incentivizing the search capability in llms via reinforcement learning repo (Jina) node-DeepResearch Public Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities Language Modeling by Language Models Agentic RL Reasoning LLM(四):Agentic RL Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Interpretability for Multimodal Models
💡 This post is initially focused on interpretability for multimodal models, while later a lot of papers in other fields are included, just for convenience. Resource Interpretability for MLLMs survey A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models Sparks of Explainability Recent Advancements in Explaining Large Vision Models Awesome LMMs Mechanistic Interpretability probing Probing Multimodal Large Language Models for Global and Local Semantic Representations representation Zoom in: An introduction to circuits Multimodal Neurons in Artificial Neural Networks Interpreting CLIP’s Image Representation via Text-Based Decomposition Interpreting the Second-Order Effects of Neurons in CLIP CLIP不同层 Multimodal Neurons in Pretrained Text-Only Transformers circuit **(causal tracing) Understanding Information Storage and Transfer in Multi-modal Large Language Models Automatic Discovery of Visual Circuits Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP SAE Case study: Interpreting, manipulating, and controlling clip with sparse autoencoders Towards multimodal interpretability: Learning sparse interpretable features in vision transformers Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery visualization Visualizer!简化你的Vision Transformer可视化! (DVT) Denoising Vision Transformers Token Activation Map to Visually Explain Multimodal LLMs LVLM-Intrepret: An Interpretability Tool for Large Vision Language Models Transformer Interpretability Beyond Attention Visualization others **Towards interpreting visual information processing in vision-language models demo (dogit lens) Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space information flow **Cross-modal Information Flow in Multimodal Large Language Models *From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks *What’s in the Image? A Deep-Dive into the Vision of Vision Language Models The Narrow Gate: Localized Image-Text Communication in Vision-Language Models Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference analyses on MLLMs Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Forgotten Polygons: Multimodal Large Language Models are Shape-Blind Vision Transformers Need Registers On the rankability of visual embeddings Other fields of MLLMs visual pretraining ...
Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks
ArXiv(old version): https://arxiv.org/pdf/2502.06106
一些语言学的梗和有意思的知识
This post is written in Chinese. If you don’t know Chinese, you can learn it lol. (Sorry for this because simply translating the post into English may not be enough for you to understand). 语言学乐子 皮钦语 (pidgin) 大家对那些 1.言语中不时夹杂着英文单词 2.装/凡尔赛 的人表现出一种厌恶。例如,下面是某恋综里的一段留子对话的名场面: ...
Possible Research Areas in Mechanistic Interpretability
💡 This post is mainly focused on text models. For multi-modal models, please refer to this post. The Purpose I Write This Blog To get started in mech interp research, we need to have a macro understanding of this area. So I write this blog as a summarization of this field to help you and me choose a research topic. Circuit Discovery Methods basic activation patching (causal mediation/interchange interventions…) path patching scaling techinques: attribution patching DAS (distributed alignment search) directional activation patching? 🔭 resources inspirition Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ROME) Locating and Editing Factual Associations in GPT Attribution patching: Activation patching at industrial scale (ACDC) Towards Automated Circuit Discovery for Mechanistic Interpretability Attribution Patching Outperforms Automated Circuit Discovery AtP*: An efficient and scalable method for localizing llm behaviour to components Causal Scrubbing: a method for rigorously testing interpretability hypotheses new Using SAE Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Automatically Identifying Local and Global Circuits with Linear Computation Graphs Contextual Decomposition Mechanistic Interpretation through Contextual Decomposition in Transformers Edge Pruning ? Finding Transformer Circuits with Edge Pruning Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning Evaluation lack of ground truth ...
Exploring Emotional Features in GPT2-Small
🎶Code in this post can be found at the jupyter notebook in my “saeExploration” repo. Find features that reflect positive emotions To find the features related to a specific emotion, I write five sentences containing the key words for each emotion. For example, for happy emotions I have: 1 2 3 4 5 prompt_happy = ["I'll be on a vacation tomorrow and I'm so happy.", "My mombrings home a new puppy and I'm so happy.", "I'm so glad I got the job I wanted.", "I feel so happy when I'm with my friends.", "I'm so happy I got the promotion I wanted.",] I choose to look for features that reflect happiness and sadness. Apart from that, I also wonder if the feature that reflects excitedness has something to do with the one that reflects happiness (they are alike from the semantic level at least.) ...
A Brief Introduction to Mechanistic Interpretability Research
⚠️ Warnings This post was written when I first delved into this area, and it hasn’t been updated for a long time. Thus there might be a lot of errors. Now I’ve changed my attitude to this area. The area is not well-defined, and most of the research in this area is of low quality and is not appealing to me. Besides, I think the study of interpretability should be applied to pratical use, though we can also study it for fun. I’m still interested in interpretability and its applications. I’ll write something new and interesting later ~ 💡 This post is accompanied with another post, which contains specific content in this area. ...