Welcome to Siriuslala’s Blog! 👋

It is the responsibility of intellectuals to speak the truth and expose lies.

-- Noam Chomsky

Hi, I'm Yueyan Li, just call me Sirius! I am a reseacher (still a student) on machine learning, deep learning and anything cool with that. Now my interest focuses on interpretability for NNs and cognitive science and linguistics. The blog is a place where I share my thoughts, ideas and some cool stuff I found. Hope you'll enjoy it! Details about me can be found in the About page.

If you have any questions or suggestions about my blogs or simply want for a communication, feel free to comment below my blogs or contact me via almightygod007@163.com.

Thinking and Reasoning

The Purpose I Write This Blog Thinking models are crazily popualr nowadays. The first time I delved in this area was in September, 2023. Later I gradually forgetted this area, until Deepseek came to life. I want to keep to collect information about LLM reasoning and share my thoughts here. Thinking Models text-based explicit reasoning DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Kimi k1.5: Scaling Reinforcement Learning with LLMs GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Skywork Open Reasoner 1 Technical Report implicit reasoning (Coconut) Training Large Language Models to Reason in a Continuous Latent Space others ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline blogs 自顶向下方式深度解读 DeepSeek-R1，内含大量细节 MLA(1)：从代码角度学习和彻底理解 DeepSeek MLA 算法从头理解思考模型（LLM based Reasoning Model），O1，DeepSeek R1，Kimi K1.5 overthinking survey Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models (repo) Awesome-Efficient-Reasoning-LLMs papers Qwen3 Technical Report AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning AdaptThink: Reasoning Models Can Learn When to Think blogs 自适应快慢思考推理模型（Adaptive Reasoning Model）：Qwen3混合思考->字节AdaCoT->清华AdaptThinking parallel thinking Deep Think with Confidence visual reasoning survey Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers papers $V^{*}$: Guided Visual Search as a Core Mechanism in Multimodal LLMs active perception DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL GRIT: Teaching MLLMs to Think with Images tool use VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models imagination Thinking with Generated Images Visual Planning: Let’s Think Only with Images blogs Thinking with Images 小结 others [蒙特卡洛搜索树] MCT Self-Refine (MCTSr)的算法（包含代码理解）聊聊推理模型中的PRMs与MCTS Evaluation dataset Analyses implicit reasoning Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought interpretability How Reinforcement Learning After Next-Token Prediction Facilitates Learning Base Models Know How to Reason, Thinking Models Learn When Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Demystifying Reasoning Dynamics with Mutual Information: Thinking Tokens are Information Peaks in LLM Reasoning Thought Anchors: Which LLM Reasoning Steps Matter? Understanding Reasoning in Thinking Language Models via Steering Vectors Chain-of-Thought Is Not Explainability Unveiling the Mechanisms of Explicit CoT Training: How Chain-of-Thought Enhances Reasoning Generalization How Do LLMs Perform Two-Hop Reasoning in Context? theories Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought Reinforcement Learning RL algorithms (GAE) High-Dimensional Continuous Control Using Generalized Advantage Estimation (DPO) Direct preference optimization: Your language model is secretly a reward model From r to q∗: Your language model is secretly a q-function DPO新作Your Language Model is Secretly a Q-Function解读，与OPENAI Q* 的联系？ (PPO) Proximal Policy Optimization Algorithms (REINFORCE++) REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models (GRPO) DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (DAPO) DAPO: An Open-Source LLM Reinforcement Learning System at Scale (GSPO) Group Sequence Policy Optimization (Cispo) MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Blogs algorithms 人人都能看懂的RL-PPO理论知识 Reasoning LLM（三）：LLM+RL RLHF 常见的思维误区 reward modeling text (PRM) Let’s verify step by step (POLAR) Pre-Trained Policy Discriminators are General Reward Models reward model for generative models Improving Video Generation with Human Feedback VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation Black-Box Prompt Optimization: Aligning Large Language Models without Model Training analyses RL training Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning entropy Reasoning LLM（五）：熵缩过程与能力边界 LLMxRL】熵坍缩与缓解策略 (clip/kl-cov) The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models (forking tokens) Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning RL v.s. SFT Sft memorizes, rl generalizes: A comparative study of foundation model post-training RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs 3.2 统一视角理解从 SFT 到 RL All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning Generalist Reward Models: Found Inside Large Language Models (DFT) On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification 从 SFT 到 RL：一步步看清它们的联系 (NFT) Bridging Supervised Learning and Reinforcement Learning in Math Reasoning Resource RL infra (verl) HybridFlow: A Flexible and Efficient RLHF Framework doc repo slime RL Scaling 时代，我们需要什么样的 RL 框架呢？ blogs training 浅聊RL框架的勃勃生机、万物竞发 How we built our multi-agent research system verl [AI Infra] VeRL 框架入门&代码带读从零开始的verl框架解析 verl RL支持训练deepseek-v3 671B实习复盘(个人版) OpenRLHF&Verl参数转换指南 verl小白解读一文深度全面解析大模型分布式并行策略：DP/TP/PP/CP/EP/SP 深入理解 Megatron-LM（2）原理介绍 DeepSpeed zero1，zero2，zero3和FSDP区别详解 inference SGLang：LLM推理引擎发展新方向图解大模型计算加速系列：FlashAttention V1，从硬件到计算逻辑

LLM Agents

The Purpose I Write This Blog LLM-based agent is gonna change the world. Amazing agent systems have been created to change our life. Since I was once in a team that aimed to build advanced agents for the control of digital devices and for which I was impressed, I want to keep to collect information about LLM agents and share my thoughts here. Resource GUI Agents survey Large Language Model-Brained GUI Agents: A Survey GUI Agent综述 : 揭秘GUI智能体的前世今生-1 : 总览篇-启程 models autoglm ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents MobileRL: Advancing Mobile Use Agents With Adaptive Online Reinforcement Learning ANDROIDGEN: Building an Android Language Agent under Data Scarcity Autoglm: Autonomous foundation agents for guis WebRL:Training llm web agents via self-evolving online curriculum reinforcement learning AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents others DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents Appagent: Multimodal agents as smartphone users (SeeAct) GPT-4V(ision) is a Generalist Web Agent, if Grounded Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V benchmarks web WebArena: A Realistic Web Environment for Building Autonomous Agents Mind2web: Towards a generalist agent for the web Osworld: Benchmarking multimodal agents for open-ended tasks in real computer environments (MiniWob) World of Bits: An Open-Domain Platform for Web-Based Agents android Android in the Wild: A Large-Scale Dataset for Android Device Control (AndroidArena) Understanding the weakness of large language model agents within a complex android environment DeepResearch survey Deep Research Agents: A Systematic Examination And Roadmap Towards AI Search Paradigm models Search-o1: Agentic search-enhanced large reasoning models Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning R1-searcher: Incentivizing the search capability in llms via reinforcement learning repo (Jina) node-DeepResearch Public Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities Language Modeling by Language Models Agentic RL papers ...

Interpretability (& other areas) for Multimodal Models

💡 This post is initially focused on interpretability for multimodal models, while later a lot of papers in other fields are included, just for convenience. Resource Interpretability for MLLMs survey A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models Sparks of Explainability Recent Advancements in Explaining Large Vision Models Awesome LMMs Mechanistic Interpretability probing Probing Multimodal Large Language Models for Global and Local Semantic Representations representation Zoom in: An introduction to circuits Multimodal Neurons in Artificial Neural Networks Interpreting CLIP’s Image Representation via Text-Based Decomposition Interpreting the Second-Order Effects of Neurons in CLIP CLIP不同层 Multimodal Neurons in Pretrained Text-Only Transformers Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers? circuit **(causal tracing) Understanding Information Storage and Transfer in Multi-modal Large Language Models Automatic Discovery of Visual Circuits Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP SAE Case study: Interpreting, manipulating, and controlling clip with sparse autoencoders Towards multimodal interpretability: Learning sparse interpretable features in vision transformers Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery visualization Visualizer！简化你的Vision Transformer可视化！ (DVT) Denoising Vision Transformers Token Activation Map to Visually Explain Multimodal LLMs LVLM-Intrepret: An Interpretability Tool for Large Vision Language Models Transformer Interpretability Beyond Attention Visualization others **Towards interpreting visual information processing in vision-language models demo (dogit lens) Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models tools VLM-Lens information flow **Cross-modal Information Flow in Multimodal Large Language Models *From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks *What’s in the Image? A Deep-Dive into the Vision of Vision Language Models The Narrow Gate: Localized Image-Text Communication in Vision-Language Models Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference analyses on MLLMs Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Lost in Embeddings: Information Loss in Vision–Language Models Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Forgotten Polygons: Multimodal Large Language Models are Shape-Blind Vision Transformers Need Registers On the rankability of visual embeddings Other fields of MLLMs visual pretraining ...

Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks

ArXiv(old version): https://arxiv.org/pdf/2502.06106

一些语言学的梗和有意思的知识

This post is written in Chinese. If you don’t know Chinese, you can learn it lol. (Sorry for this because simply translating the post into English may not be enough for you to understand). 语言学乐子皮钦语 (pidgin) 大家对那些 1.言语中不时夹杂着英文单词 2.装/凡尔赛的人表现出一种厌恶。例如，下面是某恋综里的一段留子对话的名场面： ...

Possible Research Areas in Mechanistic Interpretability

💡 This post is mainly focused on text models. For multi-modal models, please refer to this post. The Purpose I Write This Blog To get started in mech interp research, we need to have a macro understanding of this area. So I write this blog as a summarization of this field to help you and me choose a research topic. Circuit Discovery Methods basic activation patching (causal mediation/interchange interventions…) path patching scaling techinques: attribution patching DAS (distributed alignment search) directional activation patching? 🔭 resources inspirition Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ROME) Locating and Editing Factual Associations in GPT Attribution patching: Activation patching at industrial scale (ACDC) Towards Automated Circuit Discovery for Mechanistic Interpretability Attribution Patching Outperforms Automated Circuit Discovery AtP*: An efficient and scalable method for localizing llm behaviour to components Causal Scrubbing: a method for rigorously testing interpretability hypotheses new Using SAE Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Automatically Identifying Local and Global Circuits with Linear Computation Graphs Contextual Decomposition Mechanistic Interpretation through Contextual Decomposition in Transformers Edge Pruning ? Finding Transformer Circuits with Edge Pruning Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning Evaluation lack of ground truth ...

Exploring Emotional Features in GPT2-Small

🎶Code in this post can be found at the jupyter notebook in my “saeExploration” repo. Find features that reflect positive emotions To find the features related to a specific emotion, I write five sentences containing the key words for each emotion. For example, for happy emotions I have: 1 2 3 4 5 prompt_happy = ["I'll be on a vacation tomorrow and I'm so happy.", "My mombrings home a new puppy and I'm so happy.", "I'm so glad I got the job I wanted.", "I feel so happy when I'm with my friends.", "I'm so happy I got the promotion I wanted.",] I choose to look for features that reflect happiness and sadness. Apart from that, I also wonder if the feature that reflects excitedness has something to do with the one that reflects happiness (they are alike from the semantic level at least.) ...

A Brief Introduction to Mechanistic Interpretability Research

⚠️ Warnings This post was written when I first delved into this area, and it hasn’t been updated for a long time. Thus there might be a lot of errors. Now I’ve changed my attitude to this area. The area is not well-defined, and most of the research in this area is of low quality and is not appealing to me. Besides, I think the study of interpretability should be applied to pratical use, though we can also study it for fun. I’m still interested in interpretability and its applications. I’ll write something new and interesting later ~ 💡 This post is accompanied with another post, which contains specific content in this area. ...