A Collection of My Publications

* Equal contribution ✉ Corresponding author ICLR 2026 Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models Yueyan Li*, Chenggong Zhao, Zeyuan Zhang, Caixia Yuan, Xiaojie Wang✉ International Conference on Learning Representations (ICLR), 2026 PDF Code arxiv Sparse Model Diffing via Dynamic Circuits Yueyan Li*, Wenhao Gap, Caixia Yuan, Xiaojie Wang✉ ArXiv, 2026 PDF Code Technical Report AutoGLM: Autonomous Foundation Agents for GUIs Team AutoGLM Arxiv, 2024 PDF Code EMNLP 2024 ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline Yifan Xu*, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Zhao Wenyi, Jie Tang, Yuxiao Dong✉ Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 PDF Code Technical Report ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Team GLM Arxiv, 2024 PDF Code arxiv EntroKV: An Entropy-aware Memory Manager for KV cache Compression Wenhao Gao*, Haoran Cao, Yueyan Li, Caixia Yuan, Xiaojie Wang✉ ArXiv, 2026 Code

January 23, 2026 · 1 min · 164 words · Sirius

Interpretability (& other areas) for Multimodal Models

💡 This post is initially focused on interpretability for multimodal models, while later a lot of papers in other fields are included, just for convenience. Resource Interpretability for MLLMs survey A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models Sparks of Explainability Recent Advancements in Explaining Large Vision Models Awesome LMMs Mechanistic Interpretability probing Probing Multimodal Large Language Models for Global and Local Semantic Representations representation Zoom in: An introduction to circuits Multimodal Neurons in Artificial Neural Networks Interpreting CLIP’s Image Representation via Text-Based Decomposition Interpreting the Second-Order Effects of Neurons in CLIP CLIP不同层 Multimodal Neurons in Pretrained Text-Only Transformers Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers? circuit **(causal tracing) Understanding Information Storage and Transfer in Multi-modal Large Language Models Automatic Discovery of Visual Circuits Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP SAE Case study: Interpreting, manipulating, and controlling clip with sparse autoencoders Towards multimodal interpretability: Learning sparse interpretable features in vision transformers Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery visualization Visualizer!简化你的Vision Transformer可视化! (DVT) Denoising Vision Transformers Token Activation Map to Visually Explain Multimodal LLMs LVLM-Intrepret: An Interpretability Tool for Large Vision Language Models Transformer Interpretability Beyond Attention Visualization others **Towards interpreting visual information processing in vision-language models demo (dogit lens) Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models tools VLM-Lens information flow **Cross-modal Information Flow in Multimodal Large Language Models *From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks *What’s in the Image? A Deep-Dive into the Vision of Vision Language Models The Narrow Gate: Localized Image-Text Communication in Vision-Language Models Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference analyses on MLLMs Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Lost in Embeddings: Information Loss in Vision–Language Models Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Forgotten Polygons: Multimodal Large Language Models are Shape-Blind Vision Transformers Need Registers On the rankability of visual embeddings Other fields of MLLMs visual pretraining ...

February 25, 2025 · 8 min · 3589 words · Sirius

Possible Research Areas in Mechanistic Interpretability

💡 This post is mainly focused on text models. For multi-modal models, please refer to this post. The Purpose I Write This Blog To get started in mech interp research, we need to have a macro understanding of this area. So I write this blog as a summarization of this field to help you and me choose a research topic. Circuit Discovery Methods basic activation patching (causal mediation/interchange interventions…) path patching scaling techinques: attribution patching DAS (distributed alignment search) directional activation patching? 🔭 resources inspirition Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned what is circuit discovery? Towards Best Practices of Activation Patching in Language Models: Metrics and Methods How to use and interpret activation patching representative work activation patching Investigating gender bias in language models using causal mediation analysis (ROME) Locating and Editing Factual Associations in GPT Causal Scrubbing: a method for rigorously testing interpretability hypotheses (AtP) Attribution patching: Activation patching at industrial scale AtP*: An efficient and scalable method for localizing llm behaviour to components path patching (ACDC) Towards Automated Circuit Discovery for Mechanistic Interpretability (EAP) Attribution Patching Outperforms Automated Circuit Discovery (EAP-IG) Have faith in faithfulness: Going beyond circuit overlap when finding model mechanisms Localizing Model Behavior with Path Patching distributed alignment search (DAS) Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Interpretability at Scale: Identifying Causal Mechanisms in Alpaca new Using SAE Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Automatically Identifying Local and Global Circuits with Linear Computation Graphs Contextual Decomposition Mechanistic Interpretation through Contextual Decomposition in Transformers Edge Pruning ? Finding Transformer Circuits with Edge Pruning Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning attribution graph see Applications in the Dictionary Learning section Evaluation lack of ground truth ...

September 6, 2024 · 7 min · 3281 words · Sirius

Exploring Emotional Features in GPT2-Small

🎶Code in this post can be found at the jupyter notebook in my “saeExploration” repo. Find features that reflect positive emotions To find the features related to a specific emotion, I write five sentences containing the key words for each emotion. For example, for happy emotions I have: 1 2 3 4 5 prompt_happy = ["I'll be on a vacation tomorrow and I'm so happy.", "My mombrings home a new puppy and I'm so happy.", "I'm so glad I got the job I wanted.", "I feel so happy when I'm with my friends.", "I'm so happy I got the promotion I wanted.",] I choose to look for features that reflect happiness and sadness. Apart from that, I also wonder if the feature that reflects excitedness has something to do with the one that reflects happiness (they are alike from the semantic level at least.) ...

August 29, 2024 · 6 min · 1114 words · Sirius

A Brief Introduction to Mechanistic Interpretability Research

⚠️ Warnings This post was written when I first delved into this area, and it hasn’t been updated for a long time. Thus there might be a lot of errors. I’m still interested in interpretability and its applications. I’ll write something new and interesting later ~ 💡 This post is accompanied with another post, which contains specific content in this area. ...

August 28, 2024 · 16 min · 3208 words · Sirius