Interpretability (& other areas) for Multimodal Models

💡 This post is initially focused on interpretability for multimodal models, while later a lot of papers in other fields are included, just for convenience. Methods Interpretability for MLLMs survey A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models Sparks of Explainability Recent Advancements in Explaining Large Vision Models Awesome LMMs Mechanistic Interpretability probing Probing Multimodal Large Language Models for Global and Local Semantic Representations representation Zoom in: An introduction to circuits Multimodal Neurons in Artificial Neural Networks Interpreting CLIP’s Image Representation via Text-Based Decomposition Interpreting the Second-Order Effects of Neurons in CLIP CLIP不同层 Multimodal Neurons in Pretrained Text-Only Transformers Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers? Understanding Video Transformers via Universal Concept Discovery circuit **(causal tracing) Understanding Information Storage and Transfer in Multi-modal Large Language Models Automatic Discovery of Visual Circuits Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP SAE Case study: Interpreting, manipulating, and controlling clip with sparse autoencoders Towards multimodal interpretability: Learning sparse interpretable features in vision transformers Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery visualization VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow Visualizer!简化你的Vision Transformer可视化! (DVT) Denoising Vision Transformers Token Activation Map to Visually Explain Multimodal LLMs LVLM-Intrepret: An Interpretability Tool for Large Vision Language Models Transformer Interpretability Beyond Attention Visualization others **Towards interpreting visual information processing in vision-language models demo (dogit lens) Laying the Foundations for Vision and Multimodal Mechanistic Interpretability & Open Problems Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models tools VLM-Lens information flow Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs **Cross-modal Information Flow in Multimodal Large Language Models *From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks *What’s in the Image? A Deep-Dive into the Vision of Vision Language Models The Narrow Gate: Localized Image-Text Communication in Vision-Language Models Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference analyses on MLLMs Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Lost in Embeddings: Information Loss in Vision–Language Models Words or Vision: Do Vision-Language Models Have Blind Faith in Text? Forgotten Polygons: Multimodal Large Language Models are Shape-Blind Vision Transformers Need Registers On the rankability of visual embeddings Interpretability for Diffusion Models survey awesome-generative-explainability representation general Interpreting Physics in Video World Models V-JEPA 系列 The Hidden Language of Diffusion Models FreeU: Free Lunch in Diffusion U-Net Denoising diffusion autoencoders are unified self-supervised learners Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry localization Localizing Knowledge in Diffusion Transformers Precise Parameter Localization for Textual Generation in Diffusion Models Stable flow: Vital layers for training-free image editing Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion submission (Revisiting Block-wise Interactions of MMDiT for Training-free Improved Synthesis) review motion (for video gen models) Video Diffusion Models are Training-free Motion Interpreter and Controller Emergent Temporal Correspondences from Video Diffusion Transformers inference Demystifying Video Reasoning Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions *Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model modules positional encoding FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing *Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation attention What the DAAM: Interpreting Stable Diffusion Using Cross Attention probing Interpreting Physics in Video World Models circuit Localizing and editing knowledge in text-to-image generative models On Mechanistic Knowledge Localization in Text-to-Image Generative Models Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers SAE Emergence and Evolution of Interpretable Concepts in Diffusion Models Tide: Temporal-aware sparse autoencoders for interpretable diffusion transformers in image generation SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders steering vector Video Unlearning via Low-Rank Refusal Vector Decoding Vision Transformers: the Diffusion Steering Lens learning dynamics Biased Generalization in Diffusion Models Bigger Isn’t Always Memorizing: Early Stopping Overparameterized Diffusion Models Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models visualization others The Spacetime of Diffusion Models: An Information Geometry Perspective VideoEraser: Concept Erasure in Text-to-Video Diffusion Models Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model Ac3d: Analyzing and improving 3d camera control in video diffusion transformers application-guided Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection Emergent Correspondence from Image Diffusion Moaw: Unleashing Motion Awareness for Video Diffusion Models Make It Count: Text-to-Image Generation with an Accurate Number of Objects Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference Other fields of MLLMs visual pretraining ...

February 25, 2025 · 10 min · 4521 words · Sirius