Posts

MM_Interp

Resource dataset (GQA) GQA:ANewDataset for Real-World Visual Reasoning and Compositional Question Answering https://cs.stanford.edu/people/dorarad/gqa/index.html image token compression (multimodal image token compression) *AdaFV: Rethinking of Visual-Language alignment for VLM acceleration (FasterVLM) [CLS] Attention is All You Need for Training-FreeVisual Token Pruning: Make VLM Inference Faster Sparsevlm: Visual token sparsification for efficient vision-languag (FastV) An image is worth 1/2 tokens after layer 2: Plug-and-PLay Acceleration for VLLM Inference LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token *Inference Optimal VLMs Need Only One Visual Token but Larger Models TokenPacker: Efficient Visual Projector for Multimodal LLM Matryoshka Multimodal Models Matryoshka Query Transformer for Large Vision-Language Models FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding spatial ...

Circuit-tuning: A Mechanistic Approach for Identifying Parameter Redundancy and Fine-tuning Neural Networks

The paper is here. ArXiv: https://arxiv.org/pdf/2502.06106

一些语言学的梗和有意思的知识

This post is written in Chinese. If you don’t know Chinese, you can learn it lol. (Sorry for this because simply translating the post into English may not be enough for you to understand). 纯玩梗语言现象背后蕴含的知识皮钦语 (pidgin) 大家对那些 1.言语中不时夹杂着英文单词 2.装/凡尔赛的人表现出一种厌恶。例如，下面是某恋综里的一段名场面： ...

Possible Research Areas in Mechanistic Interpretability

The Purpose I Write This Blog To get started in mech interp research, we need to have a macro understanding of this area. So I write this blog as a summarization of this field to help you and me choose a research topic. Circuit Discovery Methods basic activation patching (causal mediation/interchange interventions…) path patching scaling techinques: attribution patching DAS (distributed alignment search) directional activation patching? 🔭 resources inspirition Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned (ROME) Locating and Editing Factual Associations in GPT Attribution patching: Activation patching at industrial scale (ACDC) Towards Automated Circuit Discovery for Mechanistic Interpretability Attribution Patching Outperforms Automated Circuit Discovery AtP*: An efficient and scalable method for localizing llm behaviour to components Causal Scrubbing: a method for rigorously testing interpretability hypotheses new Using SAE Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Automatically Identifying Local and Global Circuits with Linear Computation Graphs Contextual Decomposition Mechanistic Interpretation through Contextual Decomposition in Transformers Edge Pruning ? Finding Transformer Circuits with Edge Pruning Functional Faithfulness in the Wild: Circuit Discovery with Differentiable Computation Graph Pruning Evaluation lack of ground truth ...

Exploring Emotional Features in GPT2-Small

🎶Code in this post can be found at the jupyter notebook in my “saeExploration” repo. Find features that reflect positive emotions To find the features related to a specific emotion, I write five sentences containing the key words for each emotion. For example, for happy emotions I have: 1 2 3 4 5 prompt_happy = ["I'll be on a vacation tomorrow and I'm so happy.", "My mombrings home a new puppy and I'm so happy.", "I'm so glad I got the job I wanted.", "I feel so happy when I'm with my friends.", "I'm so happy I got the promotion I wanted.",] I choose to look for features that reflect happiness and sadness. Apart from that, I also wonder if the feature that reflects excitedness has something to do with the one that reflects happiness (they are alike from the semantic level at least.) ...

A Brief Introduction to Mechanistic Interpretability Research

The purpose I write this blog Mechanistic Interpretability is a new field in machine learning that aims to reverse engineering complicated model structures to something clear, understandable and hopefully controllable for our humans. The study of this field is still at a young age and facing mountains of challenges. While for beginners (like me), there are lots of terms or ideas which are not so familiar (e.g. superposition, circuits, activation patching, etc). Thus it’s a little bit difficult for people new to this area to figure out what researchers are really doing. ...