Deep-Learning on Santhisenan's Blog

Global explanations in LLMs

Fri, 03 May 2024 20:33:03 +0800

Global explanations aim to offer insights into the inner workings of an LLM by understanding what individual components have encoded. Here, individual components could be neurons, hidden layers or even larger modules. In this post, we will look at four main methods – probing, neuron activation analysis, concept-based methods and mechanistic interpretation.

Probing-based methods

During self-supervised pre-training, LLMs acquire broad linguistic knowledge from training data. Using probing techniques, we can understand the knowledge that the LLMs have captured. There are two kinds of probing.

Approaches to generating local explanations in LLMs

Sat, 20 Apr 2024 11:31:28 +0800

Why care about explainability?

Explaining why Large Language Models (LLMs) make a certain prediction is difficult. This is because LLMs are very complex “black box” models, i.e. their inner working mechanisms are opaque. However, there are mainly two reasons why we need to develop methods for explaining LLM predictions:

For end users of the models, explaining a model’s predictions will help understand the reasoning behind a certain prediction, which can help build trust in the system they are using. For example, if an LLM is used the medical domain to detect a certain disease, the medical practitioners would need to understand the reasoning behind the predictions to verify the accuracy.

Intuition for deep neural networks

Sat, 13 Apr 2024 10:03:13 +0800

In this post, I will extend the idea of interpreting shallow neural networks as piecewise linear functions to deep neural networks. This post is based on chapter 4 of the Understanding Deep Learning textbook.

Composing two shallow neural networks

Before looking into deep neural networks, let’s look at composing two shallow neural networks and see how the composition impacts the linear regions that are formed. Let’s define the first neural network that takes an input $x$ and returns an output $y$ by:

Neural Networks as Piecewise Linear Functions

Sat, 06 Apr 2024 11:44:59 +0800

Defining a simple neural network

Today I learned that a simple shallow neural networks can be thought of as piecewise linear functions. Consider a simple neural network that maps a single scalar value, $x$ to a single scalar value $y$ , given by

$$y = f[x, \theta]$$

Say this simple network only has 10 parameters, represented by

$$\phi = \{\phi_0, \phi_1, \phi_2, \phi_3, \theta_{10}, \theta_{11}, \theta_{20}, \theta_{21}, \theta_{30}, \theta_{31}\}$$

and the equation