<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Deep-Learning on Santhisenan's Blog</title><link>https://santhisenan.github.io/blog/tags/deep-learning/</link><description>Recent content in Deep-Learning on Santhisenan's Blog</description><generator>Hugo</generator><language>en-us</language><copyright>© Santhisenan A</copyright><lastBuildDate>Fri, 03 May 2024 20:33:03 +0800</lastBuildDate><atom:link href="https://santhisenan.github.io/blog/tags/deep-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>Global explanations in LLMs</title><link>https://santhisenan.github.io/blog/posts/global-explain-llm/</link><pubDate>Fri, 03 May 2024 20:33:03 +0800</pubDate><guid>https://santhisenan.github.io/blog/posts/global-explain-llm/</guid><description>&lt;p&gt;Global explanations aim to offer insights into the inner workings of an LLM by understanding what individual components have encoded. Here, individual components could be neurons, hidden layers or even larger modules. In this post, we will look at four main methods &amp;ndash; probing, neuron activation analysis, concept-based methods and mechanistic interpretation.&lt;/p&gt;
&lt;h2 id="probing-based-methods"&gt;Probing-based methods&lt;/h2&gt;
&lt;p&gt;During self-supervised pre-training, LLMs acquire broad linguistic knowledge from training data. Using probing techniques, we can understand the knowledge that the LLMs have captured. There are two kinds of probing.&lt;/p&gt;</description></item><item><title>Approaches to generating local explanations in LLMs</title><link>https://santhisenan.github.io/blog/posts/llm-explain-local/</link><pubDate>Sat, 20 Apr 2024 11:31:28 +0800</pubDate><guid>https://santhisenan.github.io/blog/posts/llm-explain-local/</guid><description>&lt;h2 id="why-care-about-explainability"&gt;Why care about explainability?&lt;/h2&gt;
&lt;p&gt;Explaining why Large Language Models (LLMs) make a certain prediction is difficult. This is because LLMs are very complex &amp;ldquo;black box&amp;rdquo; models, i.e. their inner working mechanisms are opaque. However, there are mainly two reasons why we need to develop methods for explaining LLM predictions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;For end users of the models, explaining a model&amp;rsquo;s predictions will help understand the reasoning behind a certain prediction, which can help build trust in the system they are using. For example, if an LLM is used the medical domain to detect a certain disease, the medical practitioners would need to understand the reasoning behind the predictions to verify the accuracy.&lt;/p&gt;</description></item><item><title>Intuition for deep neural networks</title><link>https://santhisenan.github.io/blog/posts/intuition-dnns/</link><pubDate>Sat, 13 Apr 2024 10:03:13 +0800</pubDate><guid>https://santhisenan.github.io/blog/posts/intuition-dnns/</guid><description>&lt;p&gt;In this post, I will extend the idea of interpreting shallow neural networks as piecewise linear functions to deep neural networks. This post is based on chapter 4 of the &lt;a href="https://udlbook.github.io/udlbook/"&gt;Understanding Deep Learning&lt;/a&gt; textbook.&lt;/p&gt;
&lt;h2 id="composing-two-shallow-neural-networks"&gt;Composing two shallow neural networks&lt;/h2&gt;
&lt;p&gt;Before looking into deep neural networks, let&amp;rsquo;s look at composing two shallow neural networks and see how the composition impacts the linear regions that are formed. Let&amp;rsquo;s define the first neural network that takes an input $x$ and returns an output $y$ by:&lt;/p&gt;</description></item><item><title>Neural Networks as Piecewise Linear Functions</title><link>https://santhisenan.github.io/blog/posts/nn-as-piecewise-linear/</link><pubDate>Sat, 06 Apr 2024 11:44:59 +0800</pubDate><guid>https://santhisenan.github.io/blog/posts/nn-as-piecewise-linear/</guid><description>&lt;h2 id="defining-a-simple-neural-network"&gt;Defining a simple neural network&lt;/h2&gt;
&lt;p&gt;Today I learned that a simple shallow neural networks can be thought of as piecewise linear functions. Consider a simple neural network that maps a single scalar value, $x$ to a single scalar value $y$ , given by
&lt;/p&gt;
$$y = f[x, \theta]$$&lt;p&gt;Say this simple network only has 10 parameters, represented by
&lt;/p&gt;
$$\phi = \{\phi_0, \phi_1, \phi_2, \phi_3, \theta_{10}, \theta_{11}, \theta_{20}, \theta_{21}, \theta_{30}, \theta_{31}\}$$&lt;p&gt;
and the equation&lt;br&gt;
&lt;/p&gt;</description></item></channel></rss>