Let’s Read: Transformer Models, Part 3

This is the final part of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, used in large language models (LLMs). In the first part, we covered some background, and in the second part we reviewed the architecture of the Transformer model. In this part, we’ll discuss the authors’ arguments in favor of Transformer models.

Why Transformer models?

The authors argue in favor of Transformers in section 4 by comparing them to previously extant options, namely recurrent neural networks (RNNs) and convolutional neural networks (CNNs).

[Read more…]

Let’s Read: Transformer Models, Part 2

This article is a continuation of my series reading “Attention is all you need”, the foundational paper that invented the Transformer model, which is used in large language models (LLMs).

In the first part, I covered general background. This part will discuss Transformer model architecture, basically section 3 of the paper. I aim to make this understandable to non-technical audiences, but this is easily the most difficult section. Feel free to ask for clarifications, and see the TL;DRs for the essential facts.

The encoder and decoder architecture

The first figure of the paper shows the architecture of their Transformer model:

diagram of Transformer architecture

Figure 1 from “Attention is all you need”

[Read more…]

Let’s Read: Transformer Models, Part 1

Large Language Models (LLMs) are a hot topic today, but few people know even the basics of how they work. I work in data science, but I also didn’t really know how they work. In this series, I’d like to go through the foundational paper that defined the Transformer model on which LLMs are based.

“Attention is all you need” by Ashish Vaswani et al. from the Proceedings of the 31st International Conference on Neural Information Processing Systems, December 2017. https://dl.acm.org/doi/10.5555/3295222.3295349 (publicly accessible)

This series aims to be understandable to a non-technical audience, but will discuss at least some of the technical details. If the technical parts are too difficult, please ask for clarification in the comments. You’re also welcome to just read the TL;DR parts, which should contain the essential points.

[Read more…]

Response to Dr. Collier on AI

I obviously watch a lot of youtube video essays, so I frequently get recommended thinkpieces about the problems with AI. And I don’t watch them because I am literally a professional in the field and I don’t need some vlogger to ramble at me about something I generally understand better than they do. But I watched one of these videos, and I disagree on some points, and now you’re going to hear about it.

The video is “AI does not exist but it will ruin everything anyway” by Dr. Angela Collier. It is not necessarily the best example to highlight (it’s an hour of rambling, I respect that most readers will not want to watch that), but it’s the one I watched, okay? I’m going to structure this as a list of items, starting with the most fact-based items, moving towards my more contentious opinions.

[Read more…]

Fair lending and discrimination

If a lender offered the same price (i.e. interest rate or APR) to every borrower, then it would only be a good deal for the riskiest borrowers. Lenders would have to raise prices to match the risk, and then it would only be a good deal for the riskiest of the riskiest borrowers. Lenders would have to raise prices further and further until there are no takers. This is called an adverse selection death spiral.

Therefore, lending fundamentally relies on offering different prices to different borrowers—and refusing some borrowers entirely. In other words, lending fundamentally relies on discrimination.

Lenders assess the risk of each borrower, in a process called underwriting, and make the decision whether to decline or approve, and at what price. Traditionally, underwriting has been done manually by human experts. It has also been performed by following pre-determined rules. More recently, many lenders are using machine learning to make underwriting decisions.

When we talk about discrimination, usually we’re talking about “bad” discrimination, such as sexism or racism. But in general, discrimination is just about treating different people differently, and that in itself is not bad. Nonetheless, legitimate discrimination can be used to conceal bad discrimination. Bad discrimination can also occur unintentionally, being concealed even to its purveyors. Fair lending regulations try to delineate and mitigate bad discrimination in lending.

[Read more…]

I am Sydney

Sydney, the late chatbot

Microsoft has a closed preview of a new GPT-powered Bing Search chatbot. Google also has another search chatbot in the works called Bard. I don’t know if this particular application of AI will pan out, but investors seem to be hoping for something big. Recently, Google’s stock dropped 9% after a factual error was spotted in one of the ads for Bard.

In my experience, the chatbots make stuff up all the time. The search chatbots are able to perform internet searches, and people worry about the bots drawing from unreliable sources. However, this concern greatly underestimates the problem, because even when summarizing reliable sources, the bots frequently make misstatements and insert plausible fabrications. Bing’s chatbot cites its sources, which turns out to be important, because you really need to verify everything.

Another interesting thing to do with these chatbots is to manipulate them. For example, you can use prompt injection to persuade Bing’s search chatbot to recite its instructions–even though the instructions say they are confidential. For example, the first four lines of instructions are:

[Read more…]

Regulating data science with explanations

Data science has an increasing impact on our lives, and not always for the better. People speak of “Big Data”, and demand regulation, but they don’t really understand what that would look like. I work in one of the few areas where data science is regulated, so I want to discuss one particular regulation and its consequences.

So, it’s finally time for me to publicly admit… I work in the finance sector.

These regulations apply to many different financial trades, but for illustrative purposes, I’m going to talk about loans. The problem with taking out a loan is that you need to pay it back plus interest. The interest is needed to give lenders a return on their investment, and to offset the losses from other borrowers who don’t pay it off. Lenders can increase profit margins and/or lower interest rates if they can predict who won’t pay off their debt, and decline those people. Data science is used to help make those decline decisions.

The US imposes two major restrictions on the data science. First, there’s anti-discrimination laws (a subject I might discuss at a later time) (ETA: it’s here). Second, an explanation must be provided to people who are declined.

[Read more…]