Environmental impact of LLMs


A reader asked: what is the environmental impact of large language models (LLMs)? So I read some articles on the subject, and made comparisons to other technologies, such as video games and video streaming. My conclusion is that the environmental footprint is large enough that we shouldn’t ignore it, but I think people are overreacting.

Pricing

I’m not an expert in assessing environmental impact, but I’ve had a bit of experience assessing computational prices for LLMs. Pricing might be a good proxy for carbon footprint, because it doesn’t just represent energy costs, but also the costs of building and operating a data center. My guess is that across many different kinds of computation tasks, the carbon footprint per dollar spent is roughly similar. And in my experience, LLMs are far from the most significant computational cost in a typical tech company.

To start with a simple example, I did one thing that required transcribing audio, and then processing the transcript with an LLM. The computational cost of transcribing the audio was an order of magnitude more expensive than processing the transcript. And that makes sense, because audio files are much, much larger than text files! I have not seen any public consternation about the environmental impact of voice to text transcriptions. I think that people are very suspicious of new technologies, while taking the costs of older technologies for granted.

You can check pricing for yourself, let me walk you through it. If you google “LLM pricing”, you can find many pages like this one. Right now, the most expensive model is Claude Opus (larger than you need for most use cases), and it costs $15 per million input tokens, and $75 per million output tokens. Suppose I briefly prompt Claude Opus to write a short blog post (terrible waste, by the way, real bloggers are better).  Let’s say it generates a short essay of about 1000 words.  That’s roughly 1000 tokens, which costs 7.5 cents. Let’s compare that to AWS’s pricing for virtual machines. My personal computer has 16 GB RAM, so a comparable machine is t3.xlarge, which costs 16.6 cents per hour. So using an LLM to generate a blog post costs about as much as running my PC for half an hour. More realistically, you would use a smaller model such as GPT 3.5 (what ChatGPT uses on the free tier), and that’s about 5 times cheaper than Claude Opus.

It’s not nothing. But, most tech workers are running PCs for eight hours a day, and sometimes larger virtual machines, and hardly anyone seems to worry about that.

The real concern is when companies start using LLMs in industrial quantities. For instance, suppose Amazon decides to feed every product review into an LLM to generate summaries, that would be very expensive. But I don’t know how much that would compare to their other computational costs.

Environmental Impact

Of course, pricing is just a proxy for environmental impact, and we want some more direct estimates. Let’s take a look at a popular article, “What Do Google’s AI Answers Cost the Environment?” in SciAm.

The first claim is “When compared to traditional search engines, AI uses ‘orders of magnitude more energy,'” This is not a very useful comparison, because I wasn’t previously worried about the carbon footprint of search engines, so if you tell me I should be 100 times more worried about AI search engines, I really don’t know worried to be! (Also, there are several completely distinct ways in which AI could be used in a search engine: interpreting queries, finding results, and summarizing results–and they’re being unclear about what they’re talking about.)

The next claim is “the large language model BLOOM emitted greenhouse gases equivalent to 19 kilograms of CO2 per day of use, or the amount generated by driving 49 miles in an average gas-powered car.” Okay, but I can imagine lots of individuals driving 49 miles per day, while it would be extremely unusual for a single individual to run large language model queries for 24 hours straight.  Keep in mind that when you use ChatGPT, you aren’t using it continuously, you’re spending a lot of time idling.  While you’re idling, those computational resources aren’t getting wasted, they’re going to some other user.

Next: “Others have estimated in research posted on the preprint server arXiv.org that every 10 to 50 responses from ChatGPT running GPT-3 evaporate the equivalent of a bottle of water to cool the AI’s servers.” This comparison is unimpressive on its face, because a bottle of water is very obviously not that much? According to one website, a quarter pound of beef consumes about 2000 gallons of water, and one cup of soda consumes 350 cups of water. Also, the claim made me imagine drinking a bottle of water, until I realized that the key word was “evaporate”. Based on a quick calculation, evaporating 1 L of water requires at the very least 0.6 kWh, and producing that much electricity consumes 30-60 L of water. So consider me baffled by how they lowballed their own estimate.

SciAm also gives an estimate of total consumption: “Data centers, including those that house AI servers, currently represent about 1.5 percent of global energy usage but are projected to double by 2026, at which point they may collectively use as much power as the country of Japan does today.” The problem is, this doesn’t tell us how much of that energy is spent on large language models vs all the other things data centers are used for.

Another article says, “A recent study found that training just one AI model can emit more than 626,000 pounds of cardon dioxide, which is equivalent to nearly five times the lifetime emissions of an average American car.” That sounds like a lot, but most companies aren’t training their own models, they’re just using models off the shelf.  How many new LLMs are trained every year anyway?

Here’s an assessment from another article:

Even if inference uses 100 times as much as training, and even if there are 100 models as popular as ChatGPT, these LLMs still account for only 5 million tonnes CO2e (100 x 100 x 500 = 5 million): 0.01% of global emissions or at most half a percent of global ICT [information and communications technology] emissions.

The assumption here is that ChatGPT is 1% of total LLM costs, and training the model accounts for 1% of the total cost. The assumptions seem sketchy to me but that’s the only total estimate I have.

Personally, my takeaway is that people should stop eating beef.

Comparable technologies

I think part of the reason people worry about the environmental impact of LLMs, is that we’ve been primed by the previous technological fad, cryptocurrency. Cryptocurrency is extremely wasteful because proof-of-work incentivizes an arms race among crypto miners to build bigger and bigger data centers. According to this page, a single Bitcoin transaction generates 433 kg of carbon dioxide, the equivalent of watching 72 thousand hours of YouTube. It consumes power equivalent to 26 days of household power consumption. It also uses 12,000 liters of water, which is about one swimming pool. Incredible.

In total, Bitcoin generates 96 million tons of carbon dioxide annually, which is more than an order of magnitude greater than the earlier estimate for LLMs. And remember, this is for a service used by a tiny group of crypto geeks.

Okay, but crypto is an easy target. What about more common activities, like streaming videos? According to this estimate, it consumes about 0.077 kWh of electricity to stream one hour of video, producing about 0.036 kg of carbon dioxide. If you recall, the estimate for running BLOOM was 19 kg/day. So minute by minute, running BLOOM has a carbon footprint that is about 20 times larger than streaming Netflix for the same amount of time.  But it’s a lot easier to imagine streaming an hour of video than it is to imagine running an LLM for an hour (not including idle time).

Next: video games. According to one estimate, the entire games industry (including developers and players) produced between 3 and 15 million tons of carbon dioxide in 2020, which is comparable to the global film industry. That’s comparable to the (admittedly dubious) estimate of the total LLM footprint.  Of course, personally I derive much greater joy from the global games industry than from the global LLM industry.

Why we’re concerned.

Let’s wrap this up with a more conceptual understanding of what’s going on.

An LLM is a neural network, and neural networks mainly involve performing a lot of matrix multiplication.  Training an LLM basically means deciding what numbers go in each cell of each matrix.  There are a lot of matrices, and they’re quite large, so all in all there are about 100 billion parameters.  Some models are larger, and some are smaller.  Typically, each parameter takes 2 bytes, so simply storing an LLM requires hundreds of gigabytes.

Those hundreds of gigabytes are what allow LLMs to be so versatile.  But given any specific task, far fewer parameters would have sufficed.  So there is a sense in which LLMs are inefficient.  You’re bringing 100 billion parameters to bear on a narrow issue, because you didn’t want to invest the resources to develop a more narrowly tailored tool.  LLMs are essentially a form of software bloat, made possible by the ever decreasing costs of computation.

So, on a conceptual level, I really don’t think people are wrong to worry about the environmental impact of large language models.  I commend the researchers who looked into the subject, and came up with a bunch of (frequently dubious) analogies to explain the size of the footprint.  That’s important work.  But I’m also not very impressed by the numbers, and if anything have been surprised how small the footprint is.  I think the public’s attention to the environmental impact has not been proportionate to the size of the impact.

But I think it’s fair for readers to look at these scattered estimates and draw different conclusions from me. Obviously, if you feel LLMs are useless then the environmental impact is too high for what you get in return.

Comments

  1. says

    im inclined to agree broadly with your assessment, but when I think about the inefficiencies and how they’re about to multiply, i can’t help but think of this as a capitalism problem, yet again.

    it’s in the usefulness you mentioned in the last paragraph: what are LLMs used for now and what are corporations planning to use them for? if this enterprise could be driven by the best interests of humanity instead of profit, how much more efficient could it be?

    jean q public would be hard-pressed to argue against an environmentally costly computer process that brought together disparate medical and scientific info to propose testable cures for cancer, but instead we have a few dozen companies chasing investor dollars by coming up with their own redundant applications of something we already know is a fundamentally flawed bullshit generator.

    even with the flaws of gpt 3.5 it can be a very useful polite bullshit generator. i listen to the banal repetitive conversations of most people and think this *could be* a fully adequate replacement for an average person, conversationally, which to me suggests we could be on the verge of curing loneliness. i’m totally serious about that.

    that alone is worth an environmental cost to me, especially when any given thing we do is a trade-off for something else, and it’d reduce time spent streaming fox news, for example.

    idk. i usually do not care one jot about tech innovations, but the new leap in AI has such radical potential, it just sucks to see it percolating in the late crapitalist hellscape.

  2. says

    for another topic i might like to see u strap on the speculative fiction hat and generate some ideas for: how could the tech be used if human well-being was the only consideration, how is it actually going to be used, and more imaginatively, where could this tech go in the future?

Leave a Reply

Your email address will not be published. Required fields are marked *