There has been a great deal of buzz about the latest developments in AI such as ChatGPT. There have been practical considerations about how dangerous it might be to develop it, but there have also been concerns that the current incarnations of AI are overblown, that they are merely large language models that use massive databases of language to seek out patterns and then use those patterns to provide merely a facsimile of intelligence, similar in principle to Siri and Alexa and to the algorithms that autocorrect words or suggest the next words in our text messages, except that these are far more sophisticated.
Leaving aside those issues, Elizabeth Kolbert writes about a very practical application of large language models, and that is to try and decipher whale communication, because they seem to use regular patterns.
The world’s largest predators, sperm whales spend most of their lives hunting. To find their prey—generally squid—in the darkness of the depths, they rely on echolocation. By means of a specialized organ in their heads, they generate streams of clicks that bounce off any solid (or semi-solid) object. Sperm whales also produce quick bursts of clicks, known as codas, which they exchange with one another. The exchanges seem to have the structure of conversation.
…Since then, cetologists have spent thousands of hours listening to codas, trying to figure out what that function might be. Gero, who wrote his Ph.D. thesis on vocal communication between sperm whales, told me that one of the “universal truths” about codas is their timing. There are always four seconds between the start of one coda and the beginning of the next. Roughly two of those seconds are given over to clicks; the rest is silence. Only after the pause, which may or may not be analogous to the pause a human speaker would put between words, does the clicking resume.
Codas are clearly learned or, to use the term of art, socially transmitted. Whales in the eastern Pacific exchange one set of codas, those in the eastern Caribbean another, and those in the South Atlantic yet another. Baby sperm whales pick up the codas exchanged by their relatives, and before they can click them out proficiently they “babble.”The whales around Dominica have a repertoire of around twenty-five codas. These codas differ from one another in the number of their clicks and also in their rhythms. The coda known as three regular, or 3R, for example, consists of three clicks issued at equal intervals. The coda 7R consists of seven evenly spaced clicks. In seven increasing, or 7I, by contrast, the interval between the clicks grows longer; it’s about five-hundredths of a second between the first two clicks, and between the last two it’s twice that long. In four decreasing, or 4D, there’s a fifth of a second between the first two clicks and only a tenth of a second between the last two. Then, there are syncopated codas. The coda most frequently issued by members of Unit R, which has been dubbed 1+1+3, has a cha-cha-esque rhythm and might be rendered in English as click . . . click . . . click-click-click.
If codas are in any way comparable to words, a repertoire of twenty-five represents a pretty limited vocabulary. But, just as no one can yet say what, if anything, codas mean to sperm whales, no one can say exactly what features are significant to them. It may be that there are nuances in, say, pacing or pitch that have so far escaped human detection. Already, ceti team members have identified a new kind of signal—a single click—that may serve as some kind of punctuation mark.
At present, we know that these codas seem to serve the purpose of language for whales but we don’t know what they are saying. The idea occurred to some researchers that these large language models could be fed a database of codas and that they may be able to interpret the meanings. To do that would require the collection of a large database of whale codas.
CETI [Cetacean Translation Initiative] has its unofficial headquarters in a rental house above Roseau, [Dominica’s] capital. The group’s plan is to turn Dominica’s west coast into a giant whale-recording studio. This involves installing a network of underwater microphones to capture the codas of passing whales. It also involves planting recording devices on the whales themselves—cetacean bugs, as it were. The data thus collected can then be used to “train” machine-learning algorithms.
…The most famous whale calls are the long, melancholy “songs” issued by humpbacks. Sperm-whale codas are neither mournful nor musical. Some people compare them to the sound of bacon frying, others to popcorn popping. That morning, as I listened through the headphones, I thought of horses clomping over cobbled streets. Then I changed my mind. The clatter was more mechanical, as if somewhere deep beneath the waves someone was pecking out a memo on a manual typewriter.
…As anyone who has been conscious for the past ten months knows, ChatGPT is capable of amazing feats. It can write essays, compose sonnets, explain scientific concepts, and produce jokes (though these last are not necessarily funny). If you ask ChatGPT how it was created, it will tell you that first it was trained on a “massive corpus” of data from the Internet. This phase consisted of what’s called “unsupervised machine learning,” which was performed by an intricate array of processing nodes known as a neural network. Basically, the “learning” involved filling in the blanks; according to ChatGPT, the exercise entailed “predicting the next word in a sentence given the context of the previous words.” By digesting millions of Web pages—and calculating and recalculating the odds—ChatGPT got so good at this guessing game that, without ever understanding English, it mastered the language. (Other languages it is “fluent” in include Chinese, Spanish, and French.)
In theory at least, what goes for English (and Chinese and French) also goes for sperm whale. Provided that a computer model can be trained on enough data, it should be able to master coda prediction. It could then—once again in theory—generate sequences of codas that a sperm whale would find convincing. The model wouldn’t understand sperm whale-ese, but it could, in a manner of speaking, speak it. Call it ClickGPT.
…Andreas told me that ceti had already made significant strides, just by reanalyzing Gero’s archive. Not only had the team uncovered the new kind of signal but also it had found that codas have much more internal structure than had previously been recognized. “The amount of information that this system can carry is much bigger,” he said.
“The holy grail here—the thing that separates human language from all other animal communication systems—is what’s called ‘duality of patterning,’ ” Andreas went on. “Duality of patterning” refers to the way that meaningless units—in English, sounds like “sp” or “ot”—can be combined to form meaningful units, like “spot.” If, as is suspected, clicks are empty of significance but codas refer to something, then sperm whales, too, would have arrived at duality of patterning. “Based on what we know about how the coda inventory works, I’m optimistic—though still not sure—that this is going to be something that we find in sperm whales,” Andreas said.
There have long been heated debates as to whether language is something that is unique to humans. While developments like ChatGPT are interesting, I must say that this particular application of the technology is what I have found to be the most exciting so far. Communicating with other species would open up a wonderful new world.
steve oberski says
In David Brin’s “Uplift Universe” novels, the only thing of value that humans can offer the Five Galaxies (a multitude of sapient races that has existed for billions of years) are recordings of whale songs, and in fact the Five Galaxies value the whale civilization far more than they do human civilization.
The only reason that more severe sanctions were not imposed on humans for the ecological carnage they inflicted on the earth is the fact that they had “uplifted” (in which a “patron” species genetically modifies a pre-sapient “client” species until it is sapient) several species at the time of first contact (Chimpanzees, Gorilla, Dolphins).
moarscienceplz says
When ChatGPT was trained in Chinese, were the trainers themselves ignorant of Chinese? Because that would be analogous to to the sperm whale situation. If all that happens is ChatGPT learns common patterns of codas it might generate something equivalent to “purple grape shot in the arm” where each word is mathematically related to its neighbors, but the totality is gibberish. ISTM we need to first generate a dictionary of whale codas which would entail cataloging whale behavior after they hear some codas. Has anyone fed millions of hours of TV shows to ChatGPT and had it learn English verbally? That is what would be needed to learn whale talk IMO.
jimf says
The model wouldn’t understand sperm whale-ese, but it could, in a manner of speaking, speak it.
And what, precisely, does anyone gain if I can speak German but do not know what I am saying? I guess it might be good for a comedy sketch…
https://youtu.be/grA5XmBRC6g
moarscienceplz says
@jimf
I knew immediately what clip that link would take me to. Thanks, and may your hovercraft always be full of eels.
SchreiberBike says
“The model wouldn’t understand sperm whale-ese, but it could, in a manner of speaking, speak it.”
The same with the large language models that understand English; they can respond to what we say in convincing ways, but we have no idea what, if anything, they understand. Similarly I have no idea what any of you all understand, but I’m operating on the assumption that you are human and think in ways somewhat similar to myself.
Translating another language without context is another whole kettle of cetaceans.
Heidi Nemeth says
Many conversations are uninteresting to me. I’m doubt I’d find myself fascinated by sperm whale conversations if they are all about where the squid are, how many squid there are, and grandma squid’s aches and pains. The conversations would be more interesting if they involve sex, danger, past, present and future. Most interesting would be if they cover relationships and emotions. How well does Chat-GPT deal with emotions?
sonofrojblake says
Is the first translated sperm whale communication turned out to be the words “oh no not again”, a million geek pedants’ heads would explode.
Pierce R. Butler says
Has anybody tried large-language models on the vocalizations of animals which ethologists have studied in depth already and have a (putative) grasp of typical meanings? A few practice runs with primates, songbirds, etc, would make more sense than jumping into the (very) deep end with organisms with brains larger than ours, and said brains consisting mostly of sound-processing circuits.
John Morales says
“This phase consisted of what’s called “unsupervised machine learning,” which was performed by an intricate array of processing nodes known as a neural network.”
(My emphasis)
Marcus Ranum says
I find this whole topic fascinating, and keep thinking of Carl Safina’s book How Animals Think and Feel -- I read it a few years ago and recommend it. His point is basically that, “of course animals are trying to communicate -- that’s what the noises they make are for!” Well, duh. At the time I read the book I had two dogs who spoke to eachother all the time in dog-bro (they were littermates) and would kind of make these chirbling noises that apparently they understood between eachother. One of the things that made me think is that maybe we have situations where a pair of dog-bros grow up together, not in the context of a dog pack, and develop their own language extensions, like some children also do. How would we know?
I knew someone who used to research crows’ speech, and apparently many birds have species-specific vocalizations, but also develop regional dialects. e.g.: a crow from Alabama might sound like a redneck, or something.
Many years ago I tangled online with what I can only describe as a “human supremacist” -- a deeply religious clown who wanted to push the idea that animals have no souls, and therefore no inner life, and therefore cannot communicate. I found that “reasoning” to be absolutely absurd, and -- literally -- asked my dogs, who did what they usually did when I talked to them about something, at length: sat down, perked up their ears, put attentive expressions on their faces, and pretty much completely failed to understand a thing I was saying. But, it was obvious they were trying, per Safina, if they didn’t understand me at all, they would not have reacted. The fact that they sat still and listened was communication -- it just was not very effective.
Safina’s book was pretty eye opening, since it made me realize that, yeah, anyone who has known a dog ought to understand that when they growl, it’s communication. And when a whale pings, it’s communication. And when a cat pisses on your shoes, it’s communication.
I think the problem with training AI to communicate is that you need to have a “… therefore” attached to the communication. I would need to tell the AI, “If I say the word ‘sit’ to my dog, it sometimes sits.” If I tell a whale, “come here, big guy” and it comes over to me 90% of the time, is it just curious or have we learned to communicate? That’s not that big a problem, though, because I can vary the inputs and observe the outputs just like we do when we are teaching a human child. I would also hypothesize that two AIs that wanted to exchange a language could do so by performing the signaling and variation at warp speed.
Like with a human child, why should we hold AI to a standard that is unrealistic? It’ll get some things wrong -- but part of communication is correcting communication errors. I don’t know how to say “I don’t understand you” in whale but my dog-bros sure knew how to say it with their ears and carefully constructed confused expressions. I wonder if I can still find the videos I made of me talking to them -- they were pretty cute. Or, they were to me. Maybe a pair of 150lb dogs attempting a “friendly grin” might scare the crap out of someone who didn’t know dogs.
Matt G says
“So long, and thanks for all the krill.”
xohjoh2n says
@10:
Ha! That’s what you think. That ear twitch actually meant “I think when we take over I’ll recommend we keep this one -- he’s pretty useless, but somewhat amusing.”
John Morales says
Well, I did wait.
“Using large language models to understand whales”
≠
“Using large language models to
try to
understand whales”—
I’m reminded of two Gary Larsen cartoons:
One in which three doodled french poodles (in the foreground) are discussing murdering their owner (shown doing dishes in the background)
“Well, yes, that is the downside, Fluffy. When we kill her, the pampering will end.”
Another where the scientist is walking down the street where dogs are vocalising, wearing his almost indetectable apparatus:
“Donning his new canine decoder, Professor Schwartzman becomes the first human being on Earth to hear what barking dogs are actually saying.
John Morales says
[Larson]
Silentbob says
Since Morales apparently can’t be bothered:
https://live.staticflickr.com/4121/4872650427_cdec0a24a3_b.jpg
https://pbs.twimg.com/media/F8T3ncvWAAAdf0c.jpg:large
Srsly, dude. The second one doesn’t even work without the image. Think before posting. *facepalm*
John Morales says
<snicker>
I’ve got a coolie to do my work for me, bub.
John Morales says
But hey:
http://hyperboleandahalf.blogspot.com/2010/03/animals.html
sonofrojblake says
@mjr, 10:
That’s not necessarily true though -- there’s the confounding factor that when a whale pings, it might very well simply be attempting to sense its surroundings. That emitted sound may have no informational content AT ALL -- its function being to reflect off things and provide information with its echoes. I’m put in mind of the noises my three-year-old son makes when we walk through a tunnel or other acoustically interesting space. He’s not communicating anything -- he’s experiencing the different way the noise bounces back in this space compared to what he’s used to. It’s an entirely internal experience to him, like say looking at a rainbow, it’s just that I can’t HEAR him looking at a rainbow. I can’t learn anything about his mental state from the noise he makes though, because the noise has no content… it’s just a noise. How could an AI “understand” that?
It’s a problem almost unique to cetaceans (bats are conceivably similar?), which just reinforces the sense that doing this with almost ANY land animal vocalisation would be more likely to yield results than with creatures that we KNOW use their “voice” for things other than stuff that actually sounds like “talking”.
Holms says
“Coolie”? Real nice.
John Morales says
I know, Holms. Bloody excellent.
(And self-elected, at that!)
Raging Bee says
And what, precisely, does anyone gain if I can speak German but do not know what I am saying?
“I will not buy this large language model, it is scratched.”
birgerjohansson says
Sonofrojblake @ 7
It took me an hour to get it.
sonofrojblake says
@birgerjohansson,22: that makes me very happy 🙂
moarscienceplz says
@#9 John Morales
I reject that assertion. I don’t see any way to have an “AI” “learn” a language without giving it correct feedback from people competent in that language.
John Morales says
https://en.wikipedia.org/wiki/Unsupervised_learning
sonofrojblake says
@John Morales, 25:
You’re obviously an expert.
As such, I’m sure you can explain the following: the link you supplied describes the training of a network as follows:
You can surely explain how, in the complete absence of anyone competent to discriminate between correct output and error, an unsupervised network can nevertheless be successfully trained. Do enlighten us.
John Morales says
sonofrojblake:
Well, I can obviously read the OP and quote from it.
So I suppose that’s obvious expertise.
Not to the likes of you.
Sure:
“Basically, the “learning” involved filling in the blanks; according to ChatGPT, the exercise entailed “predicting the next word in a sentence given the context of the previous words.” By digesting millions of Web pages—and calculating and recalculating the odds—ChatGPT got so good at this guessing game that, without ever understanding English, it mastered the language.”
cf. https://en.wikipedia.org/wiki/Chinese_room
sonofrojblake says
@John Morales, 27:
So, “no”, then. As you go on to demonstrate.
You’ve very obviously not thought this through. The LLM training you’re talking about involved using a training set consisting of millions of pages of known-to-be-valid English text inputs. Its digestion of that training set was, yes, “unsupervised”. Whoopy fuckin doo.
But it was only useful for producing outputs in English because there was someone upfront filtering what went into that training set, someone who understood English.
How well could you produce a training set for language X if
(a) you can’t initially be sure you’re even dealing with language at all per se
(b) you can’t work with text, which is readily categorisable, but instead have to work with just sounds, and far more importantly
(c) you have no idea whether the noise you’re hearing is
-- gibberish (is that a whale word, or just the equivalent of the noise my kid makes when he knows it will echo?)
-- a valid expression in language X
-- where that expression starts and stops (is that one word, or two?)
-- a valid expression in the mutually unintelligible language Y (is that English, or Chinese?)
-- a valid expression in the mutually unintelligible language Z (oh, hang on, could be Hungarian…)
How well do you suppose ChatGPT would answer questions in English if its training set had just been twenty million randomly selected pages of text in all human languages, and another 980 million of just random text? It is the complete inability, even in principle, to discriminate errors from correct outputs at any stage of the process that makes what’s being discussed here… challenging.
John Morales says
Exactly. Turns out I am not an expert, whether obviously or not.
But I can read and quote, expertly.
I can read, but.
“This phase consisted of what’s called “unsupervised machine learning,” which was performed by an intricate array of processing nodes known as a neural network.”
(My emphasis)
“To find their prey—generally squid—in the darkness of the depths, they rely on echolocation. By means of a specialized organ in their heads, they generate streams of clicks that bounce off any solid (or semi-solid) object. Sperm whales also produce quick bursts of clicks, known as codas, which they exchange with one another. The exchanges seem to have the structure of conversation.”
Training set of communications in English, training set of communications in Whalish. Same thing to the algorithm.
(Presumably, whales communicate in whalish just as people communicate in peoplish)
Wow, you really are dense.
Issue at hand is what I addressed directly: “When ChatGPT was trained in Chinese, were the trainers themselves ignorant of Chinese?”
Again, there are no trainers either in English or in Chinese or in Whalish.
(There’s a dataset upon which the LLM trains itself — unsupervised)
—
Get it into your head: I am quoting the OP, are regurgitating it.
(Remind you of anything? 😉 )
John Morales says
[exercise]
The wages of sin is _____.
birgerjohansson says
It would be interesting to “teach” a large language model English using the dialogue of a typical Spike Lee film.
John Morales says
PS a bit more than a million pages: https://community.openai.com/t/what-is-the-size-of-the-training-set-for-gpt-3/360896
Dunc says
Very basically, you take your corpus of candidate training material and you split it into a training set and testing set, to which you apply partial masking. You train the model on the training set, then you see how well it can reproduce masked bits of the test set. If your corpus is garbage, there are no consistent patterns to be detected, and so the model will not be able to reproduce the test set. If you can train the model to reproduce the test set from the training set, then there must be a consistent pattern across the whole corpus.
Given the size of ChatGPT’s training corpus, I very much doubt that it was all pre-selected by people, although they were able to point it in the direction of vast swathes of known good(ish) material. Of course, for ChatGPT there were additional rounds of both supervised learning and reinforcement learning from human feedback, nethier of which would be available in the case of sperm whales, but that doesn’t mean that a model trained purely through unsupervised learning is worthless. But yes, it’s unlikely to be able to have a completely convincing conversation.
I suspect that the researchers involved in this project are probably quite well aware of its limits -- probably rather more so than any interested layman who’s given the matter a few minutes thought. I also suspect that the pop-sci write up we’ve got here hasn’t fully captured those issues and complexities.
Silentbob says
@ Morales
Why do hyperliteral trolls keep posting pedantic messages on blogs where they are not welcome?
Actual ChatGPT response 🙂
Seems pretty cluey to me Juan Ramón, lol. X-D
John Morales says
Ah yes, Blob of Quietude.
Because you are yet to respond to this, one of my perennial questions to you:
What do you imagine is the difference between literal and “hyperliteral”?
Whence the prefix?
See, I’ve responded thus (corpus right there) each time you foolishly try to paint me as beyond literal. Your little neologism is vapid, of course.
(By its fruit shall ye know the tree)
Nah. Not even slightly. You are just saying that.
And, again, for the umpteenth time, yet another perennial question:
What is your natal name? You’ve latched onto mine since in the last couple of decades I have let it be known, and after due schooling you know how it goes.
So, PustulentBoil, what is yours?
—
And, of course, here is the pattern yet again manifest. From two of the current trio, though WMDK is a postulant.
All about me. Me, me, me.
Ah well. I am more interesting than the topic at hand, to these junkies.
(I am special)
Holms says
I notice the Voynich manuscript is still unsolved. Any takers from the machine learning advocates…? Though with its thousands of distinct words I suspect it will be a few orders of magnitude more complex than whales and their two dozen.
…
If he had gone with just ‘no’, the reply would have been useful and lacking any derision. And he can’t be having with that now, can he; there’s an image to maintain. So, extra verbiage specifically to add a jab.
Dunc says
@36: That’s an entirely different class of problem. You can’t do cryptanalysis with an LLM, although if you had a large enough sample of Voynich text, you could train one to produce more of it (assuming that it’s not just gibberish).
John Morales says
Ah, the usual triumvirate has had its toke.
Very psychoanalytic, that is.
(You mean ‘persona’?)
John Morales says
I notice you presume it’s solvable.
(cf. my #13)
sonofrojblake says
@John Morales, 29:
Corrected it for you.
And it generates the dataset itself, unsupervised?
Dunc, 33:
Here’s a thing: why has AI in its currently massively overhyped, glorified-auto-correct form only really blown up in the last 12-18 months? Because it’s only relatively recently that sufficiently massive corpuses (corpi?) of training material have become available and processable in less than years. And the success of these models has been pretty dependent on the staggering size of their training sets. A recently famous example of just one small subset of these things is “Books3”, a collection of pirated ebooks numbering close to two hundred THOUSAND.
Does the totality of all the available whale “communication” represent a corpus sufficient to fill even ONE book the size of the average “Books3” entry? Limited size of corpus is likely to massively limit the effectiveness of any training possible even if there were someone in a position to weed out the gibberish and sort the “communication” into separate sets depending on which “language” they were in (you didn’t assume all whales speak the same language, right?) -- which of course there isn’t.
I think what you’ve got here is an example of what’s a pretty common thing in science reporting -- someone trying to attach their possibly/probably legit research to whatever’s currently sexy, and right now it’s AI and LLMs.
I’ll believe it when they publish a translation of whale conversation. I won’t hold my breath…
Dunc says
@40
From the OP:
[My emphasis]
I’m pretty sure they’re not really trying to come up with an LLM that can converse with whales. They’re applying similar techniques and technologies to try and understand more about the structure of whale vocalisations, with apparently some success already.
To quote my own earlier comment: “I suspect that the researchers involved in this project are probably quite well aware of its limits — probably rather more so than any interested layman who’s given the matter a few minutes thought.”
sonofrojblake says
@Dunc, 41:
Yeah, fair enough -- I’m possibly just massively cynical about AI hype, having heard it before. Can it really be almost 20 years since the I-then-thought-relatively-reputable New Scientist magazine uncritically parroted a press release about “ChatNannies”, a supposed software agent that could spot grooming behaviour employed by paedophiles in online chatrooms? https://hoaxes.org/weblog/comments/chatnannies
Dunc says
There certainly is a depressingly large amount of bollocks AI hype out there right now, and it’s good to be cynical about it -- but equally, machine learning is a real thing with lots of interesting applications that’s going through an amazing growth phase at the moment. Telling the difference can be really quite hard, and the breathless pop-sci stuff doesn’t help much. This sounds legit to me though, once you strip out the obligatory references to ChatGPT. Not that I’m any kind of an expert…
Holms says
#39
No I don’t.
John Morales says
@44, yes, you do.
“is still unsolved” is only applicable to something solvable; something unsolvable can neither be solved nor unsolved.
(Your grasp of logic remains as strong as ever)
Holms says
Something that is not solved can be termed unsolved, whether it is solvable or not.
John Morales says
Heh heh heh.
Something that is not dead can be termed undead, whether it can die or not.
I have here an undead glass.
It is not yet dead, it will never be dead since it was never alive, so it is undead.
(Oh, right. +1 to your thread quota of persistent pointless personal protestations)
Holms says
Analogy fail. Something that is not solved can be termed unsolved, without even knowing if it is possible to be solved.
The point is to correct lousy reasoning. You are a prolific producer of such.
You take disagreement with your reasoning personally?? This explains a lot!
John Morales says
+1. A few more to go, Holms.
Of course, you have nothing to offer on the topic at hand.
It’s all about your perception of me.
(Me, me, me)
Holms says
I mentioned the Voynich in relation to solving language; you then jumped on that to make a silly claim about my use of ‘unsolved’ rather than comment on topic. Thus the diversion from the topic was, as usual, all you.
John Morales says
Solving language!
(Why, is it unsolved?)
+1
John Morales says
It’s always about me, where you, bobiferous, and the other are concerned.
But I do get it. I am more interesting than the topic at hand.
(AI, whales)
sonofrojblake says
Holms -- you’re feeding it.
John Morales says
Heh heh heh.
Whale songs.
Holms says
Look in a mirror. #36 was on topic, your response to it was the tangent. All I do from there is swat at the dumb stuff you post.
John Morales says
Heh heh heh.
And you try to pretend that it’s not all about me.
<snicker>
+1.
Holms says
It began with you being wrong about my on-topic #36, so, yes. Mirror.
But sure, your usual spin. Other replies to you = ‘all about you’; your replies to others = [ignored].
sonofrojblake says
Holms — you’re feeding it.
John Morales says
Ah well, persistent patterns persist.
That’s enough for now, Holms.
(“it” is a cute thing it says)
John Morales says
Related to the actual topic: https://www.theguardian.com/science/2023/oct/12/researchers-use-ai-to-read-word-on-ancient-scroll-burned-by-vesuvius
Holms says
See you in the next pedantic tangent!
John Morales says
[boo!]