The GOP has become the party of political thuggery » « Another thriller in rugby World Cup

Using large language models to understand whales

There has been a great deal of buzz about the latest developments in AI such as ChatGPT. There have been practical considerations about how dangerous it might be to develop it, but there have also been concerns that the current incarnations of AI are overblown, that they are merely large language models that use massive databases of language to seek out patterns and then use those patterns to provide merely a facsimile of intelligence, similar in principle to Siri and Alexa and to the algorithms that autocorrect words or suggest the next words in our text messages, except that these are far more sophisticated.

Leaving aside those issues, Elizabeth Kolbert writes about a very practical application of large language models, and that is to try and decipher whale communication, because they seem to use regular patterns.

The world’s largest predators, sperm whales spend most of their lives hunting. To find their prey—generally squid—in the darkness of the depths, they rely on echolocation. By means of a specialized organ in their heads, they generate streams of clicks that bounce off any solid (or semi-solid) object. Sperm whales also produce quick bursts of clicks, known as codas, which they exchange with one another. The exchanges seem to have the structure of conversation.
…

Since then, cetologists have spent thousands of hours listening to codas, trying to figure out what that function might be. Gero, who wrote his Ph.D. thesis on vocal communication between sperm whales, told me that one of the “universal truths” about codas is their timing. There are always four seconds between the start of one coda and the beginning of the next. Roughly two of those seconds are given over to clicks; the rest is silence. Only after the pause, which may or may not be analogous to the pause a human speaker would put between words, does the clicking resume.
Codas are clearly learned or, to use the term of art, socially transmitted. Whales in the eastern Pacific exchange one set of codas, those in the eastern Caribbean another, and those in the South Atlantic yet another. Baby sperm whales pick up the codas exchanged by their relatives, and before they can click them out proficiently they “babble.”

The whales around Dominica have a repertoire of around twenty-five codas. These codas differ from one another in the number of their clicks and also in their rhythms. The coda known as three regular, or 3R, for example, consists of three clicks issued at equal intervals. The coda 7R consists of seven evenly spaced clicks. In seven increasing, or 7I, by contrast, the interval between the clicks grows longer; it’s about five-hundredths of a second between the first two clicks, and between the last two it’s twice that long. In four decreasing, or 4D, there’s a fifth of a second between the first two clicks and only a tenth of a second between the last two. Then, there are syncopated codas. The coda most frequently issued by members of Unit R, which has been dubbed 1+1+3, has a cha-cha-esque rhythm and might be rendered in English as click . . . click . . . click-click-click.

If codas are in any way comparable to words, a repertoire of twenty-five represents a pretty limited vocabulary. But, just as no one can yet say what, if anything, codas mean to sperm whales, no one can say exactly what features are significant to them. It may be that there are nuances in, say, pacing or pitch that have so far escaped human detection. Already, ceti team members have identified a new kind of signal—a single click—that may serve as some kind of punctuation mark.

At present, we know that these codas seem to serve the purpose of language for whales but we don’t know what they are saying. The idea occurred to some researchers that these large language models could be fed a database of codas and that they may be able to interpret the meanings. To do that would require the collection of a large database of whale codas.

CETI [Cetacean Translation Initiative] has its unofficial headquarters in a rental house above Roseau, [Dominica’s] capital. The group’s plan is to turn Dominica’s west coast into a giant whale-recording studio. This involves installing a network of underwater microphones to capture the codas of passing whales. It also involves planting recording devices on the whales themselves—cetacean bugs, as it were. The data thus collected can then be used to “train” machine-learning algorithms.
…

The most famous whale calls are the long, melancholy “songs” issued by humpbacks. Sperm-whale codas are neither mournful nor musical. Some people compare them to the sound of bacon frying, others to popcorn popping. That morning, as I listened through the headphones, I thought of horses clomping over cobbled streets. Then I changed my mind. The clatter was more mechanical, as if somewhere deep beneath the waves someone was pecking out a memo on a manual typewriter.
…

As anyone who has been conscious for the past ten months knows, ChatGPT is capable of amazing feats. It can write essays, compose sonnets, explain scientific concepts, and produce jokes (though these last are not necessarily funny). If you ask ChatGPT how it was created, it will tell you that first it was trained on a “massive corpus” of data from the Internet. This phase consisted of what’s called “unsupervised machine learning,” which was performed by an intricate array of processing nodes known as a neural network. Basically, the “learning” involved filling in the blanks; according to ChatGPT, the exercise entailed “predicting the next word in a sentence given the context of the previous words.” By digesting millions of Web pages—and calculating and recalculating the odds—ChatGPT got so good at this guessing game that, without ever understanding English, it mastered the language. (Other languages it is “fluent” in include Chinese, Spanish, and French.)

In theory at least, what goes for English (and Chinese and French) also goes for sperm whale. Provided that a computer model can be trained on enough data, it should be able to master coda prediction. It could then—once again in theory—generate sequences of codas that a sperm whale would find convincing. The model wouldn’t understand sperm whale-ese, but it could, in a manner of speaking, speak it. Call it ClickGPT.
…

Andreas told me that ceti had already made significant strides, just by reanalyzing Gero’s archive. Not only had the team uncovered the new kind of signal but also it had found that codas have much more internal structure than had previously been recognized. “The amount of information that this system can carry is much bigger,” he said.

“The holy grail here—the thing that separates human language from all other animal communication systems—is what’s called ‘duality of patterning,’ ” Andreas went on. “Duality of patterning” refers to the way that meaningless units—in English, sounds like “sp” or “ot”—can be combined to form meaningful units, like “spot.” If, as is suspected, clicks are empty of significance but codas refer to something, then sperm whales, too, would have arrived at duality of patterning. “Based on what we know about how the coda inventory works, I’m optimistic—though still not sure—that this is going to be something that we find in sperm whales,” Andreas said.

There have long been heated debates as to whether language is something that is unique to humans. While developments like ChatGPT are interesting, I must say that this particular application of the technology is what I have found to be the most exciting so far. Communicating with other species would open up a wonderful new world.

The GOP has become the party of political thuggery » « Another thriller in rugby World Cup

Comments

steve oberski says

October 16, 2023 at 10:23 am

In David Brin’s “Uplift Universe” novels, the only thing of value that humans can offer the Five Galaxies (a multitude of sapient races that has existed for billions of years) are recordings of whale songs, and in fact the Five Galaxies value the whale civilization far more than they do human civilization.

The only reason that more severe sanctions were not imposed on humans for the ecological carnage they inflicted on the earth is the fact that they had “uplifted” (in which a “patron” species genetically modifies a pre-sapient “client” species until it is sapient) several species at the time of first contact (Chimpanzees, Gorilla, Dolphins).
moarscienceplz says

October 16, 2023 at 11:55 am

When ChatGPT was trained in Chinese, were the trainers themselves ignorant of Chinese? Because that would be analogous to to the sperm whale situation. If all that happens is ChatGPT learns common patterns of codas it might generate something equivalent to “purple grape shot in the arm” where each word is mathematically related to its neighbors, but the totality is gibberish. ISTM we need to first generate a dictionary of whale codas which would entail cataloging whale behavior after they hear some codas. Has anyone fed millions of hours of TV shows to ChatGPT and had it learn English verbally? That is what would be needed to learn whale talk IMO.
jimf says

October 16, 2023 at 1:50 pm

The model wouldn’t understand sperm whale-ese, but it could, in a manner of speaking, speak it.

And what, precisely, does anyone gain if I can speak German but do not know what I am saying? I guess it might be good for a comedy sketch…
https://youtu.be/grA5XmBRC6g
moarscienceplz says

October 16, 2023 at 2:34 pm

@jimf
I knew immediately what clip that link would take me to. Thanks, and may your hovercraft always be full of eels.
SchreiberBike says

October 16, 2023 at 2:41 pm

“The model wouldn’t understand sperm whale-ese, but it could, in a manner of speaking, speak it.”

The same with the large language models that understand English; they can respond to what we say in convincing ways, but we have no idea what, if anything, they understand. Similarly I have no idea what any of you all understand, but I’m operating on the assumption that you are human and think in ways somewhat similar to myself.

Translating another language without context is another whole kettle of cetaceans.
Heidi Nemeth says

October 16, 2023 at 2:41 pm

Many conversations are uninteresting to me. I’m doubt I’d find myself fascinated by sperm whale conversations if they are all about where the squid are, how many squid there are, and grandma squid’s aches and pains. The conversations would be more interesting if they involve sex, danger, past, present and future. Most interesting would be if they cover relationships and emotions. How well does Chat-GPT deal with emotions?
sonofrojblake says

October 16, 2023 at 4:13 pm

Is the first translated sperm whale communication turned out to be the words “oh no not again”, a million geek pedants’ heads would explode.
Pierce R. Butler says

October 16, 2023 at 4:53 pm

Has anybody tried large-language models on the vocalizations of animals which ethologists have studied in depth already and have a (putative) grasp of typical meanings? A few practice runs with primates, songbirds, etc, would make more sense than jumping into the (very) deep end with organisms with brains larger than ours, and said brains consisting mostly of sound-processing circuits.
John Morales says

October 16, 2023 at 6:04 pm

When ChatGPT was trained in Chinese, were the trainers themselves ignorant of Chinese?

“This phase consisted of what’s called “unsupervised machine learning,” which was performed by an intricate array of processing nodes known as a neural network.”

(My emphasis)
Marcus Ranum says

October 16, 2023 at 6:10 pm

I find this whole topic fascinating, and keep thinking of Carl Safina’s book How Animals Think and Feel -- I read it a few years ago and recommend it. His point is basically that, “of course animals are trying to communicate -- that’s what the noises they make are for!” Well, duh. At the time I read the book I had two dogs who spoke to eachother all the time in dog-bro (they were littermates) and would kind of make these chirbling noises that apparently they understood between eachother. One of the things that made me think is that maybe we have situations where a pair of dog-bros grow up together, not in the context of a dog pack, and develop their own language extensions, like some children also do. How would we know?

I knew someone who used to research crows’ speech, and apparently many birds have species-specific vocalizations, but also develop regional dialects. e.g.: a crow from Alabama might sound like a redneck, or something.

Many years ago I tangled online with what I can only describe as a “human supremacist” -- a deeply religious clown who wanted to push the idea that animals have no souls, and therefore no inner life, and therefore cannot communicate. I found that “reasoning” to be absolutely absurd, and -- literally -- asked my dogs, who did what they usually did when I talked to them about something, at length: sat down, perked up their ears, put attentive expressions on their faces, and pretty much completely failed to understand a thing I was saying. But, it was obvious they were trying, per Safina, if they didn’t understand me at all, they would not have reacted. The fact that they sat still and listened was communication -- it just was not very effective.

Safina’s book was pretty eye opening, since it made me realize that, yeah, anyone who has known a dog ought to understand that when they growl, it’s communication. And when a whale pings, it’s communication. And when a cat pisses on your shoes, it’s communication.

I think the problem with training AI to communicate is that you need to have a “… therefore” attached to the communication. I would need to tell the AI, “If I say the word ‘sit’ to my dog, it sometimes sits.” If I tell a whale, “come here, big guy” and it comes over to me 90% of the time, is it just curious or have we learned to communicate? That’s not that big a problem, though, because I can vary the inputs and observe the outputs just like we do when we are teaching a human child. I would also hypothesize that two AIs that wanted to exchange a language could do so by performing the signaling and variation at warp speed.

Like with a human child, why should we hold AI to a standard that is unrealistic? It’ll get some things wrong -- but part of communication is correcting communication errors. I don’t know how to say “I don’t understand you” in whale but my dog-bros sure knew how to say it with their ears and carefully constructed confused expressions. I wonder if I can still find the videos I made of me talking to them -- they were pretty cute. Or, they were to me. Maybe a pair of 150lb dogs attempting a “friendly grin” might scare the crap out of someone who didn’t know dogs.
Matt G says

October 16, 2023 at 7:48 pm

“So long, and thanks for all the krill.”
xohjoh2n says

October 16, 2023 at 9:07 pm

@10:

asked my dogs, who did what they usually did when I talked to them about something, at length: sat down, perked up their ears, put attentive expressions on their faces, and pretty much completely failed to understand a thing I was saying.

Ha! That’s what you think. That ear twitch actually meant “I think when we take over I’ll recommend we keep this one -- he’s pretty useless, but somewhat amusing.”
John Morales says

October 16, 2023 at 11:34 pm

Well, I did wait.

“Using large language models to understand whales”
≠
“Using large language models to try to understand whales”

—

I’m reminded of two Gary Larsen cartoons:

One in which three doodled french poodles (in the foreground) are discussing murdering their owner (shown doing dishes in the background)
“Well, yes, that is the downside, Fluffy. When we kill her, the pampering will end.”

Another where the scientist is walking down the street where dogs are vocalising, wearing his almost indetectable apparatus:
“Donning his new canine decoder, Professor Schwartzman becomes the first human being on Earth to hear what barking dogs are actually saying.
John Morales says

October 16, 2023 at 11:40 pm

[Larson]
Silentbob says

October 17, 2023 at 2:11 am

Since Morales apparently can’t be bothered:

https://live.staticflickr.com/4121/4872650427_cdec0a24a3_b.jpg

https://pbs.twimg.com/media/F8T3ncvWAAAdf0c.jpg:large

Srsly, dude. The second one doesn’t even work without the image. Think before posting. *facepalm*
John Morales says

October 17, 2023 at 2:55 am

Srsly, dude. The second one doesn’t even work without the image. Think before posting. *facepalm*

<snicker>

I’ve got a coolie to do my work for me, bub.
John Morales says

October 17, 2023 at 3:01 am

But hey:
http://hyperboleandahalf.blogspot.com/2010/03/animals.html
sonofrojblake says

October 17, 2023 at 4:51 am

@mjr, 10:

when a whale pings, it’s communication

That’s not necessarily true though -- there’s the confounding factor that when a whale pings, it might very well simply be attempting to sense its surroundings. That emitted sound may have no informational content AT ALL -- its function being to reflect off things and provide information with its echoes. I’m put in mind of the noises my three-year-old son makes when we walk through a tunnel or other acoustically interesting space. He’s not communicating anything -- he’s experiencing the different way the noise bounces back in this space compared to what he’s used to. It’s an entirely internal experience to him, like say looking at a rainbow, it’s just that I can’t HEAR him looking at a rainbow. I can’t learn anything about his mental state from the noise he makes though, because the noise has no content… it’s just a noise. How could an AI “understand” that?

It’s a problem almost unique to cetaceans (bats are conceivably similar?), which just reinforces the sense that doing this with almost ANY land animal vocalisation would be more likely to yield results than with creatures that we KNOW use their “voice” for things other than stuff that actually sounds like “talking”.
Holms says

October 17, 2023 at 5:57 am

“Coolie”? Real nice.
John Morales says

October 17, 2023 at 7:56 am

I know, Holms. Bloody excellent.

(And self-elected, at that!)
Raging Bee says

October 17, 2023 at 10:28 am

And what, precisely, does anyone gain if I can speak German but do not know what I am saying?

“I will not buy this large language model, it is scratched.”
birgerjohansson says

October 17, 2023 at 12:20 pm

Sonofrojblake @ 7
It took me an hour to get it.
sonofrojblake says

October 17, 2023 at 1:46 pm

@birgerjohansson,22: that makes me very happy 🙂
moarscienceplz says

October 17, 2023 at 2:33 pm

@#9 John Morales
I reject that assertion. I don’t see any way to have an “AI” “learn” a language without giving it correct feedback from people competent in that language.
John Morales says

October 17, 2023 at 4:16 pm

https://en.wikipedia.org/wiki/Unsupervised_learning
sonofrojblake says

October 17, 2023 at 4:29 pm

@John Morales, 25:
You’re obviously an expert.
As such, I’m sure you can explain the following: the link you supplied describes the training of a network as follows:

During the learning phase, an unsupervised network tries to mimic the data it’s given and uses the error in its mimicked output to correct itself

You can surely explain how, in the complete absence of anyone competent to discriminate between correct output and error, an unsupervised network can nevertheless be successfully trained. Do enlighten us.
John Morales says

October 17, 2023 at 6:05 pm

sonofrojblake:

You’re obviously an expert.

Well, I can obviously read the OP and quote from it.
So I suppose that’s obvious expertise.

You can surely explain how, in the complete absence of anyone competent to discriminate between correct output and error, an unsupervised network can nevertheless be successfully trained.

Not to the likes of you.

Do enlighten us.

Sure:
“Basically, the “learning” involved filling in the blanks; according to ChatGPT, the exercise entailed “predicting the next word in a sentence given the context of the previous words.” By digesting millions of Web pages—and calculating and recalculating the odds—ChatGPT got so good at this guessing game that, without ever understanding English, it mastered the language.”

cf. https://en.wikipedia.org/wiki/Chinese_room
sonofrojblake says

October 18, 2023 at 3:50 am

@John Morales, 27:

You can surely explain how, in the complete absence of anyone competent to discriminate between correct output and error, an unsupervised network can nevertheless be successfully trained.

Not to the likes of you.

So, “no”, then. As you go on to demonstrate.

You’ve very obviously not thought this through. The LLM training you’re talking about involved using a training set consisting of millions of pages of known-to-be-valid English text inputs. Its digestion of that training set was, yes, “unsupervised”. Whoopy fuckin doo.

But it was only useful for producing outputs in English because there was someone upfront filtering what went into that training set, someone who understood English.

How well could you produce a training set for language X if
(a) you can’t initially be sure you’re even dealing with language at all per se
(b) you can’t work with text, which is readily categorisable, but instead have to work with just sounds, and far more importantly
(c) you have no idea whether the noise you’re hearing is
-- gibberish (is that a whale word, or just the equivalent of the noise my kid makes when he knows it will echo?)
-- a valid expression in language X
-- where that expression starts and stops (is that one word, or two?)
-- a valid expression in the mutually unintelligible language Y (is that English, or Chinese?)
-- a valid expression in the mutually unintelligible language Z (oh, hang on, could be Hungarian…)

How well do you suppose ChatGPT would answer questions in English if its training set had just been twenty million randomly selected pages of text in all human languages, and another 980 million of just random text? It is the complete inability, even in principle, to discriminate errors from correct outputs at any stage of the process that makes what’s being discussed here… challenging.
John Morales says

October 18, 2023 at 5:03 am

So, “no”, then. As you go on to demonstrate.

Exactly. Turns out I am not an expert, whether obviously or not.

But I can read and quote, expertly.

You’ve very obviously not thought this through.

I can read, but.

“This phase consisted of what’s called “unsupervised machine learning,” which was performed by an intricate array of processing nodes known as a neural network.”
(My emphasis)

But it was only useful for producing outputs in English because there was someone upfront filtering what went into that training set, someone who understood English.

“To find their prey—generally squid—in the darkness of the depths, they rely on echolocation. By means of a specialized organ in their heads, they generate streams of clicks that bounce off any solid (or semi-solid) object. Sperm whales also produce quick bursts of clicks, known as codas, which they exchange with one another. The exchanges seem to have the structure of conversation.”

Training set of communications in English, training set of communications in Whalish. Same thing to the algorithm.

(Presumably, whales communicate in whalish just as people communicate in peoplish)

How well do you suppose ChatGPT would answer questions in English if its training set had just been twenty million randomly selected pages of text in all human languages, and another 980 million of just random text?

Wow, you really are dense.

Issue at hand is what I addressed directly: “When ChatGPT was trained in Chinese, were the trainers themselves ignorant of Chinese?”

Again, there are no trainers either in English or in Chinese or in Whalish.

(There’s a dataset upon which the LLM trains itself — unsupervised)

—

Get it into your head: I am quoting the OP, are regurgitating it.

(Remind you of anything? 😉 )
John Morales says

October 18, 2023 at 5:06 am

[exercise]

The wages of sin is _____.
birgerjohansson says

October 18, 2023 at 5:11 am

It would be interesting to “teach” a large language model English using the dialogue of a typical Spike Lee film.
John Morales says

October 18, 2023 at 5:12 am

PS a bit more than a million pages: https://community.openai.com/t/what-is-the-size-of-the-training-set-for-gpt-3/360896
Dunc says

October 18, 2023 at 5:30 am

Very basically, you take your corpus of candidate training material and you split it into a training set and testing set, to which you apply partial masking. You train the model on the training set, then you see how well it can reproduce masked bits of the test set. If your corpus is garbage, there are no consistent patterns to be detected, and so the model will not be able to reproduce the test set. If you can train the model to reproduce the test set from the training set, then there must be a consistent pattern across the whole corpus.

Given the size of ChatGPT’s training corpus, I very much doubt that it was all pre-selected by people, although they were able to point it in the direction of vast swathes of known good(ish) material. Of course, for ChatGPT there were additional rounds of both supervised learning and reinforcement learning from human feedback, nethier of which would be available in the case of sperm whales, but that doesn’t mean that a model trained purely through unsupervised learning is worthless. But yes, it’s unlikely to be able to have a completely convincing conversation.

I suspect that the researchers involved in this project are probably quite well aware of its limits -- probably rather more so than any interested layman who’s given the matter a few minutes thought. I also suspect that the pop-sci write up we’ve got here hasn’t fully captured those issues and complexities.
Silentbob says

October 18, 2023 at 6:00 am

@ Morales

Why do hyperliteral trolls keep posting pedantic messages on blogs where they are not welcome?

Actual ChatGPT response 🙂

Hyperliteral trolls, or individuals who engage in pedantic and overly precise behavior, often post such messages on blogs and online forums for various reasons. While it can be frustrating for the community and the blog’s author, there are several underlying motivations that drive this behavior:

1) Attention-seeking: Trolls seek attention and reactions from others, and they might think that being hyperliteral and pedantic will provoke strong responses.

2) Annoyance and disruption: Some trolls derive satisfaction from disrupting discussions and annoying other users. By nitpicking and being overly pedantic, they can derail conversations and create chaos.

3) Personal enjoyment: Some individuals simply find it amusing to point out minor errors or inconsistencies, even when it’s not particularly relevant to the topic at hand.

4) Intellectual superiority: Hyperliteral trolls may feel a sense of intellectual superiority, believing they are exposing the flaws in others’ arguments or writing.

5) Boredom: Online trolling is often a way for people to alleviate boredom or find entertainment, even if it’s at the expense of others.

6) Ideological or political reasons: Trolls may have strong ideological or political beliefs and use hyperliteral tactics to challenge or undermine opposing viewpoints.

7) Lack of empathy: Trolls might not fully grasp or care about the social norms and etiquette of online communities, leading them to disregard the wishes of other users and the intended tone of the blog.

8) Group dynamics: In some cases, trolling can become a form of group behavior, with trolls encouraging each other to be pedantic and hyperliteral as part of a collective effort to disrupt and annoy.

Dealing with hyperliteral trolls can be challenging, as responding with anger or frustration is often what they want. Moderators and the community can take steps to mitigate their impact, such as enforcing forum rules, ignoring the trolls, and not engaging in arguments. The goal is to minimize the satisfaction the troll gains from their behavior and maintain a positive online environment.

Seems pretty cluey to me Juan Ramón, lol. X-D
John Morales says

October 18, 2023 at 6:29 am

Actual ChatGPT response

Ah yes, Blob of Quietude.

Because you are yet to respond to this, one of my perennial questions to you:
What do you imagine is the difference between literal and “hyperliteral”?
Whence the prefix?

See, I’ve responded thus (corpus right there) each time you foolishly try to paint me as beyond literal. Your little neologism is vapid, of course.

(By its fruit shall ye know the tree)

Seems pretty cluey to me Juan Ramón, lol. X-D

Nah. Not even slightly. You are just saying that.

And, again, for the umpteenth time, yet another perennial question:
What is your natal name? You’ve latched onto mine since in the last couple of decades I have let it be known, and after due schooling you know how it goes.

So, PustulentBoil, what is yours?

—

And, of course, here is the pattern yet again manifest. From two of the current trio, though WMDK is a postulant.

All about me. Me, me, me.

Ah well. I am more interesting than the topic at hand, to these junkies.

(I am special)
Holms says

October 18, 2023 at 7:21 am

I notice the Voynich manuscript is still unsolved. Any takers from the machine learning advocates…? Though with its thousands of distinct words I suspect it will be a few orders of magnitude more complex than whales and their two dozen.

…

Not to the likes of you. [Morales]

So, “no”, then. [sonof]

If he had gone with just ‘no’, the reply would have been useful and lacking any derision. And he can’t be having with that now, can he; there’s an image to maintain. So, extra verbiage specifically to add a jab.
Dunc says

October 18, 2023 at 7:37 am

@36: That’s an entirely different class of problem. You can’t do cryptanalysis with an LLM, although if you had a large enough sample of Voynich text, you could train one to produce more of it (assuming that it’s not just gibberish).
John Morales says

October 18, 2023 at 7:48 am

So, extra verbiage specifically to add a jab.

Ah, the usual triumvirate has had its toke.

And he can’t be having with that now, can he; there’s an image to maintain.

Very psychoanalytic, that is.

(You mean ‘persona’?)
John Morales says

October 18, 2023 at 7:55 am

I notice the Voynich manuscript is still unsolved.

I notice you presume it’s solvable.

(cf. my #13)
sonofrojblake says

October 18, 2023 at 8:37 am

@John Morales, 29:

Training set of communications in English, training set of something that is assumed to be communications in Whalish.

Corrected it for you.

Again, there are no trainers either in English or in Chinese or in Whalish. (There’s a dataset upon which the LLM trains itself — unsupervised)

And it generates the dataset itself, unsupervised?

Dunc, 33:

you take your corpus of candidate training material and you split it into a training set and testing set, to which you apply partial masking. You train the model on the training set,

Here’s a thing: why has AI in its currently massively overhyped, glorified-auto-correct form only really blown up in the last 12-18 months? Because it’s only relatively recently that sufficiently massive corpuses (corpi?) of training material have become available and processable in less than years. And the success of these models has been pretty dependent on the staggering size of their training sets. A recently famous example of just one small subset of these things is “Books3”, a collection of pirated ebooks numbering close to two hundred THOUSAND.

Does the totality of all the available whale “communication” represent a corpus sufficient to fill even ONE book the size of the average “Books3” entry? Limited size of corpus is likely to massively limit the effectiveness of any training possible even if there were someone in a position to weed out the gibberish and sort the “communication” into separate sets depending on which “language” they were in (you didn’t assume all whales speak the same language, right?) -- which of course there isn’t.

I think what you’ve got here is an example of what’s a pretty common thing in science reporting -- someone trying to attach their possibly/probably legit research to whatever’s currently sexy, and right now it’s AI and LLMs.

I’ll believe it when they publish a translation of whale conversation. I won’t hold my breath…
Dunc says

October 18, 2023 at 9:30 am

@40

Limited size of corpus is likely to massively limit the effectiveness of any training possible

[…]

I’ll believe it when they publish a translation of whale conversation. I won’t hold my breath…

From the OP:

Provided that a computer model can be trained on enough data, it should be able to master coda prediction. It could then—once again in theory—generate sequences of codas that a sperm whale would find convincing. The model wouldn’t understand sperm whale-ese, but it could, in a manner of speaking, speak it.

[…]

Andreas told me that ceti had already made significant strides, just by reanalyzing Gero’s archive. Not only had the team uncovered the new kind of signal but also it had found that codas have much more internal structure than had previously been recognized.

[My emphasis]

I’m pretty sure they’re not really trying to come up with an LLM that can converse with whales. They’re applying similar techniques and technologies to try and understand more about the structure of whale vocalisations, with apparently some success already.

To quote my own earlier comment: “I suspect that the researchers involved in this project are probably quite well aware of its limits — probably rather more so than any interested layman who’s given the matter a few minutes thought.”
sonofrojblake says

October 18, 2023 at 10:15 am

@Dunc, 41:
Yeah, fair enough -- I’m possibly just massively cynical about AI hype, having heard it before. Can it really be almost 20 years since the I-then-thought-relatively-reputable New Scientist magazine uncritically parroted a press release about “ChatNannies”, a supposed software agent that could spot grooming behaviour employed by paedophiles in online chatrooms? https://hoaxes.org/weblog/comments/chatnannies
Dunc says

October 18, 2023 at 11:17 am

There certainly is a depressingly large amount of bollocks AI hype out there right now, and it’s good to be cynical about it -- but equally, machine learning is a real thing with lots of interesting applications that’s going through an amazing growth phase at the moment. Telling the difference can be really quite hard, and the breathless pop-sci stuff doesn’t help much. This sounds legit to me though, once you strip out the obligatory references to ChatGPT. Not that I’m any kind of an expert…
Holms says

October 18, 2023 at 11:17 am

#39
No I don’t.
John Morales says

October 18, 2023 at 4:51 pm

@44, yes, you do.
“is still unsolved” is only applicable to something solvable; something unsolvable can neither be solved nor unsolved.

(Your grasp of logic remains as strong as ever)
Holms says

October 18, 2023 at 6:43 pm

Something that is not solved can be termed unsolved, whether it is solvable or not.
John Morales says

October 18, 2023 at 6:54 pm

Heh heh heh.

Something that is not dead can be termed undead, whether it can die or not.

I have here an undead glass.

It is not yet dead, it will never be dead since it was never alive, so it is undead.

(Oh, right. +1 to your thread quota of persistent pointless personal protestations)
Holms says

October 18, 2023 at 9:18 pm

Analogy fail. Something that is not solved can be termed unsolved, without even knowing if it is possible to be solved.

pointless

The point is to correct lousy reasoning. You are a prolific producer of such.

personal

You take disagreement with your reasoning personally?? This explains a lot!
John Morales says

October 18, 2023 at 9:23 pm

+1. A few more to go, Holms.

Of course, you have nothing to offer on the topic at hand.

It’s all about your perception of me.

(Me, me, me)
Holms says

October 18, 2023 at 10:44 pm

I mentioned the Voynich in relation to solving language; you then jumped on that to make a silly claim about my use of ‘unsolved’ rather than comment on topic. Thus the diversion from the topic was, as usual, all you.
John Morales says

October 18, 2023 at 11:11 pm

Solving language!

(Why, is it unsolved?)

+1
John Morales says

October 18, 2023 at 11:13 pm

Thus the diversion from the topic was, as usual, all you.

It’s always about me, where you, bobiferous, and the other are concerned.

But I do get it. I am more interesting than the topic at hand.

(AI, whales)
sonofrojblake says

October 19, 2023 at 3:47 am

Holms -- you’re feeding it.
John Morales says

October 19, 2023 at 3:52 am

Heh heh heh.

Whale songs.
Holms says

October 19, 2023 at 3:54 am

Look in a mirror. #36 was on topic, your response to it was the tangent. All I do from there is swat at the dumb stuff you post.
John Morales says

October 19, 2023 at 4:44 am

Look in a mirror.

Heh heh heh.

And you try to pretend that it’s not all about me.

All I do from there is swat at the dumb stuff you post.

<snicker>

+1.
Holms says

October 19, 2023 at 10:01 am

It began with you being wrong about my on-topic #36, so, yes. Mirror.

But sure, your usual spin. Other replies to you = ‘all about you’; your replies to others = [ignored].
sonofrojblake says

October 19, 2023 at 1:30 pm

Holms — you’re feeding it.
John Morales says

October 19, 2023 at 4:27 pm

Ah well, persistent patterns persist.

That’s enough for now, Holms.

(“it” is a cute thing it says)
John Morales says

October 19, 2023 at 4:29 pm

Related to the actual topic: https://www.theguardian.com/science/2023/oct/12/researchers-use-ai-to-read-word-on-ancient-scroll-burned-by-vesuvius
Holms says

October 19, 2023 at 4:48 pm

See you in the next pedantic tangent!
John Morales says

October 19, 2023 at 5:15 pm

[boo!]

Mano Singham

Just another Freethought Blogs site

May Flowers - Red and Orange

SCHOOL'S OUT FOR SUMMER!

Silly songs

Circuitous

What's the right way to protest Israel?

Origami: Coffee

Flower in Her Hair (more weird art)

Social Media and Me

The Saber-toothed Salmon: When Fact Is Goofier Than Fiction

Using large language models to understand whales

Comments

Leave a Reply Cancel reply

Share this:

Comments

Leave a Reply Cancel reply