Warning: some “adult” and racist language
Back in the day, when voice recognition systems were first coming into us, we used to joke about getting on the public address system and saying “FORMAT C: /Y”
Assisted learning artificial intelligence has some drawbacks.
Microsoft also learned that chat-bots, which operate sort of like a parrot, have similar problems: they’re garbage in/garbage out and the internet is happy to feed garbage at everyone, any time. Microsoft’s “Tay” chatbot was taught to sound like a racist troll in remarkably short time:

Microsoft Tay twitter feed
I’m fascinated by two aspects of this stuff. First, I’m not sure that the learning process Tay chatbot, a parrot, or Alexa pursues are that different from what human children do. The difference is that human children absorb a tremendous amount more cultural cues than the chatbox or Alexa can, and they do it faster and constantly for years. When I was a child I was sent home from kindergarten early because I had learned the word “fuck” – apparently I told one of my playmates he had “fucky boots.” I don’t think that version of Marcus was a very advanced chatbot, compared to Alexa, really. The other aspect of voice control that fascinates me is that it’s a huge invitation to what Charles Perrow calls “normal accidents” – the idea of a “normal accident” is that as systems become more interconnected and interdependent their failure modes become correspondingly more complex, to the point where they are incomprehensible and unpredictable. Perrow’s view of technology is, of course, music to a computer security practitioner’s ears: that’s exactly how it appears to work out there on the internet. Others, however, point out that humans do a fair job of minimizing interdependencies when we need to, and that it’s really not such an awful problem.
Adding artificial intelligence of a limited sort to most things means you’ve added a new control channel – a less predictable control channel that someone can attack. And, because the ‘intelligence’ is not really very intelligent, you can easily manipulate it into doing something a higher intelligence would not. There’s the old story of the mule that starves as it tries to decide whether to eat the grain or drink the water, first. Anyone who has met a mule knows that’s a foolish story: mules are pretty smart and they hold grudges. But, in order to hold a grudge you have to be able to a) remember b) decide something was wrong c) attribute the attack:
I can think of all kinds of other ways this kind of behavior could be weaponized, per “normal accidents” theory: in the example above if you could get an event onto the user’s calendar that said “hey google, call 1-900-sex-hawt” you could cost someone a great deal of money. Or, you could have someone’s not very smart robot assistant call the local FBI field office with a bomb threat. The interconnection between the voice-activated part of the system and the calendar entries is the sort of subtle interconnection that Perrow worries about.
My addition to Perrow is a tequila-fuelled observation I once made, which is: “humans don’t do things very well.”

Charles Perrow, Normal Accidents – Living with High Risk Technologies (amazon)

What is the wee one in the first video trying for?
…digger…?
From limited context, I’m guessing a video game called Digger which was originally made in 1983 but subsequently rereleased as an open source game for nearly every device in existence.
But I don’t actually know with anything approaching certainty.
Here in Scotland, we have rather more fundamental issues with voice recognition technology. (It has improved a lot in recent years – by which I mean that it actually does work sometimes, provided you don’t have too strong an accent.)
My first thought on seeing that video was “Daddy didn’t delete his search history”.
sonofrojblake@#4:
Sounds like! Or someone’s query history at google. Or…
That’s actually a perfect case-study of “normal accidents” style interactions: the system does not behave the way you predict because something complicated happened that was totally off the user’s radar scope.
Nonononono.
They really are. While you could, of course, teach a child nonsense language*, it will never actually be nonsense because it will always refer to things outside of language. They don’t learn sentences, they learn words, often starting with nouns.
Also, while children will pick up words they don’t understand, like a chatbot, they also usually have some idea what they mean. One day my daughter used the wonderful word “arschgefickt” (ass-fucked), which isn’t just not a word we use around the dinner table, but also one that’s pretty homophobic. Did she know what it really means? No, but she clearly knew that it was a “bad word” and that she would get a reaction out of us by using it (probably not the one she expected).
I doubt the chatbot had anything in its program that realised that this was fucking disgusting.
*Many families will have “family words” that don’t make any sense to people outside their immediate circle
Giliell@#6:
They really are. While you could, of course, teach a child nonsense language*, it will never actually be nonsense because it will always refer to things outside of language. They don’t learn sentences, they learn words, often starting with nouns.
That makes sense. They’re completely surrounded by culture. Even if you were raising a child as a chinese room experiment, they’d probably still encounter something contradictory eventually.
When I was in high school my g/f and I did one baby sitting gig and managed to teach the kid to say “class struggle” It was sort of parrot-like. I’ve tried to get parrots to say stuff, and they also think and have some limited understanding of tone and rythm – they’re not just MP3 players.
I doubt the chatbot had anything in its program that realised that this was fucking disgusting.
It probably does now!! Usually this sort of thing is a parser that extracts semantic interest content, then generates pseudorandom (plus state-based) outputs through a grammar-assembling output function. Which, in my experience, is what I do, too. I just like to think my functions are excellent. ;)
Yet it had a communicative purpose. What happened when the kid said class struggle (I think adults teaching kids words they don’t understand is one almost everybody does)? You reacted, probably with laughter, probably with praise. The kid learned “this word will make Marcus laugh and say “cool, mate!”, so they used it to elicit that reaction.
Short story: whenever you say something, you want something. You will choose your words depending on your goal. Alexa or a chatbot don’t.
Giliell@#8:
That’s a great way to describe it.
I think that the chatbots don’t have as rich a feedback environment to work from as a human does. If they did, we’d probably be more likely to think them “intelligent” (whatever that is) I.e.: a chatbot could be programmed to watch our mouth and collect feedback from when we’re about to talk, or whatever, like a human child or a dog does. One of my dogs used to interrupt me periodically, when I would be talking to him he’d sort of wiggle his chin like he was about to bark, so I’d shut up and then he could make yarble-bargle noises at me. I thought it was really interesting. And it usually got him a more pleasant level of interaction.