They grow up too fast


She might have growed up a little since this photo was taken.

My daughter, Skatje, is doing her PhD defense on Thursday, and Mary and I will be attending over Zoom. Naturally, not wanting to look like a dope, I thought I’d look up her work (finally) and get a little hint of what I’m going to hear. I’m already lost.

Hallo, I’m Skatje Myers, a PhD student in Computer Science (joint degree in Cognitive Science) at the University of Colorado at Boulder, advised by Martha Palmer.

My research focus is in accelerating development of new corpora for semantic role labels (SRL).

I’m investigating techniques for conducting active learning for semantic role labeling: How can we determine which sentences will most improve the model when annotated and added to our training data? This methodology enables us to improve annotation efficiency by selecting only the most informative sentences to annotate.

Additionally, I’m examining approaches for projecting semantic annotation cross-lingually: If we know what the semantic roles are in an English sentence, and we know the translation of that sentence, can we figure out which words to assign those roles to in the target language? These projected annotations may serve either as a starting point for manual annotation that will expedite the process, or as training data themselves.

I’m presently exploring these techniques specifically in regards to developing and expanding a Russian PropBank corpus.

I suspect that plunging right into a thesis defense in this field is going to be bewildering, but we’ll try.

Comments

  1. says

    When I defended, I told my parents I was studying the development of sensorimotor integration in embryonic zebrafish. That made perfect sense, right?

  2. birgerjohansson says

    Time…
    James Bond aka Pierce Brosnan turned 70 today. That young guy was chasing baddies just a short time ago and now he is old???

  3. wzrd1 says

    This methodology enables us to improve annotation efficiency by selecting only the most informative sentences to annotate.

    That’s not highly original, but it is a critical area of research, which can and has impacted even things as simple as search engine queries.
    Ever make a search engine query of only three words on a technical subject and get good results? Your meatware essentially did what she’s been working on and improving. Adding in cross-lingual bits, that becomes a challenge, as most non-romance languages don’t have true translations, but require more associative transliterations.
    Given the mixture, it sounds like she’s likely working with nested neural networks, a network of neural networks, the work I’ve seen in that shows great promise.
    Defending the thesis? Comprehending it in detail is way beyond my pay grade! ;)
    I can understand the mile high view, get down into the sand, I’ll be about as lost as the rest of us are.

  4. larpar says

    PZ @ 2
    It makes more sense than “new corpora for semantic role labels”.
    Of course, I’ve been reading this one biologist’s blog for over 10 years. A little has seeped in. : )

  5. René says

    SQB, Might that be a (data)bank of Propositions (a.k.a. declarative sentences) in the Russian lingo?

  6. René says

    [murmurs] Probably programmed in a product by another former boss annex crook, Larry. [/m]

  7. moonslicer says

    Aye, they do grow up fast. I remember the days when I was first on my own with my son. He was 2 years old at the time and barely tall enough to see over the table when he was eating his dinner. I told myself, “He won’t always be that small. Time will go by.”

    And of course it did. These days he’s 37 and about 9 inches taller than me and he’s doing OK for himself. Not the most scintillating job, but he’s good at it and he’s pretty well paid. So we’re both getting by better than we were during the dark days.

  8. birgerjohansson says

    I suspect translation software is to AI what fusion is to physics.

    PZ@2
    Funding idea: tell Pentagon you are weaponizing Australian zebrafish.

  9. René says

    @PZ

    sensorimotor integration in embryonic zebrafish

    I had no idea stupid fish were that precocious.

    On another note, I always wondered how you and your trophy wife came up with Skatje for her name. Skat and its diminutive Skatje is definitely West-Frisian — not spoken in Friesland but in the North of North-Holland — jokingly for darling.

    Anyway have a great day thursday, and all the best for Skatje’s defense!

  10. Matt G says

    My research focus is in accelerating development of new corpora for semantic role labels (SRL).

    In other words, witchcraft.

  11. says

    The original idea was to name her Skaði, but eth isn’t a common letter in English. It’s often transcribed as “d”, but we worried that Skadi would be read as Scotty, you know, a boy’s name, and also at the time a close colleague was named Scott, and we didn’t want it to sound as if we were naming her after him. The “tj” was just a way to soften the name a bit.
    And then in the 1980s we discovered all this pop and disco music from the Netherlands that referred to Skatje. You can’t win. Should have just stuck with Skaði.
    Skaði is a Norse goddess of hunting & skiing, by the way.

  12. robro says

    I happen to have some idea about what she’s talking about. I’ve worked on a ML project to do known entity extraction from content. I’m currently working on a “knowledge graph” project with two ontologists. And in the long ago past, I worked with some folks using Latent Semantic Analysis to analyze some other content. One of those folks was a PhD in cognitive science from Cambridge (England), and one of the most interesting people I have ever met…plus a damn good magician. Anyway, it’s an interesting field to work in with lots of opportunities although apparently the current crop of semantic web technologies are a bit spooked by LLM (Large Language Models such as ChatGPT).

  13. robro says

    Matt G @ #14 — After reading the article in the May issue of Scientific American about witch hunts around the world, I’m making no jokes about witchcraft, voodoo, sorcery, or anything similar. Witchcraft still gets people killed All. The. Time. And while the legislators of Massachusetts finally made some gesture to atone for murdering women (mostly) and men as “witches” in the state’s past, the legislators of Connecticut wanted to know what evidence there was that these people weren’t guilty…you know, like there’s something real about witches beyond just greedy fanatics wanting to get someone out of the way to take their property. Anyway, look it up — Title: Witch Hunts. Authors: Silvia Federici, Alice Markham Cantor.

  14. René says

    You could have named her Cyneðryð or borrowed her th spelling as Skathe.

    I like her name Skatje — I photographed a half-sunken boat with that name under spring blossoms a season or two ago — if I only knew how to forward that picture.

    https://en.wikipedia.org/wiki/Cynethryth

  15. says

    Skaði is a Norse goddess of hunting & skiing, by the way.

    I knew that. One of the few things Waldorf schools are good for.

  16. René says

    So, now, names of godesses who never existed are better names than those of queens that actually did? Not that I like queens…

  17. says

    @11:

    Considering that people can’t do translation very well, let alone understand how it’s done, expecting machines to do it well when we’re the one’s who’ve taught them (or at least established the parameters for the learning system) is at minimum wildly overoptimistic.

    And that’s before getting into the nonverbal aspects of translation to really understand “meaning.” A couple decades back, when US forces “liberated” Baghdad, there was one 3-minutes-or-so sequence shown on TV news (multiple sources) showing a crowd of Iraqi men pulling down a statue of Saddam Hussein, then removing their shoes and smacking the statue with the soles of their shoes. Nobody on the news commented on the nonverbal communication. I suspect that an AI wouldn’t get it, either…

  18. John Morales says

    … I thought I’d look up her work (finally) and get a little hint of what I’m going to hear. I’m already lost.

    :)

    (You are a successful dad)

  19. Jim Balter says

    I’m curious to learn what a Russian PropBank is.

    Intelligent people who are curious look things up via a remarkable modern resource called a search engine, or skip directly to Wikipedia.

    I suspect translation software is to AI what fusion is to physics.

    You suspect wrongly. Google translate became at least an order of magnitude more accurate when it was (silently) changed to use an LLM.

    Nobody on the news commented on the nonverbal communication. I suspect that an AI wouldn’t get it, either…

    But it might inform you that the statue was pulled down by U.S. Marines using an M88 ARV.

  20. wzrd1 says

    Jim Balter @ 26, true, but Google also silently changed to LLM in their main search engine, with an order of magnitude lower accuracy. Low enough that my middling to average computer skills youngest noticed and has remarked upon as well. Was going through merry hell earlier trying to find titanium tubing of a specific size and wall thickness, Google’s wonderful engine that used to give me the correct results kept pointing me at aluminum.

    As for the Saddam statue, you missed the point, whether that was intentionally or not, I’m uncertain. Beating with shoes holds a specific context in the culture, that of utter contempt. That’s reflected in the US within the Jewish community as well, which leaked into some of our movies. In “My Fellow Americans”, there was a threat to beat the other with his shoe that reflects that very same context.
    That makes sense, as Arabs and Jews are from a common origin, culturally. Which is also reflected in traditions.
    All of which get missed by a LLM, but should be caught by a GAI, should we ever get an effective one. What we have now reflects single level research, so gives erroneous answers to some questions. An example was given by Anton Petrov shortly after ChatGPT made the splash in the news. Getting an accurate answer was an exercise in query by dentistry. In short, like pulling teeth. Initially, a wrong answer, then it tried to stick with that answer and when challenged, finally looked a level deeper and arrived at the correct answer – about the first exoplanet observed vs an interesting recently studied one.

    AI’s, as they currently stand are valuable tools, indeed, invaluable in some applications, but we’re a long, long way to general AI yet. Chat GPT is only a small first step.

  21. birgerjohansson says

    Jim Balter @ 26
    It is great that translation software is getting better.
    People with a mediocre understanding of language can probably not do better than the best current software.
    But to catch the finer nuances the way the best translators do – I suspect it will take general A I to get that far.

    BTW what is the situation re translating spoken language into text? The videos on Youtube do not have very good subtitles so far.

  22. rietpluim says

    In Dutch, “schat” literally means treasure. As a pet name, a better translation would be sweetheart or darling. The suffix “-je” means little, so “schatje” means little treasure. “Skatje” is the slang version. It may be used either affectuous or (when used ironically) contemptuous. The word doesn’t seem to be related to Skaði. Its etymologic origin is unknown.

  23. birgerjohansson says

    Rietpluim @ 29
    In Swedish the closely related “skatt ” means treasure.

    Skatt has also come to mean what the English call “tax”.
    Not to be confused with Swedish tax= dachshund.

  24. birgerjohansson says

    Rietplum @ 31
    “Min skatt” is “My treasure” but it is such a strong compliment that it is rarely used.
    In Swedish translations, Gollum adresses the Ring as ” min älskade” eller “min skatt”

    As “je” is not a Swedish suffix “Skatje” becomes just syllables without obvious meaning.
    .
    Other stuff. Fock is a Swedish naval term of Dutch origin describing part of the rig of sailing ships.
    As the wind makes this component hit other parts, “fock” in Dutch naval slang became synonymous with (repeatedly) striking something.
    Sometime in the 18th century the term entered the English language but rarely in connection to naval matters.

  25. says

    @26, @27

    There were multiple incidents of crowds beating on statues with their shoes. The one I particularly recall, and that I remarked upon at the time, showed the crowd itself pulling the statue down. I’m sure there were others (chosen by American news directors) in which Marines pulled down the statue and left it for the crowds.

    That said, the point that failure of context in translations Is a Problem is, if anything, perhaps worse.

  26. wzrd1 says

    @36, true enough. There was definitely no shortage of Saddam statues to go around.
    A fair number were assisted by US service members, but we certainly didn’t have enough men to go around to handle them all and perform their duties.
    And you’re spot on about context, it’s everything and trivially missed by an AI at its current state of development. I figure they’ll get all the bugs ironed out and context accurately assessed around when we’re in full production of fusion power.
    I’ll see myself to the door…

  27. Jim Balter says

    Google also silently changed to LLM in their main search engine, with an order of magnitude lower accuracy.

    No they didn’t.

    As for the Saddam statue, you missed the point

    No, you did. I didn’t say that Iraqis didn’t beat the statue with shoes–they did. But I pointed out additional important facts about the event that were largely suppressed by the media.

    All of which get missed by a LLM, but should be caught by a GAI, should we ever get an effective one.

    I don’t disagree.

    we’re a long, long way to general AI yet.

    Yup.

    Chat GPT is only a small first step.

    Nope … ChatGPT is not a step toward AGI, it’s a diversion that impedes progress (which isn’t necessarily a bad thing given the risks, but LLMs have their own dangers). See Gary Marcus for details.

    The one I particularly recall, and that I remarked upon at the time, showed the crowd itself pulling the statue down.

    Nope; didn’t happen.

  28. Pierce R. Butler says

    No doubt an urban legend, but (perhaps) worth passing along:

    Supposedly somebody (Bell Labs?) attempted a Chinese-English computer translation project in the ’60s.

    They tested it, in part, by giving it an English phrase, then taking the Chinese output and rendering it back into English.

    “Out of sight, out of mind” came back as “invisible idiot”.

  29. Tethys says

    Good luck to Skatje!

    I’ve always wondered about her name.

    In the Eddas, Skaði is married to Njord, in recompense for the death of her Father.

    Her father was a Jotun named Þjazi who kidnapped Idunn, and was then killed by Thor.

    Etymology wise-
    Skað means damage in ON, and is related to English scathe.

    Sceat is a silver coin in Anglo-Saxon which is cognate with the Dutch schat, Swedish skatt, and German schatz. Treasure, Darling (especially if you add the diminutive)

    I think you made a good choice with the spelling. It works with both etymologies, but English lost eth and thorn long ago.

  30. wzrd1 says

    Google also silently changed to LLM in their main search engine, with an order of magnitude lower accuracy.

    No they didn’t.

    OK, the results are consistent with the chatbot’s responses in search engine output format and show much the same errors the bot shows, but you’re right, Google just randomizes results instead because of hand wave.

    The one I particularly recall, and that I remarked upon at the time, showed the crowd itself pulling the statue down.

    Nope; didn’t happen.

    I saw the raw intelligence footage, didn’t pay attention to polished press accounts, as I tend to dislike viewing things through yellow journalism colored glasses. There were many instances where the statues were pulled down, as you couldn’t throw a rock without hitting a Saddam statue when he was in power.
    I actually asked our intelligence folks why Iraqis had that much rope laying about, got a rather baffled look in response. Alas, that’s not an unusual event outside of an intelligence agency where there is a dedicated analyst for the country, who likely would’ve had the answer.

    @40, Snopes consensus was that that tale likely isn’t true, however, it could be, due to the complexities of language.
    For some background, look up the thought experiments “Chinese room”. Wikipedia has a decent article on the subject. Peter Watts used it as a key plot element in Blindsight.
    Frankly, I honestly couldn’t make an acceptable case in support of the AI understanding English, let alone Chinese. It’s just using a net to process longer strings, with minimal context analysis in input, be it question or decision driving data sources. Pretty much in the same way neural nets are used to process tons of image data to see deltas and/or similarities between objects. The net doesn’t have a clue what it’s “looking at”, it sees data without context and just looks for sum and difference results, as well as averages falling out of tolerance over time.

  31. Pierce R. Butler says

    wzrd1 @ # 42: … the thought experiments “Chinese room”.

    Thanks for the refresher. I feel the Wiki article you suggest somewhat misses the point in stressing differences between “minds” and “models of minds”, because I consider “mind” as a modeling process to begin with. Those of us with biological brains have a 3.5-billion year advantage in dealing with the unexpected and the unknown; it may take the computers as long as 35 more years to catch up. (Possibly a bit less if Skatje Myers devotes herself fully to the project.)

  32. wzrd1 says

    The biggest problem we had was the model that “brains are computers”, while actually believing that brains are binary based digital computers, rather than what they are, neurmodulated, with multiple neurtransmitters that give different results than what a plain electronic binary based gate could process. The closest we came were with tristate, due to the high output impedance mode and that’s largely used only in multiplexers.
    Biological brains are pattern recognition engines, which formerly computers really had to work at when processing something like imagery.
    Basically, due to deficient theoretical concepts, we hobbled ourselves in development. Neural networks, once a curiosity, now are beginning to emulate in a stronger way, that which biology has given us, while retaining their strength in speed and are in current usage in research, such as in astronomy.