I am an AI training module


I’m trying to plan some alternative teaching options for the Fall, since I might be temporarily incapacitated for a bit — I’m waiting on a call from podiatrist right now, which might define some of my limitations. One of the obvious fall-back strategies would be to do some lectures remotely, since we’re all well-trained on using Zoom nowadays. Except that now I learn Zoom wants to use us.

Zoom has rolled out a controversial update to its terms of service, adding a clause that allows it to use customer data for AI and ML training.

The pertinent clause is quoted below:

You consent to Zoom’s access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage of Service Generated Data for any purpose, to the extent and in the manner permitted under applicable Law, including for the purpose of product and service development, marketing, analytics, quality assurance, machine learning or artificial intelligence (including for the purposes of training and tuning of algorithms and models), training, testing, improvement of the Services, Software, or Zoom’s other products, services, and software, or any combination thereof, and as otherwise provided in this Agreement. In furtherance of the foregoing, if, for any reason, there are any rights in such Service Generated Data which do not accrue to Zoom under this Section 10.2 or as otherwise provided in this Agreement, you hereby unconditionally and irrevocably assign and agree to assign to Zoom on your behalf, and you shall cause your End Users to unconditionally and irrevocably assign and agree to assign to Zoom, all right, title, and interest in and to the Service Generated Data, including all Proprietary Rights relating thereto.

Those bastards. This is a sneaky way of violating privacy, confidentiality, and our ownership of classroom content. If they announced that my lectures could be lifted wholesale and sold for use by anyone else, that would be less of a violation than this. They figure they could use a computer proxy to take all that content, massage it, refilter it, and then distribute it without attribution in the guise of AI — no one will be able to blame or credit me as a source, which is an essential part of the way science works.

They’ll also be able to steal my students’ contributions, although mostly they’re all silent black rectangles on the screen. They will chime in with good stuff now and then, though, all of it to be grist for the AI machine.

Also, people use Zoom for business meetings, where real money is at stake. I wonder how they’re going to take to the idea of a digital spy lurking in the background?

Or how about medical consultations? Are those to be AI fodder, too?

Comments

  1. birgerjohansson says

    Using AI tools in a medical context puts demands on the MDs to be aware of the limitations of their new tools- not a trivial detail.

  2. robro says

    My first question is what’s “Service Generated Data”? So I went to the Zoom Terms of Service and in the sentence leading into the quoted excerpt the agreement says:

    10.2 Service Generated Data; Consent to Use. Customer Content does not include any telemetry data, product usage data, diagnostic data, and similar content or data that Zoom collects or generates in connection with your or your End Users’ use of the Services or Software (“Service Generated Data”).

    So they are making some kind of distinction between “Customer Content” and “Service Generated Data.” The entire section 10 is titled “Customer Content”, and here’s the first section:

    10.1 Customer Content. You or your End Users may provide, upload, or originate data, content, files, documents, or other materials (collectively, “Customer Input”) in accessing or using the Services or Software, and Zoom may provide, create, or make available to you, in its sole discretion or as part of the Services, certain derivatives, transcripts, analytics, outputs, visual displays, or data sets resulting from the Customer Input (together with Customer Input, “Customer Content”); provided, however, that no Customer Content provided, created, or made available by Zoom results in any conveyance, assignment, or other transfer of Zoom’s Proprietary Rights contained or embodied in the Services, Software, or other technology used to provide, create, or make available any Customer Content in any way and Zoom retains all Proprietary Rights therein. You further acknowledge that any Customer Content provided, created, or made available to you by Zoom is for your or your End Users’ use solely in connection with use of the Services, and that you are solely responsible for Customer Content.

    However, the ZOOM TERMS OF SERVICE are so heavily legalistic that I’m not sure a mere mortal can make any sense of it. I, a mere mortal, don’t under the distinction they are making between “Customer Content” and “Service Generated Content” and what is covered by what. Perhaps a lawyer in the group can decipher.

    In any case, I suspect business clients of Zoom will scream and get some kind of way to opt out of this feature. The business I work for would but we use WebEx.

  3. birgerjohansson says

    …Also, welcome to late-stage capitalism. Teddy Roosevelt saved the future of USA when he put restraints on the robber barons. FDR et al put further constraints on the banks. Richard Goddamn Nixon created EPA.

    All those restraints have been removed, or so hollowed out that if you want a smidgeon of government protection I recommend moving to the EU countries. Canada is dangerously close and will get dragged down if your economy implodes.

  4. Dunc says

    They figure they could use a computer proxy to take all that content, massage it, refilter it, and then distribute it without attribution in the guise of AI

    I think it’s rather more likely that they want to use AI for things like noise suppression and brightness modulation, but that’s not clearly specified in the text of the terms.

  5. Alan G. Humphrey says

    The first thing that popped into my mind was the use of high-quality video to capture moving faces in conversation for AI facial recognition. That one is worth many billions of dollars, and now I’m thinking it’s time to check the terms of use for smartphones, too.

  6. wzrd1 says

    It’s a bit more damning, really. Zoom is admitting openly that they’re recording all of user sessions.
    Which is entertaining, really, given in many jurisdictions, that’s a felony.
    Gotta love it when someone openly admits to committing a felony. Hopefully, quite a few state Attorney Generals are paying attention.

  7. says

    Yep, it’s a job for the lawyers.

    Personally if I were a business (like a university) I’d send them a cancellation notice on my subscription, with an offer to renegotiate the contract, subject to rewritten terms of use. Pushback is the only way to end enshittification.

  8. Pierce R. Butler says

    Marcus Ranum @ # 10 – The world shall end with a, “Aaah, what’s on TikTok?”?

  9. wzrd1 says

    Marcus Ranum @ 10, I pity those poor AIs, as I am decidedly a confounder for any study. Poor Google even has largely given up, returning discordant results in trying to predict my interests. Facebook, well, it’s pretty much forgotten what I look like.
    sys 64738. ;)

  10. robro says

    Marcus Ranum @ #10 — Based on my current work, there is no AI capable of doing anything like what you suggest except in the feverish minds of a few Hollywood producers. Could it in the near or long term? That’s difficult to say, of course, because my Ouji boards refuses to return a coherent response no matter how I design the prompt. I think very long term is a maybe. There’s a tremendous amount of hype around AI in general, and LLMs in particular. Anyway, there’s probably more immediate existential crises to worry about. Man is it hot again today.

    wzrd1 @ #12 — A work colleague was just telling me about a video he saw with Meta’s main AI guy saying that their AI/LLM is a mess.

  11. birgerjohansson says

    AIs will not reach “strong AI” levels in our lifetimes, as the problem of consciousness always turn up to be harder than we expected, like tokamak fusion energy only worse.

    I exchanged a few letters with Arthur C Clarke. He had been certain that strong AI would have emerged not much later than it did in his stories, so we were all disappointed.

    But in regard to real, strong AI , the webcomic Saturday Morning Breakfast Cereal is adressing that topic with much greater wit than I could ever do.

  12. wzrd1 says

    Strong general AI is right around the corner. Fusion power is right around the corner. Flying cars are right around the corner. Unicorns are farting right around the corner.
    I expect to smell unicorn farts first out of all of those lousy predictions.

    Oh, another new policy from Zoom. They’re forcing their workforce to return to the office at minimum of two days per week.
    https://www.cnn.com/2023/08/07/business/zoom-return-to-office/index.html

  13. Snarki, child of Loki says

    Set up video cameras observing your spiders 24/7; feed them into Zoom.

    Not sure what that’ll train their AI to do, but I’m sure it won’t be what they’re hoping for.

    If they find their CEO wrapped up in silk and sucked to a husk? Karma.

  14. says

    @15 Honest, competent, personable politicians, however, are not right around the corner (if we’re lucky, we can choose one; anywhere in Illinois, those are undefined terms).

    @3 “Service Generated Data” includes all aggregation and telemetry (e.g., “just what is that actual IP address, upload/download speed, and native screen resolution of the machine?”). Zoom is carefully distinguishing between anything that might be copyrightable and anything for which there is clearly no copyright interest, but may be a different type of interest (hypothetically — one must look in other parts of the TOS to see it — location data –> privacy, because one cannot run Zoom even through a VPN without transmitting location data).

    And why is that? Because unlike most other aspects of “confidential information,” one cannot transfer a copyright interest without a signed writing. Zoom really, really, really doesn’t want to get into the “who owns the copyright in audiovisual representations directly and intentionally procured by the person appearing in them?” quagmire. The answer, of course, is “It depends — and not on the TOS, which is as a matter of law not a signed writing sufficient to convey a copyright interest or the entire copyright unless it’s, you know, signed and not just clicked/checkboxed, presuming that there is a copyright interest to convey.”

  15. says

    @16 I think he already has. If you look at the “new” TOS terms, they’re obviously designed to cocoon the information and save it to suck out later…

  16. imthegenieicandoanything says

    They saw that Black Mirror episode “Joan is Awful” and took it as being a great idea, but plan on changing the end and becoming… all powerful… forever!

    Really, deep inside, they want to create Skynet, because they (the people behind pretty much everything hugs in this version of “capitalism”) are self-pitying, childish dweebs who want to pretend they’re rational nihilists. They want to be in a fucking story with an ending.

    And then a sequel.

    And then a franchise.

  17. says

    I hope the Zoom folks are at least taking some measures to keep all “their” content from being accessed by OTHER people’s AIs…

  18. John Morales says

    Raging Bee, you write as if “the Zoom folks” [folk is already a collective noun] themselves had AI, which is an unwarranted inference.
    You also write as if AIs had agency (they don’t), what they are is a tool.

    Anyway, to your point: from a game-theoretical standpoint, if you can make it cheaper to buy the data rather than to steal it, then people will buy your data rather than steal it. $profit$

  19. says

    Actually, no, I’m not talking about anyone’s AIs having agency; I’m talking about what the people in charge are doing (if anything) about keeping all this information they own out of other hands.

  20. beholder says

    Those bastards. This is a sneaky way of violating privacy, confidentiality, and our ownership of classroom content.

    You relied on an unaccountable third party running untrusted and fundamentally unauditable software to provide this service. What did you expect would happen?

    There are alternatives out there. Jitsi Meet and other WebRTC-based protocols come to mind. You’ll have to set up the server yourself, but if you care about classroom privacy and accountability, you, or someone you trust, shouldn’t have a problem with that.

  21. John Morales says

    Raging Bee, you wrote “at least taking some measures to keep all “their” content from being accessed by OTHER people’s AIs”, so…

    First, the only reason to write “OTHER people’s AIs” is to distinguish those from their AIs, else one would just leave out the capitalised, emphasised OTHER.

    Second, if it’s being done by other people, then it’s not the AIs doing it, is it?
    It’s being accessed by OTHER people, not by AIs.

    Third, why should it be kept out of other hands when $profit$ is the name of the game? That was my very point, oblique as it might seem to you.

    (The locution ‘you write as if’ does not mean ‘you write because’)

  22. John Morales says

    beholder,

    There are alternatives out there.

    True. But PZ is not self-employed, is he?
    He has to use what his employer deems proper, if anything is to be used.

  23. snarkhuntr says

    This is probably going to get a lot worse before it gets better. We’re currently in the midst of an AI goldrush/hype cycle. Some AI systems have shown themselves able to do some really neat things (ChatGPT / Midjourney / Stable diffusion / etc). When you’re trying to train an AI to do things ‘like humans do them’, you need a vast source of training information produced by humans. Without the vast network of (mostly) human-created text to train LLMs, chatGPT doesn’t exist. Without vast archives of images created by humans and labelled by humans with machine readable text, text-to-image machines like Midjourney and SD just produce noise. Without the large archives (github etc) of human-produced-annotated-and-categorized computer code – no LLM is going to write useable software.

    Unfortunately for the AI rush – the AI’s content is also being distributed through the internet, often being passed off as human-created or otherwise not identified as AI output. This is a serious concern for AI makers, because the more AI-output you feed into the AI’s input, the more noisy and less useful that new AI’s output is going to be. Finding clean, human-produced datasets is going to be a big part of the new goldrush, I think.

    I suspect that you’re going to see more and more companies who process human-created information realize that that information has a resale value to the AI industry. I’m sure that the vampires at Zoom are trying to figure out a way to market ‘anonymized’ human conversation data/recordings/audio to future AI trainers without completely alienating their userbase – just like Twitter, Reddit, Github, many stock-photography companies and probably hundreds of other companies I can’t even imagine are doing. Clean human-produced information is going to get more valuable for the next few years.

  24. wzrd1 says

    beholder @ 25, but then one has to retain and pay skilled staff to configure and maintain such things, while such cheaper options abound with mere data retention being a thing.
    I wish I was joking, but I’m not. My video experience goes back to ancient times, with a massive bandwidth intensive Cu-SeeMe upgrades, Microsoft stepped in briefly with a freebie that killed CU and the bandwidth intensive options, then that went away.
    At the end of the day, corporate sees dollar signs, payroll and benefits vs hosted and hostaged. And given the sheer effective volume of ransomware attacks that are successful and paid off, yeah, retaining skills isn’t very high on corporate list requirements.

    snarkhuntr @ 28, remember dot bomb? Enough said, tons of vaporware to sell quickly. Hell, John Morales stumbled over it and didn’t bother to detail it.
    Yeah, my AI code is in my other set of balls, excuse me while I get them. Oh, Oh… Ummm, I seem to have mislaid them. Unicorn farts is all the offering right now.
    And for job offers, if I see AI offered as a service to “improve my resume”, it’s a gigantic gamma ray illuminated warning label.
    I enjoy science fiction films and books, not science fiction resumes, which would just go into a blacklist.

  25. John Morales says

    Me: [folk is already a collective noun]
    Fanboi querulousbob: “the eyerolling parenthetical”

    <snicker>

    Dude, I am 100% correct, however many RPM your eyes can manage.

    And, as usual, you pop into a thread merely to try to somehow, any how denigrate me. As usual, you fall flat on your face.

    As usual, tothing to say about zoom, about PZ’s plight, or about AI, or anything related to the topic at hand.

    (Your singular focus, as usual, is me. There, there, fanboi — have this dropping)

  26. John Morales says

    So tedious.

    You can’t help yourself from making threads tedious, quackybog. I get it.
    You’re doing it right now.

    You feel impelled to write about me, instead of about whatever the post is about.
    Evidently.

    (Quite the sad Walter Mitty thing you have going there)

  27. steve1 says

    Zoom is a HIPAA compliant platform but I question if this update to,their terms. of servvice violates HIPAA.
    I think lawyers are needed. My expiernce with the law has only taught me that I don’t know much about the law.

  28. seversky says

    Podiatrist joke, “I didn’t think wearing orthopedic shoes would help but I stand corrected.”

  29. says

    @35 Steve, most lawyers don’t know that much about the law. The good ones… admit that they need to do further research, and know enough to get started. (This is why proclamations from “activists” on what the law is should be taken with 10kg pure NaCl : Most activists don’t, and those who do tend to ignore what they don’t like and instead substitute their view of what the law should be for what the law is.)

  30. snarkhuntr says

    @35 steve1,

    Remember that HIPAA also allows the sale of so-called anonymized medical data to for-profit data brokers. Deanonymizing such datasets is trivially easy once they’re correlated with other large datasets. No government agency is dealing with this, because that would be messing with The Money, which government is loath to do. See also: commercial sale of cell-phone location data.

    All Zoom would need to do is promise to ‘anonymize’ the training datasets they sell, there will not be consequences if people are subsequently able to deanonymize those datasets after the fact. Any other result would require people with money to suffer consequences, and our system avoids that at all costs.

  31. rorschach says

    We use Jitsi at work, it works well and doesnt seem to suffer from Zooms privacy craziness.
    As to AI, I used the pro version of ChatGPT this week to have it code me a trading robot, and all I got was a bare C## template with the helpful comment “insert your trading logic here”. Needs work.

  32. wzrd1 says

    A/V encryption suffers from far too many thoughts to the past.
    Need encryption, which is expensive and intensive on CPU, not realizing, reality today.
    SHell into a machine, it’s encrypted with rather strong encryption, if you set it up correctly.
    Anything beyond is encrypted.
    Use ancient cuseeme protocol, so what? Use Network Video, so what, use seemyballs, so what? It’s already encrypted. Add an encryption layer, slow it down a bit, but it arrives intact.
    The biggies are man in the middle and escrow key attacks. Add in bugs in software to perfect the picture, not confound it.
    I can’t enter into why the early objections, an NDA, but the latter is open source. As is much of the NDA griping.
    All currently used to object to the existence of crypto, due to some few pedophile criminals.
    Who were arrested and convicted despite their crypto.
    Which is utterly impenetrable.
    Ignore the convictions, it’s 2×2=5.

  33. John Morales says

    wzrd1:

    Need encryption, which is expensive and intensive on CPU, not realizing, reality today.

    Not really. And, of course, there are GPUs as a matter of course, in reality today.

  34. wzrd1 says

    John Morales: Yeah, because encryption cards haven’t been available since, what, ever?
    And GPU’s vs supercomputers is, well, GPUsoupercomputers is nothing, as well, scale means nothing, quantum computing hand wave or something.

  35. wzrd1 says

    John, point being, at drivespace, which I do well remember, my nut size and my wife’s tit dimensions weren’t recalled.
    With them, it is, precisely and essentially, without real permission.

  36. John Morales says

    wzrd1, whether or not Zoom calls store participants’ testicular and breast volumes, I note you never did say how they store all that data from all those video streams so they can trawl it for AI training. (cf. #20)

    With them, it is, precisely and essentially, without real permission.

    Unreal permission, in other words. Permitted permission?

    Functionally, if a user knows about this and still uses the platform, they are tacitly granting permission, no? If they don’t know about this, then they are ignorant users.

    (A bit of an asymptotic curve, are rules for the mitigation of outcomes for the ignorant)

  37. wzrd1 says

    How does the NSA store all that metadata in Utah?

    As for tacit permission, is that the same permission a woman gives going outside and is then raped, knowing that’s possible whenever she goes outside, so that means it’s acceptable in your distorted universe?