Gary Farber, unsung prophet of the internets


I’ve been catching up with the blogs, and I’m seeing outrage over the revelation that the NSA has been carrying out wide-spectrum data mining of the American people…that it hasn’t just been surveillance of suspected terrorists. You know, if everyone would just read Gary Farber, you’d have known this five months ago. That’s how data mining works. Now people are trying to argue that we knew it all along, so it’s OK—but this is exactly what the administration has spent the last several months denying.

It’s not just the surveillance. It’s the lying. Well, the obtuseness, too.

Comments

  1. Miguelito says

    Now they can track who is calling who.

    Next, they will be searching domestic telephone conversations for keywords, then linking the keywords to who calls who, because an anomalously high number of keywords for one person can lead to a terrorist. If you’ve done nothing wrong, you have nothing to fear by appearing in their database. You want to keep America safe, right?

    Finally, there will be transcriptions of all conversations. Again, if you’ve done nothing wrong, you have nothing to fear about them looking at everybody’s past conversations, right?

    And, with Bush and whatever Republican president that follows him in charge (can anybody honestly see a democrat getting elected in the next decade?), you can guarantee more Alito-like supreme-court nominations, where the candidates believe in strong police powers over civil liberties. Nothing will be able to control intelligence gathering if dictated by the president.

    And the police state will be there, but come upon you so gradual you won’t know it until the morality police are at your bedside one morning because of a conversation you had the previous evening.

    Yes, I’m being paranoid. But, the line between security at the federal and state levels can become blurred if states demand access to such a resource. Should sodomy ever become a crime again, then it becomes possible for state police to mine existing databases to round up offenders.

  2. Keanus says

    Seldom am I grateful for being serviced by a very local independent phone company, but this is one time I am. Our service (which includes reliable very high speed DSL) comes from that giant of telecom, D & E, a mostly rural service provider in central Pennsylvania. Bush knows nothing (at least I don’t think he does) about whom I’ve phoned in the last four years. Having said all that, where possible, those who buy their wireless, land lines, or long distance from Verizon, AT&T, or Bell South, should find another provider. Any sob who reveals private phone usage to Dubya needs to be punished.

    Tellingly, the quotes attributed by Qwest and the USAToday report to the NSA officials reveal that they knew perfectly well that what they were doing was at best very suspect. How do people get away with this?

  3. Mark Paris says

    “Next, they will be searching domestic telephone conversations for keywords …”

    Miguelito, I strongly suspect that the NSA is already doing that. As I pointed out in a comment on Ed Brayton’s blog on this subject, that’s the only reason I can think of for the Bush administration not to get warrants for the NSA program, since they can tap first and ask later. It would be hard to get a warrant for every tappable call in the US.

    I wish SB would fix their commenting problems.

  4. Stwriley says

    Actually, Bush is still denying that data mining is taking place, even though the only possible use of such a database is for (wait for it…) data mining! His public comments today, as widely reported, fall back on the old “we’re only targeting al-Qaeda” line. The quote found in most press outlets runs like this (from the BBC newsfeed):

    “The privacy of ordinary Americans is fiercely protected,” he said, adding: “We are not mining or trolling through the personal lives of millions of innocent Americans.”

    If you’re unsure exactly how it’s possible to gather data on who Americans are calling without “trolling through” their personal lives, then you have passed the reason test and are clearly an enemy of the administration (oops, good thing I didn’t say that on the phone.) Yes, once again, the current occupant has lied in an obvious and clumsy attempt to keep his lawless behavior acceptable to the American people. The question now is, will this finally make people see how important this story really is?

  5. says

    Paul Begala made an excellent comment on CNN. He pointed out that the large telephone companies that have agreed to cooperate with the administration on domestic surveillance may have an ulterior motive. Begala reminded viewers that the phone companies are looking to remove net neutrality and would need the support of the administration to get that done. His point is that these companies and the administration are playing “we scratch your back and you scratch our backs.” It certainly sounds like a plausible scenario.

    more observations here:

    http://www.thoughttheater.com

  6. says

    “Next, they will be searching domestic telephone conversations for keywords, then linking the keywords to who calls who, because an anomalously high number of keywords for one person can lead to a terrorist.”

    Close, but I’m dead sure they’ve been doing this for years. The only “next” is when that will be leaked, and everyone will express such surprise.

    Similarly, the nonsense they’re putting out now about how it’s just phone records, without names attached, is inane. Ever heard of a phone book?

    And as I’ve said many times, anyone who thinks that the NSA isn’t also making use of every commercial data-mining databank out there, credit cards, bank records, utilities, everything, and correlating it all, is incredibly naive.

    But we’ll have to wait for that, too, to be leaked, and then people will be further shocked, shocked at that.

    “Bush knows nothing (at least I don’t think he does) about whom I’ve phoned in the last four years.”

    Yeah, if you’ve ever called outside your county, I wouldn’t count on that, since as I’ve been writing for months, as has been much written about, as James Risen and Eric Lichtau and Murray Waas and others have documented, the switches are where the data is flowing from. (Bruce Schnier linked to my last substantive post on this, if you want someone else with authoritative knowledge to vouch for it.)

    Thanks, as always, for the link, P.Z.; I didn’t have you in mind with the post I just did about the Founding Fathers, but I imagine you’ll appreciate it.

  7. says

    Incidentally, I did find that clicking “remember personal info” prevented me from commenting after I signed into Typekey, and I had to go back to sign in to Typekey, but not click on that option, to be able to comment. So it’s quite possible that it’s not a “random” problem, but that problem, specifically.

    Meanwhile, this is terrible advice: “Delete your scienceblogs.com cookies. This is made very simple with the use of the “Zap Cookies” bookmarklet, which can be found on this page.”

    Why? Because if people are so ignorant as to not know how to delete their cookies eight other ways from Sunday, they’re not going to know that this advise will delete all their cookies, and you’ve just mightly screwed up zillions of their interactions with the internets! For goodness sake, advise them to delete just the cookie for here, if need by (and apparently it’s unnecessary), but don’t tell them to get rid of all their cookies without telling them what effect that will have on every site they’ve got a standing log-in at!

  8. Frost says

    This is outrageous. How much more are Americans willing to tolerate? Does the majority of them care about civil liberties at all?

  9. Caledonian says

    No, they don’t.

    Perhaps the solution is for a percentage of the population to get on their phones and discuss assassinating the President. That’d like screw up the database right quick.

  10. says

    Next, they will be searching domestic telephone conversations for keywords, then linking the keywords to who calls who, because an anomalously high number of keywords for one person can lead to a terrorist.”
    Close, but I’m dead sure they’ve been doing this for years. The only “next” is when that will be leaked, and everyone will express such surprise.

    No. They haven’t been. Not because they don’t want to (I’m sure the NSA would love to), but because they can’t. Effective voice recognition needs training data pegged against known text. That’s why when you buy Dragon Speaking or similar products you have to read pages and pages of material to it, and even after doing so, it still doesn’t work for anyone but you. Voice recognition uses nearly identical machine learning technology to what is used in gene finders (and I’ve written a gene finder).

    So, unless a bunch of Men in Black show up to your house asking you to recite dozens of pages of text into a microphone, they aren’t going to automatically find key words in your phone messages. Of course, if you a high profile “threat” they may employ *people* to listen to your phone calls, but that doesn’t scale well.

  11. says

    “Effective voice recognition needs training data pegged against known text. That’s why when you buy Dragon Speaking or similar products you have to read pages and pages of material to it, and even after doing so, it still doesn’t work for anyone but you.”

    I could be wrong, but I’m willing to bet a nickel that the Puzzle Palace has more advanced voice recognition software than DragonSpeaking. I’m even willing to bet a dime.

  12. says

    Not that I’m saying that they have magically perfect-working voice recognition software, mind. I merely suspect that they have such that’s good enough to get some use from. The entire Program wouldn’t make much sense without it. Have you read Risen and Lichtbau?

  13. says

    Not that I’m saying that they have magically perfect-working voice recognition software, mind. I merely suspect that they have such that’s good enough to get some use from.

    Maybe the NSA does have somewhat better software than what’s out there commercially, but it isn’t any more reasonable to assume that they have machine learning techniques that don’t require training sets than to assume that they have perpetual motion machines. Unless Majestic-12 is giving them alien technology from Roswell or something.

    The entire Program wouldn’t make much sense without it. Have you read Risen and Lichtbau?

    If it’s the article I’m thinking of, they claim that the NSA is listening in (without warrents) on the conversations of about 500 people. That’s completely believable using people. The NSA has over 30,000 employees.

  14. says

    In point of fact, I’ve posted so many posts and links about NSA voice recognition that I’ve lost track, off the top of my head. Here’s one. I see you’re not familiar with Risen, so here’s another. It might be no larger than 500 people in the U.S. being listened to at a given time, but it’s thousands overseas (Risen said seven thousand), and from a vastly larger pool. As you can see from my previous cite, “machine analysis” and filtering are specifically outlined.

    That is significant to the public debate because this kind of filtering intrudes into content, and machines “listen” to more Americans than humans do.

    Etc.

  15. says

    Gary, the point is claims on your blog, or even those of a professional journalist like Risen, can’t negate actual technological limitations. I think it is pretty significant that nobody making claims of machine recognition of key words in phone conversations has any background in machine learning. To someone who has never coded up a Hidden Markov Model or Support Vector Machine, the idea that algorithms could train themselves without data maybe seems completely within the dark powers of the NSA, but to someone who has, the idea just doesn’t make sense.

    Again, I’m not defending the NSA — illegal wiretapping is unethical no matter how it’s done — but the NSA is made up of mere mortals and their algorithms have to obey the same constraints as anybody else’s.

  16. says

    I’ll certainly agree I can’t speak to the technical aspects. I’d be rather surprised to find that Risen got things so wrong, though.

    For the record, though, no, I don’t believe that “the idea that algorithms could train themselves without data maybe seems completely within the dark powers of the NSA.”