I could have used this last semester

I’m on a search committee for a tenure track position in statistics and computer science — we’re looking for someone to teach a data science course, maybe a little bioinformatics on the side, and work with both our statistics and computer science disciplines. I’m the outside member of the committee — you know, the weirdo who isn’t steeped deeply in the culture of the disciplines and maybe is better able to provide the big picture perspective on how candidates will fit with the rest of the university — so I know next to nothing about this stuff. My eyes were crossing and my brain was breaking as I reviewed candidate applications. What I really needed was this bingo card. I think I saw all of those terms fly by as I was flipping through CVs and research and teaching statements.

Don’t worry, I deferred to the expertise of my colleagues on all matters dealing with the details of their work.

It’s always interesting, though, to peek into the domains outside my own, and feel a little humbled at all the stuff I don’t know.


  1. says

    It’s funny because it is true. But only for US conferences, not European ones, where functional programming and cloud would have to be much more prominent.

  2. sirbedevere says

    Being the “outsider” can actually be enjoyable and informative if the “insiders” gave good attitudes. I’m a photographer and digital artist and I teach currently in the Art & Graphic Design department. But at my previous job (my first full-time teaching position) all the courses I taught – Photoshop, Multimedia Design and Web Design – were deemed by that university to belong to the Computer Science & Information Systems department. So I found myself to be the lone artist in a department full of people with PhD’s in math and computer science! Smart people with good attitudes and a pleasure to work with.

  3. Enkidum says

    R doesn’t suck! The help files suck, in a truly appalling way, and the user community are a bunch of snobbish assholes. But when you actually figure out how to do something it’s probably the most powerful and flexible stats-and-plotting-specific language out there.

    (Well, actually, I really only know three or four languages, but from what I understand little else comes close).

    And, uh, Merry Christmas?

  4. ekwhite says


    I was thinking the same thing. What sucks about a free programming language with the same functionality as statistical packages costing thousands of dollars? I haven’t personally interacted with the user community, but I can usually find a package that does what I need online.

    Oh yes, and Merry Christmas to you too.

  5. says

    R doesn’t suck. I personally prefer Python+numpy+scipy+matplotlib by a long shot, but R seems to be good at what it does.

  6. anchor says


    So anyone who may be qualified for that position knows not only what those things in those boxes mean, but (according to the CVs of the best candidates, no doubt) actually have an academically-trained aptitude or expertise in those things, whatever they may be…over and beyond, presumably, knowing that these items are specifically sought for by potential hirers, who just as presumably understand exactly what these things are and know they need them..

    I am perplexed impressed.

    PZ, you must be my hero. I insist.

  7. Chelydra says

    The “iris data set” is from Edgar Anderson’s work in the ’30s on morphological delineation of species. He carefully measured petal and sepal dimensions for several populations of “blue-flag” irises and was able to demonstrate that the difficult-to-distinguish, variable Iris versicolor and Iris virginica really were consistently seperable. He also proposed from the data set that I. versicolor was a hybrid between I. virginica (eastern US) and Iris setosa (Alaska), which was subsequently proven genetically.

  8. Pierce R. Butler says


    What do the opening notes of the Pink Panther theme signify in data science?

  9. Compuholic says


    Hadoop is a programming framework that allows for easier handing of large datasets. It allows to distribute storage and computation over many systems.