90% of everything is junk


I read Larry Moran’s What’s in Your Genome?: 90% of Your Genome Is Junk this week — it’s a truly excellent book, everyone should read it, and I’ll be making a more thorough review once I get a little time to breathe again. Basically, though, he makes an interdisciplinary case for the sloppiness of our genome, and it’s all that evidence that we should be giving our biology students from day one.

Anyway, I ran into a similar story online. Everything accumulates junk, from your genome to my office to Google. Cory Doctorow explains how search engines are choking in their own filth.

The internet is increasingly full of garbage, much of it written by other confident habitual liar chatbots, which are now extruding plausible sentences at enormous scale. Future confident habitual liar chatbots will be trained on the output of these confident liar chatbots, producing Jathan Sadowski’s “Habsburg AI”:

https://twitter.com/jathansadowski/status/1625245803211272194

But the declining quality of Google Search isn’t merely a function of chatbot overload. For many years, Google’s local business listings have been terrible. Anyone who’s tried to find a handyman, a locksmith, an emergency tow, or other small businessperson has discovered that Google is worse than useless for this. Try to search for that locksmith on the corner that you pass every day? You won’t find them – but you will find a fake locksmith service that will dispatch an unqualified, fumble-fingered guy with a drill and a knockoff lock, who will drill out your lock, replace it with one made of bubblegum and spit, and charge you 400% the going rate (and then maybe come back to rob you):

https://www.nytimes.com/2016/01/31/business/fake-online-locksmiths-may-be-out-to-pick-your-pocket-too.html

Google is clearly losing the fraud/spam wars, which is pretty awful, given that they have spent billions to put every other search engine out of business. They spend $45b every year to secure exclusivity deals that prevent people from discovering or using rivals – that’s like buying a whole Twitter every year, just so they don’t have to compete:

https://www.thebignewsletter.com/p/how-a-google-antitrust-case-could/

I’m thinking I should advertise Myers Spider Removal Service on Google, and then I respond to calls by showing up, collecting a few spiders, bring them back to my lab, and increase their numbers a thousand-fold, which I then return to the house in the dead of night. Then they call me again.

Hey, it’s a business model.

The comparison of Google’s junk to our genome’s junk falls apart pretty quickly, though, because your cells have mechanisms to silence the expression of garbage, while Google is instead motivated to increase expression of junk, because capitalism.

Comments

  1. says

    A year or so ago I saw a listing for a Saskatoon bar that had closed more than 30 years ago, showing it as open and giving its hours of business. I don’t know how many businesses have been in that particular space subsequently.

  2. wzrd1 says

    Well, a sizable amount of that junk DNA is of retroviral origin.
    Such as this junk DNA…
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6177113/
    Downside is, some oncogenes originate from ERV’s as well.

    As for fake locksmiths, not a problem here, as locksmithing is one of my favorite hobbies. I’ve also been known to leave a killer key laying around. Lock killer keys are a nuisance against such folks that abuse spare keys that are about, as once inserted, they cannot open the lock or be removed without first removing and dismantling the lock or having a specialized insert to allow that key to be removed.

    As for Google, been pondering increasingly moving away from Google’s crap search engine, as the results are growing worse and worse. Yahoo’s been giving results on par with what Google returned before they “improved” their search engine into uselessness by turning it into an advertisement engine.

  3. Reginald Selkirk says

    The short form:

    What’s in Your Genome?
    Larry Moran, 2011

    This is a list breaking down the components of junk DNA by size. Obviously the book is going to have more context and explanatory text. I bought my copy, but haven’t read it yet.

  4. hemidactylus says

    Moran’s evisceration of ENCODE was priceless and well deserved. He sets context before getting into the nitty gritty of that. Paraphrasing Metallica ‘the light at the end of ENCODEs tunnel was a freight train of junk coming their way’. He’d been doing this on his blog for years.

    I was a bit confused on the distinctions between general noncoding DNA and junk DNA beforehand as much RNA transcription (tRNA, rRNA etc) comes from noncoding regions. I think I’m clearer on that. It’s the fuzziness of the concept adaptionist propagandists exploit by creating the popular perception that biologists thought so much of the noncoding regions were junk and suddenly functions are being ascribed. Urban legend.

    The notion of spurious transcription and junk RNA was a little bit novel to me. ENCODE made mountains out of molehills with the spurious stuff that meant nothing in actuality and hyped it as discovery of a largely functional genome.

    One thing I pondered was to what extent there is spurious translation into junk peptides.

    Also he takes issue I think (its been a while since I read the book) with the concept of evolvability as in junk regions allegedly being exaptable pools of novelty. Not so much.

    He does touch upon the interesting case of Syncytin-1 where a retrovirus just happened to be functionally cooptable into aiding placental development (PZ’s ballpark).

    I probably need to reread the book for more depth. Looking forward to PZ’s take.

  5. says

    Me, too. The plan was to put it up before I left for Cornell, but that idea got sideswiped by that drunken trash article by Coyne.

  6. robro says

    Like nomaduk @ #5, I use DuckDuckGo and have no complaints. I did searches for “locksmiths” and was offered an auto-complete of “locksmiths near my location”. The first two results were ads, so I skipped those. This was followed by an info box from Apple Maps with locksmiths in the town where I live and then other local locksmiths. I did a similar search for “handyman” with the auto-complete of “handyman near me” and got local results, although no ads.

    In all fairness, I did similar searches in Google search with similar results. I don’t use Google search much unless I’m looking for photos.

    Obviously Google, DuckDuckGo, and other search bots are no position to vet the results. “Buyer beware” is sage advice when shopping online.

    The “auto-complete” aspects of those searches is probably the result of AI/ML/LLM technologies on the back end. I have a colleague who is an graph database “ontologist” who refers to chatbots, like ChatGPT, as just high-powered “auto-complete”: you give it a prompt, it returns a statistical result.

    In any case, I don’t think we’re anywhere near doomed by AI/ML/LLM. There are other bigger problems to deal with.

  7. birgerjohansson says

    When I look back at stuff I have written in the past I would say 50- 70% of it is bland or garbage. So I am slightly better than machine learning (I will not dignify it with the term AI).

  8. chrislawson says

    I have plenty of complaints about DuckDuckGo, but its deficiencies are stable while Google is rapidly decaying as a functional search engine.

  9. hemidactylus says

    I do also recall Larry taking the hype of alternative mRNA splicing down quite a few notches. I was enthralled by that idea back in the late 90s and want to think Larry keelhauled me a bit back then on talk.origins or at least that idea. He earned the nickname Waldorf (balcony muppet) as a term of affection from me.

    https://en.m.wikipedia.org/wiki/File:Statler_and_Waldorf_2.jpg

    https://static.wikia.nocookie.net/muppet/images/4/48/WaldorfLookingUpStock2022.jpg/revision/latest?cb=20221017222518

    https://biochemistry.utoronto.ca/wp-content/uploads/2014/10/Moran-e1412390438910.jpg

    Striking resemblance?

  10. hemidactylus says

    Also I don’t know if John Wilkins lurks here but Larry references him several times in the book. “[M]y friend John Wilkins” in one reference! Another “One of my friends is John Wilkins, a philosopher and a veteran of the creationist wars. He has been studying the species concept for several decades. I highly recommend his latest book: Species: The Evolution of the Idea (2nd ed.).”

    I’m going old school with that!

  11. wzrd1 says

    hemidactylus @ 6, the fun part is, Syncytin-1 is part of an ERV. The paper I referenced above discusses it in some length, as well as other ERV’s that became part of the placental system.
    I’m feeling a tad lazy, but it shouldn’t be too difficult to find about links to some genetic diseases that involved junk DNA defects preventing proper reading of the “good” DNA. Fouled non-coding DNA as well and more numerous, as those can foul peptides, structural proteins, enzymes and a bit more.

  12. tacitus says

    Google has become a lot less useful on the tech side too since the top results have all long since been hijacked by superficial autogenerated comparison sites and “top 10” lists filled with useless drivel and/or promoting whichever products earns them the most money through their Amazon affiliate link.

    For the last year or two, more and more people were discovering that adding “reddit” to the search terms was a more fruitful way to discover good information about tech products created by real people and not bots. Trust the Reddit CEO to go and screw all that up too…

    I guess the one upside is that it gives more specialized well curated sites a chance to compete against the behemoth that is Google, not long after it all seemed to be a hopeless task. They don’t need a billion views to be profitable either, so maybe there’s room for a healthy ecosystem to thrive where mega corporations are failing.

  13. wzrd1 says

    They’re not top 10 lists, they’re top 10 paid advertisement sites that paid Google top 10 for top listing. Followed by sponsored sites, followed by search results that reflect the sponsored sites, then random bullshit that has absolutely no relationship with what one was searching for.
    I used to be able to use regex to search Google, but that’s been fouled for much of this year.
    Heh, just checked Google search trends, they’re filtering “Google search sucks”, displaying hit numbers on the chart, but mysteriously, “no data is available”. I know that’s been one of my searches, call it feedback.

  14. says

    The commenters on PZ’s blog are usually astute. But, they show extreme gullibility by using GOOGLE. For years we have been warning people that – –
    G00GLE is dangerous crap. It profiles you, gathers a personal dossier on you and feeds you results that IT thinks you want to see (or what it is paid most to push at people). There are almost no ‘native search’ results. (native search being an unbiased search for whats most relevant to your search terms)

    Please, use duckduckgo. It is not perfect, it does have a few ads, but clearly labels them and does not track you or create its own profile of you personally. Its native search results are quite good.

  15. John Morales says

    shermanj:

    The commenters on PZ’s blog are usually astute. But, they show extreme gullibility by using GOOGLE.

    You are exceedingly naive; Google is just another service, made to make money just as all other “free” searches.

    I use it, it does not use me.

    Please, use duckduckgo.

    You don’t think they’re there to make money? Heh.

  16. John Morales says

    I admit Google is far, far less useful that it was before it was monetised without end and its functionality scrapped.

    Used to be one could construct a search query using logical operators and set operators and so forth, and set various limits on the search.

    Ostensibly, it’s all still there (some, such as the site: operator still work as expected), but now are overridden somewhat poorly. Basically, the easiest and most non-frustrating way to use it is to just treat it as a natural language search and limit the terms as much as possible.

  17. says

    Hmmm, I think I hear a troll scrambling around in the debris under this bridge. I’m going to ignore it and hope it goes away.

  18. says

    @21 wzrd1: I apologize, I was working on repairing a friend’s computer and almost missed your CLEVER reference to an old expression: You’re not going to let him ‘get your goat’.

  19. John Morales says

    Calling someone an ‘it’ is what some might categorise as dehumanising, not just othering.
    Saying to the world at large that you are ignoring that ‘it’ is not ignoring that ‘it’.

    (Take a look at yourself)

  20. John Morales says

    And chortling about how your goat was not got is also not ignoring me, again.

    (You’re so sure I’ve not got your goat?)