The Scots version of Wikipedia is written by an American teenager


An American teenager who, by the way, doesn’t speak Scots. They copy Wikipedia articles written in English and use a dictionary to change a few words here and there, and they’ve been doing this at such a steady pace they’ve churned out tens of thousands of articles and hundreds of thousands of edits, and are the sole author on about a third of that version of the wiki.

The problem is that this person cannot speak Scots. I don’t mean this in a mean spirited or gatekeeping way where they’re trying their best but are making a few mistakes, I mean they don’t seem to have any knowledge of the language at all. They misuse common elements of Scots that are even regularly found in Scots English like “syne” and “an aw”, they invent words which look like phonetically written English words spoken in a Scottish accent like “knaw” (an actual Middle Scots word to be fair, thanks u/lauchteuch9) instead of “ken”, “saive” instead of “hain” and “moost” instead of “maun”, sometimes they just sometimes leave entire English phrases and sentences in the articles without even making an attempt at Scottifying them, nevermind using the appropriate Scots words. Scots words that aren’t also found in an alternate form in English are barely ever used, and never used correctly. Scots grammar is simply not used, there are only Scots words inserted at random into English sentences.

Wow. As a kid, this person must have stumbled into Wikipedia editing, discovered a formula for getting credit as an “author” by rote copying, and then turned it into a matter of personal prestige. He probably thinks he’s making a legitimate contribution, but it’s “simply English, spelled poorly, likely intending to resemble a stereotypical Scottish accent.”

This is my problem with Wikipedia, and why I tell students it’s not an acceptable source for their papers. The lack of professional oversight means some people’s enthusiasms take over and are used as a substitute for expertise, and you can never be sure when that has happened, so you have to double- and triple-check everything the wiki says — you can use it in an initial exploration, but mainly to pluck out additional sources and rely more on authenticated publications.

This is the first I’ve heard of such wholesale fabrication, though.

This is going to sound incredibly hyperbolic and hysterical but I think this person has possibly done more damage to the Scots language than anyone else in history. They engaged in cultural vandalism on a hitherto unprecedented scale. Wikipedia is one of the most visited websites in the world. Potentially tens of millions of people now think that Scots is a horribly mangled rendering of English rather than being a language or dialect of its own, all because they were exposed to a mangled rendering of English being called Scots by this person and by this person alone. They wrote such a massive volume of this pretend Scots that anyone writing in genuine Scots would have their work drowned out by rubbish. Or, even worse, edited to be more in line with said rubbish.

Now I’m wondering what Wikipedia will do about this massive vandalism. Are they just going to rely on crowd-sourcing the cleanup? Will they even try to do a cleanup?

Comments

  1. Dunc says

    Bonus points: people use wikipedia as training data for language processing applications.

  2. cartomancer says

    Aaaahz an Ahmerrrrriken aaaai thaenk thus iyyyys jerrrrrst fahhhhhn. Ahhhhh downet sayeee waaaaaahut ther prahhhblum iyyyyys.

    Look! I did a comment in American English! This is fun!

  3. vole says

    Don’t get me started. The “UK English” dictionary and grammar checker in MS Word contains loads of Americanisms, my pet hate being the way “any more” (2 words, correctly) gets flagged as an error. To write it as a single word is incorrect over here, but if supposedly authoritative software keeps telling people otherwise, it’s only a matter of time before the Americanism takes hold. It’s sneaky cultural imperialism.

  4. mailliw says

    I’d like to know there the series Castle got its appalling mangling of my native dialect in the episode The Geordie.

    The producer has since apologised to the people of Newcastle upon Tyne.

  5. says

    #2: I’m sorry, Cartomancer, that’s southern American. Yankee American is flatter and more nasal, and is the true correct way to speak the language and doesn’t need any substitution. Get used to it. You will be assimilated.

  6. says

    Although, I should mention, a kind reader donated a Macbook to me that I’m now using for all my teaching (the Linux machine does some things very well, but presentations and image editing…not). Only it’s from Australia. I first noticed when I went looking for the trash, and it’s not there — it’s a “bin”. Then I was getting interesting spelling corrections. Why do non-Americans like the letter “u” so much? When I activated Siri, I discovered she has a strong Aussie accent, too.

    It’s growing on me. I’ve decided to name the new computer “Sheila”.

  7. mailliw says

    #5

    In a previous job we had some colleagues from Texas visit us. We went for a meal in the evening and Eddie’s wife asked us “did y’all give Eddie a hard time?” I remember wondering, who is Jawl? Is he that Norwegian sysadmin?

  8. cartomancer says

    As long as you still call it “English”, I and my hilariously uptight fellow RP speakers get to decide how it’s spoken. This includes a rolling crackdown on split infinitives, proper use of “less” and “fewer”, and whatever other arcane rules 19th Century grammarians from Oxford and Cambridge came up with to make other people feel inadequate.

    The “u” in colour and ardour and suchlike stays, I’m afraid. It is necessary to remind people they are not writing in Latin. The only way around this stipulation is to actually write in Latin, which is a perfectly acceptable solution and entirely in keeping with our ongoing campaign of archaizing prescriptivist tyranny. Classical Greek is also acceptable (none of this New Testament Koine Greek rubbish, mind), and indeed highly recommended, because knowledge of it will help you to work within our antiquated British protocols for when you use -ise and when you use -ize (spoiler alert – the z is for when the verb has a Greek root, the s for when it has a French or Latin one).

    We may not have Macau or Mafeking or the New World possessions anymore, but by jingo we’re still in the Imperialism business where our language is concerned, and don’t you upstart colonials forget it!

  9. mailliw says

    #8 cartomancer

    The “u” in colour and ardour and suchlike stays

    But what happened to the u in governor?

  10. brucegee1962 says

    I’m sorry, cartomancer, but William Shatner has made it officially acceptable to boldly split infinitives now and for all time. A Captain outranks a bunch of fusty 19th-century Latin-loving grammarians any day.

  11. cartomancer says

    mailliw, #9

    governor is sufficiently different from the Latin gubernator that it doesn’t need the “u” to mark it out. Though if you want to put one in for old times’ sake then feel free. The letter doesn’t get as much work as its fellow vowels, so anything we can do to help it out is a bonus.

  12. mailliw says

    #8
    I think you might enjoy the Gesellschaft zur Stärkung der Verben – Society for the Strengthening of Verbs (https://neutsch.org/Startseite), an organisation concerned by the number of irregular verbs becoming regular and dedicated to reversing this trend.

    They go beyond keeping strong verbs strong, they also want make weak verbs irregular – including loan words.

    A particular favourite loan word example of mine is chillen (to chill) for which they suggest for chill, chall, gechullen as opposed to the regular chill, chillt gechillt.

  13. stroppy says

    But when traveling Ireland with your laptop, ‘Sheila’ would be considered somewhat derogatory, no? Or maybe spelling makes the difference. On the other hand, maybe appropriate if you’re mapping sheela-na-gigs…?

  14. PaulBC says

    This is my problem with Wikipedia, and why I tell students it’s not an acceptable source for their papers.

    It’s a good starting point. Most Wikipedia pages provide a list of citations, and these can be verified and compared to other sources.

    Wikipedia is an incredibly useful resource (I’m not saying you were saying it’s not, but I don’t know) and a great place to being investigations of nearly anything. It’s clearly not the final word, and even if it were more carefully curated, it would not suffice as a primary source. But neither does Encyclopedia Britannica.

  15. imback says

    It was Noah Webster, he of the dictionary, who tried to reform spelling by dropping the u in mold (and many other words) to make clear it rhymed with cold and not could. Starting in 1783, two years after the end of the Revolutionary War, Webster changed spellings to jail, draft, defense, magic, mask, center, plow, and catalog. He wanted to drop the ue in tongue and spell it tung, but that did not take. Also not taking was ake for ache, nabor for neighbo(u)r, soop for soup, wimman and wimmen for woman and women, and iz and waz for is and was.

  16. springa73 says

    One odd thing I’ve read is that the pronunciation American English has actually changed less over time than English in England itself, meaning that American English pronunciation is actually closer to 17th and 18th century English pronunciation than modern English English is. Apparently English in England underwent major changes in the 19th century that didn’t happen in the US.

  17. quotetheunquote says

    @brucegee1962:
    Okay, sorry, but, like Bill’s a Canadian right? Like, what’s he know about English, ya know what I’m saying…?

  18. mailliw says

    #16

    That’s good. She misses out the Bristol habit of adding an l to words that end in a vowel.

    My television reception is very poor.
    It’s cos of your areal.
    Do you mean my aerial or my area?
    Like I said, it’s cos of your areal.

    They have check point in the posher parts of Bristol where they ask people “what’s red and goes in salads” and if they say tomatol, they aren’t allowed in. The correct answer is of course radicchio.

  19. says

    @#8, cartomancer:

    Ah, I see you agree with me that we really ought to change the name of the language to “American”, because America has more native speakers of it than anybody else, which means that the evolution of the language is no longer in the hands of the English and presumably never will be again. After all, prescriptive dictionaries are obvious dead letters, and descriptive ones will seek to describe the language as the majority of speakers speak it, which means the way Americans speak it. Your grammar and spelling, therefore, are either a quaint dialect or merely wrong.

  20. vereverum says

    #3 Vole
    “It’s sneaky cultural imperialism.”

    better than the not at all sneaky imperialism of the Romans, the Angles, and the Normans, n’est-ce pas?

  21. starskeptic says

    PaulBC@15
    “It’s a good starting point. Most Wikipedia pages provide a list of citations,..”
    PZ very ably pointed that out at the end of the paragraph….

  22. Azkyroth, B*Cos[F(u)]==Y says

    [Doesn’t seem to have posted, trying again]

    The “UK English” dictionary and grammar checker in MS Word contains loads of Americanisms, my pet hate being the way “any more” (2 words, correctly) gets flagged as an error. To write it as a single word is incorrect over here, but if supposedly authoritative software keeps telling people otherwise, it’s only a matter of time before the Americanism takes hold. It’s sneaky cultural imperialism.

    Thinks back to the last few hundred Schroedinger-ironic steaming-piles-of-over-the-top-smuggardry I’ve encountered about “[comic sans]Colonials[/comic sans]” and their supposedly “improper” spelling and pluralization choices …sounds awful.

    (Like, seriously, apparently I’m weird for not finding British idiosyncrasies and affectations “charming” but I just cannot fathom the logic of a culture apparently deliberately choosing a variation on its own “upper class twit” stock character as the face it presents to the world.)

  23. nomdeplume says

    Oh PZ, “Sheila” was seen as old-fashioned in the 1950s. And it is good you are learning to spell words correctly with a “u”. Next thing you will be pronouncing “aluminium” correctly….

    I agree on Wikipedia. Many years ago I attempted to correct an entry on a project I had created, and which contained errors of fact about that project. My attempted corrections were removed and I was told I was not permitted to do anything with the entry because of my involvement. Catch-22 or what?

  24. birgerjohansson says

    With a zillion former Brit colonies keeping English as a parallel official language alongside the local languages, we inevitably get Indian English, Sri Lanka English, Nigerian English and so on.
    And the local varieties of English spoken by local groups of Black Americans have not only influences from West African languages but from the 17 and 18-century Brit dialects spoken by Brit indentured workers-slaves in everything but name- who worked beside the slaves.

  25. PaulBC says

    starskeptic@25 You are right. Maybe I ought to read more carefully. I would turn the emphasis around though: there is a ton of information online in wikipedia and other sources, and it’s worth reading. Just be careful when you try to claim it as authoritative.

  26. Dunc says

    I just cannot fathom the logic of a culture apparently deliberately choosing a variation on its own “upper class twit” stock character as the face it presents to the world.

    It’s a form of self-deprecation, which is a big thing in British culture.

    I wouldn’t get too smug if I were you, since the face the USA tends to present to rest of the world isn’t exactly great either.

  27. blf says

    The mildly deranged penguin points out that having some other sod do the work for free is a very characteristic Scottish trait…

  28. richardh says

    “More than a dialect.”
    In the words of Max Weinreich: “a shprakh iz a dialekt mit an armey un flot”.
    IOW the distinction between “language” and “dialect” is politics, not linguistics.

  29. blf says

    @28, “indentured workers — slaves in everything but name”.

    Whilst it is true that indentured service in the States, along with chattel slavery, was abolished in the States by the 13th Amendment, and by the Universal Declaration of Human Rights as a form of slavery; and that numerous abuses occurred with poor-to-no legal redress; and that the treatment wasn’t much — if any — different from the treatment of slaves, the quoted remark MAY be sailing a bit too close to the discredited meme that indentured workers were slaves (or versa-visa, as some thug politicians in the States (e.g., the governor of Virgina, Ralph Northam) are known to have claimed).

    For example, from Snopes, discussing perhaps the more persistent variant of that claim, the ludicrous notion that there were Irish slaves in the Americas, Were There Irish Slaves in America, Too?:

    What’s False: Unlike institutionalized chattel slavery, indentured servitude was neither hereditary nor lifelong; unlike black slaves, white indentured servants had legal rights; unlike black slaves, indentured servants weren’t considered property.

    […]

    Is it mere quibbling? Generically speaking, any form of forced labor can be called slavery. But what do we gain by doing so, besides blurring historical distinctions? Consider impressment, the 18th-century British naval practice of kidnapping young men and forcing them to serve on sailing vessels. That’s slavery, in a sense. So is being sentenced to hard labor in prison. But while these share features in common with the institution of chattel slavery in America, they are on a whole separate plane.

    It isn’t “bias” that keeps legitimate historians from substituting the term “slavery” for “impressment,” “hard labor,” or even “forced indentured servitude.” It’s a simple respect for the facts.

    (As an aside, it seems that every time I stumble across that Irish-were-slaves claim, it’s illustrated with a carefully-cropped image of Jean-Léon Gérôme’s Slave Market in Ancient Rome, making it even more hilarious.)

  30. stroppy says

    “IOW the distinction between ‘language’ and ‘dialect’ is politics, not linguistics.”

    There’s truth in that, maybe more to the extent that Scotts can be spoken as a register of English…?

    But one measure of language over dialect is mutual unintelligibility. Speaking for myself, I find plenty of Scottish people completely incomprehensible…at least when speaking. I did manage to work my way through But N Ben A-Go-Go with a little background study and a dictionary though. Perfectly enjoyable science fiction, by the way. I recommend it.

  31. wzrd1 says

    @blf #33, there is one area in which contracted workers are essentially treated still as property and have extremely limited recourse and rights, the US Armed Forces. The SCOTUS and legislators both claim it as a reasonable necessity, “due to the unique nature of military forces”.

    As for the working populace, especially the poor, legal recourse is essentially a promised unicorn fart, as retaining an attorney is out of the reach of the poor and lower middle class.

  32. KG says

    It’s a form of self-deprecation, which is a big thing in British culture. – Dunc@30

    No, really, we’re not all that good at it!

  33. Dunc says

    stroppy, @ #34:

    maybe more to the extent that Scotts can be spoken as a register of English…?

    You’re confusing Scots with Scots English. Scots is definitely not a register of English.

    But one measure of language over dialect is mutual unintelligibility. Speaking for myself, I find plenty of Scottish people completely incomprehensible…

    I can pretty much guarantee that none of those people were actually speaking Scots, as hardly anybody really does anymore. Even most Scottish people struggle to understand true Scots, in much the same way that most English people struggle with Shakespeare or Chaucer.