Machine learning has a pseudoscience problem

I saw this interesting paper linked on Bluesky

The reanimation of pseudoscience in machine learning and its ethical repercussions

It is from Patterns Volume 5Issue 9, September 13 2024, and talks about the harms of ML throughs its promotion of pseudo-science, or as the paper states:

The bigger picture

Machine learning has a pseudoscience problem. An abundance of ethical issues arising from the use of machine learning (ML)-based technologies—by now, well documented—is inextricably entwined with the systematic epistemic misuse of these tools. We take a recent resurgence of deep learning-assisted physiognomic research as a case study in the relationship between ML-based pseudoscience and attendant social harms—the standard purview of “AI ethics.” In practice, the epistemic and ethical dimensions of ML misuse often arise from shared underlying reasons and are resolvable by the same pathways. Recent use of ML toward the ends of predicting protected attributes from photographs highlights the need for philosophical, historical, and domain-specific perspectives of particular sciences in the prevention and remediation of misused ML.

Summary

The present perspective outlines how epistemically baseless and ethically pernicious paradigms are recycled back into the scientific literature via machine learning (ML) and explores connections between these two dimensions of failure. We hold up the renewed emergence of physiognomic methods, facilitated by ML, as a case study in the harmful repercussions of ML-laundered junk science. A summary and analysis of several such studies is delivered, with attention to the means by which unsound research lends itself to social harms. We explore some of the many factors contributing to poor practice in applied ML. In conclusion, we offer resources for research best practices to developers and practitioners.
The problem is simply put that the people responsible for the ML, cannot evaluate the data they feed into the ML. Or as the paper explains:
When embarking on a project in applied ML, it is not standard practice to read the historical legacy of domain-specific research. For any applied ML project, there exists a field or fields of research devoted to the study of that subject matter, be it on housing markets or human emotions. This ahistoricity contributes to a lack of understanding of the subject matter and of the evolution of methods with which it has been studied. The wealth of both subject-matter expertise and methodological training possessed by trained scientists is typically not known to ML developers and practitioners.
The gatekeeping methods present in scientific disciplines that typically prevent pseudoscientific research practices from getting through are not present for applied ML in either industry or academic research settings. The same lack of domain expertise and subject-matter-specific methodological training characteristic of those undertaking applied ML projects is typically also lacking in corporate oversight mechanisms as well as among reviewers at generalist ML conferences. ML has largely shrugged off the yoke of traditional peer-review mechanisms, opting instead to disseminate research via online archive platforms. ML scholars do not submit their work to refereed academic journals. Research in ML receives visibility and acclaim when it is accepted for presentation at a prestigious conference. However, it is typically shared and cited, and its methods built upon and extended, without first having gone through a peer-review process. This changes the function of refereeing scholarship. The peer-review process that does exist for ML conferences does not exist for the purpose of selecting which work is suitable for public consumption but, rather, as a kind of merit-awarding mechanism. The process awards (the appearance of) novelty and clear quantitative results. Even relative to the modified functional role of refereeing in ML, however, peer-reviewing procedures in the field are widely acknowledged to be ineffective and unprincipled. Reviewers are often overburdened and ill-equipped to the task. What is more, they are neither trained nor incentivized to review fairly or to prioritize meaningful measures of success and adequacy in the work they are reviewing.
This brings us to the matter of perverse incentives in ML engineering and scholarship. Both ML qua academic field and ML qua software engineering profession possess a culture that pushes to maximize output and quantitative gains at the cost of appropriate training and quality control. In most scientific domains, a student is not standardly expected to publish until the PhD, at which point they have typically had at least half a decade of training in the field. Within ML, it is now typical for students to have their names on several papers upon exiting their undergraduate. The incentives force scholars and scholars in training to churn out ever higher quantities of research. As limited biological agents, however, there is a bottleneck on time and critical thought that can be devoted to research. As quantity of output is pushed ever higher, the quality of scholarship necessarily degrades.
The field of ML has a culture of obsession with quantification—a kind of “measurement mania.” Determinations of success or failure at every stage and level are made quantitatively. Quantitative measures are intrinsically limited in how informative they can be—they are, as we have said, only informative to the extent that they are lent content by a theory or narrative. Quantitative measure cannot, for instance, capture the relative soundness of problem formulation. It has been widely acknowledged that benchmarking is given undue import in the field of ML and, in many cases, is actively harmful in that it penalizes careful theorizing while rewarding kludgy or hardware-based solutions.
A further contributing factor is the increased distribution of labor within scientific and science-adjacent activities. The Taylorization or industrialization of science and engineering pushes its practitioners into increasingly specialized roles whose operations are increasingly opaque to one another. This fact is not intrinsically negative—its repercussions for the legitimacy of science can be, when care is taken, a net positive. In combination with the other facets already mentioned, however, it can cause a host of problems. Increasingly, scholars and industry actors outsource the collection and labeling of their data to third parties. When—as we have argued—much of the theoretical commitments of a modeling exercise come in at the level of data collection and labeling, offloading these tasks can have damaging repercussions for the epistemic integrity of research.
All of the above realities work alongside a basic fact of modern ML: its ease of use. With data in hand and the computing power necessary to train a model, it is possible to achieve publishable or actionable results with a few hours of scripting and write-up. The rapidity with which such models are able to be trained and deployed works alongside a lack of gatekeeping and critical oversight to ill effect.
In my opinion, the paper makes the case for a new process, where people who actually knows the field are part of vetting the data given to the ML model.

Computer models advancing science

Computer modeling has become a more and more important tool for science. We have seen it in Climatology for decades, as well as in a number of other fields. People who have a poor understanding of science, or who are trying to deny science, such as creationists and climate change deniers, will often claim that it isn’t really real science, but that is of course pure nonsense, as empirical evidence has demonstrated it again and again.

Now, there is a new great example of how a computer model is advancing our understanding of science. As ScienceDaily reports:

First hominin muscle reconstruction shows 3.2 million-year-old ‘Lucy’ could stand as erect as we can

A Cambridge University researcher has digitally reconstructed the missing soft tissue of an early human ancestor — or hominin — for the first time, revealing a capability to stand as erect as we do today.

Dr Ashleigh Wiseman has 3D-modelled the leg and pelvis muscles of the hominin Australopithecus afarensis using scans of ‘Lucy’: the famous fossil specimen discovered in Ethiopia in the mid-1970s.

Wiseman was able to use recently published open source data on the Lucy fossil to create a digital model of the 3.2 million-year-old hominin’s lower body muscle structure. The study is published in the journal Royal Society Open Science.

The research recreated 36 muscles in each leg, most of which were much larger in Lucy and occupied greater space in the legs compared to modern humans.

For example, major muscles in Lucy’s calves and thighs were over twice the size of those in modern humans, as we have a much higher fat to muscle ratio. Muscles made up 74% of the total mass in Lucy’s thigh, compared to just 50% in humans.

Paleoanthropologists agree that Lucy was bipedal, but disagree on how she walked. Some have argued that she moved in a crouching waddle, similar to chimpanzees — our common ancestor — when they walk on two legs. Others believe that her movement was closer to our own upright bipedalism.

Research in the last 20 years have seen a consensus begin to emerge for fully erect walking, and Wiseman’s work adds further weight to this. Lucy’s knee extensor muscles, and the leverage they would allow, confirm an ability to straighten the knee joints as much as a healthy person can today.

The paper can be found at the Royal Society Open Science: Three-dimensional volumetric muscle reconstruction of the Australopithecus afarensis pelvis and limb, with estimations of limb leverage

Abstract

To understand how an extinct species may have moved, we first need to reconstruct the missing soft tissues of the skeleton, which rarely preserve, with an understanding of segmental volume and muscular composition within the body. The Australopithecus afarensis specimen AL 288-1 is one of the most complete hominin skeletons. Despite 40+ years of research, the frequency and efficiency of bipedal movement in this specimen is still debated. Here, 36 muscles of the pelvis and lower limb were reconstructed using three-dimensional polygonal modelling, guided by imaging scan data and muscle scarring. Reconstructed muscle masses and configurations guided musculoskeletal modelling of the lower limb in comparison with a modern human. Results show that the moment arms of both species were comparable, hinting towards similar limb functionality. Moving forward, the polygonal muscle modelling approach has demonstrated promise for reconstructing the soft tissues of hominins and providing information on muscle configuration and space filling. This method demonstrates that volumetric reconstructions are required to know where space must be occupied by muscles and thus where lines of action might not be feasible due to interference with another muscle. This approach is effective for reconstructing muscle volumes in extinct hominins for which musculature is unknown.

The paper is an interesting read and in my opinion fairly accessible.