Let’s dig out another article from my drafts bin, something I don’t think I was ever going to finish. This draft was titled “Data Science as function fitting”. It begins:
In the buzzword-ridden age of AI hype, perhaps there’s value in having more unmagical ways of talking about data science. My most unmagical description, is that data science is basically function fitting. It’s like what Excel does when you ask Excel to draw a trendline through a set of data points.
So that’s the thesis statement. But is it a thesis worth arguing? It feels kind of “hot take” quality to me. When you have a large enough audience, you start to be aware that at least a few readers are experts, and will call you out on your bullshit. And so immediately after stating the thesis, I felt it was necessary to add a ton of qualifying statements. You know, like I didn’t really mean it.
In the draft, I explained some of the basics of data science, specifically supervised machine learning. Asking Excel to draw a trendline through a set of data points is an example of machine learning.
However, I also wanted to explain why machine learning isn’t quite like an Excel trendline. When Excel draws a trendline, the trendline exists in two dimensions: x and y. A typical machine learning model has N dimensions. This has significant implications on the methodology and intuition. In data science this is known as “the curse of dimensionality”.
What’s wrong with this draft? Aside from the hot-take thesis, it’s fine. Nothing wrong with an explainer. I just think I lost motivation to finish it, especially considering that this is well-trodden ground.
Sometimes I wonder, why bother writing explainers? It’s a bit like handing out homemade treats. If you feel a connection with me, you might place more value on what I make, even if objectively it may not be any better than what you can get at the store. If I wrote a good explainer (big if), some readers may come out more informed, whereas if I merely linked a good explainer, readers probably wouldn’t bother clicking through.
So, the pro-social benefit to educating people about data science fundamentals seems pretty marginal to me. And it’s directly proportional to the size of my audience, which is not that large. This is why I lean much more on intrinsic motivation for blogging rather than pro-social motivations.
Leave a Reply