The Beauty in Data Science. Who are you calling a sophist? | by S. T. Lanier | The Startup

The Beauty in Data Science. Who are you calling a sophist? | by S. T. Lanier | The Startup | Medium

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Medium

The Beauty in Data Science. Who are you calling a sophist? | by S. T. Lanier | The Startup | Medium

Play all audios:

WHO ARE YOU CALLING A SOPHIST? No, I don’t mean AI art, though that is beautiful. I mean the intrinsic, intellectual beauty of the field, before it’s ever taken into the realm of art. I

suppose this is going to be my “why data science” post. My last semester at college, I had the good luck to take a class that left me with so much to think about that I still think about it,

years later. The serendipitous combination of my being interested in the material, the whole of the class seeming to get along well, and the whole of the class seeming to adore our young

instructor — as well as it being a summer session, and all of us left feeling there was just a little extra time in the day to smell the roses — have left me with the memory of those last

salad days being some of my favorite on campus. The class: a reading of Plato’s _Republic_. It’s full of rich ideas and rich language. Armchair Socrates opens with, “I went down to the

Piraeus yesterday…” and later delivers such gorgeous lines as these: > “The object of education is to teach us to love what is > beautiful.” > > “Nothing beautiful without

struggle.” More famous than any one quote, though, is his allegory of the cave. There’s a good chance it’s already familiar to you in some version. Plato tells the story of a man — all men,

actually — chained to a wall in a cave, forced to watch the wall for shadows cast by the fire behind them. That’s their whole world until one man breaks free, ascends from the cave, and is

blinded, literally, by truth, the light of day. He’s disoriented, in pain from too much sunlight, but gradually he adjusts and comes to a clearer understanding of reality. He returns to the

cave to tell the others, but they’re afraid, angry even, at what he has to say: the shadows on the wall are their reality. Plato, and through him Socrates, is the grandfather, the

cornerstone of thought, on which so much of Western philosophy is built, and perhaps in no better way does the lineage show itself than in this idea: the allegory of the cave is about the

effect of education on human nature; the idea that education is the way out of the cave for humankind; and the relationship, perhaps contrastive, between perception and reality. Like all

good metaphors, that’s not the only thing this is about. It’s also about Plato’s theory of forms, that our reality is populated by imitations of truth. For his theft, a burglar is made to

pay a fine, an act of justice, but that is not the _form_ of justice — merely an (imperfect) instance of it. One of the shadows on the wall. None of us really knows what perfect justice

looks like. To make an analogy for this audience, think about object-oriented programming. The forms are like the classes themselves (perfect), and imitations are like the various instances

of that class (imperfect). You never actually have the _class_ (form) as an object — it’s untouchable, too abstract to be realized in a literal way — you have _instances_ (imitations) of

that class as objects. Not a perfect analogy, but a good enough one. I felt myself stumble into some of these same ideas reading something that many here should be familiar with, _An

Introduction to Statistical Learning_. Early on, the authors talk about how, given some quantitative response _Y_ and _z_ different predictors, _X_₁_, X_₂_, …, X_𝓏, there is an assumed

relationship between _Y _and _Χ_ = (_X_₁_, X_₂_, …, X_𝓏) defined as Where _f_ is some function on the input space and _ε_ is a random error term with mean 0 (p.16 of _An Introduction_). In

other words, the relationship between the input X and the output Y _can _be perfectly described, but for the most part, that knowledge is beyond us. What we can do instead is approximate the

relationship between the input and the output as since _ε_ has mean 0, where _f-hat_ represents an estimation for the function _f _and _Y-hat_ represents the prediction that’s made from it

(17, _An Introduction_). Even if we could make the predicted function _f-hat _a perfect match of the function _f _, there would still be some error in _Υ-hat _since we haven’t accounted at

all for random error _ε_. In other words, some of the error in our predictions is _irreducible_ — unfixable. In some ways, our approximation is very much an imitation in the Platonic sense;

the form, _Y_, is unreachable. And yet! The imitation can be made very good. That’s what the whole study of mathematical prediction is, in a nutshell: if nature has a blueprint, how close

can we come to understanding it that we might anticipate it. The form is out there; how close can we get the imitation? When we train a model well, we’re approaching this perfect form that

describes the relationship between our predictors and the outcome. We’re essentially picking the model that’s closest to the Platonic _form_ of the data. Every time you train a model, every

time you see your accuracy jump up or your RMSE plummet, you feel the thrill of knowing you approach something previously unknowable. Bit by bit, you can see your algorithm hacking away at

the shadows, pulling you out of the cave, throwing you into the sunlight. It’s always the right time to read a good book. Here’s a link to _Republic_ (I’m partial to the Reeve translation,

though there are myriad free versions available online).