Again, a short memo.
You can pretty much get what I am going to say here from the title: I really think we should think of statistics in two frames: science perspective and engineering perspective. And this, in a way, is inspired from Greg Mankiw's nice writing about how we should view macroeconomics in these two frames (see http://www.nber.org/papers/w12349 ).
I had a 180-degree turn on how statistics should be (mentioning in case anyone knows of my past thoughts) - and I now believe that statistics should more become like machine learning (more on how machine learning is different from statistics later) - or at least as a matter of "science."
That is, I do think that how machine learning thinks of statistical inference - basically learning algorithms - is the right way to do statistics science.
But not all sciences are complete, and some tools being developed scientifically often are incomplete and not ready for use. This is why we need engineering tools. In statistics, this tends to be human modelling intuitions and tips built from experience confronting reality. When one science is developing, far from completion, there are engineering tools that seem to work in reality and yet feel suspicious in terms of existing scientific frameworks. For example, Dirac's hole theory was replaced with a different scientific understanding, and yet in engineering the idea of "holes" is sometimes still taken up for convenience. With hindsight, we can consider Dirac's hole theory as an engineering tool to describe reality even if it is incomplete.
Similarly, we can even understand "Standard Model" in physics as an engineering model of the future, once we get to understand how a unified physics theory works.
To go back to statistics, there is the question of "how statistics should be" (science) and "most useful pictures statistics can provide" (engineering). There are times when these tend to go differently. Thus, it may be true that statistics should more be about learning algorithms, but for current understanding of learning algorithms, more traditional statistical approaches may be the best option available (though we can understand them also as particular learning algorithms).
------------------------------------------
Addition: it is often said that machine learning tools often tend to look at data and have disregards to structural models. But this is not necessarily the case. Think of the case on how machine learning tools would incorporate knowledge of classical physics, which can be said to operate in our fairly non-relativistic world (that is, we can live as if the world is non-relativistic most of the time). Simply make a meta-learning algorithm being used to choose learning algorithms that make use of classical physics knowledge that analyze data so that analysis never does not go in conflict with classical physics. It is actually traditional statistical approaches that have harder times incorporating these established knowledge. And in terms of checking established truths, machine learning approach is much better in dealing with data inconsistent with established truths, but this is a topic for another day.
(One can also think of typical machine learning problems. Many of these problems can be clearly stated - like "winning strategies in Go" - and thus one can think of learning algorithms that work best for these particular classes of problems, instead of general ones. One can go pure theoretical and derive an optimal learning algorithm using characteristics of these problems. Or one can go somewhat empirical and think of what has so far worked out best.)
No comments:
Post a Comment