## regression analysis -- 12/20/18

**Today's encore selection -- from Super Crunchers: Why Thinking By Numbers is the New Way to Be Smart by Ian Ayres.** Regression analysis is the original and still most-widely used statistical technique for analyzing data, which today is used prevalently by retailers, banks, internet search engines and even internet dating services to determine customer preferences, tendencies and other correlations:

"A regression is a statistical procedure that takes raw historical data and estimates how various causal factors influence a single variable of interest. In [internet dating services, for example,] the variable of interest is how compatible a couple is likely to be. And the causal factors are twenty-nine emotional, social, and cognitive attributes of each person in the couple.

"The regression technique was developed more than 100 years ago by Francis Galton, a cousin of Charles Darwin. Galton estimated the first regression line way back in 1877. [Take for example] Orley Ashenfelter's simple equation to predict the quality of wine. That equation came from a regression. Galton's very first regression was also agricultural. He estimated a formula to predict the size of sweet pea seeds based on the size of their parent seeds. Galton found that the offspring of large seeds tended to be larger than the offspring of average or small seeds, but they weren't quite as large as their large parents.

Portrait of Galton by Octavius Oakley, 1840 |

"Galton calculated a different regression equation and found a similar tendency for the heights of sons and fathers. The sons of tall fathers were taller than average but not quite as tall as their fathers. In terms of the regression equation, this means that the formula predicting a son's height will multiply the father's height by some factor less than one. In fact, Galton estimated that every additional inch that a father was above average only contributed two-thirds of an inch to the son's predicted height.

"He found the pattern again when he calculated the regression equation estimating the relationship between the IQ of parents and children. The children of smart parents were smarter than the average person but not as smart as their folks. The very term 'regression' doesn't have anything to do with the technique itself. Galton just called the technique a regression because the first things that he happened to estimate displayed this tendency -- what Galton called 'regression toward mediocrity' -- and what we now call 'regression toward the mean.'

"The regression literally produces an equation that best fits the data. Even though the regression equation is estimated using historical data, the equation can be used to predict what will happen in the future. Galton's first equation predicted seed and child size as a function of their progenitors' size. Orley Ashenfelter's wine equation predicted how temperature and rain would impact wine quality."

## COMMENTS (0)