Data science is a new field, and it isn’t always obvious what makes a good data scientist. What should they know? Tools, frameworks, and technologies are always changing. In the midst of this shifting landscape data scientists can differentiate themselves by mastering one of the most useful tools from applied statistics: linear modeling. Last week I spoke to the latest class of fellows at Insight about this very topic and the slides from my talk can be found here.
Linear models are often under-appreciated. At first sight, they can seem to lack the novelty of more recent trends. But together with their modern extensions they are an important foundational tool for anyone working with data. Linear models have many virtues in their simplicity - they are interpretable, easy to extend and scale to even the largest problems. They are also a surprisingly effective tool in an enormous range of problems.