I ♥ data. But I also ♥ privacy. Even though I have been running a few data meetups for the past few years, I have managed to keep myself out of public eye – even in my professional life. I must admit, I’m giving a bit away in an recent interview with Data Science Weekly. They had excellent questions to pose, and I’d like to take a moment here to summarize two main points from the interview: my thoughts on when and how much time it took me to became a data scientist and how I feel about the hype in the field today. If you’d like, you can read the full interview here.
I think I became what is called today a data scientist many years ago (before the data science term was used to describe this), but it took me a good number of years, at least 10, to be comfortable with the term. I often like to cite Peter Norvig, Director of Research at Google: “Teach Yourself Programming in Ten Years”. Like most others in the data business years ago, I came from a data-intensive hard-science field (Physics) and I picked up the tools of trade from various books in various disciplines and by doing work with data. It’s certainly way easier now, as there are many more books, online courses etc., and you don’t need to grab this knowledge from various disciplines on your own. The software tools are also more mature – but it still takes some time to become a seasoned data scientist.
The field has certainly changed, but many things stay the same. I like to temper the extraordinary hype I often see around data science with the long history of the field prior to the term data science. Many of the machine learning algorithms we are using today were originally developed in the 1990s or 2000s; many of today’s software tools have their foundations from more than a decade ago. Of course, building on this strong foundation are innovations from a quickened pace which have resulted in new, primarily “add-on” tools which have massively improved my productivity over the last years (note: read the full interview to find out more). Despite this pace and improved productivity, it’s important that we remember that we have been engaging in many of these kinds of data analysis and building advanced analytics applications long before the boom of the past five years. During these changes in the field, let’s also keep honest about what data science can and cannot do, as there are still many challenges and limitations we face. And even in cases when it can produce results, there are several pitfalls (e.g. invalid statistical procedures) that require thinking and expertise in order to avoid.