William is a Data scientist at Quora, interested in data-driven decision making to improve both product and business. Always interested in learning new things and exploring the ubiquity of data in everyday life.
The Interviewees
DJ Patil is VP of Product at RelateIQ (acquired by Salesforce). He’s held many other roles, including former Chief Scientist at LinkedIn. DJ co-coined the term “Data Scientist” and co-authored Data Scientist: The Sexiest Job of the 21st Century. DJ transitioned into data science following a research scientist position at University of Maryland. Follow him at @dpatil(Twitter) or at Dj Patil (Quora).
Michelangelo D’Agostino is currently Senior Data Scientist at Civis Analytics. Formerly, he was head of data science at Braintree and was a senior analyst on the 2012 Obama Campaign’s analytics engine. He transitioned into data science following a PhD in Astr ophysics from Berkeley. Follow him at@MichelangeloDA (Twitter) or at Michelangelo D’Agostino (Quora).
1. Seek fast, collaborative environments
During graduate school, Michelangelo did his PhD on IceCube, a neutrino physics experiment at the South Pole. There, they measured cosmic neutrinos via sensors buried in the polar ice cap. One big transition for him as a physicist was the opportunity for him to learn in a fast, collaborative, environment. Michelangelo explains:
All of a sudden, I was working with a couple hundred people all around the world, half in Europe, half in the US in all these different time zones.
It felt like I wasn’t working on something by myself. I was working on really interesting problems with other smart people and doing really hard work. I think that was what kept me in grad school – knowing that I was working with other smart people in a collaborative environment.
Michelangelo goes on to explain that fast, collaborative environments are what distinguishes doing data science in industry from doing research in academia. After doing work for the 2012 Obama Re-election Campaign, Michelangelo briefly contemplated going back to finish his post-doc, but decided to stay in data scientist because the environment suited him better.
I like working with people a lot more than I like working by myself. I like to work on things that have more impact. You see a lot more of it in industry, in data science, than you do in research.
I like the pace a lot more. I think research can often be very slow, especially particle physics. It takes 10 years to build an experiment now. You have to have a monastic personality to be a physicist nowadays.
Unfortunately, this kind of environment can be rare for any PhD student working in a research environment. DJ Patil explains the culture shock that many PhD students get when they get their first job in data science:
In academia, the first thing you do is sit at your desk and then close the door. There’s no door anywhere in Silicon Valley; you’re out on the open floor. These people are very much culture shocked when people tell them, “No you must be working, collaborating, engaging, fighting, debating, rather than hiding behind the desk and the door.”
I think that’s just lacking in the training, and where academia fails people. They don’t get a chance to work in teams; they don’t work in groups.
Ultimately, DJ says that forgetting that data science is collaborative is a common mistake people make when considering jumping into data science.
People make a mistake by forgetting that Data Science is a team sport. People might point to people like me or Hammerbacher or Hilary or Peter Norvig and they say, oh look at these people! It’s false, it’s totally false, there’s not one single data scientist that does it all on their own.
Data science is a team sport, somebody has to bring the data together, somebody has to move it, someone needs to analyze it, someone needs to be there to bounce ideas around.
It’s a common trap during one’s PhD to end up only focusing on one’s dissertation and research. Seek out opportunities to further your research in fast, collaborative environments. This includes getting involved with large collaborative projects in your department, team-based competitions in your field, working with others on side or related projects, actively speaking about your research, and attending various conferences, events, and activities!
2. Delve deeply into hard, dirty problems
Not everything you learn in graduate school is specialized domain knowledge. In fact, the experience of working on difficult problems and the strategies that you use to approach them is one of the most valuable skills that Michelangelo picked up during his astrophysics PhD. To get that experience that will ultimately become relevant to data science, Michelangelo suggests:
Work on a hard problem for a long time and figure out how to push through and not be frustrated when something doesn’t work, because things just don’t work most of the time. You just have to keep trying and keep having faith that you can get a project to work in the end. Even if you try many, many things that don’t work, you can find all the bugs, all the mistakes in your reasoning and logic and push through to a working solution in the end.
Specifically, you should be always looking for applications of your research on real, live datasets. This gives you the wisdom of all the nuances when dealing with large, messy datasets, and allows you to understand much more than just the theory of your research. Michelangelo explains:
You can read about it, and people can teach you techniques, but until you’ve actually dealt with a nasty data set that has a formatting issue or other problems, you don’t really appreciate what it’s like when you have to merge a bunch of data sets together or make a bunch of graphs to sanity check something and all of a sudden nothing makes sense in your distributions and you have to figure out what’s going on.
3. Do things beyond your academic specialty
During graduate school, the research that you are doing with your advisor might seem all-consuming. However, it is useful to step back, look at the bigger picture, and pursue the other skills that may serve to augment your experience as a PhD student. DJ offers a reminder:
Many people who come out of academia are very one-dimensional. They haven’t proven that they can make anything, all they’ve proven is that they can study something that nobody (except maybe your advisor and your advisor’s past two students) cares about. That’s a mistake in my opinion.
During that time, you can solve that hard PhD caliber problem AND develop other skills. For example, giving talks, coding in hackathons, etc. Do things in parallel and you’ll get much more out of your academic experience.
A traditional academic curriculum is actually lacking in teaching all of the skills one needs to even become a data scientist. Michelangelo notes this in his interview and says:
You can’t finish a degree and know all the things you need to know to be a data scientist. You have to be willing to constantly teach yourself new techniques.
Michelangelo elaborates that not constantly teaching yourself new things sends a negative signal to companies looking to hire data scientists.
From a hiring perspective, when I talk to PhD students who say they want to be data scientists, I become skeptical if they haven’t taken any active steps.
“Hey, I participated in these Coursera courses or these Kaggle competitions.” or “I’ve gone to the Open Government Meetup and have done these data visualizations.”
Things like that demonstrate that you can work on problems outside your academic specialty, and they show that you really have initiative.
One of the largest dangers of coming out of academia is that you constrain yourself into an environment that rewards an intensely narrow focus on one thing. To expand and to be able to tackle the challenges of becoming a data scientist, you must continuing developing your other skills in parallel, and always be on the lookout for new challenges and opportunities.
More Resources
The Data Science Handbook features interviews from 25 amazing data scientists, including DJ and Michelangelo. Sign up at Data Science Handbook to get 3 free interviews (including the full interviews with DJ and Michelangelo).
For more concrete advice on making the transition from academia to data science, check out the answers at
- How do I become a data scientist?
- What is the Data Science topic FAQ?
- How do I become a data scientist as a PhD student?
Subscribe to Storytelling with Statistics to get updates on more posts like these!
(This post was originally published on Quora)
(Image Credit: Eric Fischer)