Trent McConaghy has been doing AI and ML research since the mid 90s. He co-founded ascribe GmbH, which enables copyright protection via internet-scale ML and the blockchain. Before that, he co-founded Solido where he applied ML to circuit design; the majority of big semis now use Solido. Before that, he co-founded ADA also doing ML + circuits; it was acquired in 2004. Before that he did ML research at the Canadian Department of Defense. He has written two books and 50 papers+patents on ML. He co-organizes the Berlin ML meetup. He keynoted Data Science Day Berlin 2014, PyData Beriln 2015, and more. He holds a PhD in ML from KU Leuven, Belgium.
Follow Peadar’s series of interviews with data scientists here.
At PyData in Berlin I chaired a panel – one of the guests was Trent McConaghy and so I reached out to him, to hear his views about analytics. I liked his views on shipping it, and the challenges he’s run into in his own world.
What project have you worked on do you wish you could go back to, and do better?
Before I answer this I must say: I strongly prefer looking forward. There’s so much to build!
I’ve made many mistakes! One is having rose-colored glasses for criteria that ultimately mattered little. For example, for my first startup, I hired a professor who’d written 100+ papers, and textbooks. Sounds great, right? Well, he’d optimized his way of thinking for academia, but was not terribly effective on the novel ML problems in my startup. It was no fun for anyone. We had to let him go.
What advice do you have to younger analytics professionals and in particular PhD students in the Sciences?
Do something that that you are passionate about, and that matters to the future. It starts with asking interesting scientific questions, and ends (ideally) with results that make a meaningful impact on the world’s knowledge.
What do you wish you knew earlier about being a data scientist?
As an AI researcher and an engineer: one thing that I didn’t know, but served me well because I did it anyway, was voracious reading of the literature. IEEE Transactions for breakfast:) That foundation has served me well my whole career.
How do you respond when you hear the phrase ‘big data’?
Marketing alert!!
That said: I like how unreasonably effective large amounts of data can be. And that it’s shifted some of focus away from algorithmic development on toy problems.
What is the most exciting thing about your field?
AI as a field has been around since the 50s. Some of the original aims of AI are still the most exciting! Getting computers to do tasks in superhuman fashions is amazing. These days it’s routine in narrow settings. When the world hits AI that can perform at the cognitive levels of humans or beyond, it changes everything. Wow! It’s my hope to help shepherd those changes in a way that is not catastrophic for humanity.
How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?
I follow steps, along the lines of the following.
- Write down goals, what question(s) I’m trying to answer. Give yourself a time limit.
- Get benchmark data, and measure(s) of quality. Draw mockups of graphs I might plot.
- Test against dumbest possible initial off-the-shelf algorithm and problem framing (including where I get the data)
- Is it good enough compared to the goals? Great, stop! (Yes, linear regression will solve some problems:)
- Try the next highest bang-for-the-buck algorithm & problem framing. Ideally, it’s off the shelf too. Benchmark / plot / etc. Repeat. Stop as soon as successful, or when time limit is hit.
- Ship!
Peadar Coyle is a Data Analytics Professional based in Luxembourg. He has helped companies solve problems using data relating to Business Process Optimization, Supply Chain Management, Air Traffic Data Analysis, Data Product Architecture and in Commercial Sales teams. He is always excited to evangelize about ‘Big Data’ and the ‘Data Mentality’, which comes from his experience as a Mathematics teacher and his Masters studies in Mathematics and Statistics. His recent speaking engagements include PyCon Sei in Florence and he will soon be speaking at PyData in Berlin and London. His expertise includes Bayesian Statistics, Optimization, Statistical Modelling and Data Products
(Image Credit: Tris Linnell / Turing Bombe / CC BY SA 2.0 )