Where Life Science And Data Science Meet - Interview With Romeo Kienzler Of IBM

Romeo Kienzler works as Chief Data Scientist in the IBM Cloud Transformation Lab Zurich, he holds a M. Sc. Degree from the Swiss Federal Institute of Technology in Information Systems, Bioinformatics and Applied Statistics. He is a big fan of Open Source and the Apache Software Foundation.

Romeo will be giving a talk on Apache Flink cluster deployment on Docker using Docker-Compose at the Flink Forward conference in Berlin this year!

Dataconomy: When did you first realise that working with data was going to be your career?

Romeo: 15 years ago I was involved as a JEE developer in some DB performance tuning and saw the metrics field “cardinality” – a column which I didn’t know the meaning of. That was actually the first time I realised that an aggregated view of a data set can give you valuable insights.

I then started a M. Sc. degree at the Swiss Federal Institute of Technology and failed the statistics course. I took two weeks off to prepare for the repetition exam which I passed with A score, and since then Ive fallen in love with Statistics.

Dataconomy: Could you describe the journey that led to joining IBM and your current role?

Romeo: Passing this very tough exam gave me so much self-confidence to take a specialisation track in my studies on Bioinformatics and Applied Statistics – in parallel the Big Data hype at IBM was starting and I could transfer from the WebSphere group to the Information Management group. At this time IBM was launching BigInsights – a commercial, enterprise ready version of Hadoop – and we installed it at a major Swiss Bank for a machine learning based Fraud Detection usecase in conjunction with our Realtime Analytics Product “InfoSphere Streams”.

Dataconomy: You’ve taken a fairly staggering number of online courses, is this how you stay ahead of the curve? Any other tips?

Romeo: Yes, actually this is my major source of knowledge. But then it is very important to work in projects to apply it. I’ve been very lucky these days because I was nearly the only one having this set of skills and have been assigned to very interesting international projects. So in case you are not that lucky I’d suggest to work on something like the KAGGLE competitions – probably with other team members. And of course check out the local meetup in this area.

Dataconomy: Any courses you particularly recommend?

Romeo: Yes – the best course on earth is Machine Learning from Stanford with Andrew Ng. But I can also recommend Mining Massive Datasets – also from Stanford, the Advanced Statistics for the Life Sciences Course from Harvard and the two new Apache Spark courses from Berkeley.

Dataconomy: What kind of problems do you take on at the IBM Innovation Lab?

Romeo: I’ve been working on some Life Science problems on massive parallel, in-memory DNA sequence alignment in the cloud using IBM InfoSphere Streams as well as on brain seizure prediction based on EEG streams using IBM InfoSphere Streams and BigInsights (Hadoop). We have also implemented a system for a major financial client capable of intercepting internal HTTPS network traffic to detect suspicious behaviour based on a multidimensional outlier detection algorithm using IBM InfoSphere Streams, BigInsights (Hadoop) and BigR/SystemML.

Currently I’m working on a risk management use case for a major bank by analysing payment data using GraphX/Spark for detecting critical nodes in the payment graph which are at risk to cause the financial system to collapse. Using Spark and MLLIB we work on a NLP system able to detect major events in streams of financial news feeds. In addition to that we are migrating Apache Flink to the IBM Container Cloud using Docker Compose and Docker Swarm.

Dataconomy: Is there a project you’re particularly proud of?

Romeo: I think the most interesting (and useful) project I’ve been working on is a Brain Seizure prediction system based on EEG streams because the first patients have already been connected to it and this eventually will save lives and gives me the feeling that I’ve done at least something useful.

Dataconomy: If you could go back 10 years, to when you were starting your career as a software engineer, what would you do differently?

Romeo: Nothing 🙂 I’ve always done what my heart told me to do and followed opportunities which just felt “right” to do. Although there where times I bit off more than I could chew – but I always ate it up (to say this in Frank Sinatras words) And especially these challenging projects where I ask myself why the heck I’ve offered my help to this project team lead to a dramatic increase in skills and experience which currently make up my personality.

Dataconomy: If you could throw yourself at any of the world’s problems, and to develop a technical solution, which would it be and why?

Romeo: Finding affordable drugs to treat HIV in 3rd world countries is something which I feel I should work on one day because Life Science evolved to a Big Data problem (Next-Generation Sequencing) a Data Scientists with Life Science background (which holds true in my case) should not waste time in writing software to optimise profit of financial sector companies, they can do more useful things. 🙂

Dataconomy: Which companies or individuals inspire you, and keep you motivated to achieve great things?

Romeo: My personal guru is Manfred Spitzer – a Ph. D in Medicine and Philosophy. I’ve watched nearly all his youtube videos and he taught me so much about myself, my brain and how and why things are as they are in this world. Similarities and differences between human brains and computer systems are now more clear to me. Besides personality – which I leave to the Philosophers to discuss – I think the human brain is nothing else than a very sophisticated classifier and pattern recognition engine capable of processing a very high dimensional feature set on a massive parallel scale and with complete data processing and data storage locality – the neuron – because data processing and storage is basically done at the same place. Very interesting concept 🙂

Dataconomy: Thanks for the time Romeo!

Don’t forget to check out Romeo’s presentation at this year’s Flink Forward conference on 12-13 October!