Finding and hiring a top-notch data scientist is a tough endeavor. The distinct skill set at the intersection of mathematics, statistics, information technology and business is rare to find and suitable candidates are well aware of their market value. Salary expectations well beyond $100,000 dollars are the new normal. Nevertheless, high expectations are oftentimes crushed and hired data scientists don’t perform as promised. Thus, pre-hire assessment of the candidate’s skill set is essential – is it time for a data science GMAT?
Ever since DJ Patil praised the data science profession as the “sexiest job of the 21st century“, usage of the term has reached inflationary dimensions. Many professions have rebranded themselves under the umbrella of “data science”, and everyone who is somehow dealing with data in their daily life tends to be called a “data scientist”.
Since there is still no common definition of “data science” and the associated skill set, the flexibility of the term is prone to exploitation. While companies with established data science teams have formed clear expectations regarding the skill and mindset of new recruits, companies without any prior data science experience oftentimes experience trouble in this regard. Consequently, assessing candidates in the recruitment process is only possible on a superficial level. Resume credentials, published and peer-reviewed research, GitHub repositories or prior work experience might serve as good proxies to determine the suitability of the candidate. But how is the candidate tackling real-life data science challenges? How can the prospective hire distill the problem and implement value-adding solutions? Currently, there is almost no possibility to check.
Every year, universities need to confront a similar challenge. Admission for graduate programs at top universities is not one dimensional, enrollment committees consider applicants from around the world and from a broad variety of academic institutions with different scientific standards. Assessing the suitability of a candidate cannot simply be narrowed down to a review of grades. In light of this situation, most universities require applicants to hand in GMAT scores.
Do we need a GMAT for data scientists?
Given the similar structure of the problem, why is there no such thing as a GMAT for data scientists? For companies it would provide great value, enabling them to quickly navigate incoming applications. Indeed, Facebook and Google support the Udacity nanodegree programs, which try to teach candidates the skills required for jobs in their tech departments. Graduates are (oftentimes) automatically eligible to apply and in some cases receive job offers straight after graduation.
This tendency underlines the structural problem in hiring qualified data scientists. If you have not seen them work on actual problems yourself, it is very tough to judge on quality and qualification. Companies with established data science teams – for example Soundcloud, Zalando and Amazon – have established the practice to give their candidates a real-life data challenge, which the candidate can work on to prove his or her skills. However, those companies are only able to compare incoming applications, lacking an overarching relative benchmark. Companies without an acting data science team face an even tougher challenge as they do not possess in-house capabilities to pose or judge the challenge. Consequently, a rather standardized assessment challenge to test data science skills would actually be of great value to both.
How would a “data science GMAT” look like?
As opposed to the university version, a “data science GMAT” would ideally resemble a real-world challenge. Just like consultancies try to test candidates in interviews with “case studies”, data scientist candidates should be confronted with a challenge, which requires the candidate to reveal precisely those skills and qualifications required in a real-job scenario. What does that mean? Essentially: structuring a problem, working with data, building a minimum viable solution, suggesting improvements. Take the famous LinkedIn “people you may know” recommendation engine as an example. Its implementation gave valuable new insights to the data scientists, allowing them to refine and optimize their models.
For data scientists, problems rarely come in a structured manner. Understanding challenges from a business perspective, narrowing them down to workable and testable questions is thus essential. Any challenge would describe a scenario in a detailed manner, requiring the candidate to distill the information for the most relevant. However, to enable cross-solution comparison, the data to work with and the the coding challenge should be clearly formulated and be focused on the standard “data science tasks” (classification, prediction, etc.), not breaching out to highly domain-specific knowledge – keeping in mind that the code needs to be evaluated regarding performance against a clearly defined set of metrics.
As a second component, the candidate should be asked to reflect on his/her solution. While the code oftentimes (if properly documented) reveals good insight into the candidate’s approach, it is only this qualitative angle, which makes the assessment holistic. Candidates should be asked to elaborate on challenges, pros and cons and possible ways to evaluate their solution. While it might seem obsolete to ask these questions (especially if a candidate has performed great in the coding section), it oftentimes reveals lines of thought, insights about the work ethics and habit of the candidate and thus relevant information in preparation for the hiring decision. Such a “technical essay” would, that goes without saying, have to be assessed by an actual person. That might certainly – as opposed to a multiple-choice GMAT – spike the cost of evaluation. However, relative to current salary expectations and opportunity costs this increased evaluation costs is still marginal in extent.
Can a standardized test really assess all the necessary data science skills?
Of course not. Just as the GMAT cannot assess the suitability of the candidate for graduate programs, a standardized test or challenge for data scientists cannot rule out that suitable candidates will fail or unsuitable candidates perform well. Given the broad range of skills required, any challenge will always be biased to certain skills.
While a somehow standardized test as outlined will certainly never be the universal solution to hiring a data scientist, it can still provide adequate guidance to help companies make more informed decisions. Especially for those companies, which do not have any data scientist in-house, it minimizes the costly risk of making the wrong initial hire. As long as there is still no such thing as a standardized data science assessment, companies should consult 3rd party support, whether through other companies with existing data science teams, service providers or independent institutions, in order to holistically assess their data scientist candidates with adequate and comparable challenges.
Like this article? Subscribe to our weekly newsletter to never miss out!