A long time ago, in January 2006, Business Week published an article entitled ‘Math Will Rock Your World” declaring, “There has never been a better time to be a mathematician.” The fact is that although this article is almost 15 years old, the article reinforces a consistently valid case for what math and physics students need to understand about how they may not only build Artificial Intelligence careers, but how they are poised to dominate hiring in the space.
Talent is at a premium and that premium is Huge for any AI Company
Our AI company is constantly on the hunt for hard driving, fun, caring and smart team members. It is our competitive advantage. We have the benefit of a growing marketplace and we are focused on being the best at math, delivery and a vision for what platforms we choose to focus on, with our current focus on enhancing Human Capital and Sales and Market Research and Forecasting.
How much is a talented AI expert worth to an AI company?
To set the stage for AI company talent battles, it may help to provide some simple math:
(1)Robust AI expert = +/- $7.5 Million in company value
Company value is defined as the value of the company at exit! In fact I may be a little conservative here because the sales of recent AI talent teams have yielded approximately $10 Million dollars per employee at exit.
Since all AI companies are looking for talent, it is important that we try to get the best talent from various available sources and academic backgrounds. Undoubtedly we are looking more and more for smart and capable math and physics expertise that we can assist and support as they “surrender” to solving machine learning and more broadly AI problems. As part of our goals as a company in building an innovative team, we continually consider ways that we can enhance the skillsets of Math and Physics students, those that perhaps don’t have deep ML or CS experience, so as to enable them to become leading engineers and thinkers in the AI space.
For Math and Science graduates building Artificial Intelligence careers, the future is extremely bright
We are in the age of the mathematicians and opportunities abound if you apply the right approach to the pedagogy of AI. It all starts with the training. In general, undergraduate math courses look something like this:
- Abstract algebra
- Calculus I and II
- Chemistry 100 level
- General Physics I and II
- Number theory
- Real analysis
- Complex analysis
- Intermediate analysis (point set topology)
- Intro to C++, C#, or perhaps Python
- Linear algebra (100/200 level course – maybe some 300)
- Discrete math
- Ordinary Differential equations
- Theoretical statistics (100/200 level)
- Numerical analysis 100 and 200
- Math reasoning
Although this academic preparation does provide a robust background, it is still not enough to form a strong foundation in and of itself.
Advanced degrees in Mathematics and Physics begin to build depth, or ‘Why Proofs Matter’
Although an undergraduate degree is just the start, as soon as we consider advanced degrees, we begin to see a robust value. That is coincidentally when we begin to understand the value of applying proofs to one’s work that some think things become interesting.
Why do proofs matter? The same reason Learning Theory does.
A mathematical proof is an inferential argument which convinces other people that something is true. Math isn’t a court of law, so a “preponderance of the evidence” or “beyond any reasonable doubt” isn’t good enough. In principle we try to prove things beyond any doubt at all — although in real life people always make mistakes.
We need proofs in math, first, because we want to be sure that what we do is correct – but not just correct – efficient (I will come back to this notion of efficiency later as we touch on the importance of Learning Theory for aspiring data and AI scientists). There are enough sources of error in our calculations, from imprecise measurement to misunderstanding of the formulas we should use, that it’s important to make sure that our thinking doesn’t add more error. So proofs not only mean checking our accuracy but our underlying reasoning.
In math, unlike science or any other field, we actually can prove that what we do is right. That’s because math is not dependent on partially known physical laws or unpredictable human behavior, but simply on reason.
No greater person understand the relationships I refer to here than my colleague, friend and advisor, Dr. Brenda Dietrich. Dr. Dietrich is steadfast in the belief that we can prove that we are right through math. As Dr. Dietrich always says, “trust the data.”
In math, unlike the real world, mathematicians set the rules, so they can know everything they need to know in order to be certain what will happen. For example, mathematicians can define what they mean by addition, and then prove that if they add b + a, they will always get the same as a + b. It is said that, “Since we can do it, we should take advantage of the possibility. Truth is rare enough to value highly.”
What is Learning Theory and Why Does It Matter?
Computational Learning Theory (we can also refer to it as Learning Theory) in AI is similar in application to proofs in mathematics and as such it is extremely valuable in identifying the rigor with which AI experts explore and implement their work.
Renowned Russian mathematician, Dr. Vladimir Arnold was a popularizer of mathematics. Through his lectures, seminars, and as the author of several textbooks and popular mathematics books, he influenced many mathematicians and physicists.
According to Dr. Arnold, “Proofs are to mathematics what spelling (or even calligraphy) is to poetry. Mathematical works do consist of proofs, just as poems do consist of characters.”
We start taking candidates seriously well before their PhD. That is not to minimize in any way the rigor associated with a PhD, in fact we prefer seasoned PhDs as candidates as do all companies, however we are also aware of market conditions in finding, nurturing and building a team of accomplished team members.
For example, here is a sample of coursework for one year of Masters Degree Math from an unidentified Ivy League school.
- MATH 500a/380a Modern Algebra I and II
- MATH 520a/320a Measure Theory and Integration
- MATH 544a Introduction to Algebraic Topology I
- MATH 573a/373a Algebraic Number
- MATH 620a/420a, Introduction to Ergodic Theory
- MATH 650a, Introduction to Categorification
- MATH 683a, Categories of Representations
- MATH 710a/AMth 710a, Harmonic Analysis and Applications
- MATH 739a, Random Structures & Algorithms
- MATH 841a, K-3 Surfaces
- MATH 991a/CS 991a Ethical Conduct of Research
- MATH 515b/381b, Intermediate Complex Analysis
- MATH 525b/325b, Introduction to Functional Analysis
- MATH 608b, Introduction to Arithmetic Geometry
- MATH 619b, Foundations of Algebraic Geometry
- MATH 620b, Homogenous Dynamics & Number Theory
- MATH 624b, Topics in Dynamics
- MATH 645b, High Dimensional Expanders
- MATH 665b, Tropical Brill-Noether Theory
- MATH 701b, Topics in Analysis
- MATH 738b, Introduction to Random Structures
- MATH 741b, Selected Topics in Random Matrix Theory
- MATH 765b/AMth 775b, Integral Equations & Fast Algorithms
- MATH 822b, Introduction to Geometric Group Theory
- MATH 830b, Introduction to Differential Geometry
- MATH 845b/440b, Introduction to Algebraic Geometry
- MATH 868b, Spectral Geometry
Not a hint of fluff, and concluding at 900 level courses, this type of preparation will enable candidates to be well prepared to assume leadership roles in AI after a training period and acclimation to the fundamentals of our work.
Although the the “data science” chasm is considerably lessened for candidates at the point of an IVY Masters Degree, Math curriculums still can sometimes lack providing all of the skills one needs to go forth into a data science career. If there is one area where many see a significant dearth in skillset it is in the areas of programming and statistical experience. Tim Hooper, recently made it clear that although few would argue the fact that programming skills are crucial for data scientists, there is a fundamental need to understand the process of handling, managing and manipulating data is a structured fashion particularly since the use of C#, R, Matlab, SPSS or Python is a salve to solving complex data challenges rather than critical tools in the arsenal of math students in college.
Descriptive and Inferential Statistics at the College Level
Descriptive and inferential statistics each give different insights into the nature of the data gathered. One statistical approach alone cannot give the whole picture. Together, they provide a powerful tool for both description and prediction.Sometimes there is also a challenge given that math training has a lack of statistics courses, and supplanting that with a few SAS, R or SPSS sessions does not quite satisfy. Mathematical statistics is valuable in picking up machine learning, although inferential statistical analysis can sometimes be missing entirely.
Descriptive Statistics refers to a discipline that quantitatively “describes” the important characteristics of a dataset. In describing these properties, it uses measures of central tendency, i.e. mean, median, mode and the measures of dispersion i.e. range, standard deviation, quartile deviation and variance, etc. The dataset is summarised with the help of numerical and graphical tools such as charts, tables, graphs, etc., to represent data in an accurate way and text is presented in support of any diagrams, to actually “explain” what they represent.
With inferential statistics, we are trying to reach conclusions that extend beyond the immediate data alone. Thus, we rely on inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what’s going on in our data. Many data teams are interested in questions of causal inference and design and analysis of experiments; some would make these essential skills for a data scientist.
What is the Cause?
In Econometrics, the Multiple Linear Regression Model is a key tool. It is applied when we want to predict the value of a variable (the defendant variable aka target) based on the value of two or more other variables. In Data Science we rely on Causal Inference to prevent data scientists from saying incorrect things and making recommendations that could be problematic. (See the theme emerging here between proofs and learning theory?)
For example, in 2010 Jennifer Hill of NYU made the case that Causal Inference can both harm and hurt. Big Data has the potential to be helpful (particularly if we can use it to measure more and better) but it can also actually exacerbate problems. “If we have a poor design/use inappropriate methods/don’t think hard about the assumptions, more data will simply make us more confident about our incorrect inference Moreover, machine learning, also a cornerstone of data science, is not a subject most Math majors would have defined until after they are finished with math coursework.” Essentially, instruction on effectively modeling real world problems if absent from many math programs can create challenges in expertise.
The Importance of Learning Theory
The use of Learning Theory with regard to ML is critical to success in the space. Why ? Understanding the proper application of an algorithm, which algorithm or algorithms to use, or more broadly, when to, for example use or combine Supervised, Unsupervised or Reinforced Learning is critical. Dr. Andrew Ng, the esteemed AI expert often tells of meeting with data scientists only to find that a mathematical approach based on a specific algorithm they have been working on for over six months should have been “obviously wrong” to them from the start. Further, the proper size of training data, when to add, when no further data is required is all based on Machine Learning Theory. It is as Dr. Ng suggests, the difference between a carpenter at school and the skills of a “Master Carpenter”.
Considering learning theory when designing an algorithm has a few important effects in practice:
1. It can help make your algorithm behave right at a crude level of analysis, leaving finer details to tuning or common sense. A good example might be the Isomap, where the algorithm is informed by the analysis yielding substantial improvements in sample complexity over earlier algorithmic ideas.
2. An algorithm with learning theory considered in it’s design can be more automatic. Consider Rifkin’s claim: that the one-against-all reduction, when tuned well, can often perform as well as other approaches. The “when tuned well” caveat is however substantial, because learning algorithms may be applied by nonexperts or by other algorithms which are computationally constrained. A reasonable and worthwhile hope for other methods of addressing multiclass problems is that they are more automatic and computationally faster. The subtle issue here is: How do you measure “more automatic”?
Yet, for those who are non-believers of Learning Theory, consider that perhaps learning theory is most useful in it’s crudest forms. A good example comes in the architecting problem: how do you go about solving a learning problem? This is in the broadest sense imaginable:
1. Is it a learning problem or not? Many problems are most easily solved via other means such as engineering, because that’s easier, because there is a severe data gathering problem, or because there is so much data that memorization works fine. Learning theory such as statistical bounds and online learning with experts helps substantially here because it provides guidelines about what is possible to learn and what not.
2. What type of learning problem is it? Is it a problem where exploration is required or not? Is it a structured learning problem? A multitask learning problem? A cost sensitive learning problem? Are you interested in the median or the mean? Is active learning useable or not? Online or not? Answering these questions correctly can easily make a difference between a successful application and not. Answering these questions is partly definition checking, and since the answer is often “all of the above”, figuring out which aspect of the problem to address first or next is helpful.
3. What is the right learning algorithm to use? Here the relative capacity of a learning algorithm and it’s computational efficiency are most important. If you have few features and many examples, a nonlinear algorithm with more representational capacity is a good idea. If you have many features and little data, linear representations or even exponentiated gradient style algorithms are important. If you have very large amounts of data, the most scalable algorithms (so far) use a linear representation. If you have little data and few features, a Bayesian approach may be your only option. Learning theory can help in all of the above by quantifying “many”, “little”, “most”, and “few”. How do you deal with the overfitting problem? One thing I realized recently is that the overfitting problem can be a concern even with very large natural datasets, because some examples are naturally more important than others.
Given this description of how traditional Mathematics schooling may leave candidates unprepared for a career in data science, one might ask how many who graduate with a Math degree can assume roles directly engaged in data science. Below an outline of reasons and suggestions offers a possible career path.
The Good News for Math and Physics Majors Trying to Build Artificial Intelligence Careers
First, many of the underpinnings of data science exist in math study.
Mathematics is closely aligned with machine learning as a result of statistics, data, and data management.
Knowing mathematics provides a strong correlation with success in terms of helping the learner to more quickly grasp each of these fields. Linear Algebra, Calculus for Data Science and Gradient Descent are all significant data science underpinnings.
Of all the possible strategies and opportunities for a math or physics student to thrive in AI and Machine Learning is the power of your creativity, curiosity and the alignment and partnership with your colleagues as they give you some time (6-12 months) to bone up.
An interest in the computer sciences and curiosity with regard to solving problems plays a key role in career success. Most of those with math and physics background exhibit this zeal. Those eager to learn something, anything new about computer programming, allows programming skill development (it doesn’t matter for what purpose – ex: building a platform in Hadoop or working with SQL, Shogun, C#, Scikit, etc.
Building some experience in Matlab, Octave, Scilab, etc is another sure way to become better exposed as something as complex as building code for ICA (Independent Component Analysis) can be handled in only a very few lines of code.
I have met many very successful ML professionals who have developed their skills by self-learning, studying hard and applying their innate scientific skills to apply ML algorithms. Also, Matlab can get things done very quickly. ICA (ICA is a technique to separate linearly mixed sources) can be accomplished very quickly in spite of the significant work that would go into coding such analysis initially. One person I know who has a strong background in Math and Physics is a team leader at Goldman Sachs, having locked himself away for close to six months only to come out a darn good applied data scientist.
A huge opportunity can arise from having employers who teach and give new team members the opportunity to learn on their own.
Given the challenges of finding talented ML expertise, companies are developing a unique pedagogical approach to minting new ML talent by teaching and allowing self-knowledge. We can’t overvalue the participation in the data science community on social media platforms where you can have the ear of some of data science’s brightest minds. Here you can build a peer network that can find solutions to problems and perhaps even your next job!
So, what is the wind up?
For those hiring data scientists, we recognize that mathematics as taught might not be the same mathematics we use daily within our teams. Quite frankly, we expect this. Plenty of people with a Masters or PhD in mathematics would be unable to define Polynomial Regression, Elastic Net Regression, or Lasso Regression. They may be able to read academic papers and understand difficult (even if new) mathematical more quickly than a computer scientist or social scientist. Given enough practice and training, they will most likely be excellent programmers.
Any pedagogical approach to advancing one’s Machine Learning knowlege must be based on the adoption of Machine Learning Theory principals. Without the application of learning theory companies will not engage.
We would prefer to have our team members adopt a “first-time right” approach to the application of algorithms, statistical models and data to solve complex problems as we serve our clients.
Not all Math majors need to be “math forever”
As Mark Twain said, “It takes a thousand men to invent a telegraph and the last man gets the credit and we forget the others.”
“One of the most painful aspects of mathematics and physics is seeing students damaged by the cult of genius. That cult tells students that it’s not worth doing math or physics unless you’re the best at it – because those special few are the only ones whose contributions really count. We don’t treat any other subject that way.”
According to From the Wrong Way to Treat Child Geniuses by Jordan Ellenberg, a professor of mathematics at the University of Wisconsin, Madison, “One of the most painful aspects of mathematics and physics is seeing students damaged by the cult of genius. That cult tells students that it’s not worth doing math or physics unless you’re the best at it – because those special few are the only ones whose contributions really count. We don’t treat any other subject that way.”
I’ve never heard a student say, “I like ‘Hamlet,’ but I don’t really belong in AP English – that child who sits in the front row knows half the plays by heart, and he started reading Shakespeare when he was 7!” Basketball players don’t quit just because one of their teammates outshines them. But I see promising young mathematicians quit every year because someone in their range of vision is “ahead” of them.”
This awareness for us is critical to the ability to move talented math and physics students toward an impactful and rewarding career in AI.
How Math and Physics Majors Can Build Artificial Intelligence Careers
For those studying math or physics, recognize that the field you love, in its formal sense, may create an amazing future career-path.
Consider taking computer science classes (e.g. R programming, Reproducible Research, Natural Language Processing, ML, etc.) and statistics classes (e.g. statistical inference, design, data analysis, data mining, data quality, data stewardship, data integrity, autonomous agents). For both students and graduates, recognize your math knowledge becomes very marketable when and in fact adheres to a sort of exponential notation as you layer skill upon skill upon formal training.
I would like to express sincere thanks to T.D. Hooper who was integral to the creation of the content and ideas for this piece.
Like this article? Subscribe to our weekly newsletter to never miss out!