predictions – Dataconomy https://dataconomy.ru Bridging the gap between technology and business Thu, 02 Jun 2022 15:20:46 +0000 en-US hourly 1 https://dataconomy.ru/wp-content/uploads/2022/12/DC-logo-emblem_multicolor-75x75.png predictions – Dataconomy https://dataconomy.ru 32 32 ML models’ estimates may be less accurate for disadvantaged subgroups, an MIT study finds https://dataconomy.ru/2022/06/02/explanation-methods-machine-learning/ https://dataconomy.ru/2022/06/02/explanation-methods-machine-learning/#respond Thu, 02 Jun 2022 15:20:45 +0000 https://dataconomy.ru/?p=24694 Machine learning (ML) may assist human decision-makers in high-risk situations. For example, an ML model might forecast which law school applicants have the greatest chances of passing the bar exam and help an admissions officer choose which students should be admitted. Some even found out that ML systems could detect deadly earthquakes swiftly. MIT researchers […]]]>

Machine learning (ML) may assist human decision-makers in high-risk situations. For example, an ML model might forecast which law school applicants have the greatest chances of passing the bar exam and help an admissions officer choose which students should be admitted. Some even found out that ML systems could detect deadly earthquakes swiftly.

MIT researchers examined explanation methods

These models typically have millions of variables, so how they come to conclusions is nearly impossible for researchers to fully comprehend, much alone an admissions officer with no machine-learning expertise. Researchers sometimes use explanatory methods that mimic a larger model by generating simple approximations of its predictions. These approximations, which are considerably simpler to understand, aid in deciding whether or not to trust the model’s predictions.

Is it acceptable for researchers to employ these explanation approaches? If explanation methods produce better approximations for males than women or white people than black people, it may persuade some users to trust the model’s predictions while others do not.

Explanation methods that mimic a larger model by generating simple approximations of machine learning predictions are sometimes used by researchers.
Researchers discovered that these explanations’ approximation quality varies considerably across subgroups and is often bad for minorities.

Some widely used explanation methods were examined closely by MIT researchers. They discovered that these explanations’ approximation quality varies considerably across subgroups and is often bad for minorities.

In practice, this implies that if the approximation accuracy of female candidates is lower than that of males, there may be an inconsistency between the explanation methods and the model’s estimates.

When the MIT researchers learned how broad these inequality gaps are, they attempted various methods to level the playing field. They could narrow some of them but not eliminate them.

“What this means in the real world is that people might incorrectly trust predictions more for some subgroups than for others. So, improving explanation methods is important, but communicating the details of these models to end-users is equally important. These gaps exist, so users may want to adjust their expectations as to what they are getting when they use these explanations,” explained Aparna Balagopalan, the lead author of the study and a graduate student in the Healthy ML group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

Explanation methods that mimic a larger model by generating simple approximations of machine learning predictions are sometimes used by researchers.
When the MIT researchers learned how broad these inequality gaps are, they attempted various methods to level the playing field.

The study’s authors include Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group at UMass Amherst; graduate students Haoran Zhang and Kimia Hamidieh from CSAIL; postdoc Thomas Hartvigsen from CSAIL; Frank Rudzicz, an associate professor of computer science at the University of Toronto; and senior author Balagopalan. The work will be presented at the ACM Conference on Fairness, Accountability, and Transparency.

Can we fully trust ML models?

Simplified explanation methods can produce predictions of a more complicated machine-learning model in a way that humans can comprehend. The fidelity property, which measures how well the simplified explanation methods match the larger model’s forecasts, is maximized by an effective explanation mechanism.

The MIT researchers instead focused on average fidelity for the entire explanation model while looking at fidelity for specific subsets of individuals in the data set. The fidelity should be very similar between each gender in a dataset with males and females and both groups having fidelity close to that of the overall explanation model.

“When you are just looking at the average fidelity across all instances, you might be missing out on artifacts that could exist in the explanation model,” said Balagopalan.

Explanation methods that mimic a larger model by generating simple approximations of machine learning predictions are sometimes used by researchers.
The MIT researchers instead focused on average fidelity for the entire explanation model while looking at fidelity for specific subsets of individuals in the data set.

Two fidelity gaps metrics were created to measure the differences in fidelity between subgroups. The difference between the overall explanatory model’s fidelity and the worst-performing group’s fidelity is one. The second compares all conceivable pairs of subsets to each other, then averages the results.

They used two types of explanation methods trained on four real-world data sets for high-stakes situations like predicting whether a patient dies in the ICU, whether a defendant reoffends, or whether a law school applicant will pass the bar exam to search for fidelity gaps. The sex and race of each individual were included in each dataset. Protected attributes are characteristics that may not be utilized in decisions because of legislation or company rules. Depending on the particular decision requirement, these terms might have various meanings.

According to their findings, the researchers discovered clear fidelity gaps for all datasets and explanation methods. In certain situations, the fidelity of disadvantaged individuals was considerably lower, with one example showing a difference of 21%. The law school dataset had a gap in fidelity between race subgroups of 7%, implying that some approximation errors were made 7% more frequently on average. For example, if there are 10,000 applicants from these subgroups in the data set, approximately a large portion may be rejected incorrectly owing to approximations that are wrong by 7%.

“I was surprised by how pervasive these fidelity gaps are in all the datasets we evaluated. It is hard to overemphasize how commonly explanations are used as a ‘fix’ for black-box machine-learning models. In this paper, we are showing that the explanation methods themselves are imperfect approximations that may be worse for some subgroups,” explained Ghassemi.

Explanation methods that mimic a larger model by generating simple approximations of machine learning predictions are sometimes used by researchers.
It is hard to overemphasize how commonly explanations are used as a ‘fix’ for black-box machine-learning models.

The researchers then tried several machine-learning techniques to close the fidelity gaps. They trained the explanation methods to recognize regions in a dataset that might be prone to low fidelity and then concentrate on those samples. They also experimented with balanced datasets that included an equal number of items from all subgroups.

These training methods reduced some fidelity gaps, but they didn’t eliminate them. The researchers then modified the explanation methods to see why fidelity disparities arise in the first place. Their study revealed that a theory model, even if group labels are hidden, may indirectly use protected group data like sex or race from the dataset and cause fidelity gaps.

Because they want to delve deeper into this problem in the future, they’ve started a new research project. They also intend to look at the consequences of fidelity gaps in the real world when making decisions. Balagopalan is pleased that concurrent work by an independent lab has arrived at similar findings, illustrating how critical it is to understand this issue thoroughly.

“Choose the explanation model carefully. But even more importantly, think carefully about the goals of using an explanation model and who it eventually affects,” added Balagopalan. By the way, undetectable backdoors can be implemented in any ML algorithm, so it is important to address the issues regarding machine learning for the well-being and security of everyone.

]]>
https://dataconomy.ru/2022/06/02/explanation-methods-machine-learning/feed/ 0
5 Big Data Disruptions Coming in 2022 https://dataconomy.ru/2021/12/13/5-big-data-disruptions-coming-2022/ https://dataconomy.ru/2021/12/13/5-big-data-disruptions-coming-2022/#respond Mon, 13 Dec 2021 11:36:12 +0000 https://dataconomy.ru/?p=22412 Big data has already transformed how many industries operate. Now that the pandemic has accelerated digital transformation around the globe, the field has grown faster than most could have predicted. This unprecedented growth will undoubtedly bring considerable disruption in 2022. Big data will disrupt industries further as new challenges and opportunities arise in the upcoming […]]]>

Big data has already transformed how many industries operate. Now that the pandemic has accelerated digital transformation around the globe, the field has grown faster than most could have predicted. This unprecedented growth will undoubtedly bring considerable disruption in 2022.

Big data will disrupt industries further as new challenges and opportunities arise in the upcoming year. Here are five of the most significant changes professionals can expect in 2022.

1. Big Data Becomes a Matter of Foreign Policy

Governments will regulate big data more closely as it becomes a larger industry. This trend has already begun to take shape with laws like the GDPR and China’s Data Security Law, but government interest will expand in 2022. China’s recently announced plan to triple its big data industry by 2025 is a sign of things to come.

Big data will become a foreign policy issue as more governments take steps to regulate the industry and support their local sectors. Nations may start to draw lines and issue digital trade restrictions relating to the industry. Operations will have to navigate increasingly complex regulatory issues as a result.

2. Big Data Optimizes Recruiting and Training

Businesses in 2022 will apply big data more heavily to recruitment amid widespread worker shortages. The Marine Corps has announced it will use big data to match recruits to roles where they’re best suited. Other organizations will likely employ similar tactics as capitalizing on available workers becomes more important.

Passive candidates make up 70% of the workforce, and big data analytics can help companies recruit workers they wouldn’t find otherwise. Similarly, organizations will use information to personalize training programs and maximize their staff’s potential. These operations will help mitigate worker shortages and boost productivity.

3. Real-Time Analytics Sustains E-Commerce

Another big data application that will grow in 2022 is real-time analytics, specifically in e-commerce. It has skyrocketed throughout the pandemic, and brands will have to capitalize on big data to make the most of it. Real-time analytics can help online stores market more effectively and optimize shipping routes for greater customer satisfaction.

Map data can already define deliverability polygons, which inform what steps are necessary for deliveries, and real-time analytics can take these further. In 2022, e-commerce delivery routes will update in real-time according to traffic patterns, weather developments, and other factors. Companies and their logistics partners will then reduce expenses and vastly improve efficiency.

4. Data Poisoning Grows More Severe

One of big data’s most significant applications is machine learning. Already, 50% of surveyed companies have implemented ML in at least one function, and that number will only grow. As additional businesses rely heavily on these models, data poisoning will become a more relevant and severe problem.

Data professionals preparing for 2022 must anticipate a wave of data poisoning attacks. Companies must understand these threats to improve security around their machine learning models and data pools. If cybersecurity standards in 2022 don’t adapt to meet these threats, machine learning may cause more harm than good.

5. The Rise of Green Data Centers

As big data demands rise, so will their impact on the environment. With climate change growing increasingly severe, more companies will look for ways to use big data sustainably in 2022. Namely, green data centers and renewable energy facilities will become more popular.

Businesses that transition to green data centers early could gain more loyalty from eco-conscious consumers. Companies may force government pressure to use these facilities in some areas as sustainability becomes a larger focus for world politics.  This transition may cause initial disruption for the industry, but it will ensure success and protect the planet in the long term.

Big Data Is Reaching New Heights in 2022

Big data has already made impressive strides in its relatively short history, and it’s only going to keep growing in 2022. This means there are still many disruptions ahead before the field reaches maturity. Changing technologies, social trends, and legal developments will reshape big data and how companies use it.

These five shifts represent the most significant disruptions that will likely come to big data in 2022. If companies and data professionals can prepare for them now, they can ensure success in the future.

]]>
https://dataconomy.ru/2021/12/13/5-big-data-disruptions-coming-2022/feed/ 0
2016 Outlook on Artificial Intelligence in the Enterprise  https://dataconomy.ru/2016/08/10/2016-outlook-on-artificial-intelligence-in-the-enterprise/ https://dataconomy.ru/2016/08/10/2016-outlook-on-artificial-intelligence-in-the-enterprise/#comments Wed, 10 Aug 2016 08:00:18 +0000 https://dataconomy.ru/?p=16233 Artificial intelligence (AI) has been around for decades, but only recently has it begun to emerge as a viable field with massive commercial opportunity. Why now? The proliferation of data. Data is the fuel that feeds AI and is opening opportunities that we can only have imagined until now. And, large enterprises are waking up […]]]>

Artificial intelligence (AI) has been around for decades, but only recently has it begun to emerge as a viable field with massive commercial opportunity. Why now? The proliferation of data. Data is the fuel that feeds AI and is opening opportunities that we can only have imagined until now. And, large enterprises are waking up to this opportunity.

Findings from our newly published research report, “Outlook on Artificial Intelligence in the Enterprise 2016” indicate major changes as to the perception and use of AI-powered solutions. The survey results clearly speak to the power of the partnership. AI technologies, when combined with human skills, produce results reaching beyond what either group could achieve alone.

The research report, based on a survey of 235 senior business executives from a variety of industries such as healthcare, manufacturing, and financial services, reveals relevant insight like:

AI adoption is imminent – 38 percent of the survey group are using AI technologies, and another 26 percent plan to do so by 2018.

Predictive analytics is dominating the enterprise – While not that surprising that 58 percent of respondents confirmed use of the technology, we think this finding reflects a larger pattern. Given the vast amounts of data required to enable predictive analytics, this finding also points to the growing availability of high-quality data. As companies become more sophisticated at extracting information from their data, we’re going to see transformative shifts in the decision-making process related to operations, product design, and customer service.

The shortage of data science talent continues to affect organizations – 20 percent of respondents named ‘shortage of data science talent’ as the primary barrier to realizing value from their big data technologies. Data scientists don’t scale, and new AI technologies are beginning to emerge that help automate some of their tasks.

Below is an infographic summarizing the findings and you can read the full report.

 

outlook-on-artificial-intelligence-in-the-enterprise-2016-1-638

 

 

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2016/08/10/2016-outlook-on-artificial-intelligence-in-the-enterprise/feed/ 1
How will data science change in 2016? https://dataconomy.ru/2016/01/11/how-will-data-science-change-in-2016/ https://dataconomy.ru/2016/01/11/how-will-data-science-change-in-2016/#comments Mon, 11 Jan 2016 09:30:43 +0000 https://dataconomy.ru/?p=14731 2015 was a good year for data science. A cursory glance at any tech jobs board reveals the sheer breadth of companies looking for data science expertise. Technical terms such as machine learning are slowly entering the public consciousness. Many people still don’t realise how much data science touches their everyday lives, from Amazon recommendations […]]]>

2015 was a good year for data science. A cursory glance at any tech jobs board reveals the sheer breadth of companies looking for data science expertise. Technical terms such as machine learning are slowly entering the public consciousness. Many people still don’t realise how much data science touches their everyday lives, from Amazon recommendations to the algorithms powering their Uber app. With adoption of data science up across most business verticals, it’s natural to wonder how the sector will develop in 2016. My gut feeling is that it will be the year data science proves its worth on a spectacular scale.

Many of the systems in financial institutions are underpinned by data science. Indeed, the financial industry is one of the pioneers of data science techniques. Nevertheless, the adoption of data science has been far from uniform across all banking services. In 2016 I expect this picture to change. Better use of data and personalisation of services will move from the financial markets to retail banking. It will have a profound impact on marketing, customer service and product development.

Atom Bank has already announced its intention to use data models to predict its customers’ needs. It’s worth noting that Atom Bank’s model of prioritising mobile services over bricks and mortar branches is, in the long-term, likely to be adopted by most major banks in the UK. However, such a move will require large scale investment in IT infrastructure, something that is notoriously difficult to get right in financial corporations with bespoke legacy infrastructure.

Data science will inform the best marketing initiatives next year. Targeting has got much more accurate, thanks to a better understanding that collecting the right data goes way beyond an email address and a first name. The personalisation that information from social media platforms enable has opened the door to a huge swathe of new marketing opportunities.

By marrying information from traditional sources and social media, with other dynamic data sets such as weather, economic news, major events, and in-store activity (for retail), ultra-targeted and personalised marketing becomes a reality. The issue of joining the world of in-store marketing and online marketing could finally be solved, much like the difficulties around multi-platform marketing have been largely surmounted.

Underpinning the explosion in mobile advertising and ever more impressive personalisation is the surge in the number of marketers intelligently using data. Indeed, it is this growth in ‘data-savviness’ by marketers that will inform many of the major changes we are likely to see in 2016.

Use of data science within the insurance sector will also continue to take off. The most exciting area of development is the use of wearable technology to better monitor and assess health and wellbeing. Not only will this help to give health insurance companies more useful information, it will also have a growing impact within the HR and recruitment function of some pioneering businesses.

Don’t expect it to be plain sailing for data science in 2016, though: there are a number of head winds. First, a new Safe Harbour agreement seems a long way off. In October, the US Senate passed The Cybersecurity Information Sharing Act. This act should make it easier for US companies to share data with American security agencies. Given that around seven different US security agencies employing thousands of people could access and share this information, the result is to significantly erode online privacy standards in American.

These decisions taken together, along with the Microsoft judgement (more on that later), have created an environment where the US and EU are going in completely different directions on data protection and, by extension, data security standards.

The consequence of this fragmentation is likely to be serious disruption in the free movement of data across the world. For businesses, this means increased restrictions on how they manage and use data, resulting in higher costs both in relation to infrastructure and compliance.

Second, the Microsoft case should reach a conclusion in January. If the Federal Court in the US rules against Microsoft and allows the US Government to access data held in a data centre in the Republic of Ireland, we should expect serious repercussions. Cloud computing businesses will be the most severely affected and a dangerous precedent that other governments could follow could be set. Whatever happens, the case will probably be appealed, so expect this issue to rumble on for the rest of the year (and beyond).

However, these issues are unlikely to derail the strong development of data science over the next twelve months. More businesses are going to undertake innovative applications of data science, strengthening the profession by providing valuable experience and thought-provoking case studies. This will create a virtuous circle, prompting more people to become data scientists, increasing the talent pool and spurring more innovation.

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2016/01/11/how-will-data-science-change-in-2016/feed/ 2
“Being comfortable with ambiguity and successfully framing problems is a great way to differentiate yourself” – Interview with Stitch Fix’s Brad Klingenberg https://dataconomy.ru/2015/12/14/being-comfortable-with-ambiguity-and-successfully-framing-problems-is-a-great-way-to-differentiate-yourself-interview-with-stitich-fixs-brad-klingenberg/ https://dataconomy.ru/2015/12/14/being-comfortable-with-ambiguity-and-successfully-framing-problems-is-a-great-way-to-differentiate-yourself-interview-with-stitich-fixs-brad-klingenberg/#respond Mon, 14 Dec 2015 08:30:40 +0000 https://dataconomy.ru/?p=14561 Brad Klingenberg is the Director of Styling Algorithms at Stitch Fix in San Francisco. His team uses data and algorithms to improve the selection of merchandise sent to clients. Prior to joining Stitch Fix Brad worked with data and predictive analytics at financial and technology companies. He studied applied mathematics at the University of Colorado […]]]>

brad
Brad Klingenberg is the Director of Styling Algorithms at Stitch Fix in San Francisco. His team uses data and algorithms to improve the selection of merchandise sent to clients. Prior to joining Stitch Fix Brad worked with data and predictive analytics at financial and technology companies. He studied applied mathematics at the University of Colorado at Boulder and earned his PhD in Statistics at Stanford University in 2012.

 


What project have you worked on do you wish you could go back to, and do better?

Nearly everything! A common theme would be not taking the framing of a problem for granted. Even seemingly basic questions like how to measure success can have subtleties. As a concrete example, I work at Stitch Fix, an online personal styling service for women. One of the problems that we study is predicting the probability that a client will love an item that we select and send to her. I have definitely tricked myself in the past by trying to optimize a measure of prediction error like AUC.

This is trickier than it seems because there are some sources of variance that are not useful for making recommendations. For example, if I can predict the marginal probability that a given client will love any item then that model may give me a great AUC when making predictions over many clients, because some clients may be more likely to love things than others and the model will capture this. But if the model has no other information it will be useless for making recommendations because it doesn’t even depend on the item. Despite its AUC, such a model is therefore useless for ranking items for a given client. It is important to think carefully about what you are really measuring.

What advice do you have to younger analytics professionals and in particular PhD students in the Sciences and Social Sciences?

Focus on learning the basic tools of applied statistics. It can be tempting to assume that more complicated means better, but you will be well-served by investing time in learning workhorse tools like basic inference, model selection and linear models with their modern extensions. It is very important to be practical. Start with simple things.

Learn enough computer science and software engineering to be able to get things done. Some tools and best practices from engineering, like careful version control, go a long ways. Try to write clean, reusable code. Popular tools in R and Python are great for starting to work with data. Learn about convex optimization so you can fit your own models when you need to – it’s extremely useful to be able to cast statistical estimates as the solution to optimization problems.

Finally, try to get experience framing problems. Talk with colleagues about problems they are solving. What tools did they choose? Why? How should did they measure success? Being comfortable with ambiguity and successfully framing problems is a great way to differentiate yourself. You will get better with experience – try to seek out opportunities.

What do you wish you knew earlier about being a data scientist?

I have always had trouble identifying as a data scientist – almost everything I do with data can be considered applied statistics or (very) basic software engineering. When starting my career I was worried that there must be something more to it – surely, there had to be some magic that I was missing. There’s not. There is no magic. A great majority of what an effective data scientist does comes back to the basic elements of looking at data, framing problems, and designing experiments. Very often the most important part is framing problems and choosing a reasonable model so that you can estimate its parameters or make inferences about them.

How do you respond when you hear the phrase ‘big data’?

I tend to lose interest. It’s a very over-used phrase. Perhaps more importantly I find it to be a poor proxy for problems that are interesting. It can be true that big data brings engineering challenges, but data science is generally made more interesting by having data with high information content rather than by sheer scale. Having lots of data does not necessarily mean that there are interesting questions to answer or that those answers will be important to your business or application. That said, there are some applications like computer vision where it can be important to have a very large amount of data.

What is the most exciting thing about your field?

While “big data” is overhyped, a positive side effect has been an increased awareness of the benefits of learning from data, especially in tech companies. The range of opportunities for data scientists today is very exciting. The abundance of opportunities makes it easier to be picky and to find the problems you are most excited to work on. An important aspect of this is to look in places you might not expect. I work at Stitch Fix, an online personal styling service for women. I never imagined working in women’s apparel, but due to the many interesting problems I get to work on it has been the most exciting work of my career.

How do you go about framing a data problem – in particular, how do you avoid spending too long, how do you manage expectations etc. How do you know what is good enough?

As I mentioned previously, it can be helpful to start framing a problem by thinking about how you would measure success. This will often help you figure out what to focus on. You will also seldom go wrong by starting simple. Even if you eventually find that another approach is more effective a simple model can be a hugely helpful benchmark. This will also help you understand how well you can reasonably expect your ultimate approach to perform. In industry, it is not uncommon to find problems where (1) it is just not worth the effort to do more than something simple, or (2) no plausible method will do well enough to be considered successful. Of course, measuring these trade-offs depends on the context of your problem, but a quick pass with a simple model can often help you make an assessment.

How do you explain to C-level execs the importance of Data Science? How do you deal with the ‘educated selling’ parts of the job? In particular – how does this differ from sports and industry?

It is usually better if you are not the first to evangelize the use of data. That said, data scientists will be most successful if they put themselves in situations where they have value to offer a business. Not all problems that are statistically interesting are important to a business. If you can deliver insights, products or predictions that have the potential to help the business then people will usually listen. Of course this is most effective when the data scientist clearly articulates the problem they are solving and what its impact will be.

The perceived importance of data science is also a critical aspect of choosing where to work – you should ask yourself if the company values what you will be working on and whether data science can really make it better. If this is the case then things will be much easier.

What is the most exciting thing you’ve been working on lately and tell us a bit about it.

I lead the styling algorithms team at Stitch Fix. Among the problems we work on is making recommendations to our stylists, human experts who curate our recommendations for our clients. Making recommendations with humans in the loop is fascinating problem because it introduces an extra layer of feedback – the selections made by our stylists. Combining this feedback with direct feedback from our clients to make better recommendations is an interesting and challenging problem.

What is the biggest challenge of leading a data science team?

Hiring and growing a team are constant challenges, not least because there is not much consensus around what data science even is. In my experience a successful data science team needs people with a variety of skills. Hiring people with a command of applied statistics fundamentals is a key element, but having enough engineering experience and domain knowledge can also be important. At Stitch Fix we are fortunate to partner with a very strong data platform team, and this enables us to handle the engineering work that comes with taking on ever more ambitious problems.

]]>
https://dataconomy.ru/2015/12/14/being-comfortable-with-ambiguity-and-successfully-framing-problems-is-a-great-way-to-differentiate-yourself-interview-with-stitich-fixs-brad-klingenberg/feed/ 0
4 Predictions for Big Data in 2015 from Industry Leaders https://dataconomy.ru/2015/01/12/4-predictions-for-big-data-in-2015-from-industry-leaders/ https://dataconomy.ru/2015/01/12/4-predictions-for-big-data-in-2015-from-industry-leaders/#comments Mon, 12 Jan 2015 14:22:29 +0000 https://dataconomy.ru/?p=11352 2014 was a fantastic year for data science. Funding rounds were huge, the mergers and acquistions space was active all year, data science skills proved to be the hottest of the year. But will data science continue to flourish in 2015? We asked four industry experts- working in AI, big data strategy, Hadoop and data […]]]>

2014 was a fantastic year for data science. Funding rounds were huge, the mergers and acquistions space was active all year, data science skills proved to be the hottest of the year. But will data science continue to flourish in 2015? We asked four industry experts- working in AI, big data strategy, Hadoop and data transformation respectively- to share their thoughts on how big data will progress in 2015.

Kris Hammond1. Data Scientists Not So Sexy in 2015

“In 2015, CEOs will demand more from their data than the elusive “big insight” that data scientists keep promising but haven’t been able to deliver.They will decrease investments in human-powered data science and adopt scalable automation solutions that understand data, unlock insights trapped in it and then provide answers to ongoing problems of understanding performance, logistics, provisioning and HR just to name a few.”

Kris Hammond, Chief Scientist for Narrative Science
Read our interview with Kris here.

1e3d3472. Big Data Goes Mainstream in the Enterprise

In 2014 one of the things that we noticed changing rapidly in Big Data was its increasing enterprise focus. Adoption of open source platforms like Hadoop was originally limited to specific applications within early adopters like ad-tech and global web properties. But today, more and more mainstream companies view Big Data as a must-have. Manufacturing companies, for example, are now able to combine reliability and performance data from the field with testing data from the factory to help design and build better and more profitable products. Expect to see Big Data make major impacts on the competitive landscape in 2015. Companies which effectively embrace and deploy these solutions will expand their market and profit shares at the expense of lagging competitors.

Ron Bodkin, Founder of ThinkBig
Read all of Ron’s predictions here.

John Schroder Big Data 20153. Self-Service Big Data Goes Mainstream

In 2015, IT will embrace self-service Big Data to allow business users self service to big data. Self-service empowers developers, data scientists and data analysts to conduct data exploration directly. Previously, IT would be required to establish centralized data structures. This is a time consuming and expensive step. Hadoop has made the enterprise comfortable with structure-on-read for some use cases. Advanced organizations will move to data bindings on execution and away from a central structure to fulfill ongoing requirements. This self service speeds organizations in their ability to leverage new data sources and respond to opportunities and threats.

John Schroeder, CEO of MapR

Tye Rattenbury Big Data 20154. Data Science Will Belong to the Economists

We will start to see data science (to the extent that it operates as a coherent entity) increasingly rely on the domain expertise of economists. The early days of data science were very math, statistics and programming oriented. Then there was the rise of the “computational social scientist,” which added sociology to the mix.

Many trend setting data science places are finding that sociology, and similar disciplines, tend to be retrospective, while other fields, like economics, offer simulation and auction modeling and other techniques to get more proactive and predictive with data. Of course, most economists don’t have the programming chops to land most data science jobs, but I think we’ll see that start to change significantly.

Tye Rattenbury, Data Scientist at Trifacta & Former Data Scientist at Facebook
Read our interview with Tye here.


(Image credit: “Happy New Year” by Peter Thoeny)

]]>
https://dataconomy.ru/2015/01/12/4-predictions-for-big-data-in-2015-from-industry-leaders/feed/ 4
Predicting Deaths in Game of Thrones With Statistical Modelling https://dataconomy.ru/2014/10/10/predicting-deaths-game-of-thrones/ https://dataconomy.ru/2014/10/10/predicting-deaths-game-of-thrones/#respond Fri, 10 Oct 2014 16:02:25 +0000 https://dataconomy.ru/?p=9772 Who Will Be the Next Character to Die in Game of Thrones? It’s a question that is likely to have been brought up in your social circle at some point in the last couple of years – and a perfect opportunity to arm yourself with predictive analytics. Richard Vale, a statistician at the University of […]]]>

Who Will Be the Next Character to Die in Game of Thrones?

It’s a question that is likely to have been brought up in your social circle at some point in the last couple of years – and a perfect opportunity to arm yourself with predictive analytics.

Richard Vale, a statistician at the University of Canterbury in Christchurch, has created a model that looks at the number of ‘point of view’ chapters from each of the major characters in the series, creating a probability distribution for their continuance in the future novels. Essentially, at what point dear old George is going to write them out of the story in the usual sanguinary manner.

This project is an attempt to encourage other statisticians to test their theories on events yet to happen, rather than simply explaining observed data. Models geared toward prediction are much more valuable in real world applications, which is demonstrated by the swift adoption of the ‘black-box’ machine learning models which often predict more accurately than statistical models.

“A model which explains but does not predict isn’t a model; it’s a religion.”  [source]

Is Jon Snow Dead?

The model indicates that Jon Snow has a 60% chance of surviving in book six, since this is less than his probability of having zero POV chapters. To readers of the series this may seem like an inaccurate probability, and it should be noted that the model doesn’t account for the exact circumstances of the character. Other issues with the model include not being able to take into account the introduction of new characters, and how that impacts the number of POV chapters distributed amongst the existing cast.

So how likely is your favourite character to make it through the next two books? This graph will help to demonstrate:

game-of-thrones

Read the full study here.

(image credt : vagueonthehow)

]]>
https://dataconomy.ru/2014/10/10/predicting-deaths-game-of-thrones/feed/ 0