Data Visualization – Dataconomy https://dataconomy.ru Bridging the gap between technology and business Fri, 18 Aug 2017 15:19:42 +0000 en-US hourly 1 https://dataconomy.ru/wp-content/uploads/2025/01/DC_icon-75x75.png Data Visualization – Dataconomy https://dataconomy.ru 32 32 Securing Competitive Advantage with Machine Learning https://dataconomy.ru/2017/09/18/competitive-advantage-machine-learning/ https://dataconomy.ru/2017/09/18/competitive-advantage-machine-learning/#comments Mon, 18 Sep 2017 08:35:05 +0000 https://dataconomy.ru/?p=18345 Business dynamics are evolving with every passing second. There is no doubt that the competition in today’s business world is much more intense than it was a decade ago. Companies are fighting to hold on to any advantages. Digitalization and the introduction of machine learning into day-to-day business processes have created a prominent structural shift […]]]>

Business dynamics are evolving with every passing second. There is no doubt that the competition in today’s business world is much more intense than it was a decade ago. Companies are fighting to hold on to any advantages.

Digitalization and the introduction of machine learning into day-to-day business processes have created a prominent structural shift in the last decade. The algorithms have continuously improved and developed.

Every idea that has completely transformed our lives was initially met with criticism. Acceptance is always followed by skepticism, and only when the idea becomes reality does the mainstream truly accept it. At first, data integration, data visualization and data analytics were no different.

Incorporating data structures into business processes to reach a valuable conclusion is not a new practice. The methods, however, have continuously improved. Initially, such data was only available to the government, where they used it to make defense strategies. Ever heard of Enigma?

In the modern day, continuous development and improvement in data structures, along with the introduction of open source cloud-based platforms, has made it possible for everyone to access data. The commercialization of data has minimized public criticism and skepticism.

Companies now realize that data is knowledge and knowledge is power. Data is probably the most important asset a company owns. Businesses go to great lengths to obtain more information, improve the processes of data analytics and protect that data from potential theft. This is because nearly anything about a business can be revealed by crunching the right data.

It is impossible to reap the maximum benefit from data integration without incorporating the right kind of data structure. The foundation of a data-driven organization is laid on four pillars. It becomes increasingly difficult for any organization to thrive if it lacks any of the following features.

Here are the four key elements of a comprehensive data management system:

  • Hybrid data management
  • Unified governance
  • Data science and machine learning
  • Data analytics and visualization

Hybrid data management refers to the accessibility and repeated usage of the data. The primary step for incorporating a data-driven structure in your organization is to ensure that the data is available. Then you proceed by bringing all the departments within the business on board. The primary data structure unifies all the individual departments in a company and streamlines the flow of information between those departments.

If there is a communication gap between the departments, it will hinder the flow of information. Mismanagement of communication will result in chaos and havoc instead of increasing the efficiency of business operations.

Initially, strict rules and regulations governed data and restricted people from accessing it. The new form of data governance makes data accessible, but it also ensures security and protection. You can learn more about the new European Union General Data Protection Regulation (GDPR) law and unified data governance over here in Rob Thomas’ GDPR session.

The other two aspects of data management are concerned with data engineering. A spreadsheet full of numbers is of no use if it cannot be tailored to deduce some useful insights about business operations. This requires analytical skills to filter out irrelevant information. There are various visualization technologies that make it possible and easier for people to handle and comprehend data.

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2017/09/18/competitive-advantage-machine-learning/feed/ 1
Confused by data visualization? Here’s how to cope in a world of many features https://dataconomy.ru/2017/05/15/data-visualisation-features/ https://dataconomy.ru/2017/05/15/data-visualisation-features/#respond Mon, 15 May 2017 07:30:02 +0000 https://dataconomy.ru/?p=17889 The late data visionary Hans Rosling mesmerised the world with his work, contributing to a more informed society. Rosling used global health data to paint a stunning picture of how our world is a better place now than it was in the past, bringing hope through data. Now more than ever, data are collected from […]]]>

The late data visionary Hans Rosling mesmerised the world with his work, contributing to a more informed society. Rosling used global health data to paint a stunning picture of how our world is a better place now than it was in the past, bringing hope through data.

Now more than ever, data are collected from every aspect of our lives. From social media and advertising to artificial intelligence and automated systems, understanding and parsing information have become highly valuable skills. But we often overlook the importance of knowing how to communicate data to peers and to the public in an effective, meaningful way.

Confused by data visualization? Here's how to cope in a world of many features
Hans Rosling paved the way for effectively communicating global health data. Vimeo

The first tools that come to mind in considering how to best communicate data – especially statistics – are graphs and scatter plots. These simple visuals help us understand elementary causes and consequences, trends and so on. They are invaluable and have an important role in disseminating knowledge.

Data visualisation can take many other forms, just as data itself can be interpreted in many different ways. It can be used to highlight important achievements, as Bill and Melinda Gates have shown with their annual letters in which their main results and aspirations are creatively displayed.

Everyone has the potential to better explore data sets and provide more thorough, yet simple, representations of facts. But how can do we do this when faced with daunting levels of complex data?

A world of too many features

We can start by breaking the data down. Any data set consists of two main elements: samples and features. The former correspond to individual elements in a group; the latter are the characteristics they share.

Anyone interested in presenting information about a given data set should focus on analysing the relationship between features in that set. This is the key to understanding which factors are most affecting sales, for example, or which elements are responsible for an advertising campaign’s success.

When only a few features are present, data visualisation is straightforward. For instance, the relationship between two features is best understood using a simple scatter plot or bar graph. While not that exciting, these formats can give all the information that system has to offer.

Confused by data visualization? Here's how to cope in a world of many features
Global temperature rise over the years: the relationship between both features is easy to see and conclusions can be quickly drawn. NASA

Data visualisation really comes into play when we seek to analyse a large number of features simultaneously. Imagine you are at a live concert. Consciously or unconsciously, you’re simultaneously taking into account different aspects of it (stagecraft and sound quality, for instance, or melody and lyrics), to decide whether the show is good or not.

This approach, which we use to categorise elements in different groups, is called a classification strategy. And while humans can unconsciously handle many different classification tasks, we might not really be conscious of the features being considered or realise which ones are the most important

Now let’s say you try to rank dozens of concerts from best to worst. That’s more complex. In fact, your task is twofold, as you must first classify a show as good or bad and then put similar concerts together.

Finding the most relevant features

Data visualisation tools enable us to bunch different samples (in this case, concerts) into similar groups and present the differences between them.

Clearly, some features are more important in deciding whether a show is good or not. You might feel an inept singer is more likely to affect concert quality than, say, poor lighting. Figuring out which features impact a given outcome is a good starting point for visualising data.

Imagine that we could transpose live shows onto a huge landscape, one that is generated by the features we were previously considering (sound for instance, or lyrics). In this new terrain, great gigs are played on mountains and poor ones in valleys. We can initially translate this landscape into a two-dimensional map representing a general split between good and bad.

We can then go even further and reshape that map to specify which regions are rocking in “Awesome Guitar Solo Mountain” or belong in “Cringe Valley”.

Confused by data visualization? Here's how to cope in a world of many features
When in a data landscape, look for peaks and valleys

From a technical standpoint, this approach is broadly called dimensionality reduction, where a given data set with too many features (dimensions) can be reduced into a map where only relevant, meaningful information is represented. While a programming background is advantageous, several accessible resources, tutorials and straightforward approaches can help you capitalise on this great tool in a short period of time.

Network analysis and the pursuit of similarity

Finding similarity between samples is another good starting point. Network analysis is a well-known technique that relies on establishing connections between samples (also called nodes). Strong connections between samples indicate a high level of similarity between features.

Once these connections are established, the network rearranges itself so that samples with like characteristics stick together. While before we were considering only the most relevant features of each live show and using that as reference, now all features are assessed simultaneously – similarity is more broadly defined.

Confused by data visualization? Here's how to cope in a world of many features
Networks show a highly connected yet well-defined world.

The amount of information that can be visualised with networks is akin to dimensionality reduction, but the feature assessment aspect is now different. Whereas previously samples would be grouped based on a few specific marking features, in this tool samples that share many features stick together. That leaves it up to users to choose their approach based on their goals.

Venturing into network analysis is easier than undertaking dimensionality reduction, since usually a high level of programming skills is not required. Widely available user-friendly software and tutorials allow people new to data visualisation to explore several aspects of network science.

The world of data visualisation is vast and it goes way beyond what has been introduced here, but those who actually reap its benefits, garnering new insights and becoming agents of positive and efficient change, are few. In an age of overwhelming information, knowing how to communicate data can make a difference – and it can help keep data’s relevance in check.

 

Like this article? Subscribe to our weekly newsletter to never miss out!

This article was originally published on The Conversation. Read the original article.

Image: KamiPhuc/Flickr, CC BY-SA

 

]]>
https://dataconomy.ru/2017/05/15/data-visualisation-features/feed/ 0
Three Mistakes that Set Data Scientists up for Failure https://dataconomy.ru/2017/05/10/three-mistakes-data-scientist-failure/ https://dataconomy.ru/2017/05/10/three-mistakes-data-scientist-failure/#respond Wed, 10 May 2017 08:35:38 +0000 https://dataconomy.ru/?p=17881 The rise of the data scientists continues and social media is filled with success stories – but what about those who fail? There are no cover articles praising the failures of the many data scientists that don’t live up to the hype and don’t meet the needs of their stakeholders. The job of the data […]]]>

The rise of the data scientists continues and social media is filled with success stories – but what about those who fail? There are no cover articles praising the failures of the many data scientists that don’t live up to the hype and don’t meet the needs of their stakeholders.

The job of the data scientist is solving problems. And some data scientists can’t solve them. They either don’t know how to, or are obsessed about the technology part of the craft and forget what the job is all about. Some get frustrated that “those business people” are asking them to do “simple trivial data tasks” while they’re working on something really important and complex”. There are many ways a data scientist can fail – here’s a summary of top three mistakes that is a straight path towards failure.

Mistake #1 – Less communication is better

What I have seen ingreat data scientists is that they are communicators first and data geeks second. A very common mistake that data scientists make is avoiding business people at all costs. This means that they try to maintain a minimal amount of interactions with them in order to go back and do “cool geek stuff”. Now I really like the geeky part of work, I do. That’s why I got into the field in the first place. But we are hired to solve problems and without communication those problems won’t be solved. Data scientists must follow up on the progress of their data analysis and collect feedback from their peers all the time, especially when they don’t find anything peculiar – maybe that’s good news? Not only collecting feedback is important but also adjusting the analysis and assumptions based on the feedback. This is the “science” in the “data science” – scientific method is founded on the principle of redefining hypothesis based on new data. And the only way to collect and interpret new data is by communicating with your stakeholders who have defined the hypothesis in the first place!

Mistake #2 – Delaying simple data requests from business teams

This is a golden one – simple data requests drive data scientists crazy (“it’s just 30 lines of SQL code, yuck!”). And this is where they fail. While it might be very simple for a data scientist – the data might just have become available and it might solve years’ worth of a problem. But the data scientist tends to think like an engineer (“trust me, I’m an engineer”) and tries to build scalable architectures to support long-term solutions. But – the business doesn’t care about the architectures, scale, engineering – they only care about the insights, actionable insights. If you’re not providing them – you fail in their eyes. And, well – they do the sales, so their decisions matter. If you don’t help improving those decisions – you’re just a sunk cost and finance theory has some pretty rough advice how to deal with it. Don’t ignore the simple requests. First make sure they support a decision and that decision will improve the business if it has the data – and when you do, swallow your pride and run those trivial 30 lines of SQL code – you’ll turn to a high ROI unit instead of a sunk cost.

Mistake #3 – Preference for a complex solution over an easy one

Very costly mistake. It’s actually a whole mantra that’s been built around the data scientist occupation. Depictions of data scientists as ultimate geniuses who can code, do math and statistics, and understand business better than most has done a big disservice to the profession. The expectation becomes a perverse one – the data scientists think that they need to solve the problems by applying the top-of-the-line statistical and computer science methods. Ultimately you get to a situation where the junior data scientists think that everything can be solved with deep learning and don’t know how to explore data because the industry sold the complexity obsession to them. Basic data exploration and visualization are the main tools for a data scientist and you will spend most of your time exploring data. Not building machine learning models – unless you’re hired to exclusively do so. Not building back-end architectures that scale. Not writing a 10-page in-depth hypothesis testing research for a simple business question. Unless you’re hired for that or were specifically asked to do that. Your main role is discovering actionable insights and sharing them as recommendations with your stakeholders.

Don’t over-complicate the already overly complex field with too many superstitions.The most typical situation showcasing this mistake is when the data scientists want to apply machine learning everywhere, for every use case, every project. This not only slows down the delivery of the desired output but in many cases a machine learning model is not required at all! As I’ve explained earlier – the core work of a data scientist is to solve problems; not to apply and use every shiny new tool that’s out there.

64541955

So how do I succeed as a data scientist?

As with every field there are many ways so succeed and fail – and many mistakes need to be made to understand which are which – but the fundamental lessons can be learned without trial-and-error. What’s utmost important is being passionate about the problems and building solutions for your stakeholders instead of obsessing over tools and geeky stuff. Unless your role is an engineering one where you are not required to interact with other human beings, you will have to deal with human-to-human communication and run very simple – trivial, in your mind! – code that delivers a non-attractive 3×3 data table. But sometimes the simple is better, and it’s all that is needed – “everything should be made as simple as possible, but not simpler” as one pretty famous scientist Albert Einstein once said.

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2017/05/10/three-mistakes-data-scientist-failure/feed/ 0
Big Data for Humans: The Importance of Data Visualization https://dataconomy.ru/2017/05/08/big-data-data-visualization/ https://dataconomy.ru/2017/05/08/big-data-data-visualization/#comments Mon, 08 May 2017 07:30:02 +0000 https://dataconomy.ru/?p=17878 Everyone has heard the old moniker garbage in – garbage out. It is a simple way of saying that machine learning is only as good as the data, algorithms, and human experience that goes into them. But even the best results can be thought of as garbage if no one can see and understand the […]]]>

Everyone has heard the old moniker garbage in – garbage out. It is a simple way of saying that machine learning is only as good as the data, algorithms, and human experience that goes into them. But even the best results can be thought of as garbage if no one can see and understand the value of the output.

That’s where the importance of visualization comes in. Visualization is the means by which humans understand complex analytics and is often the most crucial and overlooked step in the analytics process. As you increase the complexity of your data, the complexity of your final model increases as well, making effective communication and visualization of data even more difficult and critical to end users.  

Data Visualization is the key to actionable insights

Visualization allows you to take your complex findings and present them in a way that is informative and engaging to all stakeholders – and a strong understanding of data science is required for that visualization to be successful.

We must all remember that in the end, the consumer of the product of all artificial intelligence or machine learning endeavors will be people. We should ensure results are delivered as actionable, impactful insights to act upon in business and in life. The human brain is only able to process two to three pieces of information at a time and many different aspects of consumer behavior are influenced by more than just two or three events. This means you have to utilize advanced analytics and statistical modeling to accurately predict consumer behavior and KPIs for businesses.

Every Data team needs a talented communicator

The first thing many companies focus on when starting a department or initiative for big data is either the actual data or the talent needed to analyze that data. Most data scientists will tell you the more data they have the better the model, and that often becomes the main focus. A skilled data scientist can assist with this process, but you also need someone who has domain knowledge of your business and the ability to effectively communicate information back to end users.  

The amount of data available to businesses and consumers can often be overwhelming, and it is only continuing to increase, which makes finding accurate, granular, and relevant data through the clutter more difficult and important.

The weather industry is a great example for effective use of big data. Weather models utilize a vast amount of data and the final forecast a consumer receives is often the result of several models. Forecasts for weather and businesses are becoming increasingly complex, so being able to take a model output and deliver that information in a fashion that audiences can understand and quickly act upon is necessary for success.

However, once you have those results, how do you explain them? When you have 20 different components going into a model with interactions, lags, and non-linear relationships, how do you explain that to a user in a way that makes it easy for them to act upon? 

The impact of integrating multiple sources of data

Weather shows how one data source can be used for multiple purposes. Weather influences what beer people drink, what music they listen to, how many steps they take, and their drive time to work – in other words, virtually every part of their day. Quantifying and communicating how weather impacts people in their daily lives is where visualization comes in.

Through the right graphics, a user can quickly ingest multiple pieces of complex information. For weather this is especially important which because it is highly dependent on geography. Each climate zone has different weather events and different reactions to weather. We know that six inches of snow in Chicago will have a much different impact than six inches of snow in Dallas, Texas, but what happens when we’re looking at hundreds or thousands of locations? How do you properly communicate those complex relationships? This is where having capabilities in GIS is key. The quality and granularity of your data influences the accuracy of model outputs and the relevancy of the end results—regardless of how compelling the visuals are.

You can create the most complex and accurate forecasts possible, but those solutions also have to be scalable to a large audience and available through multiple delivery channels. Taking in multiple data sources across the world and serving it up to a large global audience in a way that is responsive and accurate takes the right skills and resources.

When you think about the impact this new age of analytics will have and what it could do for your business, remember that it is the smart people involved at all levels of the process that will help deliver the insights you and your customers need to make informed decisions that will impact your life and bottom line

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2017/05/08/big-data-data-visualization/feed/ 6
6 Ways Business Intelligence is Going to Change in 2017 https://dataconomy.ru/2017/02/06/6-ways-business-intelligence-changes/ https://dataconomy.ru/2017/02/06/6-ways-business-intelligence-changes/#comments Mon, 06 Feb 2017 09:00:28 +0000 https://dataconomy.ru/?p=17330 Data-driven businesses are five times more likely to make faster decisions than their market peers, and twice as likely to land in the top quartile of financial performance within their industries. Business Intelligence, previously known as data mining combined with analytical processing and reporting, is changing how organizations move forward. Since decisions based on evidence […]]]>

Data-driven businesses are five times more likely to make faster decisions than their market peers, and twice as likely to land in the top quartile of financial performance within their industries. Business Intelligence, previously known as data mining combined with analytical processing and reporting, is changing how organizations move forward.

Since decisions based on evidence are entirely more reliable than decisions based on instinct, assumptions or perceptions, it’s become clear that success is now cultivated from analyzing relevant data and letting the conclusions of that data drive the direction of the company.

Although it’s become crystal clear that data-driven strategy is the way to go, up until recently, access to sophisticated Business Intelligence tools has been restricted to large enterprises and enterprise-level solutions. Only the industry giants have benefitted from sophisticated analytics due to the considerable investment required not only to collect the data, but also to maintain an in-house data scientist to translate it into usable information.

But 2017 is the year of change.

SMBs are desperate to take advantage of the same analytics as the big players, and are therefore demanding an alternative – self-sufficient Business Intelligence tools. More than half of business users and analysts are projected to have access to self-service Business Intelligence tools this year. According to Gartner’s Research Vice President Rita Sallam, BI is rapidly transitioning from “IT-led, system-of-record reporting to pervasive, business-led, self-service analytics.”

Thanks to more intuitive interfaces, increasingly intelligent data preparation tools, improved integrations and a distinctly lower price tag, 2017 is the year SMBs become empowered to become their own data scientists. Here’s what to expect this year:

  • Affordable Access

The good news for SMBs is that complex data analytics is becoming more cost effective, and as a result, considerably more accessible. In 2017, we expect to see the trend continue to grow as more players enter the market. The wave of new, self-service BI tools allows SMBs to gather, analyze and interpret data, draw detailed analytics and discern trends, filter useful information from the raw data and automate data mining for quicker turnaround.

  • Smarter Integration

BI innovations are becoming more widely available through a variety of integrations into messaging services and IoT. Sisense, for example, is rolling out voice-activated BI interfaces for Amazon Alexa, chatbots and connected lightbulbs. “Our entire focus is on simplifying complex data,” CEO Amir Orad comments.

Alexa handles the natural language processing of voice-to-text, and then “understands” the question it was asked. It passes the information over to Sisense, which parses the text to find out what it means, and then delivers an answer. The plug-and-play approach allows complex analytics to leverage these new interfaces and platforms, as evidenced by the use of Alexa for natural language processing.

Welcome to the new BI: it’s on-demand.

  • Simplified Analytics

The commoditization of Business Intelligence platforms has evolved to the point where enterprises are no longer required to possess sophisticated analysis skills to process and utilize raw data. For example, both Tableau and Domo provide comprehensive suites of layman-accessible services from back-end number crunching to front-end visualization. Users can simply drag-and-drop to pull data from multiple sources and link up data fields, creating interactive dashboards to help with visualization.

  • Cloud Based Data

In years past, BI analytics required processing huge amounts of data stored on company servers. Given the sheer data volume, the trend is to embrace distributed infrastructures in cloud-based BI solutions. It’s a cycle: the enormous amount of data collected on a daily basis has spurred a demand for more data storage mediums, causing the price of bandwidth and storage-per-gigabyte to fall to historical lows, thus encouraging increased usage. The widespread adoption of cloud platforms for data warehousing underwrites the push toward SMB BI self-sufficiency.

  • Evolved Visualization

Expect analytics to become more “in your face”. While data visualization has always enabled decision makers to see analytics and therefore identify patterns, the new self-sufficient BI tools offer interactive visualization, which takes the concept a step further. The interactive dashboards on many of these tools allow users to drill down into charts and graphs for more detail, interactively changing which pieces of data are displayed and how they are processed – all in real time.

  • Collaboration

Because BI is becoming more accessible, the opportunity for SMBs to employ cross-team collaboration will increase. For example, content marketing teams are suddenly able to work closely with data teams to measure how each piece of content works best across multiple formats and contexts. With the insights from that data, the content team can adjust their editorial calendar to include the types of content that perform best and focus on the topics that earn the most attention. This collaboration makes closed-loop marketing possible.

A Better, Data-Fueled Future

The increased availability of BI solutions means that SMBs are no longer tethered to expensive, slow enterprise software. Affordable, meaningful data insights are increasingly accessible, positioning everyone as their own Data Scientist.

 

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2017/02/06/6-ways-business-intelligence-changes/feed/ 3
Get the facts straight: The 10 Most Common Statistical Blunders https://dataconomy.ru/2017/01/27/10-most-common-statistical-blunders/ https://dataconomy.ru/2017/01/27/10-most-common-statistical-blunders/#respond Fri, 27 Jan 2017 09:00:34 +0000 https://dataconomy.ru/?p=17283 Competent analysis is not only about understanding statistics, but about implementing the correct statistical approach or method. In this brief article I will showcase some common statistical blunders that we generally make and how to avoid them. To make this information simple and consumable I have divided these errors into two parts: Data Visualization Errors […]]]>

Competent analysis is not only about understanding statistics, but about implementing the correct statistical approach or method. In this brief article I will showcase some common statistical blunders that we generally make and how to avoid them.

To make this information simple and consumable I have divided these errors into two parts:

  • Data Visualization Errors
  • Statistical Blunders Galore

Data Visualization Errors

This is one nightmare-inducing area to both the presenter as well as the audience. Incorrect data presentation can skew the inference and can leave the interpretation at the mercy of the audience.

Pie Charts

Pie charts are considered to be the best graph when you want to show how the categorical values are broken. However, they can be seriously deceptive or misleading. Below are some quick points to remember when looking at the Pie Charts:

  • Percentages should add up to 100%
  • 3D fits better in VR consoles than in pie charts
  • Thou shall not have ‘Other’ – Beware of the slices with ‘Other’. If that is larger than the rest of the slices, you have a problem, because it makes the pie chart vague
  • Show the total number of reported categories to determine how big is the pie

Bar Graphs

Bar graphs are great graphs to show the categorical data by the number or percent for a particular group. Points to consider when examining a Bar Graph:

  • Thou shall have right scale: Scale made very small to make the graph look big or severe
  • Consider the units being represented by the height of the bar and what it means as a result in terms of those units

Time Charts

A time chart is used to show how the measurable quantities change by time.

  • Thou shall have the right scale and the axis: It is a good practice to check the scale on the vertical axis (usually the quantity) as well as the horizontal axis (timeline) as the results can be made to look very impactful by switching the scales
  • Don’t try to answer the “Why is it happening?” question using the time charts as they only show “What is happening”
  • Ensure that your time charts should show empty spaces for the times when no data was recorded

Histograms

  • It is good practice to check the scale used for the vertical axis frequency (relative or otherwise), especially when the results are showed down through the use of inappropriate scale
  • Ensure that the intervals are not missed on the x or y axis to make the data look smaller
  • Ensure the application of histogram is correct as people tend to confuse histograms with a bar graphs

Statistical Blunders Galore

This is probably a ‘no-nonsense zone’ where one would not want to make false assumptions or erroneous selections. Statistical errors can be a costly affair, if not checked or looked into it carefully.

Biased Data

Unbiased

Bias in statistics can be termed as over or underestimating the true value. Below are some most common sources or reasons for such errors.

  • Measurement instruments that are systematically off and causing such bias. Example a scale that adds up 5 pounds each time you weigh.
  • Survey participants influenced by the questioning techniques
  • A Population sample of individuals that doesn’t represent the population of interest. For example, examining exercise habits by only visiting people in gyms will introduce a bias.

No Margin of Error

This is a great way to understand the potential miscalculation or change in circumstance that can result in a sampling error and ensures that the result from a sample study is close to the number that can be expected from the entire population. It is a good idea to always look for this statistics to ensure that the audiences are not left to wonder about the accuracy of the study.

Non-Random Sample

Non-Random samples are biased, and their data cannot be used to represent any other population beyond themselves. It is pivotal to ensure that any study is based on the random sample and if it isn’t, well, you are about to get into big trouble.

Correlation is not Causation

Besides the above statement, correlation is one statistic that has been misused more than being used. Below are the few reasons that makes me believe the misuse part of this statistic.

Correlation applies only to two numerical variables, such as weight and height, call duration and hold time, test scores for a subject and time spent studying that subject etc. So, if you hear someone say, “It appears that the study pattern is correlation with gender,” you know that’s statistically incorrect. Study pattern and gender might have some level of association but they cannot be correlated in the statistical sense.

Correlation helps to measure the strength and the direction of a linear relationship. If the correlation is weak, once can say that there is no linear relationship but that doesn’t mean that there is no other type of relationship that might exist.

Botched Numbers

One should not believe in everything that appears with statistics. As we know error appears all the time (either by design or by mistake), so look for the below points to ensure that there are no botched numbers.

  • Make sure everything adds up to what it is reported to
  • “A stitch in time saves nine” – Do not hesitate to double-check the numbers and basic of calculations
  • Look at the response rates of a survey – Number of people responded divided by the number of people surveyed
  • Question the statistic type used to ensure it is the best fit

Being a consumer of information, it is your job to identify shortcomings within the data and analysis presented to avoid that “oops” moment. Statistics are nothing but simple calculations that are smartly used by people who are either ignorant or don’t want you to catch them to make their story interesting. So, to be a certified skeptic, wear your statistics glasses.

 

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2017/01/27/10-most-common-statistical-blunders/feed/ 0
Three Unexpected Uses for 3D Printing in Big Data https://dataconomy.ru/2016/05/16/three-unexpected-uses-3d-printing-big-data/ https://dataconomy.ru/2016/05/16/three-unexpected-uses-3d-printing-big-data/#comments Mon, 16 May 2016 08:00:29 +0000 https://dataconomy.ru/?p=15618 The future of 3D printing is heavily reliant on the power of data. The Internet of Things means users will be able to access and print files remotely, as well as create incredible scans and share prints. But, there are also several more opportunities for big data and 3D printing to work together. 3D printing […]]]>

The future of 3D printing is heavily reliant on the power of data. The Internet of Things means users will be able to access and print files remotely, as well as create incredible scans and share prints. But, there are also several more opportunities for big data and 3D printing to work together. 3D printing will help manufacturers and researchers and see uses that expand far beyond our expectations. Here’s three of the more exciting possibilities for 3D printing and big data to work together.

3D Printing and Data Visualization

Data means nothing if it can’t be understood. This is not only true for major companies looking to turn numbers into profit, but also ordinary folks and smaller institutions. The world has seen pie charts, interactive graphs, and even videos to portray data, but what if you could touch those visualizations? The results would not just be pretty, they would lead to more thorough understandings and faster insights. Previously unnoticed patterns would be given a physical presence, and viewers could absorb information with greater depth.

Two researchers at MIT made this solution popular back in 2014. Their study started with a 3D printed model of the university campus. Twitter data was then streamed onto the mock-up according to geographical location. These tweets could also be described by topic or time. “Other demonstrations may include animating twitter traffic volume as a function of time and space to provide insight into campus patterns or life,” they add.

Possible cited applications include understanding traffic flows or other duties performed by city planners. Small companies are finally growing and utilizing their data, and proper visualization is the next step. Some enthusiasts aren’t just sharing their data, they’re sharing tips for turning datasets into 3D printed models. One research agency even took data from live streaming video platform Twitch and turned it into physical prints. While written numbers can tell the basic story, a 3D model allows everyone involved to see the data from all angles at once. The Twitch researchers managed to create visualizations that could be physically held and even compared to one another in the real world.

Printing Data Storage

One of the biggest problems looming in the shadows of big data is just how much space it requires. The world is generating 2.5 quintillion bytes of data daily. Companies big and large are going to continue storing (maybe even hoarding) data. For some, this detail overshadows the many possibilities big data has to disrupt the modern world. In order to keep all of that information, companies need lots of space—and that gets expensive. Not just for the user, but for those cloud solutions who are looking after it all. This will also lead to negative environmental impacts, so future data do-gooders will be stuck dealing with a double-edged sword.

3D Printed electronics are not yet the norm, but they are on their way. Printing these electronics pieces could lead to less wasted time and resources in the manufacturing process. At the moment, printing technology is still in its relatively early stages. Printing storage devices and pieces wouldn’t necessarily be faster than traditional manufacturing methods—yet. More importantly, one huge reason that 3D printing is being pushed in many sectors is the complete lack of waste. Traditional manufacturing leads to ample waste, but printing uses only the material necessary to create the object. Once 3D printing masters the electronic (which it likely will, based on how many people care about the field), printed storage drivers and circuits board will become as common as traditional technology. In fact, they could become more common, and that will be a huge relief for data science.

Monitoring Manufacturing

One of 3D printing biggest roles is the future of manufacturing. It’s disrupting the way companies create and produce. Many seem to think printing isn’t yet capable of creating truly useful prints, and they picture some awkward hunk of plastic that clearly looks printed. That, however, is only what the consumer sees. Major companies are already turning to printing in prototyping and even for end-product creation. However, whenever technology progresses at rapid speed, there are always drawbacks and steps that can go wrong. That’s why big data is needed to keep 3D printing safe.

GE recently reported on their own creative use of 3D printing in the aviation sector. The company, like many others, is using 3D printing to create powerful new parts. A recent jet engine fuel nozzle was 3D printed to be lighter and more durable than its predecessors. The problem is that creating such an exact tool isn’t simple, and new quality control methods need to be developed fast.

“We are dealing with a microscopic weld pool that’s moving at hundreds of millimeters per second,” says GE Aviation mechanical engineer, Todd Rockstroh. “Every cubic millimeter is a chance for a defect.”

These prints take anywhere from 10 to 100 hours to produce, occur on minuscule scales and at extreme temperatures. Engineers can’t just press “print” and come back the next week to check it out. Instead, they must develop new data-driven technologies to actively prevent possible issues. Thanks to data analysis, temperature anomalies can now be easily spotted. It can even help retrospectively determine why a failed print. GE expects this kind of inspection is increasing production speeds by 25%. It also keep post-print work to a minimum.

The exponential growth of 3D printing is fueled by the technology’s great impact in almost every field. Hobbyists, international manufacturers, hospitals, NASA—nearly everyone has a use for it. Without data science, however, it would have only limited uses and a much shorter lifespan. The low-cost, highly customizable, zero-waste solution will become completely entwined with data, and it will be a huge victory for both sides. Best of all, the combined solution will lead to possibilities neither printers nor data scientists have yet to consider. Future warehouses will, no doubt, be full of incredible, connected opportunities.

image credit: MKzero

Like this article? Subscribe to our weekly newsletter to never miss out!

]]>
https://dataconomy.ru/2016/05/16/three-unexpected-uses-3d-printing-big-data/feed/ 4