Tamr – Dataconomy https://dataconomy.ru Bridging the gap between technology and business Thu, 25 Jun 2015 12:49:46 +0000 en-US hourly 1 https://dataconomy.ru/wp-content/uploads/2022/12/DC-logo-emblem_multicolor-75x75.png Tamr – Dataconomy https://dataconomy.ru 32 32 Four Steps for Building a Successful Enterprise Metadata Catalog https://dataconomy.ru/2015/06/25/four-steps-for-building-a-successful-enterprise-metadata-catalog/ https://dataconomy.ru/2015/06/25/four-steps-for-building-a-successful-enterprise-metadata-catalog/#comments Thu, 25 Jun 2015 12:46:15 +0000 https://dataconomy.ru/?p=12960 With the fast-growing interest in data lakes — a storage solution that allows structured and semi-structured data to live in the same place — attention is turning toward metadata as a way to organize large amounts of diverse enterprise data. Metadata is an ambiguous and generic term, but it most commonly refers to attribute names, […]]]>

With the fast-growing interest in data lakes — a storage solution that allows structured and semi-structured data to live in the same place — attention is turning toward metadata as a way to organize large amounts of diverse enterprise data.

Metadata is an ambiguous and generic term, but it most commonly refers to attribute names, data types, relationships, basic data quality metrics, usage stats and access controls. Metadata is literally data about data and thus is often left unwritten — stored only in the heads of those in the know.

Capturing and harnessing this metadata in a robust, easily accessible catalog can open dramatic opportunities for an organization.  Specifically, a catalog can improve the availability of enterprise data.  Data scientists can quickly and confidently gather the necessary data for analysis. Data stewards can better understand how data interacts and connects across sources and silos.   Through user interfaces that make it easy for data experts, owners and users to access the catalog, you can help create collaboration and shared understanding of your data assets across the enterprise.

Why is this so important?  Imagine you’re a data scientist tasked with analyzing payment terms across all of your organization’s suppliers, with data from hundreds of ERP systems flowing into a single data lake. The fact that this data is in a single location might give you the illusion that it’s easily accessible for analysis, but it’s not.  You know the data you need is in there, but you don’t know where you can immediately find it when you need it.

A well-maintained metadata catalog would make it far easier for you to identify the sources and attributes required for your payment analysis across organizational silos. The result? Significantly less time spent collecting data and more accurate outcomes.

Following are four best practices for starting to manage your own metadata for analytic applications.

  1.  Start with Questions (The Hard Ones)

Before you begin thinking about metadata, start by thinking of the most impactful business-level questions that your organization would want to solve, and the data required to answer them. For example, the data needed for an analysis of cross-selling opportunities among your lines of business may be different from the data needed for inventory projections for your stores. Thinking through requirements ahead of time and ensuring they are baked into your metadata catalog can be an immense time-saver when it comes time to perform analysis. You may not have the data immediately ready, but you’ll know what’s available, its quality and reliability, and its location.

  1.  Identify Core Attributes and Sources

As you develop key business questions, you’ll no doubt get a better idea about the underlying entities that would be required for analysis. In the payment analysis example above, the key entities would be suppliers and payment terms. For a pharmaceutical analysis, it could be patient, drug and experiment data. With respect to metadata and analytics, understanding entities and their relationships is critical for downstream analytics.

  1.  Identify Key Data Experts

The most valuable metadata often isn’t stored in a database or data lake. It’s stored in the brains of people. In other words, the data owners and experts who are often spread throughout the enterprise.

Understanding table relationships, completeness or emptiness indicators, and table structure is far too big a job for one person. The knowledge is split up among the various domain experts who use the data regularly and the IT analysts who create and maintain the data structure. Everything from quality metrics (e.g., ‘99999’ means null in this attribute) to data origins (e.g., average county income was used for the ‘income’ attribute) to much more nuanced information (e.g., inflation in the US vs. inflation in Mexico) is stored somewhere in the minds of your owners and experts. Once you have identified the business goals and what kind of data you’ll require, make sure you verify with these experts that you have everything you need, and that you have what you think you have.

Beyond verifying that you have what you need, however, people can also help you find the metadata by registering it in the catalog in the first place, as well as by collaboratively annotating it.

  1.  Embrace variety.

Data changes constantly. New business initiatives and needs pop up every day. Responding to all these changes ad hoc is not going to lead to long-term data stability. Instead, create a more deliberate process for reviewing metadata changes and monitoring data streams for change. Metadata is a critical part of a healthy data ecosystem, but it only takes one oversight or mistake to render it ineffective.

Clearly, this last step is the most difficult to implement. Part of this implementation is deciding what tools to use for tracking and maintaining data deltas.

Master Data Management (MDM) software, which uses user-defined rules for matching entities and mapping attributes, was developed for exactly this reason. And many MDM tools have been incredibly effective in mapping segments of a data ecosystem. But there are a few problems with using this top-down, rigid approach in today’s world of data lakes and Big Data: namely, we don’t know what we don’t know.

Some of the newer enterprise data unification products are trying to overcome this problem by using human-guided machine learning.  Algorithms automatically connect the vast majority of data sources and resolve duplications, errors and inconsistencies among entities and attributes. When the system can’t resolve connections automatically, it calls on people in the organization familiar with the data to weigh in on the mapping and improve its quality and integrity.  As a result, the data gets better and better over time.
So that when you do start with the hard questions — as these best practices encourage you to do — you’ll have some quick help identifying the sources, attributes and key experts that are central to good metadata management.


Four Steps for Building a Successful Enterprise Metadata Catalog

Maggie Soderholm is a field engineer at Tamr, Inc.. where she helps customers deploy metadata catalogs and other enterprise data unification software. Before joining Tamr, she was a data analyst on the Business Intelligence Team at Evernote, where she helped create the infrastructure for storing and pulling data as well as accessing reports and dashboards.  She holds a degree in statistics from Carnegie Mellon University.

]]>
https://dataconomy.ru/2015/06/25/four-steps-for-building-a-successful-enterprise-metadata-catalog/feed/ 1
Database Wizard Dr. Michael Stonebraker Wins the 2014 ACM Turing Award https://dataconomy.ru/2015/04/07/database-wizard-dr-michael-stonebraker-wins-the-2014-acm-turing-award/ https://dataconomy.ru/2015/04/07/database-wizard-dr-michael-stonebraker-wins-the-2014-acm-turing-award/#comments Tue, 07 Apr 2015 14:47:51 +0000 https://dataconomy.ru/?p=12584 The A.M. Turing Award, bestowed by the world’s largest educational and scientific computing society, Association for Computing Machinery (ACM), honours computer scientists and engineers who have contributed towards the progress of the information technology industry. Often considered to be the the ‘Nobel Prize for Computing,’ the 2014 Turing Award has recognised the efforts of Dr. […]]]>

The A.M. Turing Award, bestowed by the world’s largest educational and scientific computing society, Association for Computing Machinery (ACM), honours computer scientists and engineers who have contributed towards the progress of the information technology industry.

Often considered to be the the ‘Nobel Prize for Computing,’ the 2014 Turing Award has recognised the efforts of Dr. Michael Stonebraker, towards the concepts and practices underlying modern database systems. ACM will present him with the 2014 A.M. Turing Award at its annual Awards Banquet on June 20 in San Francisco, California.

“Michael Stonebraker’s work is an integral part of how business gets done today,” explains ACM President Alexander L. Wolf. “Moreover, through practical application of his innovative database management technologies and numerous business start-ups, he has continually demonstrated the role of the research university in driving economic development.”

The 71 year old is credited with the invention of numerous concepts that have been crucial in making databases a reality and that are used in almost all modern database systems, explains the announcement. With Ingres he introduced the notion of query modification, used for integrity constraints and views while Postgres introduced the object-relational model, effectively merging databases with abstract data types while keeping the database separate from the programming language.

Dr. Stonebraker later released these systems as open software, enabling widespread adoption and their code bases have been incorporated into many modern database systems.

Currently, Dr. Stonebraker is adjunct professor at the Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) where he is also co-founder and co-director of the Intel Science and Technology Center for Big Data.

Previously, he served as professor of computer science at the University of California at Berkeley for 29 years, and is a graduate of Princeton University earning his master’s degree and his Ph.D. from the University of Michigan.

Prior to the Turing Award, Dr. Stonebraker has been the recipient of various awards including the Software System award with Gerald Held and Eugene Wong for the development of INGRES for his involvement as the main architect of the INGRES relational DBMS (IBM’s System R was also recognized).

He was also responsible for founding several successful companies to commercialize his work, including Ingres Corporation (acquired by ASK and then Computer Associates), Illustra Corporation (acquired by Informix), Cohera Corporation (acquired by PeopleSoft), Streambase, Inc (acquired by Tibco), Vertica Systems, Inc. (acquired by HP), VoltDB, Paradigm4 and Tamr, Inc.

Started in 1966 and named after Alan Turing, the British mathematician touted as the father of modern computing, who led the efforts to crack the German Enigma machine during World War 2, the ACM Turing Award presents a $1 million prize to the winner, financial support for which is provided by Google, Inc.

Photo credit: Tamr

]]>
https://dataconomy.ru/2015/04/07/database-wizard-dr-michael-stonebraker-wins-the-2014-acm-turing-award/feed/ 1
Tamr Bags Industry Veteran James Markarian as Advisor https://dataconomy.ru/2015/01/27/tamr-bags-industry-veteran-james-markarian-as-advisor/ https://dataconomy.ru/2015/01/27/tamr-bags-industry-veteran-james-markarian-as-advisor/#respond Tue, 27 Jan 2015 10:30:28 +0000 https://dataconomy.ru/?p=11694 Tamr, the MIT CSAIL spin-off startup that deals in Big Data, has snapped up the former executive VP and CTO at Informatica Corporation James Markarian to act as an advisor. Tamr co-founder and CEO Andy Palmer lauded the industry veteran: “He’s been a visionary leader in data technologies on the West Coast for more than […]]]>

Tamr, the MIT CSAIL spin-off startup that deals in Big Data, has snapped up the former executive VP and CTO at Informatica Corporation James Markarian to act as an advisor.

Tamr co-founder and CEO Andy Palmer lauded the industry veteran: “He’s been a visionary leader in data technologies on the West Coast for more than 15 years. We are thrilled that James has chosen to work closely with us as we grow.”

Mr. Markarian is presently an Entrepreneur in Residence (EIR) at Khosla Ventures, and also an investor in StreamSets, EnerAllies and Waterline Data Science. With past experiences in Informatica and Oracle before that he advise Tamr with product architecture and strategy, customer acquisition and West Coast leadership for the company’s award-winning data unification platform.

“Tamr has the potential to change the status quo, which today is that all analytics involving diverse data sets have to be science projects,” noted Markarian. “Tamr has figured out how to integrate and deliver high-quality data directly to the people, analytics and applications that can solve the big, P&L-level business problems ─ and do this automatically, at scale. I’m really looking forward to being part of this evolution.”

Founded in 2014, the Tamr team has been building a commercial-grade solution to connect and enrich diverse data inside and outside an organization, enabling enterprises to use all available data for business intelligence and analytics, at scale, quickly and cost effectively.


(Image credit: Tamr)

]]>
https://dataconomy.ru/2015/01/27/tamr-bags-industry-veteran-james-markarian-as-advisor/feed/ 0
Tamr: Data Curation Powered by Machine Learning https://dataconomy.ru/2014/05/21/tamr-database-curation-powered-machine-learning/ https://dataconomy.ru/2014/05/21/tamr-database-curation-powered-machine-learning/#comments Wed, 21 May 2014 09:19:06 +0000 https://dataconomy.ru/?p=4612 Monday saw the unveiling of Tamr, a scalable platform for data curation, at the Databeat Conference in San Francisco. Founders Andy Palmer and Michael Stonebraker also announced they had accrued an impressive $16 million of first-round funding from Google Ventures and New Enterprise Associates. Palmer and Stonebraker came up with idea for the software during […]]]>

Monday saw the unveiling of Tamr, a scalable platform for data curation, at the Databeat Conference in San Francisco. Founders Andy Palmer and Michael Stonebraker also announced they had accrued an impressive $16 million of first-round funding from Google Ventures and New Enterprise Associates.

Palmer and Stonebraker came up with idea for the software during their research at MIT, during which they discovered a wide-spread demand amongst tech companies for software that would enable them to prepare data for analysis faster. “We came to the conclusion that what was required was something that automated the integration of new sources and new attributes over time and would insulate the system from the changes in the fundamental sources so that you didn’t have to go … and re-engineer from the top down all the time all this ETL [extract, transform, and load work],” said Palmer.

Tamr sits on top of the company’s exisiting databases and provides a holistic view across all of the systems. Through its machine learning algorithm, Tamr roots out similar sets of data across different databases and delivers a report on this data to the company. Each report item has a ‘confidence rating’ (i.e, an indicator of how certain the programme is that the datasets are similar). Human input is then required to decide if the data is similar; if it is, Tamr maps the two fields or collumns together, making the systems more cohesive and streamlined.

Using this software to integrate and combine datasets could give companies a competitive advantage in data analysis over firms who are only exploring one data set in isolation, rather than gaining a wider perspective. Rich Miner of Google Ventures, one of Tamr’s primary investors, said of its value: “Businesses can’t keep up with the number and depth of data sources exploding within their companies. Tamr combines machine learning and corporate knowledge to unlock a unified view of companies’ most valued data repositories.”

Read more here
(Photo credit: Tamr website)

]]>
https://dataconomy.ru/2014/05/21/tamr-database-curation-powered-machine-learning/feed/ 1