Data Warehousing – Dataconomy https://dataconomy.ru Bridging the gap between technology and business Tue, 31 May 2016 10:55:51 +0000 en-US hourly 1 https://dataconomy.ru/wp-content/uploads/2022/12/cropped-DC-logo-emblem_multicolor-32x32.png Data Warehousing – Dataconomy https://dataconomy.ru 32 32 Data Mining Tops LinkedIn’s List of the Hottest Skills in 2014 https://dataconomy.ru/2015/01/12/data-mining-tops-linkedins-list-of-the-hottest-skills-in-2014/ https://dataconomy.ru/2015/01/12/data-mining-tops-linkedins-list-of-the-hottest-skills-in-2014/#respond Mon, 12 Jan 2015 07:54:14 +0000 https://dataconomy.ru/?p=11336 The list for 25 Hottest Skills of 2014 that got people hired according to the business networking service provider, LinkedIn, came out mid-December last year, and there has been a significant shuffling since 2013. What came out on top as the number one on this list was Statistical Analysis and Data Mining, as compared to […]]]>

The list for 25 Hottest Skills of 2014 that got people hired according to the business networking service provider, LinkedIn, came out mid-December last year, and there has been a significant shuffling since 2013.

What came out on top as the number one on this list was Statistical Analysis and Data Mining, as compared to number 5th on 2013’s list; the topper then was Social Media Marketing.

“If your skills fit one of the categories below, there’s a good chance you either started a new job or garnered the interest of a recruiter in the past year,” says the blog post making the announcement.

LinkedIn analyzed the skills and employment history of more than 330 million LinkedIn member profiles. Expertise (skills) and experience (work history) represent the primary components of professional identity on LinkedIn.

It was good news for data-based professionals all round, with Storage Systems, Information Security, Business Intelligence, Algorithm Design and Perl/Python/Ruby all also appearing in the top 10.

Here is the entire list, courtesy of LinkedIn’s blogpost:

Data Mining Tops LinkedIn's List of the Hottest Skills in 2014


(Featured Image credit: A Name Like Shields Can Make You Defensive, via Flickr)

]]>
https://dataconomy.ru/2015/01/12/data-mining-tops-linkedins-list-of-the-hottest-skills-in-2014/feed/ 0
Exasol: Building the Fastest Database in The World https://dataconomy.ru/2014/10/17/exasol-building-the-fastest-database-in-the-world/ https://dataconomy.ru/2014/10/17/exasol-building-the-fastest-database-in-the-world/#respond Fri, 17 Oct 2014 16:12:23 +0000 https://dataconomy.ru/?p=9924 Exasol was founded 14 years ago with a mission in mind: to build an ultra-fast, highly scalable database for analytics. Having been in the business of crafting an in-memory database for nearly a decade and a half, Exasol is uniquely placed to offer insights into the data science ecosystem, and will be giving talks at our […]]]>

MossmanGrahamExasol was founded 14 years ago with a mission in mind: to build an ultra-fast, highly scalable database for analytics. Having been in the business of crafting an in-memory database for nearly a decade and a half, Exasol is uniquely placed to offer insights into the data science ecosystem, and will be giving talks at our events in Berlin, Munich, and London. We recently caught up with Graham Mossman, Exasol’s Senior Solutions Architect to discuss Exasol’s development, product, and his upcoming talk at Data Enthusiasts London.


Exasol has been around for 14 years. How has the product changed over time?

Over the past 14 years, our priority has been to stay true to our roots and be absolutely world-class when it comes to analytical SQL queries. We haven’t tried to build our OLTP capabilities on top of that. Rather, we have gone for being the best at our particular niche.

We have worked tirelessly on making this the easiest product to use on the market, and crucially, not having to compromise on speed. On top of this, our aim really has been to make our product easier to integrate with other systems. Our software engineers have made this product move from strength to strength, and I think it shows in how quick, easy and adaptable our product is.

Speed is a differentiating factor for Exasol- you were recently named the world’s fastest database according to the TPC-H Benchmark 2014. What else makes you unique?

It’s undoubtedly the fact that there are very few tuning knobs on the Exasol system. We believe that the system should do as much of the work as possible. There are actually only two things you can change in an Exasol system, otherwise it just calculates the best way of operating.

The system is really like a car engine, where the system does not rely on the user to have an in-depth knowledge of engine mechanics to operate. To put it quite simply, the system just works!

Exasol’s system is quite intuitive, but I’m wondering how you deal with customers who have very specific requirements.

We provide a very simple tool for a job and we set the system up in such a way that it is extremely easy to integrate. Again, to give the engine analogy – our system is like a Mercedes engine; they are extremely versatile. They fit easily into formula 1 car, a pick up truck, a cab, etc. Exasol’s engine is exactly like this. We make it easy for people to plug it into their particular architecture.

We do not choose sides when it comes to the ETL tools or the BI front end you use, we just make our system easy to talk to. That way, we provide you with maximum power, without making any significant compromises.

You offer Hadoop integration too. What is the basic distinction between what you would expect an enterprise to do with Hadoop and what you would expect them to do with a data warehouse?

This really depends on the way particular enterprise is set up. If we talk about our client King, they have a very large Hadoop implementation with petabytes of data and they use us to answer analytical queries and their integration is really straightforward.

For example, if King wants to know about a particular month of data, rather than pull all the data they’ve collected over the past 3 years, they will only pull a copy of the data they need into the database. Once they’ve loaded that data into our solution, they are able to run extremely quickly some very challenging queries that would have stymied their Hive solution. This is one integration we offer, which I call a “loose integration” where you pull from Hadoop down into a database.

What else do you provide with Hadoop?

We provide a Hadoop integration product, which uses 0MQ messaging and Google protocol buffers to build an infrastructure that allows the passing of messages between Hadoop and the database. This means that, as part of your SQL in our database, you can call an external function. This is actually a MapReduce job that will run of the Hadoop cluster, which is a much more intimate integration that is available to those who want to have both systems working close together.

From our experience, however, what most people want is to have Hadoop and the database world separate so that both can be optimised to do a particular job, rather than compromising both worlds.

Can you give us a brief summary of the talk you will be giving in our Meetup in London on November 13th and why you chose this topic?

The title we’re going with is “a tool for a job” and what we’re going to be talking about is how analytics queries are imperfectly dealt with by traditional multipurpose databases. Equally, we are also going to touch upon that fact that Hadoop is not well equipped to deal with such analytical queries either.

Although Hadoop has a number of projects trying to do SQL at speed, I believe that these approaches lead you to a very impure kind of architecture. By being grafted onto something else, which is what people are trying to do with SQL and Hadoop, you lose the purity of having a product that is developed from the ground up for a particular job.

I’ve also got some nice picture’s of me mowing the lawn, which, if you come to the event, will understand how it’s relevant!

This will be the backbone and I’m looking forward to seeing how people respond!

This interview was conducted Furhaad Shah – a journalist at Dataconomy. 

(Image source: Exasol)

]]>
https://dataconomy.ru/2014/10/17/exasol-building-the-fastest-database-in-the-world/feed/ 0
What is a Data Warehouse? https://dataconomy.ru/2014/10/14/what-is-a-data-warehouse/ https://dataconomy.ru/2014/10/14/what-is-a-data-warehouse/#comments Tue, 14 Oct 2014 15:54:26 +0000 https://dataconomy.ru/?p=9850 When you mention “data warehouse” to someone not acquainted with the term, the first image that springs to their minds is usually something like the one above. When in fact, it actually looks a little more like this: A data warehouse is, on the highest level, a central repository of data. Just as big data has […]]]>

When you mention “data warehouse” to someone not acquainted with the term, the first image that springs to their minds is usually something like the one above. When in fact, it actually looks a little more like this:

Data_warehouse_overview

A data warehouse is, on the highest level, a central repository of data. Just as big data has Doug Laney and his 3 V’s, data warehousing has Bill Inmon, and his much-less catchy but oft-cited definition:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.

This might not be all that illuminating at first glance, but if you unpack it, there’s a lot of information to be found here.

  • “Subject-Oriented”: Contrary to what many people believe when first hearing the term, a data warehouse does not have to be one monolithic repository of all your data. A data warehouse can be used to store and analyse data on a specific subject.
  • “Integrated”- One of the key features of data warehouses is that they store and leverage data from multiple different sources.
  • “Time variant”-  Data warehouses are the home of historical data. You can pull up data from a specific time period, be that 3 months or 3 years ago. Data warehouses often also contain detailed temporal data, showing changes to a dataset. For example, if you’re using your data warehouse to store key customer metrics, you can not only see a customer’s current address or recent purchase history, but all data previously associated with this customer & how it’s changed over time.
  • “Non-volatile”The data contained within a data warehouse is not subject to change.

Another key defintion often cited when defining the term “data warehouse” comes courtesy of Ralph Kimball. He states:

A data warehouse is a copy of transaction data specifically structured for query and analysis.

The Data Warehousing Information Center add two caveats to Kimball’s defintion, which I think are worth highlighting:

1) Sometimes non-transaction data are stored in a data warehouse – though probably 95-99% of the data usually are transaction data.
2) I say “querying and reporting” rather than “query and analysis” because the main output from data warehouse systems are either tabular listings (queries) with minimal formatting or highly formatted “formal” reports. Queries and reports generated from data stored in a data warehouse may or may not be used for analysis.”

They also raise the salient point that what Kimball purposefully doesn’t say is also worth taking into consideration. No mention is made about the form of the stored data, as a data’s form never qualifies or disqualifies it from being considered a data warehouse. A data warehouse can be a relational database, a multidimensional database, a hierarchical database, an object database- the list goes on. There is no one form, or set of forms, that a data warehouse is expected to make.

Some leaders in the field of data warehousing solutions include:

  • Exasol- Exasol offer a ready-to-run data warehouse appliance, tailored to your data volume and analysis requirements. They offer a free single-node copy of their software here, if you’d like to try before you buy.
  • Teradata- Teradata offer a broad spectrum of data warehousing products, from their fully inclusive Teradata Active Enterprise Data Warehouse, to the Teradata Database Express, which is free to download straight away. Features within the portfolio include support for JSON documents and a range of language plugins (including Ruby, Perl and Python)
  • Data Virtuality- Data Virtuality’s USP is that it claims to unite “all data, all systems and all sources” on one platform, automatically, in less than one week. It’s quite a claim, but reviewing their extensive list of connectors should convince you of their credibility.

Other important vendors in the data warehousing space include EMC, Greenplume, Splunk, Amazon Warehouse Services and Google Mesa. Stay tuned in the coming weeks for in-depth guides focused around these technologies.

(Image sources: Wikipedia, Sébastien Barré)

]]>
https://dataconomy.ru/2014/10/14/what-is-a-data-warehouse/feed/ 1
BitYota Announces Launch of its Flagship Data Warehouse Service, Promises More Power Packed and Flexible Experience https://dataconomy.ru/2014/10/10/bityota-announces-launch-of-its-flagship-data-warehouse-service-promises-more-power-packed-and-flexible-experience/ https://dataconomy.ru/2014/10/10/bityota-announces-launch-of-its-flagship-data-warehouse-service-promises-more-power-packed-and-flexible-experience/#respond Fri, 10 Oct 2014 08:41:47 +0000 https://dataconomy.ru/?p=9754 BitYota, a Warehouse-as-a-Service provider for big data analytics, has now made available its flagship Data Warehouse Service (DWS). Founded in late 2011, the startup claims that, imbued with new capabilities, it provides greater power, versatility and convenience to the multi-structured data analytics platform. Dev Patel, CEO of BitYota explained, “Some of the most valuable data […]]]>

BitYota, a Warehouse-as-a-Service provider for big data analytics, has now made available its flagship Data Warehouse Service (DWS).

Founded in late 2011, the startup claims that, imbued with new capabilities, it provides greater power, versatility and convenience to the multi-structured data analytics platform.

Dev Patel, CEO of BitYota explained, “Some of the most valuable data available today comes from external sources such as 3rd analytics APIs. With this new version of our Data Warehouse Service, BitYota offers users the ability to bring data in from numerous external sources, process it using their custom business rules and immediately begin interrogating data in multiple structures, using industry-standard SQL query language, all from within the DWS.”

The new DWS version offers a range of features and upgrades that provide new performance and flexibility:

  • The platform’s data collection framework provides a unified way to funnel data from a wide variety of upstream 3rd party API sources
  • An in-database processing pipeline for ELT (extract-load-transform); the ability to build a custom data pipeline using SQL within the DWS that can be run on a schedule
  • Enhanced resource management
  • Platform-specific improvements to boost analytics performance
  • Availability of multiple new configurations

Jay Zaveri, Chief Product Officer of CloudOn, a cloud storage provider that allows users to create, review and share files from any device notes,
“BitYota serves as a cost-effective, high performance data warehouse that enables us to analyze raw session data from millions of users in seconds. A traditional analytics system just wouldn’t work given the price and the flexibility we need. We load data into BitYota every hour, store, and explore this raw data. We look deep into user behavior with complete ease, and run ad-hoc queries, for example understanding churn and usage funnels, all using SQL over native JSON.”

This release is now available at no additional cost.

(Image Source: BitYota)

]]>
https://dataconomy.ru/2014/10/10/bityota-announces-launch-of-its-flagship-data-warehouse-service-promises-more-power-packed-and-flexible-experience/feed/ 0
What the Future Holds for the Internet of Things & Smart Cities https://dataconomy.ru/2014/09/30/what-the-future-holds-for-the-internet-of-things-smart-cities/ https://dataconomy.ru/2014/09/30/what-the-future-holds-for-the-internet-of-things-smart-cities/#comments Tue, 30 Sep 2014 09:41:11 +0000 https://dataconomy.ru/?p=9549 In the latest installment of our expert interviews about the future of the Internet of Things, we discuss the promise of these smart cities with RF Code‘s VP of Worldwide Marketing & Strategic Partnerships, Richard Jenkins. As a leading environmental monitoring and asset management solutions provider, RF Code have been working within the RFID and […]]]>

What the Future Holds for the Internet of Things & Smart Cities Richard JenkinsIn the latest installment of our expert interviews about the future of the Internet of Things, we discuss the promise of these smart cities with RF Code‘s VP of Worldwide Marketing & Strategic Partnerships, Richard Jenkins. As a leading environmental monitoring and asset management solutions provider, RF Code have been working within the RFID and Internet of Things space for over a decade, and are uniquely placed to offer insights into what the future holds for the internet of things and smart cities.


Why do you think the interest in the Internet of Things and smart cities has grown so much in recent years?

The Internet of Things describes the connection of people and things, via the internet, to other people and things. The IoT is all about data, how that data is used to extract valuable information and, subsequently, how that information is used to improve ‘something’.

Public interest has grown through the invention and availability of consumer devices and interconnected media. We socialize instantaneously through mobile devices that track our every activity and now wearable technology is enabling the monitoring and understanding of our health, sport and environmental interactions.

From a commercial perspective, the IoT has been around longer, but not always under its current much-hyped moniker. Supply chains have tracking devices, the retail industry manages inventory using RFID and critical assets in healthcare are located in real-time to ensure they are where they should be, when they are needed.

The commercial value of the IoT is rapidly becoming indispensable; a competitive differentiator that could not only affect success, but also the existence of brands and services if not utilized correctly. Automated communities, product development and accurate management of organizations’ facilities and data centers further increases profitability and market dominance.
What the Future Holds for the Internet of Things and Smart CitiesWhat are the opportunities that the IoT and sensor data open up to us? Are there any current uses you find particularly interesting?

The applications are almost limitless but that is where a common issue lies. Too many people discuss an ambitious world vision where data from countless sources is shared and utilized. This vision may eventually occur in the future but currently there are three areas with tangible benefits:

  • The inbound corporate IoT – real-time intelligence and a competitive differentiator that delivers realistic corporate outcomes
  • The internal IoT – intelligent buildings, the smart office, operational control and efficiency
  • Smart urban infrastructure management – transport and traffic efficiency, utilities management, healthcare advancement, public services

Take the intelligent business as an example – with integrated internal and inbound IoT environments, the business knows in real-time what something is, its available capacity, where it is, where it is going and what condition it is in.

This combination of sensor networks, advanced data analytics and business process management ensures a business can be connected from the customer all the way to the boardroom.

What about the challenges?

Maintaining focus is critical. Thinking ‘Big’ is undeniably valuable for certain sectors – financial services, healthcare and other public services – but for the majority, a targeted and strategic implementation of sensor data will deliver positive outcomes and benefit our lives.

Data security and scalability are issues, but the largest challenge is how to store, analyze, predict and utilize the data being generated and collated. The IoT relies entirely on the data center. Businesses and governments have to address data center inefficiency before they can strive forward with the IoT.

How will the rise of the Internet of Things change our data warehousing requirements?

Data warehousing is an interesting term as the IoT will put significant pressure on both the application (storage, Big Data analytics, etc.) of the IoT and the facilities in which these massive data processing environments are housed.

The aforementioned focus will be the only method that ensures the IoT data deluge does not confound organizations. Understanding what is relevant and valuable, and then how it can be used to better the provision of products or services will be critical. The data center allows an organization to collate, analyze, predict and act on data which, ultimately, will keep the hub of any company running.

How intelligent do you think intelligent cities are going to get? What possible future applications do you foresee?

We already see cities looking at far more intelligent uses of sensor networks. Today, the infrastructure that runs cities – transportation, utilities, traffic flow, financial services – are using a rudimentary level of sensors. As population density increases, the public’s expectations increase, (and technology advances), so the instrumentation of cities will demand unprecedented data center infrastructure and data analytics.

Intelligent buildings that manage their own lighting, temperature, power distribution, IT networks and space are increasingly being assessed. The power demands of data centers, and their associated climatic impact, is driving the deployment of sophisticated environmental monitoring technology, with positive results seeing rewards from energy companies and governments. Elsewhere asset control in hospitals is leading to improved patient care and lower treatment costs.

The IoT is already having an impact of the quality of our lives and will, noticeably, drive new economies and allow a small planet to accommodate a large number of people!


(Image credit: Brian Koprowski)

]]>
https://dataconomy.ru/2014/09/30/what-the-future-holds-for-the-internet-of-things-smart-cities/feed/ 2