Data Migration – Dataconomy https://dataconomy.ru Bridging the gap between technology and business Fri, 27 Feb 2015 15:10:47 +0000 en-US hourly 1 https://dataconomy.ru/wp-content/uploads/2025/01/DC_icon-75x75.png Data Migration – Dataconomy https://dataconomy.ru 32 32 Criteo’s Prediction on Hadoop: How and Why it Came About https://dataconomy.ru/2015/02/24/criteos-prediction-on-hadoop-how-and-why-it-came-about/ https://dataconomy.ru/2015/02/24/criteos-prediction-on-hadoop-how-and-why-it-came-about/#comments Tue, 24 Feb 2015 16:28:29 +0000 https://dataconomy.ru/?p=12151 At Criteo we display online advertisement, and we sell clicks to our clients. So we have to predict, for each of our 2 billion daily banners, whether it will likely be clicked or not. That’s why we use machine learning, and we feed the algorithms of our well-oiled engine with big data. But it has […]]]>

At Criteo we display online advertisement, and we sell clicks to our clients. So we have to predict, for each of our 2 billion daily banners, whether it will likely be clicked or not. That’s why we use machine learning, and we feed the algorithms of our well-oiled engine with big data.

But it has not always been like this. Actually, the shape of our prediction engine –and its underlying architecture- had to evolve as our business grew.

At first, when our dataset fitted in a SQL table, there was no need to invoke the name of Hadoop. At that time, we implemented regression tree algorithms in C#. Single process learnings took place on a single server. And we were happy.

An issue with regression tree, is that their size can explode exponentially as you add dimensions. Before our algorithm reached those limits, we improved it. First we used Bayesian networks for a while. Then we implemented generalized linear models. This change increased our performance a lot. And we were proud.

But then, as the needs of the business increased, we had to add another server. And another one. And… and we were worried that our architecture would reach its limits in a near future.

Migrating our existing solution on our Hadoop cluster seemed a healthy way to go. It was far from trivial, but it was definitely where we wanted to be. To achieve this, a lot of questions had to be answered. How do we distribute a mono-threaded gradient descent, into several mappers and reducers? How do we run an existing C# codebase on a Hadoop Linux cluster? How do we keep the reliability of an architecture developed and tuned over the years, when we apply such a big bang?

Our Prediction and Scalability teams worked hand in hand in order to answer those concerns. Our data scientists showed us how we could distribute our learnings. Technical surveillance provided tools that would fit our technology. Reliability has been handled as always, thanks to our engineering culture.

This has been one of our major work last year. It took time and effort. But the outcome met the expectations since we’ve been able to increase the size of our training set, and at the same time nearly doubling the number of trained algorithms. However, I won’t have time to talk much more about it: now there are still a lot of improvements and new technologies I want to test!

If you are curious about what we do and want to join us, have a look at our tech blog and drop us a line at  r&drecruitment@criteo.com!!!

-By Guillaume Turri, Software Developer, R&D, Criteo


(Image credit: Diana Robinson, via Flickr)

]]>
https://dataconomy.ru/2015/02/24/criteos-prediction-on-hadoop-how-and-why-it-came-about/feed/ 1
Hadoop’s Unprecedented Growth Trips Switch for Data Migration Tools to Ease Enterprises’ Adoption of the Platform https://dataconomy.ru/2014/10/29/hadoops-unprecedented-growth-trips-switch-for-data-migration-tools-to-ease-enterprises-adoption-of-the-platform/ https://dataconomy.ru/2014/10/29/hadoops-unprecedented-growth-trips-switch-for-data-migration-tools-to-ease-enterprises-adoption-of-the-platform/#respond Wed, 29 Oct 2014 09:00:13 +0000 https://dataconomy.ru/?p=10073 What is essentially a Domino trip, the growth of Hadoop, as more enterprises adopt the platform, has triggered the need for a tool to help with data migration in and out of Hadoop. “In theory, getting data into and out of Hadoop is well within the capacity of both the software and its users,” notes […]]]>

What is essentially a Domino trip, the growth of Hadoop, as more enterprises adopt the platform, has triggered the need for a tool to help with data migration in and out of Hadoop.

“In theory, getting data into and out of Hadoop is well within the capacity of both the software and its users,” notes Serdar Yegulalp, senior writer at InfoWorld. However, “not everyone is comfortable doing the work themselves, so vendors are offering polished import/export solutions that require less manual labor.” Apache’s Sqoop deals with Hadoop import and export, providing native support for MySQL, Oracle, PostgreSQL, and HSQLDB.

A plethora of tools are available, providing data migration solutions for other, pre-existing platforms. Attunity, with its Attunity Replicate, handles Oracle, SQL Server, DB2, and Teradata over Hadoop. Diyotta DataMover supports Hadoop as either a source or a target, with an equally large roster of data formats and repositories, explains Mr. Yegulalp.

Working in conjunction with Cloudera is Syncsort which picks up data directly from existing mainframes and loads it into Hadoop.

The other important aspect being “future-proofing” – “being able to work well with the changes coming down the pike for Hadoop.”

“With these offerings, the main attraction isn’t the number of supported data sources, but rather the convenience and the expertise-in-a-box approach,” Mr. Yegulalp writes. On the other hand, vendors like Hortonworks offer support and migration services, so there may be “less incentive on their part to make Sqoop into a full-blown replacement for third-party products.”

“You might not want to roll your own Sqoop import connector for a mission-critical job, but the work could pay off in the long run for a future inward migration,” expresses Yegulalp as the best bet for any future investment, “or if an option even bigger and more ambitious than Hadoop comes along.”

Read more here.

(Image credit: Flickr)

]]>
https://dataconomy.ru/2014/10/29/hadoops-unprecedented-growth-trips-switch-for-data-migration-tools-to-ease-enterprises-adoption-of-the-platform/feed/ 0