academia – Dataconomy https://dataconomy.ru Bridging the gap between technology and business Tue, 22 Oct 2024 13:51:51 +0000 en-US hourly 1 https://dataconomy.ru/wp-content/uploads/2022/12/cropped-DC-logo-emblem_multicolor-32x32.png academia – Dataconomy https://dataconomy.ru 32 32 The Blurring Lines Between AI Academia and Industry https://dataconomy.ru/2024/10/22/blurring-lines-ai-academia-industry/ Tue, 22 Oct 2024 13:51:51 +0000 https://dataconomy.ru/?p=59426 The world of AI research is in constant flux, with breakthroughs emerging at a dizzying pace. But where are these advancements happening? While universities have traditionally been the hotbed of scientific discovery, a significant shift is underway. Increasingly, big tech companies play a pivotal role in AI research, blurring the lines between academia and industry.  […]]]>

The world of AI research is in constant flux, with breakthroughs emerging at a dizzying pace. But where are these advancements happening? While universities have traditionally been the hotbed of scientific discovery, a significant shift is underway. Increasingly, big tech companies play a pivotal role in AI research, blurring the lines between academia and industry. 

In 2019, 65% of graduating North American PhDs in AI opted for industry roles, a significant jump from 44.4% in 2010. This trend highlights the growing influence of industry labs in shaping the future of AI.

To understand this evolving landscape, I spoke with Shakarim Soltanayev, a Research Scientist at Sony Interactive Entertainment and a former Research Engineer at Huawei. His insights shed light on the motivations, benefits, and challenges of conducting AI research within a large company and how this interplay with academia drives innovation.

Why Companies Embrace Academic Publishing

Tech giants like Google, Meta, Microsoft, and NVIDIA publish research at academic conferences for various reasons.

“First and foremost, publishing research at conferences can be a powerful marketing tool for companies,” Soltanayev said. “These publications serve as a form of indirect marketing, demonstrating the company’s technical prowess and commitment to advancing the field. This boosts their brand image within the research community and in the eyes of customers, partners, and investors. These publications help companies stand out from competitors and strengthen their overall market presence.”

The role that publishing plays in talent acquisition is vital.

“Top-tier conferences such as NeurIPS and CVPR are a prime venue for networking with leading researchers and engineers and recruiting promising students,” Soltanayev said. “By showcasing their work, research laboratories such as Google Deepmind and Meta AI can attract the brightest minds in the field, as top talent often wants to work on groundbreaking problems with access to high-quality resources and collaborators.”

A Two-Way Street: The Exchange of Value

The relationship between academia and industry is not one-sided; it’s a dynamic exchange of knowledge and resources that benefits both sides.

“A great example of academic research directly influencing industry is the development of the convolutional neural network (CNN) architecture,” Soltanayev said. “It was pioneered by Yann LeCun and his colleagues in the academic space, and it has had a major impact on tech products, particularly in computer vision. When AlexNet, a CNN-based model, won the ImageNet competition in 2012, it sparked widespread adoption in the industry. Nowadays, CNNs have a wide range of applications, including image recognition for facial identification and object detection, medical imaging for disease diagnosis, and autonomous vehicles for real-time object recognition.”

On the other hand, the industry has significantly contributed to academic research in several ways.

“One of the most notable contributions is the development of large-scale datasets and powerful computing frameworks,” Soltanayev said. “For example, companies have released massive datasets, such as those for image recognition, language models, and self-driving car simulations, that have become critical for academic research. These datasets provide the necessary scale for training advanced machine learning models, which would be difficult for most academic labs to collect independently. Industry also drives innovation in hardware and software, with the development of GPUs by NVIDIA and deep learning frameworks like TensorFlow by Google and PyTorch by Meta, now standard tools in academic and industrial research.”

Different Priorities, Different Cultures

As AI advances, academia and industry are taking different paths to prioritize and approach these developments.

“The main difference between academia and industry research is the focus,” Soltanayev said. “In academia, the priority is often on long-term, fundamental questions that push the boundaries of knowledge. Researchers have the freedom to explore ideas without the pressure of immediate application. In industry, research focuses more on solving real-world problems and creating products, so the timeline is usually shorter, and there’s more pressure to deliver practical results.”

The variations between the two environments significantly influence the cultural dynamics.

“Academia encourages deep exploration, independent thinking, and publishing findings to advance knowledge,” Soltanayev said. “Industry research, on the other hand, is more collaborative, with teams working together to quickly turn ideas into products or solutions. While academic research often provides the theoretical groundwork, industry research pushes innovation by applying these ideas in real-world situations.”

The Allure of Industry Labs

So, why are more researchers pursuing careers in industry labs rather than traditional academic institutions, and what are the advantages and disadvantages of each path?

“Many researchers are choosing to work at big companies due to the attractive compensation packages,” Soltanayev said. “Salaries in industry labs are typically much higher than those in academia, and they often come with additional benefits such as health insurance, retirement plans, and bonuses. In particular, stock options or equity can be a major draw, especially in tech companies where shares have the potential to grow significantly in value. These financial incentives can offer long-term security that’s harder to achieve in academia, where researchers may face grant-based funding cycles and lower salaries, especially in the early stages of their careers. The stability and benefits that big companies provide, combined with the opportunity to work on high-impact, well-funded projects, make industry labs an appealing choice for many.”

Industry research is often focused on achieving specific business goals and developing new products, which can limit researchers’ freedom to explore topics purely for the sake of knowledge.

“In contrast, academia offers the ability to pursue long-term, curiosity-driven projects, which can be deeply rewarding for those passionate about fundamental research,” Soltanayev said. “Academia also encourages the development of independent research programs and the ability to mentor and teach the next generation of scientists, which many researchers find fulfilling. That said, the “publish or perish” culture in academia can create pressure to produce papers frequently, which may sometimes limit the freedom to take big risks or explore novel ideas. Securing funding and tenure positions can also be highly competitive, adding to the stress of an academic career.”

The industry provides superior financial incentives, job security, and access to resources for tackling significant real-world challenges. On the other hand, academia offers greater intellectual autonomy and opportunities for self-directed research. Both paths have their own advantages, and the decision depends on the researcher’s personal motivations—whether they prioritize immediate impact and compensation or a deeper exploration of fundamental ideas.

The Future of Collaboration

Soltanayev envisions an even more intertwined future for academia and industry.

“I see the relationship between academia and industry in AI becoming even more collaborative,” Soltanayev said. “In the future, I expect to see more partnerships between universities and companies, where academic research provides the groundwork for industry to build upon, while companies provide the data, computing power, and funding necessary to drive large-scale experiments and applications. Companies will continue to play a major role in shaping AI’s future, particularly in applied research and development. With their vast amounts of data and access to powerful computing resources, they’re uniquely positioned to accelerate progress in machine learning, natural language processing, and computer vision.”

Organizations will maintain their influence on AI research by contributing to open-source projects, sharing data, and creating new tools and frameworks. This cooperative environment will play a critical role in expediting advancements in AI and ensuring its responsible progress. With the boundaries between academia and industry becoming increasingly indistinct, we can anticipate even more remarkable progress in AI, driven by the collaborative relationship between these two influential entities.

]]>
Behind the Scenes at the Berlin Big Data Center https://dataconomy.ru/2015/03/16/behind-the-scenes-at-the-berlin-big-data-center-bbdc/ https://dataconomy.ru/2015/03/16/behind-the-scenes-at-the-berlin-big-data-center-bbdc/#comments Mon, 16 Mar 2015 13:55:14 +0000 https://dataconomy.ru/?p=12316 When we think of “governments” and “data”, the associated thoughts that spring to mind aren’t often positive. But we’ve seen government-backed and -funded initiatives around the globe which aim to harness data science for positive growth and change. From fighting fires in Israel to amplifying the voice of the electorate in India, governments with big […]]]>

Berlin Big Data Center BBDC Stefan EdlichWhen we think of “governments” and “data”, the associated thoughts that spring to mind aren’t often positive. But we’ve seen government-backed and -funded initiatives around the globe which aim to harness data science for positive growth and change. From fighting fires in Israel to amplifying the voice of the electorate in India, governments with big data isn’t all bad news. Here, in Dataconomy’s home of Berlin, the German ministry of research and economics is funding the Berlin Big Data Center– an institution committed to advancing technology and innovation here and abroad. We recently spoke to Dr. Stefan Edlich, one of the center’s Principal Investigators, about the instituion, and what we can expect from them in the future.


 

Could you give us a brief introduction to yourself and the Berlin Big Data Center. 

In recent years the German ministry of research and economics have put big data on their agenda which resulted in several fundings in 2014. Two centres won the race for funds, one in Leipzig / Dresden called ScaDS and the other one in Berlin. With 3 universities, so many research institutes and a vibrant start-up community this is a perfect environment for cutting edge research in Berlin. For this reason we have many partners in bbdc.berlin working together in research: TU-Berlin, Zuse Institut Berlin, Fritz-Haber Institut, Max-Plank-Gesellschaft, DFKI and Beuth Hochschule.

Talk us through the conception of BBDC.

The bbdc.berlin has technical and non-technical tasks to solve. Let’s start with the non technical aspects: The first is education. Fortunately Germany and the government have come to conclusion that Data Science on Big Data is an important field and is one of the key success factors for the economy. That’s why TU-Berlin now offers a masters in this area, together with international partners. But of course the vision is much bigger. We need better funding, more master programs, a deeper integration with companies and much more. This goes hand in hand with the support of young researchers. Furthermore there are aspects of innovation and sustainability that our research has to bring. Finally, Germany needs successful and visible application examples as lighthouse projects to motivate and foster the industry to create similar projects.

What are the goals for the BBDC?

As researchers, some of our main goals are technical, and here we are focused on a fascinating field: We are trying to bring together scalable data processing with scalable machine learning. The heart of the project as scalable machine learning is still in its early days. There’s many people and the industry doing this with Hadoop or with languages as R for statistical analysis, but this is often not scalable nor in real-time. For this reason we need new systems, new libraries and comprehensive experiences with lots of application areas. We have many research areas around this topic as declarative programming models, debugging of such systems, adaptive big data processing, system integration and much more.

The other interesting part is that this core must be surrounded by application areas. Here we have strong partners doing research in material sciences, video and text mining, and image analysis in medicine. These are really exciting fields.

Do you have some examples of applications?

Let’s talk about two fields: Imagine what would happen if all videos (not only youtube) could be analyzed in a way so that you have the complete metadata of the film. Meaning the complete transcript and the complete action plot. For example at 5:45 in a film the computer can automatically derive that a green clothed man gets into a car and tells his neighbour to send best wishes to his wife! And this for the entire film! This would be a huge step towards real-time knowledge of any video / audio streams with interesting implications.

Another cool area is material sciences: Here you normally have a terabyte of data for just one material, with it’s features and the interactions with other materials. In the earlier days you would have to do many thousands of costly experiments to gain new insights about materials and their interactions in combination. But if you have the next generation big data processing system with a superior performance and strong machine learning capabilities, you are suddenly able to predict material features which can save you a lot of money and puts you ahead in global competition.

What system are you using?

Thanks to generous funding by a lot of institutions such as DFG, many universities as TU, HU, HIP and others have started to build a superior big data processing system called Stratosphere. During the lifetime of the bbdc.berlin project (that runs till 2018) we will leverage this to produce the strongest system in this area. Several successful steps have already been made since the start of bbdc.berlin: The project advanced to an Apache.org Top-Level project in record time and is now called Apache Flink. Anyone can now download it and use it within ten minutes!

What’s in store for the BBDC in 2015?

There is too much to enumerate! Some highlights: We will improve education activities and connect with the industry and many more research project up to the EU level. And I am sure the first practical results will be available this year. Another opportunity for everyone are the events we will be launching this year. One of the most visible one will be a conference around Apache Flink in mid-October where we will attract many from the industry, research and everyone interested in big data processing.

Photo credit: Alexander Steinhof / Foter / CC BY-NC-ND

]]>
https://dataconomy.ru/2015/03/16/behind-the-scenes-at-the-berlin-big-data-center-bbdc/feed/ 1
Using Wikipedia Data to Predict Box Office Success https://dataconomy.ru/2014/04/17/using-wikipedia-activity-data-forecast-movie-success/ https://dataconomy.ru/2014/04/17/using-wikipedia-activity-data-forecast-movie-success/#comments Thu, 17 Apr 2014 13:38:51 +0000 https://dataconomy.ru/?p=2002 My colleagues and I have devised a mathematical model which can be used to predict films that become blockbusters or flops at the box office – up to a month before the movie is released. Our model is based on an analysis of the activity on Wikipedia pages about American films released in 2009 and […]]]>

My colleagues and I have devised a mathematical model which can be used to predict films that become blockbusters or flops at the box office – up to a month before the movie is released.

Our model is based on an analysis of the activity on Wikipedia pages about American films released in 2009 and 2010. After examining 312 movies, taking into account the number of page views for the movie’s article, the number of human editors contributing to the article, the number of edits made and the diversity of online users, we could come up with good estimations for the prospective popularity of a movie at box office. The results obtained using this model, and the actual figures (published in Internet Movie Database (IMDb)) showed a high degree of correlation.

Yasseri_PLoSONE_Figure (1)Actual first weekend box office revenue in the United States against its predicted value based on Wikipedia data 30 days before the release. The green line, indicating the perfect prediction, is drawn for comparison. Each dot represents a movie from the sample and the size of the dot indicates the amount of the error in the prediction. Predictions for more successful movies are more accurate.

Their mathematical algorithm has allowed us to predict box office revenues with an overall accuracy of around 77 per cent. This level of accuracy is higher than the best existing predictive models applied by marketing firms (which they estimate to be at around 57 per cent). We could predict the box office takings of six out of 312 films with 99 per cent accuracy where the predicted value was within one per cent of the real value. Some 23 movies were predicted with 90 per cent accuracy and 70 movies with an accuracy of 70 per cent and above.

The more successful the show, the more accurately we were able to predict box office takings. This is possibly due to the increased amount of online data generated by films that turn out to be successful. The model correctly forecast the commercial success of Iron Man 2, Alice in Wonderland, Toy Story 3 and Inception, but failed to accurately forecast the financial return on less successful movies Never Let Me Go, and Animal Kingdom.

Box Office Prediction Graph

These results can be of great value to marketing firms but more importantly for us; we were able to demonstrate how we can use socially generated online data to predict a lot about future human behaviour.

We have demonstrated for the first time that Wikipedia edit statistics provide us with another tool to predict social events. We studied the problem of predicting the financial success of movies and concluded that, in some aspects, forecasting based on Wikipedia outperforms tweets as Wikipedia activity has a longer timescale which enables earlier predictions.

The efficiency of the predictions might be improved by applying more sophisticated statistical methods, such as including the controversy measure of an article.


taha_yasseriTaha Yasseri is a Big Data Research Officer at the Oxford Internet Institute. Prior to Oxford Internet Institute, he spent two years as a Postdoctoral Researcher at the Budapest University of Technology and Economics, working on socio-physical aspects of the community of Wikipedia editors, focusing on conflict and editorial wars, along with Big Data analysis to understand human dynamics, language complexity, and popularity spread.

This Research has been published in PLoS ONE and can be accessed at “Mestyán, M., Yasseri, T., and Kertész, J. (2013) Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data. PLoS ONE 8 (8) e71226.”


(Image Credit: Brett Sayer)

]]>
https://dataconomy.ru/2014/04/17/using-wikipedia-activity-data-forecast-movie-success/feed/ 6