training – Dataconomy

Understanding different machine learning techniques

Kerem Gülen — Fri, 12 Apr 2024 09:00:49 +0000

Is reinforcement learning supervised or unsupervised? While this technical question is important, let’s shift our focus to a business lens. Reinforcement learning (RL) holds immense potential for transforming decision-making processes and optimizing strategies across industries.

The sheer volume of data produced by computers, smartphones, and various technologies can be daunting, particularly for those uncertain about its implications. To harness this data effectively, researchers and programmers frequently employ machine learning to enhance user experiences.

Emerging daily are sophisticated methodologies for data scientists encompassing supervised, unsupervised, and reinforcement learning techniques. This article aims to succinctly describe supervised, unsupervised, and reinforcement learning, highlight their distinctions, and illustrate their applications by prominent companies.

Is reinforcement learning supervised or unsupervised?

Reinforcement learning carves its own path in the world of machine learning, distinct from both supervised and unsupervised learning. But first let’s learn what are supervised and unsupervised learning first.

What is supervised learning?

Supervised learning is a machine learning technique where a model is trained on a labeled dataset. This means the data includes both input examples and their corresponding desired outputs (labels). The goal is for the model to learn the relationship between the inputs and outputs, so it can accurately predict the output for new, unseen data.

Think of it like a student learning with a teacher. The labeled dataset is like practice problems with solutions. The student (the model) studies these examples and the teacher (the algorithm) guides the learning process. The goal is for the student to learn how to solve similar problems independently.

Key concepts:

Labeled data: The heart of supervised learning. Each data point has an input (features) and its corresponding correct output (label).
Training: The model is fed the labeled data. It analyzes patterns and correlations between inputs and outputs.
Learning function: The model develops a mathematical function that maps inputs to outputs as accurately as possible.
Prediction: Once trained, the model can take new inputs and predict their corresponding outputs.

Supervised learning is a machine learning technique where a model is trained on a labeled dataset

What is unsupervised learning?

Unsupervised learning is a machine learning technique where the model is trained on an unlabeled dataset. This means the data only includes the inputs, with no corresponding target outputs. The goal is for the model to discover hidden patterns, structures, or relationships within the data itself.

Think of it like a child exploring a new environment without any specific instructions. The child learns by observing patterns, grouping similar objects, and understanding relationships without anyone directly telling them what things are called.

Key concepts:

Unlabeled data: Unsupervised learning doesn’t have pre-defined answers to learn from.
Pattern discovery: The model analyzes the data to find similarities, differences, and underlying structures.
No explicit guidance: No “teacher” corrects the model. It learns through self-discovery.

Unsupervised learning is a machine learning technique where the model is trained on an unlabeled dataset

What is reinforcement learning?

Reinforcement learning is a type of machine learning where an agent learns through trial and error by interacting with an environment. The agent tries different actions, receives rewards or penalties based on its actions, and adjusts its strategy to maximize the total reward over time.

Imagine training a dog. You don’t explicitly tell the dog how to sit. Instead, you give it rewards (treats) when it performs actions that lead to sitting. Over time, the dog learns to associate sitting with rewards

Key concepts:

Agent: The decision-maker, the entity that learns.
Environment: The system the agent interacts with.
State: The current situation of the agent within its environment.
Actions: What the agent can do in its environment.
Rewards: Positive or negative feedback signals the agent receives for its actions.
Policy: The strategy the agent uses to determine what action to take in a given state.

Reinforcement learning is a type of machine learning where an agent learns through trial and error by interacting with an environment

Which machine learning technique to choose?

There’s no single “best” machine learning technique that universally outperforms all others. The best technique depends entirely on these factors:

The problem: What task are you trying to solve?
- Classification (e.g., email spam filtering)?
- Regression (e.g., predicting housing prices)?
- Clustering (e.g., grouping customers)
- Anomaly detection (e.g., identifying fraudulent transactions)?
Type of data:
- Is your data labeled or unlabeled?
- How large is your dataset?
- Is the data structured (e.g., numbers, categories) or unstructured (e.g., images, text)?
Desired performance:
- Do you prioritize speed or high accuracy?
- How important is it for the model to be easily interpretable (understanding how it makes decisions)?

Choose supervised learning if you have a dataset with labeled examples (input data and their corresponding correct outputs). Popular techniques include Linear Regression (for predicting continuous values), Logistic Regression (for classification), Decision Trees (for creating rule-based models), SVMs (for finding boundaries between data classes), and Neural Networks (for complex pattern recognition).

Unsupervised learning is perfect for exploring your dataset, uncovering hidden patterns, or grouping similar data points when you don’t have a predefined outcome in mind. Popular techniques include K-Means Clustering (grouping data by similarity), Principal Component Analysis (PCA) (reducing data complexity), and Autoencoders (for finding compact representations of data).

Reinforcement learning is particularly useful for problems focused on decision-making with long-term rewards, like in games or robotics. In reinforcement learning, an agent interacts with an environment, gets feedback in the form of rewards or penalties, and learns the optimal strategy to maximize rewards over time.

Image credits: Kerem Gülen/Midjourney

Study: Dealing with increasing power needs of ML

Kerem Gülen — Fri, 03 Jun 2022 15:03:57 +0000

A recent research from MIT Lincoln Laboratory and Northeastern University has investigated the savings that can be made by power capping GPUs used in model training and inference and several different methods to reduce AI energy use in light of growing concern over huge machine learning models’ energy demands.

Power capping can significantly reduce energy usage when training ML

The study’s major problem focuses on power capping (cutting off the available power to the GPU training the model). They think power capping results in significant energy savings, especially for Masked Language Modeling (MLM) and frameworks like BERT and its descendants. Language modeling is a rapidly growing area. Did you know that Pathways Language Model can explain a joke?

Similar cost savings may be had due to reduced training time and energy usage for larger-scale models, which have grabbed people’s attention in recent years owing to hyperscale data and new models with billions or trillions of parameters.

For bigger deployments, the researchers found that lowering the power limit to 150W produced an average 13.7% reduction in energy usage and a modest 6.8% increase in training time compared to the standard 250W maximum. If you want to dig into more detail, find out how to manage the machine learning lifecycle by reading our article.

Researchers think that power capping results in significant energy savings, especially for Masked Language Modeling.

The researchers further contend that, despite the headlines about the cost of model training in recent years, the energy requirements of utilizing those trained models are significantly greater.

“For language modeling with BERT, energy gains through power capping are noticeably greater when performing inference than training. If this is consistent for other AI applications, this could have significant ramifications in energy consumption for large-scale or cloud computing platforms serving inference applications for research and industry.”

Finally, the study claims that extensive machine learning training should be limited to the colder months of the year and at night to save money on cooling.

For language modeling with BERT, energy gains through power capping are noticeably greater when performing inference than training.

“Evidently, heavy NLP workloads are typically much less efficient in the summer than those executed during winter. Given the large seasonal variation, if there, are computationally expensive experiments that can be timed to cooler months this timing can significantly reduce the carbon footprint,” the authors stated.

The study also recognizes the potential for energy savings in optimizing model architecture and processes. However, it leaves further development to other efforts.

Finally, the authors advocate for new scientific papers from the machine learning industry to end with a statement that details the energy usage of the study and the potential energy consequences of adopting technologies documented in it.

The study titled “Great Power, Great Responsibility: Recommendations for Reducing Energy for Training Language Models” was conducted by six researchers Joseph McDonald, Baolin Li, Nathan Frey, Devesh Tiwari, Vijay Gadepally, Siddharth Samsi from MIT Lincoln and Northeastern University.

How to create power-efficient ML?

To achieve the same level of accuracy, machine learning algorithms require increasingly large amounts of data and computing power, yet the current ML culture equates energy usage with improved performance.

According to a 2022 MIT collaboration, achieving a tenfold improvement in model performance would need a 10,000-fold increase in computational requirements and the same amount of energy.

As a result, interest in more power-efficient effective ML training has grown in recent years. According to the researchers, the new paper is the first to focus on the influence of power constraints on machine learning training and inference, with particular emphasis paid to NLP approaches.

“[This] method does not affect the predictions of trained models or consequently their performance accuracy on tasks. That is, if two networks with the same structure, initial values, and batched data are trained for the same number of batches under different power caps, their resulting parameters will be identical, and only the energy required to produce them may differ,” explained the authors.

The experiments indicate that implementing power capping can significantly reduce energy usage. (Image Credit)

To evaluate the impact of power capping on training and inference, the researchers utilized Nvidia-smi (System Management Interface) and a HuggingFace MLM library.

The researchers trained BERT, DistilBERT, and Big Bird using MLM and tracked their energy usage throughout training and deployment.

For the experiment, DeepAI’s WikiText-103 dataset was used for four epochs of training in batches of eight on 16 V100 GPUs, with four different power capping: 100W, 150W, 200W, and 250W (the default or baseline for an NVIDIA V100 GPU). To guard against bias during training, scratch-trained parameters and random init values were used.

As demonstrated in the first graph, with favorable changes in training time and non-linear, a great amount of energy savings may be achieved.

“Our experiments indicate that implementing power caps can significantly reduce energy usage at the cost of training time,” said the authors.

The authors then used the same method to tackle a more challenging problem: training BERT on dispersed configurations of numerous GPUs, which is a more typical case for well-funded and well-publicized FAANG NLP models.

The paper states:

“Averaging across each configuration choice, a 150W bound on power utilization led to an average 13.7% decrease in energy usage and 6.8% increase in training time compared to the default maximum. [The] 100W setting has significantly longer training times (31.4% longer on average). The authors explained that a 200W limit corresponds with almost the same training time as a 250W limit but more modest energy savings than a 150W limit,” explained the authors.

The researchers determined that these findings support the notion of power-capping GPU architectures and applications that run on them at 150W. They also noted that energy savings apply to various hardware platforms, so they repeated the tests to see how things fared for NVIDIA K80, T4, and A100 GPUs.

Inference requires a lot of power

Despite the headlines, it’s inference (i.e., utilizing a completed model, such as an NLP model) rather than training that has the greatest amount of power according to prior research, implying that as popular models are commercialized and enter the mainstream, power usage might grow more problematic than it is at this early phase of NLP development.

The researchers quantified the influence of inference on power usage, finding that restricting power use has a significant impact on inference latency:

“Compared to 250W, a 100W setting required double the inference time (a 114% increase) and consumed 11.0% less energy, 150W required 22.7% more time and saved 24.2% the energy, and 200W required 8.2% more time with 12.0% less energy,” explained the authors.

The paper’s authors propose that training might be done at peak Power Usage Effectiveness (PUE).

The importance of PUE

The paper’s authors propose that training might be done at peak Power Usage Effectiveness (PUE), roughly in the winter and night when the data center is most efficient.

“Significant energy savings can be obtained if workloads can be scheduled at times when a lower PUE is expected. For example, moving a short-running job from daytime to nighttime may provide a roughly 10% reduction, and moving a longer, expensive job (e.g., a language model taking weeks to complete) from summer to winter may see a 33% reduction. While it is difficult to predict the savings that an individual researcher may achieve, the information presented here highlights the importance of environmental factors affecting the overall energy consumed by their workloads,” stated the authors.

Finally, the paper suggests that because local processing resources are unlikely to have implemented the same efficiency measures as big data centers and high-level cloud computing players, transferring workloads to regions with deep energy investments may provide environmental benefits.

“While there is convenience in having private computing resources that are accessible, this convenience comes at a cost. Generally speaking, energy savings and impact are more easily obtained at larger scales. Datacenters and cloud computing providers make significant investments in the efficiency of their facilities,” added the authors.

This is not the only attempt to create power-efficient machine learning and artificial intelligence models. The latest researches show that nanomagnets will pave the way for low-energy AI.

Data Science Career Building: Our Top 5 Articles

Dan Gray — Tue, 28 Jul 2015 09:36:23 +0000

What does it take to get into tech’s hottest career path?

Over the past year we’ve posted many articles related to getting into Data Science and the related fields. Experts from across the industry have talked about the core technical skills and where they can be acquired. Celebrating the launch of our Dataconomy candidate database we wanted to provide a quick round-up of the top 5:

1) The Data Science Skills Network

A look at the top skills listed by Data Scientists on LinkedIn (from a Data Scientist at LinkedIn!) and the part they play in the world of Data Science.

2) The Top 10 Data Science Skills, and How to Learn Them

A follow up to Ferris’ Data Science Skills Network article, looking at where you can go to learn the top 10 skills.

3) Why You Should Learn R First for Data Science

Joshua Ebner looks at the choices of programming language for aspiring Data Scientists, and why R is his choice for the first to learn.

4) The Importance of Soft Skills in Data Science

Paedar Coyle looks at all of the related soft skills that are crucial to accelerating your career as a Data Scientist and adding value to your employer.

5) 10 Online Big Data Courses

Eileen McNulty looks at some of the best ways to sharpen up your Data Science skill-set from the comfort of your own home.

Anything else we’re missing? Any thoughts on the above? Leave a comment or drop us a line through the contact form on our website!

If you’re looking for new opportunities in the field of Data Science, be sure to add your details to our candidate database and we’ll match you up with suitable vacancies!

(image credit: Texas A&M University)

Hortonworks’ Comprehensive Certification Program for Enterprise Hadoop Expands Domain with Latest Additions

Eileen McNulty — Tue, 06 Jan 2015 16:14:05 +0000

Enterprise Apache Hadoop providers Hortonworks revealed earlier last month the expansion of the Hortonworks Certified Technology Program to include certification for ‘key capabilities of operations, security and governance focused tools and applications supporting the growth of enterprise Hadoop and ecosystem integration.

This comes as a followup to the introduction of the certification program few months back, which saw more than 70 technologies become HDP™ YARN Ready.

“By offering one of the most comprehensive certification programs in the market, our technology partners are better enabled to help drive the modernization of enterprise data architectures,” explains Tim Hall, vice president of product management at Hortonworks. “Our program gives vendors extremely valuable guidance and integration tools for the ecosystem and allows customers to be assured of an ecosystem that integrates well.”

“Our partnership with Hortonworks enhances the customer value of the HP Haven platform,” noted Chris Selland, Vice President, HP Software Big Data Business Group, at HP. “HP is proud to be a key member of the Hortonworks Certification program, as our organizations work together to provide comprehensive capabilities for our joint customers, and drive successful enterprise Big Data projects with speed and confidence.”

A press release announcing the expansion further added that ISVs can now more precisely integrate Hortonworks’ 100-percent open source Apache Hadoop platform with their applications.

Salient aspects of the new certification program components include:

HDP Operations Ready: Delivers assurance to manage and run applications on HDP from an operational perspective. Specifically, integrates with Apache Ambari, using Ambari as a client to an enterprise management system, integrating Ambari-managed Hadoop components via Ambari Stacks, or providing tailored user tools with Ambari Views.
HDP Security Ready: Delivers tested and validated integration with security-related components of the platform. Beyond the ability to work in a Kerberos-enabled cluster, it is also designed to work with the Apache Knox gateway and Apache Ranger for comprehensive security integration and centralized management.
HDP Governance Ready: Provides assurance that data is integrated into the platform via automated and managed data pipelines as described and facilitated by the Apache Falcon data workflow engine.

Follow @DataconomyMedia

(Image credit: Hortonworks)

Booz Allen Hamilton Launch Data Science Training Programme To Address Data Scientist Shortage

Eileen McNulty — Mon, 20 Oct 2014 08:42:38 +0000

In a move that directly addresses the need for a greater number of data scientists, strategy & consulting giants Booz Allen Hamilton launched the Explore Data Science online training program last week.

BAH is calling it “a self-paced, hands-on course geared toward all levels of data science proficiency – from introductory to professional,” that introduces common data science theory and techniques to help programmers, mathematicians, and other technical professionals expand their data science expertise.

“In today’s highly competitive marketplace, being able to effectively compile, manage and analyze data is critically important to a business’ growth, strategic development and overall success,” explain Peter Guerra, a Principal in the firm’s Strategic Innovation Group.

“We’re therefore seeing a clear demand – across all industries – for data science expertise, but it far exceeds the available supply of trained professionals. By introducing training like Explore Data Science, we’re providing companies and interested individuals with an invaluable resource while empowering an increasingly data-driven workforce,” he added.

The application interface is laced with “galactic-themed gamification” elements that run through data science principles as students face increasingly advanced, scenario-based challenges across 32 missions.

Students are expected “transform raw information into business-critical insights,” at the successful completion of which participants earn points, awards and badges that are shareable on social media, also enabling users to “develop or increase competency in the core tenets of data science.”

“For example, analyzing a densely forested planet’s wildlife to identify the most dangerous plants and animals introduces players to data organization techniques as well as algorithms for pattern learning – integral tools for improving business outcomes,” enunciates the press release.

Earlier, last year, the firm released the Field Guide to Data Science, whose principals are complemented in the Explore Data Science training program “enabling future, or current, data scientists to practice, test and grow their capabilities in a simulation-based environment.”

Data Science workshop in London, from Pivigo recruitment and KGMD

admin — Mon, 04 Aug 2014 12:50:37 +0000

In order to address the lack of skilled data scientists in Europe, Professional Services giant, KPMG and London based training and recruitment agency, Pivigo Recruitment have joined efforts to roll out their Data Science boot-camp, aptly titled, Science to Data Science (S2DS).

The program is led by Pivigo’s Managing Director, Ms. Kim Nilsson, herself a data scientist and an astronomer. She believes that many companies in Europe are yet to understand the significance of data science.

“They know it is coming, they want to get started but they are a little afraid to invest,” Ms. Nilsson said. A general trend in the investing companies is to look out for “data unicorns” who would code and analyze data. “These people don’t exist and if they do exist, they are extremely expensive,” she said. She advised that companies should rather direct efforts to building teams with complementing skills

85 analytical PhD students between the ages of 27 and 41, representing over 24 different countries, and from fields as varied as astronomy, physics, economics and neuroscience will be participating in this five week course. They will be mentored through 23 actual commercial problems from Pivigo’s clients such as Royal Mail, KPMG and Digital Shadows to name a few.

A report published by the Royal Statistical Society and Nesta, an independent innovation Foundation, highlights the severe lack of talent with the right data skills to quench the industry need for data scientists that seems unavoidable. Such an exercise could prove essential in building a strong industry directed talent force in the near future.