Fundamentals of AI

Introduction to Artificial Intelligence

For something to be considered AI, it must be able to learn from data and adapt so that its performance improves. On the simpler end of the spectrum, AI systems are based on predefined rules and logic, such as decision trees, rule-based systems, and basic automation tools. Often referred to as expert systems, early AI primarily used IF-else logic to perform inferences about data. An email spam filter is a simple example of AI. The techniques spammers use are always changing, so to effectively classify messages as spam, the classification software must continually incorporate new data. On the more complex end of the spectrum, AI systems utilize machine learning, deep learning, and neural networks (computer systems designed to model the human brain) to perform tasks without explicit programming for each scenario. Examples of complex AI include applications such as image recognition systems, natural language processing models, and autonomous vehicles.
The artificial intelligence many people are familiar with refers to advanced machine learning models, especially generative deep learning models that create new content. These models, such as OpenAI’s ChatGPT, can produce human-like text and perform various language tasks, including answering questions, writing creatively, game conceptualization, and generating false information. However, there is more to generative models than just text generation. For example, Adobe’s AI-powered Generative Fill in Photoshop allows users to seamlessly add or remove elements from images, creating realistic modifications based on simple text prompts. Similarly, generative models can be used for music composition, in which they to generate original pieces of music that mimic different styles and genres. These systems are very different from the AI of the past.

Traditional AI

Also referred to as classical, rule-based, or symbolic AI, traditional AI is an early instance of artificial intelligence that works primarily with logic and rules. In these systems, human thought processes are imitated by formal logic to manipulate symbols and derive conclusions. A symbol can represent concepts like ideas in mathematics or language—it could be words, letters, numbers, or any other form of notation that stands for something else.
Encoding knowledge as symbols and using logical rules to manipulate these symbols uses formal logic to infer conclusions from premises. An example of this process is a medical diagnosis system in which there is a rule that states “IF patient has fever AND cough THEN diagnose as flu.” Such systems are very interpretable because the rules and reasoning processes are clear and easy to comprehend.
Knowledge representation is an important aspect of traditional AI that involves the structuring of information to facilitate reasoning and problem-solving. Semantic networks represent knowledge in graphs where nodes represent concepts and edges represent relationships between concepts. For example, a semantic network for animals might include nodes for "mammal," "bird," and "fish," with edges indicating relationships like "is-a" or "has-a." Semantic networks help AI systems understand and process relationships between different concepts.
Similarly, frames are data structures for representing stereotyped situations, like objects in object-oriented programming. Frames include slots for attributes and values, providing a structured way to represent knowledge and support reasoning about objects and their properties. For example, a frame for a "house" may include slots for "rooms," "roof type," and "building material."

Traditional AI Process Flow

Traditional AI expert systems use symbolic AI to emulate the decision-making capabilities of human experts. An AI expert system is made up of two basic components: the knowledge base, which holds domain-specific information and rules, and the inference engine, which uses these rules on input data to reach conclusions or decisions.
The traditional AI process begins with user input, in which users supply data or ask questions. Rules, facts, and domain knowledge are stored within the knowledge base, represented symbolically and logically.
As shown in the diagram, the processing flow demonstrates the real-time interaction between the inference engine and the knowledge base. First, rules and facts are retrieved from the knowledge base by the inference engine. The inference engine consults the knowledge base to interpret what has been entered by users as input and then performs data parsing, preprocessing, rule matching, and inference-making. The processing flow is an iterative cycle that involves always consulting with a knowledge base to ensure that it keeps improving on its intermediate results until it arrives at a good enough solution. This cyclical system breaks down complex queries into smaller parts to solve them more easily while ensuring accurate context-specific outputs.
Finally, the results or decisions made by the inference engine are presented as output. The output is the response or action taken by the system based on the logical analysis of the input data and the rules defined in the knowledge base.

Traditional AI Challenge

Traditional AI Challenges Scalability is one of the biggest problems with traditional AI systems. Traditional AI must be programmed explicitly for each possible situation an application can face. As more rules and cases are added, managing and maintaining the system becomes harder and harder. Not only does this complexity make management of the AI more difficult, but it also slows down system performance. For instance, a customer service, rule-based system may become infeasible as it attempts to cover every conceivable customer inquiry, resulting in slower response times and increased maintenance costs.
A major limitation of traditional AI is its lack of understanding. Traditional AI systems work with pre-defined rules and do not comprehend the context, generalizing from limited information. Such rigidity means that they cannot handle new or unexpected events. For example, a conventional AI-based chatbot may be unable to deal with ambiguous or complex queries outside its programmed responses, leading to poor user experience. This restriction emphasizes the need for more flexible AI systems that can learn and grow from new data.
Traditional AI relies heavily on fixed guidelines, limiting its adaptability to the variability seen in real-world situations. Modern AI techniques such as machine learning (ML) differ from traditional AI by training algorithms to recognize patterns and make predictions instead of relying on explicit rules alone. This data-driven methodology enables ML models to adjust and improve over time, making them more adaptable than traditional AI. Deep learning (DL) is a branch of machine learning that improves AI by using neural networks to learn from large datasets. DL models can perform complicated tasks like image and speech recognition, which are not possible for traditional AI. Though traditional AI is logically transparent and easily traceable in decision-making, its rigidity and limited scalability are serious disadvantages when matched against modern approaches like deep learning.

Modern Applications of Traditional AI

Traditional AI has played a significant role in the development of modern AI. Early artificial intelligence research brought many ideas, including search algorithms, knowledge representation, and rule-based thinking. These concepts have been improved upon over time, but their essential principles still play an important part in today’s AI systems. For example, search algorithms derived from traditional AI are still used in various applications, including path finding in robotics (navigating an environment) and data analysis optimization. The foundational work in traditional AI has led the way to contemporary AI advancements.

Traditional AI Concept

Modern AI Application

Search algorithms to navigate paths.

Machine learning for dynamic path finding in robotics.

Symbolic representation to encode knowledge.

Knowledge graphs and embeddings for semantic understanding.

Rule-based systems with IF-then logic.

Decision trees and random forests for predictive analytics

Predefined optimization rules.

Machine learning algorithms for data optimization and pattern recognition.

Expert systems with predefined rules for medical diagnosis.

Machine learning models for clinical decision support, predicting outcomes based on patient data.

Formal logic to draw conclusions.

Deep learning in NLP to understand and generate human language.

Traditional AI has significantly impacted the computer and cognitive sciences significantly. The work in traditional AI that investigates the way machines can imitate human thinking has resulted in new algorithms, data structures, and methods for solving problems.
One of traditional AI’s major contributions is John McCarthy’s creation of Lisp – a programming language designed specifically for use in artificial intelligence research. This language features strong abilities regarding symbolic computation that have shaped many contemporary languages used today, and it is still being used within the field of AI. The insights gained from traditional AI research have driven innovation and enhanced understanding of both human cognition and computational processes.
Traditional AI has also played a role in shaping AI R&D. The limitations of traditional AI, such as its reliance on explicit rules and difficulty in handling complex, unstructured data, have driven the development of new approaches like machine learning and deep learning. These innovative techniques use vast amounts of data combined with advanced algorithms to learn from experience over time, thus overcoming some shortcomings associated with traditional AI. The shift from rule-based systems to data-driven approaches demonstrates the ongoing evolution of AI.

Machine Learning vs Deep Learning

The two main subsets of modern AI, machine learning (ML) and deep learning (DL), take different approaches in how they analyze and learn from the data provided to them. Developers and businesses need to understand the differences between ML and DL. This knowledge allows them to select the most appropriate AI technique for their specific needs. Moreover, this knowledge drives innovation, encouraging the development of hybrid models that combine the strengths of both ML and DL. For instance, a hybrid approach might use ML for initial data preprocessing and feature selection, followed by DL for complex pattern recognition and prediction.

Machine Learning

Consists of algorithm using statistical techniques to improve specific tasks as they gain more experience over time.
Simply speaking, ML use training data to train a model then allows that model to try to make predictions for new data that was not trained on
Example of ML: email spam filters in which algorithms try to identify and filter out spam emails based on learned patterns from previously categorized emails

Deep Learning

Use neural networks with many layers (referred to deep neural networks) to learn from vast amounts of data.
Example of DL is image and sound recognition system

Data Requirements

ML often requires less data and can work effectively with structured data. For example, an ML model for predicting house prices might need fewer data points and can operate with a simpler structure. Therefore, the data requirements are much lower. With less data required to train the model, training these models can be accomplished much quicker and requires less computational power. Training an ML model for a basic task, such as email filtering, might take minutes to hours. This behavior makes ML suitable for tasks where data is limited or structured. ML models are generally easier to interpret and explain due to their simpler structure. These factors make ML accessible for a wider range of tasks. However, ML models cannot efficiently handle unstructured data for tasks like image or speech recognition.
In contrast, DL requires a much larger amount of data. Some of the types of data it might require for its tasks are likely to be large, particularly unstructured data like images and audio. Other types of data, like human-generated text from novels or news reports, might be small, but they still likely require a significant amount of processing time because it is unstructured. The model needs to do a lot of work to figure out what that text means. In either case, training a DL model for speech recognition, for example, can take days to weeks, demanding significant computational power and resources. Additionally, the decisions of DL models can be difficult to interpret and explain. Their decision-making processes are embedded deep in complex network structures from which it is hard to gain insight into the details of a specific decision.

Machine Learning and Deep Learning Techniques and Methodology

Machine Learning encompasses many techniques and methodology, some of which are:
- Supervised
- Unsupervised
- Reinforcement
- Deep Learning

Supervised Learning

Supervised learning is a common approach in which models are trained on labeled data, meaning each input has a corresponding output. This technique includes methods such as regression and classification in which the model learns to predict outcomes based on input features.
The figure on “Retirement Preparedness” demonstrates the concept of supervised learning. Underlying this chart, there is a dataset full of labeled data points representing individuals' ages, their corresponding retirement account balances, and whether they are "On Track" or "Not on Track" for retirement. In supervised learning, models are trained on such labeled data in which each input combination (in this case, age and retirement balance) has an associated output label (retirement preparedness).
Supervised learning allows the algorithms to look at the data and discover the pattern of when the verified data suggests that someone is properly prepared for retirement. It has a set of verified inputs (age, balance) and outputs (prepared or not), and it will attempt to map new inputs to an output that seems to fit with the verified outputs in the training set. The training process involves the model analyzing these data points to learn the boundary that separates the "On Track" and "Not on Track" categories. The verified outputs in the training dataset are why this technique is considered supervised.

Unsupervised Learning

In contrast, unsupervised learning lacks these verified I/O mappings. This strategy involves training models on unlabeled data to discover hidden patterns and structures within it. Some techniques in unsupervised learning include clustering and dimensionality reduction.
An unsupervised learning model can analyze the data points without the verified preparedness output in the training data. Or the model could use clustering techniques to group individuals based on similarities in their age and retirement balance. It might identify and group customers who are above a trendline (represented by a certain ratio between age and retirement account balance) while placing others who are below that trendline in a different group. These clusters can reveal hidden insights into common characteristics among different groups. Unfortunately, because the model lacks the verified outputs of “prepared” or “not prepared” in the training data, a researcher or financial analyst would be needed to try to interpret these clusters to extract meaningful insights. Unsupervised learning models can, therefore, assist financial advisors in discovering these clusters, and allow the advisors to tailor their advice to these different groups.

Reinforcement Learning

Reinforcement learning is another core technique in ML in which models learn to make decisions by interacting with an environment and receiving rewards or penalties. This approach is widely used in robotics, gaming, and autonomous systems. An example of reinforcement learning is training an AI system to play a video game by rewarding it for achieving high scores, encouraging the system to prioritize strategies that maximize its performance.

Deep Learning

Lastly, deep learning builds on all these learning core concepts. DL models use neural networks with multiple layers to learn from large datasets. These networks consist of interconnected nodes, or neurons, which are organized into layers. Each layer processes the input data and passes the output to the next layer, allowing the model to learn complex representations of the data. This deep structure enables the model to automatically extract features from raw data, reducing the need for manual feature-engineering.
In an approach blending supervised and unsupervised learning, a DL model first learns from the labeled data in the preceding figure to identify patterns and relationships that distinguish "Good" instances from "Bad" ones. This initial training phase uses techniques like backpropagation and gradient descent to minimize the error between the model's predictions and the actual labels. During this phase, the model goes through multiple epochs, each representing one complete pass through the labeled dataset. These epochs are essential as they allow the model to adjust its weights and improve its accuracy iteratively.
After the initial training, the model is used to predict labels for some new or novel unlabeled data. These predictions are referred to as pseudo-labels because they aren’t verified yet but are instead generated based on the model's current understanding of what separates “good” and “bad.”

Next, the pseudo-labeled data is combined with the original labeled data to create a new epoch of an expanded training set. This expanded set includes both the original labeled examples and the newly pseudo-labeled examples. The model then undergoes additional training using this expanded dataset. During this phase, the model refines its predictions by adjusting its parameters to minimize errors across both labeled and pseudo-labeled data. This process can be repeated many times to refine and improve the accuracy of the results.

Machine Learning and Deep Learning Applications and Use Cases

Machine learning and deep learning have distinct use cases based on their strengths and weaknesses. With their simpler structure, ML models are easier to maintain and scale in environments where the data and requirements are relatively static. DL models, with their ability to learn from vast amounts of data, are more adaptable to dynamic environments where data is constantly changing.

Machine Learning

Deep Learning

Predictive Analytics

Image Recognition

Recommendation Systems

Speech Recognition

Anomaly Detection

Natural Language Processing

Credit Scoring

Autonomous Driving

Predictive Maintenance

Medical Image Analysis

Customer Segmentation

Algorithm Trading

ML is often preferred for applications requiring high interpretability and transparency.
- In finance, an ML model might be used for credit scoring because its decision-making process can be easily understood and explained to stakeholders, a requirement for regulatory compliance.
- In the automotive industry, ML algorithms are used for predictive maintenance, analyzing sensor data to predict equipment failures and schedule timely repairs. It is commonly used in predictive analytics, recommendation systems, and anomaly detection.
- ML models are used by marketers to segment customers and predict their buying behavior based on structured data like past purchases and demographic information. Having structured data and the desire for clear actionable insights makes ML appropriate for these types of applications.
In contrast, DL outperforms in handling complex, unstructured data and delivering high accuracy in tasks such as image and speech recognition.
- In healthcare, DL models are used for medical image analysis, where the ability to extract features from large volumes of data automatically can significantly improve diagnostic accuracy.
- Self-driving cars use DL models to process and interpret visual data from cameras, enabling the vehicle to identify objects, detect lane markings, and make driving decisions.
- It is also used in voice-activated assistants like Siri and Alexa, which understand and respond to spoken commands by analyzing speech patterns.
- DL models can also be employed in more complex tasks like sentiment analysis in which the systems analyze unstructured data from social media posts and review the data to gauge public opinion about products and brands.
ML and DL can complement each other to great effect in specific domains:
- In healthcare, ML models are used for patient diagnosis and risk prediction, while DL can be used for medical image analysis, interpreting complex visual data from scans and X-rays to detect diseases.
- Complementary methods can be found in the finance industry in which ML is applied for credit scoring and fraud detection, while DL models are used for algorithmic trading and sentiment analysis.
There are significant considerations about how the different types of learning may influence ethics and responsibility in AI development. Machine learning's transparency makes it easier to detect and mitigate biases in decision-making processes such as those used in applications like hiring and lending in which fairness is paramount. Deep learning's complexity, however, needs more robust mechanisms to ensure accountability and prevent unintended consequences. For example, in healthcare, where AI-driven diagnostic tools can significantly impact patient outcomes, it is vital to ensure that DL models are trained on diverse and representative datasets to avoid biased or inaccurate predictions.

Reference

Cisco Validated Design for Data Center Networking Blueprint for AI/ML Applications

PreviousAIML Clusters and Models NextGenarative AI

Last updated 8 months ago