Classification vs Regression in Supervised Learning

June 16, 2024

1. Supervised Learning

Supervised learning is a type of machine learning where the algorithm learns from labeled training data, which includes input features and corresponding output labels. Two main types of supervised learning algorithms are classification and regression, each serving different purposes based on the nature of the problem.

2. Classification

Definition: Classification is a supervised learning task where the goal is to predict the categorical class labels of new instances based on past observations.

Key Points:

Output Variable: The output variable in classification is categorical, representing discrete classes or categories.
Decision Boundaries: Classification algorithms learn decision boundaries to separate different classes in the input feature space.
Evaluation Metrics: Classification models are evaluated using metrics such as accuracy, precision, recall, F1-score, and confusion matrix.

Code Example (Python - Scikit-Learn):


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train a K-Nearest Neighbors classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')

3. Regression

Definition: Regression is a supervised learning task where the goal is to predict continuous numerical values based on input features.

Key Points:

Output Variable: The output variable in regression is continuous, representing a range of numerical values.
Function Approximation: Regression algorithms approximate the underlying function that maps input features to the target variable.
Evaluation Metrics: Regression models are evaluated using metrics such as mean squared error (MSE), root mean squared error (RMSE), and coefficient of determination (R-squared).

Code Example (Python - Scikit-Learn):


import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generate sample data
np.random.seed(0)
X = np.random.rand(100, 1) * 10
y = 2 * X.squeeze() + np.random.randn(100) * 2  # True relationship: y = 2x + noise

# Initialize and train a Linear Regression model
model = LinearRegression()
model.fit(X, y)

# Make predictions on new data
X_new = np.array([[5.0], [7.0], [9.0]])  # New data points
y_pred = model.predict(X_new)

print('Predictions:', y_pred)

4. Conclusion

In summary, classification and regression are two fundamental types of supervised learning algorithms, each suited for specific prediction tasks. Understanding the differences between them helps in choosing the appropriate algorithm for a given problem domain.

Search This Blog

PythonShot