Course · 12 modules · 82 lessons · 495 min

Machine Learning Foundations

Mathematical foundations, learning theory, supervised and unsupervised methods, neural networks, and production ML systems.

← All courses
Your progress0 / 82 lessons· 0%

The course at a glance

12 modules · click any tile to jump to its lessons.

All lessons

01Mathematical Foundations
01Derivatives and GradientsThe mathematical machinery for measuring how outputs change with inputs -- the foundation of all learning algorithms.5 min02Information TheoryEntropy, KL divergence, and mutual information -- quantifying uncertainty, surprise, and the difference between distributions.5 min03Matrix DecompositionsEigendecomposition, SVD, and Cholesky -- factoring matrices to reveal structure, compress data, and solve systems efficiently.5 min04Maximum Likelihood EstimationFinding the parameter values that make observed data most probable -- the dominant paradigm for fitting ML models.5 min05Norms and Distance MetricsMeasuring size and similarity in feature space -- L1, L2, cosine, Mahalanobis, and when each is appropriate.6 min06Optimization and Gradient DescentIteratively adjusting parameters to minimize a loss function -- the engine that drives model training.5 min07Probability FundamentalsRandom variables, distributions, Bayes' theorem, and conditional probability -- the language of uncertainty in ML.5 min08Statistical InferenceDrawing conclusions about populations from samples -- hypothesis testing, confidence intervals, and the frequentist-Bayesian divide.5 min09Vectors and MatricesThe fundamental data structures of ML -- representing data as points in high-dimensional space and transformations as matrices.5 min
02Data Science Fundamentals
01Data Cleaning and PreprocessingHandling noise, inconsistencies, and formatting issues -- garbage in, garbage out is the first law of ML.6 min02Data Splitting and SamplingTrain/validation/test splits, stratification, and handling class imbalance -- the foundation of honest evaluation.8 min03Data Types and StructuresNumerical, categorical, ordinal, text, time series -- understanding your data's nature determines every downstream decision.5 min04Encoding Categorical VariablesOne-hot, label, target, and embedding-based encoding -- translating categories into numbers without introducing false relationships.7 min05Exploratory Data AnalysisVisualizing distributions, correlations, and anomalies before modeling -- the most undervalued step in the ML pipeline.6 min06Feature Scaling and NormalizationStandardization, min-max scaling, and robust scaling -- ensuring features contribute equally regardless of their original units.6 min07Handling Missing DataDeletion, imputation, and model-based approaches -- the strategy depends on why data is missing, not just how much.7 min
03Core Learning Theory
01Bias-Variance TradeoffThe fundamental tension between underfitting and overfitting -- every model navigates this tradeoff whether you manage it or not.6 min02Curse of DimensionalityAs dimensions increase, data becomes sparse, distances become meaningless, and exponentially more data is needed.6 min03Empirical Risk MinimizationMinimizing average loss on training data as a proxy for true risk -- the theoretical framework underlying most ML algorithms.6 min04Loss FunctionsThe objective being optimized -- MSE, cross-entropy, hinge loss, and how the choice shapes what the model learns.5 min05Overfitting and UnderfittingMemorizing training data vs. failing to capture patterns -- the two failure modes of every learning algorithm.6 min06RegularizationConstraining model complexity to improve generalization -- L1, L2, dropout, early stopping, and the bias-variance connection.6 min07Types of Machine LearningSupervised, unsupervised, semi-supervised, and self-supervised -- a taxonomy based on what labels are available.6 min08What Is Machine Learning?Learning patterns from data rather than programming rules explicitly -- the three paradigms and when each applies.6 min
04Supervised Learning Regression
01Generalized Linear ModelsExtending linear regression to non-normal responses via link functions -- unifying logistic, Poisson, and other regression types.7 min02Linear RegressionFitting a hyperplane to data by minimizing squared errors -- the most interpretable and foundational predictive model.6 min03Polynomial RegressionCapturing nonlinear relationships within the linear regression framework by adding polynomial feature terms.6 min04Regression DiagnosticsResidual analysis, heteroscedasticity, multicollinearity, and influence points -- verifying assumptions before trusting results.6 min05Ridge and Lasso RegressionL2 and L1 penalties that shrink coefficients toward zero -- Ridge for stability, Lasso for sparsity and feature selection.7 min
05Supervised Learning Classification
01Decision TreesRecursive binary splitting that produces interpretable if-then rules -- the building block of ensemble methods.6 min02K-Nearest NeighborsClassify by majority vote of the K closest training examples -- no training phase, all computation at prediction time.6 min03Kernel MethodsThe kernel trick maps data to higher dimensions without explicit computation -- making linear methods handle nonlinear boundaries.6 min04Logistic RegressionLinear model with sigmoid output for probability estimation -- the workhorse baseline for binary classification.6 min05Multi-Class ClassificationExtending binary classifiers to multiple classes via one-vs-rest, one-vs-one, and native multi-class approaches.7 min06Naive BayesApplying Bayes' theorem with a strong independence assumption -- surprisingly effective despite being "wrong" in theory.6 min07Support Vector MachinesFinding the maximum-margin hyperplane that separates classes -- elegant geometry with strong theoretical guarantees.6 min
06Ensemble Methods
01AdaBoostSequentially training weak learners that focus on previously misclassified examples -- boosting accuracy through reweighting.5 min02Bagging and BootstrapTraining multiple models on bootstrapped samples and averaging predictions -- reducing variance through diversity.5 min03Gradient BoostingBuilding an additive model by fitting each new tree to the residual errors of the ensemble -- the most powerful tabular method.6 min04Random ForestsBagged decision trees with random feature subsets -- robust, parallelizable, and hard to overfit with more trees.5 min05Stacking and BlendingTraining a meta-learner on base model predictions -- combining diverse model families for competition-winning performance.7 min06XGBoost, LightGBM, and CatBoostIndustrial-strength gradient boosting implementations with regularization, histogram binning, and GPU acceleration.7 min
07Unsupervised Learning
01Anomaly DetectionIdentifying data points that deviate significantly from the norm -- isolation forests, autoencoders, and statistical approaches.7 min02Association RulesDiscovering frequent itemsets and co-occurrence patterns in transactional data -- the Apriori algorithm and market basket analysis.7 min03DBSCANDiscovering arbitrarily-shaped clusters based on point density -- no need to specify K, naturally identifies outliers.8 min04Gaussian Mixture ModelsSoft clustering via a weighted sum of Gaussians fitted with EM -- probabilistic assignment captures cluster uncertainty.7 min05Hierarchical ClusteringBuilding a tree of nested clusters via agglomerative merging or divisive splitting -- revealing multi-scale data structure.6 min06K-Means ClusteringPartitioning data into K groups by iteratively assigning points to nearest centroids -- simple, fast, and surprisingly effective.7 min07Principal Component AnalysisProjecting data onto orthogonal directions of maximum variance -- the foundational dimensionality reduction technique.8 min08t-SNE and UMAPNonlinear dimensionality reduction for visualization -- preserving local neighborhood structure in 2D/3D plots.8 min
08Neural Network Foundations
01Activation FunctionsNonlinear transforms between layers -- ReLU, sigmoid, tanh, and why the choice matters for gradient flow and expressivity.5 min02BackpropagationComputing gradients layer by layer via the chain rule -- the algorithm that makes deep learning computationally feasible.5 min03Batch NormalizationNormalizing layer inputs within each mini-batch -- stabilizing training, enabling higher learning rates, and acting as regularization.5 min04Dropout and RegularizationRandomly zeroing activations during training -- an implicit ensemble that prevents co-adaptation of neurons.6 min05OptimizersSGD, momentum, RMSProp, Adam, and AdamW -- adaptive methods that navigate loss landscapes faster than vanilla gradient descent.5 min06Perceptrons and Multilayer NetworksFrom single linear classifiers to universal function approximators -- stacking layers creates representational power.5 min07Universal Approximation TheoremA single hidden layer with enough neurons can approximate any continuous function -- but finding those weights is the hard part.7 min08Weight InitializationXavier, He, and orthogonal initialization -- breaking symmetry and controlling signal magnitude at the start of training.5 min
09Probabilistic Methods
01Bayesian InferenceUpdating beliefs with evidence via Bayes' theorem -- treating parameters as distributions rather than fixed values.6 min02Expectation-MaximizationIteratively inferring latent variables (E-step) and optimizing parameters (M-step) -- the workhorse for incomplete data.6 min03Gaussian ProcessesNonparametric Bayesian regression defining distributions over functions -- elegant uncertainty quantification with $O(n^3)$ cost.6 min04Graphical ModelsBayesian networks and Markov random fields representing conditional dependencies as graphs -- structured probabilistic reasoning.7 min05Markov Chain Monte CarloSampling from complex posterior distributions by constructing Markov chains -- when exact inference is intractable.7 min06Variational InferenceApproximating intractable posteriors by optimization rather than sampling -- trading exactness for scalability.7 min
10Model Selection And Evaluation
01CalibrationWhen a model says "80% confidence" it should be right 80% of the time -- reliability diagrams, Platt scaling, and isotonic regression.6 min02Classification MetricsAccuracy, precision, recall, F1, AUC-ROC, and AUC-PR -- choosing the right metric depends on what errors cost.5 min03Cross-ValidationK-fold, stratified, and leave-one-out validation -- maximizing use of limited data for both training and evaluation.6 min04Hyperparameter TuningGrid search, random search, and Bayesian optimization -- finding optimal settings without overfitting to the validation set.5 min05Learning CurvesPlotting performance vs. training set size or training iterations -- diagnosing whether you need more data, more capacity, or more regularization.6 min06Model ComparisonPaired t-tests, McNemar's test, and Wilcoxon signed-rank -- determining if performance differences are real or noise.6 min07Regression MetricsMSE, RMSE, MAE, MAPE, and R-squared -- each captures different aspects of prediction quality.5 min
11Feature Engineering
01Automated Feature EngineeringAutoML, Featuretools, and neural feature learning -- when manual engineering doesn't scale.7 min02Feature Extraction and TransformationCreating new informative features from raw data through domain knowledge, mathematical transforms, and automated methods.7 min03Feature Selection MethodsFilter, wrapper, and embedded approaches for identifying the most informative features -- removing noise to improve generalization.7 min04Handling High-Cardinality FeaturesTarget encoding, hashing, and embedding approaches for categorical features with thousands of unique values.7 min05Time-Series Feature EngineeringLags, rolling statistics, seasonality decomposition, and calendar features -- encoding temporal patterns for ML models.6 min
12Ml Systems And Production
01A/B Testing for MLComparing model versions in production with statistical rigor -- offline metrics don't always predict online impact.6 min02Data Drift and Model MonitoringDetecting when production data diverges from training data -- models degrade silently without monitoring.5 min03Experiment TrackingLogging parameters, metrics, artifacts, and code versions -- reproducing results and navigating the experiment space systematically.5 min04ML PipelinesChaining data processing, feature engineering, and model training into reproducible, deployable workflows.5 min05Model Deployment and ServingBatch vs. real-time inference, containerization, model registries, and the infrastructure of production ML.6 min06Responsible AI and FairnessMeasuring and mitigating bias, ensuring transparency, and building ML systems that are accountable and equitable.7 min