Course · 12 modules · 82 lessons · 495 min

Machine Learning Foundations

Mathematical foundations, learning theory, supervised and unsupervised methods, neural networks, and production ML systems.

← All courses

Your progress0 / 82 lessons· 0%

The course at a glance

12 modules · click any tile to jump to its lessons.

All lessons

№ 01Mathematical Foundations

01Derivatives and GradientsThe mathematical machinery for measuring how outputs change with inputs -- the foundation of all learning algorithms.5 min→02Information TheoryEntropy, KL divergence, and mutual information -- quantifying uncertainty, surprise, and the difference between distributions.5 min→03Matrix DecompositionsEigendecomposition, SVD, and Cholesky -- factoring matrices to reveal structure, compress data, and solve systems efficiently.5 min→04Maximum Likelihood EstimationFinding the parameter values that make observed data most probable -- the dominant paradigm for fitting ML models.5 min→05Norms and Distance MetricsMeasuring size and similarity in feature space -- L1, L2, cosine, Mahalanobis, and when each is appropriate.6 min→06Optimization and Gradient DescentIteratively adjusting parameters to minimize a loss function -- the engine that drives model training.5 min→07Probability FundamentalsRandom variables, distributions, Bayes' theorem, and conditional probability -- the language of uncertainty in ML.5 min→08Statistical InferenceDrawing conclusions about populations from samples -- hypothesis testing, confidence intervals, and the frequentist-Bayesian divide.5 min→09Vectors and MatricesThe fundamental data structures of ML -- representing data as points in high-dimensional space and transformations as matrices.5 min→

№ 02Data Science Fundamentals

01Data Cleaning and PreprocessingHandling noise, inconsistencies, and formatting issues -- garbage in, garbage out is the first law of ML.6 min→02Data Splitting and SamplingTrain/validation/test splits, stratification, and handling class imbalance -- the foundation of honest evaluation.8 min→03Data Types and StructuresNumerical, categorical, ordinal, text, time series -- understanding your data's nature determines every downstream decision.5 min→04Encoding Categorical VariablesOne-hot, label, target, and embedding-based encoding -- translating categories into numbers without introducing false relationships.7 min→05Exploratory Data AnalysisVisualizing distributions, correlations, and anomalies before modeling -- the most undervalued step in the ML pipeline.6 min→06Feature Scaling and NormalizationStandardization, min-max scaling, and robust scaling -- ensuring features contribute equally regardless of their original units.6 min→07Handling Missing DataDeletion, imputation, and model-based approaches -- the strategy depends on why data is missing, not just how much.7 min→

№ 03Core Learning Theory

01Bias-Variance TradeoffThe fundamental tension between underfitting and overfitting -- every model navigates this tradeoff whether you manage it or not.6 min→02Curse of DimensionalityAs dimensions increase, data becomes sparse, distances become meaningless, and exponentially more data is needed.6 min→03Empirical Risk MinimizationMinimizing average loss on training data as a proxy for true risk -- the theoretical framework underlying most ML algorithms.6 min→04Loss FunctionsThe objective being optimized -- MSE, cross-entropy, hinge loss, and how the choice shapes what the model learns.5 min→05Overfitting and UnderfittingMemorizing training data vs. failing to capture patterns -- the two failure modes of every learning algorithm.6 min→06RegularizationConstraining model complexity to improve generalization -- L1, L2, dropout, early stopping, and the bias-variance connection.6 min→07Types of Machine LearningSupervised, unsupervised, semi-supervised, and self-supervised -- a taxonomy based on what labels are available.6 min→08What Is Machine Learning?Learning patterns from data rather than programming rules explicitly -- the three paradigms and when each applies.6 min→

№ 04Supervised Learning Regression

01Generalized Linear ModelsExtending linear regression to non-normal responses via link functions -- unifying logistic, Poisson, and other regression types.7 min→02Linear RegressionFitting a hyperplane to data by minimizing squared errors -- the most interpretable and foundational predictive model.6 min→03Polynomial RegressionCapturing nonlinear relationships within the linear regression framework by adding polynomial feature terms.6 min→04Regression DiagnosticsResidual analysis, heteroscedasticity, multicollinearity, and influence points -- verifying assumptions before trusting results.6 min→05Ridge and Lasso RegressionL2 and L1 penalties that shrink coefficients toward zero -- Ridge for stability, Lasso for sparsity and feature selection.7 min→

№ 05Supervised Learning Classification

01Decision TreesRecursive binary splitting that produces interpretable if-then rules -- the building block of ensemble methods.6 min→02K-Nearest NeighborsClassify by majority vote of the K closest training examples -- no training phase, all computation at prediction time.6 min→03Kernel MethodsThe kernel trick maps data to higher dimensions without explicit computation -- making linear methods handle nonlinear boundaries.6 min→04Logistic RegressionLinear model with sigmoid output for probability estimation -- the workhorse baseline for binary classification.6 min→05Multi-Class ClassificationExtending binary classifiers to multiple classes via one-vs-rest, one-vs-one, and native multi-class approaches.7 min→06Naive BayesApplying Bayes' theorem with a strong independence assumption -- surprisingly effective despite being "wrong" in theory.6 min→07Support Vector MachinesFinding the maximum-margin hyperplane that separates classes -- elegant geometry with strong theoretical guarantees.6 min→

№ 06Ensemble Methods

01AdaBoostSequentially training weak learners that focus on previously misclassified examples -- boosting accuracy through reweighting.5 min→02Bagging and BootstrapTraining multiple models on bootstrapped samples and averaging predictions -- reducing variance through diversity.5 min→03Gradient BoostingBuilding an additive model by fitting each new tree to the residual errors of the ensemble -- the most powerful tabular method.6 min→04Random ForestsBagged decision trees with random feature subsets -- robust, parallelizable, and hard to overfit with more trees.5 min→05Stacking and BlendingTraining a meta-learner on base model predictions -- combining diverse model families for competition-winning performance.7 min→06XGBoost, LightGBM, and CatBoostIndustrial-strength gradient boosting implementations with regularization, histogram binning, and GPU acceleration.7 min→

№ 07Unsupervised Learning

01Anomaly DetectionIdentifying data points that deviate significantly from the norm -- isolation forests, autoencoders, and statistical approaches.7 min→02Association RulesDiscovering frequent itemsets and co-occurrence patterns in transactional data -- the Apriori algorithm and market basket analysis.7 min→03DBSCANDiscovering arbitrarily-shaped clusters based on point density -- no need to specify K, naturally identifies outliers.8 min→04Gaussian Mixture ModelsSoft clustering via a weighted sum of Gaussians fitted with EM -- probabilistic assignment captures cluster uncertainty.7 min→05Hierarchical ClusteringBuilding a tree of nested clusters via agglomerative merging or divisive splitting -- revealing multi-scale data structure.6 min→06K-Means ClusteringPartitioning data into K groups by iteratively assigning points to nearest centroids -- simple, fast, and surprisingly effective.7 min→07Principal Component AnalysisProjecting data onto orthogonal directions of maximum variance -- the foundational dimensionality reduction technique.8 min→08t-SNE and UMAPNonlinear dimensionality reduction for visualization -- preserving local neighborhood structure in 2D/3D plots.8 min→

№ 08Neural Network Foundations

01Activation FunctionsNonlinear transforms between layers -- ReLU, sigmoid, tanh, and why the choice matters for gradient flow and expressivity.5 min→02BackpropagationComputing gradients layer by layer via the chain rule -- the algorithm that makes deep learning computationally feasible.5 min→03Batch NormalizationNormalizing layer inputs within each mini-batch -- stabilizing training, enabling higher learning rates, and acting as regularization.5 min→04Dropout and RegularizationRandomly zeroing activations during training -- an implicit ensemble that prevents co-adaptation of neurons.6 min→05OptimizersSGD, momentum, RMSProp, Adam, and AdamW -- adaptive methods that navigate loss landscapes faster than vanilla gradient descent.5 min→06Perceptrons and Multilayer NetworksFrom single linear classifiers to universal function approximators -- stacking layers creates representational power.5 min→07Universal Approximation TheoremA single hidden layer with enough neurons can approximate any continuous function -- but finding those weights is the hard part.7 min→08Weight InitializationXavier, He, and orthogonal initialization -- breaking symmetry and controlling signal magnitude at the start of training.5 min→

№ 09Probabilistic Methods

01Bayesian InferenceUpdating beliefs with evidence via Bayes' theorem -- treating parameters as distributions rather than fixed values.6 min→02Expectation-MaximizationIteratively inferring latent variables (E-step) and optimizing parameters (M-step) -- the workhorse for incomplete data.6 min→03Gaussian ProcessesNonparametric Bayesian regression defining distributions over functions -- elegant uncertainty quantification with $O(n^3)$ cost.6 min→04Graphical ModelsBayesian networks and Markov random fields representing conditional dependencies as graphs -- structured probabilistic reasoning.7 min→05Markov Chain Monte CarloSampling from complex posterior distributions by constructing Markov chains -- when exact inference is intractable.7 min→06Variational InferenceApproximating intractable posteriors by optimization rather than sampling -- trading exactness for scalability.7 min→

№ 10Model Selection And Evaluation

01CalibrationWhen a model says "80% confidence" it should be right 80% of the time -- reliability diagrams, Platt scaling, and isotonic regression.6 min→02Classification MetricsAccuracy, precision, recall, F1, AUC-ROC, and AUC-PR -- choosing the right metric depends on what errors cost.5 min→03Cross-ValidationK-fold, stratified, and leave-one-out validation -- maximizing use of limited data for both training and evaluation.6 min→04Hyperparameter TuningGrid search, random search, and Bayesian optimization -- finding optimal settings without overfitting to the validation set.5 min→05Learning CurvesPlotting performance vs. training set size or training iterations -- diagnosing whether you need more data, more capacity, or more regularization.6 min→06Model ComparisonPaired t-tests, McNemar's test, and Wilcoxon signed-rank -- determining if performance differences are real or noise.6 min→07Regression MetricsMSE, RMSE, MAE, MAPE, and R-squared -- each captures different aspects of prediction quality.5 min→

№ 11Feature Engineering

01Automated Feature EngineeringAutoML, Featuretools, and neural feature learning -- when manual engineering doesn't scale.7 min→02Feature Extraction and TransformationCreating new informative features from raw data through domain knowledge, mathematical transforms, and automated methods.7 min→03Feature Selection MethodsFilter, wrapper, and embedded approaches for identifying the most informative features -- removing noise to improve generalization.7 min→04Handling High-Cardinality FeaturesTarget encoding, hashing, and embedding approaches for categorical features with thousands of unique values.7 min→05Time-Series Feature EngineeringLags, rolling statistics, seasonality decomposition, and calendar features -- encoding temporal patterns for ML models.6 min→

№ 12Ml Systems And Production

01A/B Testing for MLComparing model versions in production with statistical rigor -- offline metrics don't always predict online impact.6 min→02Data Drift and Model MonitoringDetecting when production data diverges from training data -- models degrade silently without monitoring.5 min→03Experiment TrackingLogging parameters, metrics, artifacts, and code versions -- reproducing results and navigating the experiment space systematically.5 min→04ML PipelinesChaining data processing, feature engineering, and model training into reproducible, deployable workflows.5 min→05Model Deployment and ServingBatch vs. real-time inference, containerization, model registries, and the infrastructure of production ML.6 min→06Responsible AI and FairnessMeasuring and mitigating bias, ensuring transparency, and building ML systems that are accountable and equitable.7 min→