Top 100 Most Frequently Asked Machine Learning Interview Questions 2025

Top 100 Most Frequently Asked Machine Learning Interview Questions 2025 (Domain Kownledge)

In this article, we have put together 100 popular interview questions asked in machine learning interviews. These cover mostly machine learning domain knowledge, but also basic and required machine learning concepts needed for MLE position interviews. The MLE interview often consists of numerous conceptual questions that test your fundamental understanding. These questions may be in a standalone machine learning domain knowledge round, or blended into every round of ML interviews. These questions are mostly straightforward questions about ML concepts that require precise and thorough answers. Make sure you've fully mastered these concepts before your interview to demonstrate both depth and breadth of knowledge.

(We will keep updating this link)

Part I: Foundations of Machine Learning

Understanding the Core Concepts

What do overfitting/underfitting refer to?
What is the bias-variance trade-off?
Explain the mechanism behind why regularization helps prevent overfitting.
What are relative entropy, cross-entropy, and KL divergence? Explain their intuitions.

Model Categories

What is the difference between Generative and Discriminative models?
Compared to discriminative models, are generative models more prone to overfitting or underfitting?

Part II: The ML Pipeline

Data Preprocessing and Feature Engineering

How do you handle missing data?
How do you handle imbalanced data?
What problems arise in high-dimensional classification? How do you handle them?
How do you perform feature selection?
How do you capture feature interactions?

Model Evaluation and Metrics

Given a set of ground truths and 2 models, how do you determine with confidence that one model is better than another?
Which metrics should be used for classification problems and why?
Explain the confusion matrix and its components.
Explain precision and recall. What is the trade-off between them?
What metrics should you use when dealing with imbalanced datasets?
What is AUC? Explain it as the probability of ranking a randomly selected positive sample higher than a randomly selected negative sample.
Define true positive rate and false positive rate. How do they relate to ROC curves?
What is log-loss? When should you use log-loss as a metric?

Part III: Optimization and Training

Loss Functions

Write the formula for MSE. When is MSE typically used?
What is the loss function for logistic regression?
Derive the loss function for logistic regression.
What is the loss function for SVM?
For multiclass logistic regression, why do we use cross-entropy as the cost function?
What is the optimization objective when splitting nodes in a decision tree?

Optimization Theory

Is logistic regression with MSE loss a convex optimization problem?
What is the relationship between least squares estimation and MLE in linear regression?
What is the relationship between minimizing squared error and maximizing likelihood?
Explain the problems of plateaus and saddle points in optimization.

Regularization Techniques

What are common methods to prevent overfitting?
What is the difference between L1 and L2 regularization?
Explain Lasso and Ridge regression. What are their respective priors?
Derive the optimization formulations for Lasso and Ridge regression.
Why does L1 regularization produce sparse solutions while L2 does not?
Why are L1 and L2 norms commonly used for regularization instead of higher-order norms like L3 or L4?

Part IV: Traditional Machine Learning Algorithms

Linear Models

What are the fundamental assumptions of Linear Regression?
What happens when we have correlated variables? How do you solve this?
Explain regression coefficients and their interpretation.
How can you minimize inter-correlation between variables in Linear Regression?
If the relationship between y and x is non-linear, can linear regression solve it?
Why and when should you use interaction variables?
What are the differences between Logistic Regression and SVM? (Consider loss functions and outputs)

Tree-Based Methods

How do regression and classification decision trees split nodes?
How do you prevent overfitting in decision trees?
How do you apply regularization in decision trees?

Ensemble Methods

What is the difference between bagging and boosting?
Compare GBDT and Random Forest. What are their pros and cons?
Explain how GBDT and Random Forest work.
Does Random Forest help reduce bias or variance? Why does it reduce variance?

Probabilistic and Statistical Models

Explain Naive Bayes. What are its fundamental assumptions?
What are LDA and QDA? What are their assumptions?

Support Vector Machines and Kernel Methods

Explain SVM. How do you introduce non-linearity?
Explain kernel methods. Why use them? What kernels do you know?
How do you convert SVM outputs to probabilities?

Instance-Based Learning

Explain KNN (K-Nearest Neighbors).

Unsupervised Learning

Explain K-means clustering algorithm in detail. Does it converge to global or local optima? What are the stopping criteria?
What is the EM algorithm?
What is GMM (Gaussian Mixture Model)? How does it relate to K-means?
Explain PCA (Principal Component Analysis).

Part V: Deep Learning

Neural Network Fundamentals

Why do DNNs need bias terms? What is the intuition behind bias terms?
Can you initialize all weights in a neural network to zero? Why or why not?
What is the difference between DNNs and Logistic Regression?
Why do DNNs have stronger fitting capability than Logistic Regression?
What are common activation functions (sigmoid, tanh, ReLU, Leaky ReLU)? What are their pros and cons?
Why do we need non-linear activation functions?

Training Deep Networks

What is backpropagation and how does it work?
What are vanishing and exploding gradients? How do you solve these problems?
Compare different optimizers: SGD, RMSprop, Momentum, Adagrad, Adam.
What are the pros and cons of batch gradient descent vs SGD? How does batch size affect training?
How do learning rates that are too large or too small affect model training?

Deep Learning Regularization

What methods can prevent overfitting in Deep Learning?
What is Dropout? Why does it work? What is the difference in dropout behavior during training vs testing?
What is Batch Normalization? Why does it work? What is the difference in BN behavior during training vs testing?

Advanced Deep Learning Practices

How do you perform hyperparameter tuning in deep learning? Compare grid search vs random search.
When does transfer learning make sense?

Part VI: Model Selection and Comparison

What are the recent advances in ML models and their performance benchmarks?
Can you name the popular models within each ML category (e.g., tree-based, neural networks, probabilistic)?
Which models have you used in your previous projects? Walk me through your experience with them.

AI Infra System Design Private Course 📒

Would you like to master modern AI infrastructure from FAANG engineers with 15+ years of experience? Contact us at [email protected]

AI Workshop