Question 1

Explain the bias-variance trade-off.

Accepted Answer

Bias is error from oversimplified assumptions (underfitting). Variance is error from sensitivity to training data fluctuations (overfitting). As model complexity increases, bias decreases but variance increases. The goal is to find the sweet spot. Techniques like cross-validation, regularization, and ensemble methods help balance the trade-off.

Question 2

How would you design an A/B test to measure the impact of a new recommendation algorithm?

Accepted Answer

Define the metric (CTR, revenue, engagement time). Calculate the required sample size based on minimum detectable effect, significance level (0.05), and power (0.8). Randomly assign users to control and treatment groups. Run for at least one full business cycle. Check for novelty effects. Use a two-sample t-test or Mann-Whitney U test for analysis.

Question 3

What is the difference between L1 and L2 regularization?

Accepted Answer

L1 (Lasso) adds the sum of absolute weights to the loss function, driving some weights to exactly zero (feature selection). L2 (Ridge) adds the sum of squared weights, shrinking all weights toward zero but rarely eliminating them. L1 produces sparse models; L2 produces small but dense weights. Use Elastic Net for a combination of both.

Question 4

How do you handle class imbalance in a classification problem?

Accepted Answer

Options: resample the training data (oversample minority with SMOTE, undersample majority), use class weights in the loss function, change the evaluation metric (use precision-recall AUC instead of accuracy), or use anomaly detection approaches. The right choice depends on how severe the imbalance is and the cost of false positives vs. false negatives.

Question 5

Explain the difference between correlation and causation. Give a real example.

Accepted Answer

Correlation means two variables move together; causation means one directly influences the other. Ice cream sales and drowning deaths are correlated (both increase in summer) but one does not cause the other. Establishing causation requires controlled experiments (A/B tests) or careful causal inference methods (instrumental variables, difference-in-differences).

Question 6

What is gradient descent, and what are its variants?

Accepted Answer

Gradient descent iteratively adjusts model parameters in the direction that reduces the loss function. Batch GD uses all training data per step (stable but slow). Stochastic GD (SGD) uses one sample (fast but noisy). Mini-batch GD uses a small batch (best balance). Variants like Adam, RMSProp, and Adagrad adapt the learning rate per parameter.

Question 7

How would you handle missing data in a dataset?

Accepted Answer

First, understand why data is missing (MCAR, MAR, MNAR). Options: drop rows (only if MCAR and small fraction), impute with mean/median/mode (simple but loses variance), use model-based imputation (KNN, iterative imputer), or create a 'missing' indicator feature. For tree-based models, some implementations handle missing values natively.

Question 8

Explain cross-validation and why it is preferred over a simple train/test split.

Accepted Answer

Cross-validation (e.g., k-fold) splits data into k parts, trains on k-1 folds and validates on the remaining fold, rotating k times. This gives a more reliable estimate of model performance because every data point is used for both training and validation. A single train/test split can be misleading if the split is unrepresentative.

Question 9

What is feature engineering, and how do you approach it?

Accepted Answer

Feature engineering is creating new input features from raw data to improve model performance. Approaches: domain knowledge (creating ratios, time-since features), interaction terms, binning continuous variables, encoding categoricals (one-hot, target encoding), date/time decomposition, and text features (TF-IDF, embeddings). It is often the highest-leverage activity in a ML project.

Question 10

Describe how a random forest works and when you would choose it over a single decision tree.

Accepted Answer

A random forest builds many decision trees, each trained on a bootstrapped sample with a random subset of features, and averages their predictions. This reduces overfitting (lower variance) compared to a single tree. Random forests are robust to noise, handle non-linear relationships, and require minimal tuning. Choose a single tree only when interpretability is the top priority.

Question 11

How do you evaluate a classification model beyond accuracy?

Accepted Answer

Use: precision (of predicted positives, how many are correct), recall (of actual positives, how many were found), F1 score (harmonic mean), ROC-AUC (threshold-independent ranking quality), and the confusion matrix. For imbalanced classes, precision-recall AUC is more informative than ROC-AUC. Choose the metric that aligns with the business cost of errors.

Question 12

What is the curse of dimensionality, and how do you mitigate it?

Accepted Answer

As the number of features grows, the volume of the feature space increases exponentially, making data sparse. Distance metrics become less meaningful, and models need exponentially more data to generalize. Mitigate with: feature selection (remove irrelevant features), dimensionality reduction (PCA, UMAP), regularization, and domain knowledge to limit feature count.

Data Scientist interview questions

1.Explain the bias-variance trade-off.

2.How would you design an A/B test to measure the impact of a new recommendation algorithm?

3.What is the difference between L1 and L2 regularization?

4.How do you handle class imbalance in a classification problem?

5.Explain the difference between correlation and causation. Give a real example.

6.What is gradient descent, and what are its variants?

7.How would you handle missing data in a dataset?

8.Explain cross-validation and why it is preferred over a simple train/test split.

9.What is feature engineering, and how do you approach it?

10.Describe how a random forest works and when you would choose it over a single decision tree.

11.How do you evaluate a classification model beyond accuracy?

12.What is the curse of dimensionality, and how do you mitigate it?

Prepare further

More interview topics