Skip to main contentOpteroAIBeta

Data Scientist interview questions

Data science interview questions covering statistics, machine learning, experimental design, feature engineering, and communicating results.

12 questions
4 easy7 medium1 hard

1.Explain the bias-variance trade-off.

medium
How to approach thisBias is error from oversimplified assumptions (underfitting). Variance is error from sensitivity to training data fluctuations (overfitting). As model complexity increases, bias decreases but variance increases. The goal is to find the sweet spot. Techniques like cross-validation, regularization, and ensemble methods help balance the trade-off.

2.How would you design an A/B test to measure the impact of a new recommendation algorithm?

medium
How to approach thisDefine the metric (CTR, revenue, engagement time). Calculate the required sample size based on minimum detectable effect, significance level (0.05), and power (0.8). Randomly assign users to control and treatment groups. Run for at least one full business cycle. Check for novelty effects. Use a two-sample t-test or Mann-Whitney U test for analysis.

3.What is the difference between L1 and L2 regularization?

medium
How to approach thisL1 (Lasso) adds the sum of absolute weights to the loss function, driving some weights to exactly zero (feature selection). L2 (Ridge) adds the sum of squared weights, shrinking all weights toward zero but rarely eliminating them. L1 produces sparse models; L2 produces small but dense weights. Use Elastic Net for a combination of both.

4.How do you handle class imbalance in a classification problem?

medium
How to approach thisOptions: resample the training data (oversample minority with SMOTE, undersample majority), use class weights in the loss function, change the evaluation metric (use precision-recall AUC instead of accuracy), or use anomaly detection approaches. The right choice depends on how severe the imbalance is and the cost of false positives vs. false negatives.

5.Explain the difference between correlation and causation. Give a real example.

easy
How to approach thisCorrelation means two variables move together; causation means one directly influences the other. Ice cream sales and drowning deaths are correlated (both increase in summer) but one does not cause the other. Establishing causation requires controlled experiments (A/B tests) or careful causal inference methods (instrumental variables, difference-in-differences).

6.What is gradient descent, and what are its variants?

medium
How to approach thisGradient descent iteratively adjusts model parameters in the direction that reduces the loss function. Batch GD uses all training data per step (stable but slow). Stochastic GD (SGD) uses one sample (fast but noisy). Mini-batch GD uses a small batch (best balance). Variants like Adam, RMSProp, and Adagrad adapt the learning rate per parameter.

7.How would you handle missing data in a dataset?

easy
How to approach thisFirst, understand why data is missing (MCAR, MAR, MNAR). Options: drop rows (only if MCAR and small fraction), impute with mean/median/mode (simple but loses variance), use model-based imputation (KNN, iterative imputer), or create a 'missing' indicator feature. For tree-based models, some implementations handle missing values natively.

8.Explain cross-validation and why it is preferred over a simple train/test split.

easy
How to approach thisCross-validation (e.g., k-fold) splits data into k parts, trains on k-1 folds and validates on the remaining fold, rotating k times. This gives a more reliable estimate of model performance because every data point is used for both training and validation. A single train/test split can be misleading if the split is unrepresentative.

9.What is feature engineering, and how do you approach it?

medium
How to approach thisFeature engineering is creating new input features from raw data to improve model performance. Approaches: domain knowledge (creating ratios, time-since features), interaction terms, binning continuous variables, encoding categoricals (one-hot, target encoding), date/time decomposition, and text features (TF-IDF, embeddings). It is often the highest-leverage activity in a ML project.

10.Describe how a random forest works and when you would choose it over a single decision tree.

medium
How to approach thisA random forest builds many decision trees, each trained on a bootstrapped sample with a random subset of features, and averages their predictions. This reduces overfitting (lower variance) compared to a single tree. Random forests are robust to noise, handle non-linear relationships, and require minimal tuning. Choose a single tree only when interpretability is the top priority.

11.How do you evaluate a classification model beyond accuracy?

easy
How to approach thisUse: precision (of predicted positives, how many are correct), recall (of actual positives, how many were found), F1 score (harmonic mean), ROC-AUC (threshold-independent ranking quality), and the confusion matrix. For imbalanced classes, precision-recall AUC is more informative than ROC-AUC. Choose the metric that aligns with the business cost of errors.

12.What is the curse of dimensionality, and how do you mitigate it?

hard
How to approach thisAs the number of features grows, the volume of the feature space increases exponentially, making data sparse. Distance metrics become less meaningful, and models need exponentially more data to generalize. Mitigate with: feature selection (remove irrelevant features), dimensionality reduction (PCA, UMAP), regularization, and domain knowledge to limit feature count.

Prepare further

More interview topics