1.Explain the difference between supervised, unsupervised, and reinforcement learning.
easyHow to approach thisSupervised: learn from labeled data (classification, regression). Unsupervised: find patterns in unlabeled data (clustering, dimensionality reduction). Reinforcement: learn by interacting with an environment and receiving rewards (game playing, robotics). Semi-supervised and self-supervised are hybrid approaches gaining popularity.
2.How would you handle a model that performs well on training data but poorly on test data?
mediumHow to approach thisThis is overfitting. Remedies: add more training data, reduce model complexity (fewer layers, lower degree), apply regularization (L1/L2, dropout), use early stopping, increase data augmentation, or switch to a simpler model. Diagnose by plotting learning curves (training vs. validation loss over epochs).
3.What is the transformer architecture, and why did it revolutionize NLP?
hardHow to approach thisTransformers use self-attention to process all tokens in parallel (unlike RNNs which process sequentially). The attention mechanism lets the model weigh the importance of each token relative to every other token. This enables: massive parallelization during training, better long-range dependency modeling, and transfer learning via pre-training (BERT, GPT). They now dominate NLP, vision, and beyond.
4.How do you choose between precision and recall for a given problem?
mediumHow to approach thisOptimize precision when false positives are costly (spam filter: users hate losing real emails). Optimize recall when false negatives are costly (cancer screening: missing a case is dangerous). Use the F1 score when you need a balance. In practice, plot the precision-recall curve and choose the threshold that matches your business requirements.
5.Explain how backpropagation works in neural networks.
hardHow to approach thisBackpropagation computes the gradient of the loss function with respect to each weight by applying the chain rule, working backwards from the output layer. These gradients indicate how to adjust each weight to reduce the loss. Combined with gradient descent, this iteratively improves the network. Understanding the chain rule and computation graphs is key to debugging training issues.
6.What is transfer learning, and when is it effective?
mediumHow to approach thisTransfer learning uses a model pre-trained on a large dataset (ImageNet, Common Crawl) and fine-tunes it on your smaller, task-specific dataset. It is effective when: you have limited labeled data, the source domain is similar to your target domain, and the pre-trained model has learned general features. Fine-tune the last few layers and freeze earlier layers for small datasets.
7.How would you deploy a machine learning model to production?
mediumHow to approach thisPackage the model (ONNX, TorchScript, SavedModel) and serve it via an API (Flask, FastAPI, TF Serving). Use a model registry for versioning. Implement A/B testing to compare model versions. Monitor prediction latency, throughput, and data drift. Set up automated retraining pipelines triggered by performance degradation. Handle graceful fallback if the model service is down.
8.What is data drift, and how do you detect it?
mediumHow to approach thisData drift occurs when the distribution of input data in production differs from the training data. This degrades model performance over time. Detect it by: monitoring feature statistics (mean, variance, quantiles) over time, using statistical tests (KS test, PSI), and tracking model prediction distributions. When drift is detected, retrain on recent data.
9.Explain the difference between batch prediction and real-time prediction.
easyHow to approach thisBatch: precompute predictions for all items periodically (e.g., nightly recommendations). Best for: high throughput, non-time-sensitive predictions, features that change slowly. Real-time: compute predictions on-demand per request. Best for: time-sensitive decisions (fraud detection), features that change rapidly. Many systems use both: batch for baseline, real-time for adjustments.
10.What are embeddings, and why are they useful?
mediumHow to approach thisEmbeddings are dense, low-dimensional vector representations of discrete entities (words, users, products). Similar items have similar vectors. They enable: measuring similarity (cosine distance), feeding categorical data into neural networks, transfer learning (pre-trained word2vec, GloVe), and efficient nearest-neighbor search. They are the foundation of modern recommendation and search systems.
11.How do you handle the cold start problem in a recommendation system?
mediumHow to approach thisFor new users: use content-based recommendations (based on profile data), popularity-based recommendations, or ask onboarding questions. For new items: use item metadata (category, description) for content-based matching. Hybrid approaches combine collaborative filtering (user behavior) with content-based methods. As data accumulates, gradually shift weight toward collaborative signals.
12.What is the difference between bagging and boosting?
mediumHow to approach thisBagging (Bootstrap Aggregating) trains multiple models on random subsets of data in parallel, then averages predictions (Random Forest). Reduces variance. Boosting trains models sequentially, with each new model focusing on the errors of the previous one (XGBoost, AdaBoost). Reduces bias. Boosting typically achieves higher accuracy but is more prone to overfitting and slower to train.