Q1
Walk me through how you would approach feature engineering for a dataset with 200+ raw variables. What techniques would you use to reduce dimensionality and avoid overfitting?
Why they ask this:* They want to assess your practical understanding of feature selection, domain knowledge application, and awareness of the bias-variance tradeoff in real-world scenarios.
Q2
Explain the difference between L1 and L2 regularization and describe a situation where you would choose one over the other in a production model.
Why they ask this:* They're evaluating your grasp of fundamental machine learning concepts and your ability to make informed decisions about model complexity and interpretability.
Q3
How would you design a data pipeline to handle a streaming dataset with missing values, outliers, and class imbalance for a classification task?
Why they ask this:* They want to see your understanding of end-to-end data engineering, preprocessing best practices, and practical problem-solving for real production environments.
Q4
Describe your experience with A/B testing. How do you determine sample size, handle multiple comparisons, and communicate statistical significance to non-technical stakeholders?