Modern artificial intelligence incorporates deep learning to learn useful information from complex and high-dimensional data and achieves remarkable practical success. However, significant concerns have recently emerged - (1) Reliability. The deep learning model lacks robustness and exhibits prediction bias in subpopulations. (2) Data scarcity. Training a deep model necessitates big data, which is often infeasible in practice. In my talk, I will show the importance of tackling distribution shifts to address these concerns. Specifically, I will present several theoretical principled tools - (a) Invariance, learning an invariant representation by eliminating nuisance factors for robustness. (b) Optimization, optimizing on all the subgroup distributions equally for fairness. (c) Selection, learning limited labeled data by selecting useful information from the relevant data. I will further explore how these ideas are being applied in the real-world such as healthcare.