The optimization of thermodynamic efficiency requires optimal information processing. I show how one can get from the thermodynamics of information engines to optimal data representation, to information theory, and to machine learning. In particular, I show that Shannon's information rate is proportional to the least effort necessary for data representation and Shannon's channel capacity is related to maximal work potential. Furthermore, by minimizing the smallest achievable heat dissipation in an information engine over all possible data representations one can derive a machine learning method known as the "Information Bottleneck."