Language is given meaning through its correspondence with a world representation. This correspondence can be at multiple levels of granularity or resolutions. In this project, we study multi-resolution language grounding in multiple domains of sport commentaries, geometry questions and images. This project aims at building a framework to learn to understand and generate narratives for natural dynamic environments with weak supervision.