A venerable tradition in neuroscience seeks to understand sensory processing, and in particular vision, through unsupervised learning of natural statistics. Generative models are key to understanding response statistics, including nonlinearities in response means, variability, oscillations, of low level vision through performing probabilistic inference. However, progress has been hampered by the limited capabilities of generative models to stand up for the requirements set by nonlinear hierarchical computations in the visual cortex. Here we harness the inspirations coming from neuroscience to develop a novel flavor of Variational Autoencoders, a class of models that is capable of performing learning and inference, of/in nonlinear generative models. We discuss the choices for efficient training of a hierarchical generative model. Further, inspired by theoretical results on the unidentifiability of latent variable representations, we investigate the contribution of a range of inductive biases that contribute to shaping the hierarchical representations in a purely unsupervised model. In particular, we investigate the parametric form of the prior, computational complexity, and information complexity. We contrast the insights we obtain from Variational Autoencoders with the properties of neural representations.