As you've probably heard, a very small hole in a surface like a thin wall will display a focussed image (of a landscape say) on another flat surface placed behind the wall. No lens is required. That's called a "pinhole camera" in English. Basically, the light from any given part of the scene has only one path to follow through the hole to the image plane behind the "wall."
Actually, many paths exist, but they are so tightly grouped together that they fall within so small an area on the image that it appears to be in complete focus without help from a lens. The smaller the hole, the better this appearance of focus. (Up to a point. Different topic.) But a small hole admits little light. So the scene must be in bright sunlight, and the area behind must be a fairly dark room for our eyes to pick up the faint image projected. Most film is less sensitive than our eyes, so the problem is worse when we want to capture the image for posterity. So we must make the hole bigger to capture enough light.
When we make the hole bigger, light from any given portion of the scene has more paths to follow. The optics of a lens bring each of those paths to the same point on the image plane -- or that's what we try to accomplish. The bigger the hole, the harder the problem of designing a lens that will bring each path to the same point for the three critical frequencies of light. (Light of different frequencies is refracted by a different angle through any given optical material).