Robert Deniro and Martin Scorsese on set
Stories & Ideas

Thu 24 Aug 2017

Scorsese and the four key elements of visual literacy

Craft Education Film
Luke Buckmaster

Luke Buckmaster

Film Critic & Author

Film Critic Luke Buckmaster traces Martin Scorsese's passion for visual literacy

Martin Scorsese is passionate about visual literacy. The filmmaker’s fondness for the subject harks back to childhood, having grown up in a poor home with little access to books or other printed materials.

He has spoken eloquently about it and matched words with action. The organisation Scorsese founded in 1990, The Film Foundation, develop free curriculums that have been used in the classrooms of more than 100,000 educators. 

But what does “visual literacy” actually mean? The topic could consume an entire book, or even several.

For now we will take a look at the four things Scorsese believes are core to understanding it (mentioned in a 2013 keynote lecture) sourcing examples from his own work to see how they can be applied. They are light, movement, time and inference. 


When we think of effective lighting in cinema we tend to envision the entire frame, or large parts of it. As demonstrated in the director’s epic 1995 crime drama Casino, lighting can also be used for highly specific purposes.

In the film Italian actor Pasquale Cajano plays the Chicago Mafia's top boss, the quietly spoken but deadly Remo Gaggi. In almost every one of his scenes Cajano is lit with spotlights that either shine on him from above or reach up to him from below, visually differentiating him other cast members.  


Pasquale Cajano as Remo Gaggi in 'Casino' (1995)

Lighting an actor in this way has a similar effect to filming them from a low angle. It implies a position of power, in this case psychologically reminding audiences that Remo calls the shots.

Such lighting techniques weren’t always available. In the early years of cinema, technology was so rudimentary everything needed to be shot in the open (including interior scenes, which were filmed outside on purpose-built stages).


Movement is another component of film language that has greatly evolved over time. Pan shots, which move the camera left or right, have been possible more or less since the beginning of film. But the size of cameras have shrunk substantially, becoming more agile and thus capable of highly energetic camerawork. 

In the opening scene of After Hours, Scorsese's 1985 one-night-on-the-town black comedy, the camera almost literally leaps out of a chair – as if it’s been fired from a rocket. In an office environment, from behind a person’s desk, the frame scuttles forward in a diagonal direction, past two or three other desks then right up into the face of mild-mannered protagonist Paul Hackett (Griffin Dunne).

This fast-paced nine second shot has a big impact, immediately establishing the film as a visually playful work. The final shot takes place when an exasperated Paul returns to work the next morning covered in white gunk, after a bizarre incident turns him into a human paper mache.

For this image the camera begins at his desk and careens backwards, navigating the space between other desks in big loop-like reverse swings. The world of this ordinarily calm and collected man has finally, literally, spiralled out of control.


The third of Scorsese’s four key components of visual literacy, time, is something filmmakers generally work to compress. If we see a person beginning to ascend a flight of stairs, for example, the logical next shot will depict them at the top of it (thus removing the dull vision of watching them climbing).

But sometimes, as in the beginning of 1980's Raging Bull, editing is used instead to expand time. We see Jake La Motta (Robert De Niro) alone in the ring, moving like a dancer. Scorsese slows the footage down so it has a graceful, almost balletic quality.  

This is a very different tempo to the conventional, fast-paced, wham-bam-slam boxing movie scene that has been rehashed countless times over the years. Scorsese's manipulation of time is essential in establishing Raging Bull (which is often cited as the finest boxing film ever made) as an experience that is lyrical rather than visceral. 


The fourth of Scorsese’s visual literacy components, inference, refers to things we don’t actually see. Or rather, things we don’t see on the screen. We see them with what some critics and artists describe as our "mind's eye". 

Inference goes to the very fundamentals of film editing. Although we may remember movies as a series of single pictures, meaning is most powerfully created not by one image but by a combination of them. When we see two shots together, a third is created - the one we see in our heads. 

Consider seeing the following two images. The first is of a well-dressed gambler betting recklessly and throwing down wads of cash at the roulette wheel. The second is of that same man, but now he is dressed in shabby clothes and is sitting in the gutter asking passers-by for change.

When consumed in isolation, these images don't impart much in the way of a story. When combined, they communicate a very clear one: the man has lost all his money gambling and is now destitute. The middle part - his journey downwards - plays out entirely in our minds.

Scorsese experiments with inference in Shutter Island, the 2010 hallucinogenic thriller about federal marshal Edward Daniels (Leonardo DiCaprio) who investigates a case on an island that houses a hospital for the criminally insane. Inside a beautiful swanky lounge room, about an hour into the running time, Daniels is suddenly confronted by a person who looks more like a ghoul than a man. He has hideous scars, a bung eye and a ghastly Freddy Krueger-esque smile.

When the man offers him a drink, Scorsese cuts to DiCaprio for his response. When he cuts back to where the horrible yeti-like person was standing, a different actor is now there: Mark Ruffalo, who plays Daniels’ well-dressed partner. A further shot, after the sound of a scream from somewhere off-frame, then shows nobody other than DiCaprio is present in the room. 

In isolation, these images don't tell us much. Combined they raise big questions. Is Daniels talking to ghosts? Is he hallucinating? Scorsese uses inference to call into question the sanity of his protagonist.

The sanity of the director, of course, remains intact: the 73-year-old is truly a master of film language. There are many ways to go about exploring visual literacy and many other components to consider, in addition to the four Scorsese listed.

If anybody asks why we ought to bother in the first place, perhaps we can refer to words from the man himself.

"Young people need to understand that not all images are there to be consumed like fast food and then forgotten,” Scorsese said in his 2013 keynote. “We need to educate them to understand the difference between moving images that engage their humanity and their intelligence, and moving images that are just selling them something.”