Emotional symbologies, realism and the next hardware generation

Recently I re-consumed Final Fantasy VII, released in 1997.One thing that threw me off at first was the gestures. The characters all use comically exaggerated gestures that always seem a little off: Cloud’s response to everything is shrugging, while shaking seems to denote almost anything from fear to laughter to anger. But I couldn’t recall being bothered by this during my first run-through fourteen years ago, and I realized that we used to have this whole symbolic gestural language for conveying emotion back in the ultra-low polygon count days. The gestures didn’t necessarily depict real-world body language, but had their own unique symbology that could be deciphered by consumers in order to understand character emotion. It’s impressive, actually; each of the characters has their own distinguishing gestures linked to their personalities, and the game is able to convey a fairly wide range of emotion even though the only facial movement available is blinking.

This made me think of the recent revelation that the upcoming next generation of console hardware will be able to push out four times as many polygons as the current generation. The picture accompanying the linked article, however, is pretty underwhelming:


It looks as if this increased processing power will be used primarily to… add more flaps to characters’ clothing.

Of course, this technological advance could be used to greatly expand the range of expression in interactive narratives. L.A. Noire (2011), for example, a crime drama set in 1947 Los Angeles, used facial capture technology to portray the complex facial expressions of actors (and consequently it is one of the only interactive narratives to feature recognizable actors). The idea is that the consumer must look for subtle facial clues while interviewing suspects to determine if they are lying. Ultimately, however, the game fell flat for many consumers because the level of facial detail was not fine enough to trigger instinctive emotional comprehension, making that part of the experience very frustrating. But L.A. Noire with four times the detail might work perfectly.

However, the objection might be reasonably raised that this kind of technological fetishism will not actually improve narrative expression. Except for the specific case of L.A. Noire, perhaps, most interactive narratives aren’t being held back by technological constraints. There are plenty of ways to express emotion without faithfully rendering every single facial muscle. There are all kinds of emotional symbologies available. After all, Final Fantasy VII managed to create compelling, emotionally expressive characters way back in 1997, even though the characters’ only available facial expression was blinking. The technology may have improved by orders of magnitude since then, but the stories certainly haven’t. The solution isn’t better technology, but better writing.

Sam Keeper puts it succinctly:

…so much in AAA game design is mindbendingly wrongheaded. You can’t tech demo your way to emotion, no matter how many pretty wrinkles you put on the face of your sad old man sprite! Emotion isn’t higher resolution, you’re not saying anything more profound with those pixels! You say profound things with a marriage of form and content, a blending of experimentation and sound communication techniques.

I mostly agree with this, but I want to point out that other representational visual mediums have clever ways of increasing or decreasing the level of detail in their representations according to the narrative’s need to express emotion at any particular moment. Keeper here is writing about Homestuck, a great example of this. It is a verbal-visual narrative that constantly moves between levels of detail in order to manipulate the consumer’s level of emotional engagement. It even acknowledges this manipulation at one point: when the narrative introduces a character named Aradia, she is foreshadowed as a mysterious, ominous character with great power committing acts of violent destruction, and she is drawn in a relatively realistic fashion. When it comes time to reveal more about her, however, the narrative explicitly acknowledges that it will “render the girl in a more symbolic manner.” The next page then depicts her in the armless “chibi” form Homestuck frequently uses. The narrator remarks, “That’s better. We can now be properly introduced. Who’s this spooky lady?” The narrative acknowledges that it had been using a less symbolic representational style to convey mystery and menace, and that by changing to a more symbolic style the representation has purposefully eliminated that response and replaced it with familiarity: now we can be properly introduced.

Speaking of chibi, anime and manga change representational detail levels to manipulate reader engagement all the time. It is common for characters’ representation to switch from realistic proportions to chibi style without warning, usually to express an emotion like anxiety, embarrassment, anger, etc. Drawing the characters wildly disproportionately, so that they are barely recognizable as human, reduces the reader’s ability to empathize with their emotions. And emotions that we don’t empathize with are funny. Here is Wikipedia’s example of chibi style:


This character is clearly very angry, but the style allows us to distance ourselves from that emotion, rather than mirror it, and from that distance we can find such extreme anger amusing. If this level of emotion were represented in a more realistic style the effect would be very different: we would only expect it in some sort of tense, angry confrontation with great narrative weight. And since emotions are frequently depicted in this defamiliarizing way, depicting emotions less symbolically becomes a powerful signal to the consumer that the emotions on display carry narrative weight, and that emotional engagement and empathy are the appropriate response at that particular moment. Furthermore, chibi style isn’t the only way anime and manga manipulate consumer response:


This is not a chibi representation, but the lack of pupils, the perfectly round eyes, and the lack of a nose all signal that the consternation depicted here is something that the consumer should laugh at more than empathize with. Anime and manga constantly manipulate the consumer’s emotional engagement through shifting representational detail.

So other representational visual mediums don’t need to come ever-closer to photorealism in order to convey emotion, they rely on emotional symbologies. But they’re also not constrained by those symbologies: they can increase or decrease the level of representational detail at will in order to manipulate consumer response.

However, while interactive narratives are a representational visual medium like paintings or comics or anime, they have a hard time shifting levels of detail on the fly: the hardware simply cannot render more detail. Or rather, the main method of increasing detail is very awkward: cutscenes. Final Fantasy VII, for example, plays a movie at emotionally impactful points, when the narrative needs to increase the level of representational detail in order to increase the consumer’s empathetic response. The movies are prerendered by powerful computers, so the level of representational detail is not constrained by hardware. Nowadays cutscenes might actually be rendered in realtime with higher quality models swapped in; tight control of the camera means more processing power can be focused on faces rather than other elements. However, this means that the narrative experience essentially changes to watching a movie. The only way for interactive narratives to manipulate their level of representational detail is to switch to another medium.  This is, again, awkward. It wrenches the narrative experience from participation to voyeurism, from second person to third person.

So while more powerful consoles will probably be used mostly just to render more explosions at once, they do have the potential to unlock the medium from its representational constraints. That may come in the form of detailed faces that are available to the consumer at all times, while in control of the camera, for example, not just during cutscenes. Or it may take some other form: as technological limits fall away, interactive narratives will begin to have just as many options for representation as other representational visual mediums. So while better tech won’t automatically mean better narratives, and there’s good reason to be skeptical of the fetishization of ever-faster hardware as a way to improve the medium, I think more polygons can at least create the potential for better representation. But of course, none of this will make much difference without good writing!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s