information

Here, there, everywhere: Scatter plots and heat maps in Illustrator

June 15, 2012By carlainformation

Many designers choose to export data from Excel into Adobe Illustrator’s charting tool. A few months ago, we found ourselves scratching our heads over a request that we received. The internal client had lots and lots of data on personal income and wanted to show this data as a scatter plot graphic, divided into quintiles. They had already created a basic version of this graphic in Excel, but needed the designers to redesign it to make it easier to understand and to add the visual polish and presentation for which design software is better suited.

Goals, approach and tools

We took a look and determined that our biggest task was simply learning how to create a scatterplot graphic that could handle the large volume of data the original Excel file contained–1,043 rows of data.

Our goal: Show intensity and data patterns across five categories (quintiles). Keep the data “live” in order to be able to quickly update the graphic with new data.

Our approach: Use Illustrator’s scatter plot tool to graph the data. Customize the graph to create a heat map in order to show intensity/concentration of data.

Our tools: Adobe Illustrator’s graph creation tool and Illustrator’s transparency settings to create a heat map

An explanation of the final product: a heat map produced using Illustrator’s scatter plot graph tool

Take a look at the graph below. I recreated something similar to that which my team designed (update: because we haven’t yet published the data, this graph shows widgets instead of the subject matter of the original graphic). This hypothetical graph shows costs of production (money) spread out across four categories (quartiles)–a bottom quartile, a second quartile, a third quartile and the top (fourth) quartile. The darker the color (the heat map effect) the greater the intensity of those data. In other words, where the color is darkest represents a large number of widgets that with that cost of production. Where it is lightest represents a smaller number of widgets with that cost of production.

Fig. 7: Bells and whistles: Showing intensity in a scatter plot graph in Adobe Illustrator — Showing intensity in a scatter plot graph in Adobe Illustrator

Needless to say, we learned a lot about Illustrator’s scatter plot capabilities in six hours.

What you need to know before you begin this tutorial

Before I begin, I’m assuming a basic level of understanding with Illustrator (we were using CS5, but I believe all CS levels should work for this example) and its graph creation tool. If you’re not, search for Illustrator graphs and you’ll find plenty of tutorials. Better yet, FlowingData has a good basic tutorial on Illustrator graphs here. And so does Adobe. If you’re a designer, you probably already know the basics.

Although I’m also assuming that you know what a scatter plot graph and what a heat map is, this tutorial will explain a bit about its uses and compare it to a line graph, albeit briefly.

To better explain all of this, let’s first start by building a more basic graph, a scatter plot graph.

As you can see from the example above, the graph contains four category rows (labeled “Category 1,” “Category 2,” etc. In the real world, these categories could be years, quartiles or however you wish to divide up your data. Each category shows a row of data points associated with it (Category 1 shows a gap in the values between 6 and 15, for example). Keep an eye on Category 4’s outlier, the number 20 in the top right. More on that later.

How to correctly import data into Illustrator’s data tool

Half the battle is learning how to enter or import this data into Illustrator. Essentially, think of your data as a series of columns that alternate. The first column has your “Y” axis values (your categories); the second column has your “X” axis values (the data associated with each category).

In my example, for Category 1 to appear first on my “Y” axis, I entered“1” in the first column (repeating “1” as many times as I had data for that category). In the second column I entered all the data associated with Category 1. Repeat, and you’re all set.

Take a look at these first two columns in the data table to see how simple this is: [FIGURE 1].

Data table for a scatter plot graph: FIGURE 1

Customizing your graph by using Illustrator’s “Graph Type” feature

As a final step, once you are finished working in the data table, click the checkmark button in the top right corner to output the graph. Then right-click on the graph itself and select “Type” from the menu. From the resulting “Graph options” dropdown in the dialog box, select the “Value axis.” In that dialog box, make sure that “Override Calculated Values” is checked. This is how to format your values for those fields:

Minimum value: should always be set to zero
Maximum value: should match the number of your categories (in the basic example shown in Figure 1, I had four categories, so I entered “4”)
Divisions value: this is how the categories will be divided up. I always find this one intuitive, though difficult, to explain. For this tutorial make sure that the number you enter is one less than the total the number of categories that you have (I had 4 categories, so I entered a “3”).

Basic scatter plot graphic in Illustrator: FIGURE 2

Here’s the resulting graphic that Illustrator will produce at this point (I added the color manually). The blue row in the graphic represents column 2 in the data table. Remember: the reason Illustrator knows to put those blue points under the row labeled “1” is because you labeled them as 1s in column 1 of the data table. You can later change the name of that row from “1” to “Category 1” (as an example) in the graphic itself. [FIGURE 2]

Fig. 2: Unformatted scatter plot graph in Adobe Illustrator

[Aside] How a scatter plot graph is different than a line graph: FIGURE 3

As an aside, if you’re not familiar with scatter plot graphs, here’s a quick explanation of how to interpret this one. Take a look at category 4 (it’s the green square in the top right of the graph shown in Figure 2, in the data table in Figure 1 it is the last column). Do you see how Category 4 has ten values, each numbered as 20?

But on the scatter plot graph, you only see the number 20 represented once (green square). Scatter plot graphs won’t show you data points when they overlap exactly–a line graph will, however. Here’s the same data in a line graph. Category 3 (green) now shows you each data point that is numbered as 20. [FIGURE 3]

Fig. 3: Comparison of same data: A line graph in Adobe Illustrator — Fig. 3: Comparison of the same data: A line graph in Adobe Illustrator

Customizing the scatter plot graph: FIGURE 4

Back to the scatter plot graph, you can customize the labels in the graphic itself once you’re finished with the data view. For example, you can change the category numbers from 1,2,3 and 4 to specific category values that reflect how the data is actually organized (e.g., by quartile, by year, etc.). More importantly, you can customize further with fonts, colors, stroke widths, etc., some of which Illustrator will retain if you return to data view and change the data. Which ones, you might ask? That’s a post for another day. [FIGURE 4]

So, in the real world, what can Illustrator do for you? FIGURE 5

You can create a graphic that looks like this (this is not real data, of course): [FIGURE 5]

Fig. 5: Finished: A scatter plot graph in Adobe Illustrator

How to set up 1,043 rows of data: FIGURE 6

Here’s a look at the data. You’ll notice that the setup is identical to the basics that I outlined earlier. The figure below shows you a snapshot of how each row is set up. [FIGURE 6]

Figure 6: Data view of hypothetical widgets

Turning a scatter plot into a heat map: Using transparency to further customize Illustrator’s scatter plot graph to create a heat map: FIGURE 7

I promised you a heat map, and here it is. Remember that a heat map essentially shows areas of concentration (or lack thereof) in data–intensity.

To show intensity for those data points that overlap (like the repeated series of 20s that I mentioned in Figure 2), simply select *all* the points in a category with your direct selection tool. (If you’re familiar with the “Select Similar” feature in illustrator, use your direct selection tool to choose just one data point, then choose use the “select similar” feature to automatically select all of the points in that row.) Then apply transparency to them all at once. Because transparency has a cumulative effect when layered on top of something else that is transparent, you are essentially creating a heat map effect. [FIGURE 7]

A simple explanation of transparency: FIGURE 8

Fig. 8: Bells and whistles: How to use transparency to show intensity in Adobe Illustrator

Well, that’s it. Please let me know if I’ve left anything out. Remember, this tutorial is meant to show how to customize Illustrator’s scatter plot graph tool. Just because you can, doesn’t mean you should! Once we publish the actual graphic, I’ll post that as well.

Should graphics be easy to understand?

June 7, 2012By carlainformation, news

Ah, the glamorous life of the data visualization designer… to draw or not to draw? To obfuscate or not to obfuscate? I’ve been doing some reading lately about a debate that is making its way amongst the data viz community. At what point does too much illustration, creativity or innovation get in the way of the primary purpose of data visualization? And how well is the design community being transparent about art based on data versus data visualization? Or, to put it more simply, should data visualization be easy to understand and what happens when it’s not?

Allow me, first, to offer up my own definition, artfully cadged from people much smarter than I and enhanced by my own experience in the field, such as it is. So, data visualization is what, exactly?

Information served up visually in order to inform and improve/enhance our understanding of the data.

Clumsy, but I’m hitting the main points: inform and understanding. If pressed, I would add the word “easily.” Actually, it’s the word “easily” that prompted me to write this.

If you can’t understand a data visualization piece, then it’s pretty useless, isn’t it? Maybe it’s beautiful, but if you walk away more confused than you began, it’s useless. And if you walk away as confused, or a bit less confused, it’s still useless.

How far can we take this concept? Here is a quick survey of what folks have been saying lately. Props to infosthetics for providing a good starting point for these discussions. And here they are:

Stephen Few’s blog post on the two types of data viz is a good start. According to Few (Tufte’s alter-ego), there are two approaches to presenting data graphically—data visualization and data art. As he puts it, “rarely do the twain meet.” Therein lies the problem. They do meet. All the time. Though Few makes a good point—failing to distinguish between them creates confusion and harm, I would argue that the two are not mutually exclusive.

Few defines data visualizations as products created to inform, and “data art” as visualizations of data created to entertain—“art based on data”—something which can be judged accordingly.

My response? Would that the public were quite as discerning as he. The train has left the station and what we have before us is—at worst—a proliferation of eager designers too quick on the draw to consider the very important questions that need to be asked about the data that are being depicted. At best, a cadre of informed (and willing to learn) designers who humbly allow the information, the audience and the goals of the visualization to drive the design—who are loathe to add one extra pixel that doesn’t belong, and willing to take away any element that obscures a better understanding of the data. I’d like to think that I fall into the latter category but I fall somewhere in the middle, as do most designers.

Rather than drawing a bright line between these two approaches and dogmatically refusing to accept a middle ground, I suggest we embrace a blend of these when they are produced well—when they inform and present a clearer understanding of the data and are at the same time aesthetically pleasing. As a designer who chooses to serve both masters—art and data, I find joy in being able to translate a jumble of Excel rows and columns into a plain bar chart—sometimes the beauty lies in the hard work of sifting through the data and simplifying complexity. And sometimes the joy comes from experimenting with different formats and adding visual accents to enhance the data—provided, of course, that the user’s ability to understand the data is not impeded, but enhanced.

Nevertheless, I agree with Few’s depiction of the pitfalls of “data art” being misperceived as data visualization, and I’ll add one myself. In addition to spreading poor practice instead of best practice, it creates unrealistic expectations about what is acceptable in a data visualization, particularly for those of us who are working in the industry in a supportive capacity to researchers and writers with an uneven understanding of best practices (how many of us have been asked to create 3D graphics or exploding pie-charts on a whim?).

And a rising tide floats all boats. In this case, I’ll agree with Few’s point that the proliferation of “data art” and other fancy-schmanzy graphics that pass for data visualization imply that data viz is a closely-guarded secret known only to denizens of the data underworld (paraphrasing liberally from Mr. Few, here). But I take issue with his assertion that this prevents the “democratization of data”—implying that the public is somehow being dissuaded from engaging and creating data. For better or for worse, they aren’t. Just google “infographics.”

As an interesting aside, note that Eagereyes’ Robert Kosara wrote a primer on the two types of data visualization that Few discusses, waaaay back in 2007. Like Few, Kosara was also bothered by the blurred line between data and art. What Few calls “data art” Kosara called “artistic visualization.” Nonetheless, they each underscore the same points—keep data and art separate in order to be as transparent and clear about the data as possible. I agree with the goal.

As Kosara puts it, “looking at one type of visualization expecting the other will lead to disappointment and misunderstandings.”

Kosara, uses what is, in my opinion, one of the best data viz sites out there (infosthetics) as an example of sites that don’t make those distinctions, thus creating confusion. Granted, this was back in 2007. I wonder what he’d say now? Nonetheless, I disagree. Let’s not confuse lack of best practice (for example, normalizing your data to prove a point, and not being transparent about it) with the so-called sin of creating a piece that is visually striking. A designer can produce a graph with no artistic aspirations whatsoever that nonetheless obscures the data. And a designer can produce a terrific visual that observes best practices (to inform) and serves up the data artistically and well.

Adam Crymble has a different moniker for Few’s “data art” and Kosara’s “artistic visualizations.” He calls these graphics “shock and awe.” I love that term. Of all the discussions that I have read, Adam’s make the most sense to me. He doesn’t touch on all data viz that is artistic, but rather focuses on the extreme—and in this I strongly agree with the points he makes.

Adam Crymble: “shock and awe” graphics

We’ve all seen these very beautiful, complex visualizations that belong inside of a picture frame or a screensaver. Or, for a few seconds, they give us pause and food for thought.

I’ve seen them, written about them and admire them for what they are—unique explorations of the complexity of data. An artistic or visual expression of the complexity of the information we spew out and take in. But they don’t inform in the traditional sense of the definitions of data viz. They may underscore a pattern, convey a sense of weight through sheer numbers or complexity (as the example above does), but that’s about it. They’re pretty much impossible to understand on a granular level without some work.

Adam’s assertion that these complex visualizations have no place in the academic world is beyond my ken. For the record, the example above is mine, not his (see his post for his own, more humorous example). But if he is correct that peer reviewers are afraid to betray their lack of understanding of these graphics, and thus—through tacit acceptance—are endorsing their validity, well then that should concern all of us.

The most interesting point to be gleaned from Adam’s perspective, I think, is the bullying nature of shoving a terabyte of data in front of someone’s face and saying “Aren’t I clever? Don’t you get it?” I don’t. Point well-taken, Adam.

Mark Ravina writes an interesting rebuttal to Adam’s criticism of “shock and awe” graphics. He compares these artistic and complex visualizations to early feminist scholarship that provoked anger when it challenged the systemic sexism of the ivory tower. I’m a huge fan of confrontation and anger-provoking methods to push movements forward. In the early 90s, ACT-UP did the same thing for GLBT rights, if you’ll recall. Without ACT-UP, Queer Nation and Lesbian Avengers, there would be no fancy Human Rights Campaign fundraising dinners today. I get it.

But Ravina’s assertion that these complex visualizations of data somehow push the field forward is a bit much for me. He calls them “intellectual challenges.” I’m not so sure about that. How many of us who are willing to spend more than a few seconds trying to piece together a gazillion threads and data points in a fancy graphic. I think we consider it more of a waste of time to do anything other than admire the concept, the novelty of the presentation and then move on. Intellect doesn’t play a big role here (the creator, on the other hand, gets some bragging rights for creativity). Does it stick? Does it move the field forward? Um, maybe, sometimes?

Ravina spends a fair amount of time discussing how humanities researchers (he knows them better than I, certainly), insist on tables when they ask for data. I didn’t really read that into Adam’s criticism of these graphics—he was merely pointing out that data viz designers were making information too complex—he never claimed that the solution was to create charts. Then Ravina cites the misuse of pie charts to make the point that just because something is familiar, it can be misused. Is he implying that unfamiliar things can’t? As he puts it, “is schlock worse than shock?” Aside from the clever turn of phrase, it’s a bit of a moot point. Nothing that I have read criticizes innovation—merely obfuscation.

Mark Ravina: “Is schlock worse than shock?”

Ravina makes good points. He surveyed (presumably informally) graphs produced in history journals and notes that the bulk of them rely on formats developed (according to him) 200 years ago—pie charts, line charts and bar graphs. And he mentions how slow the field (I’m unclear if he means academics or history journals in particular) has been to adopt and thus understand formats that even today’s eighth graders are learning (box plots, for example). That’s a valid argument, certainly, but it has little to do with the complex visualizations that Adam was addressing or, for that matter, that Kosara and Few discuss. (To be fair, Ravina’s post was mostly in response to Adam’s).

However, he conflates different types of complexity, predictably citing Tufte and Menard (some of you know how I feel about that) as well as Rosling. Perhaps it’s a matter of taste, but I feel that Rosling bends over backwards to make his visualizations inspiring and accessible (not necessarily complex and beautiful), whereas the Menard graphic, while certainly elegant and ground-breaking, does not (of course not, and how could it, given when it was produced).

Lastly, one of the most important concerns that Adam raised was around obscuring data. By introducing unnecessary complexity into a visualization or graphic, data visualization designers can make academic and peer review verification and transparency needlessly difficult. Ravina counters this by saying that liars will lie. I don’t think that’s the point. They will lie, but transparency is as much about spotting errors or raising valid concerns as it is about unmasking willful deceit. Hats off to Ravina for taking the time to provide some very thoughtful counterpoints to the discussion.

Excelcharts is a pretty good resource for charting and data viz in general, despite the Excelcharts.com name (*smiling*). Jorge Camoes nicely (and literally) draws the elusive line between art/entertainment and data/information.

More importantly, he puts a restraining hand on eager designers, quite reasonably underscoring Few’s point to make sure that, as designers, we emphasize that charts and graphics are readable and easy to understand, not memorable or beautiful. Of course, I’ll see your readability and raise you ten, Jorge. Let’s make the data understandable and, if we can, beautiful as well.

Lastly, there is this. It is a tome. You could spend hours here. It’s an open-review paper, part of which is around data viz, part of which I have skimmed. It deserves careful reading, and I’m eager to do so and write a follow-up post.

Well, if you’ve hung in there with me, I hope you have learned something. I know I have.

The joys and sorrows of concentric circle graphics

May 29, 2012By carlainformation, interactives, news

There are not many good examples of concentric circle graphics out there. La Nacion produced one last year about subway strikes, and The Guardian produced an interactive graphic on gay rights in the U.S. Both of these intrigued me because, in my day job, I produce endless variations of graphics dealing with 50-state data. And most of the time, when we look at 50-state data, we draw… you guessed it: maps. Or bar graphs showing quantity or line graphs showing changes and trends over time but no matter what we do, it involves data for the 50 states, most often over time. 50 states multiplied by several years is a lot of lines to draw, bars to fill and state maps to create. So I’ve been thinking about ways to tell the story in different formats–going beyond the map, so to speak. Last Wednesday, we created this concentric circle interactive. Here’s how we did it, and the process we took to decide on the format.

One of the most onerous dimensions to 50-state data is the sheer physical size and length of the data. Our website used to allow for a content well of 500 pixels. Try shoving 50 state labels across 500 pixels and you’ll quickly see why it’s a challenge.

But even with all the real estate in the world, long, horizontal displays are also taxing on the user if there is a comparative aspect to the data. There is simply too much bouncing back and forth from the left to the right. Go long and you lose the comparative advantages of a horizontal layout because users with small screens must scroll vertically and can’t see the entire landscape at once. Of course, layering the data into different views as an interactive can solve that. But sometimes you want to show the data all at once. And for that, a static graphic can work well.

Understandably, a map is often the solution. But maps have their limitations too. There’s only so much that you can infer from a map. If your data consist of more than 4-5 gradations it can be tough to create the at-a-glance, concise overview for which a map is best suited.

And if there are no regional patterns discernible in your map, readers wind up staring at a jumble of color with only a legend to tie it all together.

Which brings me to concentric charts. They’re not pie charts (if you look up pie charts on wikipedia, you will see that there is a distant cousin to the pie chart called a “ring chart,” also known as a multi-level pie or a radial tree). These appear to be somewhat visually similar to concentric circle graphs but have a different use–they tend to show hierarchy in data–you might see these when your computer shows you how much disc space you have, for example.

Filelight disk usage graphic — This ring chart shows computer hard drive disk space

A concentric chart, on the other hand, can tell a different story altogether. In a recent post on La Nacion’s subway strike graphic, I mentioned how designer Florencia Abd manages to plot out a time across four nodes (year, month, day and time) as well as another variable–type of incident/strike. That’s a lot of ground to cover in a static graphic. Imagine doing it in other ways and I’m sure you’ll agree.

Because a circle is, well, round, its shape lends itself quite well to a relationship-based approach. Not so much a pie-chart (where the user sees the parts in their physical relationship to the whole), but rather using the organic form of a circle to help the user more easily compare complex data. And if you add concentric circles, you take advantage of the hierarchy inherent to those circles to create layers–an intuitive way to order your data–perfect for showing levels or ratings where you use the inner and outer rings to denote the endpoints in a scale (e.g., one thing is stronger, larger or more intense on the outside than it is on the inside) or time, as the subway graphic above shows (the outer ring shows 5 a.m. and the inner ring shows 11 p.m.).

So, what does all this have to do with the U.S. map? As I mentioned, the strength of a map is to show geographic relationships in data. For example, southern states vote “red” (or conservative) in the U.S.; whereas a swath northeastern states might vote “blue” (progressive). For this, a map is helpful because regional differences tell the story and are easy to spot.

But the nice thing about concentric charts is that they, too, can show geography, or any groupings, for that matter. As the Guardian’s example shows, each “slice” of the concentric chart belongs to a state and groups of slices are regions. In the Guardian example, each ring (or level) of the chart denotes a particular right afforded to gay couples.

My team took this in a different direction. We wanted to show states and regions as well. But we also wanted to show change over time, as well as intensity on a scale. So when the Bureau of Labor Statistics released its employment figures, we had a few choices. We needed to show how changes in employment have affected each state since the recession (from April, 2007 to April, 2012). Because the recession started in December, 2007, we wanted to show how employment looked in each state before the recession, during the recession and how (and which) states were pulling themselves out of the recession.

We could have created an interactive that showed how the same views above changed over time (presumably you’d see a pre-recession view showing states doing well, a recession view showing most states doing poorly, and post-recession years showing mixed results). The most valuable piece of this would be, of course, geographical patterns in the data, if they existed (how did the Rust Belt fare, or the East Coast, for example). You could overlay this with population or any other demographic data to tell an interesting story.

When we looked at the data, we saw that there were not very strong geographic patterns to show. So we decided to create a concentric chart. Why? Because we didn’t have geographic patterns, but we did have temporal patterns (most states did poorly during a particular period of time, which contrasted well with the mixed results that states showed as they were attempting to pull themselves out of the recession, at least in terms of their employment figures). And the fact that we used a circle meant that we didn’t have to create a very long or wide table or chart, and we could stray from the map approach.

We decided to make this a light interactive–by rolling your cursor over each state’s cell you can see a small bar graph showing change in employment over time. This worked for us because our goal wasn’t to show specific numbers (how much employment rose and fell in a particular state), but rather intensity and patterns over time.

The debate continues (check out the comments on Nathan Yau’s post on the Guardian graphic) on whether or not these concentric graphs are merely eye candy when a simple bar or line chart would do just as well. I would opine that, if used correctly, they work well. Let me know if you agree. Here’s a screenshot of our interactive, and you can view the live version here.

Quick help for Tableau

May 18, 2012By carlainformation

What I learned at Alan Smithee’s Tableau blog? Alan Smithee is not Alan Smithee. The name is a pseudonym that Hollywood directors use when they don’t wish to use their real names in production credits for a (usually terrible) movie.

I also learned a few helpful tips for getting around Tableau, the open source dataviz software that many bloggers use. “Alan” catalogs and discusses a fairly robust array of dataviz formats, presenting helpful insights on working out the less well-known ways of presenting information. There are also a few very good sections on geocoding and Tableau hacks. Love it.

Icing on the cake for newbies? A tabbed data viz presentation (in Tableau of course) of all the charts that Tableau can produce. Oh Alan, I love you.

The commenters on the blog are engaged, responding to both newbies and math geeks. It’s a good blog.

Bolivia’s global information and communications technology rankings: 2012

May 13, 2012By carlainformation, news

I’m beginning to realize that, for developing countries like Bolivia, technology (by that I mean information and communications technologies ranging from cellphones and internet access, usage and affordability to the use of social media) is a chicken-and-egg dynamic. For Bolivia, both the egg and the chicken seem out of reach, though there are signs that some things might be improving.

The World Economic Forum and INSEAD recently released the 2012 Global Information Technology Report which scores 142 world economies on their use of information and communications technologies. Below is an infographic that I designed detailing how poorly and how well (mostly the former) Bolivia is using technology to improve the lives of its citizens and to become modestly globally competitive in, as the report puts it, “a hyper connected world.”

Don’t get too depressed, there are some bright spots. If you’re interested, read more about how a newspaper in Argentina is using open data to circumvent its government’s lack of open data transparency. And if you’re really interested, e-mail me.

The good (rankings out of 12 countries in South America):

Bolivia’s political and regulatory environment (as it relates to technology) ranks 7th in South America.
Although Bolivia ranks last in business and innovation, it does show a relatively high (3rd) availability of venture capital.
Overall, the quality of Bolivia’s math and science education, its educational system overall, and its adult literacy rate all rank 7, 7 and 8, respectively.
And, though Bolivia’s individual usage of technologies ranked last (12th), its citizen participation measure ranks a promising 6.
Additionally, Bolivia’s capacity for innovation rank (5) is highly encouraging, despite another last place ranking for business usage of information and communications technologies overall.

The bad:

One of the most clear challenges for Bolivia is to increase the affordability, availability and reliability of its Information and Communication Technologies (ICT) to its citizens and the businesses that operate within its borders.
Bolivia ranks last, or close to last, along almost every index. The country’s overall Network Readiness rank is 12.

Data Journalism Awards – whodunits in Spain, business in Brazil and bus subsidies in Argentina

May 13, 2012By carlainformation, interactives, news

Three solid entries from Spain, Brazil and Argentina are among the 58 nominees featured in the first-ever international competition for data journalism, the Data Journalism awards. The awards, announced by the Global Editors Network, will be announced on May 31. In the meantime, keep your eye on these three nominees:

“La trama de la SGAE,” from El Mundo’s Spanish designer David Alameda, covers last year’s “Operation Saga,” an undercover investigation of fraudulent financial activities conducted by the president and other members of Spain’s influential Society of Authors and Editors (SGAE). This piece boils down the complex network of who gave money to whom, how much and when into one of the best examples of interactive flowcharts that I’ve seen. As with the best data visualizations, this interactive avoids the many common mis-steps that could have occurred through the overuse of photos, text, talking heads, etc. Instead, Alameda keeps his focus–and ours–on a tightly scripted interactive that guides the user quickly and efficiently through the web of financial whodunits.

2011 Brazil State-Level Business Environment Ranking ranks the country’s business environment along eight categories (ranging from the political climate to innovation) and a series of indicators specific to each category. The interface is clean and simple to understand. Navigation, categories and indicators are well-prioritized and intuitive. One of my favorite features is the linked rollover behaviour between all four elements on the screen: a regional map, a deeper state-specific map, a regional bar graph and an overall scoring graph. A lot of information packed into a clean, well-designed interactive.

Lastly, Argentina’s La Nación is doing great stuff with open data. By my calculations, given that the country ranks sixth of 12 South American countries (and 92nd out of 142 economies globally, according to the recent Global Information and Technology Report’s Networked Readiness Index), this is a telling example of how Argentina’s relatively advanced use of information and communication technologies seem to be paying off, even if its government doesn’t always play along.

La Nación’s Subsidies for the Bus Transportation System is not so much a data visualization as a series of efforts to use open data to report on how bus subsidies in Argentina are being conducted. Dig a little and you’ll find a few good infographics, investigative pieces that detail a government’s efforts to be less than transparent about dollar figures, and an encouraging collaboration between the newspaper and Junar’s open data platform to create a Tableau dashboard that is beginning to circumvent Argentina’s lack of open data infrastructure. Interestingly, the newspaper compares its early efforts to the U.S.’s Freedom of Information Act laws and the American government’s data.gov platform. The dashboard presents a snapshot of indicators key to Argentina (ranging from crime and accident rates to political indicators and legislative data). It’s a promising approach that may help other countries (like Bolivia) with similar challenges (see related article on Bolivia’s recent technology rankings).

A Public Service Announcement on rhetoric and logic

May 1, 2012By carlainformation

David McCandless has published a brilliant infographic on how we/you/I manipulate rhetoric and logical thinking. Whether you appeal to authority, flattery, probability or tradition, this infographic is for you. Faulty deduction or garbled cause and effect? There’s a place for all of us in this chart. As a smart consumer of visual information, I’m sure you’ll appreciate this infographic.

Bolivia 2.0? The role of data, technology and information in Bolivia in 2012

April 29, 2012By carlainformation, interactives, news

It is by now a cliché to to point out how developing countries most in need of what data journalism provides–a credible, fact-based approach that cuts through the noise of bias to help average citizens become informed participants in the problem-solving processes of improving social-political challenges–is not (quite) manifesting itself where it is most needed. Yeah, that’s a long sentence. But Bolivia is a case in point.

A search for data visualization in Bolivia yields mostly European NGOs posting myriad Tableau and GoogleMap visualizations about the usual statistics on health and economy–laudable efforts in their own right, but not a good representation of the state of information and data visualization in Bolivia proper.

To find what Bolivians are doing, you need lots of time and a high level of tolerance for dead links. But it’s out there. As a recent example, Bolivian@s Globales produced a modern, candid video on the state of Bolivia. It’s a solid blend of information and optimism, and shows us what today’s Bolivians are capable of producing in the digital space.

And–in a country where where the government can be reliably counted upon to discourage openness and transparency–multimedia, even the simple use of video, is critical. Fortunately, there is evidence that digital journalism is growing. The major papers went online years ago, but more importantly, there are now digital journalism sites and signs that Bolivian bloggers are growing, both in quality and in numbers.

Crowdsourcing, mapping and social media in Bolivian elections

Sadly, one of the most encouraging examples of data visualization and social media in Bolivia went dark, but the screenshots and documentation that remain are encouraging. In 2009, Voces Bolivianas and other Bolivians began using data visualization to monitor Bolivian elections (Elecciones 2.0 Bolivia). See how monitoring was crowdsourced through GoogleMaps:

Coupled with Twitter, a Facebook page and other social media, Elecciones 2.0 Bolivia was groundbreaking for Bolivians. Re-visto, an online investigative journalism site run by Deutche Welle, interviewed Mario Duran (a noted Bolivian blogger) on the groundswell of acceptance and use of social media and digital journalism in the 2009 elections (English translation here). And there was a New York Times write-up of how Bolivians were covering the elections referendum on Twitter.

Other Bolivian data visualization projects of note:

Ushahidi (a project that develops open-source tools for programmers) provided the platform for a CrowdMap tool that Bolivians are using to document civil unrest in Bolivia. Guatemalans are using the tool to map incidents of citizen extortion.
There’s also FLOW: a Water for People interactive using GoogleMaps that shows the quality of safe and reliable water supplies in countries such as Bolivia.

Bolivians’ access to reliable Internet:

Bolivia (as well as other developing nations and rural communities in the U.S.) faces another challenge–reliable internet speeds. A recent Bolivian infographic (in Spanish) describes the problem and the social media citizen lobbying effort (Mas y major internet en Bolivia–Better and more Internet in Bolivia) to address it.

I’ll be honest. As I was researching information for this post, I found myself frustrated with the fact that, after days of searching, I couldn’t easily point to a few examples of cutting edge data visualization pieces. There was a part of me that wanted to say to the world, “see, we’re doing it too, you just haven’t found us.” But I’m walking away from this experience with a much more sober understanding of the challenges that Bolivians face. I’m not a journalist. I no longer live in Bolivia. I don’t have to deal with civil unrest, strikes, sketchy Internet access and the uneasy history that Bolivian governments have bequeathed to journalists and citizens concerned with civil liberties and human rights.

The willingness of Bolivians to put in the sweat equity to learn, exploit and disseminate these technologies is self-evident and encouraging.

The next steps, as I see them? Helping Bolivian journalists continue to embrace data journalism, raising awareness of open source data platforms such as Tableau and Ushahidi, and empowering today’s technology-minded Bolivians to learn how to turn information into power through openness and transparency. I’d be most interested in hearing from you on how this is happening and look forward to writing more about it.

Data visualization as multi-media narrative

April 28, 2012By carlainformation, interactives, news

I’ve been on a multimedia kick lately, digging for interesting examples of how journalists are telling their stories via this interesting catch-all for pictures, animations and all things that move with words. A multimedia interactive timeline produced back in September, 2010 persists, in my view, as a stellar example. Yes, that was over a year-and-a-half ago, but I challenge you to find anything this good that has come out since.

El Mundo, a Spanish newspaper with a very good data visualization design team, created an interactive data visualization/multi-media narrative recreating the attempts to rescue Chilean miners trapped in the copper-gold mine near Copiapó in August 5, 2010 “Rescate de los mineros chilenos atrapados bajo tierra” (“Rescue of Chilean Miners Trapped Underground”).

Created a month after the successful rescue this piece by David Almeda successfully deconstructs the messy reality of three rescue plans, changing information on the ground, technical obstacles and engineering solutions, as well as the human faces behind the crisis. If I counted correctly, there are about 30 animated frames in this, several of which contain infographics polished enough to be published in their own right. The only thing I’d add to this would be a scrubber with a timeline to allow users to move through this at their own pace and to get a sense of the timing.

This is a solid interactive and a beautifully understated display of process, timelines and information. In our ongoing fascination with data visualization, this reminds me of why I started this blog.

Conflict underground: What’s up with Argentina’s subways (infographic)

April 26, 2012By carlainformation, interactives

La Nación, a newspaper in Argentina, created an infographic about problems plaguing the “subte” (subway). Some problems are universal, aren’t they? I think it’s safe to say that anyone on the planet with access to trains and airplanes is bound to complain about both.

In “Los conflictos bajo tierra” (“Conflict underground”) the designer, Florencia Abd, was tasked with showing–on which hours, days and months during the year–subway service was disrupted by four types of labor strikes: a work stoppage across all lines (red), a stoppage for one line only (black), a suspension of card renewals (yellow), and the closure of entrance/exit turnstiles (blue).

Take a look at how nicely the designer solved the logistical nightmare (pun intended, sort of) in laying out various levels of time (hours, days, months and peak travel times) as well as frequency (how often problems occurred) and categorization (which types of problems). It’s a terrific infographic–simplifies complex data in a way that is immediately easy to understand.

I’d love to hear how others would redesign this very, very good infographic. It begs a few different approaches.

1 234 5