<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Visualization, etc.</title><atom:link href="http://cscheid.net/rss/blog.xml" rel="self" type="application/rss+xml" /><link>http://cscheid.net/blog</link><description>cscheid.net | Visualization, etc.</description><language>en-us</language><item><title>So you want to look at a graph, part 2</title><link>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_2</link><guid>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_2</guid><pubDate>Wed, 29 Feb 2012 10:25:46 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;This series of posts is a thorough examination of the design space of
graph visualization
(&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph"&gt;Intro&lt;/a&gt;,
&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_1"&gt;part
1&lt;/a&gt;). In the previous post, we talked about graphs and their
properties.  We will now talk about constraints arising from the
process of transforming our data into a visualization.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;What is in a sheet of paper? Marks&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/marks.png" /&gt;&lt;/span&gt;
Paper is like external memory. We can make marks on it, and later we
can &lt;span class="bold"&gt;read&lt;/span&gt; marks that we made on a particular spots. We will say
then that visualizations are encodings of data as particular
configurations of marks on paper. The process of &amp;ldquo;reading a
visualization&amp;rdquo;, of getting the dataset back into our heads, is simply
the decoding of the marks of paper into what they mean. I will refrain
from defining a mark precisely: it is only important that the writer
and the reader both agree as to what constitutes one, and that they&amp;rsquo;re
both capable of reading and writing marks. Let&amp;rsquo;s see how far this notion takes
us.&lt;/p&gt;&lt;p&gt;We can draw marks of different shapes, and we use the difference
between the shapes to encode aspects of our data.  Using this idea,
we could just write down a description of a graph in english prose.
If we wanted to &amp;ldquo;visualize&amp;rdquo; the data, we would then literally read
the prose, reconstruct the graph in our heads, and be done. But
this is not a visualization!&lt;/p&gt;&lt;p&gt;If we went ahead with this boneheaded idea, we would clearly be
employing our visual system to read the prose describing the graph,
even though no one in their right minds would describe that encoding
as &amp;ldquo;visual&amp;rdquo;.  One reason for this is we know that the process of
reading prose feels fundamentally different than the process of
looking at a scatterplot, or other abstract graphical depictions.  So
let&amp;rsquo;s constrain our encodings to be &amp;ldquo;graphical&amp;rdquo; in nature. I&amp;rsquo;ll keep
this notion underspecified, but for now think of it as requiring
encodings to only be configurations of dots, lines, circles and their
shapes and positions. This makes the encoding more &amp;ldquo;visual&amp;rdquo;, and
that should be enough.&lt;/p&gt;&lt;p&gt;... or should it?
&lt;div style="clear:both" /&gt;&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Even the simplest possible abstract encoding can go boink&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/innocent_marks.png" /&gt;&lt;/span&gt;
When there are &lt;span class="bold"&gt;two&lt;/span&gt; dot marks on a piece of paper, it turns out
that we immediately see that these two marks have some distance
between them. Given two marks written on paper, I can then read a real
number, in principle to arbitrarily large precision. Since we know how
to read and write this configuration, we know how to &lt;span class="bold"&gt;encode&lt;/span&gt; a
number as distance between two points.&lt;/p&gt;&lt;p&gt;This two-mark encoding appears to be completely innocent and boring,
but it turns out that we can already get ourselves in deep trouble if
we&amp;rsquo;re not careful! If you&amp;rsquo;ve ever read about Godel encodings, you
should have started feeling uneasy around where I said &amp;ldquo;arbitrarily
large precision&amp;rdquo;. With arbitrarily large precision, I can encode
a number with as many decimal places as I want, and I can define the
encoding of my graph to be, roughly, the ASCII string representing the
graph vertices, edges and properties. Furthermore, this encoding is
lossless.&lt;/p&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/oh_oh.png" /&gt;&lt;/span&gt; Although
encoding an entire graph as a single distance between two points is
valid. it&amp;rsquo;s clearly ludicrous. What went wrong?  Remember that we
decided the prose encoding was bad because &amp;ldquo;it wasn&amp;rsquo;t visual&amp;rdquo;. Now
this new encoding of ours is (superficially) visual, but it feels
similarly bad to the prose encoding. One thing the two encodings have
in common is that the part of the decoding process that confers
&amp;ldquo;graphitude&amp;rdquo; to the data does not seem to come from our vision
system, but from some other part of the brain. We are using
arbitrarily small differences in distances to distinguish potentially
arbitrarily large differences in graphs, this encoding needs
&amp;ldquo;additional decoding&amp;rdquo;. Somehow, there are these visual bits (in this
case, a distance) which get sent to other parts of the brain for
further interpretation. And this indirectness is precisely what is bad
about the encoding.&lt;/p&gt;&lt;p&gt;&lt;span class="bold"&gt;A good visual encoding is &amp;ldquo;direct&amp;rdquo;&lt;/span&gt;. Such encodings get their
&amp;ldquo;meaning&amp;rdquo; straight from the vision system, without requiring an
explicit &amp;ldquo;reading&amp;rdquo;. This notion should be familiar to you, if you
read Bertin before: it is related to his ideas of &amp;ldquo;perception of
correspondences&amp;rdquo; and &amp;ldquo;retinal legibility&amp;rdquo; in Semiology of
Graphics.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;How is any of this at all relevant?&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;At this point, you may be thinking that this entire discussion
is a gigantic waste of time, a treatise in picking nits. But the fact
of the matter is that arguing about the effectiveness of different
visualization techniques &lt;span class="bold"&gt;is exactly&lt;/span&gt; arguing about encoding
choices. And if we ever hope our theoretical arguments to be valid
regardless of which encoding we use to examine, we need to be able to
articulate why stupid encodings like the above are, in fact, stupid.&lt;/p&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;img src="/static/20120229/chernoff_faces.png" /&gt;&lt;/span&gt;And
even armed with the simplest of the observations above,
we can already make some nontrivial statements.
If you&amp;rsquo;ve ever heard about Chernoff faces and
wondered why they&amp;rsquo;re a terrible idea, worry no more. Remember that the idea behind
Chernoff faces is that since we&amp;rsquo;re incredibly good at face
recognition, then we should encode different attributes of our data as
different aspects of a face. (If you&amp;rsquo;ve never heard about them before:
yes, they are hilariously bad. But I assure you they were proposed
completely seriously.)&lt;/p&gt;&lt;p&gt;So why are Chernoff faces bad? It&amp;rsquo;s simple: although recognizing
different faces and telling them apart is something we do thousands of
times a day, we almost never think &amp;ldquo;yes, George Clooney&amp;rsquo;s face is
different from Julia Roberts&amp;rsquo;s face because his eyes are obviously
12.3% larger, his ears are 15.7% smaller, and his nose is half as
hooked.&amp;rdquo;  In fact, Chris Morris, David Ebert and Penny Rheingans have
experimentally confirmed this:
&lt;a href="http://www.research.ibm.com/people/c/cjmorris/publications/Chernoff_990402.pdf"&gt;Chernoff faces are not pre-attentive&lt;/a&gt;.  The important point is that
even though we can, given enough time, make precise judgements
of face proportions, the values don&amp;rsquo;t jump at us: we have to
&lt;span class="bold"&gt;read&lt;/span&gt; faces in the same way we would read numbers from a
spreadsheet. And, if the encoding forces me to read, then it&amp;rsquo;s not a
visual encoding at all. If they were visual, we&amp;rsquo;d just stick with spreadsheet rows
in the first place!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Next&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;The above argument means that, right now, settling almost every
question of which visual encodings are good and which are bad (and at
what) needs the heavy lifting of experiments in perceptual
psychology. Morris, Ebert and Rheingans&amp;rsquo;s study is important, and
appears to answer that one question definitively. But we would like to
have a theory which would explain, ahead of time, why Chernoff faces
are bad, and many other ideas we might have, without needing to
recruit 500 people on Mechanical Turk. I might come back to this later
on in the series, when we have enough theory under our belts to
actually say something about Chernoff faces.
Still, a lot of work has been done in evaluating visual encodings in
isolation, and we will have to recap the most important ones.&lt;/p&gt;&lt;p&gt;Next: what do we know about good visual encodings, and can we use
these encodings directly for graph visualization? If not, &lt;span class="bold"&gt;why
not&lt;/span&gt;?&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>HCL color space blues</title><link>http://cscheid.net/blog/hcl_color_space_blues</link><guid>http://cscheid.net/blog/hcl_color_space_blues</guid><pubDate>Thu, 16 Feb 2012 01:20:06 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;I&amp;rsquo;ve been playing around with the HCL color space. HCL, if you&amp;rsquo;ve
never heard of it before, is a color space that tries to combine the
advantages of perceptual uniformity of Luv, and the simplicity of
specification of HSV and HSL. HCL is an improvement over HSV and HSL,
but it is not exactly ideal: there is a nasty discontinuity at some
bits of the transformation! I have been trying to find a way around
this, but I&amp;rsquo;m stumped. Let me explain, and maybe you can help me.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/xyz_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;The transformation from RGB to HCL is somewhat complicated, and
involves two intermediate color spaces,
&lt;a href="http://en.wikipedia.org/wiki/CIE_1931_color_space"&gt;CIEXYZ&lt;/a&gt; and
&lt;a href="http://en.wikipedia.org/wiki/CIELUV"&gt;CIELUV&lt;/a&gt;.
Going from RGB to XYZ is a simple matrix transformation: $(x,y,z) = M
. (r,g,b)$. For arcane reasons, there are many possible matrices: the
one most relevant nowadays is the
&lt;a href="http://www.brucelindbloom.com/index.html?Eqn_XYZ_to_RGB.html"&gt;sRGB/D65
matrix&lt;/a&gt;. This is a linear transformation designed to make a
&amp;ldquo;brightness&amp;rdquo; coordinate, Y, while encoding the rest of the space in
the other coordinates by roughly mapping them to &amp;ldquo;red&amp;rdquo; and &amp;ldquo;blue&amp;rdquo;
stimuli.&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/luv_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;To go from XYZ to CIELUV, things are a bit more complicated: this is
the bit that tries to match the physiology of a typical human vision
system, which is much better at telling shades
of yellow and green apart than it is at telling shades of blue
apart. The &lt;a href="http://en.wikipedia.org/wiki/CIELUV"&gt;full
transformation&lt;/a&gt; behaves nonlinearly, and tries to make the euclidean
distance in CIELUV correspond roughly to perceptual differences. In
this space, L encodes the lightness of the color, or how bright it is,
and uv encodes the chromaticity portion: the particular tint or shade
of the color.&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/hcl_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;Finally, HCL is then obtained by simply transforming the UV
coordinates of Luv to polar coordinates. The phase is interpreted as
&lt;span class="bold"&gt;hue&lt;/span&gt;, and the length of the vector as &amp;ldquo;saturation&amp;rdquo;
(specifically, it&amp;rsquo;s then called &lt;span class="bold"&gt;chroma&lt;/span&gt;).&lt;/p&gt;&lt;p&gt;The goal of HCL is to be perceptually uniform along its axis, and so
the thing to notice is how the apparent brightness of the colors all
appear roughly the same for any given slider setting; and while moving
along the horizontal axis changes the hue of the color, it doesn&amp;rsquo;t
change the perceived lightness or saturation. Compare this with the
HSV colorspace.&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;
&lt;span style="float:left;"&gt;&lt;iframe src="http://cscheid.net/static/20120216/hsv_frame.html" width="400" height="480" frameBorder="0"&gt;&lt;/iframe&gt;&lt;/span&gt;So you can play with these color spaces, I&amp;rsquo;ve written a few little
demos of the color spaces using
&lt;a href="http://cscheid.github.com/facet/"&gt;Facet&lt;/a&gt;. The sliders control
the axes which resemble brightness, and the image then shows a slice
of the resulting parameter space. You will need WebGL and Chrome for
these to work (sorry!). Pay attention to the boundary of the gamut.&lt;/p&gt;&lt;p&gt;One of the great conveniences of HSV is that no matter what you do in
HSV, you will end up somewhere inside the (0,0,0)-(1,1,1) cube of
valid RGB colors. That means nothing too strange happens.&lt;/p&gt;&lt;p&gt;On the other hand, if you play a bit with the LUV and HCL colorspaces
in low luminances, you will see a discontinuity in the
conversion. Although it happens outside the RGB gamut, it is still
quite annoying: some paths through HCL are cut off in RGB. The issue
happens when clamping the values from outside of the gamut back into
(0,0,0)-(1,1,1). This is what I would like to solve: is there a simple
way to create a (clamped) conversion from HCL to RGB that is
continuous and reasonable?&lt;/p&gt;&lt;p&gt;The procedure that is used in the
&lt;a href="http://cran.r-project.org/web/packages/colorspace/index.html"&gt;R
package for colorspace management&lt;/a&gt; is the one I&amp;rsquo;m currently using in
the demo above: after converting from HCL to a value, we find the
closest point to the raw conversion that is inside the RGB cube.&lt;/p&gt;&lt;p&gt;Here&amp;rsquo;s a different approach that &lt;span class="bold"&gt;is&lt;/span&gt; continuous: instead of
converting the color $c$, we instead search for the closest color in
HCL space $c&amp;rsquo;$, which converts to a value inside the RGB gamut. Now
the problem is: how do we actually find such a transformation
efficiently? It&amp;rsquo;s easy to see that if $c$ goes outside the RGB gamut,
then $c&amp;rsquo;$ will be on the boundary of the gamut. So this is &amp;ldquo;merely&amp;rdquo;
a two-dimensional search problem. Except that the boundary of HCL or
CIELUV in RGB space is complicated. So we&amp;rsquo;re looking for the minimum
of a function constrained to a complicated 2D surface, and I don&amp;rsquo;t
think there&amp;rsquo;s any simple algorithm to do this.&lt;/p&gt;&lt;p&gt;Or is there?&lt;/p&gt;&lt;p&gt;&lt;div style="clear:both" /&gt;&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Acknowledgements&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;Thanks to &lt;a href="http://had.co.nz/"&gt;Hadley Wickham&lt;/a&gt; for teaching me
about HCL, whose &lt;a href="http://had.co.nz/ggplot/"&gt;ggplot&lt;/a&gt; library
uses that color space extensively. This post grew out of trying to
make continuous HCL scales easier to specify.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>So you want to look at a graph, part 1</title><link>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_1</link><guid>http://cscheid.net/blog/so_you_want_to_look_at_a_graph__part_1</guid><pubDate>Wed, 25 Jan 2012 08:34:01 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;This series of posts is a tour through of the design space of graph
visualization. As I promised, I will do my best to objectively justify
as many visualization decisions as I can.  This means we will have to
go slow; I won&amp;rsquo;t even draw anything today!  In this post, I will only
take the very first step: all we will do is think about graphs, and
what might be interesting about them.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;What is in a graph?&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;A graph $G$ has two things: a set of &lt;span class="bold"&gt;vertices&lt;/span&gt; $V$, and a set of
&lt;span class="bold"&gt;edges&lt;/span&gt; $E$, where each edge is represented by an ordered pair of
distinct vertices (so in this definition we will not have multiple
edges and &amp;ldquo;self-edges&amp;rdquo;). To denote that $(a, b)$ is in $E$, I will
use $a \to b$.&lt;/p&gt;&lt;p&gt;Usually, we also have a mapping $v_\textrm{attr}$
from $V$ to some other space $V_A$. This gives us attributes of these
vertices (names of the people in your social network, names of the
computers in your intranet, etc.). A similar mapping $e_\textrm{attr}$
from $E$ to $E_A$ does the same for edges (is $b$ married to $c$ or
does $b$ work for $c$? How far is $h$ from $j$?, etc.).&lt;/p&gt;&lt;p&gt;These define a graph, but they don&amp;rsquo;t say much of what is interesting
about them. So let&amp;rsquo;s list some properties of (these very general)
graphs. By explicitly thinking about them, we can see the impact they
will have on our choices of pictures.&lt;/p&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Graphs are directed or undirected&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;One important characteristic of graphs is whether they are
&lt;span class="bold"&gt;directed&lt;/span&gt; or &lt;span class="bold"&gt;undirected&lt;/span&gt;. When we say that a graph is
undirected, we mean that whenever $(a,b) \in E$, it is implied
that $(b,a) \in E$: in other words, the has-edge relation is
symmetric, and $e_\textrm{attr}((a,b)) = e_\textrm{attr}((b,a))$.
(For undirected graphs, I will write $a &amp;ndash; b$ to mean
that both $(a, b) \in E$ and $(b, a) \in E$ are true).  Otherwise, we
say that $G$ is directed.&lt;/p&gt;&lt;p&gt;This distinction is important because, remember, the first rule of
visualization is &amp;ldquo;draw all there is, but no more&amp;rdquo;. If our graph is
such that $a \to b$ does not imply $b \to a$, our visualization of it
better not imply that the relationship between $a$ and $b$
look symmetric. Of course, &amp;ldquo;making the relationship look symmetric&amp;rdquo;
is not a formal statement, and we might argue about what it
really says. But this is what I meant about the
difference between a formal systematization and an &amp;ldquo;informal&amp;rdquo; one:
we should not disconsider the notion simply because we don&amp;rsquo;t know how
to formalize it! And, as we will see, I believe this distinction
&lt;span class="bold"&gt;does&lt;/span&gt; guide the visualization choice.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Graphs have paths&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;An edge $a \to b$ in a graph implies some sort of connection between
$a$ and $b$, and we typically think of these connections being
&lt;span class="bold"&gt;transitive&lt;/span&gt;. So if $a \to b$ and $b \to c$ encode some
relationship, we tend to think of there existing some relationship
between $a$ and $c$ as well (we will say $a \leadsto b$ to say that
there exists some path $a \to \ldots \to b$).&lt;/p&gt;&lt;p&gt;This reveals another interesting property of graphs. Let&amp;rsquo;s say you send
the elements of $V$ into new sets, such
that whenever $a \leadsto b$ and $b \leadsto a$, $a$ and
$b$ must go into the same set. Then, every element of $V$ ends up in exactly
one new set. These sets form a &lt;span class="bold"&gt;partition&lt;/span&gt; (into &amp;ldquo;strongly
connected components&amp;rdquo;, SCCs). Natural partitions like this
are your data&amp;rsquo;s way of telling you to consider divide-and-conquer. If
you think paths are important (implying that SCCs are important as well), then
your resulting visualization should be &amp;ldquo;partition-preserving&amp;rdquo;
too: 1) your visualization should have the ability to visually
represent a partition of vertices (call it a &amp;ldquo;visual partition&amp;rdquo;) and
2) iff $a$ and $b$ are in the same partition, then the visualization
of $G$ should put $a$ and $b$ in the same &amp;ldquo;visual partition&amp;rdquo;.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Paths have cycles&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;We will call a path $a \leadsto a$ which does not repeat internal
vertices a &lt;span class="bold"&gt;cycle&lt;/span&gt; (and we will require that cycles in undirected
graphs have at least three two internal vertices). A directed graph
with no cycles is a dag (&amp;ldquo;directed acyclic graph&amp;rdquo;) and an undirected
graph with no cycles is a tree.&lt;/p&gt;&lt;p&gt;Vertices of a dag can be assigned natural numbers such that for every
pair of vertices $a$ and $b$ such that $a \leadsto b$, $f(a) &amp;lt; f(b)$. If your paths encode
dependencies, this assignment of numbers &lt;span class="bold"&gt;ranks&lt;/span&gt; the
dependencies, and is good information to have around.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h3&gt;Many undirected graphs have a metric structure&lt;/h3&gt;&lt;/div&gt;&lt;p&gt;The final structure I want to mention is the &lt;span class="bold"&gt;metric
structure&lt;/span&gt;. For some undirected graphs, there is a very natural way to
come up with a distance function between two vertices such it
resembles the familiar distances in plain old
two- and three-dimensional space. Our eyes are reasonably good at
distance judgements (yes, that&amp;rsquo;s somewhat controversial because of
optical illusions and such. But if we are sensitive to these issues, I
believe we can use Cleveland to back the statement.)&lt;/p&gt;&lt;p&gt;Anyway, a function $d: V \times V \to R$ is a &lt;span class="bold"&gt;metric&lt;/span&gt; if:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;$d(a, b) \ge 0$, with equality iff $a = b$.&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;$d(a, b) = d(b, a)$&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;$d(a, c) \le d(a, b) + d(b, c)$, for all $b$&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;(Assume that the graph is connected for now; any pair of vertices
(a,b) is such that $a \leadsto b$ or $b \leadsto a$.) If one of the
attributes of undirected graph edges is a positive &lt;span class="bold"&gt;weight&lt;/span&gt;
associated with each edge, then the standard metric to assign to a
graph is the &lt;span class="bold"&gt;shortest-path metric&lt;/span&gt;, where we say that the
distance $d(a, b)$ is given by the smallest cost of a path, this cost
being the sum of the edge weights along the path.&lt;/p&gt;&lt;p&gt;&amp;ldquo;But what if my graph has negative edge attributes?&amp;rdquo;, you ask. Good
question!  Then you simply can&amp;rsquo;t use a metric to describe that
particular attribute of your graph. And slightly less trivially, if your
visualization technique implies that your graph obeys some metric, then
it is telling a lie. As a preview of the next few posts, this
&amp;ldquo;metric-friendliness&amp;rdquo; will be a crucial distinction between network
diagrams and matrix diagrams.&lt;/p&gt;&lt;p&gt;Next up, I will talk about 2D space; a sheet of blank paper where we
get to write. Then we will put those things together, and bam,
&lt;span class="bold"&gt;visualization&lt;/span&gt;.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Previous posts&lt;/h2&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;&lt;a href="http://cscheid.net/blog/so_you_want_to_look_at_a_graph"&gt;Series
introduction&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>So you want to look at a graph</title><link>http://cscheid.net/blog/so_you_want_to_look_at_a_graph</link><guid>http://cscheid.net/blog/so_you_want_to_look_at_a_graph</guid><pubDate>Mon, 16 Jan 2012 18:56:48 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;Say you are given a
&lt;a href="http://en.wikipedia.org/wiki/Graph_theory"&gt;graph&lt;/a&gt; and are
told: "Tell me everything that is interesting about this graph".
What do you do? We visualization folks like to
believe that good pictures show much of what is interesting about
data; this series of posts will
carve a path from graph data to good graph plots. The path will take
us mostly through well-known research results and techniques;
the trick here is I will try to motivate the choices from first
principles, or at least as close to it as I can manage.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;One of the ideas I hope to get across is that, when
designing a visualization, it pays to systematically
consider the design space.
Jock MacKinlay&amp;rsquo;s 1986 real breakthrough was not the
technique for turning a relational schema into a drawing
specification. It was the realization that this systematization was
possible and desirable. That his technique was formal enough to be
encoded in a computer program is great gravy, but the
basic insight is deeper.&lt;/p&gt;&lt;p&gt;Of course, the theory and practice of visualization in general is not
ready for a complete systematization, but there are portions
ripe for the picking. In this series, I want to see what I can do
about graph visualization.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Why graphs?&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;Graphs have enough structure to make this discussion possible
and interesting, while not being so complicated
that the specific lessons they have to teach us would not translate well
to other domains.&lt;/p&gt;&lt;p&gt;They also happen to be the &lt;a href="http://www.graphviz.org"&gt;expertise&lt;/a&gt; of the research
department &lt;a href="http://www.research.att.com/groups/infovis"&gt;where I am&lt;/a&gt;.
I have spent a good amount of time in the last two years learning
the graph drawing landscape, and this seems like an
opportunity for a braindump.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;What to draw&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;The first rule of visualization is to &lt;span class="bold"&gt;draw all there is, but no
more&lt;/span&gt;. This goal is usually not attainable.&lt;/p&gt;&lt;p&gt;But as we will see, keeping the rule in mind helps us
navigate the design space. When we are aware of the rule, the process
of thinking how we are bending it gives
natural descriptions of the trade-offs.
It also raises the question &amp;ldquo;have we missed any spots?&amp;rdquo;, and I
&lt;a href="http://cscheid.net/blog/how_many_visweek_papers_could_the_nyt_write_in_three_weeks_"&gt;think&lt;/a&gt;
this is a fundamental question to ask.&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;What not to draw&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;I am a huge fan of data visualization. Often, it gives the most
effective path from data to insight.&lt;/p&gt;&lt;p&gt;Still, if we are to defend visualization as a first-class citizen in data
analysis, we have to be honest about it. One of my pet peeves is
that some visualizations suffer from "Something must
be done; visualization is something; therefore, we must visualize it"
trap. So here&amp;rsquo;s another rule: &lt;span class="bold"&gt;use visualization only when
writing an algorithm would be harder, or not as
effective&lt;/span&gt;.&lt;/p&gt;&lt;p&gt;As a corollary of this rule, visualizations forever must
be judged against a moving goalpost. Humans will continually become
better at solving problems with algorithms; visualization as a field
exists to help from the other side of the fence.&lt;/p&gt;&lt;p&gt;Incidentally, this is why
&amp;ldquo;automatic outlier detection&amp;rdquo; is always a big red flag for me: outliers are by definition
things which fall outside a model. &amp;ldquo;Outlier detection&amp;rdquo; as performed by
a computer is, necessarily, a model. Good visualization techniques
don&amp;rsquo;t try to detect outliers: if the goal of the task is to detect
outliers, and the outliers could be detected effectively by
a computer, then a visualization wouldn&amp;rsquo;t be the right tool to find
them! (Now, replace &amp;ldquo;outliers&amp;rdquo; with &amp;ldquo;feature&amp;rdquo;, and think
about machine learning as it relates to visualization
practices.)&lt;/p&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;h2&gt;Roadmap, or, where are we headed?&lt;/h2&gt;&lt;/div&gt;&lt;p&gt;These will be some of our stops along the way:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;What are you actually showing? Network diagrams vs. matrix
diagrams&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;What do you want to show? Distance embedding and energy
minimization&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Stress majorization, and breaking the $O(|V|^3)$ barrier&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Edge bundling: pesky visuals, always getting in the way&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Some possible interludes:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;p&gt;Reading Bertin&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;&amp;ldquo;Your drawings are hairballs! Graph drawing doesn&amp;rsquo;t work for
large graphs!&amp;rdquo; A quick digression into lower bounds&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;Where are your users?!&lt;/p&gt;&lt;/li&gt;&lt;li&gt;&lt;p&gt;A visualization blog with so few pictures! Isn&amp;rsquo;t that ironic?&lt;/p&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;If you have a particular request, let me know. Much of this is
shaping up as I write it.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>The Beauty of Roots, a Facet demo</title><link>http://cscheid.net/blog/the_beauty_of_roots__a_facet_demo</link><guid>http://cscheid.net/blog/the_beauty_of_roots__a_facet_demo</guid><pubDate>Tue, 13 Dec 2011 23:34:20 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;John Baez over at his
&lt;a href="http://johncarlosbaez.wordpress.com/2011/12/11/the-beauty-of-roots/"&gt;new
blog Azimuth&lt;/a&gt; has a post with an amazing looking fractal: the set of
all roots of all polynomials with coefficients -1 or 1. Since it&amp;rsquo;s
&amp;ldquo;just&amp;rdquo; a set of points, it seemed
like the perfect opportunity to try Facet on a large, good-looking dataset, and
&lt;a href="http://cscheid.github.com/facet/demos/beauty_of_roots.html"&gt;here
is the result&lt;/a&gt;. I think it looks pretty nice. If you want to know more
about the mathematics behind it, read Baez&amp;rsquo;s post. If you care about
the visualization details of this, read on!&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;The original dataset used by Baez in the pictures is the set of all
roots of polynomials of degree up to 24. That gives about 400 million
points, and at 8 bytes per point, we&amp;rsquo;re talking 3.2GB of data. Not a
good idea :) What I show here are the roots of polynomials of degree
up to 15. It&amp;rsquo;s still fairly large, clocking at almost two million points.
Still, the total amount of data being fetched from the server is only about
15MB. This would be hard to do in anything but WebGL, and would be
painful to write in anything but Facet.&lt;/p&gt;&lt;p&gt;It&amp;rsquo;s worth mentioning that the whole thing is 180 lines of Javascript,
of which about half is jQuery and GUI-related cruft, and the other
half is Facet. The actual rendering is done in two passes. The first
pass splats additive Gaussian blobs of adjustable size and weight onto
a floating-point texture (so that we don&amp;rsquo;t get too much accumulation
error). The shape of the gaussian blobs is computed in a fragment
shader. Then, we read back the texture and pass it through a simple
tonemapping and colormap on another shader. If you read the
&lt;a href="http://cscheid.github.com/facet/demos/beauty_of_roots.js"&gt;source&lt;/a&gt;,
however, you&amp;rsquo;ll see that there&amp;rsquo;s no shaders being written anywhere:
they&amp;rsquo;re all synthesized from the Javascript expressions.&lt;/p&gt;&lt;p&gt;The bit that took a lot of ugly parameter hacking was getting a
pleasant tradeoff between a global look of the fractal structure,
while still seeing details when zooming in. A fixed screen-space width
for each blob looks bad (You can&amp;rsquo;t really see the points when deep
zooming, they become too small), but a fixed world-space width for
each blob looks bad too (the blobs never resolve into roots). The
solution is to use, essentially, the geometric mean between those two
sizes. It works well in practice, but I can&amp;rsquo;t really justify it
theoretically.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>Announcing Facet, the EDSL for WebGL visualization and graphics</title><link>http://cscheid.net/blog/announcing_facet__the_edsl_for_webgl_visualization_and_graphics</link><guid>http://cscheid.net/blog/announcing_facet__the_edsl_for_webgl_visualization_and_graphics</guid><pubDate>Mon, 14 Nov 2011 12:18:07 -0500</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;&lt;a href="http://cscheid.github.com/facet"&gt;Facet&lt;/a&gt; is a Javascript
library I&amp;rsquo;m writing, part of a research project on high-performance
visualization and graphics on the web. It&amp;rsquo;s peculiar how historical
accidents are opportunities in disguise. If everyone knew Lisp, and if
Javascript and the WebGL shading languages were Lisp, Facet would be
what everyone would write. But Javascript is no Lisp, and WebGL&amp;rsquo;s
vertex and fragment programming languages aren&amp;rsquo;t Lisp either, and many
people still don&amp;rsquo;t know about Lisp.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;There&amp;rsquo;s a massive opportunity, then, to narrow this chasm between the
Javascript and the WebGL worlds, a chasm that&amp;rsquo;s currently giving every
WebGL programmer out there a world of hurt, even if they don&amp;rsquo;t know
about it. This chasm must be narrowed by programming language
technology: WebGL embeds a set of programming languages into
Javascript, and it is the mismatch between the two that causes much of
the pain.&lt;/p&gt;&lt;p&gt;Facet is a Javascript library to bring high-level, composable
primitives to high-performance graphics and visualizations on the
web. Facet is an embedded, domain-specific language in Javascript, and
it is built around an optimizing source-to-source compiler.&lt;/p&gt;&lt;p&gt;Facet is still very much work-in-progress. But if you care about this
sort of thing, jump over to the
&lt;a href="http://cscheid.github.com/facet/"&gt;Github&lt;/a&gt; page, or just
&lt;a href="http://github.com/cscheid/facet"&gt;fork Facet&lt;/a&gt; directly; I&amp;rsquo;d
love to hear your feedback.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>How many VisWeek papers could the NYT write in three weeks?</title><link>http://cscheid.net/blog/how_many_visweek_papers_could_the_nyt_write_in_three_weeks_</link><guid>http://cscheid.net/blog/how_many_visweek_papers_could_the_nyt_write_in_three_weeks_</guid><pubDate>Tue, 1 Nov 2011 16:43:16 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;Last week&amp;rsquo;s &lt;a href="http://visweek.org"&gt;VisWeek&lt;/a&gt; finished with Amanda
Cox&amp;rsquo;s amazing capstone talk about the visualization work that goes on
at the &lt;a href="http://www.nytimes.com"&gt;New York Times&lt;/a&gt;. As everyone in
the room was rightfully being blown away by the incredible
productivity of their graphics department,
&lt;a href="http://twitter.com/timrdf"&gt;Tim Lebo&lt;/a&gt;
&lt;a href="https://twitter.com/#!/timrdf/status/129943843171868672"&gt;asked&lt;/a&gt;:
How many papers could NYT submit in 3 weeks?&lt;/p&gt;&lt;p&gt;This same sentiment echoed in the hallways after her talk. Now, I
don&amp;rsquo;t know what this number is; but I know what it should be. It
should be &lt;span class="italic"&gt;zero&lt;/span&gt;! NYT obviously has much to say about
visualization,
but there is an important distinction.
I think it is important to not confuse the top-of-the-line
visualization practices as seen in the NYT with what we do at
VisWeek. Let me be more specific.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;For example, I see at least two reasons for design study papers at
VisWeek. First, these might offer a comprehensive view of the design
space for a particular type of data or visual primitive. A great
example of this type of paper this year was Claessen and van Wijk&amp;rsquo;s
&lt;a href="http://www.computer.org/csdl/trans/tg/2011/12/ttg2011122310-abs.html"&gt;Flexible
Linked Axes for Multivariate Data Visualization&lt;/a&gt;. As Jarke put it in
his talk, one of the questions design study papers should strive to
answer is &amp;ldquo;is there all that is?&amp;rdquo;. In other words: have we looked at
the entire design space? Why or why not? Those are typically research
questions, and deserve papers. The other reason for a design study
paper at VisWeek is to highlight an especially important area of
current research in domain sciences, and point at directions of future
work. I almost see these papers as provocations: they show how other
areas might need visualization, and how our research might use those
other areas&amp;rsquo; problems and research questions as framing devices for
our own problems and research
questions. &lt;a href="http://www.cs.utah.edu/~miriah/"&gt;Miriah Meyer&lt;/a&gt;&amp;rsquo;s
recent biological visualization papers are the perfect examples of
this kind of research program.&lt;/p&gt;&lt;p&gt;Although I have not asked the NYT directly, I think it is fair to claim
that their graphics department does not, and should not,
care about either of the categories above. The NYT is in the business
of informing people in a timely manner, as truthfully, thoroughly and
compellingly as possible. We now know that visualizations are an
integral aspect of this process, and it is rewarding and stimulating to see
visualizations executed with the skill and speed of which only the NYT
seems to be capable. But notice that neither of these says anything
about &lt;span class="italic"&gt;research&lt;/span&gt; in visualization!&lt;/p&gt;&lt;p&gt;We are in the business of &lt;span class="italic"&gt;understanding&lt;/span&gt; visualizations: why,
when, and how they work. We are in the business of &lt;span class="italic"&gt;making it
easier&lt;/span&gt; to produce good, compelling visualizations. We are even in the
business of screaming to the rest of the world, at the top of our
lungs, that they should be using visualizations, and not stupid Excel
tables of numeric values.  As our field becomes more practical and has
more impact, it is important to keep in mind, however, that we are
&lt;span class="italic"&gt;not&lt;/span&gt; in the business of actually producing visualizations,
however fun and rewarding that might be. Bringing brilliant
practitioners such as Amanda Cox to our conference is the ideal way of
interacting with visualization as it is practiced. Thinking that the
work which goes on at the NYT should become our research is not.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>VisWeek 2011 day 2</title><link>http://cscheid.net/blog/visweek_2011_day_2</link><guid>http://cscheid.net/blog/visweek_2011_day_2</guid><pubDate>Tue, 25 Oct 2011 10:44:08 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;The VisWeek keynote is just over; the fast-forward and the first VAST
session are now underway. There&amp;rsquo;s more VAST in the afternoon, and also
what I think will be a great panel.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;I thought that the keynote was somewhat disappointing. It seems wrong
to snark at an invited speaker, but I wish his talk had a little more
justification for the visualizations he presented; the animation of
the neural network process, for example, seemed to me to use
visualizations for precisely the wrong reasons: it showed something,
it looked good, but it never became clear what the visualization
taught the users. What&amp;rsquo;s the point, then? Am I missing something here?&lt;/p&gt;&lt;p&gt;One paper I&amp;rsquo;m looking forward to seeing this morning is &lt;a href="http://www.cg.tu-bs.de/media/publications/Albuquerque2011PBQ.pdf"&gt;Albuquerque et
al&amp;rsquo;s Perception-Based Visual Quality Metrics&lt;/a&gt;. The interesting bit here
is that they use perceptual experiments to derive a metric through
explicit energy minimization. This is neat: the paper uses human
subjects where computers can&amp;rsquo;t do a very good job, and computers where
humans do a good job. This is something that visualization papers tend
to get wrong: because the community is good at creating depictions of
data which are intelligible and attractive, we tend to forget that
computers are actually great at computing things!&lt;/p&gt;&lt;p&gt;In the afternoon, there&amp;rsquo;s &lt;a href="http://vis.stanford.edu/papers/orion"&gt;Orion&lt;/a&gt;, Heer and Perer&amp;rsquo;s system for
manipulating network data. The only systemsy tool I know for
generating and processing network data is GVPR, and it feels, ahem,
slightly outdated. It certainly gets simple jobs done, but AWK
expertise seems to be falling out of favor, and Heer has a good feel for
good computational building blocks. I&amp;rsquo;m looking forward to that talk.&lt;/p&gt;&lt;p&gt;In the afternoon as well, there&amp;rsquo;s van Wijk, Ware, Demiralp and
Laidlaw&amp;rsquo;s aptly named panel: &amp;ldquo;Theories of Visualization: are there
any?&amp;rdquo;. This continues the long-standing debate on how to
appropriately define and measure the goodness and utility of a
visualization. Old readers will remember my discussion on the topic at
the time of Laidlaw&amp;rsquo;s capstone some years ago. This problem continues
to be important, continues to be hard, and continues to be completely
open.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item><item><title>VisWeek iOS app</title><link>http://cscheid.net/blog/visweek_ios_app</link><guid>http://cscheid.net/blog/visweek_ios_app</guid><pubDate>Sun, 23 Oct 2011 12:56:52 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;There is now a &lt;a href="http://itunes.apple.com/us/app/conference-manager/id459194322?mt=8"&gt;free iOS app&lt;/a&gt; where you can browse conference programs,
including VisWeek. No more losing that little piece of paper with the
program!&lt;/p&gt;&lt;/div&gt;</description></item><item><title>VisWeek 2011 day 0</title><link>http://cscheid.net/blog/visweek_2011_day_0</link><guid>http://cscheid.net/blog/visweek_2011_day_0</guid><pubDate>Sun, 23 Oct 2011 07:41:16 -0400</pubDate><description>&lt;div&gt;&lt;div /&gt;&lt;p&gt;&lt;a href="http://visweek.org"&gt;VisWeek 2011&lt;/a&gt; starts tomorrow, with two
workshops, three tutorials and the contest. Having helped organize the
workshops program, I&amp;rsquo;m obviously biased on the matter. Come attend the
workshops! They&amp;rsquo;re fantastic.&lt;/p&gt;&lt;div&gt;&lt;div /&gt;&lt;p&gt;You get to choose between two great
events; it&amp;rsquo;s a pity they run in parallel. The first
workshop is on visual analytics and health care, and in particular about
&lt;a href="http://www.visualanalyticshealthcare.org/"&gt;getting physicians
to pay attention to us and vice-versa&lt;/a&gt;. This perspective cannot be
stressed nearly enough, and it&amp;rsquo;s obvious how important this topic
is. In particular, I&amp;rsquo;d like you to consider attending the panel they
will host at 11:00am. Let&amp;rsquo;s get a conversation started between these
two universes!&lt;/p&gt;&lt;p&gt;(Yes, I was half-asleep when I typed the original post. The
uncertainty workshop is on monday!) The second workshop is on
&lt;a href="http://data-stories.com/"&gt;story-telling&lt;/a&gt;, which is noteworthy
in that it follows the extremely successful workshop from last year,
and should be very popular as well. The interaction with the
journalist community is very exciting, and getting people in newsrooms
to care about what we do, and us caring about what they need, is
well worth your attendance. I&amp;rsquo;ll note a talk by New York Times&amp;rsquo; Brad
Stenger.&lt;/p&gt;&lt;p&gt;I&amp;rsquo;ll try to do some live-blogging, since not everyone is going to be
around for the whole week, in particular for the first few days. The
last couple of years were a complete live-blogging failure on my part,
but let&amp;rsquo;s see if I can make it work this year.&lt;/p&gt;&lt;p&gt;If you see me around tomorrow, stop and say hi!&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;</description></item></channel></rss>
