2026-05-14 · paper

There and Back Again: From Arrows to Tables

Work in progress. Picks up where part one left off; same cast, same slow pace.

Last time we built the RDPG recipe. Every node gets a yellow arrow and a blue arrow, every potential edge is a coin flip whose bias is the dot product of the giver’s yellow and the receiver’s blue. The bigger that number, the more likely the edge. We did that with two nodes, Anubis and Bastet, and we waved at the world with the kind of confidence you can have when you only have two of anything.

This time the cast widens. Anubis and Bastet are joined by the rest of the Egyptian cosmogony: Ra, Isis, Osiris, Horus, Thoth, and friends, related to one another in all the messy mythical ways the texts allow. (I’m building the real pantheon network for a later post; for today the extras stay abstract.) Having more than two nodes is what lets us ask the question the whole research programme is really about: if all we get to see is the network, and not the arrows themselves, can we recover the positions of the nodes that produced it?

Here is the path for today. First we turn arrows into rows of numbers and stack those rows into tables. Then the yellow and blue tables make a green table of probabilities. Finally, from one noisy graph drawn from that green table, we ask how much of the hidden geometry can be found again.

Ra, falcon-headed Egyptian sun god with a sun disk — Ra, falcon-headed under the sun disk. Sails the sky by day, fights chaos by night.

Isis, throne-crowned Egyptian goddess — **Isis**, throne-crowned, magical and unstoppable. Recombined the world by sheer will.

Osiris, mummified Egyptian lord of the afterlife — **Osiris**, mummy-bound and green-skinned, lord of the afterlife and the Nile's annual return.

Horus, falcon-headed Egyptian sky god — **Horus**, falcon-headed, sky and kingship. His left and right eyes are the moon and the sun.

Thoth, ibis-headed Egyptian god of writing — **Thoth**, ibis-headed, god of writing, calendars, and the patience to count things.

Portraits by Jeff Dahl, CC BY-SA 4.0.

Let’s start by changing how we draw the arrows.

From an arrow to a list of numbers

An arrow in two dimensions is two numbers (an $x$ coordinate and a $y$ coordinate). An arrow in three dimensions is three numbers. An arrow in $d$ dimensions is $d$ numbers. We can write each arrow as a row of numbers, left to right, one entry per dimension. There’s an equivalent way to read those numbers, and it’s the one we’ll use from here on: they are the coordinates of a single point in $d$ -dimensional space, the tip of the arrow, with the shaft no longer drawn.

Each of those numbers is a coordinate along one axis. Anubis’s first yellow coordinate might be how funereal he is, his second how meticulous, his third how dog-like; whatever the abstract axes are meant to capture, his coordinates pin him to a point in yellow space, his identity-as-a-giver. The same holds for blue space: every node also has a point there, its identity as a receiver. The two positions together, one in each space, fully describe that node in the model.

There’s a catch hiding in that “might be”, and it’s worth flagging now. No single coordinate carries a meaning on its own. As we’ll see by the end of this post, the embedding is pinned down only up to a rotation (the thread you may have tugged with the rotate-together toggle in part one), and rotating reshuffles the individual coordinates while leaving all the geometry untouched. So “funereal” or “meticulous” cannot live on any one axis; what survives a rotation, and can therefore carry real meaning, are the lengths of the arrows, the distances between them, and the angles between them. A later post will read the cloud with exactly that in mind.

We also don’t always know what the axes are meant to capture. Often they’re implicit, found by the data; in that case we use them without bothering to name them, and the model still works.

Stacking arrows into matrices

Once each arrow is a list of $d$ numbers, $n$ nodes give us $2n$ such lists, $n$ yellow and $n$ blue. Each colour stacks into its own table: $n$ rows of $d$ numbers. Mathematicians call that kind of table a matrix; this one has $n$ rows and $d$ columns, so we call it an $n \times d$ matrix.

We end up with two matrices. A yellow matrix $\textcolor{#ca8a04}{Y}$ , whose row $i$ is node $i$ ‘s yellow coordinates. A blue matrix $\textcolor{#1d4ed8}{B}$ , whose row $i$ is node $i$ ‘s blue coordinates. Both are $n \times d$ . You can read each matrix two ways: as a stack of arrows (one per row, the way we built it) or as a cloud of $n$ points in $d$ -dimensional space, each row a point. The point-cloud reading is the one that serves the geometry.

Each tip leaves its space, settles into a row of the matrix, and gives up its two coordinates: first as the colour intensity of two cells, deeper for larger values, and then as the literal numbers inside them. Same construction in yellow and in blue. From here on we’ll mostly draw and think about the tips, not the shafts.

The point of stacking is that we get to work with the whole population at once. When the arrows live one-per-node, we can only reach for them one or two at a time. Once they’re in matrices, we can operate on all nodes at once. We can stop counting too: $n$ becomes just a row index, and what we’ll do next works the same whatever value it takes.

Multiplying gives the bias matrix

We want all $n^2$ bias values at once: one for each directed pair of nodes, each one the dot product of one node’s yellow row against another’s blue row. Matrix multiplication computes every such dot product in a single operation. We multiply $\textcolor{#ca8a04}{Y}$ and $\textcolor{#1d4ed8}{B}$ , in the matrix sense of the word.

Matrix multiplication is a row-by-column operation: cell $(i, j)$ of the product is built by taking row $i$ of the left matrix, pairing it entry by entry with column $j$ of the right matrix, and summing the products. For that to make sense, the row and the column must have the same length. $\textcolor{#ca8a04}{Y}$ ’s rows have $d$ entries; so do $\textcolor{#1d4ed8}{B}$ ‘s rows, but its columns have $n$ . We transpose $\textcolor{#1d4ed8}{B}$ to swap rows and columns: row $i$ of $\textcolor{#1d4ed8}{B}$ becomes column $i$ of $\textcolor{#1d4ed8}{B}^\top$ , same numbers, written along a column instead of along a row.

Now we can multiply: $\textcolor{#ca8a04}{Y} \cdot \textcolor{#1d4ed8}{B}^\top$ . Slot $(i, j)$ of the result is built by combining row $i$ of $\textcolor{#ca8a04}{Y}$ , which is node $i$ ‘s yellow arrow, with column $j$ of $\textcolor{#1d4ed8}{B}^\top$ , which is node $j$ ‘s blue arrow. The pairing-and-summing rule turns those two arrows into a single number, and from part one we already know what that number is: the bias of the coin for the edge from node $i$ to node $j$ .

The product matrix therefore collects every coin bias the model has to offer, all $n^2$ of them, in one place. Call it $\textcolor{#15803d}{P}$ , the bias matrix: each entry a number between zero and one, with larger entries meaning more likely edges. To keep the colour palette going, think of it as a green matrix, mixing the warm yellow and the cool blue.

$\textcolor{#1d4ed8}{B}$ opens beside $\textcolor{#ca8a04}{Y}$ and then pivots into its transpose $\textcolor{#1d4ed8}{B}^\top$ , each cell travelling from its row and column in $\textcolor{#1d4ed8}{B}$ to its column and row in $\textcolor{#1d4ed8}{B}^\top$ . The bias matrix then assembles cell by cell. The first entry, $\textcolor{#15803d}{P}_{3,1}$ , is constructed in detail: row 3 of $\textcolor{#ca8a04}{Y}$ and column 1 of $\textcolor{#1d4ed8}{B}^\top$ light up, leave their matrices, pair up entry by entry, multiply into green cells, and sum to $0.41$ . That value drops into $\textcolor{#15803d}{P}$ . The right-hand inset shows the same dot product geometrically, as shadow rectangles of equal area $0.41$ along $\textcolor{#ca8a04}{Y_3}$ and $\textcolor{#1d4ed8}{B_1}$ . A quick sweep then fills the remaining fifteen cells.

The pairing-and-summing operation has a name we’ve already met. The dot product showed up in part one as a geometric object: the area of two equal shadow rectangles, or $|\mathbf{a}|\,|\mathbf{b}|\cos\theta$ . There’s an equivalent algebraic recipe for it. If formulas are friendly to you, here is the same rule in symbols. If not, the animation has already carried the idea: pair the entries, multiply each pair, add the results.

If two arrows in $d$ dimensions are written as lists of numbers, $\mathbf{a} = (a_1, a_2, \dots, a_d)$ and $\mathbf{b} = (b_1, b_2, \dots, b_d)$ , their dot product is

$\mathbf{a} \cdot \mathbf{b} \;=\; a_1 b_1 + a_2 b_2 + \cdots + a_d b_d.$

In words: multiply the corresponding entries of the two lists, then add up the products. The geometric definition and this algebraic one are the same quantity, computed two different ways. You can check it by hand in two dimensions. For an intuition in general, imagine rotating the coordinate system so $\mathbf{a}$ lies along the first axis. Then $\mathbf{a}$ has only one nonzero component, $|\mathbf{a}|$ , and the algebraic recipe collapses to $|\mathbf{a}|$ times the first component of $\mathbf{b}$ , which is exactly $|\mathbf{b}|\cos\theta$ . The recipe doesn’t care which coordinate system we picked, so the formula holds in any dimension.

Every entry of $\textcolor{#15803d}{P}$ is a single number between zero and one (the positive-orthant housekeeping from part one is what keeps it there). To generate a random network, flip $n^2$ coins, one per entry of $\textcolor{#15803d}{P}$ , each coin’s bias being the corresponding entry. The result is an $n \times n$ adjacency matrix $A$ of zeros and ones, with $A_{ij} = 1$ if the edge from $i$ to $j$ fired and $0$ otherwise.

Let’s pause. The yellow and blue tables make a green table of probabilities. That green table is not the network yet. It is the set of coin biases from which the network is drawn. The network we see is one noisy realization of that hidden green table.

What we get to see, and what we don’t

So far we’ve worked forward, from arrows to graph. Now we’re on the observer’s side, where the graph is all we have.

When we look at a real network, what we have is one draw of $A$ . We don’t have $\textcolor{#15803d}{P}$ , $\textcolor{#ca8a04}{Y}$ , or $\textcolor{#1d4ed8}{B}$ ; one adjacency matrix of zeros and ones is all we get to work with.

The model says $A$ was generated from $\textcolor{#15803d}{P}$ by independent coin flips. So $A$ is a noisy version of $\textcolor{#15803d}{P}$ . If we could observe many independent draws of the same network (the same $\textcolor{#15803d}{P}$ , repeated) and average them, the average would converge to $\textcolor{#15803d}{P}$ as the number of draws grew. With infinitely many draws, the average of all the $A$ s would equal $\textcolor{#15803d}{P}$ exactly. That’s the Law of Large Numbers, working entry by entry of the matrix.

The top two panels show the fixed yellow and blue tips of eight nodes, split into two clusters of four. Those tips produce a fixed bias matrix $\textcolor{#15803d}{P}$ with green block structure (bottom left): high biases within each cluster, lower between. Each press of resample draws a fresh $A$ from the same $\textcolor{#15803d}{P}$ (bottom right), dense within clusters and sparse between, but with the exact pattern of edges shifting every draw.

This is rarely how networks come to us. Usually we see one snapshot. It’s still useful to keep both regimes in mind: with one $A$ , we have to do statistics; with many independent draws of the same $\textcolor{#15803d}{P}$ , we can almost see $\textcolor{#15803d}{P}$ directly.

SVD and the recovery question

Suppose we have $A$ . Suppose we believe the RDPG model: $A$ is sampled from $\textcolor{#15803d}{P} = \textcolor{#ca8a04}{Y} \textcolor{#1d4ed8}{B}^\top$ , each entry of $A$ an independent coin flip whose bias is the corresponding entry of $\textcolor{#15803d}{P}$ . Can we recover $\textcolor{#ca8a04}{Y}$ and $\textcolor{#1d4ed8}{B}$ from $A$ ?

This is a matrix-factorisation question. We want to find two thin matrices ( $n \times d$ each, with $d$ much smaller than $n$ ) whose product is approximately $A$ .

Another way to say the same thing: if this big table is mostly made from a few simple patterns, what are those patterns?

The classical tool for this is the Singular Value Decomposition. Loosely: hidden inside any matrix is an ordered list of orthogonal directions that capture its structure, from most important to least. The first direction is the one along which the matrix has the most variation; the second, perpendicular to the first, has the next-most; and so on. Each direction comes with a number, the singular value, that says how big the variation along it is.

For our $A$ , the underlying $\textcolor{#15803d}{P} = \textcolor{#ca8a04}{Y} \textcolor{#1d4ed8}{B}^\top$ has only $d$ nonzero singular values: as the product of two thin matrices, only $d$ directions carry any structure; past those, the singular values are zero. $A$ is $\textcolor{#15803d}{P}$ plus coin-flip noise, which leaks small extra singular values past the first $d$ . The signal stays concentrated in the top $d$ ; the rest is noise to throw away.

Formally, the SVD writes $A$ as $\textcolor{#ca8a04}{U} \textcolor{#15803d}{\Sigma} \textcolor{#1d4ed8}{V}^\top$ , where $\textcolor{#ca8a04}{U}$ and $\textcolor{#1d4ed8}{V}$ are orthogonal (their columns are perpendicular to each other and have length one) and $\textcolor{#15803d}{\Sigma}$ is diagonal, with the singular values in decreasing order. The columns of $\textcolor{#ca8a04}{U}$ are the row-side directions, the columns of $\textcolor{#1d4ed8}{V}$ the column-side directions, and the singular values the weights.

The truncated SVD keeps only the top $d$ singular values and their matching columns of $\textcolor{#ca8a04}{U}$ and $\textcolor{#1d4ed8}{V}$ . It produces the best rank- $d$ approximation of $A$ : of every matrix of rank at most $d$ , the truncated SVD gives the closest to $A$ .

Try this widget first with $n = 8$ and $k = 2$ , then with $n = 64$ and $k = 2$ . The middle panel should start to look much more like the true green matrix as the population grows.

The rank- $k$ truncation $A_k$ is the best rank- $k$ approximation of the observed graph $A$ . At $k = d = 2$ it is aiming for the signal we care about, but at small $n$ the coin-flip noise still gets in the way. As $n$ grows, the top two singular values pull clear of the noise floor, and $A_k$ starts to resemble the hidden $\textcolor{#15803d}{P}$ .

The SVD writes $A$ as a product of three matrices, $\textcolor{#ca8a04}{U}$ , $\textcolor{#15803d}{\Sigma}$ , and $\textcolor{#1d4ed8}{V}^\top$ . We want two matrices whose product approximates $A$ . So we need to fold the middle one into the other two.

The truncated SVD, $\textcolor{#ca8a04}{U_d} \, \textcolor{#15803d}{\Sigma_d} \, \textcolor{#1d4ed8}{V_d}^\top$ , is the best rank- $d$ approximation of $A$ . (If we have many independent draws from the same $\textcolor{#15803d}{P}$ , we run the SVD on their average instead, which has less coin-flip noise to fight through.) Three matrices, still, not two.

The folding is easier than it looks, because $\textcolor{#15803d}{\Sigma_d}$ is special. It’s a diagonal matrix: only $d$ entries on its diagonal, $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_d$ , and zero everywhere else. Multiplying anything by $\textcolor{#15803d}{\Sigma_d}$ just scales each direction by the corresponding $\sigma_k$ . That scaling can be split in two: scale by $\sqrt{\sigma_k}$ on one side, scale by $\sqrt{\sigma_k}$ on the other, and you recover the full $\sigma_k$ scaling.

Let $\textcolor{#15803d}{\Sigma_d^{1/2}}$ denote the diagonal matrix with $\sqrt{\sigma_k}$ in its $k$ -th slot. Then $\textcolor{#15803d}{\Sigma_d^{1/2}} \, \textcolor{#15803d}{\Sigma_d^{1/2}} = \textcolor{#15803d}{\Sigma_d}$ : the “matrix square root” of a diagonal matrix is just the entry-wise square root. For instance, if $d = 2$ and the top two singular values happen to be $\sigma_1 = 4$ and $\sigma_2 = 1$ , then

$\textcolor{#15803d}{\Sigma_d} = \begin{pmatrix} 4 & 0 \\ 0 & 1 \end{pmatrix}, \quad \textcolor{#15803d}{\Sigma_d^{1/2}} = \begin{pmatrix} 2 & 0 \\ 0 & 1 \end{pmatrix}, \quad \textcolor{#15803d}{\Sigma_d^{1/2}} \, \textcolor{#15803d}{\Sigma_d^{1/2}} = \begin{pmatrix} 4 & 0 \\ 0 & 1 \end{pmatrix}.$

Now tuck one copy of $\textcolor{#15803d}{\Sigma_d^{1/2}}$ into $\textcolor{#ca8a04}{U_d}$ on the left and the other into $\textcolor{#1d4ed8}{V_d}$ on the right:

$\textcolor{#ca8a04}{U_d} \, \textcolor{#15803d}{\Sigma_d} \, \textcolor{#1d4ed8}{V_d}^\top \;=\; \textcolor{#ca8a04}{U_d} \, \textcolor{#15803d}{\Sigma_d^{1/2}} \, \textcolor{#15803d}{\Sigma_d^{1/2}} \, \textcolor{#1d4ed8}{V_d}^\top \;=\; \bigl(\textcolor{#ca8a04}{U_d \, \Sigma_d^{1/2}}\bigr) \, \bigl(\textcolor{#1d4ed8}{V_d \, \Sigma_d^{1/2}}\bigr)^\top.$

The two thin matrices we wanted are right there:

$\textcolor{#ca8a04}{\hat Y} \;=\; \textcolor{#ca8a04}{U_d \, \Sigma_d^{1/2}}, \qquad \textcolor{#1d4ed8}{\hat B} \;=\; \textcolor{#1d4ed8}{V_d \, \Sigma_d^{1/2}}.$

By construction, $\textcolor{#ca8a04}{\hat Y} \textcolor{#1d4ed8}{\hat B}^\top$ is the best rank- $d$ approximation of $A$ . Under reasonable assumptions, $\textcolor{#ca8a04}{\hat Y}$ and $\textcolor{#1d4ed8}{\hat B}$ converge to the true $\textcolor{#ca8a04}{Y}$ and $\textcolor{#1d4ed8}{B}$ as the number of nodes grows.

This whole recipe, taking the truncated SVD of the adjacency matrix and folding the square root of the singular values into each side, has a name we’ll reuse throughout the series: the adjacency spectral embedding, or ASE. It is the workhorse of RDPG estimation, the standard way to turn one observed network back into a cloud of estimated positions.

Why more nodes help

The last clause, “as the number of nodes grows”, is doing real work. Here’s why more nodes tighten the estimate. Each node is, in a sense, a probe of the others.

Consider Anubis. We want to know his yellow arrow. The information about that arrow lives entirely in the row of $A$ that records who Anubis sends edges to. That row has $n$ entries, one per potential receiver. Each entry is a coin flip whose bias is Anubis’s yellow against that receiver’s blue. So each receiver is, in effect, a slightly different probe of Anubis’s outgoing personality.

If we have only two nodes (Anubis and Bastet), Anubis’s yellow gets probed by exactly one other arrow, Bastet’s blue, for the single edge from Anubis to Bastet. One probe, one coin flip, almost no information. Anubis’s yellow could be anything compatible with that one flip.

If we have a hundred nodes, Anubis’s yellow is probed against a hundred different blue arrows. Each probe is a noisy measurement, but a hundred noisy measurements pinned to a hundred different reference directions can pin down a vector pretty tightly. The same logic applies for the blue arrows, working through the columns of $A$ .

The same Anubis at three sample sizes. Top row: the network we get to see, Anubis with directed edges out to the other $n - 1$ nodes; solid edges fired, dashed grey ones didn’t. Neither his yellow nor any of the blues is visible to us. Bottom row: the feasible region for his yellow, the positive orthant intersected with the unit disk, shaded by how much weight the data places on each point. With few outgoing edges to learn from, the density is broad across the region; with many, it concentrates to a peak. The peak is sharp but not a single point: even with arbitrarily many observations there is a rotation we can’t get past, which we’ll meet in a moment.

More nodes does double work: a bigger network to look at, and more probes per node from which to triangulate that node’s arrows.

The rotation we left waiting

The SVD recipe works, but the “converge to the true $\textcolor{#ca8a04}{Y}$ and $\textcolor{#1d4ed8}{B}$ ” clause needs a qualifier. If you played with the rotate-together toggle in part one, you might already see it coming.

The dot product of two arrows is invariant under rotation. Rotate $\textcolor{#ca8a04}{Y}$ and $\textcolor{#1d4ed8}{B}$ together by the same angle, and every entry of $\textcolor{#ca8a04}{Y}$ and every entry of $\textcolor{#1d4ed8}{B}$ moves, but the bias matrix $\textcolor{#15803d}{P}$ doesn’t budge at all. Watch:

The latent tips spin together around the origin (left), and the entries of $\textcolor{#ca8a04}{Y}$ and $\textcolor{#1d4ed8}{B}$ swing through new values each frame, some of them turning negative as the cloud leaves the positive orthant. The green matrix $\textcolor{#15803d}{P}$ on the right sits perfectly still: every cell holds its value the whole way around. That’s the invariance.

So the positions of the nodes, as the dot-product rule sees them, are only defined up to a common orthogonal rotation. The SVD picks one particular rotation (the one that aligns the axes with the principal directions of the data). Any other rotation, applied to both $\textcolor{#ca8a04}{Y}$ and $\textcolor{#1d4ed8}{B}$ together, would be just as good an answer.

This is a structural feature of the model: identifiability of node positions, in RDPG, is up to rotation. When you do downstream work (clustering, alignment, comparing two networks) you have to live with it. Most of the time you either ignore it (clustering is rotation-invariant anyway) or fix it (find a particular rotation that lines the answer up with something you care about). We’ll come back to the alignment question in a later post.

Where we go next

That’s the basic recovery story. Stack arrows into matrices, multiply to get the bias matrix, flip coins to get the observed adjacency, do truncated SVD to get the arrows back, remember the rotation.

Two large questions stay on the table for later. First, what happens when the network changes over time: the arrows drift along some path, the bias matrix moves with them, and the sequence of observed graphs describes a dynamical system. The rotation ambiguity from this post turns into a real obstacle to recovering the underlying dynamics, and the SVD recovery brings its own artifacts when you try to track trajectories through a time series of graphs. Second, what if the population of nodes isn’t a fixed list but a random sprinkling drawn from an intensity over space: the arrows become a continuous cloud rather than a finite set of tips, and the bias matrix becomes a heat map over that space, with classical RDPGs emerging as a concentrated limit. These are where my recent papers live, and where the next posts in the series will go.

For now, here’s where we’ve arrived. From the arrow of a single node, to two stacked matrices; from those matrices to a bias matrix and then a random network; and from a random network, via SVD, back to estimates of the arrows.

The surprising part is that one messy network can still carry the shadow of the geometry that made it. From the visible pattern of who reached toward whom, we can begin to infer the hidden arrangement of tendencies: who gives, who receives, who is aligned with whom. Anubis and Bastet’s mutual desire was never written directly in the graph. It only appeared as a chance of connection, then as a coin flip, then as an edge. But with enough company around them, enough other gods acting as witnesses, the shape of that desire starts to become recoverable.