NKS Midwest

The computational capacity of the universe

In the afternoon session of NKS Midwest 2008, we had several remote talks by video conference. The first was by Seth Llyod, speaking on his famous calculation “the computational capacity of the universe”. He gave a nice background to the idea in thermodynamics of entropy and how it tied in with Shannon information, and then the stages he went through in coming up with it. Originally he had the question of the “ultimate laptop”, how fast could one possible get a single kilogram of matter in a one liter volume to compute, subject only to the limits of QM, arriving at the figure of 10^51 ops/second on 10^31 bits. The basic idea is then to extend this calculation to the knowable universe, and derive from it an upper bound on the logical operations (or, distinctions possible, perhaps more accurately) in the history of the universe.

When explaining entropy he got an audience question of typical epistemological bent, claiming that information is a relation between a system and us, not something about the system. He shrugged this off with the statement that entropy is info we don’t know about the system and info is info we do know about the system, but the total info is what is2nd-law non-decreasing. He was on the other hand careful to distinguish mere information processing in the sense of logical operations from computation in the sense of universality. Someone makes the claim that the universe is capable of universal computation and presents as evidence our own practical computers; they wouldn’t be able to do it if the universe cannot. There is a potential objection from cardinality, that universality needs an unbounded store, or to add memory as needed. This relates to the cosmology question whether the universe is infinitely extendable.

I thought he could have done a better job with this, as the point of his calculation is to show the universe to date is only capable of a finite number of operations, yet everything we see was able to result from it, including all of our practical computers and their actual flexibility. To me this shows that infinite cardinality is not actually required for everything we know, empirically, as either practical computation or physical complexity. Our mathematical idealizations in computation theory make use of a single countable infinity, that the real world cases are not seen to require in practice. One can say this shows our computers or the universe are not universal, or one can say the mathematical definition of universality misses somewhat. I prefer the latter, not being wedded to that abstraction.

He next explained the QM origins of the processing speed limit, both the Heisenberg limit on the time to go from one distinguishable state to another related to an energy spread, and the Margolis limit related to the absolute energy. QC has a tendency to operate at the Margolis limit, which is the most one could expect, QM being posited of course. This gives around 10^123 ops since the big bang, maximum. The maximum memory calculation on the other hand comes from a volume calculation and the amount of information that can reside on the boundary of a given volume (evolution inside said boundary being deterministic etc). The finite volume comes with the qualifier “knowable” applied to universe – take a light cone from us into the past as far as the big bang, then forward from that region at light speed. Note that in principle this means a larger present region could access more memory than “here”, and a future one likewise more than “now”.

One then asks, what is the max entropy of this much energy (bounded below by the critical density, since the universe is seen to expand) in that much volume, and one gets 10^92 bits, which he noted are about the amount stored in the black body radiation (most of it, in other words). He speculated that maybe this figure can be pushed higher when dark energy is allowed for, but that was frankly a bit hand-wavy. He wanted to point out these numbers seem to be similar to the universe age over the planck time to various powers (3/2 for 10^92 and 2 for 10^123, one hopes, since the former is about 5×10^60 planck times), but it isn’t really exact.

Overall it was a fun talk, on an argument I had read before. He was clear and the history of physics bits on Maxwell, Boltzman, Gibbs, Planck, Heisenberg, Shannon, and Margolis were interesting.

Here is what I like most about this sort of argument, from my own philosophic perspective. So often we are presented with QM indeterminacy or continuous value cardinality as reasons for expecting hypercomputation or a universe that exceeds all finite grasp, but in fact the theory itself has a quite different operational tendency, rigorously limiting the operational distinctions possible and requiring “a distinction” to have a clear physical meaning. Said clear physical meanings always having an actual “spannedness” or positive measure in both time and energy terms. In a walnutshell, if QM is true then the universe is rigorously finite in information-theoretic terms.

But two provisos have to be allowed to that statement, for the partisans of actual infinities (I think e.g. of Max Tegmark). One, Llyod is speaking of a knowable (in principle) region of space-time and not making claims about inaccessible infinities beyond it, pro or con. And two, he nowhere addressed the many-worlds interpretation or any possible “contemporary orthogonals” it might posit. So one might allow a “knowable” between “the” and “universe” in the last sentence of the previous paragraph.

NKS Midwest, reduction & emergence

Invariants, native primitives, and computation


Paul-Jean LeTourneau of Wolfram Research, a former student of Stuart Kaufman, gave a fine pure NKS talk in the afternoon of the first day.  I want to discuss a few of the issues it raises because they seem to me to go beyond the specific case he was analysing. 

His rule system is ECA 146, and the point he noticed in analysing it is that this rule differs from the well-known additive rule 90 only in cases of adjacent blacks within the pattern.  Which in turn are produced in rule 90 (or 146, necessarily) by runs of white of even length.  Therefore, in any region in which there are neither adjacent black cells nor even runs of white, the evolution must be identical to rule 90.  Then the idea is to track these things through various evolutions of rule 146.  And he finds persistent structures, which themselves meander about on the background, can annihilate in pairs, etc.  These are present but not obvious in the rule 146 evolution, but comparing what rule 146 and rule 90 do from the same initial lets them pop out.

What is of more general interest about this?  One question is whether these localized structures might be manipulated to get the rule to perform meaningful computations, which would be a step toward solving the “class 3 problem” in the affirmative.  The class 3 problem being the question, are there class 3 random-looking rules that are universal?  Wolfram’s principle of computational equivalence (PCE) predicts there are, but this has not been proven.  All the simple rule universality proofs to date have exploited localized structures and their predictability in constructing analogs to practical computer components.  Class 3s might have been thought to lack the necessary local stability to support programming, though PCE conjectures otherwise.  This may be of broader information-theoretic interest – class 3s are thought of as “maximum entropy” systems, and appear to be computationally irreducible.  But universality is a strictly stronger attribute than irreducibility, leaving the class 3 question – are many instances of apparent complexity reducible but not universal, or are most universal?  So this is a pure NKS issue broader than the rule system itself.

But there are others, tangential to the core concerns of NKS, perhaps, but not to philosophy and issues of reductionism.  Notice these patterns refer to invariants of the evolution of rule 146, but to invariants that exceed the scope of its native primitive states.  The case of runs of even length is a particularly fine instance of that, not being a single pattern but a whole class of them.  (Those are not, however, strictly preserved by the evolution – instead they always give rise to a black-pair particle at some point, which then is preserved, up to collisions with like particles etc).

Next notice that regions of the rule seem to behave with the same additivity as rule 90, where information always passes through unchanged, at maximum (we say, “light”) speed.  While the regions marked by non-rule-90 behaving subpatterns move more slowly, and interact in a non-information-preserving way (e.g. a particle collision leading to both ceasing, has many possible time-inverse pre-images, etc).  As though two rules were operating on the same lattice, one having the information transfer properties we associate with light, the other having the information transfer properties we associate with matter (at least, macroscopically).  But it is a single underlying rule – there are simply many possible subpatterns that behave the first way because the deviations from additivity in the rule cannot arise without special subpatterns being present.  To me this is a fine example of emergence.

Notice further that in principle the two regimes are reducible to a single underlying rule (146), but to understand its internal complex behavior it may actually be superior to decompose into a simpler rule followed “some of the time” (additive rule 90) and to analyse the behavior of the “emergent” particles (black cell pairs and their even-white-run generators) “empirically”.  Why?  Because the reducibility of the rule-90 portion of the behavior can be exploited fully, if it is separated off from the non-additive remainder.

The point is to notice that these relationships (among levels of analysis, apparent particles, reduction, simplification, “factoring” of laws, etc) can arise in a purely formal system, for entirely analytical reasons.  They are not facts about physics.  They are practical realities of formalism and analysis itself.

NKS Midwest, real numbers

Computability of real valued relations

Gilles Dowek (Ecole Polytechnique and INRIA, France) gave a nice talk trying to formalize the requirements for keeping real numbers real, in my terms.  More specifically, he was interested in the restrictions required on functions or relations that take reals to reals, so that they remain formally computable — with various distinguished levels of the latter.  He laid it all out in relation to the more familiar discrete valued cases and was quite clear.

First notice that when we are discussing functions, aka deterministic or single valued relations, computability in maps from reals to reals is exactly equivalent to continuity of the mapping.  In that case we have the typical uses of analysis, and theorems telling us that we can approximate as close as we need to with rationals, and they will converge, etc.

So next he relaxes single valued and asks, first, which of the relations of reals to reals are fully decidable.  The answer is not much, only relations that are either empty or full (the only open sets that don’t care whether you distinguish members or not). 

So relax the map category to semi-decidable, meaning that the relation is either continuous as you shrink a ball around a source point or it becomes undefined at that point.  This gives a stable concept of semi-decidable but not a very useful one, because even the identity relation is not semi-decidable (since the domain is an open set) — which I see as a fine formulation of the objection computational “constructivists” have to reals in general.

So he wants a looser concept, analogous to being effectively enumerable for discrete sets.  That is equivalent to semi-decidable in the discrete case but not with reals, where it is strictly weaker.  Effectively enumerable in the discrete case means you start from the image set, and since it is countable (as is the domain, of course), you can just impose some enumeration scheme on the image, and equate any points in the domain that map to that image-point, identifying them by their image-index, in effect.

To extend this notion to the reals, what is required is that up to some intervening index (perhaps countable), the mapping be continuous.  You can regard the extra index as a random variable or some other sort of unknown, but once it is fixed then the map must become continuous.  All deviations from deterministic continuity are ascribed to an extra variable.   This then allows a stable identity relation – it is simply a projection operator that ignores the extra hidden index.

What I liked about this formalism is it fits the use we actually make of non-determinism in real valued models (or at least machine precision valued ones, not to beg too many questions).  We interpose some probability measure or we generate random samples and branch the relation-behavior on them.  I wasn’t so keen on the name he wanted to give it, however — since both domain and range are non-denumerable, “effectively enumerable” seemed a stretch to describe the real-number analogue.   He means it to apply to the relation, of course, but it is still too confusing a name.  Constructable or model-able are perhaps the right idea, but no better as terms.  Call it EE for now.

Then his programmatic statement is that we should restrict ourselves when using non-deterministic relations of reals to only such maps as are EE, that is, such that there exists some random variable that if fixed makes the outcome fully computable, because the projection of the relation to the plane of that specific value of said random variable, is fully continuous.  He notes that if the relation is a true function, this collapses to the usual definition, leaving all deterministic-model experiments or tests unaffected, etc.

It was a point-set topology sort of talk, and quite clear if you have a background in that sort of analysis.  I found it useful for getting more precise about an intuition I’ve had for some time, and something stressed by Gregory Chaitin, that real numbers (without any continuity requirements) can have pathological properties in an information-theoretic context.  There are any number of constructions in which one encodes entire possible universes in single real numbers — Chaitin pointed out one due to Borel later in the conference, that lay behind his own Omega.  Well, when we tried to say what we meant by a real number we were thinking of the limit of a rational series, not entire histories of possible universes.  The definition evidently misses, if it means the latter when it is trying to mean only the former.  Dawek’s definitional work helps us pin down where we can safely use reals without a qualm, and where they imply a formal non-computability too extreme for rationalism.

I should also say that I spoke with Dawek repeatedly throughout the conference and found him a talented and interesting guy, with a good take on everything going on etc.  He asked the last question of the conference, left unanswered by the panel for want of time.  (There was a discussion panel of luminaries on the last day, I’ll cover it later).  He wanted to know each operative’s take on whether there could be experimental tests for the computability of the universe, and for discreteness as well.  (With some preface comments wondering if everything looks like a nail now that we know what computation is and consider it our finest new hammer – sensible question).

NKS Midwest

Internal time in dynamic graphs

Tommaso Bolognesi is working on dynamic trivalent networks, along the lines suggested by Wolfram in the NKS book.  He has presented on them in the past, and in this talk he first reviewed how they work, then got to his recent questions.  Basically he distinguishes between a global time for the model as viewed from outside, from the sense of time “experienced” (or perhaps more properly, undergone) at a specific location within one of his graphs. 

Global time simply means the model steps themselves – at each of those one local subgraph is rewritten according to a definite list of rules with a preferred order of application.  The update specifies the location of the next “active site” or the next updated node.  He tracks where this occurs by node number and notices that those rules that produce instrinsic randomness tend to revisit all nodes “fairly”, in an average or statistical sense.  While globally the network may show exponential growth, for example, if the spacings out of revisits keep pace, the size of the graph at each (global) step on which a given node is updated, appears to grow linearly.  Similarly for a rule system that showed square root growth globally – again the “internal” sense of time is of linear growth.  It is a feature of fair revisitation that local updates on a graph growing at any speed appear as linear growth in a frame that regards each self-update as a clock-tick, and ignores all other updates.

He also discussed constructing the causal network of site updates, by tracking the faces shared by a given node-update.  I suggested linking those backward through time at the update time-slices given by the “effected” node, to get an internal sense of locality, as well.