« | Main | »

October 25, 2010



A very interesting approach to the problem. I think a few parameters of what we know about the lab are worth examining in more detail. For example, do we know how many generations of software there were? Do we know how software design elements are represented and remixed at the 'DNA' level? Finally (and most critically) do we have a good grasp of how selection occurs?

I would say that if we knew those things, it would be fairly easy for experts to decide whether the Rat's story was plausible. This may or may not be easy to boil down to a percentage - much depends on the details - but genetic software design is (so far) not nearly as robust or complex as natural selection we can watch happen in the biosphere. This means my response sort of side-steps the issue, of course, because Nathan wants to use this as a analogous situation to natural selection.

From that perspective, I would perhaps expand his thought experiment to the point where many different Rat engineers were involved, performing a variety of different approaches (the natural environment has undergone fairly drastic changes both from endogenous and exogenous causes through the ages), some of which are documented and some of which are not. Then the genetic information representation would have to be understandable at a first-order level but not perfectly understood at some higher levels that are relevant to the selection process. Of course, the selection process would also have to be not overly defined and be able to interact in some way with the solution set of populations of genetically-designed algorithms. Then one would have to employ large teams of specialists to analyze findings to present to jurists, who are presumably also educated in the relevant fields. And the trial goes on for quite a bit longer than normal trials.

At this point intuition isn't telling me much.

Nathan Smith

Well, the main point is that there needs to be some kind of field that could do these kind of proofs about the possibilities and impossibilities of what can emerge from complex systems before we are in a position to judge whether a particular phenomenon could have emerged from a particular evolutionary process. It would be a subfield of computer science or mathematics, I think, though maybe epistemology and formal logic would be involved. Does this field exist? How well-developed is it? I suspect that if the Magic Rat is sufficiently skilful it would be very difficult to prove in a courtroom that his story is false, because we're not very capable of defining the space of possible outcomes of evolutionary processes. The fact that "genetic software design is... not nearly as robust or complex as natural selection... in the biosphere" only means that the problem of whether evolution-as-a-theory-of-how-all-life-emerged is plausible is far more difficult than the Magic Rat problem, and if "intuition isn't telling [us] much" about the latter, then we're certainly in no position to settle the former question. That's why I would like to see evolution-as-a-comprehensive-theory-of-life replaced by a more modest theory of imperfect ecosystemic homeostasis.


I agree with much of Nathan's oct 29 commentary. When I said that natural selection was much more complicated and sophisticated than the artificial genetic design approaches currently in use in software, I was intentionally trying to say that just because it would be (relatively) easy to "prove" that the Magic Rat intended to break into the Goodman Company doesn't mean that evolutionary mechanisms would be similarly easy to prove in any given case. In fact, they are much much harder to prove, which is why we've expended orders of magnitude more time and effort to do so, and why many specific conclusions about evolutionary history remain dubious. For example, it was only recently (the last five years) that the savannah theory of human origins went from the most popular (but surprisingly weak) theory to the clear best theory* based on calculating the age of genes related to sweat glands. And that is a realm in which we have a tremendous amount of data. Now that we can start extracting protein data from ancient tissues, we might be able to project some measure of certainty back further, but it's likely that many questions will never get answers that are much better than conjecture. I think a fair observer will admit that there's ample grounds for skepticism about many claims of evolutionary theory.

That said, there are well-defined ways of disconfirming evolutionary hypotheses, as proponents of the interesting-and-plausible-but-wrong aquatic ape theory can hardly deny. I suspect that it would be computationally intractable to define the space of possible solutions that a sufficiently-sophisticated genetic algorithm design program could reach, but given comparable data to that we have for biological history (mutation distributions in final populations, some preserved specimens of predecessor populations, measures of historical resource usage and so on), I expect we would have standards on which to make calls about specific questions.

If someone did tinker, though, there's no guarantee we would ever be able to tell that they did. We might and we might not be able to say "no, known mechanisms can't account for that", but we could never conclude that everything that did happen happened because of a certain set of defined mechanisms. I will defend practical naturalism to the death, but I won't try to defend necessary naturalism.

*Granted, the modern savannah theory and the historical one look very different, as the original hypothesized mechanisms for the evolution of bipedalism have also been shown to be wrong.


I don't know how much you guys know about the field, so I'll just assume ignorance. Evolutionary algorithms are used extensively in computer science and electrical engineering, usually to do things that are too difficult for us to figure out how to design for. Speech recognition is a good example. The main problem with using the natural selection and descent with modification design paradigm is that you can never be sure how exactly the final circuit or program works, and you can never be sure that it will do what you want for inputs you've never tested it with. If you wanted to add or subtract features, you'd probably have to evolve the thing from scratch; just like in biological evolution, you can't get an "alligator-duck" in computer science evolution. However, programs that are designed and not evolved can be modified willy-nilly like virtual frankensteins to produce "alligator-ducks" and other abominations. Computer scientists in general would rather have the flexibility of a god to mix and match and modify code to do their bidding than to have code that just works for unknown reasons. But if you're not sure how to design a solution for a problem, using an evolutionary algorithm might be a good alternative.


Not to contradict Tom (which I'm not), but I want to clarify something. It's not at all difficult to understand the teleology of artificial algorithmic design, given that the purpose of the process is written all over the fitness test. Software programs setting fitness tests for other software programs seem plausible only in an extremely constrained way, and the fitness tests that evolved those couldn't be very ambiguous.

Nathan may say "but what if we have access only to ancestor designs, but not fitness test conditions?" I will certainly agree that without a fairly complete history by which we can reverse-engineer the test, then we're in a lot of trouble. But that doesn't compare well to evolutionary history, for which we have a pretty strong understanding of the fitness test.

A final option is that the Rat admits he was attempting to evolve programs to attack networks that are generally like those of Goodman Company, but had never intended for a program to escape and attack an actual network. I doubt most jurors would find this very convincing, but he might at least win a hung jury on criminal charges.


While we can be relatively sure of what an evolved program is for (i.e. the teleology given to it by us), most of the time we're not sure how it achieves its end. We know the inputs and the outputs, but we don't know how the stuff in between works. An evolved program is as much of a magical black box as much of nature seems to be.

Nathan Smith

I don't think Tom is right about the last point. Presumably even if the code is "written" by the computer, the Magic Rat can still see the code, and figure out why the program does what it does. I can analyze a program written by someone else and figure out what it's doing, without talking to the program's author about it (though the comments that human programmers intersperse in their code are admittedly very helpful in understanding their programs).


There's two ways in which evolved programs can be relatively inscrutable: 1) in current operation and 2) in the reason for their rise.

1:Depending on how genes are represented and how mutation occurs, there's likely no source code, commented or not. A program may respect the division between data and instructions, or it may not. The only level at which I would feel pretty confident of finding human-comprehensible function regardless of the mutation method would be when looking at the binary through a disassembler like IDA Pro (no Hex Rays here: mutation would chop up those nice clean compiler patterns). Even then who knows if that would really tell you what it's doing? There's nothing guaranteeing that a double in data doesn't get executed later. Could we eventually decipher even the toughest program? Yes, but it would be a far more difficult task than deciphering uncommented source, requiring skills and approaches different in kind as well as scale from any other software reverse engineering.

2)There's also no real agreement on *how* programs of mutation and crossover achieve the optimizations they do. I don't really know much about this part of things, but from what I can tell, we just try this, and try that, and hope that something awesome falls out.

Nathan Smith

Wow. Very interesting.


Take neural networks as an example. Every neural network has some number of inputs and outputs and some sort of neural architecture in between. In order for this thing to do what we want it to do, we give it specific inputs and tell it when it's correctly given us outputs we want or expect. It then modifies the weights of its neural pathways and the whole process is repeated until it correctly handles all of the inputs we give to it. Neural networks are thus trained using descent with modification and natural selection. If we look at the source code, all we see is the architecture of the neural network and its weight table. We don't know if it recognizes voices or images, if it does basic arithmetic, if it plays chess, etc. I mean, you could theoretically give it all sorts of random inputs that represent things like chess moves/positions for instance, and you might occasionally get an output that could be interpreted as a valid chess move/position, but if it was optimized to recognize English speech patterns, it is vanishingly unlikely to be able to do anything else. Is that something one could figure out by looking at the neural weights? Not a chance. You'd have to discover what it was good at by implementing it and feeding it gobs of data, and even then you'd be vanishingly unlikely to hit upon the right telos.


My previous post is a little beside the point. What I really want to say is that if the neural network is trained to do something like recognize speech, and let's say it does it really well, we still can't be sure how it actually accomplishes this amazing feat! Does it do some sort of convolution of samples in the time-domain? Probably not, but how could we know for sure? We can't learn how speech recognition works by looking at the nuts and bolts of the trained neural network, and thus neural networks are not really of any scientific interest except as a possible analog of the human brain (though they are of extreme interest in engineering, where people care more about things working than they care about why they work).


I would want to dispute a point that Tom probably put too strongly. We can in some sense "learn how speech recognition works by looking at the nuts and bolts of the trained neural network." The problem is that this is not a *useful* sense. We can (with a huge amount of effort) learn how this net solves the problem insofar as it does, but we will not have learned the "right" answer, nor will the "knowledge" be at all portable. We probably won't have learned something robust enough to be called an approach or a technique. More like 'it just so happens that this configuration can perform the following task well under these specific circumstances,' which is really no additional information than we had before.

Also, I think it might be worth clarifying that neural nets qua neural nets can be of great interest in science (once they get organized into systems, their properties/capabilities aren't as easily understood). It's just that the solutions provided by neural net are generally not interesting.


I guess Nato is playing semantics, but not in a *useful* sense. If given two separate trained neural networks, it would be impossible to determine which one was trained to play chess and which one was trained to recognize speech simply by looking at the neural weights. So really, is there any sense in which we could learn about speech recognition or chess playing from a table of numbers absent a context? Maybe we could, given prior knowledge of which is which, say that this particular set of numbers in this neural architecture can play chess, but that was already a given; we don't learn anything, as Nato concedes.


Hey, just trying to be clear about what's being claimed. One could easily interpret "We can't learn how speech recognition works by looking at the nuts and bolts of the trained neural network" as "neural networks are mysterious vortexes of knowledge beyond human understanding!"* But of course, this is silly, so I think it can be useful to state, "we can't learn anything from neural networks because they don't really have anything to teach."

*I know this is easy because I've seen people say things like this, though generally with less drama.


Also, I guess I should mention that genetic algorithms as a whole tend to have similar limitations as neural nets in terms of generating interesting solutions. I skipped over this fact in my original response, which may have been why Tom chimed in. I don't think there's a theoretical limitation, it's just that designing fitness criteria for evolving toward a novel solution is really difficult. Optimizations tend to be easily quantifiable and thus testable. Evolutionary solutions come about because fitness tests for one (set of) optimization(s) ends up generating a solution that ends up also "passing" some other, frequently unrelated, fitness test. Scales pass as hairs, hairs pass as quills, or feathers, or whatever. An equivalent GA program would have to set up fitness tests for many different problems and cross results across problem domains. One suspects researcher would only try this approach if they already had the idea that there was a solution composed of finite set of parts. I can imagine the Rat doing something like this, but I'm not aware of successful real-world analogues. Maybe someday some physicist will use GA like that to find a grand unified theory or something.

Tom, are you aware of anyone trying things like this?


I know it it a little bit late but I still would like to comment on your post since I think you terminated the parable prematurely. Thus in your parable some pieces of evidence are missing which might not be critical but I still would like to take the liberty to add them:

1) The FBI did analyze the Rat’s computer and found that it could indeed produce programs with new functions though while the FBI let the computer run no programs that could escape the lab or hack into other computers was produced. But this was expected since the time was too short.
2) The FBI found a million of hacking algorithms outside the lab that fit into a nested hierarchy thus suggesting that they could have evolved.
3) There is no intrinsic limitation to the complexity of the programs that the “evolutionary software engineering” can produce. Limitations would only be external factors like not enough time or resources.

You will probably object the third point which is a little bit difficult to explain, but this additional information would make the Rat's story more plausible.

One question to clarify: Would it be enough to demonstrate that the computer could have generated the hacker program, if one would let it run for some time and it would generate a similar program?

Nathan Smith

Is the "no intrinsic limitation" supposed to be part of the parable-- that is, an assumption? It seems like a substantive general claim, and a very controversial one.

As far as the "time was too short," how would one quantify that? Would it be possible to say "the odds are 1 in 100,000 that this would have happened by accident?" How?


I think point 3 may refer to the fact that types of fitness tests, mutation methods, logical space allocation and so on could permanently close off the creation of some kinds of complexity. That said, I'm not sure how this ties into the story because presumably the Rat-with-intent would make sure that the relevant (meta)design of the evolutionary process gave enough room for the goal functionality.

Perhaps it needs to be said - it would almost certainly be obvious from the start if humans just wrote the code and tried to pass it off as an algorithmic design. It would have to be real GA for it to be remotely plausible - the question would be whether it was GA elicited for a particular goal or if it was a departure from a similar (but presumably non-criminal) goal solution.


Concerning the "intrinsic limitations" it is pretty much what nato said. It should be part of the parable since if there were any this would immediately falsify Rat's story. In reality irreducible complexity would be such a limitation.

I guess it would be difficult to calculate the likelihood of a short run of the hacker program generator producing the same results as a long run. But I guess the law of large numbers applies so the probability should drop to nearly zero very quickly.

The comments to this entry are closed.

My Photo

Only use a payday cash advance as a last resort.


Blog powered by Typepad