Announcement: Thesis available

Peter Nordin Peter.Nordin at dacapo.se
Wed Oct 15 11:56:06 EDT 1997



My thesis on evolution or induction of register machine code is now
available from the publisher Krehl-verlag in Germany. The described
machine learning technique is a variant of genetic programming applied
to evolution of binary machine code for a real computer--a technique
which is very machine efficient.

The title of the thesis is:
Evolutionary Program Induction of Binary Machine Code and its
Applications

The thesis includes applications to robot control as well as image and
sound processing.

It can be ordered from:

Krehl Verlag
Postfach 51 01 42
D-48163 Muenster
GERMANY              

Tel/Fax +49 231 261550
email: krehl-verlag at t-online.de
Price: 68DM

Below you will find the abstract, table of contents and details on how
to order.

Best regards,

Peter Nordin
---------------------------------------------------------------------
Evolutionary Program Induction of Binary Machine Code and its
Applications
Author: Peter Nordin
Supervisor: Wolfgang Banzhaf
ISBN 3-931546-07-1
290 pages

Abstract

This thesis presents the Compiling Genetic Programming System (CGPS) a
machine learning method for automatic program induction. The objective
of the system is to automatically produce computer programs. CGPS is
the marriage of two new ideas (1) a special evolutionary program
induction algorithm variant and (2) the use of large scale meta
manipulation of binary code. Both ideas may have merits on their own
but it is by combining the two that the real benefits emerge, making
CGPS a powerful machine learning paradigm.

(1) The evolutionary program induction method is an instance of an
evolutionary algorithm (EA)|a class of algorithms that borrow
metaphors from biology, evolution and natural selection. It uses a
linear program representation in contrast to other well used methods
such as Koza's genetic programming (GP) approach which has a
hierarchical tree{based program structure. One way to view CGPS is as
a large alphabet genetic algorithm (GA) where each letter in the
alphabet corresponds to a syntactically closed computer program
structure. A letter in the GA could for example be a line in a
computer program i.e. an assignment a := a + 1. CGPS uses
recombination (crossover) in between letters which guarantees
syntactic closure during evolution.  However, from a GA point of view,
the letter in the linear string normally does not have an internal
structure. A program line in a computer language the program or a
machine code instruction have plenty of internal structure determining
the operation to be performed and the operands used. A better metaphor
could therefore be the gene concept in nature.  Genes in DNA are
syntactically closed sequences providing the recipe of a protein. As
in CGPS, crossover normally acts between genes/instructions preserving
the syntactic closure of the object. There is a mutation operator, to
produce variation within the internal structure of the
gene/instruction. In CGPS, the mutation operator changes the
syntactically closed program object into another syntactically closed
program object|it replaces a valid letter with another valid letter in
the GA alphabet.  CGPS also uses two significant features in the
program individual (genome) which is the header and the footer. The
header is a prefix part of the individual while the footer is the
suffix part.  The genetic operators prevent the header and the footer
from being changed, and the header and the footer are used to ensure
syntactic closure of the whole individual.

(2) CGPS in this thesis is mostly applied to program induction of
binary machine code. Binary machine code is the code in the computer
that is directly executed by the processor. The program individual in
CGPS is a binary machine code function and the genetic operators
operate directly on the binary machine code. This implies that CGPS is
a meta-manipulating program. In low level programming, a
meta-manipulating program is a program that changes its own binary
code or the binary code of another pro- gram. Usually, this only means
changing a few bytes e.g. dynamic linking of a function.  However,
CGPS is the first real instance of a large scale meta manipulating
program which constantly manipulates, shuffles around and changes
large chunks of binary code as a part of the learning process. So, the
output from CGPS is a binary machine code program that can be executed
directly by the processor. The meta manipulation of the individual
(genome) is done with the evolutionary algorithm, mentioned in (1)"
above.  The headers and footers are used to make sure that the
individual always preserves syntactic closure during evolution, no
matter what the genetic operators do with the code in between.

The marriage of these two ideas produce a system which can induce
turing-complete machine code programs very efficiently. The system
also has several other attractive properties, such as constant memory
usage, compact representation and uncomplicated memory management.

In this thesis the background to CGPS is presented together with other
evolutionary algorithms and other evolutionary program induction
techniques. The detailed descrip- tion of CGPS and its implementation
is described together with several evaluations or case studies in
different feasible domains such as robotic control, image processing
and natural language processing. Some of the evaluations make
comparisons to other ma- chine learning algorithms; neural networks
and hierarchical genetic programming. The formal definition of CGPS is
given followed by aninvestigation of generalization in vari- able
length evolutionary algorithms. Finally, several possible directions
for the future of CGPS research are presented.



Table of Contents

1 Introduction .....21
1.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . .
 . . . . 22
1.2 Genetic Programming . . .. . .. .. . .. . .. . .. . .. . .. .. . .
25
1.3 Machine Language Genetic Programming . . .. . .. . .. . .. .. . . 31
1.4 Introduction to CGPS . . . . . . . . . . . . . . . . . . . . . . . .
 . . . 38
I Implementation 47
2 Computer Hardware 49
2.1 Introduction to Computer Hardware. . .. . .. . .. . .. . .. .. . .
50
2.2 Von Neumann Computers . . . . . . . . . . . . . . . . . . . . . . .
 . . 50
3 The SPARC Architecture and CGPS 55
3.1 Fundamentals of CGPS . .. . .. .. . .. . .. . .. . .. . .. .. . . 56
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 58
3.3 Floating Point Instructions . . .. .. . .. . .. . .. . .. . .. .. .
 69
3.4 Self Modifying Code in C language . . . . . . . . . . . . . . . . .
 . . . 69
3.5 How to Call a Self|made Function from C . .. . .. . .. . .. .. . .
69
3.6 Genetic Operators . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
70
3.7 Initialisation . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. .
 72
4 Additional Features of the System 75
4.1 Leaf Procedures and Primitives.... . .. . .. . .. . .. . .. .. . .
77
4.2 Memory in Tree{Based GP . . .. .. . .. . .. . .. . .. . .. .. . . 78
4.3 Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 80
4.4 Automatically De
ned Subroutines in CGPS .. . .. . .. . .. .. . . 80
4.5 Leaf Procedure Examples .. . .. .. . .. . .. . .. . .. . .. .. . .
84
4.6 Loops and Recursion in Tree{Based GP . . . . . . . . . . . . . . . .
 . 85
4.7 Loops and Recursion in CGPS .... . .. . .. . .. . .. . .. .. . . 88
4.8 Loop Example in CGPS . . . . . . . . . . . . . . . . . . . . . . . .
 . . 88
4.9 External Functions . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 91
4.10 Strings and Lists . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 91
4.11 Parameters to the System . . . . . . . . . . . . . . . . . . . . .
 . . . . 92
4.12 C-language Output . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 93
4.13 Platforms for CGPS .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
94
4.14 Portability Methods .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
94
4.15 Caveats . .. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
94
4.16 How to Get Started .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
95
4.17 Using Tree Representation . . . . . . . . . . . . . . . . . . . . .
 . . . . 96
4.18 Speed Evaluation . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 97
4.19 Why is Binary Manipulation so Fast? . .. . .. . .. . .. . .. .. . .
98
4.20 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 99
5 A Walkthrough of an Example System. 101
5.1 Variables Constants and Parameters . . .. . .. . .. . .. . .. .. . .
102
5.2 Random Number Generation . .. .. . .. . .. . .. . .. . .. .. . . 104
5.3 Initialisation . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. .
 107
5.4 Output Functions . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 107
5.5 The Fitness Function . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 108
5.6 Reproduce an Individual . . . . . . . . . . . . . . . . . . . . . .
 . . . . 109
5.7 Crossover .. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
110
5.8 Mutation .. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
110
5.9 Tournament Selection . . .. . .. .. . .. . .. . .. . .. . .. .. . .
111
5.10 Read Training Data .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
111
5.11 The Main Procedure . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 112
II Evaluations 115
6 On-line CGPS 117
6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 118
6.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 119
6.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 122
6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 130
6.5 Conclusions of On-line CGPS . . . . . . . . . . . . . . . . . . . .
 . . . 135
7 Control Using Memory ofPast Events 139
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 140
7.2 The Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . . .
 . . . 140
7.3 Setup . . .. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
141
7.4 Objectives.. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
141
7.5 The Memory Based GP Control Architecture .. . .. . .. . .. .. . .
142
7.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 146
7.7 Future Directions of Memory Based GP in Control . . . . . . . . . .
 . 152
7.8 Summary .. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
153
8 High-performance Applications 155
8.1 Historic Remarks on CGPS and Image Coding . . .. . .. . .. .. . .
156
8.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 156
8.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 157
8.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 159
8.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . .
 . . . 161
9 CGPS and Programmatic Compression 163
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 164
9.2 Programmatic Compression (PC) . . . . . . . . . . . . . . . . . . .
 . . 164
9.3 Compression of Sound . . . . . . . . . . . . . . . . . . . . . . . .
 . . . 166
9.4 Compression of Pictures . . . . . . . . . . . . . . . . . . . . . .
 . . . . 171
9.5 Summary and Conclusion .. . .. .. . .. . .. . .. . .. . .. .. . .
173
10 CGPS and Tree{Based GP 175
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 176
10.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 176
10.3 The Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . .
 . . . . 177
10.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 177
10.5 Discussion and Conclusions . . .. .. . .. . .. . .. . .. . .. .. .
 177
11 Neural Networks and CGPS 179
11.1 The Sample Problem . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . 180
11.2 The Neural Network.. . .. . .. .. . .. . .. . .. . .. . .. .. . .
180
11.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 182
11.4 Summary .. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
183
12 Neural Networks and Generalisation 185
12.1 The Problems Used In This Study .. . .. . .. . .. . .. . .. .. . .
186
12.2 Classi
cation as Symbolic Regression . . . . . . . . . . . . . . . . . . . .
187
12.3 Introns and Explicitly De
ned Introns . . . . . . . . . . . . . . . . . . . 187
12.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 187
12.5 Summary .. . .. . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
190
III Explanation 193
13 Formalisation of CGPS 195
13.1 Register Machines . .. . .. . .. .. . .. . .. . .. . .. . .. .. . .
196
13.2 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . .
 . . . . 199
13.3 Compiling Genetic Programming . . . . . . . . . . . . . . . . . . .
 . . 203
14 Complexity, Compression and Evolution 205
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 206
14.2 Complexity, E
ective Fitness and Evolution . . . . . . . . . . . . . . . . 209
14.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 212
14.4 Kolmogorov Complexity and Generalisation . . . . . . . . . . . . .
 . . 217
14.5 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . 220
14.6 Other Evolutionary Techniques . . . . . . . . . . . . . . . . . . .
 . . . 223
14.7 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . .
 . . . 223
15 Explicitly Defined Introns 229
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 230
15.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 232
15.3 The Experimental Setup . . . . . . . . . . . . . . . . . . . . . .
 . . . . 233
15.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 . . . . . . 235
16 Conclusions and Outline of Future Perspectives 245
16.1 A Computer Language for Meta-manipulation . . . . . . . . . . . . .
 . 246
16.2 Typed GP, Constrained Crossover, Grammar and Search Bias . . . . .
 247
16.3 Other Machine Learning Algorithms . . .. . .. . .. . .. . .. .. . .
251
16.4 Special Processors . .. . .. . .. .. . .. . .. . .. . .. . .. .. .
 252
16.5 Large Number of Input Parameters . . . . . . . . . . . . . . . . .
 . . . 253
16.6 Reasoning about Machine Code Programs with a Meta GP System . . .
254
16.7 The Logic of Genetic Reasoning . . . . . . . . . . . . . . . . . .
 . . . . 258
16.8 Some Brief Initial Results .. . .. .. . .. . .. . .. . .. . .. .. .
 259
Appendix A: Flow Charts of CGPS 263


* Orders within Germany can easily order in any bookstore (DM 68,-),
  although firms/institutions or (Master or Visa) credit card owners
also can
  order directly (shipping and credit card service charge included).
* International Sales are possible to Master or Visa-Card owners:
  - order should arrive via FAX (+49 2501 261550) or PGP-signed (and
    optionally encrypted, our PGP-key is available upon request) email
    (with PGP public key trustfully available). In the moment, these
    ways are the only ones protecting the customers sensitive payment data
    (we are working on secure order via WWW but have no final timeline
    for this now)
    The order should mention credit card type, number and expiration date,
    as well as card holders name and address.
  - Beside the cost of the book (w/o VAT), i.e. DM 68,- minus 7%
    the interested party has to decide about "surface" or "air" mail
    (rates available upon request for different areas, e.g. USA
    DM 3.50 for 3-5 weeks delivery or DM 24,-- for 1-2 weeks delivery).
    The total amount is drawn from the credit card in german currency.


More information about the Connectionists mailing list