thesis on Q-learning for packet routing

Wed Oct 14 11:41:53 EDT 1998

Dear Connectionists, 

My thesis on packet routing with Q-learning is available at the UTCS
Neural Networks Research Group web site http://www.cs.utexas.edu/users/nn.

- Shailesh

CONFIDENCE BASED DUAL REINFORCEMENT Q-ROUTING: AN ON-LINE ADAPTIVE
NETWORK ROUTING ALGORITHM 
Shailesh Kumar
Master's Thesis; Technical Report AI98-267, Department of Computer
Sciences, The University of Texas at Austin (108 pages) 

FTP-host:     ftp.cs.utexas.edu
FTP-filename: pub/neural-nets/papers/kumar.msthesis.ps.Z

http://www.cs.utexas.edu/users/nn/pages/publications/abstracts.html#kumar.msthesis.ps.Z

Efficient routing  of information packets in  dynamically changing
communication networks requires  that  as  the load levels,  traffic
patterns and  topology of the  network change, the routing policy also
adapts. Making globally optimal  routing decisions would require a
central observer/controller  with complete information about the state
of  all nodes and links  in the  network, which is not
realistic. Therefore, the  routing decisions must  be made locally  by
individual nodes  (routers) using only  local routing information. The
routing information  at a node could be  estimates  of packet delivery
time to other nodes via its neighbors or estimates of queue lengths of
other  nodes in the network.  An  adaptive  routing algorithm should
efficiently  explore  and update  routing information available at
different nodes as it  routes  packets. It should  continuously evolve
efficient routing policies with minimum overhead on network resources.

In  this thesis, an  on-line adaptive network routing algorithm called
Confidence-based Dual Reinforcement Q-Routing (CDRQ-routing), based on
the  Q learning  framework, is  proposed  and evaluated.  In  this
framework, the routing  information at individual nodes is  maintained
as Q value estimates of how long it will take  to send a packet to any
particular  destination via  each of  the node's neighbors.  These Q
values are  updated    through  exploration as  the  packets are
transmitted. The main contribution  of  this  work is the faster
adaptation and  the  improved quality  of routing policies over  the
Q-Routing.  The improvement is based on  two ideas. First, the quality
of exploration is improved by including a confidence measure with each
Q value representing how reliable the Q value is. The learning rate is
a function  of these  confidence  values.  Secondly,  the quantity  of
exploration is increased  by including backward exploration into Q
learning. As a  packet  hops from one node to another, it not only
updates a Q value in the sending  node (forward exploration similar to
Q-Routing), but also updates a Q value in the receiving node using the
information appended  to the  packet when it  is sent out  (backward
exploration). Thus two Q  value updates per packet  hop  occur  in
CDRQ-Routing as against only  one in Q-routing. Certain properties  of
forward and  backward exploration that  form the basis of these update
rules are stated and proved in this work.

Experiments over  several  network topologies, including a 36 node
irregular  grid  and 128  node 7-D hypercube, indicate  that the
improvement  in quality and increase  in  quantity of  exploration
contribute in  complementary ways to the performance of the  overall
routing  algorithm.  CDRQ-Routing was able  to  learn optimal shortest
path routing  at low  loads and  efficient routing  policies at medium
loads almost  twice  as fast as  Q-Routing.  At high load levels,  the
routing policy  learned  by CDRQ-Routing  was  twice  as  good as that
learned by  Q-Routing  in terms  of  average  packet  delivery
time. CDRQ-Routing was found to adapt significantly faster  than
Q-Routing to changes  in  network  traffic patterns and  network
topology. The final routing policies learned by CDRQ-Routing were able
to sustain much higher   load  levels than those   learned  by
Q-Routing. Analysis shows that exploration  overhead incurred in
CDRQ-Routing is  less than 0.5%  of  the  packet traffic.  Various
extensions of CDRQ-Routing  namely, routing  in heterogeneous networks
(different  link delays and router processing speeds), routing  with
adaptive  congestion control  (in  case of  finite  queue buffers) and
including predictive features into  CDRQ-Routing have been proposed as
future work. CDRQ-Routing is  much superior and realistic than the
state of the art distance vector routing and the Q-Routing algorithm.