thesis on Q-learning for packet routing
Shailesh Kumar
skumar at cs.utexas.edu
Wed Oct 14 11:41:53 EDT 1998
Dear Connectionists,
My thesis on packet routing with Q-learning is available at the UTCS
Neural Networks Research Group web site http://www.cs.utexas.edu/users/nn.
- Shailesh
CONFIDENCE BASED DUAL REINFORCEMENT Q-ROUTING: AN ON-LINE ADAPTIVE
NETWORK ROUTING ALGORITHM
Shailesh Kumar
Master's Thesis; Technical Report AI98-267, Department of Computer
Sciences, The University of Texas at Austin (108 pages)
FTP-host: ftp.cs.utexas.edu
FTP-filename: pub/neural-nets/papers/kumar.msthesis.ps.Z
http://www.cs.utexas.edu/users/nn/pages/publications/abstracts.html#kumar.msthesis.ps.Z
Efficient routing of information packets in dynamically changing
communication networks requires that as the load levels, traffic
patterns and topology of the network change, the routing policy also
adapts. Making globally optimal routing decisions would require a
central observer/controller with complete information about the state
of all nodes and links in the network, which is not
realistic. Therefore, the routing decisions must be made locally by
individual nodes (routers) using only local routing information. The
routing information at a node could be estimates of packet delivery
time to other nodes via its neighbors or estimates of queue lengths of
other nodes in the network. An adaptive routing algorithm should
efficiently explore and update routing information available at
different nodes as it routes packets. It should continuously evolve
efficient routing policies with minimum overhead on network resources.
In this thesis, an on-line adaptive network routing algorithm called
Confidence-based Dual Reinforcement Q-Routing (CDRQ-routing), based on
the Q learning framework, is proposed and evaluated. In this
framework, the routing information at individual nodes is maintained
as Q value estimates of how long it will take to send a packet to any
particular destination via each of the node's neighbors. These Q
values are updated through exploration as the packets are
transmitted. The main contribution of this work is the faster
adaptation and the improved quality of routing policies over the
Q-Routing. The improvement is based on two ideas. First, the quality
of exploration is improved by including a confidence measure with each
Q value representing how reliable the Q value is. The learning rate is
a function of these confidence values. Secondly, the quantity of
exploration is increased by including backward exploration into Q
learning. As a packet hops from one node to another, it not only
updates a Q value in the sending node (forward exploration similar to
Q-Routing), but also updates a Q value in the receiving node using the
information appended to the packet when it is sent out (backward
exploration). Thus two Q value updates per packet hop occur in
CDRQ-Routing as against only one in Q-routing. Certain properties of
forward and backward exploration that form the basis of these update
rules are stated and proved in this work.
Experiments over several network topologies, including a 36 node
irregular grid and 128 node 7-D hypercube, indicate that the
improvement in quality and increase in quantity of exploration
contribute in complementary ways to the performance of the overall
routing algorithm. CDRQ-Routing was able to learn optimal shortest
path routing at low loads and efficient routing policies at medium
loads almost twice as fast as Q-Routing. At high load levels, the
routing policy learned by CDRQ-Routing was twice as good as that
learned by Q-Routing in terms of average packet delivery
time. CDRQ-Routing was found to adapt significantly faster than
Q-Routing to changes in network traffic patterns and network
topology. The final routing policies learned by CDRQ-Routing were able
to sustain much higher load levels than those learned by
Q-Routing. Analysis shows that exploration overhead incurred in
CDRQ-Routing is less than 0.5% of the packet traffic. Various
extensions of CDRQ-Routing namely, routing in heterogeneous networks
(different link delays and router processing speeds), routing with
adaptive congestion control (in case of finite queue buffers) and
including predictive features into CDRQ-Routing have been proposed as
future work. CDRQ-Routing is much superior and realistic than the
state of the art distance vector routing and the Q-Routing algorithm.
More information about the Connectionists
mailing list