Based on a plenary talk presented at the 1996 Summer Meetings of the Econometric Society.
Drew Fudenberg and David K. Levine
June 26, 1996; revised July 30, 1996
This essay has two goals. First, it is an overview of the field to give
an idea of what has been going on recently. Second, it is an editorial for our own views
of how future research ought to proceed.
The basic research agenda is to explore non-equilibrium explanations of
equilibrium in games; to view equilibrium as the long-run outcome of a dynamic process of
adjustment or learning. In other words, equilibrium concepts should be interpreted as
steady states or ergodic distributions of a dynamic process. This essay focuses on
theoretical models of learning, but is informed by and addressed to experimenters as well,
and we have suggestions about how each group "ought" to proceed.
In our view, game theorists ought to pay more attention to game theory
experiments; for example, the literature on refinements of Nash equilibrium proceeded
based almost entirely on introspection and theoretical concerns. Laboratory experiments
may not be perfect tests of theories, but they are much better than no tests at all. Our
second message to theorists is that more attention should be paid to the "medium
run" properties of learning models: Most work, including our own, has focused on
asymptotic results, which tend to be easier to obtain.
Of course, any experiment in which subjects play more than one round at
least implicitly involves learning, and indeed most experiments focus on play in the last
few rounds of the experiment, when subjects have had the most time to learn. Given this
heavy implicit reliance on the idea of learning, we feel that experimenters ought to pay
more attention to the relevant learning theory. For example, a variety of learning models
have been studied by experimental economists without much concern for the extent to which
the models do a good job of learning. Although we are prepared to be convinced otherwise
by sufficiently strong data, our prior belief is that in environments where simple
learning rules can do well, agents will tend not to use rules that do poorly. For this
reason we are skeptical of claims that very simple models fit the data well unless those
models are compared with medium-complexity models that do a better job of learning.
From a theorist's perspective, there are three criteria for good
experiments: interesting questions, appropriate theoretical foundations, and careful
experimental design. In our opinion, too many experiments focus on careful experimental
design, changing only a few treatment variables from a past experiment in order to isolate
their effect, and too few experiments try to answer questions of theoretical interest. Of
course, this procedure is useful for identifying the reasons that outcomes differ between
the treatments, but we believe that more attention to theoretical issues would help
experimental economics move beyond laundry-lists of treatment effects in various games.
Instead of drawing a t distinction between models of "learning" and other forms of adjustment, it is easier to make a distinction between models that describe behavior of individual agents and those that start at the aggregate level. Consequently, the real taxonomy is not between "learning" and "evolutionary" models, but rather between "individual level" models and "aggregate" models. This essay focuses on individual level models, but we would like to make a few comments on aggregate models before we begin.
An aggregate model starts with a description of aggregate behavior of a
population of agents. An example of this is the "replicator dynamic" that
postulates that the fraction of the population playing a strategy increases if the utility
received from that strategy is above average. There are two main reasons that economists
are interested in the replicator and related models.
First, although the replicator dynamics was originally motivated by
biological evolution, it can be derived from various sorts of individual-level models. One
such model is the stimulus-response model of Borgers and Sarin [1994] which postulates
that actions that perform well are "positively reinforced" and so more likely to
be played in the future. An alternative model is the "emulation" model in which
players copy their more successful predecessor, as in Binmore and Samuelson [1995],
Bjonerstedt and Weibull [1995], or Schlag [1995]. Note, however, that the
"emulation" stories are not applicable to most experiments, because subjects are
not told what happens in other matches. These emulation models are very similar to
non-equilibrium models of "social learning," where players are trying to learn
what technology, crop or brand is best, such as Kirman [1993] and Ellison and Fudenberg
[1993,1995]. The links between emulation and social learning models needs further
exploration.
A second reason the replicator dynamic is of interest is because some of
its the properties of the replicator dynamic extend to classes of more general processes
that correspond to other sorts of learning or emulation. Samuelson-Zhang[1992] and
Hofbauer and Weibull [1995] have results along these lines, giving sufficient conditions
for a dynamic process to eliminate all strategies that are eliminated by iterated strict
dominance.
The focus of the remainder of the essay is individual level models, and
more specifically on variants of "fictitious play" in two players games This is
not a comprehensive survey and many relevant papers have been left out ).. Moreover, we
will suppose that players observe the strategies played by their opponents at the end of
each round. If the game has a non-trivial extensive form, this implies that players see
how opponents would have played at information sets that were not reached.
The first question a learning model must address is what individuals
know before the game starts, and what it is that they are learning about. Are they
learning about exogenous data and choosing best responses, or are they learning how to
play? A closely connected issue is how much rationality to attribute to the agents? We
believe that for most purposes the right models involve neither full rationality nor the
extreme naïveté of most stimulus-response models; "better" models have agents
consciously but perhaps imperfectly trying to get a good payoff.
In this essay we will simplify matters by assuming that players know the
extensive form of the game and their own payoff functions. They may or may not know their
opponents payoffs, but they are concerned with it only insofar as they can use it to
deduce how their opponents will play. In this setup the only thing players are learning
about from one iteration of the game to the next one is the distribution of strategies
used by opponents.
From this point of view, if players do have information about opponents'
payoffs they use it to form priors beliefs about how their opponents are likely to play.
For example, they might think it is unlikely that opponents will play strategies that are
dominated given the presumed payoff functions.
These two cases fit most laboratory experiments, but in other applications players might have less information. At the other extreme, players might not know anything about their opponents, so that players do not even know if they are playing a game, and are only aware of a subset of their own possible strategies. They might observe only the history of payoffs and their own strategies, and discover new strategies by experimentation or deduction. So far only the genetic algorithm model of learning (for example Holland [1975] has attempted to explore a setting like this.
There are several ways of modeling learning about opponents' strategies.
In the Bayes model of learning agents have priors, likelihood functions, and update
according to Bayes rule. Such agents can be "Savage rational," but they need not
be fully rational in the sense of basing their inference on a correct model of the world.
Indeed, if correct model of the world means consistent with a common prior, then there the
issue of whether play converges to an equilibrium is moot, since play must be an
equilibrium (either Bayesian correlated, or Bayesian Nash) from the outset, as shown by
Aumann [1987], and Brandenburger and Dekel [1987]. So it is not sensible to impose a
common prior when trying to understand how learning leads to equilibrium.
Since we are trying to understand how common knowledge might arise from
common experience, and is not clear what we might mean by a model being correct without
common knowledge, we look at learning processes where the players' models of the
environment are "sensible" as opposed to "correct." (If we focus not
on players' models of the environment, but on the "behavior rules" by which they
choose their actions , then we are interested in sensible rules.) This means that players
are trying to get a good payoff. Moreover, sensible rules should do reasonably well in a
reasonably broad set of circumstances. Rules that do poorly even in simple environments
are not likely to be used for important decisions.
There are two levels of sophistication in learning. One is simply to
forecast how opposing players will play. However, if two people repeatedly play a two
person game against each other, they ought to consider the possibility that their current
play may influence the future play of their opponent. For example, players might think
that if they are nice they will be rewarded by their opponent being nice in the future, or
that they can teach their opponent to play a best response to a particular action by
playing that action over and over. Most of learning theory abstracts from these repeated
game considerations by explicitly or implicitly relying on a model in which the incentive
to try to alter the future play of opponents is small enough to be negligible. Lock-in
with small discount factors is one such model. A second class of models that makes
repeated play considerations negligible is that of a large number of players, who interact
relatively anonymously, with the population size large compared to the discount factor.
There are a variety of large-population models, depending on how players
meet, and what information is revealed at the end of each round of play.
Random Matching Model: Each period, all players are randomly matched. At
the end of each round each player observes only the play in his own match. The way a
player acts today will influence the way his current opponent plays tomorrow, but the
player is unlikely to be matched with his current opponent or anyone who has met the
current opponent for a long time. Myopic play is approximately optimal if the population
is finite but large compared to the discount factors. Most game theory experiments use
this design. Note, however, that myopic play need not be optimal for a fully rational
player (who knows that others play myopically) even in a large population if that player
is sufficiently patient; see Ellison [1995]).
Aggregate Statistic Model: Each period, all players are randomly
matched. At the end of the round, the population aggregates are announced. If the
population is large each player has little influence on the population aggregates, and
consequently little influence on future play, so players have no reason to depart from
myopic play. Some, but too few, experiments use this design.
Note that with either form of a large population, repeated game effects are small but not nonzero with the typical population size of 20 to 30 subjects. Since the aggregate statistic
model seems to approximate about as many real-world situations of
interest as the random-matching model, and has the additional virtue of promoting faster
learning, it is unfortunate that it has not been used more often. This may be due to a
fear that repeated game effects would prove larger in the aggregate statistic model, but
we are unaware of theoretical or empirical evidence that this is the case. Our guess is
that the preponderance of the random-matching model is due rather to the desire to
maintain comparability with past experiments.
Although the large population stories explain naïve or myopic play; they only apply if the relevant population is large. Such cases may be more prevalent than it first appears, as players may extrapolate between similar games. Most non-equilibrium models in the literature either explicitly or implicitly have a large population model in mind; Kalai and Lehrer [1993] is one notable exception.
Fictitious play is a kind of quasi-Bayesian model: Agents behave as if
they think they are facing an exogenous, stationary, unknown, distribution of opponents'
strategies. The standard formal model with a single agent in each player role. There is
some recent work by Kanivkoski and Young [1994] that examines the dynamics of fictitious
play in population of players.
In the standard formal model denote a strategy by player i by , and the set of available strategies by . As usual, we will use to denote the player(s) other than i. The best response correspondence is denoted by . In fictitious play, player i has an exogenous initial weight function . This weight is updated by adding 1 to the weight of each opponent strategy each time it is played. The probability that player i assigns to player -i playing at date t is given by
,
Fictitious play itself is then defined as any rule that assigns . This method corresponds to
Bayesian inference when player i believes that his opponents' play corresponds to a
sequence of i.i.d. multinomial random variables with a fixed but unknown distribution, and
player i's prior beliefs over that unknown distribution take the form of a Dirichlet
distribution.
Proposition (Fudenberg and Kreps 1993): (a) If s is a strict Nash
equilibrium, and s is played at date t in the process of fictitious play, s is played at
all subsequent dates. That is, strict Nash equilibria are absorbing for the process of
fictitious play. (b) Any pure strategy steady state of fictitious play must be a Nash
equilibrium. (c) If the empirical marginal distributions converge, the strategy profile
corresponding to the product of these distributions is a Nash equilibrium.
The early literature on fictitious play viewed the process as describing
pre-play calculations players might use to coordinate their expectations on a particular
Nash equilibrium (hence the name "fictitious" play.) From this viewpoint, or
when using fictitious play as a means of calculating Nash equilibria, the identification
of a cycle with its time average is not problematic, and the early literature on
fictitious play accordingly focused on finding conditions that guaranteed the empirical
distributions converge.
The convergence result above supposes players ignore cycles, even if, because of these cycles, the empirical joint distribution of the two players' play is correlated, and does not equal the product of the empirical marginals. Consider the following example, from Fudenberg and Kreps [1990], [1993]. (Similar examples can be found in Jordan [1993] and Young [1993])
A | B | |
A | 0,0 | 1,1 |
B | 1,1 | 0,0 |
Suppose this game is played according to the process of fictitious play,
with initial weights (1, sqrt (2)) for each player. In the first period, both players
think the other will play B, so both play A. The next period the weights are (2,sqrt (2))
and both play B; the outcome is the alternating sequence ((B.,B),(A,A),(B,B), etc.) The
empirical frequencies of each player's choices do converge to 1/2, 1/2, which is the Nash
equilibrium, but the realized play is always on the diagonal, and both players receive
payoff 0 in every period. The empirical joint distribution on pairs of actions does not
equal the product of the two marginal distributions, so that the empirical joint
distribution corresponds to correlated as opposed to independent play. As a model of
players learning how their opponents behave, this is not a very satisfactory notion of
"converging to an equilibrium." Moreover, even if the empirical joint
distribution does converge to the product of the empirical marginals, so that the players
will end up getting the payoffs they would expect to get from maximizing against i.i.d.
draws from the long-run empirical distribution of their opponents' play, one might still
wonder if players would ignore persistent cycles in their opponents play.
One response to this is to use a more demanding convergence notion, and
only say that a player's behavior converges to a mixed strategy if his intended actions in
each period converge to that mixed strategy. The standard fictitious play process cannot
converge to a mixed strategy in this sense (for generic payoffs). Another response is to
think that fictitious play does describe the long-run outcome, and players will not notice
that their fictitious play model is badly off, provided that their payoffs are not worse
than the model predicts.
Fudenberg and Levine [1994] and Monderer, Samet, and Sela [1994] show that the realized time-average payoffs under fictitious play cannot be much below what the players expect if the time path of play exhibits "infrequent switches."
We now wish to contrast fictitious play with the partial best-response model of learning in a large population. The partial adjustment model has a state variable giving the fractions of the population playing each . In discrete time, if a fraction of the population is picked at random switch to the best response to their opponents current play, and the rest of the population continues their current play, the partial best-response dynamic is given by
.
If the time periods are very short, and the fraction of the population adjusting is very small, this may be approximated by the continuous time adjustment process
.
In contrast, the fictitious play process moves more and more slowly over time, because the ratio of new observations to old observations becomes smaller and smaller. Let denote the historical frequency with which strategies have been played at time t. Then in fictitious play
.
This is of course very much like the partial best-response dynamic, except that the weights on the past is converging to one, and the weight on the best response to zero.
Discrete-time fictitious play asymptotically approximately shares the same limiting behavior as the continuous time best-response dynamic. More precisely, the set of limit points of discrete time fictitious play is an invariant subset for the continuous time best- response dynamics, and the path of the discrete-time fictitious play process starting from some large T remains close to that of the corresponding continuous time best- response process until some time , where can be made arbitrarily large by taking T arbitrarily large. This was shown by Hofbauer [1995].
In fictitious play, players can only randomize if exactly indifferent;
they cannot learn to play mixed strategies. To develop a sensible model of learning to
play mixed strategies, one should start with an explanation for mixing. One such
explanation is Harsanyi's [1973] purification theorem, which explains a mixed distribution
over actions as the result of unobserved payoff perturbations.
A second explanation for mixing is that it is descriptively realistic:
some psychology experiments show that choices between alternatives that are perceived as
similar tend to be relatively random.
A third motivation for supposing that players randomize is that
stochastic fictitious play allows behavior to avoid the discontinuity inherent in standard
fictitious play; consequently it allows much good long-run performance in a much wider
range of environments.
All three motivations lead to models where the best response function are replaced by "smooth best response functions" , with
The notion of a Nash equilibrium in games with smoothed best response
functions is
Definition: The profile is a Nash distribution if for all i.
This distribution may be very different from any Nash equilibria of the
original game if the smoothed best response functions are very far from the original ones.
But in the context of randomly perturbed payoffs, Harsanyi's purification theorem shows
that ("generically") the Nash distributions of the perturbed game approach the
Nash equilibria of the original game as the support of the payoff perturbations become
concentrated about 0.
The smoothed best-response function for the game of matching pennies is shown below
The "threshold perception" literature in psychology literature
shows that when asked to discriminate between two alternatives, behavior is random,
becoming more reliable as the alternatives become more distinct. In choosing between two
different strategies, one measure of the distinctness of the strategies is the difference
in utility between the strategies. With this interpretation, Thurstone's [1927] law of
comparative judgment becomes similar to Harsanyi's random utility model. Indeed, the
picture of the smooth best response curve drawn above is very similar to behavior curves
that have been derived empirically in psychological experiments.
While smooth fictitious play is random, the time average of many independent random variables has very little randomness. Asymptotically the dynamics resembles that of the continuous time "near best response" dynamics
.
Consequently, if the stochastic system eventually converges to a point
or a cycle, the point or cycle should be a closed orbit of the continuous-time dynamics.
Moreover, if a point or cycle is an unstable orbit of the continuous time dynamics, we
expect that the noise would drive the system away, so that the stochastic system can only
converge to stable orbits of the continuous-time system.
The literature on "stochastic approximation" gives conditions that allow the asymptotic limit points of stochastic discrete-time systems to be inferred from the stability of the steady states of a corresponding deterministic system in continuous time.
Proposition: (Benaim and Hirsch [1996]) Consider a two-player smooth fictitious play in which every strategy profile has positive probability at any state . If is an asymptotically stable equilibrium of the continuous time process, then regardless of the initial conditions .
Definition: A rule is -universally consistent if for any
a.s.
In words, using as the criterion the time average payoff, the utility
realized cannot be much less than that from the time invariant strategy.
This objective differs from the Bayesian approach of specifying prior
beliefs about strategies used by opponents choosing a best response to those beliefs.
However, any Bayesian expects to be consistent almost surely.
Existence of such rules originally shown by Hannan [1957], and by
Blackwell [1956]. The result has been lost and rediscovered several times since then by
many authors. In the computer science literature this is called the "on-line decision
problem."
Fudenberg and Levine [1995] show that universal consistency can be accomplished by a smooth fictitious play procedure in which is derived from maximizing a function of the form , and at each date players play applied to their current beliefs. For example, if
, then
This is "exponential fictitious play": each strategy is played
in proportion to an exponential function of the utility it has historically yielded; it
corresponds to the logit decision model that has been extensively used in empirical work.
Notice that as the
probability that any strategy that is not a best response is played goes to zero.
Note also that this is not a very complex rule. Since it is not difficult to implement universally consistent rules universal consistency may be a useful benchmark for evaluating learning rules, even if it is not exactly descriptive of experimental data in moderate size samples and even if individual players do not implement the particular procedures discussed here.
The stimulus-response model was developed largely in response to observations by psychologists about human behavior and animal behavior: choices that lead to good outcomes are more likely to be repeated
behavior is random. All of these properties are shared by many learning
processes. Stimulus-response has received a fair bit of recent attention, but we fell that
researchers would do better to investigate models more in the style of smooth fictitious
play.
First consider stimulus response with only positive reinforcements, as in Borgers-Sarin [1994]. Agents at each date use a mixed strategy, and the state of the system is the vector of mixed actions played at time t. Payoffs are normalized to lie between zero and one, so that they have the same scale as probabilities. The state evolves in the following way: if player i's payoff at date t is , then
Because all payoffs are nonnegative, the propensity to play any action
increases each time that action is played.
Narendra and Thatcher [1974] showed that against an i.i.d. opponent, as
the reinforcement parameter goes
to zero, the time average utility converges to the maximum that could be obtained against
the distribution of opponents play. But this weak property is satisfied by many learning
rules that are not consistent, including "pure" fictitious play; smooth
fictitious play retains its consistency property regardless of opponents play.
Borgers-Sarin [1995] consider a more general model in which
reinforcements can be either positive or negative, depending on whether the realized
payoff is greater or less than the agent's "aspiration level." The simplest case
is a constant but nonzero aspiration level: if the aspiration level is greater than 1,
(all outcomes are disappointing) play cannot lock on to a pure action. Less obviously, if
there are only two strategies H and T, all payoff realizations are either 0 or 1, the
probability that H yields 1 is p, and the probability that T yields 1 is 1-p, ( as if the
agent is playing against an i.i.d. strategy in the game matching pennies) and the
aspiration level is constant at ½, then probability of playing H converges to p. This is
referred to as probability matching. This is not optimal, but at one time psychologists
believed it was characteristic of human behavior; we do not believe that they still do.
Roth and Er'ev [1994],[1996] develop a closely related but more highly
parameterized variation of the stimulus-response model and use it to study experimental
data on 2-player matrix games with mixed-strategy equilibria. They assume that the
aspiration level follows a linear adjustment process that ensures that the system moves
away from probability matching in the long run, and add a minimum probability on every
strategy. Their motivation is to see how far they can get with a "simple" model
with low cognition.
They have the nice idea of looking at experiments with very long run
length. This is important because learning theory suggests that many more trials are
needed for players to learn the distribution of opponents' play when it corresponds to a
mixed strategy than when it corresponds to a pure one.
The Roth and Er'ev model fits the aggregate data (pooled over subjects)
well, but it has a large number of free parameters. Most learning models with enough
flexibility in functional form are going to fit the aggregate data relatively well,
because of its high degree of autocorrelation.
Roth and Er'ev compare their stimulus-response model to a simple
(0-parameter) deterministic fictitious play on individual data, and argue that this
comparison makes a "case for low-rationality game theory." In my opinion, this
case is far from proven: they looked only at deterministic fictitious play, and we would
expect a priori that a deterministic model will not fit the individual data nearly as well
as a stochastic one. (This is acknowledged in Roth and Er'ev [1996], who say that
stochastic modifications of fictitious play are beyond the scope of the paper.) Also,
Er'ev and Roth specify a fictitious play with zero prior weights, which implies that the
players will be very responsive to their first few observations, while actual behavior is
known to be more "sluggish" than that.
In contrast, Cheung and Friedman [1994] have had some success in fitting
modified fictitious play models to experimental data. Their model (in games with 2
actions) has three parameters per player: the probability that a player uses his first
action is a linear function (2 parameters) of the exponentially-weighted history (1 more
parameter for the exponential weight.) The median exponential weighting over players and
experiments is about .3 or .4; best response corresponds to weight zero and classic
fictitious play is weight one.
Reflecting on these papers suggests to us that, as a group, theorists and experimenters should develop, characterize, and test models of "intermediate-cognition" procedural rationality; by this we mean models in which players are trying to maximize something in an approximate or loosely interpreted sort of way.
A simple experiment shows that prior information can have an important
impact on the course of play. This is the Prasnikar and Roth [1992] experiment on the
"best-shot" game, in which two players sequentially decide how much to
contribute to a public good. The backwards-induction solution to this game is for player 1
to contribute nothing and player 2 to contribute 4; there is also an imperfect Nash
equilibrium in which player 1 contributes 4 and player 2 contributes nothing. There is no
Nash equilibrium where player 1 mixes.
Prasnikar and Roth ran two treatments of this game. In the first,
players were informed of the function determining opponents' monetary payoffs. Here, by
the last few rounds of the experiment the first movers had stopped contributing. In the
second treatment, subjects were not given any information about the payoffs of their
opponents. In this treatment even in the later rounds of the experiment a substantial
number of player 1's contributed 4 to the public good, while others played 0.
This is not consistent with backwards induction or even Nash
equilibrium, but it is consistent with an (approximate, heterogeneous) self-confirming
equilibrium (Fudenberg and Levine [1996]). These experiments provide evidence that
information about other players' payoffs is used as if players maximized.
Aumann, R. [1987]: "Correlated Equilibrium as an Extension of Bayesian Rationality," Econometrica, 55:1-18.
Benaim, M. and M. Hirsch [1994]: "Learning Processes, Mixed Equilibria and Dynamical Systems Arising from Repeated Games".
Binmore, K. and L. Samuelson [1995]: "Evolutionary Drift and Equilibrium Selection," University College London.
Blackwell, D. [1956]: "Controlled Random Walks," Proceedings International Congress of Mathematicians, III 336-338, Amsterdam, North Holland.
Bjornerstedt, J. and J. Weibull [1995]: "Nash Equilibrium and Evolution by Imitation," in The Rational Foundations of Economic Behavior, (K. Arrow et al, ed.), Macmillan.
Borgers, T. and R. Sarin [1994]: "Learning through Reinforcement and the Replicator Dynamics," mimeo.
Brandenburger and E. Dekel [1987]. "Rationalizability and Correlated Equilibria," Econometrica, 55:1391-1402.
Cheung, Y. and D. Friedman [1994]: "Learning in Evolutionary Games: Some Laboratory Results," Santa Cruz.
Ellison, G. [1995] "Learning from Personal Experience: On the Large Population Justification of Myopic Learning Models," mimeo, MIT.
Ellison, G. and D. Fudenberg [1993] "Rules of Thumb for Social Learning," Journal of Political Economy 101:612-623.,
Ellison, G. and D. Fudenberg [1995] "Word of Mouth Communication and Social Learning," Quarterly Journal of Economics, 110:93-126.
Er'ev, I. and A. Roth [1996] "On The Need for Low Rationality Cognitive Game Theory: Reinforcement Learning in Experimental Games with Unique Mixed Strategy Equilibria," mimeo, University of Pittsburgh.
Fudenberg, D. and D. Kreps [1990]: "Lectures on Learning and Equilibrium in Strategic-Form Games," CORE Lecture Series.
Fudenberg, D. and D. Kreps[1993] "Learning Mixed Equilibria," Games and Economic Behavior 5, 320-367.
Fudenberg, D. and D. K. Levine [1994]: "Consistency and Cautious Fictitious Play," Journal of Economic Dynamics and Control, 19, 1065-1090.
Fudenberg, D. and D. K. Levine [1996] "Measuring Subject's Losses in Experimental Games," mimeo. UCLA.
Hannan, J. [1957]: "Approximation to Bayes Risk in Repeated Plays," in M. Dresher, A.W. Tucker and P. Wolfe, eds. Contributions to the Theory of Games, volume 3, 97-139, Princeton University Press.
Hofbauer, J. [1995] "Stability for the Best Response Dynamic," mimeo.
Hofbauer, J. and J. Weibull [1995]: "Evolutionary Selection against dominated strategies," mimeo, Industriens Utredningsintitut , Stockholm.
Holland, J.H. [1975] Adaptation in Natural and Artificial Systems, Ann Arbor: The University of Michigan Press.
Jordan, J. [1993]: "Three problems in Learning Mixed-Strategy Nash Equilibria," Games and Economic Behavior 5, 368-386.
Kalai, E. and E. Lehrer [1993]: "Rational Learning Leads to Nash Equilibrium," Econometrica 61:1019-1045.
Kanivokski, Y. and P. Young [1994]: "Learning Dynamics in Games with Stochastic Perturbations," mimeo
Kirman, A. [1993]: "Ants, Rationality and Recruitment," Quarterly Journal of Economics, 108, 137-156.
Monderer, D., D. Samet and A. Sela [1994]: "Belief Affirming in Learning Processes," Technion.
Narendra, K. and M. Thatcher [1974]: "Learning Automata: a Survey" IEEE Transactions on Systems, Man and Cybernetics , 4, 889-899.
Prasnikar V. and A. Roth [forthcoming]: "Considerations of fairness and strategy: experimental data from sequential games," Quarterly Journal of Economics.??
Roth A. and I. Er'ev [1994]: "Learning in Extensive-Form Games: Experimental Data nd Simple Dynamic Models in the Intermediate Term," Games and Economic Behavior, 8, 164-212.
Samuelson, L. and J. Zhang [1992]: "Evolutionary Stability in Asymmetric Games," Journal of Economic Theory , 57, 363-391.
Schlag, K. [1995]: "Why Imitate, and if so, How? Exploring a Model of Social Evolution," D.P. B-296, Universitat Bonn
Young, P. [1993]: "The Evolution of Conventions," Econometrica, 61, 57-83.