Simultaneous Experimentation With More Than 2 Projects

Simultaneous Experimentation With More Than 2 Projects Alejandro Francetich School of Business, University of Washington Bothell May 12, 2016 Abstract A researcher has n > 2 projects she can undertake; one and only one of these projects can succeed, but there is uncertainty about which one will work out. She may experiment on any subset of the n projects over any interval of time. Each additional project undertaken entails a cost, but simultaneous experimentation generates more data. Due to the complexity and intractability of the problem, we cannot hope for a closed-form, complete solution. Instead, we present the numerical solution for the case n = 3. Provided the cost is not too high, or the researcher is sufficiently patient, the optimal research strategy is as follows. If the researcher is sufficiently confident about a given project, she takes on this favored project alone. If she is sufficiently confident about which project will not work out, but not so much about which of the other two will, she takes on the latter two simultaneously. Finally, when she is sufficiently unsure about the projects, she takes on all three at once. Continued failure on a project pushes confidence towards the other projects. But the researcher does not give up on the failing project; rather, she takes on other projects as well despite the higher costs and knowing that all but one of them are doomed to fail. A conjecture regarding the structure of the optimal strategy for the general problem is provided. Keywords: Experimentation, two-armed bandits, multi-choice bandits, negatively correlated arms, Poisson process JEL Classification Numbers: D83, D90 Email address: aletich@uw.edu This work features research undertaken while I was at the Decision Sciences Department at Bocconi University as a postdoctoral fellow. I am deeply indebted to David Kreps, Alejandro Manelli, Pierpaolo Battigali, Massimo Marinacci, and Juan Camilo Gomez for their support, guidance, and encouragement. I gratefully acknowledge financial support from ERC advanced grant 324219. Any remaining errors and omissions are all mine. 1

1 Introduction Imagine there is an archipelago of islands, and a treasure ship is sunk within it. An explorer is after the treasure. The treasure is known to be buried in one of the islands, but not where exactly. The explorer can organize an expedition to one island at a time, or she can organize simultaneous expeditions to multiple islands. It is more costly to set up multiple expeditions, and all but one of said expeditions are doomed to fail; but simultaneous expeditions can cover more ground faster. In standard experimentation problems, decision makers are allowed to experiment on at most one project at a time. In the context of binary problems, Francetich (2016) shows that experimenting on more than one project simultaneously is beneficial for decision makers if they have this option even if it is known that only one of the projects is fruitful ex-post. For instance, an academic research question may be true or false, and the cause of a disease may be a virus or bacteria. But what if there are more than 2 islands in the archipelago? The problem becomes intractable or far too cumbersome. With n projects, the number of possible sets of projects to undertake is 2 n. Moreover, the state space of the problem, the simplex of posterior beliefs, is multidimensional. Thus, the Bellman equation and pasting conditions involve partial differential equations, and the regions over which the decision maker switches from one set of projects to the other are characterized by surfaces rather than by simple cutoffs. As a way around these technical and computational limitations, we present the numerical solution to the problem for the case n = 3. 1 Provided the cost is not too high, or the researcher is sufficiently patient, the optimal strategy dictates conducting research as follows. If the researcher is sufficiently confident about a single project, she takes on this favored project alone. If she is sufficiently confident about which project will not work out, but not so much about which one of the other two will, she takes on the latter two simultaneously. Finally, when she is sufficiently unsure about the projects, she takes on all three at once. Continued failure on a project pushes the researcher s confidence towards the neglected projects. But the researcher does not give up on the failing project; rather, she takes on other projects as well despite the higher cost of simultaneous research, and knowing that all but one of the projects are doomed to fail. For the general problem of n > 2 projects, a conjecture regarding the structure of the optimal strategy is provided. 2 The Model There is a finite set of n projects X = {x 0,..., x n 1 } on which a decision maker (henceforth, DM) can experiment. The DM allocates her time between the different subsets of X. The set 1 As another way around said issues, Francetich and Kreps (2014) explores heuristics in a similar problem. 2

of allocations of a divisible unit of time between the subsets of X is A := S 2n 1, the (2 n 1)- dimensional simplex. Given a labelling of the subsets of X, 2 X = {A j X : j = 0,..., 2 n 1}, the j-th component α j of vector α A denotes the fraction of time spent on A j. There is a (flow) research cost c > 0 to undertaking each project. Successes yield a gross reward of 1, and they arrive over time for project i = 0,..., n 1 according to a Poisson processes with arrival rate λi(ω = ω i ), where I( ) is the indicator function, λ > c is the known arrival rate, and ω Ω := {ω 0,..., ω n 1 } is the ex-ante unobserved state of nature. In words, it is known that one and only one of these projects is profitable to undertake, and exactly how profitable it is, but there is uncertainty as to which one is the profitable one. Payoffs are discounted at the subjective rate ρ > 0. The DM starts with a prior π 0 over the states of nature; this prior is a point in Π := S n 1, the (n 1)-dimensional simplex. If π Π represents the beliefs of the DM, her expected immediate payoff from experimenting on subset A X for a time interval of length Δ > 0 is: n 1 λδ i=0 π i I(x i A) cδ#a. In addition, she observes whether any successes arrive over Δ for each of the projects x A. In particular, by working on a single project, she cannot distinguish between the event of an arrival for one of the other projects and the event of failure of arrival altogether. Let π t = (π 0,t,..., π n 1,t ) denote the period-t posterior. At any moment, observing an arrival makes the posterior jump to 1 for the successful project and to 0 for the rest. By spending time on all projects, either nothing new is learned, or the model uncertainty is resolved immediately. This is due to the symmetry in arrival rates; the event of failure of arrival is equally likely for all of the projects. The more interesting dynamics take place when the DM spends time on non-empty proper subsets of X, namely, when she works on some but not all projects. Given α A, let α i denote the fraction of time spent on project i, be it exclusively or as part of a larger set of projects: α i = j:xi A j α j. If no arrival results over [t, t + Δt), the posterior for project x i is: π i,t+δt = π i,t e αi λδt ( j:xi Aj π i,t ) e αi λδt + j:xi / Aj π i,t. As Δt shrinks, we obtain: π i,t = α i λπ i,t 1 π i,t. j:x i A j While working unsuccessfully on some but not all of the projects, the DM becomes progressively pessimistic about them and optimistic about the neglected ones. See Figure 1. The environment is stationary, and the state variable of the problem is the belief of the DM, π Π. Let w : Π R denote the (optimal, average) value function; w satisfies the 3

(a) DM works on A = {x 0 } (b) DM works on A = {x 0, x 1 } Figure 1: Evolution of posteriors when the DM works on the projects in set A X. The curved lines pointing to the corners represent the jump in the posterior in the event of success. The straight lines represent the gradual updating of beliefs while no successes are observed. Bellman equation: w(π) = max α A { 2 n 1 α j (λδ j=0 n 1 π i I(x i A j ) cδ#a j + E )} A j,π[c(w, w, π)], ρ i=0 where C is the continuation value of the problem, which depends on the distribution of posteriors and on the value function and its gradient. The optimal strategy is a stationary strategy, recommending an allocation of time α A as a function of the state π Π. 3 Numerical Solution for n = 3 With n = 3 projects, the DM can spend her time on 8 different research agendas: (namely, doing no research at all), {x 0 }, {x 1 }, {x 2 }, {x 0, x 1 }, {x 0, x 2 }, {x 1, x 2 }, and X. To compute the optimal strategy numerically, we transform the control problem into a discretetime programming problem and adjust the length of the time period. We also discretize the state space. The length of the time period and the fineness of the state space, as well as the arrival rate, the discount rate, and the cost, are the parameters of the problem. Figure 2 presents the optimal strategy computed using MATLAB, specifying a state space of 200 200 points and a length of time of Δ = 0.01. 2 The arrival rate is always λ = 0.75; the different subfigures depict the optimal strategy for different values of ρ and c. The axes represent the probabilities π 0, π 1, respectively. 3 The different shaded areas of the (lower) 2 The MATLAB code is available from the author upon request. 3 The triangles depicted in figure 2 are projections of the 2-dimensional simplex onto the plane π 0, π 1. Such 4

triangle represent different recommended subsets of projects given the posteriors. The subsets are color coded as follows: grey = ; blue = {x 0 }; yellow = {x 1 }; red = {x 2 }; green = {x 0, x 1 }; purple = {x 0, x 2 }; orange = {x 1, x 2 }; and white = X. Along the boundaries, the decision maker splits her time evenly between the subsets recommended on each of the corresponding neighboring regions. She must split her time in this way for the path of posteriors to be well defined. On the interior of these regions, beliefs evolve in different directions; by shifting back and forth, beliefs are pushed in a single direction. 4 Figure 2a features the optimal strategy when ρ = 0.1 and c = 0.3. If the DM is sufficiently confident about a given project, namely if her prior is sufficiently close to one of the corners of the state space, she takes on the favored project alone. When the DM is sufficiently confident about which project will not work out, but not so much about the other two, she takes on the latter two simultaneously. Finally, when she is sufficiently unsure about the projects, she takes on all three at once: Information is valuable to her, and the cost of each project is not too high. Compare with figure 2b, where we have ρ = 0.1 but c = 0.65. Now, the DM works on at most two projects at once: She appreciates information, but it is too costly to take on all three projects simultaneously. Finally, figure 2c depicts the case ρ = 100 and c = 0.7. The cost is even higher, and the DM is far too impatient to appreciate the information that comes from experimenting. Thus, she takes on a single project if she is sufficiently confident about it, and otherwise gives up altogether. Figure 3 describes the path of posteriors and the research dynamics under the strategy in figure 2a. Figure 3a reproduces figure 2a in the 2-dimensional simplex. Assume the prior falls in the region where the DM starts working on x 0 alone. While working unsuccessfully on it, her posterior starts moving towards the orange region (figure 3b); eventually, she takes on project x 2 as well (figure 3c). Continued failure now pushes the posterior gradually in the direction of the (0, 1, 0) corner. When beliefs reach the frontier of the yellow and orange regions, the DM holds on to x 0 and splits her time evenly between x 1 and x 2 (figure 3d). Eventually, if no successes are observed, she becomes sufficiently unsure and takes on the third project as well. At this point, she works on all three projects at once until the winner is identified. 4 Conjecture for the General Case Based on the analysis of section 3, and given the results in Francetich (2016), we pose the following conjecture for the structure of the optimal strategy for a generic n N. projections are easier to compute, and the resulting figures are practically identical. 4 Such splitting ensures what Klein and Rady (2011) calls admissibility of the strategies. For more on admissibility in the binary version of the present problem, see Francetich (2016). 5

(a) Optimal strategy for ρ = 0.1 and c = 0.3 (b) Optimal strategy for ρ = 0.1 and c = 0.65 (c) Optimal strategy for ρ = 100 and c = 0.7 Figure 2: Optimal strategy for n = 3. Conjecture. Partition the parameter space into n + 1 different regions labeled k = 0, 1,..., n. For parameters in region k, the optimal strategy recommends undertaking up to only k out of the n projects. On region 0, the optimal strategy recommends never doing any research. For parameters in region 1, the state space Π is partitioned into up to 4 regions; the 3 outer regions represent the sets of beliefs where a single project is recommended, while the (possibly empty) inner region dictates when the DM should give up. For parameters in region k = 2,..., n, the state space is partitioned into k j=1 ( n j ) regions. On each of these, a j-subset of X is recommended, for j = 1,..., k. Singletons are recommended in the neighborhood of the corners of the simplex. In the neighborhood of points with j equal non-zero entries and n j zero entries, j-sets are recommended. For parameters in region n, the full set X is recommended in the neighborhood of the point π = (1/n,..., 1/n). 6

(a) Optimal strategy from figure 2a (b) The DM takes on project x 0 alone (c) The DM takes on both x 0 and x 2 (d) The DM keeps working on x 0 while alternating between x 2 and x 1 ; eventually, if no successes occur, she takes on all three projects Figure 3: Belief and research dynamics under the optimal strategy for n = 3. The arrows pointing to the corners represent the jump in the posterior in the event of success. The lines pointing inward represent the gradual updating of beliefs as the DM works unsuccessfully. On the boundaries of the different regions of the state space, the optimal strategy recommends splitting time equally between the subsets recommended on each of the neighboring regions. This strategy dictates conducting research as follows. Depending on the cost and discountrate ranges, the DM takes on at most k out of the total n projects at a time. If the posterior falls sufficiently near a corner of the simplex, the DM is sufficiently confident about the corresponding project and focuses on it. For posteriors along the faces of the simplex and nearby but sufficiently far from the corners the DM takes on multiple projects at once, the ones corresponding to the given face of the simplex. If the cost and discount rate fall in region n, namely if the cost is sufficiently low or the DM is sufficiently patient, this strategy dictates working on all projects at once when the DM is sufficiently unsure about the projects even though simultaneous research is more expensive, and ultimately only one project 7

can succeed. The dynamic of beliefs and research when parameters fall on the n region are as follows. In the neighborhood of the corners of the belief simplex, the optimal strategy recommends focusing on the favored project. Thus, if the DM is sufficiently confident in a project, she starts working on it exclusively. As long as she does not encounter a success, she becomes gradually pessimistic about this project and gradually optimistic about the neglected ones. As her posterior moves inward in the state space, she progressively takes on additional projects one at a time (possibly alternating between sets of projects along the boundaries), without abandoning the failing project. Eventually, she takes on all of the projects simultaneously until the fruitful one is identified. References Francetich, A. (2016). Managing multiple research projects. Working Paper. Francetich, A. and Kreps, D. (2014). Choosing a good toolkit: An essay in behavioral economics. Working Paper. Klein, N. and Rady, S. (2011). Negatively correlated bandits. Review of Economics Studies 78:693 732. 8