## What the Meeting is About

Random graphs and randomized algorithms continue to play a very important role in algorithmic research and discrete mathematics. This meeting will be an opportunity to focus on some recent developments and discuss open problems, old and new.

## Tentative Schedule

26 27 28 29 Joel Spencer Uri Zwick Eli Upfal Luca Becchetti Aravind Srinivasan Micah Adler Alan Frieze Leonard Schulman Dana Randall Berthold Vöcking Muthu Muthukrishnan Ashish Goel Eric Vigoda Jirka Matousek Alessandro Panconesi Michael Mitzenmacher Jaikumar Radhakrishnan

## Important Dates

Arrival: Sunday 26 May 2002
Departure: Friday 31 May - Saturday 1 June, 2002

## Location

The meeting will be held in the small medieval hilltop town of Bertinoro. This town is in Emilia Romagna about 50km east of Bologna at an elevation of about 230m.  Here is a map putting it in context. It is easily reached by train and taxi from Bologna and is close to many splendid Italian locations such as Ravenna, a treasure trove of byzantine art and history, and the Republic of San Marino (all within 35km) as well as some less well-known locations like the thermal springs of Fratta Terme and the castle and monastic gardens of Monte Maggio.  Bertinoro can also be a base for visiting some of the better-known Italian locations such as Padua, Ferrara, Vicenza, Venice, Florence and Siena.

Bertinoro itself is picturesque, with many narrow streets and walkways winding around the central peak.  The meeting will be held in a redoubtable ex-Episcopal fortress that has been converted by the University of Bologna into a modern conference center with computing facilities and Internet access.  From the fortress you can enjoy a beautiful the vista that stretches from the Tuscan Apennines to the Adriatic coast.

## List of participants

• Micah Adler, University of Massachussets
• Luca Becchetti, La Sapienza of Rome
• Gianluca De Marco, IMC, CNR, Pisa
• Devdatt Dubhashi, Chalmers University
• Alan Frieze, Carnegie Mellon University
• Ashish Goel, University of Southern California
• Jirka Matousek, Charles University of Prague
• Michael Mitzenmacher, Harvard University
• S. Muthukrishnan, ATT Bell Labs
• Alessandro Panconesi, La Sapienza of Rome
• Marco Pellegrini, IMC, CNR, Pisa
• Jaikumar Radhakrishnan, Tata Institute of Fundamental Research
• Dana Randall, Georgia Institute of Technology
• Leonard Schulman, Caltech
• Joel Spencer, Courant Institute, NYU
• Aravind Srinivasan, University of Maryland
• Eli Upfal, Brown University
• Berthold Vöcking, Max Planck Institut
• Eric Vigoda, University of Chicago
• Uri Zwick, Tel Aviv University

Scientific Organizing Committee Michael Mitzenmacher Harvard University Alessandro Panconesi, University La Sapienza of Rome Aravind Srinivasan University of Maryland, College Park Andrea Bandini, Elena Della Godenza, Centro Congressi di Bertinoro BICI   Bertinoro International Center for Informatics

### ABSTRACTS

Tradeoffs in Probabilistic Packet Marking for IP Traceback
by Micah Adler, University of Massachussets

There has been considerable recent interest in probabilistic packet marking schemes for the problem of tracing a sequence of network packets back to an anonymous source. An important consideration for such schemes is b, the number of packet header bits that need to be allocated to the marking protocol. However, all previous schemes belong to a class of protocols for which b must be at least log(n), where n is the number of bits used to represent the path of the packets. In this talk, we introduce a new marking technique for tracing a sequence of packets sent along the same path. This new technique is effective even when b=1. In other words, the sequence of packets can be traced back to their source using only a single bit in the packet header. With this scheme, the number of packets required to reconstruct the path is O(2^(2n)), but we also show that Omega(2^n) packets are required for any protocol where b=1.
We also study the tradeoff between b and the number of packets required. We provide a protocol and a lower bound that together demonstrate that for the optimal protocol, the number of packets required (roughly) increases exponentially with n, but decreases doubly exponentially with b. The protocol we introduce is simple enough to be useful in practice. We also study the case where the packets are sent along k different paths. For this case, we demonstrate that any protocol must use at least log(2k-2) header bits. We also provide a protocol that requires log(2k+1) header bits in some restricted scenarios (to which the lower bound applies). This protocol introduces a new coding technique that may be of independent interest.

Phase Transitions for Random Processes
by Joel Spencer, Courant Institute

It has been forty years since the discovery by Paul Erd\H{o}s and Alfred R\'enyi that the random graph undergoes a phase transition (they called it: double jump) near $p=\frac{1}{n}$, many small components rapidly melding into the giant component." We now view this as a percolation effect, with classical connections to Mathematical Physics. In recent years a host of random processes coming from the fecund intersection of Discrete Math, Computer Science and Probability have shown strong percolation behavior. A particular set of examples involve what are called Achlioptos processes: each time unit two random edges are generated on $n$ vertices. Precisely one of them must be added to an evolving graph, initially empty. For example, add the first edge if isolated, otherwise add the second. There appears to be an explicit $t_0$ such that near time $t_0\frac{n}{2}$ a giant component suddenly emerges but there are many open questions about the nature of the process near that critical time.

Non-Clairvoyant Scheduling to Minimize the Total Flow Time on Single and Parallel Machines
by Luca Becchetti, La Sapienza

Scheduling a sequence of jobs released over time when the processing time of a job is only known at its completion is a classical problem in CPU scheduling in time sharing operating systems. A widely used measure for the responsiveness of the system is the average flow time of the jobs, i.e. the average time spent by jobs in the system between release and completion.

The Windows NT and the Unix operating system scheduling policies are based on the Multi-level Feedback algorithm. In our work we proved that a randomized version of the Multi-level Feedback algorithm is competitive for single and parallel machine systems, in our opinion providing one theoretical validation of the goodness of an idea that has proven effective in practice along the last two decades.

Joint work with Stefano Leonardi

Probabilistic Analysis of Symmetric Travelling Salesman Problems
by Alan Frieze, CMU

Let the edges of the complete graph $K_n$ be assigned independent uniform $[0,1]$ random edge weights. Let $Z_{TSP}$ and $Z_{2FAC}$ be the weights of the minimum length travelling salesman tour and minimum weight 2-factor respectively. We show that with high probability $|Z_{TSP}-Z_{2FAC}|=o(1)$. The proof is via by the analysis of a polynomial time algorithm that finds a tour only a little longer than $Z_{2FAC}$.

Unique-sink orientations of cubes, a talk in two-parts: Part I
by Emo Welzl, ETH

There is an ongoing attempt of designing a provably fast method for solving Linear Progams and related optimization problems by combinatorial methods in the unit cost model (as opposed to the bit-model, where polynomial methods are known). Among others, this research has brought forth randomized methods with subexponential running time, but also a number of interesting combinatorial models like Abstract Objective Functions (AOF), Unimodal Numberings, LP-type Problems, Abstract Optimization Problems (AOP), Unique Sink Orientations of Cubes (USO), et cetera. After some general introduction we will concentrate on the USO model.

Unique-sink orientations of cubes, a talk in two-parts: Part II
Jirka Matousek, Charles University

The $n$-cube is considered as a graph (with vertex set $\{0,1\}^n$). An unique-sink orientation (USO) is an orientation of the edges of the $n$-cube such that every face of the cube has exactly one sink (directed cycles are allowed). Such orientations arise from several sources, such as linear programs (considering a generic linear function on a polytope isomorphic to the cube), certain linear complementarity problems, and certain convex programs. Algorithms have been studied for finding the global sink in a USO; the USO is specified by an oracle that, given a vertex, returns the orientation of the edges incident to that vertex. Upper and lower bounds for the complexity of such algorithms have recently been given by Welzl, Szab\'o, and Schurr; improving them significantly is the main challenge. The speaker has proved that the number of USO is $2^{\Theta(2^n \log n)}$. The number of acyclic USO is easily seen to be between $2^{\Omega(2^n)}$ and $2^{O(2^n\log n)}$; it would be nice to find better bounds.

Simultaneous Optimization with Concave and Convex Costs
by Ashish Goel, University of Southern California

In simultaneous optimization, we need to simultaneously approximate a whole family of objective functions. In this talk, we will consider two problems:
a) Suppose we want to construct a tree for sending information from a single source to k sinks. Further, suppose that the cost of sending x units of information over a link is proportional to f(x), where f is concave, f(0) = 0, and f is non-decreasing. This problem is known as single source buy-at-bulk network design. We will present a simple randomized algorithm that guarantees an O(log k) approximation to the cheapest tree for all such functions f, simultaneously.
b) Suppose we want to assign jobs to machines, and the cost of an allocation is some symmetric convex function of the loads on individual machines. We will present a general framework for approximating all such functions simultaneously, and apply it to the case when job sizes are random variables.
(b) represents joint work with Adam Meyerson.

Building bipartite graphs with smaller bipartite graphs

We say that a bipartite graph $G=(V,W,E)$ has an independent set of size $k$ if there exist sets $S\subseteq V$ and $T \subseteq W$ of size $k$ each with no edge connecting them. K\H{o}vari, S\'{o}s and Tur\'{a}n proved that if $G$ has no independent set of size $k$ ($n^{\epsilon} \leq k \leq n^{1-\epsilon}$), then it has $\Omega(\frac{n^2}{k}\log n)$ edges. We consider a generalization of this.
Given $n_1,n_2,\ldots,n_r$, we wish to construcuct a bipartite graph $G$ (with vertex sets of size $n$) by putting together copies of $K_{n_i,n_i}$ such that $G$ has no independent set of size $k$. The theorem of K\"{o}vari, S\'{o}s and Tur\'{a}n refers to the situation when all $n_i$ are $1$, and implies the following necessary condition for our generalization: $$\sum_{i} n_i^2 = \Omega(\frac{n^2}{k}\log n). \label{eq:kst}$$ On the other hand, if we put copies of $K_{n_i,n_i}$ together at random, the resulting graph has no independent set of size $k$ with high probability provided $$\sum_{n_i < k} \left(\frac{n_i}{n/k}\right)^2 + \sum_{n_i \geq k} \left(\frac{n_i}{n/k}\right) \geq ck \log n. \label{eq:suff}$$ Note that (\ref{eq:kst}) explains the first term on the left hand side of (\ref{eq:suff}) but not the second. We will show that the sufficient condition (\ref{eq:suff}) is also necessary, and derive from this a lower bound on the size of depth-two superconcentrators.

Approximate distance oracles
by Uri Zwick, Tel Aviv University

Let G=(V,E) be an undirected *weighted* graph with |V|=n and |E|=m. Let k>1 be an integer. We show that G=(V,E) can be preprocessed in O(kmn^{1/k}) expected time, constructing a data structure of size O(kn^{1+1/k}), such that any subsequent *distance query* can be answered, approximately, in O(k) time. The approximate distance returned is of stretch at most 2k-1, i.e., the quotient obtained by dividing the estimated distance by the actual distance lies between 1 and 2k-1. We show that a 1963 girth conjecture of Erdo"s, implies that Omega(n^{1+1/k}) space is needed in the worst case for any real stretch strictly smaller than 2k+1. The space requirement of our algorithm is, therefore, essentially optimal. The most impressive feature of our data structure is its *constant* query time, hence the name oracle''. Previously, data structures that used only O(n^{1+1/k}) space had a query time of Omega(n^{1/k}) and a slightly larger, non-optimal, stretch.
Our algorithms are extremely simple and easy to implement efficiently. They also provide faster constructions of sparse *spanners* of weighted graphs, and improved *tree covers* and *distance labelings* of weighted or unweighted graphs.

Joint work with Mikkel Thorup

Distributions on level-sets with applications to approximation algorithms
by Aravind Srinivasan, University of Maryland - College Park

We consider a family of distributions on fixed-weight vectors of some $t$ bits; these distributions enjoy a natural negative correlation property and also satisfy pre-specified conditions on their marginal distributions. We show the existence of such families, and present a linear-time algorithm to sample from them. We then present applications to approximation algorithms.

Building Low-Diameter P2P Networks
by Eli Upfal, Brown University

In a peer-to-peer (P2P) network, nodes connect into an existing network and participate in providing and availing of services. There is no dichotomy between a central server and distributed clients. Current P2P networks (e.g., Gnutella) are constructed by participants following their own un-coordinated (and often whimsical) protocols; they consequently suffer from frequent network overload and fragmentation into disconnected pieces separated by choke-points with inadequate bandwidth.
In this paper we propose a simple scheme for participants to build P2P networks in a distributed fashion, and prove that it results in connected networks of constant degree and logarithmic diameter. It does so with no global knowledge of all the nodes in the network. In the most common P2P application to date (search), these properties are crucial.
In evaluating the performance of the P2P protocol we focus on the long term behavior of the system in a fully decentralized environment in which nodes arrive and depart in an uncoordinated, and unpredictable fashion. This setting is best modeled by a stochastic, memoryless, continuous-time setting. The arrival of new nodes is modeled by Poisson distribution with rate $\lambda$, and the duration of time a node stays connected to the network has an exponential distribution with parameter $\mu$. Let $G_t$ be the network at time $t$ ($G_0$ has no vertices). We analyze the evolution in time of the stochastic process \${\cal G}=(G_t)_{t\geq 0} (Joint work with G. Pandurangan and P. Raghavan.)

Tight Bounds for Worst-Case Equilibria
by Berthold Vöcking, Max Planck Institut

The central question of this talk is how much does network performance suffer from the lack of regulation? - As a first step toward formalizing this question mathematically, we assume that, in the absence of network regulation, users act in a purely selfish manner. Under this assumption, we can view network users as independent agents participating in a noncooperative game and expect the routes chosen by users to form a Nash equilibrium in the sense of classical game theory. We present some first results from this new and fascinating area of research.

Some general connections between phase transitions and Markov chains
by Eric Vigoda, University of Chicago

We'll examine some general results relating absence of a phase transition in certain statistical physics models with convergence rates of associated Markov chains (known as Glauber dynamics). Recently Cesi (PTRF 2001) proved some general connections using Log-Sobolev inequalities. For the case of attractive systems we present a simpler coupling based proof of his result.

This talk is based on a joint work with M. Dyer, A. Sinclair, and D. Weitz.

Dynamic Models for File Sizes and Double Pareto Distributions
by Michael Mitzenmacher, Harvard University

We introduce and analyze a new generative user model to explain the behavior of file size distributions. Our Recursive Forest File model combines ideas from recent work by Downey with ideas from recent work on random graph models for the Web. Unlike similar previous work, our Recursive Forest File model allows new files to be created and old files to be deleted over time, and our analysis covers problematic issues such as correlation among file sizes. Moreover, our model allows natural variations where files that are copied or modified are more likely to be copied or modified subsequently.

Previous empirical work suggests that file sizes tend to have a lognormal body but a Pareto tail. The Recursive Forest File model explains this behavior, yielding a double Pareto distribution, which has a Pareto tail but close to a lognormal body. We believe the Recursive Forest model may be useful for describing other power law phenomena in computer systems as well as other fields.

by Leonard Schulman, Caltech

I will explain this classical majorization inequality, and discuss the compact, complex extension.

Surfing Streams: Randomized Techniques for Sublinear Algorithms
by S. Muthu Muthukrishnan, Rutgers University and AT&T Research

Say you go fishing in a pond. There are many different types of fish and you catch a bounty. At the end of the day, how can you quickly estimate the fish types you have caught that are only a few in your catch? Suppose now that your colleague goes fishing as well and she also has a bountiful catch. At the end of the day, how do you quickly check if the fish types you have caught are similar or dissimilar? These problems are interesting when you can not remember all the fish types in your catch, and do not have the time to sort through them for answering the questions above.
When you mine stream data from sensors, Internet routers, and other sources, the problems one faces are not unlike the fishing examples above. I will present solutions to these and other problems in data streaming context. Solutions rely on a number of randomized techniques: embeddings, construction of range summable random variables, group testing with algebraic tests, use of approximate min-wish hash functions, etc. Problems become hard when one throws some of the catch back into the pond!
This is joint work with several algorithms and database researchers.

Some Open Problems Concerning Distributed Algorithms
by Alessandro Panconesi, La Sapienza

Distributed graph algorithms is an area where the use of randomization typically leads to simple and fast algorithms, whereas deterministic solutions almost invariably are contrived and exponentially slower. In this talk we shall present an overview of the state of the art and discuss some open problems, the most important of which being whether a separation result might be lurking in the background.

Simulated Tempering: Fast or Slow?
by Dana Randall, Georgia Institute of Technology

Simulated tempering is a compelling Markov chain heuristic used for random sampling when other Markov chains are known to be slow. The idea is to enhance the state space with a parameter modeling temperature, and to allow the temperature to vary during the simulation. At high temperature bottlenecks (which cause slow mixing) disappear, mixing occurs, and lowering the temperature recovers the stationary distribution of interest. The swapping algorithm is a variant of this method.

Recently Madras and Zheng analyzed the swapping algorithm on two bimodal distributions, including the mean-field Ising model, and showed that it is efficient. We generalize their results to asymmetric models and suggest a more general approach to implementing tempering-type algorithms.