• Nebyly nalezeny žádné výsledky

LucDevroye AndrásGyörgy GáborLugosi FredericUdina High-dimensionalrandomgeometricgraphsandtheircliquenumber

N/A
N/A
Protected

Academic year: 2022

Podíl "LucDevroye AndrásGyörgy GáborLugosi FredericUdina High-dimensionalrandomgeometricgraphsandtheircliquenumber"

Copied!
28
0
0

Načítání.... (zobrazit plný text nyní)

Fulltext

(1)

El e c t ro nic

Journ a l of

Pr

ob a b il i t y

Vol. 16 (2011), Paper no. 90, pages 2481–2508.

Journal URL

http://www.math.washington.edu/~ejpecp/

High-dimensional random geometric graphs and their clique number

Luc Devroye András György Gábor Lugosi Frederic Udina

Abstract

We study the behavior of random geometric graphs in high dimensions. We show that as the di- mension grows, the graph becomes similar to an Erd˝os-Rényi random graph. We pay particular attention to the clique number of such graphs and show that it is very close to that of the corre- sponding Erd˝os-Rényi graph when the dimension is larger than log3nwherenis the number of vertices. The problem is motivated by a statistical problem of testing dependencies..

Key words:Clique number; dependency testing; geometric graphs; random graphs.

AMS 2010 Subject Classification:Primary 05C80; 62H15.

Submitted to EJP on October 1, 2010, final version accepted November 29, 2011.

School of Computer Science, McGill University, 3480 University Street Montreal, Canada H3A 2A7 (email: lucde- vroye@gmail.com). Supported by NSERC.

Machine Learning Research Group, Computer and Automation Research Institute of the Hungarian Academy of Sci- ences, Kende u. 13-17, 1111 Budapest, Hungary (email: gya@szit.bme.hu). Partially supported by the National Devel- opment Agency of Hungary from the Research and Technological Innovation Fund (KTIA-OTKA CNK 77782), and by the PASCAL2 Network of Excellence (EC grant no. 216886).

Department of Economics, Pompeu Fabra University, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain (email: ga- bor.lugosi@gmail.com and frederic.udina@gmail.com). G. Lugosi is also with ICREA. Supported by the Spanish Ministry of Science and Technology grant MTM2009-09063 and by the PASCAL Network of Excellence under EC grant no. 216886.

(2)

1 Introduction

Arandom geometric graphis defined bynindependent random points taking values inRd, drawn from the same distribution. These points correspond to the vertices of the graph and two of them are joined by an edge if and only if their Euclidean distance is less than a certain threshold. Such random geometric graphs have been studied extensively and many of their basic properties are now well understood. We refer to Penrose[16]for an extensive treatment. These graphs are usually studied in an asymptotic framework when the numbernof vertices is very large (it grows to infinity) while the dimensiond is held fixed. However, in some applications it is of interest to consider situations when the dimension is large. In such cases the graph is expected to behave differently. In this paper we consider random geometric graphs defined by nindependent vectors uniformly distributed on the surface of the unit ball inRd. We show that ifd → ∞whilenis held fixed, the random graph becomes, in a very strong sense, similar to an Erd˝os-Rényi random graph. Motivated by a hypothesis testing problem, we pay particular attention to the clique number of such random geometric graphs.

We show that if d is at least of the order of log3n, then the clique number is essentially the same as that of the corresponding Erd˝os-Rényi random graph. This is in sharp contrast to the behavior of the clique number when the dimension is fixed.

The paper is organized as follows. In Section 2 the basic model is described and the asymptotic equivalence of the random geometric graph and the Erd˝os-Rényi random graph is presented (The- orem 2). In Section 3 the main results of the paper are stated and proved on the behavior of the clique number of high-dimensional random geometric graphs. In Section 4 some numerical experi- ments are reported in which the behavior of the clique number is illustrated. In Section 5 we show a statistical application that motivated our research. We describe a hypothesis testing problem arising in applications of remote sensing and finance and propose a test based on computing the clique number of random geometric graphs. Finally, the Appendix contains some of the proofs of results announced in Sections 3 and 5.

2 Notation, set-up

Denote the unit sphere inRd by Sd−1 ={x ∈Rd :kxk= 1}where k · k stands for the Euclidean norm. LetX1, . . . ,Xnbe independent vector-valued random variables, uniformly distributed inSd1. We denote the components of the random vector Xi by Xi = (Xi,1, . . . ,Xi,d). For a given value of p ∈ (0, 1) (possibly depending on n and d) we define the random geometric graph G(n,d,p) as follows: the graph hasnvertexes labeled by 1, . . . ,nand vertexiand vertex jare connected by an edge if an only if

€Xi,XjŠ

tp,d ,

where(x,y)denotes the inner product of the vectorsx and yand tp,d is determined such that the probability of each edge equalsp, that is,

P¦€Xi,XjŠ

tp,d©

=p.

Equivalently, vertexiand vertex j are connected if and only ifkXiXjk ≤p

2(1−tp,d).

For example, for p = 1/2, tp,d = 0. To understand the behavior of tp,d as a function of p, we introduce some notation. Letµd−1 denote the uniform probability measure over Sd−1. For a unit

(3)

t 1−t u Cd1(u, t)

Figure 1: A spherical cap of height 1−t.

vectoruSd−1 and real number 0≤t≤1, letCd−1(u,t) ={x∈Rd:xSd−1,(x,u)≥t}denote a spherical cap of height 1−t aroundu (see Figure 1). Theangleof a spherical cap Cd1(u,t) is defined by arccos(t).

Then p = µd1(Cd1(e,tp,d)) is the normalized surface area of a spherical cap of height 1−tp,d centered at (say) the first standard basis vectore= (1, 0, 0, . . . , 0). The following estimates for the measure of a spherical cap will be used (see Brieden et al.[6]): forp

2/dtp,d≤1, 1

6tp,dp

d(1−t2p,d)d21p≤ 1 2tp,dp

d(1−t2p,d)d21 . (1) These bounds show that ifpis fixed andd is large,tp,d is of the order of 1/p

d.

Sometimes it is useful to think about random points onSd−1 as projections of Gaussian vectors on the unit sphere. In particular, letZ1, . . . ,Zn be independent standard normal vectors (i.e.,Zi has mean0= (0, . . . , 0)and unit covariance matrix). Then the vectors

X1= Z1

kZ1k, . . . ,Xn= Zn kZnk

are independent and uniformly distributed onSd1. This representation will be used in some proofs.

For example, this representation may be used to determine the asymptotic value of tp,d. Let Z = (Z1, . . . ,Zd)be a standard Gaussian vector and letX=Z/kZk= (X1, . . . ,Xd). Observe thatEkZk2= d. Also, by the law of large numbers,kZk/p

d→1 in probability. This implies thatX1p

d converges, in distribution, to a standard normal random variable. In fact, for any fixedk, the joint distribution ofp

d(X1, . . . ,Xk)is asymptotically standard normal. One consequence of this is that for anys>0, µd1(Cd1(e,s/p

d)) =P{X1>s/p

d}=P{Z1/kZk>s/p

d} →1−Φ(s)

(4)

as d → ∞where Φ(x) = (2π)1/2Rx

−∞e−t2/2d t is the standard normal distribution function. This implies thattp,d satisfies, for any fixed p∈(0, 1),

d→∞lim tp,dp

d= Φ1(1−p). (2)

Later we will need a quantitative estimate of the rate of convergence. Such a bound is given by the next lemma, proved in Appendix C.

Lemma 1. Assume0<p≤1/2and d≥max¦

(2/p)2, 27© . Then

|tp,d p

d−Φ−1(1−p)| ≤Up,d , where

Up,d=κp

plnd/d+κ0p/p d . withκp=2p

1(1−p)andκ0p=2p

2πe(Φ1(1−p/2))2/2.

One of the main messages of this paper is that the random geometric graph G(n,d,p) defined above behaves like an Erd˝os-Rényi random graph whend is large. An Erd˝os-Rényi random graph G(n,p) is defined as a graph on nvertices such that any pair of vertices is connected by an edge with probability p and all edges are present independently. TheG(n,p)random graph model was introduced by Erd˝os and Rényi[10]and most of its properties are well understood – see Bollobás [5], Palmer[15], Janson, Łuczak, and Ruci´nski[11]for monographs dedicated to the subject.

First we point out that asymptotically (i.e., as d → ∞), the random geometric graph G(n,d,p) converges to G(n,p) in total variation distance. However, our proof only implies a small total variation distance for astronomically large values ofd. Certain characteristics ofG(n,d,p)resemble those of G(n,p)for moderate values of d. In Section 3 we show that when(log3n)/d =o(1), the clique numbers of the two random graphs behave quite similarly.

The next theorem states that the distribution of the random geometric graphG(n,d,p)converges to that of the Erd˝os-Rényi random graphG(n,p)in total variation distance. The total variation distance between two random graphsG andG0defined on the same set of vertices (say{1, . . . ,n}) is defined by

dT V(G,G0) =max

G |P{G∈ G } −P{G0∈ G }|=1 2

X

g

|P{G=g} −P{G0= g}|,

where the maximum is taken over all 22(

n 2)

setsG of graphs over nvertices and the sum is taken over all such graphs.

Theorem 2. Fix a positive integer n and0≤p≤1. Then

dlim→∞dT V(G(n,d,p),G(n,p)) =0 .

The proof, given in Appendix A, is based on a relatively straightforward application of the multivari- ate central limit theorem.

Theorem 2 shows that, asymptotically, the random geometric graph behaves like an ordinary random graph. However, by the bounds provided by the proof, astronomical values ofdare required to make

(5)

the total variation distance small. (Just note that the total variation distance is the sum over all 2(n2) possible graphs g and therefored needs to be much bigger than 2n2 in order to make the obtained bound meaningful.) For this reason, the interest in Theorem 2 is purely theoretical.

On the other hand, the notion by which Theorem 2 relates the random geometric graph to an ordinary random graph is very strong. If one is interested in simple characteristics ofG(n,d,p), it may behave as that ofG(n,p)for much smaller values ofd. The main result of the paper, presented in the next section, shows that if d is poly-logarithmic in n, then the clique number of G(n,d,p) already behaves very similarly to that of G(n,p). At the same time, for values of d significantly smaller than logn, the clique number behaves very differently.

In this paper we study the (random) clique numberω(n,d,p)ofG(n,d,p), that is, the number of vertices in the largest clique contained inG(n,d,p). It is well-known (see, e.g., Bollobás [5]) that the clique number of the Erd˝os-Rényi random graphG(n,p)is, with probability converging to one, within a constant of 2 log1/pn−2 log1/plog1/pnwhen p is held fixed as ngrows. This is in sharp contrast with the behavior ofω(n,d,p)for small values of d. It is easy to see that for any fixedd, the clique number grows linearlywith n and even for d = εlogn, for sufficiently small values of ε >0,ω(n,d,p)grows asnαwhereα→1 asε→0 (see Proposition 4 below).

Theorem 2 implies that, for very large values ofd,ω(n,d,p)behaves similarly to the clique number ofG(n,p). The more interesting question is how larged needs to be. The main result of the paper (Theorem 3) establishes that when d is about log3n, the behavior of the clique number is already similar to that of G(n,p)(for fixed p). This result is complemented by Theorem 5 which implies that ford∼(3 logn)2, we haveω(n,d,p) =Op(log3n).

3 The clique number of G ( n, d , p )

The following result describes the behavior of the clique number of the random geometric graph G(n,d,p)for large values ofd.

Theorem 3. Fix p≤1/2and define the positive constant p0=p0(p) =

¨ 1/2 if p=1/2 1−Φ(2Φ1(1−p) +2.5) if p<1/2.

Letδn∈(0,p)and suppose

d=dnκbp

δ2n log31/(p−δ

n)n

where bκp = 65 ln2(1/p0). If either δn → 0 or δnδ for some constant 0 < δ < p, then, with probability converging to1(as n→ ∞),

ω(n,d,p)≤2 log1/(p+δ

n)n−2 log1/(p+δ

n)log1/(p+δ

n)n+O(1). Also, iflim supn→∞δnlog2n<, then with probability converging to1,

ω(n,d,p)≥2 log1/(p−δ

n)n−2 log1/(p−δ

n)log1/(p−δ

n)n+ Ω(1).

(6)

Observe that the theorem implies that if d is about log3n thenω(n,d,p) is already of the order of logn. This is obtained by choosing δn as a constant. By letting δn go to zero slowly, we see that if(log3n)/d =o(1)thenω(n,d,p)≤(2+o(1))log1/pn. Finally, by takingδn ∼1/logn, we obtain that whend∼log5nthenω(n,d,p)≤2 log1/pn−2 log1/plog1/pn+O(1)and therefore the clique number is at most as large as in an Erd˝os-Rényi random graph, up to an additive constant.

For the lower bound, we need the extra condition thatδn = O(1/log2n) and therefore the lower bound is only meaningful ford at least of the order of log7n. We believe that this condition is not necessary. In fact, we conjecture that for fixed d and p, the clique number is non-increasing in d (in the stochastic sense thatP{ω(n,d,p)≥k}is non-increasing for each k). If this conjecture was true that the lower bound would hold without any condition for d simply because, as d → ∞, by Theorem 2,ω(n,d,p)converges, in distribution, to the clique number of the Erd˝os-Rényi random graph.

The theorem follows from Theorems 8 and 9 (together with the observation that p0 ≤min(bp,ep)) which are shown in Section 3.2. Before turning to the proof we note that for small values of d, ω(n,d,p)behaves in an entirely different way, as the next simple proposition shows.

Proposition 4. If d≥8,

Eω(n,d,p)≥ n 3p

d(1+t2p,d)

 1−

1+t2p,d

2

4

d1 2

.

The proposition follows simply by observing that if k points fall in any spherical cap C of angle arccos(tp,d)/2 that is, a spherical cap of height 1−(1+t2p,d)/2, then they are mutually connected and therefore form a clique. The expected number of points that fall in any such fixed cap C is d1(C)which, by (1) is at least

n 1

3p

d(1+t2p,d)

 1−

1+t2p,d2

4

d−1 2

provided p

2/d ≤ (1+t2p,d)/2. This lower bound may be improved by packing as many non- overlapping spherical caps of height 1−(1+t2p,d)/2 inSd1 as possible and considering the one containing the largest number of points. Even though the number of such caps is exponentially large ind, the refined bound is not significantly better than the one obtained above. The negligible benefit does not justify the inclusion of the more technical details.

On the one hand, Proposition 4 above shows that ifd logn, the clique number grows linearly, or almost linearly, with nwhile according to Theorem 3, ifd is at least of the order of log3n, the clique number is logarithmic inn. The next result, using very different techniques, shows that when d ∼log2n, then the clique number is already poly-logarithmic inn, at least for p<1/2. The proof is given in Appendix B.

Theorem 5. For any p<1/2and0< η <1, the clique numberω(n,d,p)of the random geometric graph G(n,d,p)satisfies, with probability at least1−η,

ω(n,d,p)n

È d+1

d(d tp,d+1)exp

‚−(d−1)(d tp,d+1) 2(d+1)

Œ

+4(d+1)ln 2ne

d+1+4 ln4 η .

(7)

To interpret this bound, recall that forp<1/2 fixed andd large,tp,dd1/2Φ1(1−p). Thus, the upper bound is of the ordernd−1/2exp(−d1/2/2) +dlog(n/d). Thus, whend∼(3 logn)2, we have ω(n,d,p) =Op(log3n). Notice also that as soon asd→ ∞,ω(n,d,p) =op(n).

3.1 The expected number of cliques

The proof of Theorem 3 is based on the first and second moment methods (see, e.g., Alon and Spencer[2]). To this end, first we need to study the expected number of cliques of size k in the random geometric graphG(n,d,p). In particular, we compare it to the expected number of cliques of the same size inG(n,p)which is

n k

p(k2) .

Denote the (random) number of cliques of sizekbyNk=Nk(n,d,p). But ENk=

n k

P{X1, . . . ,Xk form a clique}

and therefore it suffices to study the probability thatkpoints are all connected with each other. Let pk=pk(d,p) =P{X1, . . . ,Xkform a clique}denote this probability.

The cases whenp=1/2 and p<1/2 are slightly different and we treat them separately.

Theorem 6. (UPPER BOUND FOR THE EXPECTED NUMBER OF CLIQUES.) Let K≥2be a positive integer, letδn>0, and define

bp=bp(p) =1−Φ(tp,d pd) Assume

d

8(K+1)2ln1

bp

δ2n

Kln4

bp+lnK−1 2

. Then, for any1≤kK,

ENk(n,d, 1/2)≤e n

k

Φ(δn)(k2) . Furthermore, for p<1/2, defineβ =2p

ln(4/bp)and forβp

K/d<1, letα

1−βp

K/d. Then for any0< δn< αtp,d

pd we have, for any1≤kK,

ENk(n,d,p)e1/

p2

n k



1−ց αtp,d

p

dδn‹‹(k2) .

Remark. Note that (2) implies that asα→1 andδn→0, 1−Φ€ αtp,dp

dδn

Š→p.

Proof. Fix akK. We use the Gaussian representation of the Xi described in Section 2. That is, we write Xi =Zi/kZikwhereZ1, . . . ,Zn are independent standard normal vectors inRd. First we perform Gram-Schmidt orthogonalization forZk−1 1=Z1, . . . ,Zk−1. In other words, let

v1= Z1 kZ1k

(8)

and definer1=0(thed-dimensional zero vector). For j=2, . . . ,k−1, introduce, recursively, rj=

j−1

X

i=1

(Zj,vi)vi and vj= Zjrj kZjrjk . Thenv1, . . . ,vk1are orthonormal vectors, depending onZk11 only.

First we treat the casep<1/2. Introduce the “bad” event Bk−1=

jk−1 :krjk2>2(k+1)2ln(1/bp)or∃jk−1 :kZjk2< d 2

and write

pk ≤ P{X1, . . . ,Xkform a clique,Bkc1}+P{Bk1}

= E

– P

¨‚ Zk kZkk, Zj

kZjk

Œ

tp,d for all jk−1|Zk−11

«

I{X1,...,Xk−1form a clique}I{Bk−c 1}

™

+P{Bk−1}. (3)

Now fixZk11 such thatX1, . . . ,Xk−1 form a clique andBk−1does not occur. Then, for anyδn>0, P

¨‚ Zk kZkk, Zj

kZjk

Œ

tp,d for all jk−1|Zk11

«

= P

¨‚

Zk, rj

kZjk+kZjrjk kZjk vj

Œ

tp,dkZkkfor all jk−1|Zk11

«

≤ P

¨‚

Zk,kZjrjk kZjk vj

Œ

tp,dkZkk −δnfor all jk−1|Z1k−1

«

+

k−1

X

j=1

P

¨‚

Zk, rj kZjk

Œ

δn|Zk11

«

. (4)

For any fixed 1≤ jk−1 andδn>0, P

¨‚

Zk, rj kZjk

Œ

δn|Zk11

«

≤ PZk,rjŠ

> δnp

d/2|Zk11 o

≤ 1 2e

δ2 n d 4krjk2

≤ 1 2e

8(k+1)2 ln 1δ2n d

bp , (5)

where we used the fact that, conditionally onZ1k1, (Zk,rj)has centered normal distribution with variancekrjk2≤2(k+1)2ln(1/bp). Furthermore, onBk−c 1, for any 0< α <1, ifαtp,d

pd> δn then

P

¨‚

Zk,kZjrjk kZjk vj

Œ

tp,dkZkk −δn for all jk−1|Zk−11

«

≤ P

§€ Zk,vjŠ

tp,dαp

dδn for all jk−1|Zk−11

ª+P{kZkk< αp

d} (6)



1−ց αtp,dp

dδn

‹‹k−1

+e(1−α)

2d

4 , (7)

(9)

where we used the fact that by rotational invariance of the multivariate standard normal distribu- tion, the (Zk,v1), . . . ,(Zk,vk1) are independent standard normal random variables, and the last term follows from the standard tail bound on theχ2distribution

P{χd2<d−2p

d t} ≤e−t (8)

witht= (1−α2)2d/4, whereχd2denotes a random variable withχ2distribution withd degrees of freedom (see, e.g., Massart[13]). Therefore, the first term in (3) can be bounded as

E

– P

¨‚ Zk kZkk, Zj

kZjk

Œ

tp,d for all jk−1|Z1k1

«

I{X1,...,Xk1form a clique}I{Bck

1}

™

pk1

‚

1−ց αtp,d

p dδn

‹‹k1

+e(1−α

2)2d

4 + k−1

2 e

δ2 n d 8(k+1)2 ln(1/bp)

Œ

. (9)

Using the definition ofα, the second term above may be bounded, by e(1−α

2)2d

4

bp 4

K

. The last term in (9) can also be bounded by(bp/4)K using

δ2n

8(k+1)2ln1

bp

d

Kln4

bp +lnk−1 2

. Thus, (9) is bounded from above as

pk−1

‚

1−ց

αtp,dp

dδn‹‹k−1

+e(

1−α2)2d

4 +k−1

2 e

δ2 n d 8(k+1)2 ln(1/bp)

Œ

pk−1

‚

1−ց

αtp,dp dδn

‹‹k1

+2

bp 4

KŒ

pk−1€

1+23K+kŠ

1−ց

αtp,dp

dδn‹‹k−1

, (10)

where we used the fact thatbp≤1−Φ€ αtp,d

pdδn

Š<1/2 (asαtp,d

pd> δnby our assumptions).

We may bound the probability of the “bad” event as follows.

P{Bk−1} ≤ P

jk−1 :krjk2>2(k+1)2ln1 bp

+P

jk−1 :kZjk2< d 2

≤ (k−1)P

χk−2 1>2(k+1)2ln1 bp

+ (k−1)P

χd2< d 2

.

Here the second term can be bounded by using the tail inequality (8) with t =d/16, which yields P{χd2<d/2} ≤ed/16. The first term can be bounded using the standard tail bound

P{χl2l>2t+2p

l t} ≤et (11)

(10)

(see[13]) with

t=

q

4(k+1)2ln1

bp)−p k−1

‹2

4 = (k+1)2ln1

bp− q

(k−1)

4(k+1)2ln1

bp)−k+1 2

andl =k−1, which implies

P{χk−12 >2(k+1)2ln(1/bp)} ≤e2(k+1)2ln(1/bp)/4=bp(k+1)2/2. Thus

P{Bk1} ≤(k−1)

bp(k+1)2/2+ed/16

. (12)

If, in addition,d≥8(k+1)2ln(1/bp), we obtain

P{Bk1} ≤2(k−1)bp(k+1)2/2, (13) and so, by (3), (9) and (10) we have

pkpk−1€

1+23K+kŠ

1−ց

αtp,dp dδn

‹‹k−1

+2(k−1)bp(k+1)2/2. (14) Next we show that

pk



1−ց αtp,d

p dδn

‹‹(k2)k−Y1

j=1

(1+2−j−1/2) (15)

which finishes the proof of the theorem forp<1/2 sinceQk

j=1(1+2j1/2)≤e

Pk

j=12j1/2<e1/sqr t2. We proceed by induction. (15) trivially holds fork=1. Assuming it holds fork−1 for somek≥2, from (14) we obtain

pk



1−ց

αtp,dp dδn

‹‹(k21)

k2

Y

j=1

(1+2−j−1/2)

€1+23K+kŠ

1−ց αtp,dp

dδn

‹‹k1

+2(k−1)bp(k+1)2/2



1−ց αtp,d

p dδn

‹‹(k2)

k−2

Y

j=1

(1+2j1/2)

1+23K+k+2(k−1)23k+21



1−ց αtp,d

p dδn

‹‹(k2)Yk1

j=1

(1+2j−1/2)

where we used thatbp≤1−Φ€ αtp,dp

dδn

Š <1/2 (asαtp,dp

d > δn by our assumptions) and that 2−3K+k+2(k−1)23k+12 <2−k+1/2forK≥2 since 2(k−1)2−k/2≤3/2 for allk. This completes the proof of (15), and hence that of the theorem forp<1/2.

Forp=1/2, we need the following modifications. Bk−1 is now defined as Bk1=

jk−1 :krjk2>2(k+1)2ln 2 or ∃jk−1 :kZjrjk2< d 2

.

(11)

Then (3) still holds, but instead of (4) we write P

¨‚ Zk kZkk, Zj

kZjk

Œ

≥0 for all jk−1|Zk11

«

≤ P¦€Zk,vjŠ

≥ −δn for all jk−1|Zk11© +

k−1X

j=1

P

¨‚

Zk, rj kZjrjk

Œ

> δn|Zk11

« .

From here, similarly to (5) and (7), the following analog of (9) can be obtained:

E h

P¦€Xk,XjŠ

tp,d for all jk−1|Zk11©

I{X1,...,Xk1form a clique}I{Bkc

1}

i

pk1

‚

1−Φ −δn

k1

+k−1 2 e

δ2 n d 8(k+1)2 ln 2

Π.

As the bound (12) remains valid for the redefined Bk1 (with bp=1/2), the proof may be finished as before ford≥8(k+1)2ln 2 and

δ2n≥ 8(K+1)2ln 2 d

Kln 4

Φ(δn)+lnK−1 2

.

Theorem 7. (LOWER BOUND FOR THE EXPECTED NUMBER OF CLIQUES.)Introduce

ep=ep(p) =

(1/2 if p=1/2;

1−Φ€ 2tp,dp

d+1Š

if p<1/2;

and letδn∈(0, 2/3]and K≥3. Assume

d> 8(K+1)2ln1

ep

δ2n

Kln4

ep+ln(K−1) 2

. (16)

Then, for any1≤kK,

ENk(n,d, 1/2)≥ 4 5

n k

1−Φ(δn)(k2) . For p<1/2, defineα >0as

α2=1+ r8K

d ln4 ep . Then

ENk(n,d,p)≥ 4 5

n k

€1−ΦeK(d,p)Š(k2) , (17)

whereΦeK(d,p) = Φ αtp,d

pdn

q

1−2(K+1)2 ln(1/ep) d

! .

(12)

Proof. The proof is a simple variant of the previous theorem, and we use the notation introduced there. Fix akK. Define the “bad” eventBek1as

Bek1=

jk−1 :krjk2>2(k+1)2ln(1/ep)or ∃jk−1 :kZjrjk2< d 2

. (18)

Then

pk ≥ P{X1, . . . ,Xkform a clique ,eBkc1} (19)

= E

– P

¨‚ Zk kZkk, Zj

kZjk

Œ

tp,d for all jk−1|Z1k−1

«

I{X1,...,Xk−1form a clique}I{eBc

k1}

™

FixZ1, . . . ,Zk−1 such that they form a clique andBek−1does not occur. Then P

¨‚ Zk kZkk, Zj

kZjk

Œ

tp,d for all jk−1|Zk−1 1

«

≥ P

¨‚

Zk,kZjrjk kZjk vj

Œ

tp,dkZkk+δnfor all jk−1|Zk11

«

k−1X

j=1

P

¨‚

Zk, rj

kZjk≤ −δn

Œ

|Zk11

«

. (20)

Now for anyα >1, the first term can be bounded as P

¨‚

Zk,kZjrjk kZjk vj

Œ

tp,dkZkk+δnfor all jk−1|Zk11

«

≥ P

Zk, r

1−2(k+1)2ln(1/ep)

d vj

≥tp,dkZkk+δnfor all jk−1|Zk11

≥ P

€Zk,vjŠ

αtp,d

pd+δn q

1−2(k+1)2dln(1/ep)

for all jk−1|Z1k1

−P

§

kZkk> αp d

ª

≥ €

1−Φek(d,p)Šk−1

e

d

4·2α21)2 , (21)

where the first inequality holds since rj andZjrj are orthogonal andkrjk2<2(k+1)2ln(1/ep) onBek−1, implyingkZjrjk/kZjk ≥p

1−2(k+1)2ln(1/ep)/d, and the last inequality follows again by (11) (witht = d4·2α21)2) and the fact that the(Zk,v1), . . . ,(Zk,vk−1)are independent standard normal random variables. The second term in (20) can be bounded similarly to (5). The conditions

(13)

of the theorem forαanddimply P

¨‚ Zk kZkk, Zj

kZjk

Œ

tp,d for all jk−1|Z1k−1

«

≥ €

1−Φek(d,p)Šk−1

e

d

4·2α21)2k−1 2 e

8(k+1)2 ln 1δ2n d

ep (22)

≥ €

1−Φek(d,p)Šk−1

−2

ep 4

K

≥ €

1−Φek(d,p)Šk−1€

1−22K+1Š

(23) where at the last step we used the fact thatep<1−Φek(d,p)<1/2,1

To finish the proof of (17), we proceed, again, by induction, to prove that pkηkK 1−

k

X

i=2

4−i

!

€1−Φek(d,p)Š(k2)

withηK

1−22K+1Š

. This is sufficient to prove the theorem becauseηkK 1−Pk

i=24−i

>4/5 for allkKwhenK≥3. This clearly holds fork=1. Assuming it holds for somek−1,k≥2, and taking into account that, similarly to (13),

P{Bek−1} ≤2(k−1)ep(k+1)2/2≤2(k−1)€

1−Φek(d,p(k+1)2/2

, we obtain

pkηK

€1−Φek(d,pk1€

pk−1−P{Bek−1

+

ηK

€1−Φek(d,p)Šk−1

× ηk−K 1 1−

k1

X

i=2

4−i

!

€1−Φek(d,p)Š(k−21)

−2(k−1)€

1−Φek(d,p(k+1)2/2

!

+

= ηK

€1−Φek(d,p)Š(k2) ηk−K 1 1−

k1

X

i=2

4−i

!

−2(k−1)€

1−Φek(d,p(5k+1)/2

!

+

ηkK€

1−Φek(d,p)Š(k2) 1− Xk−1

i=2

4i−4k

!

where x+=max(x, 0)denotes the positive part of a real numberx and we used that 1−Φek(d,p)<

1/2 and 2(k−1)2−k/2−1< ηkK< ηk−K 1 for all 2≤kK whenK≥3.

1The second inequality is trivial. The first one can be obtained by noting that (16) impliesαp

1+δn1+δn/2 and 2(K+1)2ln(1/ep)/d< δ2n/4. From here, usingδn2/3, we have

αtp,dp d+δn

q

12(K+1)2dln(1/ep)

<(1+δn/2)tp,dp d+δn

1δn/2 <2tp,d

p d+1

which impliesep<1ΦeK(d,p).

Odkazy

Související dokumenty

The importance of remote sensing data (and, first and foremost, satellite images) as the most suitable technique for large-scale landscape prospection and feature detection in

Test statistic matters... Motivation and Recap P-value Confidence Intervals Bootstrapping.

First we show that the testability and the fault coverage achieved by a certain number of pseudo-random test vectors strictly depend on the tested circuit.. Knowledge of the

Abstract. We generalize a previous result concerning the geometric reali- zability of model spaces as curvature homogeneous spaces, and investigate applications of this approach.

In this paper, we give some lower and upper bounds on the first Zagreb index M 1 (G) of graphs and trees in terms of number of vertices, irregularity index, maxi- mum degree,

McKay, The asymptotic number of labeled connected graphs with a given number of vertices and edges, Random Structures and Algorithms, 1 (1990) 127–169..

In Section 4, we use Theorem 7 to obtain an asymptotic distribution for the number of Euler tours of a random d-in/d-out graph.. 2 Expectation and Variance of

Concretely, we will test the hypothesis that countries with higher foreign bank participation tend to finance large businesses more than businesses of small and