Probability events law of large numbers. Law of large numbers in Chebyshev form

Law of Large Numbers

The practice of studying random phenomena shows that although the results of individual observations, even those carried out under the same conditions, can differ greatly, at the same time, the average results for a sufficiently large number of observations are stable and weakly depend on the results of individual observations. The theoretical justification for this remarkable property of random phenomena is the law of large numbers. The general meaning of the law of large numbers is that the joint action of a large number of random factors leads to a result that is almost independent of chance.

Central limit theorem

Lyapunov's theorem explains the wide distribution of the normal distribution law and explains the mechanism of its formation. The theorem allows us to assert that whenever a random variable is formed as a result of adding a large number of independent random variables, the variances of which are small compared to the variance of the sum, the distribution law of this random variable turns out to be practically a normal law. And since random variables are always generated by an infinite number of causes, and most often none of them has a variance comparable to the variance of the random variable itself, most of the random variables encountered in practice are subject to the normal distribution law.

Let us dwell in more detail on the content of the theorems of each of these groups.

In practical research, it is very important to know in what cases it is possible to guarantee that the probability of an event will be either sufficiently small or arbitrarily close to unity.

Under law of large numbers and is understood as a set of sentences in which it is stated that with a probability arbitrarily close to one (or zero), an event will occur that depends on a very large, indefinitely increasing number of random events, each of which has only a slight influence on it.

More precisely, the law of large numbers is understood as a set of sentences in which it is stated that with a probability arbitrarily close to one, the deviation of the arithmetic mean of a sufficiently large number of random variables from a constant value, the arithmetic mean of their mathematical expectations, will not exceed a given arbitrarily small number.

Separate, single phenomena that we observe in nature and in social life often appear as random (for example, a recorded death, the sex of a born child, air temperature, etc.) due to the fact that many factors that are not related to the essence of the emergence or development of a phenomenon. It is impossible to predict their total effect on the observed phenomenon, and they manifest themselves differently in individual phenomena. Based on the results of one phenomenon, nothing can be said about the patterns inherent in many such phenomena.

However, it has long been noted that the arithmetic mean of the numerical characteristics of certain features (the relative frequency of the occurrence of an event, the results of measurements, etc.) with a large number of repetitions of the experiment is subject to very slight fluctuations. In the middle one, as it were, the regularity inherent in the essence of phenomena manifests itself; in it, the influence of individual factors, which made the results of individual observations random, is mutually canceled out. Theoretically, this behavior of the average can be explained using the law of large numbers. If some very general conditions regarding random variables are met, then the stability of the arithmetic mean will be a practically certain event. These conditions constitute the most important content of the law of large numbers.

The first example of the operation of this principle can be the convergence of the frequency of occurrence of a random event with its probability with an increase in the number of trials - a fact established in Bernoulli's theorem (Swiss mathematician Jacob Bernoulli(1654-1705)). Bernoull's theorem is one of the simplest forms of the law of large numbers and is often used in practice. For example, the frequency of occurrence of any quality of the respondent in the sample is taken as an estimate of the corresponding probability).

Outstanding French mathematician Simeon Denny Poisson(1781-1840) generalized this theorem and extended it to the case when the probability of events in a trial varies independently of the results of previous trials. He was also the first to use the term "law of large numbers".

Great Russian mathematician Pafnuty Lvovich Chebyshev(1821 - 1894) proved that the law of large numbers operates in phenomena with any variation and also extends to the regularity of the average.

A further generalization of the theorems of the law of large numbers is connected with the names A.A.Markov, S.N.Bernshtein, A.Ya.Khinchin and A.N.Kolmlgorov.

The general modern formulation of the problem, the formulation of the law of large numbers, the development of ideas and methods for proving theorems related to this law belong to Russian scientists P. L. Chebyshev, A. A. Markov and A. M. Lyapunov.

CHEBYSHEV'S INEQUALITY

Let us first consider auxiliary theorems: the lemma and the Chebyshev inequality, with the help of which the law of large numbers in the Chebyshev form is easily proved.

Lemma (Chebyshev).

If there are no negative values ​​of the random variable X, then the probability that it will take on some value that exceeds the positive number A is not greater than a fraction, the numerator of which is the mathematical expectation of the random variable, and the denominator is the number A:

Proof.Let the distribution law of the random variable X be known:

(i = 1, 2, ..., ), and we consider the values ​​of the random variable to be arranged in ascending order.

In relation to the number A, the values ​​of a random variable are divided into two groups: some do not exceed A, while others are greater than A. Suppose that the first group includes the first values ​​of a random variable ().

Since , then all terms of the sum are non-negative. Therefore, discarding the first terms in the expression, we obtain the inequality:

Insofar as

,

then

Q.E.D.

Random variables can have different distributions with the same mathematical expectations. However, for them, Chebyshev's lemma will give the same estimate of the probability of one or another test result. This shortcoming of the lemma is related to its generality: it is impossible to achieve a better estimate for all random variables at once.

Chebyshev's inequality .

The probability that the deviation of a random variable from its mathematical expectation will exceed a positive number in absolute value is not greater than a fraction whose numerator is the variance of the random variable and the denominator is

Proof.Since a random variable that does not take negative values, we apply the inequality from the Chebyshev lemma for a random variable for :


Q.E.D.

Consequence. Insofar as

,

then

- another form of Chebyshev's inequality

We accept without proof the fact that the lemma and Chebyshev's inequality are also true for continuous random variables.

Chebyshev's inequality underlies the qualitative and quantitative statements of the law of large numbers. It defines the upper bound on the probability that the deviation of the value of a random variable from its mathematical expectation is greater than some given number. It is remarkable that the Chebyshev inequality gives an estimate of the probability of an event for a random variable whose distribution is unknown, only its mathematical expectation and variance are known.

Theorem. (Law of large numbers in Chebyshev form)

If the dispersions of independent random variables are limited by one constant C, and their number is large enough, then the probability is arbitrarily close to unity that the deviation of the arithmetic mean of these random variables from the arithmetic mean of their mathematical expectations will not exceed the given positive number in absolute value, no matter how small it is neither was:

.

We accept the theorem without proof.

Consequence 1. If independent random variables have the same, equal, mathematical expectations, their variances are limited by the same constant C, and their number is large enough, then, no matter how small the given positive number is, the probability that the deviation of the mean is arbitrarily close to unity arithmetic of these random variables from will not exceed in absolute value .

The fact that the approximate value of an unknown quantity is taken as the arithmetic mean of the results of a sufficiently large number of measurements made under the same conditions can be justified by this theorem. Indeed, the measurement results are random, since they are affected by a lot of random factors. The absence of systematic errors means that the mathematical expectations of individual measurement results are the same and equal. Consequently, according to the law of large numbers, the arithmetic mean of a sufficiently large number of measurements will practically differ little from the true value of the desired value.

(Recall that errors are called systematic if they distort the measurement result in the same direction according to a more or less clear law. These include errors that appear as a result of the imperfection of the instruments (instrumental errors), due to the personal characteristics of the observer (personal errors) and etc.)

Consequence 2 . (Bernoulli's theorem.)

If the probability of the occurrence of event A in each of the independent trials is constant, and their number is sufficiently large, then the probability is arbitrarily close to unity that the frequency of the occurrence of the event differs arbitrarily little from the probability of its occurrence:

Bernoulli's theorem states that if the probability of an event is the same in all trials, then with an increase in the number of trials, the frequency of the event tends to the probability of the event and ceases to be random.

In practice, experiments are relatively rare in which the probability of an event occurring in any experiment is unchanged, more often it is different in different experiments. Poisson's theorem refers to a test scheme of this type:

Corollary 3 . (Poisson's theorem.)

If the probability of occurrence of an event in a -test does not change when the results of previous trials become known, and their number is large enough, then the probability that the frequency of occurrence of an event differs arbitrarily little from the arithmetic mean probabilities is arbitrarily close to unity:

Poisson's theorem states that the frequency of an event in a series of independent trials tends to the arithmetic mean of its probabilities and ceases to be random.

In conclusion, we note that none of the considered theorems gives either an exact or even an approximate value of the desired probability, but only its lower or upper bound is indicated. Therefore, if it is required to establish the exact or at least approximate value of the probabilities of the corresponding events, the possibilities of these theorems are very limited.

Approximate probabilities for large values ​​can only be obtained using limit theorems. In them, either additional restrictions are imposed on random variables (as is the case, for example, in the Lyapunov theorem), or random variables of a certain type are considered (for example, in the Moivre-Laplace integral theorem).

The theoretical significance of Chebyshev's theorem, which is a very general formulation of the law of large numbers, is great. However, if we apply it to the question of whether it is possible to apply the law of large numbers to a sequence of independent random variables, then, if the answer is yes, the theorem will often require that there be much more random variables than is necessary for the law of large numbers to come into force. This shortcoming of Chebyshev's theorem is explained by its general character. Therefore, it is desirable to have theorems that would more accurately indicate the lower (or upper) bound on the desired probability. They can be obtained by imposing on random variables some additional restrictions, which are usually satisfied for random variables encountered in practice.

REMARKS ON THE CONTENT OF THE LAW OF LARGE NUMBERS

If the number of random variables is large enough and they satisfy some very general conditions, then, no matter how they are distributed, it is practically certain that their arithmetic mean arbitrarily deviates a from a constant value - - the arithmetic mean of their mathematical expectations, that is, it is practically a constant value. Such is the content of the theorems relating to the law of large numbers. Consequently, the law of large numbers is one of the expressions of the dialectical connection between chance and necessity.

One can give many examples of the emergence of new qualitative states as manifestations of the law of large numbers, primarily among physical phenomena. Let's consider one of them.

According to modern concepts, gases consist of individual particles-molecules that are in chaotic motion, and it is impossible to say exactly where it will be at a given moment and at what speed this or that molecule will move. However, observations show that the total effect of molecules, such as the pressure of a gas on

vessel wall, manifests itself with amazing constancy. It is determined by the number of blows and the strength of each of them. Although the first and second are a matter of chance, the instruments do not detect fluctuations in the pressure of a gas under normal conditions. This is explained by the fact that due to the huge number of molecules, even in the smallest volumes

a change in pressure by a noticeable amount is practically impossible. Therefore, the physical law that states the constancy of gas pressure is a manifestation of the law of large numbers.

The constancy of pressure and some other characteristics of a gas at one time served as a weighty argument against the molecular theory of the structure of matter. Subsequently, they learned to isolate a relatively small number of molecules, ensuring that the influence of individual molecules still remained, and thus the law of large numbers could not manifest itself to a sufficient degree. Then it was possible to observe fluctuations in gas pressure, confirming the hypothesis of the molecular structure of matter.

The law of large numbers underlies various types of insurance (human life insurance for various periods, property, livestock, crops, etc.).

When planning the range of consumer goods, the demand for them from the population is taken into account. In this demand, the operation of the law of large numbers is manifested.

The sampling method widely used in statistics finds its scientific justification in the law of large numbers. For example, the quality of wheat brought from the collective farm to the procurement point is judged by the quality of grains accidentally captured in a small measure. There are few grains in the measure compared to the whole batch, but in any case, the measure is chosen such that there are quite enough grains in it for

manifestation of the law of large numbers with an accuracy that satisfies the need. We have the right to take the corresponding indicators in the sample as indicators of weediness, moisture content and the average weight of grains of the entire batch of incoming grain.

Further efforts of scientists to deepen the content of the law of large numbers were aimed at obtaining the most general conditions for the applicability of this law to a sequence of random variables. For a long time there were no fundamental successes in this direction. After P. L. Chebyshev and A. A. Markov, only in 1926 did the Soviet academician A. N. Kolmogorov manage to obtain conditions necessary and sufficient for the law of large numbers to be applicable to a sequence of independent random variables. In 1928, the Soviet scientist A. Ya. Khinchin showed that a sufficient condition for the applicability of the law of large numbers to a sequence of independent identically distributed random variables is the existence of their mathematical expectation.

For practice, it is extremely important to fully clarify the question of the applicability of the law of large numbers to dependent random variables, since phenomena in nature and society are mutually dependent and mutually determine each other. Much work has been devoted to elucidating the restrictions that must be imposed

into dependent random variables so that the law of large numbers can be applied to them, the most important ones being those of the outstanding Russian scientist A. A. Markov and the great Soviet scientists S. N. Bernshtein and A. Ya. Khinchin.

The main result of these papers is that the law of large numbers is applicable to dependent random variables, if only strong dependence exists between random variables with close numbers, and between random variables with distant numbers, the dependence is sufficiently weak. Examples of random variables of this type are the numerical characteristics of the climate. The weather of each day is noticeably influenced by the weather of the previous days, and the influence noticeably weakens with the distance of the days from each other. Consequently, the long-term average temperature, pressure and other characteristics of the climate of a given area, in accordance with the law of large numbers, should practically be close to their mathematical expectations. The latter are objective characteristics of the local climate.

In order to experimentally verify the law of large numbers, the following experiments were carried out at different times.

1. Experience Buffon. The coin is flipped 4040 times. The coat of arms fell 2048 times. The frequency of its occurrence was equal to 0.50694 =

2. Experience Pearson. The coin is flipped 12,000 and 24,000 times. The frequency of the loss of the coat of arms in the first case turned out to be 0.5016, in the Second - 0.5005.

H. Experience Westergaard. From the urn, in which there were equally white and black balls, 5011 white and 4989 black balls were obtained with 10,000 extractions (with the return of the next drawn ball to the urn). The frequency of white balls was 0.50110 = (), and black - 0.49890.

4. Experience of V.I. Romanovsky. Four coins are thrown 21160 times. Frequencies and frequencies of various combinations of coat of arms and grating were distributed as follows:

Combinations of the number of coat of arms and tails

Frequencies

Frequencies

empirical

Theoretical

4 and 0

1 181

0,05858

0,0625

3 and 1

4909

0,24350

0,2500

2 and 2

7583

0,37614

0,3750

1 and 3

5085

0,25224

0,2500

1 and 4

0,06954

0,0625

Total

20160

1,0000

1,0000

The results of experimental tests of the law of large numbers convince us that the experimental frequencies are close to the probabilities.

CENTRAL LIMIT THEOREM

It is easy to prove that the sum of any finite number of independent normally distributed random variables is also distributed according to the normal law.

If independent random variables are not distributed according to the normal law, then some very loose restrictions can be imposed on them, and their sum will still be normally distributed.

This problem was posed and solved mainly by Russian scientists P. L. Chebyshev and his students A. A. Markov and A. M. Lyapunov.

Theorem (Lyapunov).

If independent random variables have finite mathematical expectations and finite variances , their number is large enough, and with an unlimited increase

,

where are the absolute central moments of the third order, then their sum with a sufficient degree of accuracy has a distribution

(In fact, we present not Lyapunov's theorem, but one of its corollaries, since this corollary is quite sufficient for practical applications. Therefore, the condition , which is called the Lyapunov condition, is a stronger requirement than is necessary for the proof of Lyapunov's theorem itself.)

The meaning of the condition is that the action of each term (random variable) is small compared to the total action of all of them. Many random phenomena that occur in nature and in social life proceed exactly according to this pattern. In this regard, the Lyapunov theorem has exclusively great importance, and the normal distribution law is one of the basic laws in probability theory.

Let, for example, dimension some size . Various deviations of the observed values ​​from its true value (mathematical expectation) are obtained as a result of the influence of a very large number of factors, each of which generates a small error , and . Then the total measurement error is a random variable, which, according to the Lyapunov theorem, must be distributed according to the normal law.

At gun shooting under the influence of a very large number of random causes, shells are scattered over a certain area. Random effects on the projectile trajectory can be considered independent. Each cause causes only a small change in the trajectory compared to the total change due to all causes. Therefore, it should be expected that the deviation of the projectile rupture site from the target will be a random variable distributed according to the normal law.

By Lyapunov's theorem, we have the right to expect that, for example, adult male height is a random variable distributed according to the normal law. This hypothesis, as well as those considered in the previous two examples, is in good agreement with observations. To confirm, we present the distribution by height of 1000 adult male workers and the corresponding theoretical numbers of men, i.e. the number of men who should have the growth of these groups, based on the distribution assumption growth of men according to the normal law.

Height, cm

number of men

experimental data

theoretical

forecasts

143-146

146-149

149-152

152-155

155-158

158- 161

161- 164

164-167

167-170

170-173

173-176

176-179

179 -182

182-185

185-188

It would be difficult to expect a more accurate agreement between the experimental data and the theoretical ones.

One can easily prove, as a corollary of Lyapunov's theorem, a proposition that will be needed in what follows to justify the sampling method.

Offer.

The sum of a sufficiently large number of identically distributed random variables with absolute central moments of the third order is distributed according to the normal law.

The limit theorems of the theory of probability, the theorems of Moivre-Laplace explain the nature of the stability of the frequency of occurrence of an event. This nature consists in the fact that the limiting distribution of the number of occurrences of an event with an unlimited increase in the number of trials (if the probability of an event in all trials is the same) is a normal distribution.

System of random variables.

The random variables considered above were one-dimensional, i.e. were determined by one number, however, there are also random variables that are determined by two, three, etc. numbers. Such random variables are called two-dimensional, three-dimensional, etc.

Depending on the type of random variables included in the system, systems can be discrete, continuous or mixed if the system includes different types of random variables.

Let us consider systems of two random variables in more detail.

Definition. distribution law system of random variables is called a relation that establishes a relationship between the areas of possible values ​​of the system of random variables and the probabilities of the occurrence of the system in these areas.

Example. From an urn containing 2 white and 3 black balls, two balls are drawn. Let be the number of drawn white balls, and the random variable is defined as follows:


Let's make a distribution table of the system of random variables:

Since is the probability that no white balls are taken out (hence, two black balls are taken out), while , then

.

Probability

.

Probability

Probability is the probability that no white balls are taken out (and, therefore, two black balls are taken out), while , then

Probability is the probability that one white ball (and, therefore, one black) is drawn, while , then

Probability - the probability that two white balls are drawn (and, therefore, no black ones), while , then

.

Thus, the distribution series of a two-dimensional random variable has the form:

Definition. distribution function system of two random variables is called a function of two argumentsF( x, y) , equal to the probability of joint fulfillment of two inequalitiesX< x, Y< y.


We note the following properties of the distribution function of a system of two random variables:

1) ;

2) The distribution function is a non-decreasing function with respect to each argument:

3) The following is true:

4)


5) The probability of hitting a random point ( X , Y ) into an arbitrary rectangle with sides parallel to the coordinate axes, is calculated by the formula:


Distribution density of a system of two random variables.

Definition. Joint distribution density probabilities of a two-dimensional random variable ( X , Y ) is called the second mixed partial derivative of the distribution function.

If the distribution density is known, then the distribution function can be found by the formula:

The two-dimensional distribution density is non-negative and the double integral with infinite limits of the two-dimensional density is equal to one.

From the known joint distribution density, one can find the distribution density of each of the components of a two-dimensional random variable.

; ;

Conditional laws of distribution.

As shown above, knowing the joint distribution law, one can easily find the distribution laws for each random variable included in the system.

However, in practice, the inverse problem is more often - according to the known laws of distribution of random variables, find their joint distribution law.

In the general case, this problem is unsolvable, because the distribution law of a random variable says nothing about the relationship of this variable with other random variables.

In addition, if random variables are dependent on each other, then the distribution law cannot be expressed in terms of the distribution laws of the components, since should establish a connection between the components.

All this leads to the need to consider conditional distribution laws.

Definition. The distribution of one random variable included in the system, found under the condition that another random variable has taken a certain value, is called conditional distribution law.

The conditional distribution law can be specified both by the distribution function and by the distribution density.

The conditional distribution density is calculated by the formulas:

The conditional distribution density has all the properties of the distribution density of one random variable.

Conditional mathematical expectation.

Definition. Conditional expectation discrete random variable Y at X = x (x is a certain possible value of X) is called the product of all possible values Y on their conditional probabilities.

For continuous random variables:

,

where f( y/ x) is the conditional density of the random variable Y when X = x .

Conditional expectationM( Y/ x)= f( x) is a function of X and called regression function X on Y.

Example.Find the conditional expectation of the component Y at

X=x1 =1 for a discrete two-dimensional random variable given by the table:

Y

x1=1

x2=3

x3=4

x4=8

y1=3

0,15

0,06

0,25

0,04

y2=6

0,30

0,10

0,03

0,07

The conditional variance and conditional moments of the system of random variables are defined similarly.

Dependent and independent random variables.

Definition. Random variables are called independent, if the distribution law of one of them does not depend on what value the other random variable takes.

The concept of dependence of random variables is very important in probability theory.

Conditional distributions of independent random variables are equal to their unconditional distributions.

Let us define the necessary and sufficient conditions for the independence of random variables.

Theorem. Y are independent, it is necessary and sufficient that the distribution function of the system ( X, Y) was equal to the product of the distribution functions of the components.

A similar theorem can be formulated for the distribution density:

Theorem. In order for the random variables X and Y are independent, it is necessary and sufficient that the joint distribution density of the system ( X, Y) was equal to the product of the distribution densities of the components.

The following formulas are practically used:

For discrete random variables:

For continuous random variables:

The correlation moment serves to characterize the relationship between random variables. If the random variables are independent, then their correlation moment is zero.

The correlation moment has a dimension equal to the product of the dimensions of the random variables X and Y . This fact is a disadvantage of this numerical characteristic, since with different units of measurement, different correlation moments are obtained, which makes it difficult to compare the correlation moments of different random variables.

In order to eliminate this shortcoming, another characteristic is applied - the correlation coefficient.

Definition. Correlation coefficient rxy random variables X and Y is the ratio of the correlation moment to the product of the standard deviations of these quantities.

The correlation coefficient is a dimensionless quantity. For independent random variables, the correlation coefficient is zero.

Property: The absolute value of the correlation moment of two random variables X and Y does not exceed the geometric mean of their dispersions.

Property: The absolute value of the correlation coefficient does not exceed unity.

Random variables are called correlated if their correlation moment is nonzero, and uncorrelated if their correlation moment is zero.

If random variables are independent, then they are uncorrelated, but from uncorrelation one cannot conclude that they are independent.

If two quantities are dependent, then they can be either correlated or uncorrelated.

Often, according to a given distribution density of a system of random variables, one can determine the dependence or independence of these variables.

Along with the correlation coefficient, the degree of dependence of random variables can also be characterized by another quantity, which is called coefficient of covariance. The coefficient of covariance is determined by the formula:

Example. The distribution density of the system of random variables X andindependent. Of course, they will also be uncorrelated.

Linear regression.

Consider a two-dimensional random variable ( X , Y ), where X and Y are dependent random variables.

Let us represent approximately one random variable as a function of another. An exact match is not possible. We assume that this function is linear.

To determine this function, it remains only to find the constant values a and b.

Definition. Functiong( X) called best approximation random variable Y in the sense of the least squares method, if the mathematical expectation

Takes on the smallest possible value. Also functiong( x) called mean square regression Y to X .

Theorem. Linear mean square regression Y on X is calculated by the formula:

in this formula mx= M( X random variable Yrelative to random variable X. This value characterizes the magnitude of the error resulting from the replacement of a random variableYlinear functiong( X) = aX +b.

It is seen that if r= ± 1, then the residual variance is zero, and hence the error is zero and the random variableYis exactly represented by a linear function of the random variable X.

Direct Root Mean Square Regression X on theYis determined similarly by the formula: X and Yhave linear regression functions in relation to each other, then we say that the quantities X andYconnected linear correlation dependence.

Theorem. If a two-dimensional random variable ( X, Y) is normally distributed, then X and Y are connected by a linear correlation dependence.

E.G. Nikiforova


The distribution function of a random variable and its properties.

distribution function random variable X is called the function F(X), which expresses for each x the probability that the random variable X takes a value less than x: F(x)=P(X

Function F(x) sometimes called integral function distribution or integral distribution law.

Distribution function properties:

1. The distribution function of a random variable is a non-negative function enclosed between zero and one:

0 ≤ F(x) ≤ 1.

2. The distribution function of a random variable is a non-decreasing function on the whole number axis.

3. At minus infinity, the distribution function is equal to zero, at plus infinity it is equal to one, i.e.: F(-∞)= , F(+∞)= .

4. The probability of a random variable falling into the interval [x1,x2) (including x1) is equal to the increment of its distribution function on this interval, i.e. P(x 1 ≤ X< х 2) = F(x 2) - F(x 1).


Markov and Chebyshev inequality

Markov inequality

Theorem: If a random variable X takes only non-negative values ​​and has a mathematical expectation, then for any positive number A the equality is true: P(x>A) ≤ .

Since the events X > A and X ≤ A are opposite, replacing P(X > A) we express 1 - P (X ≤ A), we arrive at another form of Markov's inequality: P(X ≥ A) ≥1 - .

Markov's inequality k is applicable to any non-negative random variables.

Chebyshev's inequality

Theorem: For any random variable with mathematical expectation and variance, Chebyshev's inequality is true:

P (|X - a| > ε) ≤ D(X) / ε 2 or P (|X - a| ≤ ε) ≥ 1 - DX / ε 2, where a \u003d M (X), ε>0.


The law of large numbers "in the form" of Chebyshev's theorem.

Chebyshev's theorem: If the variances n independent random variables X1, X2,…. X n are limited by the same constant, then with an unlimited increase in the number n the arithmetic mean of random variables converges in probability to the arithmetic mean of their mathematical expectations a 1 ,a 2 ....,a n , i.e. .

The meaning of the law of large numbers is that the average values ​​of random variables tend to their mathematical expectation when n→ ∞ in probability. The deviation of the mean values ​​from the mathematical expectation becomes arbitrarily small with a probability close to one if n is large enough. In other words, the probability of any deviation of the means from a arbitrarily small with growth n.



30. Bernoulli's theorem.

Bernoulli's theorem: Event frequency in n repeated independent trials, in each of which it can occur with the same probability p, with an unlimited increase in the number n converge in probability to the probability p of this event in a separate trial: \

Bernoulli's theorem is a consequence of Chebyshev's theorem, because the frequency of an event can be represented as the arithmetic mean of n independent alternative random variables that have the same distribution law.

18. Mathematical expectation of a discrete and continuous random variable and their properties.

mathematical expectation is the sum of the products of all its values ​​and their corresponding probabilities

For a discrete random variable:

For a continuous random variable:

Properties of mathematical expectation:

1. The mathematical expectation of a constant value is equal to the constant itself: M(S)=S

2. The constant factor can be taken out of the expectation sign, i.e. M(kX)=kM(X).

3. The mathematical expectation of the algebraic sum of a finite number of random variables is equal to the same sum of their mathematical expectations, i.e. M(X±Y)=M(X)±M(Y).

4. The mathematical expectation of the product of a finite number of independent random variables is equal to the product of their mathematical expectations: M(XY)=M(X)*M(Y).

5. If all values ​​of a random variable are increased (decreased) by a constant C, then the mathematical expectation of this random variable will increase (decrease) by the same constant C: M(X±C)=M(X)±C.

6. The mathematical expectation of the deviation of a random variable from its mathematical expectation is zero: M=0.

If the phenomenon of sustainability medium takes place in reality, then in the mathematical model with which we study random phenomena, there must be a theorem reflecting this fact.
Under the conditions of this theorem, we introduce restrictions on random variables X 1 , X 2 , …, X n:

a) each random variable Х i has mathematical expectation

M(Х i) = a;

b) the variance of each random variable is finite, or we can say that the variances are bounded from above by the same number, for example With, i.e.

D(Х i) < C, i = 1, 2, …, n;

c) random variables are pairwise independent, i.e. any two X i and Xj at i¹ j independent.

Then obviously

D(X 1 + X 2 + … + X n)=D(X 1) + D(X 2) + ... + D(X n).

Let us formulate the law of large numbers in the Chebyshev form.

Chebyshev's theorem: with an unlimited increase in the number n independent tests " the arithmetic mean of the observed values ​​of a random variable converges in probability to its mathematical expectation ”, i.e. for any positive ε

R(| a| < ε ) = 1. (4.1.1)

The meaning of the expression "arithmetic mean = converges in probability to a" is that the probability that will differ arbitrarily little from a, approaches 1 indefinitely as the number n.

Proof. For a finite number n independent tests, we apply the Chebyshev inequality for a random variable = :

R(|–M()| < ε ) ≥ 1 – . (4.1.2)

Taking into account the restrictions a - b, we calculate M( ) and D( ):

M( ) = = = = = = a;

D( ) = = = = = = .

Substituting M( ) and D( ) into inequality (4.1.2), we obtain

R(| a| < ε )≥1 .

If in inequality (4.1.2) we take an arbitrarily small ε >0 and n® ¥, then we get

which proves the Chebyshev theorem.

An important practical conclusion follows from the considered theorem: we have the right to replace the unknown value of the mathematical expectation of a random variable by the arithmetic mean value obtained from a sufficiently large number of experiments. In this case, the more experiments to calculate, the more likely (reliable) it can be expected that the error associated with this replacement ( - a) will not exceed the given value ε .

In addition, other practical problems can be solved. For example, according to the values ​​of probability (reliability) R=R(| a|< ε ) and the maximum allowable error ε determine the required number of experiments n; on R and P define ε; on ε and P determine the probability of an event | a |< ε.

special case. Let at n trials observed n values ​​of a random variable x, having mathematical expectation M(X) and dispersion D(X). The obtained values ​​can be considered as random variables X 1 ,X 2 ,X 3 , ... ,X n,. It should be understood as follows: a series of P tests are carried out repeatedly, so as a result i th test, i= l, 2, 3, ..., P, in each series of tests one or another value of a random variable will appear X, not known in advance. Hence, i-e value x i random variable obtained in i th test, changes randomly if you move from one series of tests to another. So every value x i can be considered random X i .


Assume that the tests meet the following requirements:

1. Tests are independent. This means that the results X 1 , X 2 ,
X 3 , ..., X n tests are independent random variables.

2. Tests are carried out under the same conditions - this means, from the point of view of probability theory, that each of the random variables X 1 ,X 2 ,X 3 , ... ,X n has the same distribution law as the original value X, That's why M(X i) = M(X)and D(X i) = D(X), i = 1, 2, .... P.

Considering the above conditions, we get

R(| a| < ε )≥1 . (4.1.3)

Example 4.1.1. X is equal to 4. How many independent experiments are required so that with a probability of at least 0.9 it can be expected that the arithmetic mean of this random variable will differ from the mathematical expectation by less than 0.5?

Decision.According to the condition of the problem ε = 0,5; R(| a|< 0,5) 0.9. Applying formula (4.1.3) for the random variable X, we get

P(|–M(X)| < ε ) ≥ 1 .

From the relation

1 = 0,9

define

P= = = 160.

Answer: it is required to make 160 independent experiments.

Assuming that the arithmetic mean normally distributed, we get:

R(| a|< ε )= 2Φ () 0,9.

From where, using the table of the Laplace function, we get
1.645, or ≥ 6.58 i.e. n ≥49.

Example 4.1.2. Variance of a random variable X is equal to D( X) = 5. 100 independent experiments were carried out, according to which . Instead of the unknown value of the mathematical expectation a accepted . Determine the maximum amount of error allowed in this case with a probability of at least 0.8.

Decision. According to the task n= 100, R(| a|< ε ) ≥0.8. We apply the formula (4.1.3)

R(| a|< ε ) ≥1 .

From the relation

1 = 0,8

define ε :

ε 2 = = = 0,25.

Hence, ε = 0,5.

Answer: maximum error value ε = 0,5.

4.2. Law of large numbers in Bernoulli form

Although the concept of probability is the basis of any statistical inference, we can only in a few cases determine the probability of an event directly. Sometimes this probability can be established from considerations of symmetry, equal opportunity, etc., but there is no universal method that would allow one to indicate its probability for an arbitrary event. Bernoulli's theorem makes it possible to approximate the probability if for the event of interest to us BUT repeated independent tests can be carried out. Let produced P independent tests, in each of which the probability of occurrence of some event BUT constant and equal R.

Bernoulli's theorem. With an unlimited increase in the number of independent trials P relative frequency of occurrence of the event BUT converges in probability to probability p occurrence of an event BUT,t. e.

P(½ - p½≤ ε) = 1, (4.2.1)

where ε is an arbitrarily small positive number.

For the final n provided that , Chebyshev's inequality for a random variable will have the form:

P(| –p|< ε ) 1 .(4.2.2)

Proof. We apply the Chebyshev theorem. Let be X i– number of occurrences of the event BUT in i th test, i= 1, 2, . . . , n. Each of the quantities X i can only take two values:

X i= 1 (event BUT happened) with a probability p,

X i= 0 (event BUT did not occur) with a probability q= 1–p.

Let be Y n= . Sum X 1 + X 2 + … + X n is equal to the number m event occurrences BUT in n tests (0 m n), which means Y n= – relative frequency of occurrence of the event BUT in n tests. Mathematical expectation and variance X i are equal respectively:

M( ) = 1∙p + 0∙q = p,

Example 4.2.1. In order to determine the percentage of defective products, 1000 units were tested according to the return sampling scheme. What is the probability that the absolute value of the reject rate determined by this sample will differ from the reject rate for the entire batch by no more than 0.01, if it is known that, on average, there are 500 defective items for every 10,000 items?

Decision. According to the condition of the problem, the number of independent trials n= 1000;

p= = 0,05; q= 1 – p= 0,95; ε = 0,01.

Applying formula (4.2.2), we obtain

P(| p|< 0,01) 1 – = 1 – = 0,527.

Answer: with a probability of at least 0.527, it can be expected that the sample fraction of defects (the relative frequency of occurrence of defects) will differ from the share of defects in all products (from the probability of defects) by no more than 0.01.

Example 4.2.2. When stamping parts, the probability of marriage is 0.05. How many parts need to be checked so that with a probability of at least 0.95 it can be expected that the relative frequency of defective products will differ from the probability of marriage by less than 0.01?

Decision. According to the task R= 0,05; q= 0,95; ε = 0,01;

P(| p|<0,01) 0,95.

From equality 1 = 0.95 find n:

n= = =9500.

Answer: 9500 items need to be checked.

Comment. Estimates of the required number of observations obtained by applying Bernoulli's (or Chebyshev's) theorem are greatly exaggerated. There are more precise estimates proposed by Bernstein and Khinchin, but requiring a more complex mathematical apparatus. To avoid exaggeration of estimates, the Laplace formula is sometimes used

P(| p|< ε ) ≈ 2Φ .

The disadvantage of this formula is the lack of an estimate of the allowable error.

LECTURE 5

Repetition of the past

Part 1 - CHAPTER 9. LAW OF LARGE NUMBERS. LIMIT THEOREMS

With a statistical definition
probability, it is treated as some
the number towards which the relative
the frequency of a random event. At
axiomatic definition of probability -
it is, in fact, an additive measure of the set
outcomes favoring chance
event. In the first case, we are dealing with
empirical limit, in the second - with
the theoretical concept of measure. Absolutely not
Obviously they refer to the same
concept. Relationship of different definitions
probabilities are established by Bernoulli's theorem,
which is a special case of the law of large
numbers.

With an increase in the number of tests
the binomial law tends to
normal distribution. It's a theorem
De Moivre-Laplace, which is
special case of the central limit
theorems. The latter says that the function
distribution of the sum of independent
random variables with increasing number
terms tends to normal
law.
The law of large numbers and the central
the limit theorem underlies
mathematical statistics.

9.1. Chebyshev's inequality

Let the random variable ξ have
finite mathematical expectation
M[ξ] and variance D[ξ]. Then for
any positive number ε
the inequality is true:

Notes

For the opposite event:
Chebyshev's inequality is valid for
any distribution law.
Putting
fact:
, we get a nontrivial

9.2. The law of large numbers in Chebyshev form

Theorem Let random variables
are pairwise independent and have finite
variances limited to the same
constant
Then for
any
we have
Thus, the law of large numbers speaks of
convergence in probability of the arithmetic mean of random variables (i.e. random variable)
to their arithmetic mean mat. expectations (i.e.
to a non-random value).

9.2. Law of Large Numbers in Chebyshev Form: Complement

Theorem (Markov): law of large
numbers is satisfied if the variance
the sum of random variables does not grow
too fast as n grows:

10.9.3. Bernoulli's theorem

Theorem: Consider the Bernoulli scheme.
Let μn be the number of occurrences of event A in
n independent trials, p is the probability of occurrence of event A in one
test. Then for any
Those. the probability that the deviation
relative frequency of a random event from
its probability p will be modulo arbitrarily
small, it tends to unity as the number increases.
tests n.

11.

Proof: Random variable μn
distributed according to the binomial law, so
we have

12.9.4. Characteristic functions

The characteristic function of random
quantity is called a function
where exp(x) = ex.
Thus,
represents
expectation of some
complex random variable
associated with the magnitude. In particular, if
is a discrete random variable,
given by the distribution series (xi, pi), where i
= 1, 2,..., n, then

13.

For a continuous random variable
with distribution density
probabilities

14.

15.9.5. Central limit theorem (Lyapunov's theorem)

16.

Repeated the past

17. FUNDAMENTALS OF THE THEORY OF PROBABILITY AND MATHEMATICAL STATISTICS

PART II. MATHEMATICAL
STATISTICS

18. Epigraph

"There are three kinds of lies: lies,
blatant lies and statistics"
Benjamin Disraeli

19. Introduction

The two main tasks of mathematical
statistics:
collection and grouping of statistical
data;
development of analysis methods
received data depending on
research goals.

20. Methods of statistical data analysis:

estimation of the unknown probability of an event;
unknown function estimate
distribution;
estimation of the parameters of the known
distribution;
verification of statistical hypotheses about the species
unknown distribution or
parameter values ​​of the known
distribution.

21. CHAPTER 1. BASIC CONCEPTS OF MATHEMATICAL STATISTICS

22.1.1. General population and sample

General population - all
a lot of researched objects,
Sample - a set of objects, randomly
selected from the general population
for research.
The volume of the general population and
sample size - the number of objects in the general population and sample - we will
denoted as N and n, respectively.

23.

The sampling is repeated when
each selected object
choosing next returns to
the general population, and
non-repeating if selected
object in the general population
returns.

24. Representative sample:

correctly represents the features
general population, i.e. is an
representative (representative).
According to the law of large numbers, it can be argued that
that this condition is met if:
1) the sample size n is large enough;
2) each object of the sample is chosen randomly;
3) for each object, the probability of hitting
in the sample is the same.

25.

General population and sample
may be one-dimensional
(single factor)
and multidimensional (multifactorial)

26.1.2. Sample distribution law (statistical series)

Let in a sample of size n
random variable of interest to us ξ
(any parameter of objects
population) takes n1
times the value of x1, n2 times the value of x2,... and
nk times is the value of xk. Then the observables
values ​​x1, x2,..., xk of a random variable
ξ are called variants, and n1, n2,..., nk
– their frequencies.

27.

The difference xmax – xmin is the range
samples, the ratio ωi = ni /n –
relative frequency options xi.
It's obvious that

28.

If we write the options in ascending order, we get a variational series. A table made up of
ordered variant and their frequencies
(and/or relative frequencies)
is called a statistical series or
selective distribution law.
-- Analogue of the law of distribution of discrete
random variable in probability theory

29.

If the variation series consists of very
lots of numbers or
some continuous
sign, use grouped
sample. To obtain it, the interval
which contains all observable
feature values ​​are divided into
several usually equal parts
(subintervals) of length h. At
compiling a statistical series in
as xi, the midpoints are usually chosen
subintervals, and equate ni to the number
variant that fell into the i-th subinterval.

30.

40
- Frequencies -
35
30
n2
n3
ns
n1
25
20
15
10
5
0
a
a+h/2 a+3h/2
- Options -
b-h/2
b

31.1.3. Frequency polygon, sample distribution function

Let us postpone the values ​​of the random variable xi by
the abscissa axis, and the ni values ​​along the ordinate axis.
A broken line whose segments connect
points with coordinates (x1, n1), (x2, n2),..., (xk,
nk) is called a polygon
frequencies. If instead
absolute values ​​ni
put on the y-axis
relative frequencies ωi,
then we get a polygon of relative frequencies

32.

By analogy with the distribution function
discrete random variable by
the sampling law of distribution can be
build a sample (empirical)
distribution function
where the summation is performed over all
frequencies, which correspond to the values
variant, smaller x. notice, that
empirical distribution function
depends on the sample size n.

33.

Unlike the function
found
for a random variable ξ experimental
through the processing of statistical data, the true function
distribution
associated with
the general population is called
theoretical. (usually general
the aggregate is so large that
it is impossible to process it all;
can only be explored
in theory).

34.

Notice, that:

35.1.4. Properties of the empirical distribution function

stepped
view

36.

Another graphical representation
the sample we are interested in is
histogram - stepped figure,
consisting of rectangles whose bases are subintervals
width h, and heights - segments of length
ni/h (frequency histogram) or ωi/h
(histogram of relative frequencies).
In the first case
histogram area is equal to volume
samples n, during
second - unit

37. Example

38. CHAPTER 2. NUMERICAL CHARACTERISTICS OF THE SAMPLE

39.

The task of mathematical statistics is
get from the available sample
information about the general
aggregates. Numerical characteristics of a representative sample - assessment of the relevant characteristics
random variable under study,
related to general
aggregate.

40.2.1. Sample mean and sample variance, empirical moments

The sample mean is called
arithmetic mean of values
variant in the sample
The sample mean is used for
statistical evaluation of mathematical
expectations of the random variable under study.

41.

The sample variance is called
value equal to
Sample mean square
deviation -

42.

It is easy to show what is being done
the following relation, convenient for
variance calculation:

43.

Other characteristics
variation series are:
mode M0 is a variant having
the highest frequency, and the median me is
variant that divides the variational
row into two parts equal to the number
option.
2, 5, 2, 11, 5, 6, 3, 13, 5 (mode = 5)
2, 2, 3, 5, 5, 5, 6, 11.13 (median = 5)

44.

By analogy with the corresponding
theoretical expressions can
build empirical moments,
used for statistical
assessments of primary and central
moments of the random
quantities.

45.

By analogy with moments
theories
probabilities by initial empirical
moment of order m is the quantity
central empirical point
order m -

46.2.2. Properties of statistical estimates of distribution parameters: unbiasedness, efficiency, consistency

2.2. Properties of statistical estimates
distribution parameters: unbiasedness, efficiency, consistency
After receiving statistical estimates
random distribution parameters
values ​​ξ: sample mean, sample variance, etc., you need to make sure that
that they are a good approximation
for relevant parameters
theoretical distribution ξ.
Let's find the conditions that must for this
be performed.

47.

48.

The statistical score A* is called
unbiased if its mathematical
expectation equals evaluated parameter
general population A for any
sample size, i.e.
If this condition is not met, the estimate
called offset.
Unbiased estimation is not sufficient
condition for a good approximation of the statistical
scores A* to the true (theoretical) value
estimated parameter A.

49.

Scatter of individual values
relative to the average value M
depends on the variance D.
If the dispersion is large, then the value
found from the data of one sample,
may differ significantly from
evaluated parameter.
Therefore, for reliable
estimation variance D should
be small. Statistical evaluation
is called efficient if
given sample size n, it has
smallest possible variance.

50.

To statistical estimates
still a requirement
viability. The score is called
consistent if as n → it
tends in probability to
parameter being evaluated. notice, that
the unbiased estimate will be
consistent if as n → its
the variance tends to 0.

51. 2.3. Sample mean properties

We will assume that the options x1, x2,..., xn
are the values ​​of the corresponding
independent identically distributed random variables
,
having mathematical expectation
and dispersion
. Then
the sample mean can
treated as a random variable

52.

Unbiased. From properties
mathematical expectation implies that
those. the sample mean is
unbiased estimate of the mathematical
expectation of a random variable.
You can also show the effectiveness
estimates by the sample mean of mathematical expectation (for normal
distribution)

53.

Consistency. Let a be the estimated
parameter, namely the mathematical
population expectation
– population variance
.
Consider the Chebyshev inequality
We have:
then
. As n → right side
inequality tends to zero for any ε > 0, i.e.,
and hence the value X representing the sample
estimate tends to the estimated parameter a in terms of probability.

54.

Thus, it can be concluded
that the sample mean is
unbiased, efficient (according to
at least for normal
distribution) and consistent
expectation estimate
random variable associated with
the general population.

55.

56.

LECTURE 6

57. 2.4. Sample variance properties

We investigate the unbiasedness of the sample variance D* as
estimates of the variance of a random variable

58.

59.

60. Example

Find sample mean, sample
variance and root mean square
deviation, mode and corrected sample
variance for a sample having the following
distribution law:
Decision:

61.

62. CHAPTER 3. POINT ESTIMATION OF PARAMETERS OF A KNOWN DISTRIBUTION

63.

We assume that the general form of the law
distribution is known to us and
it remains to clarify the details -
parameters that define it
actual form. Exist
several methods to solve this
tasks, two of which we
consider: the method of moments and the method
maximum likelihood

64.3.1. Method of moments

65.

Method of moments developed by Carl
Pearson in 1894, based on
using these approximate equalities:
moments
calculated
theoretically according to the known law
distributions with parameters θ, and
sample moments
calculated
according to the available sample. Unknown
options
defined in
the result of solving a system of r equations,
linking relevant
theoretical and empirical moments,
For example,
.

66.

It can be shown that the estimates
parameters θ obtained by the method
moments, wealthy, their
mathematical expectations are different
from the true values ​​of the parameters to
value of the order of n–1, and the average
standard deviations are
values ​​of the order of n–0.5

67. Example

It is known that the characteristic ξ of objects
the general population, being random
value, has a uniform distribution depending on the parameters a and b:
It is required to determine by the method of moments
parameters a and b according to a known sample
average
and sample variance

68. Reminder

α1 - mathematical expectation β2 - variance

69.

(*)

70.

71.3.2. Maximum likelihood method

The method is based on the likelihood function
L(x1, x2,..., xn, θ), which is the law
vector distributions
, where
random variables
take values
sampling option, i.e. have the same
distribution. Since the random variables
are independent, the likelihood function has the form:

72.

The idea of ​​the method of greatest
plausibility lies in the fact that we
we are looking for such values ​​of the parameters θ, at
which the probability of occurrence in
selection of values ​​variant x1, x2,..., xn
is the largest. In other words,
as an estimate of the parameters θ
a vector is taken for which the function
likelihood has a local
maximum for given x1, x2, …, xn:

73.

Estimates by the method of maximum
plausibility is obtained from
necessary extremum condition
functions L(x1,x2,..., xn,θ) at a point

74. Notes:

1. When searching for the maximum of the likelihood function
to simplify the calculations, you can perform
actions that do not change the result: first,
use instead of L(x1, x2,..., xn,θ) the logarithmic likelihood function l(x1, x2,..., xn,θ) =
log L(x1, x2,..., xn,θ); second, discard in the expression
for the likelihood function independent of θ
terms (for l) or positive
factors (for L).
2. The parameter estimates considered by us are
can be called point estimates, since for
unknown parameter θ, one
single point
, which is his
approximate value. However, this approach
can lead to gross errors, and point
assessment may differ significantly from the true
values ​​of the estimated parameter (especially in
small sample size).

75. Example

Decision. In this task, it is necessary to evaluate
two unknown parameters: a and σ2.
Log-likelihood function
has the form

76.

Discarding the term in this formula, which is not
depends on a and σ2, we compose the system of equations
credibility
Solving, we get:

77. CHAPTER 4. INTERVAL ESTIMATION OF PARAMETERS OF A KNOWN DISTRIBUTION

78.









(*)

79.

(*)

80.4.1. Estimation of the mathematical expectation of a normally distributed quantity with a known variance







sample mean
as random value



81.

We have:
(1)
(2)

82.

(2)
(1)
(*)
(*)

83.4.2. Estimation of the mathematical expectation of a normally distributed quantity with an unknown variance

84.




degrees of freedom. Density

quantities are

85.

86. Student's density distribution with n - 1 degrees of freedom

87.

88.

89.







find by formulas

90. 4.3. Estimating the standard deviation of a normally distributed quantity





deviation σ.

unknown mathematical
waiting.

91. 4.3.1. A special case of the well-known mathematical expectation






Using the quantities
,


sample variance D*:

92.



quantities
have normal




93.


conditions
where
is the distribution density χ2


94.

95.

96.

97.4.3.2. Special case of unknown mathematical expectation








(where the random variable


χ2 with n–1 degrees of freedom.

98.

99.4.4. Estimating the mathematical expectation of a random variable for an arbitrary sample










a large sample (n >> 1).

100.




quantities
having

dispersion
, and the resulting
sample mean
as value
random variable

magnitude
has asymptotically


.

101.






use the formula

102.

103.

Lecture 7

104.

Repetition of the past

105. CHAPTER 4. INTERVAL ESTIMATION OF THE PARAMETERS OF A KNOWN DISTRIBUTION

106.

The problem of estimating a parameter of a known
distributions can be solved by
constructing an interval in which, with a given
true value is likely
parameter. This evaluation method
is called the interval estimate.
Usually in mathematics for evaluation
parameter θ, we construct the inequality
(*)
where the number δ characterizes the accuracy of the estimate:
the smaller δ, the better the estimate.

107.

(*)

108.4.1. Estimation of the mathematical expectation of a normally distributed quantity with a known variance

Let the random variable ξ under study be distributed according to the normal law with known
standard deviation σ and
unknown mathematical expectation a.
Required by the value of the sample mean
estimate the mathematical expectation ξ.
As before, we will consider the resulting
sample mean
as random value
values, and the values ​​are the sample variant x1, x2, …,
xn - respectively, as the values ​​​​are the same
distributed independent random variables
, each of which has a mat. expectation a and standard deviation σ.

109.

We have:
(1)
(2)

110.

(2)
(1)
(*)
(*)

111.4.2. Estimation of the mathematical expectation of a normally distributed quantity with an unknown variance

112.

It is known that the random variable tn,
given in this way has
Student's distribution with k = n - 1
degrees of freedom. Density
the probability distribution of such
quantities are

113.

114. Student's density distribution with n - 1 degrees of freedom

115.

116.

117.

Note. With a large number of degrees
freedom k Student's distribution
tends to a normal distribution with
zero mathematical expectation and
single variance. Therefore, for k ≥ 30
confidence interval can be in practice
find by formulas

118. 4.3. Estimating the standard deviation of a normally distributed quantity

Let the random variable under study
ξ is distributed according to the normal law
with expectation a and
unknown mean square
deviation σ.
Consider two cases: with known and
unknown mathematical
waiting.

119. 4.3.1. A special case of the well-known mathematical expectation

Let the value M[ξ] = a be known and
evaluate only σ or the variance D[ξ] = σ2.
Recall that for a known mat. waiting
the unbiased estimate of the variance is
sample variance D* = (σ*)2
Using the quantities
,
defined above, we introduce a random
value Y, which takes the values
sample variance D*:

120.

Consider a random variable
The sums under the sign are random
quantities
have normal
distribution with density fN (x, 0, 1).
Then Hn has a distribution χ2 with n
degrees of freedom as the sum of squares n
independent standard (a = 0, σ = 1)
normal random variables.

121.

Let us determine the confidence interval from
conditions
where
is the distribution density χ2
and γ - reliability (confidence
probability). The value of γ is numerically equal to
the area of ​​the shaded figure in Fig.

122.

123.

124.

125. 4.3.2. Special case of unknown mathematical expectation

In practice, the most common situation
when both parameters of the normal are unknown
distributions: mathematical expectation a and
standard deviation σ.
In this case, building a trust
interval is based on Fisher's theorem, from
cat. it follows that the random variable
(where the random variable
taking the values ​​of the unbiased
sample variance s2 has a distribution
χ2 with n–1 degrees of freedom.

126.

127.4.4. Estimating the mathematical expectation of a random variable for an arbitrary sample

Interval estimates of mathematical
expectations M[ξ] obtained for normally
distributed random variable ξ ,
are generally unsuitable for
random variables having a different form
distribution. However, there is a situation where
for any random variables
use similar intervals
relations, this takes place at
a large sample (n >> 1).

128.

As above, we will consider options
x1, x2,..., xn as independent values,
equally distributed random
quantities
having
expectation M[ξi] = mξ and
dispersion
, and the resulting
sample mean
as value
random variable
According to the central limit theorem
magnitude
has asymptotically
normal distribution law c
expectation mξ and variance
.

129.

Therefore, if the value of the variance is known
random variable ξ, then we can
use approximate formulas
If the value of the dispersion of the quantity ξ
unknown, then for large n one can
use the formula
where s is the corrected rms. deviation

130.

Repeated the past

131. CHAPTER 5. VERIFICATION OF STATISTICAL HYPOTHESES

132.

A statistical hypothesis is a hypothesis about
the form of an unknown distribution or about the parameters
known distribution of a random variable.
The hypothesis to be tested, usually denoted as
H0 is called the null or main hypothesis.
The additionally used hypothesis H1,
contradicting the hypothesis H0 is called
competing or alternative.
Statistical verification of advanced null
hypothesis H0 consists in its comparison with
sample data. With such a check
Two types of errors may occur:
a) errors of the first kind - cases when it is rejected
correct hypothesis H0;
b) errors of the second kind - cases when
the wrong hypothesis H0 is accepted.

133.

The probability of an error of the first kind will be
call the level of significance and designate
as a.
The main technique for checking statistical
hypothesis is that
available sample, the value is calculated
statistical criterion - some
random variable T with known
distribution law. Range of values ​​T,
under which the main hypothesis H0 must
be rejected, called critical, and
range of values ​​T for which this hypothesis
can be accepted, - acceptance area
hypotheses.

134.

135.5.1. Testing hypotheses about the parameters of a known distribution

5.1.1. Hypothesis testing about mathematical
expectation of a normally distributed random
quantities
Let the random variable ξ have
normal distribution.
We need to check the assumption that
that its mathematical expectation is
some number a0. Consider separately
cases where the variance ξ is known and when
she is unknown.

136.

In the case of known dispersion D[ξ] = σ2,
as in § 4.1, we define a random
a value that takes the values
sample mean. Hypothesis H0
initially formulated as M[ξ] =
a0. Because the sample mean
is an unbiased estimate of M[ξ], then
the hypothesis H0 can be represented as

137.

Considering the unbiasedness of the corrected
sample variances, the null hypothesis can be
write it like this:
where random variable
takes the values ​​of the corrected sample
dispersion of ξ and is similar to the random
the value of Z considered in Section 4.2.
As a statistical criterion, we choose
random variable
taking the value of the ratio of the greater
sample variance to a smaller one.

145.

Random variable F has
Fisher-Snedecor distribution with
the number of degrees of freedom k1 = n1 – 1 and k2
= n2 – 1, where n1 is the sample size, according to
which the larger
corrected variance
, and n2
the volume of the second sample, for which
found a smaller variance.
Consider two types of competing
hypotheses

146.

147.

148. 5.1.3. Comparison of mathematical expectations of independent random variables

Let us first consider the case of a normal
distributions of random variables with known
variances, and then based on it - a more general
the case of an arbitrary distribution of quantities at
large enough independent samples.
Let random variables ξ1 and ξ2 be independent and
are normally distributed, and let their variances D[ξ1]
and D[ξ2] are known. (For example, they can be found
from some other experience or calculated
in theory). Extracted samples of size n1 and n2
respectively. Let be
– selective
averages for these samples. Required by selective
average at a given significance level α
test the hypothesis about the equality of mathematical
expectations of the considered random variables to be made from a priori considerations,
based on experimental conditions, and
then the assumptions about the parameters
distributions are examined as shown
previously. However, very often there is
the need to verify the
hypothesis about the law of distribution.
Statistical tests designed
for such checks are usually called
consent criteria.

154.

Several criteria for agreement are known. Dignity
Pearson's criterion is its universality. With his
can be used to test hypotheses about different
distribution laws.
Pearson's criterion is based on comparing frequencies,
found from the sample (empirical frequencies), s
frequencies calculated using the tested
distribution law (theoretical frequencies).
Usually empirical and theoretical frequencies
differ. We need to find out if it's a coincidence
frequency discrepancy or is it significant and explained
the fact that the theoretical frequencies are calculated based on
incorrect hypothesis about the distribution of the general
aggregates.
The Pearson criterion, like any other, answers the
The question is whether there is agreement between the proposed hypothesis and
empirical data at a given level
significance.

155. 5.2.1. Testing the Hypothesis of Normal Distribution

Let there be a random variable ξ and let
a sample of a sufficiently large size n with a large
number of different values ​​option. Required
at significance level α, test the null hypothesis
H0 that the random variable ξ is distributed
fine.
For the convenience of processing the sample, we take two numbers
α and β:
and divide the interval [α, β] by s
subintervals. We will assume that the values ​​of the variant,
falling into each subinterval are approximately equal
a number that specifies the middle of the subinterval.
Counting the number of options that fall into each quantile of order α (0< α < 1) непрерывной
random variable ξ is such a number xα,
for which the equality
.
The quantile x½ is called the median of the random
the quantities ξ, the quantiles x0 and x2 are its quartiles, a
x0.1, x0.2,..., x0.9 - deciles.
For the standard normal distribution (a =
0, σ = 1) and, therefore,
where FN (x, a, σ) is the normal distribution function
distributed random variable, and Φ(x)
Laplace function.
Quantile of the standard normal distribution
xα for a given α can be found from the relation

162.6.2. Student's distribution

If a
– independent
random variables having
normal distribution with zero
mathematical expectation and
unit variance, then
random variable distribution
called Student's t-distribution
with n degrees of freedom (W.S. Gosset).

The phenomenon of stabilization of the frequency of occurrence of random events, discovered on a large and varied material, at first did not have any justification and was perceived as a purely empirical fact. The first theoretical result in this area was the famous Bernoulli theorem published in 1713, which laid the foundation for the laws of large numbers.

Bernoulli's theorem in its content is a limit theorem, i.e., a statement of asymptotic meaning, saying what will happen to the probabilistic parameters with a large number of observations. The progenitor of all modern numerous statements of this type is precisely Bernoulli's theorem.

Today it seems that the mathematical law of large numbers is a reflection of some common property many real processes.

Having a desire to give the law of large numbers as much scope as possible, corresponding to the far from exhausted potentialities of applying this law, one of the greatest mathematicians of our century, A. N. Kolmogorov, formulated its essence as follows: the law of large numbers - “ general principle, by virtue of which the cumulative action of a large number of random factors leads to a result that is almost independent of chance.

Thus, the law of large numbers has, as it were, two interpretations. One is mathematical, associated with specific mathematical models, formulations, theories, and the second is more general, going beyond this framework. The second interpretation is associated with the phenomenon of formation, often noted in practice, of a directed action to one degree or another against the background of a large number of hidden or visible acting factors that do not have such continuity outwardly. Examples related to the second interpretation are pricing in the free market, the formation of public opinion on a particular issue.

Having noted this general interpretation of the law of large numbers, let us turn to the specific mathematical formulations of this law.

As we said above, the first and fundamentally most important for the theory of probability is Bernoulli's theorem. The content of this mathematical fact, which reflects one of the most important regularities of the surrounding world, is reduced to the following.

Consider a sequence of unrelated (i.e., independent) tests, the conditions for which are reproduced invariably from test to test. The result of each test is the appearance or non-appearance of the event of interest to us. BUT.

This procedure (Bernoulli scheme) can obviously be considered typical for many practical areas: "boy - girl" in the sequence of newborns, daily meteorological observations ("it rained - it was not"), control of the flow of manufactured products ("normal - defective") etc.

Frequency of occurrence of the event BUT at P trials ( t A -

event frequency BUT in P tests) has with growth P tendency to stabilize its value, this is an empirical fact.

Bernoulli's theorem. Let us choose any arbitrarily small positive number e. Then

We emphasize that the mathematical fact established by Bernoulli in a certain mathematical model (in the Bernoulli scheme) should not be confused with the empirically established regularity of frequency stability. Bernoulli was not satisfied only with the statement of formula (9.1), but, taking into account the needs of practice, he gave an estimate of the inequality present in this formula. We will return to this interpretation below.

Bernoulli's law of large numbers has been the subject of research by a large number of mathematicians who have sought to refine it. One such refinement was obtained by the English mathematician Moivre and is currently called the Moivre-Laplace theorem. In the Bernoulli scheme, consider the sequence of normalized quantities:

Integral theorem of Moivre - Laplace. Pick any two numbers X ( and x 2 . In this case, x, x 7, then when P -» °°

If on the right side of formula (9.3) the variable x x tend to infinity, then the resulting limit, which depends only on x 2 (in this case, the index 2 can be removed), will be a distribution function, it is called standard normal distribution, or Gauss law.

The right side of formula (9.3) is equal to y = F(x 2) - F(x x). F(x2)-> 1 at x 2-> °° and F(x,) -> 0 for x, -> By choosing a sufficiently large

X] > 0 and sufficiently large in absolute value X] n we obtain the inequality:

Taking into account formula (9.2), we can extract practically reliable estimates:

If the reliability of y = 0.95 (i.e., the error probability of 0.05) may seem insufficient to someone, you can “play it safe” and build a slightly wider confidence interval using the three sigma rule mentioned above:

This interval corresponds to a very high confidence level y = 0.997 (see normal distribution tables).

Consider the example of tossing a coin. Let's toss a coin n = 100 times. Can it happen that the frequency R will be very different from the probability R= 0.5 (assuming the symmetry of the coin), for example, will it be equal to zero? To do this, it is necessary that the coat of arms does not fall out even once. Such an event is theoretically possible, but we have already calculated such probabilities, for this event it will be equal to This value

is extremely small, its order is a number with 30 decimal places. An event with such a probability can safely be considered practically impossible. What deviations of the frequency from the probability with a large number of experiments are practically possible? Using the Moivre-Laplace theorem, we answer this question as follows: with probability at= 0.95 coat of arms frequency R fits into the confidence interval:

If the error of 0.05 seems not small, it is necessary to increase the number of experiments (tossing a coin). With an increase P the width of the confidence interval decreases (unfortunately, not as fast as we would like, but inversely proportional to -Jn). For example, when P= 10 000 we get that R lies in the confidence interval with the confidence probability at= 0.95: 0.5 ± 0.01.

Thus, we have dealt quantitatively with the question of the approximation of frequency to probability.

Now let's find the probability of an event from its frequency and estimate the error of this approximation.

Let us make a large number of experiments P(tossed a coin), found the frequency of the event BUT and want to estimate its probability R.

From the law of large numbers P follows that:

Let us now estimate the practically possible error of the approximate equality (9.7). To do this, we use inequality (9.5) in the form:

For finding R on R it is necessary to solve inequality (9.8), for this it is necessary to square it and solve the corresponding quadratic equation. As a result, we get:

where

For an approximate estimate R on R can be in the formula (9.8) R on the right, replace with R or in formulas (9.10), (9.11) consider that

Then we get:

Let in P= 400 experiments received frequency value R= 0.25, then at the confidence level y = 0.95 we find:

But what if we need to know the probability more accurately, with an error of, say, no more than 0.01? To do this, you need to increase the number of experiments.

Assuming in formula (9.12) the probability R= 0.25, we equate the error value to the given value of 0.01 and obtain an equation for P:

Solving this equation, we get n~ 7500.

Let us now consider one more question: can the deviation of frequency from probability obtained in experiments be explained by random causes, or does this deviation show that the probability is not what we assumed it to be? In other words, does experience confirm the accepted statistical hypothesis or, on the contrary, require it to be rejected?

Let, for example, tossing a coin P= 800 times, we get the crest frequency R= 0.52. We suspected that the coin was not symmetrical. Is this suspicion justified? To answer this question, we will proceed from the assumption that the coin is symmetrical (p = 0.5). Let's find the confidence interval (with the confidence probability at= 0.95) for the frequency of appearance of the coat of arms. If the value obtained in the experiment R= 0.52 fits into this interval - everything is normal, the accepted hypothesis about the symmetry of the coin does not contradict the experimental data. Formula (9.12) for R= 0.5 gives an interval of 0.5 ± 0.035; received value p = 0.52 fits into this interval, which means that the coin will have to be “cleared” of suspicions of asymmetry.

Similar methods are used to judge whether various deviations from the mathematical expectation observed in random phenomena are random or "significant". For example, was there an accidental underweight in several samples of packaged goods, or does it indicate a systematic deception of buyers? Did the recovery rate increase by chance in patients who used the new drug, or is it due to the effect of the drug?

The normal law plays a particularly important role in probability theory and its practical applications. We have already seen above that a random variable - the number of occurrences of some event in the Bernoulli scheme - when P-» °° reduces to the normal law. However, there is a much more general result.

Central limit theorem. The sum of a large number of independent (or weakly dependent) random variables comparable with each other in the order of their dispersions is distributed according to the normal law, regardless of what the distribution laws of the terms were. The above statement is a rough qualitative formulation of the central limit theory. This theorem has many forms that differ from each other in the conditions that random variables must satisfy in order for their sum to “normalize” with an increase in the number of terms.

The density of the normal distribution Dx) is expressed by the formula:

where a - mathematical expectation of a random variable X s= V7) is its standard deviation.

To calculate the probability of x falling within the interval (x 1? x 2), the integral is used:

Since the integral (9.14) at density (9.13) is not expressed in terms of elementary functions (“it is not taken”), the tables of the integral distribution function of the standard normal distribution are used to calculate (9.14), when a = 0, a = 1 (such tables are available in any textbook on probability theory):

Probability (9.14) using equation (10.15) is expressed by the formula:

Example. Find the probability that the random variable x, having a normal distribution with parameters a, a, deviate from its mathematical expectation modulo no more than 3a.

Using formula (9.16) and the table of the distribution function of the normal law, we get:

Example. In each of the 700 independent experiences, an event BUT happens with constant probability R= 0.35. Find the probability that the event BUT will happen:

  • 1) exactly 270 times;
  • 2) less than 270 and more than 230 times;
  • 3) more than 270 times.

Finding the mathematical expectation a = etc and standard deviation:

random variable - the number of occurrences of the event BUT:

Finding the centered and normalized value X:

According to the density tables of the normal distribution, we find f(x):

Let's find now R w (x,> 270) = P 700 (270 F(1.98) == 1 - 0.97615 = 0.02385.

A serious step in the study of the problems of large numbers was made in 1867 by P. L. Chebyshev. He considered a very general case, when nothing is required from independent random variables, except for the existence of mathematical expectations and variances.

Chebyshev's inequality. For an arbitrarily small positive number e, the following inequality holds:

Chebyshev's theorem. If a x x, x 2, ..., x n - pairwise independent random variables, each of which has a mathematical expectation E(Xj) = ci and dispersion D(x,) =), and the variances are uniformly bounded, i.e. 1,2 ..., then for an arbitrarily small positive number e the relation is fulfilled:

Consequence. If a a,= aio, -o 2 , i= 1,2 ..., then

Task. How many times must a coin be tossed so that with probability at least y - 0.997, could it be argued that the frequency of the coat of arms would be in the interval (0.499; 0.501)?

Suppose the coin is symmetrical, p - q - 0.5. We apply the Chebyshev theorem in formula (9.19) to the random variable X- the frequency of appearance of the coat of arms in P coin tossing. We have already shown above that X = X x + X 2 + ... +Х„, where X t - a random variable that takes the value 1 if the coat of arms fell out, and the value 0 if the tails fell out. So:

We write inequality (9.19) for an event opposite to the event indicated under the probability sign:

In our case, [e \u003d 0.001, cj 2 \u003d /? -p)] t is the number of coats of arms in P throwing. Substituting these quantities into the last inequality and taking into account that, according to the condition of the problem, the inequality must be satisfied, we obtain:

The given example illustrates the possibility of using Chebyshev's inequality for estimating the probabilities of certain deviations of random variables (as well as problems like this example related to the calculation of these probabilities). The advantage of Chebyshev's inequality is that it does not require knowledge of the laws of distributions of random variables. Of course, if such a law is known, then Chebyshev's inequality gives too rough estimates.

Consider the same example, but using the fact that coin tossing is a special case of the Bernoulli scheme. The number of successes (in the example - the number of coats of arms) obeys the binomial law, and with a large P this law can be represented by the integral theorem of Moivre - Laplace as a normal law with mathematical expectation a = pr = n? 0.5 and with standard deviation a = yfnpq- 25=0.5l/l. The random variable - the frequency of the coat of arms - has a mathematical expectation = 0.5 and a standard deviation

Then we have:

From the last inequality we get:

From the normal distribution tables we find:

We see that the normal approximation gives the number of coin tosses, which provides a given error in estimating the probability of the coat of arms, which is 37 times smaller than the estimate obtained using the Chebyshev inequality (but the Chebyshev inequality makes it possible to perform similar calculations even in the case when we do not have the information on the law of distribution of the random variable under study).

Let us now consider an applied problem solved with the help of formula (9.16).

Competition problem. Two competing railway companies each have one train running between Moscow and St. Petersburg. These trains are equipped in approximately the same way, they also depart and arrive at approximately the same time. Let's pretend that P= 1000 passengers independently and randomly choose a train for themselves, therefore, as a mathematical model for choosing a train by passengers, we use the Bernoulli scheme with P trials and chances of success R= 0.5. The company must decide how many seats to provide on the train, taking into account two mutually contradictory conditions: on the one hand, they don’t want to have empty seats, on the other hand, they don’t want to appear dissatisfied with the lack of seats (next time they will prefer competing firms). Of course, you can provide on the train P= 1000 seats, but then there will certainly be empty seats. The random variable - the number of passengers in the train - within the framework of the accepted mathematical model using the integral theory of Moivre - Laplace obeys the normal law with the mathematical expectation a = pr = n/2 and dispersion a 2 = npq = p/4 sequentially. The probability that the train will come to more than s passengers is determined by the ratio:

Set the risk level a, i.e. the probability that more than s passengers:

From here:

If a a- the risk root of the last equation, which is found in the tables of the distribution function of the normal law, we get:

If, for example, P = 1000, a= 0.01 (this level of risk means that the number of places s will be sufficient in 99 cases out of 100), then x a ~ 2.33 and s= 537 places. Moreover, if both companies accept the same levels of risk a= 0.01, then the two trains will have a total of 1074 seats, 74 of which will be empty. Similarly, one can calculate that 514 seats would be enough in 80% of all cases, and 549 seats in 999 out of 1000 cases.

Similar considerations apply to other competitive service problems. For example, if t cinemas compete for the same P spectators, it should be accepted R= -. We get

that the number of seats s in the cinema should be determined by the ratio:

The total number of empty seats is equal to:

For a = 0,01, P= 1000 and t= 2, 3, 4 the values ​​of this number are approximately equal to 74, 126, 147, respectively.

Let's consider one more example. Let the train be P - 100 wagons. The weight of each wagon is a random variable with mathematical expectation a - 65 tons and mean square expectation o = 9 tons. A locomotive can carry a train if its weight does not exceed 6600 tons; otherwise, you have to hook up the second locomotive. We need to find the probability that this will not be necessary.

weights of individual wagons: having the same mathematical expectation a - 65 and the same variance d- o 2 \u003d 81. According to the rule of mathematical expectations: E(x) - 100 * 65 = 6500. According to the rule of addition of variances: D(x) \u003d 100 x 81 \u003d 8100. Taking the root, we find the standard deviation. In order for one locomotive to be able to pull a train, it is necessary that the weight of the train X turned out to be limiting, i.e., fell within the limits of the interval (0; 6600). The random variable x - the sum of 100 terms - can be considered normally distributed. By formula (9.16) we get:

It follows that the locomotive will "handle" the train with approximately 0.864 probability. Let us now reduce the number of cars in the train by two, i.e., take P= 98. Calculating now the probability that the locomotive will “handle” the train, we get a value of the order of 0.99, i.e., a practically certain event, although only two cars had to be removed for this.

So, if we are dealing with sums of a large number of random variables, then we can use the normal law. Naturally, this raises the question: how many random variables need to be added so that the distribution law of the sum is already “normalized”? It depends on what the laws of distribution of terms are. There are such intricate laws that normalization occurs only with a very large number of terms. But these laws are invented by mathematicians, while nature, as a rule, specifically does not arrange such troubles. Usually in practice, in order to be able to use the normal law, five or six terms are sufficient.

The speed with which the law of distribution of the sum of identically distributed random variables "normalizes" can be illustrated by the example of random variables with a uniform distribution on the interval (0, 1). The curve of such a distribution has the form of a rectangle, which is already unlike the normal law. Let's add two such independent quantities - we get a random variable distributed according to the so-called Simpson's law, the graphic image of which has the form of an isosceles triangle. It doesn't look like a normal law either, but it's better. And if you add three such uniformly distributed random variables, you get a curve consisting of three segments of parabolas, very similar to a normal curve. If you add six such random variables, you get a curve that does not differ from a normal one. This is the basis for the widely used method for obtaining a normally distributed random variable, while sensors uniformly distributed (0, 1) random numbers equipped with all modern computers.

The following method is recommended as one practical way to check this. We build a confidence interval for the frequency of an event with a level at= 0.997 according to the three sigma rule:

and if both of its ends do not go beyond the segment (0, 1), then the normal law can be used. If any of the boundaries of the confidence interval is outside the segment (0, 1), then the normal law cannot be used. However, under certain conditions, the binomial law for the frequency of some random event, if it does not tend to the normal one, can tend to another law.

In many applications, the Bernoulli scheme is used as a mathematical model of a random experiment, in which the number of trials P large, a random event is quite rare, i.e. R = etc not small, but not large (fluctuates in the range of O -5 - 20). In this case, the following relation holds:

Formula (9.20) is called the Poisson approximation for the binomial law, since the probability distribution on its right side is called Poisson's law. The Poisson distribution is said to be a probability distribution for rare events, since it occurs when the limits are met: P -»°°, R-»0, but X = pr oo.

Example. Birthdays. What is the probability R t (k) that in a society of 500 people to people born on New Year's Day? If these 500 people are chosen at random, then the Bernoulli scheme can be applied with a probability of success P = 1/365. Then

Probability calculations for various to give the following values: RU = 0,3484...; R 2 = 0,2388...; R 3 = 0,1089...; P 4 = 0,0372...; R 5 = 0,0101...; R 6= 0.0023... Corresponding approximations by the Poisson formula for X= 500 1/365 = 1,37

give the following values: Ru = 0,3481...; R 2 = 0,2385...; Р b = 0,1089; R 4 = 0,0373...; P 5 = 0,0102...; P 6 = 0.0023... All errors are only in the fourth decimal place.

Let us give examples of situations where Poisson's law of rare events can be used.

At the telephone exchange, an incorrect connection is unlikely to occur. R, usually R~ 0.005. Then the Poisson formula allows you to find the probability of incorrect connections for a given total number of connections n~ 1000 when X = pr =1000 0,005 = 5.

When baking buns, raisins are placed in the dough. It should be expected that due to stirring, the frequency of raisin rolls will approximately follow the Poisson distribution P n (k, X), where X- density of raisins in the dough.

A radioactive substance emits n-particles. The event that the number of d-particles reaching in the course of time t given area of ​​space, takes a fixed value to, obeys Poisson's law.

The number of living cells with altered chromosomes under the influence of X-rays follows the Poisson distribution.

So, the laws of large numbers make it possible to solve the problem of mathematical statistics associated with estimating unknown probabilities of elementary outcomes of a random experiment. Thanks to this knowledge, we make the methods of probability theory practically meaningful and useful. The laws of large numbers also make it possible to solve the problem of obtaining information about unknown elementary probabilities in another form - the form of testing statistical hypotheses.

Let us consider in more detail the formulation and the probabilistic mechanism for solving problems of testing statistical hypotheses.