Kendall's rank correlation coefficient. Kendall's rank correlation coefficient Kendall's rank correlation coefficient how to calculate

KENDALL'S RANK CORRELATION COEFFICIENT

One of the sample measures of the dependence of two random variables(signs) Xi Y, based on the ranking of sample elements (X 1, Y x), .. ., (X n, Y n). K. k. r. thus refers to ranking statisticians and is determined by the formula

Where r i- U, belonging to that couple ( X, Y), for cut Xequal i, S = 2N-(n-1)/2, N is the number of sample elements, for which both j>i and r j >r i. Always As a selective measure of the dependence of K. k.r. K. was widely used by M. Kendall (M. Kendall, see).

K. k. r. is used to test the hypothesis of independence of random variables. If the independence hypothesis is true, then E t =0 and D t =2(2n+5)/9n(n-1). With a small sample size, checking the statistical independence hypotheses are made using special tables (see). For n>10, use the normal approximation for the distribution of m: if

then the hypothesis of independence is rejected, otherwise it is accepted. Here a . - level of significance, u a /2 is the percentage point of the normal distribution. K. k. r. k., like any, can be used to detect the dependence of two qualitative characteristics, if only the sample elements can be ordered relative to these characteristics. If X, Y have a joint normal with the correlation coefficient p, then the relationship between K. k.r. k. and has the form:

See also Spearman rank correlation,Rank criterion.

Lit.: Kendal M., Rank correlations, trans. from English, M., 1975; Van der Waerden B. L., Mathematical, trans. from German, M., 1960; Bolshev L. N., Smirnov N. V., Tables of mathematical statistics, M., 1965.

A. V. Prokhorov.


Mathematical encyclopedia. - M.: Soviet Encyclopedia. I. M. Vinogradov. 1977-1985.

See what "KENDALL'S RANK CORRELATION COEFFICIENT" is in other dictionaries:

    English with efficient, rank correlation Kendall; German Kendalls Rangkorrelationskoeffizient. A correlation coefficient that determines the degree of agreement between the ordering of all pairs of objects according to two variables. Antinazi. Encyclopedia of Sociology, 2009 ... Encyclopedia of Sociology

    KENDALL'S RANK CORRELATION COEFFICIENT- English coefficient, rank correlation Kendall; German Kendalls Rangkorrelationskoeffizient. The correlation coefficient, which determines the degree of correspondence of the ordering of all pairs of objects according to two variables... Explanatory Dictionary of Sociology

    A measure of the dependence of two random variables (features) X and Y, based on the ranking of independent observation results (X1, Y1), . . ., (Xn,Yn). If the ranks of the X values ​​are in natural order i=1, . . ., n,a Ri rank Y, corresponding to... ... Mathematical Encyclopedia

    Correlation coefficient- (Correlation coefficient) The correlation coefficient is a statistical indicator of the dependence of two random variables. Definition of the correlation coefficient, types of correlation coefficients, properties of the correlation coefficient, calculation and application... ... Investor Encyclopedia

    A dependence between random variables that, generally speaking, does not have a strictly functional character. In contrast to the functional dependence, K., as a rule, is considered when one of the quantities depends not only on the other, but also... ... Mathematical Encyclopedia

    Correlation (correlation dependence) is a statistical relationship between two or more random variables (or variables that can be considered as such with some acceptable degree of accuracy). In this case, changes in the values ​​of one or ... ... Wikipedia

    Correlation- (Correlation) Correlation is a statistical relationship between two or more random variables. The concept of correlation, types of correlation, correlation coefficient, correlation analysis, price correlation, correlation of currency pairs on Forex Contents... ... Investor Encyclopedia

    It is generally accepted that the beginning of S. m.v. or, as it is often called, statistics of “small n”, was founded in the first decade of the 20th century with the publication of the work of W. Gosset, in which he placed the t distribution, postulated by the one that received a little later worldwide... ... Psychological Encyclopedia

    Maurice Kendall Sir Maurice George Kendall Date of birth: September 6, 1907 (1907 09 06) Place of birth: Kettering, UK Date of death ... Wikipedia

    Forecast- (Forecast) Definition of forecast, tasks and principles of forecasting Definition of forecast, tasks and principles of forecasting, forecasting methods Contents Contents Definition Basic concepts of forecasting Tasks and principles of forecasting... ... Investor Encyclopedia

When ranking, the expert must arrange the evaluated elements in ascending (descending) order of their preference and assign ranks to each of them in the form of natural numbers. In direct ranking, the most preferred element has rank 1 (sometimes 0), and the least preferred element has rank m.

If the expert cannot carry out a strict ranking because, in his opinion, some elements are the same in preference, then it is permissible to assign the same ranks to such elements. To ensure that the sum of ranks is equal to the sum of places of ranked elements, so-called standardized ranks are used. The standardized rank is the arithmetic mean of the numbers of elements in a ranked series that are the same in preference.

Example 2.6. The expert ranked the six items by preference as follows:

Then the standardized ranks of these elements will be

Thus, the sum of the ranks assigned to the elements will be equal to the sum of the numbers in the natural series.

The accuracy of expressing preference by ranking items depends significantly on the power of the set of presentations. The ranking procedure gives the most reliable results (in terms of the degree of closeness between the revealed preference and the “true”) when the number of evaluated elements is no more than 10. The maximum power of the presentation set should not exceed 20.

Processing and analysis of rankings are carried out with the aim of constructing a group preference relationship based on individual preferences. In this case, the following tasks can be set: a) determining the closeness of the connection between the rankings of two experts on elements of a set of presentations; b) determining the relationship between two elements according to the individual opinions of group members regarding various characteristics these elements; c) assessing the consistency of expert opinions in a group containing more than two experts.

In the first two cases, the rank correlation coefficient is used as a measure of the closeness of the connection. Depending on whether only strict or non-strict ranking is allowed, either Kendall's or Spearman's rank correlation coefficient is used.

Kendall's rank correlation coefficient for problem (a)

Where m− number of elements; r 1 i – rank assigned by the first expert i−th element; r 2 i – the same, by the second expert.

For problem (b), components (2.5) have the following meaning: m - the number of characteristics of the two elements being assessed; r 1 i(r 2 i) - rank i-th characteristics in the ranking of the first (second) element, set by a group of experts.

For strict ranking, the rank correlation coefficient is used r Spearman:


whose components have the same meaning as in (2.5).

Correlation coefficients (2.5), (2.6) vary from -1 to +1. If the correlation coefficient is +1, then this means that the rankings are the same; if it is equal to -1, then − are opposite (rankings are opposite to each other). If the correlation coefficient is zero, it means that the rankings are linearly independent (uncorrelated).

Since with this approach (an expert is a “measurer” with a random error) individual rankings are considered random, the task arises of statistical testing of the hypothesis about the significance of the resulting correlation coefficient. In this case, the Neyman-Pearson criterion is used: the significance level of the criterion α is set and, knowing the laws of distribution of the correlation coefficient, the threshold value is determined c α, with which the resulting value of the correlation coefficient is compared. The critical area is right-handed (in practice, the criterion value is usually first calculated and the significance level is determined from it, which is compared with the threshold level α ).

For m > 10, Kendall's rank correlation coefficient τ has a distribution close to normal with the parameters:

where M [τ] – mathematical expectation; D [τ] – dispersion.

In this case, tables of the standard normal distribution function are used:

and the boundary τ α of the critical region is defined as the root of the equation

If the calculated value of the coefficient τ ≥ τ α, then the rankings are considered to be in really good agreement. Typically, the value of α is chosen in the range of 0.01-0.05. For t ≤ 10, the distribution of t is given in Table. 2.1.

Checking the significance of the consistency of two rankings using the Spearman coefficient ρ is carried out in the same order using Student distribution tables for m > 10.

In this case the value

has a distribution well approximated by the Student distribution with m– 2 degrees of freedom. At m> 30 the distribution of ρ agrees well with the normal one, having M [ρ] = 0 and D [ρ] = .

For m ≤ 10, the significance of ρ is checked using the table. 2.2.

If the rankings are not strict, then the Spearman coefficient

where ρ – is calculated according to (2.6);

where k 1, k 2 is the number various groups non-strict ranks in the first and second rankings, respectively; l i is the number of identical ranks in i-th group. At practical use rank correlation coefficients ρ Spearman and τ Kendall, it should be kept in mind that the coefficient ρ provides a more accurate result in the sense of minimum variance.

Table 2.1.Kendall's rank correlation coefficient distribution

Kendall's correlation coefficient is used when variables are represented on two ordinal scales, provided that there are no associated ranks. The calculation of the Kendall coefficient involves counting the number of matches and inversions. Let's consider this procedure using the example of the previous problem.

The algorithm for solving the problem is as follows:

    We rearrange the data in the table. 8.5 so that one of the rows (in this case the row x i) turned out to be ranked. In other words, we rearrange the pairs x And y in the right order and We enter the data in columns 1 and 2 of the table. 8.6.

Table 8.6

x i

y i

2. Determine the “degree of ranking” of the 2nd row ( y i). This procedure is carried out in the following sequence:

a) take the first value of the unranked series “3”. Counting the number of ranks below given number, which more compared value. There are 9 such values ​​(numbers 6, 7, 4, 9, 5, 11, 8, 12 and 10). Enter the number 9 in the “matches” column. Then we count the number of values ​​that less three. There are 2 such values ​​(ranks 1 and 2); We enter the number 2 in the “inversion” column.

b) discard the number 3 (we have already worked with it) and repeat the procedure for the next value “6”: the number of matches is 6 (ranks 7, 9, 11, 8, 12 and 10), the number of inversions is 4 (ranks 1, 2 , 4 and 5). We enter the number 6 in the “coincidence” column, and the number 4 in the “inversion” column.

c) the procedure is repeated in a similar way until the end of the row; it should be remembered that each “worked out” value is excluded from further consideration (only ranks that lie below this number are calculated).

Note

In order not to make mistakes in calculations, it should be borne in mind that with each “step” the sum of coincidences and inversions decreases by one; This is understandable given that each time one value is excluded from consideration.

3. The sum of matches is calculated (P) and the sum of inversions (Q); the data is entered into one and three interchangeable formulas for the Kendall coefficient (8.10). The corresponding calculations are carried out.

t (8.10)

In our case:

In table XIV Appendix contains the critical values ​​of the coefficient for this sample: τ cr. = 0.45; 0.59. The empirically obtained value is compared with the tabulated one.

Conclusion

τ = 0.55 > τ cr. = 0.45. The correlation is statistically significant at level 1.

Note:

If necessary (for example, if there is no table of critical values), statistical significance t Kendall can be determined by the following formula:

(8.11)

Where S* = P – Q+ 1 if P< Q , And S* = P – Q – 1 if P>Q.

Values z for the corresponding significance level correspond to the Pearson measure and are found in the corresponding tables (not included in the appendix. For standard significance levels z kr = 1.96 (for β 1 = 0.95) and 2.58 (for β 2 = 0.99). Kendall's correlation coefficient is statistically significant if z > z cr

In our case S* = P – Q– 1 = 35 and z= 2.40, i.e. the initial conclusion is confirmed: the correlation between the characteristics is statistically significant for the 1st level of significance.

Rank correlation coefficient characterizes the general nature of the nonlinear relationship: an increase or decrease in the resultant attribute with an increase in the factorial one. This is an indicator of the tightness of a monotonic nonlinear connection.

Purpose of the service. Using this online calculator you can calculate Kendal rank correlation coefficient according to all basic formulas, as well as an assessment of its significance.

Instructions. Specify the amount of data (number of rows). The resulting solution is saved in a Word file.

The coefficient proposed by Kendal is based on relationships of the “more-less” type, the validity of which was established when constructing the scales.
Let's select a couple of objects and compare their ranks according to one characteristic and another. If the ranks for a given characteristic form a direct order (i.e., the order of the natural series), then the pair is assigned +1, if the reverse, then –1. For the selected pair, the corresponding plus and minus units (by attribute X and by attribute Y) are multiplied. The result is obviously +1; if the ranks of a pair of both features are located in the same sequence, and –1 if in the opposite order.
If the rank orders for both characteristics are the same for all pairs, then the sum of units assigned to all pairs of objects is maximum and equal to the number of pairs. If the rank orders of all pairs are reversed, then –C 2 N . In the general case, C 2 N = P + Q, where P is the number of positive and Q the number of negative units assigned to pairs when comparing their ranks on both criteria.
The value is called the Kendall coefficient.
It is clear from the formula that the coefficient τ represents the difference between the proportion of pairs of objects whose order is the same on both grounds (relative to the number of all pairs) and the proportion of pairs of objects whose order does not coincide.
For example, a coefficient value of 0.60 means that 80% of pairs have the same order of objects, and 20% do not (80% + 20% = 100%; 0.80 – 0.20 = 0.60). Those. τ can be interpreted as the difference in the probabilities of matching and not matching orders for both characteristics for a randomly selected pair of objects.
In the general case, the calculation of τ (more precisely P or Q) even for N of the order of 10 turns out to be cumbersome.
We'll show you how to simplify the calculations.


Example. The relationship between the volume of industrial output and investment in fixed capital in 10 regions of one of the federal districts of the Russian Federation in 2003 is characterized by the following data:


Calculate ranking coefficients Spearman and Kendal correlations. Check their significance at α=0.05. Formulate a conclusion about the relationship between the volume of industrial output and investment in fixed capital for the regions of the Russian Federation under consideration.

Solution. Let us assign ranks to feature Y and factor X.


Let's sort the data by X.
In the row Y to the right of 3 there are 7 ranks greater than 3, therefore, 3 will generate the term 7 in P.
To the right of 1 are 8 ranks greater than 1 (these are 2, 4, 6, 9, 5, 10, 7, 8), i.e. P will include 8, etc. As a result, P = 37 and using the formulas we have:

XYrank X, d xrank Y, d yPQ
18.4 5.57 1 3 7 2
20.6 2.88 2 1 8 0
21.5 4.12 3 2 7 0
35.7 7.24 4 4 6 0
37.1 9.67 5 6 4 1
39.8 10.48 6 9 1 3
51.1 8.58 7 5 3 0
54.4 14.79 8 10 0 2
64.6 10.22 9 7 1 0
90.6 10.45 10 8 0 0
37 8


Using simplified formulas:




where n is the sample size; z kp is the critical point of the two-sided critical region, which is found by Laplace function table by equality Ф(z kp)=(1-α)/2.
If |τ|< T kp - нет оснований отвергнуть нулевую гипотезу. Ранговая корреляционная связь между качественными признаками незначима. Если |τ| >T kp - the null hypothesis is rejected. There is a significant rank correlation between qualitative characteristics.
Let's find the critical point z kp
Ф(z kp) = (1-α)/2 = (1 - 0.05)/2 = 0.475

Let's find the critical point:

Since τ > T kp - we reject the null hypothesis; the rank correlation between the scores on the two tests is significant.

Example. According to data on the volume of construction and installation work performed on our own, and the number of employees in 10 construction companies in one of the cities of the Russian Federation, determine the relationship between these characteristics using the Kendel coefficient.

Solution find using calculator.
Let us assign ranks to feature Y and factor X.
Let's arrange the objects so that their ranks in X represent the natural series. Since the estimates assigned to each pair of this series are positive, the “+1” values ​​included in P will be generated only by those pairs whose ranks in Y form a direct order.
They can be easily calculated by sequentially comparing the ranks of each object in the Y row with the steel ones.
Kendal coefficient.

In the general case, the calculation of τ (more precisely P or Q) even for N of the order of 10 turns out to be cumbersome. We'll show you how to simplify the calculations.

or

Solution.
Let's sort the data by X.
In the row Y to the right of 2 there are 8 ranks greater than 2, therefore, 2 will generate the term 8 in P.
To the right of 4 are 6 ranks greater than 4 (these are 7, 5, 6, 8, 9, 10), i.e. P will include 6, etc. As a result, P = 29 and using the formulas we have:

XYrank X, d xrank Y, d yPQ
38 292 1 2 8 1
50 302 2 4 6 2
52 366 3 7 3 4
54 312 4 5 4 2
59 359 5 6 3 2
61 398 6 8 2 2
66 401 7 9 1 2
70 298 8 3 1 1
71 283 9 1 1 0
73 413 10 10 0 0
29 16


Using simplified formulas:


In order to test the null hypothesis at the significance level α that the general Kendall rank correlation coefficient is equal to zero under the competing hypothesis H 1: τ ≠ 0, it is necessary to calculate the critical point:

where n is the sample size; z kp is the critical point of the two-sided critical region, which is found from the table of the Laplace function by the equality Ф(z kp)=(1 - α)/2.
If |τ| T kp - the null hypothesis is rejected. There is a significant rank correlation between qualitative characteristics.
Let's find the critical point z kp
Ф(z kp) = (1 - α)/2 = (1 - 0.05)/2 = 0.475
Using the Laplace table we find z kp = 1.96
Let's find the critical point:

Since τ

2024 wisemotors.ru. How does this work. Iron. Mining. Cryptocurrency.