Measuring ED&I scientifically

It is now generally accepted by most (although not all) people that promoting and delivering Equality, Diversity, and Inclusion (ED&I) is a good idea. But what exactly do we mean by ED&I? The usual definition given is something along the lines of: ‘equality’ means ensuring everyone has the same rights and opportunities, regardless of their personal characteristics; ‘diversity’ means recognizing, respecting, and valuing individual differences, including backgrounds, beliefs, skills, and identities; and ‘inclusion’ means creating an environment where all people feel welcomed, respected, and able to contribute fully. That’s all well and good; but how might we go about measuring these things? In this blog post I will attempt to answer that question using the concept of entropy.

Consider a organization with members who are categorized into n groups based on personal characteristics (race, gender, sexual orientation, and so on). Let p_i denote the proportion of people in the organization who belong to group i. To measure the ‘fairness’ of the organization scientifically we need to define a function H_n() which takes the proportions p₁,…,p_n and outputs a positive number, with the interpretation that that H_n(p₁,…,p_n) represents a quantitative measure of the organization’s fairness. What properties should this function have? First of all it should be continuous, so that changing the values of the proportions by a small amount only changes the fairness of the organization by a small amount.

The function H_n() should be symmetric, so that the output remains the same if the proportions are reordered. This ensures that we are not applying preferential treatment to one particular group. It should attain its maximum when all proportions are equal. It should also increase with the number of outcomes when all proportions are equal; that is, H_n(1/n,…,1/n) < H_n+1(1/(n+1),…,1/(n+1)). The final property this function should have might be called ‘additivity’: if the organization is split into sub-organizations, then the fairness of the organization should be equal to a weighted sum of the fairness of the sub-organizations, where the weight is equal to the proportion of members belonging to that sub-organization.

It turns out that the only function that satisfies these criteria is the so-called ‘entropy’ function: H_n(p₁,…,p_n) = -∑_ip_ilog(p_i). This gives us a basic measure of fairness. However it is not sufficient for most purposes. In the real world we usually want to compare these proportions against a baseline, which will usually be the corresponding proportions in the population as a whole. Let us now denote these baseline proportions by p₁,…,p_n and the corresponding proportions within the organization by q₁,…,q_n. Further, let X and Y be random variables which take the value i with probabilities p_i and q_i respectively. Then we can measure the fairness of the organization using the ‘relative entropy’ H(X|Y) = -∑_ip_ilog(q_i/p_i), where I have suppressed the dependence on n for convenience.

Note that a high value of H(X|Y) means that the organization is not representative of the population as a whole, whereas a value of zero means that it is perfectly representative of the population. Therefore, an organization would generally want to decrease the value of H(X|Y) as a high value represents a lack of diversity. This therefore gives us a measure of (lack of) diversity; but what about equality? Let us now suppose that the organization is stratified into a hierarchy with m levels. Let q_ij be the proportion of people in level i of the organization belonging to group j, and let Y_i be a random variable which takes the value j with probability q_ij. Then the within-level entropy for level i is just the relative entropy H(X|Y_i).

How do we account for entropy across levels?Let r_ij be proportion of people in group j belonging to level i of the organization, and let Z_j be a random variable which takes the value i with probability r_ij. Further, let r_i be the overall proportion of people belonging to level i of the organization, and let and let Z be a random variable which takes the value i with probability r_i. Then we may define the cross-level entropy for group j as the relative entropy H(Z|Z_j). A high value of H(Z|Z_j) means that members of group j are facing discrimination (either positive of negative). Therefore, an organization would generally want to decrease the value of H(Z|Z_j) as a high value represents a lack of equality with respect to that particular group.

The overall entropy H can then be defined as a weighted double sum: H = ∑_i∑_j[q_ijH(X|Y_i)+r_ijH(Z|Z_j)]. This gives a combined measure of both equality and diversity, which can be considered as a measure of inclusion. Thus we have successfully defined measures of equality, diversity, and inclusion. The next step is to apply these measures to some real-world data; I will leave that for a future blog post.

Measuring ED&I scientifically

Share this:

Leave a comment Cancel reply