research How fair or unfair is your algorithm

A system similar to the way we rate products. A year later, however, it turned out that the algorithm did not give everyone the same chance: women were systematically scored lower on technical vacancies, regardless of their knowledge, skills or experience.

Ethics in AI

What turned out? Because the system was trained on submitted CVs from earlier years, the bias in the tech world was reinforced: the industry was dominated by men, so most of the submitted CVs also came from men. Once Amazon realized this, the algorithm was retrained to remove this bias. However, confidence in the system was already lost.

This story is a well-known example of how AI systems, despite the best of intentions, can turn out unfair. Unfortunately, Amazon isn’t the only company that has gone wrong in the past. More and more news articles are published that show the risks of big data. Because the importance of (big) data will only continue to grow, and with it the risks, an important social discussion has arisen: the ethics within the field of Artificial Intelligence.

Nobody consciously chooses dishonesty

Central to this area are the choices that are made during the creation of a big data product. These can cause the system to become unfair despite good intentions. The more wrong choices, the more unfair the system and the greater the consequences. Although these consequences can differ enormously in the impact they have on society, they usually have one thing in common: a (large) group of people is unfairly disadvantaged. With an ever-growing number of algorithms used to help us make decisions, the need to do something about this is all the greater.

Which decisions then lead to these consequences? Although there are many different ways in which things can go wrong, these can usually be traced back to three phases during (big) data projects: data, design and decisions.


What data do you use to train your system?

A system is never better than the data that is put into it. We often say garbage in, garbage out. The same goes for bias: bias in, bias out. When our data is already biased, such as the data Amazon used to predict success on a position, the system will only confirm this bias. As a result, the system strengthens the status quo. A world dominated by men will continue to be dominated by men.


Which algorithm do you choose?

Even when you make all the right choices when selecting your data, a system can go wrong because you don’t understand what the system makes decisions about. The well-known ‘black box’ algorithms make it difficult to understand why one person A is placed in box 1 and person B in box 2.


What decisions will you make based on the predictions?

This is the all-decisive phase; what will you do with the results of your model? Do you use them to determine who you will email with an offer or do you use them to determine whose jail term you will extend? Obviously, a wrong decision in the last example is a lot worse than in the first example. The size of the decision thus increases the impact of the dishonesty.

Stay critical

Each of these phases therefore entails its own choices. To make these well-considered choices, you can use the following pillars:


Is the privacy of your customers guaranteed? Is there no way to trace this data back to a person?


Can you explain on the basis of which decisions the algorithm makes a prediction? Is this explanation also clear for the business?


Can you not only explain why a choice was made, but also justify it? Do you dare to take responsibility for the predictions that the model makes?


Does everyone in the dataset have an equal chance of a certain outcome? Are groups not disadvantaged?


Do the choices you make envision the best interests of humanity? How big is the impact of our actions on people who have been given the wrong prediction?

By weighing every decision in each of the phases on each of these five pillars, you can check whether your (big) data project is as fair as you would like it to be. While the examples of dishonest systems reaching the news, such as Amazon’s recruiting system, come mainly from the big tech companies, it’s up to us all to prevent them. Everyone who works with data should think about this. Stay critical, both towards others and towards yourself. Ultimately, data should make our world a better place, not distribute it even further.

This article is written by Esther Lietaert Peerbolte, senior data science consultant at EY VODW, and previously appeared in MarketingTribune 15/16, 2020.


Related Articles

Back to top button