What a picture of Alexandria Ocasio-Cortez in a bikini tells us about the disturbing future of AI
Source and Background
There are two readings associated with this week's topic:
- What a picture of Alexandria Ocasio-Cortez in a bikini tells us about the disturbing future of AI: New research on image-generating algorithms has raised alarming evidence of bias. It’s time to tackle the problem of discrimination being baked into tech, before it is too late
- Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases: When compared with statistical patterns in online image datasets, our findings suggest that machine learning models can automatically learn bias from the way people are stereotypically portrayed on the web.
The first reading is very accessible: Arwa Mahdawi makes the case that "discrimination [is] being baked into tech". The people creating that tech? Mathematicians and computer scientists.
Arwa's article has links to several articles which you might check out, such as
- Amazon scraps secret AI recruiting tool that showed bias against women
- Wrongfully Accused by an Algorithm: In what may be the first known case of its kind, a faulty facial recognition match led to a Michigan man’s arrest for a crime he did not commit.
Let me give you a very personal example. My wife (who is Togolese) is very dark-skinned; I am very light-skinned. When we Zoom (as we have been doing a lot during this pandemic), we sometimes use a virtual background; when we do, my wife is far more likely to "disappear" than I am. We jokingly accuse Zoom of racial bias, but IT'S NO JOKE. Similarly, when taking pictures in Africa with cameras that have an "auto focus" feature, many will zoom right in on a white person in sea of black folks. Now why is that? It's not chance....
- The second reading is technical -- extremely technical. I noticed that some of you were saying in the discussion that you wished that you knew more about the source material (for Global Warming, say). Well, be careful what you wish for! I don't expect any of you to actually read this! But I would like for you all to examine it. Generally the abstract and the conclusion may be accessible (although jargon-filled). The rest may be weeds....
While technical, I think that you can read it for some sense of how these algorithms work, and for the ammunition by which Arwa comes to her summary statistics for her article.
For those of you with an interest in statistics and computer science, there are some interesting technical aspects to this research; for those of you with an interest in education, you need to be aware of what your students might be doing -- and might be subjected to -- someday!
And for those of you with an interest in programming, and, in particular, the statistical package R, you can find the code, images, etc. used to produce this paper at https://github.com/ryansteed/ieat.
One more recent news-item, relevant to the discussion: ‘This is bigger than just Timnit’: How Google tried to silence a critic and ignited a movement: Big Tech has used its power to control the field of AI ethics and avoid accountability. Now, the ouster of Timnit Gebru is putting the movement for equitable tech in the spotlight.
- The title is very provocative (which, one might argue, every title should be). In order to train anything, you need a training set: that is, you need to train something to recognize certain objects, or reach a conclusion, or.... But it has to be trained. There are two kinds of training: supervised and unsupervised. The argument here is that, if you leave an algorithm to reach its conclusions based on what it finds on the web, it's going to learn all the evils of the web! But if you allow humans to supervise their learning, the trainers build their own prejudices into the process. How can you win?
- While not exactly pertinent to the issue of AI, it was interesting years back when the "Math is Hard" Barbie came out. How are stereotypes "baked in" to your areas of interest? What have you observed?
- Examining the "source document", Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases: can you pick apart any particular mathematics you've encountered before, and how it's useful here? For any of you with any statistical training, you'll certainly notice there are a lot of p-values.... Can you imagine yourself performing any of these analyses? Can you imagine how you might make some of these tests with the training you already have?
- The Turing test is a classic challenge, dealing with the question of whether machines can think. We decide that they are actually "thinking machines" if we can not distinguish between interactions with a real human and the machine. About this "Imitation game", Turing (1950) says: I believe that in about fifty years’ time it will be possible to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 percent chance of making the right identification after five minutes of questioning. I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted. It's 70 years on: are we there? Are Siri and Alexa there? And if machines can think, what makes us think that they won't think like sexist, racist bigots?
There's no winning or losing in this situation. Machine learning is just a black box and whatever you feed it is all it learns (Ganga Adhikari). They learn what they experience. Maybe the best method to approach this problem of being biased is to combine both unsupervised learning and supervised learning (Liam Painter, Ganga Adhikari, Jenna Handerson). Since Al picks up and develops "knowledge" by what is directly produced by humans (Chelsea Debord), it's always going to be biased (Justin Horn, Octavia Dieng, Jenna Handerson), but hopefully awareness of the issue in machine learning and in general can help lead to smart, innovative people finding better solutions as time goes on. (Rachel Ritchie).
A fair amount of the students were not far enough along in their courses to fully understand the mathematics exhibited in the research paper. They could tell that there were functions and variables being used and from the context in the paper understand that these would help detect biases in the models. A lot of students recognized the p-values used in hypothesis testing from their beginner statistic courses. Overall, they could pick apart the little pieces of the mathematics but had trouble understanding how everything fit together.
Another group of students were a lot further along in their studies and could understand the more advanced topics like the permutation tests used to construct the null distribution in the hypothesis testing. It is another way of doing a hypothesis test when you have a non-normal population or do not want to assume normality. Further than the mathematics topics included recognizing the "python programming libraries and computer vision models," (Shawn Huesman). Some even knew how important it was to test for biases in algorithms past what was presented in the article. These biases could lead to dangerous and misleading models and are why "data validation, beta testing, and User Acceptance Testing (UAT) is extremely important in any research or project," (Ganga Adhikari).
Overall, students could not currently see themselves doing this type of research, but look forward to their coming classes where they will hopefully learn more about the topics presented in this article!
Siri and Alexa (current AI) are not yet to the point where we cannot distinguish between interactions with a human and the machine. (John Nuestro) As of right now machines are only good at what they are programmed to do. (Madison Goodwin) Currently only AI Eugene Goostman has passed the Turing Test but no machine has reached the point where they think like a human. (Craig McGhee) What currently makes it possible to distinguish between humans and machines is human’s ability to have emotional capacity rather than just logic. Machines will ‘think’ what information is given to them, (Rachel Ritchie) so we need to ask who is creating our machines/AI? Are they racist/sexist/have biased, hateful thinking? (Shawn Huseman) Since machines do not have the capacity to actually think they won’t have emotion behind bigotry, but their actions may indicate that they were designed by a bigot or that the information they were programmed with was biased. As technology progresses bias should be removed or programming put in place that prevents bias. (Octavia Dieng)