There are 2 problems with not having enough diversity in training data:
-
The AI will be worse at depicting diversity when prompted, eg. If the AI hasn’t seen enough pictures of black people it may not be able to depict black hair properly as it doesn’t “know what it looks like”
-
The AI will not show as much diversity when not prompted. The AI is working off statistics so if you tell it to depict a person and most of the people it’s “seen” are white men it will almost always depict a white man because that’s statistically what a person is according to its data.
This method combats the second problem, but not the first. The first can mostly be solved by generally scaling the training data though, which is mostly what these companies have been doing. Even if only 1% of your images are of POC, if you have 1b images 10mil will be of POC which may be enough to train it. The second problem would remain unsolved though since the AI will always go with the statistically safe 99%.
We really need to vote her out this round, San Francisco deserves better than this