I again submit the last two years where model collapse did not happen. The doom-and-gloom predictions - some rather gleeful - plainly missed the mark. The proliferation of generated content has not in fact ruined the content generators, and it’s sure not because we’re any good at marking generated content. Early symptoms went away entirely and the problem has been practically addressed.
As for “unlearning,” universality is why it’s a made-up problem. Nobody loudly complains that x-rays make doctors worse at feeling around for lumps.
Is classifying generated data ever intractable, when you can just… generate some? Even if the portion of human content dwindled - rather new human content, since there’s a recent hard cutoff - there’s a faucet for counterexamples. You can have as much for-sure machine-generated content as you like, to point at and go, wrong.
The article links a study. What’s your study that collapse isn’t a concern?
For what it’s worth, my worry was never focused on cancer, these doctors were just an example measured for the likely universal unlearning effect.
I again submit the last two years where model collapse did not happen. The doom-and-gloom predictions - some rather gleeful - plainly missed the mark. The proliferation of generated content has not in fact ruined the content generators, and it’s sure not because we’re any good at marking generated content. Early symptoms went away entirely and the problem has been practically addressed.
As for “unlearning,” universality is why it’s a made-up problem. Nobody loudly complains that x-rays make doctors worse at feeling around for lumps.
https://cacm.acm.org/blogcacm/model-collapse-is-already-happening-we-just-pretend-it-isnt/ Others seem to disagree.
This is approaching unfalsifiable.
Is classifying generated data ever intractable, when you can just… generate some? Even if the portion of human content dwindled - rather new human content, since there’s a recent hard cutoff - there’s a faucet for counterexamples. You can have as much for-sure machine-generated content as you like, to point at and go, wrong.