What We Lost the Last Time Code Got Cheap

codeinabox@programming.dev · 3 days ago

What We Lost the Last Time Code Got Cheap

ell1e@leminal.space · 1 day ago

Have you checked on that narrative?

The only workaround known so far seems to be to make sure enough data is fresh: https://www.inria.fr/en/collapse-ia-generatives https://en.wikipedia.org/wiki/Model_collapse But read for yourself.

mindbleach@sh.itjust.works · 1 day ago

That’s a lot of “could” and “will” from an article a year old, primarily about concerns from two years ago, while image models to-day keep getting smaller and better. They didn’t find a second internet’s worth of JPEGs. Better training on the same data, or even better labels on less data, beats a simple obsession with scale.

Yes, photocopying a photocopy will degrade, but diffusion is a denoising algorithm. Un-degrading an image is its central function. ‘Make it look less AI’ is how you get generative adversarial networks.

Anyway, the grim truth is that the central concern is mistaken. Training data for cancer screening does not require the patient lived.

ell1e@leminal.space · 1 day ago

The article links a study. What’s your study that collapse isn’t a concern?

For what it’s worth, my worry was never focused on cancer, these doctors were just an example measured for the likely universal unlearning effect.

mindbleach@sh.itjust.works · 23 hours ago

I again submit the last two years where model collapse did not happen. The doom-and-gloom predictions - some rather gleeful - plainly missed the mark. The proliferation of generated content has not in fact ruined the content generators, and it’s sure not because we’re any good at marking generated content. Early symptoms went away entirely and the problem has been practically addressed.

As for “unlearning,” universality is why it’s a made-up problem. Nobody loudly complains that x-rays make doctors worse at feeling around for lumps.

ell1e@leminal.space · 20 hours ago

https://cacm.acm.org/blogcacm/model-collapse-is-already-happening-we-just-pretend-it-isnt/ Others seem to disagree.