A Glimmer of Hope in the AI Replacing Jobs Argument

6.7K followers

Updated 2 years ago

News stories continue to emerge stating that AI is going to take away most jobs. Generative AI can do the work of multiple writers, so the argument goes. AI can replace artists, voice over talent, actors, and so on.

But here's an interesting twist. A research paper submitted to arXiv suggests that as more models are trained using previous iterations of generative AI, the more diluted the information becomes. In other words, when the content creation comes from AI, this accounts for what the authors refer to as model collapse.

[2305.17493v2] The Curse of Recursion: Training on Generated Data Makes Models Forget (arxiv.org)

Disclaimer: arXiv offers a repository for researchers. The organization (run by Cornell University) does not peer review the research, although the articles are moderated for completeness. As with anything, it's important to keep that in mind when considering the content of the research. That doesn't necessarily make it wrong. But use that information for what it's worth.

If you decide to forge ahead to read the research, be forewarned that it is a bit mathematically intensive. I have not gotten through the entire article yet but I did submit it to ChatGPT to summarize it. Here is the result of that summarization:

----------- Start of ChatGTP results --------------

This paper explores the phenomenon of "Model Collapse" in the context of generative models, particularly Large Language Models (LLMs) like GPT. Model collapse is a process where generative models progressively lose the ability to represent the true underlying data distribution when they are trained on data generated by their predecessors. The paper makes several key points:

1. **Background**: The paper begins by discussing the importance of human communication online, mentioning the role of search engines and smart text editors. It highlights that LLMs, such as GPT-4, are capable of generating text that is often indistinguishable from human-written text.

2. **Model Collapse**: The paper introduces the concept of model collapse, which is a degenerative process observed in generative models. It occurs as models are trained on data generated by previous generations of models, leading to a progressive loss of information about the true data distribution.

3. **Causes of Model Collapse**: The paper identifies two primary causes of model collapse:
- **Statistical Approximation Error**: This error arises due to the finite number of samples used in training, leading to the loss of information.
- **Functional Approximation Error**: This error stems from limitations in the expressiveness of the function approximators (e.g., neural networks) used in the models.

4. **Discrete Distributions**: The paper discusses the case of discrete probability distributions and how model collapse affects them. It explains that model collapse in this scenario is primarily due to statistical errors.

5. **Single-Dimensional Gaussian**: The paper explores the case of a single-dimensional Gaussian distribution. It shows that even with perfect functional approximation (i.e., fitting a Gaussian to the data), model collapse can still occur due to sampling errors. Variance increases linearly with each generation.

6. **Wasserstein Distance**: The paper suggests using the Wasserstein-2 distance as a measure to quantify the distance between the true distribution and the approximated distribution at each generation.

7. **Implications**: The paper concludes by discussing the broader implications of model collapse. It emphasizes the importance of having access to genuine human-generated content to avoid model collapse, particularly when the tails of the data distribution matter. It suggests that the use of LLMs to generate content on the internet can contaminate training data with model-generated data.

In summary, the paper presents a theoretical exploration of model collapse in generative models, illustrating how both statistical and functional approximation errors contribute to the phenomenon. It emphasizes the need for careful consideration of data sources when training generative models to avoid model collapse.

----------- End of ChatGTP results --------------

The research seems to conclude that humans will continue to be needed in the creative process for the models to "stay fresh." Even though I knew this, I decided to ask ChatGPT the following question:

Me: Does the paper suggest that we will continue to require human-created content to keep the information fresh for the models?

----------- Start of ChatGTP results --------------

Yes, the paper suggests that access to genuine human-created content is essential to keep the information fresh for generative models and to avoid model collapse. The paper emphasizes that when models are trained on data generated by their predecessors, they can progressively lose touch with the true underlying data distribution. This means that relying solely on model-generated content for training new generations of models can lead to a degradation in the quality and diversity of the generated content.

To prevent or mitigate model collapse, the paper suggests that continuing to incorporate real human-generated data into the training process is crucial. Human-generated content serves as a valuable source of diverse and authentic data that can help maintain the models' ability to generate high-quality, relevant, and up-to-date content.

In essence, the paper highlights the ongoing importance of human-created content in training and fine-tuning generative models to ensure that they remain reliable and capable of producing fresh and contextually accurate output.

----------- End of ChatGTP results --------------

AI is here to stay and it will improve remarkably over the next few years. There may even be a time where AI becomes exactly like humans and that may be enough to offer the diversity in the "gene pool" of content. Who knows? But at least in the short term, it seems that human-generated content is still held in high regard.

Best Regards,

Jim

Create Your Free Wealthy Affiliate Account Today!

4-Steps to Success Class

One Profit Ready Website

Market Research & Analysis Tools

Millionaire Mentorship

Core “Business Start Up” Training

Recent Comments

New message...

JeffreyBrown

2 years ago

We would definitely hope so, James! Some good info!

Jeff

techhound

2 years ago

Hey Jeff,
Nice to hear from you again. Thank you for your comment.
Best Regards,
Jim

JeffreyBrown

2 years ago

Thanks, Jim! A compelling post, my friend!

Jeff

Fashion4746

2 years ago

Hi Jim, You know I like your post but I have to disagree on one point we can not do without humans in the workforce. We were skeptical about the computers but yet we still need a paper trail because we don't know when a computer will crush.

It is the same with AI it has no feelings, no compassion, no human touch. We have that as being human but computers don't have that capability. We need human touch, human companionship. We can't cuddle up with AI or a computer.

But what is the benefits of dealing with AI. Writers show feelings through their work same with artists, AI can't do that only human being can do that.

Mary

techhound

2 years ago

Thank for your comment, Mary. I don't believe I alluded to doing without humans in the workforce. In fact, I present the counter argument that suggests that we cannot do without humans, at least in the short term.

I did state if there ever comes a time when AI technology can make machines think like humans completely, then that could change the landscape. But some of the more prominent research on this suggests that we are decades away from that. Again, who knows.

Best Regards,
Jim

Fashion4746

2 years ago

Hi Jim, I wasn't saying that I was stating that in general. But you know the way things are going AI might become more sophisticated faster than we think it is.

We just have to be prepared in either case. In general society thought computer was a fad and look what has happen computers have evolved so fast that now we can't seem to catch up with all the new things.

We started with floppy discs, then we went to the hard discs, now we have no discs but flash drives. We went from vinyl records to cd's and now we are back to vinyl records. We went from 8 track tapes, to cassette tapes.

So things are revolving faster than we can catch up to. We used to have external hard drives now we have external hard drives. Also we have TV's with picture tubes to flat screens.

So going back to what I was saying AI could be evolving just as fast we don't know exactly sure the time that will happen.

Mary

techhound

2 years ago

Hey Mary,
Thanks for the follow-up comment.
Best Regards,
Jim

FrankB-1

2 years ago

Hi, Jim

Yes, I've already read that paper; very interesting stuff!

Check out "Fusing Large Language Models with
Completion Engines for Automated Program Repair."
https://arxiv.org/pdf/2309.00608.pdf

The Repilot framework is a workable alternative, for sure!

By the way, arXiv is run by Cornell University, not Columbia. 😎

Rock On! 🤘
Frank 🎸

techhound

2 years ago

Hey Frank, thanks for your comment. I will check out that article. And thanks for the clarification on the university. I am changing it now in the article :)

Best Regards,
Jim

FrankB-1

2 years ago

You're welcome, Jim. 😎

Have a great week ahead!
Frank 🎸

techhound

2 years ago

You as well, Frank!

See more comments

Create Your Free Wealthy Affiliate Account Today!

4-Steps to Success Class

One Profit Ready Website

Market Research & Analysis Tools

Millionaire Mentorship

Core “Business Start Up” Training