Researchers warn we could run out of data to train AI by 2026. What then?
As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems.
- As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems.
- This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.
Why high-quality data are important for AI
- For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.
- If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.
- Low-quality data such as social media posts or blurry photographs are easy to source, but aren’t sufficient to train high-performing AI models.
Do we have enough data?
- At the same time, research shows online data stocks are growing much slower than datasets used to train AI.
- They also estimated low-quality language data will be exhausted sometime between 2030 and 2050, and low-quality image data between 2030 and 2060.
- AI could contribute up to US$15.7 trillion (A$24.1 trillion) to the world economy by 2030, according to accounting and consulting group PwC.
Should we be worried?
- One opportunity is for AI developers to improve algorithms so they use the data they already have more efficiently.
- It’s likely in the coming years they will be able to train high-performing AI systems using less data, and possibly less computational power.
- Being remunerated for their work may help restore some of the power imbalance that exists between creatives and AI companies.
- Read more:
No, the Lensa AI app technically isn’t stealing artists' work – but it will majorly shake up the art world
Rita Matulionyte is a member of Standards Australia, IT-043 working group.