“Tech giants harvest data to boost A.I. without ethical concern.”




Key Elements of Article on Tech Giants Harvesting Data for A.I.

TLDR

Key points:

  • Tech giants like OpenAI, Google, and Meta cut corners to harvest data for A.I.
  • OpenAI transcribed over one million hours of YouTube videos to train its latest A.I. system.

Article Summary

In late 2021, OpenAI faced a shortage of English-language text for training its A.I. system, leading to the creation of a speech recognition tool called Whisper to transcribe YouTube videos for more data. Some employees raised concerns about violating YouTube’s rules, but the data was still collected. Meta also discussed buying Simon & Schuster to acquire long works and debated gathering copyrighted data without licenses. This highlights the desperate hunt for data by tech companies like Google, OpenAI, and Meta.

The article also discusses the shift towards using more training data in A.I. models, led by developments like GPT-3. Tech companies like Google use public information to train language A.I. models like Google Translate and Cloud AI capabilities.