UMass Amherst researchers published an 85-page paper analyzing random samples of YouTube videos to better understand the platform's archive2. They found that many videos were intended for personal use or small groups, with a significant portion created by children appearing to be under 13. The research raises questions about the use of such content in training AI models and potential privacy risks.
Personal YouTube videos can impact AI training data by providing a diverse range of content that may not be highly viewed but has high engagement, such as conversations between friends. These videos can be valuable for training chatbot language models, and even videos with few views may be included in AI training datasets.
The bulk of YouTube's content consists of diverse video types, including personal videos meant for friends and family, educational content such as tutorials and explainers, and popular content like influencer stunts, news clips, and product review videos24. A significant proportion of YouTube content is created by children who appear to be under 13, raising privacy concerns for AI companies using this content for training their models.