JustAnotherOldGuy shares a report from Live Science: An neural network named “Speech2Face” was trained by scientists on millions of educational videos from the internet that showed over 100,000 different people talking. From this dataset, Speech2Face learned associations between vocal cues and certain physical features in a human face, researchers wrote in a new study. The AI then used an audio clip to model a photorealistic face matching the voice, and the results are surprisingly close to the actual faces of the people whose voices it listened to. The faces generated by Speech2Face didn’t precisely match the people behind the voices. But the images did usually capture the correct age ranges, ethnicities and genders of the individuals, according to the study. The findings have been published in the preprint journal arXiv but have not been peer-reviewed.
Read more of this story at Slashdot.