On Tuesday, Microsoft Research Asia unveiled VASA-1, an AI model that can create a synchronized animated video of a person talking or singing from a single photo and an existing audio track. In the future, it could power virtual avatars that render locally and don't require video feeds—or allow anyone with similar tools to take a photo of a person found online and make them appear to say whatever they want.
"It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors," reads the abstract of the accompanying research paper titled, "VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time." It's the work of Sicheng Xu, Guojun Chen, Yu-Xiao Guo, Jiaolong Yang, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, and Baining Guo.
The VASA framework (short for "Visual Affective Skills Animator") uses machine learning to analyze a static image along with a speech audio clip. It is then able to generate a realistic video with precise facial expressions, head movements, and lip-syncing to the audio. It does not clone or simulate voices (like other Microsoft research) but relies on an existing audio input that could be specially recorded or spoken for a particular purpose.
Subscribers lodged thousands of complaints related to inaccuracies in Amazon's Prime Video catalog, including incorrect content and missing episodes, according to a Business Insider report this week. While Prime Video users aren't the only streaming users dealing with these problems, Insider's examination of leaked "internal documents" brings more perspective into the impact of mislabeling and similar errors on streaming platforms.
Insider didn't publish the documents but said they show that "60 percent of all content-related customer-experience complaints for Prime Video last year were about catalogue errors," such as movies or shows labeled with wrong or missing titles.
Specific examples reportedly named in the document include Season 1, Episode 2 of The Rings of Power being available before Season 1, Episode 1; character names being mistranslated; Continuum displaying the wrong age rating; and the Spanish-audio version of Die Hard With a Vengeance missing a chunk of audio.
Like the games, the show depicts a Vault Dweller making her way out into the Wasteland. [credit: Amazon ]
Amazon has had a rocky history with big, geeky properties making their way onto Prime Video. The Wheel of Time wasn’t for everyone, and I have almost nothing good to say about The Lord of the Rings: The Rings of Power.
Fallout, the first season of which premiered this week, seems to break that bad streak. All the episodes are online now, but I’ve watched three episodes so far. I love it.
I’ve spent hundreds of hours playing the games that inspired it, so I can only speak to that experience; I don’t know how well it will work for people who never played the games. But as a video game adaptation, it’s up there with The Last of Us.
It's been over two years since major players in the international game industry united to largely cut off the Russian market in response to a request from a beleaguered Ukraine. The relative isolation has apparently forced Vladimir Putin's government to contemplate the kind of homegrown gaming hardware and software that characterized Cold War gaming behind the Iron Curtain.
PC Gamer brings word of a series of recently approved Russian economic orders from the Kremlin. Amid talk of airport and museum funding, ocean shipping, and road construction is the somewhat bewildering instruction for the government to (machine translation):
consider the issue of organizing the production of stationary and portable game consoles and game consoles, as well as the creation of an operating system and a cloud system for delivering games and programs to users
Oh, is that all?
OpenAI has launched a charm offensive in Hollywood, holding meetings with major studios including Paramount, Universal, and Warner Bros Discovery to showcase its video generation technology Sora and allay fears the artificial intelligence model will harm the movie industry.
Chief Executive Sam Altman and Chief Operating Officer Brad Lightcap gave presentations to executives from the film industry giants, said multiple people with knowledge of the meetings, which took place in recent days.
Altman and Lightcap showed off Sora, a new generative AI model that can create detailed videos from simple written prompts.
All the available Immersive Video launch content fit on a small strip in the TV app. [credit: Samuel Axon ]
Tonight, Apple will debut some new Immersive Video content for the Vision Pro headset—the first sports content for the device. It doesn't seem like much after two months of no new content, though.
Starting at 6 pm PT/9 pm ET, Vision Pro users will be able to watch a sports film captured for the platform's Immersive Video format. The video will be a series of highlights from last year's Major League Soccer (MLS) playoffs, and according to Six Colors, it will run just five minutes. It will be free for all Vision Pro users.
On February 2, Apple released what appeared to be the first episodes of three Immersive Video series: Adventure, Prehistoric Planet, and Wildlife. Each debuted alongside the Vision Pro's launch with one episode labeled "Episode 1" of "Season 1."
A Vault Dweller navigates a post-apocalyptic wasteland in Fallout, based on the bestselling gaming franchise.
Amazon Prime Video has dropped the full official trailer for Fallout, the streaming platform's forthcoming post-apocalyptic sci-fi series. It's based on the bestselling role-playing gaming franchise set in a satirical, 1950s-style future post-nuclear apocalypse. There's plenty for gaming fans to be pleased about, judging by the trailer, but casting national treasure Walton Goggins (Justified) as a gunslinging Ghoul was quite simply a stroke of genius.
The first Fallout RPG was released in 1997, followed by several sequels and spinoffs. According to the game's lore, modern civilization is destroyed in 2077 by a global nuclear war between the US and China. Survivors live in various underground vaults (fallout shelters). Each iteration of the game takes place somewhere across a post-apocalyptic US metro area and features a Vault Dweller—someone born and raised underground—as the protagonist. The first game takes place in 2161 and features a Vault Dweller from Vault 13, deep in the mountains of Southern California. The Vault Dweller must complete various missions to save the residents of Vault 13, which takes said protagonist to in-world places like Junktown; a merchant city called the Hub; and Necropolis, filled with Ghouls, i.e., humans badly mutated by exposure to nuclear radiation.
The series was announced in July 2020, with Westworld writers Jonathan Nolan and Lisa Joy serving as executive producers. In January 2022, it was revealed that Nolan would direct the first three episodes but that two other writers—Geneva Robertson-Dworet and Graham Wagner—would be the showrunners. Todd Howard, who directed several games in the franchise, is also an executive producer and has said the series is not an adaptation of any particular game, but it is set within the same continuity. Per the official premise:
Video doorbell cameras have been commoditized to the point where they're available for $30–$40 on marketplaces like Amazon, Walmart, Temu, and Shein. The true cost of owning one might be much greater, however.
Consumer Reports (CR) has released the findings of a security investigation into two budget-minded doorbell brands, Eken and Tuck, which are largely the same hardware produced by the Eken Group in China, according to CR. The cameras are further resold under at least 10 more brands. The cameras are set up through a common mobile app, Aiwit. And the cameras share something else, CR claims: "troubling security vulnerabilities."
In an interview with The Hollywood Reporter published Thursday, filmmaker Tyler Perry spoke about his concerns related to the impact of AI video synthesis on entertainment industry jobs. In particular, he revealed that he has suspended a planned $800 million expansion of his production studio after seeing what OpenAI's recently announced AI video generator Sora can do.
"I have been watching AI very closely," Perry said in the interview. "I was in the middle of, and have been planning for the last four years... an $800 million expansion at the studio, which would’ve increased the backlot a tremendous size—we were adding 12 more soundstages. All of that is currently and indefinitely on hold because of Sora and what I’m seeing. I had gotten word over the last year or so that this was coming, but I had no idea until I saw recently the demonstrations of what it’s able to do. It’s shocking to me."
OpenAI, the company behind ChatGPT, revealed a preview of Sora's capabilities last week. Sora is a text-to-video synthesis model, and it uses a neural network—previously trained on video examples—that can take written descriptions of a scene and turn them into high-definition video clips up to 60 seconds long. Sora caused shock in the tech world because it appeared to surpass other AI video generators in capability dramatically. It seems that a similar shock also rippled into adjacent professional fields. "Being told that it can do all of these things is one thing, but actually seeing the capabilities, it was mind-blowing," Perry said in the interview.
On Monday, Will Smith posted a video on his official Instagram feed that parodied an AI-generated video of the actor eating spaghetti that went viral last year. With the recent announcement of OpenAI's Sora video synthesis model, many people have noted the dramatic jump in AI-video quality over the past year compared to the infamous spaghetti video. Smith's new video plays on that comparison by showing the actual actor eating spaghetti in a comical fashion and claiming that it is AI-generated.
Captioned "This is getting out of hand!", the Instagram video uses a split screen layout to show the original AI-generated spaghetti video created by a Reddit user named "chaindrop" in March 2023 on the top, labeled with the subtitle "AI Video 1 year ago." Below that, in a box titled "AI Video Now," the real Smith shows 11 video segments of himself actually eating spaghetti by slurping it up while shaking his head, pouring it into his mouth with his fingers, and even nibbling on a friend's hair. 2006's Snap Yo Fingers by Lil Jon plays in the background.
In the Instagram comments section, some people expressed confusion about the new (non-AI) video, saying, "I'm still in doubt if second video was also made by AI or not." In a reply, someone else wrote, "Boomers are gonna loose [sic] this one. Second one is clearly him making a joke but I wouldn’t doubt it in a couple months time it will get like that."
On Thursday, OpenAI announced Sora, a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it's only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It's also freaking people out.
"It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them," wrote Wall Street Journal tech reporter Joanna Stern on X.
"This could be the 'holy shit' moment of AI," wrote Tom Warren of The Verge.
On Thursday, AMC notified subscribers of a proposed $8.3 million settlement that provides awards to an estimated 6 million subscribers of its six streaming services: AMC+, Shudder, Acorn TV, ALLBLK, SundanceNow, and HIDIVE.
The settlement comes in response to allegations that AMC illegally shared subscribers' viewing history with tech companies like Google, Facebook, and X (aka Twitter) in violation of the Video Privacy Protection Act (VPPA).
Passed in 1988, the VPPA prohibits AMC and other video service providers from sharing "information which identifies a person as having requested or obtained specific video materials or services from a video tape service provider." It was originally passed to protect individuals' right to private viewing habits, after a journalist published the mostly unrevealing video rental history of a judge, Robert Bork, who had been nominated to the Supreme Court by Ronald Reagan.
Streaming services like Amazon Prime Video promote annual subscriptions as a way to save money. But long-term commitments to streaming companies that are in the throes of trying to determine how to maintain or achieve growth typically end up biting subscribers in the butt—and they're getting fed up.
As first reported by The Hollywood Reporter, a lawsuit seeking class-action certification [PDF] hit Amazon on February 9. The complaint centers on Amazon showing ads with Prime Video streams, which it started doing for US subscribers in January unless customers paid an extra $2.99/month. This approach differed from how other streaming services previously introduced ads: by launching a new subscription plan with ads and lower prices and encouraging subscribers to switch.
A problem with this approach, though, as per the lawsuit, is that it meant that people who signed up for an annual subscription to Prime Video before Amazon’s September 2023 announcement about ads already paid for a service that’s different from what they expected.
On January 29, Amazon started showing ads to Prime Video subscribers in the US unless they pay an additional $2.99 per month. But this wasn't the only change to the service. Those who don't pay up also lose features; their accounts no longer support Dolby Vision or Dolby Atmos.
As noticed by German tech outlet 4K Filme on Sunday, Prime Video users who choose to sit through ads can no longer use Dolby Vision or Atmos while streaming. Ad-tier subscribers are limited to HDR10+ and Dolby Digital 5.1.
4K Filme confirmed that this was the case on TVs from both LG and Sony; Forbes also confirmed the news using a TCL TV.
On Tuesday, Meta announced its plan to start labeling AI-generated images from other companies like OpenAI and Google, as reported by Reuters. The move aims to enhance transparency on platforms such as Facebook, Instagram, and Threads by informing users when the content they see is digitally synthesized media rather than an authentic photo or video.
Coming during a US election year that is expected to be contentious, Meta's decision is part of a larger effort within the tech industry to establish standards for labeling content created using generative AI models, which are capable of producing fake but realistic audio, images, and video from written prompts. (Even non-AI-generated fake content can potentially confuse social media users, as we covered yesterday.)
Meta President of Global Affairs Nick Clegg made the announcement in a blog post on Meta's website. "We’re taking this approach through the next year, during which a number of important elections are taking place around the world," wrote Clegg. "During this time, we expect to learn much more about how people are creating and sharing AI content, what sort of transparency people find most valuable, and how these technologies evolve."
On Tuesday, Google announced Lumiere, an AI video generator that it calls "a space-time diffusion model for realistic video generation" in the accompanying preprint paper. But let's not kid ourselves: It does a great job of creating videos of cute animals in ridiculous scenarios, such as using roller skates, driving a car, or playing a piano. Sure, it can do more, but it is perhaps the most advanced text-to-animal AI video generator yet demonstrated.
According to Google, Lumiere utilizes unique architecture to generate a video's entire temporal duration in one go. Or, as the company put it, "We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution—an approach that inherently makes global temporal consistency difficult to achieve."
In layperson terms, Google's tech is designed to handle both the space (where things are in the video) and time (how things move and change throughout the video) aspects simultaneously. So, instead of making a video by putting together many small parts or frames, it can create the entire video, from start to finish, in one smooth process.
Patreon, a monetization platform for content creators, has asked a federal judge to deem unconstitutional a rarely invoked law that some privacy advocates consider one of the nation's "strongest protections of consumer privacy against a specific form of data collection." Such a ruling would end decades that the US spent carefully shielding the privacy of millions of Americans' personal video viewing habits.
The Video Privacy Protection Act (VPPA) blocks businesses from sharing data with third parties on customers' video purchases and rentals. At a minimum, the VPPA requires written consent each time a business wants to share this sensitive video data—including the title, description, and, in most cases, the subject matter.
The VPPA was passed in 1988 in response to backlash over a reporter sharing the video store rental history of a judge, Robert Bork, who had been nominated to the Supreme Court by Ronald Reagan. The report revealed that Bork apparently liked spy thrillers and British costume dramas and suggested that maybe the judge had a family member who dug John Hughes movies.
Amazon confirmed today in an email to Prime members that it will begin showing ads alongside its streaming Prime Video content starting January 29, 2024. The price will remain the same, but subscribers who don't wish to see any ads will have to pay an additional $2.99 per month on top of their monthly or yearly Amazon Prime subscription. The change was first reported back in September.
"Starting January 29, Prime Video movies and TV shows will include limited advertisements," Amazon wrote in an email sent to Amazon Prime subscribers. "This will allow us to continue investing in compelling content and keep increasing that investment over a long period of time. We aim to have meaningfully fewer ads than linear TV and other streaming TV providers. No action is required from you, and there is no change to the current price of your Prime membership."
Subscribers who want to avoid ads can sign up for the extra monthly fee at the Prime Video website.
"Here, There, and Everywhere" isn't just a Beatles song. It's also a phrase that recalls the spread of generative AI into the tech industry during 2023. Whether you think AI is just a fad or the dawn of a new tech revolution, it's been impossible to deny that AI news has dominated the tech space for the past year.
We've seen a large cast of AI-related characters emerge that includes tech CEOs, machine learning researchers, and AI ethicists—as well as charlatans and doomsayers. From public feedback on the subject of AI, we've heard that it's been difficult for non-technical people to know who to believe, what AI products (if any) to use, and whether we should fear for our lives or our jobs.
Meanwhile, in keeping with a much-lamented trend of 2022, machine learning research has not slowed down over the past year. On X, former Biden administration tech advisor Suresh Venkatasubramanian wrote, "How do people manage to keep track of ML papers? This is not a request for support in my current state of bewilderment—I'm genuinely asking what strategies seem to work to read (or "read") what appear to be 100s of papers per day."
Users of UniFi, the popular line of wireless devices from manufacturer Ubiquiti, are reporting receiving private camera feeds from, and control over, devices belonging to other users, posts published to social media site Reddit over the past 24 hours show.
“Recently, my wife received a notification from UniFi Protect, which included an image from a security camera,” one Reddit user reported. “However, here's the twist—this camera doesn't belong to us.”
The post included two images. The first showed a notification pushed to the person’s phone reporting that their UDM Pro, a network controller and network gateway used by tech-enthusiast consumers, had detected someone moving in the backyard. A still shot of video recorded by a connected surveillance camera showed a three-story house surrounded by trees. The second image showed the dashboard belonging to the Reddit user. The user’s connected device was a UDM SE, and the video it captured showed a completely different house.
On Tuesday, Stability AI released Stable Video Diffusion, a new free AI research tool that can turn any still image into a short video—with mixed results. It's an open-weights preview of two AI models that use a technique called image-to-video, and it can run locally on a machine with an Nvidia GPU.
Last year, Stability AI made waves with the release of Stable Diffusion, an "open weights" image synthesis model that kick started a wave of open image synthesis and inspired a large community of hobbyists that have built off the technology with their own custom fine-tunings. Now Stability wants to do the same with AI video synthesis, although the tech is still in its infancy.
Right now, Stable Video Diffusion consists of two models: one that can produce image-to-video synthesis at 14 frames of length (called "SVD"), and another that generates 25 frames (called "SVD-XT"). They can operate at varying speeds from 3 to 30 frames per second, and they output short (typically 2-4 second-long) MP4 video clips at 576×1024 resolution.
On Tuesday, YouTube announced it will soon implement stricter measures on realistic AI-generated content hosted by the service. "We’ll require creators to disclose when they've created altered or synthetic content that is realistic, including using AI tools," the company wrote in a statement. The changes will roll out over the coming months and into next year.
The move by YouTube comes as part of a series of efforts by the platform to address challenges posed by generative AI in content creation, including deepfakes, voice cloning, and disinformation. When creators upload content, YouTube will provide new options to indicate if the content includes realistic AI-generated or AI-altered material. "For example, this could be an AI-generated video that realistically depicts an event that never happened, or content showing someone saying or doing something they didn't actually do," YouTube writes.
In the detailed announcement, Jennifer Flannery O'Connor and Emily Moxley, vice presidents of product management at YouTube, explained that the policy update aims to maintain a positive ecosystem in the face of generative AI. "We believe it’s in everyone’s interest to maintain a healthy ecosystem of information on YouTube," they write. "We have long-standing policies that prohibit technically manipulated content that misleads viewers ... However, AI’s powerful new forms of storytelling can also be used to generate content that has the potential to mislead viewers—particularly if they’re unaware that the video has been altered or is synthetically created."
The Amazon-produced TV adaptation of the popular, long-running video game series Fallout will premiere on Amazon Prime Video on April 12, 2024, the company announced on Monday.
Amazon also announced that the series will be part of the same canon as the video games. The announcement was made through social media posts that showed an interface resembling the iconic Vault Boy accessory from the games.
The series was announced in July 2020, alongside news that Westworld writers Jonathan Nolan and Lisa Joy would be executive producers. In January 2022, it was revealed that Nolan would direct the first episode but that two other writers—Geneva Robertson-Dworet (Captain Marvel, Tomb Raider) and Graham Wagner (Portlandia, The Office, Silicon Valley) would be the showrunners.
News of AI deepfakes spread quickly when you're Tom Hanks. On Sunday, the actor posted a warning on Instagram about an unauthorized AI-generated version of himself being used to sell a dental plan. Hanks' warning spread in the media, including The New York Times. The next day, CBS anchor Gayle King warned of a similar scheme using her likeness to sell a weight-loss product. The now widely reported incidents have raised new concerns about the use of AI in digital media.
"BEWARE!! There’s a video out there promoting some dental plan with an AI version of me. I have nothing to do with it," wrote Hanks on his Instagram feed. Similarly, King shared an AI-augmented video with the words "Fake Video" stamped across it, stating, "I've never heard of this product or used it! Please don't be fooled by these AI videos."
Also on Monday, YouTube celebrity MrBeast posted on social media network X about a similar scam that features a modified video of him with manipulated speech and lip movements promoting a fraudulent iPhone 15 giveaway. "Lots of people are getting this deepfake scam ad of me," he wrote. "Are social media platforms ready to handle the rise of AI deepfakes? This is a serious problem."