Meta SAM 3 and 3D available as open models
SAM segments objects in images and videos, even audio can be separated by prompt: The AI model is freely available.
All zebras have been segmented using a prompt.
(Image: SAM)
Meta is making SAM 3, 3D Objects, and 3D Body freely available. These are the latest segmentation models. What sounds cumbersome actually means the automated segmentation of objects in images and videos. This allows objects to be captured and, for example, inserted into new environments. Meta sees this as an important step in computer vision.
SAM stands for Segment Anything Model. You can use the new version, for instance, via the Edits app or the Playground – a specially set up website, but the models are also freely available on the common platforms.
(Image: Eva-Maria Weiß / KI / SAM)
In the Playground, you can upload images or videos and select elements via prompt that SAM will isolate. Typically, a single word is enough; SAM recognizes a person, a dog, or a kettle. Only simple prompts with one, two, or three words are possible. This is because Meta opted for an encoder model instead of integrating a large language model – as Nikhila Ravi, Research Engineer at Meta, explained to us in an interview.
Afterwards, effects can be applied. You can clone the object, pixelate it, add frames, change the background, black it out, and much more. The images can be downloaded. Instagram is, of course, an almost mandatory application area for this kind of tinkering. However, the images can also be uploaded and sent elsewhere.
SAM 3D as a Shopping Assistant
It is also possible to select objects from photos from which SAM creates a 3D image. These can then be inserted into other backgrounds and effects can be overlaid. A lamp in a vacuum over ice with firefly-like dots is not a truly realistic use case, but it's fun. If you hold the object with the mouse, you can change the viewing angle. Even people can become 3D images, whose movements are also segmented across multiple images. For this, there is the specialized model SAM 3D Body, which captures the skeletal structure of a person in the background rather than the shape of the entire body volume.
(Image: SAM)
Meta is already testing the 3D function in Facebook Marketplace. There, buyers can isolate home furnishings and integrate them into their own rooms. Similar functions have been worked on in online shopping for a long time, but the cutouts and objects were previously much more complex to create. The ability to convert used goods into a 3D object on the fly is new.
In addition to the Playground, SAM 3 can also be used in Edits. This is Meta's AI-based video editor app. The functions here are also designed to upload videos and images to Instagram. A similar app is also offered by TikTok with CapCut.
Videos by heise
The model is not trained for use in medicine, for example. This would require further fine-tuning. The application areas of SAM range from fun image and video editing to robotics and data labeling.
Also new is the SAM Audio model. This allows sounds, speech, and music to be segmented. For example, it is possible to filter out only the guitar from a video recording of a band. A conversation or bird chirping can be separated from an environment. Here too, a simple prompt is sufficient. Meta refers to it as a Perception Encoder Audio Video – another new, freely available model.
(emw)