Skip to content
150

See, Hear, Write

2 min

0
Modality (text-only AI, 2022)
0+
Modalities (modern multimodal AI)
0
Prompt to analyze a photo, get a playlist, and organize your desk

Take a photo of your messy desk. Upload it to an AI that understands images. Ask it: "Look at this photo of my workspace. Describe what you see, suggest an organization system, and write me a motivational cleaning playlist with 10 song suggestions."


In one interaction, the AI processed a visual image, analyzed spatial organization, applied knowledge about productivity, and crossed into music recommendation, all seamlessly. This is multimodal AI in action.


A year ago, AI could only work with text. Now it can see your photos, listen to audio recordings, read documents, and respond with text, images, or even code. The walls between different types of content are falling, and the possibilities are expanding faster than anyone predicted.

One prompt that uses text, image, and audio together.

Stage 1 of 6