Cosmos 3 Super Prompt Guide:
Write Prompts That Actually Work
Cosmos 3 Super is xAI's image-to-video model that animates still images into short videos with synchronized audio. The model already sees your uploaded image — your prompt should focus on motion, camera, atmosphere, and audio. This guide follows the official image-to-video prompt structure with copy-ready examples and the mistakes to avoid.
// Image-to-Video Prompt Formula
The Image-to-Video Prompt Structure
Cosmos 3 Super is image-to-video only. Your uploaded image provides the scene — your prompt describes what should change: motion, camera, atmosphere, and audio.
// The Formula
Motion / Action
What should change — the action, movement, or transformation. Be specific about degree and speed.
Camera Movement
How the camera moves — use standard cinematic language the model understands.
Atmosphere / Lighting
Mood, time of day, or light quality — not what's already visible in the image.
Audio
Native audio is generated automatically. Describe music, SFX, ambience, or dialogue.
// Camera Movements
Cinematic Camera Language
Cosmos 3 Super understands standard cinematic camera terminology. Always specify a shot type and camera movement in your image-to-video prompts.
| Movement | What it does |
|---|---|
| Pan left / right | Camera rotates horizontally to reveal a scene |
| Tilt up / down | Camera rotates vertically for dramatic reveals |
| Zoom in / out | Lens zooms closer or further |
| Dolly in / out | Camera physically moves forward or backward (more cinematic than zoom) |
| Tracking / follow shot | Camera follows a moving subject |
| Orbit / surround | Camera circles around the subject |
| Aerial / drone | Elevated bird's-eye perspective |
| Handheld | Natural shake for documentary feel or urgency |
| Slow push-in | Gradual forward movement to build tension |
| Static / tripod | No camera movement for stable, formal compositions |
// Audio Prompts
Native Audio Generation
You don't need to add audio separately — Grok 1.5 handles it automatically as part of the same generation. Just mention music, sound effects, ambience, or short dialogue in your prompt.
Background music
- “with upbeat electronic music”
- “dramatic orchestral score”
Sound effects
- “footsteps on gravel”
- “wind howling”
- “engine revving”
Ambient audio
- “quiet café ambience”
- “forest sounds with birdsong”
Short dialogue
- “a quiet whisper: 'We made it.'”
- “urgent shout: 'Stop him!'”
AUDIO: section at the end of your prompt for clarity. This helps separate visual and audio instructions.Want to see how native audio performs in practice? The Cosmos 3 Super review covers real audio test results from 40+ generation runs.
// Prompt Keywords
Prompt Keyword Library
Click any keyword to copy it. Combine motion, camera, atmosphere, and audio terms for image-to-video prompts.
Motion & Action
Camera Movement
Atmosphere & Lighting
Audio (Native Generation)
// 30 Copy-Ready Prompts
Prompt Examples by Scene
Click any prompt to copy it. All examples are motion-focused for image-to-video — upload your image first, then paste and customize for your scene.
The sneaker rotates smoothly on the pedestal, camera orbiting at eye level, dramatic spotlight sweeping across the surface.
Slow 360-degree rotation. Studio lighting sweeping across surface. Subtle electronic hum.
Static shot, steam rising from cup. Natural kitchen sounds with distant conversation.
Water droplets falling on watch face in slow motion. Dramatic rim light sweeping. Orchestral swell building.
Product slowly tilts forward revealing details. Clean studio lighting. Quiet ambient tone.
// Bad vs Good Prompts
What Makes a Prompt Actually Work
The fastest way to learn is to see what doesn't work next to what does. These image-to-video prompt comparisons show exactly what to fix — motion, camera, and audio instead of re-describing your image.
A woman with brown hair and blue dress walking on a beach at sunset with waves
Slow pull-back as she walks forward. Ocean breeze moving her hair. Ambient wave sounds.
Car passing
Car racing past at high speed. Static wide shot. Engine revving loudly.
A woman dances gracefully
The man slowly nods and smiles. Gentle camera push-in. Soft room tone.
Steam rising from the coffee cup
Static close-up, steam rising gently. Morning window light. Quiet café ambience.
No blur, avoid shaking, without grain
Sharp focus, stable tripod shot, clean cinematic look.
// Common Mistakes
7 Prompt Mistakes That Kill Your Output
These are the most frequent problems we see with Grok Imagine Video 1.5 prompts. Each mistake leads to blurry, inconsistent, or off-target results — and each has a simple fix.
Re-describing the image
The model already sees it. Describing what's in the photo wastes prompt budget and can cause drift.
Fix: Focus on motion, camera movement, atmosphere, and audio.
Contradicting the source image
Writing actions or subjects that don't match the uploaded photo confuses the output.
Fix: Match your prompt to what's actually in the image.
Tag stacking
"knight, castle, epic, 8K, cinematic" doesn't help — the model needs intent, not keywords.
Fix: Write a natural sentence with clear motion and camera direction.
Too many simultaneous actions
Multiple unrelated actions at once produce inconsistent results.
Fix: Keep it to one subject, one action, one camera move — or list multi-beat actions in order.
No camera direction
Without camera direction, the model defaults to static or unpredictable motion.
Fix: Always specify a shot type and camera movement.
Vague motion
"The thing moves" gives the model nothing to work with.
Fix: Use specific verbs with intensity modifiers — 'racing past at high speed' not 'passing.'
Using negative prompts
"No blur", "avoid shaking" — the model ignores negative instructions entirely.
Fix: Describe what you want instead.
See the difference these fixes make by testing them in the Cosmos 3 Super generator — compare a corrected prompt against the original mistake.
// Think Like a Director
Write Prompts for Image-to-Video
Think like a director — your image is the scene. Write about motion, camera, and audio, not description. Every generation requires an input image.
Don't re-describe the image
The model sees it. Tell it what should change — the action, the camera movement, the atmosphere.
Don't contradict the image
If there's a man in the photo, don't write 'a woman dances.' Match your prompt to what's actually there.
Be specific about motion
'Car passing' is vague — 'car racing past at high speed' gives the model something to work with.
Anchor the subject
Mention prominent features: 'the old man wearing glasses' or 'the woman in the red jacket.'
Negative prompts don't work
The model ignores them. Describe what you want instead.
Apply these rules directly in the Cosmos 3 Super generator — upload an image and paste any motion prompt from this guide to test immediately.
// What You Can Make
Use Cases for Image-to-Video
Cosmos 3 Super animates still images into short videos with synchronized audio. It handles both visual generation and audio synthesis in one pass.
Product Showcases
Got a product photo sitting flat? Give it a slow rotation and dramatic lighting sweep — suddenly it looks like a real ad. Works great for watches, sneakers, and anything you want to show off from multiple angles.
Character Animation
Illustrated characters move surprisingly well — think smooth walk cycles and exaggerated expressions, without needing a full animation crew. The model handles cartoon physics better than you'd expect.
Portrait Videos
Animate professional headshots into video introductions with natural human motion. The model handles realistic facial expressions, head turns, and body language.
Creative Projects
Bring concept art to life, animate historical photos, or turn memes into short video clips with appropriate sound effects and music.
Curious how the model performs across these scenarios in real tests? Read the Cosmos 3 Super review for benchmark data and head-to-head results.
// FAQ
Frequently Asked Questions
Common questions about writing prompts for Cosmos 3 Super — covering length, structure, audio, camera control, and consistency.
// Ready to Generate
Put Your Prompts to Work
Upload a still image, paste a motion-focused prompt from this guide, and generate a short clip with native audio — no API key required on this site.
Want benchmark data and real test results? Read the full review