Image-to-Video Only30 Copy-Ready PromptsPreview · June 2026

Cosmos 3 Super Prompt Guide:
Write Prompts That Actually Work

Cosmos 3 Super is xAI's image-to-video model that animates still images into short videos with synchronized audio. The model already sees your uploaded image — your prompt should focus on motion, camera, atmosphere, and audio. This guide follows the official image-to-video prompt structure with copy-ready examples and the mistakes to avoid.

Generate with Cosmos 3 Super Jump to Guide →

Director's 5 Rules

30 Copy-Ready Examples

Camera + Audio Guide

1–15s · 480p/720p

// Image-to-Video Prompt Formula

The Image-to-Video Prompt Structure

Cosmos 3 Super is image-to-video only. Your uploaded image provides the scene — your prompt describes what should change: motion, camera, atmosphere, and audio.

// The Formula

1Motion2Camera3Atmosphere4Audio

Step 1

Motion / Action

What should change — the action, movement, or transformation. Be specific about degree and speed.

turns head to the right and smilescar racing past at high speedsteam rising gently

Step 2

Camera Movement

How the camera moves — use standard cinematic language the model understands.

gentle camera push-incamera orbiting at eye levelslow pan left

Step 3

Atmosphere / Lighting

Mood, time of day, or light quality — not what's already visible in the image.

soft morning window lightgolden hour warmthcozy kitchen mood

Step 4

Audio

Native audio is generated automatically. Describe music, SFX, ambience, or dialogue.

soft room tonefootsteps on gravelupbeat electronic music

// Complete Example

1 · MotionThe woman slowly turns her head to the right and smiles, soft breeze moving her hair. 2 · CameraGentle camera push-in. 3 · AtmosphereSoft morning window light, cozy mood. 4 · AudioQuiet room tone with faint birdsong.

// Camera Movements

Cinematic Camera Language

Cosmos 3 Super understands standard cinematic camera terminology. Always specify a shot type and camera movement in your image-to-video prompts.

Movement	What it does
Pan left / right	Camera rotates horizontally to reveal a scene
Tilt up / down	Camera rotates vertically for dramatic reveals
Zoom in / out	Lens zooms closer or further
Dolly in / out	Camera physically moves forward or backward (more cinematic than zoom)
Tracking / follow shot	Camera follows a moving subject
Orbit / surround	Camera circles around the subject
Aerial / drone	Elevated bird's-eye perspective
Handheld	Natural shake for documentary feel or urgency
Slow push-in	Gradual forward movement to build tension
Static / tripod	No camera movement for stable, formal compositions

// Audio Prompts

Native Audio Generation

You don't need to add audio separately — Grok 1.5 handles it automatically as part of the same generation. Just mention music, sound effects, ambience, or short dialogue in your prompt.

Background music

“with upbeat electronic music”
“dramatic orchestral score”

Sound effects

“footsteps on gravel”
“wind howling”
“engine revving”

Ambient audio

“quiet café ambience”
“forest sounds with birdsong”

Short dialogue

“a quiet whisper: 'We made it.'”
“urgent shout: 'Stop him!'”

Tip: You can add an AUDIO: section at the end of your prompt for clarity. This helps separate visual and audio instructions.

// AUDIO: Section Example

Close-up of hands pulling apart a warm cinnamon roll, steam rising, soft morning window light, slow camera push-in, cozy kitchen mood. AUDIO: soft room tone, faint kettle hiss, gentle pastry tear sound, a quiet satisfied whisper: 'Perfect.'

Want to see how native audio performs in practice? The Cosmos 3 Super review covers real audio test results from 40+ generation runs.

// Prompt Keywords

Prompt Keyword Library

Click any keyword to copy it. Combine motion, camera, atmosphere, and audio terms for image-to-video prompts.

Motion & Action

Camera Movement

Atmosphere & Lighting

Audio (Native Generation)

// 30 Copy-Ready Prompts

Prompt Examples by Scene

Click any prompt to copy it. All examples are motion-focused for image-to-video — upload your image first, then paste and customize for your scene.

The sneaker rotates smoothly on the pedestal, camera orbiting at eye level, dramatic spotlight sweeping across the surface.

Slow 360-degree rotation. Studio lighting sweeping across surface. Subtle electronic hum.

Static shot, steam rising from cup. Natural kitchen sounds with distant conversation.

Water droplets falling on watch face in slow motion. Dramatic rim light sweeping. Orchestral swell building.

Product slowly tilts forward revealing details. Clean studio lighting. Quiet ambient tone.

// Bad vs Good Prompts

What Makes a Prompt Actually Work

The fastest way to learn is to see what doesn't work next to what does. These image-to-video prompt comparisons show exactly what to fix — motion, camera, and audio instead of re-describing your image.

Re-describing the image

Bad Prompt

A woman with brown hair and blue dress walking on a beach at sunset with waves

Good Prompt

Slow pull-back as she walks forward. Ocean breeze moving her hair. Ambient wave sounds.

Vague motion

Bad Prompt

Car passing

Good Prompt

Car racing past at high speed. Static wide shot. Engine revving loudly.

Contradicting the image

Bad Prompt

A woman dances gracefully

Good Prompt

The man slowly nods and smiles. Gentle camera push-in. Soft room tone.

No camera direction

Bad Prompt

Steam rising from the coffee cup

Good Prompt

Static close-up, steam rising gently. Morning window light. Quiet café ambience.

Negative prompts

Bad Prompt

No blur, avoid shaking, without grain

Good Prompt

Sharp focus, stable tripod shot, clean cinematic look.

// Common Mistakes

7 Prompt Mistakes That Kill Your Output

These are the most frequent problems we see with Grok Imagine Video 1.5 prompts. Each mistake leads to blurry, inconsistent, or off-target results — and each has a simple fix.

Mistake 1

Re-describing the image

The model already sees it. Describing what's in the photo wastes prompt budget and can cause drift.

Fix: Focus on motion, camera movement, atmosphere, and audio.

Mistake 2

Contradicting the source image

Writing actions or subjects that don't match the uploaded photo confuses the output.

Fix: Match your prompt to what's actually in the image.

Mistake 3

Tag stacking

"knight, castle, epic, 8K, cinematic" doesn't help — the model needs intent, not keywords.

Fix: Write a natural sentence with clear motion and camera direction.

Mistake 4

Too many simultaneous actions

Multiple unrelated actions at once produce inconsistent results.

Fix: Keep it to one subject, one action, one camera move — or list multi-beat actions in order.

Mistake 5

No camera direction

Without camera direction, the model defaults to static or unpredictable motion.

Fix: Always specify a shot type and camera movement.

Mistake 6

Vague motion

"The thing moves" gives the model nothing to work with.

Fix: Use specific verbs with intensity modifiers — 'racing past at high speed' not 'passing.'

Mistake 7

Using negative prompts

"No blur", "avoid shaking" — the model ignores negative instructions entirely.

Fix: Describe what you want instead.

See the difference these fixes make by testing them in the Cosmos 3 Super generator — compare a corrected prompt against the original mistake.

// Think Like a Director

Write Prompts for Image-to-Video

Think like a director — your image is the scene. Write about motion, camera, and audio, not description. Every generation requires an input image.

Don't re-describe the image

The model sees it. Tell it what should change — the action, the camera movement, the atmosphere.

Don't contradict the image

If there's a man in the photo, don't write 'a woman dances.' Match your prompt to what's actually there.

Be specific about motion

'Car passing' is vague — 'car racing past at high speed' gives the model something to work with.

Anchor the subject

Mention prominent features: 'the old man wearing glasses' or 'the woman in the red jacket.'

Negative prompts don't work

The model ignores them. Describe what you want instead.

Apply these rules directly in the Cosmos 3 Super generator — upload an image and paste any motion prompt from this guide to test immediately.

// What You Can Make

Use Cases for Image-to-Video

Cosmos 3 Super animates still images into short videos with synchronized audio. It handles both visual generation and audio synthesis in one pass.

Product Showcases

Got a product photo sitting flat? Give it a slow rotation and dramatic lighting sweep — suddenly it looks like a real ad. Works great for watches, sneakers, and anything you want to show off from multiple angles.

Character Animation

Illustrated characters move surprisingly well — think smooth walk cycles and exaggerated expressions, without needing a full animation crew. The model handles cartoon physics better than you'd expect.

Portrait Videos

Animate professional headshots into video introductions with natural human motion. The model handles realistic facial expressions, head turns, and body language.

Creative Projects

Bring concept art to life, animate historical photos, or turn memes into short video clips with appropriate sound effects and music.

Curious how the model performs across these scenarios in real tests? Read the Cosmos 3 Super review for benchmark data and head-to-head results.

// FAQ

Frequently Asked Questions

Common questions about writing prompts for Cosmos 3 Super — covering length, structure, audio, camera control, and consistency.

No. Cosmos 3 Super Preview is image-to-video only — every generation requires an input image. Write prompts focused on motion, camera movement, atmosphere, and audio rather than describing the scene from scratch.

Keep it short — 1–3 sentences focusing on motion, camera movement, atmosphere, and audio. The image already provides the visual context, so you don't need to describe what's in the photo.

No. The model already sees your uploaded image. Describing what's in the photo wastes prompt budget and can cause the output to drift from your source. Focus only on what should change.

Include audio descriptions in your prompt: 'ambient rain sounds', 'orchestral swell', 'footsteps on gravel'. Cosmos 3 Super generates synchronized audio automatically. You can add an 'AUDIO:' section at the end of your prompt for clarity.

Yes. Use standard cinematic language: 'slow push-in', 'aerial drone shot', 'handheld tracking', 'orbit around subject', 'static tripod shot'. The model understands professional camera terminology.

[Motion/Action] + [Camera Movement] + [Atmosphere] + [Audio]. Example: 'Slow zoom out, leaves falling gently. Ocean breeze moving her hair. Ambient wave sounds.' Don't describe what's in the image — focus on what should change.

For more consistent Cosmos 3 Super results, keep each prompt to one subject, one action, and one camera move. Use specific verbs with intensity modifiers. Iterate in small steps — change one element at a time. Shorter clips (5–8 seconds) are more stable than 15-second clips. Match aspect ratio to your platform.

// Ready to Generate

Put Your Prompts to Work

Upload a still image, paste a motion-focused prompt from this guide, and generate a short clip with native audio — no API key required on this site.

Want benchmark data and real test results? Read the full review

Cosmos 3 Super Prompt Guide:Write Prompts That Actually Work

The Image-to-Video Prompt Structure

Motion / Action

Camera Movement

Atmosphere / Lighting

Audio

Cinematic Camera Language

Native Audio Generation

Background music

Sound effects

Ambient audio

Short dialogue

Prompt Keyword Library

Motion & Action

Camera Movement

Atmosphere & Lighting

Audio (Native Generation)

Prompt Examples by Scene

What Makes a Prompt Actually Work

7 Prompt Mistakes That Kill Your Output

Re-describing the image

Contradicting the source image

Tag stacking

Too many simultaneous actions

No camera direction

Vague motion

Using negative prompts

Write Prompts for Image-to-Video

Don't re-describe the image

Don't contradict the image

Be specific about motion

Anchor the subject

Negative prompts don't work

Use Cases for Image-to-Video

Product Showcases

Character Animation

Portrait Videos

Creative Projects

Frequently Asked Questions

Does Cosmos 3 Super support text-to-video?

How long should a Cosmos 3 Super prompt be?

Do I need to describe the image when using image-to-video?

How do I add audio to my Cosmos 3 Super generation?

Can I specify camera movement in Cosmos 3 Super prompts?

What's the best prompt structure for Cosmos 3 Super image-to-video?

How do I get more consistent results with Cosmos 3 Super?

Put Your Prompts to Work

Cosmos 3 Super Prompt Guide:
Write Prompts That Actually Work