New Model

Wan 2.7 Reference to Video is Now on Segmind: Character-Consistent Video from Any Photo

Generate character-consistent AI videos from reference images at up to 1080P. Wan 2.7-R2V is now live on the Segmind API.

Rohit Rao

04 Apr 2026 • 2 min read

Search interest in AI video generation has been climbing steadily since the start of 2026, but most tools still struggle with one fundamental problem: keep a face consistent across frames. Every time you try to generate a branded spokesperson, a recurring character for a YouTube series, or a digital actor for a film pre-visualization, you get a different person in every clip. Wan 2.7 Reference to Video solves that.

What is Wan 2.7 Reference to Video?

Wan 2.7-R2V is a video generation model built specifically for character-consistent outputs from reference images. You pass in one or more portrait photos, write a scene description, and get back a video where that person is doing exactly what you described, in your chosen environment, at up to 1080P resolution. It also supports multi-subject inputs (so you can have two distinct characters in the same scene) and voice cloning, where the character in the video can be made to speak in a specific voice using a reference audio clip. I ran it across a set of industry scenarios and the character fidelity is noticeably better than general-purpose text-to-video models for this specific task.

What you can build with it

Marketing agencies can generate product spokesperson clips, lifestyle brand walkthroughs, and ad-style videos from a single photo of a model or brand ambassador, at a fraction of the cost of a shoot.
Film studios and VFX teams can run rapid character pre-visualization, placing a digital actor into a scene and iterating on direction and lighting before committing to production time.
Production houses and MCNs can create custom YouTube channel intros, talking head segments, and recurring on-screen characters at scale without repeat shoots.

See it in action

Here is a sample I generated using a single reference image. The model keeps the character's features stable across all five seconds of output.

  Prompt used
  Image1 walks through a lush green garden with blooming flowers, smiling warmly at the camera, golden hour lighting

Wan 2.7-R2V output — character-consistent garden walk from a single reference image

Get started on Segmind

Wan 2.7-R2V is live on the Segmind API right now. 720P clips come in at $0.625 per request, 1080P at $0.9375. No infrastructure to set up, no queue to manage. You call the endpoint, pass in your reference image URL and a prompt, and get back a ready-to-use MP4 in seconds.

import requests

response = requests.post(
    "https://api.segmind.com/v1/wan2.7-r2v",
    headers={"x-api-key": "YOUR_API_KEY"},
    json={
        "prompt": "Image1 walks into a bright product launch event, gestures at camera, confident smile",
        "reference_images": '["https://your-cdn.com/your-photo.jpg"]',
        "resolution": "720P",
        "duration": 5,
        "seed": 42
    }
)

with open("output.mp4", "wb") as f:
    f.write(response.content)

Check the full docs and try it live at segmind.com/models/wan2.7-r2v.