MAKE IT IN HYPERFRAMES

> ## Documentation Index
> Fetch the complete documentation index at: https://hyperframes.heygen.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Remove Background (transparent video)

> Remove the background from a video or image and drop it into any composition as a transparent overlay.

Background removal — also called *matting* in VFX — separates a foreground subject (typically a person) from its background. The output is a video with an alpha channel: fully transparent where the background was, opaque where the subject is. Drop it into any HyperFrames composition as a `<video>` tag and the subject floats over whatever you put behind them.

The CLI ships a built-in `remove-background` command that runs locally — no API keys, no cloud upload, no green screen.

## Quick Start

<Steps>
  <Step title="Verify ffmpeg is installed">
    The pipeline needs `ffmpeg` and `ffprobe` for decode + encode. Most systems already have them; if not:

    ```bash Terminal theme={null}
    # macOS
    brew install ffmpeg

    # Ubuntu / Debian
    sudo apt install ffmpeg
    ```

    Confirm with `npx hyperframes doctor` — both should be green.
  </Step>

  <Step title="Remove the background from your video">
    ```bash Terminal theme={null}
    npx hyperframes remove-background subject.mp4 -o transparent.webm
    ```

    On the first run, the CLI downloads \~168 MB of model weights to `~/.cache/hyperframes/background-removal/models/`. Subsequent runs reuse the cache.

    Output:

    ```
    ◇  Removed background from 240 frames in 38.4s (6.3 fps, CoreML) → ./transparent.webm
    ```
  </Step>

  <Step title="Drop it into a composition">
    The output is a standard VP9-with-alpha WebM. Chrome's `<video>` element decodes the alpha plane natively — no special player needed:

    ```html composition.html theme={null}
    <div class="scene">
      <!-- background layer -->
      <img src="city.jpg" class="bg" />

      <!-- transparent subject floats on top -->
      <video src="transparent.webm" autoplay muted loop playsinline></video>
    </div>
    ```

    Render the composition with the usual `hyperframes render`.
  </Step>
</Steps>

## How it works

The pipeline runs four stages, all locally:

```
ffmpeg decode  →  u²-net_human_seg inference  →  alpha composite  →  ffmpeg encode
   (raw RGB)         (320×320 mask, then upsampled)                    (VP9-alpha)
```

The model is **u²-net\_human\_seg** (MIT license, \~168 MB ONNX). It runs through `onnxruntime-node` with the best-available execution provider on your machine: CoreML on Apple Silicon, CUDA on NVIDIA, CPU otherwise.

The output is encoded with the exact ffmpeg flags Chrome's `<video>` element needs to decode alpha — `-pix_fmt yuva420p` plus the `alpha_mode=1` metadata tag. Get those wrong and the alpha plane is silently discarded by browsers.

## Output formats

| Extension         | Codec                  | When to use                                                      | Size (4s @ 1080p) |
| ----------------- | ---------------------- | ---------------------------------------------------------------- | ----------------- |
| `.webm` (default) | VP9 with alpha         | Drop into `<video>` for HTML5-native transparent playback        | \~1 MB            |
| `.mov`            | ProRes 4444 with alpha | Editing round-trip in Premiere / Resolve / Final Cut             | \~50 MB           |
| `.png`            | PNG with alpha         | Single-image cutout (only when the input is also a single image) | varies            |

```bash Terminal theme={null}
npx hyperframes remove-background subject.mp4 -o transparent.webm        # web playback
npx hyperframes remove-background subject.mp4 -o transparent.mov         # editing
npx hyperframes remove-background portrait.jpg -o cutout.png       # still image
```

## Layer separation: emit the cutout and the background plate together

Pass `--background-output` (alias `-b`) to write a *second* transparent video alongside the cutout. Same source RGB, alpha is the *inverse* mask — opaque where the surroundings were, transparent where the subject is. The result is a clean two-layer separation in a single inference pass:

```bash Terminal theme={null}
npx hyperframes remove-background subject.mp4 \
  -o subject.webm \
  --background-output plate.webm
```

| Output         | Alpha                                     | Use it as                                                                                                    |
| -------------- | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| `subject.webm` | Mask — subject opaque                     | Foreground layer (top of stack)                                                                              |
| `plate.webm`   | `255 − mask` — subject region transparent | Background layer; place anything you want **under the subject's silhouette** between this and `subject.webm` |

Both encoders share the source W/H/fps and your `--quality` preset, so the layers are pixel-aligned. Encode cost roughly doubles; segmentation cost is unchanged.

<Tip>
  **This is a hole-cut plate, not an inpainted clean plate.** The subject region in `plate.webm` is fully transparent — you have to composite something opaque under it (a graphic, a blurred copy, a different scene) to fill the hole. If you need an actual filled background where the subject was, use a video inpainter (LaMa, ProPainter, RunwayML Inpaint) — `remove-background` is not the right tool for that.
</Tip>

### Hole-cut vs. clean plate — when does the difference matter?

A **hole-cut plate** keeps the original surroundings and makes the subject region transparent. A **clean plate** fills the subject region with reconstructed background — produced by a separate inpainting model. Display each alone over black:

|                    | Hole-cut plate (this command)               | Clean plate (inpainted)                   |
| ------------------ | ------------------------------------------- | ----------------------------------------- |
| Subject region     | Transparent silhouette                      | Reconstructed background pixels           |
| What you see alone | A person-shaped hole                        | An empty room                             |
| Cost               | One inference pass, one extra ffmpeg encode | A second model (LaMa, ProPainter, E2FGVI) |
| Tool               | `remove-background --background-output`     | Outside this CLI                          |

The line is: **does anything ever need to be visible *through* the subject's silhouette where the subject used to be?**

| Use case                                                                  | What you need                                                     |
| ------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| Text/graphics live *between* the cutout and the plate (the example above) | **Hole-cut** — the graphics fill the hole.                        |
| Composite the subject onto an unrelated scene                             | Neither. Just use `subject.webm`; the plate is irrelevant.        |
| Show "the room without the person" as a real background                   | **Clean plate** — a hole-cut plate would show a transparent void. |
| Replace the person with a different subject (re-target)                   | **Clean plate** — the new subject needs real pixels under it.     |
| VFX rotoscoping / "remove an extra from this take"                        | **Clean plate** — the canonical inpainting use case.              |

If something opaque always covers the silhouette, hole-cut is sufficient and \~1000× cheaper than running an inpainter.

### The two-layer composition pattern

The two-layer pattern is functionally a drop-in for [text-behind-subject](#text-behind-subject-the-recommended-layout) without needing the original `presenter.mp4` in the project — the plate replaces it as the bottom layer:

```html theme={null}
<!-- z=1 inverse-alpha plate fills everything except the subject's silhouette -->
<video src="plate.webm" data-start="0" data-duration="6" data-track-index="0" muted playsinline></video>

<!-- z=2 anything you want occluded by the subject lives here -->
<h1 style="z-index:2; position:absolute; top:50%; left:50%; transform:translate(-50%,-50%);">
  MAKE IT IN HYPERFRAMES
</h1>

<!-- z=3 the cutout puts the subject back on top -->
<div class="cutout-wrap" style="position:absolute;inset:0;z-index:3">
  <video src="subject.webm" data-start="0" data-duration="6" data-track-index="1" muted playsinline></video>
</div>
```

Constraints: the flag requires a video input and `.webm` or `.mov` for both outputs. It's not valid for image inputs (no temporal pairing to do) and won't accept `.png` for the plate.

## Performance

Real-world numbers from the [matting eval](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654), running u²-net\_human\_seg on a 4-second 1080p clip:

| Platform                         | Provider | ms/frame | 30-second clip |
| -------------------------------- | -------- | -------- | -------------- |
| Apple Silicon (M2 Pro / M3 / M4) | CoreML   | \~263    | \~2 min        |
| NVIDIA GPU (T4, A10, RTX)        | CUDA     | \~80–150 | \~30–60 s      |
| Linux x86                        | CPU      | \~1100   | \~16 min       |
| macOS Intel                      | CPU      | \~900    | \~13 min       |

Matting is offline preprocessing — you run it once per asset and reuse the output. CPU-only is slow but always works; if you reuse the same subject clip repeatedly, run it once on a faster machine and check the transparent output into your project.

## Picking a device explicitly

`--device auto` is the default and right for almost everyone. The flag exists for two cases:

* **Force CPU on a GPU box** when you want to keep the GPU free for other work, or are debugging an EP-specific issue:

  ```bash Terminal theme={null}
  npx hyperframes remove-background subject.mp4 -o transparent.webm --device cpu
  ```

* **Opt into CUDA** by setting `HYPERFRAMES_CUDA=1` and providing a GPU-enabled `onnxruntime-node` build (the bundled build is CPU + CoreML only, to keep the install small for the 99% of users who don't have a GPU):

  ```bash Terminal theme={null}
  HYPERFRAMES_CUDA=1 npx hyperframes remove-background subject.mp4 -o transparent.webm --device cuda
  ```

Run `npx hyperframes remove-background --info` to see what providers are detected on your machine and which one `auto` would pick.

## Using the transparent video in a composition

The transparent WebM behaves like any other video element. The two patterns you'll use most:

**Subject over a background image:**

```html theme={null}
<div style="position: relative; width: 1920px; height: 1080px;">
  <img src="background.jpg" style="position: absolute; inset: 0;" />
  <video
    src="transparent.webm"
    autoplay
    muted
    loop
    playsinline
    style="position: absolute; right: 80px; bottom: 0; height: 90%;"
  ></video>
</div>
```

**Subject over a HyperFrames scene:**

```html theme={null}
<!-- scene contents (text, animations, etc.) -->
<div class="title-card">Welcome</div>

<!-- subject layered on top -->
<video src="transparent.webm" autoplay muted loop playsinline class="subject"></video>
```

The cutout inherits the composition's frame rate and timeline — it plays through once during the scene's duration, so match the source clip length to the scene length when possible. If the scene is longer than the clip, `loop` handles it.

<Tip>
  When rendering a composition that contains a `<video>` element, the renderer reads the source via ffmpeg internally. Transparent WebMs are decoded with the alpha plane preserved.
</Tip>

## Compositing patterns and pitfalls

The cutout webm is a **re-encoded copy** of the source mp4's RGB — the matter pipeline decodes the source to raw RGB, runs segmentation, and re-encodes to VP9 with alpha. That choice has consequences depending on what you put behind it.

### The three patterns

| Pattern                                                                                | Behind the cutout                                         | Result                                                                                                                                                                                                                                                  |
| -------------------------------------------------------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Cutout over a different scene** *(most common)*                                      | Static image, gradient, animated bg, or unrelated footage | Clean. The cutout is the only source of the subject — no doubling, no edge halo. Use any `--quality`.                                                                                                                                                   |
| **Cutout over its own source mp4** *(text-behind-subject, talking-head with overlays)* | The same mp4 the cutout was generated from                | Two RGB sources for the same person. At default `--quality balanced` (crf 18) the doubling is barely visible; at `--quality fast` (crf 30) you'll see a slight color shift / soft edge on the silhouette. Use `--quality best` (crf 12) for hero shots. |
| **Cutout over different footage of the same subject**                                  | Another take of the same person                           | Looks like two overlapping people. Avoid — re-shoot or re-cut the source.                                                                                                                                                                               |

### Text-behind-subject: the recommended layout

Putting a headline *behind* a presenter so their silhouette occludes the text:

```html theme={null}
<!-- z=1 base mp4: full lobby + presenter, plays the whole scene -->
<video
  id="cf-base"
  data-start="0" data-duration="6" data-media-start="0" data-track-index="0"
  src="presenter.mp4"
  muted playsinline
></video>

<!-- z=2 headline -->
<h1 id="cf-headline" style="position:absolute;top:50%;left:50%;
     transform:translate(-50%,-50%); z-index:2;
     color:#fff; text-shadow:0 6px 32px rgba(0,0,0,.55);
     clip-path:inset(0 0 100% 0); font-size:220px; font-weight:900;">
  MAKE IT IN HYPERFRAMES
</h1>

<!-- z=3 cutout: same source, alpha around presenter, hidden until the cut.
     The wrapper carries the opacity, NOT the <video> itself. -->
<div class="cutout-wrap" style="position:absolute;inset:0;z-index:3;opacity:0">
  <video
    id="cf-cutout"
    data-start="0" data-duration="6" data-media-start="0" data-track-index="1"
    src="presenter.webm"
    muted playsinline
  ></video>
</div>
```

```js theme={null}
const tl = gsap.timeline({ paused: true });
const CUT = 3.3;

// Reveal the headline early
tl.to("#cf-headline", { clipPath: "inset(0 0 0% 0)", duration: 0.6, ease: "expo.out" }, 0.25);

// At the cut, flip the cutout wrapper visible — silhouette punches through the headline
tl.set(".cutout-wrap", { opacity: 1 }, CUT);

// Sentinel: extend timeline to the composition's full duration so the renderer
// doesn't bail past the last meaningful tween.
tl.set({}, {}, 6);
```

### Two non-obvious rules

**1. Wrap the cutout video in a non-timed `<div>` and animate the wrapper, not the video.**

The framework forces `opacity: 1` on any element with `data-start`/`data-duration` while it's "active" — that's how it controls clip visibility. CSS `opacity: 0` on the video element is silently overwritten by the framework's clip lifecycle, so an opacity tween on the video element won't do anything. Wrap the video in a `<div>` that has no `data-*` attributes; the wrapper is owned entirely by your CSS/GSAP.

**2. Both videos start at `data-start="0"` and decode in sync from t=0.**

It's tempting to "late-mount" the cutout (`data-start="3.3"` to match the cut). Don't — Chrome does a seek + decoder warm-up at mount, which can land one frame off the base mp4 at the cut moment. With both videos mounted from t=0 and the cutout's wrapper opacity-animated, both decoders advance the same way and stay frame-accurate.

### Quality preset and color match

When the cutout is overlaid on its own source mp4, the encoder's CRF directly affects how visible the doubling is at edges:

| `--quality`            | CRF | File size (12s @ 1080p) | When to use                                                                     |
| ---------------------- | --- | ----------------------- | ------------------------------------------------------------------------------- |
| `fast`                 | 30  | \~2 MB                  | Cutout sits over an unrelated background and file size matters                  |
| `balanced` *(default)* | 18  | \~6 MB                  | Recommended for text-behind-subject and any pattern that overlays on the source |
| `best`                 | 12  | \~12 MB                 | Hero shots, masters, or anything you'll re-encode downstream                    |

The encoder also writes BT.709 + limited-range color metadata so Chrome's YUV→RGB pipeline matches the source mp4's. Without those tags, the cutout would render slightly differently from the underlying mp4 even at lossless quality (visible red/skin shift).

## What u²-net\_human\_seg is and isn't good for

The model is purpose-built for **portrait / human matting**. It excels when:

* ✅ The subject is a person, head-and-shoulders or full-body
* ✅ The framing is reasonably stable (not a wide handheld shot)
* ✅ The background contrasts with the subject

It struggles or fails on:

* ❌ Non-human subjects (products, animals, objects). The model will return a mostly-empty mask.
* ❌ Very fine hair detail on a busy background. The 320×320 inference resolution means hair tips get softened — fine for most use cases, but compositors notice.
* ❌ Frame-to-frame temporal consistency. Each frame is processed independently, so static backgrounds with moving subjects can show subtle edge flicker. For most web playback this is invisible; for high-end VFX it may matter.
* ❌ Live streams or real-time capture. The pipeline is batch-only.

If your use case hits one of these, see the alternatives below.

## Alternatives — when the built-in command isn't the right tool

The CLI ships **one model on purpose** — the one that's MIT-licensed, runs everywhere, and produces production-quality output for person/portrait video. The list below leads with **free, open-source tools** that pair naturally with HyperFrames. Each entry calls out the actual catch — license, install effort, hardware needs — so you can pick the right one for your situation. Full benchmarks are in the [matting eval](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654).

### Free, open-source CLIs and libraries

These all run locally with no account, no upload, no watermark.

| Tool                                                                                                | When to use it                                                                                                                                                                                                                   | Catch                                                                                                               |
| --------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| [`rembg`](https://github.com/danielgatis/rembg) (Python, MIT)                                       | You need a different subject type — `isnet-general-use` for objects/animals/products, `birefnet-portrait` for a quality ceiling on hair, `silueta` for a tiny \~40 MB footprint. Same family as our default model, more variety. | Requires Python + `pip install rembg`. Some bundled models (`birefnet-*`) need \~4 GB RAM and are CPU-only          |
| [BiRefNet](https://github.com/ZhengPeng7/BiRefNet) (PyTorch, MIT)                                   | Highest-fidelity portrait mattes available — visibly better hair edges than u²-net                                                                                                                                               | Heavy (\~4 GB inference RAM), slow on CPU, broken on Apple CoreML at the time of the eval                           |
| [Robust Video Matting (RVM)](https://github.com/PeterL1n/RobustVideoMatting) (PyTorch, **GPL-3.0**) | The only widely-available model with **temporal consistency** built in — no edge flicker on moving subjects. Best choice when you're matting a long talking-head clip and frame-to-frame stability matters                       | GPL-3.0 license is incompatible with most commercial / proprietary codebases. Read your repo's license before using |
| [Backgroundremover](https://github.com/nadermx/backgroundremover) (Python, MIT)                     | Simple `pip install` wrapper around u²-net; nice if you want a Python API instead of our Node CLI                                                                                                                                | Same model family as ours, no quality difference — pick whichever fits your stack                                   |
| [ComfyUI](https://github.com/comfyanonymous/ComfyUI) (open-source, GPL-3.0 core)                    | Custom workflows: chain a segmentation model + alpha refinement + temporal smoothing. The right tool for tricky cases (multiple subjects, hair against a similar background, sports footage)                                     | Setup is involved (Python, models, node graph). Worth it for repeat specialty work                                  |

After running any of these externally, encode the output as a HyperFrames-compatible transparent WebM with:

```bash Terminal theme={null}
ffmpeg -i frames-%04d.png -c:v libvpx-vp9 \
  -pix_fmt yuva420p \
  -metadata:s:v:0 alpha_mode=1 \
  -auto-alt-ref 0 -cpu-used 4 -b:v 0 -crf 30 \
  transparent.webm
```

### Free desktop / GUI tools

| Tool                                                                                     | When to use it                                                                                                                       | Catch                                                                                                                                     |
| ---------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------- |
| [DaVinci Resolve — Magic Mask](https://www.blackmagicdesign.com/products/davinciresolve) | You're already editing in Resolve, want a brush-based UI with manual refinement, and need to round-trip the alpha into a larger edit | macOS / Windows / Linux desktop install. The free tier covers Magic Mask; paid Studio version unlocks higher resolutions on some features |
| [Backgroundremover.app](https://backgroundremover.app) (web)                             | One-off image cutout, no signup, no watermark                                                                                        | Single images only, not video. Free tier is hosted but the underlying tool is the same `rembg` model family                               |
| [PhotoRoom Background Remover](https://www.photoroom.com/tools/background-remover) (web) | Quick one-off image, polished UI, no signup                                                                                          | Single images only, e-commerce-tuned model                                                                                                |

### Web SaaS tools (free tiers, with strings)

| Tool                                                                                  | When to use it                                                                                          | Catch                                                                                                                       |
| ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| [unscreen.com](https://www.unscreen.com)                                              | Quick one-off video, no install, drag-and-drop                                                          | **Free tier is watermarked and capped at short clips** (\~10s preview). Paid removes both. Run by the team behind remove.bg |
| [RunwayML — Green Screen](https://runwayml.com)                                       | Polished UI with brush refinement and time-aware tracking; the closest a SaaS gets to professional roto | Free tier exists but is credit-limited; serious use is a subscription                                                       |
| [Kapwing — Background Remover](https://www.kapwing.com/tools/remove-video-background) | Browser-based, integrates with their video editor                                                       | Free tier is watermarked; paid removes it                                                                                   |

### How to choose

* **Person / portrait video, web playback, MIT-clean** → use the built-in `hyperframes remove-background` (this is what it's tuned for).
* **Non-human subject** (product, animal, object) → `rembg` with `isnet-general-use`.
* **Maximum portrait quality, especially hair** → `BiRefNet` via Python.
* **Long video where edge flicker would be visible**, GPL is OK → `RVM`.
* **One-off marketing clip, no install** → DaVinci Resolve (free) for video, Backgroundremover.app for a still image.
* **Specialty case the off-the-shelf models can't handle** → ComfyUI with a custom graph.

## Troubleshooting

### Model download fails or hangs

The weights live on GitHub Releases (rembg's `v0.0.0` release, \~168 MB). If your network blocks GitHub or the download is interrupted:

```bash Terminal theme={null}
# Manually download and drop into the cache
mkdir -p ~/.cache/hyperframes/background-removal/models
curl -L -o ~/.cache/hyperframes/background-removal/models/u2net_human_seg.onnx \
  https://github.com/danielgatis/rembg/releases/download/v0.0.0/u2net_human_seg.onnx
```

Subsequent `remove-background` runs skip the download and use your local copy.

### "ffmpeg and ffprobe are required"

The pipeline shells out to ffmpeg for decode + encode. Install via `brew install ffmpeg` on macOS or `sudo apt install ffmpeg` on Debian/Ubuntu. Verify with `npx hyperframes doctor`.

### The output WebM looks fully opaque in the browser

Chrome only reads the alpha plane when the WebM is encoded as `yuva420p` with the `alpha_mode=1` metadata tag. The CLI sets both. If you re-encode the output yourself (e.g. with another ffmpeg invocation), preserve those flags:

```bash Terminal theme={null}
ffmpeg -i in.webm -c:v libvpx-vp9 \
  -pix_fmt yuva420p \
  -metadata:s:v:0 alpha_mode=1 \
  -auto-alt-ref 0 -cpu-used 4 \
  out.webm
```

To verify a WebM has alpha, extract the first frame and inspect:

```bash Terminal theme={null}
ffmpeg -y -c:v libvpx-vp9 -i out.webm -frames:v 1 -pix_fmt rgba -update 1 frame0.png
```

The decoded `frame0.png` should be RGBA and have non-trivial alpha values.

### CoreML is "available" but inference fails to start

The pipeline auto-falls-back to CPU if CoreML fails to bind, with a warning. If you want to skip the CoreML attempt entirely, force CPU:

```bash Terminal theme={null}
npx hyperframes remove-background subject.mp4 -o transparent.webm --device cpu
```

### The alpha mask has rough or jagged edges

That usually means the source frame is high-contrast against a similar-toned background and the model's 320×320 inference resolution is showing through. Two paths forward:

1. Re-frame or re-shoot to give the subject a more contrasting background.
2. Try `birefnet-portrait` via `rembg` (see [Other open-source models](#other-open-source-models)) — it's higher quality at hair edges but slower and heavier.

## Reference

* CLI: [`hyperframes remove-background`](/packages/cli#remove-background)
* Eval: [Matting eval — v7](https://www.heygenverse.com/a/0dd5a431-1832-4858-862d-de7fb7d02654)
* Source model: [danielgatis/rembg](https://github.com/danielgatis/rembg)
* ONNX runtime: [`onnxruntime-node`](https://www.npmjs.com/package/onnxruntime-node)