What is auto reframe in video?

Auto reframe is the process of intelligently cropping a horizontal video (16:9) to vertical (9:16), detecting faces and who is speaking to keep the main subject centered in the frame throughout the scene. Without auto reframe, the crop is static and can leave the face out of frame.

What's the difference between manual and auto reframe?

With manual reframe, you drag the crop frame scene by scene, positioning the subject by hand. With auto reframe, the AI detects face and voice and moves the frame on its own, following the speaker in real time. For long lives or podcasts with several participants, doing it manually is impractical.

Does auto reframe work on podcasts with more than one person?

Yes. Speaker-detection-based reframe identifies which person is talking and shifts the frame's focus to them. In a conversation with two or three participants, the frame follows whoever has the floor, with no jarring cut. The result looks hand-edited, but it's automatic.

Why does bad framing hurt a clip's performance?

Vertical clips are watched full-screen on the phone. If the creator's face appears cropped or pushed into a corner, the viewer notices instantly and swipes away. That raises the early drop-off rate, and the algorithm reads it as a negative signal, reducing the clip's distribution.

Bad framing kills a clip: what auto reframe is and why it matters

You cut the clip, it looked great, you posted it, and the face vanished from the frame. The problem wasn't the edit. It was the framing. Auto reframe fixes this, but a lot of people still don't get what it really does.

Bad framing kills a clip: what auto reframe is and why it matters

You record a two-hour live, use a tool to cut out the best moments, export everything vertical and post it. The clips look bad. Not because of the content, not because of the captions. The creator's face shows up cut in half, or pushed into the corner of the screen, with an empty space taking over the frame.

This happens more than you'd think. And the cause is always the same: the vertical crop was done with no framing logic at all.

The problem starts at the source of the video

Camera, screen capture, OBS, webcam: all of it records in 16:9, which is horizontal. The TikTok, Reels and Shorts feed is 9:16, which is vertical. They're opposite aspect ratios.

When you take a horizontal video and turn it into vertical, you have to choose which part of the width is going to show. Most basic tools do this the simple way: they grab the center of the image and that's it. If the person talking happens to be in the center, it works. If they're not, the face disappears.

In a typical webcam live, the creator is usually reasonably centered. But in a live with two participants on screen at once? In a gameplay with a facecam in the corner? In a podcast recorded at a table with three people? The center crop will grab the space between them and leave everyone out.

What auto reframe really is

Auto reframe isn't just "cropping at the center, but smarter." It's a process that analyzes the video and identifies where the faces are and, more importantly, who is speaking at that moment.

Based on that, it decides where the 9:16 frame will sit inside the original 16:9. If the person is on the left, the frame goes left. If they move, the frame follows. If they stop talking and someone else starts, the frame migrates to that other person.

The result looks like someone did a careful scene-by-scene edit, but it's all automatic.

Cut.Pro implements reframe with speaker detection: the frame follows whoever has the voice, not just where there's a face. That makes a huge difference in conversational content.

Why framing matters so much for a clip's performance

On the phone, the clip takes up the whole screen. There's no context around it, no other elements, nothing to pull attention away. The full screen is the face of the person talking, or it should be.

When the face is cropped or pushed into a corner, the viewer notices in under a second. It's not conscious, it's not a critique. It's instinctive rejection. The thumb swipes up before they even process what was being said.

That raises the clip's early drop-off rate. And the TikTok, Instagram and YouTube Shorts algorithms use that signal as a quality indicator. A clip with bad framing gets distributed less, even if the content is good. I've already written about how the first seconds and a clip's duration affect distribution, and framing works on the same logic: it's the bare minimum that has to be right.

Manual versus auto reframe

There are editing tools that let you reframe manually, frame by frame or scene by scene. You drag the crop point, position it where you want, and save. It works well for short videos with static content.

The problem is scale and complexity.

In a 2-hour live with 30 clips to extract, doing manual reframe on each one means dozens of hours of work. For anyone with an active channel on Twitch or YouTube who posts content every day, that simply doesn't fit the routine.

On top of that, when there's more than one person in the scene, manual reframe demands constant attention during editing. You have to notice when the speaker changes and adjust the frame at exactly the right moment. Any distraction and it comes out wrong.

Auto reframe solves this by taking the scene-by-scene decision out of your hands.

Practical examples of where reframe makes a difference

Live with two creators. A classic situation on Twitch and Kick: two streamers on screen at once, one on each side. The center of the frame is the empty space between them. Without smart reframe, any crop will grab nothing relevant. With speaker detection, the frame goes to the creator who's talking in that stretch of the clip, and switches when the other one starts talking.

Gameplay with a facecam. The game takes up 80% of the screen and the facecam sits in a corner. Depending on the moment in the clip, what matters is the facecam's reaction, not the game. Auto reframe that detects faces will prioritize the face when it's active, and can balance the two elements when it makes sense.

Podcast at a table. Three people sitting around a table, static camera. Center reframe grabs the person in the middle and ignores the other two. With speaker detection, the frame moves to whoever has the floor at each moment in the extracted clip. The vertical clip looks like a multi-cam edit, even though it came from a single still camera.

These cases come up all the time when we work with creators who have variety channels, talk shows, streams with guests. The difference in quality between dumb reframe and AI reframe is immediate.

What the algorithm "sees" when the frame is wrong

Here's something few people talk about: platform algorithms also process the video's content, not just the metadata. TikTok, for example, uses computer vision to understand what the video is about, what type of content it is, whether there's a visible face, what the expression is.

A clip with a cropped or badly positioned face can be interpreted differently by the categorization system. It's not an explicit penalty, but videos with a well-framed human face tend to get more organic distribution in entertainment and conversation niches, because the system identifies the content type better.

This isn't the main factor, but it's one more reason not to treat framing as a detail.

The limits of auto reframe

Auto reframe solves a lot, but it isn't magic. There are situations where it'll get it wrong or need adjusting.

A very low-quality camera, or bad lighting, makes face detection harder. In very dark scenes, the model can lose the subject and the frame sits in the wrong spot for a few seconds.

There's also the case of a live with a heavy overlay, a small camera and graphic elements covering the face. There the reframe detects the face but the visual result still isn't great, because the problem is in the original composition of the stream, not in the crop.

For those cases, the ideal is to have the option of manual adjustment on top of the automatic, where you can fix things selectively without redoing everything from scratch.

How to use reframe in a real clipping workflow

The workflow that works in practice is: let the AI do auto reframe on all the clips as a starting point, quickly review the results, and adjust only the ones that came out bad.

In a batch of 20 clips, maybe 2 or 3 need manual adjustment. The rest are ready to go. That's very different from doing everything by hand.

Cut.Pro generates clips with reframe already applied, so when you open the clip's timeline, the vertical framing is already set and you see the result immediately, without having to process anything separately. You can adjust if you need to, but most of the time you don't.

If you want to understand more about the end-to-end AI clipping process, there's a complete guide on AI clipping for Twitch and Kick that goes into more detail on each step.

One thing that changes everything

There's one detail that separates a useful reframe from one that looks good but delivers bad clips: the difference between following the face and following the speaker.

Following the face is simpler to implement. You detect where there's a face, put the frame there, done. The problem is that in a scene with two faces, the system doesn't know which one to prioritize. Sometimes it keeps switching between them abruptly, which looks like a glitch in the clip.

Following the speaker combines face detection with audio analysis. The system knows who's talking, so it knows which face to prioritize. The transition between speakers can be smooth, similar to a real camera cut. It's that behavior that makes the clip look like it was edited by a human.

This distinction isn't visible in the tool's interface, but it shows up in the result. It's worth paying attention to when you're evaluating which tool to use for clipping: test a podcast with two participants and see what happens to the framing when the speaker changes.

If the frame locks on the first face and ignores the second, you're using face following. If it migrates smoothly to whoever has the floor, that's speaker following.

For anyone producing conversational content, interviews, streams with guests, that difference defines whether the clip will look professional or need rework in half the cases.

Keep reading

More insights and tutorials to help you grow as a content creator.

8 mistakes that get the algorithm to bury your Short (and how to fix them)

You post, you wait, and the Short vanishes. Nobody watches, nobody shares. The blame almost always falls on one of these eight mistakes, and most of them you can fix today.

2026-05-19Read more

How to find the gold moment in a 3-hour stream and turn it into a clip

You streamed for 3 hours. The gold moment was in there somewhere. The hard part is finding it. This post breaks down the signals that reveal the best segments, and why watching the whole thing by hand isn't the way anymore.

2026-05-13Read more

7 hooks that keep viewers watching in the first 3 seconds

You can have the best content in the world. If the first 3 seconds don't grab the viewer, they're gone. Here are 7 types of hook we see working right now, with real examples of how they sound on a livestream or podcast.

2026-05-03Read more