Bad framing kills a clip: what auto reframe is and why it matters
You cut the clip, it looked great, you posted it, and the face vanished from the frame. The problem wasn't the edit. It was the framing. Auto reframe fixes this, but a lot of people still don't get what it really does.

Bad framing kills a clip: what auto reframe is and why it matters
You record a two-hour live, use a tool to cut out the best moments, export everything vertical and post it. The clips look bad. Not because of the content, not because of the captions. The creator's face shows up cut in half, or pushed into the corner of the screen, with an empty space taking over the frame.
This happens more than you'd think. And the cause is always the same: the vertical crop was done with no framing logic at all.
The problem starts at the source of the video
Camera, screen capture, OBS, webcam: all of it records in 16:9, which is horizontal. The TikTok, Reels and Shorts feed is 9:16, which is vertical. They're opposite aspect ratios.
When you take a horizontal video and turn it into vertical, you have to choose which part of the width is going to show. Most basic tools do this the simple way: they grab the center of the image and that's it. If the person talking happens to be in the center, it works. If they're not, the face disappears.
In a typical webcam live, the creator is usually reasonably centered. But in a live with two participants on screen at once? In a gameplay with a facecam in the corner? In a podcast recorded at a table with three people? The center crop will grab the space between them and leave everyone out.
What auto reframe really is
Auto reframe isn't just "cropping at the center, but smarter." It's a process that analyzes the video and identifies where the faces are and, more importantly, who is speaking at that moment.
Based on that, it decides where the 9:16 frame will sit inside the original 16:9. If the person is on the left, the frame goes left. If they move, the frame follows. If they stop talking and someone else starts, the frame migrates to that other person.
The result looks like someone did a careful scene-by-scene edit, but it's all automatic.
Cut.Pro implements reframe with speaker detection: the frame follows whoever has the voice, not just where there's a face. That makes a huge difference in conversational content.
Why framing matters so much for a clip's performance
On the phone, the clip takes up the whole screen. There's no context around it, no other elements, nothing to pull attention away. The full screen is the face of the person talking, or it should be.
When the face is cropped or pushed into a corner, the viewer notices in under a second. It's not conscious, it's not a critique. It's instinctive rejection. The thumb swipes up before they even process what was being said.
That raises the clip's early drop-off rate. And the TikTok, Instagram and YouTube Shorts algorithms use that signal as a quality indicator. A clip with bad framing gets distributed less, even if the content is good. I've already written about how the first seconds and a clip's duration affect distribution, and framing works on the same logic: it's the bare minimum that has to be right.
Manual versus auto reframe
There are editing tools that let you reframe manually, frame by frame or scene by scene. You drag the crop point, position it where you want, and save. It works well for short videos with static content.
The problem is scale and complexity.
In a 2-hour live with 30 clips to extract, doing manual reframe on each one means dozens of hours of work. For anyone with an active channel on Twitch or YouTube who posts content every day, that simply doesn't fit the routine.
On top of that, when there's more than one person in the scene, manual reframe demands constant attention during editing. You have to notice when the speaker changes and adjust the frame at exactly the right moment. Any distraction and it comes out wrong.
Auto reframe solves this by taking the scene-by-scene decision out of your hands.
Practical examples of where reframe makes a difference
Live with two creators. A classic situation on Twitch and Kick: two streamers on screen at once, one on each side. The center of the frame is the empty space between them. Without smart reframe, any crop will grab nothing relevant. With speaker detection, the frame goes to the creator who's talking in that stretch of the clip, and switches when the other one starts talking.
Gameplay with a facecam. The game takes up 80% of the screen and the facecam sits in a corner. Depending on the moment in the clip, what matters is the facecam's reaction, not the game. Auto reframe that detects faces will prioritize the face when it's active, and can balance the two elements when it makes sense.
Podcast at a table. Three people sitting around a table, static camera. Center reframe grabs the person in the middle and ignores the other two. With speaker detection, the frame moves to whoever has the floor at each moment in the extracted clip. The vertical clip looks like a multi-cam edit, even though it came from a single still camera.
These cases come up all the time when we work with creators who have variety channels, talk shows, streams with guests. The difference in quality between dumb reframe and AI reframe is immediate.
What the algorithm "sees" when the frame is wrong
Here's something few people talk about: platform algorithms also process the video's content, not just the metadata. TikTok, for example, uses computer vision to understand what the video is about, what type of content it is, whether there's a visible face, what the expression is.
A clip with a cropped or badly positioned face can be interpreted differently by the categorization system. It's not an explicit penalty, but videos with a well-framed human face tend to get more organic distribution in entertainment and conversation niches, because the system identifies the content type better.
This isn't the main factor, but it's one more reason not to treat framing as a detail.
The limits of auto reframe
Auto reframe solves a lot, but it isn't magic. There are situations where it'll get it wrong or need adjusting.
A very low-quality camera, or bad lighting, makes face detection harder. In very dark scenes, the model can lose the subject and the frame sits in the wrong spot for a few seconds.
There's also the case of a live with a heavy overlay, a small camera and graphic elements covering the face. There the reframe detects the face but the visual result still isn't great, because the problem is in the original composition of the stream, not in the crop.
For those cases, the ideal is to have the option of manual adjustment on top of the automatic, where you can fix things selectively without redoing everything from scratch.
How to use reframe in a real clipping workflow
The workflow that works in practice is: let the AI do auto reframe on all the clips as a starting point, quickly review the results, and adjust only the ones that came out bad.
In a batch of 20 clips, maybe 2 or 3 need manual adjustment. The rest are ready to go. That's very different from doing everything by hand.
Cut.Pro generates clips with reframe already applied, so when you open the clip's timeline, the vertical framing is already set and you see the result immediately, without having to process anything separately. You can adjust if you need to, but most of the time you don't.
If you want to understand more about the end-to-end AI clipping process, there's a complete guide on AI clipping for Twitch and Kick that goes into more detail on each step.
One thing that changes everything
There's one detail that separates a useful reframe from one that looks good but delivers bad clips: the difference between following the face and following the speaker.
Following the face is simpler to implement. You detect where there's a face, put the frame there, done. The problem is that in a scene with two faces, the system doesn't know which one to prioritize. Sometimes it keeps switching between them abruptly, which looks like a glitch in the clip.
Following the speaker combines face detection with audio analysis. The system knows who's talking, so it knows which face to prioritize. The transition between speakers can be smooth, similar to a real camera cut. It's that behavior that makes the clip look like it was edited by a human.
This distinction isn't visible in the tool's interface, but it shows up in the result. It's worth paying attention to when you're evaluating which tool to use for clipping: test a podcast with two participants and see what happens to the framing when the speaker changes.
If the frame locks on the first face and ignores the second, you're using face following. If it migrates smoothly to whoever has the floor, that's speaker following.
For anyone producing conversational content, interviews, streams with guests, that difference defines whether the clip will look professional or need rework in half the cases.
Keep reading
More insights and tutorials to help you grow as a content creator.


