Overview (Updated 1/23/2013)
Vitaly’s hack enables users to reconfigure the way the GH2 camera encodes the video, in order to improve the quality of its recorded images and motion. New or less-techy users, trying to decide which patch to try or use, may find it daunting to read through Personal View’s lengthy forum discussions. This FAQ is intended to give such users a rough idea of what’s going on.
Video and Data Compression: the basics
In an ideal world of fast cameras and unlimited data storage, video would be stored as a series of high-resolution frame images, all stored in a lossless and uncompressed data format. Imagine storing your video as a series of TIF or RAW files.
Only the really high-end video cameras, like the Red or the Arri, support this method. But for most other cameras, video is usually compressed. This is needed because the memory card may not be large or fast enough to capture the massive data streams created by uncompressed video. This requires the camera to perform some very intensive processing to compress the video data.
This processing takes place during shooting, so the camera has to do this very quickly. So the camera relies on shortcuts, which will be described later in this FAQ. As a result, the camera uses “lossy” data compression, and the image and motion quality can be compromised. (For now, we recommend reading Wikipedia on http://en.wikipedia.org/wiki/Video_coding#Video.)
Vitaly’s hack enables us to change the camera’s shortcuts a little, and improve the quality of its video. This reconfiguration is complex. Sometimes, the settings may not work with each other. (For example, some patches with high image quality may have trouble recording acceptable sound, or “spanning” across multiple files. More about this below.) Some users have spent a lot of time and effort experimenting with the settings, and they have developed “patches” with various strengths, weaknesses, and reliabilities. These include the Sanity patch, the FlowMotion patch, and the many patches developed by Driftwood.)
Intraframe and Interframe compression
Earlier, we said that video could be stored as a series of full-resolution TIF files, but this would create far too much data. One way of compressing that data would be to store each frame not as a TIF file, but as a compressed JPEG. (As you may know, saving a TIF file as a JPEG makes a much smaller file.) This technique of compressing each frame individually is called intraframing. Video compression doesn’t use JPEG format, but you get the idea. (See http://en.wikipedia.org/wiki/Intraframe for more information.)
The fact that we’re compressing video files, and that video frames look very much like each other, enables a more complicated technique called interframing. This is the technique used by the GH2.
Here, the camera records an initial frame (or “I-frame”). Usually, this I-frame is compressed by intraframing. But the second frame is recorded as only the changes that occur from the I-frame. (See http://en.wikipedia.org/wiki/Video_coding#Video) for a discussion.) So the video file may consist of the I-frame (Frame 1), and then data that contains only changes between Frame 1 and 2, and then, the changes between frames 2 and 3, etc. This is what interframing is.
Again, Interframing requires the camera to perform a LOT of math before writing the data to the memory card. (And such data requires a lot of math to play back.) It can also be a lossy technique, especially when combined with further compression techniques such as macroblocking and motion prediction.
Macroblocking and Motion Prediction
Earlier we said that, in interframing, the video records only the changes between one frame and the next. The recording of these changes can be compressed even more.
Sometimes, instead of recording all the individual pixel changes, it’s easier to say (in essence) “Take this group of pixels and move it one pixel to the left.” So, instead of recording the changes for every individual pixel, the camera may calculate the changes for small blocks of pixels—perhaps a 2×2 block, or a 4×4 block, a 4×6 block, or even a 16×16 block. These are called macroblocks. This technique enables the camera do perform its recording math much more quickly, but it does lose some data because the camera’s basically averaging the content of those blocks. (Consult http://en.wikipedia.org/wiki/Macroblocks for more details.)
When macroblocking isn’t done properly, you’ll see little squares in your video that look like flickering blocks. (This is usually seen in areas like the crowds at sports events, leaves on trees, or water in a running stream—where the details of motion can be lost because there’s a lot of similar movement nearby.) This is an artifact of macroblocking, and it’s one of the reasons why people hack the GH2 camera; we want to reduce or eliminate that garbage.
In addition to macroblocking, the camera uses motion prediction to compress the data further. This involves examining the frames, examining how parts of the image move between certain frames, and then storing estimates of where those parts are in the frames in-between.
It’s like this. The camera is looking at Frame 1 and sees a bundle of pixels in one spot. It looks at Frame 5 and sees a similar batch of pixels somewhere else. So it assumes that the two groups of pixels depict the same object, and it’s moved between the five frames. So it estimates where those pixels would be on frames 2, 3 and 4. This is called “motion prediction.” Again, it loses some data if it’s not done properly. Again, this is a complicated operation, so read http://en.wikipedia.org/wiki/Inter_frame for more information.
But here’s what you can take away for now. The video compression used by the GH2 works like this. The first frame is stored as-is. Each subsequent frame is stored as the changes from the previous frame. The changes are calculated using techniques like macroblocking and motion prediction. And these techniques sometimes make mistakes. Errors crop up. Motion doesn’t always match predictions. Macroblocking can look blocky. Both of these are lossy compression techniques. So some correction needs to be performed as well.
Temporal correction: P frames and B frames
We said earlier that full video would be a series of full frames stored at full resolution, while compressed video stores only one frame and estimates the rest. So you may have one initial, perfect frame, but the frames that follow become less and less perfect because prediction’s never quite accurate.
So, to compensate for this, maybe you have the camera record every tenth frame at full resolution. That way, the data compression doesn’t get too out of line: it’s as though you’re nudging it back to being accurate every third of a second.
This is where data compression and video can get very complicated. The Wikipedia page http://en.wikipedia.org/wiki/Inter_frame#Frame_types has a good description of this, and we’ll use its illustrations as an example.
Every tenth frame will be an I-frame; it’s usually compressed, by intraframing, but these frames are the least compressed in the data. Your camera uses motion prediction to “predict” what the fourth and seventh frames will be, based on the surrounding I-frames. The fourth and seventh frames are called P-frames (for “predicted”)
And the frames in between the P-frames—the B-frames, for “bi-directional”) are, in turn, “predicted” from the surrounding P-frames.
The three types of frames differ in terms of their reliability for accuracy. I-frames are, of course, very accurate and reliable. P-frames are estimates derived from the I-frame—they’re not perfect, but they’re very good. B-frames are estimates derived from P-frames—so they’re not as good as P-frames. (Some video codecs allow for deriving B-frames from other B-frames, but we don’t have to worry about that for now.)
Because the predicted frames are checked against frames in the past and future, the ‘drift” from macroblocking and motion prediction is reduced.
Quantization and Deblocking Tables
We won’t discuss these subjects in any great detail. This FAQ’s intended to explain things so that people have a general idea of what Vitaly’s Hack changes. But these topics– quantization and deblocking tables– can be deeply detailed. Unless you’re writing patches for the GH2, this is something you can skip.
But here’s a general idea of what they are. We’ve mentioned that data compression requires the GH2 to perform a lot of math in compressing the data. One way to speed up that process is to provide the GH2 with pre-calculated tables of data to use when compressing data. Some hacks, like Flow Motion, use custom-designed tables of data to improve the quality of the compressed video.
Group of Pictures (GOP)
This batch of frames– the initial I-frame and the following P- and B-frames—is referred to as a Group of Pictures, or GOP. (Every I-frame starts a new GOP.) Each GOP has two values that determine its structure; the GOP Size and GOP length.
The GOP Size is the number of frames between each P-frame. In our example above, that value would be 4.
The GOP Length is the number of frames in the group—or the number of frames between I-frames. In our example, this value would be 10.
(Duartix has pointed out that, in many of the discussions on Personal View, people use both “GOP Size” and “GOP length” to refer to the length between I-frames. This may confuse some people. But for the purposes of this FAQ, we’re not trying to turn everyone into a patch-designer; we just want people to have a good idea of what’s going on under the hood.)
Obviously, changing these numbers affects the quality of the data compression. If the GOP Size were reduced, then we would have more P-frames and fewer B-frames—the data would be less compressed, less lossy, and with less drift. And if the GOP Length was reduced, we would have more frequent I-frames, and the video data would be less compressed and more perfect.
So, at long last, we learn something about how the GH2 patches improve the quality of your video—by changing the GOP values.
Variable GOP values and bit rates
The GH2 can also change its strategy for compression, by changing the GOP values “on the fly.” For a video with very little movement—a shot of a still pond with no wind ruffling the trees– it may use high GOP values, use large macroblocks, and thus, compress the data by a huge margin. For a video with a lot of movement and action, where every pixel changes from frame to frame (like a panning shot of a football game), the camera will switch to low GOP values, store more I-frames, use smaller macroblocks, and create data that isn’t as compressed so you get to see more details. (It can also do things like use bigger macroblocks, change how it predicts movement, etc.)
But there are limits on how much the GH2 can do with this. That’s where bit rates come in.
The GH2 can handle only so much data at a time. So the GH2 can’t simply shoot constant I-frames, one after the other, because that’d create too much data for the system. But it can’t rely too much on the predictions of P-frames and B-frames, because they’ll start to look bad. It can’t use macroblocks that are too big, because that’d look blocky, but it can’t use macroblocks that are too small because it’d create too much data.
So the camera has to have limits. All of the techniques above can be adjusted, on the fly, for greater or lesser compression. So the camera is adjusting these techniques so that its data falls within a range, a sweet spot between good compression and high detail. This range is specified by the bit rates. There has to be a minimum bit rate so that all the frames have a minimum standard of quality, but also a maximum bit rate so the data limits aren’t broken. A unhacked GH2’s highest bit rate is 24mbps. So the GH2 is constantly adjusting its compression techniques, as described above, so that the data stream doesn’t exceed 24mbps.
But if you have a hacked GH2, you can specify a higher bit rate—say, 40mbps, 80mbps, or in the case of some of Driftwood’s patches, more than 100 mbps. Now, the camera has the leeway to use compression techniques that are less lossy, and preserve movement and detail more effectively. That is the advantage of higher bit rates.
This does not always work. You can’t simply use Vitaly’s hack to specify a 100 mbps bit rate and the camera to work flawlessly. Your memory card may not be fast enough or large enough to handle a 100 mbps flood of data. The CPU can overload. Or, the other settings, like the GOP values, create conflicts or don’t work right.
Happily, we don’t have to go into why this doesn’t always work. That’s for the people who design and test the patches, and who read all of the forum posts about these things. They try various combinations of GOP sizes, bit rates, specs for the macroblocking and motion prediction, until they find combinations that do work.
The rest of us don’t need to understand this at that level. All we need to know for now is what the “bit rate” really means for us. The bit rate is a number that shows how much range the camera has in compressing video; the higher the number, less loss in compression.