Recherche avancée

Médias (91)

Autres articles (57)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • Creating farms of unique websites

    13 avril 2011, par

    MediaSPIP platforms can be installed as a farm, with a single "core" hosted on a dedicated server and used by multiple websites.
    This allows (among other things) : implementation costs to be shared between several different projects / individuals rapid deployment of multiple unique sites creation of groups of like-minded sites, making it possible to browse media in a more controlled and selective environment than the major "open" (...)

  • Other interesting software

    13 avril 2011, par

    We don’t claim to be the only ones doing what we do ... and especially not to assert claims to be the best either ... What we do, we just try to do it well and getting better ...
    The following list represents softwares that tend to be more or less as MediaSPIP or that MediaSPIP tries more or less to do the same, whatever ...
    We don’t know them, we didn’t try them, but you can take a peek.
    Videopress
    Website : http://videopress.com/
    License : GNU/GPL v2
    Source code : (...)

Sur d’autres sites (5170)

  • Of ctors and dtors

    18 février 2011, par Multimedia Mike — Programming, Sega Dreamcast

    I haven’t given up on the Sega Dreamcast programming. I was able to compile a bunch of homebrew code for the DC many years ago and I can’t make it work anymore. Again, I was working with a purpose-built, open source RTOS named KallistiOS (or KOS). I can make the programs compile but not run. I had ELF files left over from years ago which still executed. But when I tried to build new ELF files, no luck— the programs crashed before even reaching my main() function.

    I found the problem : ELF files are comprised of a number of sections and 2 of these sections are named ’.ctors’ and ’.dtors’ which stand for constructors and destructors. The KOS RTOS performs a manual traversal of .ctors section during program initialization and this is where things go bad. The traversal code doesn’t seem to account for a .ctors section that only contains a single entry. I commented out the function that does the traversal and programs started to work, at least until it was time to exit the program and return control to the program loader. That’s when the counterpart .dtors section traversal code ran and demonstrated the same problem. I’ll exhibit the problematic code at the end of this post.

    So I’m finally tinkering with Sega Dreamcast programming once again and with a slightly better grasp of software engineering than the first time I did this.

    Portable and Compatible C ?
    If nothing else, this low-level embedded stuff exposes you to some serious toolchain arcana, the likes of which you will likely never see working strictly in the desktop arena.

    Still, this exercise makes me wonder why C code from a decade ago doesn’t compile reliably now. Part of it is because gcc has gotten stricter about the syntax it will accept. In the case of this specific crashing problem, I suspect it comes down to a difference in the way the linker generates the final ELF file. I’ve written a list of items I have had to modify in the KOS codebase in order to get it to compile on more recent gcc versions. I wonder if it would be worth publishing the specifics, or if anyone would ever find the information useful ? Oh, who am I kidding ? Of course I’ll write it up, perhaps publish a new version of the code, if only because that’s the best chance I have of finding my own work again some years down the road.

    Problematic C Code
    See if this code makes any sense to you. It somehow traverse a list of 32-bit function pointers (in different directions, depending on constructors or destructors), executing each in turn. However, it appears to fall over if the list of pointers consists of a single entry.

    C :
    1. typedef void (*fptr)(void) ;
    2.  
    3. static fptr ctor_list[1] __attribute__((section(".ctors"))) = { (fptr) -1 } ;
    4. static fptr dtor_list[1] __attribute__((section(".dtors"))) = { (fptr) -1 } ;
    5.  
    6. /* Call this to execute all ctors */
    7. void arch_ctors() {
    8.     fptr *fpp ;
    9.  
    10.     /* Run up to the end of the list (defined by crtend) */
    11.     for (fpp=ctor_list + 1 ; *fpp != 0 ; ++fpp)
    12.          ;
    13.  
    14.     /* Now run the ctors backwards */
    15.     while (—fpp> ctor_list)
    16.         (**fpp)() ;
    17. }
    18.  
    19. /* Call this to execute all dtors */
    20. void arch_dtors() {
    21.     fptr *fpp ;
    22.  
    23.     /* Do the dtors forwards */
    24.     for (fpp=dtor_list + 1 ; *fpp != 0 ; ++fpp )
    25.         (**fpp)() ;
    26. }
  • How to cheat on video encoder comparisons

    21 juin 2010, par Dark Shikari — H.264, benchmark, stupidity, test sequences

    Over the past few years, practically everyone and their dog has published some sort of encoder comparison. Sometimes they’re actually intended to be something for the world to rely on, like the old Doom9 comparisons and the MSU comparisons. Other times, they’re just to scratch an itch — someone wants to decide for themselves what is better. And sometimes they’re just there to outright lie in favor of whatever encoder the author likes best. The latter is practically an expected feature on the websites of commercial encoder vendors.

    One thing almost all these comparisons have in common — particularly (but not limited to !) the ones done without consulting experts — is that they are horribly done. They’re usually easy to spot : for example, two videos at totally different bitrates are being compared, or the author complains about one of the videos being “washed out” (i.e. he screwed up his colorspace conversion). Or the results are simply nonsensical. Many of these problems result from the person running the test not “sanity checking” the results to catch mistakes that he made in his test. Others are just outright intentional.

    The result of all these mistakes, both intentional and accidental, is that the results of encoder comparisons tend to be all over the map, to the point of absurdity. For any pair of encoders, it’s practically a given that a comparison exists somewhere that will “prove” any result you want to claim, even if the result would be beyond impossible in any sane situation. This often results in the appearance of a “controversy” even if there isn’t any.

    Keep in mind that every single mistake I mention in this article has actually been done, usually in more than one comparison. And before I offend anyone, keep in mind that when I say “cheating”, I don’t mean to imply that everyone that makes the mistake is doing it intentionally. Especially among amateur comparisons, most of the mistakes are probably honest.

    So, without further ado, we will investigate a wide variety of ways, from the blatant to the subtle, with which you too can cheat on your encoder comparisons.

    Blatant cheating

    1. Screw up your colorspace conversions. A common misconception is that converting from YUV to RGB and back is a simple process where nothing can go wrong. This is quite untrue. There are two primary attributes of YUV : PC range (0-255) vs TV range (16-235) and BT.709 vs BT.601 conversion coefficients. That sums up to a total of 4 possible different types of YUV. When people compare encoders, they often use different frontends, some of which make incorrect assumptions about these attributes.

    Incorrect assumptions are so common that it’s often a matter of luck whether the tool gets it right or not. It doesn’t help that most videos don’t even properly signal which they are to begin with ! Often even the tool that the person running the comparison is using to view the source material gets the conversion wrong.

    Subsampling YUV (aka what everyone uses) adds yet another dimension to the problem : the locations which the chroma data represents (“chroma siting”) isn’t constant. For example, JPEG and MPEG-2 define different positions. This is even worse because almost nobody actually handles this correctly — the best approach is to simply make sure none of your software is doing any conversion. A mistake in chroma siting is what created that infamous PSNR graph showing Theora beating x264, which has been cited for ages since despite the developers themselves retracting it after realizing their mistake.

    Keep in mind that the video encoder is not responsible for colorspace conversion — almost all video encoders operate in the YUV domain (usually subsampled 4:2:0 YUV, aka YV12). Thus any problem in colorspace conversion is usually the fault of the tools used, not the actual encoder.

    How to spot it : “The color is a bit off” or “the contrast of the video is a bit duller”. There were a staggering number of “H.264 vs Theora” encoder comparisons which came out in favor of one or the other solely based on “how well the encoder kept the color” — making the results entirely bogus.

    2. Don’t compare at the same (or nearly the same) bitrate. I saw a VP8 vs x264 comparison the other day that gave VP8 30% more bitrate and then proceeded to demonstrate that it got better PSNR. You would think this is blindingly obvious, but people still make this mistake ! The most common cause of this is assuming that encoders will successfully reach the target bitrate you ask of them — particularly with very broken encoders that don’t. Always check the output filesizes of your encodes.

    How to spot it : The comparison lists perfectly round bitrates for every single test, as opposed to the actual bitrates achieved by the encoders, which will never be exactly matching in any real test.

    3. Use unfair encoding settings. This is a bit of a wide topic : there are many ways to do this. We’ll cover the more blatant ones in this part. Here’s some common ones :

    a. Simply cheat. Intentionally pick awful settings for the encoder you don’t like.

    b. Don’t consider performance. Pick encoding settings without any regard for some particular performance goal. For example, it’s perfectly reasonable to say “use the best settings possible, regardless of speed”. It’s also reasonable to look for a particular encoding speed target. But what isn’t reasonable is to pick extremely fast settings for one encoder and extremely slow settings for another encoder.

    c. Don’t attempt match compatibility options when it’s reasonable to do so. Keyframe interval is a classic one of these : shorter values reduce compression but improve seeking. An easy way to cheat is to simply not set them to the same value, biasing towards whatever encoder has the longer interval. This is most common as an accidental mistake with comparisons involving ffmpeg, where the default keyframe interval is an insanely low 12 frames.

    How to spot it : The comparison doesn’t document its approach regarding choice of encoding settings.

    4. Use ratecontrol methods unfairly. Constant bitrate is not the same as average bitrate — using one instead of the other is a great way to completely ruin a comparison. Another method is to use 1-pass bitrate mode for one encoder and 2-pass or constant quality for another. A good general approach is that, for any given encoder, one should use 2-pass if available and constant quality if not (it may take a few runs to get the bitrate you want, of course).

    Of course, it’s also fine to run a comparison with a particular mode in mind — for example, a comparison targeted at streaming applications might want to test using 1-pass CBR. Of course, in such a case, if CBR is not available in an encoder, you can’t compare to that encoder.

    How to spot it : It’s usually pretty obvious if the encoding settings are given.

    5. Use incredibly old versions of encoders. As it happens, Debian stable is not the best source for the most recent encoding software. Equally, using recent versions known to be buggy.

    6. Don’t distinguish between video formats and the software that encodes them. This is incredibly common : I’ve seen tests that claim to compare “H.264″ against something else while in fact actually comparing “Quicktime” against something else. It’s impossible to compare all H.264 encoders at once, so don’t even try — just call the comparison “Quicktime versus X” instead of “H.264 versus X”. Or better yet, use a good H.264 encoder, like x264 and don’t bother testing awful encoders to begin with.

    Less-obvious cheating

    1. Pick a bitrate that’s way too low. Low bitrate testing is very effective at making differences between encoders obvious, particularly if doing a visual comparison. But past a certain point, it becomes impossible for some encoders to keep up. This is usually an artifact of the video format itself — a scalability limitation. Practically all DCT-based formats have this kind of limitation (wavelets are mostly immune).

    In reality, this is rarely a problem, because one could merely downscale the video to resolve the problem — lower resolutions need fewer bits. But people rarely do this in comparisons (it’s hard to do it fairly), so the best approach is to simply not use absurdly low bitrates. What is “absurdly low” ? That’s a hard question — it ends up being a matter of using one’s best judgement.

    This tends to be less of a problem in larger-scale tests that use many different bitrates.

    How to spot it : At least one of the encoders being compared falls apart completely and utterly in the screenshots.

    Biases towards, a lot : Video formats with completely scalable coding methods (Dirac, Snow, JPEG-2000, SVC).

    Biases towards, a little : Video formats with coding methods that improve scalability, such as arithmetic coding, B-frames, and run-length coding. For example, H.264 and Theora tend to be more scalable than MPEG-4.

    2. Pick a bitrate that’s way too high. This is staggeringly common mistake : pick a bitrate so high that all of the resulting encodes look absolutely perfect. The claim is then made that “there’s no significant difference” between any of the encoders tested. This is surprisingly easy to do inadvertently on sources like Big Buck Bunny, which looks transparent at relatively low bitrates. An equally common but similar mistake is to test at a bitrate that isn’t so high that the videos look perfect, but high enough that they all look very good. The claim is then made that “the difference between these encoders is small”. Well, of course, if you give everything tons of bitrate, the difference between encoders is small.

    How to spot it : You can’t tell which image is the source and which is the encode.

    3. Making invalid comparisons using objective metrics. I explained this earlier in the linked blog post, but in short, if you’re going to measure PSNR, make sure all the encoders are optimized for PSNR. Equally, if you’re going to leave the encoder optimized for visual quality, don’t measure PSNR — post screenshots instead. Same with SSIM or any other objective metric. Furthermore, don’t blindly do metric comparisons — always at least look at the output as a sanity test. Finally, do not claim that PSNR is particularly representative of visual quality, because it isn’t.

    How to spot it : Encoders with psy optimizations, such as x264 or Theora 1.2, do considerably worse than expected in PSNR tests, but look much better in visual comparisons.

    4. Lying with graphs. Using misleading scales on graphs is a great way to make the differences between encoders seem larger or smaller than they actually are. A common mistake is to scale SSIM linearly : in fact, 0.99 is about twice as good as 0.98, not 1% better. One solution for this is to use db to compare SSIM values.

    5. Using lossy screenshots. Posting screenshots as JPEG is a silly, pointless way to worsen an encoder comparison.

    Subtle cheating

    1. Unfairly pick screenshots for comparison. Comparing based on stills is not ideal, but it’s often vastly easier than comparing videos in motion. But it also opens up the door to unfairness. One of the most common mistakes is to pick a frame immediately after (or on) a keyframe for one encoder, but which isn’t for the other encoder. Particularly in the case of encoders that massively boost keyframe quality, this will unfairly bias in favor of the one with the recent keyframe.

    How to spot it : It’s very difficult to tell, if not impossible, unless they provide the video files to inspect.

    2. Cherry-pick source videos. Good source videos are incredibly hard to come by — almost everything is already compressed and what’s left is usually a very poor example of real content. Here’s some common ways to bias unfairly using cherry-picking :

    a. Pick source videos that are already heavily compressed. Pre-compressed source isn’t much of an issue if your target quality level for testing is much lower than that of the source, since any compression artifacts in the source will be a lot smaller than those created by the encoders. But if the source is already very compressed, or you’re testing at a relatively high quality level, this becomes a significant issue.

    Biases towards : Anything that uses a similar transform to the source content. For MPEG-2 source material, this biases towards formats that use the 8x8dct or a very close approximation : MPEG-1/2/4, H.263, and Theora. For H.264 source material, this biases towards formats that use a 4×4 transform : H.264 and VP8.

    b. Pick standard test clips that were not intended for this purpose. There are a wide variety of uncompressed “standard test clips“. Some of these are not intended for general-purpose use, but rather exist to test specific encoder capabilities. For example, Mobile Calendar (“mobcal”) is extremely sharp and low motion, serving to test interpolation capabilities. It will bias incredibly heavily towards whatever encoder uses more B-frames and/or has higher-precision motion compensation. Other test clips are almost completely static, such as the classic “akiyo”. These are also not particularly representative of real content.

    c. Pick very noisy content. Noise is — by definition — not particularly compressible. Both in terms of PSNR and visual quality, a very noisy test clip will tend to reduce the differences between encoders dramatically.

    d. Pick a test clip to exercise a specific encoder feature. I’ve often used short clips from Touhou games to demonstrate the effectiveness of x264′s macroblock-tree algorithm. I’ve sometimes even used it to compare to other encoders as part of such a demonstration. I’ve also used the standard test clip “parkrun” as a demonstration of adaptive quantization. But claiming that either is representative of most real content — and thus can be used as a general determinant of how good encoders are — is of course insane.

    e. Simply encode a bunch of videos and pick the one your favorite encoder does best on.

    3. Preprocessing the source. A encoder test is a test of encoders, not preprocessing. Some encoding apps may add preprocessors to the source, such as noise reduction. This may make the video look better — possibly even better than the source — but it’s not a fair part of comparing the actual encoders.

    4. Screw up decoding. People often forget that in addition to encoding, a test also involves decoding — a step which is equally possible to do wrong. One common error caused by this is in tests of Theora on content whose resolution isn’t divisible by 16. Decoding is often done with ffmpeg — which doesn’t crop the edges properly in some cases. This isn’t really a big deal visually, but in a PSNR comparison, misaligning the entire frame by 4 or 8 pixels is a great way of completely invalidating the results.

    The greatest mistake of all

    Above all, the biggest and most common mistake — and the one that leads to many of the problems mentioned here – is the mistaken belief that one, or even a few tests can really represent all usage fairly. Any comparison has to have some specific goal — to compare something in some particular case, whether it be “maximum offline compression ignoring encoding speed” or “real-time high-speed video streaming” or whatnot. And even then, no comparison can represent all use-cases in that category alone. An encoder comparison can only be honest if it’s aware of its limitations.

  • Anatomy of an optimization : H.264 deblocking

    26 mai 2010, par Dark Shikari — H.264, assembly, development, speed, x264

    As mentioned in the previous post, H.264 has an adaptive deblocking filter. But what exactly does that mean — and more importantly, what does it mean for performance ? And how can we make it as fast as possible ? In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.

    H.264′s deblocking filter has two steps : strength calculation and the actual filter. The first step calculates the parameters for the second step. The filter runs on all the edges in each macroblock. That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels. The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters !). The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.

    Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.

    If we’re on the edge between an intra macroblock and any other macroblock : Strength 4
    If we’re on an internal edge of an intra macroblock : Strength 3
    If either side of a 4-pixel-long edge has residual data : Strength 2
    If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same : Strength 1
    Otherwise : Strength 0 (no deblocking)

    These values are then thrown into a lookup table depending on the quantizer : higher quantizers have stronger deblocking. Then the actual filter is run with the appropriate parameters. Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.

    One can see somewhat intuitively why these strengths are chosen. The deblocker exists to get rid of sharp edges caused by the block-based nature of H.264, and so the strength depends on what exists that might cause such sharp edges. The strength calculation is a way to use existing data from the video stream to make better decisions during the deblocking process, improving compression and quality.

    Both the strength calculation and the actual filter (not described here) are very complex if naively implemented. The latter can be SIMD’d with not too much difficulty ; no H.264 decoder can get away with reasonable performance without such a thing. But what about optimizing the strength calculation ? A quick analysis shows that this can be beneficial as well.

    Since we have to check both horizontal and vertical edges, we have to check up to 32 pairs of coefficient counts (for residual), 16 pairs of reference frame indices, and 128 motion vector values (counting x and y as separate values). This is a lot of calculation ; a naive implementation can take 500-1000 clock cycles on a modern CPU. Of course, there’s a lot of shortcuts we can take. Here’s some examples :

    • If the macroblock uses the 8×8 transform, we only need to check 2 edges in each direction instead of 4, because we don’t deblock inside of the 8×8 blocks.
    • If the macroblock is a P-skip, we only have to check the first edge in each direction, since there’s guaranteed to be no motion vector differences, reference frame differences, or residual inside of the macroblock.
    • If the macroblock has no residual at all, we can skip that check.
    • If we know the partition type of the macroblock, we can do motion vector checks only along the edges of the partitions.
    • If the effective quantizer is so low that no deblocking would be performed no matter what, don’t bother calculating the strength.

    But even all of this doesn’t save us from ourselves. We still have to iterate over a ton of edges, checking each one. Stuff like the partition-checking logic greatly complicates the code and adds overhead even as it reduces the number of checks. And in many cases decoupling the checks to add such logic will make it slower : if the checks are coupled, we can avoid doing a motion vector check if there’s residual, since Strength 2 overrides Strength 1.

    But wait. What if we could do this in SIMD, just like the actual loopfilter itself ? Sure, it seems more of a problem for C code than assembly, but there aren’t any obvious things in the way. Many years ago, Loren Merritt (pengvado) wrote the first SIMD implementation that I know of (for ffmpeg’s decoder) ; it is quite fast, so I decided to work on porting the idea to x264 to see if we could eke out a bit more speed here as well.

    Before I go over what I had to do to make this change, let me first describe how deblocking is implemented in x264. Since the filter is a loopfilter, it acts “in loop” and must be done in both the encoder and decoder — hence why x264 has it too, not just decoders. At the end of encoding one row of macroblocks, x264 goes back and deblocks the row, then performs half-pixel interpolation for use in encoding the next frame.

    We do it per-row for reasons of cache coherency : deblocking accesses a lot of pixels and a lot of code that wouldn’t otherwise be used, so it’s more efficient to do it in a single pass as opposed to deblocking each macroblock immediately after encoding. Then half-pixel interpolation can immediately re-use the resulting data.

    Now to the change. First, I modified deblocking to implement a subset of the macroblock_cache_load function : spend an extra bit of effort loading the necessary data into a data structure which is much simpler to address — as an assembly implementation would need (x264_macroblock_cache_load_deblock). Then I massively cleaned up deblocking to move all of the core strength-calculation logic into a single, small function that could be converted to assembly (deblock_strength_c). Finally, I wrote the assembly functions and worked with Loren to optimize them. Here’s the result.

    And the timings for the resulting assembly function on my Core i7, in cycles :

    deblock_strength_c : 309
    deblock_strength_mmx : 79
    deblock_strength_sse2 : 37
    deblock_strength_ssse3 : 33

    Now that is a seriously nice improvement. 33 cycles on average to perform that many comparisons–that’s absurdly low, especially considering the SIMD takes no branchy shortcuts : it always checks every single edge ! I walked over to my performance chart and happily crossed off a box.

    But I had a hunch that I could do better. Remember, as mentioned earlier, we’re reloading all that data back into our data structures in order to address it. This isn’t that slow, but takes enough time to significantly cut down on the gain of the assembly code. And worse, less than a row ago, all this data was in the correct place to be used (when we just finished encoding the macroblock) ! But if we did the deblocking right after encoding each macroblock, the cache issues would make it too slow to be worth it (yes, I tested this). So I went back to other things, a bit annoyed that I couldn’t get the full benefit of the changes.

    Then, yesterday, I was talking with Pascal, a former Xvid dev and current video hacker over at Google, about various possible x264 optimizations. He had seen my deblocking changes and we discussed that a bit as well. Then two lines hit me like a pile of bricks :

    <_skal_> tried computing the strength at least ?
    <_skal_> while it’s fresh

    Why hadn’t I thought of that ? Do the strength calculation immediately after encoding each macroblock, save the result, and then go pick it up later for the main deblocking filter. Then we can use the data right there and then for strength calculation, but we don’t have to do the whole deblock process until later.

    I went and implemented it and, after working my way through a horde of bugs, eventually got a working implementation. A big catch was that of slices : deblocking normally acts between slices even though normal encoding does not, so I had to perform extra munging to get that to work. By midday today I was able to go cross yet another box off on the performance chart. And now it’s committed.

    Sometimes chatting for 10 minutes with another developer is enough to spot the idea that your brain somehow managed to miss for nearly a straight week.

    NB : the performance chart is on a specific test clip at a specific set of settings (super fast settings) relevant to the company I work at, so it isn’t accurate nor complete for, say, default settings.

    Update : Here’s a higher resolution version of the current chart, as requested in the comments.