Recherche avancée

Médias (91)

Autres articles (73)

  • Organiser par catégorie

    17 mai 2013, par

    Dans MédiaSPIP, une rubrique a 2 noms : catégorie et rubrique.
    Les différents documents stockés dans MédiaSPIP peuvent être rangés dans différentes catégories. On peut créer une catégorie en cliquant sur "publier une catégorie" dans le menu publier en haut à droite ( après authentification ). Une catégorie peut être rangée dans une autre catégorie aussi ce qui fait qu’on peut construire une arborescence de catégories.
    Lors de la publication prochaine d’un document, la nouvelle catégorie créée sera proposée (...)

  • Récupération d’informations sur le site maître à l’installation d’une instance

    26 novembre 2010, par

    Utilité
    Sur le site principal, une instance de mutualisation est définie par plusieurs choses : Les données dans la table spip_mutus ; Son logo ; Son auteur principal (id_admin dans la table spip_mutus correspondant à un id_auteur de la table spip_auteurs)qui sera le seul à pouvoir créer définitivement l’instance de mutualisation ;
    Il peut donc être tout à fait judicieux de vouloir récupérer certaines de ces informations afin de compléter l’installation d’une instance pour, par exemple : récupérer le (...)

  • Taille des images et des logos définissables

    9 février 2011, par

    Dans beaucoup d’endroits du site, logos et images sont redimensionnées pour correspondre aux emplacements définis par les thèmes. L’ensemble des ces tailles pouvant changer d’un thème à un autre peuvent être définies directement dans le thème et éviter ainsi à l’utilisateur de devoir les configurer manuellement après avoir changé l’apparence de son site.
    Ces tailles d’images sont également disponibles dans la configuration spécifique de MediaSPIP Core. La taille maximale du logo du site en pixels, on permet (...)

Sur d’autres sites (5754)

  • VP8 : a retrospective

    13 juillet 2010, par Dark Shikari — DCT, speed, VP8

    I’ve been working the past few weeks to help finish up the ffmpeg VP8 decoder, the first community implementation of On2′s VP8 video format. Now that I’ve written a thousand or two lines of assembly code and optimized a good bit of the C code, I’d like to look back at VP8 and comment on a variety of things — both good and bad — that slipped the net the first time, along with things that have changed since the time of that blog post.

    These are less-so issues related to compression — that issue has been beaten to death, particularly in MSU’s recent comparison, where x264 beat the crap out of VP8 and the VP8 developers pulled a Pinocchio in the developer comments. But that was expected and isn’t particularly interesting, so I won’t go into that. VP8 doesn’t have to be the best in the world in order to be useful.

    When the ffmpeg VP8 decoder is complete (just a few more asm functions to go), we’ll hopefully be able to post some benchmarks comparing it to libvpx.

    1. The spec, er, I mean, bitstream guide.

    Google has reneged on their claim that a spec existed at all and renamed it a “bitstream guide”. This is probably after it was found that — not merely was it incomplete — but at least a dozen places in the spec differed wildly from what was actually in their own encoder and decoder software ! The deblocking filter, motion vector clamping, probability tables, and many more parts simply disagreed flat-out with the spec. Fortunately, Ronald Bultje, one of the main authors of the ffmpeg VP8 decoder, is rather skilled at reverse-engineering, so we were able to put together a matching implementation regardless.

    Most of the differences aren’t particularly important — they don’t have a huge effect on compression or anything — but make it vastly more difficult to implement a “working” VP8 decoder, or for that matter, decide what “working” really is. For example, Google’s decoder will, if told to “swap the ALT and GOLDEN reference frames”, overwrite both with GOLDEN, because it first sets GOLDEN = ALT, and then sets ALT = GOLDEN. Is this a bug ? Or is this how it’s supposed to work ? It’s hard to tell — there isn’t a spec to say so. Google says that whatever libvpx does is right, but I doubt they intended this.

    I expect a spec will eventually be written, but it was a bit obnoxious of Google — both to the community and to their own developers — to release so early that they didn’t even have their own documentation ready.

    2. The TM intra prediction mode.

    One thing I glossed over in the original piece was that On2 had added an extra intra prediction mode to the standard batch that H.264 came with — they replaced Planar with “TM pred”. For i4x4, which didn’t have a Planar mode, they just added it without replacing an old one, resulting in a total of 10 modes to H.264′s 9. After understanding and writing assembly code for TM pred, I have to say that it is quite a cool idea. Here’s how it works :

    1. Let us take a block of size 4×4, 8×8, or 16×16.

    2. Define the pixels bordering the top of this block (starting from the left) as T[0], T[1], T[2]…

    3. Define the pixels bordering the left of this block (starting from the top) as L[0], L[1], L[2]…

    4. Define the pixel above the top-left of the block as TL.

    5. Predict every pixel <X,Y> in the block to be equal to clip3( T[X] + L[Y] – TL, 0, 255).

    It’s effectively a generalization of gradient prediction to the block level — predict each pixel based on the gradient between its top and left pixels, and the topleft. According to the VP8 devs, it’s chosen by the encoder quite a lot of the time, which isn’t surprising ; it seems like a pretty good idea. As just one more intra pred mode, it’s not going to do magic for compression, but it’s a cool idea and elegantly simple.

    3. Performance and the deblocking filter.

    On2 advertised for quite some that VP8′s goal was to be significantly faster to decode than H.264. When I saw the spec, I waited for the punchline, but apparently they were serious. There’s nothing wrong with being of similar speed or a bit slower — but I was rather confused as to the fact that their design didn’t match their stated goal at all. What apparently happened is they had multiple profiles of VP8 — high and low complexity profiles. They marketed the performance of the low complexity ones while touting the quality of the high complexity ones, a tad dishonest. More importantly though, practically nobody is using the low complexity modes, so anyone writing a decoder has to be prepared to handle the high complexity ones, which are the default.

    The primary time-eater here is the deblocking filter. VP8, being an H.264 derivative, has much the same problem as H.264 does in terms of deblocking — it spends an absurd amount of time there. As I write this post, we’re about to finish some of the deblocking filter asm code, but before it’s committed, up to 70% or more of total decoding time is spent in the deblocking filter ! Like H.264, it suffers from the 4×4 transform problem : a 4×4 transform requires a total of 8 length-16 and 8 length-8 loopfilter calls per macroblock, while Theora, with only an 8×8 transform, requires half that.

    This problem is aggravated in VP8 by the fact that the deblocking filter isn’t strength-adaptive ; if even one 4×4 block in a macroblock contains coefficients, every single edge has to be deblocked. Furthermore, the deblocking filter itself is quite complicated ; the “inner edge” filter is a bit more complex than H.264′s and the “macroblock edge” filter is vastly more complicated, having two entirely different codepaths chosen on a per-pixel basis. Of course, in SIMD, this means you have to do both and mask them together at the end.

    There’s nothing wrong with a good-but-slow deblocking filter. But given the amount of deblocking one needs to do in a 4×4-transform-based format, it might have been a better choice to make the filter simpler. It’s pretty difficult to beat H.264 on compression, but it’s certainly not hard to beat it on speed — and yet it seems VP8 missed a perfectly good chance to do so. Another option would have been to pick an 8×8 transform instead of 4×4, reducing the amount of deblocking by a factor of 2.

    And yes, there’s a simple filter available in the low complexity profile, but it doesn’t help if nobody uses it.

    4. Tree-based arithmetic coding.

    Binary arithmetic coding has become the standard entropy coding method for a wide variety of compressed formats, ranging from LZMA to VP6, H.264 and VP8. It’s simple, relatively fast compared to other arithmetic coding schemes, and easy to make adaptive. The problem with this is that you have to come up with a method for converting non-binary symbols into a list of binary symbols, and then choosing what probabilities to use to code each one. Here’s an example from H.264, the sub-partition mode symbol, which is either 8×8, 8×4, 4×8, or 4×4. encode_decision( context, bit ) writes a binary decision (bit) into a numbered context (context).

    8×8 : encode_decision( 21, 0 ) ;

    8×4 : encode_decision( 21, 1 ) ; encode_decision( 22, 0 ) ;

    4×8 : encode_decision( 21, 1 ) ; encode_decision( 22, 1 ) ; encode_decision( 23, 1 ) ;

    4×4 : encode_decision( 21, 1 ) ; encode_decision( 22, 1 ) ; encode_decision( 23, 0 ) ;

    As can be seen, this is clearly like a Huffman tree. Wouldn’t it be nice if we could represent this in the form of an actual tree data structure instead of code ? On2 thought so — they designed a simple system in VP8 that allowed all binarization schemes in the entire format to be represented as simple tree data structures. This greatly reduces the complexity — not speed-wise, but implementation-wise — of the entropy coder. Personally, I quite like it.

    5. The inverse transform ordering.

    I should at some point write a post about common mistakes made in video formats that everyone keeps making. These are not issues that are patent worries or huge issues for compression — just stupid mistakes that are repeatedly made in new video formats, probably because someone just never asked the guy next to him “does this look stupid ?” before sticking it in the spec.

    One common mistake is the problem of transform ordering. Every sane 2D transform is “separable” — that is, it can be done by doing a 1D transform vertically and doing the 1D transform again horizontally (or vice versa). The original iDCT as used in JPEG, H.263, and MPEG-1/2/4 was an “idealized” iDCT — nobody had to use the exact same iDCT, theirs just had to give very close results to a reference implementation. This ended up resulting in a lot of practical problems. It was also slow ; the only way to get an accurate enough iDCT was to do all the intermediate math in 32-bit.

    Practically every modern format, accordingly, has specified an exact iDCT. This includes H.264, VC-1, RV40, Theora, VP8, and many more. Of course, with an exact iDCT comes an exact ordering — while the “real” iDCT can be done in any order, an exact iDCT usually requires an exact order. That is, it specifies horizontal and then vertical, or vertical and then horizontal.

    All of these transforms end up being implemented in SIMD. In SIMD, a vertical transform is generally the only option, so a transpose is added to the process instead of doing a horizontal transform. Accordingly, there are two ways to do it :

    1. Transpose, vertical transform, transpose, vertical transform.

    2. Vertical transform, transpose, vertical transform, transpose.

    These may seem to be equally good, but there’s one catch — if the transpose is done first, it can be completely eliminated by merging it into the coefficient decoding process. On many modern CPUs, particularly x86, transposes are very expensive, so eliminating one of the two gives a pretty significant speed benefit.

    H.264 did it way 1).

    VC-1 did it way 1).

    Theora (inherited from VP3) did it way 1).

    But no. VP8 has to do it way 2), where you can’t eliminate the transpose. Bah. It’s not a huge deal ; probably only 1-2% overall at most speed-wise, but it’s just a needless waste. What really bugs me is that VP3 got it right — why in the world did they screw it up this time around if they got it right beforehand ?

    RV40 is the other modern format I know that made this mistake.

    (NB : You can do transforms without a transpose, but it’s generally not worth it unless the intermediate needs 32-bit math, as in the case of the “real” iDCT.)

    6. Not supporting interlacing.

    THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU.

    Interlacing was the scourge of H.264. It weaseled its way into every nook and cranny of the spec, making every decoder a thousand lines longer. H.264 even included a highly complicated — and effective — dedicated interlaced coding scheme, MBAFF. The mere existence of MBAFF, despite its usefulness for broadcasters and others still stuck in the analog age with their 1080i, 576i , and 480i content, was a blight upon the video format.

    VP8 has once and for all avoided it.

    And if anyone suggests adding interlaced support to the experimental VP8 branch, find a straightjacket and padded cell for them before they cause any real damage.

  • VP8 : a retrospective

    13 juillet 2010, par Dark Shikari — DCT, VP8, speed

    I’ve been working the past few weeks to help finish up the ffmpeg VP8 decoder, the first community implementation of On2′s VP8 video format. Now that I’ve written a thousand or two lines of assembly code and optimized a good bit of the C code, I’d like to look back at VP8 and comment on a variety of things — both good and bad — that slipped the net the first time, along with things that have changed since the time of that blog post.

    These are less-so issues related to compression — that issue has been beaten to death, particularly in MSU’s recent comparison, where x264 beat the crap out of VP8 and the VP8 developers pulled a Pinocchio in the developer comments. But that was expected and isn’t particularly interesting, so I won’t go into that. VP8 doesn’t have to be the best in the world in order to be useful.

    When the ffmpeg VP8 decoder is complete (just a few more asm functions to go), we’ll hopefully be able to post some benchmarks comparing it to libvpx.

    1. The spec, er, I mean, bitstream guide.

    Google has reneged on their claim that a spec existed at all and renamed it a “bitstream guide”. This is probably after it was found that — not merely was it incomplete — but at least a dozen places in the spec differed wildly from what was actually in their own encoder and decoder software ! The deblocking filter, motion vector clamping, probability tables, and many more parts simply disagreed flat-out with the spec. Fortunately, Ronald Bultje, one of the main authors of the ffmpeg VP8 decoder, is rather skilled at reverse-engineering, so we were able to put together a matching implementation regardless.

    Most of the differences aren’t particularly important — they don’t have a huge effect on compression or anything — but make it vastly more difficult to implement a “working” VP8 decoder, or for that matter, decide what “working” really is. For example, Google’s decoder will, if told to “swap the ALT and GOLDEN reference frames”, overwrite both with GOLDEN, because it first sets GOLDEN = ALT, and then sets ALT = GOLDEN. Is this a bug ? Or is this how it’s supposed to work ? It’s hard to tell — there isn’t a spec to say so. Google says that whatever libvpx does is right, but I doubt they intended this.

    I expect a spec will eventually be written, but it was a bit obnoxious of Google — both to the community and to their own developers — to release so early that they didn’t even have their own documentation ready.

    2. The TM intra prediction mode.

    One thing I glossed over in the original piece was that On2 had added an extra intra prediction mode to the standard batch that H.264 came with — they replaced Planar with “TM pred”. For i4x4, which didn’t have a Planar mode, they just added it without replacing an old one, resulting in a total of 10 modes to H.264′s 9. After understanding and writing assembly code for TM pred, I have to say that it is quite a cool idea. Here’s how it works :

    1. Let us take a block of size 4×4, 8×8, or 16×16.

    2. Define the pixels bordering the top of this block (starting from the left) as T[0], T[1], T[2]…

    3. Define the pixels bordering the left of this block (starting from the top) as L[0], L[1], L[2]…

    4. Define the pixel above the top-left of the block as TL.

    5. Predict every pixel <X,Y> in the block to be equal to clip3( T[X] + L[Y] – TL, 0, 255).

    It’s effectively a generalization of gradient prediction to the block level — predict each pixel based on the gradient between its top and left pixels, and the topleft. According to the VP8 devs, it’s chosen by the encoder quite a lot of the time, which isn’t surprising ; it seems like a pretty good idea. As just one more intra pred mode, it’s not going to do magic for compression, but it’s a cool idea and elegantly simple.

    3. Performance and the deblocking filter.

    On2 advertised for quite some that VP8′s goal was to be significantly faster to decode than H.264. When I saw the spec, I waited for the punchline, but apparently they were serious. There’s nothing wrong with being of similar speed or a bit slower — but I was rather confused as to the fact that their design didn’t match their stated goal at all. What apparently happened is they had multiple profiles of VP8 — high and low complexity profiles. They marketed the performance of the low complexity ones while touting the quality of the high complexity ones, a tad dishonest. More importantly though, practically nobody is using the low complexity modes, so anyone writing a decoder has to be prepared to handle the high complexity ones, which are the default.

    The primary time-eater here is the deblocking filter. VP8, being an H.264 derivative, has much the same problem as H.264 does in terms of deblocking — it spends an absurd amount of time there. As I write this post, we’re about to finish some of the deblocking filter asm code, but before it’s committed, up to 70% or more of total decoding time is spent in the deblocking filter ! Like H.264, it suffers from the 4×4 transform problem : a 4×4 transform requires a total of 8 length-16 and 8 length-8 loopfilter calls per macroblock, while Theora, with only an 8×8 transform, requires half that.

    This problem is aggravated in VP8 by the fact that the deblocking filter isn’t strength-adaptive ; if even one 4×4 block in a macroblock contains coefficients, every single edge has to be deblocked. Furthermore, the deblocking filter itself is quite complicated ; the “inner edge” filter is a bit more complex than H.264′s and the “macroblock edge” filter is vastly more complicated, having two entirely different codepaths chosen on a per-pixel basis. Of course, in SIMD, this means you have to do both and mask them together at the end.

    There’s nothing wrong with a good-but-slow deblocking filter. But given the amount of deblocking one needs to do in a 4×4-transform-based format, it might have been a better choice to make the filter simpler. It’s pretty difficult to beat H.264 on compression, but it’s certainly not hard to beat it on speed — and yet it seems VP8 missed a perfectly good chance to do so. Another option would have been to pick an 8×8 transform instead of 4×4, reducing the amount of deblocking by a factor of 2.

    And yes, there’s a simple filter available in the low complexity profile, but it doesn’t help if nobody uses it.

    4. Tree-based arithmetic coding.

    Binary arithmetic coding has become the standard entropy coding method for a wide variety of compressed formats, ranging from LZMA to VP6, H.264 and VP8. It’s simple, relatively fast compared to other arithmetic coding schemes, and easy to make adaptive. The problem with this is that you have to come up with a method for converting non-binary symbols into a list of binary symbols, and then choosing what probabilities to use to code each one. Here’s an example from H.264, the sub-partition mode symbol, which is either 8×8, 8×4, 4×8, or 4×4. encode_decision( context, bit ) writes a binary decision (bit) into a numbered context (context).

    8×8 : encode_decision( 21, 0 ) ;

    8×4 : encode_decision( 21, 1 ) ; encode_decision( 22, 0 ) ;

    4×8 : encode_decision( 21, 1 ) ; encode_decision( 22, 1 ) ; encode_decision( 23, 1 ) ;

    4×4 : encode_decision( 21, 1 ) ; encode_decision( 22, 1 ) ; encode_decision( 23, 0 ) ;

    As can be seen, this is clearly like a Huffman tree. Wouldn’t it be nice if we could represent this in the form of an actual tree data structure instead of code ? On2 thought so — they designed a simple system in VP8 that allowed all binarization schemes in the entire format to be represented as simple tree data structures. This greatly reduces the complexity — not speed-wise, but implementation-wise — of the entropy coder. Personally, I quite like it.

    5. The inverse transform ordering.

    I should at some point write a post about common mistakes made in video formats that everyone keeps making. These are not issues that are patent worries or huge issues for compression — just stupid mistakes that are repeatedly made in new video formats, probably because someone just never asked the guy next to him “does this look stupid ?” before sticking it in the spec.

    One common mistake is the problem of transform ordering. Every sane 2D transform is “separable” — that is, it can be done by doing a 1D transform vertically and doing the 1D transform again horizontally (or vice versa). The original iDCT as used in JPEG, H.263, and MPEG-1/2/4 was an “idealized” iDCT — nobody had to use the exact same iDCT, theirs just had to give very close results to a reference implementation. This ended up resulting in a lot of practical problems. It was also slow ; the only way to get an accurate enough iDCT was to do all the intermediate math in 32-bit.

    Practically every modern format, accordingly, has specified an exact iDCT. This includes H.264, VC-1, RV40, Theora, VP8, and many more. Of course, with an exact iDCT comes an exact ordering — while the “real” iDCT can be done in any order, an exact iDCT usually requires an exact order. That is, it specifies horizontal and then vertical, or vertical and then horizontal.

    All of these transforms end up being implemented in SIMD. In SIMD, a vertical transform is generally the only option, so a transpose is added to the process instead of doing a horizontal transform. Accordingly, there are two ways to do it :

    1. Transpose, vertical transform, transpose, vertical transform.

    2. Vertical transform, transpose, vertical transform, transpose.

    These may seem to be equally good, but there’s one catch — if the transpose is done first, it can be completely eliminated by merging it into the coefficient decoding process. On many modern CPUs, particularly x86, transposes are very expensive, so eliminating one of the two gives a pretty significant speed benefit.

    H.264 did it way 1).

    VC-1 did it way 1).

    Theora (inherited from VP3) did it way 1).

    But no. VP8 has to do it way 2), where you can’t eliminate the transpose. Bah. It’s not a huge deal ; probably only 1-2% overall at most speed-wise, but it’s just a needless waste. What really bugs me is that VP3 got it right — why in the world did they screw it up this time around if they got it right beforehand ?

    RV40 is the other modern format I know that made this mistake.

    (NB : You can do transforms without a transpose, but it’s generally not worth it unless the intermediate needs 32-bit math, as in the case of the “real” iDCT.)

    6. Not supporting interlacing.

    THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU.

    Interlacing was the scourge of H.264. It weaseled its way into every nook and cranny of the spec, making every decoder a thousand lines longer. H.264 even included a highly complicated — and effective — dedicated interlaced coding scheme, MBAFF. The mere existence of MBAFF, despite its usefulness for broadcasters and others still stuck in the analog age with their 1080i, 576i , and 480i content, was a blight upon the video format.

    VP8 has once and for all avoided it.

    And if anyone suggests adding interlaced support to the experimental VP8 branch, find a straightjacket and padded cell for them before they cause any real damage.

  • IJG swings again, and misses

    1er février 2010, par Mans — Multimedia

    Earlier this month the IJG unleashed version 8 of its ubiquitous libjpeg library on the world. Eager to try out the “major breakthrough in image coding technology” promised in the README file accompanying v7, I downloaded the release. A glance at the README file suggests something major indeed is afoot :

    Version 8.0 is the first release of a new generation JPEG standard to overcome the limitations of the original JPEG specification.

    The text also hints at the existence of a document detailing these marvellous new features, and a Google search later a copy has found its way onto my monitor. As I read, however, my state of mind shifts from an initial excited curiosity, through bewilderment and disbelief, finally arriving at pure merriment.

    Already on the first page it becomes clear no new JPEG standard in fact exists. All we have is an unsolicited proposal sent to the ITU-T by members of the IJG. Realising that even the most brilliant of inventions must start off as mere proposals, I carry on reading. The summary informs me that I am about to witness the introduction of three extensions to the T.81 JPEG format :

    1. An alternative coefficient scan sequence for DCT coefficient serialization
    2. A SmartScale extension in the Start-Of-Scan (SOS) marker segment
    3. A Frame Offset definition in or in addition to the Start-Of-Frame (SOF) marker segment

    Together these three extensions will, it is promised, “bring DCT based JPEG back to the forefront of state-of-the-art image coding technologies.”

    Alternative scan

    The first of the proposed extensions introduces an alternative DCT coefficient scan sequence to be used in place of the zigzag scan employed in most block transform based codecs.

    Alternative scan sequence

    Alternative scan sequence

    The advantage of this scan would be that combined with the existing progressive mode, it simplifies decoding of an initial low-resolution image which is enhanced through subsequent passes. The author of the document calls this scheme “image-pyramid/hierarchical multi-resolution coding.” It is not immediately obvious to me how this constitutes even a small advance in image coding technology.

    At this point I am beginning to suspect that our friend from the IJG has been trapped in a half-world between interlaced GIF images transmitted down noisy phone lines and today’s inferno of SVC, MVC, and other buzzwords.

    (Not so) SmartScale

    Disguised behind this camel-cased moniker we encounter a method which, we are told, will provide better image quality at high compression ratios. The author has combined two well-known (to us) properties in a (to him) clever way.

    The first property concerns the perceived impact of different types of distortion in an image. When encoding with JPEG, as the quantiser is increased, the decoded image becomes ever more blocky. At a certain point, a better subjective visual quality can be achieved by down-sampling the image before encoding it, thus allowing a lower quantiser to be used. If the decoded image is scaled back up to the original size, the unpleasant, blocky appearance is replaced with a smooth blur.

    The second property belongs to the DCT where, as we all know, the top-left (DC) coefficient is the average of the entire block, its neighbours represent the lowest frequency components etc. A top-left-aligned subset of the coefficient block thus represents a low-resolution version of the full block in the spatial domain.

    In his flash of genius, our hero came up with the idea of using the DCT for down-scaling the image. Unfortunately, he appears to possess precious little knowledge of sampling theory and human visual perception. Any block-based resampling will inevitably produce sharp artefacts along the block edges. The human visual system is particularly sensitive to sharp edges, so this is one of the most unwanted types of distortion in an encoded image.

    Despite the obvious flaws in this approach, I decided to give it a try. After all, the software is already written, allowing downscaling by factors of 8/8..16.

    Using a 1280×720 test image, I encoded it with each of the nine scaling options, from unity to half size, each time adjusting the quality parameter for a final encoded file size of no more than 200000 bytes. The following table presents the encoded file size, the libjpeg quality parameter used, and the SSIM metric for each of the images.

    Scale Size Quality SSIM
    8/8 198462 59 0.940
    8/9 196337 70 0.936
    8/10 196133 79 0.934
    8/11 197179 84 0.927
    8/12 193872 89 0.915
    8/13 197153 92 0.914
    8/14 188334 94 0.899
    8/15 198911 96 0.886
    8/16 197190 97 0.869

    Although the smaller images allowed a higher quality setting to be used, the SSIM value drops significantly. Numbers may of course be misleading, but the images below speak for themselves. These are cut-outs from the full image, the original on the left, unscaled JPEG-compressed in the middle, and JPEG with 8/16 scaling to the right.

    Looking at these images, I do not need to hesitate before picking the JPEG variant I prefer.

    Frame offset

    The third and final extension proposed is quite simple and also quite pointless : a top-left cropping to be applied to the decoded image. The alleged utility of this feature would be to enable lossless cropping of a JPEG image. In a typical image workflow, however, JPEG is only used for the final published version, so the need for this feature appears quite far-fetched.

    The grand finale

    Throughout the text, the author makes references to “the fundamental DCT property for image representation.” In his own words :

    This property was found by the author during implementation of the new DCT scaling features and is after his belief one of the most important discoveries in digital image coding after releasing the JPEG standard in 1992.

    The secret is to be revealed in an annex to the main text. This annex quotes in full a post by the author to the comp.dsp Usenet group in a thread with the subject why DCT. Reading the entire thread proves quite amusing. A few excerpts follow.

    The actual reason is much simpler, and therefore apparently very difficult to recognize by complicated-thinking people.

    Here is the explanation :

    What are people doing when they have a bunch of images and want a quick preview ? They use thumbnails ! What are thumbnails ? Thumbnails are small downscaled versions of the original image ! If you want more details of the image, you can zoom in stepwise by enlarging (upscaling) the image.

    So with proper understanding of the fundamental DCT property, the MPEG folks could make their videos more scalable, but, as in the case of JPEG, they are unable to recognize this simple but basic property, unfortunately, and pursue rather inferior approaches in actual developments.

    These are just phrases, and they don’t explain anything. But this is typical for the current state in this field : The relevant people ignore and deny the true reasons, and thus they turn in a circle and no progress is being made.

    However, there are dark forces in action today which ignore and deny any fruitful advances in this field. That is the reason that we didn’t see any progress in JPEG for more than a decade, and as long as those forces dominate, we will see more confusion and less enlightenment. The truth is always simple, and the DCT *is* simple, but this fact is suppressed by established people who don’t want to lose their dubious position.

    I believe a trip to the Total Perspective Vortex may be in order. Perhaps his tin-foil hat will save him.