Recherche avancée

Médias (91)

Autres articles (81)

  • Automated installation script of MediaSPIP

    25 avril 2011, par

    To overcome the difficulties mainly due to the installation of server side software dependencies, an "all-in-one" installation script written in bash was created to facilitate this step on a server with a compatible Linux distribution.
    You must have access to your server via SSH and a root account to use it, which will install the dependencies. Contact your provider if you do not have that.
    The documentation of the use of this installation script is available here.
    The code of this (...)

  • Contribute to a better visual interface

    13 avril 2011

    MediaSPIP is based on a system of themes and templates. Templates define the placement of information on the page, and can be adapted to a wide range of uses. Themes define the overall graphic appearance of the site.
    Anyone can submit a new graphic theme or template and make it available to the MediaSPIP community.

  • Support de tous types de médias

    10 avril 2011

    Contrairement à beaucoup de logiciels et autres plate-formes modernes de partage de documents, MediaSPIP a l’ambition de gérer un maximum de formats de documents différents qu’ils soient de type : images (png, gif, jpg, bmp et autres...) ; audio (MP3, Ogg, Wav et autres...) ; vidéo (Avi, MP4, Ogv, mpg, mov, wmv et autres...) ; contenu textuel, code ou autres (open office, microsoft office (tableur, présentation), web (html, css), LaTeX, Google Earth) (...)

Sur d’autres sites (7078)

  • FFmpeg Time Skipping after concat

    18 septembre 2017, par Manul Hüttinger

    i am trying to concatenate videos and in my resulting Videos i have the issue that at the concat edges my video is skipping/glitching.


    General Workflow

    in general my workflow looks like this

    1. Stream via ffmpeg
    2. capture stream and segment all 2 mins in to TS files
    3. concat TS file to MP4 video
    4. cut MP4 video at the desired Timestamps

    Detailed Look at Step 4

    as i am handling 2-hours+ videos i don’t want to re-encode the whole video. So i am looking for the closest I Frames to my In and Outpoint and cut 1 Frame before/after. As my result i get 3 Videos (front/middle/back). Now it is only needed to re-encode front and back and concat all three Videos.
    So far the theory


    General Observations

    As i look at my resulting video, i obeservated the following.
    frontpart duration is e.g. 3.2 secs. i skip framewise through the Video and get nice and clean timestamps ( 00:00:02.620, 00:00:02:660 )
    As soon as i get to my cutting edge the timestamps are skipping some frames and changing (00:00:03.200, 00:00:03.408 ,00:00:03.409). When i look only at the single frame images, their are no missing/dropped frames

    when i concat my TS files, the results contains a messed up frame rate e.g from constant 25 F/s to 25.0096 F/s


    FFMPEG instructions

    FRONT Video

    ffmpeg -ss 232.52 -i src.mp4 -t 3.24 -filter_complex "setpts=PTS-STARTPTS" -vcodec libx264 -profile:v baseline -level 31 -bf 0  -f mp4 -y front.mp4
    ffmpeg version n3.3 Copyright (c) 2000-2017 the FFmpeg developers
     built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
     configuration: --prefix=/usr --extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib --bindir=/usr/local/bin --extra-libs=-ldl --enable-gpl --enable-libass --enable-libfdk-aac --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-nonfree --enable-libssh --enable-openssl
     libavutil      55. 58.100 / 55. 58.100
     libavcodec     57. 89.100 / 57. 89.100
     libavformat    57. 71.100 / 57. 71.100
     libavdevice    57.  6.100 / 57.  6.100
     libavfilter     6. 82.100 /  6. 82.100
     libswscale      4.  6.100 /  4.  6.100
     libswresample   2.  7.100 /  2.  7.100
     libpostproc    54.  5.100 / 54.  5.100
    Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'src.mp4':
     Metadata:
       major_brand     : isom
       minor_version   : 512
       compatible_brands: isomiso2avc1mp41
       encoder         : Lavf55.33.100
     Duration: 01:06:00.33, start: 0.000000, bitrate: 610 kb/s
       Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x464 [SAR 1:1 DAR 80:29], 537 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)
       Metadata:
         handler_name    : VideoHandler
       Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 65 kb/s (default)
       Metadata:
         handler_name    : SoundHandler
    Stream mapping:
     Stream #0:0 (h264) -> setpts (graph 0)
     setpts (graph 0) -> Stream #0:0 (libx264)
     Stream #0:1 -> #0:1 (aac (native) -> aac (native))
    Press [q] to stop, [?] for help
    [libx264 @ 0x31ea160] using SAR=1/1
    [libx264 @ 0x31ea160] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX LZCNT
    [libx264 @ 0x31ea160] profile Constrained Baseline, level 3.1
    [libx264 @ 0x31ea160] 264 - core 148 - H.264/MPEG-4 AVC codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html - options: cabac=0 ref=3 deblock=1:0:0 analyse=0x1:0x111 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
    Output #0, mp4, to 'front.mp4':
     Metadata:
       major_brand     : isom
       minor_version   : 512
       compatible_brands: isomiso2avc1mp41
       encoder         : Lavf57.71.100
       Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1280x464 [SAR 1:1 DAR 80:29], q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
       Metadata:
         encoder         : Lavc57.89.100 libx264
       Side data:
         cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
       Stream #0:1(und): Audio: aac (LC) ([64][0][0][0] / 0x0040), 44100 Hz, mono, fltp, 69 kb/s (default)
       Metadata:
         handler_name    : SoundHandler
         encoder         : Lavc57.89.100 aac
    frame=   81 fps= 31 q=-1.0 Lsize=     151kB time=00:00:03.25 bitrate= 379.5kbits/s speed=1.23x
    video:119kB audio:28kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.531915%
    [libx264 @ 0x31ea160] frame I:1     Avg QP:17.43  size: 38268
    [libx264 @ 0x31ea160] frame P:80    Avg QP:21.05  size:  1038
    [libx264 @ 0x31ea160] mb I  I16..4: 71.4%  0.0% 28.6%
    [libx264 @ 0x31ea160] mb P  I16..4:  0.2%  0.0%  0.0%  P16..4:  7.8%  1.8%  0.6%  0.0%  0.0%    skip:89.6%
    [libx264 @ 0x31ea160] coded y,uvDC,uvAC intra: 21.8% 40.8% 23.1% inter: 1.6% 4.2% 0.2%
    [libx264 @ 0x31ea160] i16 v,h,dc,p: 66% 25%  5%  4%
    [libx264 @ 0x31ea160] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 30% 28% 15%  4%  4%  5%  4%  5%  4%
    [libx264 @ 0x31ea160] i8c dc,h,v,p: 51% 24% 21%  4%
    [libx264 @ 0x31ea160] ref P L0: 74.3% 15.2% 10.5%
    [libx264 @ 0x31ea160] kb/s:299.56
    [aac @ 0x31edfa0] Qavg: 123.323

    MIDDLE Video

    ffmpeg -ss 235.88 -i src.mp4 -c copy -t 1492.96 -f mp4 -y middle.mp4
    ffmpeg version n3.3 Copyright (c) 2000-2017 the FFmpeg developers
     built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
     configuration: --prefix=/usr --extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib --bindir=/usr/local/bin --extra-libs=-ldl --enable-gpl --enable-libass --enable-libfdk-aac --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-nonfree --enable-libssh --enable-openssl
     libavutil      55. 58.100 / 55. 58.100
     libavcodec     57. 89.100 / 57. 89.100
     libavformat    57. 71.100 / 57. 71.100
     libavdevice    57.  6.100 / 57.  6.100
     libavfilter     6. 82.100 /  6. 82.100
     libswscale      4.  6.100 /  4.  6.100
     libswresample   2.  7.100 /  2.  7.100
     libpostproc    54.  5.100 / 54.  5.100
    Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'src.mp4':
     Metadata:
       major_brand     : isom
       minor_version   : 512
       compatible_brands: isomiso2avc1mp41
       encoder         : Lavf55.33.100
     Duration: 01:06:00.33, start: 0.000000, bitrate: 610 kb/s
       Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x464 [SAR 1:1 DAR 80:29], 537 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)
       Metadata:
         handler_name    : VideoHandler
       Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 65 kb/s (default)
       Metadata:
         handler_name    : SoundHandler
    Output #0, mp4, to 'middle.mp4':
     Metadata:
       major_brand     : isom
       minor_version   : 512
       compatible_brands: isomiso2avc1mp41
       encoder         : Lavf57.71.100
       Stream #0:0(und): Video: h264 (Main) ([33][0][0][0] / 0x0021), yuv420p, 1280x464 [SAR 1:1 DAR 80:29], q=2-31, 537 kb/s, 25 fps, 25 tbr, 90k tbn, 90k tbc (default)
       Metadata:
         handler_name    : VideoHandler
       Stream #0:1(und): Audio: aac (LC) ([64][0][0][0] / 0x0040), 44100 Hz, mono, fltp, 65 kb/s (default)
       Metadata:
         handler_name    : SoundHandler
    Stream mapping:
     Stream #0:0 -> #0:0 (copy)
     Stream #0:1 -> #0:1 (copy)
    Press [q] to stop, [?] for help
    frame=37329 fps=16333 q=-1.0 Lsize=  113774kB time=00:24:52.95 bitrate= 624.3kbits/s speed= 653x
    video:100359kB audio:12095kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.173957%

    BACK Video

    ffmpeg -ss 1728.88 -i 'src.mp4' -t 3.9199999999998 -filter_complex "setpts=PTS-STARTPTS" -vcodec libx264 -profile:v baseline -level 31 -bf 0  -f mp4 -y back.mp4
    ffmpeg version n3.3 Copyright (c) 2000-2017 the FFmpeg developers
     built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
     configuration: --prefix=/usr --extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib --bindir=/usr/local/bin --extra-libs=-ldl --enable-gpl --enable-libass --enable-libfdk-aac --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-nonfree --enable-libssh --enable-openssl
     libavutil      55. 58.100 / 55. 58.100
     libavcodec     57. 89.100 / 57. 89.100
     libavformat    57. 71.100 / 57. 71.100
     libavdevice    57.  6.100 / 57.  6.100
     libavfilter     6. 82.100 /  6. 82.100
     libswscale      4.  6.100 /  4.  6.100
     libswresample   2.  7.100 /  2.  7.100
     libpostproc    54.  5.100 / 54.  5.100
    Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'src.mp4':
     Metadata:
       major_brand     : isom
       minor_version   : 512
       compatible_brands: isomiso2avc1mp41
       encoder         : Lavf55.33.100
     Duration: 01:06:00.33, start: 0.000000, bitrate: 610 kb/s
       Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x464 [SAR 1:1 DAR 80:29], 537 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)
       Metadata:
         handler_name    : VideoHandler
       Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 65 kb/s (default)
       Metadata:
         handler_name    : SoundHandler
    Stream mapping:
     Stream #0:0 (h264) -> setpts (graph 0)
     setpts (graph 0) -> Stream #0:0 (libx264)
     Stream #0:1 -> #0:1 (aac (native) -> aac (native))
    Press [q] to stop, [?] for help
    [libx264 @ 0x2672160] using SAR=1/1
    [libx264 @ 0x2672160] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX LZCNT
    [libx264 @ 0x2672160] profile Constrained Baseline, level 3.1
    [libx264 @ 0x2672160] 264 - core 148 - H.264/MPEG-4 AVC codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html - options: cabac=0 ref=3 deblock=1:0:0 analyse=0x1:0x111 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
    Output #0, mp4, to 'back.mp4':
     Metadata:
       major_brand     : isom
       minor_version   : 512
       compatible_brands: isomiso2avc1mp41
       encoder         : Lavf57.71.100
       Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1280x464 [SAR 1:1 DAR 80:29], q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
       Metadata:
         encoder         : Lavc57.89.100 libx264
       Side data:
         cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
       Stream #0:1(und): Audio: aac (LC) ([64][0][0][0] / 0x0040), 44100 Hz, mono, fltp, 69 kb/s (default)
       Metadata:
         handler_name    : SoundHandler
         encoder         : Lavc57.89.100 aac
    frame=   98 fps= 27 q=-1.0 Lsize=     268kB time=00:00:03.92 bitrate= 559.2kbits/s speed=1.07x
    video:230kB audio:33kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.538479%
    [libx264 @ 0x2672160] frame I:1     Avg QP:19.57  size:112129
    [libx264 @ 0x2672160] frame P:97    Avg QP:21.68  size:  1270
    [libx264 @ 0x2672160] mb I  I16..4: 35.4%  0.0% 64.6%
    [libx264 @ 0x2672160] mb P  I16..4:  0.2%  0.0%  0.1%  P16..4:  6.9%  2.0%  0.8%  0.0%  0.0%    skip:90.1%
    [libx264 @ 0x2672160] coded y,uvDC,uvAC intra: 54.4% 69.1% 54.2% inter: 2.0% 3.6% 0.3%
    [libx264 @ 0x2672160] i16 v,h,dc,p: 63% 23%  8%  6%
    [libx264 @ 0x2672160] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 17% 11%  6%  8%  9%  7%  8%  8%
    [libx264 @ 0x2672160] i8c dc,h,v,p: 50% 18% 24%  7%
    [libx264 @ 0x2672160] ref P L0: 81.8% 11.5%  6.7%
    [libx264 @ 0x2672160] kb/s:480.15
    [aac @ 0x2675fa0] Qavg: 119.994

    CONCAT Videos

    ffmpeg -f concat -safe 0 -i concat.txt -c copy -r 25 -y /concat.mp4
    ffmpeg version n3.3 Copyright (c) 2000-2017 the FFmpeg developers
     built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.4) 20160609
     configuration: --prefix=/usr --extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib --bindir=/usr/local/bin --extra-libs=-ldl --enable-gpl --enable-libass --enable-libfdk-aac --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libx264 --enable-nonfree --enable-libssh --enable-openssl
     libavutil      55. 58.100 / 55. 58.100
     libavcodec     57. 89.100 / 57. 89.100
     libavformat    57. 71.100 / 57. 71.100
     libavdevice    57.  6.100 / 57.  6.100
     libavfilter     6. 82.100 /  6. 82.100
     libswscale      4.  6.100 /  4.  6.100
     libswresample   2.  7.100 /  2.  7.100
     libpostproc    54.  5.100 / 54.  5.100
    Input #0, concat, from 'concat.txt':
     Duration: N/A, start: 0.000000, bitrate: N/A
       Stream #0:0: Video: h264 (Constrained Baseline) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1280x464 [SAR 1:1 DAR 80:29], 25 fps, 25 tbr, 90k tbn, 50 tbc
       Stream #0:1(und): Audio: aac (LC) ([15][0][0][0] / 0x000F), 44100 Hz, mono, fltp, 74 kb/s
    Output #0, mp4, to 'concat.mp4':
     Metadata:
       encoder         : Lavf57.71.100
       Stream #0:0: Video: h264 (Constrained Baseline) ([33][0][0][0] / 0x0021), yuv420p(progressive), 1280x464 [SAR 1:1 DAR 80:29], q=2-31, 25 fps, 25 tbr, 12800 tbn, 25 tbc
       Stream #0:1(und): Audio: aac (LC) ([64][0][0][0] / 0x0040), 44100 Hz, mono, fltp, 74 kb/s
    Stream mapping:
     Stream #0:0 -> #0:0 (copy)
     Stream #0:1 -> #0:1 (copy)
    Press [q] to stop, [?] for help
    frame=37508 fps=7256 q=-1.0 Lsize=  114285kB time=00:25:00.59 bitrate= 623.9kbits/s speed= 290x
    video:100672kB audio:12598kB subtitle:0kB other streams:0kB global headers:1kB muxing overhead: 0.896435%

    thank you very much.
    If needed i can provide samples

    Manu

  • Translating Return To Ringworld

    17 août 2016, par Multimedia Mike — Game Hacking

    As indicated in my previous post, the Translator has expressed interest in applying his hobby towards another DOS adventure game from the mid 1990s : Return to Ringworld (henceforth R2RW) by Tsunami Media. This represents significantly more work than the previous outing, Phantasmagoria.


    Return to Ringworld Title Screen
    Return to Ringworld Title Screen

    I have been largely successful thus far in crafting translation tools. I have pushed the fruits of these labors to a Github repository named improved-spoon (named using Github’s random name generator because I wanted something more interesting than ‘game-hacking-tools’).

    Further, I have recorded everything I have learned about the game’s resource format (named RLB) at the XentaxWiki.

    New Challenges
    The previous project mostly involved scribbling subtitle text on an endless series of video files by leveraging a separate software library which took care of rendering fonts. In contrast, R2RW has at least 30k words of English text contained in various blocks which require translation. Further, the game encodes its own fonts (9 of them) which stubbornly refuse to be useful for rendering text in nearly any other language.

    Thus, the immediate 2 challenges are :

    1. Translating volumes of text to Spanish
    2. Expanding the fonts to represent Spanish characters

    Normally, “figuring out the file format data structures involved” is on the list as well. Thankfully, understanding the formats is not a huge challenge since the folks at the ScummVM project already did all the heavy lifting of reverse engineering the file formats.

    The Pitch
    Here was the plan :

    • Create a tool that can dump out the interesting data from the game’s master resource file.
    • Create a tool that can perform the elaborate file copy described in the previous post. The new file should be bit for bit compatible with the original file.
    • Modify the rewriting tool to repack some modified strings into the new resource file.
    • Unpack the fonts and figure out a way to add new characters.
    • Repack the new fonts into the resource file.
    • Repack message strings with Spanish characters.

    Showing The Work : Modifying Strings
    First, I created the tool to unpack blocks of message string resources. I elected to dump the strings to disk as JSON data since it’s easy to write and read JSON using Python, and it’s quick to check if any mistakes have crept in.

    The next step is to find a string to focus on. So I started the game and looked for the first string I could trigger :


    Return to Ringworld: Original text

    This shows up in the JSON string dump as :

      
        "Spanish" : " !0205Your quarters on the Lance of Truth are spartan, in accord with your mercenary lifestyle.",
        "English" : " !0205Your quarters on the Lance of Truth are spartan, in accord with your mercenary lifestyle."
      ,
    

    As you can see, many of the strings are encoded with an ID key as part of the string which should probably be left unmodified. I changed the Spanish string :

      
        "Spanish" : " !0205Hey, is this thing on ?",
        "English" : " !0205Your quarters on the Lance of Truth are spartan, in accord with your mercenary lifestyle."
      ,
    

    And then I wrote the repacking tool to substitute this message block for the original one. Look ! The engine liked it !


    Return to Ringworld: Modified text

    Little steps, little steps.

    Showing The Work : Modifying Fonts
    The next little step is to find a place to put the new characters. First, a problem definition : The immediate goal is to translate the game into Spanish. The current fonts encoded in the game resource only support 128 characters, corresponding to 7-bit ASCII. In order to properly express Spanish, 16 new characters are required : á, é, í, ó, ú, ü, ñ (each in upper and lower case for a total of 14 characters) as well as the inverted punctuation symbols : ¿, ¡.

    Again, ScummVM already documents (via code) the font coding format. So I quickly determined that each of the 9 fonts is comprised of 128 individual bitmaps with either 1 or 2 bits per pixel. I wrote a tool to unpack each character into an individual portable grey map (PGM) image. These can be edited with graphics editors or with text editors since they are just text files.

    Where to put the 16 new Spanish characters ? ASCII characters 1-31 are non-printable, so my first theory was that these characters would be empty and could be repurposed. However, after dumping and inspecting, I learned that they represent the same set of characters as seen in DOS Code Page 437. So that’s a no-go (so I assumed ; I didn’t check if any existing strings leveraged those characters).

    My next plan was hope that I could extend the font beyond index 127 and use positions 128-143. This worked superbly. This is the new example string :

      
        "Spanish" : " !0205¿Ves esto ? ¡La puntuacion se hace girar !",
        "English" : " !0205Your quarters on the Lance of Truth are spartan, in accord with your mercenary lifestyle."
      ,
    

    Fortunately, JSON understands UTF-8 and after mapping the 16 necessary characters down to the numeric range of 128-143, I repacked the new fonts and the new string :


    Return to Ringworld: Espanol
    Translation : “See this ? The punctuation is rotated !”

    Another victory. Notice that there are no diacritics in this string. None are required for this translation (according to Google Translate). But adding the diacritics to the 14 characters isn’t my department. My tool does help by prepopulating [aeiounAEIOUN] into the right positions to make editing easier for the Translator. But the tool does make the effort to rotate the punctuation since that is easy to automate.

    Next Steps and Residual Weirdness
    There is another method for storing ASCII text inside the R2RW resource called strip resources. These store conversation scripts. There are plenty of fields in the data structures that I don’t fully understand. So, following the lessons I learned from my previous translation outing, I was determined to modify as little as possible. This means copying over most of the original data structures intact, but changing the field representing the relative offset that points to the corresponding string. This works well since the strings are invariably stored NULL-terminated in a concatenated manner.

    I wanted to document for the record that the format that R2RW uses has some weirdness in they way it handles residual bytes in a resource. The variant of the resource format that R2RW uses requires every block to be aligned on a 16-byte boundary. If there is space between the logical end of the resource and the start of the next resource, there are random bytes in that space. This leads me to believe that these bytes were originally recorded from stale/uninitialized memory. This frustrates me because when I write the initial file copy tool which unpacks and repacks each block, I want the new file to be identical to the original. However, these apparent nonsense bytes at the end thwart that effort.

    But leaving those bytes as 0 produces an acceptable resource file.

    Text On Static Images
    There is one last resource type we are working on translating. There are various bits of text that are rendered as images. For example, from the intro :


    Return to Ringworld: Static text

    It’s possible to locate and extract the exact image that is overlaid on this scene, though without the colors :


    Original static text

    The palettes are stored in a separate resource type. So it seems the challenge is to figure out the palette in use for these frames and render a transparent image that uses the same palette, then repack the new text-image into the new resource file.

  • IJG swings again, and misses

    1er février 2010, par Mans — Multimedia

    Earlier this month the IJG unleashed version 8 of its ubiquitous libjpeg library on the world. Eager to try out the “major breakthrough in image coding technology” promised in the README file accompanying v7, I downloaded the release. A glance at the README file suggests something major indeed is afoot :

    Version 8.0 is the first release of a new generation JPEG standard to overcome the limitations of the original JPEG specification.

    The text also hints at the existence of a document detailing these marvellous new features, and a Google search later a copy has found its way onto my monitor. As I read, however, my state of mind shifts from an initial excited curiosity, through bewilderment and disbelief, finally arriving at pure merriment.

    Already on the first page it becomes clear no new JPEG standard in fact exists. All we have is an unsolicited proposal sent to the ITU-T by members of the IJG. Realising that even the most brilliant of inventions must start off as mere proposals, I carry on reading. The summary informs me that I am about to witness the introduction of three extensions to the T.81 JPEG format :

    1. An alternative coefficient scan sequence for DCT coefficient serialization
    2. A SmartScale extension in the Start-Of-Scan (SOS) marker segment
    3. A Frame Offset definition in or in addition to the Start-Of-Frame (SOF) marker segment

    Together these three extensions will, it is promised, “bring DCT based JPEG back to the forefront of state-of-the-art image coding technologies.”

    Alternative scan

    The first of the proposed extensions introduces an alternative DCT coefficient scan sequence to be used in place of the zigzag scan employed in most block transform based codecs.

    Alternative scan sequence

    Alternative scan sequence

    The advantage of this scan would be that combined with the existing progressive mode, it simplifies decoding of an initial low-resolution image which is enhanced through subsequent passes. The author of the document calls this scheme “image-pyramid/hierarchical multi-resolution coding.” It is not immediately obvious to me how this constitutes even a small advance in image coding technology.

    At this point I am beginning to suspect that our friend from the IJG has been trapped in a half-world between interlaced GIF images transmitted down noisy phone lines and today’s inferno of SVC, MVC, and other buzzwords.

    (Not so) SmartScale

    Disguised behind this camel-cased moniker we encounter a method which, we are told, will provide better image quality at high compression ratios. The author has combined two well-known (to us) properties in a (to him) clever way.

    The first property concerns the perceived impact of different types of distortion in an image. When encoding with JPEG, as the quantiser is increased, the decoded image becomes ever more blocky. At a certain point, a better subjective visual quality can be achieved by down-sampling the image before encoding it, thus allowing a lower quantiser to be used. If the decoded image is scaled back up to the original size, the unpleasant, blocky appearance is replaced with a smooth blur.

    The second property belongs to the DCT where, as we all know, the top-left (DC) coefficient is the average of the entire block, its neighbours represent the lowest frequency components etc. A top-left-aligned subset of the coefficient block thus represents a low-resolution version of the full block in the spatial domain.

    In his flash of genius, our hero came up with the idea of using the DCT for down-scaling the image. Unfortunately, he appears to possess precious little knowledge of sampling theory and human visual perception. Any block-based resampling will inevitably produce sharp artefacts along the block edges. The human visual system is particularly sensitive to sharp edges, so this is one of the most unwanted types of distortion in an encoded image.

    Despite the obvious flaws in this approach, I decided to give it a try. After all, the software is already written, allowing downscaling by factors of 8/8..16.

    Using a 1280×720 test image, I encoded it with each of the nine scaling options, from unity to half size, each time adjusting the quality parameter for a final encoded file size of no more than 200000 bytes. The following table presents the encoded file size, the libjpeg quality parameter used, and the SSIM metric for each of the images.

    Scale Size Quality SSIM
    8/8 198462 59 0.940
    8/9 196337 70 0.936
    8/10 196133 79 0.934
    8/11 197179 84 0.927
    8/12 193872 89 0.915
    8/13 197153 92 0.914
    8/14 188334 94 0.899
    8/15 198911 96 0.886
    8/16 197190 97 0.869

    Although the smaller images allowed a higher quality setting to be used, the SSIM value drops significantly. Numbers may of course be misleading, but the images below speak for themselves. These are cut-outs from the full image, the original on the left, unscaled JPEG-compressed in the middle, and JPEG with 8/16 scaling to the right.

    Looking at these images, I do not need to hesitate before picking the JPEG variant I prefer.

    Frame offset

    The third and final extension proposed is quite simple and also quite pointless : a top-left cropping to be applied to the decoded image. The alleged utility of this feature would be to enable lossless cropping of a JPEG image. In a typical image workflow, however, JPEG is only used for the final published version, so the need for this feature appears quite far-fetched.

    The grand finale

    Throughout the text, the author makes references to “the fundamental DCT property for image representation.” In his own words :

    This property was found by the author during implementation of the new DCT scaling features and is after his belief one of the most important discoveries in digital image coding after releasing the JPEG standard in 1992.

    The secret is to be revealed in an annex to the main text. This annex quotes in full a post by the author to the comp.dsp Usenet group in a thread with the subject why DCT. Reading the entire thread proves quite amusing. A few excerpts follow.

    The actual reason is much simpler, and therefore apparently very difficult to recognize by complicated-thinking people.

    Here is the explanation :

    What are people doing when they have a bunch of images and want a quick preview ? They use thumbnails ! What are thumbnails ? Thumbnails are small downscaled versions of the original image ! If you want more details of the image, you can zoom in stepwise by enlarging (upscaling) the image.

    So with proper understanding of the fundamental DCT property, the MPEG folks could make their videos more scalable, but, as in the case of JPEG, they are unable to recognize this simple but basic property, unfortunately, and pursue rather inferior approaches in actual developments.

    These are just phrases, and they don’t explain anything. But this is typical for the current state in this field : The relevant people ignore and deny the true reasons, and thus they turn in a circle and no progress is being made.

    However, there are dark forces in action today which ignore and deny any fruitful advances in this field. That is the reason that we didn’t see any progress in JPEG for more than a decade, and as long as those forces dominate, we will see more confusion and less enlightenment. The truth is always simple, and the DCT *is* simple, but this fact is suppressed by established people who don’t want to lose their dubious position.

    I believe a trip to the Total Perspective Vortex may be in order. Perhaps his tin-foil hat will save him.