Recherche avancée

Médias (0)

Mot : - Tags -/xmlrpc

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (79)

  • Organiser par catégorie

    17 mai 2013, par

    Dans MédiaSPIP, une rubrique a 2 noms : catégorie et rubrique.
    Les différents documents stockés dans MédiaSPIP peuvent être rangés dans différentes catégories. On peut créer une catégorie en cliquant sur "publier une catégorie" dans le menu publier en haut à droite ( après authentification ). Une catégorie peut être rangée dans une autre catégorie aussi ce qui fait qu’on peut construire une arborescence de catégories.
    Lors de la publication prochaine d’un document, la nouvelle catégorie créée sera proposée (...)

  • Récupération d’informations sur le site maître à l’installation d’une instance

    26 novembre 2010, par

    Utilité
    Sur le site principal, une instance de mutualisation est définie par plusieurs choses : Les données dans la table spip_mutus ; Son logo ; Son auteur principal (id_admin dans la table spip_mutus correspondant à un id_auteur de la table spip_auteurs)qui sera le seul à pouvoir créer définitivement l’instance de mutualisation ;
    Il peut donc être tout à fait judicieux de vouloir récupérer certaines de ces informations afin de compléter l’installation d’une instance pour, par exemple : récupérer le (...)

  • Emballe Médias : Mettre en ligne simplement des documents

    29 octobre 2010, par

    Le plugin emballe médias a été développé principalement pour la distribution mediaSPIP mais est également utilisé dans d’autres projets proches comme géodiversité par exemple. Plugins nécessaires et compatibles
    Pour fonctionner ce plugin nécessite que d’autres plugins soient installés : CFG Saisies SPIP Bonux Diogène swfupload jqueryui
    D’autres plugins peuvent être utilisés en complément afin d’améliorer ses capacités : Ancres douces Légendes photo_infos spipmotion (...)

Sur d’autres sites (6857)

  • Dreamcast Finds

    15 avril 2022, par Multimedia Mike — Sega Dreamcast

    Pursuant to my recent post about finally understanding how Sega Dreamcast GD-ROM rips are structured, I was able to prepare the contents of various demo discs in a manner that makes exploration easy via the Internet Archive. This is due to the way that IA makes it easy to browse archives such as ZIP or ISO files (anything that 7zip knows how to unpack), and also presents the audio tracks for native playback directly through the web browser.

    These are some of the interesting things I have found while perusing the various Dreamcast sampler discs.

    Multimedia Formats
    First and foremost : Multimedia-wise, SFD and ADX files abound on all the discs. SFD files are Sofdec, a middleware format used for a lot of FMV on Dreamcast games. These were little more than MPEG video files with a non-MPEG (ADPCM instead) audio codec. VLC will usually play the video portions of these files but has trouble detecting the audio. It’s not for lack of audio codec support because it can play the ADX files just fine.

    It should be noted that Dreamcast Magazine Disc 11 has an actual .mpg file (as opposed to a .sfd file) that has proper MPEG audio instead instead of ADX ADPCM.

    The only other multimedia format I know of that was used in any Dreamcast games was 4XM, used on Alone In The Dark : The New Nightmare. I wrote a simple C tool a long time to recover these files from a disc image I extracted myself. Rather than interpreting the ISO-9660 filesystem, the tool just crawled through the binary blob searching for ‘4XMV’ file signatures and using length data within the files for extraction.

    Also, there are plentiful PVR files (in reference to the PowerVR2 GPU hardware that the DC uses) which ‘file’ dutifully identifies as “Sega PVR image”. There are probably tools to view them. It doesn’t appear to be a complicated format.

    Scripting
    I was fascinated to see Lua files on at least one of the discs. It turns out that MDK 2 leverages the language, as several other games do. But it was still interesting to see the .lua files show up in the Dreamcast version as well.

    That Windows CE Logo
    Every Sega Dreamcast is famously emblazoned with a logo mentioning Microsoft Windows CE :


    Windows CE Logo on Dreamcast

    It has confused many folks. It also confused me until this exploratory exercise. Many would wonder if the Dreamcast booted up into some Windows CE OS environment that then ran the game, but that certainly wasn’t it. Indeed, Dreamcast was one of the last consoles that really didn’t have any kind of hypervisor operating system managing everything.

    I found a file called rt2dc.exe on one sampler disc. At first, I suspected that this was a development utility for Windows to convert some “RT” graphical format into a format more suitable for the Dreamcast. Then, ‘file’ told me that it was actually a Windows EXE but compiled for the Hitachi SH-4 CPU (the brain inside the DC). Does the conversion utility run on the Dreamcast itself ? Then I analyzed the strings inside the binary and saw references to train stations. That’s when it started to click for me that this was the binary executable for the demo version of Railroad Tycoon 2 : Gold Edition, hence “rt2dc.exe”. Still, this provides some insight about whether Dreamcast “runs” Windows. This binary was built against a series of Windows CE libraries. The symbols also imply DirectX compatibility.

    Here is a page with more info about the WinCE/DirectX variant for the Sega Dreamcast. It seems that this was useful for closing the gap between PC and DC ports of games (i.e., being able to re-use more code between the 2 platforms). I guess this was part of what made Dreamcast a dry run for the DirectXbox (later Xbox).

    Here is a list of all the Dreamcast games that are known to use Windows CE.

    Suddenly, I am curious if tools such as IDA Pro or Ghidra can possibly open up Windows CE binaries that contain SH-4 code. Not that I’m particularly interested in reverse engineering any algorithms locked up in Dreamcast land.

    Tomb Raider Easter Egg
    The volume 6 sampler disc has a demo of Tomb Raider : The Last Revelation. While inspecting the strings, I found an Easter egg. I was far from the first person to discover it, though, as seen on this The Cutting Room Floor wiki page (look under “Developer Message”). It looks like I am the first person to notice it on the Dreamcast version. It shows up at offset 0xE3978 in the Dreamcast (demo version) binary, if anyone with permissions wants to update the page.

    Web Browser
    Then there’s the Web Browser for Sega Dreamcast. It seemed to be included on a lot of these sampler discs. But only mentioning the web browser undersells it– the thing also bundled an email client and an IRC client. It’s important to remember that the Dreamcast also had a keyboard peripheral.

    I need to check the timeline for when the web browser first became available vs. when the MIL-CD hack became known. My thinking is that there is no way that the web browser program didn’t have some security issues– buffer overflows and the like. It seems like this would have been a good method of breaking the security of the system.

    Ironically, I suddenly can think of a reason why one might want to use advanced reverse engineering tools on Dreamcast binaries, something I struggled with just a few paragraphs ago.

    Odds ‘n Ends
    It’s always fun to find plain text files among video game assets and speculating on the precise meaning… while also marveling how long people have been struggling to correctly spell “length”.

    Internationalization via plain text files.

    Another game (Slave Zero) saw fit to zip its assets. Maybe this was to save space in order to fit everything on the magazine sampler disc. Quizzically, this didn’t really save an appreciable amount of space.

    Finally, all the discs have an audio track 2 that advises that the disc must be played in a Dreamcast console. Not unusual. However, volume 4 also has a Japanese lady saying the same thing on track 4. This is odd because track 4 is one of the GD area audio tracks and is not accessible with normal CD hardware. Further, she identifies the disc as a “Windows CE disc”.

    The post Dreamcast Finds first appeared on Breaking Eggs And Making Omelettes.

  • Anatomy of an optimization : H.264 deblocking

    26 mai 2010, par Dark Shikari — H.264, assembly, development, speed, x264

    As mentioned in the previous post, H.264 has an adaptive deblocking filter. But what exactly does that mean — and more importantly, what does it mean for performance ? And how can we make it as fast as possible ? In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.

    H.264′s deblocking filter has two steps : strength calculation and the actual filter. The first step calculates the parameters for the second step. The filter runs on all the edges in each macroblock. That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels. The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters !). The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.

    Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.

    If we’re on the edge between an intra macroblock and any other macroblock : Strength 4
    If we’re on an internal edge of an intra macroblock : Strength 3
    If either side of a 4-pixel-long edge has residual data : Strength 2
    If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same : Strength 1
    Otherwise : Strength 0 (no deblocking)

    These values are then thrown into a lookup table depending on the quantizer : higher quantizers have stronger deblocking. Then the actual filter is run with the appropriate parameters. Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.

    One can see somewhat intuitively why these strengths are chosen. The deblocker exists to get rid of sharp edges caused by the block-based nature of H.264, and so the strength depends on what exists that might cause such sharp edges. The strength calculation is a way to use existing data from the video stream to make better decisions during the deblocking process, improving compression and quality.

    Both the strength calculation and the actual filter (not described here) are very complex if naively implemented. The latter can be SIMD’d with not too much difficulty ; no H.264 decoder can get away with reasonable performance without such a thing. But what about optimizing the strength calculation ? A quick analysis shows that this can be beneficial as well.

    Since we have to check both horizontal and vertical edges, we have to check up to 32 pairs of coefficient counts (for residual), 16 pairs of reference frame indices, and 128 motion vector values (counting x and y as separate values). This is a lot of calculation ; a naive implementation can take 500-1000 clock cycles on a modern CPU. Of course, there’s a lot of shortcuts we can take. Here’s some examples :

    • If the macroblock uses the 8×8 transform, we only need to check 2 edges in each direction instead of 4, because we don’t deblock inside of the 8×8 blocks.
    • If the macroblock is a P-skip, we only have to check the first edge in each direction, since there’s guaranteed to be no motion vector differences, reference frame differences, or residual inside of the macroblock.
    • If the macroblock has no residual at all, we can skip that check.
    • If we know the partition type of the macroblock, we can do motion vector checks only along the edges of the partitions.
    • If the effective quantizer is so low that no deblocking would be performed no matter what, don’t bother calculating the strength.

    But even all of this doesn’t save us from ourselves. We still have to iterate over a ton of edges, checking each one. Stuff like the partition-checking logic greatly complicates the code and adds overhead even as it reduces the number of checks. And in many cases decoupling the checks to add such logic will make it slower : if the checks are coupled, we can avoid doing a motion vector check if there’s residual, since Strength 2 overrides Strength 1.

    But wait. What if we could do this in SIMD, just like the actual loopfilter itself ? Sure, it seems more of a problem for C code than assembly, but there aren’t any obvious things in the way. Many years ago, Loren Merritt (pengvado) wrote the first SIMD implementation that I know of (for ffmpeg’s decoder) ; it is quite fast, so I decided to work on porting the idea to x264 to see if we could eke out a bit more speed here as well.

    Before I go over what I had to do to make this change, let me first describe how deblocking is implemented in x264. Since the filter is a loopfilter, it acts “in loop” and must be done in both the encoder and decoder — hence why x264 has it too, not just decoders. At the end of encoding one row of macroblocks, x264 goes back and deblocks the row, then performs half-pixel interpolation for use in encoding the next frame.

    We do it per-row for reasons of cache coherency : deblocking accesses a lot of pixels and a lot of code that wouldn’t otherwise be used, so it’s more efficient to do it in a single pass as opposed to deblocking each macroblock immediately after encoding. Then half-pixel interpolation can immediately re-use the resulting data.

    Now to the change. First, I modified deblocking to implement a subset of the macroblock_cache_load function : spend an extra bit of effort loading the necessary data into a data structure which is much simpler to address — as an assembly implementation would need (x264_macroblock_cache_load_deblock). Then I massively cleaned up deblocking to move all of the core strength-calculation logic into a single, small function that could be converted to assembly (deblock_strength_c). Finally, I wrote the assembly functions and worked with Loren to optimize them. Here’s the result.

    And the timings for the resulting assembly function on my Core i7, in cycles :

    deblock_strength_c : 309
    deblock_strength_mmx : 79
    deblock_strength_sse2 : 37
    deblock_strength_ssse3 : 33

    Now that is a seriously nice improvement. 33 cycles on average to perform that many comparisons–that’s absurdly low, especially considering the SIMD takes no branchy shortcuts : it always checks every single edge ! I walked over to my performance chart and happily crossed off a box.

    But I had a hunch that I could do better. Remember, as mentioned earlier, we’re reloading all that data back into our data structures in order to address it. This isn’t that slow, but takes enough time to significantly cut down on the gain of the assembly code. And worse, less than a row ago, all this data was in the correct place to be used (when we just finished encoding the macroblock) ! But if we did the deblocking right after encoding each macroblock, the cache issues would make it too slow to be worth it (yes, I tested this). So I went back to other things, a bit annoyed that I couldn’t get the full benefit of the changes.

    Then, yesterday, I was talking with Pascal, a former Xvid dev and current video hacker over at Google, about various possible x264 optimizations. He had seen my deblocking changes and we discussed that a bit as well. Then two lines hit me like a pile of bricks :

    <_skal_> tried computing the strength at least ?
    <_skal_> while it’s fresh

    Why hadn’t I thought of that ? Do the strength calculation immediately after encoding each macroblock, save the result, and then go pick it up later for the main deblocking filter. Then we can use the data right there and then for strength calculation, but we don’t have to do the whole deblock process until later.

    I went and implemented it and, after working my way through a horde of bugs, eventually got a working implementation. A big catch was that of slices : deblocking normally acts between slices even though normal encoding does not, so I had to perform extra munging to get that to work. By midday today I was able to go cross yet another box off on the performance chart. And now it’s committed.

    Sometimes chatting for 10 minutes with another developer is enough to spot the idea that your brain somehow managed to miss for nearly a straight week.

    NB : the performance chart is on a specific test clip at a specific set of settings (super fast settings) relevant to the company I work at, so it isn’t accurate nor complete for, say, default settings.

    Update : Here’s a higher resolution version of the current chart, as requested in the comments.

  • Anatomy of an optimization : H.264 deblocking

    26 mai 2010, par Dark Shikari — H.264, assembly, development, speed, x264

    As mentioned in the previous post, H.264 has an adaptive deblocking filter. But what exactly does that mean — and more importantly, what does it mean for performance ? And how can we make it as fast as possible ? In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.

    H.264′s deblocking filter has two steps : strength calculation and the actual filter. The first step calculates the parameters for the second step. The filter runs on all the edges in each macroblock. That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels. The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters !). The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.

    Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.

    If we’re on the edge between an intra macroblock and any other macroblock : Strength 4
    If we’re on an internal edge of an intra macroblock : Strength 3
    If either side of a 4-pixel-long edge has residual data : Strength 2
    If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same : Strength 1
    Otherwise : Strength 0 (no deblocking)

    These values are then thrown into a lookup table depending on the quantizer : higher quantizers have stronger deblocking. Then the actual filter is run with the appropriate parameters. Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.

    One can see somewhat intuitively why these strengths are chosen. The deblocker exists to get rid of sharp edges caused by the block-based nature of H.264, and so the strength depends on what exists that might cause such sharp edges. The strength calculation is a way to use existing data from the video stream to make better decisions during the deblocking process, improving compression and quality.

    Both the strength calculation and the actual filter (not described here) are very complex if naively implemented. The latter can be SIMD’d with not too much difficulty ; no H.264 decoder can get away with reasonable performance without such a thing. But what about optimizing the strength calculation ? A quick analysis shows that this can be beneficial as well.

    Since we have to check both horizontal and vertical edges, we have to check up to 32 pairs of coefficient counts (for residual), 16 pairs of reference frame indices, and 128 motion vector values (counting x and y as separate values). This is a lot of calculation ; a naive implementation can take 500-1000 clock cycles on a modern CPU. Of course, there’s a lot of shortcuts we can take. Here’s some examples :

    • If the macroblock uses the 8×8 transform, we only need to check 2 edges in each direction instead of 4, because we don’t deblock inside of the 8×8 blocks.
    • If the macroblock is a P-skip, we only have to check the first edge in each direction, since there’s guaranteed to be no motion vector differences, reference frame differences, or residual inside of the macroblock.
    • If the macroblock has no residual at all, we can skip that check.
    • If we know the partition type of the macroblock, we can do motion vector checks only along the edges of the partitions.
    • If the effective quantizer is so low that no deblocking would be performed no matter what, don’t bother calculating the strength.

    But even all of this doesn’t save us from ourselves. We still have to iterate over a ton of edges, checking each one. Stuff like the partition-checking logic greatly complicates the code and adds overhead even as it reduces the number of checks. And in many cases decoupling the checks to add such logic will make it slower : if the checks are coupled, we can avoid doing a motion vector check if there’s residual, since Strength 2 overrides Strength 1.

    But wait. What if we could do this in SIMD, just like the actual loopfilter itself ? Sure, it seems more of a problem for C code than assembly, but there aren’t any obvious things in the way. Many years ago, Loren Merritt (pengvado) wrote the first SIMD implementation that I know of (for ffmpeg’s decoder) ; it is quite fast, so I decided to work on porting the idea to x264 to see if we could eke out a bit more speed here as well.

    Before I go over what I had to do to make this change, let me first describe how deblocking is implemented in x264. Since the filter is a loopfilter, it acts “in loop” and must be done in both the encoder and decoder — hence why x264 has it too, not just decoders. At the end of encoding one row of macroblocks, x264 goes back and deblocks the row, then performs half-pixel interpolation for use in encoding the next frame.

    We do it per-row for reasons of cache coherency : deblocking accesses a lot of pixels and a lot of code that wouldn’t otherwise be used, so it’s more efficient to do it in a single pass as opposed to deblocking each macroblock immediately after encoding. Then half-pixel interpolation can immediately re-use the resulting data.

    Now to the change. First, I modified deblocking to implement a subset of the macroblock_cache_load function : spend an extra bit of effort loading the necessary data into a data structure which is much simpler to address — as an assembly implementation would need (x264_macroblock_cache_load_deblock). Then I massively cleaned up deblocking to move all of the core strength-calculation logic into a single, small function that could be converted to assembly (deblock_strength_c). Finally, I wrote the assembly functions and worked with Loren to optimize them. Here’s the result.

    And the timings for the resulting assembly function on my Core i7, in cycles :

    deblock_strength_c : 309
    deblock_strength_mmx : 79
    deblock_strength_sse2 : 37
    deblock_strength_ssse3 : 33

    Now that is a seriously nice improvement. 33 cycles on average to perform that many comparisons–that’s absurdly low, especially considering the SIMD takes no branchy shortcuts : it always checks every single edge ! I walked over to my performance chart and happily crossed off a box.

    But I had a hunch that I could do better. Remember, as mentioned earlier, we’re reloading all that data back into our data structures in order to address it. This isn’t that slow, but takes enough time to significantly cut down on the gain of the assembly code. And worse, less than a row ago, all this data was in the correct place to be used (when we just finished encoding the macroblock) ! But if we did the deblocking right after encoding each macroblock, the cache issues would make it too slow to be worth it (yes, I tested this). So I went back to other things, a bit annoyed that I couldn’t get the full benefit of the changes.

    Then, yesterday, I was talking with Pascal, a former Xvid dev and current video hacker over at Google, about various possible x264 optimizations. He had seen my deblocking changes and we discussed that a bit as well. Then two lines hit me like a pile of bricks :

    <_skal_> tried computing the strength at least ?
    <_skal_> while it’s fresh

    Why hadn’t I thought of that ? Do the strength calculation immediately after encoding each macroblock, save the result, and then go pick it up later for the main deblocking filter. Then we can use the data right there and then for strength calculation, but we don’t have to do the whole deblock process until later.

    I went and implemented it and, after working my way through a horde of bugs, eventually got a working implementation. A big catch was that of slices : deblocking normally acts between slices even though normal encoding does not, so I had to perform extra munging to get that to work. By midday today I was able to go cross yet another box off on the performance chart. And now it’s committed.

    Sometimes chatting for 10 minutes with another developer is enough to spot the idea that your brain somehow managed to miss for nearly a straight week.

    NB : the performance chart is on a specific test clip at a specific set of settings (super fast settings) relevant to the company I work at, so it isn’t accurate nor complete for, say, default settings.

    Update : Here’s a higher resolution version of the current chart, as requested in the comments.