Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (1)

Mot : - Tags -/biomaping

Autres articles (38)

MediaSPIP version 0.1 Beta

16 avril 2011, par kent1

MediaSPIP 0.1 beta est la première version de MediaSPIP décrétée comme "utilisable".
Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
Pour avoir une installation fonctionnelle, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...)
MediaSPIP 0.1 Beta version

25 avril 2011, par kent1

MediaSPIP 0.1 beta is the first version of MediaSPIP proclaimed as "usable".
The zip file provided here only contains the sources of MediaSPIP in its standalone version.
To get a working installation, you must manually install all-software dependencies on the server.
If you want to use this archive for an installation in "farm mode", you will also need to proceed to other manual (...)
Mise à jour de la version 0.1 vers 0.2

24 juin 2013, par kent1

Explications des différents changements notables lors du passage de la version 0.1 de MediaSPIP à la version 0.3. Quelles sont les nouveautés
Au niveau des dépendances logicielles Utilisation des dernières versions de FFMpeg (>= v1.2.1) ; Installation des dépendances pour Smush ; Installation de MediaInfo et FFprobe pour la récupération des métadonnées ; On n’utilise plus ffmpeg2theora ; On n’installe plus flvtool2 au profit de flvtool++ ; On n’installe plus ffmpeg-php qui n’est plus maintenu au (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 13

Sur d’autres sites (2697)

CD-R Read Speed Experiments

21 mai 2011, par Multimedia Mike — Science Projects, Sega Dreamcast
I want to know how fast I can really read data from a CD-R. Pursuant to my previous musings on this subject, I was informed that it is inadequate to profile reading just any file from a CD-R since data might be read faster or slower depending on whether the data is closer to the inside or the outside of the disc.

Conclusion / Executive Summary
It is 100% true that reading data from the outside of a CD-R is faster than reading data from the inside. Read on if you care to know the details of how I arrived at this conclusion, and to find out just how much speed advantage there is to reading from the outside rather than the inside.

Science Project Outline
- Create some sample CD-Rs with various properties
- Get a variety of optical drives
- Write a custom program that profiles the read speed
Creating The Test Media
It’s my understanding that not all CD-Rs are created equal. Fortunately, I have 3 spindles of media handy : Some plain-looking Memorex discs, some rather flamboyant Maxell discs, and those 80mm TDK discs :

My approach for burning is to create a single file to be burned into a standard ISO-9660 filesystem. The size of the file will be the advertised length of the CD-R minus 1 megabyte for overhead— so, 699 MB for the 120mm discs, 209 MB for the 80mm disc. The file will contain a repeating sequence of 0..0xFF bytes.

Profiling
I don’t want to leave this to the vagaries of any filesystem handling layer so I will conduct this experiment at the sector level. Profiling program outline :
- Read the CD-ROM TOC and get the number of sectors that comprise the data track
- Profile reading the first 20 MB of sectors
- Profile reading 20 MB of sectors in the middle of the track
- Profile reading the last 20 MB of sectors
Unfortunately, I couldn’t figure out the raw sector reading on modern Linux incarnations (which is annoying since I remember it being pretty straightforward years ago). So I left it to the filesystem after all. New algorithm :
- Open the single, large file on the CD-R and query the file length
- Profile reading the first 20 MB of data, 512 kbytes at a time
- Profile reading 20 MB of sectors in the middle of the track (starting from filesize / 2 - 10 MB), 512 kbytes at a time
- Profile reading the last 20 MB of sectors (starting from filesize - 20MB), 512 kbytes at a time
Empirical Data
I tested the program in Linux using an LG Slim external multi-drive (seen at the top of the pile in this post) and one of my Sega Dreamcast units. I gathered the median value of 3 runs for each area (inner, middle, and outer). I also conducted a buffer flush in between Linux runs (as root : 'sync; echo 3 > /proc/sys/vm/drop_caches').

LG Slim external multi-drive (reading from inner, middle, and outer areas in kbytes/sec) :
- TDK-80mm : 721, 897, 1048
- Memorex-120mm : 1601, 2805, 3623
- Maxell-120mm : 1660, 2806, 3624
So the 120mm discs can range from about 10.5X all the way up to a full 24X on this drive. For whatever reason, the 80mm disc fares a bit worse — even at the inner track — with a range of 4.8X - 7X.

Sega Dreamcast (reading from inner, middle, and outer areas in kbytes/sec) :
- TDK-80mm : 502, 632, 749
- Memorex-120mm : 499, 889, 1143
- Maxell-120mm : 500, 890, 1156
It’s interesting that the 80mm disc performed comparably to the 120mm discs in the Dreamcast, in contrast to the LG Slim drive. Also, the results are consistent with my previous profiling experiments, which largely only touched the inner area. The read speeds range from 3.3X - 7.7X. The middle of a 120mm disc reads at about 6X.

Implications
A few thoughts regarding these results :
- Since the very definition of 1X is the minimum speed necessary to stream data from an audio CD, then presumably, original 1X CD-ROM drives would have needed to be capable of reading 1X from the inner area. I wonder what the max read speed at the outer edges was ? It’s unlikely I would be able to get a 1X drive working easily in this day and age since the earliest CD-ROM drives required custom controllers.
- I think 24X is the max rated read speed for CD-Rs, at least for this drive. This implies that the marketing literature only cites the best possible numbers. I guess this is no surprise, similar to how monitors and TVs have always been measured by their diagonal dimension.
- Given this data, how do you engineer an ISO-9660 filesystem image so that the timing-sensitive multimedia files live on the outermost track ? In the Dreamcast case, if you can guarantee your FMV files will live somewhere between the middle and the end of the disc, you should be able to count on a bitrate of at least 900 kbytes/sec.
Source Code
Here is the program I wrote for profiling. Note that the filename is hardcoded (#define FILENAME). Compiling for Linux is a simple 'gcc -Wall profile-cdr.c -o profile-cdr'. Compiling for Dreamcast is performed in the standard KallistiOS manner (people skilled in the art already know what they need to know) ; the only variation is to compile with the '-D_arch_dreamcast' flag, which the default KOS environment adds anyway.

PLAIN TEXT
C :
#ifdef _arch_dreamcast

  #include <kos .h>

  /* map I/O functions to their KOS equivalents */

  #define open fs_open

  #define lseek fs_seek

  #define read fs_read

  #define close fs_close

  #define FILENAME "/cd/bigfile"

#else

  #include <stdio .h>

  #include <sys /types.h>

  #include </sys><sys /stat.h>

  #include </sys><sys /time.h>

  #include <fcntl .h>

  #include <unistd .h>

  #define FILENAME "/media/Full disc/bigfile"

#endif

/* Get a current absolute millisecond count ; it doesn’t have to be in

* reference to anything special. */

unsigned int get_current_milliseconds()

{

#ifdef _arch_dreamcast

  return timer_ms_gettime64() ;

#else

  struct timeval tv ;

  gettimeofday(&tv, NULL) ;

  return tv.tv_sec * 1000 + tv.tv_usec / 1000 ;

#endif

}

#define READ_SIZE (20 * 1024 * 1024)

#define READ_BUFFER_SIZE (512 * 1024)

int main()

{

  int i, j ;

  int fd ;

  char read_buffer[READ_BUFFER_SIZE] ;

  off_t filesize ;

  unsigned int start_time, end_time ;

  fd = open(FILENAME, O_RDONLY) ;

  if (fd == -1)

  {

    printf("could not open %s\n", FILENAME) ;

    return 1 ;

  }

  filesize = lseek(fd, 0, SEEK_END) ;

  for (i = 0 ; i <3 ; i++)

  {

    if (i == 0)

    {

      printf("reading inner 20 MB...\n") ;

      lseek(fd, 0, SEEK_SET) ;

    }

    else if (i == 1)

    {

      printf("reading middle 20 MB...\n") ;

      lseek(fd, (filesize / 2) - (READ_SIZE / 2), SEEK_SET) ;

    }

    else

    {

      printf("reading outer 20 MB...\n") ;

      lseek(fd, filesize - READ_SIZE, SEEK_SET) ;

    }

    /* read 20 MB ; 40 chunks of 1/2 MB */

    start_time = get_current_milliseconds() ;

    for (j = 0 ; j <(READ_SIZE / READ_BUFFER_SIZE) ; j++)

      if (read(fd, read_buffer, READ_BUFFER_SIZE) != READ_BUFFER_SIZE)

      {

        printf("read error\n") ;

        break ;

      }

    end_time = get_current_milliseconds() ;

    printf("%d - %d = %d ms => %d kbytes/sec\n",

      end_time, start_time, end_time - start_time,

      READ_SIZE / (end_time - start_time)) ;

  }

  close(fd) ;

  return 0 ;

}
Creating A Lossless SMC Encoder

26 avril 2011, par Multimedia Mike — General
Look, I can’t explain how or why I come up with this stuff. For some reason, I thought it would be interesting to write a new encoder for the Apple SMC video codec. I can’t even remember why. I just sat down the other day, started writing, and now I have a lossless SMC encoder that I’m not sure what to do with. Maybe this is to be my new thing— writing encoders for marginal multimedia formats.

Introduction
SMC is a vector quantizer (a lossy method) but I decided to attack it from the angle of lossless encoding. A.k.a. Apple Graphics Codec, SMC operates on 4x4 blocks in an 8-bit paletted colorspace. Each 4x4 block can be encoded with 1, 2, 4, 8, or 16 colors. Blocks can also be skipped (copied from previous frame) or copied from blocks rendered immediately prior within the same frame.

Step 1 : Validating Infrastructure
The goal of this step is to encode the most braindead SMC frame possible and see if FFmpeg/libav’s QuickTime muxer can create a valid file. I think the simplest frame would be one in which each vector is encoded with the single-color mode, starting with color 0 and incrementing through the palette.

Status : Successful. The only ’trick’ was to set avctx->bits_per_coded_sample to 8. (For fun, this can also be set to 40 (8 | 0x20) to specify a grayscale palette.)

Step 2 : Preprocessing
The video frames will arrive at the encoder as 32-bit RGB. These will need to be converted to a paletted colorspace before encoding. I don’t want to use FFmpeg’s default dithering approach as this will result in a substantial loss of quality as described in this post. I would rather maintain a palette built from observed colors throughout successive frames. If the total number of unique observed colors ever exceeds 256, error out.

That’s what I would like to do. However, I noticed that FFmpeg/libav’s QuickTime muxer has never taken into account the possibility of encoding palettes. The path of least resistance in this case is to dither the input to match QuickTime’s default 8-bit palette (if a paletted QuickTime file does not specify a palette, a default 1-, 2-, 4-, or 8-bit palette is selected).

Status : Successful, if slow. I definitely need to optimize this step later.

Step 3 : Most Naive Encoding
The most basic encoding is to "encode" each block as a 16-color block. This will actually result in a slightly larger frame size than a raw encoding since each 4x4 block will be prepended by a byte opcode (0xE0 in this case) to indicate encoding mode. This should demonstrate that the encoder is functioning at the most basic level.

Status : Successful. Try not to laugh too hard at the Big Buck Bunny dithered to an 8-bit palette :

Step 4 : Better Representation
It seems to me that encoding this format (losslessly) will entail performing vector operations on lots of 16-element (4x4-pixel) vectors. These could be done on the frame as-is, but it strikes me as more efficient and perhaps less error prone to rearrange the input images into a vector of vectors (or array of arrays if you prefer) :
```
  0  1  2  3  w ...
  4  5  6  7  x ...
  8  9  A  B  y ...
  C  D  E  F  z ...
```
```
  0 : [0 1 2 3 4 5 6 7 8 9 A B C D E F]
  1 : [...]
```
Status : Successful.

Step 5 : Add Interframe Skip Codes
Time to add a bit of brainpower to the proceedings : On non-keyframes, compare the current vector to the vector at the same position from the previous frame.

Test this by encoding a pair of identical frames. Ideally, all codes should be skip codes.

Status : Successful, though my vector matching function could probably be improved.

Step 6 : Analyze Blocks For Optimal Color Coding
This is where things get potentially interesting, algorithmically. At least, I need to figure out (or look up) an algorithm to count the unique elements in a vector.

Naive algorithm (i.e., first thing I can think of) :
- initialize a count variable to 0
- initialize an array of 256 flags to false
- for each 8-bit element in vector :
  - if flag array[element] is 0, set array[element] to true and increment count
Status : Successful. Here is the distribution for the 640x360 Big Buck Bunny title :

1194 4636 4113 2140 1138 568 325 154 80 36 9 5 2 0 0 0

Or, in pretty graph form, demonstrating that vectors with few distinct elements dominate :

Step 7 : Encode Monochrome Blocks
At this point, the structure is starting to come together pretty well. This phase involves encoding a 0x60 opcode and a palette index when the count_distinct() function returns 1.

Status : Absolutely no problem.

Step 8 : Encode 2-, 4-, and 8-color Modes
This step is a little more involved. This is where SMC’s 2-, 4-, and 8-color circular palette caches come into play. E.g., when the first 2-color block is encoded, the pair of colors it uses will be inserted into entry 0 of the 2-color cache. During the next 2-color block encoding, if the block uses a pair of colors that already occurs in the cache, the encoding can reference that cache entry. Otherwise, it adds the pair to the next available cache entry, looping back around to 0 as necessary.

I think I should modify the count_distinct() function to also return a 16-byte array that contains a sorted list of the palette indicies used in the vector. The color pair cache will contain 256 16-bit, 32-bit ints for the quads and 64-bit ints for the octets. This will allow a slightly faster linear cache search.

Status : The 2-color encoding wasn’t too much trouble and I was able to adapt it to the 4-color mode pretty quickly afterward. I’m still having trouble with the insane 8-color coding mode, though. So that’s commented out for the time being.

Step 9 : Run Encoding and Putting It All Together
For each frame, convert the input pixels to a paletted format via one method or another (match to default QuickTime palette for first pass). Then, preprocess each vector to determine the minimum number of elements that can be used to represent it, storing the sorted list of distinct colors in a separate array. The number of elements can either be 0 (only for interframes and indicates a skip block), 1, 2, 4, 8, or 16. Also during this phase, for each vector after the first, test if the vector is the same as the previous vector. If it is, denote this fact in the preprocessed encoding (set the high bit of the element count number).

Finally, pack it into the bytestream. Iterate through the element count array and search for the longest runs of elements that are encoded with the same mode (up to 256 for skip modes, up to 16 for other modes). If the high bit of an element count is set, that indicates that a copy mode can be encoded. Look for the longest run of element counts with the high bit set and encode a copy mode.

Status : In-process. Will finish this as motivation strikes.
An introduction to reverse engineering

22 janvier 2011

(This blog is still in hibernation, but I needed somewhere to post this)

Reverse engineering is one of those wonderful topics, covering everything from simple "guess how this program works" problem solving, to poking at silicon with scanning electron microscopes. I’m always hugely fascinated by articles that walk through the steps involved in these types of activities, so I thought I’d contribute one back to the world.

In this case, I’m going to be looking at the export bundle format created by the Tandberg Content Server, a device for recording video conferences. The bundle is intended for moving recordings between Tandberg devices, but it’s also the easiest way to get all of the related assets for a recorded conference. Unfortunately, there’s no parser available to take the bundle files (.tcb) and output the component pieces. Well, that just won’t do.

For this type of reverse engineering, I basically want to learn enough about the TCB format to be able to parse out the individual files within it. The only tools I’ll need in this process are a hex editor, a notepad, and a way to convert between hex and decimal (the OS X calculator will do fine if you’re not the type to do it in your head).

Step 1 : Basic Research
After Googling around to see if this was a solved issue, I decided to dive into the format. I brought a sample bundle into my trusty hex editor (in this case Hex Fiend).

A few things are immediately obvious. First, we see the first four bytes are the letters TCSB. Another quick visit to Google confirms this header type isn’t found elsewhere, and there’s essentially no discussion of it. Going a few bytes further, we see "contents.xml." And a few bytes after that, we see what looks like plaintext XML. This is a pretty good clue that the TCB file consists of a . Let’s scan a bit further and see if we can confirm that.

In this segment, we see the end of the XML, and something that could be another filename - "dbtransfer" - followed by what looks like gibberish. That doesn’t help too much. Let’s keep looking.

Great - a .jpg ! Looking a bit further, we see the letters "JFIF," which is recognizable as part of a JPEG header. If you weren’t already familiar with that, a quick google for "jpg hex header" would clear up any confusion. So, we’ve got the basics of the file format down, but we’ll need a little bit more information if we’re going to write a parser.

Step 2 : Finding the pattern
We can make an educated guess that a file like this has to provide a few hints to a decoder. We would either expect a table of contents, describing where in the bundle each individual file was located, some sort of stop bit marking the boundary between files, byte offsets describing the locations of files, or a listing of file lengths.

There isn’t any sign of a table of contents. Let’s start looking for a stop bit, as that would make writing our parser really easy. Want I’m going to do is pull out all of the data between two prospective files, and I want two sets to compare.
I’ve placed asterisks to flag the bytes corresponding to the filenames, since those are known.

1E D1 70 4C 25 06 36 4D 42 E9 65 6A 9F 5D 88 38 0A 00 *64 62 74 72 61 6E 73 66 65 72* 42 06 ED 48 0B 50 0A C4 14 D6 63 42 F2 BF E3 9D 20 29 00 00 00 00 00 00 DE E5 FD

01 0C 00 *63 6F 6E 74 65 6E 74 73 2E 78 6D 6C* 9E 0E FE D3 C9 3A 3A 85 F4 E4 22 FE D0 21 DC D7 53 03 00 00 00 00 00 00

The first line corresponds to the "dbtransfer" entry, the second to the "contents.xml" entry. Let’s trim the first entry to match the second.

38 0A 00 *64 62 74 72 61 6E 73 66 65 72* 42 06 ED 48 0B 50 0A C4 14 D6 63 42 F2 BF E3 9D 20 29 00 00 00 00 00 00

01 0C 00 *63 6F 6E 74 65 6E 74 73 2E 78 6D 6C* 9E 0E FE D3 C9 3A 3A 85 F4 E4 22 FE D0 21 DC D7 53 03 00 00 00 00 00 00

It looks like we’ve got three bytes before the filename, followed by 18 bytes, followed by six bytes of zero. Unfortunately, there’s no obvious pattern of bits which would correspond to a "break" between segments. However, looking at those first three bytes, we see a 0x0A, and a 0x0C, two small values in the same place. 10 and 12. Interesting - the 12 entry corresponds with "contents.xml" and the 10 entry corresponds with "dbtransfer". Could that byte describe the length of the filename ? Let’s look at our much longer JPG entry to be sure.

70 4A 00 *77 77 77 5C 73 6C 69 64 65 73 5C 64 37 30 64 35 34 63 66 2D 32 39 35 62 2D 34 31 34 63 2D 61 38 64 66 2D 32 66 37 32 64 66 33 30 31 31 35 65 5C 74 68 75 6D 62 6E 61 69 6C 73 5C 74 68 75 6D 62 6E 61 69 6C 30 30 2E 6A 70 67*

0x4A - 74, corresponding to a 74 character filename. Looks like we’re in business.

At this point, it’s worth an aside to talk about endianness. I happen to know that the Tandberg Content Server runs Windows on Intel, so I went into this with the assumption that the format was little-endian. However, if you’re not sure, it’s always worth looking at words backwards and forwards, just in case.

So we know how to find our filename. Now how do we find our file data ? Let’s go back to our JPEG. We know that JPEGs start with 0xFFD8FFE0, and a quick trip to Google also tells us that they end with 0xFFD9. We can use that to pull a sample jpeg out of our TCB, save it to disk, and confirm that we’re on the right track.

This is one of those great steps in reverse engineering - concrete proof that you’re on the right track. Everything seems to go quicker from this point on.

So, we know we’ve got a JPEG file in a continuous 2177 byte segment. We know that the format used byte lengths to describe filenames - maybe it also uses byte lengths to describe file lengths. Let’s look for 2177, or 0x8108, near our JPEG.

Well, that’s a good sign. But, it could be coincidental, so at this point we’d want to check a few other files to be sure. In fact, looking further in some file, we find some larger .mp4 files which don’t quite match our guess. It turns out that file length is a 32bit value, not a 16bit value - with our two jpegs, the larger bytes just happened to be zeros.

Step 3 : Writing a parser

"Bbbbbut...", I hear you say ! "You have all these chunks of data you don’t understand !"

True enough, but all I care about is getting the files out, with the proper names. I don’t care about creation dates, file permissions, or any of the other crud that this file format likely contains.

Let’s look at the first two files in this bundle. A little bit of byte counting shows us the pattern that we can follow. We’ll treat the first file as a special case. After that, we seek 16 bytes from the end of file data to find the filename length (2 bytes), then we’re at the filename, then we seek 16 bytes to find the file length (4 bytes) and seek another 4 bytes to find the start of the file data. Rinse, repeat.

I wrote a quick parser in PHP, since the eventual use for this information is part of a larger PHP-based application, but any language with basic raw file handling would work just as well.

tcsParser.txt
This was about the simplest possible type of reverse engineering - we had known data in an unknown format, without any compression or encryption. It only gets harder from here...