Recherche avancée

Médias (0)

Mot : - Tags -/diogene

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (103)

  • Encoding and processing into web-friendly formats

    13 avril 2011, par

    MediaSPIP automatically converts uploaded files to internet-compatible formats.
    Video files are encoded in MP4, Ogv and WebM (supported by HTML5) and MP4 (supported by Flash).
    Audio files are encoded in MP3 and Ogg (supported by HTML5) and MP3 (supported by Flash).
    Where possible, text is analyzed in order to retrieve the data needed for search engine detection, and then exported as a series of image files.
    All uploaded files are stored online in their original format, so you can (...)

  • Supporting all media types

    13 avril 2011, par

    Unlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)

  • Formulaire personnalisable

    21 juin 2013, par

    Cette page présente les champs disponibles dans le formulaire de publication d’un média et il indique les différents champs qu’on peut ajouter. Formulaire de création d’un Media
    Dans le cas d’un document de type média, les champs proposés par défaut sont : Texte Activer/Désactiver le forum ( on peut désactiver l’invite au commentaire pour chaque article ) Licence Ajout/suppression d’auteurs Tags
    On peut modifier ce formulaire dans la partie :
    Administration > Configuration des masques de formulaire. (...)

Sur d’autres sites (3514)

  • Processing Big Data Problems

    8 janvier 2011, par Multimedia Mike — Big Data

    I’m becoming more interested in big data problems, i.e., extracting useful information out of absurdly sized sets of input data. I know it’s a growing field and there is a lot to read on the subject. But you know how I roll— just think of a problem to solve and dive right in.

    Here’s how my adventure unfolded.

    The Corpus
    I need to run a command line program on a set of files I have collected. This corpus is on the order of 350,000 files. The files range from 7 bytes to 175 MB. Combined, they occupy around 164 GB of storage space.

    Oh, and said storage space resides on an external, USB 2.0-connected hard drive. Stop laughing.

    A file is named according to the SHA-1 hash of its data. The files are organized in a directory hierarchy according to the first 6 hex digits of the SHA-1 hash (e.g., a file named a4d5832f... is stored in a4/d5/83/a4d5832f...). All of this file hash, path, and size information is stored in an SQLite database.

    First Pass
    I wrote a Python script that read all the filenames from the database, fed them into a pool of worker processes using Python’s multiprocessing module, and wrote some resulting data for each file back to the SQLite database. My Eee PC has a single-core, hyperthreaded Atom which presents 2 CPUs to the system. Thus, 2 worker threads crunched the corpus. It took awhile. It took somewhere on the order of 9 or 10 or maybe even 12 hours. It took long enough that I’m in no hurry to re-run the test and get more precise numbers.

    At least I extracted my initial set of data from the corpus. Or did I ?

    Think About The Future

    A few days later, I went back to revisit the data only to notice that the SQLite database was corrupted. To add insult to that bit of injury, the script I had written to process the data was also completely corrupted (overwritten with something unrelated to Python code). BTW, this is was on a RAID brick configured for redundancy. So that’s strike 3 in my personal dealings with RAID technology.

    I moved the corpus to a different external drive and also verified the files after writing (easy to do since I already had the SHA-1 hashes on record).

    The corrupted script was pretty simple to rewrite, even a little better than before. Then I got to re-run it. However, this run was on a faster machine, a hyperthreaded, quad-core beast that exposes 8 CPUs to the system. The reason I wasn’t too concerned about the poor performance with my Eee PC is that I knew I was going to be able to run in on this monster later.

    So I let the rewritten script rip. The script gave me little updates regarding its progress. As it did so, I ran some rough calculations and realized that it wasn’t predicted to finish much sooner than it would have if I were running it on the Eee PC.

    Limiting Factors
    It had been suggested to me that I/O bandwidth of the external USB drive might be a limiting factor. This is when I started to take that idea very seriously.

    The first idea I had was to move the SQLite database to a different drive. The script records data to the database for every file processed, though it only commits once every 100 UPDATEs, so at least it’s not constantly syncing the disc. I ran before and after tests with a small subset of the corpus and noticed a substantial speedup thanks to this policy chance.

    Then I remembered hearing something about "atime" which is access time. Linux filesystems, per default, record the time that a file was last accessed. You can watch this in action by running 'stat <file> ; cat <file> > /dev/null ; stat <file>' and observe that the "Access" field has been updated to NOW(). This also means that every single file that gets read from the external drive still causes an additional write. To avoid this, I started mounting the external drive with '-o noatime' which instructs Linux not to record "last accessed" time for files.

    On the limited subset test, this more than doubled script performance. I then wondered about mounting the external drive as read-only. This had the same performance as noatime. I thought about using both options together but verified that access times are not updated for a read-only filesystem.

    A Note On Profiling
    Once you start accessing files in Linux, those files start getting cached in RAM. Thus, if you profile, say, reading a gigabyte file from a disk and get 31 MB/sec, and then repeat the same test, you’re likely to see the test complete instantaneously. That’s because the file is already sitting in memory, cached. This is useful in general application use, but not if you’re trying to profile disk performance.

    Thus, in between runs, do (as root) 'sync; echo 3 > /proc/sys/vm/drop_caches' in order to wipe caches (explained here).

    Even Better ?
    I re-ran the test using these little improvements. Now it takes somewhere around 5 or 6 hours to run.

    I contrived an artificially large file on the external drive and did some 'dd' tests to measure what the drive could really do. The drive consistently measured a bit over 31 MB/sec. If I could read and process the data at 30 MB/sec, the script would be done in about 95 minutes.

    But it’s probably rather unreasonable to expect that kind of transfer rate for lots of smaller files scattered around a filesystem. However, it can’t be that helpful to have 8 different processes constantly asking the HD for 8 different files at any one time.

    So I wrote a script called stream-corpus.py which simply fetched all the filenames from the database and loaded the contents of each in turn, leaving the data to be garbage-collected at Python’s leisure. This test completed in 174 minutes, just shy of 3 hours. I computed an average read speed of around 17 MB/sec.

    Single-Reader Script
    I began to theorize that if I only have one thread reading, performance should improve greatly. To test this hypothesis without having to do a lot of extra work, I cleared the caches and ran stream-corpus.py until 'top' reported that about half of the real memory had been filled with data. Then I let the main processing script loose on the data. As both scripts were using sorted lists of files, they iterated over the filenames in the same order.

    Result : The processing script tore through the files that had obviously been cached thanks to stream-corpus.py, degrading drastically once it had caught up to the streaming script.

    Thus, I was incented to reorganize the processing script just slightly. Now, there is a reader thread which reads each file and stuffs the name of the file into an IPC queue that one of the worker threads can pick up and process. Note that no file data is exchanged between threads. No need— the operating system is already implicitly holding onto the file data, waiting in case someone asks for it again before something needs that bit of RAM. Technically, this approach accesses each file multiple times. But it makes little practical difference thanks to caching.

    Result : About 183 minutes to process the complete corpus (which works out to a little over 16 MB/sec).

    Why Multiprocess
    Is it even worthwhile to bother multithreading this operation ? Monitoring the whole operation via 'top', most instances of the processing script are barely using any CPU time. Indeed, it’s likely that only one of the worker threads is doing any work most of the time, pulling a file out of the IPC queue as soon the reader thread triggers its load into cache. Right now, the processing is usually pretty quick. There are cases where the processing (external program) might hang (one of the reasons I’m running this project is to find those cases) ; the multiprocessing architecture at least allows other processes to take over until a hanging process is timed out and killed by its monitoring process.

    Further, the processing is pretty simple now but is likely to get more intensive in future iterations. Plus, there’s the possibility that I might move everything onto a more appropriately-connected storage medium which should help alleviate the bottleneck bravely battled in this post.

    There’s also the theoretical possibility that the reader thread could read too far ahead of the processing threads. Obviously, that’s not too much of an issue in the current setup. But to guard against it, the processes could share a variable that tracks the total number of bytes that have been processed. The reader thread adds filesizes to the count while the processing threads subtract file sizes. The reader thread would delay reading more if the number got above a certain threshold.

    Leftovers
    I wondered if the order of accessing the files mattered. I didn’t write them to the drive in any special order. The drive is formatted with Linux ext3. I ran stream-corpus.py on all the filenames sorted by filename (remember the SHA-1 naming convention described above) and also by sorting them randomly.

    Result : It helps immensely for the filenames to be sorted. The sorted variant was a little more than twice as fast as the random variant. Maybe it has to do with accessing all the files in a single directory before moving onto another directory.

    Further, I have long been under the impression that the best read speed you can expect from USB 2.0 was 27 Mbytes/sec (even though 480 Mbit/sec is bandied about in relation to the spec). This comes from profiling I performed with an external enclosure that supports both USB 2.0 and FireWire-400 (and eSata). FW-400 was able to read the same file at nearly 40 Mbytes/sec that USB 2.0 could only read at 27 Mbytes/sec. Other sources I have read corroborate this number. But this test (using different hardware), achieved over 31 Mbytes/sec.

  • Ode to the Gravis Ultrasound

    1er août 2011, par Multimedia Mike — General

    WARNING : This post is a bunch of nostalgia. Feel free to follow along if you recall the DOS days of the early-mid 1990s.

    I finally let go of my Gravis Ultrasound MAX sound card a little while ago. It felt like the end of an era for me, even though I had scarcely used the card in recent memory.



    The Beginning
    What is the Gravis Ultrasound ? Only the finest PC sound card from the classic DOS days. Back in the day (very early 1990s), most consumer PC sound cards were Yamaha OPL FM synthesizers paired with a basic digital to analog converter (DAC). Gravis, a company known for game controllers, dared to break with the dominant paradigm of Sound Blaster clones and create a sound card that had 32 digital channels.

    I heard about the GUS sometime in 1992 through one of the dominant online services at the time, Prodigy. Through the message boards, I learned of a promotion with Electronic Arts in which customers could pre-order a GUS at a certain discount along with 2 EA games from a selected catalog (with progressive discounts when ordering more games from the list). I know I got the DOS version of PowerMonger ; I think the other was Night Shift, though that doesn’t seem to be an EA title.

    Anyway, 1992 saw many maddening delays of the GUS hardware. Finally, reports of GUS shipments began to trickle into the Prodigy message forums. Then one day in November, 1992, mine arrived. Into the 286 machine it went and a valiant attempt at software installation was made. A friend and I fought with the software late into the evening, trying to make this thing work reasonably. I remember grabbing a pair of old headphones sitting near the computer that were used for an ancient (even for the time) portable radio. That was the only means of sound reproduction we had available at that moment. And it still sounded incredible.

    After graduating to progressively superior headphones, I would later return to that original pair only to feel my ears were being physically assaulted. Strange, they sounded fine that first night I was trying to make the GUS work. I guess this was my first understanding that the degree to which one is a snobby audiophile is all a matter of hard-earned experience.

    Technology
    The GUS was powered by something called a GF1 which was supposed to use a technology called wavetable synthesis. In the early days, I thought (and I wasn’t alone in this) that this meant that the GF1 chip had a bunch of digitized instrument samples stored in the ASIC. That wasn’t it.

    However, it did feature 32 digital channels at a time when most PC audio cards had 2 (plus that Yamaha FM synthesizer). There was some hemming and hawing about how the original GUS couldn’t drive all 32 channels at a full 44.1 kHz ("CD quality") playback rate. It’s true— if 14 channels were enabled, all could be played at 44.1 kHz. Enabling more channels started progressive degradation and with all 32 channels, each was only playing at around 19 kHz. Still, from my emerging game programmer perspective, that allowed for 8-channel tracker music and 6 channels of sound effects, all at the vaunted CD level of quality.

    Games and Compatibility
    The primary reason to have a discrete sound card was for entertainment applications — ahem, games. GUS support was pretty sketchy out of the gate (ostensibly a major reason for the card’s delay). While many sound cards offered Sound Blaster emulation by basically having the same hardware as Sound Blaster cards, the GUS took a software route towards emulating the SB. To do this required a program called the Sound Blaster Operating System, or SBOS.

    Oh, how awesome it was to hear the program exclaim "SBOS installed !" And how harshly it grated on your nerves after the 200th time hearing it due to so many reboots and fiddling with options to make your games work. Also, I’ve always wondered if there’s something special about sampling an ’s’ sound — does it strain the sampling frequency range ? Perhaps the phrase was sampled at too low a bitrate because the ’s’ sounds didn’t come through very clearly, which is something you notice after hundreds of iterations when there are 3 ’s’ sounds in the phrase.

    Fortunately, SBOS became less relevant with the advent of Mega-Em, a separate emulator which intercepted calls to Roland MIDI systems and routed them to the very capable GUS. Roland-supporting games sounded beautiful.

    Eventually, more and more DOS games were released with native Gravis support, sometimes with the help of The Miles Sound System (from our friends at Rad Game Tools — you know, the people behind Smacker and Bink). The library changelog is quite the trip down PC memory lane.

    An important area where the GUS shined brightly was that of demos and music trackers. The emerging PC demo scene embraced the powerful GUS (aided, no doubt, by Gravis’ sponsorship of the community) and the coolest computer art and music of the time natively supported the card.

    Programming
    At this point in my life, I was a budding programmer in high school and was fairly intent on programming video games. So far, I had figured out how to make a few blips using a borrowed Sound Blaster card. I went to great lengths to learn how to program the Gravis Ultrasound.

    Oh you kids today, with your easy access to information at the tips of your fingers thanks to Google and the broader internet. I had to track down whatever information I could find through a combination of Prodigy message boards and local dialup BBSes and FidoNet message bases. Gravis was initially tight-lipped about programming information for its powerful card, as was de rigueur of hardware companies (something that largely persists to this day). But Gravis eventually saw an opportunity to one-up encumbent Creative Labs and released a full SDK for the Ultrasound. I wanted the SDK badly.

    So it was early-mid 1993. Gravis released an SDK. I heard that it was available on their support BBS. Their BBS with a long distance phone number. If memory serves, the SDK was only in the neighborhood of 1.5 Mbytes. That takes a long time to transfer via a 2400 baud modem at a time when long distance phone charges were still a thing and not insubstantial.

    Luckily, they also put the SDK on something called an ’FTP site’. Fortunately, about this time, I had the opportunity to get some internet access via the local university.

    Indeed, my entire motivation for initially wanting to get on the internet was to obtain special programming information. Is that nerdy enough for you ?

    I see that the GUS SDK is still available via the Gravis FTP site. The file GUSDK222.ZIP is dated 1998 and is less than a megabyte.

    Next Generation : CD Support
    So I had my original GUS by the end of 1992. That was just the first iteration of the Gravis Ultrasound. The next generation was the GUS MAX. When I was ready to get into the CD-ROM era, this was what I wanted in my computer. This is because the GUS MAX had CD-ROM support. This is odd to think about now when all optical drives have SATA interfaces and (P)ATA interfaces before that— what did CD-ROM compatibility mean back then ? I wasn’t quite sure. But in early 1995, I headed over to Computer City (R.I.P.) and bought a new GUS MAX and Sony double-speed CD-ROM drive to install in the family’s PC.



    About the "CD-ROM compatibility" : It seems that there were numerous competing interfaces in the early days of CD-ROM technology. The GUS MAX simply integrated 3 different CD-ROM controllers onto the audio card. This was superfluous to me since the Sony drive came with an appropriate controller card anyway, though I didn’t figure out that the extra controller card was unnecessary until after I installed it. No matter ; computers of the day were rife with expansion ports.



    The 3 different CD-ROM controllers on the GUS MAX

    Explaining The Difference
    It was difficult to explain the difference in quality to those who didn’t really care. Sometime during 1995, I picked up a quasi-promotional CD-ROM called "The Gravis Ultrasound Experience" from Babbage’s computer store (remember when that was a thing ?). As most PC software had been distributed on floppy discs up until this point, this CD-ROM was an embarrassment of riches. Tons of game demos, scene demos, tracker music, and all the latest GUS drivers and support software.

    Further, the CD-ROM had a number of red book CD audio tracks that illustrated the difference between Sound Blaster cards and the GUS. I remember loaning this to a tech-savvy coworker who disbelieved how awesome the GUS was. The coworker took it home, listened to it, and wholly agreed that the GUS audio sounded better than the SB audio in the comparison — and was thoroughly confused because she was hearing this audio emanating from her Sound Blaster. It was the difference between real-time and pre-rendered audio, I suppose, but I failed to convey that message. I imagine the same issue comes up even today regarding real-time video rendering vs., e.g., a pre-rendered HD cinematic posted on YouTube.

    Regrettably, I can’t find that CD-ROM anymore which leads me to believe that the coworker never gave it back. Too bad, because it was quite the treasure trove.

    Aftermath
    According to folklore I’ve heard, Gravis couldn’t keep up as the world changed to Windows and failed to deliver decent drivers. Indeed, I remember trying to keep my GUS in service under Windows 95 well into 1998 but eventually relented and installed some kind of more appropriate sound card that was better supported under Windows.

    Of course, audio output capability has been standard issue for any PC for at least 10 years and many people aren’t even aware that discrete sound cards still exist. Real-time audio rendering has become less essential as full musical tracks can be composed and compressed into PCM format and delivered with the near limitless space afforded by optical storage.

    A few years ago, it was easy to pick up old GUS cards on eBay for cheap. As of this writing, there are only a few and they’re pricy (but perhaps not selling). Maybe I was just viewing during the trough of no value a few years ago.

    Nowadays, of course, anyone interested in studying the old GUS or getting a nostalgia fix need only boot up the always-excellent DOSBox emulator which provides remarkable GUS emulation support.

  • Physical Calculus Education

    2 septembre 2011, par Multimedia Mike — General

    I have never claimed to be especially proficient at math. I did take Advanced Placement calculus in my senior year of high school. While digging through some boxes, I found an old grade report from that high school year. I wondered what motivated me to save it. Maybe it’s because it offered this clue as to why I can’t perform adequately in math class :



    Mystery solved : I did not wear proper P.E. attire to calculus class.