
Recherche avancée
Médias (91)
-
#3 The Safest Place
16 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#4 Emo Creates
15 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#2 Typewriter Dance
15 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#1 The Wires
11 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
ED-ME-5 1-DVD
11 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Audio
-
Revolution of Open-source and film making towards open film making
6 octobre 2011, par
Mis à jour : Juillet 2013
Langue : English
Type : Texte
Autres articles (44)
-
Gestion des droits de création et d’édition des objets
8 février 2011, parPar défaut, beaucoup de fonctionnalités sont limitées aux administrateurs mais restent configurables indépendamment pour modifier leur statut minimal d’utilisation notamment : la rédaction de contenus sur le site modifiables dans la gestion des templates de formulaires ; l’ajout de notes aux articles ; l’ajout de légendes et d’annotations sur les images ;
-
Supporting all media types
13 avril 2011, parUnlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)
-
Dépôt de média et thèmes par FTP
31 mai 2013, parL’outil MédiaSPIP traite aussi les média transférés par la voie FTP. Si vous préférez déposer par cette voie, récupérez les identifiants d’accès vers votre site MédiaSPIP et utilisez votre client FTP favori.
Vous trouverez dès le départ les dossiers suivants dans votre espace FTP : config/ : dossier de configuration du site IMG/ : dossier des média déjà traités et en ligne sur le site local/ : répertoire cache du site web themes/ : les thèmes ou les feuilles de style personnalisées tmp/ : dossier de travail (...)
Sur d’autres sites (3162)
-
Computer crashing when using python tools in same script
5 février 2023, par SL1997I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.


Tools :


https://github.com/alphacep/vosk-api

https://github.com/resemble-ai/Resemblyzer

I can do both things individually but run into issues when trying to do them when running the one python script.


I used the following guide when setting up the diarization system :




Computer specs are as follows :


Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)

32GB RAM

The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.


from vosk import Model, KaldiRecognizer
from pydub import AudioSegment
import json
import sys
import os
import subprocess
import datetime
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.hparams import sampling_rate
from spectralcluster import SpectralClusterer
import threading
import queue
import gc



def recognition(queue, audio, FRAME_RATE):

 model = Model("Vosk_Models/vosk-model-small-en-us-0.15")

 rec = KaldiRecognizer(model, FRAME_RATE)
 rec.SetWords(True)

 rec.AcceptWaveform(audio.raw_data)
 result = rec.Result()

 transcript = json.loads(result)#["text"]

 #return transcript
 queue.put(transcript)



def diarization(queue, audio):

 wav = preprocess_wav(audio)
 encoder = VoiceEncoder("cpu")
 _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
 print(cont_embeds.shape)

 clusterer = SpectralClusterer(
 min_clusters=2,
 max_clusters=100,
 p_percentile=0.90,
 gaussian_blur_sigma=1)

 labels = clusterer.predict(cont_embeds)

 def create_labelling(labels, wav_splits):

 times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
 labelling = []
 start_time = 0

 for i, time in enumerate(times):
 if i > 0 and labels[i] != labels[i - 1]:
 temp = [str(labels[i - 1]), start_time, time]
 labelling.append(tuple(temp))
 start_time = time
 if i == len(times) - 1:
 temp = [str(labels[i]), start_time, time]
 labelling.append(tuple(temp))

 return labelling

 #return
 labelling = create_labelling(labels, wav_splits)
 queue.put(labelling)



def identify_speaker(queue1, queue2):

 transcript = queue1.get()
 labelling = queue2.get()

 for speaker in labelling:

 speakerID = speaker[0]
 speakerStart = speaker[1]
 speakerEnd = speaker[2]

 result = transcript['result']
 words = [r['word'] for r in result if speakerStart < r['start'] < speakerEnd]
 #return
 print("Speaker",speakerID,":",' '.join(words), "\n")





def main():

 queue1 = queue.Queue()
 queue2 = queue.Queue()

 FRAME_RATE = 16000
 CHANNELS = 1

 podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")
 podcast = podcast.set_channels(CHANNELS)
 podcast = podcast.set_frame_rate(FRAME_RATE)

 first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))
 second_thread = threading.Thread(target=diarization, args=(queue2, podcast))
 third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))

 first_thread.start()
 first_thread.join()
 gc.collect()

 second_thread.start()
 second_thread.join()
 gc.collect()

 third_thread.start()
 third_thread.join()
 gc.collect()

 # transcript = recognition(podcast,FRAME_RATE)
 #
 # labelling = diarization(podcast)
 #
 # print(identify_speaker(transcript, labelling))


if __name__ == '__main__':
 main()



When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.


-
FFMPEG Steam Stream stop after a few min
24 novembre 2022, par Fabian BoulegueI try set a video stream to STREAM via FFMPEG so far everything works fine but after a few min it just seem to timeout (at last the stream stops but ffmpeg still send everything)



Here is the ffmpeg string



OUTRES="1920x1080"
GOP="60"
GOPMIN="15"
THREADS="4"
CBR="2500k"
QUALITY="ultrafast"
VIDEO="rescue.mp4"
STREAM_KEY="steam_62879128_f5xxxxxxxca5b1c"
SERVER="rtmp://ingest-01-fra1.broadcast.steamcontent.com/app/"

ffmpeg -stream_loop -1 -i $VIDEO -f flv \
-vcodec libx264 -g $GOP -keyint_min $GOPMIN -b:v $CBR -minrate $CBR -maxrate $CBR -pix_fmt yuv420p \
-s $OUTRES -flvflags no_duration_filesize -preset $QUALITY -tune film -acodec aac -threads $THREADS -strict normal \
-bufsize $CBR "$SERVER$STREAM_KEY"




this is the ffmpeg output



ffmpeg version 3.4.6-0ubuntu0.18.04.1 Copyright (c) 2000-2019 the FFmpeg developers
 built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)
 configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
 libavutil 55. 78.100 / 55. 78.100
 libavcodec 57.107.100 / 57.107.100
 libavformat 57. 83.100 / 57. 83.100
 libavdevice 57. 10.100 / 57. 10.100
 libavfilter 6.107.100 / 6.107.100
 libavresample 3. 7. 0 / 3. 7. 0
 libswscale 4. 8.100 / 4. 8.100
 libswresample 2. 9.100 / 2. 9.100
 libpostproc 54. 7.100 / 54. 7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'rescue.mp4':
 Metadata:
 major_brand : isom
 minor_version : 512
 compatible_brands: isomiso2avc1mp41
 encoder : Lavf57.83.100
 Duration: 00:44:48.08, start: 0.000000, bitrate: 2250 kb/s
 Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 2114 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
 Metadata:
 handler_name : VideoHandler
 Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
 Metadata:
 handler_name : SoundHandler
Stream mapping:
 Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
 Stream #0:1 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
[libx264 @ 0x55a3ddc7e520] using SAR=1/1
[libx264 @ 0x55a3ddc7e520] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 0x55a3ddc7e520] profile Constrained Baseline, level 4.0
[libx264 @ 0x55a3ddc7e520] 264 - core 152 r2854 e9a5903 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=0 ref=1 deblock=0:-1:-1 analyse=0:0 me=dia subme=0 psy=1 psy_rd=1.00:0.15 mixed_ref=0 me_range=16 chroma_me=1 trellis=0 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=0 threads=4 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=60 keyint_min=15 scenecut=0 intra_refresh=0 rc_lookahead=0 rc=cbr mbtree=0 bitrate=2500 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=2500 vbv_bufsize=2500 nal_hrd=none filler=0 ip_ratio=1.40 aq=0
Output #0, flv, to 'rtmp://ingest-01-fra1.broadcast.steamcontent.com/app/steam_62879128_f5999013aeca5b1c':
 Metadata:
 major_brand : isom
 minor_version : 512
 compatible_brands: isomiso2avc1mp41
 encoder : Lavf57.83.100
 Stream #0:0(und): Video: h264 (libx264) ([7][0][0][0] / 0x0007), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 2500 kb/s, 30 fps, 1k tbn, 30 tbc (default)
 Metadata:
 handler_name : VideoHandler
 encoder : Lavc57.107.100 libx264
 Side data:
 cpb: bitrate max/min/avg: 2500000/0/2500000 buffer size: 2500000 vbv_delay: -1
 Stream #0:1(eng): Audio: aac (LC) ([10][0][0][0] / 0x000A), 44100 Hz, stereo, fltp, 128 kb/s (default)
 Metadata:
 handler_name : SoundHandler
 encoder : Lavc57.107.100 aac
frame= 479 fps= 74 q=-1.0 Lsize= 4669kB time=00:00:16.13 bitrate=2370.2kbits/s speed=2.49x



-
More Cinepak Madness
20 octobre 2011, par Multimedia Mike — Codec TechnologyFellow digital archaeologist Clone2727 found a possible fifth variant of the Cinepak video codec. He asked me if I cared to investigate the sample. I assured him I wouldn’t be able to die a happy multimedia nerd unless I have cataloged all possible Cinepak variants known to exist in the wild. I’m sure there are chemistry nerds out there who are ecstatic when another element is added to the periodic table. Well, that’s me, except with weird multimedia formats.
Background
Cinepak is a video codec that saw widespread use in the early days of digital multimedia. To date, we have cataloged 4 variants of Cinepak in the wild. This distinction is useful when trying to write and maintain an all-in-one decoder. The variants are :- The standard type : Most Cinepak data falls into this category. It decodes to a modified/simplified YUV 4:2:0 planar colorspace and is often seen in AVI and QuickTime/MOV files.
- 8-bit greyscale : Essentially the same as the standard type but with only a Y plane. This has only been identified in AVI files and is distinguished by the file header’s video bits/pixel field being set to 8 instead of 24.
- 8-bit paletted : Again, this is identified by the video header specifying 8 bits/pixel for a Cinepak stream. There is essentially only a Y plane in the data, however, each 8-bit value is a palette index. The palette is transported along with the video header. To date, only one known sample of this format has even been spotted in the wild, and it’s classified as NSFW. It is also a QuickTime/MOV file.
- Sega/FILM CPK data : Sega Saturn games often used CPK files which stored a variant of Cinepak that, while very close the standard Cinepak, couldn’t be decoded with standard decoder components.
So, a flexible Cinepak decoder has to identify if the file’s video header specified 8 bits/pixel. How does it distinguish between greyscale and paletted ? If a file is paletted, a custom palette should have been included with the video header. Thus, if video bits/pixel is 8 and a palette is present, use paletted ; else, use greyscale. Beyond that, the Cinepak decoder has a heuristic to determine how to handle the standard type of data, which might deviate slightly if it comes from a Sega CPK file.
The Fifth Variant ?
Now, regarding this fifth variant– the reason this issue came up is because of that aforementioned heuristic. Basically, a Cinepak chunk is supposed to store the length of the entire chunk in its header. The data from a Sega CPK file plays fast and loose with this chunk size and the discrepancy makes it easy to determine if the data requires special handling. However, a handful of files discovered on a Macintosh game called “The Journeyman Project : Pegasus Prime” have chunk lengths which are sometimes in disagreement with the lengths reported in the containing QuickTime file’s stsz atom. This trips the heuristic and tries to apply the CPK rules against Cinepak data which, aside from the weird chunk length, is perfectly compliant.Here are the first few chunk sizes, as reported by the file header (stsz atom) and the chunk :
size from stsz = 7880 (0x1EC8) ; from header = 3940 (0xF64) size from stsz = 3940 (0xF64) ; from header = 3940 (0xF64) size from stsz = 15792 (0x3DB0) ; from header = 3948 (0xF6C) size from stsz = 11844 (0x2E44) ; from header = 3948 (0xF6C)
Hey, there’s a pattern here. If they don’t match, then the stsz size is an even multiple of the chunk size (2x, 3x, or 4x in my observation). I suppose I could revise the heuristic to state that if the stsz size is 2x, 3x, 4x, or equal to the chunk header, qualify it as compliant Cinepak data.
Of course it feels impure, but software engineering is rarely about programmatic purity. A decade of special cases in the FFmpeg / Libav codebases are a testament to that.
What’s A Variant ?
Suddenly, I find myself contemplating what truly constitutes a variant. Maybe this was just a broken encoder program making these files ? And for that, I assign it the designation of distinct variation, like some sort of special, unique showflake ?Then again, I documented Magic Carpet FLIC as being a distinct variant of the broader FLIC format (which has an enormous number of variants as well).