Newest 'ffmpeg' Questions - Stack Overflow

http://stackoverflow.com/questions/tagged/ffmpeg

Les articles publiés sur le site

  • Decoding HEVC with Alpha Channel via NVDEC : Monochrome (4:0:0) Workarounds ?

    22 février, par Holy_diver

    I’m working on decoding HEVC streams that include an alpha (transparency) channel using NVIDIA’s NVDEC. The alpha channel is encoded in monochrome (YUV 4:0:0), but NVDEC’s HEVC decoder appears to lack support for monochrome formats. How can I work around this limitation?

    Problem Details:

    HEVC Profile: Stream uses HEVC_Rext (Range Extensions) 
    with a monochrome alpha layer (4:0:0 chroma subsampling).
    
    NVDEC Limitations: The SDK documentation states support 
    for 4:0:0 (8-bit) only for specific codecs like JPEG, not HEVC.
    Attempting to decode returns 
    cudaError_InvalidValue or NVCUDACB_STATUS_INVALID_PARAM.
    
    Alpha Storage: The alpha is either a separate stream 
    or a dual-layer HEVC bitstream (e.g., DUAL_LAYER_DEPTH_SEPARATE).
    
  • How to fix fps (stream ? container ? - wrong value) in mov file - using ffmpeg [closed]

    22 février, par rgr
    Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.mov':
      Metadata:
        major_brand     : qt
        minor_version   : 512
        compatible_brands: qt
        encoder         : Lavf61.9.106
      Duration: 00:02:29.98, start: 0.000000, bitrate: 2622795 kb/s
      Stream #0:0[0x1]: Video: prores (HQ) (apch / 0x68637061), yuv422p10le(progressive), 3840x2160, 2622789 kb/s,
      59.95 fps, 59.94 tbr, 60k tbn (default)
      ---------
          Metadata:
            handler_name    : VideoHandler
            vendor_id       : FFMP
            encoder         : Lavc60.37.100 prores_ks
    

    How can I fix this value (59.95 fps) in a mov file? Because of it, Vegas loads the MOV file incorrectly (it recognizes it as 60fps and not 59.94fps). Frame times (dts) are ok, only this value is a problem.

    (Is this the fps value read from the container or the stream?)

    Using "ffmpeg -t -c copy" I can cut a fragment and then this value is fixed. But how can I fix the whole file without cutting it into fragments?

  • Animated watermark moving on the edges of the movie - ffmpeg

    22 février, par saeid ezzati

    I wanna overlay a picture on a video, as a watermark. How do I insert an animated watermark that randomly moves from side to side.

    For example, A watermark, placed on top/ upper-left corner, moves randomly to the top/upper-right corner and freezes there for five seconds before moving down to the lower- right corner.

    I don't want the watermark to have a cross movement and move from the upper-right corner to the lower-left corner.

    Here is an example of my code, using which the watermark randomly jumps to a corner each 200 frames without animate:

    ffmpeg -i "source.mp4" -i "watermark.png" -filter_complex "[1:v]scale=50:-1[a]; [0:v][a]overlay=x='st(0,floor(random(n)*2)+1);if(eq(mod(n-1,200),0), if(eq(ld(0),1),0,  main_w-overlay_w   ) ,x)':y='st(0,floor(random(n)*2)+1);if(eq(mod(n-1,200),0),if(eq(ld(0),1),0,  main_h-overlay_h   ),y)'" -codec:a copy "out.mp4"
    
  • FFMPEG says MP3 file is longer than it actually is

    21 février, par badr2001

    I have a mp3 file which is 01:04:09 seconds long. When I use the following commmand:

    ffmpeg -i TestAudio_123.mp3 -ss 60 -to 120 -c:a libmp3lame -q:a 2 output.mp3
    

    I get this output in the console:

    Input #0, mp3, from 'TestAudio_123.mp3':
      Metadata:
        major_brand     : M4A
        minor_version   : 0
        compatible_brands: M4A isommp42
        voice-memo-uuid : 07BF4A32-29E8-4A28-89D5-B6676F9CB945
        title           : تسجيل جديد ٣٨
        encoder         : Lavf61.1.100
      Duration: 01:07:22.01, start: 0.023021, bitrate: 32 kb/s
      Stream #0:0: Audio: mp3 (mp3float), 48000 Hz, mono, fltp, 32 kb/s
    

    My question is why is the duration longer than the actual input file? Just to show the input file duration:

    enter image description here

  • Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

    21 février, par dannym25

    I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.

    Setup:

    • Twilio sends inbound audio to my WebSocket server.
    • WebSocket receives and saves the raw mulaw-encoded audio data from Twilio.
    • The audio is processed via Google Speech-to-Text for transcription.
    • When I attempt to play back the audio, it sounds like machine-gun-like noise instead of spoken words.

    1. Confirmed WebSocket Receives Data

    • The WebSocket successfully logs incoming audio chunks from Twilio:

    🔊 Received 379 bytes of audio from Twilio
    🔊 Received 379 bytes of audio from Twilio
    

    • This suggests Twilio is sending audio data, but it's not being interpreted correctly.

    2. Saving and Playing Raw Audio

    • I save the incoming raw mulaw (8000Hz) audio from Twilio to a file:

    fs.appendFileSync('twilio-audio.raw', message);
    

    • Then, I convert it to a .wav file using FFmpeg:

    ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav
    

    Problem: When I play the audio using ffplay, it contains no speech, only rapid clicking sounds.

    3. Ensured Correct Audio Encoding

    • Twilio sends mulaw 8000Hz mono format. • Verified that my ffmpeg conversion is using the same settings. • Tried different conversion methods:

    ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav
    

    → Same issue.

    4. Checked Google Speech-to-Text Input Format

    • Google STT requires proper encoding configuration:

    const request = {
        config: {
            encoding: 'MULAW',
            sampleRateHertz: 8000,
            languageCode: 'en-US',
        },
        interimResults: false,
    };
    

    • No errors from Google STT, but it never detects speech, likely because the input audio is just noise.

    5. Confirmed That Raw Audio is Not a WAV File

    • Since Twilio sends raw audio, I checked whether I needed to strip the header before processing. • Tried manually extracting raw bytes, but the issue persists.

    Current Theory:

    • The WebSocket server might be handling Twilio’s raw audio incorrectly before saving it.
    • There might be an additional header in the Twilio stream that needs to be removed before playback.
    • Twilio’s tag expects a WebSocket connection starting with wss:// instead of https://, and switching to wss:// partially fixed some previous connection issues.

    Code Snippets:

    Twilio Setup in TwiML Response

    app.post('/voice-response', (req, res) => {
        console.log("📞 Incoming call from Twilio");
    
        const twiml = new twilio.twiml.VoiceResponse();
        twiml.say("Hello! Welcome to the service. How can I help you?");
        
        // Prevent Twilio from hanging up too early
        twiml.pause({ length: 5 });
    
        twiml.connect().stream({
            url: `wss://your-ngrok-url/ws`,
            track: "inbound_track"
        });
    
        console.log("🛠️ Twilio Stream URL:", `wss://your-ngrok-url/ws`);
        
        res.type('text/xml').send(twiml.toString());
    });
    

    WebSocket Server Handling Twilio Audio Stream

    wss.on('connection', (ws) => {
        console.log("🔗 WebSocket Connected! Waiting for audio input...");
    
        ws.on('message', (message) => {
            console.log(`🔊 Received ${message.length} bytes of audio from Twilio`);
    
            // Save raw audio data for debugging
            fs.appendFileSync('twilio-audio.raw', message);
    
            // Check if audio is non-empty but contains only noise
            if (message.length < 100) {
                console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent.");
            }
        });
    
        ws.on('close', () => {
            console.log("❌ WebSocket Disconnected!");
            
            // Convert Twilio audio for debugging
            exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => {
                if (err) console.error("❌ FFmpeg Conversion Error:", err);
                else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`");
            });
        });
    
        ws.on('error', (error) => console.error("⚠️ WebSocket Error:", error));
    });
    

    Questions:

    • Why is the audio from Twilio being received as a clicking noise instead of actual speech?
    • Do I need to strip any additional metadata from the raw bytes before saving?
    • Is there a known issue with Twilio’s mulaw format when streaming audio over WebSockets?
    • How can I confirm that Google STT is receiving properly formatted audio?

    Additional Context:

    • Twilio is connected and receiving data (confirmed by logs).
    • WebSocket successfully receives and saves audio, but it only plays noise.
    • Tried multiple ffmpeg conversions, Google STT configurations, and raw data inspection.
    • Still no recognizable speech in the audio output.

    Any help is greatly appreciated! 🙏