Newest 'ffmpeg' Questions - Stack Overflow
Les articles publiés sur le site
-
Decoding HEVC with Alpha Channel via NVDEC : Monochrome (4:0:0) Workarounds ?
22 février, par Holy_diverI’m working on decoding HEVC streams that include an alpha (transparency) channel using NVIDIA’s NVDEC. The alpha channel is encoded in monochrome (YUV 4:0:0), but NVDEC’s HEVC decoder appears to lack support for monochrome formats. How can I work around this limitation?
Problem Details:
HEVC Profile: Stream uses HEVC_Rext (Range Extensions) with a monochrome alpha layer (4:0:0 chroma subsampling). NVDEC Limitations: The SDK documentation states support for 4:0:0 (8-bit) only for specific codecs like JPEG, not HEVC. Attempting to decode returns cudaError_InvalidValue or NVCUDACB_STATUS_INVALID_PARAM. Alpha Storage: The alpha is either a separate stream or a dual-layer HEVC bitstream (e.g., DUAL_LAYER_DEPTH_SEPARATE).
-
How to fix fps (stream ? container ? - wrong value) in mov file - using ffmpeg [closed]
22 février, par rgrInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.mov': Metadata: major_brand : qt minor_version : 512 compatible_brands: qt encoder : Lavf61.9.106 Duration: 00:02:29.98, start: 0.000000, bitrate: 2622795 kb/s Stream #0:0[0x1]: Video: prores (HQ) (apch / 0x68637061), yuv422p10le(progressive), 3840x2160, 2622789 kb/s, 59.95 fps, 59.94 tbr, 60k tbn (default) --------- Metadata: handler_name : VideoHandler vendor_id : FFMP encoder : Lavc60.37.100 prores_ks
How can I fix this value (59.95 fps) in a mov file? Because of it, Vegas loads the MOV file incorrectly (it recognizes it as 60fps and not 59.94fps). Frame times (dts) are ok, only this value is a problem.
(Is this the fps value read from the container or the stream?)
Using "ffmpeg -t -c copy" I can cut a fragment and then this value is fixed. But how can I fix the whole file without cutting it into fragments?
-
Animated watermark moving on the edges of the movie - ffmpeg
22 février, par saeid ezzatiI wanna overlay a picture on a video, as a watermark. How do I insert an animated watermark that randomly moves from side to side.
For example, A watermark, placed on top/ upper-left corner, moves randomly to the top/upper-right corner and freezes there for five seconds before moving down to the lower- right corner.
I don't want the watermark to have a cross movement and move from the upper-right corner to the lower-left corner.
Here is an example of my code, using which the watermark randomly jumps to a corner each 200 frames without animate:
ffmpeg -i "source.mp4" -i "watermark.png" -filter_complex "[1:v]scale=50:-1[a]; [0:v][a]overlay=x='st(0,floor(random(n)*2)+1);if(eq(mod(n-1,200),0), if(eq(ld(0),1),0, main_w-overlay_w ) ,x)':y='st(0,floor(random(n)*2)+1);if(eq(mod(n-1,200),0),if(eq(ld(0),1),0, main_h-overlay_h ),y)'" -codec:a copy "out.mp4"
-
FFMPEG says MP3 file is longer than it actually is
21 février, par badr2001I have a mp3 file which is 01:04:09 seconds long. When I use the following commmand:
ffmpeg -i TestAudio_123.mp3 -ss 60 -to 120 -c:a libmp3lame -q:a 2 output.mp3
I get this output in the console:
Input #0, mp3, from 'TestAudio_123.mp3': Metadata: major_brand : M4A minor_version : 0 compatible_brands: M4A isommp42 voice-memo-uuid : 07BF4A32-29E8-4A28-89D5-B6676F9CB945 title : تسجيل جديد ٣٨ encoder : Lavf61.1.100 Duration: 01:07:22.01, start: 0.023021, bitrate: 32 kb/s Stream #0:0: Audio: mp3 (mp3float), 48000 Hz, mono, fltp, 32 kb/s
My question is why is the duration longer than the actual input file? Just to show the input file duration:
-
Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech
21 février, par dannym25I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.
Setup:
- Twilio
sends inbound audio to my WebSocket server. - WebSocket receives and saves the raw mulaw-encoded audio data from Twilio.
- The audio is processed via Google Speech-to-Text for transcription.
- When I attempt to play back the audio, it sounds like machine-gun-like noise instead of spoken words.
1. Confirmed WebSocket Receives Data
• The WebSocket successfully logs incoming audio chunks from Twilio:
🔊 Received 379 bytes of audio from Twilio 🔊 Received 379 bytes of audio from Twilio
• This suggests Twilio is sending audio data, but it's not being interpreted correctly.
2. Saving and Playing Raw Audio
• I save the incoming raw mulaw (8000Hz) audio from Twilio to a file:
fs.appendFileSync('twilio-audio.raw', message);
• Then, I convert it to a
.wav
file using FFmpeg:ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav
• Problem: When I play the audio using
ffplay
, it contains no speech, only rapid clicking sounds.3. Ensured Correct Audio Encoding
• Twilio sends mulaw 8000Hz mono format. • Verified that my
ffmpeg
conversion is using the same settings. • Tried different conversion methods:ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav
→ Same issue.
4. Checked Google Speech-to-Text Input Format
• Google STT requires proper encoding configuration:
const request = { config: { encoding: 'MULAW', sampleRateHertz: 8000, languageCode: 'en-US', }, interimResults: false, };
• No errors from Google STT, but it never detects speech, likely because the input audio is just noise.
5. Confirmed That Raw Audio is Not a WAV File
• Since Twilio sends raw audio, I checked whether I needed to strip the header before processing. • Tried manually extracting raw bytes, but the issue persists.
Current Theory:
- The WebSocket server might be handling Twilio’s raw audio incorrectly before saving it.
- There might be an additional header in the Twilio stream that needs to be removed before playback.
- Twilio’s
tag expects a WebSocket connection starting withwss://
instead ofhttps://
, and switching towss://
partially fixed some previous connection issues.
Code Snippets:
Twilio
Setup in TwiML Response app.post('/voice-response', (req, res) => { console.log("📞 Incoming call from Twilio"); const twiml = new twilio.twiml.VoiceResponse(); twiml.say("Hello! Welcome to the service. How can I help you?"); // Prevent Twilio from hanging up too early twiml.pause({ length: 5 }); twiml.connect().stream({ url: `wss://your-ngrok-url/ws`, track: "inbound_track" }); console.log("🛠️ Twilio Stream URL:", `wss://your-ngrok-url/ws`); res.type('text/xml').send(twiml.toString()); });
WebSocket Server Handling Twilio Audio Stream
wss.on('connection', (ws) => { console.log("🔗 WebSocket Connected! Waiting for audio input..."); ws.on('message', (message) => { console.log(`🔊 Received ${message.length} bytes of audio from Twilio`); // Save raw audio data for debugging fs.appendFileSync('twilio-audio.raw', message); // Check if audio is non-empty but contains only noise if (message.length < 100) { console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent."); } }); ws.on('close', () => { console.log("❌ WebSocket Disconnected!"); // Convert Twilio audio for debugging exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => { if (err) console.error("❌ FFmpeg Conversion Error:", err); else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`"); }); }); ws.on('error', (error) => console.error("⚠️ WebSocket Error:", error)); });
Questions:
- Why is the audio from Twilio being received as a clicking noise instead of actual speech?
- Do I need to strip any additional metadata from the raw bytes before saving?
- Is there a known issue with Twilio’s
mulaw
format when streaming audio over WebSockets? - How can I confirm that Google STT is receiving properly formatted audio?
Additional Context:
- Twilio
is connected and receiving data (confirmed by logs). - WebSocket successfully receives and saves audio, but it only plays noise.
- Tried multiple ffmpeg conversions, Google STT configurations, and raw data inspection.
- Still no recognizable speech in the audio output.
Any help is greatly appreciated! 🙏
- Twilio