Recherche avancée

Médias (3)

Mot : - Tags -/Valkaama

Autres articles (81)

  • Support de tous types de médias

    10 avril 2011

    Contrairement à beaucoup de logiciels et autres plate-formes modernes de partage de documents, MediaSPIP a l’ambition de gérer un maximum de formats de documents différents qu’ils soient de type : images (png, gif, jpg, bmp et autres...) ; audio (MP3, Ogg, Wav et autres...) ; vidéo (Avi, MP4, Ogv, mpg, mov, wmv et autres...) ; contenu textuel, code ou autres (open office, microsoft office (tableur, présentation), web (html, css), LaTeX, Google Earth) (...)

  • Supporting all media types

    13 avril 2011, par

    Unlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)

  • Les tâches Cron régulières de la ferme

    1er décembre 2010, par

    La gestion de la ferme passe par l’exécution à intervalle régulier de plusieurs tâches répétitives dites Cron.
    Le super Cron (gestion_mutu_super_cron)
    Cette tâche, planifiée chaque minute, a pour simple effet d’appeler le Cron de l’ensemble des instances de la mutualisation régulièrement. Couplée avec un Cron système sur le site central de la mutualisation, cela permet de simplement générer des visites régulières sur les différents sites et éviter que les tâches des sites peu visités soient trop (...)

Sur d’autres sites (5584)

  • How to Stream Audio from Google Cloud Storage in Chunks and Convert Each Chunk to WAV for Whisper Transcription

    14 novembre 2024, par Douglas Landvik

    I'm working on a project where I need to transcribe audio stored in a Google Cloud Storage bucket using OpenAI's Whisper model. The audio is stored in WebM format with Opus encoding, and due to the file size, I'm streaming the audio in 30-second chunks.

    


    To convert each chunk to WAV (16 kHz, mono, 16-bit PCM) compatible with Whisper, I'm using FFmpeg. The first chunk converts successfully, but subsequent chunks fail to convert. I suspect this is because each chunk lacks the WebM container's header, which FFmpeg needs to interpret the Opus codec correctly.

    


    Here’s a simplified version of my approach :

    


    Download Chunk : I download each chunk from GCS as bytes.
Convert with FFmpeg : I pass the bytes to FFmpeg to convert each chunk from WebM/Opus to WAV.

    


    async def handle_transcription_and_notify(
    consultation_service: ConsultationService,
    consultation_processor: ConsultationProcessor,
    consultation: Consultation,
    language: str,
    notes: str,
    clinic_id: str,
    vet_email: str,
    trace_id: str,
    blob_path: str,
    max_retries: int = 3,
    retry_delay: int = 5,
    max_concurrent_tasks: int = 3
):
    """
    Handles the transcription process by streaming the file from GCS, converting to a compatible format, 
    and notifying the client via WebSocket.
    """
    chunk_duration_sec = 30  # 30 seconds per chunk
    logger.info(f"Starting transcription process for consultation {consultation.consultation_id}",
                extra={'trace_id': trace_id})

    # Initialize GCS client
    service_account_key = os.environ.get('SERVICE_ACCOUNT_KEY_BACKEND')
    if not service_account_key:
        logger.error("Service account key not found in environment variables", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Service account key not found for consultation {consultation.consultation_id}.\nTrace ID: {trace_id}"
        )
        return

    try:
        service_account_info = json.loads(service_account_key)
        credentials = service_account.Credentials.from_service_account_info(service_account_info)
    except Exception as e:
        logger.error(f"Error loading service account credentials: {str(e)}", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error loading service account credentials for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    # Initialize GCS client
    service_account_key = os.environ.get('SERVICE_ACCOUNT_KEY_BACKEND')
    if not service_account_key:
        logger.error("Service account key not found in environment variables", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Service account key not found for consultation {consultation.consultation_id}.\nTrace ID: {trace_id}"
        )
        return

    try:
        service_account_info = json.loads(service_account_key)
        credentials = service_account.Credentials.from_service_account_info(service_account_info)
    except Exception as e:
        logger.error(f"Error loading service account credentials: {str(e)}", extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error loading service account credentials for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    storage_client = storage.Client(credentials=credentials)
    bucket_name = 'vetz_consultations'
    blob = storage_client.bucket(bucket_name).get_blob(blob_path)
    bytes_per_second = 16000 * 2  # 32,000 bytes per second
    chunk_size_bytes = 30 * bytes_per_second
    size = blob.size

    async def stream_blob_in_chunks(blob, chunk_size):
        loop = asyncio.get_running_loop()
        start = 0
        size = blob.size
        while start < size:
            end = min(start + chunk_size - 1, size - 1)
            try:
                logger.info(f"Requesting chunk from {start} to {end}", extra={'trace_id': trace_id})
                chunk = await loop.run_in_executor(
                    None, lambda: blob.download_as_bytes(start=start, end=end)
                )
                if not chunk:
                    break
                logger.info(f"Yielding chunk from {start} to {end}, size: {len(chunk)} bytes",
                            extra={'trace_id': trace_id})
                yield chunk
                start += chunk_size
            except Exception as e:
                logger.error(f"Error downloading chunk from {start} to {end}: {str(e)}", exc_info=True,
                             extra={'trace_id': trace_id})
                raise e

    async def convert_to_wav(chunk_bytes, chunk_idx):
        """
        Convert audio chunk to WAV format compatible with Whisper, ensuring it's 16 kHz, mono, and 16-bit PCM.
        """
        try:
            logger.debug(f"Processing chunk {chunk_idx}: size = {len(chunk_bytes)} bytes")

            detected_format = await detect_audio_format(chunk_bytes)
            logger.info(f"Detected audio format for chunk {chunk_idx}: {detected_format}")
            input_io = io.BytesIO(chunk_bytes)
            output_io = io.BytesIO()

            # ffmpeg command to convert webm/opus to WAV with 16 kHz, mono, and 16-bit PCM

            # ffmpeg command with debug information
            ffmpeg_command = [
                "ffmpeg",
                "-loglevel", "debug",
                "-f", "s16le",            # Treat input as raw PCM data
                "-ar", "48000",           # Set input sample rate
                "-ac", "1",               # Set input to mono
                "-i", "pipe:0",
                "-ar", "16000",           # Set output sample rate to 16 kHz
                "-ac", "1",               # Ensure mono output
                "-sample_fmt", "s16",     # Set output format to 16-bit PCM
                "-f", "wav",              # Output as WAV format
                "pipe:1"
            ]

            process = subprocess.Popen(
                ffmpeg_command,
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE
            )

            stdout, stderr = process.communicate(input=input_io.read())

            if process.returncode == 0:
                logger.info(f"FFmpeg conversion completed successfully for chunk {chunk_idx}")
                output_io.write(stdout)
                output_io.seek(0)

                # Save the WAV file locally for listening
                output_dir = "converted_chunks"
                os.makedirs(output_dir, exist_ok=True)
                file_path = os.path.join(output_dir, f"chunk_{chunk_idx}.wav")

                with open(file_path, "wb") as f:
                    f.write(stdout)
                logger.info(f"Chunk {chunk_idx} saved to {file_path}")

                return output_io
            else:
                logger.error(f"FFmpeg failed for chunk {chunk_idx} with return code {process.returncode}")
                logger.error(f"Chunk {chunk_idx} - FFmpeg stderr: {stderr.decode()}")
                return None

        except Exception as e:
            logger.error(f"Unexpected error in FFmpeg conversion for chunk {chunk_idx}: {str(e)}")
            return None

    async def transcribe_chunk(idx, chunk_bytes):
        for attempt in range(1, max_retries + 1):
            try:
                logger.info(f"Transcribing chunk {idx + 1} (attempt {attempt}).", extra={'trace_id': trace_id})

                # Convert to WAV format
                wav_io = await convert_to_wav(chunk_bytes, idx)
                if not wav_io:
                    logger.error(f"Failed to convert chunk {idx + 1} to WAV format.")
                    return ""

                wav_io.name = "chunk.wav"
                chunk_transcription = await consultation_processor.transcribe_audio_whisper(wav_io)
                logger.info(f"Chunk {idx + 1} transcribed successfully.", extra={'trace_id': trace_id})
                return chunk_transcription
            except Exception as e:
                logger.error(f"Error transcribing chunk {idx + 1} (attempt {attempt}): {str(e)}", exc_info=True,
                             extra={'trace_id': trace_id})
                if attempt < max_retries:
                    await asyncio.sleep(retry_delay)
                else:
                    await send_discord_alert(
                        f"Max retries reached for chunk {idx + 1} in consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
                    )
                    return ""  # Return empty string for failed chunk

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} is being transcribed.", vet_email
    )

    try:
        idx = 0
        full_transcription = []
        async for chunk in stream_blob_in_chunks(blob, chunk_size_bytes):
            transcription = await transcribe_chunk(idx, chunk)
            if transcription:
                full_transcription.append(transcription)
            idx += 1

        combined_transcription = " ".join(full_transcription)
        consultation.full_transcript = (consultation.full_transcript or "") + " " + combined_transcription
        consultation_service.save_consultation(clinic_id, vet_email, consultation)
        logger.info(f"Transcription saved for consultation {consultation.consultation_id}.",
                    extra={'trace_id': trace_id})

    except Exception as e:
        logger.error(f"Error during transcription process: {str(e)}", exc_info=True, extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error during transcription process for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        return

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} has been transcribed.", vet_email
    )

    try:
        template_service = TemplateService()
        medical_record_template = template_service.get_template_by_name(
            consultation.medical_record_template_id).sections

        sections = await consultation_processor.extract_structured_sections(
            transcription=consultation.full_transcript,
            notes=notes,
            language=language,
            template=medical_record_template,
        )
        consultation.sections = sections
        consultation_service.save_consultation(clinic_id, vet_email, consultation)
        logger.info(f"Sections processed for consultation {consultation.consultation_id}.",
                    extra={'trace_id': trace_id})
    except Exception as e:
        logger.error(f"Error processing sections for consultation {consultation.consultation_id}: {str(e)}",
                     exc_info=True, extra={'trace_id': trace_id})
        await send_discord_alert(
            f"Error processing sections for consultation {consultation.consultation_id}.\nError: {str(e)}\nTrace ID: {trace_id}"
        )
        raise e

    await notification_manager.send_personal_message(
        f"Consultation {consultation.consultation_id} is fully processed.", vet_email
    )
    logger.info(f"Successfully processed consultation {consultation.consultation_id}.",
                extra={'trace_id': trace_id})



    


  • On-premise analytics demand grows as Google Analytics GDPR uncertainties continue

    7 janvier 2020, par Jake Thornton — Privacy

    The Google Analytics GDPR relationship is a complicated one. Website owners in states like Berlin in Germany are now required to ask users for consent to collect their data. This doesn’t make for the friendliest user-experience and often the website visitor will simply click “no.”

    The problem Google Analytics now presents website owners in the EU is with more visitors clicking “no”, the less accurate your data will become.

    Why do you need to ask your visitors for consent ?

    At this stage it’s simply because Google Analytics collects data for its own purposes. An example of this is using your visitor’s personal data for retargeting purposes across their advertising platforms like Google Ads and YouTube. 

    Google’s Privacy & Terms states : “when you visit a website that uses advertising services like AdSense, including analytics tools like Google Analytics, or embeds video content from YouTube, your web browser automatically sends certain information to Google. This includes the URL of the page you’re visiting and your IP address. We may also set cookies on your browser or read cookies that are already there. Apps that use Google advertising services also share information with Google, such as the name of the app and a unique identifier for advertising.”

    The rise of hosting web analytics on-premise

    Managing Google Analytics and GDPR can quickly become complicated, so there’s been an increase in website owners switching from cloud-hosted web analytics platforms, like Google Analytics, to more GDPR compliant alternatives, where you can host web analytics software on your own servers. This is called hosting web analytics on-premise.

    Hosting web analytics on your own servers means :

    No third-parties are involved

    The visitor data your website collects is stored on your own internal infrastructure. This means no third-parties are involved and there’s no risk of personal data being used in the way Google Analytics uses it e.g. sending personal data to its advertising platforms. 

    When you sign up with Google Analytics you sign away control of your user’s personal data. With on-premise website analytics, you own your data and are in full control.

    NOTE : Though Google Analytics uses personal data for its own purposes, not all cloud hosted web analytics platforms do this. As an example, Matomo Analytics Cloud hosted solution states that all personal data collected is not used for its own purposes and that Matomo has no rights in accessing or using this personal data. 

    You control where in the world your personal data is stored

    Google Analytics servers are based out of USA, Europe and Asia, so where your personal data will end up is uncertain and you don’t have the option to choose which location it goes to when using free Google Analytics.

    Different countries have different laws when it comes to accessing personal data. When you choose to host your web analytics on-premise, you can choose the location of your servers and where the personal data is stored.

    More flexibility

    With self-hosted web analytics platforms like Matomo On-Premise, you can extend the platform to do anything you want without the restrictions that cloud hosted platforms impose.

    You can :

    • Get full access to the source code of open-source solutions, like Matomo
    • Extend the platform however you want for your business
    • Get access to APIs
    • Have no data limitations or restrictions
    • Get RAW data access
    • Have control over security

    >> Read more about on-premise flexibility for web analytics here

    So what does the future look like for Google Analytics and GDPR ?

    It’s difficult to assess this right now. How exactly GDPR is enforced is still quite unclear. 

    What is clear however, is now website owners in Berlin using Google Analytics are lawfully required to ask their visitors for consent to collect personal data. It has been reported that Google Analytics has already received 200,000 complaints in Germany alone and it appears this trend is likely to continue across much of the EU.

    When using Google Analytics in the EU you must also ensure your privacy policy is updated so website visitors are aware that data is being collected through Google Analytics for its own purposes.

    Moving to a web analytics on-premise platform

    Matomo Analytics is the #1 open-source web analytics platform in the world and has been rated as an exceptional alternative to Google Analytics. Check the reviews on Capterra.

    Choosing Matomo On-Premise means you can control exactly where your data is stored, you have full flexibility to customise the platform to do what you want and it’s FREE.

    Matomo’s mission is to give control back to website owners and the team has designed the platform so that moving away from Google Analytics is seamless. Matomo offers most of your favourite Google Analytics features, a leaner interface to navigate, and the option to add free and paid premium features that Google Analytics can’t even offer you.

    And now you can import your historical Google Analytics data directly into your Matomo with the Google Analytics Importer plugin.

    And if you can’t host web analytics on your own servers ...

    Hosting web analytics on-premise is not an option for all businesses as you do need the internal infrastructure and technical knowledge to host your own platform.

    If you can’t self-host, then Matomo has a Cloud hosted solution you can easily install and operate like Google Analytics, which is hosted on Matomo’s servers in the EU. 

    The GDPR advantages of choosing Matomo Cloud over Google Analytics are :

    • Servers are secure and based in the EU (strict laws forbid outside access)
    • 100% data ownership – we never use data for our own purposes
    • You can export your data anytime and switch to Matomo On-Premise whenever you like
    • User-privacy protection
    • Advanced GDPR Manager and data anonymisation features which GA doesn’t offer

    Interested to learn more ?

    If you are wanting to learn more about why users are making the move from Google Analytics to Matomo, check out our Matomo Analytics vs Google Analytics comparison page.

    >> Matomo Analytics vs Google Analytics

  • Anomalie #2910 : Erreur 404 après redirection après message forum ds un site avec URLs arbo.

    8 janvier 2013, par Joachim SENE

    intéressant : avec URLs Propres+.html le forum fonctionne mais les URLs sont parfois correctes -Titre-article.html parfois non ?page=article&id_article=17 ce qui fait que malgré ça, dans les 2 cas la redirection après validation du message de forum est ok … mais ce n’est pas satisfaisant du pt de (...)