Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (1)

Mot : - Tags -/MediaSPIP 0.2

Autres articles (81)

Keeping control of your media in your hands

13 avril 2011, par kent1

The vocabulary used on this site and around MediaSPIP in general, aims to avoid reference to Web 2.0 and the companies that profit from media-sharing.
While using MediaSPIP, you are invited to avoid using words like "Brand", "Cloud" and "Market".
MediaSPIP is designed to facilitate the sharing of creative media online, while allowing authors to retain complete control of their work.
MediaSPIP aims to be accessible to as many people as possible and development is based on expanding the (...)
Les sons

15 mai 2013, par kent1
Soumettre bugs et patchs

10 avril 2011

Un logiciel n’est malheureusement jamais parfait...
Si vous pensez avoir mis la main sur un bug, reportez le dans notre système de tickets en prenant bien soin de nous remonter certaines informations pertinentes : le type de navigateur et sa version exacte avec lequel vous avez l’anomalie ; une explication la plus précise possible du problème rencontré ; si possibles les étapes pour reproduire le problème ; un lien vers le site / la page en question ;
Si vous pensez avoir résolu vous même le bug (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 27

Sur d’autres sites (5905)

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

14 février 2018, par Josh

I have an rtmp stream of a video call and I want to transcribe it. I have created 2 services in Go and I’m getting results but it’s not very accurate and a lot of data seems to get lost.

Let me explain.

I have a transcode service, I use ffmpeg to transcode the video to Linear16 audio and place the output bytes onto a PubSub queue for a transcribe service to handle. Obviously there is a limit to the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I chunk the transcoded data into 3 second clips (not fixed length, just seems about right) and put them onto the queue.

The data is transcoded quite simply :

var stdout Buffer



cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")

cmd.Stdout = &amp;stdout



if err := cmd.Start(); err != nil {

    log.Fatal(err)

}



ticker := time.NewTicker(3 * time.Second)



for {

    select {

    case &lt;-ticker.C:

        bytesConverted := stdout.Len()

        log.Infof("Converted %d bytes", bytesConverted)



        // Send the data we converted, even if there are no bytes.

        topic.Publish(ctx, &amp;pubsub.Message{

            Data: stdout.Bytes(),

        })



        stdout.Reset()

    }

}

The transcribe service pulls messages from the queue at a rate of 1 every 3 seconds, helping to process the audio data at about the same rate as it’s being created. There are limits on the Speech API stream, it can’t be longer than 60 seconds so I stop the old stream and start a new one every 30 seconds so we never hit the limit, no matter how long the video call lasts for.

This is how I’m transcribing it :

stream := prepareNewStream()

clipLengthTicker := time.NewTicker(30 * time.Second)

chunkLengthTicker := time.NewTicker(3 * time.Second)



cctx, cancel := context.WithCancel(context.TODO())

err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {



    select {

    case &lt;-clipLengthTicker.C:

        log.Infof("Clip length reached.")

        log.Infof("Closing stream and starting over")



        err := stream.CloseSend()

        if err != nil {

            log.Fatalf("Could not close stream: %v", err)

        }



        go getResult(stream)

        stream = prepareNewStream()



    case &lt;-chunkLengthTicker.C:

        log.Infof("Chunk length reached.")



        bytesConverted := len(msg.Data)



        log.Infof("Received %d bytes\n", bytesConverted)



        if bytesConverted > 0 {

            if err := stream.Send(&amp;speechpb.StreamingRecognizeRequest{

                StreamingRequest: &amp;speechpb.StreamingRecognizeRequest_AudioContent{

                    AudioContent: transcodedChunk.Data,

                },

            }); err != nil {

                resp, _ := stream.Recv()

                log.Errorf("Could not send audio: %v", resp.GetError())

            }

        }



        msg.Ack()

    }

})

I think the problem is that my 3 second chunks don’t necessarily line up with starts and end of phrases or sentences so I suspect that the Speech API is a recurrent neural network which has been trained on full sentences rather than individual words. So starting a clip in the middle of a sentence loses some data because it can’t figure out the first few words up to the natural end of a phrase. Also, I lose some data in changing from an old stream to a new stream. There’s some context lost. I guess overlapping clips might help with this.

I have a couple of questions :

1) Does this architecture seem appropriate for my constraints (unknown length of audio stream, etc.) ?

2) What can I do to improve accuracy and minimise lost data ?

(Note I’ve simplified the examples for readability. Point out if anything doesn’t make sense because I’ve been heavy handed in cutting the examples down.)

How do IP camera stream video across home network

22 janvier 2018, par Ouroboros
My question is how do the IP camera stream the data from home network to public network. Here’s how I think it can be done :
1. If I’d to set up something like this using a raspberry pi camera module. I’d probably use port forwarding on my Access Point/Wifi Router. However, clearly, this is not a scalable solution, and there must be something else that off the shelf IP cameras must be doing.
2. One option is to stream the video (using ffmpeg) to a remove server, and then that remote server can probably again "re-stream" that ? -If this is indeed the case, how is it done ?
I understand backend architecture very strongly, and have developed fairly complex onces so I do want a fairly technical answer for this one.

intreadwrite : Use __unaligned in MSVC for ARM64 as well

16 janvier 2018, par Martin Storsjö

intreadwrite : Use __unaligned in MSVC for ARM64 as well

This attribute is supported for this architecture in MSVC as well
(but produces errors if used for 32 bit x86).

Signed-off-by : Martin Storsjö <martin@martin.st>

[D B H] libavutil/intreadwrite.h

1 | ... | 1886 | 1887 | 1888 | 1889 | 1890 | 1891 | 1892 | 1893 | 1894 | ... | 1969

Recherche avancée

Médias (1)

Collections - Formulaire de création rapide

Autres articles (81)

Keeping control of your media in your hands

Les sons

Soumettre bugs et patchs

Sur d’autres sites (5905)

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

How do IP camera stream video across home network

intreadwrite : Use __unaligned in MSVC for ARM64 as well

Se connecter

Navigation

Syndication

Boussole SPIP