Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (1)

Mot : - Tags -/ogv

Autres articles (28)

Publier sur MédiaSpip

13 juin 2013

Puis-je poster des contenus à partir d’une tablette Ipad ?
Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir
Les tâches Cron régulières de la ferme

1er décembre 2010, par kent1

La gestion de la ferme passe par l’exécution à intervalle régulier de plusieurs tâches répétitives dites Cron.
Le super Cron (gestion_mutu_super_cron)
Cette tâche, planifiée chaque minute, a pour simple effet d’appeler le Cron de l’ensemble des instances de la mutualisation régulièrement. Couplée avec un Cron système sur le site central de la mutualisation, cela permet de simplement générer des visites régulières sur les différents sites et éviter que les tâches des sites peu visités soient trop (...)
Déploiements possibles

31 janvier 2010, par kent1

Deux types de déploiements sont envisageable dépendant de deux aspects : La méthode d’installation envisagée (en standalone ou en ferme) ; Le nombre d’encodages journaliers et la fréquentation envisagés ;
L’encodage de vidéos est un processus lourd consommant énormément de ressources système (CPU et RAM), il est nécessaire de prendre tout cela en considération. Ce système n’est donc possible que sur un ou plusieurs serveurs dédiés.
Version mono serveur
La version mono serveur consiste à n’utiliser qu’une (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

Sur d’autres sites (6087)

How to write NALs produced by x264_encoder_encode() using ffmpeg av_interleaved_write_frame()

21 janvier 2013, par Haleeq Usman

I have been trying to produce a "flv" video file in the following sequence :

av_register_all();



// Open video file

if (avformat_open_input(&amp;pFormatCtx, "6.mp4", NULL, NULL) != 0)

    return -1; // Couldn&#39;t open file



// Retrieve stream information

if (avformat_find_stream_info(pFormatCtx, NULL) &lt; 0)

    return -1; // Couldn&#39;t find stream information



// Dump information about file onto standard error

av_dump_format(pFormatCtx, 0, "input_file.mp4", 0);



// Find the first video stream

videoStream = -1;

for (i = 0; i &lt; pFormatCtx->nb_streams; i++)

    if (pFormatCtx->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO) {

        videoStream = i;

        break;

    }

if (videoStream == -1)

    return -1; // Didn&#39;t find a video stream



// Get a pointer to the codec context for the video stream

pCodecCtx = pFormatCtx->streams[videoStream]->codec;



// Find the decoder for the video stream

pCodec = avcodec_find_decoder(pCodecCtx->codec_id);

if (pCodec == NULL) {

    fprintf(stderr, "Unsupported codec!\n");

    return -1; // Codec not found

}

// Open codec

if (avcodec_open2(pCodecCtx, pCodec, NULL) &lt; 0)

    return -1; // Could not open codec



// Allocate video frame

pFrame = avcodec_alloc_frame();



// Allocate video frame

pFrame = avcodec_alloc_frame();



// Allocate an AVFrame structure

pFrameYUV420 = avcodec_alloc_frame();

if (pFrameYUV420 == NULL)

    return -1;



// Determine required buffer size and allocate buffer

numBytes = avpicture_get_size(pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height);

buffer = (uint8_t *) av_malloc(numBytes * sizeof(uint8_t));



// Assign appropriate parts of buffer to image planes in pFrameYUV420

// Note that pFrameYUV420 is an AVFrame, but AVFrame is a superset of AVPicture

avpicture_fill((AVPicture *) pFrameRGB, buffer, pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height);



// Setup scaler

img_convert_ctx = sws_getContext(pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt, pCodecCtx->width, pCodecCtx->height, pCodecCtx->pix_fmt, SWS_BILINEAR, 0, 0, 0);

if (img_convert_ctx == NULL) {

    fprintf(stderr, "Cannot initialize the conversion context!\n");

    exit(1);

}



// Setup encoder/muxing now

filename = "output_file.flv";

fmt = av_guess_format("flv", filename, NULL);

if (fmt == NULL) {

    printf("Could not guess format.\n");

    return -1;

}

/* allocate the output media context */

oc = avformat_alloc_context();

if (oc == NULL) {

    printf("could not allocate context.\n");

    return -1;

}

oc->oformat = fmt;

snprintf(oc->filename, sizeof(oc->filename), "%s", filename);



video_st = NULL;

if (fmt->video_codec != AV_CODEC_ID_NONE) {

    video_st = add_stream(oc, &amp;video_codec, fmt->video_codec);

}



// Let&#39;s see some information about our format

av_dump_format(oc, 0, filename, 1);



/* open the output file, if needed */

if (!(fmt->flags &amp; AVFMT_NOFILE)) {

    ret = avio_open(&amp;oc->pb, filename, AVIO_FLAG_WRITE);

    if (ret &lt; 0) {

        fprintf(stderr, "Could not open &#39;%s&#39;: %s\n", filename, av_err2str(ret));

        return 1;

    }

    }

/* Write the stream header, if any. */

ret = avformat_write_header(oc, NULL);

if (ret &lt; 0) {

    fprintf(stderr, "Error occurred when opening output file: %s\n", av_err2str(ret));

    return 1;

}



// Setup x264 params

x264_param_t param;

x264_param_default_preset(&amp;param, "veryfast", "zerolatency");

param.i_threads = 1;

param.i_width = video_st->codec->width;

param.i_height = video_st->codec->height;

param.i_fps_num = STREAM_FRAME_RATE; // 30 fps, same as video

param.i_fps_den = 1;

// Intra refres:

param.i_keyint_max = STREAM_FRAME_RATE;

param.b_intra_refresh = 1;

// Rate control:

param.rc.i_rc_method = X264_RC_CRF;

param.rc.f_rf_constant = 25;

param.rc.f_rf_constant_max = 35;

// For streaming:

param.b_repeat_headers = 1;

param.b_annexb = 1;

x264_param_apply_profile(&amp;param, "baseline");



x264_t* encoder = x264_encoder_open(&amp;param);

x264_picture_t pic_in, pic_out;

x264_picture_alloc(&amp;pic_in, X264_CSP_I420, video_st->codec->width, video_st->codec->height);



x264_nal_t* nals;

int i_nals;



// The loop:

// 1. Read frames

// 2. Decode the frame

// 3. Attempt to re-encode using x264

// 4. Write the x264 encoded frame using av_interleaved_write_frame

while (av_read_frame(pFormatCtx, &amp;packet) >= 0) {

    // Is this a packet from the video stream?

    if (packet.stream_index == videoStream) {

        // Decode video frame

        avcodec_decode_video2(pCodecCtx, pFrame, &amp;frameFinished, &amp;packet);



        // Did we get a video frame?

        if (frameFinished) {

            sws_scale(img_convert_ctx, pFrame->data, pFrame->linesize, 0, pCodecCtx->height, pic_in.img.plane, pic_in.img.i_stride);

            int frame_size = x264_encoder_encode(encoder, &amp;nals, &amp;i_nals, &amp;pic_in, &amp;pic_out);



            if (frame_size >= 0) {

                if (i_nals &lt; 0)

                    printf("invalid frame size: %d\n", i_nals);

                // write out NALs

                for (i = 0; i &lt; i_nals; i++) {

                    // initalize a packet

                    AVPacket p;

                    av_init_packet(&amp;p);

                    p.data = nals[i].p_payload;

                    p.size = nals[i].i_payload;

                    p.stream_index = video_st->index;

                    p.flags = AV_PKT_FLAG_KEY;

                    p.pts = AV_NOPTS_VALUE;

                    p.dts = AV_NOPTS_VALUE;

                    ret = av_interleaved_write_frame(oc, &amp;p);

                }

            }

            printf("encoded frame #%d\n", frame_count);

            frame_count++;

        }

    }



    // Free the packet that was allocated by av_read_frame

    av_free_packet(&amp;packet);

}



// Now we free up resources used/close codecs, and finally close our program.

Here is the implementation for the add_stream() function :

/* Add an output stream. */

static AVStream *add_stream(AVFormatContext *oc, AVCodec **codec, enum AVCodecID codec_id) {

    AVCodecContext *c;

    AVStream *st;

    int r;

    /* find the encoder */

    *codec = avcodec_find_encoder(codec_id);

    if (!(*codec)) {

        fprintf(stderr, "Could not find encoder for &#39;%s&#39;\n",

                avcodec_get_name(codec_id));

        exit(1);

    }

    st = avformat_new_stream(oc, *codec);

    if (!st) {

        fprintf(stderr, "Could not allocate stream\n");

        exit(1);

    }

    st->id = oc->nb_streams - 1;

    c = st->codec;

    switch ((*codec)->type) {

    case AVMEDIA_TYPE_AUDIO:

        st->id = 1;

        c->sample_fmt = AV_SAMPLE_FMT_FLTP;

        c->bit_rate = 64000;

        c->sample_rate = 44100;

        c->channels = 2;

        break;

    case AVMEDIA_TYPE_VIDEO:

        avcodec_get_context_defaults3(c, *codec);

        c->codec_id = codec_id;

        c->bit_rate = 500*1000;

        //c->rc_min_rate = 500*1000;

        //c->rc_max_rate = 500*1000;

        //c->rc_buffer_size = 500*1000;

        /* Resolution must be a multiple of two. */

        c->width = 1280;

        c->height = 720;

        /* timebase: This is the fundamental unit of time (in seconds) in terms

         * of which frame timestamps are represented. For fixed-fps content,

         * timebase should be 1/framerate and timestamp increments should be

         * identical to 1. */

        c->time_base.den = STREAM_FRAME_RATE;

        c->time_base.num = 1;

        c->gop_size = 12; /* emit one intra frame every twelve frames at most */

        c->pix_fmt = STREAM_PIX_FMT;

        if (c->codec_id == AV_CODEC_ID_MPEG2VIDEO) {

            /* just for testing, we also add B frames */

            c->max_b_frames = 2;

        }

        if (c->codec_id == AV_CODEC_ID_MPEG1VIDEO) {

            /* Needed to avoid using macroblocks in which some coeffs overflow.

             * This does not happen with normal video, it just happens here as

             * the motion of the chroma plane does not match the luma plane. */

            c->mb_decision = 2;

        }

        break;

    default:

        break;

    }

    /* Some formats want stream headers to be separate. */

    if (oc->oformat->flags &amp; AVFMT_GLOBALHEADER)

        c->flags |= CODEC_FLAG_GLOBAL_HEADER;

    return st;

}

After the encoding is complete, I check the output file output_file.flv. I notice it's size is very large : 101MB and it does not play. If I use ffmpeg to decode/encode the input file, then I get an output file about 83MB in size (which is about the same size as the original .mp4 file used as input). Also, the 83MB output from just using ffmpeg C api, as opposed to using x264 for the encoding step, plays just fine. Does anyone know where I am going wrong ? I have tried researching this for a few days now but with no luck :(. I feel that I am close to making it work, however, I just cannot figure out what I am doing wrong. Thank you !

how to stream h.264 video with mp3 audio using libavcodec ?

18 septembre 2012, par dasg

I read h.264 frames from webcamera and capture audio from microphone. I need to stream live video to ffserver. During debug I read video from ffserver using ffmpeg with following command :

ffmpeg -i http://127.0.0.1:12345/robot.avi -vcodec copy -acodec copy out.avi

My video in output file is slightly accelerated. If I add a audio stream it is accelerated several times. Sometimes there is no audio in the output file.

Here is my code for encoding audio :

#include "v_audio_encoder.h"



extern "C" {

#include <libavcodec></libavcodec>avcodec.h>

}

#include <cassert>



struct VAudioEncoder::Private

{

    AVCodec *m_codec;

    AVCodecContext *m_context;



    std::vector m_outBuffer;

};



VAudioEncoder::VAudioEncoder( int sampleRate, int bitRate )

{

    d = new Private( );

    d->m_codec = avcodec_find_encoder( CODEC_ID_MP3 );

    assert( d->m_codec );

    d->m_context = avcodec_alloc_context3( d->m_codec );



    // put sample parameters

    d->m_context->channels = 2;

    d->m_context->bit_rate = bitRate;

    d->m_context->sample_rate = sampleRate;

    d->m_context->sample_fmt = AV_SAMPLE_FMT_S16;

    strcpy( d->m_context->codec_name, "libmp3lame" );



    // open it

    int res = avcodec_open2( d->m_context, d->m_codec, 0 );

    assert( res >= 0 );



    d->m_outBuffer.resize( d->m_context->frame_size );

}



VAudioEncoder::~VAudioEncoder( )

{

    avcodec_close( d->m_context );

    av_free( d->m_context );

    delete d;

}



void VAudioEncoder::encode( const std::vector&amp; samples, std::vector&amp; outbuf )

{

    assert( (int)samples.size( ) == d->m_context->frame_size );



    int outSize = avcodec_encode_audio( d->m_context, d->m_outBuffer.data( ),

                                        d->m_outBuffer.size( ), reinterpret_cast<const>( samples.data( ) ) );

    if( outSize ) {

        outbuf.resize( outSize );

        memcpy( outbuf.data( ), d->m_outBuffer.data( ), outSize );

    }

    else

        outbuf.clear( );

}



int VAudioEncoder::getFrameSize( ) const

{

    return d->m_context->frame_size;

}

</const></cassert>

Here is my code for streaming video :

#include "v_out_video_stream.h"



extern "C" {

#include <libavformat></libavformat>avformat.h>

#include <libavutil></libavutil>opt.h>

#include <libavutil></libavutil>avstring.h>

#include <libavformat></libavformat>avio.h>

}



#include <stdexcept>

#include <cassert>



struct VStatticRegistrar

{

    VStatticRegistrar( )

    {

        av_register_all( );

        avformat_network_init( );

    }

};



VStatticRegistrar __registrar;



struct VOutVideoStream::Private

{

    AVFormatContext * m_context;

    int m_videoStreamIndex;

    int m_audioStreamIndex;



    int m_videoBitrate;

    int m_width;

    int m_height;

    int m_fps;

    int m_bitrate;



    bool m_waitKeyFrame;

};



VOutVideoStream::VOutVideoStream( int width, int height, int fps, int bitrate )

{

    d = new Private( );

    d->m_width = width;

    d->m_height = height;

    d->m_fps = fps;

    d->m_context = 0;

    d->m_videoStreamIndex = -1;

    d->m_audioStreamIndex = -1;

    d->m_bitrate = bitrate;

    d->m_waitKeyFrame = true;

}



bool VOutVideoStream::connectToServer( const std::string&amp; uri )

{

    assert( ! d->m_context );



    // initalize the AV context

    d->m_context = avformat_alloc_context();

    if( !d->m_context )

        return false;

    // get the output format

    d->m_context->oformat = av_guess_format( "ffm", NULL, NULL );

    if( ! d->m_context->oformat )

        return false;



    strcpy( d->m_context->filename, uri.c_str( ) );



    // add an H.264 stream

    AVStream *stream = avformat_new_stream( d->m_context, NULL );

    if ( ! stream )

        return false;

    // initalize codec

    AVCodecContext* codec = stream->codec;

    if( d->m_context->oformat->flags &amp; AVFMT_GLOBALHEADER )

        codec->flags |= CODEC_FLAG_GLOBAL_HEADER;

    codec->codec_id = CODEC_ID_H264;

    codec->codec_type = AVMEDIA_TYPE_VIDEO;

    strcpy( codec->codec_name, "libx264" );

//    codec->codec_tag = ( unsigned(&#39;4&#39;) &lt;&lt; 24 ) + (unsigned(&#39;6&#39;) &lt;&lt; 16 ) + ( unsigned(&#39;2&#39;) &lt;&lt; 8 ) + &#39;H&#39;;

    codec->width = d->m_width;

    codec->height = d->m_height;

    codec->time_base.den = d->m_fps;

    codec->time_base.num = 1;

    codec->bit_rate = d->m_bitrate;

    d->m_videoStreamIndex = stream->index;



    // add an MP3 stream

    stream = avformat_new_stream( d->m_context, NULL );

    if ( ! stream )

        return false;

    // initalize codec

    codec = stream->codec;

    if( d->m_context->oformat->flags &amp; AVFMT_GLOBALHEADER )

        codec->flags |= CODEC_FLAG_GLOBAL_HEADER;

    codec->codec_id = CODEC_ID_MP3;

    codec->codec_type = AVMEDIA_TYPE_AUDIO;

    strcpy( codec->codec_name, "libmp3lame" );

    codec->sample_fmt = AV_SAMPLE_FMT_S16;

    codec->channels = 2;

    codec->bit_rate = 64000;

    codec->sample_rate = 44100;

    d->m_audioStreamIndex = stream->index;



    // try to open the stream

    if( avio_open( &amp;d->m_context->pb, d->m_context->filename, AVIO_FLAG_WRITE ) &lt; 0 )

         return false;



    // write the header

    return avformat_write_header( d->m_context, NULL ) == 0;

}



void VOutVideoStream::disconnect( )

{

    assert( d->m_context );



    avio_close( d->m_context->pb );

    avformat_free_context( d->m_context );

    d->m_context = 0;

}



VOutVideoStream::~VOutVideoStream( )

{

    if( d->m_context )

        disconnect( );

    delete d;

}



int VOutVideoStream::getVopType( const std::vector&amp; image )

{

    if( image.size( ) &lt; 6 )

        return -1;

    unsigned char *b = (unsigned char*)image.data( );



    // Verify NAL marker

    if( b[ 0 ] || b[ 1 ] || 0x01 != b[ 2 ] ) {

        ++b;

        if ( b[ 0 ] || b[ 1 ] || 0x01 != b[ 2 ] )

            return -1;

    }



    b += 3;



    // Verify VOP id

    if( 0xb6 == *b ) {

        ++b;

        return ( *b &amp; 0xc0 ) >> 6;

    }



    switch( *b ) {

    case 0x65: return 0;

    case 0x61: return 1;

    case 0x01: return 2;

    }



    return -1;

}



bool VOutVideoStream::sendVideoFrame( std::vector&amp; image )

{

    // Init packet

    AVPacket pkt;

    av_init_packet( &amp;pkt );

    pkt.flags |= ( 0 >= getVopType( image ) ) ? AV_PKT_FLAG_KEY : 0;



    // Wait for key frame

    if ( d->m_waitKeyFrame ) {

        if( pkt.flags &amp; AV_PKT_FLAG_KEY )

            d->m_waitKeyFrame = false;

        else

            return true;

    }



    pkt.stream_index = d->m_videoStreamIndex;

    pkt.data = image.data( );

    pkt.size = image.size( );

    pkt.pts = pkt.dts = AV_NOPTS_VALUE;



    return av_write_frame( d->m_context, &amp;pkt ) >= 0;

}



bool VOutVideoStream::sendAudioFrame( std::vector&amp; audio )

{

    // Init packet

    AVPacket pkt;

    av_init_packet( &amp;pkt );

    pkt.stream_index = d->m_audioStreamIndex;

    pkt.data = audio.data( );

    pkt.size = audio.size( );

    pkt.pts = pkt.dts = AV_NOPTS_VALUE;



    return av_write_frame( d->m_context, &amp;pkt ) >= 0;

}

</cassert></stdexcept>

Here is how I use it :

BOOST_AUTO_TEST_CASE(testSendingVideo)

{

    const int framesToGrab = 90000;



    VOutVideoStream stream( VIDEO_WIDTH, VIDEO_HEIGHT, FPS, VIDEO_BITRATE );

    if( stream.connectToServer( URI ) ) {

        VAudioEncoder audioEncoder( AUDIO_SAMPLE_RATE, AUDIO_BIT_RATE );

        VAudioCapture microphone( MICROPHONE_NAME, AUDIO_SAMPLE_RATE, audioEncoder.getFrameSize( ) );



        VLogitecCamera camera( VIDEO_WIDTH, VIDEO_HEIGHT );

        BOOST_REQUIRE( camera.open( CAMERA_PORT ) );

        BOOST_REQUIRE( camera.startCapturing( ) );



        std::vector image, encodedAudio;

        std::vector voice;

        boost::system_time startTime;

        int delta;

        for( int i = 0; i &lt; framesToGrab; ++i ) {

            startTime = boost::posix_time::microsec_clock::universal_time( );



            BOOST_REQUIRE( camera.read( image ) );

            BOOST_REQUIRE( microphone.read( voice ) );

            audioEncoder.encode( voice, encodedAudio );



            BOOST_REQUIRE( stream.sendVideoFrame( image ) );

            BOOST_REQUIRE( stream.sendAudioFrame( encodedAudio ) );



            delta = ( boost::posix_time::microsec_clock::universal_time( ) - startTime ).total_milliseconds( );

            if( delta &lt; 1000 / FPS )

                boost::thread::sleep( startTime + boost::posix_time::milliseconds( 1000 / FPS - delta ) );

        }



        BOOST_REQUIRE( camera.stopCapturing( ) );

        BOOST_REQUIRE( camera.close( ) );

    }

    else

        std::cout &lt;&lt; "failed to connect to server" &lt;&lt; std::endl;

}

I think my problem is in PTS and DTS. Can anyone help me ?

Method For Crawling Google

28 mai 2011, par Multimedia Mike — Big Data
I wanted to crawl Google in order to harvest a large corpus of certain types of data as yielded by a certain search term (we’ll call it “term” for this exercise). Google doesn’t appear to offer any API to automatically harvest their search results (why would they ?). So I sat down and thought about how to do it. This is the solution I came up with.

FAQ
Q : Is this legal / ethical / compliant with Google’s terms of service ?
A : Does it look like I care ? Moving right along…

Manual Crawling Process
For this exercise, I essentially automated the task that would be performed by a human. It goes something like this :
1. Search for “term”
2. On the first page of results, download each of the 10 results returned
3. Click on the next page of results
4. Go to step 2, until Google doesn’t return anymore pages of search results
Google returns up to 1000 results for a given search term. Fetching them 10 at a time is less than efficient. Fortunately, the search URL can easily be tweaked to return up to 100 results per page.

Expanding Reach
Problem : 1000 results for the “term” search isn’t that many. I need a way to expand the search. I’m not aiming for relevancy ; I’m just searching for random examples of some data that occurs around the internet.

My solution for this is to refine the search using the “site” wildcard. For example, you can ask Google to search for “term” at all Canadian domains using “site :.ca”. So, the manual process now involves harvesting up to 1000 results for every single internet top level domain (TLD). But many TLDs can be more granular than that. For example, there are 50 sub-domains under .us, one for each state (e.g., .ca.us, .ny.us). Those all need to be searched independently. Same for all the sub-domains under TLDs which don’t allow domains under the main TLD, such as .uk (search under .co.uk, .ac.uk, etc.).

Another extension is to combine “term” searches with other terms that are likely to have a rich correlation with “term”. For example, if “term” is relevant to various scientific fields, search for “term” in conjunction with various scientific disciplines.

Algorithmically
My solution is to create an SQLite database that contains a table of search seeds. Each seed is essentially a “site :” string combined with a starting index.

Each TLD and sub-TLD is inserted as a searchseed record with a starting index of 0.

A script performs the following crawling algorithm :
- Fetch the next record from the searchseed table which has not been crawled
- Fetch search result page from Google
- Scrape URLs from page and insert each into URL table
- Mark the searchseed record as having been crawled
- If the results page indicates there are more results for this search, insert a new searchseed for the same seed but with a starting index 100 higher
Digging Into Sites
Sometimes, Google notes that certain sites are particularly rich sources of “term” and offers to let you search that site for “term”. This basically links to another search for ‘term site:somesite”. That site gets its own search seed and the program might harvest up to 1000 URLs from that site alone.

Harvesting the Data
Armed with a database of URLs, employ the following algorithm :
- Fetch a random URL from the database which has yet to be downloaded
- Try to download it
- For goodness sake, have a mechanism in place to detect whether the download process has stalled and automatically kill it after a certain period of time
- Store the data and update the database, noting where the information was stored and that it is already downloaded
This step is easy to parallelize by simply executing multiple copies of the script. It is useful to update the URL table to indicate that one process is already trying to download a URL so multiple processes don’t duplicate work.

Acting Human
A few factors here :
- Google allegedly doesn’t like automated programs crawling its search results. Thus, at the very least, don’t let your script advertise itself as an automated program. At a basic level, this means forging the User-Agent : HTTP header. By default, Python’s urllib2 will identify itself as a programming language. Change this to a well-known browser string.
- Be patient ; don’t fire off these search requests as quickly as possible. My crawling algorithm inserts a random delay of a few seconds in between each request. This can still yield hundreds of useful URLs per minute.
- On harvesting the data : Even though you can parallelize this and download data as quickly as your connection can handle, it’s a good idea to randomize the URLs. If you hypothetically had 4 download processes running at once and they got to a point in the URL table which had many URLs from a single site, the server might be configured to reject too many simultaneous requests from a single client.
Conclusion
Anyway, that’s just the way I would (and did) do it. What did I do with all the data ? That’s a subject for a different post.

Adorable spider drawing from here.

1 | ... | 1811 | 1812 | 1813 | 1814 | 1815 | 1816 | 1817 | 1818 | 1819 | ... | 2029

Recherche avancée

Médias (1)

Bug de détection d’ogg

Autres articles (28)

Publier sur MédiaSpip

Les tâches Cron régulières de la ferme

Déploiements possibles

Sur d’autres sites (6087)

How to write NALs produced by x264_encoder_encode() using ffmpeg av_interleaved_write_frame()

how to stream h.264 video with mp3 audio using libavcodec ?

Method For Crawling Google

Se connecter

Navigation

Syndication

Boussole SPIP