Recherche avancée

Médias (91)

Autres articles (68)

  • Modifier la date de publication

    21 juin 2013, par

    Comment changer la date de publication d’un média ?
    Il faut au préalable rajouter un champ "Date de publication" dans le masque de formulaire adéquat :
    Administrer > Configuration des masques de formulaires > Sélectionner "Un média"
    Dans la rubrique "Champs à ajouter, cocher "Date de publication "
    Cliquer en bas de la page sur Enregistrer

  • MediaSPIP Init et Diogène : types de publications de MediaSPIP

    11 novembre 2010, par

    À l’installation d’un site MediaSPIP, le plugin MediaSPIP Init réalise certaines opérations dont la principale consiste à créer quatre rubriques principales dans le site et de créer cinq templates de formulaire pour Diogène.
    Ces quatre rubriques principales (aussi appelées secteurs) sont : Medias ; Sites ; Editos ; Actualités ;
    Pour chacune de ces rubriques est créé un template de formulaire spécifique éponyme. Pour la rubrique "Medias" un second template "catégorie" est créé permettant d’ajouter (...)

  • Le profil des utilisateurs

    12 avril 2011, par

    Chaque utilisateur dispose d’une page de profil lui permettant de modifier ses informations personnelle. Dans le menu de haut de page par défaut, un élément de menu est automatiquement créé à l’initialisation de MediaSPIP, visible uniquement si le visiteur est identifié sur le site.
    L’utilisateur a accès à la modification de profil depuis sa page auteur, un lien dans la navigation "Modifier votre profil" est (...)

Sur d’autres sites (5980)

  • Linker error by trying example program with ffmpeg

    11 mars, par Chris

    I think this might be a stupid question and I'm just blind but that thing is driving me nuts for hours now.
I downloaded ffmpeg and build it. Now I want to try the thing out in a program but I can't setup cmake to link ffmpeg properly and have no idea what is wrong.

    


    The linker always tells me this :

    


    christoph@christoph-ThinkPad-T490:~/develop/ffmpg_example/build$ make
[ 50%] Linking CXX executable test
CMakeFiles/test.dir/main.cxx.o: In function `main':
main.cxx:(.text+0x180): undefined reference to `av_register_all()'
main.cxx:(.text+0x1a7): undefined reference to `avformat_open_input(AVFormatContext**, char const*, AVInputFormat*, AVDictionary**)'
main.cxx:(.text+0x1ce): undefined reference to `avformat_find_stream_info(AVFormatContext*, AVDictionary**)'
main.cxx:(.text+0x206): undefined reference to `av_dump_format(AVFormatContext*, int, char const*, int)'
main.cxx:(.text+0x2bb): undefined reference to `avcodec_find_decoder(AVCodecID)'
main.cxx:(.text+0x2fc): undefined reference to `avcodec_alloc_context3(AVCodec const*)'
main.cxx:(.text+0x316): undefined reference to `avcodec_copy_context(AVCodecContext*, AVCodecContext const*)'
main.cxx:(.text+0x361): undefined reference to `avcodec_open2(AVCodecContext*, AVCodec const*, AVDictionary**)'
main.cxx:(.text+0x377): undefined reference to `av_frame_alloc()'
main.cxx:(.text+0x383): undefined reference to `av_frame_alloc()'
main.cxx:(.text+0x3ba): undefined reference to `avpicture_get_size(AVPixelFormat, int, int)'
main.cxx:(.text+0x3d0): undefined reference to `av_malloc(unsigned long)'
main.cxx:(.text+0x3ff): undefined reference to `avpicture_fill(AVPicture*, unsigned char const*, AVPixelFormat, int, int)'
main.cxx:(.text+0x43d): undefined reference to `sws_getContext(int, int, AVPixelFormat, int, int, AVPixelFormat, int, SwsFilter*, SwsFilter*, double const*)'
main.cxx:(.text+0x465): undefined reference to `av_read_frame(AVFormatContext*, AVPacket*)'
main.cxx:(.text+0x49f): undefined reference to `avcodec_decode_video2(AVCodecContext*, AVFrame*, int*, AVPacket const*)'
main.cxx:(.text+0x4fd): undefined reference to `sws_scale(SwsContext*, unsigned char const* const*, int const*, int, int, unsigned char* const*, int const*)'
main.cxx:(.text+0x545): undefined reference to `av_free_packet(AVPacket*)'
main.cxx:(.text+0x556): undefined reference to `av_free(void*)'
main.cxx:(.text+0x565): undefined reference to `av_frame_free(AVFrame**)'
main.cxx:(.text+0x574): undefined reference to `av_frame_free(AVFrame**)'
main.cxx:(.text+0x580): undefined reference to `avcodec_close(AVCodecContext*)'
main.cxx:(.text+0x58f): undefined reference to `avcodec_close(AVCodecContext*)'
main.cxx:(.text+0x59e): undefined reference to `avformat_close_input(AVFormatContext**)'
collect2: error: ld returned 1 exit status
CMakeFiles/test.dir/build.make:87: recipe for target 'test' failed
make[2]: *** [test] Error 1
CMakeFiles/Makefile2:75: recipe for target 'CMakeFiles/test.dir/all' failed
make[1]: *** [CMakeFiles/test.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2


    


    The cmake list looks like this :

    


    cmake_minimum_required(VERSION 3.16)

project(ffmpeg_test)

add_library(avformat STATIC IMPORTED)
set_target_properties(avformat
    PROPERTIES IMPORTED_LOCATION /home/christoph/develop/FFmpeg/build/lib/libavformat.a
)
add_library(avcodec STATIC IMPORTED)
set_target_properties(avcodec
    PROPERTIES IMPORTED_LOCATION /home/christoph/develop/FFmpeg/build/lib/libavcodec.a
)
add_library(swscale STATIC IMPORTED)
set_target_properties(swscale
    PROPERTIES IMPORTED_LOCATION /home/christoph/develop/FFmpeg/build/lib/libswscale.a
)
add_library(avutil STATIC IMPORTED)
set_target_properties(avutil
    PROPERTIES IMPORTED_LOCATION /home/christoph/develop/FFmpeg/build/lib/libavutil.a
)
add_executable(test main.cxx)

target_link_libraries(test PRIVATE
    /home/christoph/develop/FFmpeg/build/lib/libavformat.a
    avcodec
    swscale
    avutil
)
target_include_directories(test PRIVATE /home/christoph/develop/FFmpeg/build/include)


    


    And here are the ffmpeg libs :

    


    christoph@christoph-ThinkPad-T490:~/develop/FFmpeg/build/lib$ ll
total 277840
drwxr-xr-x  3 christoph christoph      4096 Dez  7 23:59 ./
drwxr-xr-x 17 christoph christoph      4096 Dez  7 23:59 ../
-rw-r--r--  1 christoph christoph 173479270 Dez  7 23:59 libavcodec.a
-rw-r--r--  1 christoph christoph   2174910 Dez  7 23:59 libavdevice.a
-rw-r--r--  1 christoph christoph  37992438 Dez  7 23:59 libavfilter.a
-rw-r--r--  1 christoph christoph  59222040 Dez  7 23:59 libavformat.a
-rw-r--r--  1 christoph christoph   4759514 Dez  7 23:59 libavutil.a
-rw-r--r--  1 christoph christoph    695698 Dez  7 23:59 libswresample.a
-rw-r--r--  1 christoph christoph   6164398 Dez  7 23:59 libswscale.a
drwxr-xr-x  2 christoph christoph      4096 Dez  7 23:59 pkgconfig/


    


    And this is the example code :

    


    #include&#xA;&#xA;#include <libavcodec></libavcodec>avcodec.h>&#xA;#include <libavformat></libavformat>avformat.h>&#xA;#include <libswscale></libswscale>swscale.h>&#xA;&#xA;// compatibility with newer API&#xA;#if LIBAVCODEC_VERSION_INT &lt; AV_VERSION_INT(55,28,1)&#xA;#define av_frame_alloc avcodec_alloc_frame&#xA;#define av_frame_free avcodec_free_frame&#xA;#endif&#xA;&#xA;void SaveFrame(AVFrame *pFrame, int width, int height, int iFrame) {&#xA;  FILE *pFile;&#xA;  char szFilename[32];&#xA;  int  y;&#xA;  &#xA;  // Open file&#xA;  sprintf(szFilename, "frame%d.ppm", iFrame);&#xA;  pFile=fopen(szFilename, "wb");&#xA;  if(pFile==NULL)&#xA;    return;&#xA;  &#xA;  // Write header&#xA;  fprintf(pFile, "P6\n%d %d\n255\n", width, height);&#xA;  &#xA;  // Write pixel data&#xA;  for(y=0; ydata[0]&#x2B;y*pFrame->linesize[0], 1, width*3, pFile);&#xA;  &#xA;  // Close file&#xA;  fclose(pFile);&#xA;}&#xA;&#xA;int main(int argc, char *argv[]) {&#xA;  // Initalizing these to NULL prevents segfaults!&#xA;  AVFormatContext   *pFormatCtx = NULL;&#xA;  int               i, videoStream;&#xA;  AVCodecContext    *pCodecCtxOrig = NULL;&#xA;  AVCodecContext    *pCodecCtx = NULL;&#xA;  AVCodec           *pCodec = NULL;&#xA;  AVFrame           *pFrame = NULL;&#xA;  AVFrame           *pFrameRGB = NULL;&#xA;  AVPacket          packet;&#xA;  int               frameFinished;&#xA;  int               numBytes;&#xA;  uint8_t           *buffer = NULL;&#xA;  struct SwsContext *sws_ctx = NULL;&#xA;&#xA;  if(argc &lt; 2) {&#xA;    printf("Please provide a movie file\n");&#xA;    return -1;&#xA;  }&#xA;  // Register all formats and codecs&#xA;  av_register_all();&#xA;  &#xA;  // Open video file&#xA;  if(avformat_open_input(&amp;pFormatCtx, argv[1], NULL, NULL)!=0)&#xA;    return -1; // Couldn&#x27;t open file&#xA;  &#xA;  // Retrieve stream information&#xA;  if(avformat_find_stream_info(pFormatCtx, NULL)&lt;0)&#xA;    return -1; // Couldn&#x27;t find stream information&#xA;  &#xA;  // Dump information about file onto standard error&#xA;  av_dump_format(pFormatCtx, 0, argv[1], 0);&#xA;  &#xA;  // Find the first video stream&#xA;  videoStream=-1;&#xA;  for(i=0; inb_streams; i&#x2B;&#x2B;)&#xA;    if(pFormatCtx->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO) {&#xA;      videoStream=i;&#xA;      break;&#xA;    }&#xA;  if(videoStream==-1)&#xA;    return -1; // Didn&#x27;t find a video stream&#xA;  &#xA;  // Get a pointer to the codec context for the video stream&#xA;  pCodecCtxOrig=pFormatCtx->streams[videoStream]->codec;&#xA;  // Find the decoder for the video stream&#xA;  pCodec=avcodec_find_decoder(pCodecCtxOrig->codec_id);&#xA;  if(pCodec==NULL) {&#xA;    fprintf(stderr, "Unsupported codec!\n");&#xA;    return -1; // Codec not found&#xA;  }&#xA;  // Copy context&#xA;  pCodecCtx = avcodec_alloc_context3(pCodec);&#xA;  if(avcodec_copy_context(pCodecCtx, pCodecCtxOrig) != 0) {&#xA;    fprintf(stderr, "Couldn&#x27;t copy codec context");&#xA;    return -1; // Error copying codec context&#xA;  }&#xA;&#xA;  // Open codec&#xA;  if(avcodec_open2(pCodecCtx, pCodec, NULL)&lt;0)&#xA;    return -1; // Could not open codec&#xA;  &#xA;  // Allocate video frame&#xA;  pFrame=av_frame_alloc();&#xA;  &#xA;  // Allocate an AVFrame structure&#xA;  pFrameRGB=av_frame_alloc();&#xA;  if(pFrameRGB==NULL)&#xA;    return -1;&#xA;&#xA;  // Determine required buffer size and allocate buffer&#xA;  numBytes=avpicture_get_size(AV_PIX_FMT_RGB24, pCodecCtx->width,&#xA;                  pCodecCtx->height);&#xA;  buffer=(uint8_t *)av_malloc(numBytes*sizeof(uint8_t));&#xA;  &#xA;  // Assign appropriate parts of buffer to image planes in pFrameRGB&#xA;  // Note that pFrameRGB is an AVFrame, but AVFrame is a superset&#xA;  // of AVPicture&#xA;  avpicture_fill((AVPicture *)pFrameRGB, buffer, AV_PIX_FMT_RGB24,&#xA;         pCodecCtx->width, pCodecCtx->height);&#xA;  &#xA;  // initialize SWS context for software scaling&#xA;  sws_ctx = sws_getContext(pCodecCtx->width,&#xA;               pCodecCtx->height,&#xA;               pCodecCtx->pix_fmt,&#xA;               pCodecCtx->width,&#xA;               pCodecCtx->height,&#xA;               AV_PIX_FMT_RGB24,&#xA;               SWS_BILINEAR,&#xA;               NULL,&#xA;               NULL,&#xA;               NULL&#xA;               );&#xA;&#xA;  // Read frames and save first five frames to disk&#xA;  i=0;&#xA;  while(av_read_frame(pFormatCtx, &amp;packet)>=0) {&#xA;    // Is this a packet from the video stream?&#xA;    if(packet.stream_index==videoStream) {&#xA;      // Decode video frame&#xA;      avcodec_decode_video2(pCodecCtx, pFrame, &amp;frameFinished, &amp;packet);&#xA;      &#xA;      // Did we get a video frame?&#xA;      if(frameFinished) {&#xA;    // Convert the image from its native format to RGB&#xA;    sws_scale(sws_ctx, (uint8_t const * const *)pFrame->data,&#xA;          pFrame->linesize, 0, pCodecCtx->height,&#xA;          pFrameRGB->data, pFrameRGB->linesize);&#xA;    &#xA;    // Save the frame to disk&#xA;    if(&#x2B;&#x2B;i&lt;=5)&#xA;      SaveFrame(pFrameRGB, pCodecCtx->width, pCodecCtx->height, &#xA;            i);&#xA;      }&#xA;    }&#xA;    &#xA;    // Free the packet that was allocated by av_read_frame&#xA;    av_free_packet(&amp;packet);&#xA;  }&#xA;  &#xA;  // Free the RGB image&#xA;  av_free(buffer);&#xA;  av_frame_free(&amp;pFrameRGB);&#xA;  &#xA;  // Free the YUV frame&#xA;  av_frame_free(&amp;pFrame);&#xA;  &#xA;  // Close the codecs&#xA;  avcodec_close(pCodecCtx);&#xA;  avcodec_close(pCodecCtxOrig);&#xA;&#xA;  // Close the video file&#xA;  avformat_close_input(&amp;pFormatCtx);&#xA;  &#xA;  return 0;&#xA;}&#xA;

    &#xA;

  • WebVTT Discussions at FOMS

    18 décembre 2013, par silvia

    At the recent FOMS (Foundations of Open Media Software and Standards) Developer Workshop, we had a massive focus on WebVTT and the state of its feature set. You will find links to summaries of the individual discussions in the FOMS Schedule page. Here are some of the key results I went away with.

    1. WebVTT Regions

    The key driving force for improvements to WebVTT continues to be the accurate representation of CEA608/708 captioning. As part of that drive, we’ve introduced regions (the CEA708 “window” concept) to WebVTT. WebVTT regions satisfy multiple requirements of CEA608/708 captions :

    1. support for rollup captions
    2. support for background color and border color on a group of cues independent of the background color of the individual cue
    3. possibility to move a group of cues from one location on screen to a different
    4. support to specify an anchor point and a growth direction for cues when their text size changes
    5. support for specifying a fixed number of lines to be rendered
    6. possibility to specify which region is rendered in front of which other one when regions overlap

    While WebVTT regions enable us to satisfy all of the above points, the specification isn’t actually complete yet and some of the above needs aren’t satisfied yet.

    We have an open bug to move a region elsewhere. A first discussion at FOMS seemed to to indicate that we’ll have to add syntax for updating a region at a particular time and thus give region definitions a way to be valid only for a certain time frame. I can imagine that the region definitions that we have in the header of the WebVTT file now would have an implicitly defined time frame from the start to the end of the file, but can be overruled by a re-definition anywhere within the WebVTT file. That redefinition needs to provide a start and end time.

    We registered a bug to add specifying the width and height of regions (and possibly of cues) by em (i.e. by multiples of the largest character in a font). This should allow us to have the region grow/shrink around the region anchor point with a change of font size by script or a user. em specifications should also be applied to cues – that matches the column count of CEA708/608 better.

    When regions overlap, the original region extension spec already suggested a “layer” cue setting. It will be easy to add it.

    Another change that we will ultimately need is the “scroll” setting : we will need to introduce support for scrolling text down or from left-to-right or right-to-left, e.g. vertical scrolling text seems to be used in some Chinese caption use cases.

    2. Unify Rendering Approach

    The introduction of regions created a second code path in the rendering spec with some duplication. At FOMS we discussed if it was possible to unify that. The suggestion is to render all cues into a region. Those that are not part of a region would be rendered into an anonymous region that covers the complete viewport. There may be some consequences to this, e.g. cue settings should be usable across all cues, no matter whether or not part of a region, and avoiding cue overlap may need to be done within regions.

    Here’s a rough outline of the path of the new rendering algorithm :

    (1) Render the regions :

    Specified Region Anonymous Region
    Render values as given : Render following values :
    • width
    • lines
    • regionanchor
    • viewportanchor
    • scroll
    • 100%
    • videoheight/lineheight
    • 0,0
    • 0,0
    • none

    (2) Render the cues :

    • Create a cue box and put it in its region (anonymous if none given).
    • Calculate position & size of cue box from cue settings (position, line, size).
    • Calculate position of cue text inside cue box from remaining cue settings (vertical, align).

    3. Vertical Features

    WebVTT includes vertical rendering, both right-to-left and left-to-right. However, regions are not defined for vertical. Eventually, we’re going to have to look at the vertical features of WebVTT with more details and figure out whether the spec is working for them and what real-world requirements we have missed. We hope we can get some help from users in countries where vertically rendered captions/subtitles are the norm.

    4. Best Practices

    Some of he WebVTT users at FOMS suggested it would be advantageous to start a list of “best practices” for how to author captions with WebVTT. Example recommendations are :

    • Use line numbers only to position cues from top or bottom of viewport. Don’t use otherwise.
    • Note that when the user increases the fontsize in rollup captions and thus introduces new line breaks, your cues will roll by faster because the number of lines of a rollup is fixed.
    • Make sure to use &lrm ; and &rlm ; UTF-8 markers to control the directionality of your text.

    It would be nice if somebody started such a document.

    5. Non-caption use cases

    Instead of continuing to look back and improve our support of captions/subtitles in WebVTT, one session at FOMS also went ahead and looked forward to other use cases. The following requirements came out of this :

    5.1 Preview Thumbnails

    A common use case for timed data is the use of preview thumbnails on the navigation bar of videos. A native implementation of preview thumbnails would allow crawlers and search engines to have a standardised way of extracting timed images for media files, so introduction of a new @kind value “thumbnails” was suggested.

    The content of a “thumbnails” cue could be any of :

    • an image URL
    • a sprite URL to a single image
    • a spatial & temporal media fragment URL to a media resource
    • base64 encoded image (data URI)
    • an iframe offset to the media resource

    The suggestion is to allow anything that would work in a img @src attribute as value in a cue of @kind=”thumbnails”. Responsive images might also be useful for a track of @kind=”thumbnails”. It may even be possible to define an inband thumbnail track based on the track of @kind=”thumbnails”. Such cues should also work in the JavaScript track API.

    5.2 Chapter markers

    There is interest to put richer content than just a chapter title into chapter cues. Often, chapters consist of a title, text and and image. The text is not so important, but the image is used almost everywhere that chapters are used. There may be a need to extend chapter cue content with images, similar to what a @kind=”thumbnails” track offers.

    The conclusion that we arrived at was that we need to make @kind=”thumbnails” work first and then look at using the learnings from that to extend @kind=”chapters”.

    5.3 Inband tracks for live video

    A difficult topic was opened with the question of how to transport text tracks in live video. In live captioning, end times are never created for cues, but are implied by the start time of the next cue. This is a use case that hasn’t been addressed in HTML5/WebVTT yet. An old proposal to allow a special end time value of “NEXT” was discussed and recommended for adoption. Also, there was support for the spec change that stops blocking loading VTT until all cues have been loaded.

    5.4 Cross-domain VTT loading

    A brief discussion centered around the fact that the spec disallows cross-domain loading of WebVTT files, but that no browser implements this. This needs to be discussion at the HTML WG level.

    6. Regions in live captioning

    The final topic that we discussed was how we could provide support for regions in live captioning.

    • The currently active region definitions will need to be come part of every header of every VTT file segment that HLS uses, so it’s available in case the cues in the segment file reference it.
    • “NEXT” in end time markers would make authoring of live captioned VTT files easier.
    • If the application wants to use 1 word at a time and doesn’t want to delay sending the word until the full cue is authored (e.g. in a Hangout type environment), we will need to introduce the concept of “cue continuation markers”, so we know that a cue could be extended with the next VTT file fragment.

    This is an extensive and impressive amount of discussion around WebVTT and a lot of new work to be performed in the future. I’m very grateful for all the people who have contributed to these discussions at FOMS and will hopefully continue to help get the specifications right.

  • WebVTT Discussions at FOMS

    1er janvier 2014, par silvia

    At the recent FOMS (Foundations of Open Media Software and Standards) Developer Workshop, we had a massive focus on WebVTT and the state of its feature set. You will find links to summaries of the individual discussions in the FOMS Schedule page. Here are some of the key results I went away with.

    1. WebVTT Regions

    The key driving force for improvements to WebVTT continues to be the accurate representation of CEA608/708 captioning. As part of that drive, we’ve introduced regions (the CEA708 “window” concept) to WebVTT. WebVTT regions satisfy multiple requirements of CEA608/708 captions :

    1. support for rollup captions
    2. support for background color and border color on a group of cues independent of the background color of the individual cue
    3. possibility to move a group of cues from one location on screen to a different
    4. support to specify an anchor point and a growth direction for cues when their text size changes
    5. support for specifying a fixed number of lines to be rendered
    6. possibility to specify which region is rendered in front of which other one when regions overlap

    While WebVTT regions enable us to satisfy all of the above points, the specification isn’t actually complete yet and some of the above needs aren’t satisfied yet.

    We have an open bug to move a region elsewhere. A first discussion at FOMS seemed to to indicate that we’ll have to add syntax for updating a region at a particular time and thus give region definitions a way to be valid only for a certain time frame. I can imagine that the region definitions that we have in the header of the WebVTT file now would have an implicitly defined time frame from the start to the end of the file, but can be overruled by a re-definition anywhere within the WebVTT file. That redefinition needs to provide a start and end time.

    We registered a bug to add specifying the width and height of regions (and possibly of cues) by em (i.e. by multiples of the largest character in a font). This should allow us to have the region grow/shrink around the region anchor point with a change of font size by script or a user. em specifications should also be applied to cues – that matches the column count of CEA708/608 better.

    When regions overlap, the original region extension spec already suggested a “layer” cue setting. It will be easy to add it.

    Another change that we will ultimately need is the “scroll” setting : we will need to introduce support for scrolling text down or from left-to-right or right-to-left, e.g. vertical scrolling text seems to be used in some Chinese caption use cases.

    2. Unify Rendering Approach

    The introduction of regions created a second code path in the rendering spec with some duplication. At FOMS we discussed if it was possible to unify that. The suggestion is to render all cues into a region. Those that are not part of a region would be rendered into an anonymous region that covers the complete viewport. There may be some consequences to this, e.g. cue settings should be usable across all cues, no matter whether or not part of a region, and avoiding cue overlap may need to be done within regions.

    Here’s a rough outline of the path of the new rendering algorithm :

    (1) Render the regions :

    Specified Region Anonymous Region
    Render values as given : Render following values :
    • width
    • lines
    • regionanchor
    • viewportanchor
    • scroll
    • 100%
    • videoheight/lineheight
    • 0,0
    • 0,0
    • none

    (2) Render the cues :

    • Create a cue box and put it in its region (anonymous if none given).
    • Calculate position & size of cue box from cue settings (position, line, size).
    • Calculate position of cue text inside cue box from remaining cue settings (vertical, align).

    3. Vertical Features

    WebVTT includes vertical rendering, both right-to-left and left-to-right. However, regions are not defined for vertical. Eventually, we’re going to have to look at the vertical features of WebVTT with more details and figure out whether the spec is working for them and what real-world requirements we have missed. We hope we can get some help from users in countries where vertically rendered captions/subtitles are the norm.

    4. Best Practices

    Some of he WebVTT users at FOMS suggested it would be advantageous to start a list of “best practices” for how to author captions with WebVTT. Example recommendations are :

    • Use line numbers only to position cues from top or bottom of viewport. Don’t use otherwise.
    • Note that when the user increases the fontsize in rollup captions and thus introduces new line breaks, your cues will roll by faster because the number of lines of a rollup is fixed.
    • Make sure to use &lrm ; and &rlm ; UTF-8 markers to control the directionality of your text.

    It would be nice if somebody started such a document.

    5. Non-caption use cases

    Instead of continuing to look back and improve our support of captions/subtitles in WebVTT, one session at FOMS also went ahead and looked forward to other use cases. The following requirements came out of this :

    5.1 Preview Thumbnails

    A common use case for timed data is the use of preview thumbnails on the navigation bar of videos. A native implementation of preview thumbnails would allow crawlers and search engines to have a standardised way of extracting timed images for media files, so introduction of a new @kind value “thumbnails” was suggested.

    The content of a “thumbnails” cue could be any of :

    • an image URL
    • a sprite URL to a single image
    • a spatial & temporal media fragment URL to a media resource
    • base64 encoded image (data URI)
    • an iframe offset to the media resource

    The suggestion is to allow anything that would work in a img @src attribute as value in a cue of @kind=”thumbnails”. Responsive images might also be useful for a track of @kind=”thumbnails”. It may even be possible to define an inband thumbnail track based on the track of @kind=”thumbnails”. Such cues should also work in the JavaScript track API.

    5.2 Chapter markers

    There is interest to put richer content than just a chapter title into chapter cues. Often, chapters consist of a title, text and and image. The text is not so important, but the image is used almost everywhere that chapters are used. There may be a need to extend chapter cue content with images, similar to what a @kind=”thumbnails” track offers.

    The conclusion that we arrived at was that we need to make @kind=”thumbnails” work first and then look at using the learnings from that to extend @kind=”chapters”.

    5.3 Inband tracks for live video

    A difficult topic was opened with the question of how to transport text tracks in live video. In live captioning, end times are never created for cues, but are implied by the start time of the next cue. This is a use case that hasn’t been addressed in HTML5/WebVTT yet. An old proposal to allow a special end time value of “NEXT” was discussed and recommended for adoption. Also, there was support for the spec change that stops blocking loading VTT until all cues have been loaded.

    5.4 Cross-domain VTT loading

    A brief discussion centered around the fact that the spec disallows cross-domain loading of WebVTT files, but that no browser implements this. This needs to be discussion at the HTML WG level.

    6. Regions in live captioning

    The final topic that we discussed was how we could provide support for regions in live captioning.

    • The currently active region definitions will need to be come part of every header of every VTT file segment that HLS uses, so it’s available in case the cues in the segment file reference it.
    • “NEXT” in end time markers would make authoring of live captioned VTT files easier.
    • If the application wants to use 1 word at a time and doesn’t want to delay sending the word until the full cue is authored (e.g. in a Hangout type environment), we will need to introduce the concept of “cue continuation markers”, so we know that a cue could be extended with the next VTT file fragment.

    This is an extensive and impressive amount of discussion around WebVTT and a lot of new work to be performed in the future. I’m very grateful for all the people who have contributed to these discussions at FOMS and will hopefully continue to help get the specifications right.