Recherche avancée

Médias (91)

Autres articles (82)

  • Contribute to a better visual interface

    13 avril 2011

    MediaSPIP is based on a system of themes and templates. Templates define the placement of information on the page, and can be adapted to a wide range of uses. Themes define the overall graphic appearance of the site.
    Anyone can submit a new graphic theme or template and make it available to the MediaSPIP community.

  • Configuration spécifique pour PHP5

    4 février 2011, par

    PHP5 est obligatoire, vous pouvez l’installer en suivant ce tutoriel spécifique.
    Il est recommandé dans un premier temps de désactiver le safe_mode, cependant, s’il est correctement configuré et que les binaires nécessaires sont accessibles, MediaSPIP devrait fonctionner correctement avec le safe_mode activé.
    Modules spécifiques
    Il est nécessaire d’installer certains modules PHP spécifiques, via le gestionnaire de paquet de votre distribution ou manuellement : php5-mysql pour la connectivité avec la (...)

  • Multilang : améliorer l’interface pour les blocs multilingues

    18 février 2011, par

    Multilang est un plugin supplémentaire qui n’est pas activé par défaut lors de l’initialisation de MediaSPIP.
    Après son activation, une préconfiguration est mise en place automatiquement par MediaSPIP init permettant à la nouvelle fonctionnalité d’être automatiquement opérationnelle. Il n’est donc pas obligatoire de passer par une étape de configuration pour cela.

Sur d’autres sites (5396)

  • What's the most desireable way to capture system display and audio in the form of individual encoded audio and video packets in go (language) ? [closed]

    11 janvier 2023, par Tiger Yang

    Question (read the context below first) :

    


    For those of you familiar with the capabilities of go, Is there a better way to go about all this ? Since ffmpeg is so ubiquitous, I'm sure it's been optomized to perfection, but what's the best way to capture system display and audio in the form of individual encoded audio and video packets in go (language), so that they can be then sent via webtransport-go ? I wish for it to prioritize efficiency and low latency, and ideally capture and encode the framebuffer directly like ffmpeg does.

    


    Thanks ! I have many other questions about this, but I think it's best to ask as I go.

    


    Context and what I've done so far :

    


    I'm writing a remote desktop software for my personal use because of grievances with current solutions out there. At the moment, it consists of a web app that uses the webtransport API to send input datagrams and receive AV packets on two dedicated unidirectional streams, and the webcodecs API to decode these packets. On the serverside, I originally planned to use python with the aioquic library as a webtransport server. Upon connection and authentication, the server would start ffmpeg as a subprocess with this command :

    


    ffmpeg -init_hw_device d3d11va -filter_complex ddagrab=video_size=1920x1080:framerate=60 -vcodec hevc_nvenc -tune ll -preset p7 -spatial_aq 1 -temporal_aq 1 -forced-idr 1 -rc cbr -b:v 400K -no-scenecut 1 -g 216000 -f hevc -

    


    What I really appreciate about this is that it uses windows' desktop duplication API to copy the framebuffer of my GPU and hand that directly to the on-die hardware encoder with zero round trips to the CPU. I think it's about as efficient and elegant a solution as I can manage. It then outputs the encoded stream to the stdout, which python can read and send to the client.

    


    As for the audio, there is another ffmpeg instance :

    


    ffmpeg -f dshow -channels 2 -sample_rate 48000 -sample_size 16 -audio_buffer_size 15 -i audio="RD Audio (High Definition Audio Device)" -acodec libopus -vbr on -application audio -mapping_family 0 -apply_phase_inv true -b:a 25K -fec false -packet_loss 0 -map 0 -f data -

    


    which listens to a physical loopback interface, which is literally just a short wire bridging the front panel headphone and microphone jacks (I'm aware of the quality loss of converting to analog and back, but the audio is then crushed down to 25kbps so it's fine) ()

    


    Unfortunately, aioquic was not easy to work with IMO, and I found webtransport-go https://github.com/adriancable/webtransport-go, which was a hell of a lot better in both simplicity and documentation. However, now I'm dealing with a whole new language, and I wanna ask : (above)

    


    EDIT : Here's the code for my server so far :

    


    

    

    package main

import (
    "bytes"
    "context"
    "fmt"
    "log"
    "net/http"
    "os/exec"
    "time"

    "github.com/adriancable/webtransport-go"
)

func warn(str string) {
    fmt.Printf("\n===== WARNING ===================================================================================================\n   %s\n=================================================================================================================\n", str)
}

func main() {

    password := []byte("abc")

    videoString := []string{
        "ffmpeg",
        "-init_hw_device", "d3d11va",
        "-filter_complex", "ddagrab=video_size=1920x1080:framerate=60",
        "-vcodec", "hevc_nvenc",
        "-tune", "ll",
        "-preset", "p7",
        "-spatial_aq", "1",
        "-temporal_aq", "1",
        "-forced-idr", "1",
        "-rc", "cbr",
        "-b:v", "500K",
        "-no-scenecut", "1",
        "-g", "216000",
        "-f", "hevc", "-",
    }

    audioString := []string{
        "ffmpeg",
        "-f", "dshow",
        "-channels", "2",
        "-sample_rate", "48000",
        "-sample_size", "16",
        "-audio_buffer_size", "15",
        "-i", "audio=RD Audio (High Definition Audio Device)",
        "-acodec", "libopus",
        "-mapping_family", "0",
        "-b:a", "25K",
        "-map", "0",
        "-f", "data", "-",
    }

    connected := false

    http.HandleFunc("/", func(writer http.ResponseWriter, request *http.Request) {
        session := request.Body.(*webtransport.Session)

        session.AcceptSession()
        fmt.Println("\nAccepted incoming WebTransport connection.")
        fmt.Println("Awaiting authentication...")

        authData, err := session.ReceiveMessage(session.Context()) // Waits here till first datagram
        if err != nil {                                            // if client closes connection before sending anything
            fmt.Println("\nConnection closed:", err)
            return
        }

        if len(authData) >= 2 && bytes.Equal(authData[2:], password) {
            if connected {
                session.CloseSession()
                warn("Client has authenticated, but a session is already taking place! Connection closed.")
                return
            } else {
                connected = true
                fmt.Println("Client has authenticated!\n")
            }
        } else {
            session.CloseSession()
            warn("Client has failed authentication! Connection closed. (" + string(authData[2:]) + ")")
            return
        }

        videoStream, _ := session.OpenUniStreamSync(session.Context())

        videoCmd := exec.Command(videoString[0], videoString[1:]...)
        go func() {
            videoOut, _ := videoCmd.StdoutPipe()
            videoCmd.Start()

            buffer := make([]byte, 15000)
            for {
                len, err := videoOut.Read(buffer)
                if err != nil {
                    break
                }
                if len > 0 {
                    videoStream.Write(buffer[:len])
                }
            }
        }()

        time.Sleep(50 * time.Millisecond)

        audioStream, err := session.OpenUniStreamSync(session.Context())

        audioCmd := exec.Command(audioString[0], audioString[1:]...)
        go func() {
            audioOut, _ := audioCmd.StdoutPipe()
            audioCmd.Start()

            buffer := make([]byte, 15000)
            for {
                len, err := audioOut.Read(buffer)
                if err != nil {
                    break
                }
                if len > 0 {
                    audioStream.Write(buffer[:len])
                }
            }
        }()

        for {
            data, err := session.ReceiveMessage(session.Context())
            if err != nil {
                videoCmd.Process.Kill()
                audioCmd.Process.Kill()

                connected = false

                fmt.Println("\nConnection closed:", err)
                break
            }

            if len(data) == 0 {

            } else if data[0] == byte(0) {
                fmt.Printf("Received mouse datagram: %s\n", data)
            }
        }

    })

    server := &webtransport.Server{
        ListenAddr: ":1024",
        TLSCert:    webtransport.CertFile{Path: "SSL/fullchain.pem"},
        TLSKey:     webtransport.CertFile{Path: "SSL/privkey.pem"},
        QuicConfig: &webtransport.QuicConfig{
            KeepAlive:      false,
            MaxIdleTimeout: 3 * time.Second,
        },
    }

    fmt.Println("Launching WebTransport server at", server.ListenAddr)
    ctx, cancel := context.WithCancel(context.Background())
    if err := server.Run(ctx); err != nil {
        log.Fatal(err)
        cancel()
    }

}

    


    


    



  • What is Multi-Touch Attribution ? (And How To Get Started)

    2 février 2023, par Erin — Analytics Tips

    Good marketing thrives on data. Or more precisely — its interpretation. Using modern analytics software, we can determine which marketing actions steer prospects towards the desired action (a conversion event). 

    An attribution model in marketing is a set of rules that determine how various marketing tactics and channels impact the visitor’s progress towards a conversion. 

    Yet, as customer journeys become more complicated and involve multiple “touches”, standard marketing reports no longer tell the full picture. 

    That’s when multi-touch attribution analysis comes to the fore. 

    What is Multi-Touch Attribution ?

    Multi-touch attribution (also known as multi-channel attribution or cross-channel attribution) measures the impact of all touchpoints on the consumer journey on conversion. 

    Unlike single-touch reporting, multi-touch attribution models give credit to each marketing element — a social media ad, an on-site banner, an email link click, etc. By seeing impacts from every touchpoint and channel, marketers can avoid false assumptions or subpar budget allocations.

    To better understand the concept, let’s interpret the same customer journey using a standard single-touch report vs a multi-touch attribution model. 

    Picture this : Jammie is shopping around for a privacy-centred web analytics solution. She saw a recommendation on Twitter and ended up on the Matomo website. After browsing a few product pages and checking comparisons with other web analytics tools, she signs up for a webinar. One week after attending, Jammie is convinced that Matomo is the right tool for her business and goes directly to the Matomo website a starts a free trial. 

    • A standard single-touch report would attribute 100% of the conversion to direct traffic, which doesn’t give an accurate view of the multiple touchpoints that led Jammie to start a free trial. 
    • A multi-channel attribution report would showcase all the channels involved in the free trial conversion — social media, website content, the webinar, and then the direct traffic source.

    In other words : Multi-touch attribution helps you understand how prospects move through the sales funnel and which elements tinder them towards the desired outcome. 

    Types of Attribution Models

    As marketers, we know that multiple factors play into a conversion — channel type, timing, user’s stage on the buyer journey and so on. Various attribution models exist to reflect this variability. 

    Types of Attribution Models

    First Interaction attribution model (otherwise known as first touch) gives all credit for the conversion to the first channel (for example — a referral link) and doesn’t report on all the other interactions a user had with your company (e.g., clicked a newsletter link, engaged with a landing page, or browsed the blog campaign).

    First-touch helps optimise the top of your funnel and establish which channels bring the best leads. However, it doesn’t offer any insight into other factors that persuaded a user to convert. 

    Last Interaction attribution model (also known as last touch) allocates 100% credit to the last channel before conversion — be it direct traffic, paid ad, or an internal product page.

    The data is useful for optimising the bottom-of-the-funnel (BoFU) elements. But you have no visibility into assisted conversions — interactions a user had prior to conversion. 

    Last Non-Direct attribution model model excludes direct traffic and assigns 100% credit for a conversion to the last channel a user interacted with before converting. For instance, a social media post will receive 100% of credit if a shopper buys a product three days later. 

    This model is more telling about the other channels, involved in the sales process. Yet, you’re seeing only one step backwards, which may not be sufficient for companies with longer sales cycles.

    Linear attribution model distributes an equal credit for a conversion between all tracked touchpoints.

    For instance, with a four touchpoint conversion (e.g., an organic visit, then a direct visit, then a social visit, then a visit and conversion from an ad campaign) each touchpoint would receive 25% credit for that single conversion.

    This is the simplest multi-channel attribution modelling technique many tools support. The nuance is that linear models don’t reflect the true impact of various events. After all, a paid ad that introduced your brand to the shopper and a time-sensitive discount code at the checkout page probably did more than the blog content a shopper browsed in between. 

    Position Based attribution model allocates a 40% credit to the first and the last touchpoints and then spreads the remaining 20% across the touchpoints between the first and last. 

    This attribution model comes in handy for optimising conversions across the top and the bottom of the funnel. But it doesn’t provide much insight into the middle, which can skew your decision-making. For instance, you may overlook cases when a shopper landed via a social media post, then was re-engaged via email, and proceeded to checkout after an organic visit. Without email marketing, that sale may not have happened.

    Time decay attribution model adjusts the credit, based on the timing of the interactions. Touchpoints that preceded the conversion get the highest score, while the first ones get less weight (e.g., 5%-5%-10%-15%-25%-30%).

    This multi-channel attribution model works great for tracking the bottom of the funnel, but it underestimates the impact of brand awareness campaigns or assisted conversions at mid-stage. 

    Why Use Multi-Touch Attribution Modelling

    Multi-touch attribution provides you with the full picture of your funnel. With accurate data across all touchpoints, you can employ targeted conversion rate optimisation (CRO) strategies to maximise the impact of each campaign. 

    Most marketers and analysts prefer using multi-touch attribution modelling — and for some good reasons.

    Issues multi-touch attribution solves 

    • Funnel visibility. Understand which tactics play an important role at the top, middle and bottom of your funnel, instead of second-guessing what’s working or not. 
    • Budget allocations. Spend money on channels and tactics that bring a positive return on investment (ROI). 
    • Assisted conversions. Learn how different elements and touchpoints cumulatively contribute to the ultimate goal — a conversion event — to optimise accordingly. 
    • Channel segmentation. Determine which assets drive the most qualified and engaged leads to replicate them at scale.
    • Campaign benchmarking. Compare how different marketing activities from affiliate marketing to social media perform against the same metrics.

    How To Get Started With Multi-Touch Attribution 

    To make multi-touch attribution part of your analytics setup, follow the next steps :

    1. Define Your Marketing Objectives 

    Multi-touch attribution helps you better understand what led people to convert on your site. But to capture that, you need to first map the standard purchase journeys, which include a series of touchpoints — instances, when a prospect forms an opinion about your business.

    Touchpoints include :

    • On-site interactions (e.g., reading a blog post, browsing product pages, using an on-site calculator, etc.)
    • Off-site interactions (e.g., reading a review, clicking a social media link, interacting with an ad, etc.)

    Combined these interactions make up your sales funnel — a designated path you’ve set up to lead people toward the desired action (aka a conversion). 

    Depending on your business model, you can count any of the following as a conversion :

    • Purchase 
    • Account registration 
    • Free trial request 
    • Contact form submission 
    • Online reservation 
    • Demo call request 
    • Newsletter subscription

    So your first task is to create a set of conversion objectives for your business and add them as Goals or Conversions in your web analytics solution. Then brainstorm how various touchpoints contribute to these objectives. 

    Web analytics tools with multi-channel attribution, like Matomo, allow you to obtain an extra dimension of data on touchpoints via Tracked Events. Using Event Tracking, you can analyse how many people started doing a desired action (e.g., typing details into the form) but never completed the task. This way you can quickly identify “leaking” touchpoints in your funnel and fix them. 

    2. Select an Attribution Model 

    Multi-attribution models have inherent tradeoffs. Linear attribution model doesn’t always represent the role and importance of each channel. Position-based attribution model emphasises the role of the last and first channel while diminishing the importance of assisted conversions. Time-decay model, on the contrary, downplays the role awareness-related campaigns played.

    To select the right attribution model for your business consider your objectives. Is it more important for you to understand your best top of funnel channels to optimise customer acquisition costs (CAC) ? Or would you rather maximise your on-site conversion rates ? 

    Your industry and the average cycle length should also guide your choice. Position-based models can work best for eCommerce and SaaS businesses where both CAC and on-site conversion rates play an important role. Manufacturing companies or educational services providers, on the contrary, will benefit more from a time-decay model as it better represents the lengthy sales cycles. 

    3. Collect and Organise Data From All Touchpoints 

    Multi-touch attribution models are based on available funnel data. So to get started, you will need to determine which data sources you have and how to best leverage them for attribution modelling. 

    Types of data you should collect : 

    • General web analytics data : Insights on visitors’ on-site actions — visited pages, clicked links, form submissions and more.
    • Goals (Conversions) : Reports on successful conversions across different types of assets. 
    • Behavioural user data : Some tools also offer advanced features such as heatmaps, session recording and A/B tests. These too provide ample data into user behaviours, which you can use to map and optimise various touchpoints.

    You can also implement extra tracking, for instance for contact form submissions, live chat contacts or email marketing campaigns to identify repeat users in your system. Just remember to stay on the good side of data protection laws and respect your visitors’ privacy. 

    Separately, you can obtain top-of-the-funnel data by analysing referral traffic sources (channel, campaign type, used keyword, etc). A Tag Manager comes in handy as it allows you to zoom in on particular assets (e.g., a newsletter, an affiliate, a social campaign, etc). 

    Combined, these data points can be parsed by an app, supporting multi-touch attribution (or a custom algorithm) and reported back to you as specific findings. 

    Sounds easy, right ? Well, the devil is in the details. Getting ample, accurate data for multi-touch attribution modelling isn’t easy. 

    Marketing analytics has an accuracy problem, mainly for two reasons :

    • Cookie consent banner rejection 
    • Data sampling application

    Please note that we are not able to provide legal advice, so it’s important that you consult with your own DPO to ensure compliance with all relevant laws and regulations.

    If you’re collecting web analytics in the EU, you know that showing a cookie consent banner is a GDPR must-do. But many consumers don’t often rush to accept cookie consent banners. The average consent rate for cookies in 2021 stood at 54% in Italy, 45% in France, and 44% in Germany. The consent rates are likely lower in 2023, as Google was forced to roll out a “reject all” button for cookie tracking in Europe, while privacy organisations lodge complaints against individual businesses for deceptive banners. 

    For marketers, cookie rejection means substantial gaps in analytics data. The good news is that you can fill in those gaps by using a privacy-centred web analytics tool like Matomo. 

    Matomo takes extra safeguards to protect user privacy and supports fully cookieless tracking. Because of that, Matomo is legally exempt from tracking consent in France. Plus, you can configure to use our analytics tool without consent banners in other markets outside of Germany and the UK. This way you get to retain the data you need for audience modelling without breaching any privacy regulations. 

    Data sampling application partially stems from the above. When a web analytics or multi-channel attribution tool cannot secure first-hand data, the “guessing game” begins. Google Analytics, as well as other tools, often rely on synthetic AI-generated data to fill in the reporting gaps. Respectively, your multi-attribution model doesn’t depict the real state of affairs. Instead, it shows AI-produced guesstimates of what transpired whenever not enough real-world evidence is available.

    4. Evaluate and Select an Attribution Tool 

    Google Analytics (GA) offers several multi-touch attribution models for free (linear, time-decay and position-based). The disadvantage of GA multi-touch attribution is its lower accuracy due to cookie rejection and data sampling application.

    At the same time, you cannot create custom credit allocations for the proposed models, unless you have the paid version of GA, Google Analytics 360. This version of GA comes with a custom Attribution Modeling Tool (AMT). The price tag, however, starts at USD $50,000 per year. 

    Matomo Cloud offers multi-channel conversion attribution as a feature and it is available as a plug-in on the marketplace for Matomo On-Premise. We support linear, position-based, first-interaction, last-interaction, last non-direct and time-decay modelling, based fully on first-hand data. You also get more precise insights because cookie consent isn’t an issue with us. 

    Most multi-channel attribution tools, like Google Analytics and Matomo, provide out-of-the-box multi-touch attribution models. But other tools, like Matomo On-Premise, also provide full access to raw data so you can develop your own multi-touch attribution models and do custom attribution analysis. The ability to create custom attribution analysis is particularly beneficial for data analysts or organisations with complex and unique buyer journeys. 

    Conclusion

    Ultimately, multi-channel attribution gives marketers greater visibility into the customer journey. By analysing multiple touchpoints, you can establish how various marketing efforts contribute to conversions. Then use this information to inform your promotional strategy, budget allocations and CRO efforts. 

    The key to benefiting the most from multi-touch attribution is accurate data. If your analytics solution isn’t telling you the full story, your multi-touch model won’t either. 

    Collect accurate visitor data for multi-touch attribution modelling with Matomo. Start your free 21-day trial now

  • Parallelize Youtube video frame download using yt-dlp and cv2

    4 mars 2023, par zulle99

    My task is to download multiple sequences of successive low resolution frames of Youtube videos.

    


    I summarize the main parts of the process :

    


      

    • Each bag of shots have a dimension of half a second (depending on the current fps)
    • 


    • In order to grab useful frames I've decided to remove the initial and final 10% of each video since it is common to have an intro and outro. Moreover
    • 


    • I've made an array of pair of initial and final frame to distribute the load on multiple processes using ProcessPoolExecutor(max_workers=multiprocessing.cpu_count())
    • 


    • In case of failure/exception I completly remove the relative directory
    • 


    


    The point is that it do not scale up, since while running I noticesd that all CPUs had always a load lower that the 20% more or less. In addition since with these shots I have to run multiple CNNs, to prevent overfitting it is suggested to have a big dataset and not a bounch of shots.

    


    Here it is the code :

    


    import yt_dlp
import os
from tqdm import tqdm
import cv2
import shutil
import time
import random
from concurrent.futures import ProcessPoolExecutor
import multiprocessing
import pandas as pd
import numpy as np
from pathlib import Path
import zipfile


# PARAMETERS
percentage_train_test = 50
percentage_bag_shots = 20
percentage_to_ignore = 10

zip_f_name = f'VideoClassificationDataset_{percentage_train_test}_{percentage_bag_shots}_{percentage_to_ignore}'
dataset_path = Path('/content/VideoClassificationDataset')

# DOWNOAD ZIP FILES
!wget --no-verbose https://github.com/gtoderici/sports-1m-dataset/archive/refs/heads/master.zip

# EXTRACT AND DELETE THEM
!unzip -qq -o '/content/master.zip' 
!rm '/content/master.zip'

DATA = {'train_partition.txt': {},
        'test_partition.txt': {}}

LABELS = []

train_dict = {}
test_dict = {}

path = '/content/sports-1m-dataset-master/original'

for f in os.listdir(path):
  with open(path + '/' + f) as f_txt:
    lines = f_txt.readlines()
    for line in lines:
      splitted_line = line.split(' ')
      label_indices = splitted_line[1].rstrip('\n').split(',') 
      DATA[f][splitted_line[0]] = list(map(int, label_indices))

with open('/content/sports-1m-dataset-master/labels.txt') as f_labels:
  LABELS = f_labels.read().splitlines()


TRAIN = DATA['train_partition.txt']
TEST = DATA['test_partition.txt']
print('Original Train Test length: ', len(TRAIN), len(TEST))

# sample a subset percentage_train_test
TRAIN = dict(random.sample(TRAIN.items(), (len(TRAIN)*percentage_train_test)//100))
TEST = dict(random.sample(TEST.items(), (len(TEST)*percentage_train_test)//100))

print(f'Sampled {percentage_train_test} Percentage  Train Test length: ', len(TRAIN), len(TEST))


if not os.path.exists(dataset_path): os.makedirs(dataset_path)
if not os.path.exists(f'{dataset_path}/train'): os.makedirs(f'{dataset_path}/train')
if not os.path.exists(f'{dataset_path}/test'): os.makedirs(f'{dataset_path}/test')


    


    Function to extract a sequence of continuous frames :

    


    def extract_frames(directory, url, idx_bag, start_frame, end_frame):
  capture = cv2.VideoCapture(url)
  count = start_frame

  capture.set(cv2.CAP_PROP_POS_FRAMES, count)
  os.makedirs(f'{directory}/bag_of_shots{str(idx_bag)}')

  while count < end_frame:

    ret, frame = capture.read()

    if not ret: 
      shutil.rmtree(f'{directory}/bag_of_shots{str(idx_bag)}')
      return False

    filename = f'{directory}/bag_of_shots{str(idx_bag)}/shot{str(count - start_frame)}.png'

    cv2.imwrite(filename, frame)
    count += 1

  capture.release()
  return True


    


    Function to spread the load along multiple processors :

    


    def video_to_frames(video_url, labels_list, directory, dic, percentage_of_bags):
  url_id = video_url.split('=')[1]
  path_until_url_id = f'{dataset_path}/{directory}/{url_id}'
  try:   

    ydl_opts = {
        'ignoreerrors': True,
        'quiet': True,
        'nowarnings': True,
        'simulate': True,
        'ignorenoformatserror': True,
        'verbose':False,
        'cookies': '/content/all_cookies.txt',
        #https://stackoverflow.com/questions/63329412/how-can-i-solve-this-youtube-dl-429
    }
    ydl = yt_dlp.YoutubeDL(ydl_opts)
    info_dict = ydl.extract_info(video_url, download=False)

    if(info_dict is not None and  info_dict['fps'] >= 20):
      # I must have a least 20 frames per seconds since I take half of second bag of shots for every video

      formats = info_dict.get('formats', None)

      # excluding the initial and final 10% of each video to avoid noise
      video_length = info_dict['duration'] * info_dict['fps']

      shots = info_dict['fps'] // 2

      to_ignore = (video_length * percentage_to_ignore) // 100
      new_len = video_length - (to_ignore * 2)
      tot_stored_bags = ((new_len // shots) * percentage_of_bags) // 100   # ((total_possbile_bags // shots) * percentage_of_bags) // 100
      if tot_stored_bags == 0: tot_stored_bags = 1 # minimum 1 bag of shots

      skip_rate_between_bags = (new_len - (tot_stored_bags * shots)) // (tot_stored_bags-1) if tot_stored_bags > 1 else 0

      chunks = [[to_ignore+(bag*(skip_rate_between_bags+shots)), to_ignore+(bag*(skip_rate_between_bags+shots))+shots] for bag in range(tot_stored_bags)]
      # sequence of [[start_frame, end_frame], [start_frame, end_frame], [start_frame, end_frame], ...]


      # ----------- For the moment I download only shots form video that has 144p resolution -----------

      res = {
          '160': '144p',
          '133': '240p',
          '134': '360p',
          '135': '360p',
          '136': '720p'
      }

      format_id = {}
      for f in formats: format_id[f['format_id']] = f
      #for res in resolution_id:
      if list(res.keys())[0] in list(format_id.keys()):
          video = format_id[list(res.keys())[0]]
          url = video.get('url', None)
          if(video.get('url', None) != video.get('manifest_url', None)):

            if not os.path.exists(path_until_url_id): os.makedirs(path_until_url_id)

            with ProcessPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
              for idx_bag, f in enumerate(chunks): 
                res = executor.submit(
                  extract_frames, directory = path_until_url_id, url = url, idx_bag = idx_bag, start_frame = f[0], end_frame = f[1])
                
                if res.result() is True: 
                  l = np.zeros(len(LABELS), dtype=int) 
                  for label in labels_list: l[label] = 1
                  l = np.append(l, [shots]) # appending the number of shots taken in the list before adding it on the dictionary

                  dic[f'{directory}/{url_id}/bag_of_shots{str(idx_bag)}'] = l.tolist()


  except Exception as e:
    shutil.rmtree(path_until_url_id)
    pass


    


    Download of TRAIN bag of shots :

    


    start_time = time.time()
pbar = tqdm(enumerate(TRAIN.items()), total = len(TRAIN.items()), leave=False)

for _, (url, labels_list) in pbar: video_to_frames(
  video_url = url, labels_list = labels_list, directory = 'train', dic = train_dict, percentage_of_bags = percentage_bag_shots)

print("--- %s seconds ---" % (time.time() - start_time))


    


    Download of TEST bag of shots :

    


    start_time = time.time()
pbar = tqdm(enumerate(TEST.items()), total = len(TEST.items()), leave=False)

for _, (url, labels_list) in pbar: video_to_frames(
  video_url = url, labels_list = labels_list, directory = 'test', dic = test_dict, percentage_of_bags = percentage_bag_shots)

print("--- %s seconds ---" % (time.time() - start_time))


    


    Save the .csv files

    


    train_df = pd.DataFrame.from_dict(train_dict, orient='index', dtype=int).reset_index(level=0)
train_df = train_df.rename(columns={train_df.columns[-1]: 'shots'})
train_df.to_csv('/content/VideoClassificationDataset/train.csv', index=True)

test_df = pd.DataFrame.from_dict(test_dict, orient='index', dtype=int).reset_index(level=0)
test_df = test_df.rename(columns={test_df.columns[-1]: 'shots'})
test_df.to_csv('/content/VideoClassificationDataset/test.csv', index=True)