I have fixed the problem of "different IDs for the same content".
List of changes:
- Revert to the old behavior of extracting media IDs.
- Support segmented videos (extract only the relevant parts of the
whole episode).
- Reduce verbosity of _VALID_URL.
The SRGSSR Play websites now often uses the integrationlayer version 2.0
API instead of version 1.0. I have modified the SRGSSR information
extractor to use this new integrationlayer instead of the old one. All
the old media supports this new version too, so there is no need to
stick with the old one. It's possible, that the support for the old
integrationlayer will be dropped, so this switch has to made anyway.
Here is a list of the changes:
- Use integrationlayer version 2.0 API instead of version 1.0.
- Assure consistant media IDs. In the old version of the information
extractor, youtube-dl extracts the same video for the urls
"http://www.srf.ch/play/tv/schweiz-aktuell/video/schweiz-aktuell-vom-22-02-2017?id=d0206674-6125-49ef-b85d-3cf36d24d582"
and
"http://www.srf.ch/play/tv/schweiz-aktuell/video/walliser-baubaubranche-wehrt-sich?id=967590f0-f812-4941-8f6a-06a2db7bd083",
but uses different media IDs. Now it still extracts the same videos
(since there is no support to cut videos into parts in youtube-dl,
right?), but it uses the same media IDs. So we always have consistant
media IDs for the media.
- Add extraction of media duration.
- Add extraction of video subtitles.
- Use multiline regular expressions for _VALID_URL for better
readability.
- Indicate direct podcast downloads in format_id.
- Update tests.
To fix rg3#10359, use `-attach` to embed the thumbnail as attachment.
In this version, the attached filename follow the https://matroska.org/technical/cover_art/index.html convention, as pointed in #6046.
The image dimensions are probably not going to respect the convention though, and we assume the thumbnail is in landscape orientation.
As only 2 images type seems to be recognized by the convention, they are hardcoded (mimetypes module would match on extension too).