Merge remote-tracking branch 'rg3/master'

This commit is contained in:
lucas 2017-04-26 10:56:28 -04:00
commit 011d1ab1be
41 changed files with 1049 additions and 442 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.04.17*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.04.26*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.04.17** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.04.26**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.04.17 [debug] youtube-dl version 2017.04.26
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

2
.gitignore vendored
View File

@ -35,8 +35,8 @@ updates_key.pem
*.mkv *.mkv
*.swf *.swf
*.part *.part
*.ytdl
*.swp *.swp
test/testdata
test/local_parameters.json test/local_parameters.json
.tox .tox
youtube-dl.zsh youtube-dl.zsh

View File

@ -1,3 +1,47 @@
version 2017.04.26
Core
* Introduce --keep-fragments for keeping fragments of fragmented download
on disk after download is finished
* [YoutubeDL] Fix output template for missing timestamp (#12796)
* [socks] Handle cases where credentials are required but missing
* [extractor/common] Improve HLS extraction (#12211)
- Extract m3u8 parsing to separate method
- Improve rendition groups extraction
- Build stream name according stream GROUP-ID
- Ignore reference to AUDIO group without URI when stream has no CODECS
- Use float for scaled tbr in _parse_m3u8_formats
* [utils] Add support for TTML styles in dfxp2srt
* [downloader/hls] No need to download keys for fragments that have been
already downloaded
* [downloader/fragment] Improve fragment downloading
- Resume immediately
- Don't concatenate fragments and decrypt them on every resume
- Optimize disk storage usage, don't store intermediate fragments on disk
- Store bookkeeping download state file
+ [extractor/common] Add support for multiple getters in try_get
+ [extractor/common] Add support for video of WebPage context in _json_ld
(#12778)
+ [extractor/common] Relax JWPlayer regular expression and remove
duplicate URLs (#12768)
Extractors
* [iqiyi] Fix extraction of Yule videos
* [vidio] Improve extraction and sort formats
+ [brightcove] Match only video elements with data-video-id attribute
* [iqiyi] Fix playlist detection (#12504)
- [azubu] Remove extractor (#12813)
* [porn91] Fix extraction (#12814)
* [vidzi] Fix extraction (#12793)
+ [amp] Extract error message (#12795)
+ [xfileshare] Add support for gorillavid.com and daclips.com (#12776)
* [instagram] Fix extraction (#12777)
+ [generic] Support Brightcove videos in <iframe> (#12482)
+ [brightcove] Support URLs with bcpid instead of playerID (#12482)
* [brightcove] Fix _extract_url (#12782)
+ [odnoklassniki] Extract HLS formats
version 2017.04.17 version 2017.04.17
Extractors Extractors

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.ytdl *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete

View File

@ -187,6 +187,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
and ISM) and ISM)
--abort-on-unavailable-fragment Abort downloading when some fragment is not --abort-on-unavailable-fragment Abort downloading when some fragment is not
available available
--keep-fragments Keep downloaded fragments on disk after
downloading is finished; fragments are
erased by default
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024) (default is 1024)
--no-resize-buffer Do not automatically adjust the buffer --no-resize-buffer Do not automatically adjust the buffer

View File

@ -81,8 +81,6 @@
- **AZMedien**: AZ Medien videos - **AZMedien**: AZ Medien videos
- **AZMedienPlaylist**: AZ Medien playlists - **AZMedienPlaylist**: AZ Medien playlists
- **AZMedienShowPlaylist**: AZ Medien show playlists - **AZMedienShowPlaylist**: AZ Medien show playlists
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频 - **BaiduVideo**: 百度视频
- **bambuser** - **bambuser**
- **bambuser:channel** - **bambuser:channel**

View File

@ -3,12 +3,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
# Allow direct execution # Allow direct execution
import io
import os import os
import sys import sys
import unittest import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, expect_dict from test.helper import FakeYDL, expect_dict, expect_value
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
@ -175,6 +176,354 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
}] }]
}) })
def test_parse_m3u8_formats(self):
_TEST_CASES = [
(
# https://github.com/rg3/youtube-dl/issues/11507
# http://pluzz.francetv.fr/videos/le_ministere.html
'pluzz_francetv_11507',
'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/master.m3u8?caption=2017%2F16%2F156589847-1492488987.m3u8%3Afra%3AFrancais&audiotrack=0%3Afra%3AFrancais',
[{
'url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/master.m3u8?caption=2017%2F16%2F156589847-1492488987.m3u8%3Afra%3AFrancais&audiotrack=0%3Afra%3AFrancais',
'ext': 'mp4',
'format_id': 'meta',
'format_note': 'Quality selection URL',
'protocol': 'm3u8',
'preference': -100,
'resolution': 'multiple'
}, {
'url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_0_av.m3u8?null=0',
'manifest_url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_0_av.m3u8?null=0',
'ext': 'mp4',
'format_id': '180',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.66.30',
'tbr': 180,
'width': 256,
'height': 144,
}, {
'url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_1_av.m3u8?null=0',
'manifest_url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_1_av.m3u8?null=0',
'ext': 'mp4',
'format_id': '303',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.66.30',
'tbr': 303,
'width': 320,
'height': 180,
}, {
'url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_2_av.m3u8?null=0',
'manifest_url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_2_av.m3u8?null=0',
'ext': 'mp4',
'format_id': '575',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.66.30',
'tbr': 575,
'width': 512,
'height': 288,
}, {
'url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_3_av.m3u8?null=0',
'manifest_url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_3_av.m3u8?null=0',
'ext': 'mp4',
'format_id': '831',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.77.30',
'tbr': 831,
'width': 704,
'height': 396,
}, {
'url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_4_av.m3u8?null=0',
'manifest_url': 'http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_4_av.m3u8?null=0',
'ext': 'mp4',
'protocol': 'm3u8',
'format_id': '1467',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.77.30',
'tbr': 1467,
'width': 1024,
'height': 576,
}]
),
(
# https://github.com/rg3/youtube-dl/issues/11995
# http://teamcoco.com/video/clueless-gamer-super-bowl-for-honor
'teamcoco_11995',
'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/main.m3u8',
[{
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/main.m3u8',
'ext': 'mp4',
'format_id': 'meta',
'format_note': 'Quality selection URL',
'protocol': 'm3u8',
'preference': -100,
'resolution': 'multiple',
}, {
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-audio-160k_v4.m3u8',
'ext': 'mp4',
'format_id': 'audio-0-Default',
'protocol': 'm3u8',
'vcodec': 'none',
}, {
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-audio-64k_v4.m3u8',
'ext': 'mp4',
'format_id': 'audio-1-Default',
'protocol': 'm3u8',
'vcodec': 'none',
}, {
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-audio-64k_v4.m3u8',
'manifest_url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-audio-64k_v4.m3u8',
'ext': 'mp4',
'format_id': '71',
'protocol': 'm3u8',
'acodec': 'mp4a.40.5',
'vcodec': 'none',
'tbr': 71,
}, {
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-400k_v4.m3u8',
'manifest_url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-400k_v4.m3u8',
'ext': 'mp4',
'format_id': '413',
'protocol': 'm3u8',
'acodec': 'none',
'vcodec': 'avc1.42001e',
'tbr': 413,
'width': 400,
'height': 224,
}, {
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-400k_v4.m3u8',
'manifest_url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-400k_v4.m3u8',
'ext': 'mp4',
'format_id': '522',
'protocol': 'm3u8',
'acodec': 'none',
'vcodec': 'avc1.42001e',
'tbr': 522,
'width': 400,
'height': 224,
}, {
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-1m_v4.m3u8',
'manifest_url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-1m_v4.m3u8',
'ext': 'mp4',
'format_id': '1205',
'protocol': 'm3u8',
'acodec': 'none',
'vcodec': 'avc1.4d001e',
'tbr': 1205,
'width': 640,
'height': 360,
}, {
'url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-2m_v4.m3u8',
'manifest_url': 'http://ak.storage-w.teamcococdn.com/cdn/2017-02/98599/ed8f/hls/CONAN_020217_Highlight_show-2m_v4.m3u8',
'ext': 'mp4',
'format_id': '2374',
'protocol': 'm3u8',
'acodec': 'none',
'vcodec': 'avc1.4d001f',
'tbr': 2374,
'width': 1024,
'height': 576,
}]
),
(
# https://github.com/rg3/youtube-dl/issues/12211
# http://video.toggle.sg/en/series/whoopie-s-world/ep3/478601
'toggle_mobile_12211',
'http://cdnapi.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/http/entryId/0_89q6e8ku/format/applehttp/tags/mobile_sd/f/a.m3u8',
[{
'url': 'http://cdnapi.kaltura.com/p/2082311/sp/208231100/playManifest/protocol/http/entryId/0_89q6e8ku/format/applehttp/tags/mobile_sd/f/a.m3u8',
'ext': 'mp4',
'format_id': 'meta',
'format_note': 'Quality selection URL',
'protocol': 'm3u8',
'preference': -100,
'resolution': 'multiple'
}, {
'url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_sa2ntrdg/name/a.mp4/index.m3u8',
'ext': 'mp4',
'format_id': 'audio-English',
'protocol': 'm3u8',
'language': 'eng',
'vcodec': 'none',
}, {
'url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_r7y0nitg/name/a.mp4/index.m3u8',
'ext': 'mp4',
'format_id': 'audio-Undefined',
'protocol': 'm3u8',
'language': 'und',
'vcodec': 'none',
}, {
'url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_qlk9hlzr/name/a.mp4/index.m3u8',
'manifest_url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_qlk9hlzr/name/a.mp4/index.m3u8',
'ext': 'mp4',
'format_id': '155',
'protocol': 'm3u8',
'tbr': 155.648,
'width': 320,
'height': 180,
}, {
'url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_oefackmi/name/a.mp4/index.m3u8',
'manifest_url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_oefackmi/name/a.mp4/index.m3u8',
'ext': 'mp4',
'format_id': '502',
'protocol': 'm3u8',
'tbr': 502.784,
'width': 480,
'height': 270,
}, {
'url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/12/pv/1/flavorId/0_vyg9pj7k/name/a.mp4/index.m3u8',
'manifest_url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/12/pv/1/flavorId/0_vyg9pj7k/name/a.mp4/index.m3u8',
'ext': 'mp4',
'format_id': '827',
'protocol': 'm3u8',
'tbr': 827.392,
'width': 640,
'height': 360,
}, {
'url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/12/pv/1/flavorId/0_50n4psvx/name/a.mp4/index.m3u8',
'manifest_url': 'http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/12/pv/1/flavorId/0_50n4psvx/name/a.mp4/index.m3u8',
'ext': 'mp4',
'format_id': '1396',
'protocol': 'm3u8',
'tbr': 1396.736,
'width': 854,
'height': 480,
}]
),
(
# http://www.twitch.tv/riotgames/v/6528877
'twitch_vod',
'https://usher.ttvnw.net/vod/6528877?allow_source=true&allow_audio_only=true&allow_spectre=true&player=twitchweb&nauth=%7B%22user_id%22%3Anull%2C%22vod_id%22%3A6528877%2C%22expires%22%3A1492887874%2C%22chansub%22%3A%7B%22restricted_bitrates%22%3A%5B%5D%7D%2C%22privileged%22%3Afalse%2C%22https_required%22%3Afalse%7D&nauthsig=3e29296a6824a0f48f9e731383f77a614fc79bee',
[{
'url': 'https://usher.ttvnw.net/vod/6528877?allow_source=true&allow_audio_only=true&allow_spectre=true&player=twitchweb&nauth=%7B%22user_id%22%3Anull%2C%22vod_id%22%3A6528877%2C%22expires%22%3A1492887874%2C%22chansub%22%3A%7B%22restricted_bitrates%22%3A%5B%5D%7D%2C%22privileged%22%3Afalse%2C%22https_required%22%3Afalse%7D&nauthsig=3e29296a6824a0f48f9e731383f77a614fc79bee',
'ext': 'mp4',
'format_id': 'meta',
'format_note': 'Quality selection URL',
'protocol': 'm3u8',
'preference': -100,
'resolution': 'multiple'
}, {
'url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/audio_only/index-muted-HM49I092CC.m3u8',
'manifest_url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/audio_only/index-muted-HM49I092CC.m3u8',
'ext': 'mp4',
'format_id': 'Audio Only',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'none',
'tbr': 182.725,
}, {
'url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/mobile/index-muted-HM49I092CC.m3u8',
'manifest_url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/mobile/index-muted-HM49I092CC.m3u8',
'ext': 'mp4',
'format_id': 'Mobile',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.42C00D',
'tbr': 280.474,
'width': 400,
'height': 226,
}, {
'url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/low/index-muted-HM49I092CC.m3u8',
'manifest_url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/low/index-muted-HM49I092CC.m3u8',
'ext': 'mp4',
'format_id': 'Low',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.42C01E',
'tbr': 628.347,
'width': 640,
'height': 360,
}, {
'url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/medium/index-muted-HM49I092CC.m3u8',
'manifest_url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/medium/index-muted-HM49I092CC.m3u8',
'ext': 'mp4',
'format_id': 'Medium',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.42C01E',
'tbr': 893.387,
'width': 852,
'height': 480,
}, {
'url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/high/index-muted-HM49I092CC.m3u8',
'manifest_url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/high/index-muted-HM49I092CC.m3u8',
'ext': 'mp4',
'format_id': 'High',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.42C01F',
'tbr': 1603.789,
'width': 1280,
'height': 720,
}, {
'url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/chunked/index-muted-HM49I092CC.m3u8',
'manifest_url': 'https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/chunked/index-muted-HM49I092CC.m3u8',
'ext': 'mp4',
'format_id': 'Source',
'protocol': 'm3u8',
'acodec': 'mp4a.40.2',
'vcodec': 'avc1.100.31',
'tbr': 3214.134,
'width': 1280,
'height': 720,
}]
),
(
# http://www.vidio.com/watch/165683-dj_ambred-booyah-live-2015
# EXT-X-STREAM-INF tag with NAME attribute that is not defined
# in HLS specification
'vidio',
'https://www.vidio.com/videos/165683/playlist.m3u8',
[{
'url': 'https://www.vidio.com/videos/165683/playlist.m3u8',
'ext': 'mp4',
'format_id': 'meta',
'format_note': 'Quality selection URL',
'protocol': 'm3u8',
'preference': -100,
'resolution': 'multiple'
}, {
'url': 'https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b300.mp4.m3u8',
'manifest_url': 'https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b300.mp4.m3u8',
'ext': 'mp4',
'format_id': '270p 3G',
'protocol': 'm3u8',
'tbr': 300,
'width': 480,
'height': 270,
}, {
'url': 'https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b600.mp4.m3u8',
'manifest_url': 'https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b600.mp4.m3u8',
'ext': 'mp4',
'format_id': '360p SD',
'protocol': 'm3u8',
'tbr': 600,
'width': 640,
'height': 360,
}, {
'url': 'https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b1200.mp4.m3u8',
'manifest_url': 'https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b1200.mp4.m3u8',
'ext': 'mp4',
'format_id': '720p HD',
'protocol': 'm3u8',
'tbr': 1200,
'width': 1280,
'height': 720,
}]
)
]
for m3u8_file, m3u8_url, expected_formats in _TEST_CASES:
with io.open('./test/testdata/m3u8/%s.m3u8' % m3u8_file,
mode='r', encoding='utf-8') as f:
formats = self.ie._parse_m3u8_formats(
f.read(), m3u8_url, ext='mp4')
self.ie._sort_formats(formats)
expect_value(self, formats, expected_formats, None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -1069,6 +1069,47 @@ The first line
''' '''
self.assertEqual(dfxp2srt(dfxp_data_no_default_namespace), srt_data) self.assertEqual(dfxp2srt(dfxp_data_no_default_namespace), srt_data)
dfxp_data_with_style = '''<?xml version="1.0" encoding="utf-8"?>
<tt xmlns="http://www.w3.org/2006/10/ttaf1" xmlns:ttp="http://www.w3.org/2006/10/ttaf1#parameter" ttp:timeBase="media" xmlns:tts="http://www.w3.org/2006/10/ttaf1#style" xml:lang="en" xmlns:ttm="http://www.w3.org/2006/10/ttaf1#metadata">
<head>
<styling>
<style id="s2" style="s0" tts:color="cyan" tts:fontWeight="bold" />
<style id="s1" style="s0" tts:color="yellow" tts:fontStyle="italic" />
<style id="s3" style="s0" tts:color="lime" tts:textDecoration="underline" />
<style id="s0" tts:backgroundColor="black" tts:fontStyle="normal" tts:fontSize="16" tts:fontFamily="sansSerif" tts:color="white" />
</styling>
</head>
<body tts:textAlign="center" style="s0">
<div>
<p begin="00:00:02.08" id="p0" end="00:00:05.84">default style<span tts:color="red">custom style</span></p>
<p style="s2" begin="00:00:02.08" id="p0" end="00:00:05.84"><span tts:color="lime">part 1<br /></span><span tts:color="cyan">part 2</span></p>
<p style="s3" begin="00:00:05.84" id="p1" end="00:00:09.56">line 3<br />part 3</p>
<p style="s1" tts:textDecoration="underline" begin="00:00:09.56" id="p2" end="00:00:12.36"><span style="s2" tts:color="lime">inner<br /> </span>style</p>
</div>
</body>
</tt>'''
srt_data = '''1
00:00:02,080 --> 00:00:05,839
<font color="white" face="sansSerif" size="16">default style<font color="red">custom style</font></font>
2
00:00:02,080 --> 00:00:05,839
<b><font color="cyan" face="sansSerif" size="16"><font color="lime">part 1
</font>part 2</font></b>
3
00:00:05,839 --> 00:00:09,560
<u><font color="lime">line 3
part 3</font></u>
4
00:00:09,560 --> 00:00:12,359
<i><u><font color="yellow"><font color="lime">inner
</font>style</font></u></i>
'''
self.assertEqual(dfxp2srt(dfxp_data_with_style), srt_data)
def test_cli_option(self): def test_cli_option(self):
self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128']) self.assertEqual(cli_option({'proxy': '127.0.0.1:3128'}, '--proxy', 'proxy'), ['--proxy', '127.0.0.1:3128'])
self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), []) self.assertEqual(cli_option({'proxy': None}, '--proxy', 'proxy'), [])

View File

@ -0,0 +1,14 @@
#EXTM3U
#EXT-X-VERSION:5
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subs",NAME="Francais",DEFAULT=NO,FORCED=NO,URI="http://replayftv-pmd.francetv.fr/subtitles/2017/16/156589847-1492488987.m3u8",LANGUAGE="fra"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="aac",LANGUAGE="fra",NAME="Francais",DEFAULT=YES, AUTOSELECT=YES
#EXT-X-STREAM-INF:SUBTITLES="subs",AUDIO="aac",PROGRAM-ID=1,BANDWIDTH=180000,RESOLUTION=256x144,CODECS="avc1.66.30, mp4a.40.2"
http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_0_av.m3u8?null=0
#EXT-X-STREAM-INF:SUBTITLES="subs",AUDIO="aac",PROGRAM-ID=1,BANDWIDTH=303000,RESOLUTION=320x180,CODECS="avc1.66.30, mp4a.40.2"
http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_1_av.m3u8?null=0
#EXT-X-STREAM-INF:SUBTITLES="subs",AUDIO="aac",PROGRAM-ID=1,BANDWIDTH=575000,RESOLUTION=512x288,CODECS="avc1.66.30, mp4a.40.2"
http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_2_av.m3u8?null=0
#EXT-X-STREAM-INF:SUBTITLES="subs",AUDIO="aac",PROGRAM-ID=1,BANDWIDTH=831000,RESOLUTION=704x396,CODECS="avc1.77.30, mp4a.40.2"
http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_3_av.m3u8?null=0
#EXT-X-STREAM-INF:SUBTITLES="subs",AUDIO="aac",PROGRAM-ID=1,BANDWIDTH=1467000,RESOLUTION=1024x576,CODECS="avc1.77.30, mp4a.40.2"
http://replayftv-vh.akamaihd.net/i/streaming-adaptatif_france-dom-tom/2017/S16/J2/156589847-58f59130c1f52-,standard1,standard2,standard3,standard4,standard5,.mp4.csmil/index_4_av.m3u8?null=0

16
test/testdata/m3u8/teamcoco_11995.m3u8 vendored Normal file
View File

@ -0,0 +1,16 @@
#EXTM3U
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio-0",NAME="Default",AUTOSELECT=YES,DEFAULT=YES,URI="hls/CONAN_020217_Highlight_show-audio-160k_v4.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio-1",NAME="Default",AUTOSELECT=YES,DEFAULT=YES,URI="hls/CONAN_020217_Highlight_show-audio-64k_v4.m3u8"
#EXT-X-I-FRAME-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=37862000,CODECS="avc1.4d001f",URI="hls/CONAN_020217_Highlight_show-2m_iframe.m3u8"
#EXT-X-I-FRAME-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=18750000,CODECS="avc1.4d001e",URI="hls/CONAN_020217_Highlight_show-1m_iframe.m3u8"
#EXT-X-I-FRAME-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=6535000,CODECS="avc1.42001e",URI="hls/CONAN_020217_Highlight_show-400k_iframe.m3u8"
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2374000,RESOLUTION=1024x576,CODECS="avc1.4d001f,mp4a.40.2",AUDIO="audio-0"
hls/CONAN_020217_Highlight_show-2m_v4.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1205000,RESOLUTION=640x360,CODECS="avc1.4d001e,mp4a.40.2",AUDIO="audio-0"
hls/CONAN_020217_Highlight_show-1m_v4.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=522000,RESOLUTION=400x224,CODECS="avc1.42001e,mp4a.40.2",AUDIO="audio-0"
hls/CONAN_020217_Highlight_show-400k_v4.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=413000,RESOLUTION=400x224,CODECS="avc1.42001e,mp4a.40.5",AUDIO="audio-1"
hls/CONAN_020217_Highlight_show-400k_v4.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=71000,CODECS="mp4a.40.5",AUDIO="audio-1"
hls/CONAN_020217_Highlight_show-audio-64k_v4.m3u8

View File

@ -0,0 +1,13 @@
#EXTM3U
#EXT-X-VERSION:4
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="eng",NAME="English",URI="http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_sa2ntrdg/name/a.mp4/index.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="audio",LANGUAGE="und",NAME="Undefined",URI="http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_r7y0nitg/name/a.mp4/index.m3u8"
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=155648,RESOLUTION=320x180,AUDIO="audio"
http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_qlk9hlzr/name/a.mp4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=502784,RESOLUTION=480x270,AUDIO="audio"
http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/2/pv/1/flavorId/0_oefackmi/name/a.mp4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=827392,RESOLUTION=640x360,AUDIO="audio"
http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/12/pv/1/flavorId/0_vyg9pj7k/name/a.mp4/index.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1396736,RESOLUTION=854x480,AUDIO="audio"
http://k.toggle.sg/fhls/p/2082311/sp/208231100/serveFlavor/entryId/0_89q6e8ku/v/12/pv/1/flavorId/0_50n4psvx/name/a.mp4/index.m3u8

20
test/testdata/m3u8/twitch_vod.m3u8 vendored Normal file
View File

@ -0,0 +1,20 @@
#EXTM3U
#EXT-X-TWITCH-INFO:ORIGIN="s3",CLUSTER="edgecast_vod",REGION="EU",MANIFEST-CLUSTER="edgecast_vod",USER-IP="109.171.17.81"
#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="chunked",NAME="Source",AUTOSELECT=YES,DEFAULT=YES
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3214134,CODECS="avc1.100.31,mp4a.40.2",RESOLUTION="1280x720",VIDEO="chunked"
https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/chunked/index-muted-HM49I092CC.m3u8
#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="high",NAME="High",AUTOSELECT=YES,DEFAULT=YES
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1603789,CODECS="avc1.42C01F,mp4a.40.2",RESOLUTION="1280x720",VIDEO="high"
https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/high/index-muted-HM49I092CC.m3u8
#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="medium",NAME="Medium",AUTOSELECT=YES,DEFAULT=YES
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=893387,CODECS="avc1.42C01E,mp4a.40.2",RESOLUTION="852x480",VIDEO="medium"
https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/medium/index-muted-HM49I092CC.m3u8
#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="low",NAME="Low",AUTOSELECT=YES,DEFAULT=YES
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=628347,CODECS="avc1.42C01E,mp4a.40.2",RESOLUTION="640x360",VIDEO="low"
https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/low/index-muted-HM49I092CC.m3u8
#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="mobile",NAME="Mobile",AUTOSELECT=YES,DEFAULT=YES
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=280474,CODECS="avc1.42C00D,mp4a.40.2",RESOLUTION="400x226",VIDEO="mobile"
https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/mobile/index-muted-HM49I092CC.m3u8
#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="audio_only",NAME="Audio Only",AUTOSELECT=NO,DEFAULT=NO
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=182725,CODECS="mp4a.40.2",VIDEO="audio_only"
https://vod.edgecast.hls.ttvnw.net/e5da31ab49_riotgames_15001215120_261543898/audio_only/index-muted-HM49I092CC.m3u8

10
test/testdata/m3u8/vidio.m3u8 vendored Normal file
View File

@ -0,0 +1,10 @@
#EXTM3U
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=300000,RESOLUTION=480x270,NAME="270p 3G"
https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b300.mp4.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=600000,RESOLUTION=640x360,NAME="360p SD"
https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b600.mp4.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1200000,RESOLUTION=1280x720,NAME="720p HD"
https://cdn1-a.production.vidio.static6.com/uploads/165683/dj_ambred-4383-b1200.mp4.m3u8

View File

@ -640,7 +640,7 @@ class YoutubeDL(object):
NUMERIC_FIELDS = set(( NUMERIC_FIELDS = set((
'width', 'height', 'tbr', 'abr', 'asr', 'vbr', 'fps', 'filesize', 'filesize_approx', 'width', 'height', 'tbr', 'abr', 'asr', 'vbr', 'fps', 'filesize', 'filesize_approx',
'upload_year', 'upload_month', 'upload_day', 'timestamp', 'upload_year', 'upload_month', 'upload_day',
'duration', 'view_count', 'like_count', 'dislike_count', 'repost_count', 'duration', 'view_count', 'like_count', 'dislike_count', 'repost_count',
'average_rating', 'comment_count', 'age_limit', 'average_rating', 'comment_count', 'age_limit',
'start_time', 'end_time', 'start_time', 'end_time',

View File

@ -343,6 +343,7 @@ def _real_main(argv=None):
'retries': opts.retries, 'retries': opts.retries,
'fragment_retries': opts.fragment_retries, 'fragment_retries': opts.fragment_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments, 'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'keep_fragments': opts.keep_fragments,
'buffersize': opts.buffersize, 'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer, 'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl, 'continuedl': opts.continue_dl,

View File

@ -187,6 +187,9 @@ class FileDownloader(object):
return filename[:-len('.part')] return filename[:-len('.part')]
return filename return filename
def ytdl_filename(self, filename):
return filename + '.ytdl'
def try_rename(self, old_filename, new_filename): def try_rename(self, old_filename, new_filename):
try: try:
if old_filename == new_filename: if old_filename == new_filename:
@ -327,21 +330,22 @@ class FileDownloader(object):
os.path.exists(encodeFilename(filename)) os.path.exists(encodeFilename(filename))
) )
continuedl_and_exists = ( if not hasattr(filename, 'write'):
self.params.get('continuedl', True) and continuedl_and_exists = (
os.path.isfile(encodeFilename(filename)) and self.params.get('continuedl', True) and
not self.params.get('nopart', False) os.path.isfile(encodeFilename(filename)) and
) not self.params.get('nopart', False)
)
# Check file already present # Check file already present
if filename != '-' and (nooverwrites_and_exists or continuedl_and_exists): if filename != '-' and (nooverwrites_and_exists or continuedl_and_exists):
self.report_file_already_downloaded(filename) self.report_file_already_downloaded(filename)
self._hook_progress({ self._hook_progress({
'filename': filename, 'filename': filename,
'status': 'finished', 'status': 'finished',
'total_bytes': os.path.getsize(encodeFilename(filename)), 'total_bytes': os.path.getsize(encodeFilename(filename)),
}) })
return True return True
min_sleep_interval = self.params.get('sleep_interval') min_sleep_interval = self.params.get('sleep_interval')
if min_sleep_interval: if min_sleep_interval:

View File

@ -1,13 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
from ..utils import (
sanitize_open,
encodeFilename,
)
class DashSegmentsFD(FragmentFD): class DashSegmentsFD(FragmentFD):
@ -28,31 +22,24 @@ class DashSegmentsFD(FragmentFD):
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True) skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def process_segment(segment, tmp_filename, num): frag_index = 0
segment_url = segment['url'] for i, segment in enumerate(segments):
segment_name = 'Frag%d' % num frag_index += 1
target_filename = '%s-%s' % (tmp_filename, segment_name) if frag_index <= ctx['fragment_index']:
continue
# In DASH, the first segment contains necessary headers to # In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment # generate a valid MP4 file, so always abort for the first segment
fatal = num == 0 or not skip_unavailable_fragments fatal = i == 0 or not skip_unavailable_fragments
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, { success, frag_content = self._download_fragment(ctx, segment['url'], info_dict)
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb') self._append_fragment(ctx, frag_content)
ctx['dest_stream'].write(down.read())
down.close()
segments_filenames.append(target_sanitized)
break break
except compat_urllib_error.HTTPError as err: except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the # YouTube may often return 404 HTTP error for a fragment causing the
@ -63,22 +50,14 @@ class DashSegmentsFD(FragmentFD):
# HTTP error. # HTTP error.
count += 1 count += 1
if count <= fragment_retries: if count <= fragment_retries:
self.report_retry_fragment(err, segment_name, count, fragment_retries) self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries: if count > fragment_retries:
if not fatal: if not fatal:
self.report_skip_fragment(segment_name) self.report_skip_fragment(frag_index)
return True continue
self.report_error('giving up after %s fragment retries' % fragment_retries) self.report_error('giving up after %s fragment retries' % fragment_retries)
return False return False
return True
for i, segment in enumerate(segments):
if not process_segment(segment, ctx['tmpfilename'], i):
return False
self._finish_frag_download(ctx) self._finish_frag_download(ctx)
for segment_file in segments_filenames:
os.remove(encodeFilename(segment_file))
return True return True

View File

@ -3,7 +3,6 @@ from __future__ import division, unicode_literals
import base64 import base64
import io import io
import itertools import itertools
import os
import time import time
from .fragment import FragmentFD from .fragment import FragmentFD
@ -16,9 +15,7 @@ from ..compat import (
compat_struct_unpack, compat_struct_unpack,
) )
from ..utils import ( from ..utils import (
encodeFilename,
fix_xml_ampersands, fix_xml_ampersands,
sanitize_open,
xpath_text, xpath_text,
) )
@ -366,17 +363,21 @@ class F4mFD(FragmentFD):
dest_stream = ctx['dest_stream'] dest_stream = ctx['dest_stream']
write_flv_header(dest_stream) if ctx['complete_frags_downloaded_bytes'] == 0:
if not live: write_flv_header(dest_stream)
write_metadata_tag(dest_stream, metadata) if not live:
write_metadata_tag(dest_stream, metadata)
base_url_parsed = compat_urllib_parse_urlparse(base_url) base_url_parsed = compat_urllib_parse_urlparse(base_url)
self._start_frag_download(ctx) self._start_frag_download(ctx)
frags_filenames = [] frag_index = 0
while fragments_list: while fragments_list:
seg_i, frag_i = fragments_list.pop(0) seg_i, frag_i = fragments_list.pop(0)
frag_index += 1
if frag_index <= ctx['fragment_index']:
continue
name = 'Seg%d-Frag%d' % (seg_i, frag_i) name = 'Seg%d-Frag%d' % (seg_i, frag_i)
query = [] query = []
if base_url_parsed.query: if base_url_parsed.query:
@ -386,17 +387,10 @@ class F4mFD(FragmentFD):
if info_dict.get('extra_param_to_segment_url'): if info_dict.get('extra_param_to_segment_url'):
query.append(info_dict['extra_param_to_segment_url']) query.append(info_dict['extra_param_to_segment_url'])
url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query)) url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query))
frag_filename = '%s-%s' % (ctx['tmpfilename'], name)
try: try:
success = ctx['dl'].download(frag_filename, { success, down_data = self._download_fragment(ctx, url_parsed.geturl(), info_dict)
'url': url_parsed.geturl(),
'http_headers': info_dict.get('http_headers'),
})
if not success: if not success:
return False return False
(down, frag_sanitized) = sanitize_open(frag_filename, 'rb')
down_data = down.read()
down.close()
reader = FlvReader(down_data) reader = FlvReader(down_data)
while True: while True:
try: try:
@ -411,12 +405,8 @@ class F4mFD(FragmentFD):
break break
raise raise
if box_type == b'mdat': if box_type == b'mdat':
dest_stream.write(box_data) self._append_fragment(ctx, box_data)
break break
if live:
os.remove(encodeFilename(frag_sanitized))
else:
frags_filenames.append(frag_sanitized)
except (compat_urllib_error.HTTPError, ) as err: except (compat_urllib_error.HTTPError, ) as err:
if live and (err.code == 404 or err.code == 410): if live and (err.code == 404 or err.code == 410):
# We didn't keep up with the live window. Continue # We didn't keep up with the live window. Continue
@ -436,7 +426,4 @@ class F4mFD(FragmentFD):
self._finish_frag_download(ctx) self._finish_frag_download(ctx)
for frag_file in frags_filenames:
os.remove(encodeFilename(frag_file))
return True return True

View File

@ -2,6 +2,7 @@ from __future__ import division, unicode_literals
import os import os
import time import time
import json
from .common import FileDownloader from .common import FileDownloader
from .http import HttpFD from .http import HttpFD
@ -28,15 +29,37 @@ class FragmentFD(FileDownloader):
and hlsnative only) and hlsnative only)
skip_unavailable_fragments: skip_unavailable_fragments:
Skip unavailable fragments (DASH and hlsnative only) Skip unavailable fragments (DASH and hlsnative only)
keep_fragments: Keep downloaded fragments on disk after downloading is
finished
For each incomplete fragment download youtube-dl keeps on disk a special
bookkeeping file with download state and metadata (in future such files will
be used for any incomplete download handled by youtube-dl). This file is
used to properly handle resuming, check download file consistency and detect
potential errors. The file has a .ytdl extension and represents a standard
JSON file of the following format:
extractor:
Dictionary of extractor related data. TBD.
downloader:
Dictionary of downloader related data. May contain following data:
current_fragment:
Dictionary with current (being downloaded) fragment data:
index: 0-based index of current fragment among all fragments
fragment_count:
Total count of fragments
This feature is experimental and file format may change in future.
""" """
def report_retry_fragment(self, err, fragment_name, count, retries): def report_retry_fragment(self, err, frag_index, count, retries):
self.to_screen( self.to_screen(
'[download] Got server HTTP error: %s. Retrying fragment %s (attempt %d of %s)...' '[download] Got server HTTP error: %s. Retrying fragment %d (attempt %d of %s)...'
% (error_to_compat_str(err), fragment_name, count, self.format_retries(retries))) % (error_to_compat_str(err), frag_index, count, self.format_retries(retries)))
def report_skip_fragment(self, fragment_name): def report_skip_fragment(self, frag_index):
self.to_screen('[download] Skipping fragment %s...' % fragment_name) self.to_screen('[download] Skipping fragment %d...' % frag_index)
def _prepare_url(self, info_dict, url): def _prepare_url(self, info_dict, url):
headers = info_dict.get('http_headers') headers = info_dict.get('http_headers')
@ -46,6 +69,51 @@ class FragmentFD(FileDownloader):
self._prepare_frag_download(ctx) self._prepare_frag_download(ctx)
self._start_frag_download(ctx) self._start_frag_download(ctx)
@staticmethod
def __do_ytdl_file(ctx):
return not ctx['live'] and not ctx['tmpfilename'] == '-'
def _read_ytdl_file(self, ctx):
stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'r')
ctx['fragment_index'] = json.loads(stream.read())['downloader']['current_fragment']['index']
stream.close()
def _write_ytdl_file(self, ctx):
frag_index_stream, _ = sanitize_open(self.ytdl_filename(ctx['filename']), 'w')
downloader = {
'current_fragment': {
'index': ctx['fragment_index'],
},
}
if ctx.get('fragment_count') is not None:
downloader['fragment_count'] = ctx['fragment_count']
frag_index_stream.write(json.dumps({'downloader': downloader}))
frag_index_stream.close()
def _download_fragment(self, ctx, frag_url, info_dict, headers=None):
fragment_filename = '%s-Frag%d' % (ctx['tmpfilename'], ctx['fragment_index'])
success = ctx['dl'].download(fragment_filename, {
'url': frag_url,
'http_headers': headers or info_dict.get('http_headers'),
})
if not success:
return False, None
down, frag_sanitized = sanitize_open(fragment_filename, 'rb')
ctx['fragment_filename_sanitized'] = frag_sanitized
frag_content = down.read()
down.close()
return True, frag_content
def _append_fragment(self, ctx, frag_content):
try:
ctx['dest_stream'].write(frag_content)
finally:
if self.__do_ytdl_file(ctx):
self._write_ytdl_file(ctx)
if not self.params.get('keep_fragments', False):
os.remove(ctx['fragment_filename_sanitized'])
del ctx['fragment_filename_sanitized']
def _prepare_frag_download(self, ctx): def _prepare_frag_download(self, ctx):
if 'live' not in ctx: if 'live' not in ctx:
ctx['live'] = False ctx['live'] = False
@ -66,11 +134,38 @@ class FragmentFD(FileDownloader):
} }
) )
tmpfilename = self.temp_name(ctx['filename']) tmpfilename = self.temp_name(ctx['filename'])
dest_stream, tmpfilename = sanitize_open(tmpfilename, 'wb') open_mode = 'wb'
resume_len = 0
# Establish possible resume length
if os.path.isfile(encodeFilename(tmpfilename)):
open_mode = 'ab'
resume_len = os.path.getsize(encodeFilename(tmpfilename))
# Should be initialized before ytdl file check
ctx.update({
'tmpfilename': tmpfilename,
'fragment_index': 0,
})
if self.__do_ytdl_file(ctx):
if os.path.isfile(encodeFilename(self.ytdl_filename(ctx['filename']))):
self._read_ytdl_file(ctx)
else:
self._write_ytdl_file(ctx)
if ctx['fragment_index'] > 0:
assert resume_len > 0
else:
assert resume_len == 0
dest_stream, tmpfilename = sanitize_open(tmpfilename, open_mode)
ctx.update({ ctx.update({
'dl': dl, 'dl': dl,
'dest_stream': dest_stream, 'dest_stream': dest_stream,
'tmpfilename': tmpfilename, 'tmpfilename': tmpfilename,
# Total complete fragments downloaded so far in bytes
'complete_frags_downloaded_bytes': resume_len,
}) })
def _start_frag_download(self, ctx): def _start_frag_download(self, ctx):
@ -79,9 +174,9 @@ class FragmentFD(FileDownloader):
# hook # hook
state = { state = {
'status': 'downloading', 'status': 'downloading',
'downloaded_bytes': 0, 'downloaded_bytes': ctx['complete_frags_downloaded_bytes'],
'frag_index': 0, 'fragment_index': ctx['fragment_index'],
'frag_count': total_frags, 'fragment_count': total_frags,
'filename': ctx['filename'], 'filename': ctx['filename'],
'tmpfilename': ctx['tmpfilename'], 'tmpfilename': ctx['tmpfilename'],
} }
@ -89,8 +184,6 @@ class FragmentFD(FileDownloader):
start = time.time() start = time.time()
ctx.update({ ctx.update({
'started': start, 'started': start,
# Total complete fragments downloaded so far in bytes
'complete_frags_downloaded_bytes': 0,
# Amount of fragment's bytes downloaded by the time of the previous # Amount of fragment's bytes downloaded by the time of the previous
# frag progress hook invocation # frag progress hook invocation
'prev_frag_downloaded_bytes': 0, 'prev_frag_downloaded_bytes': 0,
@ -106,11 +199,12 @@ class FragmentFD(FileDownloader):
if not ctx['live']: if not ctx['live']:
estimated_size = ( estimated_size = (
(ctx['complete_frags_downloaded_bytes'] + frag_total_bytes) / (ctx['complete_frags_downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags) (state['fragment_index'] + 1) * total_frags)
state['total_bytes_estimate'] = estimated_size state['total_bytes_estimate'] = estimated_size
if s['status'] == 'finished': if s['status'] == 'finished':
state['frag_index'] += 1 state['fragment_index'] += 1
ctx['fragment_index'] = state['fragment_index']
state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes'] state['downloaded_bytes'] += frag_total_bytes - ctx['prev_frag_downloaded_bytes']
ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes'] ctx['complete_frags_downloaded_bytes'] = state['downloaded_bytes']
ctx['prev_frag_downloaded_bytes'] = 0 ctx['prev_frag_downloaded_bytes'] = 0
@ -132,6 +226,10 @@ class FragmentFD(FileDownloader):
def _finish_frag_download(self, ctx): def _finish_frag_download(self, ctx):
ctx['dest_stream'].close() ctx['dest_stream'].close()
if self.__do_ytdl_file(ctx):
ytdl_filename = encodeFilename(self.ytdl_filename(ctx['filename']))
if os.path.isfile(ytdl_filename):
os.remove(ytdl_filename)
elapsed = time.time() - ctx['started'] elapsed = time.time() - ctx['started']
self.try_rename(ctx['tmpfilename'], ctx['filename']) self.try_rename(ctx['tmpfilename'], ctx['filename'])
fsize = os.path.getsize(encodeFilename(ctx['filename'])) fsize = os.path.getsize(encodeFilename(ctx['filename']))

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os.path
import re import re
import binascii import binascii
try: try:
@ -18,8 +17,6 @@ from ..compat import (
compat_struct_pack, compat_struct_pack,
) )
from ..utils import ( from ..utils import (
encodeFilename,
sanitize_open,
parse_m3u8_attributes, parse_m3u8_attributes,
update_url_query, update_url_query,
) )
@ -103,17 +100,18 @@ class HlsFD(FragmentFD):
media_sequence = 0 media_sequence = 0
decrypt_info = {'METHOD': 'NONE'} decrypt_info = {'METHOD': 'NONE'}
byte_range = {} byte_range = {}
frags_filenames = [] frag_index = 0
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line: if line:
if not line.startswith('#'): if not line.startswith('#'):
frag_index += 1
if frag_index <= ctx['fragment_index']:
continue
frag_url = ( frag_url = (
line line
if re.match(r'^https?://', line) if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line)) else compat_urlparse.urljoin(man_url, line))
frag_name = 'Frag%d' % i
frag_filename = '%s-%s' % (ctx['tmpfilename'], frag_name)
if extra_query: if extra_query:
frag_url = update_url_query(frag_url, extra_query) frag_url = update_url_query(frag_url, extra_query)
count = 0 count = 0
@ -122,15 +120,10 @@ class HlsFD(FragmentFD):
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end']) headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'])
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(frag_filename, { success, frag_content = self._download_fragment(
'url': frag_url, ctx, frag_url, info_dict, headers)
'http_headers': headers,
})
if not success: if not success:
return False return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
break break
except compat_urllib_error.HTTPError as err: except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served. # Unavailable (possibly temporary) fragments may be served.
@ -139,28 +132,29 @@ class HlsFD(FragmentFD):
# https://github.com/rg3/youtube-dl/issues/10448). # https://github.com/rg3/youtube-dl/issues/10448).
count += 1 count += 1
if count <= fragment_retries: if count <= fragment_retries:
self.report_retry_fragment(err, frag_name, count, fragment_retries) self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries: if count > fragment_retries:
if skip_unavailable_fragments: if skip_unavailable_fragments:
i += 1 i += 1
media_sequence += 1 media_sequence += 1
self.report_skip_fragment(frag_name) self.report_skip_fragment(frag_index)
continue continue
self.report_error( self.report_error(
'giving up after %s fragment retries' % fragment_retries) 'giving up after %s fragment retries' % fragment_retries)
return False return False
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence) iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
decrypt_info['KEY'] = decrypt_info.get('KEY') or self.ydl.urlopen(decrypt_info['URI']).read()
frag_content = AES.new( frag_content = AES.new(
decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content) decrypt_info['KEY'], AES.MODE_CBC, iv).decrypt(frag_content)
ctx['dest_stream'].write(frag_content) self._append_fragment(ctx, frag_content)
frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test # We only download the first fragment during the test
if test: if test:
break break
i += 1 i += 1
media_sequence += 1 media_sequence += 1
elif line.startswith('#EXT-X-KEY'): elif line.startswith('#EXT-X-KEY'):
decrypt_url = decrypt_info.get('URI')
decrypt_info = parse_m3u8_attributes(line[11:]) decrypt_info = parse_m3u8_attributes(line[11:])
if decrypt_info['METHOD'] == 'AES-128': if decrypt_info['METHOD'] == 'AES-128':
if 'IV' in decrypt_info: if 'IV' in decrypt_info:
@ -170,7 +164,8 @@ class HlsFD(FragmentFD):
man_url, decrypt_info['URI']) man_url, decrypt_info['URI'])
if extra_query: if extra_query:
decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query) decrypt_info['URI'] = update_url_query(decrypt_info['URI'], extra_query)
decrypt_info['KEY'] = self.ydl.urlopen(decrypt_info['URI']).read() if decrypt_url != decrypt_info['URI']:
decrypt_info['KEY'] = None
elif line.startswith('#EXT-X-MEDIA-SEQUENCE'): elif line.startswith('#EXT-X-MEDIA-SEQUENCE'):
media_sequence = int(line[22:]) media_sequence = int(line[22:])
elif line.startswith('#EXT-X-BYTERANGE'): elif line.startswith('#EXT-X-BYTERANGE'):
@ -183,7 +178,4 @@ class HlsFD(FragmentFD):
self._finish_frag_download(ctx) self._finish_frag_download(ctx)
for frag_file in frags_filenames:
os.remove(encodeFilename(frag_file))
return True return True

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os
import time import time
import struct import struct
import binascii import binascii
@ -8,10 +7,6 @@ import io
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
from ..utils import (
sanitize_open,
encodeFilename,
)
u8 = struct.Struct(b'>B') u8 = struct.Struct(b'>B')
@ -225,50 +220,39 @@ class IsmFD(FragmentFD):
self._prepare_and_start_frag_download(ctx) self._prepare_and_start_frag_download(ctx)
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0) fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True) skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
track_written = False track_written = False
frag_index = 0
for i, segment in enumerate(segments): for i, segment in enumerate(segments):
segment_url = segment['url'] frag_index += 1
segment_name = 'Frag%d' % i if frag_index <= ctx['fragment_index']:
target_filename = '%s-%s' % (ctx['tmpfilename'], segment_name) continue
count = 0 count = 0
while count <= fragment_retries: while count <= fragment_retries:
try: try:
success = ctx['dl'].download(target_filename, { success, frag_content = self._download_fragment(ctx, segment['url'], info_dict)
'url': segment_url,
'http_headers': info_dict.get('http_headers'),
})
if not success: if not success:
return False return False
down, target_sanitized = sanitize_open(target_filename, 'rb')
down_data = down.read()
if not track_written: if not track_written:
tfhd_data = extract_box_data(down_data, [b'moof', b'traf', b'tfhd']) tfhd_data = extract_box_data(frag_content, [b'moof', b'traf', b'tfhd'])
info_dict['_download_params']['track_id'] = u32.unpack(tfhd_data[4:8])[0] info_dict['_download_params']['track_id'] = u32.unpack(tfhd_data[4:8])[0]
write_piff_header(ctx['dest_stream'], info_dict['_download_params']) write_piff_header(ctx['dest_stream'], info_dict['_download_params'])
track_written = True track_written = True
ctx['dest_stream'].write(down_data) self._append_fragment(ctx, frag_content)
down.close()
segments_filenames.append(target_sanitized)
break break
except compat_urllib_error.HTTPError as err: except compat_urllib_error.HTTPError as err:
count += 1 count += 1
if count <= fragment_retries: if count <= fragment_retries:
self.report_retry_fragment(err, segment_name, count, fragment_retries) self.report_retry_fragment(err, frag_index, count, fragment_retries)
if count > fragment_retries: if count > fragment_retries:
if skip_unavailable_fragments: if skip_unavailable_fragments:
self.report_skip_fragment(segment_name) self.report_skip_fragment(frag_index)
continue continue
self.report_error('giving up after %s fragment retries' % fragment_retries) self.report_error('giving up after %s fragment retries' % fragment_retries)
return False return False
self._finish_frag_download(ctx) self._finish_frag_download(ctx)
for segment_file in segments_filenames:
os.remove(encodeFilename(segment_file))
return True return True

View File

@ -7,15 +7,19 @@ from ..utils import (
parse_iso8601, parse_iso8601,
mimetype2ext, mimetype2ext,
determine_ext, determine_ext,
ExtractorError,
) )
class AMPIE(InfoExtractor): class AMPIE(InfoExtractor):
# parse Akamai Adaptive Media Player feed # parse Akamai Adaptive Media Player feed
def _extract_feed_info(self, url): def _extract_feed_info(self, url):
item = self._download_json( feed = self._download_json(
url, None, 'Downloading Akamai AMP feed', url, None, 'Downloading Akamai AMP feed',
'Unable to download Akamai AMP feed')['channel']['item'] 'Unable to download Akamai AMP feed')
item = feed.get('channel', {}).get('item')
if not item:
raise ExtractorError('%s said: %s' % (self.IE_NAME, feed['error']))
video_id = item['guid'] video_id = item['guid']

View File

@ -180,7 +180,7 @@ class ArteTVBaseIE(InfoExtractor):
class ArteTVPlus7IE(ArteTVBaseIE): class ArteTVPlus7IE(ArteTVBaseIE):
IE_NAME = 'arte.tv:+7' IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/[^/]+/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:(?:www|sites)\.)?arte\.tv/(?:[^/]+/)?(?P<lang>fr|de|en|es)/(?:videos/)?(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D', 'url': 'http://www.arte.tv/guide/de/sendungen/XEN/xenius/?vid=055918-015_PLUS7-D',
@ -188,6 +188,9 @@ class ArteTVPlus7IE(ArteTVBaseIE):
}, { }, {
'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22', 'url': 'http://sites.arte.tv/karambolage/de/video/karambolage-22',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.arte.tv/de/videos/048696-000-A/der-kluge-bauch-unser-zweites-gehirn',
'only_matching': True,
}] }]
@classmethod @classmethod

View File

@ -1,140 +0,0 @@
from __future__ import unicode_literals
import json
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
sanitized_Request,
)
class AzubuIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?azubu\.(?:tv|uol.com.br)/[^/]+#!/play/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://www.azubu.tv/GSL#!/play/15575/2014-hot6-cup-last-big-match-ro8-day-1',
'md5': 'a88b42fcf844f29ad6035054bd9ecaf4',
'info_dict': {
'id': '15575',
'ext': 'mp4',
'title': '2014 HOT6 CUP LAST BIG MATCH Ro8 Day 1',
'description': 'md5:d06bdea27b8cc4388a90ad35b5c66c01',
'thumbnail': r're:^https?://.*\.jpe?g',
'timestamp': 1417523507.334,
'upload_date': '20141202',
'duration': 9988.7,
'uploader': 'GSL',
'uploader_id': 414310,
'view_count': int,
},
},
{
'url': 'http://www.azubu.tv/FnaticTV#!/play/9344/-fnatic-at-worlds-2014:-toyz---%22i-love-rekkles,-he-has-amazing-mechanics%22-',
'md5': 'b72a871fe1d9f70bd7673769cdb3b925',
'info_dict': {
'id': '9344',
'ext': 'mp4',
'title': 'Fnatic at Worlds 2014: Toyz - "I love Rekkles, he has amazing mechanics"',
'description': 'md5:4a649737b5f6c8b5c5be543e88dc62af',
'thumbnail': r're:^https?://.*\.jpe?g',
'timestamp': 1410530893.320,
'upload_date': '20140912',
'duration': 172.385,
'uploader': 'FnaticTV',
'uploader_id': 272749,
'view_count': int,
},
'skip': 'Channel offline',
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://www.azubu.tv/api/video/%s' % video_id, video_id)['data']
title = data['title'].strip()
description = data.get('description')
thumbnail = data.get('thumbnail')
view_count = data.get('view_count')
user = data.get('user', {})
uploader = user.get('username')
uploader_id = user.get('id')
stream_params = json.loads(data['stream_params'])
timestamp = float_or_none(stream_params.get('creationDate'), 1000)
duration = float_or_none(stream_params.get('length'), 1000)
renditions = stream_params.get('renditions') or []
video = stream_params.get('FLVFullLength') or stream_params.get('videoFullLength')
if video:
renditions.append(video)
if not renditions and not user.get('channel', {}).get('is_live', True):
raise ExtractorError('%s said: channel is offline.' % self.IE_NAME, expected=True)
formats = [{
'url': fmt['url'],
'width': fmt['frameWidth'],
'height': fmt['frameHeight'],
'vbr': float_or_none(fmt['encodingRate'], 1000),
'filesize': fmt['size'],
'vcodec': fmt['videoCodec'],
'container': fmt['videoContainer'],
} for fmt in renditions if fmt['url']]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'uploader': uploader,
'uploader_id': uploader_id,
'view_count': view_count,
'formats': formats,
}
class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?azubu\.(?:tv|uol.com.br)/(?P<id>[^/]+)$'
_TESTS = [{
'url': 'http://www.azubu.tv/MarsTVMDLen',
'only_matching': True,
}, {
'url': 'http://azubu.uol.com.br/adolfz',
'only_matching': True,
}]
def _real_extract(self, url):
user = self._match_id(url)
info = self._download_json(
'http://api.azubu.tv/public/modules/last-video/{0}/info'.format(user),
user)['data']
if info['type'] != 'STREAM':
raise ExtractorError('{0} is not streaming live'.format(user), expected=True)
req = sanitized_Request(
'https://edge-elb.api.brightcove.com/playback/v1/accounts/3361910549001/videos/ref:' + info['reference_id'])
req.add_header('Accept', 'application/json;pk=BCpkADawqM1gvI0oGWg8dxQHlgT8HkdE2LnAlWAZkOlznO39bSZX726u4JqnDsK3MDXcO01JxXK2tZtJbgQChxgaFzEVdHRjaDoxaOu8hHOO8NYhwdxw9BzvgkvLUlpbDNUuDoc4E4wxDToV')
bc_info = self._download_json(req, user)
m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
self._sort_formats(formats)
return {
'id': info['id'],
'title': self._live_title(info['title']),
'uploader_id': user,
'formats': formats,
'is_live': True,
'thumbnail': bc_info['poster'],
}

View File

@ -131,6 +131,12 @@ class BrightcoveLegacyIE(InfoExtractor):
}, },
'playlist_mincount': 10, 'playlist_mincount': 10,
}, },
{
# playerID inferred from bcpid
# from http://www.un.org/chinese/News/story.asp?NewsID=27724
'url': 'https://link.brightcove.com/services/player/bcpid1722935254001/?bctid=5360463607001&autoStart=false&secureConnections=true&width=650&height=350',
'only_matching': True, # Tested in GenericIE
}
] ]
FLV_VCODECS = { FLV_VCODECS = {
1: 'SORENSON', 1: 'SORENSON',
@ -266,9 +272,13 @@ class BrightcoveLegacyIE(InfoExtractor):
if matches: if matches:
return list(filter(None, [cls._build_brighcove_url(m) for m in matches])) return list(filter(None, [cls._build_brighcove_url(m) for m in matches]))
return list(filter(None, [ matches = re.findall(r'(customBC\.createVideo\(.+?\);)', webpage)
cls._build_brighcove_url_from_js(custom_bc) if matches:
for custom_bc in re.findall(r'(customBC\.createVideo\(.+?\);)', webpage)])) return list(filter(None, [
cls._build_brighcove_url_from_js(custom_bc)
for custom_bc in matches]))
return [src for _, src in re.findall(
r'<iframe[^>]+src=([\'"])((?:https?:)?//link\.brightcove\.com/services/player/(?!\1).+)\1', webpage)]
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
@ -285,6 +295,10 @@ class BrightcoveLegacyIE(InfoExtractor):
if videoPlayer: if videoPlayer:
# We set the original url as the default 'Referer' header # We set the original url as the default 'Referer' header
referer = smuggled_data.get('Referer', url) referer = smuggled_data.get('Referer', url)
if 'playerID' not in query:
mobj = re.search(r'/bcpid(\d+)', url)
if mobj is not None:
query['playerID'] = [mobj.group(1)]
return self._get_video_info( return self._get_video_info(
videoPlayer[0], query, referer=referer) videoPlayer[0], query, referer=referer)
elif 'playerKey' in query: elif 'playerKey' in query:
@ -484,8 +498,8 @@ class BrightcoveNewIE(InfoExtractor):
}] }]
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(ie, webpage):
urls = BrightcoveNewIE._extract_urls(webpage) urls = BrightcoveNewIE._extract_urls(ie, webpage)
return urls[0] if urls else None return urls[0] if urls else None
@staticmethod @staticmethod
@ -508,7 +522,7 @@ class BrightcoveNewIE(InfoExtractor):
# [2] looks like: # [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall( for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx) r'''(?isx)
(<video\s+[^>]+>) (<video\s+[^>]*data-video-id=['"]?[^>]+>)
(?:.*? (?:.*?
(<script[^>]+ (<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/ src=["\'](?:https?:)?//players\.brightcove\.net/

View File

@ -976,6 +976,22 @@ class InfoExtractor(object):
return info return info
if isinstance(json_ld, dict): if isinstance(json_ld, dict):
json_ld = [json_ld] json_ld = [json_ld]
def extract_video_object(e):
assert e['@type'] == 'VideoObject'
info.update({
'url': e.get('contentUrl'),
'title': unescapeHTML(e.get('name')),
'description': unescapeHTML(e.get('description')),
'thumbnail': e.get('thumbnailUrl') or e.get('thumbnailURL'),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
})
for e in json_ld: for e in json_ld:
if e.get('@context') == 'http://schema.org': if e.get('@context') == 'http://schema.org':
item_type = e.get('@type') item_type = e.get('@type')
@ -1000,18 +1016,11 @@ class InfoExtractor(object):
'description': unescapeHTML(e.get('articleBody')), 'description': unescapeHTML(e.get('articleBody')),
}) })
elif item_type == 'VideoObject': elif item_type == 'VideoObject':
info.update({ extract_video_object(e)
'url': e.get('contentUrl'), elif item_type == 'WebPage':
'title': unescapeHTML(e.get('name')), video = e.get('video')
'description': unescapeHTML(e.get('description')), if isinstance(video, dict) and video.get('@type') == 'VideoObject':
'thumbnail': e.get('thumbnailUrl') or e.get('thumbnailURL'), extract_video_object(video)
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),
'height': int_or_none(e.get('height')),
})
break break
return dict((k, v) for k, v in info.items() if v is not None) return dict((k, v) for k, v in info.items() if v is not None)
@ -1303,17 +1312,25 @@ class InfoExtractor(object):
entry_protocol='m3u8', preference=None, entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None, m3u8_id=None, note=None, errnote=None,
fatal=True, live=False): fatal=True, live=False):
res = self._download_webpage_handle( res = self._download_webpage_handle(
m3u8_url, video_id, m3u8_url, video_id,
note=note or 'Downloading m3u8 information', note=note or 'Downloading m3u8 information',
errnote=errnote or 'Failed to download m3u8 information', errnote=errnote or 'Failed to download m3u8 information',
fatal=fatal) fatal=fatal)
if res is False: if res is False:
return [] return []
m3u8_doc, urlh = res m3u8_doc, urlh = res
m3u8_url = urlh.geturl() m3u8_url = urlh.geturl()
return self._parse_m3u8_formats(
m3u8_doc, m3u8_url, ext=ext, entry_protocol=entry_protocol,
preference=preference, m3u8_id=m3u8_id, live=live)
def _parse_m3u8_formats(self, m3u8_doc, m3u8_url, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, live=False):
if '#EXT-X-FAXS-CM:' in m3u8_doc: # Adobe Flash Access if '#EXT-X-FAXS-CM:' in m3u8_doc: # Adobe Flash Access
return [] return []
@ -1324,19 +1341,21 @@ class InfoExtractor(object):
if re.match(r'^https?://', u) if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u)) else compat_urlparse.urljoin(m3u8_url, u))
# We should try extracting formats only from master playlists [1], i.e. # References:
# playlists that describe available qualities. On the other hand media # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
# playlists [2] should be returned as is since they contain just the media # 2. https://github.com/rg3/youtube-dl/issues/12211
# without qualities renditions.
# We should try extracting formats only from master playlists [1, 4.3.4],
# i.e. playlists that describe available qualities. On the other hand
# media playlists [1, 4.3.3] should be returned as is since they contain
# just the media without qualities renditions.
# Fortunately, master playlist can be easily distinguished from media # Fortunately, master playlist can be easily distinguished from media
# playlist based on particular tags availability. As of [1, 2] master # playlist based on particular tags availability. As of [1, 4.3.3, 4.3.4]
# playlist tags MUST NOT appear in a media playist and vice versa. # master playlist tags MUST NOT appear in a media playist and vice versa.
# As of [3] #EXT-X-TARGETDURATION tag is REQUIRED for every media playlist # As of [1, 4.3.3.1] #EXT-X-TARGETDURATION tag is REQUIRED for every
# and MUST NOT appear in master playlist thus we can clearly detect media # media playlist and MUST NOT appear in master playlist thus we can
# playlist with this criterion. # clearly detect media playlist with this criterion.
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.1
if '#EXT-X-TARGETDURATION' in m3u8_doc: # media playlist, return as is if '#EXT-X-TARGETDURATION' in m3u8_doc: # media playlist, return as is
return [{ return [{
'url': m3u8_url, 'url': m3u8_url,
@ -1345,52 +1364,71 @@ class InfoExtractor(object):
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
}] }]
audio_in_video_stream = {}
last_info = {} groups = {}
last_media = {} last_stream_inf = {}
def extract_media(x_media_line):
media = parse_m3u8_attributes(x_media_line)
# As per [1, 4.3.4.1] TYPE, GROUP-ID and NAME are REQUIRED
media_type, group_id, name = media.get('TYPE'), media.get('GROUP-ID'), media.get('NAME')
if not (media_type and group_id and name):
return
groups.setdefault(group_id, []).append(media)
if media_type not in ('VIDEO', 'AUDIO'):
return
media_url = media.get('URI')
if media_url:
format_id = []
for v in (group_id, name):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
formats.append(f)
def build_stream_name():
# Despite specification does not mention NAME attribute for
# EXT-X-STREAM-INF tag it still sometimes may be present (see [1]
# or vidio test in TestInfoExtractor.test_parse_m3u8_formats)
# 1. http://www.vidio.com/watch/165683-dj_ambred-booyah-live-2015
stream_name = last_stream_inf.get('NAME')
if stream_name:
return stream_name
# If there is no NAME in EXT-X-STREAM-INF it will be obtained
# from corresponding rendition group
stream_group_id = last_stream_inf.get('VIDEO')
if not stream_group_id:
return
stream_group = groups.get(stream_group_id)
if not stream_group:
return stream_group_id
rendition = stream_group[0]
return rendition.get('NAME') or stream_group_id
for line in m3u8_doc.splitlines(): for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'): if line.startswith('#EXT-X-STREAM-INF:'):
last_info = parse_m3u8_attributes(line) last_stream_inf = parse_m3u8_attributes(line)
elif line.startswith('#EXT-X-MEDIA:'): elif line.startswith('#EXT-X-MEDIA:'):
media = parse_m3u8_attributes(line) extract_media(line)
media_type = media.get('TYPE')
if media_type in ('VIDEO', 'AUDIO'):
group_id = media.get('GROUP-ID')
media_url = media.get('URI')
if media_url:
format_id = []
for v in (group_id, media.get('NAME')):
if v:
format_id.append(v)
f = {
'format_id': '-'.join(format_id),
'url': format_url(media_url),
'language': media.get('LANGUAGE'),
'ext': ext,
'protocol': entry_protocol,
'preference': preference,
}
if media_type == 'AUDIO':
f['vcodec'] = 'none'
if group_id and not audio_in_video_stream.get(group_id):
audio_in_video_stream[group_id] = False
formats.append(f)
else:
# When there is no URI in EXT-X-MEDIA let this tag's
# data be used by regular URI lines below
last_media = media
if media_type == 'AUDIO' and group_id:
audio_in_video_stream[group_id] = True
elif line.startswith('#') or not line.strip(): elif line.startswith('#') or not line.strip():
continue continue
else: else:
tbr = int_or_none(last_info.get('AVERAGE-BANDWIDTH') or last_info.get('BANDWIDTH'), scale=1000) tbr = float_or_none(
last_stream_inf.get('AVERAGE-BANDWIDTH') or
last_stream_inf.get('BANDWIDTH'), scale=1000)
format_id = [] format_id = []
if m3u8_id: if m3u8_id:
format_id.append(m3u8_id) format_id.append(m3u8_id)
# Despite specification does not mention NAME attribute for stream_name = build_stream_name()
# EXT-X-STREAM-INF it still sometimes may be present
stream_name = last_info.get('NAME') or last_media.get('NAME')
# Bandwidth of live streams may differ over time thus making # Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided # format_id unpredictable. So it's better to keep provided
# format_id intact. # format_id intact.
@ -1403,11 +1441,11 @@ class InfoExtractor(object):
'manifest_url': manifest_url, 'manifest_url': manifest_url,
'tbr': tbr, 'tbr': tbr,
'ext': ext, 'ext': ext,
'fps': float_or_none(last_info.get('FRAME-RATE')), 'fps': float_or_none(last_stream_inf.get('FRAME-RATE')),
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
} }
resolution = last_info.get('RESOLUTION') resolution = last_stream_inf.get('RESOLUTION')
if resolution: if resolution:
mobj = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', resolution) mobj = re.search(r'(?P<width>\d+)[xX](?P<height>\d+)', resolution)
if mobj: if mobj:
@ -1423,13 +1461,26 @@ class InfoExtractor(object):
'vbr': vbr, 'vbr': vbr,
'abr': abr, 'abr': abr,
}) })
f.update(parse_codecs(last_info.get('CODECS'))) codecs = parse_codecs(last_stream_inf.get('CODECS'))
if audio_in_video_stream.get(last_info.get('AUDIO')) is False and f['vcodec'] != 'none': f.update(codecs)
# TODO: update acodec for audio only formats with the same GROUP-ID audio_group_id = last_stream_inf.get('AUDIO')
f['acodec'] = 'none' # As per [1, 4.3.4.1.1] any EXT-X-STREAM-INF tag which
# references a rendition group MUST have a CODECS attribute.
# However, this is not always respected, for example, [2]
# contains EXT-X-STREAM-INF tag which references AUDIO
# rendition group but does not have CODECS and despite
# referencing audio group an audio group, it represents
# a complete (with audio and video) format. So, for such cases
# we will ignore references to rendition groups and treat them
# as complete formats.
if audio_group_id and codecs and f.get('vcodec') != 'none':
audio_group = groups.get(audio_group_id)
if audio_group and audio_group[0].get('URI'):
# TODO: update acodec for audio only formats with
# the same GROUP-ID
f['acodec'] = 'none'
formats.append(f) formats.append(f)
last_info = {} last_stream_inf = {}
last_media = {}
return formats return formats
@staticmethod @staticmethod
@ -1803,7 +1854,7 @@ class InfoExtractor(object):
'ext': mimetype2ext(mime_type), 'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')), 'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')), 'height': int_or_none(representation_attrib.get('height')),
'tbr': int_or_none(bandwidth, 1000), 'tbr': float_or_none(bandwidth, 1000),
'asr': int_or_none(representation_attrib.get('audioSamplingRate')), 'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
'fps': int_or_none(representation_attrib.get('frameRate')), 'fps': int_or_none(representation_attrib.get('frameRate')),
'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None, 'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
@ -2182,7 +2233,7 @@ class InfoExtractor(object):
def _find_jwplayer_data(self, webpage, video_id=None, transform_source=js_to_json): def _find_jwplayer_data(self, webpage, video_id=None, transform_source=js_to_json):
mobj = re.search( mobj = re.search(
r'(?s)jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\).*?\.setup\s*\((?P<options>[^)]+)\)', r'(?s)jwplayer\((?P<quote>[\'"])[^\'" ]+(?P=quote)\)(?!</script>).*?\.setup\s*\((?P<options>[^)]+)\)',
webpage) webpage)
if mobj: if mobj:
try: try:

View File

@ -87,7 +87,6 @@ from .azmedien import (
AZMedienPlaylistIE, AZMedienPlaylistIE,
AZMedienShowPlaylistIE, AZMedienShowPlaylistIE,
) )
from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE from .bandcamp import BandcampIE, BandcampAlbumIE

View File

@ -430,6 +430,22 @@ class GenericIE(InfoExtractor):
'skip_download': True, # m3u8 download 'skip_download': True, # m3u8 download
}, },
}, },
{
# Brightcove video in <iframe>
'url': 'http://www.un.org/chinese/News/story.asp?NewsID=27724',
'md5': '36d74ef5e37c8b4a2ce92880d208b968',
'info_dict': {
'id': '5360463607001',
'ext': 'mp4',
'title': '叙利亚失明儿童在废墟上演唱《心跳》 呼吁获得正常童年生活',
'description': '联合国儿童基金会中东和北非区域大使、作曲家扎德·迪拉尼Zade Dirani在3月15日叙利亚冲突爆发7周年纪念日之际发布了为叙利亚谱写的歌曲《心跳》HEARTBEAT为受到六年冲突影响的叙利亚儿童发出强烈呐喊呼吁世界做出共同努力使叙利亚儿童重新获得享有正常童年生活的权利。',
'uploader': 'United Nations',
'uploader_id': '1362235914001',
'timestamp': 1489593889,
'upload_date': '20170315',
},
'add_ie': ['BrightcoveLegacy'],
},
{ {
# Brightcove with alternative playerID key # Brightcove with alternative playerID key
'url': 'http://www.nature.com/nmeth/journal/v9/n7/fig_tab/nmeth.2062_SV1.html', 'url': 'http://www.nature.com/nmeth/journal/v9/n7/fig_tab/nmeth.2062_SV1.html',

View File

@ -112,7 +112,8 @@ class InstagramIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
(video_url, description, thumbnail, timestamp, uploader, (video_url, description, thumbnail, timestamp, uploader,
uploader_id, like_count, comment_count, height, width) = [None] * 10 uploader_id, like_count, comment_count, comments, height,
width) = [None] * 11
shared_data = self._parse_json( shared_data = self._parse_json(
self._search_regex( self._search_regex(
@ -121,7 +122,10 @@ class InstagramIE(InfoExtractor):
video_id, fatal=False) video_id, fatal=False)
if shared_data: if shared_data:
media = try_get( media = try_get(
shared_data, lambda x: x['entry_data']['PostPage'][0]['media'], dict) shared_data,
(lambda x: x['entry_data']['PostPage'][0]['graphql']['shortcode_media'],
lambda x: x['entry_data']['PostPage'][0]['media']),
dict)
if media: if media:
video_url = media.get('video_url') video_url = media.get('video_url')
height = int_or_none(media.get('dimensions', {}).get('height')) height = int_or_none(media.get('dimensions', {}).get('height'))

View File

@ -189,7 +189,11 @@ class IqiyiIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://yule.iqiyi.com/pcb.html', 'url': 'http://yule.iqiyi.com/pcb.html',
'only_matching': True, 'info_dict': {
'id': '4a0af228fddb55ec96398a364248ed7f',
'ext': 'mp4',
'title': '第2017-04-21期 女艺人频遭极端粉丝骚扰',
},
}, { }, {
# VIP-only video. The first 2 parts (6 minutes) are available without login # VIP-only video. The first 2 parts (6 minutes) are available without login
# MD5 sums omitted as values are different on Travis CI and my machine # MD5 sums omitted as values are different on Travis CI and my machine
@ -337,15 +341,18 @@ class IqiyiIE(InfoExtractor):
url, 'temp_id', note='download video page') url, 'temp_id', note='download video page')
# There's no simple way to determine whether an URL is a playlist or not # There's no simple way to determine whether an URL is a playlist or not
# So detect it # Sometimes there are playlist links in individual videos, so treat it
playlist_result = self._extract_playlist(webpage) # as a single video first
if playlist_result:
return playlist_result
tvid = self._search_regex( tvid = self._search_regex(
r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid') r'data-(?:player|shareplattrigger)-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid', default=None)
if tvid is None:
playlist_result = self._extract_playlist(webpage)
if playlist_result:
return playlist_result
raise ExtractorError('Can\'t find any video')
video_id = self._search_regex( video_id = self._search_regex(
r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id') r'data-(?:player|shareplattrigger)-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
formats = [] formats = []
for _ in range(5): for _ in range(5):
@ -377,7 +384,8 @@ class IqiyiIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = (get_element_by_id('widget-videotitle', webpage) or title = (get_element_by_id('widget-videotitle', webpage) or
clean_html(get_element_by_attribute('class', 'mod-play-tit', webpage))) clean_html(get_element_by_attribute('class', 'mod-play-tit', webpage)) or
self._html_search_regex(r'<span[^>]+data-videochanged-title="word"[^>]*>([^<]+)</span>', webpage, 'title'))
return { return {
'id': video_id, 'id': video_id,

View File

@ -28,7 +28,7 @@ class NownessBaseIE(InfoExtractor):
bc_url = BrightcoveLegacyIE._extract_brightcove_url(player_code) bc_url = BrightcoveLegacyIE._extract_brightcove_url(player_code)
if bc_url: if bc_url:
return self.url_result(bc_url, BrightcoveLegacyIE.ie_key()) return self.url_result(bc_url, BrightcoveLegacyIE.ie_key())
bc_url = BrightcoveNewIE._extract_url(player_code) bc_url = BrightcoveNewIE._extract_url(self, player_code)
if bc_url: if bc_url:
return self.url_result(bc_url, BrightcoveNewIE.ie_key()) return self.url_result(bc_url, BrightcoveNewIE.ie_key())
raise ExtractorError('Could not find player definition') raise ExtractorError('Could not find player definition')

View File

@ -38,7 +38,7 @@ class OdnoklassnikiIE(InfoExtractor):
}, { }, {
# metadataUrl # metadataUrl
'url': 'http://ok.ru/video/63567059965189-0?fromTime=5', 'url': 'http://ok.ru/video/63567059965189-0?fromTime=5',
'md5': '9676cf86eff5391d35dea675d224e131', 'md5': '6ff470ea2dd51d5d18c295a355b0b6bc',
'info_dict': { 'info_dict': {
'id': '63567059965189-0', 'id': '63567059965189-0',
'ext': 'mp4', 'ext': 'mp4',
@ -54,7 +54,7 @@ class OdnoklassnikiIE(InfoExtractor):
}, { }, {
# YouTube embed (metadataUrl, provider == USER_YOUTUBE) # YouTube embed (metadataUrl, provider == USER_YOUTUBE)
'url': 'http://ok.ru/video/64211978996595-1', 'url': 'http://ok.ru/video/64211978996595-1',
'md5': '5d7475d428845cd2e13bae6f1a992278', 'md5': '2f206894ffb5dbfcce2c5a14b909eea5',
'info_dict': { 'info_dict': {
'id': '64211978996595-1', 'id': '64211978996595-1',
'ext': 'mp4', 'ext': 'mp4',
@ -62,8 +62,8 @@ class OdnoklassnikiIE(InfoExtractor):
'description': 'md5:848eb8b85e5e3471a3a803dae1343ed0', 'description': 'md5:848eb8b85e5e3471a3a803dae1343ed0',
'duration': 440, 'duration': 440,
'upload_date': '20150826', 'upload_date': '20150826',
'uploader_id': '750099571', 'uploader_id': 'tvroscosmos',
'uploader': 'Алина П', 'uploader': 'Телестудия Роскосмоса',
'age_limit': 0, 'age_limit': 0,
}, },
}, { }, {
@ -82,6 +82,7 @@ class OdnoklassnikiIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Video has not been found',
}, { }, {
'url': 'http://ok.ru/web-api/video/moviePlayer/20079905452', 'url': 'http://ok.ru/web-api/video/moviePlayer/20079905452',
'only_matching': True, 'only_matching': True,

View File

@ -1,10 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
)
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
parse_duration, parse_duration,
@ -19,7 +15,7 @@ class Porn91IE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://91porn.com/view_video.php?viewkey=7e42283b4f5ab36da134', 'url': 'http://91porn.com/view_video.php?viewkey=7e42283b4f5ab36da134',
'md5': '6df8f6d028bc8b14f5dbd73af742fb20', 'md5': '7fcdb5349354f40d41689bd0fa8db05a',
'info_dict': { 'info_dict': {
'id': '7e42283b4f5ab36da134', 'id': '7e42283b4f5ab36da134',
'title': '18岁大一漂亮学妹水嫩性感再爽一次', 'title': '18岁大一漂亮学妹水嫩性感再爽一次',
@ -43,24 +39,7 @@ class Porn91IE(InfoExtractor):
r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title') r'<div id="viewvideo-title">([^<]+)</div>', webpage, 'title')
title = title.replace('\n', '') title = title.replace('\n', '')
# get real url info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
file_id = self._search_regex(
r'so.addVariable\(\'file\',\'(\d+)\'', webpage, 'file id')
sec_code = self._search_regex(
r'so.addVariable\(\'seccode\',\'([^\']+)\'', webpage, 'sec code')
max_vid = self._search_regex(
r'so.addVariable\(\'max_vid\',\'(\d+)\'', webpage, 'max vid')
url_params = compat_urllib_parse_urlencode({
'VID': file_id,
'mp4': '1',
'seccode': sec_code,
'max_vid': max_vid,
})
info_cn = self._download_webpage(
'http://91porn.com/getfile.php?' + url_params, video_id,
'Downloading real video url')
video_url = compat_urllib_parse_unquote(self._search_regex(
r'file=([^&]+)&', info_cn, 'url'))
duration = parse_duration(self._search_regex( duration = parse_duration(self._search_regex(
r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False)) r'时长:\s*</span>\s*(\d+:\d+)', webpage, 'duration', fatal=False))
@ -68,11 +47,12 @@ class Porn91IE(InfoExtractor):
comment_count = int_or_none(self._search_regex( comment_count = int_or_none(self._search_regex(
r'留言:\s*</span>\s*(\d+)', webpage, 'comment count', fatal=False)) r'留言:\s*</span>\s*(\d+)', webpage, 'comment count', fatal=False))
return { info_dict.update({
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': video_url,
'duration': duration, 'duration': duration,
'comment_count': comment_count, 'comment_count': comment_count,
'age_limit': self._rta_search(webpage), 'age_limit': self._rta_search(webpage),
} })
return info_dict

View File

@ -49,8 +49,11 @@ class VidioIE(InfoExtractor):
thumbnail = clip.get('image') thumbnail = clip.get('image')
m3u8_url = m3u8_url or self._search_regex( m3u8_url = m3u8_url or self._search_regex(
r'data(?:-vjs)?-clip-hls-url=(["\'])(?P<url>.+?)\1', webpage, 'hls url') r'data(?:-vjs)?-clip-hls-url=(["\'])(?P<url>(?!\1).+)\1',
formats = self._extract_m3u8_formats(m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native') webpage, 'hls url')
formats = self._extract_m3u8_formats(
m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
duration = int_or_none(duration or self._search_regex( duration = int_or_none(duration or self._search_regex(
r'data-video-duration=(["\'])(?P<duartion>\d+)\1', webpage, 'duration')) r'data-video-duration=(["\'])(?P<duartion>\d+)\1', webpage, 'duration'))

View File

@ -42,14 +42,15 @@ class VidziIE(InfoExtractor):
title = self._html_search_regex( title = self._html_search_regex(
r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title') r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title')
packed_codes = [mobj.group(0) for mobj in re.finditer( codes = [webpage]
PACKED_CODES_RE, webpage)] codes.extend([
for num, pc in enumerate(packed_codes, 1): decode_packed_codes(mobj.group(0)).replace('\\\'', '\'')
code = decode_packed_codes(pc).replace('\\\'', '\'') for mobj in re.finditer(PACKED_CODES_RE, webpage)])
for num, code in enumerate(codes, 1):
jwplayer_data = self._parse_json( jwplayer_data = self._parse_json(
self._search_regex( self._search_regex(
r'setup\(([^)]+)\)', code, 'jwplayer data', r'setup\(([^)]+)\)', code, 'jwplayer data',
default=NO_DEFAULT if num == len(packed_codes) else '{}'), default=NO_DEFAULT if num == len(codes) else '{}'),
video_id, transform_source=js_to_json) video_id, transform_source=js_to_json)
if jwplayer_data: if jwplayer_data:
break break

View File

@ -17,24 +17,24 @@ from ..utils import (
class XFileShareIE(InfoExtractor): class XFileShareIE(InfoExtractor):
_SITES = ( _SITES = (
('daclips.in', 'DaClips'), (r'daclips\.(?:in|com)', 'DaClips'),
('filehoot.com', 'FileHoot'), (r'filehoot\.com', 'FileHoot'),
('gorillavid.in', 'GorillaVid'), (r'gorillavid\.(?:in|com)', 'GorillaVid'),
('movpod.in', 'MovPod'), (r'movpod\.in', 'MovPod'),
('powerwatch.pw', 'PowerWatch'), (r'powerwatch\.pw', 'PowerWatch'),
('rapidvideo.ws', 'Rapidvideo.ws'), (r'rapidvideo\.ws', 'Rapidvideo.ws'),
('thevideobee.to', 'TheVideoBee'), (r'thevideobee\.to', 'TheVideoBee'),
('vidto.me', 'Vidto'), (r'vidto\.me', 'Vidto'),
('streamin.to', 'Streamin.To'), (r'streamin\.to', 'Streamin.To'),
('xvidstage.com', 'XVIDSTAGE'), (r'xvidstage\.com', 'XVIDSTAGE'),
('vidabc.com', 'Vid ABC'), (r'vidabc\.com', 'Vid ABC'),
('vidbom.com', 'VidBom'), (r'vidbom\.com', 'VidBom'),
('vidlo.us', 'vidlo'), (r'vidlo\.us', 'vidlo'),
) )
IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1]) IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
_VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)' _VALID_URL = (r'https?://(?P<host>(?:www\.)?(?:%s))/(?:embed-)?(?P<id>[0-9a-zA-Z]+)'
% '|'.join(re.escape(site) for site in list(zip(*_SITES))[0])) % '|'.join(site for site in list(zip(*_SITES))[0]))
_FILE_NOT_FOUND_REGEXES = ( _FILE_NOT_FOUND_REGEXES = (
r'>(?:404 - )?File Not Found<', r'>(?:404 - )?File Not Found<',

View File

@ -258,7 +258,7 @@ class YahooIE(InfoExtractor):
return self.url_result(bc_url, BrightcoveLegacyIE.ie_key()) return self.url_result(bc_url, BrightcoveLegacyIE.ie_key())
# Look for Brightcove New Studio embeds # Look for Brightcove New Studio embeds
bc_url = BrightcoveNewIE._extract_url(webpage) bc_url = BrightcoveNewIE._extract_url(self, webpage)
if bc_url: if bc_url:
return self.url_result(bc_url, BrightcoveNewIE.ie_key()) return self.url_result(bc_url, BrightcoveNewIE.ie_key())

View File

@ -468,6 +468,10 @@ def parseOpts(overrideArguments=None):
'--abort-on-unavailable-fragment', '--abort-on-unavailable-fragment',
action='store_false', dest='skip_unavailable_fragments', action='store_false', dest='skip_unavailable_fragments',
help='Abort downloading when some fragment is not available') help='Abort downloading when some fragment is not available')
downloader.add_option(
'--keep-fragments',
action='store_true', dest='keep_fragments', default=False,
help='Keep downloaded fragments on disk after downloading is finished; fragments are erased by default')
downloader.add_option( downloader.add_option(
'--buffer-size', '--buffer-size',
dest='buffersize', metavar='SIZE', default='1024', dest='buffersize', metavar='SIZE', default='1024',

View File

@ -193,9 +193,10 @@ class sockssocket(socket.socket):
self._check_response_version(SOCKS5_VERSION, version) self._check_response_version(SOCKS5_VERSION, version)
if method == Socks5Auth.AUTH_NO_ACCEPTABLE: if method == Socks5Auth.AUTH_NO_ACCEPTABLE or (
method == Socks5Auth.AUTH_USER_PASS and (not self._proxy.username or not self._proxy.password)):
self.close() self.close()
raise Socks5Error(method) raise Socks5Error(Socks5Auth.AUTH_NO_ACCEPTABLE)
if method == Socks5Auth.AUTH_USER_PASS: if method == Socks5Auth.AUTH_USER_PASS:
username = self._proxy.username.encode('utf-8') username = self._proxy.username.encode('utf-8')

View File

@ -2103,13 +2103,16 @@ def dict_get(d, key_or_keys, default=None, skip_false_values=True):
def try_get(src, getter, expected_type=None): def try_get(src, getter, expected_type=None):
try: if not isinstance(getter, (list, tuple)):
v = getter(src) getter = [getter]
except (AttributeError, KeyError, TypeError, IndexError): for get in getter:
pass try:
else: v = get(src)
if expected_type is None or isinstance(v, expected_type): except (AttributeError, KeyError, TypeError, IndexError):
return v pass
else:
if expected_type is None or isinstance(v, expected_type):
return v
def encode_compat_str(string, encoding=preferredencoding(), errors='strict'): def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
@ -2508,27 +2511,97 @@ def srt_subtitles_timecode(seconds):
def dfxp2srt(dfxp_data): def dfxp2srt(dfxp_data):
LEGACY_NAMESPACES = (
('http://www.w3.org/ns/ttml', [
'http://www.w3.org/2004/11/ttaf1',
'http://www.w3.org/2006/04/ttaf1',
'http://www.w3.org/2006/10/ttaf1',
]),
('http://www.w3.org/ns/ttml#styling', [
'http://www.w3.org/ns/ttml#style',
]),
)
SUPPORTED_STYLING = [
'color',
'fontFamily',
'fontSize',
'fontStyle',
'fontWeight',
'textDecoration'
]
_x = functools.partial(xpath_with_ns, ns_map={ _x = functools.partial(xpath_with_ns, ns_map={
'ttml': 'http://www.w3.org/ns/ttml', 'ttml': 'http://www.w3.org/ns/ttml',
'ttaf1': 'http://www.w3.org/2006/10/ttaf1', 'tts': 'http://www.w3.org/ns/ttml#styling',
'ttaf1_0604': 'http://www.w3.org/2006/04/ttaf1',
}) })
styles = {}
default_style = {}
class TTMLPElementParser(object): class TTMLPElementParser(object):
out = '' _out = ''
_unclosed_elements = []
_applied_styles = []
def start(self, tag, attrib): def start(self, tag, attrib):
if tag in (_x('ttml:br'), _x('ttaf1:br'), 'br'): if tag in (_x('ttml:br'), 'br'):
self.out += '\n' self._out += '\n'
else:
unclosed_elements = []
style = {}
element_style_id = attrib.get('style')
if default_style:
style.update(default_style)
if element_style_id:
style.update(styles.get(element_style_id, {}))
for prop in SUPPORTED_STYLING:
prop_val = attrib.get(_x('tts:' + prop))
if prop_val:
style[prop] = prop_val
if style:
font = ''
for k, v in sorted(style.items()):
if self._applied_styles and self._applied_styles[-1].get(k) == v:
continue
if k == 'color':
font += ' color="%s"' % v
elif k == 'fontSize':
font += ' size="%s"' % v
elif k == 'fontFamily':
font += ' face="%s"' % v
elif k == 'fontWeight' and v == 'bold':
self._out += '<b>'
unclosed_elements.append('b')
elif k == 'fontStyle' and v == 'italic':
self._out += '<i>'
unclosed_elements.append('i')
elif k == 'textDecoration' and v == 'underline':
self._out += '<u>'
unclosed_elements.append('u')
if font:
self._out += '<font' + font + '>'
unclosed_elements.append('font')
applied_style = {}
if self._applied_styles:
applied_style.update(self._applied_styles[-1])
applied_style.update(style)
self._applied_styles.append(applied_style)
self._unclosed_elements.append(unclosed_elements)
def end(self, tag): def end(self, tag):
pass if tag not in (_x('ttml:br'), 'br'):
unclosed_elements = self._unclosed_elements.pop()
for element in reversed(unclosed_elements):
self._out += '</%s>' % element
if unclosed_elements and self._applied_styles:
self._applied_styles.pop()
def data(self, data): def data(self, data):
self.out += data self._out += data
def close(self): def close(self):
return self.out.strip() return self._out.strip()
def parse_node(node): def parse_node(node):
target = TTMLPElementParser() target = TTMLPElementParser()
@ -2536,13 +2609,45 @@ def dfxp2srt(dfxp_data):
parser.feed(xml.etree.ElementTree.tostring(node)) parser.feed(xml.etree.ElementTree.tostring(node))
return parser.close() return parser.close()
for k, v in LEGACY_NAMESPACES:
for ns in v:
dfxp_data = dfxp_data.replace(ns, k)
dfxp = compat_etree_fromstring(dfxp_data.encode('utf-8')) dfxp = compat_etree_fromstring(dfxp_data.encode('utf-8'))
out = [] out = []
paras = dfxp.findall(_x('.//ttml:p')) or dfxp.findall(_x('.//ttaf1:p')) or dfxp.findall(_x('.//ttaf1_0604:p')) or dfxp.findall('.//p') paras = dfxp.findall(_x('.//ttml:p')) or dfxp.findall('.//p')
if not paras: if not paras:
raise ValueError('Invalid dfxp/TTML subtitle') raise ValueError('Invalid dfxp/TTML subtitle')
repeat = False
while True:
for style in dfxp.findall(_x('.//ttml:style')):
style_id = style.get('id')
parent_style_id = style.get('style')
if parent_style_id:
if parent_style_id not in styles:
repeat = True
continue
styles[style_id] = styles[parent_style_id].copy()
for prop in SUPPORTED_STYLING:
prop_val = style.get(_x('tts:' + prop))
if prop_val:
styles.setdefault(style_id, {})[prop] = prop_val
if repeat:
repeat = False
else:
break
for p in ('body', 'div'):
ele = xpath_element(dfxp, [_x('.//ttml:' + p), './/' + p])
if ele is None:
continue
style = styles.get(ele.get('style'))
if not style:
continue
default_style.update(style)
for para, index in zip(paras, itertools.count(1)): for para, index in zip(paras, itertools.count(1)):
begin_time = parse_dfxp_time_expr(para.attrib.get('begin')) begin_time = parse_dfxp_time_expr(para.attrib.get('begin'))
end_time = parse_dfxp_time_expr(para.attrib.get('end')) end_time = parse_dfxp_time_expr(para.attrib.get('end'))

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.04.17' __version__ = '2017.04.26'