Merge remote-tracking branch 'upstream/master' into yourporn
This commit is contained in:
commit
2784e9376e
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@ -6,8 +6,8 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.17*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.23*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.17**
|
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.23**
|
||||||
|
|
||||||
### Before submitting an *issue* make sure you have:
|
### Before submitting an *issue* make sure you have:
|
||||||
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||||
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
|
|||||||
[debug] User config: []
|
[debug] User config: []
|
||||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||||
[debug] youtube-dl version 2019.01.17
|
[debug] youtube-dl version 2019.01.23
|
||||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||||
[debug] Proxy map: {}
|
[debug] Proxy map: {}
|
||||||
|
29
ChangeLog
29
ChangeLog
@ -1,3 +1,32 @@
|
|||||||
|
version 2019.01.23
|
||||||
|
|
||||||
|
Core
|
||||||
|
* [utils] Fix urljoin for paths with non-http(s) schemes
|
||||||
|
* [extractor/common] Improve jwplayer relative URL handling (#18892)
|
||||||
|
+ [YoutubeDL] Add negation support for string comparisons in format selection
|
||||||
|
expressions (#18600, #18805)
|
||||||
|
* [extractor/common] Improve HLS video-only format detection (#18923)
|
||||||
|
|
||||||
|
Extractors
|
||||||
|
* [crunchyroll] Extend URL regular expression (#18955)
|
||||||
|
* [pornhub] Bypass scrape detection (#4822, #5930, #7074, #10175, #12722,
|
||||||
|
#17197, #18338 #18842, #18899)
|
||||||
|
+ [vrv] Add support for authentication (#14307)
|
||||||
|
* [videomore:season] Fix extraction
|
||||||
|
* [videomore] Improve extraction (#18908)
|
||||||
|
+ [tnaflix] Pass Referer in metadata request (#18925)
|
||||||
|
* [radiocanada] Relax DRM check (#18608, #18609)
|
||||||
|
* [vimeo] Fix video password verification for videos protected by
|
||||||
|
Referer HTTP header
|
||||||
|
+ [hketv] Add support for hkedcity.net (#18696)
|
||||||
|
+ [streamango] Add support for fruithosts.net (#18710)
|
||||||
|
+ [instagram] Add support for tags (#18757)
|
||||||
|
+ [odnoklassniki] Detect paid videos (#18876)
|
||||||
|
* [ted] Correct acodec for HTTP formats (#18923)
|
||||||
|
* [cartoonnetwork] Fix extraction (#15664, #17224)
|
||||||
|
* [vimeo] Fix extraction for password protected player URLs (#18889)
|
||||||
|
|
||||||
|
|
||||||
version 2019.01.17
|
version 2019.01.17
|
||||||
|
|
||||||
Extractors
|
Extractors
|
||||||
|
@ -667,7 +667,7 @@ The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `
|
|||||||
- `asr`: Audio sampling rate in Hertz
|
- `asr`: Audio sampling rate in Hertz
|
||||||
- `fps`: Frame rate
|
- `fps`: Frame rate
|
||||||
|
|
||||||
Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begins with), `$=` (ends with), `*=` (contains) and following string meta fields:
|
Also filtering work for comparisons `=` (equals), `^=` (starts with), `$=` (ends with), `*=` (contains) and following string meta fields:
|
||||||
- `ext`: File extension
|
- `ext`: File extension
|
||||||
- `acodec`: Name of the audio codec in use
|
- `acodec`: Name of the audio codec in use
|
||||||
- `vcodec`: Name of the video codec in use
|
- `vcodec`: Name of the video codec in use
|
||||||
@ -675,6 +675,8 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
|
|||||||
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
|
- `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
|
||||||
- `format_id`: A short description of the format
|
- `format_id`: A short description of the format
|
||||||
|
|
||||||
|
Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain).
|
||||||
|
|
||||||
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
|
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
|
||||||
|
|
||||||
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
|
Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
|
||||||
|
@ -361,6 +361,7 @@
|
|||||||
- **hitbox**
|
- **hitbox**
|
||||||
- **hitbox:live**
|
- **hitbox:live**
|
||||||
- **HitRecord**
|
- **HitRecord**
|
||||||
|
- **hketv**: 香港教育局教育電視 (HKETV) Educational Television, Hong Kong Educational Bureau
|
||||||
- **HornBunny**
|
- **HornBunny**
|
||||||
- **HotNewHipHop**
|
- **HotNewHipHop**
|
||||||
- **hotstar**
|
- **hotstar**
|
||||||
@ -386,6 +387,7 @@
|
|||||||
- **IndavideoEmbed**
|
- **IndavideoEmbed**
|
||||||
- **InfoQ**
|
- **InfoQ**
|
||||||
- **Instagram**
|
- **Instagram**
|
||||||
|
- **instagram:tag**: Instagram hashtag search
|
||||||
- **instagram:user**: Instagram user profile
|
- **instagram:user**: Instagram user profile
|
||||||
- **Internazionale**
|
- **Internazionale**
|
||||||
- **InternetVideoArchive**
|
- **InternetVideoArchive**
|
||||||
|
@ -497,7 +497,64 @@ jwplayer("mediaplayer").setup({"abouttext":"Visit Indie DB","aboutlink":"http:\/
|
|||||||
'width': 1280,
|
'width': 1280,
|
||||||
'height': 720,
|
'height': 720,
|
||||||
}]
|
}]
|
||||||
)
|
),
|
||||||
|
(
|
||||||
|
# https://github.com/rg3/youtube-dl/issues/18923
|
||||||
|
# https://www.ted.com/talks/boris_hesser_a_grassroots_healthcare_revolution_in_africa
|
||||||
|
'ted_18923',
|
||||||
|
'http://hls.ted.com/talks/31241.m3u8',
|
||||||
|
[{
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '600k-Audio',
|
||||||
|
'vcodec': 'none',
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '68',
|
||||||
|
'vcodec': 'none',
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/64k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '163',
|
||||||
|
'acodec': 'none',
|
||||||
|
'width': 320,
|
||||||
|
'height': 180,
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/180k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '481',
|
||||||
|
'acodec': 'none',
|
||||||
|
'width': 512,
|
||||||
|
'height': 288,
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/320k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '769',
|
||||||
|
'acodec': 'none',
|
||||||
|
'width': 512,
|
||||||
|
'height': 288,
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/450k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '984',
|
||||||
|
'acodec': 'none',
|
||||||
|
'width': 512,
|
||||||
|
'height': 288,
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/600k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '1255',
|
||||||
|
'acodec': 'none',
|
||||||
|
'width': 640,
|
||||||
|
'height': 360,
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/950k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '1693',
|
||||||
|
'acodec': 'none',
|
||||||
|
'width': 853,
|
||||||
|
'height': 480,
|
||||||
|
}, {
|
||||||
|
'url': 'http://hls.ted.com/videos/BorisHesser_2018S/video/1500k.m3u8?nobumpers=true&uniqueId=76011e2b',
|
||||||
|
'format_id': '2462',
|
||||||
|
'acodec': 'none',
|
||||||
|
'width': 1280,
|
||||||
|
'height': 720,
|
||||||
|
}]
|
||||||
|
),
|
||||||
]
|
]
|
||||||
|
|
||||||
for m3u8_file, m3u8_url, expected_formats in _TEST_CASES:
|
for m3u8_file, m3u8_url, expected_formats in _TEST_CASES:
|
||||||
|
@ -239,6 +239,52 @@ class TestFormatSelection(unittest.TestCase):
|
|||||||
downloaded = ydl.downloaded_info_dicts[0]
|
downloaded = ydl.downloaded_info_dicts[0]
|
||||||
self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot')
|
self.assertEqual(downloaded['format_id'], 'vid-vcodec-dot')
|
||||||
|
|
||||||
|
def test_format_selection_string_ops(self):
|
||||||
|
formats = [
|
||||||
|
{'format_id': 'abc-cba', 'ext': 'mp4', 'url': TEST_URL},
|
||||||
|
]
|
||||||
|
info_dict = _make_result(formats)
|
||||||
|
|
||||||
|
# equals (=)
|
||||||
|
ydl = YDL({'format': '[format_id=abc-cba]'})
|
||||||
|
ydl.process_ie_result(info_dict.copy())
|
||||||
|
downloaded = ydl.downloaded_info_dicts[0]
|
||||||
|
self.assertEqual(downloaded['format_id'], 'abc-cba')
|
||||||
|
|
||||||
|
# does not equal (!=)
|
||||||
|
ydl = YDL({'format': '[format_id!=abc-cba]'})
|
||||||
|
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
|
||||||
|
|
||||||
|
# starts with (^=)
|
||||||
|
ydl = YDL({'format': '[format_id^=abc]'})
|
||||||
|
ydl.process_ie_result(info_dict.copy())
|
||||||
|
downloaded = ydl.downloaded_info_dicts[0]
|
||||||
|
self.assertEqual(downloaded['format_id'], 'abc-cba')
|
||||||
|
|
||||||
|
# does not start with (!^=)
|
||||||
|
ydl = YDL({'format': '[format_id!^=abc-cba]'})
|
||||||
|
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
|
||||||
|
|
||||||
|
# ends with ($=)
|
||||||
|
ydl = YDL({'format': '[format_id$=cba]'})
|
||||||
|
ydl.process_ie_result(info_dict.copy())
|
||||||
|
downloaded = ydl.downloaded_info_dicts[0]
|
||||||
|
self.assertEqual(downloaded['format_id'], 'abc-cba')
|
||||||
|
|
||||||
|
# does not end with (!$=)
|
||||||
|
ydl = YDL({'format': '[format_id!$=abc-cba]'})
|
||||||
|
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
|
||||||
|
|
||||||
|
# contains (*=)
|
||||||
|
ydl = YDL({'format': '[format_id*=-]'})
|
||||||
|
ydl.process_ie_result(info_dict.copy())
|
||||||
|
downloaded = ydl.downloaded_info_dicts[0]
|
||||||
|
self.assertEqual(downloaded['format_id'], 'abc-cba')
|
||||||
|
|
||||||
|
# does not contain (!*=)
|
||||||
|
ydl = YDL({'format': '[format_id!*=-]'})
|
||||||
|
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
|
||||||
|
|
||||||
def test_youtube_format_selection(self):
|
def test_youtube_format_selection(self):
|
||||||
order = [
|
order = [
|
||||||
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',
|
'38', '37', '46', '22', '45', '35', '44', '18', '34', '43', '6', '5', '17', '36', '13',
|
||||||
|
@ -507,6 +507,8 @@ class TestUtil(unittest.TestCase):
|
|||||||
self.assertEqual(urljoin('http://foo.de/', ''), None)
|
self.assertEqual(urljoin('http://foo.de/', ''), None)
|
||||||
self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
|
self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
|
||||||
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
|
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
|
||||||
|
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', 'rtmp://foo.de'), 'rtmp://foo.de')
|
||||||
|
self.assertEqual(urljoin(None, 'rtmp://foo.de'), 'rtmp://foo.de')
|
||||||
|
|
||||||
def test_url_or_none(self):
|
def test_url_or_none(self):
|
||||||
self.assertEqual(url_or_none(None), None)
|
self.assertEqual(url_or_none(None), None)
|
||||||
|
28
test/testdata/m3u8/ted_18923.m3u8
vendored
Normal file
28
test/testdata/m3u8/ted_18923.m3u8
vendored
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
#EXTM3U
|
||||||
|
#EXT-X-VERSION:4
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=1255659,PROGRAM-ID=1,CODECS="avc1.42c01e,mp4a.40.2",RESOLUTION=640x360
|
||||||
|
/videos/BorisHesser_2018S/video/600k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=163154,PROGRAM-ID=1,CODECS="avc1.42c00c,mp4a.40.2",RESOLUTION=320x180
|
||||||
|
/videos/BorisHesser_2018S/video/64k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=481701,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
|
||||||
|
/videos/BorisHesser_2018S/video/180k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=769968,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
|
||||||
|
/videos/BorisHesser_2018S/video/320k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=984037,PROGRAM-ID=1,CODECS="avc1.42c015,mp4a.40.2",RESOLUTION=512x288
|
||||||
|
/videos/BorisHesser_2018S/video/450k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=1693925,PROGRAM-ID=1,CODECS="avc1.4d401f,mp4a.40.2",RESOLUTION=853x480
|
||||||
|
/videos/BorisHesser_2018S/video/950k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=2462469,PROGRAM-ID=1,CODECS="avc1.640028,mp4a.40.2",RESOLUTION=1280x720
|
||||||
|
/videos/BorisHesser_2018S/video/1500k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
#EXT-X-STREAM-INF:AUDIO="600k",BANDWIDTH=68101,PROGRAM-ID=1,CODECS="mp4a.40.2",DEFAULT=YES
|
||||||
|
/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b
|
||||||
|
|
||||||
|
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=74298,PROGRAM-ID=1,CODECS="avc1.42c00c",RESOLUTION=320x180,URI="/videos/BorisHesser_2018S/video/64k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
|
||||||
|
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=216200,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/180k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
|
||||||
|
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=304717,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/320k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
|
||||||
|
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=350933,PROGRAM-ID=1,CODECS="avc1.42c015",RESOLUTION=512x288,URI="/videos/BorisHesser_2018S/video/450k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
|
||||||
|
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=495850,PROGRAM-ID=1,CODECS="avc1.42c01e",RESOLUTION=640x360,URI="/videos/BorisHesser_2018S/video/600k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
|
||||||
|
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=810750,PROGRAM-ID=1,CODECS="avc1.4d401f",RESOLUTION=853x480,URI="/videos/BorisHesser_2018S/video/950k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
|
||||||
|
#EXT-X-I-FRAME-STREAM-INF:BANDWIDTH=1273700,PROGRAM-ID=1,CODECS="avc1.640028",RESOLUTION=1280x720,URI="/videos/BorisHesser_2018S/video/1500k_iframe.m3u8?nobumpers=true&uniqueId=76011e2b"
|
||||||
|
|
||||||
|
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="600k",LANGUAGE="en",NAME="Audio",AUTOSELECT=YES,DEFAULT=YES,URI="/videos/BorisHesser_2018S/audio/600k.m3u8?nobumpers=true&uniqueId=76011e2b",BANDWIDTH=614400
|
@ -1063,21 +1063,24 @@ class YoutubeDL(object):
|
|||||||
if not m:
|
if not m:
|
||||||
STR_OPERATORS = {
|
STR_OPERATORS = {
|
||||||
'=': operator.eq,
|
'=': operator.eq,
|
||||||
'!=': operator.ne,
|
|
||||||
'^=': lambda attr, value: attr.startswith(value),
|
'^=': lambda attr, value: attr.startswith(value),
|
||||||
'$=': lambda attr, value: attr.endswith(value),
|
'$=': lambda attr, value: attr.endswith(value),
|
||||||
'*=': lambda attr, value: value in attr,
|
'*=': lambda attr, value: value in attr,
|
||||||
}
|
}
|
||||||
str_operator_rex = re.compile(r'''(?x)
|
str_operator_rex = re.compile(r'''(?x)
|
||||||
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
|
\s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
|
||||||
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
|
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?
|
||||||
\s*(?P<value>[a-zA-Z0-9._-]+)
|
\s*(?P<value>[a-zA-Z0-9._-]+)
|
||||||
\s*$
|
\s*$
|
||||||
''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
|
''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
|
||||||
m = str_operator_rex.search(filter_spec)
|
m = str_operator_rex.search(filter_spec)
|
||||||
if m:
|
if m:
|
||||||
comparison_value = m.group('value')
|
comparison_value = m.group('value')
|
||||||
op = STR_OPERATORS[m.group('op')]
|
str_op = STR_OPERATORS[m.group('op')]
|
||||||
|
if m.group('negation'):
|
||||||
|
op = lambda attr, value: not str_op
|
||||||
|
else:
|
||||||
|
op = str_op
|
||||||
|
|
||||||
if not m:
|
if not m:
|
||||||
raise ValueError('Invalid filter specification %r' % filter_spec)
|
raise ValueError('Invalid filter specification %r' % filter_spec)
|
||||||
|
@ -1596,6 +1596,7 @@ class InfoExtractor(object):
|
|||||||
# References:
|
# References:
|
||||||
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
|
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-21
|
||||||
# 2. https://github.com/rg3/youtube-dl/issues/12211
|
# 2. https://github.com/rg3/youtube-dl/issues/12211
|
||||||
|
# 3. https://github.com/rg3/youtube-dl/issues/18923
|
||||||
|
|
||||||
# We should try extracting formats only from master playlists [1, 4.3.4],
|
# We should try extracting formats only from master playlists [1, 4.3.4],
|
||||||
# i.e. playlists that describe available qualities. On the other hand
|
# i.e. playlists that describe available qualities. On the other hand
|
||||||
@ -1667,11 +1668,16 @@ class InfoExtractor(object):
|
|||||||
rendition = stream_group[0]
|
rendition = stream_group[0]
|
||||||
return rendition.get('NAME') or stream_group_id
|
return rendition.get('NAME') or stream_group_id
|
||||||
|
|
||||||
|
# parse EXT-X-MEDIA tags before EXT-X-STREAM-INF in order to have the
|
||||||
|
# chance to detect video only formats when EXT-X-STREAM-INF tags
|
||||||
|
# precede EXT-X-MEDIA tags in HLS manifest such as [3].
|
||||||
|
for line in m3u8_doc.splitlines():
|
||||||
|
if line.startswith('#EXT-X-MEDIA:'):
|
||||||
|
extract_media(line)
|
||||||
|
|
||||||
for line in m3u8_doc.splitlines():
|
for line in m3u8_doc.splitlines():
|
||||||
if line.startswith('#EXT-X-STREAM-INF:'):
|
if line.startswith('#EXT-X-STREAM-INF:'):
|
||||||
last_stream_inf = parse_m3u8_attributes(line)
|
last_stream_inf = parse_m3u8_attributes(line)
|
||||||
elif line.startswith('#EXT-X-MEDIA:'):
|
|
||||||
extract_media(line)
|
|
||||||
elif line.startswith('#') or not line.strip():
|
elif line.startswith('#') or not line.strip():
|
||||||
continue
|
continue
|
||||||
else:
|
else:
|
||||||
@ -2624,7 +2630,7 @@ class InfoExtractor(object):
|
|||||||
'id': this_video_id,
|
'id': this_video_id,
|
||||||
'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
|
'title': unescapeHTML(video_data['title'] if require_title else video_data.get('title')),
|
||||||
'description': video_data.get('description'),
|
'description': video_data.get('description'),
|
||||||
'thumbnail': self._proto_relative_url(video_data.get('image')),
|
'thumbnail': urljoin(base_url, self._proto_relative_url(video_data.get('image'))),
|
||||||
'timestamp': int_or_none(video_data.get('pubdate')),
|
'timestamp': int_or_none(video_data.get('pubdate')),
|
||||||
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
|
'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
|
||||||
'subtitles': subtitles,
|
'subtitles': subtitles,
|
||||||
@ -2651,12 +2657,9 @@ class InfoExtractor(object):
|
|||||||
for source in jwplayer_sources_data:
|
for source in jwplayer_sources_data:
|
||||||
if not isinstance(source, dict):
|
if not isinstance(source, dict):
|
||||||
continue
|
continue
|
||||||
source_url = self._proto_relative_url(source.get('file'))
|
source_url = urljoin(
|
||||||
if not source_url:
|
base_url, self._proto_relative_url(source.get('file')))
|
||||||
continue
|
if not source_url or source_url in urls:
|
||||||
if base_url:
|
|
||||||
source_url = compat_urlparse.urljoin(base_url, source_url)
|
|
||||||
if source_url in urls:
|
|
||||||
continue
|
continue
|
||||||
urls.append(source_url)
|
urls.append(source_url)
|
||||||
source_type = source.get('type') or ''
|
source_type = source.get('type') or ''
|
||||||
|
@ -144,7 +144,7 @@ class CrunchyrollBaseIE(InfoExtractor):
|
|||||||
|
|
||||||
class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
|
class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
|
||||||
IE_NAME = 'crunchyroll'
|
IE_NAME = 'crunchyroll'
|
||||||
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
|
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|(?:[^/]*/){1,2}[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
|
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -269,6 +269,9 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'http://www.crunchyroll.com/media-723735',
|
'url': 'http://www.crunchyroll.com/media-723735',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.crunchyroll.com/en-gb/mob-psycho-100/episode-2-urban-legends-encountering-rumors-780921',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
_FORMAT_IDS = {
|
_FORMAT_IDS = {
|
||||||
|
@ -452,6 +452,7 @@ from .hellporno import HellPornoIE
|
|||||||
from .helsinki import HelsinkiIE
|
from .helsinki import HelsinkiIE
|
||||||
from .hentaistigma import HentaiStigmaIE
|
from .hentaistigma import HentaiStigmaIE
|
||||||
from .hgtv import HGTVComShowIE
|
from .hgtv import HGTVComShowIE
|
||||||
|
from .hketv import HKETVIE
|
||||||
from .hidive import HiDiveIE
|
from .hidive import HiDiveIE
|
||||||
from .historicfilms import HistoricFilmsIE
|
from .historicfilms import HistoricFilmsIE
|
||||||
from .hitbox import HitboxIE, HitboxLiveIE
|
from .hitbox import HitboxIE, HitboxLiveIE
|
||||||
@ -494,7 +495,11 @@ from .ina import InaIE
|
|||||||
from .inc import IncIE
|
from .inc import IncIE
|
||||||
from .indavideo import IndavideoEmbedIE
|
from .indavideo import IndavideoEmbedIE
|
||||||
from .infoq import InfoQIE
|
from .infoq import InfoQIE
|
||||||
from .instagram import InstagramIE, InstagramUserIE
|
from .instagram import (
|
||||||
|
InstagramIE,
|
||||||
|
InstagramUserIE,
|
||||||
|
InstagramTagIE,
|
||||||
|
)
|
||||||
from .internazionale import InternazionaleIE
|
from .internazionale import InternazionaleIE
|
||||||
from .internetvideoarchive import InternetVideoArchiveIE
|
from .internetvideoarchive import InternetVideoArchiveIE
|
||||||
from .iprima import IPrimaIE
|
from .iprima import IPrimaIE
|
||||||
|
191
youtube_dl/extractor/hketv.py
Normal file
191
youtube_dl/extractor/hketv.py
Normal file
@ -0,0 +1,191 @@
|
|||||||
|
# coding: utf-8
|
||||||
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
|
from .common import InfoExtractor
|
||||||
|
from ..compat import compat_str
|
||||||
|
from ..utils import (
|
||||||
|
clean_html,
|
||||||
|
ExtractorError,
|
||||||
|
int_or_none,
|
||||||
|
merge_dicts,
|
||||||
|
parse_count,
|
||||||
|
str_or_none,
|
||||||
|
try_get,
|
||||||
|
unified_strdate,
|
||||||
|
urlencode_postdata,
|
||||||
|
urljoin,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class HKETVIE(InfoExtractor):
|
||||||
|
IE_NAME = 'hketv'
|
||||||
|
IE_DESC = '香港教育局教育電視 (HKETV) Educational Television, Hong Kong Educational Bureau'
|
||||||
|
_GEO_BYPASS = False
|
||||||
|
_GEO_COUNTRIES = ['HK']
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?hkedcity\.net/etv/resource/(?P<id>[0-9]+)'
|
||||||
|
_TESTS = [{
|
||||||
|
'url': 'https://www.hkedcity.net/etv/resource/2932360618',
|
||||||
|
'md5': 'f193712f5f7abb208ddef3c5ea6ed0b7',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '2932360618',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': '喜閱一生(共享閱讀樂) (中、英文字幕可供選擇)',
|
||||||
|
'description': 'md5:d5286d05219ef50e0613311cbe96e560',
|
||||||
|
'upload_date': '20181024',
|
||||||
|
'duration': 900,
|
||||||
|
'subtitles': 'count:2',
|
||||||
|
},
|
||||||
|
'skip': 'Geo restricted to HK',
|
||||||
|
}, {
|
||||||
|
'url': 'https://www.hkedcity.net/etv/resource/972641418',
|
||||||
|
'md5': '1ed494c1c6cf7866a8290edad9b07dc9',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '972641418',
|
||||||
|
'ext': 'mp4',
|
||||||
|
'title': '衣冠楚楚 (天使系列之一)',
|
||||||
|
'description': 'md5:10bb3d659421e74f58e5db5691627b0f',
|
||||||
|
'upload_date': '20070109',
|
||||||
|
'duration': 907,
|
||||||
|
'subtitles': {},
|
||||||
|
},
|
||||||
|
'params': {
|
||||||
|
'geo_verification_proxy': '<HK proxy here>',
|
||||||
|
},
|
||||||
|
'skip': 'Geo restricted to HK',
|
||||||
|
}]
|
||||||
|
|
||||||
|
_CC_LANGS = {
|
||||||
|
'中文(繁體中文)': 'zh-Hant',
|
||||||
|
'中文(简体中文)': 'zh-Hans',
|
||||||
|
'English': 'en',
|
||||||
|
'Bahasa Indonesia': 'id',
|
||||||
|
'\u0939\u093f\u0928\u094d\u0926\u0940': 'hi',
|
||||||
|
'\u0928\u0947\u092a\u093e\u0932\u0940': 'ne',
|
||||||
|
'Tagalog': 'tl',
|
||||||
|
'\u0e44\u0e17\u0e22': 'th',
|
||||||
|
'\u0627\u0631\u062f\u0648': 'ur',
|
||||||
|
}
|
||||||
|
_FORMAT_HEIGHTS = {
|
||||||
|
'SD': 360,
|
||||||
|
'HD': 720,
|
||||||
|
}
|
||||||
|
_APPS_BASE_URL = 'https://apps.hkedcity.net'
|
||||||
|
|
||||||
|
def _real_extract(self, url):
|
||||||
|
video_id = self._match_id(url)
|
||||||
|
webpage = self._download_webpage(url, video_id)
|
||||||
|
|
||||||
|
title = (
|
||||||
|
self._html_search_meta(
|
||||||
|
('ed_title', 'search.ed_title'), webpage, default=None) or
|
||||||
|
self._search_regex(
|
||||||
|
r'data-favorite_title_(?:eng|chi)=(["\'])(?P<id>(?:(?!\1).)+)\1',
|
||||||
|
webpage, 'title', default=None, group='url') or
|
||||||
|
self._html_search_regex(
|
||||||
|
r'<h1>([^<]+)</h1>', webpage, 'title', default=None) or
|
||||||
|
self._og_search_title(webpage)
|
||||||
|
)
|
||||||
|
|
||||||
|
file_id = self._search_regex(
|
||||||
|
r'post_var\[["\']file_id["\']\s*\]\s*=\s*(.+?);',
|
||||||
|
webpage, 'file ID')
|
||||||
|
curr_url = self._search_regex(
|
||||||
|
r'post_var\[["\']curr_url["\']\s*\]\s*=\s*"(.+?)";',
|
||||||
|
webpage, 'curr URL')
|
||||||
|
data = {
|
||||||
|
'action': 'get_info',
|
||||||
|
'curr_url': curr_url,
|
||||||
|
'file_id': file_id,
|
||||||
|
'video_url': file_id,
|
||||||
|
}
|
||||||
|
|
||||||
|
response = self._download_json(
|
||||||
|
self._APPS_BASE_URL + '/media/play/handler.php', video_id,
|
||||||
|
data=urlencode_postdata(data),
|
||||||
|
headers=merge_dicts({
|
||||||
|
'Content-Type': 'application/x-www-form-urlencoded'},
|
||||||
|
self.geo_verification_headers()))
|
||||||
|
|
||||||
|
result = response['result']
|
||||||
|
|
||||||
|
if not response.get('success') or not response.get('access'):
|
||||||
|
error = clean_html(response.get('access_err_msg'))
|
||||||
|
if 'Video streaming is not available in your country' in error:
|
||||||
|
self.raise_geo_restricted(
|
||||||
|
msg=error, countries=self._GEO_COUNTRIES)
|
||||||
|
else:
|
||||||
|
raise ExtractorError(error, expected=True)
|
||||||
|
|
||||||
|
formats = []
|
||||||
|
|
||||||
|
width = int_or_none(result.get('width'))
|
||||||
|
height = int_or_none(result.get('height'))
|
||||||
|
|
||||||
|
playlist0 = result['playlist'][0]
|
||||||
|
for fmt in playlist0['sources']:
|
||||||
|
file_url = urljoin(self._APPS_BASE_URL, fmt.get('file'))
|
||||||
|
if not file_url:
|
||||||
|
continue
|
||||||
|
# If we ever wanted to provide the final resolved URL that
|
||||||
|
# does not require cookies, albeit with a shorter lifespan:
|
||||||
|
# urlh = self._downloader.urlopen(file_url)
|
||||||
|
# resolved_url = urlh.geturl()
|
||||||
|
label = fmt.get('label')
|
||||||
|
h = self._FORMAT_HEIGHTS.get(label)
|
||||||
|
w = h * width // height if h and width and height else None
|
||||||
|
formats.append({
|
||||||
|
'format_id': label,
|
||||||
|
'ext': fmt.get('type'),
|
||||||
|
'url': file_url,
|
||||||
|
'width': w,
|
||||||
|
'height': h,
|
||||||
|
})
|
||||||
|
self._sort_formats(formats)
|
||||||
|
|
||||||
|
subtitles = {}
|
||||||
|
tracks = try_get(playlist0, lambda x: x['tracks'], list) or []
|
||||||
|
for track in tracks:
|
||||||
|
if not isinstance(track, dict):
|
||||||
|
continue
|
||||||
|
track_kind = str_or_none(track.get('kind'))
|
||||||
|
if not track_kind or not isinstance(track_kind, compat_str):
|
||||||
|
continue
|
||||||
|
if track_kind.lower() not in ('captions', 'subtitles'):
|
||||||
|
continue
|
||||||
|
track_url = urljoin(self._APPS_BASE_URL, track.get('file'))
|
||||||
|
if not track_url:
|
||||||
|
continue
|
||||||
|
track_label = track.get('label')
|
||||||
|
subtitles.setdefault(self._CC_LANGS.get(
|
||||||
|
track_label, track_label), []).append({
|
||||||
|
'url': self._proto_relative_url(track_url),
|
||||||
|
'ext': 'srt',
|
||||||
|
})
|
||||||
|
|
||||||
|
# Likes
|
||||||
|
emotion = self._download_json(
|
||||||
|
'https://emocounter.hkedcity.net/handler.php', video_id,
|
||||||
|
data=urlencode_postdata({
|
||||||
|
'action': 'get_emotion',
|
||||||
|
'data[bucket_id]': 'etv',
|
||||||
|
'data[identifier]': video_id,
|
||||||
|
}),
|
||||||
|
headers={'Content-Type': 'application/x-www-form-urlencoded'},
|
||||||
|
fatal=False) or {}
|
||||||
|
like_count = int_or_none(try_get(
|
||||||
|
emotion, lambda x: x['data']['emotion_data'][0]['count']))
|
||||||
|
|
||||||
|
return {
|
||||||
|
'id': video_id,
|
||||||
|
'title': title,
|
||||||
|
'description': self._html_search_meta(
|
||||||
|
'description', webpage, fatal=False),
|
||||||
|
'upload_date': unified_strdate(self._html_search_meta(
|
||||||
|
'ed_date', webpage, fatal=False), day_first=False),
|
||||||
|
'duration': int_or_none(result.get('length')),
|
||||||
|
'formats': formats,
|
||||||
|
'subtitles': subtitles,
|
||||||
|
'thumbnail': urljoin(self._APPS_BASE_URL, result.get('image')),
|
||||||
|
'view_count': parse_count(result.get('view_count')),
|
||||||
|
'like_count': like_count,
|
||||||
|
}
|
@ -227,44 +227,37 @@ class InstagramIE(InfoExtractor):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
class InstagramUserIE(InfoExtractor):
|
class InstagramPlaylistIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<id>[^/]{2,})/?(?:$|[?#])'
|
# A superclass for handling any kind of query based on GraphQL which
|
||||||
IE_DESC = 'Instagram user profile'
|
# results in a playlist.
|
||||||
IE_NAME = 'instagram:user'
|
|
||||||
_TEST = {
|
|
||||||
'url': 'https://instagram.com/porsche',
|
|
||||||
'info_dict': {
|
|
||||||
'id': 'porsche',
|
|
||||||
'title': 'porsche',
|
|
||||||
},
|
|
||||||
'playlist_count': 5,
|
|
||||||
'params': {
|
|
||||||
'extract_flat': True,
|
|
||||||
'skip_download': True,
|
|
||||||
'playlistend': 5,
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
_gis_tmpl = None
|
_gis_tmpl = None # used to cache GIS request type
|
||||||
|
|
||||||
def _entries(self, data):
|
def _parse_graphql(self, webpage, item_id):
|
||||||
|
# Reads a webpage and returns its GraphQL data.
|
||||||
|
return self._parse_json(
|
||||||
|
self._search_regex(
|
||||||
|
r'sharedData\s*=\s*({.+?})\s*;\s*[<\n]', webpage, 'data'),
|
||||||
|
item_id)
|
||||||
|
|
||||||
|
def _extract_graphql(self, data, url):
|
||||||
|
# Parses GraphQL queries containing videos and generates a playlist.
|
||||||
def get_count(suffix):
|
def get_count(suffix):
|
||||||
return int_or_none(try_get(
|
return int_or_none(try_get(
|
||||||
node, lambda x: x['edge_media_' + suffix]['count']))
|
node, lambda x: x['edge_media_' + suffix]['count']))
|
||||||
|
|
||||||
uploader_id = data['entry_data']['ProfilePage'][0]['graphql']['user']['id']
|
uploader_id = self._match_id(url)
|
||||||
csrf_token = data['config']['csrf_token']
|
csrf_token = data['config']['csrf_token']
|
||||||
rhx_gis = data.get('rhx_gis') or '3c7ca9dcefcf966d11dacf1f151335e8'
|
rhx_gis = data.get('rhx_gis') or '3c7ca9dcefcf966d11dacf1f151335e8'
|
||||||
|
|
||||||
self._set_cookie('instagram.com', 'ig_pr', '1')
|
|
||||||
|
|
||||||
cursor = ''
|
cursor = ''
|
||||||
for page_num in itertools.count(1):
|
for page_num in itertools.count(1):
|
||||||
variables = json.dumps({
|
variables = {
|
||||||
'id': uploader_id,
|
|
||||||
'first': 12,
|
'first': 12,
|
||||||
'after': cursor,
|
'after': cursor,
|
||||||
})
|
}
|
||||||
|
variables.update(self._query_vars_for(data))
|
||||||
|
variables = json.dumps(variables)
|
||||||
|
|
||||||
if self._gis_tmpl:
|
if self._gis_tmpl:
|
||||||
gis_tmpls = [self._gis_tmpl]
|
gis_tmpls = [self._gis_tmpl]
|
||||||
@ -276,21 +269,26 @@ class InstagramUserIE(InfoExtractor):
|
|||||||
'%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']),
|
'%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']),
|
||||||
]
|
]
|
||||||
|
|
||||||
|
# try all of the ways to generate a GIS query, and not only use the
|
||||||
|
# first one that works, but cache it for future requests
|
||||||
for gis_tmpl in gis_tmpls:
|
for gis_tmpl in gis_tmpls:
|
||||||
try:
|
try:
|
||||||
media = self._download_json(
|
json_data = self._download_json(
|
||||||
'https://www.instagram.com/graphql/query/', uploader_id,
|
'https://www.instagram.com/graphql/query/', uploader_id,
|
||||||
'Downloading JSON page %d' % page_num, headers={
|
'Downloading JSON page %d' % page_num, headers={
|
||||||
'X-Requested-With': 'XMLHttpRequest',
|
'X-Requested-With': 'XMLHttpRequest',
|
||||||
'X-Instagram-GIS': hashlib.md5(
|
'X-Instagram-GIS': hashlib.md5(
|
||||||
('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(),
|
('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(),
|
||||||
}, query={
|
}, query={
|
||||||
'query_hash': '42323d64886122307be10013ad2dcc44',
|
'query_hash': self._QUERY_HASH,
|
||||||
'variables': variables,
|
'variables': variables,
|
||||||
})['data']['user']['edge_owner_to_timeline_media']
|
})
|
||||||
|
media = self._parse_timeline_from(json_data)
|
||||||
self._gis_tmpl = gis_tmpl
|
self._gis_tmpl = gis_tmpl
|
||||||
break
|
break
|
||||||
except ExtractorError as e:
|
except ExtractorError as e:
|
||||||
|
# if it's an error caused by a bad query, and there are
|
||||||
|
# more GIS templates to try, ignore it and keep trying
|
||||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
|
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
|
||||||
if gis_tmpl != gis_tmpls[-1]:
|
if gis_tmpl != gis_tmpls[-1]:
|
||||||
continue
|
continue
|
||||||
@ -348,14 +346,80 @@ class InstagramUserIE(InfoExtractor):
|
|||||||
break
|
break
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
username = self._match_id(url)
|
user_or_tag = self._match_id(url)
|
||||||
|
webpage = self._download_webpage(url, user_or_tag)
|
||||||
|
data = self._parse_graphql(webpage, user_or_tag)
|
||||||
|
|
||||||
webpage = self._download_webpage(url, username)
|
self._set_cookie('instagram.com', 'ig_pr', '1')
|
||||||
|
|
||||||
data = self._parse_json(
|
|
||||||
self._search_regex(
|
|
||||||
r'sharedData\s*=\s*({.+?})\s*;\s*[<\n]', webpage, 'data'),
|
|
||||||
username)
|
|
||||||
|
|
||||||
return self.playlist_result(
|
return self.playlist_result(
|
||||||
self._entries(data), username, username)
|
self._extract_graphql(data, url), user_or_tag, user_or_tag)
|
||||||
|
|
||||||
|
|
||||||
|
class InstagramUserIE(InstagramPlaylistIE):
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?instagram\.com/(?P<id>[^/]{2,})/?(?:$|[?#])'
|
||||||
|
IE_DESC = 'Instagram user profile'
|
||||||
|
IE_NAME = 'instagram:user'
|
||||||
|
_TEST = {
|
||||||
|
'url': 'https://instagram.com/porsche',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'porsche',
|
||||||
|
'title': 'porsche',
|
||||||
|
},
|
||||||
|
'playlist_count': 5,
|
||||||
|
'params': {
|
||||||
|
'extract_flat': True,
|
||||||
|
'skip_download': True,
|
||||||
|
'playlistend': 5,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
_QUERY_HASH = '42323d64886122307be10013ad2dcc44',
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _parse_timeline_from(data):
|
||||||
|
# extracts the media timeline data from a GraphQL result
|
||||||
|
return data['data']['user']['edge_owner_to_timeline_media']
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _query_vars_for(data):
|
||||||
|
# returns a dictionary of variables to add to the timeline query based
|
||||||
|
# on the GraphQL of the original page
|
||||||
|
return {
|
||||||
|
'id': data['entry_data']['ProfilePage'][0]['graphql']['user']['id']
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class InstagramTagIE(InstagramPlaylistIE):
|
||||||
|
_VALID_URL = r'https?://(?:www\.)?instagram\.com/explore/tags/(?P<id>[^/]+)'
|
||||||
|
IE_DESC = 'Instagram hashtag search'
|
||||||
|
IE_NAME = 'instagram:tag'
|
||||||
|
_TEST = {
|
||||||
|
'url': 'https://instagram.com/explore/tags/lolcats',
|
||||||
|
'info_dict': {
|
||||||
|
'id': 'lolcats',
|
||||||
|
'title': 'lolcats',
|
||||||
|
},
|
||||||
|
'playlist_count': 50,
|
||||||
|
'params': {
|
||||||
|
'extract_flat': True,
|
||||||
|
'skip_download': True,
|
||||||
|
'playlistend': 50,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
_QUERY_HASH = 'f92f56d47dc7a55b606908374b43a314',
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _parse_timeline_from(data):
|
||||||
|
# extracts the media timeline data from a GraphQL result
|
||||||
|
return data['data']['hashtag']['edge_hashtag_to_media']
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _query_vars_for(data):
|
||||||
|
# returns a dictionary of variables to add to the timeline query based
|
||||||
|
# on the GraphQL of the original page
|
||||||
|
return {
|
||||||
|
'tag_name':
|
||||||
|
data['entry_data']['TagPage'][0]['graphql']['hashtag']['name']
|
||||||
|
}
|
||||||
|
@ -115,6 +115,10 @@ class OdnoklassnikiIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://m.ok.ru/dk?st.cmd=movieLayer&st.discId=863789452017&st.retLoc=friend&st.rtu=%2Fdk%3Fst.cmd%3DfriendMovies%26st.mode%3Down%26st.mrkId%3D%257B%2522uploadedMovieMarker%2522%253A%257B%2522marker%2522%253A%25221519410114503%2522%252C%2522hasMore%2522%253Atrue%257D%252C%2522sharedMovieMarker%2522%253A%257B%2522marker%2522%253Anull%252C%2522hasMore%2522%253Afalse%257D%257D%26st.friendId%3D561722190321%26st.frwd%3Don%26_prevCmd%3DfriendMovies%26tkn%3D7257&st.discType=MOVIE&st.mvId=863789452017&_prevCmd=friendMovies&tkn=3648#lst#',
|
'url': 'https://m.ok.ru/dk?st.cmd=movieLayer&st.discId=863789452017&st.retLoc=friend&st.rtu=%2Fdk%3Fst.cmd%3DfriendMovies%26st.mode%3Down%26st.mrkId%3D%257B%2522uploadedMovieMarker%2522%253A%257B%2522marker%2522%253A%25221519410114503%2522%252C%2522hasMore%2522%253Atrue%257D%252C%2522sharedMovieMarker%2522%253A%257B%2522marker%2522%253Anull%252C%2522hasMore%2522%253Afalse%257D%257D%26st.friendId%3D561722190321%26st.frwd%3Don%26_prevCmd%3DfriendMovies%26tkn%3D7257&st.discType=MOVIE&st.mvId=863789452017&_prevCmd=friendMovies&tkn=3648#lst#',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
# Paid video
|
||||||
|
'url': 'https://ok.ru/video/954886983203',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
@ -244,6 +248,11 @@ class OdnoklassnikiIE(InfoExtractor):
|
|||||||
'ext': 'flv',
|
'ext': 'flv',
|
||||||
})
|
})
|
||||||
|
|
||||||
|
if not formats:
|
||||||
|
payment_info = metadata.get('paymentInfo')
|
||||||
|
if payment_info:
|
||||||
|
raise ExtractorError('This video is paid, subscribe to download it', expected=True)
|
||||||
|
|
||||||
self._sort_formats(formats)
|
self._sort_formats(formats)
|
||||||
|
|
||||||
info['formats'] = formats
|
info['formats'] = formats
|
||||||
|
@ -10,7 +10,9 @@ from .common import InfoExtractor
|
|||||||
from ..compat import (
|
from ..compat import (
|
||||||
compat_HTTPError,
|
compat_HTTPError,
|
||||||
compat_str,
|
compat_str,
|
||||||
|
compat_urllib_request,
|
||||||
)
|
)
|
||||||
|
from .openload import PhantomJSwrapper
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
ExtractorError,
|
ExtractorError,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
@ -22,7 +24,29 @@ from ..utils import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
class PornHubIE(InfoExtractor):
|
class PornHubBaseIE(InfoExtractor):
|
||||||
|
def _download_webpage_handle(self, *args, **kwargs):
|
||||||
|
def dl(*args, **kwargs):
|
||||||
|
return super(PornHubBaseIE, self)._download_webpage_handle(*args, **kwargs)
|
||||||
|
|
||||||
|
webpage, urlh = dl(*args, **kwargs)
|
||||||
|
|
||||||
|
if any(re.search(p, webpage) for p in (
|
||||||
|
r'<body\b[^>]+\bonload=["\']go\(\)',
|
||||||
|
r'document\.cookie\s*=\s*["\']RNKEY=',
|
||||||
|
r'document\.location\.reload\(true\)')):
|
||||||
|
url_or_request = args[0]
|
||||||
|
url = (url_or_request.get_full_url()
|
||||||
|
if isinstance(url_or_request, compat_urllib_request.Request)
|
||||||
|
else url_or_request)
|
||||||
|
phantom = PhantomJSwrapper(self, required_version='2.0')
|
||||||
|
phantom.get(url, html=webpage)
|
||||||
|
webpage, urlh = dl(*args, **kwargs)
|
||||||
|
|
||||||
|
return webpage, urlh
|
||||||
|
|
||||||
|
|
||||||
|
class PornHubIE(PornHubBaseIE):
|
||||||
IE_DESC = 'PornHub and Thumbzilla'
|
IE_DESC = 'PornHub and Thumbzilla'
|
||||||
_VALID_URL = r'''(?x)
|
_VALID_URL = r'''(?x)
|
||||||
https?://
|
https?://
|
||||||
@ -307,7 +331,7 @@ class PornHubIE(InfoExtractor):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
class PornHubPlaylistBaseIE(InfoExtractor):
|
class PornHubPlaylistBaseIE(PornHubBaseIE):
|
||||||
def _extract_entries(self, webpage, host):
|
def _extract_entries(self, webpage, host):
|
||||||
# Only process container div with main playlist content skipping
|
# Only process container div with main playlist content skipping
|
||||||
# drop-down menu that uses similar pattern for videos (see
|
# drop-down menu that uses similar pattern for videos (see
|
||||||
|
@ -49,6 +49,16 @@ class RadioCanadaIE(InfoExtractor):
|
|||||||
# m3u8 download
|
# m3u8 download
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
# with protectionType but not actually DRM protected
|
||||||
|
'url': 'radiocanada:toutv:140872',
|
||||||
|
'info_dict': {
|
||||||
|
'id': '140872',
|
||||||
|
'title': 'Épisode 1',
|
||||||
|
'series': 'District 31',
|
||||||
|
},
|
||||||
|
'only_matching': True,
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
|
||||||
@ -67,8 +77,10 @@ class RadioCanadaIE(InfoExtractor):
|
|||||||
el = find_xpath_attr(metadata, './/Meta', 'name', name)
|
el = find_xpath_attr(metadata, './/Meta', 'name', name)
|
||||||
return el.text if el is not None else None
|
return el.text if el is not None else None
|
||||||
|
|
||||||
|
# protectionType does not necessarily mean the video is DRM protected (see
|
||||||
|
# https://github.com/rg3/youtube-dl/pull/18609).
|
||||||
if get_meta('protectionType'):
|
if get_meta('protectionType'):
|
||||||
raise ExtractorError('This video is DRM protected.', expected=True)
|
self.report_warning('This video is probably DRM protected.')
|
||||||
|
|
||||||
device_types = ['ipad']
|
device_types = ['ipad']
|
||||||
if not smuggled_data:
|
if not smuggled_data:
|
||||||
|
@ -14,7 +14,7 @@ from ..utils import (
|
|||||||
|
|
||||||
|
|
||||||
class StreamangoIE(InfoExtractor):
|
class StreamangoIE(InfoExtractor):
|
||||||
_VALID_URL = r'https?://(?:www\.)?streamango\.com/(?:f|embed)/(?P<id>[^/?#&]+)'
|
_VALID_URL = r'https?://(?:www\.)?(?:streamango\.com|fruithosts\.net)/(?:f|embed)/(?P<id>[^/?#&]+)'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4',
|
'url': 'https://streamango.com/f/clapasobsptpkdfe/20170315_150006_mp4',
|
||||||
'md5': 'e992787515a182f55e38fc97588d802a',
|
'md5': 'e992787515a182f55e38fc97588d802a',
|
||||||
@ -38,6 +38,9 @@ class StreamangoIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
|
'url': 'https://streamango.com/embed/clapasobsptpkdfe/20170315_150006_mp4',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://fruithosts.net/f/mreodparcdcmspsm/w1f1_r4lph_2018_brrs_720p_latino_mp4',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
|
@ -265,6 +265,8 @@ class TEDIE(InfoExtractor):
|
|||||||
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
|
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
|
||||||
'protocol': 'http',
|
'protocol': 'http',
|
||||||
})
|
})
|
||||||
|
if f.get('acodec') == 'none':
|
||||||
|
del f['acodec']
|
||||||
formats.append(f)
|
formats.append(f)
|
||||||
|
|
||||||
audio_download = talk_info.get('audioDownload')
|
audio_download = talk_info.get('audioDownload')
|
||||||
|
@ -96,7 +96,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
|
|||||||
|
|
||||||
cfg_xml = self._download_xml(
|
cfg_xml = self._download_xml(
|
||||||
cfg_url, display_id, 'Downloading metadata',
|
cfg_url, display_id, 'Downloading metadata',
|
||||||
transform_source=fix_xml_ampersands)
|
transform_source=fix_xml_ampersands, headers={'Referer': url})
|
||||||
|
|
||||||
formats = []
|
formats = []
|
||||||
|
|
||||||
|
@ -4,8 +4,14 @@ from __future__ import unicode_literals
|
|||||||
import re
|
import re
|
||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
|
from ..compat import compat_str
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
int_or_none,
|
int_or_none,
|
||||||
|
orderedSet,
|
||||||
|
parse_duration,
|
||||||
|
str_or_none,
|
||||||
|
unified_strdate,
|
||||||
|
url_or_none,
|
||||||
xpath_element,
|
xpath_element,
|
||||||
xpath_text,
|
xpath_text,
|
||||||
)
|
)
|
||||||
@ -13,7 +19,19 @@ from ..utils import (
|
|||||||
|
|
||||||
class VideomoreIE(InfoExtractor):
|
class VideomoreIE(InfoExtractor):
|
||||||
IE_NAME = 'videomore'
|
IE_NAME = 'videomore'
|
||||||
_VALID_URL = r'videomore:(?P<sid>\d+)$|https?://videomore\.ru/(?:(?:embed|[^/]+/[^/]+)/|[^/]+\?.*\btrack_id=)(?P<id>\d+)(?:[/?#&]|\.(?:xml|json)|$)'
|
_VALID_URL = r'''(?x)
|
||||||
|
videomore:(?P<sid>\d+)$|
|
||||||
|
https?://(?:player\.)?videomore\.ru/
|
||||||
|
(?:
|
||||||
|
(?:
|
||||||
|
embed|
|
||||||
|
[^/]+/[^/]+
|
||||||
|
)/|
|
||||||
|
[^/]*\?.*?\btrack_id=
|
||||||
|
)
|
||||||
|
(?P<id>\d+)
|
||||||
|
(?:[/?#&]|\.(?:xml|json)|$)
|
||||||
|
'''
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://videomore.ru/kino_v_detalayah/5_sezon/367617',
|
'url': 'http://videomore.ru/kino_v_detalayah/5_sezon/367617',
|
||||||
'md5': '44455a346edc0d509ac5b5a5b531dc35',
|
'md5': '44455a346edc0d509ac5b5a5b531dc35',
|
||||||
@ -79,6 +97,9 @@ class VideomoreIE(InfoExtractor):
|
|||||||
}, {
|
}, {
|
||||||
'url': 'videomore:367617',
|
'url': 'videomore:367617',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
|
}, {
|
||||||
|
'url': 'https://player.videomore.ru/?partner_id=97&track_id=736234&autoplay=0&userToken=',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
@ -136,7 +157,7 @@ class VideomoreIE(InfoExtractor):
|
|||||||
|
|
||||||
class VideomoreVideoIE(InfoExtractor):
|
class VideomoreVideoIE(InfoExtractor):
|
||||||
IE_NAME = 'videomore:video'
|
IE_NAME = 'videomore:video'
|
||||||
_VALID_URL = r'https?://videomore\.ru/(?:(?:[^/]+/){2})?(?P<id>[^/?#&]+)[/?#&]*$'
|
_VALID_URL = r'https?://videomore\.ru/(?:(?:[^/]+/){2})?(?P<id>[^/?#&]+)(?:/*|[?#&].*?)$'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
# single video with og:video:iframe
|
# single video with og:video:iframe
|
||||||
'url': 'http://videomore.ru/elki_3',
|
'url': 'http://videomore.ru/elki_3',
|
||||||
@ -176,6 +197,9 @@ class VideomoreVideoIE(InfoExtractor):
|
|||||||
'params': {
|
'params': {
|
||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
|
}, {
|
||||||
|
'url': 'https://videomore.ru/molodezhka/6_sezon/29_seriya?utm_so',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
@ -196,13 +220,16 @@ class VideomoreVideoIE(InfoExtractor):
|
|||||||
r'track-id=["\'](\d+)',
|
r'track-id=["\'](\d+)',
|
||||||
r'xcnt_product_id\s*=\s*(\d+)'), webpage, 'video id')
|
r'xcnt_product_id\s*=\s*(\d+)'), webpage, 'video id')
|
||||||
video_url = 'videomore:%s' % video_id
|
video_url = 'videomore:%s' % video_id
|
||||||
|
else:
|
||||||
|
video_id = None
|
||||||
|
|
||||||
return self.url_result(video_url, VideomoreIE.ie_key())
|
return self.url_result(
|
||||||
|
video_url, ie=VideomoreIE.ie_key(), video_id=video_id)
|
||||||
|
|
||||||
|
|
||||||
class VideomoreSeasonIE(InfoExtractor):
|
class VideomoreSeasonIE(InfoExtractor):
|
||||||
IE_NAME = 'videomore:season'
|
IE_NAME = 'videomore:season'
|
||||||
_VALID_URL = r'https?://videomore\.ru/(?!embed)(?P<id>[^/]+/[^/?#&]+)[/?#&]*$'
|
_VALID_URL = r'https?://videomore\.ru/(?!embed)(?P<id>[^/]+/[^/?#&]+)(?:/*|[?#&].*?)$'
|
||||||
_TESTS = [{
|
_TESTS = [{
|
||||||
'url': 'http://videomore.ru/molodezhka/sezon_promo',
|
'url': 'http://videomore.ru/molodezhka/sezon_promo',
|
||||||
'info_dict': {
|
'info_dict': {
|
||||||
@ -210,8 +237,16 @@ class VideomoreSeasonIE(InfoExtractor):
|
|||||||
'title': 'Молодежка Промо',
|
'title': 'Молодежка Промо',
|
||||||
},
|
},
|
||||||
'playlist_mincount': 12,
|
'playlist_mincount': 12,
|
||||||
|
}, {
|
||||||
|
'url': 'http://videomore.ru/molodezhka/sezon_promo?utm_so',
|
||||||
|
'only_matching': True,
|
||||||
}]
|
}]
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def suitable(cls, url):
|
||||||
|
return (False if (VideomoreIE.suitable(url) or VideomoreVideoIE.suitable(url))
|
||||||
|
else super(VideomoreSeasonIE, cls).suitable(url))
|
||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
display_id = self._match_id(url)
|
display_id = self._match_id(url)
|
||||||
|
|
||||||
@ -219,6 +254,51 @@ class VideomoreSeasonIE(InfoExtractor):
|
|||||||
|
|
||||||
title = self._og_search_title(webpage)
|
title = self._og_search_title(webpage)
|
||||||
|
|
||||||
|
data = self._parse_json(
|
||||||
|
self._html_search_regex(
|
||||||
|
r'\bclass=["\']seasons-tracks["\'][^>]+\bdata-custom-data=(["\'])(?P<value>{.+?})\1',
|
||||||
|
webpage, 'data', default='{}', group='value'),
|
||||||
|
display_id, fatal=False)
|
||||||
|
|
||||||
|
entries = []
|
||||||
|
|
||||||
|
if data:
|
||||||
|
episodes = data.get('episodes')
|
||||||
|
if isinstance(episodes, list):
|
||||||
|
for ep in episodes:
|
||||||
|
if not isinstance(ep, dict):
|
||||||
|
continue
|
||||||
|
ep_id = int_or_none(ep.get('id'))
|
||||||
|
ep_url = url_or_none(ep.get('url'))
|
||||||
|
if ep_id:
|
||||||
|
e = {
|
||||||
|
'url': 'videomore:%s' % ep_id,
|
||||||
|
'id': compat_str(ep_id),
|
||||||
|
}
|
||||||
|
elif ep_url:
|
||||||
|
e = {'url': ep_url}
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
e.update({
|
||||||
|
'_type': 'url',
|
||||||
|
'ie_key': VideomoreIE.ie_key(),
|
||||||
|
'title': str_or_none(ep.get('title')),
|
||||||
|
'thumbnail': url_or_none(ep.get('image')),
|
||||||
|
'duration': parse_duration(ep.get('duration')),
|
||||||
|
'episode_number': int_or_none(ep.get('number')),
|
||||||
|
'upload_date': unified_strdate(ep.get('date')),
|
||||||
|
})
|
||||||
|
entries.append(e)
|
||||||
|
|
||||||
|
if not entries:
|
||||||
|
entries = [
|
||||||
|
self.url_result(
|
||||||
|
'videomore:%s' % video_id, ie=VideomoreIE.ie_key(),
|
||||||
|
video_id=video_id)
|
||||||
|
for video_id in orderedSet(re.findall(
|
||||||
|
r':(?:id|key)=["\'](\d+)["\']', webpage))]
|
||||||
|
|
||||||
|
if not entries:
|
||||||
entries = [
|
entries = [
|
||||||
self.url_result(item) for item in re.findall(
|
self.url_result(item) for item in re.findall(
|
||||||
r'<a[^>]+href="((?:https?:)?//videomore\.ru/%s/[^/]+)"[^>]+class="widget-item-desc"'
|
r'<a[^>]+href="((?:https?:)?//videomore\.ru/%s/[^/]+)"[^>]+class="widget-item-desc"'
|
||||||
|
@ -435,6 +435,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
'url': 'https://vimeo.com/160743502/abd0e13fb4',
|
'url': 'https://vimeo.com/160743502/abd0e13fb4',
|
||||||
'only_matching': True,
|
'only_matching': True,
|
||||||
}
|
}
|
||||||
|
# https://gettingthingsdone.com/workflowmap/
|
||||||
|
# vimeo embed with check-password page protected by Referer header
|
||||||
]
|
]
|
||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
@ -465,20 +467,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
urls = VimeoIE._extract_urls(url, webpage)
|
urls = VimeoIE._extract_urls(url, webpage)
|
||||||
return urls[0] if urls else None
|
return urls[0] if urls else None
|
||||||
|
|
||||||
def _verify_player_video_password(self, url, video_id):
|
def _verify_player_video_password(self, url, video_id, headers):
|
||||||
password = self._downloader.params.get('videopassword')
|
password = self._downloader.params.get('videopassword')
|
||||||
if password is None:
|
if password is None:
|
||||||
raise ExtractorError('This video is protected by a password, use the --video-password option')
|
raise ExtractorError('This video is protected by a password, use the --video-password option')
|
||||||
data = urlencode_postdata({
|
data = urlencode_postdata({
|
||||||
'password': base64.b64encode(password.encode()),
|
'password': base64.b64encode(password.encode()),
|
||||||
})
|
})
|
||||||
pass_url = url + '/check-password'
|
headers = merge_dicts(headers, {
|
||||||
password_request = sanitized_Request(pass_url, data)
|
'Content-Type': 'application/x-www-form-urlencoded',
|
||||||
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
|
})
|
||||||
password_request.add_header('Referer', url)
|
checked = self._download_json(
|
||||||
return self._download_json(
|
url + '/check-password', video_id,
|
||||||
password_request, video_id,
|
'Verifying the password', data=data, headers=headers)
|
||||||
'Verifying the password', 'Wrong password')
|
if checked is False:
|
||||||
|
raise ExtractorError('Wrong video password', expected=True)
|
||||||
|
return checked
|
||||||
|
|
||||||
def _real_initialize(self):
|
def _real_initialize(self):
|
||||||
self._login()
|
self._login()
|
||||||
@ -591,7 +595,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
|||||||
cause=e)
|
cause=e)
|
||||||
else:
|
else:
|
||||||
if config.get('view') == 4:
|
if config.get('view') == 4:
|
||||||
config = self._verify_player_video_password(redirect_url, video_id)
|
config = self._verify_player_video_password(redirect_url, video_id, headers)
|
||||||
|
|
||||||
vod = config.get('video', {}).get('vod', {})
|
vod = config.get('video', {}).get('vod', {})
|
||||||
|
|
||||||
|
@ -11,10 +11,12 @@ import time
|
|||||||
|
|
||||||
from .common import InfoExtractor
|
from .common import InfoExtractor
|
||||||
from ..compat import (
|
from ..compat import (
|
||||||
|
compat_HTTPError,
|
||||||
compat_urllib_parse_urlencode,
|
compat_urllib_parse_urlencode,
|
||||||
compat_urllib_parse,
|
compat_urllib_parse,
|
||||||
)
|
)
|
||||||
from ..utils import (
|
from ..utils import (
|
||||||
|
ExtractorError,
|
||||||
float_or_none,
|
float_or_none,
|
||||||
int_or_none,
|
int_or_none,
|
||||||
)
|
)
|
||||||
@ -24,29 +26,41 @@ class VRVBaseIE(InfoExtractor):
|
|||||||
_API_DOMAIN = None
|
_API_DOMAIN = None
|
||||||
_API_PARAMS = {}
|
_API_PARAMS = {}
|
||||||
_CMS_SIGNING = {}
|
_CMS_SIGNING = {}
|
||||||
|
_TOKEN = None
|
||||||
|
_TOKEN_SECRET = ''
|
||||||
|
|
||||||
def _call_api(self, path, video_id, note, data=None):
|
def _call_api(self, path, video_id, note, data=None):
|
||||||
|
# https://tools.ietf.org/html/rfc5849#section-3
|
||||||
base_url = self._API_DOMAIN + '/core/' + path
|
base_url = self._API_DOMAIN + '/core/' + path
|
||||||
encoded_query = compat_urllib_parse_urlencode({
|
query = [
|
||||||
'oauth_consumer_key': self._API_PARAMS['oAuthKey'],
|
('oauth_consumer_key', self._API_PARAMS['oAuthKey']),
|
||||||
'oauth_nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
|
('oauth_nonce', ''.join([random.choice(string.ascii_letters) for _ in range(32)])),
|
||||||
'oauth_signature_method': 'HMAC-SHA1',
|
('oauth_signature_method', 'HMAC-SHA1'),
|
||||||
'oauth_timestamp': int(time.time()),
|
('oauth_timestamp', int(time.time())),
|
||||||
'oauth_version': '1.0',
|
]
|
||||||
})
|
if self._TOKEN:
|
||||||
|
query.append(('oauth_token', self._TOKEN))
|
||||||
|
encoded_query = compat_urllib_parse_urlencode(query)
|
||||||
headers = self.geo_verification_headers()
|
headers = self.geo_verification_headers()
|
||||||
if data:
|
if data:
|
||||||
data = json.dumps(data).encode()
|
data = json.dumps(data).encode()
|
||||||
headers['Content-Type'] = 'application/json'
|
headers['Content-Type'] = 'application/json'
|
||||||
method = 'POST' if data else 'GET'
|
base_string = '&'.join([
|
||||||
base_string = '&'.join([method, compat_urllib_parse.quote(base_url, ''), compat_urllib_parse.quote(encoded_query, '')])
|
'POST' if data else 'GET',
|
||||||
|
compat_urllib_parse.quote(base_url, ''),
|
||||||
|
compat_urllib_parse.quote(encoded_query, '')])
|
||||||
oauth_signature = base64.b64encode(hmac.new(
|
oauth_signature = base64.b64encode(hmac.new(
|
||||||
(self._API_PARAMS['oAuthSecret'] + '&').encode('ascii'),
|
(self._API_PARAMS['oAuthSecret'] + '&' + self._TOKEN_SECRET).encode('ascii'),
|
||||||
base_string.encode(), hashlib.sha1).digest()).decode()
|
base_string.encode(), hashlib.sha1).digest()).decode()
|
||||||
encoded_query += '&oauth_signature=' + compat_urllib_parse.quote(oauth_signature, '')
|
encoded_query += '&oauth_signature=' + compat_urllib_parse.quote(oauth_signature, '')
|
||||||
|
try:
|
||||||
return self._download_json(
|
return self._download_json(
|
||||||
'?'.join([base_url, encoded_query]), video_id,
|
'?'.join([base_url, encoded_query]), video_id,
|
||||||
note='Downloading %s JSON metadata' % note, headers=headers, data=data)
|
note='Downloading %s JSON metadata' % note, headers=headers, data=data)
|
||||||
|
except ExtractorError as e:
|
||||||
|
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
|
||||||
|
raise ExtractorError(json.loads(e.cause.read().decode())['message'], expected=True)
|
||||||
|
raise
|
||||||
|
|
||||||
def _call_cms(self, path, video_id, note):
|
def _call_cms(self, path, video_id, note):
|
||||||
if not self._CMS_SIGNING:
|
if not self._CMS_SIGNING:
|
||||||
@ -55,19 +69,22 @@ class VRVBaseIE(InfoExtractor):
|
|||||||
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
|
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
|
||||||
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())
|
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())
|
||||||
|
|
||||||
def _set_api_params(self, webpage, video_id):
|
|
||||||
if not self._API_PARAMS:
|
|
||||||
self._API_PARAMS = self._parse_json(self._search_regex(
|
|
||||||
r'window\.__APP_CONFIG__\s*=\s*({.+?})</script>',
|
|
||||||
webpage, 'api config'), video_id)['cxApiParams']
|
|
||||||
self._API_DOMAIN = self._API_PARAMS.get('apiDomain', 'https://api.vrv.co')
|
|
||||||
|
|
||||||
def _get_cms_resource(self, resource_key, video_id):
|
def _get_cms_resource(self, resource_key, video_id):
|
||||||
return self._call_api(
|
return self._call_api(
|
||||||
'cms_resource', video_id, 'resource path', data={
|
'cms_resource', video_id, 'resource path', data={
|
||||||
'resource_key': resource_key,
|
'resource_key': resource_key,
|
||||||
})['__links__']['cms_resource']['href']
|
})['__links__']['cms_resource']['href']
|
||||||
|
|
||||||
|
def _real_initialize(self):
|
||||||
|
webpage = self._download_webpage(
|
||||||
|
'https://vrv.co/', None, headers=self.geo_verification_headers())
|
||||||
|
self._API_PARAMS = self._parse_json(self._search_regex(
|
||||||
|
[
|
||||||
|
r'window\.__APP_CONFIG__\s*=\s*({.+?})(?:</script>|;)',
|
||||||
|
r'window\.__APP_CONFIG__\s*=\s*({.+})'
|
||||||
|
], webpage, 'app config'), None)['cxApiParams']
|
||||||
|
self._API_DOMAIN = self._API_PARAMS.get('apiDomain', 'https://api.vrv.co')
|
||||||
|
|
||||||
|
|
||||||
class VRVIE(VRVBaseIE):
|
class VRVIE(VRVBaseIE):
|
||||||
IE_NAME = 'vrv'
|
IE_NAME = 'vrv'
|
||||||
@ -86,6 +103,22 @@ class VRVIE(VRVBaseIE):
|
|||||||
'skip_download': True,
|
'skip_download': True,
|
||||||
},
|
},
|
||||||
}]
|
}]
|
||||||
|
_NETRC_MACHINE = 'vrv'
|
||||||
|
|
||||||
|
def _real_initialize(self):
|
||||||
|
super(VRVIE, self)._real_initialize()
|
||||||
|
|
||||||
|
email, password = self._get_login_info()
|
||||||
|
if email is None:
|
||||||
|
return
|
||||||
|
|
||||||
|
token_credentials = self._call_api(
|
||||||
|
'authenticate/by:credentials', None, 'Token Credentials', data={
|
||||||
|
'email': email,
|
||||||
|
'password': password,
|
||||||
|
})
|
||||||
|
self._TOKEN = token_credentials['oauth_token']
|
||||||
|
self._TOKEN_SECRET = token_credentials['oauth_token_secret']
|
||||||
|
|
||||||
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
|
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
|
||||||
if not url or stream_format not in ('hls', 'dash'):
|
if not url or stream_format not in ('hls', 'dash'):
|
||||||
@ -116,27 +149,15 @@ class VRVIE(VRVBaseIE):
|
|||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
video_id = self._match_id(url)
|
video_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(
|
|
||||||
url, video_id,
|
|
||||||
headers=self.geo_verification_headers())
|
|
||||||
media_resource = self._parse_json(self._search_regex(
|
|
||||||
[
|
|
||||||
r'window\.__INITIAL_STATE__\s*=\s*({.+?})(?:</script>|;)',
|
|
||||||
r'window\.__INITIAL_STATE__\s*=\s*({.+})'
|
|
||||||
], webpage, 'inital state'), video_id).get('watch', {}).get('mediaResource') or {}
|
|
||||||
|
|
||||||
video_data = media_resource.get('json')
|
|
||||||
if not video_data:
|
|
||||||
self._set_api_params(webpage, video_id)
|
|
||||||
episode_path = self._get_cms_resource(
|
episode_path = self._get_cms_resource(
|
||||||
'cms:/episodes/' + video_id, video_id)
|
'cms:/episodes/' + video_id, video_id)
|
||||||
video_data = self._call_cms(episode_path, video_id, 'video')
|
video_data = self._call_cms(episode_path, video_id, 'video')
|
||||||
title = video_data['title']
|
title = video_data['title']
|
||||||
|
|
||||||
streams_json = media_resource.get('streams', {}).get('json', {})
|
streams_path = video_data['__links__'].get('streams', {}).get('href')
|
||||||
if not streams_json:
|
if not streams_path:
|
||||||
self._set_api_params(webpage, video_id)
|
self.raise_login_required()
|
||||||
streams_path = video_data['__links__']['streams']['href']
|
|
||||||
streams_json = self._call_cms(streams_path, video_id, 'streams')
|
streams_json = self._call_cms(streams_path, video_id, 'streams')
|
||||||
|
|
||||||
audio_locale = streams_json.get('audio_locale')
|
audio_locale = streams_json.get('audio_locale')
|
||||||
@ -202,11 +223,7 @@ class VRVSeriesIE(VRVBaseIE):
|
|||||||
|
|
||||||
def _real_extract(self, url):
|
def _real_extract(self, url):
|
||||||
series_id = self._match_id(url)
|
series_id = self._match_id(url)
|
||||||
webpage = self._download_webpage(
|
|
||||||
url, series_id,
|
|
||||||
headers=self.geo_verification_headers())
|
|
||||||
|
|
||||||
self._set_api_params(webpage, series_id)
|
|
||||||
seasons_path = self._get_cms_resource(
|
seasons_path = self._get_cms_resource(
|
||||||
'cms:/seasons?series_id=' + series_id, series_id)
|
'cms:/seasons?series_id=' + series_id, series_id)
|
||||||
seasons_data = self._call_cms(seasons_path, series_id, 'seasons')
|
seasons_data = self._call_cms(seasons_path, series_id, 'seasons')
|
||||||
|
@ -1868,7 +1868,7 @@ def urljoin(base, path):
|
|||||||
path = path.decode('utf-8')
|
path = path.decode('utf-8')
|
||||||
if not isinstance(path, compat_str) or not path:
|
if not isinstance(path, compat_str) or not path:
|
||||||
return None
|
return None
|
||||||
if re.match(r'^(?:https?:)?//', path):
|
if re.match(r'^(?:[a-zA-Z][a-zA-Z0-9+-.]*:)?//', path):
|
||||||
return path
|
return path
|
||||||
if isinstance(base, bytes):
|
if isinstance(base, bytes):
|
||||||
base = base.decode('utf-8')
|
base = base.decode('utf-8')
|
||||||
|
@ -1,3 +1,3 @@
|
|||||||
from __future__ import unicode_literals
|
from __future__ import unicode_literals
|
||||||
|
|
||||||
__version__ = '2019.01.17'
|
__version__ = '2019.01.23'
|
||||||
|
Loading…
x
Reference in New Issue
Block a user