Merge branch 'master' into fix.25.12.2018

# Conflicts:
#	youtube_dl/version.py
This commit is contained in:
Avi Peretz 2019-01-28 12:31:34 +02:00
commit d086a13545
21 changed files with 332 additions and 102 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.17*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.01.27*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.17** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.01.27**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.01.17 [debug] youtube-dl version 2019.01.27
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -339,7 +339,7 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
``` ```
### Use safe conversion functions ### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
@ -347,6 +347,8 @@ Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON. Use `try_get` for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions. Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples #### More examples

View File

@ -1,3 +1,57 @@
version 2019.01.27
Core
+ [extractor/common] Extract season in _json_ld
* [postprocessor/ffmpeg] Fallback to ffmpeg/avconv for audio codec detection
(#681)
Extractors
* [vice] Fix extraction for locked videos (#16248)
+ [wakanim] Detect DRM protected videos
+ [wakanim] Add support for wakanim.tv (#14374)
* [usatoday] Fix extraction for videos with custom brightcove partner id
(#18990)
* [drtv] Fix extraction (#18989)
* [nhk] Extend URL regular expression (#18968)
* [go] Fix Adobe Pass requests for Disney Now (#18901)
+ [openload] Add support for oload.club (#18969)
version 2019.01.24
Core
* [YoutubeDL] Fix negation for string operators in format selection (#18961)
version 2019.01.23
Core
* [utils] Fix urljoin for paths with non-http(s) schemes
* [extractor/common] Improve jwplayer relative URL handling (#18892)
+ [YoutubeDL] Add negation support for string comparisons in format selection
expressions (#18600, #18805)
* [extractor/common] Improve HLS video-only format detection (#18923)
Extractors
* [crunchyroll] Extend URL regular expression (#18955)
* [pornhub] Bypass scrape detection (#4822, #5930, #7074, #10175, #12722,
#17197, #18338 #18842, #18899)
+ [vrv] Add support for authentication (#14307)
* [videomore:season] Fix extraction
* [videomore] Improve extraction (#18908)
+ [tnaflix] Pass Referer in metadata request (#18925)
* [radiocanada] Relax DRM check (#18608, #18609)
* [vimeo] Fix video password verification for videos protected by
Referer HTTP header
+ [hketv] Add support for hkedcity.net (#18696)
+ [streamango] Add support for fruithosts.net (#18710)
+ [instagram] Add support for tags (#18757)
+ [odnoklassniki] Detect paid videos (#18876)
* [ted] Correct acodec for HTTP formats (#18923)
* [cartoonnetwork] Fix extraction (#15664, #17224)
* [vimeo] Fix extraction for password protected player URLs (#18889)
version 2019.01.17 version 2019.01.17
Extractors Extractors

View File

@ -1213,7 +1213,7 @@ Incorrect:
'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4' 'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
``` ```
### Use safe conversion functions ### Use convenience conversion and parsing functions
Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well. Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
@ -1221,6 +1221,8 @@ Use `url_or_none` for safe URL processing.
Use `try_get` for safe metadata extraction from parsed JSON. Use `try_get` for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions. Explore [`youtube_dl/utils.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
#### More examples #### More examples

View File

@ -361,6 +361,7 @@
- **hitbox** - **hitbox**
- **hitbox:live** - **hitbox:live**
- **HitRecord** - **HitRecord**
- **hketv**: 香港教育局教育電視 (HKETV) Educational Television, Hong Kong Educational Bureau
- **HornBunny** - **HornBunny**
- **HotNewHipHop** - **HotNewHipHop**
- **hotstar** - **hotstar**
@ -386,6 +387,7 @@
- **IndavideoEmbed** - **IndavideoEmbed**
- **InfoQ** - **InfoQ**
- **Instagram** - **Instagram**
- **instagram:tag**: Instagram hashtag search
- **instagram:user**: Instagram user profile - **instagram:user**: Instagram user profile
- **Internazionale** - **Internazionale**
- **InternetVideoArchive** - **InternetVideoArchive**
@ -1067,6 +1069,7 @@
- **VVVVID** - **VVVVID**
- **VyboryMos** - **VyboryMos**
- **Vzaar** - **Vzaar**
- **Wakanim**
- **Walla** - **Walla**
- **WalyTV** - **WalyTV**
- **washingtonpost** - **washingtonpost**

View File

@ -242,6 +242,7 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection_string_ops(self): def test_format_selection_string_ops(self):
formats = [ formats = [
{'format_id': 'abc-cba', 'ext': 'mp4', 'url': TEST_URL}, {'format_id': 'abc-cba', 'ext': 'mp4', 'url': TEST_URL},
{'format_id': 'zxc-cxz', 'ext': 'webm', 'url': TEST_URL},
] ]
info_dict = _make_result(formats) info_dict = _make_result(formats)
@ -253,6 +254,11 @@ class TestFormatSelection(unittest.TestCase):
# does not equal (!=) # does not equal (!=)
ydl = YDL({'format': '[format_id!=abc-cba]'}) ydl = YDL({'format': '[format_id!=abc-cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!=abc-cba][format_id!=zxc-cxz]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy()) self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# starts with (^=) # starts with (^=)
@ -262,7 +268,12 @@ class TestFormatSelection(unittest.TestCase):
self.assertEqual(downloaded['format_id'], 'abc-cba') self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not start with (!^=) # does not start with (!^=)
ydl = YDL({'format': '[format_id!^=abc-cba]'}) ydl = YDL({'format': '[format_id!^=abc]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!^=abc][format_id!^=zxc]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy()) self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# ends with ($=) # ends with ($=)
@ -272,16 +283,29 @@ class TestFormatSelection(unittest.TestCase):
self.assertEqual(downloaded['format_id'], 'abc-cba') self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not end with (!$=) # does not end with (!$=)
ydl = YDL({'format': '[format_id!$=abc-cba]'}) ydl = YDL({'format': '[format_id!$=cba]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!$=cba][format_id!$=cxz]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy()) self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
# contains (*=) # contains (*=)
ydl = YDL({'format': '[format_id*=-]'}) ydl = YDL({'format': '[format_id*=bc-cb]'})
ydl.process_ie_result(info_dict.copy()) ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'abc-cba') self.assertEqual(downloaded['format_id'], 'abc-cba')
# does not contain (!*=) # does not contain (!*=)
ydl = YDL({'format': '[format_id!*=bc-cb]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'zxc-cxz')
ydl = YDL({'format': '[format_id!*=abc][format_id!*=zxc]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())
ydl = YDL({'format': '[format_id!*=-]'}) ydl = YDL({'format': '[format_id!*=-]'})
self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy()) self.assertRaises(ExtractorError, ydl.process_ie_result, info_dict.copy())

View File

@ -1078,7 +1078,7 @@ class YoutubeDL(object):
comparison_value = m.group('value') comparison_value = m.group('value')
str_op = STR_OPERATORS[m.group('op')] str_op = STR_OPERATORS[m.group('op')]
if m.group('negation'): if m.group('negation'):
op = lambda attr, value: not str_op op = lambda attr, value: not str_op(attr, value)
else: else:
op = str_op op = str_op

View File

@ -1249,7 +1249,10 @@ class InfoExtractor(object):
info['title'] = episode_name info['title'] = episode_name
part_of_season = e.get('partOfSeason') part_of_season = e.get('partOfSeason')
if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'): if isinstance(part_of_season, dict) and part_of_season.get('@type') in ('TVSeason', 'Season', 'CreativeWorkSeason'):
info['season_number'] = int_or_none(part_of_season.get('seasonNumber')) info.update({
'season': unescapeHTML(part_of_season.get('name')),
'season_number': int_or_none(part_of_season.get('seasonNumber')),
})
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries') part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'): if isinstance(part_of_series, dict) and part_of_series.get('@type') in ('TVSeries', 'Series', 'CreativeWorkSeries'):
info['series'] = unescapeHTML(part_of_series.get('name')) info['series'] = unescapeHTML(part_of_series.get('name'))

View File

@ -144,7 +144,7 @@ class CrunchyrollBaseIE(InfoExtractor):
class CrunchyrollIE(CrunchyrollBaseIE, VRVIE): class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
IE_NAME = 'crunchyroll' IE_NAME = 'crunchyroll'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)' _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|(?:[^/]*/){1,2}[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513', 'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
'info_dict': { 'info_dict': {
@ -269,6 +269,9 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
}, { }, {
'url': 'http://www.crunchyroll.com/media-723735', 'url': 'http://www.crunchyroll.com/media-723735',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/en-gb/mob-psycho-100/episode-2-urban-legends-encountering-rumors-780921',
'only_matching': True,
}] }]
_FORMAT_IDS = { _FORMAT_IDS = {

View File

@ -77,10 +77,9 @@ class DRTVIE(InfoExtractor):
r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'), r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'),
webpage, 'video id') webpage, 'video id')
programcard = self._download_json( data = self._download_json(
'http://www.dr.dk/mu/programcard/expanded/%s' % video_id, 'https://www.dr.dk/mu-online/api/1.4/programcard/%s' % video_id,
video_id, 'Downloading video JSON') video_id, 'Downloading video JSON', query={'expanded': 'true'})
data = programcard['Data'][0]
title = remove_end(self._og_search_title( title = remove_end(self._og_search_title(
webpage, default=None), ' | TV | DR') or data['Title'] webpage, default=None), ' | TV | DR') or data['Title']
@ -97,7 +96,7 @@ class DRTVIE(InfoExtractor):
formats = [] formats = []
subtitles = {} subtitles = {}
for asset in data['Assets']: for asset in [data['PrimaryAsset']]:
kind = asset.get('Kind') kind = asset.get('Kind')
if kind == 'Image': if kind == 'Image':
thumbnail = asset.get('Uri') thumbnail = asset.get('Uri')

View File

@ -1373,6 +1373,7 @@ from .vuclip import VuClipIE
from .vvvvid import VVVVIDIE from .vvvvid import VVVVIDIE
from .vyborymos import VyboryMosIE from .vyborymos import VyboryMosIE
from .vzaar import VzaarIE from .vzaar import VzaarIE
from .wakanim import WakanimIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import ( from .washingtonpost import (
WashingtonPostIE, WashingtonPostIE,

View File

@ -25,15 +25,15 @@ class GoIE(AdobePassIE):
}, },
'watchdisneychannel': { 'watchdisneychannel': {
'brand': '004', 'brand': '004',
'requestor_id': 'Disney', 'resource_id': 'Disney',
}, },
'watchdisneyjunior': { 'watchdisneyjunior': {
'brand': '008', 'brand': '008',
'requestor_id': 'DisneyJunior', 'resource_id': 'DisneyJunior',
}, },
'watchdisneyxd': { 'watchdisneyxd': {
'brand': '009', 'brand': '009',
'requestor_id': 'DisneyXD', 'resource_id': 'DisneyXD',
} }
} }
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\ _VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/(?:(?:[^/]+/)*(?P<id>vdka\w+)|(?:[^/]+/)*(?P<display_id>[^/?#]+))'\
@ -130,8 +130,8 @@ class GoIE(AdobePassIE):
'device': '001', 'device': '001',
} }
if video_data.get('accesslevel') == '1': if video_data.get('accesslevel') == '1':
requestor_id = site_info['requestor_id'] requestor_id = site_info.get('requestor_id', 'DisneyChannels')
resource = self._get_mvpd_resource( resource = site_info.get('resource_id') or self._get_mvpd_resource(
requestor_id, title, video_id, None) requestor_id, title, video_id, None)
auth = self._extract_mvpd_auth( auth = self._extract_mvpd_auth(
url, video_id, requestor_id, resource) url, video_id, requestor_id, resource)

View File

@ -5,8 +5,8 @@ from ..utils import ExtractorError
class NhkVodIE(InfoExtractor): class NhkVodIE(InfoExtractor):
_VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/vod/(?P<id>[^/]+/[^/?#&]+)' _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/en/(?:vod|ondemand)/(?P<id>[^/]+/[^/?#&]+)'
_TEST = { _TESTS = [{
# Videos available only for a limited period of time. Visit # Videos available only for a limited period of time. Visit
# http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples. # http://www3.nhk.or.jp/nhkworld/en/vod/ for working samples.
'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815', 'url': 'http://www3.nhk.or.jp/nhkworld/en/vod/tokyofashion/20160815',
@ -19,7 +19,10 @@ class NhkVodIE(InfoExtractor):
'episode': 'The Kimono as Global Fashion', 'episode': 'The Kimono as Global Fashion',
}, },
'skip': 'Videos available only for a limited period of time', 'skip': 'Videos available only for a limited period of time',
} }, {
'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/video/2015173/',
'only_matching': True,
}]
_API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k' _API_URL = 'http://api.nhk.or.jp/nhkworld/vodesdlist/v1/all/all/all.json?apikey=EJfK8jdS57GqlupFgAfAAwr573q01y6k'
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -249,7 +249,7 @@ class OpenloadIE(InfoExtractor):
(?:www\.)? (?:www\.)?
(?: (?:
openload\.(?:co|io|link)| openload\.(?:co|io|link)|
oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun) oload\.(?:tv|stream|site|xyz|win|download|cloud|cc|icu|fun|club)
) )
)/ )/
(?:f|embed)/ (?:f|embed)/
@ -334,6 +334,9 @@ class OpenloadIE(InfoExtractor):
}, { }, {
'url': 'https://oload.fun/f/gb6G1H4sHXY', 'url': 'https://oload.fun/f/gb6G1H4sHXY',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://oload.club/f/Nr1L-aZ2dbQ',
'only_matching': True,
}] }]
_USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36' _USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'

View File

@ -10,7 +10,9 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_str, compat_str,
compat_urllib_request,
) )
from .openload import PhantomJSwrapper
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -22,7 +24,29 @@ from ..utils import (
) )
class PornHubIE(InfoExtractor): class PornHubBaseIE(InfoExtractor):
def _download_webpage_handle(self, *args, **kwargs):
def dl(*args, **kwargs):
return super(PornHubBaseIE, self)._download_webpage_handle(*args, **kwargs)
webpage, urlh = dl(*args, **kwargs)
if any(re.search(p, webpage) for p in (
r'<body\b[^>]+\bonload=["\']go\(\)',
r'document\.cookie\s*=\s*["\']RNKEY=',
r'document\.location\.reload\(true\)')):
url_or_request = args[0]
url = (url_or_request.get_full_url()
if isinstance(url_or_request, compat_urllib_request.Request)
else url_or_request)
phantom = PhantomJSwrapper(self, required_version='2.0')
phantom.get(url, html=webpage)
webpage, urlh = dl(*args, **kwargs)
return webpage, urlh
class PornHubIE(PornHubBaseIE):
IE_DESC = 'PornHub and Thumbzilla' IE_DESC = 'PornHub and Thumbzilla'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -307,7 +331,7 @@ class PornHubIE(InfoExtractor):
} }
class PornHubPlaylistBaseIE(InfoExtractor): class PornHubPlaylistBaseIE(PornHubBaseIE):
def _extract_entries(self, webpage, host): def _extract_entries(self, webpage, host):
# Only process container div with main playlist content skipping # Only process container div with main playlist content skipping
# drop-down menu that uses similar pattern for videos (see # drop-down menu that uses similar pattern for videos (see

View File

@ -3,21 +3,23 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError,
get_element_by_attribute, get_element_by_attribute,
parse_duration, parse_duration,
try_get,
update_url_query, update_url_query,
ExtractorError,
) )
from ..compat import compat_str from ..compat import compat_str
class USATodayIE(InfoExtractor): class USATodayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?usatoday\.com/(?:[^/]+/)*(?P<id>[^?/#]+)' _VALID_URL = r'https?://(?:www\.)?usatoday\.com/(?:[^/]+/)*(?P<id>[^?/#]+)'
_TEST = { _TESTS = [{
# Brightcove Partner ID = 29906170001
'url': 'http://www.usatoday.com/media/cinematic/video/81729424/us-france-warn-syrian-regime-ahead-of-new-peace-talks/', 'url': 'http://www.usatoday.com/media/cinematic/video/81729424/us-france-warn-syrian-regime-ahead-of-new-peace-talks/',
'md5': '4d40974481fa3475f8bccfd20c5361f8', 'md5': '033587d2529dc3411a1ab3644c3b8827',
'info_dict': { 'info_dict': {
'id': '81729424', 'id': '4799374959001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'US, France warn Syrian regime ahead of new peace talks', 'title': 'US, France warn Syrian regime ahead of new peace talks',
'timestamp': 1457891045, 'timestamp': 1457891045,
@ -25,8 +27,20 @@ class USATodayIE(InfoExtractor):
'uploader_id': '29906170001', 'uploader_id': '29906170001',
'upload_date': '20160313', 'upload_date': '20160313',
} }
} }, {
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/29906170001/38a9eecc-bdd8-42a3-ba14-95397e48b3f8_default/index.html?videoId=%s' # ui-video-data[asset_metadata][items][brightcoveaccount] = 28911775001
'url': 'https://www.usatoday.com/story/tech/science/2018/08/21/yellowstone-supervolcano-eruption-stop-worrying-its-blow/973633002/',
'info_dict': {
'id': '5824495846001',
'ext': 'mp4',
'title': 'Yellowstone more likely to crack rather than explode',
'timestamp': 1534790612,
'description': 'md5:3715e7927639a4f16b474e9391687c62',
'uploader_id': '28911775001',
'upload_date': '20180820',
}
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/%s/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -35,10 +49,11 @@ class USATodayIE(InfoExtractor):
if not ui_video_data: if not ui_video_data:
raise ExtractorError('no video on the webpage', expected=True) raise ExtractorError('no video on the webpage', expected=True)
video_data = self._parse_json(ui_video_data, display_id) video_data = self._parse_json(ui_video_data, display_id)
item = try_get(video_data, lambda x: x['asset_metadata']['items'], dict) or {}
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': self.BRIGHTCOVE_URL_TEMPLATE % video_data['brightcove_id'], 'url': self.BRIGHTCOVE_URL_TEMPLATE % (item.get('brightcoveaccount', '29906170001'), item.get('brightcoveid') or video_data['brightcove_id']),
'id': compat_str(video_data['id']), 'id': compat_str(video_data['id']),
'title': video_data['title'], 'title': video_data['title'],
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('thumbnail'),

View File

@ -94,7 +94,6 @@ class ViceIE(AdobePassIE):
'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1', 'url': 'https://www.viceland.com/en_us/video/thursday-march-1-2018/5a8f2d7ff1cdb332dd446ec1',
'only_matching': True, 'only_matching': True,
}] }]
_PREPLAY_HOST = 'vms.vice'
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
@ -158,9 +157,8 @@ class ViceIE(AdobePassIE):
}) })
try: try:
host = 'www.viceland' if is_locked else self._PREPLAY_HOST
preplay = self._download_json( preplay = self._download_json(
'https://%s.com/%s/video/preplay/%s' % (host, locale, video_id), 'https://vms.vice.com/%s/video/preplay/%s' % (locale, video_id),
video_id, query=query) video_id, query=query)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401): if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401):

View File

@ -11,10 +11,12 @@ import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_parse, compat_urllib_parse,
) )
from ..utils import ( from ..utils import (
ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
) )
@ -24,29 +26,41 @@ class VRVBaseIE(InfoExtractor):
_API_DOMAIN = None _API_DOMAIN = None
_API_PARAMS = {} _API_PARAMS = {}
_CMS_SIGNING = {} _CMS_SIGNING = {}
_TOKEN = None
_TOKEN_SECRET = ''
def _call_api(self, path, video_id, note, data=None): def _call_api(self, path, video_id, note, data=None):
# https://tools.ietf.org/html/rfc5849#section-3
base_url = self._API_DOMAIN + '/core/' + path base_url = self._API_DOMAIN + '/core/' + path
encoded_query = compat_urllib_parse_urlencode({ query = [
'oauth_consumer_key': self._API_PARAMS['oAuthKey'], ('oauth_consumer_key', self._API_PARAMS['oAuthKey']),
'oauth_nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]), ('oauth_nonce', ''.join([random.choice(string.ascii_letters) for _ in range(32)])),
'oauth_signature_method': 'HMAC-SHA1', ('oauth_signature_method', 'HMAC-SHA1'),
'oauth_timestamp': int(time.time()), ('oauth_timestamp', int(time.time())),
'oauth_version': '1.0', ]
}) if self._TOKEN:
query.append(('oauth_token', self._TOKEN))
encoded_query = compat_urllib_parse_urlencode(query)
headers = self.geo_verification_headers() headers = self.geo_verification_headers()
if data: if data:
data = json.dumps(data).encode() data = json.dumps(data).encode()
headers['Content-Type'] = 'application/json' headers['Content-Type'] = 'application/json'
method = 'POST' if data else 'GET' base_string = '&'.join([
base_string = '&'.join([method, compat_urllib_parse.quote(base_url, ''), compat_urllib_parse.quote(encoded_query, '')]) 'POST' if data else 'GET',
compat_urllib_parse.quote(base_url, ''),
compat_urllib_parse.quote(encoded_query, '')])
oauth_signature = base64.b64encode(hmac.new( oauth_signature = base64.b64encode(hmac.new(
(self._API_PARAMS['oAuthSecret'] + '&').encode('ascii'), (self._API_PARAMS['oAuthSecret'] + '&' + self._TOKEN_SECRET).encode('ascii'),
base_string.encode(), hashlib.sha1).digest()).decode() base_string.encode(), hashlib.sha1).digest()).decode()
encoded_query += '&oauth_signature=' + compat_urllib_parse.quote(oauth_signature, '') encoded_query += '&oauth_signature=' + compat_urllib_parse.quote(oauth_signature, '')
return self._download_json( try:
'?'.join([base_url, encoded_query]), video_id, return self._download_json(
note='Downloading %s JSON metadata' % note, headers=headers, data=data) '?'.join([base_url, encoded_query]), video_id,
note='Downloading %s JSON metadata' % note, headers=headers, data=data)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
raise ExtractorError(json.loads(e.cause.read().decode())['message'], expected=True)
raise
def _call_cms(self, path, video_id, note): def _call_cms(self, path, video_id, note):
if not self._CMS_SIGNING: if not self._CMS_SIGNING:
@ -55,19 +69,22 @@ class VRVBaseIE(InfoExtractor):
self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING, self._API_DOMAIN + path, video_id, query=self._CMS_SIGNING,
note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers()) note='Downloading %s JSON metadata' % note, headers=self.geo_verification_headers())
def _set_api_params(self, webpage, video_id):
if not self._API_PARAMS:
self._API_PARAMS = self._parse_json(self._search_regex(
r'window\.__APP_CONFIG__\s*=\s*({.+?})</script>',
webpage, 'api config'), video_id)['cxApiParams']
self._API_DOMAIN = self._API_PARAMS.get('apiDomain', 'https://api.vrv.co')
def _get_cms_resource(self, resource_key, video_id): def _get_cms_resource(self, resource_key, video_id):
return self._call_api( return self._call_api(
'cms_resource', video_id, 'resource path', data={ 'cms_resource', video_id, 'resource path', data={
'resource_key': resource_key, 'resource_key': resource_key,
})['__links__']['cms_resource']['href'] })['__links__']['cms_resource']['href']
def _real_initialize(self):
webpage = self._download_webpage(
'https://vrv.co/', None, headers=self.geo_verification_headers())
self._API_PARAMS = self._parse_json(self._search_regex(
[
r'window\.__APP_CONFIG__\s*=\s*({.+?})(?:</script>|;)',
r'window\.__APP_CONFIG__\s*=\s*({.+})'
], webpage, 'app config'), None)['cxApiParams']
self._API_DOMAIN = self._API_PARAMS.get('apiDomain', 'https://api.vrv.co')
class VRVIE(VRVBaseIE): class VRVIE(VRVBaseIE):
IE_NAME = 'vrv' IE_NAME = 'vrv'
@ -86,6 +103,22 @@ class VRVIE(VRVBaseIE):
'skip_download': True, 'skip_download': True,
}, },
}] }]
_NETRC_MACHINE = 'vrv'
def _real_initialize(self):
super(VRVIE, self)._real_initialize()
email, password = self._get_login_info()
if email is None:
return
token_credentials = self._call_api(
'authenticate/by:credentials', None, 'Token Credentials', data={
'email': email,
'password': password,
})
self._TOKEN = token_credentials['oauth_token']
self._TOKEN_SECRET = token_credentials['oauth_token_secret']
def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang): def _extract_vrv_formats(self, url, video_id, stream_format, audio_lang, hardsub_lang):
if not url or stream_format not in ('hls', 'dash'): if not url or stream_format not in ('hls', 'dash'):
@ -116,28 +149,16 @@ class VRVIE(VRVBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id,
headers=self.geo_verification_headers())
media_resource = self._parse_json(self._search_regex(
[
r'window\.__INITIAL_STATE__\s*=\s*({.+?})(?:</script>|;)',
r'window\.__INITIAL_STATE__\s*=\s*({.+})'
], webpage, 'inital state'), video_id).get('watch', {}).get('mediaResource') or {}
video_data = media_resource.get('json') episode_path = self._get_cms_resource(
if not video_data: 'cms:/episodes/' + video_id, video_id)
self._set_api_params(webpage, video_id) video_data = self._call_cms(episode_path, video_id, 'video')
episode_path = self._get_cms_resource(
'cms:/episodes/' + video_id, video_id)
video_data = self._call_cms(episode_path, video_id, 'video')
title = video_data['title'] title = video_data['title']
streams_json = media_resource.get('streams', {}).get('json', {}) streams_path = video_data['__links__'].get('streams', {}).get('href')
if not streams_json: if not streams_path:
self._set_api_params(webpage, video_id) self.raise_login_required()
streams_path = video_data['__links__']['streams']['href'] streams_json = self._call_cms(streams_path, video_id, 'streams')
streams_json = self._call_cms(streams_path, video_id, 'streams')
audio_locale = streams_json.get('audio_locale') audio_locale = streams_json.get('audio_locale')
formats = [] formats = []
@ -202,11 +223,7 @@ class VRVSeriesIE(VRVBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
series_id = self._match_id(url) series_id = self._match_id(url)
webpage = self._download_webpage(
url, series_id,
headers=self.geo_verification_headers())
self._set_api_params(webpage, series_id)
seasons_path = self._get_cms_resource( seasons_path = self._get_cms_resource(
'cms:/seasons?series_id=' + series_id, series_id) 'cms:/seasons?series_id=' + series_id, series_id)
seasons_data = self._call_cms(seasons_path, series_id, 'seasons') seasons_data = self._call_cms(seasons_path, series_id, 'seasons')

View File

@ -0,0 +1,66 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
merge_dicts,
urljoin,
)
class WakanimIE(InfoExtractor):
_VALID_URL = r'https://(?:www\.)?wakanim\.tv/[^/]+/v2/catalogue/episode/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.wakanim.tv/de/v2/catalogue/episode/2997/the-asterisk-war-omu-staffel-1-episode-02-omu',
'info_dict': {
'id': '2997',
'ext': 'mp4',
'title': 'Episode 02',
'description': 'md5:2927701ea2f7e901de8bfa8d39b2852d',
'series': 'The Asterisk War (OmU.)',
'season_number': 1,
'episode': 'Episode 02',
'episode_number': 2,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
}, {
# DRM Protected
'url': 'https://www.wakanim.tv/de/v2/catalogue/episode/7843/sword-art-online-alicization-omu-arc-2-folge-15-omu',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m3u8_url = urljoin(url, self._search_regex(
r'file\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'm3u8 url',
group='url'))
# https://docs.microsoft.com/en-us/azure/media-services/previous/media-services-content-protection-overview#streaming-urls
encryption = self._search_regex(
r'encryption%3D(c(?:enc|bc(?:s-aapl)?))',
m3u8_url, 'encryption', default=None)
if encryption and encryption in ('cenc', 'cbcs-aapl'):
raise ExtractorError('This video is DRM protected.', expected=True)
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
info = self._search_json_ld(webpage, video_id, default={})
title = self._search_regex(
(r'<h1[^>]+\bclass=["\']episode_h1[^>]+\btitle=(["\'])(?P<title>(?:(?!\1).)+)\1',
r'<span[^>]+\bclass=["\']episode_title["\'][^>]*>(?P<title>[^<]+)'),
webpage, 'title', default=None, group='title')
return merge_dicts(info, {
'id': video_id,
'title': title,
'formats': formats,
})

View File

@ -9,9 +9,6 @@ import re
from .common import AudioConversionError, PostProcessor from .common import AudioConversionError, PostProcessor
from ..compat import (
compat_subprocess_get_DEVNULL,
)
from ..utils import ( from ..utils import (
encodeArgument, encodeArgument,
encodeFilename, encodeFilename,
@ -165,27 +162,45 @@ class FFmpegPostProcessor(PostProcessor):
return self._paths[self.probe_basename] return self._paths[self.probe_basename]
def get_audio_codec(self, path): def get_audio_codec(self, path):
if not self.probe_available: if not self.probe_available and not self.available:
raise PostProcessingError('ffprobe or avprobe not found. Please install one.') raise PostProcessingError('ffprobe/avprobe and ffmpeg/avconv not found. Please install one.')
try: try:
cmd = [ if self.probe_available:
encodeFilename(self.probe_executable, True), cmd = [
encodeArgument('-show_streams'), encodeFilename(self.probe_executable, True),
encodeFilename(self._ffmpeg_filename_argument(path), True)] encodeArgument('-show_streams')]
else:
cmd = [
encodeFilename(self.executable, True),
encodeArgument('-i')]
cmd.append(encodeFilename(self._ffmpeg_filename_argument(path), True))
if self._downloader.params.get('verbose', False): if self._downloader.params.get('verbose', False):
self._downloader.to_screen('[debug] %s command line: %s' % (self.basename, shell_quote(cmd))) self._downloader.to_screen(
handle = subprocess.Popen(cmd, stderr=compat_subprocess_get_DEVNULL(), stdout=subprocess.PIPE, stdin=subprocess.PIPE) '[debug] %s command line: %s' % (self.basename, shell_quote(cmd)))
output = handle.communicate()[0] handle = subprocess.Popen(
if handle.wait() != 0: cmd, stderr=subprocess.PIPE,
stdout=subprocess.PIPE, stdin=subprocess.PIPE)
stdout_data, stderr_data = handle.communicate()
expected_ret = 0 if self.probe_available else 1
if handle.wait() != expected_ret:
return None return None
except (IOError, OSError): except (IOError, OSError):
return None return None
audio_codec = None output = (stdout_data if self.probe_available else stderr_data).decode('ascii', 'ignore')
for line in output.decode('ascii', 'ignore').split('\n'): if self.probe_available:
if line.startswith('codec_name='): audio_codec = None
audio_codec = line.split('=')[1].strip() for line in output.split('\n'):
elif line.strip() == 'codec_type=audio' and audio_codec is not None: if line.startswith('codec_name='):
return audio_codec audio_codec = line.split('=')[1].strip()
elif line.strip() == 'codec_type=audio' and audio_codec is not None:
return audio_codec
else:
# Stream #FILE_INDEX:STREAM_INDEX[STREAM_ID](LANGUAGE): CODEC_TYPE: CODEC_NAME
mobj = re.search(
r'Stream\s*#\d+:\d+(?:\[0x[0-9a-f]+\])?(?:\([a-z]{3}\))?:\s*Audio:\s*([0-9a-z]+)',
output)
if mobj:
return mobj.group(1)
return None return None
def run_ffmpeg_multiple_files(self, input_paths, out_path, opts): def run_ffmpeg_multiple_files(self, input_paths, out_path, opts):

View File

@ -1,5 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = 'vc.2019.01.28'
__version__ = 'vc.2019.01.22'