Merge remote-tracking branch 'refs/remotes/rg3/master' into shoutfactorytv

This commit is contained in:
gkoelln 2017-01-27 21:34:03 -06:00
commit aa204d3da0
59 changed files with 1610 additions and 592 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.01.10*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.01.28*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.01.10** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.01.28**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.01.10 [debug] youtube-dl version 2017.01.28
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

111
ChangeLog
View File

@ -1,3 +1,114 @@
version 2017.01.28
Core
* [utils] Improve parse_duration
Extractors
* [crunchyroll] Improve series and season metadata extraction (#11832)
* [soundcloud] Improve formats extraction and extract audio bitrate
+ [soundcloud] Extract HLS formats
* [soundcloud] Fix track URL extraction (#11852)
+ [twitch:vod] Expand URL regular expressions (#11846)
* [aenetworks] Fix season episodes extraction (#11669)
+ [tva] Add support for videos.tva.ca (#11842)
* [jamendo] Improve and extract more metadata (#11836)
+ [disney] Add support for Disney sites (#7409, #11801, #4975, #11000)
* [vevo] Remove request to old API and catch API v2 errors
+ [cmt,mtv,southpark] Add support for episode URLs (#11837)
+ [youtube] Add fallback for duration extraction (#11841)
version 2017.01.25
Extractors
+ [openload] Fallback video extension to mp4
+ [extractor/generic] Add support for Openload embeds (#11536, #11812)
* [srgssr] Fix rts video extraction (#11831)
+ [afreecatv:global] Add support for afreeca.tv (#11807)
+ [crackle] Extract vtt subtitles
+ [crackle] Extract multiple resolutions for thumbnails
+ [crackle] Add support for mobile URLs
+ [konserthusetplay] Extract subtitles (#11823)
+ [konserthusetplay] Add support for HLS videos (#11823)
* [vimeo:review] Fix config URL extraction (#11821)
version 2017.01.24
Extractors
* [pluralsight] Fix extraction (#11820)
+ [nextmedia] Add support for NextTV (壹電視)
* [24video] Fix extraction (#11811)
* [youtube:playlist] Fix nonexistent and private playlist detection (#11604)
+ [chirbit] Extract uploader (#11809)
version 2017.01.22
Extractors
+ [pornflip] Add support for pornflip.com (#11556, #11795)
* [chaturbate] Fix extraction (#11797, #11802)
+ [azmedien] Add support for AZ Medien sites (#11784, #11785)
+ [nextmedia] Support redirected URLs
+ [vimeo:channel] Extract videos' titles for playlist entries (#11796)
+ [youtube] Extract episode metadata (#9695, #11774)
+ [cspan] Support Ustream embedded videos (#11547)
+ [1tv] Add support for HLS videos (#11786)
* [uol] Fix extraction (#11770)
* [mtv] Relax triforce feed regular expression (#11766)
version 2017.01.18
Extractors
* [bilibili] Fix extraction (#11077)
+ [canalplus] Add fallback for video id (#11764)
* [20min] Fix extraction (#11683, #11751)
* [imdb] Extend URL regular expression (#11744)
+ [naver] Add support for tv.naver.com links (#11743)
version 2017.01.16
Core
* [options] Apply custom config to final composite configuration (#11741)
* [YoutubeDL] Improve protocol auto determining (#11720)
Extractors
* [xiami] Relax URL regular expressions
* [xiami] Improve track metadata extraction (#11699)
+ [limelight] Check hand-make direct HTTP links
+ [limelight] Add support for direct HTTP links at video.llnw.net (#11737)
+ [brightcove] Recognize another player ID pattern (#11688)
+ [niconico] Support login via cookies (#7968)
* [yourupload] Fix extraction (#11601)
+ [beam:live] Add support for beam.pro live streams (#10702, #11596)
* [vevo] Improve geo restriction detection
+ [dramafever] Add support for URLs with language code (#11714)
* [cbc] Improve playlist support (#11704)
version 2017.01.14
Core
+ [common] Add ability to customize akamai manifest host
+ [utils] Add more date formats
Extractors
- [mtv] Eliminate _transform_rtmp_url
* [mtv] Generalize triforce mgid extraction
+ [cmt] Add support for full episodes and video clips (#11623)
+ [mitele] Extract DASH formats
+ [ooyala] Add support for videos with embedToken (#11684)
* [mixcloud] Fix extraction (#11674)
* [openload] Fix extraction (#10408)
* [tv4] Improve extraction (#11698)
* [freesound] Fix and improve extraction (#11602)
+ [nick] Add support for beta.nick.com (#11655)
* [mtv,cc] Use HLS by default with native HLS downloader (#11641)
* [mtv] Fix non-HLS extraction
version 2017.01.10 version 2017.01.10
Extractors Extractors

View File

@ -374,7 +374,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
avprobe) avprobe)
--audio-format FORMAT Specify audio format: "best", "aac", --audio-format FORMAT Specify audio format: "best", "aac",
"vorbis", "mp3", "m4a", "opus", or "wav"; "vorbis", "mp3", "m4a", "opus", or "wav";
"best" by default "best" by default; No effect without -x
--audio-quality QUALITY Specify ffmpeg/avconv audio quality, insert --audio-quality QUALITY Specify ffmpeg/avconv audio quality, insert
a value between 0 (better) and 9 (worse) a value between 0 (better) and 9 (worse)
for VBR or a specific bitrate like 128K for VBR or a specific bitrate like 128K
@ -841,7 +841,7 @@ Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox). In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare). Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).

View File

@ -33,7 +33,8 @@
- **AdobeTVVideo** - **AdobeTVVideo**
- **AdultSwim** - **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- **AfreecaTV**: afreecatv.com - **afreecatv**: afreecatv.com
- **afreecatv:global**: afreecatv.com
- **AirMozilla** - **AirMozilla**
- **AlJazeera** - **AlJazeera**
- **Allocine** - **Allocine**
@ -74,6 +75,8 @@
- **awaan:live** - **awaan:live**
- **awaan:season** - **awaan:season**
- **awaan:video** - **awaan:video**
- **AZMedien**: AZ Medien videos
- **AZMedienShow**: AZ Medien shows
- **Azubu** - **Azubu**
- **AzubuLive** - **AzubuLive**
- **BaiduVideo**: 百度视频 - **BaiduVideo**: 百度视频
@ -86,6 +89,7 @@
- **bbc.co.uk:article**: BBC articles - **bbc.co.uk:article**: BBC articles
- **bbc.co.uk:iplayer:playlist** - **bbc.co.uk:iplayer:playlist**
- **bbc.co.uk:playlist** - **bbc.co.uk:playlist**
- **Beam:live**
- **Beatport** - **Beatport**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
@ -198,6 +202,7 @@
- **Digiteka** - **Digiteka**
- **Discovery** - **Discovery**
- **DiscoveryGo** - **DiscoveryGo**
- **Disney**
- **Dotsub** - **Dotsub**
- **DouyuTV**: 斗鱼 - **DouyuTV**: 斗鱼
- **DPlay** - **DPlay**
@ -482,6 +487,7 @@
- **Newstube** - **Newstube**
- **NextMedia**: 蘋果日報 - **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞 - **NextMediaActionNews**: 蘋果日報 - 動新聞
- **NextTV**: 壹電視
- **nfb**: National Film Board of Canada - **nfb**: National Film Board of Canada
- **nfl.com** - **nfl.com**
- **NhkVod** - **NhkVod**
@ -571,6 +577,7 @@
- **PolskieRadio** - **PolskieRadio**
- **PolskieRadioCategory** - **PolskieRadioCategory**
- **PornCom** - **PornCom**
- **PornFlip**
- **PornHd** - **PornHd**
- **PornHub**: PornHub and Thumbzilla - **PornHub**: PornHub and Thumbzilla
- **PornHubPlaylist** - **PornHubPlaylist**
@ -779,6 +786,7 @@
- **TV2Article** - **TV2Article**
- **TV3** - **TV3**
- **TV4**: tv4.se and tv4play.se - **TV4**: tv4.se and tv4play.se
- **TVA**
- **TVANouvelles** - **TVANouvelles**
- **TVANouvellesArticle** - **TVANouvellesArticle**
- **TVC** - **TVC**

View File

@ -510,6 +510,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780) self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
self.assertEqual(parse_duration('87 Min.'), 5220) self.assertEqual(parse_duration('87 Min.'), 5220)
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04) self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
self.assertEqual(parse_duration('PT00H03M30SZ'), 210)
def test_fix_xml_ampersands(self): def test_fix_xml_ampersands(self):
self.assertEqual( self.assertEqual(

View File

@ -1363,7 +1363,7 @@ class YoutubeDL(object):
format['ext'] = determine_ext(format['url']).lower() format['ext'] = determine_ext(format['url']).lower()
# Automatically determine protocol if missing (useful for format # Automatically determine protocol if missing (useful for format
# selection purposes) # selection purposes)
if 'protocol' not in format: if format.get('protocol') is None:
format['protocol'] = determine_protocol(format) format['protocol'] = determine_protocol(format)
# Add HTTP headers, so that external programs can use them from the # Add HTTP headers, so that external programs can use them from the
# json output # json output

View File

@ -87,7 +87,7 @@ class AENetworksIE(AENetworksBaseIE):
self._html_search_meta('aetn:SeriesTitle', webpage)) self._html_search_meta('aetn:SeriesTitle', webpage))
elif url_parts_len == 2: elif url_parts_len == 2:
entries = [] entries = []
for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage): for episode_item in re.findall(r'(?s)<[^>]+class="[^"]*(?:episode|program)-item[^"]*"[^>]*>', webpage):
episode_attributes = extract_attributes(episode_item) episode_attributes = extract_attributes(episode_item)
episode_url = compat_urlparse.urljoin( episode_url = compat_urlparse.urljoin(
url, episode_attributes['data-canonical']) url, episode_attributes['data-canonical'])

View File

@ -18,6 +18,7 @@ from ..utils import (
class AfreecaTVIE(InfoExtractor): class AfreecaTVIE(InfoExtractor):
IE_NAME = 'afreecatv'
IE_DESC = 'afreecatv.com' IE_DESC = 'afreecatv.com'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
@ -143,3 +144,94 @@ class AfreecaTVIE(InfoExtractor):
expected=True) expected=True)
return info return info
class AfreecaTVGlobalIE(AfreecaTVIE):
IE_NAME = 'afreecatv:global'
_VALID_URL = r'https?://(?:www\.)?afreeca\.tv/(?P<channel_id>\d+)(?:/v/(?P<video_id>\d+))?'
_TESTS = [{
'url': 'http://afreeca.tv/36853014/v/58301',
'info_dict': {
'id': '58301',
'title': 'tryhard top100',
'uploader_id': '36853014',
'uploader': 'makgi Hearthstone Live!',
},
'playlist_count': 3,
}]
def _real_extract(self, url):
channel_id, video_id = re.match(self._VALID_URL, url).groups()
video_type = 'video' if video_id else 'live'
query = {
'pt': 'view',
'bid': channel_id,
}
if video_id:
query['vno'] = video_id
video_data = self._download_json(
'http://api.afreeca.tv/%s/view_%s.php' % (video_type, video_type),
video_id or channel_id, query=query)['channel']
if video_data.get('result') != 1:
raise ExtractorError('%s said: %s' % (self.IE_NAME, video_data['remsg']))
title = video_data['title']
info = {
'thumbnail': video_data.get('thumb'),
'view_count': int_or_none(video_data.get('vcnt')),
'age_limit': int_or_none(video_data.get('grade')),
'uploader_id': channel_id,
'uploader': video_data.get('cname'),
}
if video_id:
entries = []
for i, f in enumerate(video_data.get('flist', [])):
video_key = self.parse_video_key(f.get('key', ''))
f_url = f.get('file')
if not video_key or not f_url:
continue
entries.append({
'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
'title': title,
'upload_date': video_key.get('upload_date'),
'duration': int_or_none(f.get('length')),
'url': f_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
info.update({
'id': video_id,
'title': title,
'duration': int_or_none(video_data.get('length')),
})
if len(entries) > 1:
info['_type'] = 'multi_video'
info['entries'] = entries
elif len(entries) == 1:
i = entries[0].copy()
i.update(info)
info = i
else:
formats = []
for s in video_data.get('strm', []):
s_url = s.get('purl')
if not s_url:
continue
# TODO: extract rtmp formats
if s.get('stype') == 'HLS':
formats.extend(self._extract_m3u8_formats(
s_url, channel_id, 'mp4', fatal=False))
self._sort_formats(formats)
info.update({
'id': channel_id,
'title': self._live_title(title),
'is_live': True,
'formats': formats,
})
return info

View File

@ -0,0 +1,145 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .kaltura import KalturaIE
from ..utils import (
get_element_by_class,
strip_or_none,
)
class AZMedienBaseIE(InfoExtractor):
def _kaltura_video(self, partner_id, entry_id):
return self.url_result(
'kaltura:%s:%s' % (partner_id, entry_id), ie=KalturaIE.ie_key(),
video_id=entry_id)
class AZMedienIE(AZMedienBaseIE):
IE_DESC = 'AZ Medien videos'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
[0-9]+-show-[^/\#]+
(?:
/[0-9]+-episode-[^/\#]+
(?:
/[0-9]+-segment-(?:[^/\#]+\#)?|
\#
)|
\#
)
(?P<id>[^\#]+)
'''
_TESTS = [{
# URL with 'segment'
'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
'info_dict': {
'id': '1_2444peh4',
'ext': 'mov',
'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
'uploader_id': 'TeleZ?ri',
'upload_date': '20161218',
'timestamp': 1482084490,
},
'params': {
'skip_download': True,
},
}, {
# URL with 'segment' and fragment:
'url': 'http://www.telebaern.tv/118-show-news/14240-episode-dienstag-17-januar-2017/33666-segment-achtung-gefahr#zu-wenig-pflegerinnen-und-pfleger',
'only_matching': True
}, {
# URL with 'episode' and fragment:
'url': 'http://www.telem1.ch/47-show-sonntalk/13986-episode-soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz#soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz',
'only_matching': True
}, {
# URL with 'show' and fragment:
'url': 'http://www.telezueri.ch/66-show-sonntalk#burka-plakate-trump-putin-china-besuch',
'only_matching': True
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
partner_id = self._search_regex(
r'<script[^>]+src=["\'](?:https?:)?//(?:[^/]+\.)?kaltura\.com(?:/[^/]+)*/(?:p|partner_id)/([0-9]+)',
webpage, 'kaltura partner id')
entry_id = self._html_search_regex(
r'<a[^>]+data-id=(["\'])(?P<id>(?:(?!\1).)+)\1[^>]+data-slug=["\']%s'
% re.escape(video_id), webpage, 'kaltura entry id', group='id')
return self._kaltura_video(partner_id, entry_id)
class AZMedienShowIE(AZMedienBaseIE):
IE_DESC = 'AZ Medien shows'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
telezueri\.ch|
telebaern\.tv|
telem1\.ch
)/
(?P<id>[0-9]+-show-[^/\#]+
(?:
/[0-9]+-episode-[^/\#]+
)?
)$
'''
_TESTS = [{
# URL with 'episode'
'url': 'http://www.telebaern.tv/118-show-news/13735-episode-donnerstag-15-dezember-2016',
'info_dict': {
'id': '118-show-news/13735-episode-donnerstag-15-dezember-2016',
'title': 'News - Donnerstag, 15. Dezember 2016',
},
'playlist_count': 9,
}, {
# URL with 'show' only
'url': 'http://www.telezueri.ch/86-show-talktaeglich',
'only_matching': True
}]
def _real_extract(self, url):
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
entries = []
partner_id = self._search_regex(
r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
webpage, 'kaltura partner id', default=None)
if partner_id:
entries = [
self._kaltura_video(partner_id, m.group('id'))
for m in re.finditer(
r'data-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage)]
if not entries:
entries = [
self.url_result(m.group('url'), ie=AZMedienIE.ie_key())
for m in re.finditer(
r'<a[^>]+data-real=(["\'])(?P<url>http.+?)\1', webpage)]
title = self._search_regex(
r'episodeShareTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
webpage, 'title',
default=strip_or_none(get_element_by_class(
'title-block-cell', webpage)), group='title')
return self.playlist_result(entries, show_id, title)

View File

@ -0,0 +1,73 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
compat_str,
int_or_none,
parse_iso8601,
try_get,
)
class BeamProLiveIE(InfoExtractor):
IE_NAME = 'Beam:live'
_VALID_URL = r'https?://(?:\w+\.)?beam\.pro/(?P<id>[^/?#&]+)'
_RATINGS = {'family': 0, 'teen': 13, '18+': 18}
_TEST = {
'url': 'http://www.beam.pro/niterhayven',
'info_dict': {
'id': '261562',
'ext': 'mp4',
'title': 'Introducing The Witcher 3 // The Grind Starts Now!',
'description': 'md5:0b161ac080f15fe05d18a07adb44a74d',
'thumbnail': r're:https://.*\.jpg$',
'timestamp': 1483477281,
'upload_date': '20170103',
'uploader': 'niterhayven',
'uploader_id': '373396',
'age_limit': 18,
'is_live': True,
'view_count': int,
},
'skip': 'niterhayven is offline',
'params': {
'skip_download': True,
},
}
def _real_extract(self, url):
channel_name = self._match_id(url)
chan = self._download_json(
'https://beam.pro/api/v1/channels/%s' % channel_name, channel_name)
if chan.get('online') is False:
raise ExtractorError(
'{0} is offline'.format(channel_name), expected=True)
channel_id = chan['id']
formats = self._extract_m3u8_formats(
'https://beam.pro/api/v1/channels/%s/manifest.m3u8' % channel_id,
channel_name, ext='mp4', m3u8_id='hls', fatal=False)
self._sort_formats(formats)
user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
return {
'id': compat_str(chan.get('id') or channel_name),
'title': self._live_title(chan.get('name') or channel_name),
'description': clean_html(chan.get('description')),
'thumbnail': try_get(chan, lambda x: x['thumbnail']['url'], compat_str),
'timestamp': parse_iso8601(chan.get('updatedAt')),
'uploader': chan.get('token') or try_get(
chan, lambda x: x['user']['username'], compat_str),
'uploader_id': compat_str(user_id) if user_id else None,
'age_limit': self._RATINGS.get(chan.get('audience')),
'is_live': True,
'view_count': int_or_none(chan.get('viewersTotal')),
'formats': formats,
}

View File

@ -34,8 +34,8 @@ class BiliBiliIE(InfoExtractor):
}, },
} }
_APP_KEY = '6f90a59ac58a4123' _APP_KEY = '84956560bc028eb7'
_BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326' _BILIBILI_KEY = '94aba54af9065f71de72f5508f1cd42e'
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -179,7 +179,7 @@ class BrightcoveLegacyIE(InfoExtractor):
params = {} params = {}
playerID = find_param('playerID') playerID = find_param('playerID') or find_param('playerId')
if playerID is None: if playerID is None:
raise ExtractorError('Cannot find player ID') raise ExtractorError('Cannot find player ID')
params['playerID'] = playerID params['playerID'] = playerID
@ -204,7 +204,7 @@ class BrightcoveLegacyIE(InfoExtractor):
# // build Brightcove <object /> XML # // build Brightcove <object /> XML
# } # }
m = re.search( m = re.search(
r'''(?x)customBC.\createVideo\( r'''(?x)customBC\.createVideo\(
.*? # skipping width and height .*? # skipping width and height
["\'](?P<playerID>\d+)["\']\s*,\s* # playerID ["\'](?P<playerID>\d+)["\']\s*,\s* # playerID
["\'](?P<playerKey>AQ[^"\']{48})[^"\']*["\']\s*,\s* # playerKey begins with AQ and is 50 characters ["\'](?P<playerKey>AQ[^"\']{48})[^"\']*["\']\s*,\s* # playerKey begins with AQ and is 50 characters

View File

@ -107,7 +107,7 @@ class CanalplusIE(InfoExtractor):
[r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)', [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
r'id=["\']canal_video_player(?P<id>\d+)', r'id=["\']canal_video_player(?P<id>\d+)',
r'data-video=["\'](?P<id>\d+)'], r'data-video=["\'](?P<id>\d+)'],
webpage, 'video id', group='id') webpage, 'video id', default=mobj.group('vid'), group='id')
info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id) info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
video_data = self._download_json(info_url, video_id, 'Downloading video JSON') video_data = self._download_json(info_url, video_id, 'Downloading video JSON')

View File

@ -90,36 +90,49 @@ class CBCIE(InfoExtractor):
}, },
}], }],
'skip': 'Geo-restricted to Canada', 'skip': 'Geo-restricted to Canada',
}, {
# multiple CBC.APP.Caffeine.initInstance(...)
'url': 'http://www.cbc.ca/news/canada/calgary/dog-indoor-exercise-winter-1.3928238',
'info_dict': {
'title': 'Keep Rover active during the deep freeze with doggie pushups and other fun indoor tasks',
'id': 'dog-indoor-exercise-winter-1.3928238',
},
'playlist_mincount': 6,
}] }]
@classmethod @classmethod
def suitable(cls, url): def suitable(cls, url):
return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url) return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url)
def _extract_player_init(self, player_init, display_id):
player_info = self._parse_json(player_init, display_id, js_to_json)
media_id = player_info.get('mediaId')
if not media_id:
clip_id = player_info['clipId']
feed = self._download_json(
'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
clip_id, fatal=False)
if feed:
media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
if not media_id:
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
player_init = self._search_regex( entries = [
r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage, 'player init', self._extract_player_init(player_init, display_id)
default=None) for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)]
if player_init: entries.extend([
player_info = self._parse_json(player_init, display_id, js_to_json) self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
media_id = player_info.get('mediaId') for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)])
if not media_id: return self.playlist_result(
clip_id = player_info['clipId'] entries, display_id,
feed = self._download_json( self._og_search_title(webpage, fatal=False),
'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id, self._og_search_description(webpage))
clip_id, fatal=False)
if feed:
media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
if not media_id:
media_id = self._download_json(
'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
clip_id)['entries'][0]['id'].split('/')[-1]
return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
else:
entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
return self.playlist_result(entries)
class CBCPlayerIE(InfoExtractor): class CBCPlayerIE(InfoExtractor):

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..utils import ExtractorError
@ -31,30 +33,35 @@ class ChaturbateIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
m3u8_url = self._search_regex( m3u8_formats = [(m.group('id').lower(), m.group('url')) for m in re.finditer(
r'src=(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage, r'hlsSource(?P<id>.+?)\s*=\s*(?P<q>["\'])(?P<url>http.+?)(?P=q)', webpage)]
'playlist', default=None, group='url')
if not m3u8_url: if not m3u8_formats:
error = self._search_regex( error = self._search_regex(
[r'<span[^>]+class=(["\'])desc_span\1[^>]*>(?P<error>[^<]+)</span>', [r'<span[^>]+class=(["\'])desc_span\1[^>]*>(?P<error>[^<]+)</span>',
r'<div[^>]+id=(["\'])defchat\1[^>]*>\s*<p><strong>(?P<error>[^<]+)<'], r'<div[^>]+id=(["\'])defchat\1[^>]*>\s*<p><strong>(?P<error>[^<]+)<'],
webpage, 'error', group='error', default=None) webpage, 'error', group='error', default=None)
if not error: if not error:
if any(p not in webpage for p in ( if any(p in webpage for p in (
self._ROOM_OFFLINE, 'offline_tipping', 'tip_offline')): self._ROOM_OFFLINE, 'offline_tipping', 'tip_offline')):
error = self._ROOM_OFFLINE error = self._ROOM_OFFLINE
if error: if error:
raise ExtractorError(error, expected=True) raise ExtractorError(error, expected=True)
raise ExtractorError('Unable to find stream URL') raise ExtractorError('Unable to find stream URL')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4') formats = []
for m3u8_id, m3u8_url in m3u8_formats:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, ext='mp4',
# ffmpeg skips segments for fast m3u8
preference=-10 if m3u8_id == 'fast' else None,
m3u8_id=m3u8_id, fatal=False, live=True))
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': self._live_title(video_id), 'title': self._live_title(video_id),
'thumbnail': 'https://cdn-s.highwebmedia.com/uHK3McUtGCG3SMFcd4ZJsRv8/roomimage/%s.jpg' % video_id, 'thumbnail': 'https://roomimg.stream.highwebmedia.com/ri/%s.jpg' % video_id,
'age_limit': self._rta_search(webpage), 'age_limit': self._rta_search(webpage),
'is_live': True, 'is_live': True,
'formats': formats, 'formats': formats,

View File

@ -19,6 +19,7 @@ class ChirbitIE(InfoExtractor):
'title': 'md5:f542ea253f5255240be4da375c6a5d7e', 'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
'description': 'md5:f24a4e22a71763e32da5fed59e47c770', 'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
'duration': 306, 'duration': 306,
'uploader': 'Gerryaudio',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -54,6 +55,9 @@ class ChirbitIE(InfoExtractor):
duration = parse_duration(self._search_regex( duration = parse_duration(self._search_regex(
r'class=["\']c-length["\'][^>]*>([^<]+)', r'class=["\']c-length["\'][^>]*>([^<]+)',
webpage, 'duration', fatal=False)) webpage, 'duration', fatal=False))
uploader = self._search_regex(
r'id=["\']chirbit-username["\'][^>]*>([^<]+)',
webpage, 'uploader', fatal=False)
return { return {
'id': audio_id, 'id': audio_id,
@ -61,6 +65,7 @@ class ChirbitIE(InfoExtractor):
'title': title, 'title': title,
'description': description, 'description': description,
'duration': duration, 'duration': duration,
'uploader': uploader,
} }

View File

@ -1,13 +1,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .mtv import MTVIE from .mtv import MTVIE
from ..utils import ExtractorError
class CMTIE(MTVIE): class CMTIE(MTVIE):
IE_NAME = 'cmt.com' IE_NAME = 'cmt.com'
_VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)' _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows|(?:full-)?episodes|video-clips)/(?P<id>[^/]+)'
_FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061', 'url': 'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061',
@ -33,17 +31,24 @@ class CMTIE(MTVIE):
}, { }, {
'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172', 'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.cmt.com/full-episodes/537qb3/nashville-the-wayfaring-stranger-season-5-ep-501',
'only_matching': True,
}, {
'url': 'http://www.cmt.com/video-clips/t9e4ci/nashville-juliette-in-2-minutes',
'only_matching': True,
}] }]
@classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
if 'error_not_available.swf' in rtmp_video_url:
raise ExtractorError(
'%s said: video is not available' % cls.IE_NAME, expected=True)
return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
return self._search_regex( mgid = self._search_regex(
r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1', r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1',
webpage, 'mgid', group='mgid') webpage, 'mgid', group='mgid', default=None)
if not mgid:
mgid = self._extract_triforce_mgid(webpage)
return mgid
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
mgid = self._extract_mgid(webpage)
return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)

View File

@ -48,17 +48,8 @@ class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
mgid = self._extract_triforce_mgid(webpage, data_zone='t2_lc_promo1')
feed_json = self._search_regex(r'var triforceManifestFeed\s*=\s*(\{.+?\});\n', webpage, 'triforce feeed')
feed = self._parse_json(feed_json, playlist_id)
zones = feed['manifest']['zones']
video_zone = zones['t2_lc_promo1']
feed = self._download_json(video_zone['feed'], playlist_id)
mgid = feed['result']['data']['id']
videos_info = self._get_videos_info(mgid) videos_info = self._get_videos_info(mgid)
return videos_info return videos_info
@ -94,12 +85,6 @@ class ToshIE(MTVServicesInfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
@classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
new_urls = super(ToshIE, cls)._transform_rtmp_url(rtmp_video_url)
new_urls['rtmp'] = rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm')
return new_urls
class ComedyCentralTVIE(MTVServicesInfoExtractor): class ComedyCentralTVIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'

View File

@ -6,7 +6,7 @@ from ..utils import int_or_none
class CrackleIE(InfoExtractor): class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)' _VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934', 'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',
'info_dict': { 'info_dict': {
@ -31,8 +31,32 @@ class CrackleIE(InfoExtractor):
} }
} }
_THUMBNAIL_RES = [
(120, 90),
(208, 156),
(220, 124),
(220, 220),
(240, 180),
(250, 141),
(315, 236),
(320, 180),
(360, 203),
(400, 300),
(421, 316),
(460, 330),
(460, 460),
(462, 260),
(480, 270),
(587, 330),
(640, 480),
(700, 330),
(700, 394),
(854, 480),
(1024, 1024),
(1920, 1080),
]
# extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx # extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
_THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
_MEDIA_FILE_SLOTS = { _MEDIA_FILE_SLOTS = {
'c544.flv': { 'c544.flv': {
'width': 544, 'width': 544,
@ -61,17 +85,25 @@ class CrackleIE(InfoExtractor):
item = self._download_xml( item = self._download_xml(
'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id, 'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
video_id).find('i') video_id, headers=self.geo_verification_headers()).find('i')
title = item.attrib['t'] title = item.attrib['t']
subtitles = {} subtitles = {}
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
'http://content.uplynk.com/ext/%s/%s.m3u8' % (config_doc.attrib['strUplynkOwnerId'], video_id), 'http://content.uplynk.com/ext/%s/%s.m3u8' % (config_doc.attrib['strUplynkOwnerId'], video_id),
video_id, 'mp4', m3u8_id='hls', fatal=None) video_id, 'mp4', m3u8_id='hls', fatal=None)
thumbnail = None thumbnails = []
path = item.attrib.get('p') path = item.attrib.get('p')
if path: if path:
thumbnail = self._THUMBNAIL_TEMPLATE % path for width, height in self._THUMBNAIL_RES:
res = '%dx%d' % (width, height)
thumbnails.append({
'id': res,
'url': 'http://images-us-am.crackle.com/%stnl_%s.jpg' % (path, res),
'width': width,
'height': height,
'resolution': res,
})
http_base_url = 'http://ahttp.crackle.com/' + path http_base_url = 'http://ahttp.crackle.com/' + path
for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items(): for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
formats.append({ formats.append({
@ -86,10 +118,11 @@ class CrackleIE(InfoExtractor):
if locale and v: if locale and v:
if locale not in subtitles: if locale not in subtitles:
subtitles[locale] = [] subtitles[locale] = []
subtitles[locale] = [{ for url_ext, ext in (('vtt', 'vtt'), ('xml', 'tt')):
'url': '%s/%s%s_%s.xml' % (config_doc.attrib['strSubtitleServer'], path, locale, v), subtitles.setdefault(locale, []).append({
'ext': 'ttml', 'url': '%s/%s%s_%s.%s' % (config_doc.attrib['strSubtitleServer'], path, locale, v, url_ext),
}] 'ext': ext,
})
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id')) self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return { return {
@ -100,7 +133,7 @@ class CrackleIE(InfoExtractor):
'series': item.attrib.get('sn'), 'series': item.attrib.get('sn'),
'season_number': int_or_none(item.attrib.get('se')), 'season_number': int_or_none(item.attrib.get('se')),
'episode_number': int_or_none(item.attrib.get('ep')), 'episode_number': int_or_none(item.attrib.get('ep')),
'thumbnail': thumbnail, 'thumbnails': thumbnails,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }

View File

@ -166,6 +166,25 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://www.crunchyroll.com/konosuba-gods-blessing-on-this-wonderful-world/episode-1-give-me-deliverance-from-this-judicial-injustice-727589',
'info_dict': {
'id': '727589',
'ext': 'mp4',
'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 Give Me Deliverance from this Judicial Injustice!",
'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Kadokawa Pictures Inc.',
'upload_date': '20170118',
'series': "KONOSUBA -God's blessing on this wonderful world!",
'season_number': 2,
'episode': 'Give Me Deliverance from this Judicial Injustice!',
'episode_number': 1,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, { }, {
'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697', 'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
'only_matching': True, 'only_matching': True,
@ -439,6 +458,18 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
subtitles = self.extract_subtitles(video_id, webpage) subtitles = self.extract_subtitles(video_id, webpage)
# webpage provide more accurate data than series_title from XML
series = self._html_search_regex(
r'id=["\']showmedia_about_episode_num[^>]+>\s*<a[^>]+>([^<]+)',
webpage, 'series', default=xpath_text(metadata, 'series_title'))
episode = xpath_text(metadata, 'episode_title')
episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
season_number = int_or_none(self._search_regex(
r'(?s)<h4[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h4>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None))
return { return {
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
@ -446,9 +477,10 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'thumbnail': xpath_text(metadata, 'episode_image_url'), 'thumbnail': xpath_text(metadata, 'episode_image_url'),
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'series': xpath_text(metadata, 'series_title'), 'series': series,
'episode': xpath_text(metadata, 'episode_title'), 'season_number': season_number,
'episode_number': int_or_none(xpath_text(metadata, 'episode_number')), 'episode': episode,
'episode_number': episode_number,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }

View File

@ -12,6 +12,7 @@ from ..utils import (
ExtractorError, ExtractorError,
) )
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
from .ustream import UstreamIE
class CSpanIE(InfoExtractor): class CSpanIE(InfoExtractor):
@ -22,14 +23,13 @@ class CSpanIE(InfoExtractor):
'md5': '94b29a4f131ff03d23471dd6f60b6a1d', 'md5': '94b29a4f131ff03d23471dd6f60b6a1d',
'info_dict': { 'info_dict': {
'id': '315139', 'id': '315139',
'ext': 'mp4',
'title': 'Attorney General Eric Holder on Voting Rights Act Decision', 'title': 'Attorney General Eric Holder on Voting Rights Act Decision',
'description': 'Attorney General Eric Holder speaks to reporters following the Supreme Court decision in [Shelby County v. Holder], in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced.',
}, },
'playlist_mincount': 2,
'skip': 'Regularly fails on travis, for unknown reasons', 'skip': 'Regularly fails on travis, for unknown reasons',
}, { }, {
'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models', 'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
'md5': '8e5fbfabe6ad0f89f3012a7943c1287b', # md5 is unstable
'info_dict': { 'info_dict': {
'id': 'c4486943', 'id': 'c4486943',
'ext': 'mp4', 'ext': 'mp4',
@ -38,14 +38,11 @@ class CSpanIE(InfoExtractor):
} }
}, { }, {
'url': 'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall', 'url': 'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall',
'md5': '2ae5051559169baadba13fc35345ae74',
'info_dict': { 'info_dict': {
'id': '342759', 'id': '342759',
'ext': 'mp4',
'title': 'General Motors Ignition Switch Recall', 'title': 'General Motors Ignition Switch Recall',
'duration': 14848,
'description': 'md5:118081aedd24bf1d3b68b3803344e7f3'
}, },
'playlist_mincount': 6,
}, { }, {
# Video from senate.gov # Video from senate.gov
'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers', 'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers',
@ -57,12 +54,30 @@ class CSpanIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, # m3u8 downloads 'skip_download': True, # m3u8 downloads
} }
}, {
# Ustream embedded video
'url': 'https://www.c-span.org/video/?114917-1/armed-services',
'info_dict': {
'id': '58428542',
'ext': 'flv',
'title': 'USHR07 Armed Services Committee',
'description': 'hsas00-2118-20150204-1000et-07\n\n\nUSHR07 Armed Services Committee',
'timestamp': 1423060374,
'upload_date': '20150204',
'uploader': 'HouseCommittee',
'uploader_id': '12987475',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_type = None video_type = None
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
ustream_url = UstreamIE._extract_url(webpage)
if ustream_url:
return self.url_result(ustream_url, UstreamIE.ie_key())
# We first look for clipid, because clipprog always appears before # We first look for clipid, because clipprog always appears before
patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')] patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
results = list(filter(None, (re.search(p, webpage) for p in patterns))) results = list(filter(None, (re.search(p, webpage) for p in patterns)))

View File

@ -0,0 +1,115 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
compat_str,
determine_ext,
)
class DisneyIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|starwars\.com))/(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})'''
_TESTS = [{
'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
'info_dict': {
'id': '545ed1857afee5a0ec239977',
'ext': 'mp4',
'title': 'Moana - Trailer',
'description': 'A fun adventure for the entire Family! Bring home Moana on Digital HD Feb 21 & Blu-ray March 7',
'upload_date': '20170112',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2',
'only_matching': True,
}, {
'url': 'http://video.en.disneyme.com/watch/future-worm/robo-carp-2001-544b66002aa7353cdd3f5114',
'only_matching': True,
}, {
'url': 'http://video.disneyturkiye.com.tr/izle/7c-7-cuceler/kimin-sesi-zaten-5456f3d015f6b36c8afdd0e2',
'only_matching': True,
}, {
'url': 'http://disneyjunior.disney.com/embed/546a4798ddba3d1612e4005d',
'only_matching': True,
}, {
'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097',
'only_matching': True,
}]
def _real_extract(self, url):
domain, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(
'http://%s/embed/%s' % (domain, video_id), video_id)
video_data = self._parse_json(self._search_regex(
r'Disney\.EmbedVideo=({.+});', webpage, 'embed data'), video_id)['video']
for external in video_data.get('externals', []):
if external.get('source') == 'vevo':
return self.url_result('vevo:' + external['data_id'], 'Vevo')
title = video_data['title']
formats = []
for flavor in video_data.get('flavors', []):
flavor_format = flavor.get('format')
flavor_url = flavor.get('url')
if not flavor_url or not re.match(r'https?://', flavor_url):
continue
tbr = int_or_none(flavor.get('bitrate'))
if tbr == 99999:
formats.extend(self._extract_m3u8_formats(
flavor_url, video_id, 'mp4', m3u8_id=flavor_format, fatal=False))
continue
format_id = []
if flavor_format:
format_id.append(flavor_format)
if tbr:
format_id.append(compat_str(tbr))
ext = determine_ext(flavor_url)
if flavor_format == 'applehttp' or ext == 'm3u8':
ext = 'mp4'
width = int_or_none(flavor.get('width'))
height = int_or_none(flavor.get('height'))
formats.append({
'format_id': '-'.join(format_id),
'url': flavor_url,
'width': width,
'height': height,
'tbr': tbr,
'ext': ext,
'vcodec': 'none' if (width == 0 and height == 0) else None,
})
self._sort_formats(formats)
subtitles = {}
for caption in video_data.get('captions', []):
caption_url = caption.get('url')
caption_format = caption.get('format')
if not caption_url or caption_format.startswith('unknown'):
continue
subtitles.setdefault(caption.get('language', 'en'), []).append({
'url': caption_url,
'ext': {
'webvtt': 'vtt',
}.get(caption_format, caption_format),
})
return {
'id': video_id,
'title': title,
'description': video_data.get('description') or video_data.get('short_desc'),
'thumbnail': video_data.get('thumb') or video_data.get('thumb_secure'),
'duration': int_or_none(video_data.get('duration_sec')),
'upload_date': unified_strdate(video_data.get('publish_date')),
'formats': formats,
'subtitles': subtitles,
}

View File

@ -66,7 +66,7 @@ class DramaFeverBaseIE(AMPIE):
class DramaFeverIE(DramaFeverBaseIE): class DramaFeverIE(DramaFeverBaseIE):
IE_NAME = 'dramafever' IE_NAME = 'dramafever'
_VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)' _VALID_URL = r'https?://(?:www\.)?dramafever\.com/(?:[^/]+/)?drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/', 'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
'info_dict': { 'info_dict': {
@ -103,6 +103,9 @@ class DramaFeverIE(DramaFeverBaseIE):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'https://www.dramafever.com/zh-cn/drama/4972/15/Doctor_Romantic/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -148,7 +151,7 @@ class DramaFeverIE(DramaFeverBaseIE):
class DramaFeverSeriesIE(DramaFeverBaseIE): class DramaFeverSeriesIE(DramaFeverBaseIE):
IE_NAME = 'dramafever:series' IE_NAME = 'dramafever:series'
_VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+)(?:/(?:(?!\d+(?:/|$)).+)?)?$' _VALID_URL = r'https?://(?:www\.)?dramafever\.com/(?:[^/]+/)?drama/(?P<id>[0-9]+)(?:/(?:(?!\d+(?:/|$)).+)?)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.dramafever.com/drama/4512/Cooking_with_Shin/', 'url': 'http://www.dramafever.com/drama/4512/Cooking_with_Shin/',
'info_dict': { 'info_dict': {

View File

@ -30,7 +30,10 @@ from .aenetworks import (
AENetworksIE, AENetworksIE,
HistoryTopicIE, HistoryTopicIE,
) )
from .afreecatv import AfreecaTVIE from .afreecatv import (
AfreecaTVIE,
AfreecaTVGlobalIE,
)
from .airmozilla import AirMozillaIE from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE from .alphaporno import AlphaPornoIE
@ -77,6 +80,10 @@ from .awaan import (
AWAANLiveIE, AWAANLiveIE,
AWAANSeasonIE, AWAANSeasonIE,
) )
from .azmedien import (
AZMedienIE,
AZMedienShowIE,
)
from .azubu import AzubuIE, AzubuLiveIE from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE from .bambuser import BambuserIE, BambuserChannelIE
@ -88,6 +95,7 @@ from .bbc import (
BBCCoUkPlaylistIE, BBCCoUkPlaylistIE,
BBCIE, BBCIE,
) )
from .beampro import BeamProLiveIE
from .beeg import BeegIE from .beeg import BeegIE
from .behindkink import BehindKinkIE from .behindkink import BehindKinkIE
from .bellmedia import BellMediaIE from .bellmedia import BellMediaIE
@ -243,6 +251,7 @@ from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE from .discovery import DiscoveryIE
from .discoverygo import DiscoveryGoIE from .discoverygo import DiscoveryGoIE
from .disney import DisneyIE
from .dispeak import DigitallySpeakingIE from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE from .dropbox import DropboxIE
from .dw import ( from .dw import (
@ -593,6 +602,7 @@ from .nextmedia import (
NextMediaIE, NextMediaIE,
NextMediaActionNewsIE, NextMediaActionNewsIE,
AppleDailyIE, AppleDailyIE,
NextTVIE,
) )
from .nfb import NFBIE from .nfb import NFBIE
from .nfl import NFLIE from .nfl import NFLIE
@ -719,6 +729,7 @@ from .polskieradio import (
) )
from .porn91 import Porn91IE from .porn91 import Porn91IE
from .porncom import PornComIE from .porncom import PornComIE
from .pornflip import PornFlipIE
from .pornhd import PornHdIE from .pornhd import PornHdIE
from .pornhub import ( from .pornhub import (
PornHubIE, PornHubIE,
@ -975,6 +986,7 @@ from .tv2 import (
) )
from .tv3 import TV3IE from .tv3 import TV3IE
from .tv4 import TV4IE from .tv4 import TV4IE
from .tva import TVAIE
from .tvanouvelles import ( from .tvanouvelles import (
TVANouvellesIE, TVANouvellesIE,
TVANouvellesArticleIE, TVANouvellesArticleIE,

View File

@ -86,18 +86,43 @@ class FirstTVIE(InfoExtractor):
title = item['title'] title = item['title']
quality = qualities(QUALITIES) quality = qualities(QUALITIES)
formats = [] formats = []
path = None
for f in item.get('mbr', []): for f in item.get('mbr', []):
src = f.get('src') src = f.get('src')
if not src or not isinstance(src, compat_str): if not src or not isinstance(src, compat_str):
continue continue
tbr = int_or_none(self._search_regex( tbr = int_or_none(self._search_regex(
r'_(\d{3,})\.mp4', src, 'tbr', default=None)) r'_(\d{3,})\.mp4', src, 'tbr', default=None))
if not path:
path = self._search_regex(
r'//[^/]+/(.+?)_\d+\.mp4', src,
'm3u8 path', default=None)
formats.append({ formats.append({
'url': src, 'url': src,
'format_id': f.get('name'), 'format_id': f.get('name'),
'tbr': tbr, 'tbr': tbr,
'quality': quality(f.get('name')), 'source_preference': quality(f.get('name')),
}) })
# m3u8 URL format is reverse engineered from [1] (search for
# master.m3u8). dashEdges (that is currently balancer-vod.1tv.ru)
# is taken from [2].
# 1. http://static.1tv.ru/player/eump1tv-current/eump-1tv.all.min.js?rnd=9097422834:formatted
# 2. http://static.1tv.ru/player/eump1tv-config/config-main.js?rnd=9097422834
if not path and len(formats) == 1:
path = self._search_regex(
r'//[^/]+/(.+?$)', formats[0]['url'],
'm3u8 path', default=None)
if path:
if len(formats) == 1:
m3u8_path = ','
else:
tbrs = [compat_str(t) for t in sorted(f['tbr'] for f in formats)]
m3u8_path = '_,%s,%s' % (','.join(tbrs), '.mp4')
formats.extend(self._extract_m3u8_formats(
'http://balancer-vod.1tv.ru/%s%s.urlset/master.m3u8'
% (path, m3u8_path),
display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
thumbnail = item.get('poster') or self._og_search_thumbnail(webpage) thumbnail = item.get('poster') or self._og_search_thumbnail(webpage)

View File

@ -81,7 +81,7 @@ class FlipagramIE(InfoExtractor):
'filesize': int_or_none(cover.get('size')), 'filesize': int_or_none(cover.get('size')),
} for cover in flipagram.get('covers', []) if cover.get('url')] } for cover in flipagram.get('covers', []) if cover.get('url')]
# Note that this only retrieves comments that are initally loaded. # Note that this only retrieves comments that are initially loaded.
# For videos with large amounts of comments, most won't be retrieved. # For videos with large amounts of comments, most won't be retrieved.
comments = [] comments = []
for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []): for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):

View File

@ -78,6 +78,9 @@ from .vbox7 import Vbox7IE
from .dbtv import DBTVIE from .dbtv import DBTVIE
from .piksel import PikselIE from .piksel import PikselIE
from .videa import VideaIE from .videa import VideaIE
from .twentymin import TwentyMinutenIE
from .ustream import UstreamIE
from .openload import OpenloadIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -422,6 +425,26 @@ class GenericIE(InfoExtractor):
'skip_download': True, # m3u8 download 'skip_download': True, # m3u8 download
}, },
}, },
{
# Brightcove with alternative playerID key
'url': 'http://www.nature.com/nmeth/journal/v9/n7/fig_tab/nmeth.2062_SV1.html',
'info_dict': {
'id': 'nmeth.2062_SV1',
'title': 'Simultaneous multiview imaging of the Drosophila syncytial blastoderm : Quantitative high-speed imaging of entire developing embryos with simultaneous multiview light-sheet microscopy : Nature Methods : Nature Research',
},
'playlist': [{
'info_dict': {
'id': '2228375078001',
'ext': 'mp4',
'title': 'nmeth.2062-sv1',
'description': 'nmeth.2062-sv1',
'timestamp': 1363357591,
'upload_date': '20130315',
'uploader': 'Nature Publishing Group',
'uploader_id': '1964492299001',
},
}],
},
# ooyala video # ooyala video
{ {
'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219', 'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
@ -567,17 +590,6 @@ class GenericIE(InfoExtractor):
'description': 'md5:8145d19d320ff3e52f28401f4c4283b9', 'description': 'md5:8145d19d320ff3e52f28401f4c4283b9',
} }
}, },
# Embedded Ustream video
{
'url': 'http://www.american.edu/spa/pti/nsa-privacy-janus-2014.cfm',
'md5': '27b99cdb639c9b12a79bca876a073417',
'info_dict': {
'id': '45734260',
'ext': 'flv',
'uploader': 'AU SPA: The NSA and Privacy',
'title': 'NSA and Privacy Forum Debate featuring General Hayden and Barton Gellman'
}
},
# nowvideo embed hidden behind percent encoding # nowvideo embed hidden behind percent encoding
{ {
'url': 'http://www.waoanime.tv/the-super-dimension-fortress-macross-episode-1/', 'url': 'http://www.waoanime.tv/the-super-dimension-fortress-macross-episode-1/',
@ -1448,6 +1460,20 @@ class GenericIE(InfoExtractor):
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
}, },
{
# 20 minuten embed
'url': 'http://www.20min.ch/schweiz/news/story/So-kommen-Sie-bei-Eis-und-Schnee-sicher-an-27032552',
'info_dict': {
'id': '523629',
'ext': 'mp4',
'title': 'So kommen Sie bei Eis und Schnee sicher an',
'description': 'md5:117c212f64b25e3d95747e5276863f7d',
},
'params': {
'skip_download': True,
},
'add_ie': [TwentyMinutenIE.ie_key()],
}
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@ -1939,7 +1965,14 @@ class GenericIE(InfoExtractor):
re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or
re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage)) re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage))
if mobj is not None: if mobj is not None:
return OoyalaIE._build_url_result(smuggle_url(mobj.group('ec'), {'domain': url})) embed_token = self._search_regex(
r'embedToken[\'"]?\s*:\s*[\'"]([^\'"]+)',
webpage, 'ooyala embed token', default=None)
return OoyalaIE._build_url_result(smuggle_url(
mobj.group('ec'), {
'domain': url,
'embed_token': embed_token,
}))
# Look for multiple Ooyala embeds on SBN network websites # Look for multiple Ooyala embeds on SBN network websites
mobj = re.search(r'SBN\.VideoLinkset\.entryGroup\((\[.*?\])', webpage) mobj = re.search(r'SBN\.VideoLinkset\.entryGroup\((\[.*?\])', webpage)
@ -2070,10 +2103,9 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url'), 'TED') return self.url_result(mobj.group('url'), 'TED')
# Look for embedded Ustream videos # Look for embedded Ustream videos
mobj = re.search( ustream_url = UstreamIE._extract_url(webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage) if ustream_url:
if mobj is not None: return self.url_result(ustream_url, UstreamIE.ie_key())
return self.url_result(mobj.group('url'), 'Ustream')
# Look for embedded arte.tv player # Look for embedded arte.tv player
mobj = re.search( mobj = re.search(
@ -2394,6 +2426,18 @@ class GenericIE(InfoExtractor):
if videa_urls: if videa_urls:
return _playlist_from_matches(videa_urls, ie=VideaIE.ie_key()) return _playlist_from_matches(videa_urls, ie=VideaIE.ie_key())
# Look for 20 minuten embeds
twentymin_urls = TwentyMinutenIE._extract_urls(webpage)
if twentymin_urls:
return _playlist_from_matches(
twentymin_urls, ie=TwentyMinutenIE.ie_key())
# Look for Openload embeds
openload_urls = OpenloadIE._extract_urls(webpage)
if openload_urls:
return _playlist_from_matches(
openload_urls, ie=OpenloadIE.ie_key())
# Looking for http://schema.org/VideoObject # Looking for http://schema.org/VideoObject
json_ld = self._search_json_ld( json_ld = self._search_json_ld(
webpage, video_id, default={}, expected_type='VideoObject') webpage, video_id, default={}, expected_type='VideoObject')

View File

@ -13,7 +13,7 @@ from ..utils import (
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-)vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-|videoplayer/)vi(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
@ -32,6 +32,9 @@ class ImdbIE(InfoExtractor):
}, { }, {
'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897', 'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.imdb.com/videoplayer/vi1562949145',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -5,9 +5,27 @@ import re
from ..compat import compat_urlparse from ..compat import compat_urlparse
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_duration
class JamendoIE(InfoExtractor): class JamendoBaseIE(InfoExtractor):
def _extract_meta(self, webpage, fatal=True):
title = self._og_search_title(
webpage, default=None) or self._search_regex(
r'<title>([^<]+)', webpage,
'title', default=None)
if title:
title = self._search_regex(
r'(.+?)\s*\|\s*Jamendo Music', title, 'title', default=None)
if not title:
title = self._html_search_meta(
'name', webpage, 'title', fatal=fatal)
mobj = re.search(r'(.+) - (.+)', title or '')
artist, second = mobj.groups() if mobj else [None] * 2
return title, artist, second
class JamendoIE(JamendoBaseIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
_TEST = { _TEST = {
'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i', 'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
@ -16,7 +34,10 @@ class JamendoIE(InfoExtractor):
'id': '196219', 'id': '196219',
'display_id': 'stories-from-emona-i', 'display_id': 'stories-from-emona-i',
'ext': 'flac', 'ext': 'flac',
'title': 'Stories from Emona I', 'title': 'Maya Filipič - Stories from Emona I',
'artist': 'Maya Filipič',
'track': 'Stories from Emona I',
'duration': 210,
'thumbnail': r're:^https?://.*\.jpg' 'thumbnail': r're:^https?://.*\.jpg'
} }
} }
@ -28,7 +49,7 @@ class JamendoIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = self._html_search_meta('name', webpage, 'title') title, artist, track = self._extract_meta(webpage)
formats = [{ formats = [{
'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294' 'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@ -46,37 +67,47 @@ class JamendoIE(InfoExtractor):
thumbnail = self._html_search_meta( thumbnail = self._html_search_meta(
'image', webpage, 'thumbnail', fatal=False) 'image', webpage, 'thumbnail', fatal=False)
duration = parse_duration(self._search_regex(
r'<span[^>]+itemprop=["\']duration["\'][^>]+content=["\'](.+?)["\']',
webpage, 'duration', fatal=False))
return { return {
'id': track_id, 'id': track_id,
'display_id': display_id, 'display_id': display_id,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'title': title, 'title': title,
'duration': duration,
'artist': artist,
'track': track,
'formats': formats 'formats': formats
} }
class JamendoAlbumIE(InfoExtractor): class JamendoAlbumIE(JamendoBaseIE):
_VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)' _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)'
_TEST = { _TEST = {
'url': 'https://www.jamendo.com/album/121486/duck-on-cover', 'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
'info_dict': { 'info_dict': {
'id': '121486', 'id': '121486',
'title': 'Duck On Cover' 'title': 'Shearer - Duck On Cover'
}, },
'playlist': [{ 'playlist': [{
'md5': 'e1a2fcb42bda30dfac990212924149a8', 'md5': 'e1a2fcb42bda30dfac990212924149a8',
'info_dict': { 'info_dict': {
'id': '1032333', 'id': '1032333',
'ext': 'flac', 'ext': 'flac',
'title': 'Warmachine' 'title': 'Shearer - Warmachine',
'artist': 'Shearer',
'track': 'Warmachine',
} }
}, { }, {
'md5': '1f358d7b2f98edfe90fd55dac0799d50', 'md5': '1f358d7b2f98edfe90fd55dac0799d50',
'info_dict': { 'info_dict': {
'id': '1032330', 'id': '1032330',
'ext': 'flac', 'ext': 'flac',
'title': 'Without Your Ghost' 'title': 'Shearer - Without Your Ghost',
'artist': 'Shearer',
'track': 'Without Your Ghost',
} }
}], }],
'params': { 'params': {
@ -90,18 +121,18 @@ class JamendoAlbumIE(InfoExtractor):
webpage = self._download_webpage(url, mobj.group('display_id')) webpage = self._download_webpage(url, mobj.group('display_id'))
title = self._html_search_meta('name', webpage, 'title') title, artist, album = self._extract_meta(webpage, fatal=False)
entries = [ entries = [{
self.url_result( '_type': 'url_transparent',
compat_urlparse.urljoin(url, m.group('path')), 'url': compat_urlparse.urljoin(url, m.group('path')),
ie=JamendoIE.ie_key(), 'ie_key': JamendoIE.ie_key(),
video_id=self._search_regex( 'id': self._search_regex(
r'/track/(\d+)', m.group('path'), r'/track/(\d+)', m.group('path'), 'track id', default=None),
'track id', default=None)) 'artist': artist,
for m in re.finditer( 'album': album,
r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link', } for m in re.finditer(
webpage) r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
] webpage)]
return self.playlist_result(entries, album_id, title) return self.playlist_result(entries, album_id, title)

View File

@ -2,29 +2,31 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
) )
class KonserthusetPlayIE(InfoExtractor): class KonserthusetPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?konserthusetplay\.se/\?.*\bm=(?P<id>[^&]+)' _VALID_URL = r'https?://(?:www\.)?(?:konserthusetplay|rspoplay)\.se/\?.*\bm=(?P<id>[^&]+)'
_TEST = { _TESTS = [{
'url': 'http://www.konserthusetplay.se/?m=CKDDnlCY-dhWAAqiMERd-A', 'url': 'http://www.konserthusetplay.se/?m=CKDDnlCY-dhWAAqiMERd-A',
'md5': 'e3fd47bf44e864bd23c08e487abe1967',
'info_dict': { 'info_dict': {
'id': 'CKDDnlCY-dhWAAqiMERd-A', 'id': 'CKDDnlCY-dhWAAqiMERd-A',
'ext': 'flv', 'ext': 'mp4',
'title': 'Orkesterns instrument: Valthornen', 'title': 'Orkesterns instrument: Valthornen',
'description': 'md5:f10e1f0030202020396a4d712d2fa827', 'description': 'md5:f10e1f0030202020396a4d712d2fa827',
'thumbnail': 're:^https?://.*$', 'thumbnail': 're:^https?://.*$',
'duration': 398.8, 'duration': 398.76,
}, },
'params': { }, {
# rtmp download 'url': 'http://rspoplay.se/?m=elWuEH34SMKvaO4wO_cHBw',
'skip_download': True, 'only_matching': True,
}, }]
}
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -42,12 +44,18 @@ class KonserthusetPlayIE(InfoExtractor):
player_config = media['playerconfig'] player_config = media['playerconfig']
playlist = player_config['playlist'] playlist = player_config['playlist']
source = next(f for f in playlist if f.get('bitrates')) source = next(f for f in playlist if f.get('bitrates') or f.get('provider'))
FORMAT_ID_REGEX = r'_([^_]+)_h264m\.mp4' FORMAT_ID_REGEX = r'_([^_]+)_h264m\.mp4'
formats = [] formats = []
m3u8_url = source.get('url')
if m3u8_url and determine_ext(m3u8_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
fallback_url = source.get('fallbackUrl') fallback_url = source.get('fallbackUrl')
fallback_format_id = None fallback_format_id = None
if fallback_url: if fallback_url:
@ -97,6 +105,13 @@ class KonserthusetPlayIE(InfoExtractor):
thumbnail = media.get('image') thumbnail = media.get('image')
duration = float_or_none(media.get('duration'), 1000) duration = float_or_none(media.get('duration'), 1000)
subtitles = {}
captions = source.get('captionsAvailableLanguages')
if isinstance(captions, dict):
for lang, subtitle_url in captions.items():
if lang != 'none' and isinstance(subtitle_url, compat_str):
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
@ -104,4 +119,5 @@ class KonserthusetPlayIE(InfoExtractor):
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'formats': formats, 'formats': formats,
'subtitles': subtitles,
} }

View File

@ -59,14 +59,26 @@ class LimelightBaseIE(InfoExtractor):
format_id = 'rtmp' format_id = 'rtmp'
if stream.get('videoBitRate'): if stream.get('videoBitRate'):
format_id += '-%d' % int_or_none(stream['videoBitRate']) format_id += '-%d' % int_or_none(stream['videoBitRate'])
http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:] http_format_id = format_id.replace('rtmp', 'http')
urls.append(http_url)
http_fmt = fmt.copy() CDN_HOSTS = (
http_fmt.update({ ('delvenetworks.com', 'cpl.delvenetworks.com'),
'url': http_url, ('video.llnw.net', 's2.content.video.llnw.net'),
'format_id': format_id.replace('rtmp', 'http'), )
}) for cdn_host, http_host in CDN_HOSTS:
formats.append(http_fmt) if cdn_host not in rtmp.group('host').lower():
continue
http_url = 'http://%s/%s' % (http_host, rtmp.group('playpath')[4:])
urls.append(http_url)
if self._is_valid_url(http_url, video_id, http_format_id):
http_fmt = fmt.copy()
http_fmt.update({
'url': http_url,
'format_id': http_format_id,
})
formats.append(http_fmt)
break
fmt.update({ fmt.update({
'url': rtmp.group('url'), 'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'), 'play_path': rtmp.group('playpath'),

View File

@ -190,7 +190,7 @@ class MiTeleIE(InfoExtractor):
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
# for some reason only HLS is supported # for some reason only HLS is supported
'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8'}), 'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,

View File

@ -16,7 +16,6 @@ from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
OnDemandPagedList, OnDemandPagedList,
parse_count,
str_to_int, str_to_int,
) )
@ -36,7 +35,6 @@ class MixcloudIE(InfoExtractor):
'uploader_id': 'dholbach', 'uploader_id': 'dholbach',
'thumbnail': r're:https?://.*\.jpg', 'thumbnail': r're:https?://.*\.jpg',
'view_count': int, 'view_count': int,
'like_count': int,
}, },
}, { }, {
'url': 'http://www.mixcloud.com/gillespeterson/caribou-7-inch-vinyl-mix-chat/', 'url': 'http://www.mixcloud.com/gillespeterson/caribou-7-inch-vinyl-mix-chat/',
@ -49,7 +47,6 @@ class MixcloudIE(InfoExtractor):
'uploader_id': 'gillespeterson', 'uploader_id': 'gillespeterson',
'thumbnail': 're:https?://.*', 'thumbnail': 're:https?://.*',
'view_count': int, 'view_count': int,
'like_count': int,
}, },
}, { }, {
'url': 'https://beta.mixcloud.com/RedLightRadio/nosedrip-15-red-light-radio-01-18-2016/', 'url': 'https://beta.mixcloud.com/RedLightRadio/nosedrip-15-red-light-radio-01-18-2016/',
@ -89,26 +86,18 @@ class MixcloudIE(InfoExtractor):
song_url = play_info['stream_url'] song_url = play_info['stream_url']
PREFIX = ( title = self._html_search_regex(r'm-title="([^"]+)"', webpage, 'title')
r'm-play-on-spacebar[^>]+'
r'(?:\s+[a-zA-Z0-9-]+(?:="[^"]+")?)*?\s+')
title = self._html_search_regex(
PREFIX + r'm-title="([^"]+)"', webpage, 'title')
thumbnail = self._proto_relative_url(self._html_search_regex( thumbnail = self._proto_relative_url(self._html_search_regex(
PREFIX + r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail', r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail', fatal=False))
fatal=False))
uploader = self._html_search_regex( uploader = self._html_search_regex(
PREFIX + r'm-owner-name="([^"]+)"', r'm-owner-name="([^"]+)"', webpage, 'uploader', fatal=False)
webpage, 'uploader', fatal=False)
uploader_id = self._search_regex( uploader_id = self._search_regex(
r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False) r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
like_count = parse_count(self._search_regex(
r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
webpage, 'like count', default=None))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
[r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"', [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
r'/listeners/?">([0-9,.]+)</a>'], r'/listeners/?">([0-9,.]+)</a>',
r'm-tooltip=["\']([\d,.]+) plays'],
webpage, 'play count', default=None)) webpage, 'play count', default=None))
return { return {
@ -120,7 +109,6 @@ class MixcloudIE(InfoExtractor):
'uploader': uploader, 'uploader': uploader,
'uploader_id': uploader_id, 'uploader_id': uploader_id,
'view_count': view_count, 'view_count': view_count,
'like_count': like_count,
} }

View File

@ -13,11 +13,11 @@ from ..utils import (
fix_xml_ampersands, fix_xml_ampersands,
float_or_none, float_or_none,
HEADRequest, HEADRequest,
NO_DEFAULT,
RegexNotFoundError, RegexNotFoundError,
sanitized_Request, sanitized_Request,
strip_or_none, strip_or_none,
timeconvert, timeconvert,
try_get,
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
url_basename, url_basename,
@ -42,15 +42,6 @@ class MTVServicesInfoExtractor(InfoExtractor):
# Remove the templates, like &device={device} # Remove the templates, like &device={device}
return re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', url) return re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', url)
# This was originally implemented for ComedyCentral, but it also works here
@classmethod
def _transform_rtmp_url(cls, rtmp_video_url):
m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\..+?/.*)$', rtmp_video_url)
if not m:
return {'rtmp': rtmp_video_url}
base = 'http://viacommtvstrmfs.fplive.net/'
return {'http': base + m.group('finalid')}
def _get_feed_url(self, uri): def _get_feed_url(self, uri):
return self._FEED_URL return self._FEED_URL
@ -91,22 +82,28 @@ class MTVServicesInfoExtractor(InfoExtractor):
if rendition.get('method') == 'hls': if rendition.get('method') == 'hls':
hls_url = rendition.find('./src').text hls_url = rendition.find('./src').text
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
hls_url, video_id, ext='mp4', entry_protocol='m3u8_native')) hls_url, video_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls'))
else: else:
# fms # fms
try: try:
_, _, ext = rendition.attrib['type'].partition('/') _, _, ext = rendition.attrib['type'].partition('/')
rtmp_video_url = rendition.find('./src').text rtmp_video_url = rendition.find('./src').text
if 'error_not_available.swf' in rtmp_video_url:
raise ExtractorError(
'%s said: video is not available' % self.IE_NAME,
expected=True)
if rtmp_video_url.endswith('siteunavail.png'): if rtmp_video_url.endswith('siteunavail.png'):
continue continue
new_urls = self._transform_rtmp_url(rtmp_video_url)
formats.extend([{ formats.extend([{
'ext': 'flv' if new_url.startswith('rtmp') else ext, 'ext': 'flv' if rtmp_video_url.startswith('rtmp') else ext,
'url': new_url, 'url': rtmp_video_url,
'format_id': '-'.join(filter(None, [kind, rendition.get('bitrate')])), 'format_id': '-'.join(filter(None, [
'rtmp' if rtmp_video_url.startswith('rtmp') else None,
rendition.get('bitrate')])),
'width': int(rendition.get('width')), 'width': int(rendition.get('width')),
'height': int(rendition.get('height')), 'height': int(rendition.get('height')),
} for kind, new_url in new_urls.items()]) }])
except (KeyError, TypeError): except (KeyError, TypeError):
raise ExtractorError('Invalid rendition field.') raise ExtractorError('Invalid rendition field.')
self._sort_formats(formats) self._sort_formats(formats)
@ -212,7 +209,28 @@ class MTVServicesInfoExtractor(InfoExtractor):
[self._get_video_info(item, use_hls) for item in idoc.findall('.//item')], [self._get_video_info(item, use_hls) for item in idoc.findall('.//item')],
playlist_title=title, playlist_description=description) playlist_title=title, playlist_description=description)
def _extract_mgid(self, webpage, default=NO_DEFAULT): def _extract_triforce_mgid(self, webpage, data_zone=None, video_id=None):
triforce_feed = self._parse_json(self._search_regex(
r'triforceManifestFeed\s*=\s*({.+?})\s*;\s*\n', webpage,
'triforce feed', default='{}'), video_id, fatal=False)
data_zone = self._search_regex(
r'data-zone=(["\'])(?P<zone>.+?_lc_promo.*?)\1', webpage,
'data zone', default=data_zone, group='zone')
feed_url = try_get(
triforce_feed, lambda x: x['manifest']['zones'][data_zone]['feed'],
compat_str)
if not feed_url:
return
feed = self._download_json(feed_url, video_id, fatal=False)
if not feed:
return
return try_get(feed, lambda x: x['result']['data']['id'], compat_str)
def _extract_mgid(self, webpage):
try: try:
# the url can be http://media.mtvnservices.com/fb/{mgid}.swf # the url can be http://media.mtvnservices.com/fb/{mgid}.swf
# or http://media.mtvnservices.com/{mgid} # or http://media.mtvnservices.com/{mgid}
@ -232,7 +250,11 @@ class MTVServicesInfoExtractor(InfoExtractor):
sm4_embed = self._html_search_meta( sm4_embed = self._html_search_meta(
'sm4:video:embed', webpage, 'sm4 embed', default='') 'sm4:video:embed', webpage, 'sm4 embed', default='')
mgid = self._search_regex( mgid = self._search_regex(
r'embed/(mgid:.+?)["\'&?/]', sm4_embed, 'mgid', default=default) r'embed/(mgid:.+?)["\'&?/]', sm4_embed, 'mgid', default=None)
if not mgid:
mgid = self._extract_triforce_mgid(webpage)
return mgid return mgid
def _real_extract(self, url): def _real_extract(self, url):
@ -282,7 +304,7 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
class MTVIE(MTVServicesInfoExtractor): class MTVIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv' IE_NAME = 'mtv'
_VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)' _VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.mtv.com/feeds/mrss/' _FEED_URL = 'http://www.mtv.com/feeds/mrss/'
_TESTS = [{ _TESTS = [{
@ -299,6 +321,9 @@ class MTVIE(MTVServicesInfoExtractor):
}, { }, {
'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101', 'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.mtv.com/episodes/g8xu7q/teen-mom-2-breaking-the-wall-season-7-ep-713',
'only_matching': True,
}] }]

View File

@ -12,10 +12,10 @@ from ..utils import (
class NaverIE(InfoExtractor): class NaverIE(InfoExtractor):
_VALID_URL = r'https?://(?:m\.)?tvcast\.naver\.com/v/(?P<id>\d+)' _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://tvcast.naver.com/v/81652', 'url': 'http://tv.naver.com/v/81652',
'info_dict': { 'info_dict': {
'id': '81652', 'id': '81652',
'ext': 'mp4', 'ext': 'mp4',
@ -24,7 +24,7 @@ class NaverIE(InfoExtractor):
'upload_date': '20130903', 'upload_date': '20130903',
}, },
}, { }, {
'url': 'http://tvcast.naver.com/v/395837', 'url': 'http://tv.naver.com/v/395837',
'md5': '638ed4c12012c458fefcddfd01f173cd', 'md5': '638ed4c12012c458fefcddfd01f173cd',
'info_dict': { 'info_dict': {
'id': '395837', 'id': '395837',
@ -34,6 +34,9 @@ class NaverIE(InfoExtractor):
'upload_date': '20150519', 'upload_date': '20150519',
}, },
'skip': 'Georestricted', 'skip': 'Georestricted',
}, {
'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -2,7 +2,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import parse_iso8601 from ..compat import compat_urlparse
from ..utils import (
clean_html,
get_element_by_class,
int_or_none,
parse_iso8601,
remove_start,
unified_timestamp,
)
class NextMediaIE(InfoExtractor): class NextMediaIE(InfoExtractor):
@ -30,6 +38,12 @@ class NextMediaIE(InfoExtractor):
return self._extract_from_nextmedia_page(news_id, url, page) return self._extract_from_nextmedia_page(news_id, url, page)
def _extract_from_nextmedia_page(self, news_id, url, page): def _extract_from_nextmedia_page(self, news_id, url, page):
redirection_url = self._search_regex(
r'window\.location\.href\s*=\s*([\'"])(?P<url>(?!\1).+)\1',
page, 'redirection URL', default=None, group='url')
if redirection_url:
return self.url_result(compat_urlparse.urljoin(url, redirection_url))
title = self._fetch_title(page) title = self._fetch_title(page)
video_url = self._search_regex(self._URL_PATTERN, page, 'video url') video_url = self._search_regex(self._URL_PATTERN, page, 'video url')
@ -93,7 +107,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE): class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報' IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews|actionnews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?' _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/[^/]+/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694', 'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d', 'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@ -157,6 +171,10 @@ class AppleDailyIE(NextMediaIE):
}, { }, {
'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/', 'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/',
'only_matching': True, 'only_matching': True,
}, {
# Redirected from http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694
'url': 'http://ent.appledaily.com.tw/section/article/headline/20150128/36354694',
'only_matching': True,
}] }]
_URL_PATTERN = r'\{url: \'(.+)\'\}' _URL_PATTERN = r'\{url: \'(.+)\'\}'
@ -173,3 +191,48 @@ class AppleDailyIE(NextMediaIE):
def _fetch_description(self, page): def _fetch_description(self, page):
return self._html_search_meta('description', page, 'news description') return self._html_search_meta('description', page, 'news description')
class NextTVIE(InfoExtractor):
IE_DESC = '壹電視'
_VALID_URL = r'https?://(?:www\.)?nexttv\.com\.tw/(?:[^/]+/)+(?P<id>\d+)'
_TEST = {
'url': 'http://www.nexttv.com.tw/news/realtime/politics/11779671',
'info_dict': {
'id': '11779671',
'ext': 'mp4',
'title': '「超收稅」近4千億 藍議員籲發消費券',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1484825400,
'upload_date': '20170119',
'view_count': int,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<h1[^>]*>([^<]+)</h1>', webpage, 'title')
data = self._hidden_inputs(webpage)
video_url = data['ntt-vod-src-detailview']
date_str = get_element_by_class('date', webpage)
timestamp = unified_timestamp(date_str + '+0800') if date_str else None
view_count = int_or_none(remove_start(
clean_html(get_element_by_class('click', webpage)), '點閱:'))
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': data.get('ntt-vod-img-src'),
'timestamp': timestamp,
'view_count': view_count,
}

View File

@ -7,7 +7,6 @@ import datetime
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_urlencode,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -40,6 +39,7 @@ class NiconicoIE(InfoExtractor):
'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org', 'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
'duration': 33, 'duration': 33,
}, },
'skip': 'Requires an account',
}, { }, {
# File downloaded with and without credentials are different, so omit # File downloaded with and without credentials are different, so omit
# the md5 field # the md5 field
@ -55,6 +55,7 @@ class NiconicoIE(InfoExtractor):
'timestamp': 1304065916, 'timestamp': 1304065916,
'duration': 209, 'duration': 209,
}, },
'skip': 'Requires an account',
}, { }, {
# 'video exists but is marked as "deleted" # 'video exists but is marked as "deleted"
# md5 is unstable # md5 is unstable
@ -65,9 +66,10 @@ class NiconicoIE(InfoExtractor):
'description': 'deleted', 'description': 'deleted',
'title': 'ドラえもんエターナル第3話「決戦第3新東京市」前編', 'title': 'ドラえもんエターナル第3話「決戦第3新東京市」前編',
'upload_date': '20071224', 'upload_date': '20071224',
'timestamp': 1198527840, # timestamp field has different value if logged in 'timestamp': int, # timestamp field has different value if logged in
'duration': 304, 'duration': 304,
}, },
'skip': 'Requires an account',
}, { }, {
'url': 'http://www.nicovideo.jp/watch/so22543406', 'url': 'http://www.nicovideo.jp/watch/so22543406',
'info_dict': { 'info_dict': {
@ -79,13 +81,12 @@ class NiconicoIE(InfoExtractor):
'upload_date': '20140104', 'upload_date': '20140104',
'uploader': 'アニメロチャンネル', 'uploader': 'アニメロチャンネル',
'uploader_id': '312', 'uploader_id': '312',
} },
'skip': 'The viewing period of the video you were searching for has expired.',
}] }]
_VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)' _VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
_NETRC_MACHINE = 'niconico' _NETRC_MACHINE = 'niconico'
# Determine whether the downloader used authentication to download video
_AUTHENTICATED = False
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -109,8 +110,6 @@ class NiconicoIE(InfoExtractor):
if re.search(r'(?i)<h1 class="mb8p4">Log in error</h1>', login_results) is not None: if re.search(r'(?i)<h1 class="mb8p4">Log in error</h1>', login_results) is not None:
self._downloader.report_warning('unable to log in: bad username or password') self._downloader.report_warning('unable to log in: bad username or password')
return False return False
# Successful login
self._AUTHENTICATED = True
return True return True
def _real_extract(self, url): def _real_extract(self, url):
@ -128,35 +127,19 @@ class NiconicoIE(InfoExtractor):
'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id, 'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id,
note='Downloading video info page') note='Downloading video info page')
if self._AUTHENTICATED: # Get flv info
# Get flv info flv_info_webpage = self._download_webpage(
flv_info_webpage = self._download_webpage( 'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1', video_id, 'Downloading flv info')
video_id, 'Downloading flv info')
else:
# Get external player info
ext_player_info = self._download_webpage(
'http://ext.nicovideo.jp/thumb_watch/' + video_id, video_id)
thumb_play_key = self._search_regex(
r'\'thumbPlayKey\'\s*:\s*\'(.*?)\'', ext_player_info, 'thumbPlayKey')
# Get flv info
flv_info_data = compat_urllib_parse_urlencode({
'k': thumb_play_key,
'v': video_id
})
flv_info_request = sanitized_Request(
'http://ext.nicovideo.jp/thumb_watch', flv_info_data,
{'Content-Type': 'application/x-www-form-urlencoded'})
flv_info_webpage = self._download_webpage(
flv_info_request, video_id,
note='Downloading flv info', errnote='Unable to download flv info')
flv_info = compat_urlparse.parse_qs(flv_info_webpage) flv_info = compat_urlparse.parse_qs(flv_info_webpage)
if 'url' not in flv_info: if 'url' not in flv_info:
if 'deleted' in flv_info: if 'deleted' in flv_info:
raise ExtractorError('The video has been deleted.', raise ExtractorError('The video has been deleted.',
expected=True) expected=True)
elif 'closed' in flv_info:
raise ExtractorError('Niconico videos now require logging in',
expected=True)
else: else:
raise ExtractorError('Unable to find video URL') raise ExtractorError('Unable to find video URL')

View File

@ -18,7 +18,7 @@ class OoyalaBaseIE(InfoExtractor):
_CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/' _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
_AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?' _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?'
def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None): def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None, embed_token=None):
content_tree = self._download_json(content_tree_url, video_id)['content_tree'] content_tree = self._download_json(content_tree_url, video_id)['content_tree']
metadata = content_tree[list(content_tree)[0]] metadata = content_tree[list(content_tree)[0]]
embed_code = metadata['embed_code'] embed_code = metadata['embed_code']
@ -29,7 +29,8 @@ class OoyalaBaseIE(InfoExtractor):
self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) + self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
compat_urllib_parse_urlencode({ compat_urllib_parse_urlencode({
'domain': domain, 'domain': domain,
'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds', 'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds,dash,smooth',
'embedToken': embed_token,
}), video_id) }), video_id)
cur_auth_data = auth_data['authorization_data'][embed_code] cur_auth_data = auth_data['authorization_data'][embed_code]
@ -52,6 +53,12 @@ class OoyalaBaseIE(InfoExtractor):
elif delivery_type == 'hds' or ext == 'f4m': elif delivery_type == 'hds' or ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False)) s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
elif delivery_type == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
s_url, embed_code, mpd_id='dash', fatal=False))
elif delivery_type == 'smooth':
self._extract_ism_formats(
s_url, embed_code, ism_id='mss', fatal=False)
elif ext == 'smil': elif ext == 'smil':
formats.extend(self._extract_smil_formats( formats.extend(self._extract_smil_formats(
s_url, embed_code, fatal=False)) s_url, embed_code, fatal=False))
@ -146,8 +153,9 @@ class OoyalaIE(OoyalaBaseIE):
embed_code = self._match_id(url) embed_code = self._match_id(url)
domain = smuggled_data.get('domain') domain = smuggled_data.get('domain')
supportedformats = smuggled_data.get('supportedformats') supportedformats = smuggled_data.get('supportedformats')
embed_token = smuggled_data.get('embed_token')
content_tree_url = self._CONTENT_TREE_BASE + 'embed_code/%s/%s' % (embed_code, embed_code) content_tree_url = self._CONTENT_TREE_BASE + 'embed_code/%s/%s' % (embed_code, embed_code)
return self._extract(content_tree_url, embed_code, domain, supportedformats) return self._extract(content_tree_url, embed_code, domain, supportedformats, embed_token)
class OoyalaExternalIE(OoyalaBaseIE): class OoyalaExternalIE(OoyalaBaseIE):

View File

@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_chr from ..compat import compat_chr
from ..utils import ( from ..utils import (
@ -56,6 +58,12 @@ class OpenloadIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+src=["\']((?:https?://)?(?:openload\.(?:co|io)|oload\.tv)/embed/[a-zA-Z0-9-_]+)',
webpage)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage('https://openload.co/embed/%s/' % video_id, video_id) webpage = self._download_webpage('https://openload.co/embed/%s/' % video_id, video_id)
@ -64,16 +72,17 @@ class OpenloadIE(InfoExtractor):
raise ExtractorError('File not found', expected=True) raise ExtractorError('File not found', expected=True)
ol_id = self._search_regex( ol_id = self._search_regex(
'<span[^>]+id="[a-zA-Z0-9]+x"[^>]*>([0-9]+)</span>', '<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
webpage, 'openload ID') webpage, 'openload ID')
first_two_chars = int(float(ol_id[0:][:2])) first_three_chars = int(float(ol_id[0:][:3]))
fifth_char = int(float(ol_id[3:5]))
urlcode = '' urlcode = ''
num = 2 num = 5
while num < len(ol_id): while num < len(ol_id):
urlcode += compat_chr(int(float(ol_id[num:][:3])) - urlcode += compat_chr(int(float(ol_id[num:][:3])) +
first_two_chars * int(float(ol_id[num + 3:][:2]))) first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2])))
num += 5 num += 5
video_url = 'https://openload.co/stream/' + urlcode video_url = 'https://openload.co/stream/' + urlcode
@ -92,7 +101,7 @@ class OpenloadIE(InfoExtractor):
'thumbnail': self._og_search_thumbnail(webpage, default=None), 'thumbnail': self._og_search_thumbnail(webpage, default=None),
'url': video_url, 'url': video_url,
# Seems all videos have extensions in their titles # Seems all videos have extensions in their titles
'ext': determine_ext(title), 'ext': determine_ext(title, 'mp4'),
'subtitles': subtitles, 'subtitles': subtitles,
} }
return info_dict return info_dict

View File

@ -157,13 +157,10 @@ class PluralsightIE(PluralsightBaseIE):
display_id = '%s-%s' % (name, clip_id) display_id = '%s-%s' % (name, clip_id)
parsed_url = compat_urlparse.urlparse(url)
payload_url = compat_urlparse.urlunparse(parsed_url._replace(
netloc='app.pluralsight.com', path='player/api/v1/payload'))
course = self._download_json( course = self._download_json(
payload_url, display_id, headers={'Referer': url})['payload']['course'] 'https://app.pluralsight.com/player/user/api/v1/player/payload',
display_id, data=urlencode_postdata({'courseId': course_name}),
headers={'Referer': url})
collection = course['modules'] collection = course['modules']

View File

@ -0,0 +1,92 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_str,
)
from ..utils import (
int_or_none,
try_get,
unified_timestamp,
)
class PornFlipIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:v|embed)/(?P<id>[0-9A-Za-z]{11})'
_TESTS = [{
'url': 'https://www.pornflip.com/v/wz7DfNhMmep',
'md5': '98c46639849145ae1fd77af532a9278c',
'info_dict': {
'id': 'wz7DfNhMmep',
'ext': 'mp4',
'title': '2 Amateurs swallow make his dream cumshots true',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 112,
'timestamp': 1481655502,
'upload_date': '20161213',
'uploader_id': '106786',
'uploader': 'figifoto',
'view_count': int,
'age_limit': 18,
}
}, {
'url': 'https://www.pornflip.com/embed/wz7DfNhMmep',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.pornflip.com/v/%s' % video_id, video_id)
flashvars = compat_parse_qs(self._search_regex(
r'<embed[^>]+flashvars=(["\'])(?P<flashvars>(?:(?!\1).)+)\1',
webpage, 'flashvars', group='flashvars'))
title = flashvars['video_vars[title]'][0]
def flashvar(kind):
return try_get(
flashvars, lambda x: x['video_vars[%s]' % kind][0], compat_str)
formats = []
for key, value in flashvars.items():
if not (value and isinstance(value, list)):
continue
format_url = value[0]
if key == 'video_vars[hds_manifest]':
formats.extend(self._extract_mpd_formats(
format_url, video_id, mpd_id='dash', fatal=False))
continue
height = self._search_regex(
r'video_vars\[video_urls\]\[(\d+)', key, 'height', default=None)
if not height:
continue
formats.append({
'url': format_url,
'format_id': 'http-%s' % height,
'height': int_or_none(height),
})
self._sort_formats(formats)
uploader = self._html_search_regex(
(r'<span[^>]+class="name"[^>]*>\s*<a[^>]+>\s*<strong>(?P<uploader>[^<]+)',
r'<meta[^>]+content=(["\'])[^>]*\buploaded by (?P<uploader>.+?)\1'),
webpage, 'uploader', fatal=False, group='uploader')
return {
'id': video_id,
'formats': formats,
'title': title,
'thumbnail': flashvar('big_thumb'),
'duration': int_or_none(flashvar('duration')),
'timestamp': unified_timestamp(self._html_search_meta(
'uploadDate', webpage, 'timestamp')),
'uploader_id': flashvar('author_id'),
'uploader': uploader,
'view_count': int_or_none(flashvar('views')),
'age_limit': 18,
}

View File

@ -173,46 +173,54 @@ class SoundcloudIE(InfoExtractor):
}) })
# We have to retrieve the url # We have to retrieve the url
streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
format_dict = self._download_json( format_dict = self._download_json(
streams_url, 'http://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
track_id, 'Downloading track url') track_id, 'Downloading track url', query={
'client_id': self._CLIENT_ID,
'secret_token': secret_token,
})
for key, stream_url in format_dict.items(): for key, stream_url in format_dict.items():
abr = int_or_none(self._search_regex(
r'_(\d+)_url', key, 'audio bitrate', default=None))
if key.startswith('http'): if key.startswith('http'):
formats.append({ stream_formats = [{
'format_id': key, 'format_id': key,
'ext': ext, 'ext': ext,
'url': stream_url, 'url': stream_url,
'vcodec': 'none', }]
})
elif key.startswith('rtmp'): elif key.startswith('rtmp'):
# The url doesn't have an rtmp app, we have to extract the playpath # The url doesn't have an rtmp app, we have to extract the playpath
url, path = stream_url.split('mp3:', 1) url, path = stream_url.split('mp3:', 1)
formats.append({ stream_formats = [{
'format_id': key, 'format_id': key,
'url': url, 'url': url,
'play_path': 'mp3:' + path, 'play_path': 'mp3:' + path,
'ext': 'flv', 'ext': 'flv',
'vcodec': 'none', }]
}) elif key.startswith('hls'):
stream_formats = self._extract_m3u8_formats(
stream_url, track_id, 'mp3', entry_protocol='m3u8_native',
m3u8_id=key, fatal=False)
else:
continue
if not formats: for f in stream_formats:
# We fallback to the stream_url in the original info, this f['abr'] = abr
# cannot be always used, sometimes it can give an HTTP 404 error
formats.append({
'format_id': 'fallback',
'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
'ext': ext,
'vcodec': 'none',
})
for f in formats: formats.extend(stream_formats)
if f['format_id'].startswith('http'):
f['protocol'] = 'http' if not formats:
if f['format_id'].startswith('rtmp'): # We fallback to the stream_url in the original info, this
f['protocol'] = 'rtmp' # cannot be always used, sometimes it can give an HTTP 404 error
formats.append({
'format_id': 'fallback',
'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
'ext': ext,
})
for f in formats:
f['vcodec'] = 'none'
self._check_formats(formats, track_id) self._check_formats(formats, track_id)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -6,7 +6,7 @@ from .mtv import MTVServicesInfoExtractor
class SouthParkIE(MTVServicesInfoExtractor): class SouthParkIE(MTVServicesInfoExtractor):
IE_NAME = 'southpark.cc.com' IE_NAME = 'southpark.cc.com'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss' _FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
@ -75,7 +75,7 @@ class SouthParkDeIE(SouthParkIE):
class SouthParkNlIE(SouthParkIE): class SouthParkNlIE(SouthParkIE):
IE_NAME = 'southpark.nl' IE_NAME = 'southpark.nl'
_VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))' _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
_FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/' _FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/'
_TESTS = [{ _TESTS = [{

View File

@ -46,7 +46,7 @@ class SpikeIE(MTVServicesInfoExtractor):
_CUSTOM_URL_REGEX = re.compile(r'spikenetworkapp://([^/]+/[-a-fA-F0-9]+)') _CUSTOM_URL_REGEX = re.compile(r'spikenetworkapp://([^/]+/[-a-fA-F0-9]+)')
def _extract_mgid(self, webpage): def _extract_mgid(self, webpage):
mgid = super(SpikeIE, self)._extract_mgid(webpage, default=None) mgid = super(SpikeIE, self)._extract_mgid(webpage)
if mgid is None: if mgid is None:
url_parts = self._search_regex(self._CUSTOM_URL_REGEX, webpage, 'episode_id') url_parts = self._search_regex(self._CUSTOM_URL_REGEX, webpage, 'episode_id')
video_type, episode_id = url_parts.split('/', 1) video_type, episode_id = url_parts.split('/', 1)

View File

@ -48,9 +48,6 @@ class SRGSSRIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
bu, media_type, media_id = re.match(self._VALID_URL, url).groups() bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
if bu == 'rts':
return self.url_result('rts:%s' % media_id, 'RTS')
media_data = self.get_media_data(bu, media_type, media_id) media_data = self.get_media_data(bu, media_type, media_id)
metadata = media_data['AssetMetadatas']['AssetMetadata'][0] metadata = media_data['AssetMetadatas']['AssetMetadata'][0]

View File

@ -0,0 +1,54 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
smuggle_url,
)
class TVAIE(InfoExtractor):
_VALID_URL = r'https?://videos\.tva\.ca/episode/(?P<id>\d+)'
_TEST = {
'url': 'http://videos.tva.ca/episode/85538',
'info_dict': {
'id': '85538',
'ext': 'mp4',
'title': 'Épisode du 25 janvier 2017',
'description': 'md5:e9e7fb5532ab37984d2dc87229cadf98',
'upload_date': '20170126',
'timestamp': 1485442329,
},
'params': {
# m3u8 download
'skip_download': True,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json(
"https://d18jmrhziuoi7p.cloudfront.net/isl/api/v1/dataservice/Items('%s')" % video_id,
video_id, query={
'$expand': 'Metadata,CustomId',
'$select': 'Metadata,Id,Title,ShortDescription,LongDescription,CreatedDate,CustomId,AverageUserRating,Categories,ShowName',
'$format': 'json',
})
metadata = video_data.get('Metadata', {})
return {
'_type': 'url_transparent',
'id': video_id,
'title': video_data['Title'],
'url': smuggle_url('ooyala:' + video_data['CustomId'], {'supportedformats': 'm3u8,hds'}),
'description': video_data.get('LongDescription') or video_data.get('ShortDescription'),
'series': video_data.get('ShowName'),
'episode': metadata.get('EpisodeTitle'),
'episode_number': int_or_none(metadata.get('EpisodeNumber')),
'categories': video_data.get('Categories'),
'average_rating': video_data.get('AverageUserRating'),
'timestamp': parse_iso8601(video_data.get('CreatedDate')),
'ie_key': 'Ooyala',
}

View File

@ -12,7 +12,7 @@ from ..utils import (
class TwentyFourVideoIE(InfoExtractor): class TwentyFourVideoIE(InfoExtractor):
IE_NAME = '24video' IE_NAME = '24video'
_VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx|sex)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.24video.net/video/view/1044982', 'url': 'http://www.24video.net/video/view/1044982',
@ -43,7 +43,7 @@ class TwentyFourVideoIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( webpage = self._download_webpage(
'http://www.24video.net/video/view/%s' % video_id, video_id) 'http://www.24video.sex/video/view/%s' % video_id, video_id)
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
description = self._html_search_regex( description = self._html_search_regex(
@ -69,11 +69,11 @@ class TwentyFourVideoIE(InfoExtractor):
# Sets some cookies # Sets some cookies
self._download_xml( self._download_xml(
r'http://www.24video.net/video/xml/%s?mode=init' % video_id, r'http://www.24video.sex/video/xml/%s?mode=init' % video_id,
video_id, 'Downloading init XML') video_id, 'Downloading init XML')
video_xml = self._download_xml( video_xml = self._download_xml(
'http://www.24video.net/video/xml/%s?mode=play' % video_id, 'http://www.24video.sex/video/xml/%s?mode=play' % video_id,
video_id, 'Downloading video XML') video_id, 'Downloading video XML')
video = xpath_element(video_xml, './/video', 'video', fatal=True) video = xpath_element(video_xml, './/video', 'video', fatal=True)

View File

@ -4,91 +4,88 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import remove_end from ..utils import (
int_or_none,
try_get,
)
class TwentyMinutenIE(InfoExtractor): class TwentyMinutenIE(InfoExtractor):
IE_NAME = '20min' IE_NAME = '20min'
_VALID_URL = r'https?://(?:www\.)?20min\.ch/(?:videotv/*\?.*\bvid=(?P<id>\d+)|(?:[^/]+/)*(?P<display_id>[^/#?]+))' _VALID_URL = r'''(?x)
https?://
(?:www\.)?20min\.ch/
(?:
videotv/*\?.*?\bvid=|
videoplayer/videoplayer\.html\?.*?\bvideoId@
)
(?P<id>\d+)
'''
_TESTS = [{ _TESTS = [{
# regular video
'url': 'http://www.20min.ch/videotv/?vid=469148&cid=2', 'url': 'http://www.20min.ch/videotv/?vid=469148&cid=2',
'md5': 'b52d6bc6ea6398e6a38f12cfd418149c', 'md5': 'e7264320db31eed8c38364150c12496e',
'info_dict': { 'info_dict': {
'id': '469148', 'id': '469148',
'ext': 'flv',
'title': '85 000 Franken für 15 perfekte Minuten',
'description': 'Was die Besucher vom Silvesterzauber erwarten können. (Video: Alice Grosjean/Murat Temel)',
'thumbnail': 'http://thumbnails.20min-tv.ch/server063/469148/frame-72-469148.jpg'
}
}, {
# news article with video
'url': 'http://www.20min.ch/schweiz/news/story/-Wir-muessen-mutig-nach-vorne-schauen--22050469',
'md5': 'cd4cbb99b94130cff423e967cd275e5e',
'info_dict': {
'id': '469408',
'display_id': '-Wir-muessen-mutig-nach-vorne-schauen--22050469',
'ext': 'flv',
'title': '«Wir müssen mutig nach vorne schauen»',
'description': 'Kein Land sei innovativer als die Schweiz, sagte Johann Schneider-Ammann in seiner Neujahrsansprache. Das Land müsse aber seine Hausaufgaben machen.',
'thumbnail': 'http://www.20min.ch/images/content/2/2/0/22050469/10/teaserbreit.jpg'
},
'skip': '"This video is no longer available" is shown both on the web page and in the downloaded file.',
}, {
# YouTube embed
'url': 'http://www.20min.ch/ro/sports/football/story/Il-marque-une-bicyclette-de-plus-de-30-metres--21115184',
'md5': 'cec64d59aa01c0ed9dbba9cf639dd82f',
'info_dict': {
'id': 'ivM7A7SpDOs',
'ext': 'mp4', 'ext': 'mp4',
'title': 'GOLAZO DE CHILENA DE JAVI GÓMEZ, FINALISTA AL BALÓN DE CLM 2016', 'title': '85 000 Franken für 15 perfekte Minuten',
'description': 'md5:903c92fbf2b2f66c09de514bc25e9f5a', 'thumbnail': r're:https?://.*\.jpg$',
'upload_date': '20160424', },
'uploader': 'RTVCM Castilla-La Mancha', }, {
'uploader_id': 'RTVCM', 'url': 'http://www.20min.ch/videoplayer/videoplayer.html?params=client@twentyDE|videoId@523629',
'info_dict': {
'id': '523629',
'ext': 'mp4',
'title': 'So kommen Sie bei Eis und Schnee sicher an',
'description': 'md5:117c212f64b25e3d95747e5276863f7d',
'thumbnail': r're:https?://.*\.jpg$',
},
'params': {
'skip_download': True,
}, },
'add_ie': ['Youtube'],
}, { }, {
'url': 'http://www.20min.ch/videotv/?cid=44&vid=468738', 'url': 'http://www.20min.ch/videotv/?cid=44&vid=468738',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.20min.ch/ro/sortir/cinema/story/Grandir-au-bahut--c-est-dur-18927411',
'only_matching': True,
}] }]
@staticmethod
def _extract_urls(webpage):
return [m.group('url') for m in re.finditer(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1',
webpage)]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id) video = self._download_json(
'http://api.20min.ch/video/%s/show' % video_id,
video_id)['content']
youtube_url = self._html_search_regex( title = video['title']
r'<iframe[^>]+src="((?:https?:)?//www\.youtube\.com/embed/[^"]+)"',
webpage, 'YouTube embed URL', default=None)
if youtube_url is not None:
return self.url_result(youtube_url, 'Youtube')
title = self._html_search_regex( formats = [{
r'<h1>.*?<span>(.+?)</span></h1>', 'format_id': format_id,
webpage, 'title', default=None) 'url': 'http://podcast.20min-tv.ch/podcast/20min/%s%s.mp4' % (video_id, p),
if not title: 'quality': quality,
title = remove_end(re.sub( } for quality, (format_id, p) in enumerate([('sd', ''), ('hd', 'h')])]
r'^20 [Mm]inuten.*? -', '', self._og_search_title(webpage)), ' - News') self._sort_formats(formats)
if not video_id: description = video.get('lead')
video_id = self._search_regex( thumbnail = video.get('thumbnail')
r'"file\d?"\s*,\s*\"(\d+)', webpage, 'video id')
description = self._html_search_meta( def extract_count(kind):
'description', webpage, 'description') return try_get(
thumbnail = self._og_search_thumbnail(webpage) video,
lambda x: int_or_none(x['communityobject']['thumbs_%s' % kind]))
like_count = extract_count('up')
dislike_count = extract_count('down')
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id,
'url': 'http://speed.20min-tv.ch/%sm.flv' % video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'like_count': like_count,
'dislike_count': dislike_count,
'formats': formats,
} }

View File

@ -209,7 +209,7 @@ class TwitchVodIE(TwitchItemBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:www\.)?twitch\.tv/[^/]+/v/| (?:www\.)?twitch\.tv/(?:[^/]+/v|videos)/|
player\.twitch\.tv/\?.*?\bvideo=v player\.twitch\.tv/\?.*?\bvideo=v
) )
(?P<id>\d+) (?P<id>\d+)
@ -259,6 +259,9 @@ class TwitchVodIE(TwitchItemBaseIE):
}, { }, {
'url': 'http://player.twitch.tv/?t=5m10s&video=v6528877', 'url': 'http://player.twitch.tv/?t=5m10s&video=v6528877',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.twitch.tv/videos/6528877',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -84,12 +84,27 @@ class UOLIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
if not video_id.isdigit(): media_id = None
embed_page = self._download_webpage('https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id, video_id)
video_id = self._search_regex(r'mediaId=(\d+)', embed_page, 'media id') if video_id.isdigit():
media_id = video_id
if not media_id:
embed_page = self._download_webpage(
'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
video_id, 'Downloading embed page', fatal=False)
if embed_page:
media_id = self._search_regex(
(r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
embed_page, 'media id', default=None)
if not media_id:
webpage = self._download_webpage(url, video_id)
media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
video_data = self._download_json( video_data = self._download_json(
'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % video_id, 'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id,
video_id)['item'] media_id)['item']
title = video_data['title'] title = video_data['title']
query = { query = {
@ -118,7 +133,7 @@ class UOLIE(InfoExtractor):
tags.append(tag_description) tags.append(tag_description)
return { return {
'id': video_id, 'id': media_id,
'title': title, 'title': title,
'description': clean_html(video_data.get('desMedia')), 'description': clean_html(video_data.get('desMedia')),
'thumbnail': video_data.get('thumbnail'), 'thumbnail': video_data.get('thumbnail'),

View File

@ -69,6 +69,13 @@ class UstreamIE(InfoExtractor):
}, },
}] }]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
if mobj is not None:
return mobj.group('url')
def _get_stream_info(self, url, video_id, app_id_ver, extra_note=None): def _get_stream_info(self, url, video_id, app_id_ver, extra_note=None):
def num_to_hex(n): def num_to_hex(n):
return hex(n)[2:] return hex(n)[2:]

View File

@ -4,9 +4,9 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_etree_fromstring,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
compat_HTTPError,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -140,21 +140,6 @@ class VevoIE(VevoBaseIE):
'url': 'http://www.vevo.com/watch/INS171400764', 'url': 'http://www.vevo.com/watch/INS171400764',
'only_matching': True, 'only_matching': True,
}] }]
_SMIL_BASE_URL = 'http://smil.lvl3.vevo.com'
_SOURCE_TYPES = {
0: 'youtube',
1: 'brightcove',
2: 'http',
3: 'hls_ios',
4: 'hls',
5: 'smil', # http
7: 'f4m_cc',
8: 'f4m_ak',
9: 'f4m_l3',
10: 'ism',
13: 'smil', # rtmp
18: 'dash',
}
_VERSIONS = { _VERSIONS = {
0: 'youtube', # only in AuthenticateVideo videoVersions 0: 'youtube', # only in AuthenticateVideo videoVersions
1: 'level3', 1: 'level3',
@ -163,41 +148,6 @@ class VevoIE(VevoBaseIE):
4: 'amazon', 4: 'amazon',
} }
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
formats = []
els = smil.findall('.//{http://www.w3.org/2001/SMIL20/Language}video')
for el in els:
src = el.attrib['src']
m = re.match(r'''(?xi)
(?P<ext>[a-z0-9]+):
(?P<path>
[/a-z0-9]+ # The directory and main part of the URL
_(?P<tbr>[0-9]+)k
_(?P<width>[0-9]+)x(?P<height>[0-9]+)
_(?P<vcodec>[a-z0-9]+)
_(?P<vbr>[0-9]+)
_(?P<acodec>[a-z0-9]+)
_(?P<abr>[0-9]+)
\.[a-z0-9]+ # File extension
)''', src)
if not m:
continue
format_url = self._SMIL_BASE_URL + m.group('path')
formats.append({
'url': format_url,
'format_id': 'smil_' + m.group('tbr'),
'vcodec': m.group('vcodec'),
'acodec': m.group('acodec'),
'tbr': int(m.group('tbr')),
'vbr': int(m.group('vbr')),
'abr': int(m.group('abr')),
'ext': m.group('ext'),
'width': int(m.group('width')),
'height': int(m.group('height')),
})
return formats
def _initialize_api(self, video_id): def _initialize_api(self, video_id):
req = sanitized_Request( req = sanitized_Request(
'http://www.vevo.com/auth', data=b'') 'http://www.vevo.com/auth', data=b'')
@ -206,7 +156,7 @@ class VevoIE(VevoBaseIE):
note='Retrieving oauth token', note='Retrieving oauth token',
errnote='Unable to retrieve oauth token') errnote='Unable to retrieve oauth token')
if 'THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION' in webpage: if re.search(r'(?i)THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION', webpage):
self.raise_geo_restricted( self.raise_geo_restricted(
'%s said: This page is currently unavailable in your region' % self.IE_NAME) '%s said: This page is currently unavailable in your region' % self.IE_NAME)
@ -214,148 +164,91 @@ class VevoIE(VevoBaseIE):
self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token'] self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
def _call_api(self, path, *args, **kwargs): def _call_api(self, path, *args, **kwargs):
return self._download_json(self._api_url_template % path, *args, **kwargs) try:
data = self._download_json(self._api_url_template % path, *args, **kwargs)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError):
errors = self._parse_json(e.cause.read().decode(), None)['errors']
error_message = ', '.join([error['message'] for error in errors])
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
raise
return data
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
json_url = 'http://api.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id self._initialize_api(video_id)
response = self._download_json(
json_url, video_id, 'Downloading video info', video_info = self._call_api(
'Unable to download info', fatal=False) or {} 'video/%s' % video_id, video_id, 'Downloading api video info',
video_info = response.get('video') or {} 'Failed to download video info')
video_versions = self._call_api(
'video/%s/streams' % video_id, video_id,
'Downloading video versions info',
'Failed to download video versions info',
fatal=False)
# Some videos are only available via webpage (e.g.
# https://github.com/rg3/youtube-dl/issues/9366)
if not video_versions:
webpage = self._download_webpage(url, video_id)
video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
uploader = None
artist = None artist = None
featured_artist = None featured_artist = None
uploader = None artists = video_info.get('artists')
view_count = None for curr_artist in artists:
if curr_artist.get('role') == 'Featured':
featured_artist = curr_artist['name']
else:
artist = uploader = curr_artist['name']
formats = [] formats = []
for video_version in video_versions:
version = self._VERSIONS.get(video_version['version'])
version_url = video_version.get('url')
if not version_url:
continue
if not video_info: if '.ism' in version_url:
try: continue
self._initialize_api(video_id) elif '.mpd' in version_url:
except ExtractorError: formats.extend(self._extract_mpd_formats(
ytid = response.get('errorInfo', {}).get('ytid') version_url, video_id, mpd_id='dash-%s' % version,
if ytid: note='Downloading %s MPD information' % version,
self.report_warning( errnote='Failed to download %s MPD information' % version,
'Video is geoblocked, trying with the YouTube video %s' % ytid) fatal=False))
return self.url_result(ytid, 'Youtube', ytid) elif '.m3u8' in version_url:
formats.extend(self._extract_m3u8_formats(
raise version_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls-%s' % version,
video_info = self._call_api( note='Downloading %s m3u8 information' % version,
'video/%s' % video_id, video_id, 'Downloading api video info', errnote='Failed to download %s m3u8 information' % version,
'Failed to download video info') fatal=False))
else:
video_versions = self._call_api( m = re.search(r'''(?xi)
'video/%s/streams' % video_id, video_id, _(?P<width>[0-9]+)x(?P<height>[0-9]+)
'Downloading video versions info', _(?P<vcodec>[a-z0-9]+)
'Failed to download video versions info', _(?P<vbr>[0-9]+)
fatal=False) _(?P<acodec>[a-z0-9]+)
_(?P<abr>[0-9]+)
# Some videos are only available via webpage (e.g. \.(?P<ext>[a-z0-9]+)''', version_url)
# https://github.com/rg3/youtube-dl/issues/9366) if not m:
if not video_versions:
webpage = self._download_webpage(url, video_id)
video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
timestamp = parse_iso8601(video_info.get('releaseDate'))
artists = video_info.get('artists')
for curr_artist in artists:
if curr_artist.get('role') == 'Featured':
featured_artist = curr_artist['name']
else:
artist = uploader = curr_artist['name']
view_count = int_or_none(video_info.get('views', {}).get('total'))
for video_version in video_versions:
version = self._VERSIONS.get(video_version['version'])
version_url = video_version.get('url')
if not version_url:
continue continue
if '.ism' in version_url: formats.append({
continue 'url': version_url,
elif '.mpd' in version_url: 'format_id': 'http-%s-%s' % (version, video_version['quality']),
formats.extend(self._extract_mpd_formats( 'vcodec': m.group('vcodec'),
version_url, video_id, mpd_id='dash-%s' % version, 'acodec': m.group('acodec'),
note='Downloading %s MPD information' % version, 'vbr': int(m.group('vbr')),
errnote='Failed to download %s MPD information' % version, 'abr': int(m.group('abr')),
fatal=False)) 'ext': m.group('ext'),
elif '.m3u8' in version_url: 'width': int(m.group('width')),
formats.extend(self._extract_m3u8_formats( 'height': int(m.group('height')),
version_url, video_id, 'mp4', 'm3u8_native', })
m3u8_id='hls-%s' % version,
note='Downloading %s m3u8 information' % version,
errnote='Failed to download %s m3u8 information' % version,
fatal=False))
else:
m = re.search(r'''(?xi)
_(?P<width>[0-9]+)x(?P<height>[0-9]+)
_(?P<vcodec>[a-z0-9]+)
_(?P<vbr>[0-9]+)
_(?P<acodec>[a-z0-9]+)
_(?P<abr>[0-9]+)
\.(?P<ext>[a-z0-9]+)''', version_url)
if not m:
continue
formats.append({
'url': version_url,
'format_id': 'http-%s-%s' % (version, video_version['quality']),
'vcodec': m.group('vcodec'),
'acodec': m.group('acodec'),
'vbr': int(m.group('vbr')),
'abr': int(m.group('abr')),
'ext': m.group('ext'),
'width': int(m.group('width')),
'height': int(m.group('height')),
})
else:
timestamp = int_or_none(self._search_regex(
r'/Date\((\d+)\)/',
video_info['releaseDate'], 'release date', fatal=False),
scale=1000)
artists = video_info.get('mainArtists')
if artists:
artist = uploader = artists[0]['artistName']
featured_artists = video_info.get('featuredArtists')
if featured_artists:
featured_artist = featured_artists[0]['artistName']
smil_parsed = False
for video_version in video_info['videoVersions']:
version = self._VERSIONS.get(video_version['version'])
if version == 'youtube':
continue
else:
source_type = self._SOURCE_TYPES.get(video_version['sourceType'])
renditions = compat_etree_fromstring(video_version['data'])
if source_type == 'http':
for rend in renditions.findall('rendition'):
attr = rend.attrib
formats.append({
'url': attr['url'],
'format_id': 'http-%s-%s' % (version, attr['name']),
'height': int_or_none(attr.get('frameheight')),
'width': int_or_none(attr.get('frameWidth')),
'tbr': int_or_none(attr.get('totalBitrate')),
'vbr': int_or_none(attr.get('videoBitrate')),
'abr': int_or_none(attr.get('audioBitrate')),
'vcodec': attr.get('videoCodec'),
'acodec': attr.get('audioCodec'),
})
elif source_type == 'hls':
formats.extend(self._extract_m3u8_formats(
renditions.find('rendition').attrib['url'], video_id,
'mp4', 'm3u8_native', m3u8_id='hls-%s' % version,
note='Downloading %s m3u8 information' % version,
errnote='Failed to download %s m3u8 information' % version,
fatal=False))
elif source_type == 'smil' and version == 'level3' and not smil_parsed:
formats.extend(self._extract_smil_formats(
renditions.find('rendition').attrib['url'], video_id, False))
smil_parsed = True
self._sort_formats(formats) self._sort_formats(formats)
track = video_info['title'] track = video_info['title']
@ -376,17 +269,15 @@ class VevoIE(VevoBaseIE):
else: else:
age_limit = None age_limit = None
duration = video_info.get('duration')
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'thumbnail': video_info.get('imageUrl') or video_info.get('thumbnailUrl'), 'thumbnail': video_info.get('imageUrl') or video_info.get('thumbnailUrl'),
'timestamp': timestamp, 'timestamp': parse_iso8601(video_info.get('releaseDate')),
'uploader': uploader, 'uploader': uploader,
'duration': duration, 'duration': int_or_none(video_info.get('duration')),
'view_count': view_count, 'view_count': int_or_none(video_info.get('views', {}).get('total')),
'age_limit': age_limit, 'age_limit': age_limit,
'track': track, 'track': track,
'artist': uploader, 'artist': uploader,

View File

@ -338,7 +338,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'expected_warnings': ['Unable to download JSON metadata'], 'expected_warnings': ['Unable to download JSON metadata'],
}, },
{ {
# redirects to ondemand extractor and should be passed throught it # redirects to ondemand extractor and should be passed through it
# for successful extraction # for successful extraction
'url': 'https://vimeo.com/73445910', 'url': 'https://vimeo.com/73445910',
'info_dict': { 'info_dict': {
@ -730,12 +730,12 @@ class VimeoChannelIE(VimeoBaseInfoExtractor):
# Try extracting href first since not all videos are available via # Try extracting href first since not all videos are available via
# short https://vimeo.com/id URL (e.g. https://vimeo.com/channels/tributes/6213729) # short https://vimeo.com/id URL (e.g. https://vimeo.com/channels/tributes/6213729)
clips = re.findall( clips = re.findall(
r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)', webpage) r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)(?:[^>]+\btitle="([^"]+)")?', webpage)
if clips: if clips:
for video_id, video_url in clips: for video_id, video_url, video_title in clips:
yield self.url_result( yield self.url_result(
compat_urlparse.urljoin(base_url, video_url), compat_urlparse.urljoin(base_url, video_url),
VimeoIE.ie_key(), video_id=video_id) VimeoIE.ie_key(), video_id=video_id, video_title=video_title)
# More relaxed fallback # More relaxed fallback
else: else:
for video_id in re.findall(r'id=["\']clip_(\d+)', webpage): for video_id in re.findall(r'id=["\']clip_(\d+)', webpage):
@ -884,10 +884,14 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
def _get_config_url(self, webpage_url, video_id, video_password_verified=False): def _get_config_url(self, webpage_url, video_id, video_password_verified=False):
webpage = self._download_webpage(webpage_url, video_id) webpage = self._download_webpage(webpage_url, video_id)
data = self._parse_json(self._search_regex( config_url = self._html_search_regex(
r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data', r'data-config-url=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
default=NO_DEFAULT if video_password_verified else '{}'), video_id) 'config URL', default=None, group='url')
config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl') if not config_url:
data = self._parse_json(self._search_regex(
r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
default=NO_DEFAULT if video_password_verified else '{}'), video_id)
config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl')
if config_url is None: if config_url is None:
self._verify_video_password(webpage_url, video_id, webpage) self._verify_video_password(webpage_url, video_id, webpage)
config_url = self._get_config_url( config_url = self._get_config_url(

View File

@ -16,7 +16,9 @@ class XiamiBaseIE(InfoExtractor):
return webpage return webpage
def _extract_track(self, track, track_id=None): def _extract_track(self, track, track_id=None):
title = track['title'] track_name = track.get('songName') or track.get('name') or track['subName']
artist = track.get('artist') or track.get('artist_name') or track.get('singers')
title = '%s - %s' % (artist, track_name) if artist else track_name
track_url = self._decrypt(track['location']) track_url = self._decrypt(track['location'])
subtitles = {} subtitles = {}
@ -31,9 +33,10 @@ class XiamiBaseIE(InfoExtractor):
'thumbnail': track.get('pic') or track.get('album_pic'), 'thumbnail': track.get('pic') or track.get('album_pic'),
'duration': int_or_none(track.get('length')), 'duration': int_or_none(track.get('length')),
'creator': track.get('artist', '').split(';')[0], 'creator': track.get('artist', '').split(';')[0],
'track': title, 'track': track_name,
'album': track.get('album_name'), 'track_number': int_or_none(track.get('track')),
'artist': track.get('artist'), 'album': track.get('album_name') or track.get('title'),
'artist': artist,
'subtitles': subtitles, 'subtitles': subtitles,
} }
@ -68,14 +71,14 @@ class XiamiBaseIE(InfoExtractor):
class XiamiSongIE(XiamiBaseIE): class XiamiSongIE(XiamiBaseIE):
IE_NAME = 'xiami:song' IE_NAME = 'xiami:song'
IE_DESC = '虾米音乐' IE_DESC = '虾米音乐'
_VALID_URL = r'https?://(?:www\.)?xiami\.com/song/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?xiami\.com/song/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.xiami.com/song/1775610518', 'url': 'http://www.xiami.com/song/1775610518',
'md5': '521dd6bea40fd5c9c69f913c232cb57e', 'md5': '521dd6bea40fd5c9c69f913c232cb57e',
'info_dict': { 'info_dict': {
'id': '1775610518', 'id': '1775610518',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Woman', 'title': 'HONNE - Woman',
'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg', 'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
'duration': 265, 'duration': 265,
'creator': 'HONNE', 'creator': 'HONNE',
@ -95,7 +98,7 @@ class XiamiSongIE(XiamiBaseIE):
'info_dict': { 'info_dict': {
'id': '1775256504', 'id': '1775256504',
'ext': 'mp3', 'ext': 'mp3',
'title': '悟空', 'title': '戴荃 - 悟空',
'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg', 'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
'duration': 200, 'duration': 200,
'creator': '戴荃', 'creator': '戴荃',
@ -109,6 +112,26 @@ class XiamiSongIE(XiamiBaseIE):
}, },
}, },
'skip': 'Georestricted', 'skip': 'Georestricted',
}, {
'url': 'http://www.xiami.com/song/1775953850',
'info_dict': {
'id': '1775953850',
'ext': 'mp3',
'title': 'До Скону - Чума Пожирает Землю',
'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
'duration': 683,
'creator': 'До Скону',
'track': 'Чума Пожирает Землю',
'track_number': 7,
'album': 'Ад',
'artist': 'До Скону',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.xiami.com/song/xLHGwgd07a1',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -124,7 +147,7 @@ class XiamiPlaylistBaseIE(XiamiBaseIE):
class XiamiAlbumIE(XiamiPlaylistBaseIE): class XiamiAlbumIE(XiamiPlaylistBaseIE):
IE_NAME = 'xiami:album' IE_NAME = 'xiami:album'
IE_DESC = '虾米音乐 - 专辑' IE_DESC = '虾米音乐 - 专辑'
_VALID_URL = r'https?://(?:www\.)?xiami\.com/album/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?xiami\.com/album/(?P<id>[^/?#&]+)'
_TYPE = '1' _TYPE = '1'
_TESTS = [{ _TESTS = [{
'url': 'http://www.xiami.com/album/2100300444', 'url': 'http://www.xiami.com/album/2100300444',
@ -136,28 +159,34 @@ class XiamiAlbumIE(XiamiPlaylistBaseIE):
}, { }, {
'url': 'http://www.xiami.com/album/512288?spm=a1z1s.6843761.1110925389.6.hhE9p9', 'url': 'http://www.xiami.com/album/512288?spm=a1z1s.6843761.1110925389.6.hhE9p9',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.xiami.com/album/URVDji2a506',
'only_matching': True,
}] }]
class XiamiArtistIE(XiamiPlaylistBaseIE): class XiamiArtistIE(XiamiPlaylistBaseIE):
IE_NAME = 'xiami:artist' IE_NAME = 'xiami:artist'
IE_DESC = '虾米音乐 - 歌手' IE_DESC = '虾米音乐 - 歌手'
_VALID_URL = r'https?://(?:www\.)?xiami\.com/artist/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?xiami\.com/artist/(?P<id>[^/?#&]+)'
_TYPE = '2' _TYPE = '2'
_TEST = { _TESTS = [{
'url': 'http://www.xiami.com/artist/2132?spm=0.0.0.0.dKaScp', 'url': 'http://www.xiami.com/artist/2132?spm=0.0.0.0.dKaScp',
'info_dict': { 'info_dict': {
'id': '2132', 'id': '2132',
}, },
'playlist_count': 20, 'playlist_count': 20,
'skip': 'Georestricted', 'skip': 'Georestricted',
} }, {
'url': 'http://www.xiami.com/artist/bC5Tk2K6eb99',
'only_matching': True,
}]
class XiamiCollectionIE(XiamiPlaylistBaseIE): class XiamiCollectionIE(XiamiPlaylistBaseIE):
IE_NAME = 'xiami:collection' IE_NAME = 'xiami:collection'
IE_DESC = '虾米音乐 - 精选集' IE_DESC = '虾米音乐 - 精选集'
_VALID_URL = r'https?://(?:www\.)?xiami\.com/collect/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?xiami\.com/collect/(?P<id>[^/?#&]+)'
_TYPE = '3' _TYPE = '3'
_TEST = { _TEST = {
'url': 'http://www.xiami.com/collect/156527391?spm=a1z1s.2943601.6856193.12.4jpBnr', 'url': 'http://www.xiami.com/collect/156527391?spm=a1z1s.2943601.6856193.12.4jpBnr',

View File

@ -2,44 +2,37 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import urljoin
class YourUploadIE(InfoExtractor): class YourUploadIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)? _VALID_URL = r'https?://(?:www\.)?(?:yourupload\.com/(?:watch|embed)|embed\.yourupload\.com)/(?P<id>[A-Za-z0-9]+)'
(?:yourupload\.com/watch| _TESTS = [{
embed\.yourupload\.com| 'url': 'http://yourupload.com/watch/14i14h',
embed\.yucache\.net 'md5': '5e2c63385454c557f97c4c4131a393cd',
)/(?P<id>[A-Za-z0-9]+) 'info_dict': {
''' 'id': '14i14h',
_TESTS = [ 'ext': 'mp4',
{ 'title': 'BigBuckBunny_320x180.mp4',
'url': 'http://yourupload.com/watch/14i14h', 'thumbnail': r're:^https?://.*\.jpe?g',
'md5': '5e2c63385454c557f97c4c4131a393cd', }
'info_dict': { }, {
'id': '14i14h', 'url': 'http://www.yourupload.com/embed/14i14h',
'ext': 'mp4', 'only_matching': True,
'title': 'BigBuckBunny_320x180.mp4', }, {
'thumbnail': r're:^https?://.*\.jpe?g', 'url': 'http://embed.yourupload.com/14i14h',
} 'only_matching': True,
}, }]
{
'url': 'http://embed.yourupload.com/14i14h',
'only_matching': True,
},
{
'url': 'http://embed.yucache.net/14i14h?client_file_id=803349',
'only_matching': True,
},
]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
embed_url = 'http://embed.yucache.net/{0:}'.format(video_id) embed_url = 'http://www.yourupload.com/embed/%s' % video_id
webpage = self._download_webpage(embed_url, video_id) webpage = self._download_webpage(embed_url, video_id)
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
video_url = self._og_search_video_url(webpage) video_url = urljoin(embed_url, self._og_search_video_url(webpage))
thumbnail = self._og_search_thumbnail(webpage, default=None) thumbnail = self._og_search_thumbnail(webpage, default=None)
return { return {

View File

@ -40,6 +40,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
str_to_int, str_to_int,
try_get,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
unsmuggle_url, unsmuggle_url,
@ -383,6 +384,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'], 'categories': ['Science & Technology'],
'tags': ['youtube-dl'], 'tags': ['youtube-dl'],
'duration': 10,
'like_count': int, 'like_count': int,
'dislike_count': int, 'dislike_count': int,
'start_time': 1, 'start_time': 1,
@ -402,6 +404,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli', 'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop', 'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
'iconic ep', 'iconic', 'love', 'it'], 'iconic ep', 'iconic', 'love', 'it'],
'duration': 180,
'uploader': 'Icona Pop', 'uploader': 'Icona Pop',
'uploader_id': 'IconaPop', 'uploader_id': 'IconaPop',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IconaPop', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IconaPop',
@ -419,6 +422,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'Justin Timberlake - Tunnel Vision (Explicit)', 'title': 'Justin Timberlake - Tunnel Vision (Explicit)',
'alt_title': 'Tunnel Vision', 'alt_title': 'Tunnel Vision',
'description': 'md5:64249768eec3bc4276236606ea996373', 'description': 'md5:64249768eec3bc4276236606ea996373',
'duration': 419,
'uploader': 'justintimberlakeVEVO', 'uploader': 'justintimberlakeVEVO',
'uploader_id': 'justintimberlakeVEVO', 'uploader_id': 'justintimberlakeVEVO',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
@ -458,6 +462,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'], 'categories': ['Science & Technology'],
'tags': ['youtube-dl'], 'tags': ['youtube-dl'],
'duration': 10,
'like_count': int, 'like_count': int,
'dislike_count': int, 'dislike_count': int,
}, },
@ -493,6 +498,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'm4a', 'ext': 'm4a',
'title': 'Afrojack, Spree Wilson - The Spark ft. Spree Wilson', 'title': 'Afrojack, Spree Wilson - The Spark ft. Spree Wilson',
'description': 'md5:12e7067fa6735a77bdcbb58cb1187d2d', 'description': 'md5:12e7067fa6735a77bdcbb58cb1187d2d',
'duration': 244,
'uploader': 'AfrojackVEVO', 'uploader': 'AfrojackVEVO',
'uploader_id': 'AfrojackVEVO', 'uploader_id': 'AfrojackVEVO',
'upload_date': '20131011', 'upload_date': '20131011',
@ -512,6 +518,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'Taylor Swift - Shake It Off', 'title': 'Taylor Swift - Shake It Off',
'alt_title': 'Shake It Off', 'alt_title': 'Shake It Off',
'description': 'md5:95f66187cd7c8b2c13eb78e1223b63c3', 'description': 'md5:95f66187cd7c8b2c13eb78e1223b63c3',
'duration': 242,
'uploader': 'TaylorSwiftVEVO', 'uploader': 'TaylorSwiftVEVO',
'uploader_id': 'TaylorSwiftVEVO', 'uploader_id': 'TaylorSwiftVEVO',
'upload_date': '20140818', 'upload_date': '20140818',
@ -529,6 +536,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': 'T4XJQO3qol8', 'id': 'T4XJQO3qol8',
'ext': 'mp4', 'ext': 'mp4',
'duration': 219,
'upload_date': '20100909', 'upload_date': '20100909',
'uploader': 'The Amazing Atheist', 'uploader': 'The Amazing Atheist',
'uploader_id': 'TheAmazingAtheist', 'uploader_id': 'TheAmazingAtheist',
@ -546,6 +554,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Witcher 3: Wild Hunt - The Sword Of Destiny Trailer', 'title': 'The Witcher 3: Wild Hunt - The Sword Of Destiny Trailer',
'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}', 'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
'duration': 142,
'uploader': 'The Witcher', 'uploader': 'The Witcher',
'uploader_id': 'WitcherGame', 'uploader_id': 'WitcherGame',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
@ -562,6 +571,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dedication To My Ex (Miss That) (Lyric Video)', 'title': 'Dedication To My Ex (Miss That) (Lyric Video)',
'description': 'md5:33765bb339e1b47e7e72b5490139bb41', 'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
'duration': 247,
'uploader': 'LloydVEVO', 'uploader': 'LloydVEVO',
'uploader_id': 'LloydVEVO', 'uploader_id': 'LloydVEVO',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
@ -576,6 +586,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': '__2ABJjxzNo', 'id': '__2ABJjxzNo',
'ext': 'mp4', 'ext': 'mp4',
'duration': 266,
'upload_date': '20100430', 'upload_date': '20100430',
'uploader_id': 'deadmau5', 'uploader_id': 'deadmau5',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
@ -596,6 +607,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': 'lqQg6PlCWgI', 'id': 'lqQg6PlCWgI',
'ext': 'mp4', 'ext': 'mp4',
'duration': 6085,
'upload_date': '20150827', 'upload_date': '20150827',
'uploader_id': 'olympic', 'uploader_id': 'olympic',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic',
@ -615,6 +627,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'id': '_b-2C3KPAM0', 'id': '_b-2C3KPAM0',
'ext': 'mp4', 'ext': 'mp4',
'stretched_ratio': 16 / 9., 'stretched_ratio': 16 / 9.,
'duration': 85,
'upload_date': '20110310', 'upload_date': '20110310',
'uploader_id': 'AllenMeow', 'uploader_id': 'AllenMeow',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
@ -649,6 +662,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:7b81415841e02ecd4313668cde88737a', 'title': 'md5:7b81415841e02ecd4313668cde88737a',
'description': 'md5:116377fd2963b81ec4ce64b542173306', 'description': 'md5:116377fd2963b81ec4ce64b542173306',
'duration': 220,
'upload_date': '20150625', 'upload_date': '20150625',
'uploader_id': 'dorappi2000', 'uploader_id': 'dorappi2000',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
@ -691,6 +705,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'teamPGP: Rocket League Noob Stream (Main Camera)', 'title': 'teamPGP: Rocket League Noob Stream (Main Camera)',
'description': 'md5:dc7872fb300e143831327f1bae3af010', 'description': 'md5:dc7872fb300e143831327f1bae3af010',
'duration': 7335,
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
@ -703,6 +718,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'teamPGP: Rocket League Noob Stream (kreestuh)', 'title': 'teamPGP: Rocket League Noob Stream (kreestuh)',
'description': 'md5:dc7872fb300e143831327f1bae3af010', 'description': 'md5:dc7872fb300e143831327f1bae3af010',
'duration': 7337,
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
@ -715,6 +731,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'teamPGP: Rocket League Noob Stream (grizzle)', 'title': 'teamPGP: Rocket League Noob Stream (grizzle)',
'description': 'md5:dc7872fb300e143831327f1bae3af010', 'description': 'md5:dc7872fb300e143831327f1bae3af010',
'duration': 7337,
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
@ -727,6 +744,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'teamPGP: Rocket League Noob Stream (zim)', 'title': 'teamPGP: Rocket League Noob Stream (zim)',
'description': 'md5:dc7872fb300e143831327f1bae3af010', 'description': 'md5:dc7872fb300e143831327f1bae3af010',
'duration': 7334,
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
@ -768,6 +786,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21', 'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21',
'alt_title': 'Dark Walk', 'alt_title': 'Dark Walk',
'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a', 'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
'duration': 133,
'upload_date': '20151119', 'upload_date': '20151119',
'uploader_id': 'IronSoulElf', 'uploader_id': 'IronSoulElf',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
@ -809,10 +828,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:e41008789470fc2533a3252216f1c1d1', 'title': 'md5:e41008789470fc2533a3252216f1c1d1',
'description': 'md5:a677553cf0840649b731a3024aeff4cc', 'description': 'md5:a677553cf0840649b731a3024aeff4cc',
'duration': 721,
'upload_date': '20150127', 'upload_date': '20150127',
'uploader_id': 'BerkmanCenter', 'uploader_id': 'BerkmanCenter',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter', 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
'uploader': 'BerkmanCenter', 'uploader': 'The Berkman Klein Center for Internet & Society',
'license': 'Creative Commons Attribution license (reuse allowed)', 'license': 'Creative Commons Attribution license (reuse allowed)',
}, },
'params': { 'params': {
@ -827,6 +847,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Democratic Socialism and Foreign Policy | Bernie Sanders', 'title': 'Democratic Socialism and Foreign Policy | Bernie Sanders',
'description': 'md5:dda0d780d5a6e120758d1711d062a867', 'description': 'md5:dda0d780d5a6e120758d1711d062a867',
'duration': 4060,
'upload_date': '20151119', 'upload_date': '20151119',
'uploader': 'Bernie 2016', 'uploader': 'Bernie 2016',
'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg', 'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg',
@ -864,6 +885,31 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
# YouTube Red video with episode data
'url': 'https://www.youtube.com/watch?v=iqKdEhx-dD4',
'info_dict': {
'id': 'iqKdEhx-dD4',
'ext': 'mp4',
'title': 'Isolation - Mind Field (Ep 1)',
'description': 'md5:8013b7ddea787342608f63a13ddc9492',
'duration': 2085,
'upload_date': '20170118',
'uploader': 'Vsauce',
'uploader_id': 'Vsauce',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce',
'license': 'Standard YouTube License',
'series': 'Mind Field',
'season_number': 1,
'episode_number': 1,
},
'params': {
'skip_download': True,
},
'expected_warnings': [
'Skipping DASH manifest',
],
},
{ {
# itag 212 # itag 212
'url': '1t24XAntNCY', 'url': '1t24XAntNCY',
@ -1454,6 +1500,16 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
else: else:
video_alt_title = video_creator = None video_alt_title = video_creator = None
m_episode = re.search(
r'<div[^>]+id="watch7-headline"[^>]*>\s*<span[^>]*>.*?>(?P<series>[^<]+)</a></b>\s*S(?P<season>\d+)\s*•\s*E(?P<episode>\d+)</span>',
video_webpage)
if m_episode:
series = m_episode.group('series')
season_number = int(m_episode.group('season'))
episode_number = int(m_episode.group('episode'))
else:
series = season_number = episode_number = None
m_cat_container = self._search_regex( m_cat_container = self._search_regex(
r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>', r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
video_webpage, 'categories', default=None) video_webpage, 'categories', default=None)
@ -1482,11 +1538,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
video_subtitles = self.extract_subtitles(video_id, video_webpage) video_subtitles = self.extract_subtitles(video_id, video_webpage)
automatic_captions = self.extract_automatic_captions(video_id, video_webpage) automatic_captions = self.extract_automatic_captions(video_id, video_webpage)
if 'length_seconds' not in video_info: video_duration = try_get(
self._downloader.report_warning('unable to extract video duration') video_info, lambda x: int_or_none(x['length_seconds'][0]))
video_duration = None if not video_duration:
else: video_duration = parse_duration(self._html_search_meta(
video_duration = int(compat_urllib_parse_unquote_plus(video_info['length_seconds'][0])) 'duration', video_webpage, 'video duration'))
# annotations # annotations
video_annotations = None video_annotations = None
@ -1743,6 +1799,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'is_live': is_live, 'is_live': is_live,
'start_time': start_time, 'start_time': start_time,
'end_time': end_time, 'end_time': end_time,
'series': series,
'season_number': season_number,
'episode_number': episode_number,
} }
@ -1819,6 +1878,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'title': 'YDL_Empty_List', 'title': 'YDL_Empty_List',
}, },
'playlist_count': 0, 'playlist_count': 0,
'skip': 'This playlist is private',
}, { }, {
'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.', 'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC', 'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
@ -1850,6 +1910,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'id': 'PLtPgu7CB4gbY9oDN3drwC3cMbJggS7dKl', 'id': 'PLtPgu7CB4gbY9oDN3drwC3cMbJggS7dKl',
}, },
'playlist_count': 2, 'playlist_count': 2,
'skip': 'This playlist is private',
}, { }, {
'note': 'embedded', 'note': 'embedded',
'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu', 'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
@ -1961,14 +2022,18 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
url = self._TEMPLATE_URL % playlist_id url = self._TEMPLATE_URL % playlist_id
page = self._download_webpage(url, playlist_id) page = self._download_webpage(url, playlist_id)
for match in re.findall(r'<div class="yt-alert-message">([^<]+)</div>', page): # the yt-alert-message now has tabindex attribute (see https://github.com/rg3/youtube-dl/issues/11604)
for match in re.findall(r'<div class="yt-alert-message"[^>]*>([^<]+)</div>', page):
match = match.strip() match = match.strip()
# Check if the playlist exists or is private # Check if the playlist exists or is private
if re.match(r'[^<]*(The|This) playlist (does not exist|is private)[^<]*', match): mobj = re.match(r'[^<]*(?:The|This) playlist (?P<reason>does not exist|is private)[^<]*', match)
raise ExtractorError( if mobj:
'The playlist doesn\'t exist or is private, use --username or ' reason = mobj.group('reason')
'--netrc to access it.', message = 'This playlist %s' % reason
expected=True) if 'private' in reason:
message += ', use --username or --netrc to access it'
message += '.'
raise ExtractorError(message, expected=True)
elif re.match(r'[^<]*Invalid parameters[^<]*', match): elif re.match(r'[^<]*Invalid parameters[^<]*', match):
raise ExtractorError( raise ExtractorError(
'Invalid parameters. Maybe URL is incorrect.', 'Invalid parameters. Maybe URL is incorrect.',

View File

@ -751,7 +751,7 @@ def parseOpts(overrideArguments=None):
help='Convert video files to audio-only files (requires ffmpeg or avconv and ffprobe or avprobe)') help='Convert video files to audio-only files (requires ffmpeg or avconv and ffprobe or avprobe)')
postproc.add_option( postproc.add_option(
'--audio-format', metavar='FORMAT', dest='audioformat', default='best', '--audio-format', metavar='FORMAT', dest='audioformat', default='best',
help='Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default') help='Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default; No effect without -x')
postproc.add_option( postproc.add_option(
'--audio-quality', metavar='QUALITY', '--audio-quality', metavar='QUALITY',
dest='audioquality', default='5', dest='audioquality', default='5',
@ -867,7 +867,7 @@ def parseOpts(overrideArguments=None):
if '--ignore-config' not in system_conf: if '--ignore-config' not in system_conf:
user_conf = _readUserConf() user_conf = _readUserConf()
argv = system_conf + user_conf + command_line_conf argv = system_conf + user_conf + custom_conf + command_line_conf
opts, args = parser.parse_args(argv) opts, args = parser.parse_args(argv)
if opts.verbose: if opts.verbose:
for conf_label, conf in ( for conf_label, conf in (

View File

@ -143,6 +143,7 @@ DATE_FORMATS = (
'%Y/%m/%d', '%Y/%m/%d',
'%Y/%m/%d %H:%M', '%Y/%m/%d %H:%M',
'%Y/%m/%d %H:%M:%S', '%Y/%m/%d %H:%M:%S',
'%Y-%m-%d %H:%M',
'%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M:%S',
'%Y-%m-%d %H:%M:%S.%f', '%Y-%m-%d %H:%M:%S.%f',
'%d.%m.%Y %H:%M', '%d.%m.%Y %H:%M',
@ -1772,7 +1773,7 @@ def parse_duration(s):
s = s.strip() s = s.strip()
days, hours, mins, secs, ms = [None] * 5 days, hours, mins, secs, ms = [None] * 5
m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?$', s) m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?Z?$', s)
if m: if m:
days, hours, mins, secs, ms = m.groups() days, hours, mins, secs, ms = m.groups()
else: else:
@ -1789,11 +1790,11 @@ def parse_duration(s):
)? )?
(?: (?:
(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s* (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
)?$''', s) )?Z?$''', s)
if m: if m:
days, hours, mins, secs, ms = m.groups() days, hours, mins, secs, ms = m.groups()
else: else:
m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)$', s) m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s)
if m: if m:
hours, mins = m.groups() hours, mins = m.groups()
else: else:

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.01.10' __version__ = '2017.01.28'