Merge remote-tracking branch 'upstream/master' into yuvutu

This commit is contained in:
Simon Morgan 2016-10-09 11:57:29 +01:00
commit 54542a3ed0
20 changed files with 277 additions and 84 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.10.07*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.02** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.10.07**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.10.02 [debug] youtube-dl version 2016.10.07
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -1,3 +1,32 @@
version <unreleased>
Core
* [Makefile] Support for GNU make < 4 is fixed; BSD make dropped (#9387)
Extractors
+ [nextmedia] Recognize action news on AppleDaily
version 2016.10.07
Extractors
+ [iprima] Detect geo restriction
* [facebook] Fix video extraction (#10846)
+ [commonprotocols] Support direct MMS links (#10838)
+ [generic] Add support for multiple vimeo embeds (#10862)
+ [nzz] Add support for nzz.ch (#4407)
+ [npo] Detect geo restriction
+ [npo] Add support for 2doc.nl (#10842)
+ [lego] Add support for lego.com (#10369)
+ [tonline] Add support for t-online.de (#10376)
* [techtalks] Relax URL regular expression (#10840)
* [youtube:live] Extend URL regular expression (#10839)
+ [theweatherchannel] Add support for weather.com (#7188)
+ [thisoldhouse] Add support for thisoldhouse.com (#10837)
+ [nhl] Add support for wch2016.com (#10833)
* [pornoxo] Use JWPlatform to improve metadata extraction
version 2016.10.02 version 2016.10.02
Core Core

View File

@ -12,7 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR) install -d $(DESTDIR)$(BINDIR)
@ -90,7 +90,7 @@ fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py' _EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES) youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@ $(PYTHON) devscripts/make_lazy_extractors.py $@

View File

@ -923,7 +923,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make (both GNU make and BSD make are supported) * make (only GNU make is supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests

View File

@ -364,6 +364,7 @@
- **Le**: 乐视网 - **Le**: 乐视网
- **Learnr** - **Learnr**
- **Lecture2Go** - **Lecture2Go**
- **LEGO**
- **Lemonde** - **Lemonde**
- **LePlaylist** - **LePlaylist**
- **LetvCloud**: 乐视云 - **LetvCloud**: 乐视云
@ -507,6 +508,7 @@
- **Nuvid** - **Nuvid**
- **NYTimes** - **NYTimes**
- **NYTimesArticle** - **NYTimesArticle**
- **NZZ**
- **ocw.mit.edu** - **ocw.mit.edu**
- **OdaTV** - **OdaTV**
- **Odnoklassniki** - **Odnoklassniki**
@ -692,6 +694,7 @@
- **SWRMediathek** - **SWRMediathek**
- **Syfy** - **Syfy**
- **SztvHu** - **SztvHu**
- **t-online.de**
- **Tagesschau** - **Tagesschau**
- **tagesschau:player** - **tagesschau:player**
- **Tass** - **Tass**
@ -721,8 +724,10 @@
- **TheScene** - **TheScene**
- **TheSixtyOne** - **TheSixtyOne**
- **TheStar** - **TheStar**
- **TheWeatherChannel**
- **ThisAmericanLife** - **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **ThisOldHouse**
- **tinypic**: tinypic.com videos - **tinypic**: tinypic.com videos
- **tlc.de** - **tlc.de**
- **TMZ** - **TMZ**

View File

@ -21,6 +21,7 @@ from ..compat import (
compat_os_name, compat_os_name,
compat_str, compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
@ -2020,6 +2021,12 @@ class InfoExtractor(object):
headers['Ytdl-request-proxy'] = geo_verification_proxy headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers return headers
def _generic_id(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
def _generic_title(self, url):
return compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """

View File

@ -1,13 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
from ..utils import url_basename
class RtmpIE(InfoExtractor): class RtmpIE(InfoExtractor):
@ -23,8 +19,8 @@ class RtmpIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0]) video_id = self._generic_id(url)
title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]) title = self._generic_title(url)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
@ -34,3 +30,31 @@ class RtmpIE(InfoExtractor):
'format_id': compat_urlparse.urlparse(url).scheme, 'format_id': compat_urlparse.urlparse(url).scheme,
}], }],
} }
class MmsIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'(?i)mms://.+'
_TEST = {
# Direct MMS link
'url': 'mms://kentro.kaist.ac.kr/200907/MilesReid(0709).wmv',
'info_dict': {
'id': 'MilesReid(0709)',
'ext': 'wmv',
'title': 'MilesReid(0709)',
},
'params': {
'skip_download': True, # rtsp downloads, requiring mplayer or mpv
},
}
def _real_extract(self, url):
video_id = self._generic_id(url)
title = self._generic_title(url)
return {
'id': video_id,
'title': title,
'url': url,
}

View File

@ -186,7 +186,10 @@ from .comedycentral import (
) )
from .comcarcoff import ComCarCoffIE from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE from .commonprotocols import (
MmsIE,
RtmpIE,
)
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .cracked import CrackedIE from .cracked import CrackedIE
from .crackle import CrackleIE from .crackle import CrackleIE
@ -638,6 +641,7 @@ from .nytimes import (
NYTimesArticleIE, NYTimesArticleIE,
) )
from .nuvid import NuvidIE from .nuvid import NuvidIE
from .nzz import NZZIE
from .odatv import OdaTVIE from .odatv import OdaTVIE
from .odnoklassniki import OdnoklassnikiIE from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE from .oktoberfesttv import OktoberfestTVIE

View File

@ -258,7 +258,7 @@ class FacebookIE(InfoExtractor):
if not video_data: if not video_data:
server_js_data = self._parse_json(self._search_regex( server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})\);', webpage, 'server js data', default='{}'), video_id) r'handleServerJS\(({.+})(?:\);|,")', webpage, 'server js data', default='{}'), video_id)
for item in server_js_data.get('instances', []): for item in server_js_data.get('instances', []):
if item[1][0] == 'VideoConfig': if item[1][0] == 'VideoConfig':
video_data = video_data_list2dict(item[2][0]['videoData']) video_data = video_data_list2dict(item[2][0]['videoData'])

View File

@ -27,7 +27,6 @@ from ..utils import (
unified_strdate, unified_strdate,
unsmuggle_url, unsmuggle_url,
UnsupportedError, UnsupportedError,
url_basename,
xpath_text, xpath_text,
) )
from .brightcove import ( from .brightcove import (
@ -1549,7 +1548,7 @@ class GenericIE(InfoExtractor):
force_videoid = smuggled_data['force_videoid'] force_videoid = smuggled_data['force_videoid']
video_id = force_videoid video_id = force_videoid
else: else:
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0]) video_id = self._generic_id(url)
self.to_screen('%s: Requesting header' % video_id) self.to_screen('%s: Requesting header' % video_id)
@ -1578,7 +1577,7 @@ class GenericIE(InfoExtractor):
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]), 'title': self._generic_title(url),
'upload_date': unified_strdate(head_response.headers.get('Last-Modified')) 'upload_date': unified_strdate(head_response.headers.get('Last-Modified'))
} }
@ -1754,9 +1753,9 @@ class GenericIE(InfoExtractor):
if matches: if matches:
return _playlist_from_matches(matches, ie='RtlNl') return _playlist_from_matches(matches, ie='RtlNl')
vimeo_url = VimeoIE._extract_vimeo_url(url, webpage) vimeo_urls = VimeoIE._extract_urls(url, webpage)
if vimeo_url is not None: if vimeo_urls:
return self.url_result(vimeo_url) return _playlist_from_matches(vimeo_urls, ie=VimeoIE.ie_key())
vid_me_embed_url = self._search_regex( vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]', r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',

View File

@ -81,6 +81,9 @@ class IPrimaIE(InfoExtractor):
for _, src in re.findall(r'src["\']\s*:\s*(["\'])(.+?)\1', playerpage): for _, src in re.findall(r'src["\']\s*:\s*(["\'])(.+?)\1', playerpage):
extract_formats(src) extract_formats(src)
if not formats and '>GEO_IP_NOT_ALLOWED<' in playerpage:
self.raise_geo_restricted()
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -1,45 +1,86 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
unescapeHTML, unescapeHTML,
int_or_none, parse_duration,
get_element_by_class,
) )
class LEGOIE(InfoExtractor): class LEGOIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lego\.com/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)' _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
_TEST = { _TESTS = [{
'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1', 'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
'md5': 'f34468f176cfd76488767fc162c405fa', 'md5': 'f34468f176cfd76488767fc162c405fa',
'info_dict': { 'info_dict': {
'id': '55492d823b1b4d5e985787fa8c2973b1', 'id': '55492d823b1b4d5e985787fa8c2973b1',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi', 'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
} 'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
} },
}, {
# geo-restricted but the contentUrl contain a valid url
'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
'info_dict': {
'id': '13bdc2299ab24d9685701a915b3d71e7',
'ext': 'mp4',
'title': 'Aflevering 20 - Helden van het koninkrijk',
'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
},
}, {
# special characters in title
'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
'info_dict': {
'id': '9685ee9d12e84ff38e84b4e3d0db533d',
'ext': 'mp4',
'title': 'Force Surprise LEGO® Star Wars™ Microfighters',
'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
},
'params': {
'skip_download': True,
},
}]
_BITRATES = [256, 512, 1024, 1536, 2560] _BITRATES = [256, 512, 1024, 1536, 2560]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) locale, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage( webpage = self._download_webpage(url, video_id)
'http://www.lego.com/en-US/mediaplayer/video/' + video_id, video_id) title = get_element_by_class('video-header', webpage).strip()
title = self._search_regex(r'<title>(.+?)</title>', webpage, 'title') progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
video_data = self._parse_json(unescapeHTML(self._search_regex( streaming_base = 'http://legoprod-f.akamaihd.net/'
r"video='([^']+)'", webpage, 'video data')), video_id) content_url = self._html_search_meta('contentUrl', webpage)
progressive_base = self._search_regex( path = self._search_regex(
r'data-video-progressive-url="([^"]+)"', r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/') content_url, 'video path', default=None)
streaming_base = self._search_regex( if not path:
r'data-video-streaming-url="([^"]+)"', player_url = self._proto_relative_url(self._search_regex(
webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/') r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
item_id = video_data['ItemId'] webpage, 'player url', default=None))
if not player_url:
base_url = self._proto_relative_url(self._search_regex(
r'data-baseurl="([^"]+)"', webpage, 'base url',
default='http://www.lego.com/%s/mediaplayer/video/' % locale))
player_url = base_url + video_id
player_webpage = self._download_webpage(player_url, video_id)
video_data = self._parse_json(unescapeHTML(self._search_regex(
r"video='([^']+)'", player_webpage, 'video data')), video_id)
progressive_base = self._search_regex(
r'data-video-progressive-url="([^"]+)"',
player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
streaming_base = self._search_regex(
r'data-video-streaming-url="([^"]+)"',
player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
item_id = video_data['ItemId']
net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]]) net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])]) base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
path = '/'.join([net_storage_path, base_path]) path = '/'.join([net_storage_path, base_path])
streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES)) streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
formats = self._extract_akamai_formats( formats = self._extract_akamai_formats(
@ -80,7 +121,8 @@ class LEGOIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'thumbnail': video_data.get('CoverImageUrl'), 'description': self._html_search_meta('description', webpage),
'duration': int_or_none(video_data.get('Length')), 'thumbnail': self._html_search_meta('thumbnail', webpage),
'duration': parse_duration(self._html_search_meta('duration', webpage)),
'formats': formats, 'formats': formats,
} }

View File

@ -93,7 +93,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE): class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報' IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?' _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews|actionnews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694', 'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
'md5': 'a843ab23d150977cc55ef94f1e2c1e4d', 'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@ -154,6 +154,9 @@ class AppleDailyIE(NextMediaIE):
'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748', 'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748',
'upload_date': '20140417', 'upload_date': '20140417',
}, },
}, {
'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/',
'only_matching': True,
}] }]
_URL_PATTERN = r'\{url: \'(.+)\'\}' _URL_PATTERN = r'\{url: \'(.+)\'\}'

View File

@ -3,6 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
fix_xml_ampersands, fix_xml_ampersands,
orderedSet, orderedSet,
@ -10,6 +11,7 @@ from ..utils import (
qualities, qualities,
strip_jsonp, strip_jsonp,
unified_strdate, unified_strdate,
ExtractorError,
) )
@ -181,9 +183,16 @@ class NPOIE(NPOBaseIE):
continue continue
streams = format_info.get('streams') streams = format_info.get('streams')
if streams: if streams:
video_info = self._download_json( try:
streams[0] + '&type=json', video_info = self._download_json(
video_id, 'Downloading %s stream JSON' % format_id) streams[0] + '&type=json',
video_id, 'Downloading %s stream JSON' % format_id)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
error = (self._parse_json(ee.cause.read().decode(), video_id, fatal=False) or {}).get('errorstring')
if error:
raise ExtractorError(error, expected=True)
raise
else: else:
video_info = format_info video_info = format_info
video_url = video_info.get('url') video_url = video_info.get('url')
@ -459,8 +468,9 @@ class NPOPlaylistBaseIE(NPOIE):
class VPROIE(NPOPlaylistBaseIE): class VPROIE(NPOPlaylistBaseIE):
IE_NAME = 'vpro' IE_NAME = 'vpro'
_VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html' _VALID_URL = r'https?://(?:www\.)?(?:(?:tegenlicht\.)?vpro|2doc)\.nl/(?:[^/]+/)*(?P<id>[^/]+)\.html'
_PLAYLIST_TITLE_RE = r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)' _PLAYLIST_TITLE_RE = (r'<h1[^>]+class=["\'].*?\bmedia-platform-title\b.*?["\'][^>]*>([^<]+)',
r'<h5[^>]+class=["\'].*?\bmedia-platform-subtitle\b.*?["\'][^>]*>([^<]+)')
_PLAYLIST_ENTRY_RE = r'data-media-id="([^"]+)"' _PLAYLIST_ENTRY_RE = r'data-media-id="([^"]+)"'
_TESTS = [ _TESTS = [
@ -492,6 +502,27 @@ class VPROIE(NPOPlaylistBaseIE):
'title': 'education education', 'title': 'education education',
}, },
'playlist_count': 2, 'playlist_count': 2,
},
{
'url': 'http://www.2doc.nl/documentaires/series/2doc/2015/oktober/de-tegenprestatie.html',
'info_dict': {
'id': 'de-tegenprestatie',
'title': 'De Tegenprestatie',
},
'playlist_count': 2,
}, {
'url': 'http://www.2doc.nl/speel~VARA_101375237~mh17-het-verdriet-van-nederland~.html',
'info_dict': {
'id': 'VARA_101375237',
'ext': 'm4v',
'title': 'MH17: Het verdriet van Nederland',
'description': 'md5:09e1a37c1fdb144621e22479691a9f18',
'upload_date': '20150716',
},
'params': {
# Skip because of m3u8 download
'skip_download': True
},
} }
] ]

View File

@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
)
class NZZIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nzz\.ch/(?:[^/]+/)*[^/?#]+-ld\.(?P<id>\d+)'
_TEST = {
'url': 'http://www.nzz.ch/zuerich/gymizyte/gymizyte-schreiben-schueler-heute-noch-diktate-ld.9153',
'info_dict': {
'id': '9153',
},
'playlist_mincount': 6,
}
def _real_extract(self, url):
page_id = self._match_id(url)
webpage = self._download_webpage(url, page_id)
entries = []
for player_element in re.findall(r'(<[^>]+class="kalturaPlayer"[^>]*>)', webpage):
player_params = extract_attributes(player_element)
if player_params.get('data-type') not in ('kaltura_singleArticle',):
self.report_warning('Unsupported player type')
continue
entry_id = player_params['data-id']
entries.append(self.url_result(
'kaltura:1750922:' + entry_id, 'Kaltura', entry_id))
return self.playlist_result(entries, page_id)

View File

@ -1,29 +1,29 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import str_or_none from ..utils import (
qualities,
str_or_none,
)
class ReverbNationIE(InfoExtractor): class ReverbNationIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?reverbnation\.com/.*?/song/(?P<id>\d+).*?$' _VALID_URL = r'^https?://(?:www\.)?reverbnation\.com/.*?/song/(?P<id>\d+).*?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa', 'url': 'http://www.reverbnation.com/alkilados/song/16965047-mona-lisa',
'md5': '3da12ebca28c67c111a7f8b262d3f7a7', 'md5': 'c0aaf339bcee189495fdf5a8c8ba8645',
'info_dict': { 'info_dict': {
'id': '16965047', 'id': '16965047',
'ext': 'mp3', 'ext': 'mp3',
'title': 'MONA LISA', 'title': 'MONA LISA',
'uploader': 'ALKILADOS', 'uploader': 'ALKILADOS',
'uploader_id': '216429', 'uploader_id': '216429',
'thumbnail': 're:^https://gp1\.wac\.edgecastcdn\.net/.*?\.jpg$' 'thumbnail': 're:^https?://.*\.jpg',
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) song_id = self._match_id(url)
song_id = mobj.group('id')
api_res = self._download_json( api_res = self._download_json(
'https://api.reverbnation.com/song/%s' % song_id, 'https://api.reverbnation.com/song/%s' % song_id,
@ -31,14 +31,23 @@ class ReverbNationIE(InfoExtractor):
note='Downloading information of song %s' % song_id note='Downloading information of song %s' % song_id
) )
THUMBNAILS = ('thumbnail', 'image')
quality = qualities(THUMBNAILS)
thumbnails = []
for thumb_key in THUMBNAILS:
if api_res.get(thumb_key):
thumbnails.append({
'url': api_res[thumb_key],
'preference': quality(thumb_key)
})
return { return {
'id': song_id, 'id': song_id,
'title': api_res.get('name'), 'title': api_res['name'],
'url': api_res.get('url'), 'url': api_res['url'],
'uploader': api_res.get('artist', {}).get('name'), 'uploader': api_res.get('artist', {}).get('name'),
'uploader_id': str_or_none(api_res.get('artist', {}).get('id')), 'uploader_id': str_or_none(api_res.get('artist', {}).get('id')),
'thumbnail': self._proto_relative_url( 'thumbnails': thumbnails,
api_res.get('image', api_res.get('thumbnail'))),
'ext': 'mp3', 'ext': 'mp3',
'vcodec': 'none', 'vcodec': 'none',
} }

View File

@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
@ -9,7 +7,7 @@ class SlutloadIE(InfoExtractor):
_VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$' _VALID_URL = r'^https?://(?:\w+\.)?slutload\.com/video/[^/]+/(?P<id>[^/]+)/?$'
_TEST = { _TEST = {
'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/', 'url': 'http://www.slutload.com/video/virginie-baisee-en-cam/TD73btpBqSxc/',
'md5': '0cf531ae8006b530bd9df947a6a0df77', 'md5': '868309628ba00fd488cf516a113fd717',
'info_dict': { 'info_dict': {
'id': 'TD73btpBqSxc', 'id': 'TD73btpBqSxc',
'ext': 'mp4', 'ext': 'mp4',
@ -20,9 +18,7 @@ class SlutloadIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>', video_title = self._html_search_regex(r'<h1><strong>([^<]+)</strong>',

View File

@ -355,23 +355,28 @@ class VimeoIE(VimeoBaseInfoExtractor):
return smuggle_url(url, {'http_headers': {'Referer': referrer_url}}) return smuggle_url(url, {'http_headers': {'Referer': referrer_url}})
@staticmethod @staticmethod
def _extract_vimeo_url(url, webpage): def _extract_urls(url, webpage):
urls = []
# Look for embedded (iframe) Vimeo player # Look for embedded (iframe) Vimeo player
mobj = re.search( for mobj in re.finditer(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.vimeo\.com/video/.+?)\1',
if mobj: webpage):
player_url = unescapeHTML(mobj.group('url')) urls.append(VimeoIE._smuggle_referrer(unescapeHTML(mobj.group('url')), url))
return VimeoIE._smuggle_referrer(player_url, url) PLAIN_EMBED_RE = (
# Look for embedded (swf embed) Vimeo player # Look for embedded (swf embed) Vimeo player
mobj = re.search( r'<embed[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)\1',
r'<embed[^>]+?src="((?:https?:)?//(?:www\.)?vimeo\.com/moogaloop\.swf.+?)"', webpage) # Look more for non-standard embedded Vimeo player
if mobj: r'<video[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vimeo\.com/[0-9]+)\1',
return mobj.group(1) )
# Look more for non-standard embedded Vimeo player for embed_re in PLAIN_EMBED_RE:
mobj = re.search( for mobj in re.finditer(embed_re, webpage):
r'<video[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//(?:www\.)?vimeo\.com/[0-9]+)(?P=q1)', webpage) urls.append(mobj.group('url'))
if mobj: return urls
return mobj.group('url')
@staticmethod
def _extract_url(url, webpage):
urls = VimeoIE._extract_urls(url, webpage)
return urls[0] if urls else None
def _verify_player_video_password(self, url, video_id): def _verify_player_video_password(self, url, video_id):
password = self._downloader.params.get('videopassword') password = self._downloader.params.get('videopassword')

View File

@ -341,7 +341,7 @@ class VKIE(VKBaseIE):
if youtube_url: if youtube_url:
return self.url_result(youtube_url, 'Youtube') return self.url_result(youtube_url, 'Youtube')
vimeo_url = VimeoIE._extract_vimeo_url(url, info_page) vimeo_url = VimeoIE._extract_url(url, info_page)
if vimeo_url is not None: if vimeo_url is not None:
return self.url_result(vimeo_url) return self.url_result(vimeo_url)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.10.02' __version__ = '2016.10.07'