This commit is contained in:
Gilles Habran 2016-09-09 14:49:00 +02:00
commit 47ab27c9a5
74 changed files with 1672 additions and 819 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.31*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.31**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.08*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.08**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.08.31
[debug] youtube-dl version 2016.09.08
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -182,3 +182,6 @@ Rob van Bekkum
Petr Zvoníček
Pratyush Singh
Aleksander Nitecki
Sebastian Blunt
Matěj Cepl
Xie Yanbo

View File

@ -1,3 +1,85 @@
version 2016.09.08
Extractors
+ [jwplatform] Extract height from format label
+ [yahoo] Extract Brightcove Legacy Studio embeds (#9345)
* [videomore] Fix extraction (#10592)
* [foxgay] Fix extraction (#10480)
+ [rmcdecouverte] Add extractor for rmcdecouverte.bfmtv.com (#9709)
* [gamestar] Fix metadata extraction (#10479)
* [puls4] Fix extraction (#10583)
+ [cctv] Add extractor for CCTV and CNTV (#8153)
+ [lci] Add extractor for lci.fr (#10573)
+ [wat] Extract DASH formats
+ [viafree] Improve video id detection (#10569)
+ [trutv] Add extractor for trutv.com (#10519)
+ [nick] Add support for nickelodeon.nl (#10559)
+ [abcotvs:clips] Add support for clips.abcotvs.com
+ [abcotvs] Add support for ABC Owned Television Stations sites (#9551)
+ [miaopai] Add extractor for miaopai.com (#10556)
* [gamestar] Fix metadata extraction (#10479)
+ [bilibili] Add support for episodes (#10190)
+ [tvnoe] Add extractor for tvnoe.cz (#10524)
version 2016.09.04.1
Core
* In DASH downloader if the first segment fails, abort the whole download
process to prevent throttling (#10497)
+ Add support for --skip-unavailable-fragments and --fragment retries in
hlsnative downloader (#10165, #10448).
+ Add support for --skip-unavailable-fragments in DASH downloader
+ Introduce --skip-unavailable-fragments option for fragment based downloaders
that allows to skip fragments unavailable due to a HTTP error
* Fix extraction of video/audio entries with src attribute in
_parse_html5_media_entries (#10540)
Extractors
* [theplatform] Relax URL regular expression (#10546)
* [youtube:playlist] Extend URL regular expression
* [rottentomatoes] Delegate extraction to internetvideoarchive extractor
* [internetvideoarchive] Extract all formats
* [pornvoisines] Fix extraction (#10469)
* [rottentomatoes] Fix extraction (#10467)
* [espn] Extend URL regular expression (#10549)
* [vimple] Extend URL regular expression (#10547)
* [youtube:watchlater] Fix extraction (#10544)
* [youjizz] Fix extraction (#10437)
+ [foxnews] Add support for FoxNews Insider (#10445)
+ [fc2] Recognize Flash player URLs (#10512)
version 2016.09.03
Core
* Restore usage of NAME attribute from EXT-X-MEDIA tag for formats codes in
_extract_m3u8_formats (#10522)
* Handle semicolon in mimetype2ext
Extractors
+ [youtube] Add support for rental videos' previews (#10532)
* [youtube:playlist] Fallback to video extraction for video/playlist URLs when
no playlist is actually served (#10537)
+ [drtv] Add support for dr.dk/nyheder (#10536)
+ [facebook:plugins:video] Add extractor (#10530)
+ [go] Add extractor for *.go.com sites
* [adobepass] Check for authz_token expiration (#10527)
* [nytimes] improve extraction
* [thestar] Fix extraction (#10465)
* [glide] Fix extraction (#10478)
- [exfm] Remove extractor (#10482)
* [youporn] Fix categories and tags extraction (#10521)
+ [curiositystream] Add extractor for app.curiositystream.com
- [thvideo] Remove extractor (#10464)
* [movingimage] Fix for the new site name (#10466)
+ [cbs] Add support for once formats (#10515)
* [limelight] Skip ism snd duplicate manifests
+ [porncom] Extract categories and tags (#10510)
+ [facebook] Extract timestamp (#10508)
+ [yahoo] Extract more formats
version 2016.08.31
Extractors

View File

@ -89,6 +89,8 @@ which means you can modify it, redistribute it or use it however you like.
--mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (YouTube only)
--no-color Do not emit color codes in output
--abort-on-unavailable-fragment Abort downloading when some fragment is not
available
## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
@ -173,7 +175,10 @@ which means you can modify it, redistribute it or use it however you like.
-R, --retries RETRIES Number of retries (default is 10), or
"infinite".
--fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH only)
is 10), or "infinite" (DASH and hlsnative
only)
--skip-unavailable-fragments Skip unavailable fragments (DASH and
hlsnative only)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024)
--no-resize-buffer Do not automatically adjust the buffer
@ -846,6 +851,16 @@ will download the complete `PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re` playlist and cre
youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
### Should I add `--hls-prefer-native` into my config?
When youtube-dl detects an HLS video, it can download it either with the built-in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube-dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed.
When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](http://rg3.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
### Can you add support for this anime video site, or site which shows current movies for free?
As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.

View File

@ -19,9 +19,10 @@
- **9now.com.au**
- **abc.net.au**
- **abc.net.au:iview**
- **Abc7News**
- **abcnews**
- **abcnews:video**
- **abcotvs**: ABC Owned Television Stations
- **abcotvs:clips**
- **AcademicEarth:Course**
- **acast**
- **acast:channel**
@ -128,6 +129,7 @@
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
- **CCTV**
- **CDA**
- **CeskaTelevize**
- **channel9**: Channel 9
@ -171,6 +173,8 @@
- **CTVNews**
- **culturebox.francetvinfo.fr**
- **CultureUnplugged**
- **curiositystream**
- **curiositystream:collection**
- **CWTV**
- **DailyMail**
- **dailymotion**
@ -223,13 +227,14 @@
- **EsriVideo**
- **Europa**
- **EveryonesMixtape**
- **exfm**: ex.fm
- **ExpoTV**
- **ExtremeTube**
- **EyedoTV**
- **facebook**
- **FacebookPluginsVideo**
- **faz.net**
- **fc2**
- **fc2:embed**
- **Fczenit**
- **features.aol.com**
- **fernsehkritik.tv**
@ -243,6 +248,7 @@
- **FOX**
- **Foxgay**
- **FoxNews**: Fox News and Fox Business Video
- **foxnews:insider**
- **FoxSports**
- **france2.fr:generation-quoi**
- **FranceCulture**
@ -271,6 +277,7 @@
- **Glide**: Glide mobile video messages (glide.me)
- **Globo**
- **GloboArticle**
- **Go**
- **GodTube**
- **GodTV**
- **Golem**
@ -347,6 +354,7 @@
- **kuwo:song**: 酷我音乐
- **la7.it**
- **Laola1Tv**
- **LCI**
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网
@ -385,6 +393,7 @@
- **Metacritic**
- **Mgoon**
- **MGTV**: 芒果TV
- **MiaoPai**
- **Minhateca**
- **MinistryGrid**
- **Minoto**
@ -406,6 +415,7 @@
- **MovieClips**
- **MovieFap**
- **Moviezine**
- **MovingImage**
- **MPORA**
- **MSN**
- **mtg**: MTG services
@ -570,6 +580,7 @@
- **revision3:embed**
- **RICE**
- **RingTV**
- **RMCDecouverte**
- **RockstarGames**
- **RoosterTeeth**
- **RottenTomatoes**
@ -659,7 +670,6 @@
- **sr:mediathek**: Saarländischer Rundfunk
- **SRGSSR**
- **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
- **SSA**
- **stanfordoc**: Stanford Open ClassRoom
- **Steam**
- **Stitcher**
@ -702,8 +712,6 @@
- **TheStar**
- **ThisAmericanLife**
- **ThisAV**
- **THVideo**
- **THVideoPlaylist**
- **tinypic**: tinypic.com videos
- **tlc.de**
- **TMZ**
@ -718,6 +726,7 @@
- **TrailerAddict** (Currently broken)
- **Trilulilu**
- **trollvids**
- **TruTV**
- **Tube8**
- **TubiTv**
- **tudou**
@ -739,6 +748,7 @@
- **TVCArticle**
- **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com**
- **TVNoe**
- **tvp**: Telewizja Polska
- **tvp:embed**: Telewizja Polska
- **tvp:series**

View File

@ -39,6 +39,7 @@ from youtube_dl.utils import (
is_html,
js_to_json,
limit_length,
mimetype2ext,
ohdave_rsa_encrypt,
OnDemandPagedList,
orderedSet,
@ -625,6 +626,14 @@ class TestUtil(unittest.TestCase):
limit_length('foo bar baz asd', 12).startswith('foo bar'))
self.assertTrue('...' in limit_length('foo bar baz asd', 12))
def test_mimetype2ext(self):
self.assertEqual(mimetype2ext(None), None)
self.assertEqual(mimetype2ext('video/x-flv'), 'flv')
self.assertEqual(mimetype2ext('application/x-mpegURL'), 'm3u8')
self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
def test_parse_codecs(self):
self.assertEqual(parse_codecs(''), {})
self.assertEqual(parse_codecs('avc1.77.30, mp4a.40.2'), {

View File

@ -318,6 +318,7 @@ def _real_main(argv=None):
'nooverwrites': opts.nooverwrites,
'retries': opts.retries,
'fragment_retries': opts.fragment_retries,
'skip_unavailable_fragments': opts.skip_unavailable_fragments,
'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl,

View File

@ -38,8 +38,10 @@ class DashSegmentsFD(FragmentFD):
segments_filenames = []
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
def append_url_to_file(target_url, tmp_filename, segment_name):
def process_segment(segment, tmp_filename, fatal):
target_url, segment_name = segment
target_filename = '%s-%s' % (tmp_filename, segment_name)
count = 0
while count <= fragment_retries:
@ -52,26 +54,35 @@ class DashSegmentsFD(FragmentFD):
down.close()
segments_filenames.append(target_sanitized)
break
except (compat_urllib_error.HTTPError, ) as err:
except compat_urllib_error.HTTPError as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# is usually enough) thus allowing to download the whole file successfully.
# So, we will retry all fragments that fail with 404 HTTP error for now.
if err.code != 404:
raise
# Retry fragment
# To be future-proof we will retry all fragments that fail with any
# HTTP error.
count += 1
if count <= fragment_retries:
self.report_retry_fragment(segment_name, count, fragment_retries)
self.report_retry_fragment(err, segment_name, count, fragment_retries)
if count > fragment_retries:
if not fatal:
self.report_skip_fragment(segment_name)
return True
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
return True
if initialization_url:
append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init')
for i, segment_url in enumerate(segment_urls):
append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
segments_to_download = [(initialization_url, 'Init')] if initialization_url else []
segments_to_download.extend([
(segment_url, 'Seg%d' % i)
for i, segment_url in enumerate(segment_urls)])
for i, segment in enumerate(segments_to_download):
# In DASH, the first segment contains necessary headers to
# generate a valid MP4 file, so always abort for the first segment
fatal = i == 0 or not skip_unavailable_fragments
if not process_segment(segment, ctx['tmpfilename'], fatal):
return False
self._finish_frag_download(ctx)

View File

@ -6,6 +6,7 @@ import time
from .common import FileDownloader
from .http import HttpFD
from ..utils import (
error_to_compat_str,
encodeFilename,
sanitize_open,
)
@ -22,13 +23,19 @@ class FragmentFD(FileDownloader):
Available options:
fragment_retries: Number of times to retry a fragment for HTTP error (DASH only)
fragment_retries: Number of times to retry a fragment for HTTP error (DASH
and hlsnative only)
skip_unavailable_fragments:
Skip unavailable fragments (DASH and hlsnative only)
"""
def report_retry_fragment(self, fragment_name, count, retries):
def report_retry_fragment(self, err, fragment_name, count, retries):
self.to_screen(
'[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...'
% (fragment_name, count, self.format_retries(retries)))
'[download] Got server HTTP error: %s. Retrying fragment %s (attempt %d of %s)...'
% (error_to_compat_str(err), fragment_name, count, self.format_retries(retries)))
def report_skip_fragment(self, fragment_name):
self.to_screen('[download] Skipping fragment %s...' % fragment_name)
def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx)

View File

@ -13,6 +13,7 @@ from .fragment import FragmentFD
from .external import FFmpegFD
from ..compat import (
compat_urllib_error,
compat_urlparse,
compat_struct_pack,
)
@ -83,6 +84,10 @@ class HlsFD(FragmentFD):
self._prepare_and_start_frag_download(ctx)
fragment_retries = self.params.get('fragment_retries', 0)
skip_unavailable_fragments = self.params.get('skip_unavailable_fragments', True)
test = self.params.get('test', False)
extra_query = None
extra_param_to_segment_url = info_dict.get('extra_param_to_segment_url')
if extra_param_to_segment_url:
@ -99,15 +104,37 @@ class HlsFD(FragmentFD):
line
if re.match(r'^https?://', line)
else compat_urlparse.urljoin(man_url, line))
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
frag_name = 'Frag%d' % i
frag_filename = '%s-%s' % (ctx['tmpfilename'], frag_name)
if extra_query:
frag_url = update_url_query(frag_url, extra_query)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
break
except compat_urllib_error.HTTPError as err:
# Unavailable (possibly temporary) fragments may be served.
# First we try to retry then either skip or abort.
# See https://github.com/rg3/youtube-dl/issues/10165,
# https://github.com/rg3/youtube-dl/issues/10448).
count += 1
if count <= fragment_retries:
self.report_retry_fragment(err, frag_name, count, fragment_retries)
if count > fragment_retries:
if skip_unavailable_fragments:
i += 1
media_sequence += 1
self.report_skip_fragment(frag_name)
continue
self.report_error(
'giving up after %s fragment retries' % fragment_retries)
return False
down, frag_sanitized = sanitize_open(frag_filename, 'rb')
frag_content = down.read()
down.close()
if decrypt_info['METHOD'] == 'AES-128':
iv = decrypt_info.get('IV') or compat_struct_pack('>8xq', media_sequence)
frag_content = AES.new(
@ -115,7 +142,7 @@ class HlsFD(FragmentFD):
ctx['dest_stream'].write(frag_content)
frags_filenames.append(frag_sanitized)
# We only download the first fragment during the test
if self.params.get('test', False):
if test:
break
i += 1
media_sequence += 1

View File

@ -12,7 +12,7 @@ from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video'
_VALID_URL = 'http://abcnews.go.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_VALID_URL = r'https?://abcnews\.go\.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
@ -49,7 +49,7 @@ class AbcNewsVideoIE(AMPIE):
class AbcNewsIE(InfoExtractor):
IE_NAME = 'abcnews'
_VALID_URL = 'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_VALID_URL = r'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',

View File

@ -1,13 +1,19 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import parse_iso8601
from ..utils import (
int_or_none,
parse_iso8601,
)
class Abc7NewsIE(InfoExtractor):
_VALID_URL = r'https?://abc7news\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
class ABCOTVSIE(InfoExtractor):
IE_NAME = 'abcotvs'
IE_DESC = 'ABC Owned Television Stations'
_VALID_URL = r'https?://(?:abc(?:7(?:news|ny|chicago)?|11|13|30)|6abc)\.com(?:/[^/]+/(?P<display_id>[^/]+))?/(?P<id>\d+)'
_TESTS = [
{
'url': 'http://abc7news.com/entertainment/east-bay-museum-celebrates-vintage-synthesizers/472581/',
@ -15,7 +21,7 @@ class Abc7NewsIE(InfoExtractor):
'id': '472581',
'display_id': 'east-bay-museum-celebrates-vintage-synthesizers',
'ext': 'mp4',
'title': 'East Bay museum celebrates history of synthesized music',
'title': 'East Bay museum celebrates vintage synthesizers',
'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421123075,
@ -41,7 +47,7 @@ class Abc7NewsIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
m3u8 = self._html_search_meta(
'contentURL', webpage, 'm3u8 url', fatal=True)
'contentURL', webpage, 'm3u8 url', fatal=True).split('?')[0]
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
@ -66,3 +72,41 @@ class Abc7NewsIE(InfoExtractor):
'uploader': uploader,
'formats': formats,
}
class ABCOTVSClipsIE(InfoExtractor):
IE_NAME = 'abcotvs:clips'
_VALID_URL = r'https?://clips\.abcotvs\.com/(?:[^/]+/)*video/(?P<id>\d+)'
_TEST = {
'url': 'https://clips.abcotvs.com/kabc/video/214814',
'info_dict': {
'id': '214814',
'ext': 'mp4',
'title': 'SpaceX launch pad explosion destroys rocket, satellite',
'description': 'md5:9f186e5ad8f490f65409965ee9c7be1b',
'upload_date': '20160901',
'timestamp': 1472756695,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
video_data = self._download_json('https://clips.abcotvs.com/vogo/video/getByIds?ids=' + video_id, video_id)['results'][0]
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['videoURL'].split('?')[0], video_id, 'mp4')
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'thumbnail': video_data.get('thumbnailURL'),
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('pubDate')),
'formats': formats,
}

View File

@ -37,6 +37,10 @@ class AdobePassIE(InfoExtractor):
return self._search_regex(
'<%s>(.+?)</%s>' % (tag, tag), xml_str, tag)
def is_expired(token, date_ele):
token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(token, date_ele)))
return token_expires and token_expires <= int(time.time())
mvpd_headers = {
'ap_42': 'anonymous',
'ap_11': 'Linux i686',
@ -47,11 +51,8 @@ class AdobePassIE(InfoExtractor):
guid = xml_text(resource, 'guid')
requestor_info = self._downloader.cache.load('mvpd', requestor_id) or {}
authn_token = requestor_info.get('authn_token')
if authn_token:
token_expires = unified_timestamp(re.sub(r'[_ ]GMT', '', xml_text(authn_token, 'simpleTokenExpires')))
if token_expires and token_expires <= int(time.time()):
authn_token = None
requestor_info = {}
if authn_token and is_expired(authn_token, 'simpleTokenExpires'):
authn_token = None
if not authn_token:
# TODO add support for other TV Providers
mso_id = 'DTV'
@ -98,6 +99,8 @@ class AdobePassIE(InfoExtractor):
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
authz_token = requestor_info.get(guid)
if authz_token and is_expired(authz_token, 'simpleTokenTTL'):
authz_token = None
if not authz_token:
authorize = self._download_webpage(
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,

View File

@ -238,7 +238,7 @@ class ARDMediathekIE(InfoExtractor):
class ARDIE(InfoExtractor):
_VALID_URL = '(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
_TEST = {
'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
'md5': 'd216c3a86493f9322545e045ddc3eb35',

View File

@ -10,11 +10,12 @@ from ..utils import (
int_or_none,
float_or_none,
unified_timestamp,
urlencode_postdata,
)
class BiliBiliIE(InfoExtractor):
_VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
@ -77,6 +78,17 @@ class BiliBiliIE(InfoExtractor):
'skip_download': True,
},
'expected_warnings': ['upload time'],
}, {
'url': 'http://bangumi.bilibili.com/anime/v/40068',
'md5': '08d539a0884f3deb7b698fb13ba69696',
'info_dict': {
'id': '40068',
'ext': 'mp4',
'duration': 1402.357,
'title': '混沌武士 : 第7集 四面楚歌 A Risky Racket',
'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
'thumbnail': 're:^http?://.+\.jpg',
},
}]
_APP_KEY = '6f90a59ac58a4123'
@ -84,13 +96,19 @@ class BiliBiliIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
cid = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))['cid'][0]
if 'anime/v' not in url:
cid = compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
webpage, 'player parameters'))['cid'][0]
else:
js = self._download_json(
'http://bangumi.bilibili.com/web_api/get_source', video_id,
data=urlencode_postdata({'episode_id': video_id}),
headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
cid = js['result']['cid']
payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
sign = hashlib.md5((payload + self._BILIBILI_KEY).encode('utf-8')).hexdigest()
@ -125,6 +143,7 @@ class BiliBiliIE(InfoExtractor):
description = self._html_search_meta('description', webpage)
timestamp = unified_timestamp(self._html_search_regex(
r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False))
thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
# TODO 'view_count' requires deobfuscating Javascript
info = {
@ -132,7 +151,7 @@ class BiliBiliIE(InfoExtractor):
'title': title,
'description': description,
'timestamp': timestamp,
'thumbnail': self._html_search_meta('thumbnailUrl', webpage),
'thumbnail': thumbnail,
'duration': float_or_none(video_info.get('timelength'), scale=1000),
}

View File

@ -30,7 +30,7 @@ class CartoonNetworkIE(TurnerBaseIE):
return self._extract_cvp_info(
'http://www.cartoonnetwork.com/video-seo-svc/episodeservices/getCvpPlaylist?networkName=CN2&' + query, video_id, {
'secure': {
'media_src': 'http://apple-secure.cdn.turner.com/toon/big',
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
})

View File

@ -51,7 +51,7 @@ class CBSIE(CBSBaseIE):
path = 'dJ5BDC/media/guid/2198311517/' + guid
smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
formats, subtitles = self._extract_theplatform_smil(smil_url + '&manifest=m3u', guid)
for r in ('HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
for r in ('OnceURL&formats=M3U', 'HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
try:
tp_formats, _ = self._extract_theplatform_smil(smil_url + '&assetTypes=' + r, guid, 'Downloading %s SMIL data' % r.split('&')[0])
formats.extend(tp_formats)

View File

@ -0,0 +1,53 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import float_or_none
class CCTVIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:.+?\.)?
(?:
cctv\.(?:com|cn)|
cntv\.cn
)/
(?:
video/[^/]+/(?P<id>[0-9a-f]{32})|
\d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
)'''
_TESTS = [{
'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
'md5': '819c7b49fc3927d529fb4cd555621823',
'info_dict': {
'id': '454368eb19ad44a1925bf1eb96140a61',
'ext': 'mp4',
'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
}
}, {
'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
'only_matching': True,
}, {
'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
'only_matching': True
}]
def _real_extract(self, url):
video_id, display_id = re.match(self._VALID_URL, url).groups()
if not video_id:
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
webpage, 'video_id')
api_data = self._download_json(
'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
return {
'id': video_id,
'title': api_data['title'],
'formats': self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
}

View File

@ -1163,13 +1163,6 @@ class InfoExtractor(object):
m3u8_id=None, note=None, errnote=None,
fatal=True, live=False):
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u))
res = self._download_webpage_handle(
m3u8_url, video_id,
note=note or 'Downloading m3u8 information',
@ -1180,6 +1173,13 @@ class InfoExtractor(object):
m3u8_doc, urlh = res
m3u8_url = urlh.geturl()
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
if re.match(r'^https?://', u)
else compat_urlparse.urljoin(m3u8_url, u))
# We should try extracting formats only from master playlists [1], i.e.
# playlists that describe available qualities. On the other hand media
# playlists [2] should be returned as is since they contain just the media
@ -1201,7 +1201,8 @@ class InfoExtractor(object):
'protocol': entry_protocol,
'preference': preference,
}]
last_info = None
last_info = {}
last_media = {}
for line in m3u8_doc.splitlines():
if line.startswith('#EXT-X-STREAM-INF:'):
last_info = parse_m3u8_attributes(line)
@ -1224,23 +1225,24 @@ class InfoExtractor(object):
'protocol': entry_protocol,
'preference': preference,
})
else:
# When there is no URI in EXT-X-MEDIA let this tag's
# data be used by regular URI lines below
last_media = media
elif line.startswith('#') or not line.strip():
continue
else:
if last_info is None:
formats.append({'url': format_url(line)})
continue
tbr = int_or_none(last_info.get('AVERAGE-BANDWIDTH') or last_info.get('BANDWIDTH'), scale=1000)
format_id = []
if m3u8_id:
format_id.append(m3u8_id)
# Despite specification does not mention NAME attribute for
# EXT-X-STREAM-INF it still sometimes may be present
stream_name = last_info.get('NAME') or last_media.get('NAME')
# Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided
# format_id intact.
if not live:
# Despite specification does not mention NAME attribute for
# EXT-X-STREAM-INF it still sometimes may be present
stream_name = last_info.get('NAME')
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
f = {
'format_id': '-'.join(format_id),
@ -1269,6 +1271,7 @@ class InfoExtractor(object):
f.update(parse_codecs(last_info.get('CODECS')))
formats.append(f)
last_info = {}
last_media = {}
return formats
@staticmethod
@ -1746,7 +1749,7 @@ class InfoExtractor(object):
media_attributes = extract_attributes(media_tag)
src = media_attributes.get('src')
if src:
_, formats = _media_formats(src)
_, formats = _media_formats(src, media_type)
media_info['formats'].extend(formats)
media_info['thumbnail'] = media_attributes.get('poster')
if media_content:

View File

@ -0,0 +1,120 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
urlencode_postdata,
compat_str,
ExtractorError,
)
class CuriosityStreamBaseIE(InfoExtractor):
_NETRC_MACHINE = 'curiositystream'
_auth_token = None
_API_BASE_URL = 'https://api.curiositystream.com/v1/'
def _handle_errors(self, result):
error = result.get('error', {}).get('message')
if error:
if isinstance(error, dict):
error = ', '.join(error.values())
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error), expected=True)
def _call_api(self, path, video_id):
headers = {}
if self._auth_token:
headers['X-Auth-Token'] = self._auth_token
result = self._download_json(
self._API_BASE_URL + path, video_id, headers=headers)
self._handle_errors(result)
return result['data']
def _real_initialize(self):
(email, password) = self._get_login_info()
if email is None:
return
result = self._download_json(
self._API_BASE_URL + 'login', None, data=urlencode_postdata({
'email': email,
'password': password,
}))
self._handle_errors(result)
self._auth_token = result['message']['auth_token']
def _extract_media_info(self, media):
video_id = compat_str(media['id'])
limelight_media_id = media['limelight_media_id']
title = media['title']
subtitles = {}
for closed_caption in media.get('closed_captions', []):
sub_url = closed_caption.get('file')
if not sub_url:
continue
lang = closed_caption.get('code') or closed_caption.get('language') or 'en'
subtitles.setdefault(lang, []).append({
'url': sub_url,
})
return {
'_type': 'url_transparent',
'id': video_id,
'url': 'limelight:media:' + limelight_media_id,
'title': title,
'description': media.get('description'),
'thumbnail': media.get('image_large') or media.get('image_medium') or media.get('image_small'),
'duration': int_or_none(media.get('duration')),
'tags': media.get('tags'),
'subtitles': subtitles,
'ie_key': 'LimelightMedia',
}
class CuriosityStreamIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream'
_VALID_URL = r'https?://app\.curiositystream\.com/video/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/video/2',
'md5': 'a0074c190e6cddaf86900b28d3e9ee7a',
'info_dict': {
'id': '2',
'ext': 'mp4',
'title': 'How Did You Develop The Internet?',
'description': 'Vint Cerf, Google\'s Chief Internet Evangelist, describes how he and Bob Kahn created the internet.',
'timestamp': 1448388615,
'upload_date': '20151124',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
media = self._call_api('media/' + video_id, video_id)
return self._extract_media_info(media)
class CuriosityStreamCollectionIE(CuriosityStreamBaseIE):
IE_NAME = 'curiositystream:collection'
_VALID_URL = r'https?://app\.curiositystream\.com/collection/(?P<id>\d+)'
_TEST = {
'url': 'https://app.curiositystream.com/collection/2',
'info_dict': {
'id': '2',
'title': 'Curious Minds: The Internet',
'description': 'How is the internet shaping our lives in the 21st Century?',
},
'playlist_mincount': 17,
}
def _real_extract(self, url):
collection_id = self._match_id(url)
collection = self._call_api(
'collections/' + collection_id, collection_id)
entries = []
for media in collection.get('media', []):
entries.append(self._extract_media_info(media))
return self.playlist_result(
entries, collection_id,
collection.get('title'), collection.get('description'))

View File

@ -394,7 +394,7 @@ class DailymotionUserIE(DailymotionPlaylistIE):
class DailymotionCloudIE(DailymotionBaseInfoExtractor):
_VALID_URL_PREFIX = r'http://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL_PREFIX = r'https?://api\.dmcloud\.net/(?:player/)?embed/'
_VALID_URL = r'%s[^/]+/(?P<id>[^/?]+)' % _VALID_URL_PREFIX
_VALID_EMBED_URL = r'%s[^/]+/[^\'"]+' % _VALID_URL_PREFIX

View File

@ -4,26 +4,45 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
float_or_none,
mimetype2ext,
parse_iso8601,
remove_end,
)
class DRTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dr\.dk/tv/se/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_VALID_URL = r'https?://(?:www\.)?dr\.dk/(?:tv/se|nyheder)/(?:[^/]+/)*(?P<id>[\da-z-]+)(?:[/#?]|$)'
_TEST = {
'url': 'https://www.dr.dk/tv/se/boern/ultra/panisk-paske/panisk-paske-5',
'md5': 'dc515a9ab50577fa14cc4e4b0265168f',
_TESTS = [{
'url': 'https://www.dr.dk/tv/se/boern/ultra/klassen-ultra/klassen-darlig-taber-10',
'md5': '25e659cccc9a2ed956110a299fdf5983',
'info_dict': {
'id': 'panisk-paske-5',
'id': 'klassen-darlig-taber-10',
'ext': 'mp4',
'title': 'Panisk Påske (5)',
'description': 'md5:ca14173c5ab24cd26b0fcc074dff391c',
'timestamp': 1426984612,
'upload_date': '20150322',
'duration': 1455,
'title': 'Klassen - Dårlig taber (10)',
'description': 'md5:815fe1b7fa656ed80580f31e8b3c79aa',
'timestamp': 1471991907,
'upload_date': '20160823',
'duration': 606.84,
},
}
'params': {
'skip_download': True,
},
}, {
'url': 'https://www.dr.dk/nyheder/indland/live-christianias-rydning-af-pusher-street-er-i-gang',
'md5': '2c37175c718155930f939ef59952474a',
'info_dict': {
'id': 'christiania-pusher-street-ryddes-drdkrjpo',
'ext': 'mp4',
'title': 'LIVE Christianias rydning af Pusher Street er i gang',
'description': '- Det er det fedeste, der er sket i 20 år, fortæller christianit til DR Nyheder.',
'timestamp': 1472800279,
'upload_date': '20160902',
'duration': 131.4,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -35,7 +54,8 @@ class DRTVIE(InfoExtractor):
'Video %s is not available' % video_id, expected=True)
video_id = self._search_regex(
r'data-(?:material-identifier|episode-slug)="([^"]+)"',
(r'data-(?:material-identifier|episode-slug)="([^"]+)"',
r'data-resource="[^>"]+mu/programcard/expanded/([^"]+)"'),
webpage, 'video id')
programcard = self._download_json(
@ -43,9 +63,12 @@ class DRTVIE(InfoExtractor):
video_id, 'Downloading video JSON')
data = programcard['Data'][0]
title = data['Title']
description = data['Description']
timestamp = parse_iso8601(data['CreatedTime'])
title = remove_end(self._og_search_title(
webpage, default=None), ' | TV | DR') or data['Title']
description = self._og_search_description(
webpage, default=None) or data.get('Description')
timestamp = parse_iso8601(data.get('CreatedTime'))
thumbnail = None
duration = None
@ -56,16 +79,18 @@ class DRTVIE(InfoExtractor):
subtitles = {}
for asset in data['Assets']:
if asset['Kind'] == 'Image':
thumbnail = asset['Uri']
elif asset['Kind'] == 'VideoResource':
duration = asset['DurationInMilliseconds'] / 1000.0
restricted_to_denmark = asset['RestrictedToDenmark']
spoken_subtitles = asset['Target'] == 'SpokenSubtitles'
for link in asset['Links']:
uri = link['Uri']
target = link['Target']
format_id = target
if asset.get('Kind') == 'Image':
thumbnail = asset.get('Uri')
elif asset.get('Kind') == 'VideoResource':
duration = float_or_none(asset.get('DurationInMilliseconds'), 1000)
restricted_to_denmark = asset.get('RestrictedToDenmark')
spoken_subtitles = asset.get('Target') == 'SpokenSubtitles'
for link in asset.get('Links', []):
uri = link.get('Uri')
if not uri:
continue
target = link.get('Target')
format_id = target or ''
preference = None
if spoken_subtitles:
preference = -1
@ -76,8 +101,8 @@ class DRTVIE(InfoExtractor):
video_id, preference, f4m_id=format_id))
elif target == 'HLS':
formats.extend(self._extract_m3u8_formats(
uri, video_id, 'mp4', preference=preference,
m3u8_id=format_id))
uri, video_id, 'mp4', entry_protocol='m3u8_native',
preference=preference, m3u8_id=format_id))
else:
bitrate = link.get('Bitrate')
if bitrate:
@ -85,7 +110,7 @@ class DRTVIE(InfoExtractor):
formats.append({
'url': uri,
'format_id': format_id,
'tbr': bitrate,
'tbr': int_or_none(bitrate),
'ext': link.get('FileFormat'),
})
subtitles_list = asset.get('SubtitlesList')
@ -94,12 +119,18 @@ class DRTVIE(InfoExtractor):
'Danish': 'da',
}
for subs in subtitles_list:
lang = subs['Language']
subtitles[LANGS.get(lang, lang)] = [{'url': subs['Uri'], 'ext': 'vtt'}]
if not subs.get('Uri'):
continue
lang = subs.get('Language') or 'da'
subtitles.setdefault(LANGS.get(lang, lang), []).append({
'url': subs['Uri'],
'ext': mimetype2ext(subs.get('MimeType')) or 'vtt'
})
if not formats and restricted_to_denmark:
raise ExtractorError(
'Unfortunately, DR is not allowed to show this program outside Denmark.', expected=True)
self.raise_geo_restricted(
'Unfortunately, DR is not allowed to show this program outside Denmark.',
expected=True)
self._sort_formats(formats)

View File

@ -5,7 +5,7 @@ from ..utils import remove_end
class ESPNIE(InfoExtractor):
_VALID_URL = r'https?://espn\.go\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_VALID_URL = r'https?://(?:espn\.go|(?:www\.)?espn)\.com/(?:[^/]+/)*(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://espn.go.com/video/clip?id=10365079',
'md5': '60e5d097a523e767d06479335d1bdc58',
@ -47,6 +47,9 @@ class ESPNIE(InfoExtractor):
}, {
'url': 'http://espn.go.com/nba/playoffs/2015/story/_/id/12887571/john-wall-washington-wizards-no-swelling-left-hand-wrist-game-5-return',
'only_matching': True,
}, {
'url': 'http://www.espn.com/video/clip?id=10365079',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -1,58 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ExfmIE(InfoExtractor):
IE_NAME = 'exfm'
IE_DESC = 'ex.fm'
_VALID_URL = r'https?://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
_TESTS = [
{
'url': 'http://ex.fm/song/eh359',
'md5': 'e45513df5631e6d760970b14cc0c11e7',
'info_dict': {
'id': '44216187',
'ext': 'mp3',
'title': 'Test House "Love Is Not Enough" (Extended Mix) DeadJournalist Exclusive',
'uploader': 'deadjournalist',
'upload_date': '20120424',
'description': 'Test House \"Love Is Not Enough\" (Extended Mix) DeadJournalist Exclusive',
},
'note': 'Soundcloud song',
'skip': 'The site is down too often',
},
{
'url': 'http://ex.fm/song/wddt8',
'md5': '966bd70741ac5b8570d8e45bfaed3643',
'info_dict': {
'id': 'wddt8',
'ext': 'mp3',
'title': 'Safe and Sound',
'uploader': 'Capital Cities',
},
'skip': 'The site is down too often',
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
song_id = mobj.group('id')
info_url = 'http://ex.fm/api/v3/song/%s' % song_id
info = self._download_json(info_url, song_id)['song']
song_url = info['url']
if re.match(self._SOUNDCLOUD_URL, song_url) is not None:
self.to_screen('Soundcloud song detected')
return self.url_result(song_url.replace('/stream', ''), 'Soundcloud')
return {
'id': song_id,
'url': song_url,
'ext': 'mp3',
'title': info['title'],
'thumbnail': info['image']['large'],
'uploader': info['artist'],
'view_count': info['loved_count'],
}

View File

@ -5,11 +5,14 @@ from .abc import (
ABCIE,
ABCIViewIE,
)
from .abc7news import Abc7NewsIE
from .abcnews import (
AbcNewsIE,
AbcNewsVideoIE,
)
from .abcotvs import (
ABCOTVSIE,
ABCOTVSClipsIE,
)
from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
@ -143,6 +146,7 @@ from .cbsnews import (
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .cctv import CCTVIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
@ -194,6 +198,10 @@ from .ctsnews import CtsNewsIE
from .ctv import CTVIE
from .ctvnews import CTVNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .curiositystream import (
CuriosityStreamIE,
CuriosityStreamCollectionIE,
)
from .cwtv import CWTVIE
from .dailymail import DailyMailIE
from .dailymotion import (
@ -257,13 +265,18 @@ from .espn import ESPNIE
from .esri import EsriVideoIE
from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE
from .eyedotv import EyedoTVIE
from .facebook import FacebookIE
from .facebook import (
FacebookIE,
FacebookPluginsVideoIE,
)
from .faz import FazIE
from .fc2 import FC2IE
from .fc2 import (
FC2IE,
FC2EmbedIE,
)
from .fczenit import FczenitIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
@ -278,7 +291,10 @@ from .formula1 import Formula1IE
from .fourtube import FourTubeIE
from .fox import FOXIE
from .foxgay import FoxgayIE
from .foxnews import FoxNewsIE
from .foxnews import (
FoxNewsIE,
FoxNewsInsiderIE,
)
from .foxsports import FoxSportsIE
from .franceculture import FranceCultureIE
from .franceinter import FranceInterIE
@ -315,6 +331,7 @@ from .globo import (
GloboIE,
GloboArticleIE,
)
from .go import GoIE
from .godtube import GodTubeIE
from .godtv import GodTVIE
from .golem import GolemIE
@ -408,6 +425,7 @@ from .kuwo import (
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lci import LCIIE
from .lcp import (
LcpPlayIE,
LcpIE,
@ -458,6 +476,7 @@ from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mgoon import MgoonIE
from .mgtv import MGTVIE
from .miaopai import MiaoPaiIE
from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE,
@ -486,6 +505,7 @@ from .motherless import MotherlessIE
from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE
from .movingimage import MovingImageIE
from .msn import MSNIE
from .mtv import (
MTVIE,
@ -704,6 +724,7 @@ from .revision3 import (
)
from .rice import RICEIE
from .ringtv import RingTVIE
from .rmcdecouverte import RMCDecouverteIE
from .ro220 import Ro220IE
from .rockstargames import RockstarGamesIE
from .roosterteeth import RoosterTeethIE
@ -806,7 +827,6 @@ from .srgssr import (
SRGSSRPlayIE,
)
from .srmediathek import SRMediathekIE
from .ssa import SSAIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamable import StreamableIE
@ -841,6 +861,7 @@ from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .telequebec import TeleQuebecIE
from .teletask import TeleTaskIE
from .telewebion import TelewebionIE
from .testurl import TestURLIE
@ -869,15 +890,12 @@ from .tnaflix import (
MovieFapIE,
)
from .toggle import ToggleIE
from .thvideo import (
THVideoIE,
THVideoPlaylistIE
)
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .trollvids import TrollvidsIE
from .trutv import TruTVIE
from .tube8 import Tube8IE
from .tubitv import TubiTvIE
from .tudou import (
@ -907,6 +925,7 @@ from .tvc import (
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvnoe import TVNoeIE
from .tvp import (
TVPEmbedIE,
TVPIE,

View File

@ -351,3 +351,32 @@ class FacebookIE(InfoExtractor):
self._VIDEO_PAGE_TEMPLATE % video_id,
video_id, fatal_if_no_video=True)
return info_dict
class FacebookPluginsVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:[\w-]+\.)?facebook\.com/plugins/video\.php\?.*?\bhref=(?P<id>https.+)'
_TESTS = [{
'url': 'https://www.facebook.com/plugins/video.php?href=https%3A%2F%2Fwww.facebook.com%2Fgov.sg%2Fvideos%2F10154383743583686%2F&show_text=0&width=560',
'md5': '5954e92cdfe51fe5782ae9bda7058a07',
'info_dict': {
'id': '10154383743583686',
'ext': 'mp4',
'title': 'What to do during the haze?',
'uploader': 'Gov.sg',
'upload_date': '20160826',
'timestamp': 1472184808,
},
'add_ie': [FacebookIE.ie_key()],
}, {
'url': 'https://www.facebook.com/plugins/video.php?href=https%3A%2F%2Fwww.facebook.com%2Fvideo.php%3Fv%3D10204634152394104',
'only_matching': True,
}, {
'url': 'https://www.facebook.com/plugins/video.php?href=https://www.facebook.com/gov.sg/videos/10154383743583686/&show_text=0&width=560',
'only_matching': True,
}]
def _real_extract(self, url):
return self.url_result(
compat_urllib_parse_unquote(self._match_id(url)),
FacebookIE.ie_key())

View File

@ -1,10 +1,12 @@
#! -*- coding: utf-8 -*-
# coding: utf-8
from __future__ import unicode_literals
import hashlib
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_request,
compat_urlparse,
)
@ -16,7 +18,7 @@ from ..utils import (
class FC2IE(InfoExtractor):
_VALID_URL = r'^https?://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
_VALID_URL = r'^(?:https?://video\.fc2\.com/(?:[^/]+/)*content/|fc2:)(?P<id>[^/]+)'
IE_NAME = 'fc2'
_NETRC_MACHINE = 'fc2'
_TESTS = [{
@ -75,12 +77,17 @@ class FC2IE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
self._login()
webpage = self._download_webpage(url, video_id)
self._downloader.cookiejar.clear_session_cookies() # must clear
self._login()
webpage = None
if not url.startswith('fc2:'):
webpage = self._download_webpage(url, video_id)
self._downloader.cookiejar.clear_session_cookies() # must clear
self._login()
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
title = 'FC2 video %s' % video_id
thumbnail = None
if webpage is not None:
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
refer = url.replace('/content/', '/a/content/') if '/a/content/' not in url else url
mimi = hashlib.md5((video_id + '_gGddgPfeaf_gzyr').encode('utf-8')).hexdigest()
@ -113,3 +120,41 @@ class FC2IE(InfoExtractor):
'ext': 'flv',
'thumbnail': thumbnail,
}
class FC2EmbedIE(InfoExtractor):
_VALID_URL = r'https?://video\.fc2\.com/flv2\.swf\?(?P<query>.+)'
IE_NAME = 'fc2:embed'
_TEST = {
'url': 'http://video.fc2.com/flv2.swf?t=201404182936758512407645&i=20130316kwishtfitaknmcgd76kjd864hso93htfjcnaogz629mcgfs6rbfk0hsycma7shkf85937cbchfygd74&i=201403223kCqB3Ez&d=2625&sj=11&lang=ja&rel=1&from=11&cmt=1&tk=TlRBM09EQTNNekU9&tl=プリズン・ブレイク%20S1-01%20マイケル%20【吹替】',
'md5': 'b8aae5334cb691bdb1193a88a6ab5d5a',
'info_dict': {
'id': '201403223kCqB3Ez',
'ext': 'flv',
'title': 'プリズン・ブレイク S1-01 マイケル 【吹替】',
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
query = compat_parse_qs(mobj.group('query'))
video_id = query['i'][-1]
title = query.get('tl', ['FC2 video %s' % video_id])[0]
sj = query.get('sj', [None])[0]
thumbnail = None
if sj:
# See thumbnailImagePath() in ServerConst.as of flv2.swf
thumbnail = 'http://video%s-thumbnail.fc2.com/up/pic/%s.jpg' % (
sj, '/'.join((video_id[:6], video_id[6:8], video_id[-2], video_id[-1], video_id)))
return {
'_type': 'url_transparent',
'ie_key': FC2IE.ie_key(),
'url': 'fc2:%s' % video_id,
'title': title,
'thumbnail': thumbnail,
}

View File

@ -1,18 +1,24 @@
from __future__ import unicode_literals
import itertools
from .common import InfoExtractor
from ..utils import (
get_element_by_id,
remove_end,
)
class FoxgayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
_TEST = {
'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml',
'md5': '80d72beab5d04e1655a56ad37afe6841',
'md5': '344558ccfea74d33b7adbce22e577f54',
'info_dict': {
'id': '2582',
'ext': 'mp4',
'title': 'md5:6122f7ae0fc6b21ebdf59c5e083ce25a',
'description': 'md5:5e51dc4405f1fd315f7927daed2ce5cf',
'title': 'Fuck Turkish-style',
'description': 'md5:6ae2d9486921891efe89231ace13ffdf',
'age_limit': 18,
'thumbnail': 're:https?://.*\.jpg$',
},
@ -22,27 +28,35 @@ class FoxgayIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>(?P<title>.*?)</title>',
webpage, 'title', fatal=False)
description = self._html_search_regex(
r'<div class="ico_desc"><h2>(?P<description>.*?)</h2>',
webpage, 'description', fatal=False)
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), ' - Foxgay.com')
description = get_element_by_id('inf_tit', webpage)
# The default user-agent with foxgay cookies leads to pages without videos
self._downloader.cookiejar.clear('.foxgay.com')
# Find the URL for the iFrame which contains the actual video.
iframe_url = self._html_search_regex(
r'<iframe[^>]+src=([\'"])(?P<url>[^\'"]+)\1', webpage,
'video frame', group='url')
iframe = self._download_webpage(
self._html_search_regex(r'iframe src="(?P<frame>.*?)"', webpage, 'video frame'),
video_id)
video_url = self._html_search_regex(
r"v_path = '(?P<vid>http://.*?)'", iframe, 'url')
thumb_url = self._html_search_regex(
r"t_path = '(?P<thumb>http://.*?)'", iframe, 'thumbnail', fatal=False)
iframe_url, video_id, headers={'User-Agent': 'curl/7.50.1'},
note='Downloading video frame')
video_data = self._parse_json(self._search_regex(
r'video_data\s*=\s*([^;]+);', iframe, 'video data'), video_id)
formats = [{
'url': source,
'height': resolution,
} for source, resolution in zip(
video_data['sources'], video_data.get('resolutions', itertools.repeat(None)))]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'url': video_url,
'formats': formats,
'description': description,
'thumbnail': thumb_url,
'thumbnail': video_data.get('act_vid', {}).get('thumb'),
'age_limit': 18,
}

View File

@ -3,11 +3,12 @@ from __future__ import unicode_literals
import re
from .amp import AMPIE
from .common import InfoExtractor
class FoxNewsIE(AMPIE):
IE_DESC = 'Fox News and Fox Business Video'
_VALID_URL = r'https?://(?P<host>video\.fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
_VALID_URL = r'https?://(?P<host>video\.(?:insider\.)?fox(?:news|business)\.com)/v/(?:video-embed\.html\?video_id=)?(?P<id>\d+)'
_TESTS = [
{
'url': 'http://video.foxnews.com/v/3937480/frozen-in-time/#sp=show-clips',
@ -49,6 +50,11 @@ class FoxNewsIE(AMPIE):
'url': 'http://video.foxbusiness.com/v/4442309889001',
'only_matching': True,
},
{
# From http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words
'url': 'http://video.insider.foxnews.com/v/video-embed.html?video_id=5099377331001&autoplay=true&share_url=http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words&share_title=Student%20Group:%20Saying%20%27Politically%20Correct,%27%20%27Trash%27%20and%20%27Lame%27%20Is%20Offensive&share=true',
'only_matching': True,
},
]
def _real_extract(self, url):
@ -58,3 +64,43 @@ class FoxNewsIE(AMPIE):
'http://%s/v/feed/video/%s.js?template=fox' % (host, video_id))
info['id'] = video_id
return info
class FoxNewsInsiderIE(InfoExtractor):
_VALID_URL = r'https?://insider\.foxnews\.com/([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:insider'
_TEST = {
'url': 'http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words',
'md5': 'a10c755e582d28120c62749b4feb4c0c',
'info_dict': {
'id': '5099377331001',
'display_id': 'univ-wisconsin-student-group-pushing-silence-certain-words',
'ext': 'mp4',
'title': 'Student Group: Saying \'Politically Correct,\' \'Trash\' and \'Lame\' Is Offensive',
'description': 'Is campus censorship getting out of control?',
'timestamp': 1472168725,
'upload_date': '20160825',
'thumbnail': 're:^https?://.*\.jpg$',
},
'add_ie': [FoxNewsIE.ie_key()],
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_url = self._html_search_meta('embedUrl', webpage, 'embed URL')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
return {
'_type': 'url_transparent',
'ie_key': FoxNewsIE.ie_key(),
'url': embed_url,
'display_id': display_id,
'title': title,
'description': description,
}

View File

@ -1,14 +1,10 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
str_to_int,
unified_strdate,
remove_end,
)
@ -21,8 +17,9 @@ class GameStarIE(InfoExtractor):
'id': '76110',
'ext': 'mp4',
'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den vollständigen Trailer an.',
'thumbnail': 'http://images.gamestar.de/images/idgwpgsgp/bdb/2494525/600x.jpg',
'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1406542020,
'upload_date': '20140728',
'duration': 17
}
@ -32,41 +29,27 @@ class GameStarIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
og_title = self._og_search_title(webpage)
title = re.sub(r'\s*- Video (bei|-) GameStar\.de$', '', og_title)
url = 'http://gamestar.de/_misc/videos/portal/getVideoUrl.cfm?premium=0&videoId=' + video_id
description = self._og_search_description(webpage).strip()
thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:')
upload_date = unified_strdate(self._html_search_regex(
r'<span style="float:left;font-size:11px;">Datum: ([0-9]+\.[0-9]+\.[0-9]+)&nbsp;&nbsp;',
webpage, 'upload_date', fatal=False))
duration = parse_duration(self._html_search_regex(
r'&nbsp;&nbsp;Länge: ([0-9]+:[0-9]+)</span>', webpage, 'duration',
fatal=False))
view_count = str_to_int(self._html_search_regex(
r'&nbsp;&nbsp;Zuschauer: ([0-9\.]+)&nbsp;&nbsp;', webpage,
'view_count', fatal=False))
# TODO: there are multiple ld+json objects in the webpage,
# while _search_json_ld finds only the first one
json_ld = self._parse_json(self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>[^<]+VideoObject[^<]+)</script>',
webpage, 'JSON-LD', group='json_ld'), video_id)
info_dict = self._json_ld(json_ld, video_id)
info_dict['title'] = remove_end(info_dict['title'], ' - GameStar')
view_count = json_ld.get('interactionCount')
comment_count = int_or_none(self._html_search_regex(
r'>Kommentieren \(([0-9]+)\)</a>', webpage, 'comment_count',
r'([0-9]+) Kommentare</span>', webpage, 'comment_count',
fatal=False))
return {
info_dict.update({
'id': video_id,
'title': title,
'url': url,
'ext': 'mp4',
'thumbnail': thumbnail,
'description': description,
'upload_date': upload_date,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count
}
})
return info_dict

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
class GlideIE(InfoExtractor):
@ -14,10 +13,8 @@ class GlideIE(InfoExtractor):
'info_dict': {
'id': 'UZF8zlmuQbe4mr+7dCiQ0w==',
'ext': 'mp4',
'title': 'Damon Timm\'s Glide message',
'title': "Damon's Glide message",
'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
'uploader': 'Damon Timm',
'upload_date': '20140919',
}
}
@ -27,7 +24,8 @@ class GlideIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'title')
r'<title>(.+?)</title>', webpage,
'title', default=None) or self._og_search_title(webpage)
video_url = self._proto_relative_url(self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'video URL', default=None,
@ -36,18 +34,10 @@ class GlideIE(InfoExtractor):
r'<img[^>]+id=["\']video-thumbnail["\'][^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'thumbnail url', default=None,
group='url')) or self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'<div[^>]+class=["\']info-name["\'][^>]*>([^<]+)',
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'<div[^>]+class="info-date"[^>]*>([^<]+)',
webpage, 'upload date', fatal=False))
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@ -19,7 +19,7 @@ from ..utils import (
class GloboIE(InfoExtractor):
_VALID_URL = '(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
_SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
@ -396,7 +396,7 @@ class GloboIE(InfoExtractor):
class GloboArticleIE(InfoExtractor):
_VALID_URL = 'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
_VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})',

101
youtube_dl/extractor/go.py Normal file
View File

@ -0,0 +1,101 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
determine_ext,
parse_age_limit,
)
class GoIE(InfoExtractor):
_BRANDS = {
'abc': '001',
'freeform': '002',
'watchdisneychannel': '004',
'watchdisneyjunior': '008',
'watchdisneyxd': '009',
}
_VALID_URL = r'https?://(?:(?P<sub_domain>%s)\.)?go\.com/.*?vdka(?P<id>\w+)' % '|'.join(_BRANDS.keys())
_TESTS = [{
'url': 'http://abc.go.com/shows/castle/video/most-recent/vdka0_g86w5onx',
'info_dict': {
'id': '0_g86w5onx',
'ext': 'mp4',
'title': 'Sneak Peek: Language Arts',
'description': 'md5:7dcdab3b2d17e5217c953256af964e9c',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://abc.go.com/shows/after-paradise/video/most-recent/vdka3335601',
'only_matching': True,
}]
def _real_extract(self, url):
sub_domain, video_id = re.match(self._VALID_URL, url).groups()
video_data = self._download_json(
'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (self._BRANDS[sub_domain], video_id),
video_id)['video'][0]
title = video_data['title']
formats = []
for asset in video_data.get('assets', {}).get('asset', []):
asset_url = asset.get('value')
if not asset_url:
continue
format_id = asset.get('format')
ext = determine_ext(asset_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
asset_url, video_id, 'mp4', m3u8_id=format_id or 'hls', fatal=False))
else:
formats.append({
'format_id': format_id,
'url': asset_url,
'ext': ext,
})
self._sort_formats(formats)
subtitles = {}
for cc in video_data.get('closedcaption', {}).get('src', []):
cc_url = cc.get('value')
if not cc_url:
continue
ext = determine_ext(cc_url)
if ext == 'xml':
ext = 'ttml'
subtitles.setdefault(cc.get('lang'), []).append({
'url': cc_url,
'ext': ext,
})
thumbnails = []
for thumbnail in video_data.get('thumbnails', {}).get('thumbnail', []):
thumbnail_url = thumbnail.get('value')
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'width': int_or_none(thumbnail.get('width')),
'height': int_or_none(thumbnail.get('height')),
})
return {
'id': video_id,
'title': title,
'description': video_data.get('longdescription') or video_data.get('description'),
'duration': int_or_none(video_data.get('duration', {}).get('value'), 1000),
'age_limit': parse_age_limit(video_data.get('tvrating', {}).get('rating')),
'episode_number': int_or_none(video_data.get('episodenumber')),
'series': video_data.get('show', {}).get('title'),
'season_number': int_or_none(video_data.get('season', {}).get('num')),
'thumbnails': thumbnails,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -48,13 +48,23 @@ class InternetVideoArchiveIE(InfoExtractor):
# There are multiple videos in the playlist whlie only the first one
# matches the video played in browsers
video_info = configuration['playlist'][0]
title = video_info['title']
formats = []
for source in video_info['sources']:
file_url = source['file']
if determine_ext(file_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, ext='mp4', m3u8_id='hls'))
m3u8_formats = self._extract_m3u8_formats(
file_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
file_url = m3u8_formats[0]['url']
formats.extend(self._extract_f4m_formats(
file_url.replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_mpd_formats(
file_url.replace('.m3u8', '.mpd'),
video_id, mpd_id='dash', fatal=False))
else:
a_format = {
'url': file_url,
@ -70,7 +80,6 @@ class InternetVideoArchiveIE(InfoExtractor):
self._sort_formats(formats)
title = video_info['title']
description = video_info.get('description')
thumbnail = video_info.get('image')
else:

View File

@ -63,10 +63,17 @@ class JWPlatformBaseIE(InfoExtractor):
'ext': ext,
})
else:
height = int_or_none(source.get('height'))
if height is None:
# Often no height is provided but there is a label in
# format like 1080p.
height = int_or_none(self._search_regex(
r'^(\d{3,})[pP]$', source.get('label') or '',
'height', default=None))
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
'height': height,
'ext': ext,
}
if source_url.startswith('rtmp'):

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class KaraoketvIE(InfoExtractor):
_VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://www\.karaoketv\.co\.il/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': {

View File

@ -0,0 +1,24 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class LCIIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?lci\.fr/[^/]+/[\w-]+-(?P<id>\d+)\.html'
_TEST = {
'url': 'http://www.lci.fr/international/etats-unis-a-j-62-hillary-clinton-reste-sans-voix-2001679.html',
'md5': '2fdb2538b884d4d695f9bd2bde137e6c',
'info_dict': {
'id': '13244802',
'ext': 'mp4',
'title': 'Hillary Clinton et sa quinte de toux, en plein meeting',
'description': 'md5:a4363e3a960860132f8124b62f4a01c9',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
wat_id = self._search_regex(r'data-watid=[\'"](\d+)', webpage, 'wat id')
return self.url_result('wat:' + wat_id, 'Wat', wat_id)

View File

@ -0,0 +1,40 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class MiaoPaiIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?miaopai\.com/show/(?P<id>[-A-Za-z0-9~_]+)'
_TEST = {
'url': 'http://www.miaopai.com/show/n~0hO7sfV1nBEw4Y29-Hqg__.htm',
'md5': '095ed3f1cd96b821add957bdc29f845b',
'info_dict': {
'id': 'n~0hO7sfV1nBEw4Y29-Hqg__',
'ext': 'mp4',
'title': '西游记音乐会的秒拍视频',
'thumbnail': 're:^https?://.*/n~0hO7sfV1nBEw4Y29-Hqg___m.jpg',
}
}
_USER_AGENT_IPAD = 'Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
url, video_id, headers={'User-Agent': self._USER_AGENT_IPAD})
title = self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title')
thumbnail = self._html_search_regex(
r'<div[^>]+class=(?P<q1>[\'"]).*\bvideo_img\b.*(?P=q1)[^>]+data-url=(?P<q2>[\'"])(?P<url>[^\'"]+)(?P=q2)',
webpage, 'thumbnail', fatal=False, group='url')
videos = self._parse_html5_media_entries(url, webpage, video_id)
info = videos[0]
info.update({
'id': video_id,
'title': title,
'thumbnail': thumbnail,
})
return info

View File

@ -35,7 +35,8 @@ class MoeVideoIE(InfoExtractor):
'height': 360,
'duration': 179,
'filesize': 17822500,
}
},
'skip': 'Video has been removed',
},
{
'url': 'http://playreplay.net/video/77107.7f325710a627383d40540d8e991a',

View File

@ -7,22 +7,19 @@ from ..utils import (
)
class SSAIE(InfoExtractor):
_VALID_URL = r'https?://ssa\.nls\.uk/film/(?P<id>\d+)'
class MovingImageIE(InfoExtractor):
_VALID_URL = r'https?://movingimage\.nls\.uk/film/(?P<id>\d+)'
_TEST = {
'url': 'http://ssa.nls.uk/film/3561',
'url': 'http://movingimage.nls.uk/film/3561',
'md5': '4caa05c2b38453e6f862197571a7be2f',
'info_dict': {
'id': '3561',
'ext': 'flv',
'ext': 'mp4',
'title': 'SHETLAND WOOL',
'description': 'md5:c5afca6871ad59b4271e7704fe50ab04',
'duration': 900,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# rtmp download
'skip_download': True,
},
}
def _real_extract(self, url):
@ -30,10 +27,9 @@ class SSAIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
streamer = self._search_regex(
r"'streamer'\s*,\S*'(rtmp[^']+)'", webpage, 'streamer')
play_path = self._search_regex(
r"'file'\s*,\s*'([^']+)'", webpage, 'file').rpartition('.')[0]
formats = self._extract_m3u8_formats(
self._html_search_regex(r'file\s*:\s*"([^"]+)"', webpage, 'm3u8 manifest URL'),
video_id, ext='mp4', entry_protocol='m3u8_native')
def search_field(field_name, fatal=False):
return self._search_regex(
@ -44,13 +40,11 @@ class SSAIE(InfoExtractor):
description = unescapeHTML(search_field('Description'))
duration = parse_duration(search_field('Running time'))
thumbnail = self._search_regex(
r"'image'\s*,\s*'([^']+)'", webpage, 'thumbnails', fatal=False)
r"image\s*:\s*'([^']+)'", webpage, 'thumbnail', fatal=False)
return {
'id': video_id,
'url': streamer,
'play_path': play_path,
'ext': 'flv',
'formats': formats,
'title': title,
'description': description,
'duration': duration,

View File

@ -13,7 +13,7 @@ class MyVidsterIE(InfoExtractor):
'id': '3685814',
'title': 'md5:7d8427d6d02c4fbcef50fe269980c749',
'upload_date': '20141027',
'uploader_id': 'utkualp',
'uploader': 'utkualp',
'ext': 'mp4',
'age_limit': 18,
},

View File

@ -69,13 +69,16 @@ class NickIE(MTVServicesInfoExtractor):
class NickDeIE(MTVServicesInfoExtractor):
IE_NAME = 'nick.de'
_VALID_URL = r'https?://(?:www\.)?nick\.de/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?(?:nick\.de|nickelodeon\.nl)/(?:playlist|shows)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.nick.de/playlist/3773-top-videos/videos/episode/17306-zu-wasser-und-zu-land-rauchende-erdnusse',
'only_matching': True,
}, {
'url': 'http://www.nick.de/shows/342-icarly',
'only_matching': True,
}, {
'url': 'http://www.nickelodeon.nl/shows/474-spongebob/videos/17403-een-kijkje-in-de-keuken-met-sandy-van-binnenuit',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -1,26 +1,37 @@
from __future__ import unicode_literals
import hmac
import hashlib
import base64
from .common import InfoExtractor
from ..utils import (
float_or_none,
int_or_none,
parse_iso8601,
mimetype2ext,
determine_ext,
)
class NYTimesBaseIE(InfoExtractor):
_SECRET = b'pX(2MbU2);4N{7J8)>YwKRJ+/pQ3JkiU2Q^V>mFYv6g6gYvt6v'
def _extract_video_from_id(self, video_id):
video_data = self._download_json(
'http://www.nytimes.com/svc/video/api/v2/video/%s' % video_id,
video_id, 'Downloading video JSON')
# Authorization generation algorithm is reverse engineered from `signer` in
# http://graphics8.nytimes.com/video/vhs/vhs-2.x.min.js
path = '/svc/video/api/v3/video/' + video_id
hm = hmac.new(self._SECRET, (path + ':vhs').encode(), hashlib.sha512).hexdigest()
video_data = self._download_json('http://www.nytimes.com' + path, video_id, 'Downloading video JSON', headers={
'Authorization': 'NYTV ' + base64.b64encode(hm.encode()).decode(),
'X-NYTV': 'vhs',
}, fatal=False)
if not video_data:
video_data = self._download_json(
'http://www.nytimes.com/svc/video/api/v2/video/' + video_id,
video_id, 'Downloading video JSON')
title = video_data['headline']
description = video_data.get('summary')
duration = float_or_none(video_data.get('duration'), 1000)
uploader = video_data.get('byline')
publication_date = video_data.get('publication_date')
timestamp = parse_iso8601(publication_date[:-8]) if publication_date else None
def get_file_size(file_size):
if isinstance(file_size, int):
@ -28,35 +39,59 @@ class NYTimesBaseIE(InfoExtractor):
elif isinstance(file_size, dict):
return int(file_size.get('value', 0))
else:
return 0
return None
formats = [
{
'url': video['url'],
'format_id': video.get('type'),
'vcodec': video.get('video_codec'),
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': get_file_size(video.get('fileSize')),
} for video in video_data['renditions'] if video.get('url')
]
urls = []
formats = []
for video in video_data.get('renditions', []):
video_url = video.get('url')
format_id = video.get('type')
if not video_url or format_id == 'thumbs' or video_url in urls:
continue
urls.append(video_url)
ext = mimetype2ext(video.get('mimetype')) or determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id or 'hls', fatal=False))
elif ext == 'mpd':
continue
# formats.extend(self._extract_mpd_formats(
# video_url, video_id, format_id or 'dash', fatal=False))
else:
formats.append({
'url': video_url,
'format_id': format_id,
'vcodec': video.get('videoencoding') or video.get('video_codec'),
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
'tbr': int_or_none(video.get('bitrate'), 1000),
'ext': ext,
})
self._sort_formats(formats)
thumbnails = [
{
'url': 'http://www.nytimes.com/%s' % image['url'],
thumbnails = []
for image in video_data.get('images', []):
image_url = image.get('url')
if not image_url:
continue
thumbnails.append({
'url': 'http://www.nytimes.com/' + image_url,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
} for image in video_data.get('images', []) if image.get('url')
]
})
publication_date = video_data.get('publication_date')
timestamp = parse_iso8601(publication_date[:-8]) if publication_date else None
return {
'id': video_id,
'title': title,
'description': description,
'description': video_data.get('summary'),
'timestamp': timestamp,
'uploader': uploader,
'duration': duration,
'uploader': video_data.get('byline'),
'duration': float_or_none(video_data.get('duration'), 1000),
'formats': formats,
'thumbnails': thumbnails,
}
@ -67,7 +102,7 @@ class NYTimesIE(NYTimesBaseIE):
_TESTS = [{
'url': 'http://www.nytimes.com/video/opinion/100000002847155/verbatim-what-is-a-photocopier.html?playlistId=100000001150263',
'md5': '18a525a510f942ada2720db5f31644c0',
'md5': 'd665342765db043f7e225cff19df0f2d',
'info_dict': {
'id': '100000002847155',
'ext': 'mov',

View File

@ -90,7 +90,7 @@ class OnetBaseIE(InfoExtractor):
class OnetIE(OnetBaseIE):
_VALID_URL = 'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
_VALID_URL = r'https?://(?:www\.)?onet\.tv/[a-z]/[a-z]+/(?P<display_id>[0-9a-z-]+)/(?P<id>[0-9a-z]+)'
IE_NAME = 'onet.tv'
_TEST = {

View File

@ -1,53 +1,40 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ParliamentLiveUKIE(InfoExtractor):
IE_NAME = 'parliamentlive.tv'
IE_DESC = 'UK parliament videos'
_VALID_URL = r'https?://www\.parliamentlive\.tv/Main/Player\.aspx\?(?:[^&]+&)*?meetingId=(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:www\.)?parliamentlive\.tv/Event/Index/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TEST = {
'url': 'http://www.parliamentlive.tv/Main/Player.aspx?meetingId=15121&player=windowsmedia',
'url': 'http://parliamentlive.tv/Event/Index/c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
'info_dict': {
'id': '15121',
'ext': 'asf',
'title': 'hoc home affairs committee, 18 mar 2014.pm',
'description': 'md5:033b3acdf83304cd43946b2d5e5798d1',
'id': 'c1e9d44d-fd6c-4263-b50f-97ed26cc998b',
'ext': 'mp4',
'title': 'Home Affairs Committee',
'uploader_id': 'FFMPEG-01',
'timestamp': 1422696664,
'upload_date': '20150131',
},
'params': {
'skip_download': True, # Requires mplayer (mms)
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
asx_url = self._html_search_regex(
r'embed.*?src="([^"]+)" name="MediaPlayer"', webpage,
'metadata URL')
asx = self._download_xml(asx_url, video_id, 'Downloading ASX metadata')
video_url = asx.find('.//REF').attrib['HREF']
title = self._search_regex(
r'''(?x)player\.setClipDetails\(
(?:(?:[0-9]+|"[^"]+"),\s*){2}
"([^"]+",\s*"[^"]+)"
''',
webpage, 'title').replace('", "', ', ')
description = self._html_search_regex(
r'(?s)<span id="MainContentPlaceHolder_CaptionsBlock_WitnessInfo">(.*?)</span>',
webpage, 'description')
video_id = self._match_id(url)
webpage = self._download_webpage(
'http://vodplayer.parliamentlive.tv/?mid=' + video_id, video_id)
widget_config = self._parse_json(self._search_regex(
r'kWidgetConfig\s*=\s*({.+});',
webpage, 'kaltura widget config'), video_id)
kaltura_url = 'kaltura:%s:%s' % (widget_config['wid'][1:], widget_config['entry_id'])
event_title = self._download_json(
'http://parliamentlive.tv/Event/GetShareVideo/' + video_id, video_id)['event']['title']
return {
'_type': 'url_transparent',
'id': video_id,
'ext': 'asf',
'url': video_url,
'title': title,
'description': description,
'title': event_title,
'description': '',
'url': kaltura_url,
'ie_key': 'Kaltura',
}

View File

@ -2,7 +2,6 @@
from __future__ import unicode_literals
import re
import random
from .common import InfoExtractor
from ..utils import (
@ -13,61 +12,69 @@ from ..utils import (
class PornoVoisinesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornovoisines\.com/showvideo/(?P<id>\d+)/(?P<display_id>[^/]+)'
_VIDEO_URL_TEMPLATE = 'http://stream%d.pornovoisines.com' \
'/static/media/video/transcoded/%s-640x360-1000-trscded.mp4'
_SERVER_NUMBERS = (1, 2)
_VALID_URL = r'https?://(?:www\.)?pornovoisines\.com/videos/show/(?P<id>\d+)/(?P<display_id>[^/.]+)'
_TEST = {
'url': 'http://www.pornovoisines.com/showvideo/1285/recherche-appartement/',
'md5': '5ac670803bc12e9e7f9f662ce64cf1d1',
'url': 'http://www.pornovoisines.com/videos/show/919/recherche-appartement.html',
'md5': '6f8aca6a058592ab49fe701c8ba8317b',
'info_dict': {
'id': '1285',
'id': '919',
'display_id': 'recherche-appartement',
'ext': 'mp4',
'title': 'Recherche appartement',
'description': 'md5:819ea0b785e2a04667a1a01cdc89594e',
'description': 'md5:fe10cb92ae2dd3ed94bb4080d11ff493',
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20140925',
'duration': 120,
'view_count': int,
'average_rating': float,
'categories': ['Débutantes', 'Scénario', 'Sodomie'],
'categories': ['Débutante', 'Débutantes', 'Scénario', 'Sodomie'],
'age_limit': 18,
'subtitles': {
'fr': [{
'ext': 'vtt',
}]
},
}
}
@classmethod
def build_video_url(cls, num):
return cls._VIDEO_URL_TEMPLATE % (random.choice(cls._SERVER_NUMBERS), num)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
settings_url = self._download_json(
'http://www.pornovoisines.com/api/video/%s/getsettingsurl/' % video_id,
video_id, note='Getting settings URL')['video_settings_url']
settings = self._download_json(settings_url, video_id)['data']
formats = []
for kind, data in settings['variants'].items():
if kind == 'HLS':
formats.extend(self._extract_m3u8_formats(
data, video_id, ext='mp4', entry_protocol='m3u8_native', m3u8_id='hls'))
elif kind == 'MP4':
for item in data:
formats.append({
'url': item['url'],
'height': item.get('height'),
'bitrate': item.get('bitrate'),
})
self._sort_formats(formats)
webpage = self._download_webpage(url, video_id)
video_url = self.build_video_url(video_id)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
title = self._html_search_regex(
r'<h1>(.+?)</h1>', webpage, 'title', flags=re.DOTALL)
description = self._html_search_regex(
r'<article id="descriptif">(.+?)</article>',
webpage, 'description', fatal=False, flags=re.DOTALL)
thumbnail = self._search_regex(
r'<div id="mediaspace%s">\s*<img src="/?([^"]+)"' % video_id,
webpage, 'thumbnail', fatal=False)
if thumbnail:
thumbnail = 'http://www.pornovoisines.com/%s' % thumbnail
# The webpage has a bug - there's no space between "thumb" and src=
thumbnail = self._html_search_regex(
r'<img[^>]+class=([\'"])thumb\1[^>]*src=([\'"])(?P<url>[^"]+)\2',
webpage, 'thumbnail', fatal=False, group='url')
upload_date = unified_strdate(self._search_regex(
r'Publié le ([\d-]+)', webpage, 'upload date', fatal=False))
duration = int_or_none(self._search_regex(
'Durée (\d+)', webpage, 'duration', fatal=False))
r'Le\s*<b>([\d/]+)', webpage, 'upload date', fatal=False))
duration = settings.get('main', {}).get('duration')
view_count = int_or_none(self._search_regex(
r'(\d+) vues', webpage, 'view count', fatal=False))
average_rating = self._search_regex(
@ -75,15 +82,19 @@ class PornoVoisinesIE(InfoExtractor):
if average_rating:
average_rating = float_or_none(average_rating.replace(',', '.'))
categories = self._html_search_meta(
'keywords', webpage, 'categories', fatal=False)
categories = self._html_search_regex(
r'(?s)Catégories\s*:\s*<b>(.+?)</b>', webpage, 'categories', fatal=False)
if categories:
categories = [category.strip() for category in categories.split(',')]
subtitles = {'fr': [{
'url': subtitle,
} for subtitle in settings.get('main', {}).get('vtt_tracks', {}).values()]}
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'formats': formats,
'title': title,
'description': description,
'thumbnail': thumbnail,
@ -93,4 +104,5 @@ class PornoVoisinesIE(InfoExtractor):
'average_rating': average_rating,
'categories': categories,
'age_limit': 18,
'subtitles': subtitles,
}

View File

@ -15,7 +15,111 @@ from ..utils import (
)
class ProSiebenSat1IE(InfoExtractor):
class ProSiebenSat1BaseIE(InfoExtractor):
def _extract_video_info(self, url, clip_id):
client_location = url
video = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos',
clip_id, 'Downloading videos JSON', query={
'access_token': self._TOKEN,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
'ids': clip_id,
})[0]
if video.get('is_protected') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
duration = float_or_none(video.get('duration'))
source_ids = [compat_str(source['id']) for source in video['sources']]
client_id = self._SALT[:2] + sha1(''.join([clip_id, self._SALT, self._TOKEN, client_location, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
sources = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
clip_id, 'Downloading sources JSON', query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
})
server_id = sources['server_id']
def fix_bitrate(bitrate):
bitrate = int_or_none(bitrate)
if not bitrate:
return None
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
formats = []
for source_id in source_ids:
client_id = self._SALT[:2] + sha1(''.join([self._SALT, clip_id, self._TOKEN, server_id, client_location, source_id, self._SALT, self._CLIENT_NAME]).encode('utf-8')).hexdigest()
urls = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
clip_id, 'Downloading urls JSON', fatal=False, query={
'access_token': self._TOKEN,
'client_id': client_id,
'client_location': client_location,
'client_name': self._CLIENT_NAME,
'server_id': server_id,
'source_ids': source_id,
})
if not urls:
continue
if urls.get('status_code') != 0:
raise ExtractorError('This video is unavailable', expected=True)
urls_sources = urls['sources']
if isinstance(urls_sources, dict):
urls_sources = urls_sources.values()
for source in urls_sources:
source_url = source.get('url')
if not source_url:
continue
protocol = source.get('protocol')
mimetype = source.get('mimetype')
if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(
source_url, clip_id, f4m_id='hds', fatal=False))
elif mimetype == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
tbr = fix_bitrate(source['bitrate'])
if protocol in ('rtmp', 'rtmpe'):
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj:
continue
path = mobj.group('path')
mp4colon_index = path.rfind('mp4:')
app = path[:mp4colon_index]
play_path = path[mp4colon_index:]
formats.append({
'url': '%s/%s' % (mobj.group('url'), app),
'app': app,
'play_path': play_path,
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
'page_url': 'http://www.prosieben.de',
'tbr': tbr,
'ext': 'flv',
'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
})
else:
formats.append({
'url': source_url,
'tbr': tbr,
'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
})
self._sort_formats(formats)
return {
'duration': duration,
'formats': formats,
}
class ProSiebenSat1IE(ProSiebenSat1BaseIE):
IE_NAME = 'prosiebensat1'
IE_DESC = 'ProSiebenSat.1 Digital'
_VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
@ -188,6 +292,9 @@ class ProSiebenSat1IE(InfoExtractor):
},
]
_TOKEN = 'prosieben'
_SALT = '01!8d8F_)r9]4s[qeuXfP%'
_CLIENT_NAME = 'kolibri-2.0.19-splec4'
_CLIPID_REGEXES = [
r'"clip_id"\s*:\s+"(\d+)"',
r'clipid: "(\d+)"',
@ -234,123 +341,22 @@ class ProSiebenSat1IE(InfoExtractor):
def _extract_clip(self, url, webpage):
clip_id = self._html_search_regex(
self._CLIPID_REGEXES, webpage, 'clip id')
access_token = 'prosieben'
client_name = 'kolibri-2.0.19-splec4'
client_location = url
video = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos',
clip_id, 'Downloading videos JSON', query={
'access_token': access_token,
'client_location': client_location,
'client_name': client_name,
'ids': clip_id,
})[0]
if video.get('is_protected') is True:
raise ExtractorError('This video is DRM protected.', expected=True)
duration = float_or_none(video.get('duration'))
source_ids = [compat_str(source['id']) for source in video['sources']]
g = '01!8d8F_)r9]4s[qeuXfP%'
client_id = g[:2] + sha1(''.join([clip_id, g, access_token, client_location, g, client_name]).encode('utf-8')).hexdigest()
sources = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources' % clip_id,
clip_id, 'Downloading sources JSON', query={
'access_token': access_token,
'client_id': client_id,
'client_location': client_location,
'client_name': client_name,
})
server_id = sources['server_id']
title = self._html_search_regex(self._TITLE_REGEXES, webpage, 'title')
def fix_bitrate(bitrate):
bitrate = int_or_none(bitrate)
if not bitrate:
return None
return (bitrate // 1000) if bitrate % 1000 == 0 else bitrate
formats = []
for source_id in source_ids:
client_id = g[:2] + sha1(''.join([g, clip_id, access_token, server_id, client_location, source_id, g, client_name]).encode('utf-8')).hexdigest()
urls = self._download_json(
'http://vas.sim-technik.de/vas/live/v2/videos/%s/sources/url' % clip_id,
clip_id, 'Downloading urls JSON', fatal=False, query={
'access_token': access_token,
'client_id': client_id,
'client_location': client_location,
'client_name': client_name,
'server_id': server_id,
'source_ids': source_id,
})
if not urls:
continue
if urls.get('status_code') != 0:
raise ExtractorError('This video is unavailable', expected=True)
urls_sources = urls['sources']
if isinstance(urls_sources, dict):
urls_sources = urls_sources.values()
for source in urls_sources:
source_url = source.get('url')
if not source_url:
continue
protocol = source.get('protocol')
mimetype = source.get('mimetype')
if mimetype == 'application/f4m+xml' or 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(
source_url, clip_id, f4m_id='hds', fatal=False))
elif mimetype == 'application/x-mpegURL':
formats.extend(self._extract_m3u8_formats(
source_url, clip_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
else:
tbr = fix_bitrate(source['bitrate'])
if protocol in ('rtmp', 'rtmpe'):
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj:
continue
path = mobj.group('path')
mp4colon_index = path.rfind('mp4:')
app = path[:mp4colon_index]
play_path = path[mp4colon_index:]
formats.append({
'url': '%s/%s' % (mobj.group('url'), app),
'app': app,
'play_path': play_path,
'player_url': 'http://livepassdl.conviva.com/hf/ver/2.79.0.17083/LivePassModuleMain.swf',
'page_url': 'http://www.prosieben.de',
'tbr': tbr,
'ext': 'flv',
'format_id': 'rtmp%s' % ('-%d' % tbr if tbr else ''),
})
else:
formats.append({
'url': source_url,
'tbr': tbr,
'format_id': 'http%s' % ('-%d' % tbr if tbr else ''),
})
self._sort_formats(formats)
info = self._extract_video_info(url, clip_id)
description = self._html_search_regex(
self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
thumbnail = self._og_search_thumbnail(webpage)
upload_date = unified_strdate(self._html_search_regex(
self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
return {
info.update({
'id': clip_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
}
})
return info
def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex(

View File

@ -1,88 +1,51 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from .common import InfoExtractor
from .prosiebensat1 import ProSiebenSat1BaseIE
from ..utils import (
ExtractorError,
unified_strdate,
int_or_none,
parse_duration,
compat_str,
)
class Puls4IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?puls4\.com/video/[^/]+/play/(?P<id>[0-9]+)'
class Puls4IE(ProSiebenSat1BaseIE):
_VALID_URL = r'https?://(?:www\.)?puls4\.com/(?P<id>(?:[^/]+/)*?videos/[^?#]+)'
_TESTS = [{
'url': 'http://www.puls4.com/video/pro-und-contra/play/2716816',
'md5': '49f6a6629747eeec43cef6a46b5df81d',
'url': 'http://www.puls4.com/2-minuten-2-millionen/staffel-3/videos/2min2miotalk/Tobias-Homberger-von-myclubs-im-2min2miotalk-118118',
'md5': 'fd3c6b0903ac72c9d004f04bc6bb3e03',
'info_dict': {
'id': '2716816',
'ext': 'mp4',
'title': 'Pro und Contra vom 23.02.2015',
'description': 'md5:293e44634d9477a67122489994675db6',
'duration': 2989,
'upload_date': '20150224',
'id': '118118',
'ext': 'flv',
'title': 'Tobias Homberger von myclubs im #2min2miotalk',
'description': 'md5:f9def7c5e8745d6026d8885487d91955',
'upload_date': '20160830',
'uploader': 'PULS_4',
},
'skip': 'Only works from Germany',
}, {
'url': 'http://www.puls4.com/video/kult-spielfilme/play/1298106',
'md5': '6a48316c8903ece8dab9b9a7bf7a59ec',
'info_dict': {
'id': '1298106',
'ext': 'mp4',
'title': 'Lucky Fritz',
},
'skip': 'Only works from Germany',
}]
_TOKEN = 'puls4'
_SALT = '01!kaNgaiNgah1Ie4AeSha'
_CLIENT_NAME = ''
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
error_message = self._html_search_regex(
r'<div[^>]+class="message-error"[^>]*>(.+?)</div>',
webpage, 'error message', default=None)
if error_message:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error_message), expected=True)
real_url = self._html_search_regex(
r'\"fsk-button\".+?href=\"([^"]+)',
webpage, 'fsk_button', default=None)
if real_url:
webpage = self._download_webpage(real_url, video_id)
player = self._search_regex(
r'p4_video_player(?:_iframe)?\("video_\d+_container"\s*,(.+?)\);\s*\}',
webpage, 'player')
player_json = self._parse_json(
'[%s]' % player, video_id,
transform_source=lambda s: s.replace('undefined,', ''))
formats = None
result = None
for v in player_json:
if isinstance(v, list) and not formats:
formats = [{
'url': f['url'],
'format': 'hd' if f.get('hd') else 'sd',
'width': int_or_none(f.get('size_x')),
'height': int_or_none(f.get('size_y')),
'tbr': int_or_none(f.get('bitrate')),
} for f in v]
self._sort_formats(formats)
elif isinstance(v, dict) and not result:
result = {
'id': video_id,
'title': v['videopartname'].strip(),
'description': v.get('videotitle'),
'duration': int_or_none(v.get('videoduration') or v.get('episodeduration')),
'upload_date': unified_strdate(v.get('clipreleasetime')),
'uploader': v.get('channel'),
}
result['formats'] = formats
return result
path = self._match_id(url)
content_path = self._download_json(
'http://www.puls4.com/api/json-fe/page/' + path, path)['content'][0]['url']
media = self._download_json(
'http://www.puls4.com' + content_path,
content_path)['mediaCurrent']
player_content = media['playerContent']
info = self._extract_video_info(url, player_content['id'])
info.update({
'id': compat_str(media['objectId']),
'title': player_content['title'],
'description': media.get('description'),
'thumbnail': media.get('previewLink'),
'upload_date': unified_strdate(media.get('date')),
'duration': parse_duration(player_content.get('duration')),
'episode': player_content.get('episodePartName'),
'show': media.get('channel'),
'season_id': player_content.get('seasonId'),
'uploader': player_content.get('sourceCompany'),
})
return info

View File

@ -0,0 +1,39 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
class RMCDecouverteIE(InfoExtractor):
_VALID_URL = r'https?://rmcdecouverte\.bfmtv\.com/mediaplayer-replay.*?\bid=(?P<id>\d+)'
_TEST = {
'url': 'http://rmcdecouverte.bfmtv.com/mediaplayer-replay/?id=1430&title=LES%20HEROS%20DU%2088e%20ETAGE',
'info_dict': {
'id': '5111223049001',
'ext': 'mp4',
'title': ': LES HEROS DU 88e ETAGE',
'description': 'Découvrez comment la bravoure de deux hommes dans la Tour Nord du World Trade Center a sauvé la vie d\'innombrables personnes le 11 septembre 2001.',
'uploader_id': '1969646226001',
'upload_date': '20160904',
'timestamp': 1472951103,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Only works from France',
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1969646226001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(brightcove_legacy_url).query)['@videoPlayer'][0]
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

View File

@ -1,7 +1,6 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urlparse
from .internetvideoarchive import InternetVideoArchiveIE
@ -11,21 +10,23 @@ class RottenTomatoesIE(InfoExtractor):
_TEST = {
'url': 'http://www.rottentomatoes.com/m/toy_story_3/trailers/11028566/',
'info_dict': {
'id': '613340',
'id': '11028566',
'ext': 'mp4',
'title': 'Toy Story 3',
'description': 'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage)
query = compat_urlparse.urlparse(og_video).query
iva_id = self._search_regex(r'publishedid=(\d+)', webpage, 'internet video archive id')
return {
'_type': 'url_transparent',
'url': InternetVideoArchiveIE._build_xml_url(query),
'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?domain=www.videodetective.com&customerid=69249&playerid=641&publishedid=' + iva_id,
'ie_key': InternetVideoArchiveIE.ie_key(),
'id': video_id,
'title': self._og_search_title(webpage),
}

View File

@ -88,7 +88,7 @@ class RutubeIE(InfoExtractor):
class RutubeEmbedIE(InfoExtractor):
IE_NAME = 'rutube:embed'
IE_DESC = 'Rutube embedded videos'
_VALID_URL = 'https?://rutube\.ru/(?:video|play)/embed/(?P<id>[0-9]+)'
_VALID_URL = r'https?://rutube\.ru/(?:video|play)/embed/(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://rutube.ru/video/embed/6722881?vk_puid37=&vk_puid38=',

View File

@ -35,6 +35,7 @@ class SouthParkEsIE(SouthParkIE):
'description': 'Cartman Consigue Una Sonda Anal',
},
'playlist_count': 4,
'skip': 'Geo-restricted',
}]

View File

@ -103,7 +103,7 @@ class SpiegelIE(InfoExtractor):
class SpiegelArticleIE(InfoExtractor):
_VALID_URL = 'https?://www\.spiegel\.de/(?!video/)[^?#]*?-(?P<id>[0-9]+)\.html'
_VALID_URL = r'https?://www\.spiegel\.de/(?!video/)[^?#]*?-(?P<id>[0-9]+)\.html'
IE_NAME = 'Spiegel:Article'
IE_DESC = 'Articles on spiegel.de'
_TESTS = [{

View File

@ -53,7 +53,7 @@ class TBSIE(TurnerBaseIE):
'media_src': 'http://ht.cdn.turner.com/%s/big' % site,
},
'secure': {
'media_src': 'http://apple-secure.cdn.turner.com/%s/big' % site,
'media_src': 'http://androidhls-secure.cdn.turner.com/%s/big' % site,
'tokenizer_src': 'http://www.%s.com/video/processors/services/token_ipadAdobe.do' % domain,
},
})

View File

@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class TeleQuebecIE(InfoExtractor):
_VALID_URL = r'https?://zonevideo\.telequebec\.tv/media/(?P<id>\d+)'
_TEST = {
'url': 'http://zonevideo.telequebec.tv/media/20984/le-couronnement-de-new-york/couronnement-de-new-york',
'md5': 'fe95a0957e5707b1b01f5013e725c90f',
'info_dict': {
'id': '20984',
'ext': 'mp4',
'title': 'Le couronnement de New York',
'description': 'md5:f5b3d27a689ec6c1486132b2d687d432',
'upload_date': '20160220',
'timestamp': 1455965438,
}
}
def _real_extract(self, url):
media_id = self._match_id(url)
media_data = self._download_json(
'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
media_id)['media']
return {
'_type': 'url_transparent',
'id': media_id,
'url': 'limelight:media:' + media_data['streamInfo']['sourceId'],
'title': media_data['title'],
'description': media_data.get('descriptions', [{'text': None}])[0].get('text'),
'duration': int_or_none(media_data.get('durationInMilliseconds'), 1000),
'ie_key': 'LimelightMedia',
}

View File

@ -96,7 +96,7 @@ class ThePlatformBaseIE(OnceIE):
class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
_VALID_URL = r'''(?x)
(?:https?://(?:link|player)\.theplatform\.com/[sp]/(?P<provider_id>[^/]+)/
(?:(?:(?:[^/]+/)+select/)?(?P<media>media/(?:guid/\d+/)?)|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
(?:(?:(?:[^/]+/)+select/)?(?P<media>media/(?:guid/\d+/)?)?|(?P<config>(?:[^/\?]+/(?:swf|config)|onsite)/select/))?
|theplatform:)(?P<id>[^/\?&]+)'''
_TESTS = [{
@ -116,6 +116,7 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
# rtmp download
'skip_download': True,
},
'skip': '404 Not Found',
}, {
# from http://www.cnet.com/videos/tesla-model-s-a-second-step-towards-a-cleaner-motoring-future/
'url': 'http://link.theplatform.com/s/kYEXFC/22d_qsQ6MIRT',

View File

@ -2,8 +2,6 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE
from ..compat import compat_parse_qs
class TheStarIE(InfoExtractor):
@ -30,6 +28,9 @@ class TheStarIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
brightcove_id = compat_parse_qs(brightcove_legacy_url)['@videoPlayer'][0]
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
brightcove_id = self._search_regex(
r'mainartBrightcoveVideoId["\']?\s*:\s*["\']?(\d+)',
webpage, 'brightcove id')
return self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
'BrightcoveNew', brightcove_id)

View File

@ -1,84 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
unified_strdate
)
class THVideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?thvideo\.tv/(?:v/th|mobile\.php\?cid=)(?P<id>[0-9]+)'
_TEST = {
'url': 'http://thvideo.tv/v/th1987/',
'md5': 'fa107b1f73817e325e9433505a70db50',
'info_dict': {
'id': '1987',
'ext': 'mp4',
'title': '【动画】秘封活动记录 The Sealed Esoteric History.分镜稿预览',
'display_id': 'th1987',
'thumbnail': 'http://thvideo.tv/uploadfile/2014/0722/20140722013459856.jpg',
'description': '社团京都幻想剧团的第一个东方二次同人动画作品「秘封活动记录 The Sealed Esoteric History.」 本视频是该动画第一期的分镜草稿...',
'upload_date': '20140722'
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
# extract download link from mobile player page
webpage_player = self._download_webpage(
'http://thvideo.tv/mobile.php?cid=%s-0' % (video_id),
video_id, note='Downloading video source page')
video_url = self._html_search_regex(
r'<source src="(.*?)" type', webpage_player, 'video url')
# extract video info from main page
webpage = self._download_webpage(
'http://thvideo.tv/v/th%s' % (video_id), video_id)
title = self._og_search_title(webpage)
display_id = 'th%s' % video_id
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage)
upload_date = unified_strdate(self._html_search_regex(
r'span itemprop="datePublished" content="(.*?)">', webpage,
'upload date', fatal=False))
return {
'id': video_id,
'ext': 'mp4',
'url': video_url,
'title': title,
'display_id': display_id,
'thumbnail': thumbnail,
'description': description,
'upload_date': upload_date
}
class THVideoPlaylistIE(InfoExtractor):
_VALID_URL = r'http?://(?:www\.)?thvideo\.tv/mylist(?P<id>[0-9]+)'
_TEST = {
'url': 'http://thvideo.tv/mylist2',
'info_dict': {
'id': '2',
'title': '幻想万華鏡',
},
'playlist_mincount': 23,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
list_title = self._html_search_regex(
r'<h1 class="show_title">(.*?)<b id', webpage, 'playlist title',
fatal=False)
entries = [
self.url_result('http://thvideo.tv/v/th' + id, 'THVideo')
for id in re.findall(r'<dd><a href="http://thvideo.tv/v/th(\d+)/" target=', webpage)]
return self.playlist_result(entries, playlist_id, list_title)

View File

@ -1,10 +1,14 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE
from ..compat import compat_parse_qs
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
class TlcDeIE(InfoExtractor):
@ -35,5 +39,5 @@ class TlcDeIE(InfoExtractor):
title = mobj.group('title')
webpage = self._download_webpage(url, title)
brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
brightcove_id = compat_parse_qs(brightcove_legacy_url)['@videoPlayer'][0]
brightcove_id = compat_parse_qs(compat_urlparse.urlparse(brightcove_legacy_url).query)['@videoPlayer'][0]
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

View File

@ -0,0 +1,35 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE
class TruTVIE(TurnerBaseIE):
_VALID_URL = r'https?://(?:www\.)?trutv\.com(?:(?P<path>/shows/[^/]+/videos/[^/?#]+?)\.html|/full-episodes/[^/]+/(?P<id>\d+))'
_TEST = {
'url': 'http://www.trutv.com/shows/10-things/videos/you-wont-believe-these-sports-bets.html',
'md5': '2cdc844f317579fed1a7251b087ff417',
'info_dict': {
'id': '/shows/10-things/videos/you-wont-believe-these-sports-bets',
'ext': 'mp4',
'title': 'You Won\'t Believe These Sports Bets',
'description': 'Jamie Lee sits down with a bookie to discuss the bizarre world of illegal sports betting.',
'upload_date': '20130305',
}
}
def _real_extract(self, url):
path, video_id = re.match(self._VALID_URL, url).groups()
if path:
data_src = 'http://www.trutv.com/video/cvp/v2/xml/content.xml?id=%s.xml' % path
else:
data_src = 'http://www.trutv.com/tveverywhere/services/cvpXML.do?titleId=' + video_id
return self._extract_cvp_info(
data_src, path, {
'secure': {
'media_src': 'http://androidhls-secure.cdn.turner.com/trutv/big',
'tokenizer_src': 'http://www.trutv.com/tveverywhere/processors/services/token_ipadAdobe.do',
},
})

View File

@ -12,7 +12,7 @@ from ..utils import (
parse_duration,
xpath_attr,
update_url_query,
compat_urlparse,
ExtractorError,
)
@ -24,6 +24,7 @@ class TurnerBaseIE(InfoExtractor):
video_data = self._download_xml(data_src, video_id)
video_id = video_data.attrib['id']
title = xpath_text(video_data, 'headline', fatal=True)
content_id = xpath_text(video_data, 'contentId') or video_id
# rtmp_src = xpath_text(video_data, 'akamai/src')
# if rtmp_src:
# splited_rtmp_src = rtmp_src.split(',')
@ -54,7 +55,7 @@ class TurnerBaseIE(InfoExtractor):
# auth = self._download_webpage(
# protected_path_data['tokenizer_src'], query={
# 'path': protected_path,
# 'videoId': video_id,
# 'videoId': content_id,
# 'aifp': aifp,
# })
# token = xpath_text(auth, 'token')
@ -72,8 +73,11 @@ class TurnerBaseIE(InfoExtractor):
auth = self._download_xml(
secure_path_data['tokenizer_src'], video_id, query={
'path': secure_path,
'videoId': video_id,
'videoId': content_id,
})
error_msg = xpath_text(auth, 'error/msg')
if error_msg:
raise ExtractorError(error_msg, expected=True)
token = xpath_text(auth, 'token')
if not token:
continue
@ -93,19 +97,9 @@ class TurnerBaseIE(InfoExtractor):
formats.extend(self._extract_smil_formats(
video_url, video_id, fatal=False))
elif ext == 'm3u8':
m3u8_formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id=format_id or 'hls',
fatal=False)
if m3u8_formats:
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
# ourselves
qs = compat_urlparse.urlparse(video_url).query
if qs:
query = compat_urlparse.parse_qs(qs)
for m3u8_format in m3u8_formats:
m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
m3u8_format['extra_param_to_segment_url'] = qs
formats.extend(m3u8_formats)
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4',
m3u8_id=format_id or 'hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
update_url_query(video_url, {'hdcore': '3.7.0'}),

View File

@ -0,0 +1,49 @@
# coding: utf-8
from __future__ import unicode_literals
from .jwplatform import JWPlatformBaseIE
from ..utils import (
clean_html,
get_element_by_class,
js_to_json,
)
class TVNoeIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?tvnoe\.cz/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.tvnoe.cz/video/10362',
'md5': 'aee983f279aab96ec45ab6e2abb3c2ca',
'info_dict': {
'id': '10362',
'ext': 'mp4',
'series': 'Noční univerzita',
'title': 'prof. Tomáš Halík, Th.D. - Návrat náboženství a střet civilizací',
'description': 'md5:f337bae384e1a531a52c55ebc50fff41',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
iframe_url = self._search_regex(
r'<iframe[^>]+src="([^"]+)"', webpage, 'iframe URL')
ifs_page = self._download_webpage(iframe_url, video_id)
jwplayer_data = self._parse_json(
self._find_jwplayer_data(ifs_page),
video_id, transform_source=js_to_json)
info_dict = self._parse_jwplayer_data(
jwplayer_data, video_id, require_title=False, base_url=iframe_url)
info_dict.update({
'id': video_id,
'title': clean_html(get_element_by_class(
'field-name-field-podnazev', webpage)),
'description': clean_html(get_element_by_class(
'field-name-body', webpage)),
'series': clean_html(get_element_by_class('title', webpage))
})
return info_dict

View File

@ -348,6 +348,25 @@ class ViafreeIE(InfoExtractor):
'skip_download': True,
},
'add_ie': [TVPlayIE.ie_key()],
}, {
# with relatedClips
'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
'info_dict': {
'id': '758770',
'ext': 'mp4',
'title': 'Sommaren med YouTube-stjärnorna S01E01',
'description': 'md5:2bc69dce2c4bb48391e858539bbb0e3f',
'series': 'Sommaren med YouTube-stjärnorna',
'season': 'Säsong 1',
'season_number': 1,
'duration': 1326,
'timestamp': 1470905572,
'upload_date': '20160811',
},
'params': {
'skip_download': True,
},
'add_ie': [TVPlayIE.ie_key()],
}, {
'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
'only_matching': True,
@ -365,8 +384,17 @@ class ViafreeIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](?P<id>\d{6,})',
webpage, 'video id')
video_id = None
thumbnail = self._og_search_thumbnail(webpage, default=None)
if thumbnail:
video_id = self._search_regex(
r'https?://[^/]+/imagecache/(?:[^/]+/)+seasons/\d+/(\d{6,})/',
thumbnail, 'video id', default=None)
if not video_id:
video_id = self._search_regex(
r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
webpage, 'video id')
return self.url_result('mtg:%s' % video_id, TVPlayIE.ie_key())

View File

@ -342,7 +342,7 @@ class TwitterIE(InfoExtractor):
class TwitterAmplifyIE(TwitterBaseIE):
IE_NAME = 'twitter:amplify'
_VALID_URL = 'https?://amp\.twimg\.com/v/(?P<id>[0-9a-f\-]{36})'
_VALID_URL = r'https?://amp\.twimg\.com/v/(?P<id>[0-9a-f\-]{36})'
_TEST = {
'url': 'https://amp.twimg.com/v/0ba0c3c7-0af3-4c0a-bed5-7efd1ffa2951',

View File

@ -6,8 +6,7 @@ import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_age_limit,
parse_iso8601,
xpath_element,
xpath_text,
)
@ -17,38 +16,32 @@ class VideomoreIE(InfoExtractor):
_VALID_URL = r'videomore:(?P<sid>\d+)$|https?://videomore\.ru/(?:(?:embed|[^/]+/[^/]+)/|[^/]+\?.*\btrack_id=)(?P<id>\d+)(?:[/?#&]|\.(?:xml|json)|$)'
_TESTS = [{
'url': 'http://videomore.ru/kino_v_detalayah/5_sezon/367617',
'md5': '70875fbf57a1cd004709920381587185',
'md5': '44455a346edc0d509ac5b5a5b531dc35',
'info_dict': {
'id': '367617',
'ext': 'flv',
'title': 'В гостях Алексей Чумаков и Юлия Ковальчук',
'description': 'В гостях лучшие романтические комедии года, «Выживший» Иньярриту и «Стив Джобс» Дэнни Бойла.',
'title': 'Кино в деталях 5 сезон В гостях Алексей Чумаков и Юлия Ковальчук',
'series': 'Кино в деталях',
'episode': 'В гостях Алексей Чумаков и Юлия Ковальчук',
'episode_number': None,
'season': 'Сезон 2015',
'season_number': 5,
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2910,
'age_limit': 16,
'view_count': int,
'comment_count': int,
'age_limit': 16,
},
}, {
'url': 'http://videomore.ru/embed/259974',
'info_dict': {
'id': '259974',
'ext': 'flv',
'title': '80 серия',
'description': '«Медведей» ждет решающий матч. Макеев выясняет отношения со Стрельцовым. Парни узнают подробности прошлого Макеева.',
'title': 'Молодежка 2 сезон 40 серия',
'series': 'Молодежка',
'episode': '80 серия',
'episode_number': 40,
'season': '2 сезон',
'season_number': 2,
'episode': '40 серия',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 2809,
'age_limit': 16,
'view_count': int,
'comment_count': int,
'age_limit': 16,
},
'params': {
'skip_download': True,
@ -58,13 +51,8 @@ class VideomoreIE(InfoExtractor):
'info_dict': {
'id': '341073',
'ext': 'flv',
'title': 'Команда проиграла из-за Бакина?',
'description': 'Молодежка 3 сезон скоро',
'series': 'Молодежка',
'title': 'Промо Команда проиграла из-за Бакина?',
'episode': 'Команда проиграла из-за Бакина?',
'episode_number': None,
'season': 'Промо',
'season_number': 99,
'thumbnail': 're:^https?://.*\.jpg',
'duration': 29,
'age_limit': 16,
@ -109,43 +97,33 @@ class VideomoreIE(InfoExtractor):
'http://videomore.ru/video/tracks/%s.xml' % video_id,
video_id, 'Downloading video XML')
video_url = xpath_text(video, './/video_url', 'video url', fatal=True)
item = xpath_element(video, './/playlist/item', fatal=True)
title = xpath_text(
item, ('./title', './episode_name'), 'title', fatal=True)
video_url = xpath_text(item, './video_url', 'video url', fatal=True)
formats = self._extract_f4m_formats(video_url, video_id, f4m_id='hds')
self._sort_formats(formats)
data = self._download_json(
'http://videomore.ru/video/tracks/%s.json' % video_id,
video_id, 'Downloading video JSON')
thumbnail = xpath_text(item, './thumbnail_url')
duration = int_or_none(xpath_text(item, './duration'))
view_count = int_or_none(xpath_text(item, './views'))
comment_count = int_or_none(xpath_text(item, './count_comments'))
age_limit = int_or_none(xpath_text(item, './min_age'))
title = data.get('title') or data['project_title']
description = data.get('description') or data.get('description_raw')
timestamp = parse_iso8601(data.get('published_at'))
duration = int_or_none(data.get('duration'))
view_count = int_or_none(data.get('views'))
age_limit = parse_age_limit(data.get('min_age'))
thumbnails = [{
'url': thumbnail,
} for thumbnail in data.get('big_thumbnail_urls', [])]
series = data.get('project_title')
episode = data.get('title')
episode_number = int_or_none(data.get('episode_of_season') or None)
season = data.get('season_title')
season_number = int_or_none(data.get('season_pos') or None)
series = xpath_text(item, './project_name')
episode = xpath_text(item, './episode_name')
return {
'id': video_id,
'title': title,
'description': description,
'series': series,
'episode': episode,
'episode_number': episode_number,
'season': season,
'season_number': season_number,
'thumbnails': thumbnails,
'timestamp': timestamp,
'thumbnail': thumbnail,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'age_limit': age_limit,
'formats': formats,
}

View File

@ -28,23 +28,24 @@ class SprutoBaseIE(InfoExtractor):
class VimpleIE(SprutoBaseIE):
IE_DESC = 'Vimple - one-click video hosting'
_VALID_URL = r'https?://(?:player\.vimple\.ru/iframe|vimple\.ru)/(?P<id>[\da-f-]{32,36})'
_TESTS = [
{
'url': 'http://vimple.ru/c0f6b1687dcd4000a97ebe70068039cf',
'md5': '2e750a330ed211d3fd41821c6ad9a279',
'info_dict': {
'id': 'c0f6b168-7dcd-4000-a97e-be70068039cf',
'ext': 'mp4',
'title': 'Sunset',
'duration': 20,
'thumbnail': 're:https?://.*?\.jpg',
},
}, {
'url': 'http://player.vimple.ru/iframe/52e1beec-1314-4a83-aeac-c61562eadbf9',
'only_matching': True,
}
]
_VALID_URL = r'https?://(?:player\.vimple\.(?:ru|co)/iframe|vimple\.(?:ru|co))/(?P<id>[\da-f-]{32,36})'
_TESTS = [{
'url': 'http://vimple.ru/c0f6b1687dcd4000a97ebe70068039cf',
'md5': '2e750a330ed211d3fd41821c6ad9a279',
'info_dict': {
'id': 'c0f6b168-7dcd-4000-a97e-be70068039cf',
'ext': 'mp4',
'title': 'Sunset',
'duration': 20,
'thumbnail': 're:https?://.*?\.jpg',
},
}, {
'url': 'http://player.vimple.ru/iframe/52e1beec-1314-4a83-aeac-c61562eadbf9',
'only_matching': True,
}, {
'url': 'http://vimple.co/04506a053f124483b8fb05ed73899f19',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@ -86,38 +86,50 @@ class WatIE(InfoExtractor):
def extract_url(path_template, url_type):
req_url = 'http://www.wat.tv/get/%s' % (path_template % video_id)
head = self._request_webpage(HEADRequest(req_url), video_id, 'Extracting %s url' % url_type)
red_url = head.geturl()
if req_url == red_url:
raise ExtractorError(
'%s said: Sorry, this video is not available from your country.' % self.IE_NAME,
expected=True)
return red_url
head = self._request_webpage(HEADRequest(req_url), video_id, 'Extracting %s url' % url_type, fatal=False)
if head:
red_url = head.geturl()
if req_url != red_url:
return red_url
return None
def remove_bitrate_limit(manifest_url):
return re.sub(r'(?:max|min)_bitrate=\d+&?', '', manifest_url)
formats = []
try:
http_url = extract_url('android5/%s.mp4', 'http')
m3u8_url = extract_url('ipad/%s.m3u8', 'm3u8')
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
formats.extend(m3u8_formats)
formats.extend(self._extract_f4m_formats(
m3u8_url.replace('ios.', 'web.').replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
for m3u8_format in m3u8_formats:
vbr, abr = m3u8_format.get('vbr'), m3u8_format.get('abr')
if not vbr or not abr:
continue
format_id = m3u8_format['format_id'].replace('hls', 'http')
fmt_url = re.sub(r'%s-\d+00-\d+' % video_id, '%s-%d00-%d' % (video_id, round(vbr / 100), round(abr)), http_url)
if self._is_valid_url(fmt_url, video_id, format_id):
f = m3u8_format.copy()
f.update({
'url': fmt_url,
'format_id': format_id,
'protocol': 'http',
})
formats.append(f)
manifest_urls = self._download_json(
'http://www.wat.tv/get/webhtml/' + video_id, video_id)
m3u8_url = manifest_urls.get('hls')
if m3u8_url:
m3u8_url = remove_bitrate_limit(m3u8_url)
m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)
if m3u8_formats:
formats.extend(m3u8_formats)
formats.extend(self._extract_f4m_formats(
m3u8_url.replace('ios', 'web').replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
http_url = extract_url('android5/%s.mp4', 'http')
if http_url:
for m3u8_format in m3u8_formats:
vbr, abr = m3u8_format.get('vbr'), m3u8_format.get('abr')
if not vbr or not abr:
continue
format_id = m3u8_format['format_id'].replace('hls', 'http')
fmt_url = re.sub(r'%s-\d+00-\d+' % video_id, '%s-%d00-%d' % (video_id, round(vbr / 100), round(abr)), http_url)
if self._is_valid_url(fmt_url, video_id, format_id):
f = m3u8_format.copy()
f.update({
'url': fmt_url,
'format_id': format_id,
'protocol': 'http',
})
formats.append(f)
mpd_url = manifest_urls.get('mpd')
if mpd_url:
formats.extend(self._extract_mpd_formats(remove_bitrate_limit(
mpd_url), video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
except ExtractorError:
abr = 64

View File

@ -19,7 +19,10 @@ from ..utils import (
determine_ext,
)
from .brightcove import BrightcoveNewIE
from .brightcove import (
BrightcoveLegacyIE,
BrightcoveNewIE,
)
from .nbc import NBCSportsVPlayerIE
@ -223,6 +226,11 @@ class YahooIE(InfoExtractor):
if nbc_sports_url:
return self.url_result(nbc_sports_url, NBCSportsVPlayerIE.ie_key())
# Look for Brightcove Legacy Studio embeds
bc_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
if bc_url:
return self.url_result(bc_url, BrightcoveLegacyIE.ie_key())
# Look for Brightcove New Studio embeds
bc_url = BrightcoveNewIE._extract_url(webpage)
if bc_url:

View File

@ -1,21 +1,16 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
)
class YouJizzIE(InfoExtractor):
_VALID_URL = r'https?://(?:\w+\.)?youjizz\.com/videos/(?:[^/#?]+)?-(?P<id>[0-9]+)\.html(?:$|[?#])'
_TESTS = [{
'url': 'http://www.youjizz.com/videos/zeichentrick-1-2189178.html',
'md5': '07e15fa469ba384c7693fd246905547c',
'md5': '78fc1901148284c69af12640e01c6310',
'info_dict': {
'id': '2189178',
'ext': 'flv',
'ext': 'mp4',
'title': 'Zeichentrick 1',
'age_limit': 18,
}
@ -27,38 +22,18 @@ class YouJizzIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
# YouJizz's HTML5 player has invalid HTML
webpage = webpage.replace('"controls', '" controls')
age_limit = self._rta_search(webpage)
video_title = self._html_search_regex(
r'<title>\s*(.*)\s*</title>', webpage, 'title')
embed_page_url = self._search_regex(
r'(https?://www.youjizz.com/videos/embed/[0-9]+)',
webpage, 'embed page')
webpage = self._download_webpage(
embed_page_url, video_id, note='downloading embed page')
info_dict = self._parse_html5_media_entries(url, webpage, video_id)[0]
# Get the video URL
m_playlist = re.search(r'so.addVariable\("playlist", ?"(?P<playlist>.+?)"\);', webpage)
if m_playlist is not None:
playlist_url = m_playlist.group('playlist')
playlist_page = self._download_webpage(playlist_url, video_id,
'Downloading playlist page')
m_levels = list(re.finditer(r'<level bitrate="(\d+?)" file="(.*?)"', playlist_page))
if len(m_levels) == 0:
raise ExtractorError('Unable to extract video url')
videos = [(int(m.group(1)), m.group(2)) for m in m_levels]
(_, video_url) = sorted(videos)[0]
video_url = video_url.replace('%252F', '%2F')
else:
video_url = self._search_regex(r'so.addVariable\("file",encodeURIComponent\("(?P<source>[^"]+)"\)\);',
webpage, 'video URL')
return {
info_dict.update({
'id': video_id,
'url': video_url,
'title': video_title,
'ext': 'flv',
'format': 'flv',
'player_url': embed_page_url,
'age_limit': age_limit,
}
})
return info_dict

View File

@ -35,7 +35,7 @@ class YouPornIE(InfoExtractor):
'age_limit': 18,
},
}, {
# Anonymous User uploader
# Unknown uploader
'url': 'http://www.youporn.com/watch/561726/big-tits-awesome-brunette-on-amazing-webcam-show/?from=related3&al=2&from_id=561726&pos=4',
'info_dict': {
'id': '561726',
@ -44,7 +44,7 @@ class YouPornIE(InfoExtractor):
'title': 'Big Tits Awesome Brunette On amazing webcam show',
'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Anonymous User',
'uploader': 'Unknown',
'upload_date': '20111125',
'average_rating': int,
'view_count': int,
@ -140,17 +140,17 @@ class YouPornIE(InfoExtractor):
r'>All [Cc]omments? \(([\d,.]+)\)',
webpage, 'comment count', fatal=False))
def extract_tag_box(title):
tag_box = self._search_regex(
(r'<div[^>]+class=["\']tagBoxTitle["\'][^>]*>\s*%s\b.*?</div>\s*'
'<div[^>]+class=["\']tagBoxContent["\']>(.+?)</div>') % re.escape(title),
webpage, '%s tag box' % title, default=None)
def extract_tag_box(regex, title):
tag_box = self._search_regex(regex, webpage, title, default=None)
if not tag_box:
return []
return re.findall(r'<a[^>]+href=[^>]+>([^<]+)', tag_box)
categories = extract_tag_box('Category')
tags = extract_tag_box('Tags')
categories = extract_tag_box(
r'(?s)Categories:.*?</[^>]+>(.+?)</div>', 'categories')
tags = extract_tag_box(
r'(?s)Tags:.*?</div>\s*<div[^>]+class=["\']tagBoxContent["\'][^>]*>(.+?)</div>',
'tags')
return {
'id': video_id,

View File

@ -264,7 +264,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
)
)? # all until now is optional -> you can pass the naked ID
([0-9A-Za-z_-]{11}) # here is it! the YouTube video ID
(?!.*?&list=) # combined list/video URLs are handled by the playlist IE
(?!.*?\blist=) # combined list/video URLs are handled by the playlist IE
(?(1).+)? # if we found the ID, everything can follow
$"""
_NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
@ -844,6 +844,24 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# YouTube Red paid video (https://github.com/rg3/youtube-dl/issues/10059)
'url': 'https://www.youtube.com/watch?v=i1Ko8UG-Tdo',
'only_matching': True,
},
{
# Rental video preview
'url': 'https://www.youtube.com/watch?v=yYr8q0y5Jfg',
'info_dict': {
'id': 'uGpuVWrhIzE',
'ext': 'mp4',
'title': 'Piku - Trailer',
'description': 'md5:c36bd60c3fd6f1954086c083c72092eb',
'upload_date': '20150811',
'uploader': 'FlixMatrix',
'uploader_id': 'FlixMatrixKaravan',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan',
'license': 'Standard YouTube License',
},
'params': {
'skip_download': True,
},
}
]
@ -1254,6 +1272,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Convert to the same format returned by compat_parse_qs
video_info = dict((k, [v]) for k, v in args.items())
add_dash_mpd(video_info)
# Rental video is not rented but preview is available (e.g.
# https://www.youtube.com/watch?v=yYr8q0y5Jfg,
# https://github.com/rg3/youtube-dl/issues/10532)
if not video_info and args.get('ypc_vid'):
return self.url_result(
args['ypc_vid'], YoutubeIE.ie_key(), video_id=args['ypc_vid'])
if args.get('livestream') == '1' or args.get('live_playback') == 1:
is_live = True
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
@ -1754,11 +1778,14 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
_VALID_URL = r"""(?x)(?:
(?:https?://)?
(?:\w+\.)?
youtube\.com/
(?:
(?:course|view_play_list|my_playlists|artist|playlist|watch|embed/videoseries)
\? (?:.*?[&;])*? (?:p|a|list)=
| p/
youtube\.com/
(?:
(?:course|view_play_list|my_playlists|artist|playlist|watch|embed/videoseries)
\? (?:.*?[&;])*? (?:p|a|list)=
| p/
)|
youtu\.be/[0-9A-Za-z_-]{11}\?.*?\blist=
)
(
(?:PL|LL|EC|UU|FL|RD|UL)?[0-9A-Za-z-_]{10,}
@ -1841,6 +1868,31 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
'id': 'UUXw-G3eDE9trcvY2sBMM_aA',
},
'playlist_mincout': 21,
}, {
# Playlist URL that does not actually serve a playlist
'url': 'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4',
'info_dict': {
'id': 'FqZTN594JQw',
'ext': 'webm',
'title': "Smiley's People 01 detective, Adventure Series, Action",
'uploader': 'STREEM',
'uploader_id': 'UCyPhqAZgwYWZfxElWVbVJng',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng',
'upload_date': '20150526',
'license': 'Standard YouTube License',
'description': 'md5:507cdcb5a49ac0da37a920ece610be80',
'categories': ['People & Blogs'],
'tags': list,
'like_count': int,
'dislike_count': int,
},
'params': {
'skip_download': True,
},
'add_ie': [YoutubeIE.ie_key()],
}, {
'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
'only_matching': True,
}]
def _real_initialize(self):
@ -1901,9 +1953,20 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
playlist_title = self._html_search_regex(
r'(?s)<h1 class="pl-header-title[^"]*"[^>]*>\s*(.*?)\s*</h1>',
page, 'title')
page, 'title', default=None)
return self.playlist_result(self._entries(page, playlist_id), playlist_id, playlist_title)
has_videos = True
if not playlist_title:
try:
# Some playlist URLs don't actually serve a playlist (e.g.
# https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4)
next(self._entries(page, playlist_id))
except StopIteration:
has_videos = False
return has_videos, self.playlist_result(
self._entries(page, playlist_id), playlist_id, playlist_title)
def _check_download_just_video(self, url, playlist_id):
# Check if it's a video-specific URL
@ -1912,9 +1975,11 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
video_id = query_dict['v'][0]
if self._downloader.params.get('noplaylist'):
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
return self.url_result(video_id, 'Youtube', video_id=video_id)
return video_id, self.url_result(video_id, 'Youtube', video_id=video_id)
else:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
return video_id, None
return None, None
def _real_extract(self, url):
# Extract playlist id
@ -1923,7 +1988,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
raise ExtractorError('Invalid URL: %s' % url)
playlist_id = mobj.group(1) or mobj.group(2)
video = self._check_download_just_video(url, playlist_id)
video_id, video = self._check_download_just_video(url, playlist_id)
if video:
return video
@ -1931,7 +1996,15 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
# Mixes require a custom extraction process
return self._extract_mix(playlist_id)
return self._extract_playlist(playlist_id)
has_videos, playlist = self._extract_playlist(playlist_id)
if has_videos or not video_id:
return playlist
# Some playlist URLs don't actually serve a playlist (see
# https://github.com/rg3/youtube-dl/issues/10537).
# Fallback to plain video extraction if there is a video id
# along with playlist id.
return self.url_result(video_id, 'Youtube', video_id=video_id)
class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
@ -2309,10 +2382,11 @@ class YoutubeWatchLaterIE(YoutubePlaylistIE):
}]
def _real_extract(self, url):
video = self._check_download_just_video(url, 'WL')
_, video = self._check_download_just_video(url, 'WL')
if video:
return video
return self._extract_playlist('WL')
_, playlist = self._extract_playlist('WL')
return playlist
class YoutubeFavouritesIE(YoutubeBaseInfoExtractor):
@ -2343,7 +2417,7 @@ class YoutubeSubscriptionsIE(YoutubeFeedsInfoExtractor):
class YoutubeHistoryIE(YoutubeFeedsInfoExtractor):
IE_DESC = 'Youtube watch history, ":ythistory" for short (requires authentication)'
_VALID_URL = 'https?://www\.youtube\.com/feed/history|:ythistory'
_VALID_URL = r'https?://www\.youtube\.com/feed/history|:ythistory'
_FEED_NAME = 'history'
_PLAYLIST_TITLE = 'Youtube History'

View File

@ -423,7 +423,15 @@ def parseOpts(overrideArguments=None):
downloader.add_option(
'--fragment-retries',
dest='fragment_retries', metavar='RETRIES', default=10,
help='Number of retries for a fragment (default is %default), or "infinite" (DASH only)')
help='Number of retries for a fragment (default is %default), or "infinite" (DASH and hlsnative only)')
downloader.add_option(
'--skip-unavailable-fragments',
action='store_true', dest='skip_unavailable_fragments', default=True,
help='Skip unavailable fragments (DASH and hlsnative only)')
general.add_option(
'--abort-on-unavailable-fragment',
action='store_false', dest='skip_unavailable_fragments',
help='Abort downloading when some fragment is not available')
downloader.add_option(
'--buffer-size',
dest='buffersize', metavar='SIZE', default='1024',

View File

@ -2148,7 +2148,7 @@ def mimetype2ext(mt):
return ext
_, _, res = mt.rpartition('/')
res = res.lower()
res = res.split(';')[0].strip().lower()
return {
'3gpp': '3gp',
@ -2168,6 +2168,7 @@ def mimetype2ext(mt):
'f4m+xml': 'f4m',
'hds+xml': 'f4m',
'vnd.ms-sstr+xml': 'ism',
'quicktime': 'mov',
}.get(res, res)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.08.31'
__version__ = '2016.09.08'