Merge branch 'master' of github.com:rg3/youtube-dl into youtube-dl

This commit is contained in:
Varun Verma 2016-09-25 13:20:51 +05:30
commit 9ece2259c1
34 changed files with 1877 additions and 150 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.15*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.15**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.09.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.09.24**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.09.15
[debug] youtube-dl version 2016.09.24
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -1,8 +1,63 @@
version <unreleased>
vesion <unreleased>
Core
+ Improved support for HTML5 subtitles
Extractors
+ [twitter] Support Periscope embeds (#10737)
+ [openload] Support subtitles (#10625)
version 2016.09.24
Core
+ Add support for watchTVeverywhere.com authentication provider based MSOs for
Adobe Pass authentication (#10709)
Extractors
+ [soundcloud:playlist] Provide video id for early playlist entries (#10733)
+ [prosiebensat1] Add support for kabeleinsdoku (#10732)
* [cbs] Extract info from thunder videoPlayerService (#10728)
* [openload] Fix extraction (#10408)
+ [ustream] Support the new HLS streams (#10698)
+ [ooyala] Extract all HLS formats
+ [cartoonnetwork] Add support for Adobe Pass authentication
+ [soundcloud] Extract license metadata
+ [fox] Add support for Adobe Pass authentication (#8584)
+ [tbs] Add support for Adobe Pass authentication (#10642, #10222)
+ [trutv] Add support for Adobe Pass authentication (#10519)
+ [turner] Add support for Adobe Pass authentication
version 2016.09.19
Extractors
+ [crunchyroll] Check if already authenticated (#10700)
- [twitch:stream] Remove fallback to profile extraction when stream is offline
* [thisav] Improve title extraction (#10682)
* [vyborymos] Improve station info extraction
version 2016.09.18
Core
+ Introduce manifest_url and fragments fields in formats dictionary for
fragmented media
+ Provide manifest_url field for DASH segments, HLS and HDS
+ Provide fragments field for DASH segments
* Rework DASH segments downloader to use fragments field
+ Add helper method for Wowza Streaming Engine formats extraction
Extractors
+ [vyborymos] Add extractor for vybory.mos.ru (#10692)
+ [xfileshare] Add title regular expression for streamin.to (#10646)
+ [globo:article] Add support for multiple videos (#10653)
+ [thisav] Recognize HTML5 videos (#10447)
* [jwplatform] Improve JWPlayer detection
+ [mangomolo] Add support for Mangomolo embeds
+ [toutv] Add support for authentication (#10669)
* [franceinter] Fix upload date extraction
* [tv4] Fix HLS and HDS formats extraction (#10659)
version 2016.09.15

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete

View File

@ -388,6 +388,8 @@
- **mailru**: Видео@Mail.Ru
- **MakersChannel**
- **MakerTV**
- **mangomolo:live**
- **mangomolo:video**
- **MatchTV**
- **MDR**: MDR.DE and KiKA
- **media.ccc.de**
@ -849,6 +851,7 @@
- **VRT**
- **vube**: Vube.com
- **VuClip**
- **VyboryMos**
- **Walla**
- **washingtonpost**
- **washingtonpost:article**

View File

@ -31,7 +31,7 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative'
@staticmethod
def can_download(manifest):
def can_download(manifest, info_dict):
UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE|AES-128)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
@ -53,6 +53,7 @@ class HlsFD(FragmentFD):
)
check_results = [not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES]
check_results.append(can_decrypt_frag or '#EXT-X-KEY:METHOD=AES-128' not in manifest)
check_results.append(not info_dict.get('is_live'))
return all(check_results)
def real_download(self, filename, info_dict):
@ -62,7 +63,7 @@ class HlsFD(FragmentFD):
s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s):
if not self.can_download(s, info_dict):
self.report_warning(
'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg')

File diff suppressed because it is too large Load Diff

View File

@ -621,15 +621,21 @@ class BrightcoveNewIE(InfoExtractor):
'url': text_track['src'],
})
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
if duration and duration < 0:
is_live = True
return {
'id': video_id,
'title': title,
'title': self._live_title(title) if is_live else title,
'description': clean_html(json_data.get('description')),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'duration': duration,
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id,
'formats': formats,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
'is_live': is_live,
}

View File

@ -33,4 +33,10 @@ class CartoonNetworkIE(TurnerBaseIE):
'media_src': 'http://androidhls-secure.cdn.turner.com/toon/big',
'tokenizer_src': 'http://www.cartoonnetwork.com/cntv/mvpd/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'CartoonNetwork',
'auth_required': self._search_regex(
r'_cnglobal\.cvpFullOrPreviewAuth\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true',
})

View File

@ -4,7 +4,9 @@ from .theplatform import ThePlatformFeedIE
from ..utils import (
int_or_none,
find_xpath_attr,
ExtractorError,
xpath_element,
xpath_text,
update_url_query,
)
@ -47,27 +49,49 @@ class CBSIE(CBSBaseIE):
'only_matching': True,
}]
def _extract_video_info(self, guid):
path = 'dJ5BDC/media/guid/2198311517/' + guid
smil_url = 'http://link.theplatform.com/s/%s?mbr=true' % path
formats, subtitles = self._extract_theplatform_smil(smil_url + '&manifest=m3u', guid)
for r in ('OnceURL&formats=M3U', 'HLS&formats=M3U', 'RTMP', 'WIFI', '3G'):
try:
tp_formats, _ = self._extract_theplatform_smil(smil_url + '&assetTypes=' + r, guid, 'Downloading %s SMIL data' % r.split('&')[0])
formats.extend(tp_formats)
except ExtractorError:
def _extract_video_info(self, content_id):
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
tp_path = 'dJ5BDC/media/guid/2198311517/%s' % content_id
tp_release_url = 'http://link.theplatform.com/s/' + tp_path
asset_types = []
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types:
continue
asset_types.append(asset_type)
query = {
'mbr': 'true',
'assetTypes': asset_type,
}
if asset_type.startswith('HLS') or asset_type in ('OnceURL', 'StreamPack'):
query['formats'] = 'MPEG4,M3U'
elif asset_type in ('RTMP', 'WIFI', '3G'):
query['formats'] = 'MPEG4,FLV'
tp_formats, tp_subtitles = self._extract_theplatform_smil(
update_url_query(tp_release_url, query), content_id,
'Downloading %s SMIL data' % asset_type)
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
metadata = self._download_theplatform_metadata(path, guid)
info = self._parse_theplatform_metadata(metadata)
info = self._extract_theplatform_metadata(tp_path, content_id)
info.update({
'id': guid,
'id': content_id,
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
'series': metadata.get('cbs$SeriesTitle'),
'season_number': int_or_none(metadata.get('cbs$SeasonNumber')),
'episode': metadata.get('cbs$EpisodeTitle'),
'episode_number': int_or_none(metadata.get('cbs$EpisodeNumber')),
})
return info

View File

@ -9,6 +9,7 @@ from ..utils import (
class CBSNewsIE(CBSIE):
IE_NAME = 'cbsnews'
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@ -68,15 +69,16 @@ class CBSNewsIE(CBSIE):
class CBSNewsLiveVideoIE(InfoExtractor):
IE_NAME = 'cbsnews:livevideo'
IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[^/?#]+)'
# Live videos get deleted soon. See http://www.cbsnews.com/live/ for the latest examples
_TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
'info_dict': {
'id': 'clinton-sanders-prepare-to-face-off-in-nh',
'ext': 'flv',
'ext': 'mp4',
'title': 'Clinton, Sanders Prepare To Face Off In NH',
'duration': 334,
},
@ -84,25 +86,22 @@ class CBSNewsLiveVideoIE(InfoExtractor):
}
def _real_extract(self, url):
video_id = self._match_id(url)
display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._download_json(
'http://feeds.cbsn.cbsnews.com/rundown/story', display_id, query={
'device': 'desktop',
'dvr_slug': display_id,
})
video_info = self._parse_json(self._html_search_regex(
r'data-story-obj=\'({.+?})\'', webpage, 'video JSON info'), video_id)['story']
hdcore_sign = 'hdcore=3.3.1'
f4m_formats = self._extract_f4m_formats(video_info['url'] + '&' + hdcore_sign, video_id)
if f4m_formats:
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
formats = self._extract_akamai_formats(video_info['url'], display_id)
self._sort_formats(formats)
return {
'id': video_id,
'id': display_id,
'display_id': display_id,
'title': video_info['headline'],
'thumbnail': video_info.get('thumbnail_url_hd') or video_info.get('thumbnail_url_sd'),
'duration': parse_duration(video_info.get('segmentDur')),
'formats': f4m_formats,
'formats': formats,
}

View File

@ -1828,7 +1828,7 @@ class InfoExtractor(object):
for track_tag in re.findall(r'<track[^>]+>', media_content):
track_attributes = extract_attributes(track_tag)
kind = track_attributes.get('kind')
if not kind or kind == 'subtitles':
if not kind or kind in ('subtitles', 'captions'):
src = track_attributes.get('src')
if not src:
continue
@ -1836,16 +1836,21 @@ class InfoExtractor(object):
media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src),
})
if media_info['formats']:
if media_info['formats'] or media_info['subtitles']:
entries.append(media_info)
return entries
def _extract_akamai_formats(self, manifest_url, video_id):
formats = []
hdcore_sign = 'hdcore=3.7.0'
f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
formats.extend(self._extract_f4m_formats(
update_url_query(f4m_url, {'hdcore': '3.7.0'}),
video_id, f4m_id='hds', fatal=False))
if 'hdcore=' not in f4m_url:
f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
f4m_formats = self._extract_f4m_formats(
f4m_url, video_id, f4m_id='hds', fatal=False)
for entry in f4m_formats:
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.extend(f4m_formats)
m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',

View File

@ -46,6 +46,13 @@ class CrunchyrollBaseIE(InfoExtractor):
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
def is_logged(webpage):
return '<title>Redirecting' in webpage
# Already logged in
if is_logged(login_page):
return
login_form_str = self._search_regex(
r'(?P<form><form[^>]+?id=(["\'])%s\2[^>]*>)' % self._LOGIN_FORM,
login_page, 'login form', group='form')
@ -69,7 +76,7 @@ class CrunchyrollBaseIE(InfoExtractor):
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if '<title>Redirecting' in response:
if is_logged(response):
return
error = self._html_search_regex(

View File

@ -516,6 +516,7 @@ from .movingimage import MovingImageIE
from .msn import MSNIE
from .mtv import (
MTVIE,
MTVVideoIE,
MTVServicesEmbeddedIE,
MTVDEIE,
)
@ -1069,6 +1070,7 @@ from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vyborymos import VyboryMosIE
from .walla import WallaIE
from .washingtonpost import (
WashingtonPostIE,

View File

@ -1,14 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from .adobepass import AdobePassIE
from ..utils import (
smuggle_url,
update_url_query,
)
class FOXIE(InfoExtractor):
class FOXIE(AdobePassIE):
_VALID_URL = r'https?://(?:www\.)?fox\.com/watch/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.fox.com/watch/255180355939/7684182528',
@ -30,14 +30,26 @@ class FOXIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
release_url = self._parse_json(self._search_regex(
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
video_id)['release_url']
settings = self._parse_json(self._search_regex(
r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
webpage, 'drupal settings'), video_id)
fox_pdk_player = settings['fox_pdk_player']
release_url = fox_pdk_player['release_url']
query = {
'mbr': 'true',
'switch': 'http'
}
if fox_pdk_player.get('access') == 'locked':
ap_p = settings['foxAdobePassProvider']
rating = ap_p.get('videoRating')
if rating == 'n/a':
rating = None
resource = self._get_mvpd_resource('fbc-fox', None, ap_p['videoGUID'], rating)
query['auth'] = self._extract_mvpd_auth(url, video_id, 'fbc-fox', resource)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
release_url, {'switch': 'http'}), {'force_smil_url': True}),
'url': smuggle_url(update_url_query(release_url, query), {'force_smil_url': True}),
'id': video_id,
}

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals
import random
import re
import math
from .common import InfoExtractor
@ -14,6 +15,7 @@ from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
orderedSet,
str_or_none,
)
@ -63,6 +65,9 @@ class GloboIE(InfoExtractor):
}, {
'url': 'http://canaloff.globo.com/programas/desejar-profundo/videos/4518560.html',
'only_matching': True,
}, {
'url': 'globo:3607726',
'only_matching': True,
}]
class MD5(object):
@ -396,7 +401,7 @@ class GloboIE(InfoExtractor):
class GloboArticleIE(InfoExtractor):
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/]+)(?:\.html)?'
_VALID_URL = r'https?://.+?\.globo\.com/(?:[^/]+/)*(?P<id>[^/.]+)(?:\.html)?'
_VIDEOID_REGEXES = [
r'\bdata-video-id=["\'](\d{7,})',
@ -408,15 +413,20 @@ class GloboArticleIE(InfoExtractor):
_TESTS = [{
'url': 'http://g1.globo.com/jornal-nacional/noticia/2014/09/novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes.html',
'md5': '307fdeae4390ccfe6ba1aa198cf6e72b',
'info_dict': {
'id': '3652183',
'ext': 'mp4',
'title': 'Receita Federal explica como vai fiscalizar bagagens de quem retorna ao Brasil de avião',
'duration': 110.711,
'uploader': 'Rede Globo',
'uploader_id': '196',
}
'id': 'novidade-na-fiscalizacao-de-bagagem-pela-receita-provoca-discussoes',
'title': 'Novidade na fiscalização de bagagem pela Receita provoca discussões',
'description': 'md5:c3c4b4d4c30c32fce460040b1ac46b12',
},
'playlist_count': 1,
}, {
'url': 'http://g1.globo.com/pr/parana/noticia/2016/09/mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato.html',
'info_dict': {
'id': 'mpf-denuncia-lula-marisa-e-mais-seis-na-operacao-lava-jato',
'title': "Lula era o 'comandante máximo' do esquema da Lava Jato, diz MPF",
'description': 'md5:8aa7cc8beda4dc71cc8553e00b77c54c',
},
'playlist_count': 6,
}, {
'url': 'http://gq.globo.com/Prazeres/Poder/noticia/2015/10/all-o-desafio-assista-ao-segundo-capitulo-da-serie.html',
'only_matching': True,
@ -435,5 +445,12 @@ class GloboArticleIE(InfoExtractor):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(self._VIDEOID_REGEXES, webpage, 'video id')
return self.url_result('globo:%s' % video_id, 'Globo')
video_ids = []
for video_regex in self._VIDEOID_REGEXES:
video_ids.extend(re.findall(video_regex, webpage))
entries = [
self.url_result('globo:%s' % video_id, GloboIE.ie_key())
for video_id in orderedSet(video_ids)]
title = self._og_search_title(webpage, fatal=False)
description = self._html_search_meta('description', webpage)
return self.playlist_result(entries, display_id, title, description)

View File

@ -270,6 +270,29 @@ class MTVServicesEmbeddedIE(MTVServicesInfoExtractor):
class MTVIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv'
_VALID_URL = r'https?://(?:www\.)?mtv\.com/(video-clips|full-episodes)/(?P<id>[^/?#.]+)'
_FEED_URL = 'http://www.mtv.com/feeds/mrss/'
_TESTS = [{
'url': 'http://www.mtv.com/video-clips/vl8qof/unlocking-the-truth-trailer',
'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
'info_dict': {
'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
'ext': 'mp4',
'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
'timestamp': 1468846800,
'upload_date': '20160718',
},
}, {
'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
'only_matching': True,
}]
class MTVVideoIE(MTVServicesInfoExtractor):
IE_NAME = 'mtv:video'
_VALID_URL = r'''(?x)^https?://
(?:(?:www\.)?mtv\.com/videos/.+?/(?P<videoid>[0-9]+)/[^/]+$|
m\.mtv\.com/videos/video\.rbml\?.*?id=(?P<mgid>[^&]+))'''

View File

@ -9,9 +9,9 @@ from ..utils import (
class MwaveIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
_TEST = {
_TESTS = [{
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
# md5 is unstable
'info_dict': {
@ -23,7 +23,10 @@ class MwaveIE(InfoExtractor):
'duration': 206,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/mnettv/videodetail.m?searchVideoDetailVO.clip_id=176199',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -60,8 +63,8 @@ class MwaveIE(InfoExtractor):
class MwaveMeetGreetIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/meetgreet/view/(?P<id>\d+)'
_TEST = {
_VALID_URL = r'https?://mwave\.interest\.me/(?:[^/]+/)?meetgreet/view/(?P<id>\d+)'
_TESTS = [{
'url': 'http://mwave.interest.me/meetgreet/view/256',
'info_dict': {
'id': '173294',
@ -72,7 +75,10 @@ class MwaveMeetGreetIE(InfoExtractor):
'duration': 3634,
'view_count': int,
}
}
}, {
'url': 'http://mwave.interest.me/en/meetgreet/view/256',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@ -47,7 +47,7 @@ class OoyalaBaseIE(InfoExtractor):
delivery_type = stream['delivery_type']
if delivery_type == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
s_url, embed_code, 'mp4', 'm3u8_native',
re.sub(r'/ip(?:ad|hone)/', '/all/', s_url), embed_code, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
elif delivery_type == 'hds' or ext == 'f4m':
formats.extend(self._extract_f4m_formats(

View File

@ -24,6 +24,22 @@ class OpenloadIE(InfoExtractor):
'title': 'skyrim_no-audio_1080.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
},
}, {
'url': 'https://openload.co/embed/rjC09fkPLYs',
'info_dict': {
'id': 'rjC09fkPLYs',
'ext': 'mp4',
'title': 'movie.mp4',
'thumbnail': 're:^https?://.*\.jpg$',
'subtitles': {
'en': [{
'ext': 'vtt',
}],
},
},
'params': {
'skip_download': True, # test subtitles only
},
}, {
'url': 'https://openload.co/embed/kUEfGclsU9o/skyrim_no-audio_1080.mp4',
'only_matching': True,
@ -51,7 +67,8 @@ class OpenloadIE(InfoExtractor):
# declared to be freely used in youtube-dl
# See https://github.com/rg3/youtube-dl/issues/10408
enc_data = self._html_search_regex(
r'<span[^>]+id="hiddenurl"[^>]*>([^<]+)</span>', webpage, 'encrypted data')
r'<span[^>]*>([^<]+)</span>\s*<span[^>]*>[^<]+</span>\s*<span[^>]+id="streamurl"',
webpage, 'encrypted data')
video_url_chars = []
@ -60,7 +77,7 @@ class OpenloadIE(InfoExtractor):
if j >= 33 and j <= 126:
j = ((j + 14) % 94) + 33
if idx == len(enc_data) - 1:
j += 3
j += 2
video_url_chars += compat_chr(j)
video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
@ -70,11 +87,17 @@ class OpenloadIE(InfoExtractor):
'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True)
return {
entries = self._parse_html5_media_entries(url, webpage, video_id)
subtitles = entries[0]['subtitles'] if entries else None
info_dict = {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'url': video_url,
# Seems all videos have extensions in their titles
'ext': determine_ext(title),
'subtitles': subtitles,
}
return info_dict

View File

@ -1,6 +1,8 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_iso8601,
@ -41,6 +43,13 @@ class PeriscopeIE(PeriscopeBaseIE):
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=([\'"])(?P<url>(?:https?:)?//(?:www\.)?periscope\.tv/(?:(?!\1).)+)\1', webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
token = self._match_id(url)

View File

@ -122,7 +122,17 @@ class ProSiebenSat1BaseIE(InfoExtractor):
class ProSiebenSat1IE(ProSiebenSat1BaseIE):
IE_NAME = 'prosiebensat1'
IE_DESC = 'ProSiebenSat.1 Digital'
_VALID_URL = r'https?://(?:www\.)?(?:(?:prosieben|prosiebenmaxx|sixx|sat1|kabeleins|the-voice-of-germany|7tv)\.(?:de|at|ch)|ran\.de|fem\.com)/(?P<id>.+)'
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?:
(?:
prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
)\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de
)
/(?P<id>.+)
'''
_TESTS = [
{
@ -290,6 +300,24 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
'skip_download': True,
},
},
{
# geo restricted to Germany
'url': 'http://www.kabeleinsdoku.de/tv/mayday-alarm-im-cockpit/video/102-notlandung-im-hudson-river-ganze-folge',
'only_matching': True,
},
{
# geo restricted to Germany
'url': 'http://www.sat1gold.de/tv/edel-starck/video/11-staffel-1-episode-1-partner-wider-willen-ganze-folge',
'only_matching': True,
},
{
'url': 'http://www.sat1gold.de/tv/edel-starck/playlist/die-gesamte-1-staffel',
'only_matching': True,
},
{
'url': 'http://www.advopedia.de/videos/lenssen-klaert-auf/lenssen-klaert-auf-folge-8-staffel-3-feiertage-und-freie-tage',
'only_matching': True,
},
]
_TOKEN = 'prosieben'
@ -361,19 +389,28 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex(
self._PLAYLIST_ID_REGEXES, webpage, 'playlist id')
for regex in self._PLAYLIST_CLIP_REGEXES:
playlist_clips = re.findall(regex, webpage)
if playlist_clips:
title = self._html_search_regex(
self._TITLE_REGEXES, webpage, 'title')
description = self._html_search_regex(
self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
entries = [
self.url_result(
re.match('(.+?//.+?)/', url).group(1) + clip_path,
'ProSiebenSat1')
for clip_path in playlist_clips]
return self.playlist_result(entries, playlist_id, title, description)
playlist = self._parse_json(
self._search_regex(
'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
webpage, 'playlist'),
playlist_id)
entries = []
for item in playlist:
clip_id = item.get('id') or item.get('upc')
if not clip_id:
continue
info = self._extract_video_info(url, clip_id)
info.update({
'id': clip_id,
'title': item.get('title') or item.get('teaser', {}).get('headline'),
'description': item.get('teaser', {}).get('description'),
'thumbnail': item.get('poster'),
'duration': float_or_none(item.get('duration')),
'series': item.get('tvShowTitle'),
'uploader': item.get('broadcastPublisher'),
})
entries.append(info)
return self.playlist_result(entries, playlist_id)
def _real_extract(self, url):
video_id = self._match_id(url)

View File

@ -53,6 +53,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'E.T. ExTerrestrial Music',
'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'duration': 143,
'license': 'all-rights-reserved',
}
},
# not streamable song
@ -66,6 +67,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'The Royal Concept',
'upload_date': '20120521',
'duration': 227,
'license': 'all-rights-reserved',
},
'params': {
# rtmp
@ -84,6 +86,7 @@ class SoundcloudIE(InfoExtractor):
'description': 'test chars: \"\'/\\ä↭',
'upload_date': '20131209',
'duration': 9,
'license': 'all-rights-reserved',
},
},
# private link (alt format)
@ -98,6 +101,7 @@ class SoundcloudIE(InfoExtractor):
'description': 'test chars: \"\'/\\ä↭',
'upload_date': '20131209',
'duration': 9,
'license': 'all-rights-reserved',
},
},
# downloadable song
@ -112,6 +116,7 @@ class SoundcloudIE(InfoExtractor):
'uploader': 'oddsamples',
'upload_date': '20140109',
'duration': 17,
'license': 'cc-by-sa',
},
},
]
@ -138,20 +143,20 @@ class SoundcloudIE(InfoExtractor):
name = full_title or track_id
if quiet:
self.report_extraction(name)
thumbnail = info['artwork_url']
if thumbnail is not None:
thumbnail = info.get('artwork_url')
if isinstance(thumbnail, compat_str):
thumbnail = thumbnail.replace('-large', '-t500x500')
ext = 'mp3'
result = {
'id': track_id,
'uploader': info['user']['username'],
'upload_date': unified_strdate(info['created_at']),
'uploader': info.get('user', {}).get('username'),
'upload_date': unified_strdate(info.get('created_at')),
'title': info['title'],
'description': info['description'],
'description': info.get('description'),
'thumbnail': thumbnail,
'duration': int_or_none(info.get('duration'), 1000),
'webpage_url': info.get('permalink_url'),
'license': info.get('license'),
}
formats = []
if info.get('downloadable', False):
@ -221,7 +226,7 @@ class SoundcloudIE(InfoExtractor):
raise ExtractorError('Invalid URL: %s' % url)
track_id = mobj.group('track_id')
token = None
if track_id is not None:
info_json_url = 'http://api.soundcloud.com/tracks/' + track_id + '.json?client_id=' + self._CLIENT_ID
full_title = track_id
@ -255,7 +260,20 @@ class SoundcloudIE(InfoExtractor):
return self._extract_info_dict(info, full_title, secret_token=token)
class SoundcloudSetIE(SoundcloudIE):
class SoundcloudPlaylistBaseIE(SoundcloudIE):
@staticmethod
def _extract_id(e):
return compat_str(e['id']) if e.get('id') else None
def _extract_track_entries(self, tracks):
return [
self.url_result(
track['permalink_url'], SoundcloudIE.ie_key(),
video_id=self._extract_id(track))
for track in tracks if track.get('permalink_url')]
class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
_VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<uploader>[\w\d-]+)/sets/(?P<slug_title>[\w\d-]+)(?:/(?P<token>[^?/]+))?'
IE_NAME = 'soundcloud:set'
_TESTS = [{
@ -294,7 +312,7 @@ class SoundcloudSetIE(SoundcloudIE):
msgs = (compat_str(err['error_message']) for err in info['errors'])
raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in info['tracks']]
entries = self._extract_track_entries(info['tracks'])
return {
'_type': 'playlist',
@ -304,7 +322,7 @@ class SoundcloudSetIE(SoundcloudIE):
}
class SoundcloudUserIE(SoundcloudIE):
class SoundcloudUserIE(SoundcloudPlaylistBaseIE):
_VALID_URL = r'''(?x)
https?://
(?:(?:www|m)\.)?soundcloud\.com/
@ -321,21 +339,21 @@ class SoundcloudUserIE(SoundcloudIE):
'id': '114582580',
'title': 'The Akashic Chronicler (All)',
},
'playlist_mincount': 111,
'playlist_mincount': 74,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/tracks',
'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Tracks)',
},
'playlist_mincount': 50,
'playlist_mincount': 37,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/sets',
'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Playlists)',
},
'playlist_mincount': 3,
'playlist_mincount': 2,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/reposts',
'info_dict': {
@ -354,7 +372,7 @@ class SoundcloudUserIE(SoundcloudIE):
'url': 'https://soundcloud.com/grynpyret/spotlight',
'info_dict': {
'id': '7098329',
'title': 'Grynpyret (Spotlight)',
'title': 'GRYNPYRET (Spotlight)',
},
'playlist_mincount': 1,
}]
@ -416,13 +434,14 @@ class SoundcloudUserIE(SoundcloudIE):
for cand in candidates:
if isinstance(cand, dict):
permalink_url = cand.get('permalink_url')
entry_id = self._extract_id(cand)
if permalink_url and permalink_url.startswith('http'):
return permalink_url
return permalink_url, entry_id
for e in collection:
permalink_url = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
permalink_url, entry_id = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
if permalink_url:
entries.append(self.url_result(permalink_url))
entries.append(self.url_result(permalink_url, video_id=entry_id))
next_href = response.get('next_href')
if not next_href:
@ -442,7 +461,7 @@ class SoundcloudUserIE(SoundcloudIE):
}
class SoundcloudPlaylistIE(SoundcloudIE):
class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
_VALID_URL = r'https?://api\.soundcloud\.com/playlists/(?P<id>[0-9]+)(?:/?\?secret_token=(?P<token>[^&]+?))?$'
IE_NAME = 'soundcloud:playlist'
_TESTS = [{
@ -472,7 +491,7 @@ class SoundcloudPlaylistIE(SoundcloudIE):
data = self._download_json(
base_url + data, playlist_id, 'Downloading playlist')
entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in data['tracks']]
entries = self._extract_track_entries(data['tracks'])
return {
'_type': 'playlist',

View File

@ -4,10 +4,7 @@ from __future__ import unicode_literals
import re
from .turner import TurnerBaseIE
from ..utils import (
extract_attributes,
ExtractorError,
)
from ..utils import extract_attributes
class TBSIE(TurnerBaseIE):
@ -37,10 +34,6 @@ class TBSIE(TurnerBaseIE):
site = domain[:3]
webpage = self._download_webpage(url, display_id)
video_params = extract_attributes(self._search_regex(r'(<[^>]+id="page-video"[^>]*>)', webpage, 'video params'))
if video_params.get('isAuthRequired') == 'true':
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported.', expected=True)
query = None
clip_id = video_params.get('clipid')
if clip_id:
@ -56,4 +49,8 @@ class TBSIE(TurnerBaseIE):
'media_src': 'http://androidhls-secure.cdn.turner.com/%s/big' % site,
'tokenizer_src': 'http://www.%s.com/video/processors/services/token_ipadAdobe.do' % domain,
},
}, {
'url': url,
'site_name': site.upper(),
'auth_required': video_params.get('isAuthRequired') != 'false',
})

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from ..utils import remove_end
class ThisAVIE(JWPlatformBaseIE):
@ -35,7 +36,9 @@ class ThisAVIE(JWPlatformBaseIE):
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h1>([^<]*)</h1>', webpage, 'title')
title = remove_end(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'),
' - 視頻 - ThisAV.com-世界第一中文成人娛樂網站')
video_url = self._html_search_regex(
r"addVariable\('file','([^']+)'\);", webpage, 'video url', default=None)
if video_url:

View File

@ -22,9 +22,17 @@ class TruTVIE(TurnerBaseIE):
def _real_extract(self, url):
path, video_id = re.match(self._VALID_URL, url).groups()
auth_required = False
if path:
data_src = 'http://www.trutv.com/video/cvp/v2/xml/content.xml?id=%s.xml' % path
else:
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r"TTV\.TVE\.episodeId\s*=\s*'([^']+)';",
webpage, 'video id', default=video_id)
auth_required = self._search_regex(
r'TTV\.TVE\.authRequired\s*=\s*(true|false);',
webpage, 'auth required', default='false') == 'true'
data_src = 'http://www.trutv.com/tveverywhere/services/cvpXML.do?titleId=' + video_id
return self._extract_cvp_info(
data_src, path, {
@ -32,4 +40,8 @@ class TruTVIE(TurnerBaseIE):
'media_src': 'http://androidhls-secure.cdn.turner.com/trutv/big',
'tokenizer_src': 'http://www.trutv.com/tveverywhere/processors/services/token_ipadAdobe.do',
},
}, {
'url': url,
'site_name': 'truTV',
'auth_required': auth_required,
})

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .adobepass import AdobePassIE
from ..compat import compat_str
from ..utils import (
xpath_text,
@ -16,11 +16,11 @@ from ..utils import (
)
class TurnerBaseIE(InfoExtractor):
class TurnerBaseIE(AdobePassIE):
def _extract_timestamp(self, video_data):
return int_or_none(xpath_attr(video_data, 'dateCreated', 'uts'))
def _extract_cvp_info(self, data_src, video_id, path_data={}):
def _extract_cvp_info(self, data_src, video_id, path_data={}, ap_data={}):
video_data = self._download_xml(data_src, video_id)
video_id = video_data.attrib['id']
title = xpath_text(video_data, 'headline', fatal=True)
@ -70,11 +70,14 @@ class TurnerBaseIE(InfoExtractor):
secure_path = self._search_regex(r'https?://[^/]+(.+/)', video_url, 'secure path') + '*'
token = tokens.get(secure_path)
if not token:
auth = self._download_xml(
secure_path_data['tokenizer_src'], video_id, query={
query = {
'path': secure_path,
'videoId': content_id,
})
}
if ap_data.get('auth_required'):
query['accessToken'] = self._extract_mvpd_auth(ap_data['url'], video_id, ap_data['site_name'], ap_data['site_name'])
auth = self._download_xml(
secure_path_data['tokenizer_src'], video_id, query=query)
error_msg = xpath_text(auth, 'error/msg')
if error_msg:
raise ExtractorError(error_msg, expected=True)

View File

@ -400,11 +400,8 @@ class TwitchStreamIE(TwitchBaseIE):
'kraken/streams/%s' % channel_id, channel_id,
'Downloading stream JSON').get('stream')
# Fallback on profile extraction if stream is offline
if not stream:
return self.url_result(
'http://www.twitch.tv/%s/profile' % channel_id,
'TwitchProfile', channel_id)
raise ExtractorError('%s is offline' % channel_id, expected=True)
# Channel name may be typed if different case than the original channel name
# (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing

View File

@ -4,6 +4,7 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
determine_ext,
float_or_none,
@ -13,6 +14,8 @@ from ..utils import (
ExtractorError,
)
from .periscope import PeriscopeIE
class TwitterBaseIE(InfoExtractor):
def _get_vmap_video_url(self, vmap_url, video_id):
@ -48,12 +51,12 @@ class TwitterCardIE(TwitterBaseIE):
},
{
'url': 'https://twitter.com/i/cards/tfw/v1/654001591733886977',
'md5': 'd4724ffe6d2437886d004fa5de1043b3',
'md5': 'b6d9683dd3f48e340ded81c0e917ad46',
'info_dict': {
'id': 'dq4Oj5quskI',
'ext': 'mp4',
'title': 'Ubuntu 11.10 Overview',
'description': 'Take a quick peek at what\'s new and improved in Ubuntu 11.10.\n\nOnce installed take a look at 10 Things to Do After Installing: http://www.omgubuntu.co.uk/2011/10/10...',
'description': 'md5:a831e97fa384863d6e26ce48d1c43376',
'upload_date': '20111013',
'uploader': 'OMG! Ubuntu!',
'uploader_id': 'omgubuntu',
@ -100,12 +103,17 @@ class TwitterCardIE(TwitterBaseIE):
return self.url_result(iframe_url)
config = self._parse_json(self._html_search_regex(
r'data-(?:player-)?config="([^"]+)"', webpage, 'data player config'),
r'data-(?:player-)?config="([^"]+)"', webpage,
'data player config', default='{}'),
video_id)
if config.get('source_type') == 'vine':
return self.url_result(config['player_url'], 'Vine')
periscope_url = PeriscopeIE._extract_url(webpage)
if periscope_url:
return self.url_result(periscope_url, PeriscopeIE.ie_key())
def _search_dimensions_in_video_url(a_format, video_url):
m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
if m:
@ -244,10 +252,10 @@ class TwitterIE(InfoExtractor):
'info_dict': {
'id': '700207533655363584',
'ext': 'mp4',
'title': 'Donte The Dumbass - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'Donte The Dumbass on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'JG on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
'thumbnail': 're:^https?://.*\.jpg',
'uploader': 'Donte The Dumbass',
'uploader': 'JG',
'uploader_id': 'jaydingeer',
},
'params': {
@ -278,6 +286,18 @@ class TwitterIE(InfoExtractor):
'params': {
'skip_download': True, # requires ffmpeg
},
}, {
'url': 'https://twitter.com/OPP_HSD/status/779210622571536384',
'info_dict': {
'id': '1zqKVVlkqLaKB',
'ext': 'mp4',
'title': 'Sgt Kerry Schmidt - Ontario Provincial Police - Road rage, mischief, assault, rollover and fire in one occurrence',
'upload_date': '20160923',
'uploader_id': 'OPP_HSD',
'uploader': 'Sgt Kerry Schmidt - Ontario Provincial Police',
'timestamp': 1474613214,
},
'add_ie': ['Periscope'],
}]
def _real_extract(self, url):
@ -328,13 +348,22 @@ class TwitterIE(InfoExtractor):
})
return info
twitter_card_url = None
if 'class="PlayableMedia' in webpage:
twitter_card_url = '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid)
else:
twitter_card_iframe_url = self._search_regex(
r'data-full-card-iframe-url=([\'"])(?P<url>(?:(?!\1).)+)\1',
webpage, 'Twitter card iframe URL', default=None, group='url')
if twitter_card_iframe_url:
twitter_card_url = compat_urlparse.urljoin(url, twitter_card_iframe_url)
if twitter_card_url:
info.update({
'_type': 'url_transparent',
'ie_key': 'TwitterCard',
'url': '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid),
'url': twitter_card_url,
})
return info
raise ExtractorError('There\'s no video in this tweet.')

View File

@ -5,6 +5,7 @@ import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_str,
compat_urllib_request,
compat_urlparse,
)
@ -207,7 +208,7 @@ class UdemyIE(InfoExtractor):
if youtube_url:
return self.url_result(youtube_url, 'Youtube')
video_id = asset['id']
video_id = compat_str(asset['id'])
thumbnail = asset.get('thumbnail_url') or asset.get('thumbnailUrl')
duration = float_or_none(asset.get('data', {}).get('duration'))

View File

@ -1,15 +1,20 @@
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
encode_data_uri,
ExtractorError,
int_or_none,
float_or_none,
mimetype2ext,
str_or_none,
)
@ -47,8 +52,108 @@ class UstreamIE(InfoExtractor):
'id': '10299409',
},
'playlist_count': 3,
}, {
'url': 'http://www.ustream.tv/recorded/91343263',
'info_dict': {
'id': '91343263',
'ext': 'mp4',
'title': 'GitHub Universe - General Session - Day 1',
'upload_date': '20160914',
'description': 'GitHub Universe - General Session - Day 1',
'timestamp': 1473872730,
'uploader': 'wa0dnskeqkr',
'uploader_id': '38977840',
},
'params': {
'skip_download': True, # m3u8 download
},
}]
def _get_stream_info(self, url, video_id, app_id_ver, extra_note=None):
def num_to_hex(n):
return hex(n)[2:]
rnd = random.randrange
if not extra_note:
extra_note = ''
conn_info = self._download_json(
'http://r%d-1-%s-recorded-lp-live.ums.ustream.tv/1/ustream' % (rnd(1e8), video_id),
video_id, note='Downloading connection info' + extra_note,
query={
'type': 'viewer',
'appId': app_id_ver[0],
'appVersion': app_id_ver[1],
'rsid': '%s:%s' % (num_to_hex(rnd(1e8)), num_to_hex(rnd(1e8))),
'rpin': '_rpin.%d' % rnd(1e15),
'referrer': url,
'media': video_id,
'application': 'recorded',
})
host = conn_info[0]['args'][0]['host']
connection_id = conn_info[0]['args'][0]['connectionId']
return self._download_json(
'http://%s/1/ustream?connectionId=%s' % (host, connection_id),
video_id, note='Downloading stream info' + extra_note)
def _get_streams(self, url, video_id, app_id_ver):
# Sometimes the return dict does not have 'stream'
for trial_count in range(3):
stream_info = self._get_stream_info(
url, video_id, app_id_ver,
extra_note=' (try %d)' % (trial_count + 1) if trial_count > 0 else '')
if 'stream' in stream_info[0]['args'][0]:
return stream_info[0]['args'][0]['stream']
return []
def _parse_segmented_mp4(self, dash_stream_info):
def resolve_dash_template(template, idx, chunk_hash):
return template.replace('%', compat_str(idx), 1).replace('%', chunk_hash)
formats = []
for stream in dash_stream_info['streams']:
# Use only one provider to avoid too many formats
provider = dash_stream_info['providers'][0]
fragments = [{
'url': resolve_dash_template(
provider['url'] + stream['initUrl'], 0, dash_stream_info['hashes']['0'])
}]
for idx in range(dash_stream_info['videoLength'] // dash_stream_info['chunkTime']):
fragments.append({
'url': resolve_dash_template(
provider['url'] + stream['segmentUrl'], idx,
dash_stream_info['hashes'][compat_str(idx // 10 * 10)])
})
content_type = stream['contentType']
kind = content_type.split('/')[0]
f = {
'format_id': '-'.join(filter(None, [
'dash', kind, str_or_none(stream.get('bitrate'))])),
'protocol': 'http_dash_segments',
# TODO: generate a MPD doc for external players?
'url': encode_data_uri(b'<MPD/>', 'text/xml'),
'ext': mimetype2ext(content_type),
'height': stream.get('height'),
'width': stream.get('width'),
'fragments': fragments,
}
if kind == 'video':
f.update({
'vcodec': stream.get('codec'),
'acodec': 'none',
'vbr': stream.get('bitrate'),
})
else:
f.update({
'vcodec': 'none',
'acodec': stream.get('codec'),
'abr': stream.get('bitrate'),
})
formats.append(f)
return formats
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
@ -86,7 +191,22 @@ class UstreamIE(InfoExtractor):
'url': video_url,
'ext': format_id,
'filesize': filesize,
} for format_id, video_url in video['media_urls'].items()]
} for format_id, video_url in video['media_urls'].items() if video_url]
if not formats:
hls_streams = self._get_streams(url, video_id, app_id_ver=(11, 2))
if hls_streams:
# m3u8_native leads to intermittent ContentTooShortError
formats.extend(self._extract_m3u8_formats(
hls_streams[0]['url'], video_id, ext='mp4', m3u8_id='hls'))
'''
# DASH streams handling is incomplete as 'url' is missing
dash_streams = self._get_streams(url, video_id, app_id_ver=(3, 1))
if dash_streams:
formats.extend(self._parse_segmented_mp4(dash_streams))
'''
self._sort_formats(formats)
description = video.get('description')

View File

@ -84,7 +84,7 @@ class VideomoreIE(InfoExtractor):
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<object[^>]+data=(["\'])https?://videomore.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
r'<object[^>]+data=(["\'])https?://videomore\.ru/player\.swf\?.*config=(?P<url>https?://videomore\.ru/(?:[^/]+/)+\d+\.xml).*\1',
webpage)
if mobj:
return mobj.group('url')

View File

@ -0,0 +1,55 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_str
class VyboryMosIE(InfoExtractor):
_VALID_URL = r'https?://vybory\.mos\.ru/(?:#precinct/|account/channels\?.*?\bstation_id=)(?P<id>\d+)'
_TESTS = [{
'url': 'http://vybory.mos.ru/#precinct/13636',
'info_dict': {
'id': '13636',
'ext': 'mp4',
'title': 're:^Участковая избирательная комиссия №2231 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'Россия, Москва, улица Введенского, 32А',
'is_live': True,
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://vybory.mos.ru/account/channels?station_id=13636',
'only_matching': True,
}]
def _real_extract(self, url):
station_id = self._match_id(url)
channels = self._download_json(
'http://vybory.mos.ru/account/channels?station_id=%s' % station_id,
station_id, 'Downloading channels JSON')
formats = []
for cam_num, (sid, hosts, name, _) in enumerate(channels, 1):
for num, host in enumerate(hosts, 1):
formats.append({
'url': 'http://%s/master.m3u8?sid=%s' % (host, sid),
'ext': 'mp4',
'format_id': 'camera%d-host%d' % (cam_num, num),
'format_note': '%s, %s' % (name, host),
})
info = self._download_json(
'http://vybory.mos.ru/json/voting_stations/%s/%s.json'
% (compat_str(station_id)[:3], station_id),
station_id, 'Downloading station JSON', fatal=False)
return {
'id': station_id,
'title': self._live_title(info['name'] if info else station_id),
'description': info.get('address'),
'is_live': True,
'formats': formats,
}

View File

@ -124,12 +124,14 @@ class XFileShareIE(InfoExtractor):
webpage = self._download_webpage(req, video_id, 'Downloading video page')
title = (self._search_regex(
[r'style="z-index: [0-9]+;">([^<]+)</span>',
(r'style="z-index: [0-9]+;">([^<]+)</span>',
r'<td nowrap>([^<]+)</td>',
r'h4-fine[^>]*>([^<]+)<',
r'>Watch (.+) ',
r'<h2 class="video-page-head">([^<]+)</h2>'],
webpage, 'title', default=None) or self._og_search_title(webpage)).strip()
r'<h2 class="video-page-head">([^<]+)</h2>',
r'<h2 style="[^"]*color:#403f3d[^"]*"[^>]*>([^<]+)<'), # streamin.to
webpage, 'title', default=None) or self._og_search_title(
webpage, default=None) or video_id).strip()
def extract_video_url(default=NO_DEFAULT):
return self._search_regex(

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.09.15'
__version__ = '2016.09.24'