Merge remote-tracking branch 'upstream/master' into myversion

This commit is contained in:
Andrew Udvare 2018-04-28 01:51:59 -04:00
commit 6d4f27533d
No known key found for this signature in database
GPG Key ID: 1AFD9AFC120C26DD
42 changed files with 705 additions and 332 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.04.09*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.04.25*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.04.09** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.04.25**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.04.09 [debug] youtube-dl version 2018.04.25
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -236,3 +236,6 @@ Lei Wang
Petr Novák Petr Novák
Leonardo Taccari Leonardo Taccari
Martin Weinelt Martin Weinelt
Surya Oktafendri
TingPing
Alexandre Macabies

View File

@ -1,3 +1,45 @@
version 2018.04.25
Core
* [utils] Fix match_str for boolean meta fields
+ [Makefile] Add support for pandoc 2 and disable smart extension (#16251)
* [YoutubeDL] Fix typo in media extension compatibility checker (#16215)
Extractors
+ [openload] Recognize IPv6 stream URLs (#16136, #16137, #16205, #16246,
#16250)
+ [twitch] Extract is_live according to status (#16259)
* [pornflip] Relax URL regular expression (#16258)
- [etonline] Remove extractor (#16256)
* [breakcom] Fix extraction (#16254)
+ [youtube] Add ability to authenticate with cookies
* [youtube:feed] Implement lazy playlist extraction (#10184)
+ [svt] Add support for TV channel live streams (#15279, #15809)
* [ccma] Fix video extraction (#15931)
* [rentv] Fix extraction (#15227)
+ [nick] Add support for nickjr.nl (#16230)
* [extremetube] Fix metadata extraction
+ [keezmovies] Add support for generic embeds (#16134, #16154)
* [nexx] Extract new azure URLs (#16223)
* [cbssports] Fix extraction (#16217)
* [kaltura] Improve embeds detection (#16201)
* [instagram:user] Fix extraction (#16119)
* [cbs] Skip DRM asset types (#16104)
version 2018.04.16
Extractors
* [smotri:broadcast] Fix extraction (#16180)
+ [picarto] Add support for picarto.tv (#6205, #12514, #15276, #15551)
* [vine:user] Fix extraction (#15514, #16190)
* [pornhub] Relax URL regular expression (#16165)
* [cbc:watch] Re-acquire device token when expired (#16160)
+ [fxnetworks] Add support for https theplatform URLs (#16125, #16157)
+ [instagram:user] Add request signing (#16119)
+ [twitch] Add support for mobile URLs (#16146)
version 2018.04.09 version 2018.04.09
Core Core

View File

@ -14,6 +14,9 @@ PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi) SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)
# set markdown input format to "markdown-smart" for pandoc version 2 and to "markdown" for pandoc prior to version 2
MARKDOWN = $(shell if [ `pandoc -v | head -n1 | cut -d" " -f2 | head -c1` = "2" ]; then echo markdown-smart; else echo markdown; fi)
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR) install -d $(DESTDIR)$(BINDIR)
install -m 755 youtube-dl $(DESTDIR)$(BINDIR) install -m 755 youtube-dl $(DESTDIR)$(BINDIR)
@ -82,11 +85,11 @@ supportedsites:
$(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md $(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md
README.txt: README.md README.txt: README.md
pandoc -f markdown -t plain README.md -o README.txt pandoc -f $(MARKDOWN) -t plain README.md -o README.txt
youtube-dl.1: README.md youtube-dl.1: README.md
$(PYTHON) devscripts/prepare_manpage.py youtube-dl.1.temp.md $(PYTHON) devscripts/prepare_manpage.py youtube-dl.1.temp.md
pandoc -s -f markdown -t man youtube-dl.1.temp.md -o youtube-dl.1 pandoc -s -f $(MARKDOWN) -t man youtube-dl.1.temp.md -o youtube-dl.1
rm -f youtube-dl.1.temp.md rm -f youtube-dl.1.temp.md
youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in

View File

@ -1,27 +1,22 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
from __future__ import unicode_literals from __future__ import unicode_literals
import hashlib
import urllib.request
import json import json
versions_info = json.load(open('update/versions.json')) versions_info = json.load(open('update/versions.json'))
version = versions_info['latest'] version = versions_info['latest']
URL = versions_info['versions'][version]['bin'][0] version_dict = versions_info['versions'][version]
data = urllib.request.urlopen(URL).read()
# Read template page # Read template page
with open('download.html.in', 'r', encoding='utf-8') as tmplf: with open('download.html.in', 'r', encoding='utf-8') as tmplf:
template = tmplf.read() template = tmplf.read()
sha256sum = hashlib.sha256(data).hexdigest()
template = template.replace('@PROGRAM_VERSION@', version) template = template.replace('@PROGRAM_VERSION@', version)
template = template.replace('@PROGRAM_URL@', URL) template = template.replace('@PROGRAM_URL@', version_dict['bin'][0])
template = template.replace('@PROGRAM_SHA256SUM@', sha256sum) template = template.replace('@PROGRAM_SHA256SUM@', version_dict['bin'][1])
template = template.replace('@EXE_URL@', versions_info['versions'][version]['exe'][0]) template = template.replace('@EXE_URL@', version_dict['exe'][0])
template = template.replace('@EXE_SHA256SUM@', versions_info['versions'][version]['exe'][1]) template = template.replace('@EXE_SHA256SUM@', version_dict['exe'][1])
template = template.replace('@TAR_URL@', versions_info['versions'][version]['tar'][0]) template = template.replace('@TAR_URL@', version_dict['tar'][0])
template = template.replace('@TAR_SHA256SUM@', versions_info['versions'][version]['tar'][1]) template = template.replace('@TAR_SHA256SUM@', version_dict['tar'][1])
with open('download.html', 'w', encoding='utf-8') as dlf: with open('download.html', 'w', encoding='utf-8') as dlf:
dlf.write(template) dlf.write(template)

View File

@ -257,7 +257,6 @@
- **ESPN** - **ESPN**
- **ESPNArticle** - **ESPNArticle**
- **EsriVideo** - **EsriVideo**
- **ETOnline**
- **Europa** - **Europa**
- **EveryonesMixtape** - **EveryonesMixtape**
- **ExpoTV** - **ExpoTV**
@ -628,6 +627,8 @@
- **PhilharmonieDeParis**: Philharmonie de Paris - **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de** - **phoenix.de**
- **Photobucket** - **Photobucket**
- **Picarto**
- **PicartoVod**
- **Piksel** - **Piksel**
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**

View File

@ -232,7 +232,7 @@ class TestNPOSubtitles(BaseTestSubtitles):
class TestMTVSubtitles(BaseTestSubtitles): class TestMTVSubtitles(BaseTestSubtitles):
url = 'http://www.cc.com/video-clips/kllhuv/stand-up-greg-fitzsimmons--uncensored---too-good-of-a-mother' url = 'http://www.cc.com/video-clips/p63lk0/adam-devine-s-house-party-chasing-white-swans'
IE = ComedyCentralIE IE = ComedyCentralIE
def getInfoDict(self): def getInfoDict(self):
@ -243,7 +243,7 @@ class TestMTVSubtitles(BaseTestSubtitles):
self.DL.params['allsubtitles'] = True self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles() subtitles = self.getSubtitles()
self.assertEqual(set(subtitles.keys()), set(['en'])) self.assertEqual(set(subtitles.keys()), set(['en']))
self.assertEqual(md5(subtitles['en']), 'b9f6ca22a6acf597ec76f61749765e65') self.assertEqual(md5(subtitles['en']), '78206b8d8a0cfa9da64dc026eea48961')
class TestNRKSubtitles(BaseTestSubtitles): class TestNRKSubtitles(BaseTestSubtitles):

View File

@ -42,6 +42,7 @@ from youtube_dl.utils import (
is_html, is_html,
js_to_json, js_to_json,
limit_length, limit_length,
merge_dicts,
mimetype2ext, mimetype2ext,
month_by_name, month_by_name,
multipart_encode, multipart_encode,
@ -669,6 +670,17 @@ class TestUtil(unittest.TestCase):
self.assertEqual(dict_get(d, ('b', 'c', key, )), None) self.assertEqual(dict_get(d, ('b', 'c', key, )), None)
self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value) self.assertEqual(dict_get(d, ('b', 'c', key, ), skip_false_values=False), false_value)
def test_merge_dicts(self):
self.assertEqual(merge_dicts({'a': 1}, {'b': 2}), {'a': 1, 'b': 2})
self.assertEqual(merge_dicts({'a': 1}, {'a': 2}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': None}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {'a': ''}), {'a': 1})
self.assertEqual(merge_dicts({'a': 1}, {}), {'a': 1})
self.assertEqual(merge_dicts({'a': None}, {'a': 1}), {'a': 1})
self.assertEqual(merge_dicts({'a': ''}, {'a': 1}), {'a': ''})
self.assertEqual(merge_dicts({'a': ''}, {'a': 'abc'}), {'a': 'abc'})
self.assertEqual(merge_dicts({'a': None}, {'a': ''}, {'a': 'abc'}), {'a': 'abc'})
def test_encode_compat_str(self): def test_encode_compat_str(self):
self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест') self.assertEqual(encode_compat_str(b'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82', 'utf-8'), 'тест')
self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест') self.assertEqual(encode_compat_str('тест', 'utf-8'), 'тест')
@ -1072,6 +1084,18 @@ ffmpeg version 2.4.4 Copyright (c) 2000-2014 the FFmpeg ...'''), '2.4.4')
self.assertFalse(match_str( self.assertFalse(match_str(
'like_count > 100 & dislike_count <? 50 & description', 'like_count > 100 & dislike_count <? 50 & description',
{'like_count': 190, 'dislike_count': 10})) {'like_count': 190, 'dislike_count': 10}))
self.assertTrue(match_str('is_live', {'is_live': True}))
self.assertFalse(match_str('is_live', {'is_live': False}))
self.assertFalse(match_str('is_live', {'is_live': None}))
self.assertFalse(match_str('is_live', {}))
self.assertFalse(match_str('!is_live', {'is_live': True}))
self.assertTrue(match_str('!is_live', {'is_live': False}))
self.assertTrue(match_str('!is_live', {'is_live': None}))
self.assertTrue(match_str('!is_live', {}))
self.assertTrue(match_str('title', {'title': 'abc'}))
self.assertTrue(match_str('title', {'title': ''}))
self.assertFalse(match_str('!title', {'title': 'abc'}))
self.assertFalse(match_str('!title', {'title': ''}))
def test_parse_dfxp_time_expr(self): def test_parse_dfxp_time_expr(self):
self.assertEqual(parse_dfxp_time_expr(None), None) self.assertEqual(parse_dfxp_time_expr(None), None)

View File

@ -61,7 +61,7 @@ class TestYoutubeLists(unittest.TestCase):
dl = FakeYDL() dl = FakeYDL()
dl.params['extract_flat'] = True dl.params['extract_flat'] = True
ie = YoutubePlaylistIE(dl) ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re') result = ie.extract('https://www.youtube.com/playlist?list=PL-KKIb8rvtMSrAO9YFbeM6UQrAqoFTUWv')
self.assertIsPlaylist(result) self.assertIsPlaylist(result)
for entry in result['entries']: for entry in result['entries']:
self.assertTrue(entry.get('title')) self.assertTrue(entry.get('title'))

View File

@ -1853,7 +1853,7 @@ class YoutubeDL(object):
def compatible_formats(formats): def compatible_formats(formats):
video, audio = formats video, audio = formats
# Check extension # Check extension
video_ext, audio_ext = audio.get('ext'), video.get('ext') video_ext, audio_ext = video.get('ext'), audio.get('ext')
if video_ext and audio_ext: if video_ext and audio_ext:
COMPATIBLE_EXTS = ( COMPATIBLE_EXTS = (
('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'), ('mp3', 'mp4', 'm4a', 'm4p', 'm4b', 'm4r', 'm4v', 'ismv', 'isma'),

0
youtube_dl/extractor/americastestkitchen.py Executable file → Normal file
View File

View File

@ -3,15 +3,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import int_or_none
int_or_none,
parse_age_limit,
)
class BreakIE(InfoExtractor): class BreakIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?P<site>break|screenjunkies)\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)' _VALID_URL = r'https?://(?:www\.)?break\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056', 'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
'info_dict': { 'info_dict': {
@ -19,125 +17,73 @@ class BreakIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'When Girls Act Like D-Bags', 'title': 'When Girls Act Like D-Bags',
'age_limit': 13, 'age_limit': 13,
},
}, {
# youtube embed
'url': 'http://www.break.com/video/someone-forgot-boat-brakes-work',
'info_dict': {
'id': 'RrrDLdeL2HQ',
'ext': 'mp4',
'title': 'Whale Watching Boat Crashing Into San Diego Dock',
'description': 'md5:afc1b2772f0a8468be51dd80eb021069',
'upload_date': '20160331',
'uploader': 'Steve Holden',
'uploader_id': 'sdholden07',
},
'params': {
'skip_download': True,
} }
}, {
'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
'md5': '5c2b686bec3d43de42bde9ec047536b0',
'info_dict': {
'id': '2841915',
'display_id': 'best-quentin-tarantino-movie',
'ext': 'mp4',
'title': 'Best Quentin Tarantino Movie',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3671,
'age_limit': 13,
'tags': list,
},
}, {
'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
'info_dict': {
'id': '2348808',
'display_id': 'honest-trailers-the-dark-knight',
'ext': 'mp4',
'title': 'Honest Trailers - The Dark Knight',
'thumbnail': r're:^https?://.*\.(?:jpg|png)',
'age_limit': 10,
'tags': list,
},
}, {
# requires subscription but worked around
'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
'info_dict': {
'id': '3003285',
'display_id': 'knocking-dead-ep-1-the-show-so-far',
'ext': 'mp4',
'title': 'State of The Dead Recap: Knocking Dead Pilot',
'thumbnail': r're:^https?://.*\.jpg',
'duration': 3307,
'age_limit': 13,
'tags': list,
},
}, { }, {
'url': 'http://www.break.com/video/ugc/baby-flex-2773063', 'url': 'http://www.break.com/video/ugc/baby-flex-2773063',
'only_matching': True, 'only_matching': True,
}] }]
_DEFAULT_BITRATES = (48, 150, 320, 496, 864, 2240, 3264)
def _real_extract(self, url): def _real_extract(self, url):
site, display_id, video_id = re.match(self._VALID_URL, url).groups() display_id, video_id = re.match(self._VALID_URL, url).groups()
if not video_id: webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
(r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
webpage, 'video id')
webpage = self._download_webpage( youtube_url = YoutubeIE._extract_url(webpage)
'http://www.%s.com/embed/%s' % (site, video_id), if youtube_url:
display_id, 'Downloading video embed page') return self.url_result(youtube_url, ie=YoutubeIE.ie_key())
embed_vars = self._parse_json(
content = self._parse_json(
self._search_regex( self._search_regex(
r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'), r'(?s)content["\']\s*:\s*(\[.+?\])\s*[,\n]', webpage,
'content'),
display_id) display_id)
youtube_id = embed_vars.get('youtubeId')
if youtube_id:
return self.url_result(youtube_id, 'Youtube')
title = embed_vars['contentName']
formats = [] formats = []
bitrates = [] for video in content:
for f in embed_vars.get('media', []): video_url = video.get('url')
if not f.get('uri') or f.get('mediaPurpose') != 'play': if not video_url or not isinstance(video_url, compat_str):
continue continue
bitrate = int_or_none(f.get('bitRate')) bitrate = int_or_none(self._search_regex(
if bitrate: r'(\d+)_kbps', video_url, 'tbr', default=None))
bitrates.append(bitrate)
formats.append({ formats.append({
'url': f['uri'], 'url': video_url,
'format_id': 'http-%d' % bitrate if bitrate else 'http', 'format_id': 'http-%d' % bitrate if bitrate else 'http',
'width': int_or_none(f.get('width')),
'height': int_or_none(f.get('height')),
'tbr': bitrate, 'tbr': bitrate,
'format': 'mp4',
}) })
if not bitrates:
# When subscriptionLevel > 0, i.e. plus subscription is required
# media list will be empty. However, hds and hls uris are still
# available. We can grab them assuming bitrates to be default.
bitrates = self._DEFAULT_BITRATES
auth_token = embed_vars.get('AuthToken')
def construct_manifest_url(base_url, ext):
pieces = [base_url]
pieces.extend([compat_str(b) for b in bitrates])
pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
return ','.join(pieces)
if bitrates and auth_token:
hds_url = embed_vars.get('hdsUri')
if hds_url:
formats.extend(self._extract_f4m_formats(
construct_manifest_url(hds_url, 'f4m'),
display_id, f4m_id='hds', fatal=False))
hls_url = embed_vars.get('hlsUri')
if hls_url:
formats.extend(self._extract_m3u8_formats(
construct_manifest_url(hls_url, 'm3u8'),
display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
title = self._search_regex(
(r'title["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
r'<h1[^>]*>(?P<value>[^<]+)'), webpage, 'title', group='value')
def get(key, name):
return int_or_none(self._search_regex(
r'%s["\']\s*:\s*["\'](\d+)' % key, webpage, name,
default=None))
age_limit = get('ratings', 'age limit')
video_id = video_id or get('pid', 'video id') or display_id
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'thumbnail': embed_vars.get('thumbUri'), 'thumbnail': self._og_search_thumbnail(webpage),
'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None, 'age_limit': age_limit,
'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
'tags': embed_vars.get('tags', '').split(','),
'formats': formats, 'formats': formats,
} }

View File

@ -65,7 +65,7 @@ class CBSIE(CBSBaseIE):
last_e = None last_e = None
for item in items_data.findall('.//item'): for item in items_data.findall('.//item'):
asset_type = xpath_text(item, 'assetType') asset_type = xpath_text(item, 'assetType')
if not asset_type or asset_type in asset_types: if not asset_type or asset_type in asset_types or asset_type in ('HLS_FPS', 'DASH_CENC'):
continue continue
asset_types.append(asset_type) asset_types.append(asset_type)
query = { query = {

View File

@ -4,28 +4,35 @@ from .cbs import CBSBaseIE
class CBSSportsIE(CBSBaseIE): class CBSSportsIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?cbssports\.com/video/player/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?cbssports\.com/[^/]+/(?:video|news)/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbssports.com/video/player/videos/708337219968/0/ben-simmons-the-next-lebron?-not-so-fast', 'url': 'https://www.cbssports.com/nba/video/donovan-mitchell-flashes-star-potential-in-game-2-victory-over-thunder/',
'info_dict': { 'info_dict': {
'id': '708337219968', 'id': '1214315075735',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ben Simmons the next LeBron? Not so fast', 'title': 'Donovan Mitchell flashes star potential in Game 2 victory over Thunder',
'description': 'md5:854294f627921baba1f4b9a990d87197', 'description': 'md5:df6f48622612c2d6bd2e295ddef58def',
'timestamp': 1466293740, 'timestamp': 1524111457,
'upload_date': '20160618', 'upload_date': '20180419',
'uploader': 'CBSI-NEW', 'uploader': 'CBSI-NEW',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'https://www.cbssports.com/nba/news/nba-playoffs-2018-watch-76ers-vs-heat-game-3-series-schedule-tv-channel-online-stream/',
'only_matching': True,
}] }]
def _extract_video_info(self, filter_query, video_id): def _extract_video_info(self, filter_query, video_id):
return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id) return self._extract_feed_info('dJ5BDC', 'VxxJg8Ymh8sE', filter_query, video_id)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
[r'(?:=|%26)pcid%3D(\d+)', r'embedVideo(?:Container)?_(\d+)'],
webpage, 'video id')
return self._extract_video_info('byId=%s' % video_id, video_id) return self._extract_video_info('byId=%s' % video_id, video_id)

View File

@ -4,11 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
clean_html, parse_resolution,
) )
@ -40,34 +42,42 @@ class CCMAIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
media_type, media_id = re.match(self._VALID_URL, url).groups() media_type, media_id = re.match(self._VALID_URL, url).groups()
media_data = {}
formats = [] media = self._download_json(
profiles = ['pc'] if media_type == 'audio' else ['mobil', 'pc'] 'http://dinamics.ccma.cat/pvideo/media.jsp', media_id, query={
for i, profile in enumerate(profiles):
md = self._download_json('http://dinamics.ccma.cat/pvideo/media.jsp', media_id, query={
'media': media_type, 'media': media_type,
'idint': media_id, 'idint': media_id,
'profile': profile, })
}, fatal=False)
if md: formats = []
media_data = md media_url = media['media']['url']
media_url = media_data.get('media', {}).get('url') if isinstance(media_url, list):
if media_url: for format_ in media_url:
formats.append({ format_url = format_.get('file')
'format_id': profile, if not format_url or not isinstance(format_url, compat_str):
'url': media_url, continue
'quality': i, label = format_.get('label')
}) f = parse_resolution(label)
f.update({
'url': format_url,
'format_id': label,
})
formats.append(f)
else:
formats.append({
'url': media_url,
'vcodec': 'none' if media_type == 'audio' else None,
})
self._sort_formats(formats) self._sort_formats(formats)
informacio = media_data['informacio'] informacio = media['informacio']
title = informacio['titol'] title = informacio['titol']
durada = informacio.get('durada', {}) durada = informacio.get('durada', {})
duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text')) duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc')) timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
subtitles = {} subtitles = {}
subtitols = media_data.get('subtitols', {}) subtitols = media.get('subtitols', {})
if subtitols: if subtitols:
sub_url = subtitols.get('url') sub_url = subtitols.get('url')
if sub_url: if sub_url:
@ -77,7 +87,7 @@ class CCMAIE(InfoExtractor):
}) })
thumbnails = [] thumbnails = []
imatges = media_data.get('imatges', {}) imatges = media.get('imatges', {})
if imatges: if imatges:
thumbnail_url = imatges.get('url') thumbnail_url = imatges.get('url')
if thumbnail_url: if thumbnail_url:

0
youtube_dl/extractor/cda.py Executable file → Normal file
View File

View File

@ -682,18 +682,30 @@ class InfoExtractor(object):
else: else:
self.report_warning(errmsg + str(ve)) self.report_warning(errmsg + str(ve))
def _download_json(self, url_or_request, video_id, def _download_json_handle(
note='Downloading JSON metadata', self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', errnote='Unable to download JSON metadata', transform_source=None,
transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
fatal=True, encoding=None, data=None, headers={}, query={}): """Return a tuple (JSON object, URL handle)"""
json_string = self._download_webpage( res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal, url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query) encoding=encoding, data=data, headers=headers, query=query)
if (not fatal) and json_string is False: if res is False:
return None return res
json_string, urlh = res
return self._parse_json( return self._parse_json(
json_string, video_id, transform_source=transform_source, fatal=fatal) json_string, video_id, transform_source=transform_source,
fatal=fatal), urlh
def _download_json(
self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}):
res = self._download_json_handle(
url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query)
return res if res is False else res[0]
def _parse_json(self, json_string, video_id, transform_source=None, fatal=True): def _parse_json(self, json_string, video_id, transform_source=None, fatal=True):
if transform_source: if transform_source:
@ -1008,6 +1020,40 @@ class InfoExtractor(object):
if isinstance(json_ld, dict): if isinstance(json_ld, dict):
json_ld = [json_ld] json_ld = [json_ld]
INTERACTION_TYPE_MAP = {
'CommentAction': 'comment',
'AgreeAction': 'like',
'DisagreeAction': 'dislike',
'LikeAction': 'like',
'DislikeAction': 'dislike',
'ListenAction': 'view',
'WatchAction': 'view',
'ViewAction': 'view',
}
def extract_interaction_statistic(e):
interaction_statistic = e.get('interactionStatistic')
if not isinstance(interaction_statistic, list):
return
for is_e in interaction_statistic:
if not isinstance(is_e, dict):
continue
if is_e.get('@type') != 'InteractionCounter':
continue
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
continue
interaction_count = int_or_none(is_e.get('userInteractionCount'))
if interaction_count is None:
continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
if not count_kind:
continue
count_key = '%s_count' % count_kind
if info.get(count_key) is not None:
continue
info[count_key] = interaction_count
def extract_video_object(e): def extract_video_object(e):
assert e['@type'] == 'VideoObject' assert e['@type'] == 'VideoObject'
info.update({ info.update({
@ -1023,6 +1069,7 @@ class InfoExtractor(object):
'height': int_or_none(e.get('height')), 'height': int_or_none(e.get('height')),
'view_count': int_or_none(e.get('interactionCount')), 'view_count': int_or_none(e.get('interactionCount')),
}) })
extract_interaction_statistic(e)
for e in json_ld: for e in json_ld:
if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')): if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):

View File

@ -1,39 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class ETOnlineIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?etonline\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.etonline.com/tv/211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale/',
'info_dict': {
'id': '211130_dove_cameron_liv_and_maddie_emotional_episode_series_finale',
'title': 'md5:a21ec7d3872ed98335cbd2a046f34ee6',
'description': 'md5:8b94484063f463cca709617c79618ccd',
},
'playlist_count': 2,
}, {
'url': 'http://www.etonline.com/media/video/here_are_the_stars_who_love_bringing_their_moms_as_dates_to_the_oscars-211359/',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911076001/default_default/index.html?videoId=ref:%s'
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result(
self.BRIGHTCOVE_URL_TEMPLATE % video_id, 'BrightcoveNew', video_id)
for video_id in re.findall(
r'site\.brightcove\s*\([^,]+,\s*["\'](title_\d+)', webpage)]
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage, fatal=False),
self._og_search_description(webpage))

View File

@ -326,7 +326,6 @@ from .espn import (
FiveThirtyEightIE, FiveThirtyEightIE,
) )
from .esri import EsriVideoIE from .esri import EsriVideoIE
from .etonline import ETOnlineIE
from .europa import EuropaIE from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE from .everyonesmixtape import EveryonesMixtapeIE
from .expotv import ExpoTVIE from .expotv import ExpoTVIE
@ -815,6 +814,10 @@ from .periscope import (
from .philharmoniedeparis import PhilharmonieDeParisIE from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
from .picarto import (
PicartoIE,
PicartoVodIE,
)
from .piksel import PikselIE from .piksel import PikselIE
from .pinkbike import PinkbikeIE from .pinkbike import PinkbikeIE
from .pladform import PladformIE from .pladform import PladformIE

View File

@ -8,12 +8,12 @@ class ExtremeTubeIE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)' _VALID_URL = r'https?://(?:www\.)?extremetube\.com/(?:[^/]+/)?video/(?P<id>[^/#?&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431', 'url': 'http://www.extremetube.com/video/music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'md5': '1fb9228f5e3332ec8c057d6ac36f33e0', 'md5': '92feaafa4b58e82f261e5419f39c60cb',
'info_dict': { 'info_dict': {
'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431', 'id': 'music-video-14-british-euro-brit-european-cumshots-swallow-652431',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Music Video 14 british euro brit european cumshots swallow', 'title': 'Music Video 14 british euro brit european cumshots swallow',
'uploader': 'unknown', 'uploader': 'anonim',
'view_count': int, 'view_count': int,
'age_limit': 18, 'age_limit': 18,
} }
@ -36,10 +36,10 @@ class ExtremeTubeIE(KeezMoviesIE):
r'<h1[^>]+title="([^"]+)"[^>]*>', webpage, 'title') r'<h1[^>]+title="([^"]+)"[^>]*>', webpage, 'title')
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'Uploaded by:\s*</strong>\s*(.+?)\s*</div>', r'Uploaded by:\s*</[^>]+>\s*<a[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
r'Views:\s*</strong>\s*<span>([\d,\.]+)</span>', r'Views:\s*</[^>]+>\s*<[^>]+>([\d,\.]+)</',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
info.update({ info.update({

View File

@ -5,7 +5,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .nexx import NexxIE from .nexx import NexxIE
from ..utils import int_or_none from ..utils import (
int_or_none,
try_get,
)
class FunkBaseIE(InfoExtractor): class FunkBaseIE(InfoExtractor):
@ -77,6 +80,20 @@ class FunkChannelIE(FunkBaseIE):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# only available via byIdList API
'url': 'https://www.funk.net/channel/informr/martin-sonneborn-erklaert-die-eu',
'info_dict': {
'id': '205067',
'ext': 'mp4',
'title': 'Martin Sonneborn erklärt die EU',
'description': 'md5:050f74626e4ed87edf4626d2024210c0',
'timestamp': 1494424042,
'upload_date': '20170510',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/', 'url': 'https://www.funk.net/channel/59d5149841dca100012511e3/mein-erster-job-lovemilla-folge-1/lovemilla/',
'only_matching': True, 'only_matching': True,
@ -87,16 +104,28 @@ class FunkChannelIE(FunkBaseIE):
channel_id = mobj.group('id') channel_id = mobj.group('id')
alias = mobj.group('alias') alias = mobj.group('alias')
results = self._download_json( headers = {
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id, 'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
headers={ 'Referer': url,
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc', }
'Referer': url,
}, query={
'channelId': channel_id,
'size': 100,
})['result']
video = next(r for r in results if r.get('alias') == alias) video = None
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList', channel_id,
headers=headers, query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video:
results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id,
headers=headers, query={
'channelId': channel_id,
'size': 100,
})['result']
video = next(r for r in results if r.get('alias') == alias)
return self._make_url_result(video) return self._make_url_result(video)

View File

@ -23,6 +23,7 @@ from ..utils import (
is_html, is_html,
js_to_json, js_to_json,
KNOWN_EXTENSIONS, KNOWN_EXTENSIONS,
merge_dicts,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
sanitized_Request, sanitized_Request,
@ -1220,7 +1221,7 @@ class GenericIE(InfoExtractor):
'title': '35871', 'title': '35871',
'timestamp': 1355743100, 'timestamp': 1355743100,
'upload_date': '20121217', 'upload_date': '20121217',
'uploader_id': 'batchUser', 'uploader_id': 'cplapp@learn360.com',
}, },
'add_ie': ['Kaltura'], 'add_ie': ['Kaltura'],
}, },
@ -1271,6 +1272,22 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['Kaltura'], 'add_ie': ['Kaltura'],
}, },
{
# meta twitter:player
'url': 'http://thechive.com/2017/12/08/all-i-want-for-christmas-is-more-twerk/',
'info_dict': {
'id': '0_01b42zps',
'ext': 'mp4',
'title': 'Main Twerk (Video)',
'upload_date': '20171208',
'uploader_id': 'sebastian.salinas@thechive.com',
'timestamp': 1512713057,
},
'params': {
'skip_download': True,
},
'add_ie': ['Kaltura'],
},
# referrer protected EaglePlatform embed # referrer protected EaglePlatform embed
{ {
'url': 'https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/', 'url': 'https://tvrain.ru/lite/teleshow/kak_vse_nachinalos/namin-418921/',
@ -2986,21 +3003,6 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
sharevideos_urls, video_id, video_title) sharevideos_urls, video_id, video_title)
def merge_dicts(dict1, dict2):
merged = {}
for k, v in dict1.items():
if v is not None:
merged[k] = v
for k, v in dict2.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
# Look for HTML5 media # Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls') entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries: if entries:

View File

@ -3,7 +3,9 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext,
mimetype2ext, mimetype2ext,
qualities, qualities,
remove_end, remove_end,
@ -73,19 +75,25 @@ class ImdbIE(InfoExtractor):
video_info_list = format_info.get('videoInfoList') video_info_list = format_info.get('videoInfoList')
if not video_info_list or not isinstance(video_info_list, list): if not video_info_list or not isinstance(video_info_list, list):
continue continue
video_info = video_info_list[0] for video_info in video_info_list:
if not video_info or not isinstance(video_info, dict): if not video_info or not isinstance(video_info, dict):
continue continue
video_url = video_info.get('videoUrl') video_url = video_info.get('videoUrl')
if not video_url: if not video_url or not isinstance(video_url, compat_str):
continue continue
format_id = format_info.get('ffname') if (video_info.get('videoMimeType') == 'application/x-mpegURL' or
formats.append({ determine_ext(video_url) == 'm3u8'):
'format_id': format_id, formats.extend(self._extract_m3u8_formats(
'url': video_url, video_url, video_id, 'mp4', entry_protocol='m3u8_native',
'ext': mimetype2ext(video_info.get('videoMimeType')), m3u8_id='hls', fatal=False))
'quality': quality(format_id), continue
}) format_id = format_info.get('ffname')
formats.append({
'format_id': format_id,
'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')),
'quality': quality(format_id),
})
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -6,11 +6,16 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import (
compat_str,
compat_HTTPError,
)
from ..utils import ( from ..utils import (
ExtractorError,
get_element_by_attribute, get_element_by_attribute,
int_or_none, int_or_none,
lowercase_escape, lowercase_escape,
std_headers,
try_get, try_get,
) )
@ -239,6 +244,8 @@ class InstagramUserIE(InfoExtractor):
} }
} }
_gis_tmpl = None
def _entries(self, data): def _entries(self, data):
def get_count(suffix): def get_count(suffix):
return int_or_none(try_get( return int_or_none(try_get(
@ -254,19 +261,39 @@ class InstagramUserIE(InfoExtractor):
for page_num in itertools.count(1): for page_num in itertools.count(1):
variables = json.dumps({ variables = json.dumps({
'id': uploader_id, 'id': uploader_id,
'first': 100, 'first': 12,
'after': cursor, 'after': cursor,
}) })
s = '%s:%s:%s' % (rhx_gis, csrf_token, variables)
media = self._download_json( if self._gis_tmpl:
'https://www.instagram.com/graphql/query/', uploader_id, gis_tmpls = [self._gis_tmpl]
'Downloading JSON page %d' % page_num, headers={ else:
'X-Requested-With': 'XMLHttpRequest', gis_tmpls = [
'X-Instagram-GIS': hashlib.md5(s.encode('utf-8')).hexdigest(), '%s' % rhx_gis,
}, query={ '',
'query_hash': '472f257a40c653c64c666ce877d59d2b', '%s:%s' % (rhx_gis, csrf_token),
'variables': variables, '%s:%s:%s' % (rhx_gis, csrf_token, std_headers['User-Agent']),
})['data']['user']['edge_owner_to_timeline_media'] ]
for gis_tmpl in gis_tmpls:
try:
media = self._download_json(
'https://www.instagram.com/graphql/query/', uploader_id,
'Downloading JSON page %d' % page_num, headers={
'X-Requested-With': 'XMLHttpRequest',
'X-Instagram-GIS': hashlib.md5(
('%s:%s' % (gis_tmpl, variables)).encode('utf-8')).hexdigest(),
}, query={
'query_hash': '42323d64886122307be10013ad2dcc44',
'variables': variables,
})['data']['user']['edge_owner_to_timeline_media']
self._gis_tmpl = gis_tmpl
break
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
if gis_tmpl != gis_tmpls[-1]:
continue
raise
edges = media.get('edges') edges = media.get('edges')
if not edges or not isinstance(edges, list): if not edges or not isinstance(edges, list):

0
youtube_dl/extractor/joj.py Executable file → Normal file
View File

View File

@ -135,10 +135,10 @@ class KalturaIE(InfoExtractor):
''', webpage) or ''', webpage) or
re.search( re.search(
r'''(?xs) r'''(?xs)
<iframe[^>]+src=(?P<q1>["']) <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
(?:https?:)?//(?:www\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+) (?:https?:)?//(?:(?:www|cdnapi)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?:(?!(?P=q1)).)* (?:(?!(?P=q1)).)*
[?&]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+) [?&;]entry_id=(?P<id>(?:(?!(?P=q1))[^&])+)
(?P=q1) (?P=q1)
''', webpage) ''', webpage)
) )

View File

@ -20,23 +20,23 @@ from ..utils import (
class KeezMoviesIE(InfoExtractor): class KeezMoviesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?keezmovies\.com/video/(?:(?P<display_id>[^/]+)-)?(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.keezmovies.com/video/petite-asian-lady-mai-playing-in-bathtub-1214711', 'url': 'https://www.keezmovies.com/video/arab-wife-want-it-so-bad-i-see-she-thirsty-and-has-tiny-money-18070681',
'md5': '1c1e75d22ffa53320f45eeb07bc4cdc0', 'md5': '2ac69cdb882055f71d82db4311732a1a',
'info_dict': { 'info_dict': {
'id': '1214711', 'id': '18070681',
'display_id': 'petite-asian-lady-mai-playing-in-bathtub', 'display_id': 'arab-wife-want-it-so-bad-i-see-she-thirsty-and-has-tiny-money',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Petite Asian Lady Mai Playing In Bathtub', 'title': 'Arab wife want it so bad I see she thirsty and has tiny money.',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': None,
'view_count': int, 'view_count': int,
'age_limit': 18, 'age_limit': 18,
} }
}, { }, {
'url': 'http://www.keezmovies.com/video/1214711', 'url': 'http://www.keezmovies.com/video/18070681',
'only_matching': True, 'only_matching': True,
}] }]
def _extract_info(self, url): def _extract_info(self, url, fatal=True):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
display_id = (mobj.group('display_id') display_id = (mobj.group('display_id')
@ -55,7 +55,7 @@ class KeezMoviesIE(InfoExtractor):
encrypted = False encrypted = False
def extract_format(format_url, height=None): def extract_format(format_url, height=None):
if not isinstance(format_url, compat_str) or not format_url.startswith('http'): if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//')):
return return
if format_url in format_urls: if format_url in format_urls:
return return
@ -105,7 +105,11 @@ class KeezMoviesIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'Video %s is no longer available' % video_id, expected=True) 'Video %s is no longer available' % video_id, expected=True)
self._sort_formats(formats) try:
self._sort_formats(formats)
except ExtractorError:
if fatal:
raise
if not title: if not title:
title = self._html_search_regex( title = self._html_search_regex(
@ -122,7 +126,9 @@ class KeezMoviesIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
webpage, info = self._extract_info(url) webpage, info = self._extract_info(url, fatal=False)
if not info['formats']:
return self.url_result(url, 'Generic')
info['view_count'] = str_to_int(self._search_regex( info['view_count'] = str_to_int(self._search_regex(
r'<b>([\d,.]+)</b> Views?', webpage, 'view count', fatal=False)) r'<b>([\d,.]+)</b> Views?', webpage, 'view count', fatal=False))
return info return info

View File

@ -12,7 +12,7 @@ class MofosexIE(KeezMoviesIE):
_VALID_URL = r'https?://(?:www\.)?mofosex\.com/videos/(?P<id>\d+)/(?P<display_id>[^/?#&.]+)\.html' _VALID_URL = r'https?://(?:www\.)?mofosex\.com/videos/(?P<id>\d+)/(?P<display_id>[^/?#&.]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.mofosex.com/videos/318131/amateur-teen-playing-and-masturbating-318131.html', 'url': 'http://www.mofosex.com/videos/318131/amateur-teen-playing-and-masturbating-318131.html',
'md5': '39a15853632b7b2e5679f92f69b78e91', 'md5': '558fcdafbb63a87c019218d6e49daf8a',
'info_dict': { 'info_dict': {
'id': '318131', 'id': '318131',
'display_id': 'amateur-teen-playing-and-masturbating-318131', 'display_id': 'amateur-teen-playing-and-masturbating-318131',

View File

@ -230,15 +230,18 @@ class NexxIE(InfoExtractor):
azure_locator = stream_data['azureLocator'] azure_locator = stream_data['azureLocator']
AZURE_URL = 'http://nx%s%02d.akamaized.net/' def get_cdn_shield_base(shield_type='', static=False):
def get_cdn_shield_base(shield_type='', prefix='-p'):
for secure in ('', 's'): for secure in ('', 's'):
cdn_shield = stream_data.get('cdnShield%sHTTP%s' % (shield_type, secure.upper())) cdn_shield = stream_data.get('cdnShield%sHTTP%s' % (shield_type, secure.upper()))
if cdn_shield: if cdn_shield:
return 'http%s://%s' % (secure, cdn_shield) return 'http%s://%s' % (secure, cdn_shield)
else: else:
return AZURE_URL % (prefix, int(stream_data['azureAccount'].replace('nexxplayplus', ''))) if 'fb' in stream_data['azureAccount']:
prefix = 'df' if static else 'f'
else:
prefix = 'd' if static else 'p'
account = int(stream_data['azureAccount'].replace('nexxplayplus', '').replace('nexxplayfb', ''))
return 'http://nx-%s%02d.akamaized.net/' % (prefix, account)
azure_stream_base = get_cdn_shield_base() azure_stream_base = get_cdn_shield_base()
is_ml = ',' in language is_ml = ',' in language
@ -260,7 +263,7 @@ class NexxIE(InfoExtractor):
formats.extend(self._extract_ism_formats( formats.extend(self._extract_ism_formats(
azure_manifest_url % '', video_id, ism_id='%s-mss' % cdn, fatal=False)) azure_manifest_url % '', video_id, ism_id='%s-mss' % cdn, fatal=False))
azure_progressive_base = get_cdn_shield_base('Prog', '-d') azure_progressive_base = get_cdn_shield_base('Prog', True)
azure_file_distribution = stream_data.get('azureFileDistribution') azure_file_distribution = stream_data.get('azureFileDistribution')
if azure_file_distribution: if azure_file_distribution:
fds = azure_file_distribution.split(',') fds = azure_file_distribution.split(',')

View File

@ -81,13 +81,23 @@ class NickIE(MTVServicesInfoExtractor):
class NickBrIE(MTVServicesInfoExtractor): class NickBrIE(MTVServicesInfoExtractor):
IE_NAME = 'nickelodeon:br' IE_NAME = 'nickelodeon:br'
_VALID_URL = r'https?://(?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br/(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?#.]+)' _VALID_URL = r'''(?x)
https?://
(?:
(?P<domain>(?:www\.)?nickjr|mundonick\.uol)\.com\.br|
(?:www\.)?nickjr\.nl
)
/(?:programas/)?[^/]+/videos/(?:episodios/)?(?P<id>[^/?\#.]+)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.nickjr.com.br/patrulha-canina/videos/210-labirinto-de-pipoca/', 'url': 'http://www.nickjr.com.br/patrulha-canina/videos/210-labirinto-de-pipoca/',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'http://mundonick.uol.com.br/programas/the-loud-house/videos/muitas-irmas/7ljo9j', 'url': 'http://mundonick.uol.com.br/programas/the-loud-house/videos/muitas-irmas/7ljo9j',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.nickjr.nl/paw-patrol/videos/311-ge-wol-dig-om-terug-te-zijn/',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -340,7 +340,10 @@ class OpenloadIE(InfoExtractor):
get_element_by_id('streamurj', webpage) or get_element_by_id('streamurj', webpage) or
self._search_regex( self._search_regex(
(r'>\s*([\w-]+~\d{10,}~\d+\.\d+\.0\.0~[\w-]+)\s*<', (r'>\s*([\w-]+~\d{10,}~\d+\.\d+\.0\.0~[\w-]+)\s*<',
r'>\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)'), webpage, r'>\s*([\w~-]+~\d+\.\d+\.\d+\.\d+~[\w~-]+)',
r'>\s*([\w-]+~\d{10,}~(?:[a-f\d]+:){2}:~[\w-]+)\s*<',
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)\s*<',
r'>\s*([\w~-]+~[a-f0-9:]+~[\w~-]+)'), webpage,
'stream URL')) 'stream URL'))
video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id video_url = 'https://openload.co/stream/%s?mime=true' % decoded_id

View File

@ -0,0 +1,165 @@
# coding: utf-8
from __future__ import unicode_literals
import time
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
ExtractorError,
js_to_json,
try_get,
update_url_query,
urlencode_postdata,
)
class PicartoIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/(?P<id>[a-zA-Z0-9]+)'
_TEST = {
'url': 'https://picarto.tv/Setz',
'info_dict': {
'id': 'Setz',
'ext': 'mp4',
'title': 're:^Setz [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'timestamp': int,
'is_live': True
},
'skip': 'Stream is offline',
}
@classmethod
def suitable(cls, url):
return False if PicartoVodIE.suitable(url) else super(PicartoIE, cls).suitable(url)
def _real_extract(self, url):
channel_id = self._match_id(url)
stream_page = self._download_webpage(url, channel_id)
if '>This channel does not exist' in stream_page:
raise ExtractorError(
'Channel %s does not exist' % channel_id, expected=True)
player = self._parse_json(
self._search_regex(
r'(?s)playerSettings\[\d+\]\s*=\s*(\{.+?\}\s*\n)', stream_page,
'player settings'),
channel_id, transform_source=js_to_json)
if player.get('online') is False:
raise ExtractorError('Stream is offline', expected=True)
cdn_data = self._download_json(
'https://picarto.tv/process/channel', channel_id,
data=urlencode_postdata({'loadbalancinginfo': channel_id}),
note='Downloading load balancing info')
def get_event(key):
return try_get(player, lambda x: x['event'][key], compat_str) or ''
params = {
'token': player.get('token') or '',
'ticket': get_event('ticket'),
'con': int(time.time() * 1000),
'type': get_event('ticket'),
'scope': get_event('scope'),
}
prefered_edge = cdn_data.get('preferedEdge')
default_tech = player.get('defaultTech')
formats = []
for edge in cdn_data['edges']:
edge_ep = edge.get('ep')
if not edge_ep or not isinstance(edge_ep, compat_str):
continue
edge_id = edge.get('id')
for tech in cdn_data['techs']:
tech_label = tech.get('label')
tech_type = tech.get('type')
preference = 0
if edge_id == prefered_edge:
preference += 1
if tech_type == default_tech:
preference += 1
format_id = []
if edge_id:
format_id.append(edge_id)
if tech_type == 'application/x-mpegurl' or tech_label == 'HLS':
format_id.append('hls')
formats.extend(self._extract_m3u8_formats(
update_url_query(
'https://%s/hls/%s/index.m3u8'
% (edge_ep, channel_id), params),
channel_id, 'mp4', preference=preference,
m3u8_id='-'.join(format_id), fatal=False))
continue
elif tech_type == 'video/mp4' or tech_label == 'MP4':
format_id.append('mp4')
formats.append({
'url': update_url_query(
'https://%s/mp4/%s.mp4' % (edge_ep, channel_id),
params),
'format_id': '-'.join(format_id),
'preference': preference,
})
else:
# rtmp format does not seem to work
continue
self._sort_formats(formats)
mature = player.get('mature')
if mature is None:
age_limit = None
else:
age_limit = 18 if mature is True else 0
return {
'id': channel_id,
'title': self._live_title(channel_id),
'is_live': True,
'thumbnail': player.get('vodThumb'),
'age_limit': age_limit,
'formats': formats,
}
class PicartoVodIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?picarto\.tv/videopopout/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://picarto.tv/videopopout/ArtofZod_2017.12.12.00.13.23.flv',
'md5': '3ab45ba4352c52ee841a28fb73f2d9ca',
'info_dict': {
'id': 'ArtofZod_2017.12.12.00.13.23.flv',
'ext': 'mp4',
'title': 'ArtofZod_2017.12.12.00.13.23.flv',
'thumbnail': r're:^https?://.*\.jpg'
},
}, {
'url': 'https://picarto.tv/videopopout/Plague',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
vod_info = self._parse_json(
self._search_regex(
r'(?s)#vod-player["\']\s*,\s*(\{.+?\})\s*\)', webpage,
video_id),
video_id, transform_source=js_to_json)
formats = self._extract_m3u8_formats(
vod_info['vod'], video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
'thumbnail': vod_info.get('vodThumb'),
'formats': formats,
}

View File

@ -14,7 +14,7 @@ from ..utils import (
class PornFlipIE(InfoExtractor): class PornFlipIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:v|embed)/(?P<id>[0-9A-Za-z-]{11})' _VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:v|embed)/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.pornflip.com/v/wz7DfNhMmep', 'url': 'https://www.pornflip.com/v/wz7DfNhMmep',
'md5': '98c46639849145ae1fd77af532a9278c', 'md5': '98c46639849145ae1fd77af532a9278c',
@ -40,6 +40,9 @@ class PornFlipIE(InfoExtractor):
}, { }, {
'url': 'https://www.pornflip.com/embed/EkRD6-vS2-s', 'url': 'https://www.pornflip.com/embed/EkRD6-vS2-s',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.pornflip.com/v/NG9q6Pb_iK8',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -3,6 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import (
determine_ext,
int_or_none,
)
class RENTVIE(InfoExtractor): class RENTVIE(InfoExtractor):
@ -13,7 +17,9 @@ class RENTVIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '118577', 'id': '118577',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Документальный спецпроект: "Промывка мозгов. Технологии XXI века"' 'title': 'Документальный спецпроект: "Промывка мозгов. Технологии XXI века"',
'timestamp': 1472230800,
'upload_date': '20160826',
} }
}, { }, {
'url': 'http://ren.tv/player/118577', 'url': 'http://ren.tv/player/118577',
@ -26,9 +32,33 @@ class RENTVIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage('http://ren.tv/player/' + video_id, video_id) webpage = self._download_webpage('http://ren.tv/player/' + video_id, video_id)
jw_config = self._parse_json(self._search_regex( config = self._parse_json(self._search_regex(
r'config\s*=\s*({.+});', webpage, 'jw config'), video_id) r'config\s*=\s*({.+})\s*;', webpage, 'config'), video_id)
return self._parse_jwplayer_data(jw_config, video_id, m3u8_id='hls') title = config['title']
formats = []
for video in config['src']:
src = video.get('src')
if not src or not isinstance(src, compat_str):
continue
ext = determine_ext(src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': src,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': config.get('description'),
'thumbnail': config.get('image'),
'duration': int_or_none(config.get('duration')),
'timestamp': int_or_none(config.get('date')),
'formats': formats,
}
class RENTVArticleIE(InfoExtractor): class RENTVArticleIE(InfoExtractor):

View File

@ -310,6 +310,7 @@ class SmotriBroadcastIE(InfoExtractor):
IE_DESC = 'Smotri.com broadcasts' IE_DESC = 'Smotri.com broadcasts'
IE_NAME = 'smotri:broadcast' IE_NAME = 'smotri:broadcast'
_VALID_URL = r'https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<id>[^/]+))/?.*' _VALID_URL = r'https?://(?:www\.)?(?P<url>smotri\.com/live/(?P<id>[^/]+))/?.*'
_NETRC_MACHINE = 'smotri'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -352,17 +353,18 @@ class SmotriBroadcastIE(InfoExtractor):
adult_content = False adult_content = False
ticket = self._html_search_regex( ticket = self._html_search_regex(
r"window\.broadcast_control\.addFlashVar\('file'\s*,\s*'([^']+)'\)", (r'data-user-file=(["\'])(?P<ticket>(?!\1).+)\1',
broadcast_page, 'broadcast ticket') r"window\.broadcast_control\.addFlashVar\('file'\s*,\s*'(?P<ticket>[^']+)'\)"),
broadcast_page, 'broadcast ticket', group='ticket')
url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket broadcast_url = 'http://smotri.com/broadcast/view/url/?ticket=%s' % ticket
broadcast_password = self._downloader.params.get('videopassword') broadcast_password = self._downloader.params.get('videopassword')
if broadcast_password: if broadcast_password:
url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest() broadcast_url += '&pass=%s' % hashlib.md5(broadcast_password.encode('utf-8')).hexdigest()
broadcast_json_page = self._download_webpage( broadcast_json_page = self._download_webpage(
url, broadcast_id, 'Downloading broadcast JSON') broadcast_url, broadcast_id, 'Downloading broadcast JSON')
try: try:
broadcast_json = json.loads(broadcast_json_page) broadcast_json = json.loads(broadcast_json_page)

View File

@ -22,6 +22,8 @@ class SVTBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['SE'] _GEO_COUNTRIES = ['SE']
def _extract_video(self, video_info, video_id): def _extract_video(self, video_info, video_id):
is_live = dict_get(video_info, ('live', 'simulcast'), default=False)
m3u8_protocol = 'm3u8' if is_live else 'm3u8_native'
formats = [] formats = []
for vr in video_info['videoReferences']: for vr in video_info['videoReferences']:
player_type = vr.get('playerType') or vr.get('format') player_type = vr.get('playerType') or vr.get('format')
@ -30,7 +32,7 @@ class SVTBaseIE(InfoExtractor):
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
vurl, video_id, vurl, video_id,
ext='mp4', entry_protocol='m3u8_native', ext='mp4', entry_protocol=m3u8_protocol,
m3u8_id=player_type, fatal=False)) m3u8_id=player_type, fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
@ -90,6 +92,7 @@ class SVTBaseIE(InfoExtractor):
'season_number': season_number, 'season_number': season_number,
'episode': episode, 'episode': episode,
'episode_number': episode_number, 'episode_number': episode_number,
'is_live': is_live,
} }
@ -134,7 +137,7 @@ class SVTPlayBaseIE(SVTBaseIE):
class SVTPlayIE(SVTPlayBaseIE): class SVTPlayIE(SVTPlayBaseIE):
IE_DESC = 'SVT Play and Öppet arkiv' IE_DESC = 'SVT Play and Öppet arkiv'
_VALID_URL = r'https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp)/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?(?:svtplay|oppetarkiv)\.se/(?:video|klipp|kanaler)/(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2', 'url': 'http://www.svtplay.se/video/5996901/flygplan-till-haile-selassie/flygplan-till-haile-selassie-2',
'md5': '2b6704fe4a28801e1a098bbf3c5ac611', 'md5': '2b6704fe4a28801e1a098bbf3c5ac611',
@ -158,6 +161,9 @@ class SVTPlayIE(SVTPlayBaseIE):
}, { }, {
'url': 'http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg', 'url': 'http://www.svtplay.se/klipp/9023742/stopptid-om-bjorn-borg',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.svtplay.se/kanaler/svt1',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -173,6 +179,10 @@ class SVTPlayIE(SVTPlayBaseIE):
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
def adjust_title(info):
if info['is_live']:
info['title'] = self._live_title(info['title'])
if data: if data:
video_info = try_get( video_info = try_get(
data, lambda x: x['context']['dispatcher']['stores']['VideoTitlePageStore']['data']['video'], data, lambda x: x['context']['dispatcher']['stores']['VideoTitlePageStore']['data']['video'],
@ -183,6 +193,7 @@ class SVTPlayIE(SVTPlayBaseIE):
'title': data['context']['dispatcher']['stores']['MetaStore']['title'], 'title': data['context']['dispatcher']['stores']['MetaStore']['title'],
'thumbnail': thumbnail, 'thumbnail': thumbnail,
}) })
adjust_title(info_dict)
return info_dict return info_dict
video_id = self._search_regex( video_id = self._search_regex(
@ -198,6 +209,7 @@ class SVTPlayIE(SVTPlayBaseIE):
info_dict['title'] = re.sub( info_dict['title'] = re.sub(
r'\s*\|\s*.+?$', '', r'\s*\|\s*.+?$', '',
info_dict.get('episode') or self._og_search_title(webpage)) info_dict.get('episode') or self._og_search_title(webpage))
adjust_title(info_dict)
return info_dict return info_dict

View File

@ -168,6 +168,13 @@ class TwitchItemBaseIE(TwitchBaseIE):
return self.playlist_result(entries, info['id'], info['title']) return self.playlist_result(entries, info['id'], info['title'])
def _extract_info(self, info): def _extract_info(self, info):
status = info.get('status')
if status == 'recording':
is_live = True
elif status == 'recorded':
is_live = False
else:
is_live = None
return { return {
'id': info['_id'], 'id': info['_id'],
'title': info.get('title') or 'Untitled Broadcast', 'title': info.get('title') or 'Untitled Broadcast',
@ -178,6 +185,7 @@ class TwitchItemBaseIE(TwitchBaseIE):
'uploader_id': info.get('channel', {}).get('name'), 'uploader_id': info.get('channel', {}).get('name'),
'timestamp': parse_iso8601(info.get('recorded_at')), 'timestamp': parse_iso8601(info.get('recorded_at')),
'view_count': int_or_none(info.get('views')), 'view_count': int_or_none(info.get('views')),
'is_live': is_live,
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -16,6 +16,7 @@ from ..utils import (
ExtractorError, ExtractorError,
InAdvancePagedList, InAdvancePagedList,
int_or_none, int_or_none,
merge_dicts,
NO_DEFAULT, NO_DEFAULT,
RegexNotFoundError, RegexNotFoundError,
sanitized_Request, sanitized_Request,
@ -639,16 +640,18 @@ class VimeoIE(VimeoBaseInfoExtractor):
'preference': 1, 'preference': 1,
}) })
info_dict = self._parse_config(config, video_id) info_dict_config = self._parse_config(config, video_id)
formats.extend(info_dict['formats']) formats.extend(info_dict_config['formats'])
self._vimeo_sort_formats(formats) self._vimeo_sort_formats(formats)
json_ld = self._search_json_ld(webpage, video_id, default={})
if not cc_license: if not cc_license:
cc_license = self._search_regex( cc_license = self._search_regex(
r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1', r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
webpage, 'license', default=None, group='license') webpage, 'license', default=None, group='license')
info_dict.update({ info_dict = {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'timestamp': unified_timestamp(timestamp), 'timestamp': unified_timestamp(timestamp),
@ -658,7 +661,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': like_count, 'like_count': like_count,
'comment_count': comment_count, 'comment_count': comment_count,
'license': cc_license, 'license': cc_license,
}) }
info_dict = merge_dicts(info_dict, info_dict_config, json_ld)
return info_dict return info_dict

View File

@ -2,9 +2,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
int_or_none, int_or_none,
@ -112,21 +112,24 @@ class VineIE(InfoExtractor):
class VineUserIE(InfoExtractor): class VineUserIE(InfoExtractor):
IE_NAME = 'vine:user' IE_NAME = 'vine:user'
_VALID_URL = r'(?:https?://)?vine\.co/(?P<u>u/)?(?P<user>[^/]+)/?(\?.*)?$' _VALID_URL = r'https?://vine\.co/(?P<u>u/)?(?P<user>[^/]+)'
_VINE_BASE_URL = 'https://vine.co/' _VINE_BASE_URL = 'https://vine.co/'
_TESTS = [ _TESTS = [{
{ 'url': 'https://vine.co/itsruthb',
'url': 'https://vine.co/Visa', 'info_dict': {
'info_dict': { 'id': 'itsruthb',
'id': 'Visa', 'title': 'Ruth B',
}, 'description': '| Instagram/Twitter: itsruthb | still a lost boy from neverland',
'playlist_mincount': 46,
}, },
{ 'playlist_mincount': 611,
'url': 'https://vine.co/u/941705360593584128', }, {
'only_matching': True, 'url': 'https://vine.co/u/942914934646415360',
}, 'only_matching': True,
] }]
@classmethod
def suitable(cls, url):
return False if VineIE.suitable(url) else super(VineUserIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -138,17 +141,14 @@ class VineUserIE(InfoExtractor):
profile_data = self._download_json( profile_data = self._download_json(
profile_url, user, note='Downloading user profile data') profile_url, user, note='Downloading user profile data')
user_id = profile_data['data']['userId'] data = profile_data['data']
timeline_data = [] user_id = data.get('userId') or data['userIdStr']
for pagenum in itertools.count(1): profile = self._download_json(
timeline_url = '%sapi/timelines/users/%s?page=%s&size=100' % ( 'https://archive.vine.co/profiles/%s.json' % user_id, user_id)
self._VINE_BASE_URL, user_id, pagenum)
timeline_page = self._download_json(
timeline_url, user, note='Downloading page %d' % pagenum)
timeline_data.extend(timeline_page['data']['records'])
if timeline_page['data']['nextPage'] is None:
break
entries = [ entries = [
self.url_result(e['permalinkUrl'], 'Vine') for e in timeline_data] self.url_result(
return self.playlist_result(entries, user) 'https://vine.co/v/%s' % post_id, ie='Vine', video_id=post_id)
for post_id in profile['posts']
if post_id and isinstance(post_id, compat_str)]
return self.playlist_result(
entries, user, profile.get('username'), profile.get('description'))

View File

@ -87,7 +87,7 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
# No authentication to be performed # No authentication to be performed
if username is None: if username is None:
if self._LOGIN_REQUIRED: if self._LOGIN_REQUIRED and self._downloader.params.get('cookiefile') is None:
raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True) raise ExtractorError('No login info available, needed for using %s.' % self.IE_NAME, expected=True)
return True return True
@ -2699,10 +2699,7 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _real_extract(self, url): def _entries(self, page):
page = self._download_webpage(
'https://www.youtube.com/feed/%s' % self._FEED_NAME, self._PLAYLIST_TITLE)
# The extraction process is the same as for playlists, but the regex # The extraction process is the same as for playlists, but the regex
# for the video ids doesn't contain an index # for the video ids doesn't contain an index
ids = [] ids = []
@ -2713,12 +2710,15 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
# 'recommended' feed has infinite 'load more' and each new portion spins # 'recommended' feed has infinite 'load more' and each new portion spins
# the same videos in (sometimes) slightly different order, so we'll check # the same videos in (sometimes) slightly different order, so we'll check
# for unicity and break when portion has no new videos # for unicity and break when portion has no new videos
new_ids = filter(lambda video_id: video_id not in ids, orderedSet(matches)) new_ids = list(filter(lambda video_id: video_id not in ids, orderedSet(matches)))
if not new_ids: if not new_ids:
break break
ids.extend(new_ids) ids.extend(new_ids)
for entry in self._ids_to_results(new_ids):
yield entry
mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html) mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html)
if not mobj: if not mobj:
break break
@ -2730,8 +2730,12 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
content_html = more['content_html'] content_html = more['content_html']
more_widget_html = more['load_more_widget_html'] more_widget_html = more['load_more_widget_html']
def _real_extract(self, url):
page = self._download_webpage(
'https://www.youtube.com/feed/%s' % self._FEED_NAME,
self._PLAYLIST_TITLE)
return self.playlist_result( return self.playlist_result(
self._ids_to_results(ids), playlist_title=self._PLAYLIST_TITLE) self._entries(page), playlist_title=self._PLAYLIST_TITLE)
class YoutubeWatchLaterIE(YoutubePlaylistIE): class YoutubeWatchLaterIE(YoutubePlaylistIE):

View File

@ -2260,6 +2260,20 @@ def try_get(src, getter, expected_type=None):
return v return v
def merge_dicts(*dicts):
merged = {}
for a_dict in dicts:
for k, v in a_dict.items():
if v is None:
continue
if (k not in merged or
(isinstance(v, compat_str) and v and
isinstance(merged[k], compat_str) and
not merged[k])):
merged[k] = v
return merged
def encode_compat_str(string, encoding=preferredencoding(), errors='strict'): def encode_compat_str(string, encoding=preferredencoding(), errors='strict'):
return string if isinstance(string, compat_str) else compat_str(string, encoding, errors) return string if isinstance(string, compat_str) else compat_str(string, encoding, errors)
@ -2609,8 +2623,8 @@ def _match_one(filter_part, dct):
return op(actual_value, comparison_value) return op(actual_value, comparison_value)
UNARY_OPERATORS = { UNARY_OPERATORS = {
'': lambda v: v is not None, '': lambda v: (v is True) if isinstance(v, bool) else (v is not None),
'!': lambda v: v is None, '!': lambda v: (v is False) if isinstance(v, bool) else (v is None),
} }
operator_rex = re.compile(r'''(?x)\s* operator_rex = re.compile(r'''(?x)\s*
(?P<op>%s)\s*(?P<key>[a-z_]+) (?P<op>%s)\s*(?P<key>[a-z_]+)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2018.04.09' __version__ = '2018.04.25'