Merge remote-tracking branch 'upstream/master' into BlenderCloud-issue-13282

This commit is contained in:
Parmjit Virk 2017-06-14 14:14:57 -05:00
commit 4f0545b171
16 changed files with 266 additions and 49 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.06.05*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.06.12*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.06.05** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.06.12**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2017.06.05 [debug] youtube-dl version 2017.06.12
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -1,3 +1,36 @@
version 2017.06.12
Core
* [utils] Handle compat_HTMLParseError in extract_attributes (#13349)
+ [compat] Introduce compat_HTMLParseError
* [utils] Improve unified_timestamp
* [extractor/generic] Ensure format id is unicode string
* [extractor/common] Return unicode string from _match_id
+ [YoutubeDL] Sanitize more fields (#13313)
Extractors
+ [xfileshare] Add support for rapidvideo.tv (#13348)
* [xfileshare] Modernize and pass Referer
+ [rutv] Add support for testplayer.vgtrk.com (#13347)
+ [newgrounds] Extract more metadata (#13232)
+ [newgrounds:playlist] Add support for playlists (#10611)
* [newgrounds] Improve formats and uploader extraction (#13346)
* [msn] Fix formats extraction
* [turbo] Ensure format id is string
* [sexu] Ensure height is int
* [jove] Ensure comment count is int
* [golem] Ensure format id is string
* [gfycat] Ensure filesize is int
* [foxgay] Ensure height is int
* [flickr] Ensure format id is string
* [sohu] Fix numeric fields
* [safari] Improve authentication detection (#13319)
* [liveleak] Ensure height is int (#13313)
* [streamango] Make title optional (#13292)
* [rtlnl] Improve URL regular expression (#13295)
* [tvplayer] Fix extraction (#13291)
version 2017.06.05 version 2017.06.05
Core Core

View File

@ -145,18 +145,18 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--max-views COUNT Do not download any videos with more than --max-views COUNT Do not download any videos with more than
COUNT views COUNT views
--match-filter FILTER Generic video filter. Specify any key (see --match-filter FILTER Generic video filter. Specify any key (see
help for -o for a list of available keys) the "OUTPUT TEMPLATE" for a list of
to match if the key is present, !key to available keys) to match if the key is
check if the key is not present, key > present, !key to check if the key is not
NUMBER (like "comment_count > 12", also present, key > NUMBER (like "comment_count
works with >=, <, <=, !=, =) to compare > 12", also works with >=, <, <=, !=, =) to
against a number, key = 'LITERAL' (like compare against a number, key = 'LITERAL'
"uploader = 'Mike Smith'", also works with (like "uploader = 'Mike Smith'", also works
!=) to match against a string literal and & with !=) to match against a string literal
to require multiple matches. Values which and & to require multiple matches. Values
are not known are excluded unless you put a which are not known are excluded unless you
question mark (?) after the operator. For put a question mark (?) after the operator.
example, to only match videos that have For example, to only match videos that have
been liked more than 100 times and disliked been liked more than 100 times and disliked
less than 50 times (or the dislike less than 50 times (or the dislike
functionality is not available at the given functionality is not available at the given
@ -277,8 +277,8 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--get-filename Simulate, quiet but print output filename --get-filename Simulate, quiet but print output filename
--get-format Simulate, quiet but print output format --get-format Simulate, quiet but print output format
-j, --dump-json Simulate, quiet but print JSON information. -j, --dump-json Simulate, quiet but print JSON information.
See --output for a description of available See the "OUTPUT TEMPLATE" for a description
keys. of available keys.
-J, --dump-single-json Simulate, quiet but print JSON information -J, --dump-single-json Simulate, quiet but print JSON information
for each command-line argument. If the URL for each command-line argument. If the URL
refers to a playlist, dump the whole refers to a playlist, dump the whole

View File

@ -512,6 +512,7 @@
- **netease:song**: 网易云音乐 - **netease:song**: 网易云音乐
- **Netzkino** - **Netzkino**
- **Newgrounds** - **Newgrounds**
- **NewgroundsPlaylist**
- **Newstube** - **Newstube**
- **NextMedia**: 蘋果日報 - **NextMedia**: 蘋果日報
- **NextMediaActionNews**: 蘋果日報 - 動新聞 - **NextMediaActionNews**: 蘋果日報 - 動新聞
@ -974,7 +975,7 @@
- **WSJArticle** - **WSJArticle**
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To, XVIDSTAGE, Vid ABC, VidBom, vidlo, RapidVideo.TV
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑 - **xiami:album**: 虾米音乐 - 专辑

View File

@ -340,6 +340,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500) self.assertEqual(unified_timestamp('May 16, 2016 11:15 PM'), 1463440500)
self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100) self.assertEqual(unified_timestamp('Feb 7, 2016 at 6:35 pm'), 1454870100)
self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361) self.assertEqual(unified_timestamp('2017-03-30T17:52:41Q'), 1490896361)
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')
@ -915,6 +916,8 @@ class TestUtil(unittest.TestCase):
supports_outside_bmp = False supports_outside_bmp = False
if supports_outside_bmp: if supports_outside_bmp:
self.assertEqual(extract_attributes('<e x="Smile &#128512;!">'), {'x': 'Smile \U0001f600!'}) self.assertEqual(extract_attributes('<e x="Smile &#128512;!">'), {'x': 'Smile \U0001f600!'})
# Malformed HTML should not break attributes extraction on older Python
self.assertEqual(extract_attributes('<mal"formed/>'), {})
def test_clean_html(self): def test_clean_html(self):
self.assertEqual(clean_html('a:\nb'), 'a: b') self.assertEqual(clean_html('a:\nb'), 'a: b')

View File

@ -2322,6 +2322,19 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
from HTMLParser import HTMLParser as compat_HTMLParser from HTMLParser import HTMLParser as compat_HTMLParser
try: # Python 2
from HTMLParser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python <3.4
try:
from html.parser import HTMLParseError as compat_HTMLParseError
except ImportError: # Python >3.4
# HTMLParseError has been deprecated in Python 3.3 and removed in
# Python 3.5. Introducing dummy exception for Python >3.5 for compatible
# and uniform cross-version exceptiong handling
class compat_HTMLParseError(Exception):
pass
try: try:
from subprocess import DEVNULL from subprocess import DEVNULL
compat_subprocess_get_DEVNULL = lambda: DEVNULL compat_subprocess_get_DEVNULL = lambda: DEVNULL
@ -2882,6 +2895,7 @@ else:
__all__ = [ __all__ = [
'compat_HTMLParseError',
'compat_HTMLParser', 'compat_HTMLParser',
'compat_HTTPError', 'compat_HTTPError',
'compat_basestring', 'compat_basestring',

View File

@ -2328,6 +2328,8 @@ class InfoExtractor(object):
urls = [] urls = []
formats = [] formats = []
for source in jwplayer_sources_data: for source in jwplayer_sources_data:
if not isinstance(source, dict):
continue
source_url = self._proto_relative_url(source.get('file')) source_url = self._proto_relative_url(source.get('file'))
if not source_url: if not source_url:
continue continue

View File

@ -8,7 +8,16 @@ from ..utils import int_or_none
class CorusIE(ThePlatformFeedIE): class CorusIE(ThePlatformFeedIE):
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:globaltv|etcanada)\.com|(?:hgtv|foodnetwork|slice)\.ca)/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://
(?:www\.)?
(?P<domain>
(?:globaltv|etcanada)\.com|
(?:hgtv|foodnetwork|slice|history|showcase)\.ca
)
/(?:video/|(?:[^/]+/)+(?:videos/[a-z0-9-]+-|video\.html\?.*?\bv=))
(?P<id>\d+)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/', 'url': 'http://www.hgtv.ca/shows/bryan-inc/videos/movie-night-popcorn-with-bryan-870923331648/',
'md5': '05dcbca777bf1e58c2acbb57168ad3a6', 'md5': '05dcbca777bf1e58c2acbb57168ad3a6',
@ -27,6 +36,12 @@ class CorusIE(ThePlatformFeedIE):
}, { }, {
'url': 'http://etcanada.com/video/873675331955/meet-the-survivor-game-changers-castaways-part-2/', 'url': 'http://etcanada.com/video/873675331955/meet-the-survivor-game-changers-castaways-part-2/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.history.ca/the-world-without-canada/video/full-episodes/natural-resources/video.html?v=955054659646#video',
'only_matching': True,
}, {
'url': 'http://www.showcase.ca/eyewitness/video/eyewitness++106/video.html?v=955070531919&p=1&s=da#video',
'only_matching': True,
}] }]
_TP_FEEDS = { _TP_FEEDS = {
@ -50,6 +65,14 @@ class CorusIE(ThePlatformFeedIE):
'feed_id': '5tUJLgV2YNJ5', 'feed_id': '5tUJLgV2YNJ5',
'account_id': 2414427935, 'account_id': 2414427935,
}, },
'history': {
'feed_id': 'tQFx_TyyEq4J',
'account_id': 2369613659,
},
'showcase': {
'feed_id': '9H6qyshBZU3E',
'account_id': 2414426607,
},
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -637,7 +637,10 @@ from .neteasemusic import (
NetEaseMusicProgramIE, NetEaseMusicProgramIE,
NetEaseMusicDjRadioIE, NetEaseMusicDjRadioIE,
) )
from .newgrounds import NewgroundsIE from .newgrounds import (
NewgroundsIE,
NewgroundsPlaylistIE,
)
from .newstube import NewstubeIE from .newstube import NewstubeIE
from .nextmedia import ( from .nextmedia import (
NextMediaIE, NextMediaIE,

View File

@ -68,10 +68,6 @@ class MSNIE(InfoExtractor):
format_url = file_.get('url') format_url = file_.get('url')
if not format_url: if not format_url:
continue continue
ext = determine_ext(format_url)
if ext == 'ism':
formats.extend(self._extract_ism_formats(
format_url + '/Manifest', display_id, 'mss', fatal=False))
if 'm3u8' in format_url: if 'm3u8' in format_url:
# m3u8_native should not be used here until # m3u8_native should not be used here until
# https://github.com/rg3/youtube-dl/issues/9913 is fixed # https://github.com/rg3/youtube-dl/issues/9913 is fixed
@ -79,6 +75,9 @@ class MSNIE(InfoExtractor):
format_url, display_id, 'mp4', format_url, display_id, 'mp4',
m3u8_id='hls', fatal=False) m3u8_id='hls', fatal=False)
formats.extend(m3u8_formats) formats.extend(m3u8_formats)
elif determine_ext(format_url) == 'ism':
formats.extend(self._extract_ism_formats(
format_url + '/Manifest', display_id, 'mss', fatal=False))
else: else:
formats.append({ formats.append({
'url': format_url, 'url': format_url,

View File

@ -1,6 +1,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
extract_attributes,
int_or_none,
parse_duration,
parse_filesize,
unified_timestamp,
)
class NewgroundsIE(InfoExtractor): class NewgroundsIE(InfoExtractor):
@ -13,7 +22,10 @@ class NewgroundsIE(InfoExtractor):
'ext': 'mp3', 'ext': 'mp3',
'title': 'B7 - BusMode', 'title': 'B7 - BusMode',
'uploader': 'Burn7', 'uploader': 'Burn7',
} 'timestamp': 1378878540,
'upload_date': '20130911',
'duration': 143,
},
}, { }, {
'url': 'https://www.newgrounds.com/portal/view/673111', 'url': 'https://www.newgrounds.com/portal/view/673111',
'md5': '3394735822aab2478c31b1004fe5e5bc', 'md5': '3394735822aab2478c31b1004fe5e5bc',
@ -22,25 +34,133 @@ class NewgroundsIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dancin', 'title': 'Dancin',
'uploader': 'Squirrelman82', 'uploader': 'Squirrelman82',
'timestamp': 1460256780,
'upload_date': '20160410',
},
}, {
# source format unavailable, additional mp4 formats
'url': 'http://www.newgrounds.com/portal/view/689400',
'info_dict': {
'id': '689400',
'ext': 'mp4',
'title': 'ZTV News Episode 8',
'uploader': 'BennettTheSage',
'timestamp': 1487965140,
'upload_date': '20170224',
},
'params': {
'skip_download': True,
}, },
}] }]
def _real_extract(self, url): def _real_extract(self, url):
media_id = self._match_id(url) media_id = self._match_id(url)
webpage = self._download_webpage(url, media_id) webpage = self._download_webpage(url, media_id)
title = self._html_search_regex( title = self._html_search_regex(
r'<title>([^>]+)</title>', webpage, 'title') r'<title>([^>]+)</title>', webpage, 'title')
uploader = self._html_search_regex( media_url = self._parse_json(self._search_regex(
r'Author\s*<a[^>]+>([^<]+)', webpage, 'uploader', fatal=False) r'"url"\s*:\s*("[^"]+"),', webpage, ''), media_id)
music_url = self._parse_json(self._search_regex( formats = [{
r'"url":("[^"]+"),', webpage, ''), media_id) 'url': media_url,
'format_id': 'source',
'quality': 1,
}]
max_resolution = int_or_none(self._search_regex(
r'max_resolution["\']\s*:\s*(\d+)', webpage, 'max resolution',
default=None))
if max_resolution:
url_base = media_url.rpartition('.')[0]
for resolution in (360, 720, 1080):
if resolution > max_resolution:
break
formats.append({
'url': '%s.%dp.mp4' % (url_base, resolution),
'format_id': '%dp' % resolution,
'height': resolution,
})
self._check_formats(formats, media_id)
self._sort_formats(formats)
uploader = self._search_regex(
r'(?:Author|Writer)\s*<a[^>]+>([^<]+)', webpage, 'uploader',
fatal=False)
timestamp = unified_timestamp(self._search_regex(
r'<dt>Uploaded</dt>\s*<dd>([^<]+)', webpage, 'timestamp',
default=None))
duration = parse_duration(self._search_regex(
r'<dd>Song\s*</dd><dd>.+?</dd><dd>([^<]+)', webpage, 'duration',
default=None))
filesize_approx = parse_filesize(self._html_search_regex(
r'<dd>Song\s*</dd><dd>(.+?)</dd>', webpage, 'filesize',
default=None))
if len(formats) == 1:
formats[0]['filesize_approx'] = filesize_approx
if '<dd>Song' in webpage:
formats[0]['vcodec'] = 'none'
return { return {
'id': media_id, 'id': media_id,
'title': title, 'title': title,
'url': music_url,
'uploader': uploader, 'uploader': uploader,
'timestamp': timestamp,
'duration': duration,
'formats': formats,
} }
class NewgroundsPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:collection|[^/]+/search/[^/]+)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.newgrounds.com/collection/cats',
'info_dict': {
'id': 'cats',
'title': 'Cats',
},
'playlist_mincount': 46,
}, {
'url': 'http://www.newgrounds.com/portal/search/author/ZONE-SAMA',
'info_dict': {
'id': 'ZONE-SAMA',
'title': 'Portal Search: ZONE-SAMA',
},
'playlist_mincount': 47,
}, {
'url': 'http://www.newgrounds.com/audio/search/title/cats',
'only_matching': True,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
title = self._search_regex(
r'<title>([^>]+)</title>', webpage, 'title', default=None)
# cut left menu
webpage = self._search_regex(
r'(?s)<div[^>]+\bclass=["\']column wide(.+)',
webpage, 'wide column', default=webpage)
entries = []
for a, path, media_id in re.findall(
r'(<a[^>]+\bhref=["\']/?((?:portal/view|audio/listen)/(\d+))[^>]+>)',
webpage):
a_class = extract_attributes(a).get('class')
if a_class not in ('item-portalsubmission', 'item-audiosubmission'):
continue
entries.append(
self.url_result(
'https://www.newgrounds.com/%s' % path,
ie=NewgroundsIE.ie_key(), video_id=media_id))
return self.playlist_result(entries, playlist_id, title)

View File

@ -35,7 +35,7 @@ class NPOIE(NPOBaseIE):
https?:// https?://
(?:www\.)? (?:www\.)?
(?: (?:
npo\.nl/(?!live|radio)(?:[^/]+/){2}| npo\.nl/(?!(?:live|radio)/)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}| ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__| omroepwnl\.nl/video/fragment/[^/]+__|
zapp\.nl/[^/]+/[^/]+/ zapp\.nl/[^/]+/[^/]+/
@ -150,6 +150,9 @@ class NPOIE(NPOBaseIE):
# live stream # live stream
'url': 'npo:LI_NL1_4188102', 'url': 'npo:LI_NL1_4188102',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.npo.nl/radio-gaga/13-06-2017/BNN_101383373',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -13,11 +13,15 @@ from ..utils import (
class RUTVIE(InfoExtractor): class RUTVIE(InfoExtractor):
IE_DESC = 'RUTV.RU' IE_DESC = 'RUTV.RU'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://player\.(?:rutv\.ru|vgtrk\.com)/ https?://
(?P<path>flash\d+v/container\.swf\?id= (?:test)?player\.(?:rutv\.ru|vgtrk\.com)/
|iframe/(?P<type>swf|video|live)/id/ (?P<path>
|index/iframe/cast_id/) flash\d+v/container\.swf\?id=|
(?P<id>\d+)''' iframe/(?P<type>swf|video|live)/id/|
index/iframe/cast_id/
)
(?P<id>\d+)
'''
_TESTS = [ _TESTS = [
{ {
@ -99,17 +103,21 @@ class RUTVIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
'url': 'https://testplayer.vgtrk.com/iframe/live/id/19201/showZoomBtn/false/isPlay/true/',
'only_matching': True,
},
] ]
@classmethod @classmethod
def _extract_url(cls, webpage): def _extract_url(cls, webpage):
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>https?://player\.(?:rutv\.ru|vgtrk\.com)/(?:iframe/(?:swf|video|live)/id|index/iframe/cast_id)/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>https?://(?:test)?player\.(?:rutv\.ru|vgtrk\.com)/(?:iframe/(?:swf|video|live)/id|index/iframe/cast_id)/.+?)\1', webpage)
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')
mobj = re.search( mobj = re.search(
r'<meta[^>]+?property=(["\'])og:video\1[^>]+?content=(["\'])(?P<url>https?://player\.(?:rutv\.ru|vgtrk\.com)/flash\d+v/container\.swf\?id=.+?\2)', r'<meta[^>]+?property=(["\'])og:video\1[^>]+?content=(["\'])(?P<url>https?://(?:test)?player\.(?:rutv\.ru|vgtrk\.com)/flash\d+v/container\.swf\?id=.+?\2)',
webpage) webpage)
if mobj: if mobj:
return mobj.group('url') return mobj.group('url')

View File

@ -10,7 +10,6 @@ from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
NO_DEFAULT, NO_DEFAULT,
sanitized_Request,
urlencode_postdata, urlencode_postdata,
) )
@ -30,6 +29,7 @@ class XFileShareIE(InfoExtractor):
(r'vidabc\.com', 'Vid ABC'), (r'vidabc\.com', 'Vid ABC'),
(r'vidbom\.com', 'VidBom'), (r'vidbom\.com', 'VidBom'),
(r'vidlo\.us', 'vidlo'), (r'vidlo\.us', 'vidlo'),
(r'rapidvideo\.(?:cool|org)', 'RapidVideo.TV'),
) )
IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1]) IE_DESC = 'XFileShare based sites: %s' % ', '.join(list(zip(*_SITES))[1])
@ -109,6 +109,9 @@ class XFileShareIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://www.rapidvideo.cool/b667kprndr8w',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -130,12 +133,12 @@ class XFileShareIE(InfoExtractor):
if countdown: if countdown:
self._sleep(countdown, video_id) self._sleep(countdown, video_id)
post = urlencode_postdata(fields) webpage = self._download_webpage(
url, video_id, 'Downloading video page',
req = sanitized_Request(url, post) data=urlencode_postdata(fields), headers={
req.add_header('Content-type', 'application/x-www-form-urlencoded') 'Referer': url,
'Content-type': 'application/x-www-form-urlencoded',
webpage = self._download_webpage(req, video_id, 'Downloading video page') })
title = (self._search_regex( title = (self._search_regex(
(r'style="z-index: [0-9]+;">([^<]+)</span>', (r'style="z-index: [0-9]+;">([^<]+)</span>',

View File

@ -36,6 +36,7 @@ import xml.etree.ElementTree
import zlib import zlib
from .compat import ( from .compat import (
compat_HTMLParseError,
compat_HTMLParser, compat_HTMLParser,
compat_basestring, compat_basestring,
compat_chr, compat_chr,
@ -409,8 +410,12 @@ def extract_attributes(html_element):
but the cases in the unit test will work for all of 2.6, 2.7, 3.2-3.5. but the cases in the unit test will work for all of 2.6, 2.7, 3.2-3.5.
""" """
parser = HTMLAttributeParser() parser = HTMLAttributeParser()
try:
parser.feed(html_element) parser.feed(html_element)
parser.close() parser.close()
# Older Python may throw HTMLParseError in case of malformed HTML
except compat_HTMLParseError:
pass
return parser.attrs return parser.attrs
@ -1179,7 +1184,7 @@ def unified_timestamp(date_str, day_first=True):
if date_str is None: if date_str is None:
return None return None
date_str = date_str.replace(',', ' ') date_str = re.sub(r'[,|]', '', date_str)
pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0 pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0
timezone, date_str = extract_timezone(date_str) timezone, date_str = extract_timezone(date_str)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2017.06.05' __version__ = '2017.06.12'