commit
23edc3f8dc
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@ -6,8 +6,8 @@
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.06**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.24**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2016.04.06
|
||||
[debug] youtube-dl version 2016.04.24
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
1
AUTHORS
1
AUTHORS
@ -167,3 +167,4 @@ Kacper Michajłow
|
||||
José Joaquín Atria
|
||||
Viťas Strádal
|
||||
Kagami Hiiragi
|
||||
Philip Huppert
|
||||
|
@ -140,14 +140,14 @@ After you have ensured this site is distributing it's content legally, you can f
|
||||
# TODO more properties (see youtube_dl/extractor/common.py)
|
||||
}
|
||||
```
|
||||
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
|
||||
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
|
||||
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
|
||||
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
|
||||
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
|
||||
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
|
||||
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
|
||||
|
||||
$ git add youtube_dl/extractor/__init__.py
|
||||
$ git add youtube_dl/extractor/extractors.py
|
||||
$ git add youtube_dl/extractor/yourextractor.py
|
||||
$ git commit -m '[yourextractor] Add new extractor'
|
||||
$ git push origin yourextractor
|
||||
|
16
README.md
16
README.md
@ -176,7 +176,9 @@ which means you can modify it, redistribute it or use it however you like.
|
||||
--xattr-set-filesize Set file xattribute ytdl.filesize with
|
||||
expected filesize (experimental)
|
||||
--hls-prefer-native Use the native HLS downloader instead of
|
||||
ffmpeg (experimental)
|
||||
ffmpeg
|
||||
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
|
||||
downloader
|
||||
--hls-use-mpegts Use the mpegts container for HLS videos,
|
||||
allowing to play the video while
|
||||
downloading (some players may not be able
|
||||
@ -515,6 +517,18 @@ Available for the video that is an episode of some series or programme:
|
||||
- `episode_number`: Number of the video episode within a season
|
||||
- `episode_id`: Id of the video episode
|
||||
|
||||
Available for the media that is a track or a part of a music album:
|
||||
- `track`: Title of the track
|
||||
- `track_number`: Number of the track within an album or a disc
|
||||
- `track_id`: Id of the track
|
||||
- `artist`: Artist(s) of the track
|
||||
- `genre`: Genre(s) of the track
|
||||
- `album`: Title of the album the track belongs to
|
||||
- `album_type`: Type of the album
|
||||
- `album_artist`: List of all artists appeared on the album
|
||||
- `disc_number`: Number of the disc or other physical medium the track belongs to
|
||||
- `release_year`: Year (YYYY) when the album was released
|
||||
|
||||
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
|
||||
|
||||
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
|
||||
|
@ -50,6 +50,7 @@
|
||||
- **arte.tv:ddc**
|
||||
- **arte.tv:embed**
|
||||
- **arte.tv:future**
|
||||
- **arte.tv:info**
|
||||
- **arte.tv:magazine**
|
||||
- **AtresPlayer**
|
||||
- **ATTTechChannel**
|
||||
@ -115,6 +116,7 @@
|
||||
- **Cinemassacre**
|
||||
- **Clipfish**
|
||||
- **cliphunter**
|
||||
- **ClipRs**
|
||||
- **Clipsyndicate**
|
||||
- **cloudtime**: CloudTime
|
||||
- **Cloudy**
|
||||
@ -161,6 +163,7 @@
|
||||
- **defense.gouv.fr**
|
||||
- **democracynow**
|
||||
- **DHM**: Filmarchiv - Deutsches Historisches Museum
|
||||
- **DigitallySpeaking**
|
||||
- **Digiteka**
|
||||
- **Discovery**
|
||||
- **Dotsub**
|
||||
@ -172,7 +175,6 @@
|
||||
- **Dropbox**
|
||||
- **DrTuber**
|
||||
- **DRTV**
|
||||
- **Dump**
|
||||
- **Dumpert**
|
||||
- **dvtv**: http://video.aktualne.cz/
|
||||
- **dw**
|
||||
@ -286,7 +288,6 @@
|
||||
- **ivi:compilation**: ivi.ru compilations
|
||||
- **ivideon**: Ivideon TV
|
||||
- **Izlesene**
|
||||
- **JadoreCettePub**
|
||||
- **JeuxVideo**
|
||||
- **Jove**
|
||||
- **jpopsuki.tv**
|
||||
@ -344,19 +345,22 @@
|
||||
- **metacafe**
|
||||
- **Metacritic**
|
||||
- **Mgoon**
|
||||
- **MGTV**: 芒果TV
|
||||
- **Minhateca**
|
||||
- **MinistryGrid**
|
||||
- **Minoto**
|
||||
- **miomio.tv**
|
||||
- **MiTele**: mitele.es
|
||||
- **mixcloud**
|
||||
- **mixcloud:playlist**
|
||||
- **mixcloud:stream**
|
||||
- **mixcloud:user**
|
||||
- **MLB**
|
||||
- **Mnet**
|
||||
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
|
||||
- **Mofosex**
|
||||
- **Mojvideo**
|
||||
- **Moniker**: allmyvideos.net and vidspot.net
|
||||
- **mooshare**: Mooshare.biz
|
||||
- **Morningstar**: morningstar.com
|
||||
- **Motherless**
|
||||
- **Motorsport**: motorsport.com
|
||||
@ -393,7 +397,6 @@
|
||||
- **ndr:embed:base**
|
||||
- **NDTV**
|
||||
- **NerdCubedFeed**
|
||||
- **Nerdist**
|
||||
- **netease:album**: 网易云音乐 - 专辑
|
||||
- **netease:djradio**: 网易云音乐 - 电台
|
||||
- **netease:mv**: 网易云音乐 - MV
|
||||
@ -411,7 +414,8 @@
|
||||
- **nfl.com**
|
||||
- **nhl.com**
|
||||
- **nhl.com:news**: NHL news
|
||||
- **nhl.com:videocenter**: NHL videocenter category
|
||||
- **nhl.com:videocenter**
|
||||
- **nhl.com:videocenter:category**: NHL videocenter category
|
||||
- **nick.com**
|
||||
- **niconico**: ニコニコ動画
|
||||
- **NiconicoPlaylist**
|
||||
@ -459,13 +463,13 @@
|
||||
- **Patreon**
|
||||
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
|
||||
- **pcmag**
|
||||
- **People**
|
||||
- **Periscope**: Periscope
|
||||
- **PhilharmonieDeParis**: Philharmonie de Paris
|
||||
- **phoenix.de**
|
||||
- **Photobucket**
|
||||
- **Pinkbike**
|
||||
- **Pladform**
|
||||
- **PlanetaPlay**
|
||||
- **play.fm**
|
||||
- **played.to**
|
||||
- **PlaysTV**
|
||||
@ -484,6 +488,7 @@
|
||||
- **Pornotube**
|
||||
- **PornoVoisines**
|
||||
- **PornoXO**
|
||||
- **PressTV**
|
||||
- **PrimeShareTV**
|
||||
- **PromptFile**
|
||||
- **prosiebensat1**: ProSiebenSat.1 Digital
|
||||
@ -494,7 +499,6 @@
|
||||
- **qqmusic:playlist**: QQ音乐 - 歌单
|
||||
- **qqmusic:singer**: QQ音乐 - 歌手
|
||||
- **qqmusic:toplist**: QQ音乐 - 排行榜
|
||||
- **QuickVid**
|
||||
- **R7**
|
||||
- **radio.de**
|
||||
- **radiobremen**
|
||||
@ -608,6 +612,7 @@
|
||||
- **Tagesschau**
|
||||
- **Tapely**
|
||||
- **Tass**
|
||||
- **TDSLifeway**
|
||||
- **teachertube**: teachertube.com videos
|
||||
- **teachertube:user:collection**: teachertube.com user and collection videos
|
||||
- **TeachingChannel**
|
||||
@ -624,7 +629,6 @@
|
||||
- **TeleTask**
|
||||
- **TF1**
|
||||
- **TheIntercept**
|
||||
- **TheOnion**
|
||||
- **ThePlatform**
|
||||
- **ThePlatformFeed**
|
||||
- **TheScene**
|
||||
@ -683,7 +687,6 @@
|
||||
- **twitter**
|
||||
- **twitter:amplify**
|
||||
- **twitter:card**
|
||||
- **Ubu**
|
||||
- **udemy**
|
||||
- **udemy:course**
|
||||
- **UDNEmbed**: 聯合影音
|
||||
@ -753,7 +756,6 @@
|
||||
- **Walla**
|
||||
- **WashingtonPost**
|
||||
- **wat.tv**
|
||||
- **WayOfTheMaster**
|
||||
- **WDR**
|
||||
- **wdr:mobile**
|
||||
- **WDRMaus**: Sendung mit der Maus
|
||||
|
@ -413,6 +413,7 @@ class TestUtil(unittest.TestCase):
|
||||
self.assertEqual(parse_duration('01:02:03:04'), 93784)
|
||||
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
|
||||
self.assertEqual(parse_duration('87 Min.'), 5220)
|
||||
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
|
||||
|
||||
def test_fix_xml_ampersands(self):
|
||||
self.assertEqual(
|
||||
|
@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase):
|
||||
ie = YoutubePlaylistIE(dl)
|
||||
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
|
||||
entries = result['entries']
|
||||
self.assertTrue(len(entries) >= 20)
|
||||
self.assertTrue(len(entries) >= 50)
|
||||
original_video = entries[0]
|
||||
self.assertEqual(original_video['id'], 'OQpdSVF_k_w')
|
||||
|
||||
|
@ -260,7 +260,9 @@ class YoutubeDL(object):
|
||||
The following options determine which downloader is picked:
|
||||
external_downloader: Executable of the external downloader to call.
|
||||
None or unset for standard (built-in) downloader.
|
||||
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv.
|
||||
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
|
||||
if True, otherwise use ffmpeg/avconv if False, otherwise
|
||||
use downloader suggested by extractor if None.
|
||||
|
||||
The following parameters are not used by YoutubeDL itself, they are used by
|
||||
the downloader (see youtube_dl/downloader/common.py):
|
||||
|
@ -41,9 +41,12 @@ def get_suitable_downloader(info_dict, params={}):
|
||||
if ed.can_download(info_dict):
|
||||
return ed
|
||||
|
||||
if protocol == 'm3u8' and params.get('hls_prefer_native'):
|
||||
if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
|
||||
return HlsFD
|
||||
|
||||
if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False:
|
||||
return FFmpegFD
|
||||
|
||||
return PROTOCOL_MAP.get(protocol, HttpFD)
|
||||
|
||||
|
||||
|
@ -225,7 +225,7 @@ class FFmpegFD(ExternalFD):
|
||||
|
||||
args += ['-i', url, '-c', 'copy']
|
||||
if protocol == 'm3u8':
|
||||
if self.params.get('hls_use_mpegts', False):
|
||||
if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
|
||||
args += ['-f', 'mpegts']
|
||||
else:
|
||||
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
|
||||
|
@ -27,6 +27,8 @@ class RtspFD(FileDownloader):
|
||||
self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
|
||||
return False
|
||||
|
||||
self._debug_cmd(args)
|
||||
|
||||
retval = subprocess.call(args)
|
||||
if retval == 0:
|
||||
fsize = os.path.getsize(encodeFilename(tmpfilename))
|
||||
|
@ -12,9 +12,10 @@ from ..utils import (
|
||||
|
||||
class AolIE(InfoExtractor):
|
||||
IE_NAME = 'on.aol.com'
|
||||
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[^/?-]+)'
|
||||
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/.*-)(?P<id>[^/?-]+)'
|
||||
|
||||
_TESTS = [{
|
||||
# video with 5min ID
|
||||
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
|
||||
'md5': '18ef68f48740e86ae94b98da815eec42',
|
||||
'info_dict': {
|
||||
@ -31,6 +32,7 @@ class AolIE(InfoExtractor):
|
||||
'skip_download': True,
|
||||
}
|
||||
}, {
|
||||
# video with vidible ID
|
||||
'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
|
||||
'info_dict': {
|
||||
'id': '5707d6b8e4b090497b04f706',
|
||||
@ -45,6 +47,12 @@ class AolIE(InfoExtractor):
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
}
|
||||
}, {
|
||||
'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor):
|
||||
subtitle_url = media_info.get('_subtitleUrl')
|
||||
if subtitle_url:
|
||||
subtitles['de'] = [{
|
||||
'ext': 'srt',
|
||||
'ext': 'ttml',
|
||||
'url': subtitle_url,
|
||||
}]
|
||||
|
||||
|
@ -210,7 +210,7 @@ class ArteTVPlus7IE(InfoExtractor):
|
||||
# It also uses the arte_vp_url url from the webpage to extract the information
|
||||
class ArteTVCreativeIE(ArteTVPlus7IE):
|
||||
IE_NAME = 'arte.tv:creative'
|
||||
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)'
|
||||
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
|
||||
@ -229,9 +229,27 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
|
||||
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
|
||||
'upload_date': '20140805',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
|
||||
class ArteTVInfoIE(ArteTVPlus7IE):
|
||||
IE_NAME = 'arte.tv:info'
|
||||
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
|
||||
'info_dict': {
|
||||
'id': '067528-000-A',
|
||||
'ext': 'mp4',
|
||||
'title': 'Service civique, un cache misère ?',
|
||||
'upload_date': '20160403',
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
class ArteTVFutureIE(ArteTVPlus7IE):
|
||||
IE_NAME = 'arte.tv:future'
|
||||
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
|
||||
@ -337,7 +355,7 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
|
||||
IE_NAME = 'arte.tv:embed'
|
||||
_VALID_URL = r'''(?x)
|
||||
http://www\.arte\.tv
|
||||
/playerv2/embed\.php\?json_url=
|
||||
/(?:playerv2/embed|arte_vp/index)\.php\?json_url=
|
||||
(?P<json_url>
|
||||
http://arte\.tv/papi/tvguide/videos/stream/player/
|
||||
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*
|
||||
|
@ -30,14 +30,14 @@ class AudiomackIE(InfoExtractor):
|
||||
# audiomack wrapper around soundcloud song
|
||||
{
|
||||
'add_ie': ['Soundcloud'],
|
||||
'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare',
|
||||
'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
|
||||
'info_dict': {
|
||||
'id': '172419696',
|
||||
'id': '258901379',
|
||||
'ext': 'mp3',
|
||||
'description': 'md5:1fc3272ed7a635cce5be1568c2822997',
|
||||
'title': 'Young Thug ft Lil Wayne - Take Kare',
|
||||
'uploader': 'Young Thug World',
|
||||
'upload_date': '20141016',
|
||||
'description': 'mamba day freestyle for the legend Kobe Bryant ',
|
||||
'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
|
||||
'uploader': 'ILOVEMAKONNEN',
|
||||
'upload_date': '20160414',
|
||||
}
|
||||
},
|
||||
]
|
||||
|
@ -671,6 +671,7 @@ class BBCIE(BBCCoUkIE):
|
||||
'info_dict': {
|
||||
'id': '34475836',
|
||||
'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
|
||||
'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
|
||||
},
|
||||
'playlist_count': 3,
|
||||
}, {
|
||||
|
@ -340,7 +340,7 @@ class BrightcoveLegacyIE(InfoExtractor):
|
||||
ext = 'flv'
|
||||
if ext is None:
|
||||
ext = determine_ext(url)
|
||||
tbr = int_or_none(rend.get('encodingRate'), 1000),
|
||||
tbr = int_or_none(rend.get('encodingRate'), 1000)
|
||||
a_format = {
|
||||
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
|
||||
'url': url,
|
||||
|
@ -33,6 +33,7 @@ class CBCIE(InfoExtractor):
|
||||
'title': 'Robin Williams freestyles on 90 Minutes Live',
|
||||
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
|
||||
'upload_date': '19700101',
|
||||
'uploader': 'CBCC-NEW',
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
|
@ -5,7 +5,6 @@ from ..utils import (
|
||||
xpath_text,
|
||||
xpath_element,
|
||||
int_or_none,
|
||||
ExtractorError,
|
||||
find_xpath_attr,
|
||||
)
|
||||
|
||||
@ -64,7 +63,7 @@ class CBSIE(CBSBaseIE):
|
||||
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true'
|
||||
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
@ -84,11 +83,11 @@ class CBSIE(CBSBaseIE):
|
||||
pid = xpath_text(item, 'pid')
|
||||
if not pid:
|
||||
continue
|
||||
try:
|
||||
tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
|
||||
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
|
||||
tp_release_url += '&manifest=m3u'
|
||||
tp_formats, tp_subtitles = self._extract_theplatform_smil(
|
||||
self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid)
|
||||
except ExtractorError:
|
||||
continue
|
||||
tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
|
||||
formats.extend(tp_formats)
|
||||
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
|
||||
self._sort_formats(formats)
|
||||
|
@ -382,7 +382,7 @@ class InfoExtractor(object):
|
||||
else:
|
||||
if query:
|
||||
url_or_request = update_url_query(url_or_request, query)
|
||||
if data or headers:
|
||||
if data is not None or headers:
|
||||
url_or_request = sanitized_Request(url_or_request, data, headers)
|
||||
try:
|
||||
return self._downloader.urlopen(url_or_request)
|
||||
|
114
youtube_dl/extractor/dispeak.py
Normal file
114
youtube_dl/extractor/dispeak.py
Normal file
@ -0,0 +1,114 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
remove_end,
|
||||
xpath_element,
|
||||
xpath_text,
|
||||
)
|
||||
|
||||
|
||||
class DigitallySpeakingIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
|
||||
|
||||
_TESTS = [{
|
||||
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
|
||||
'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml',
|
||||
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
|
||||
'info_dict': {
|
||||
'id': '840376_BQRC',
|
||||
'ext': 'mp4',
|
||||
'title': 'Tenacious Design and The Interface of \'Destiny\'',
|
||||
},
|
||||
}, {
|
||||
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
|
||||
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _parse_mp4(self, metadata):
|
||||
video_formats = []
|
||||
video_root = None
|
||||
|
||||
mp4_video = xpath_text(metadata, './mp4video', default=None)
|
||||
if mp4_video is not None:
|
||||
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video)
|
||||
video_root = mobj.group('root')
|
||||
if video_root is None:
|
||||
http_host = xpath_text(metadata, 'httpHost', default=None)
|
||||
if http_host:
|
||||
video_root = 'http://%s/' % http_host
|
||||
if video_root is None:
|
||||
# Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js
|
||||
# Works for GPUTechConf, too
|
||||
video_root = 'http://s3-2u.digitallyspeaking.com/'
|
||||
|
||||
formats = metadata.findall('./MBRVideos/MBRVideo')
|
||||
if not formats:
|
||||
return None
|
||||
for a_format in formats:
|
||||
stream_name = xpath_text(a_format, 'streamName', fatal=True)
|
||||
video_path = re.match(r'mp4\:(?P<path>.*)', stream_name).group('path')
|
||||
url = video_root + video_path
|
||||
vbr = xpath_text(a_format, 'bitrate')
|
||||
video_formats.append({
|
||||
'url': url,
|
||||
'vbr': int_or_none(vbr),
|
||||
})
|
||||
return video_formats
|
||||
|
||||
def _parse_flv(self, metadata):
|
||||
formats = []
|
||||
akamai_url = xpath_text(metadata, './akamaiHost', fatal=True)
|
||||
audios = metadata.findall('./audios/audio')
|
||||
for audio in audios:
|
||||
formats.append({
|
||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
||||
'play_path': remove_end(audio.get('url'), '.flv'),
|
||||
'ext': 'flv',
|
||||
'vcodec': 'none',
|
||||
'format_id': audio.get('code'),
|
||||
})
|
||||
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
|
||||
formats.append({
|
||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
||||
'play_path': remove_end(slide_video_path, '.flv'),
|
||||
'ext': 'flv',
|
||||
'format_note': 'slide deck video',
|
||||
'quality': -2,
|
||||
'preference': -2,
|
||||
'format_id': 'slides',
|
||||
})
|
||||
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
|
||||
formats.append({
|
||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
||||
'play_path': remove_end(speaker_video_path, '.flv'),
|
||||
'ext': 'flv',
|
||||
'format_note': 'speaker video',
|
||||
'quality': -1,
|
||||
'preference': -1,
|
||||
'format_id': 'speaker',
|
||||
})
|
||||
return formats
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
xml_description = self._download_xml(url, video_id)
|
||||
metadata = xpath_element(xml_description, 'metadata')
|
||||
|
||||
video_formats = self._parse_mp4(metadata)
|
||||
if video_formats is None:
|
||||
video_formats = self._parse_flv(metadata)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'formats': video_formats,
|
||||
'title': xpath_text(metadata, 'title', fatal=True),
|
||||
'duration': parse_duration(xpath_text(metadata, 'endTime')),
|
||||
'creator': xpath_text(metadata, 'speaker'),
|
||||
}
|
@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor):
|
||||
'display_id': 'iseven',
|
||||
'ext': 'flv',
|
||||
'title': 're:^清晨醒脑!T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||
'description': 'md5:f34981259a03e980a3c6404190a3ed61',
|
||||
'description': 're:.*m7show@163\.com.*',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'uploader': '7师傅',
|
||||
'uploader_id': '431925',
|
||||
@ -43,7 +43,7 @@ class DouyuTVIE(InfoExtractor):
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Romm not found',
|
||||
'skip': 'Room not found',
|
||||
}, {
|
||||
'url': 'http://www.douyutv.com/17732',
|
||||
'info_dict': {
|
||||
@ -51,7 +51,7 @@ class DouyuTVIE(InfoExtractor):
|
||||
'display_id': '17732',
|
||||
'ext': 'flv',
|
||||
'title': 're:^清晨醒脑!T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
|
||||
'description': 'md5:f34981259a03e980a3c6404190a3ed61',
|
||||
'description': 're:.*m7show@163\.com.*',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'uploader': '7师傅',
|
||||
'uploader_id': '431925',
|
||||
@ -75,13 +75,28 @@ class DouyuTVIE(InfoExtractor):
|
||||
room_id = self._html_search_regex(
|
||||
r'"room_id"\s*:\s*(\d+),', page, 'room id')
|
||||
|
||||
config = None
|
||||
# Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache"
|
||||
# Retry with different parameters - same parameters cause same errors
|
||||
for i in range(5):
|
||||
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
|
||||
room_id, int(time.time()))
|
||||
|
||||
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
|
||||
config = self._download_json(
|
||||
|
||||
config_page = self._download_webpage(
|
||||
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
|
||||
video_id)
|
||||
try:
|
||||
config = self._parse_json(config_page, video_id, fatal=False)
|
||||
except ExtractorError:
|
||||
# Wait some time before retrying to get a different time() value
|
||||
self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. '
|
||||
'Waiting for %(timeout)s seconds before retrying')
|
||||
continue
|
||||
else:
|
||||
break
|
||||
if config is None:
|
||||
raise ExtractorError('Unable to fetch API result')
|
||||
|
||||
data = config['data']
|
||||
|
||||
|
@ -6,13 +6,18 @@ import re
|
||||
import time
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import int_or_none
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
class DPlayIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
|
||||
|
||||
_TESTS = [{
|
||||
# geo restricted, via direct unsigned hls URL
|
||||
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
|
||||
'info_dict': {
|
||||
'id': '1255600',
|
||||
@ -31,11 +36,12 @@ class DPlayIE(InfoExtractor):
|
||||
},
|
||||
'expected_warnings': ['Unable to download f4m manifest'],
|
||||
}, {
|
||||
# non geo restricted, via secure api, unsigned download hls URL
|
||||
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
|
||||
'info_dict': {
|
||||
'id': '3172',
|
||||
'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Svensken lär sig njuta av livet',
|
||||
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
|
||||
'duration': 2650,
|
||||
@ -48,23 +54,25 @@ class DPlayIE(InfoExtractor):
|
||||
'age_limit': 0,
|
||||
},
|
||||
}, {
|
||||
# geo restricted, via secure api, unsigned download hls URL
|
||||
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
|
||||
'info_dict': {
|
||||
'id': '70816',
|
||||
'display_id': 'season-6-episode-12',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Episode 12',
|
||||
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
|
||||
'duration': 2563,
|
||||
'timestamp': 1429696800,
|
||||
'upload_date': '20150422',
|
||||
'creator': 'Kanal 4',
|
||||
'creator': 'Kanal 4 (Home)',
|
||||
'series': 'Mig og min mor',
|
||||
'season_number': 6,
|
||||
'episode_number': 12,
|
||||
'age_limit': 0,
|
||||
},
|
||||
}, {
|
||||
# geo restricted, via direct unsigned hls URL
|
||||
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
@ -90,17 +98,24 @@ class DPlayIE(InfoExtractor):
|
||||
|
||||
def extract_formats(protocol, manifest_url):
|
||||
if protocol == 'hls':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
m3u8_formats = self._extract_m3u8_formats(
|
||||
manifest_url, video_id, ext='mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False))
|
||||
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
|
||||
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
|
||||
# ourselves
|
||||
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
|
||||
for m3u8_format in m3u8_formats:
|
||||
m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
|
||||
formats.extend(m3u8_formats)
|
||||
elif protocol == 'hds':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
|
||||
video_id, f4m_id=protocol, fatal=False))
|
||||
|
||||
domain_tld = domain.split('.')[-1]
|
||||
if domain_tld in ('se', 'dk'):
|
||||
if domain_tld in ('se', 'dk', 'no'):
|
||||
for protocol in PROTOCOLS:
|
||||
# Providing dsc-geo allows to bypass geo restriction in some cases
|
||||
self._set_cookie(
|
||||
'secure.dplay.%s' % domain_tld, 'dsc-geo',
|
||||
json.dumps({
|
||||
@ -113,13 +128,24 @@ class DPlayIE(InfoExtractor):
|
||||
'Downloading %s stream JSON' % protocol, fatal=False)
|
||||
if stream and stream.get(protocol):
|
||||
extract_formats(protocol, stream[protocol])
|
||||
else:
|
||||
|
||||
# The last resort is to try direct unsigned hls/hds URLs from info dictionary.
|
||||
# Sometimes this does work even when secure API with dsc-geo has failed (e.g.
|
||||
# http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
|
||||
if not formats:
|
||||
for protocol in PROTOCOLS:
|
||||
if info.get(protocol):
|
||||
extract_formats(protocol, info[protocol])
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
for lang in ('se', 'sv', 'da', 'nl', 'no'):
|
||||
for format_id in ('web_vtt', 'vtt', 'srt'):
|
||||
subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
|
||||
if subtitle_url:
|
||||
subtitles.setdefault(lang, []).append({'url': subtitle_url})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
@ -133,4 +159,5 @@ class DPlayIE(InfoExtractor):
|
||||
'episode_number': int_or_none(info.get('episode')),
|
||||
'age_limit': int_or_none(info.get('minimum_age')),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
@ -1,39 +0,0 @@
|
||||
# encoding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class DumpIE(InfoExtractor):
|
||||
_VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.dump.com/oneus/',
|
||||
'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
|
||||
'info_dict': {
|
||||
'id': 'oneus',
|
||||
'ext': 'flv',
|
||||
'title': "He's one of us.",
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
m = re.match(self._VALID_URL, url)
|
||||
video_id = m.group('id')
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
video_url = self._search_regex(
|
||||
r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
|
||||
|
||||
title = self._og_search_title(webpage)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'url': video_url,
|
||||
'thumbnail': thumbnail,
|
||||
}
|
@ -4,9 +4,11 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_HTTPError
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
url_basename,
|
||||
)
|
||||
|
||||
|
||||
@ -21,7 +23,7 @@ class EaglePlatformIE(InfoExtractor):
|
||||
_TESTS = [{
|
||||
# http://lenta.ru/news/2015/03/06/navalny/
|
||||
'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
|
||||
'md5': '70f5187fb620f2c1d503b3b22fd4efe3',
|
||||
'md5': '881ee8460e1b7735a8be938e2ffb362b',
|
||||
'info_dict': {
|
||||
'id': '227304',
|
||||
'ext': 'mp4',
|
||||
@ -36,7 +38,7 @@ class EaglePlatformIE(InfoExtractor):
|
||||
# http://muz-tv.ru/play/7129/
|
||||
# http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true
|
||||
'url': 'eagleplatform:media.clipyou.ru:12820',
|
||||
'md5': '90b26344ba442c8e44aa4cf8f301164a',
|
||||
'md5': '358597369cf8ba56675c1df15e7af624',
|
||||
'info_dict': {
|
||||
'id': '12820',
|
||||
'ext': 'mp4',
|
||||
@ -55,8 +57,13 @@ class EaglePlatformIE(InfoExtractor):
|
||||
raise ExtractorError(' '.join(response['errors']), expected=True)
|
||||
|
||||
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
|
||||
try:
|
||||
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
|
||||
except ExtractorError as ee:
|
||||
if isinstance(ee.cause, compat_HTTPError):
|
||||
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
|
||||
self._handle_error(response)
|
||||
raise
|
||||
return response
|
||||
|
||||
def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'):
|
||||
@ -84,17 +91,30 @@ class EaglePlatformIE(InfoExtractor):
|
||||
|
||||
secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:')
|
||||
|
||||
formats = []
|
||||
|
||||
m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON')
|
||||
formats = self._extract_m3u8_formats(
|
||||
m3u8_formats = self._extract_m3u8_formats(
|
||||
m3u8_url, video_id,
|
||||
'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
|
||||
formats.extend(m3u8_formats)
|
||||
|
||||
mp4_url = self._get_video_url(
|
||||
# Secure mp4 URL is constructed according to Player.prototype.mp4 from
|
||||
# http://lentaru.media.eagleplatform.com/player/player.js
|
||||
re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8),
|
||||
video_id, 'Downloading mp4 JSON')
|
||||
formats.append({'url': mp4_url, 'format_id': 'mp4'})
|
||||
mp4_url_basename = url_basename(mp4_url)
|
||||
for m3u8_format in m3u8_formats:
|
||||
mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url'])
|
||||
if mobj:
|
||||
http_format = m3u8_format.copy()
|
||||
http_format.update({
|
||||
'url': mp4_url.replace(mp4_url_basename, mobj.group(1)),
|
||||
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
|
||||
'protocol': 'http',
|
||||
})
|
||||
formats.append(http_format)
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
|
@ -46,6 +46,7 @@ from .arte import (
|
||||
ArteTVPlus7IE,
|
||||
ArteTVCreativeIE,
|
||||
ArteTVConcertIE,
|
||||
ArteTVInfoIE,
|
||||
ArteTVFutureIE,
|
||||
ArteTVCinemaIE,
|
||||
ArteTVDDCIE,
|
||||
@ -192,10 +193,10 @@ from .drbonanza import DRBonanzaIE
|
||||
from .drtuber import DrTuberIE
|
||||
from .drtv import DRTVIE
|
||||
from .dvtv import DVTVIE
|
||||
from .dump import DumpIE
|
||||
from .dumpert import DumpertIE
|
||||
from .defense import DefenseGouvFrIE
|
||||
from .discovery import DiscoveryIE
|
||||
from .dispeak import DigitallySpeakingIE
|
||||
from .dropbox import DropboxIE
|
||||
from .dw import (
|
||||
DWIE,
|
||||
@ -336,7 +337,6 @@ from .ivi import (
|
||||
)
|
||||
from .ivideon import IvideonIE
|
||||
from .izlesene import IzleseneIE
|
||||
from .jadorecettepub import JadoreCettePubIE
|
||||
from .jeuxvideo import JeuxVideoIE
|
||||
from .jove import JoveIE
|
||||
from .jwplatform import JWPlatformIE
|
||||
@ -406,13 +406,19 @@ from .mdr import MDRIE
|
||||
from .metacafe import MetacafeIE
|
||||
from .metacritic import MetacriticIE
|
||||
from .mgoon import MgoonIE
|
||||
from .mgtv import MGTVIE
|
||||
from .minhateca import MinhatecaIE
|
||||
from .ministrygrid import MinistryGridIE
|
||||
from .minoto import MinotoIE
|
||||
from .miomio import MioMioIE
|
||||
from .mit import TechTVMITIE, MITIE, OCWMITIE
|
||||
from .mitele import MiTeleIE
|
||||
from .mixcloud import MixcloudIE
|
||||
from .mixcloud import (
|
||||
MixcloudIE,
|
||||
MixcloudUserIE,
|
||||
MixcloudPlaylistIE,
|
||||
MixcloudStreamIE,
|
||||
)
|
||||
from .mlb import MLBIE
|
||||
from .mnet import MnetIE
|
||||
from .mpora import MporaIE
|
||||
@ -420,7 +426,6 @@ from .moevideo import MoeVideoIE
|
||||
from .mofosex import MofosexIE
|
||||
from .mojvideo import MojvideoIE
|
||||
from .moniker import MonikerIE
|
||||
from .mooshare import MooshareIE
|
||||
from .morningstar import MorningstarIE
|
||||
from .motherless import MotherlessIE
|
||||
from .motorsport import MotorsportIE
|
||||
@ -465,7 +470,6 @@ from .ndr import (
|
||||
from .ndtv import NDTVIE
|
||||
from .netzkino import NetzkinoIE
|
||||
from .nerdcubed import NerdCubedFeedIE
|
||||
from .nerdist import NerdistIE
|
||||
from .neteasemusic import (
|
||||
NetEaseMusicIE,
|
||||
NetEaseMusicAlbumIE,
|
||||
@ -486,9 +490,10 @@ from .nextmovie import NextMovieIE
|
||||
from .nfb import NFBIE
|
||||
from .nfl import NFLIE
|
||||
from .nhl import (
|
||||
NHLIE,
|
||||
NHLNewsIE,
|
||||
NHLVideocenterIE,
|
||||
NHLNewsIE,
|
||||
NHLVideocenterCategoryIE,
|
||||
NHLIE,
|
||||
)
|
||||
from .nick import NickIE
|
||||
from .niconico import NiconicoIE, NiconicoPlaylistIE
|
||||
@ -556,12 +561,12 @@ from .pandoratv import PandoraTVIE
|
||||
from .parliamentliveuk import ParliamentLiveUKIE
|
||||
from .patreon import PatreonIE
|
||||
from .pbs import PBSIE
|
||||
from .people import PeopleIE
|
||||
from .periscope import PeriscopeIE
|
||||
from .philharmoniedeparis import PhilharmonieDeParisIE
|
||||
from .phoenix import PhoenixIE
|
||||
from .photobucket import PhotobucketIE
|
||||
from .pinkbike import PinkbikeIE
|
||||
from .planetaplay import PlanetaPlayIE
|
||||
from .pladform import PladformIE
|
||||
from .played import PlayedIE
|
||||
from .playfm import PlayFMIE
|
||||
@ -597,7 +602,6 @@ from .qqmusic import (
|
||||
QQMusicToplistIE,
|
||||
QQMusicPlaylistIE,
|
||||
)
|
||||
from .quickvid import QuickVidIE
|
||||
from .r7 import R7IE
|
||||
from .radiode import RadioDeIE
|
||||
from .radiojavan import RadioJavanIE
|
||||
@ -730,6 +734,7 @@ from .sztvhu import SztvHuIE
|
||||
from .tagesschau import TagesschauIE
|
||||
from .tapely import TapelyIE
|
||||
from .tass import TassIE
|
||||
from .tdslifeway import TDSLifewayIE
|
||||
from .teachertube import (
|
||||
TeacherTubeIE,
|
||||
TeacherTubeUserIE,
|
||||
@ -747,7 +752,6 @@ from .teletask import TeleTaskIE
|
||||
from .testurl import TestURLIE
|
||||
from .tf1 import TF1IE
|
||||
from .theintercept import TheInterceptIE
|
||||
from .theonion import TheOnionIE
|
||||
from .theplatform import (
|
||||
ThePlatformIE,
|
||||
ThePlatformFeedIE,
|
||||
@ -832,7 +836,6 @@ from .twitter import (
|
||||
TwitterIE,
|
||||
TwitterAmplifyIE,
|
||||
)
|
||||
from .ubu import UbuIE
|
||||
from .udemy import (
|
||||
UdemyIE,
|
||||
UdemyCourseIE
|
||||
@ -917,7 +920,6 @@ from .vulture import VultureIE
|
||||
from .walla import WallaIE
|
||||
from .washingtonpost import WashingtonPostIE
|
||||
from .wat import WatIE
|
||||
from .wayofthemaster import WayOfTheMasterIE
|
||||
from .wdr import (
|
||||
WDRIE,
|
||||
WDRMobileIE,
|
||||
|
@ -7,7 +7,7 @@ from .common import InfoExtractor
|
||||
|
||||
|
||||
class GazetaIE(InfoExtractor):
|
||||
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
|
||||
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml',
|
||||
'md5': 'd49c9bdc6e5a7888f27475dc215ee789',
|
||||
@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor):
|
||||
'description': 'md5:38617526050bd17b234728e7f9620a71',
|
||||
'thumbnail': 're:^https?://.*\.jpg',
|
||||
},
|
||||
'skip': 'video not found',
|
||||
}, {
|
||||
'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml',
|
||||
'md5': '37f19f78355eb2f4256ee1688359f24c',
|
||||
'info_dict': {
|
||||
'id': '252048',
|
||||
'ext': 'mp4',
|
||||
'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"',
|
||||
},
|
||||
'add_ie': ['EaglePlatform'],
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@ -4,7 +4,6 @@ import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
remove_end,
|
||||
HEADRequest,
|
||||
sanitized_Request,
|
||||
urlencode_postdata,
|
||||
@ -51,63 +50,33 @@ class GDCVaultIE(InfoExtractor):
|
||||
{
|
||||
'url': 'http://gdcvault.com/play/1020791/',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
# Hard-coded hostname
|
||||
'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface',
|
||||
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
|
||||
'info_dict': {
|
||||
'id': '1023460',
|
||||
'ext': 'mp4',
|
||||
'display_id': 'Tenacious-Design-and-The-Interface',
|
||||
'title': 'Tenacious Design and The Interface of \'Destiny\'',
|
||||
},
|
||||
},
|
||||
{
|
||||
# Multiple audios
|
||||
'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC',
|
||||
'info_dict': {
|
||||
'id': '1014631',
|
||||
'ext': 'flv',
|
||||
'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True, # Requires rtmpdump
|
||||
'format': 'jp', # The japanese audio
|
||||
}
|
||||
},
|
||||
]
|
||||
|
||||
def _parse_mp4(self, xml_description):
|
||||
video_formats = []
|
||||
mp4_video = xml_description.find('./metadata/mp4video')
|
||||
if mp4_video is None:
|
||||
return None
|
||||
|
||||
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video.text)
|
||||
video_root = mobj.group('root')
|
||||
formats = xml_description.findall('./metadata/MBRVideos/MBRVideo')
|
||||
for format in formats:
|
||||
mobj = re.match(r'mp4\:(?P<path>.*)', format.find('streamName').text)
|
||||
url = video_root + mobj.group('path')
|
||||
vbr = format.find('bitrate').text
|
||||
video_formats.append({
|
||||
'url': url,
|
||||
'vbr': int(vbr),
|
||||
})
|
||||
return video_formats
|
||||
|
||||
def _parse_flv(self, xml_description):
|
||||
formats = []
|
||||
akamai_url = xml_description.find('./metadata/akamaiHost').text
|
||||
audios = xml_description.find('./metadata/audios')
|
||||
if audios is not None:
|
||||
for audio in audios:
|
||||
formats.append({
|
||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
||||
'play_path': remove_end(audio.get('url'), '.flv'),
|
||||
'ext': 'flv',
|
||||
'vcodec': 'none',
|
||||
'format_id': audio.get('code'),
|
||||
})
|
||||
slide_video_path = xml_description.find('./metadata/slideVideo').text
|
||||
formats.append({
|
||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
||||
'play_path': remove_end(slide_video_path, '.flv'),
|
||||
'ext': 'flv',
|
||||
'format_note': 'slide deck video',
|
||||
'quality': -2,
|
||||
'preference': -2,
|
||||
'format_id': 'slides',
|
||||
})
|
||||
speaker_video_path = xml_description.find('./metadata/speakerVideo').text
|
||||
formats.append({
|
||||
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
|
||||
'play_path': remove_end(speaker_video_path, '.flv'),
|
||||
'ext': 'flv',
|
||||
'format_note': 'speaker video',
|
||||
'quality': -1,
|
||||
'preference': -1,
|
||||
'format_id': 'speaker',
|
||||
})
|
||||
return formats
|
||||
|
||||
def _login(self, webpage_url, display_id):
|
||||
(username, password) = self._get_login_info()
|
||||
if username is None or password is None:
|
||||
@ -183,17 +152,10 @@ class GDCVaultIE(InfoExtractor):
|
||||
r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
|
||||
start_page, 'xml filename')
|
||||
|
||||
xml_description = self._download_xml(
|
||||
'%s/xml/%s' % (xml_root, xml_name), display_id)
|
||||
|
||||
video_title = xml_description.find('./metadata/title').text
|
||||
video_formats = self._parse_mp4(xml_description)
|
||||
if video_formats is None:
|
||||
video_formats = self._parse_flv(xml_description)
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': video_title,
|
||||
'formats': video_formats,
|
||||
'url': '%s/xml/%s' % (xml_root, xml_name),
|
||||
'ie_key': 'DigitallySpeaking',
|
||||
}
|
||||
|
@ -60,6 +60,7 @@ from .googledrive import GoogleDriveIE
|
||||
from .jwplatform import JWPlatformIE
|
||||
from .digiteka import DigitekaIE
|
||||
from .instagram import InstagramIE
|
||||
from .liveleak import LiveLeakIE
|
||||
|
||||
|
||||
class GenericIE(InfoExtractor):
|
||||
@ -104,7 +105,8 @@ class GenericIE(InfoExtractor):
|
||||
'skip_download': True, # infinite live stream
|
||||
},
|
||||
'expected_warnings': [
|
||||
r'501.*Not Implemented'
|
||||
r'501.*Not Implemented',
|
||||
r'400.*Bad Request',
|
||||
],
|
||||
},
|
||||
# Direct link with incorrect MIME type
|
||||
@ -235,6 +237,7 @@ class GenericIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'car-20120827-manifest',
|
||||
'formats': 'mincount:9',
|
||||
'upload_date': '20130904',
|
||||
},
|
||||
'params': {
|
||||
'format': 'bestvideo',
|
||||
@ -594,7 +597,11 @@ class GenericIE(InfoExtractor):
|
||||
'id': 'k2mm4bCdJ6CQ2i7c8o2',
|
||||
'ext': 'mp4',
|
||||
'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
|
||||
'description': 'md5:faf028e48a461b8b7fad38f1e104b119',
|
||||
'uploader': 'Spi0n',
|
||||
'uploader_id': 'xgditw',
|
||||
'upload_date': '20140425',
|
||||
'timestamp': 1398441542,
|
||||
},
|
||||
'add_ie': ['Dailymotion'],
|
||||
},
|
||||
@ -727,8 +734,11 @@ class GenericIE(InfoExtractor):
|
||||
'id': 'uxjb0lwrcz',
|
||||
'ext': 'mp4',
|
||||
'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
|
||||
'description': 'a Martin Fowler video from ThoughtWorks',
|
||||
'duration': 1715.0,
|
||||
'uploader': 'thoughtworks.wistia.com',
|
||||
'upload_date': '20140603',
|
||||
'timestamp': 1401832161,
|
||||
},
|
||||
},
|
||||
# Soundcloud embed
|
||||
@ -979,6 +989,9 @@ class GenericIE(InfoExtractor):
|
||||
'ext': 'flv',
|
||||
'title': "PFT Live: New leader in the 'new-look' defense",
|
||||
'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e',
|
||||
'uploader': 'NBCU-SPORTS',
|
||||
'upload_date': '20140107',
|
||||
'timestamp': 1389118457,
|
||||
},
|
||||
},
|
||||
# UDN embed
|
||||
@ -1031,6 +1044,9 @@ class GenericIE(InfoExtractor):
|
||||
'title': 'SN Presents: Russell Martin, World Citizen',
|
||||
'description': 'To understand why he was the Toronto Blue Jays’ top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
|
||||
'uploader': 'Rogers Sportsnet',
|
||||
'uploader_id': '1704050871',
|
||||
'upload_date': '20150525',
|
||||
'timestamp': 1432570283,
|
||||
},
|
||||
},
|
||||
# Dailymotion Cloud video
|
||||
@ -1122,12 +1138,39 @@ class GenericIE(InfoExtractor):
|
||||
'title': 'The Cardinal Pell Interview',
|
||||
'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
|
||||
'uploader': 'GlobeCast Australia - GlobeStream',
|
||||
'uploader_id': '2733773828001',
|
||||
'upload_date': '20160304',
|
||||
'timestamp': 1457083087,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 downloads
|
||||
'skip_download': True,
|
||||
},
|
||||
},
|
||||
# Another form of arte.tv embed
|
||||
{
|
||||
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
|
||||
'md5': '850bfe45417ddf221288c88a0cffe2e2',
|
||||
'info_dict': {
|
||||
'id': '030273-562_PLUS7-F',
|
||||
'ext': 'mp4',
|
||||
'title': 'ARTE Reportage - Nulle part, en France',
|
||||
'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
|
||||
'upload_date': '20160409',
|
||||
},
|
||||
},
|
||||
# LiveLeak embed
|
||||
{
|
||||
'url': 'http://www.wykop.pl/link/3088787/',
|
||||
'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
|
||||
'info_dict': {
|
||||
'id': '874_1459135191',
|
||||
'ext': 'mp4',
|
||||
'title': 'Man shows poor quality of new apartment building',
|
||||
'description': 'The wall is like a sand pile.',
|
||||
'uploader': 'Lake8737',
|
||||
}
|
||||
},
|
||||
]
|
||||
|
||||
def report_following_redirect(self, new_url):
|
||||
@ -1702,7 +1745,7 @@ class GenericIE(InfoExtractor):
|
||||
|
||||
# Look for embedded arte.tv player
|
||||
mobj = re.search(
|
||||
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
|
||||
r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
|
||||
webpage)
|
||||
if mobj is not None:
|
||||
return self.url_result(mobj.group('url'), 'ArteTVEmbed')
|
||||
@ -1930,7 +1973,13 @@ class GenericIE(InfoExtractor):
|
||||
# Look for Instagram embeds
|
||||
instagram_embed_url = InstagramIE._extract_embed_url(webpage)
|
||||
if instagram_embed_url is not None:
|
||||
return self.url_result(instagram_embed_url, InstagramIE.ie_key())
|
||||
return self.url_result(
|
||||
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
|
||||
|
||||
# Look for LiveLeak embeds
|
||||
liveleak_url = LiveLeakIE._extract_url(webpage)
|
||||
if liveleak_url:
|
||||
return self.url_result(liveleak_url, 'LiveLeak')
|
||||
|
||||
def check_video(vurl):
|
||||
if YoutubeIE.suitable(vurl):
|
||||
@ -2013,6 +2062,7 @@ class GenericIE(InfoExtractor):
|
||||
|
||||
entries = []
|
||||
for video_url in found:
|
||||
video_url = unescapeHTML(video_url)
|
||||
video_url = video_url.replace('\\/', '/')
|
||||
video_url = compat_urlparse.urljoin(url, video_url)
|
||||
video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
|
||||
|
@ -14,13 +14,13 @@ class GoshgayIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
|
||||
_TEST = {
|
||||
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
|
||||
'md5': '027fcc54459dff0feb0bc06a7aeda680',
|
||||
'md5': '4b6db9a0a333142eb9f15913142b0ed1',
|
||||
'info_dict': {
|
||||
'id': '299069',
|
||||
'ext': 'flv',
|
||||
'title': 'DIESEL SFW XXX Video',
|
||||
'thumbnail': 're:^http://.*\.jpg$',
|
||||
'duration': 79,
|
||||
'duration': 80,
|
||||
'age_limit': 18,
|
||||
}
|
||||
}
|
||||
@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
|
||||
'title': title,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'age_limit': self._family_friendly_search(webpage),
|
||||
'age_limit': 18,
|
||||
}
|
||||
|
@ -2,12 +2,6 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
xpath_element,
|
||||
xpath_text,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
)
|
||||
|
||||
|
||||
class GPUTechConfIE(InfoExtractor):
|
||||
@ -27,29 +21,15 @@ class GPUTechConfIE(InfoExtractor):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
root_path = self._search_regex(r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path', 'http://evt.dispeak.com/nvidia/events/gtc15/')
|
||||
xml_file_id = self._search_regex(r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
|
||||
|
||||
doc = self._download_xml('%sxml/%s.xml' % (root_path, xml_file_id), video_id)
|
||||
|
||||
metadata = xpath_element(doc, 'metadata')
|
||||
http_host = xpath_text(metadata, 'httpHost', 'http host', True)
|
||||
mbr_videos = xpath_element(metadata, 'MBRVideos')
|
||||
|
||||
formats = []
|
||||
for mbr_video in mbr_videos.findall('MBRVideo'):
|
||||
stream_name = xpath_text(mbr_video, 'streamName')
|
||||
if stream_name:
|
||||
formats.append({
|
||||
'url': 'http://%s/%s' % (http_host, stream_name.replace('mp4:', '')),
|
||||
'tbr': int_or_none(xpath_text(mbr_video, 'bitrate')),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
root_path = self._search_regex(
|
||||
r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path',
|
||||
default='http://evt.dispeak.com/nvidia/events/gtc15/')
|
||||
xml_file_id = self._search_regex(
|
||||
r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'id': video_id,
|
||||
'title': xpath_text(metadata, 'title'),
|
||||
'duration': parse_duration(xpath_text(metadata, 'endTime')),
|
||||
'creator': xpath_text(metadata, 'speaker'),
|
||||
'formats': formats,
|
||||
'url': '%sxml/%s.xml' % (root_path, xml_file_id),
|
||||
'ie_key': 'DigitallySpeaking',
|
||||
}
|
||||
|
@ -16,14 +16,14 @@ class GrouponIE(InfoExtractor):
|
||||
'playlist': [{
|
||||
'info_dict': {
|
||||
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
|
||||
'ext': 'mp4',
|
||||
'ext': 'flv',
|
||||
'title': 'Bikram Yoga Huntington Beach | Orange County',
|
||||
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
|
||||
'duration': 44.961,
|
||||
},
|
||||
}],
|
||||
'params': {
|
||||
'skip_download': 'HLS',
|
||||
'skip_download': 'HDS',
|
||||
}
|
||||
}
|
||||
|
||||
@ -32,7 +32,7 @@ class GrouponIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
|
||||
payload = self._parse_json(self._search_regex(
|
||||
r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
|
||||
r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
|
||||
videos = payload['carousel'].get('dealVideos', [])
|
||||
entries = []
|
||||
for v in videos:
|
||||
|
@ -24,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'duration': 161,
|
||||
},
|
||||
'skip': 'Video broken',
|
||||
},
|
||||
{
|
||||
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',
|
||||
|
@ -4,6 +4,7 @@ import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
parse_duration,
|
||||
unified_strdate,
|
||||
)
|
||||
@ -29,7 +30,12 @@ class HuffPostIE(InfoExtractor):
|
||||
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
|
||||
'duration': 1549,
|
||||
'upload_date': '20140124',
|
||||
}
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'expected_warnings': ['HTTP Error 404: Not Found'],
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -45,7 +51,7 @@ class HuffPostIE(InfoExtractor):
|
||||
description = data.get('description')
|
||||
|
||||
thumbnails = []
|
||||
for url in data['images'].values():
|
||||
for url in filter(None, data['images'].values()):
|
||||
m = re.match('.*-([0-9]+x[0-9]+)\.', url)
|
||||
if not m:
|
||||
continue
|
||||
@ -54,13 +60,25 @@ class HuffPostIE(InfoExtractor):
|
||||
'resolution': m.group(1),
|
||||
})
|
||||
|
||||
formats = [{
|
||||
formats = []
|
||||
sources = data.get('sources', {})
|
||||
live_sources = list(sources.get('live', {}).items()) + list(sources.get('live_again', {}).items())
|
||||
for key, url in live_sources:
|
||||
ext = determine_ext(url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
|
||||
elif ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
url + '?hdcore=2.9.5', video_id, f4m_id='hds', fatal=False))
|
||||
else:
|
||||
formats.append({
|
||||
'format': key,
|
||||
'format_id': key.replace('/', '.'),
|
||||
'ext': 'mp4',
|
||||
'url': url,
|
||||
'vcodec': 'none' if key.startswith('audio/') else None,
|
||||
} for key, url in data.get('sources', {}).get('live', {}).items()]
|
||||
})
|
||||
|
||||
if not formats and data.get('fivemin_id'):
|
||||
return self.url_result('5min:%s' % data['fivemin_id'])
|
||||
|
@ -12,7 +12,7 @@ from ..utils import (
|
||||
|
||||
|
||||
class InstagramIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)'
|
||||
_VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
|
||||
_TESTS = [{
|
||||
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
|
||||
'md5': '0d2da106a9d2631273e192b372806516',
|
||||
@ -38,10 +38,19 @@ class InstagramIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'https://instagram.com/p/-Cmh1cukG2/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_embed_url(webpage):
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
|
||||
webpage)
|
||||
if mobj:
|
||||
return mobj.group('url')
|
||||
|
||||
blockquote_el = get_element_by_attribute(
|
||||
'class', 'instagram-media', webpage)
|
||||
if blockquote_el is None:
|
||||
@ -53,7 +62,9 @@ class InstagramIE(InfoExtractor):
|
||||
return mobj.group('link')
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
url = mobj.group('url')
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',
|
||||
|
@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
|
||||
IE_NAME = 'iqiyi'
|
||||
IE_DESC = '爱奇艺'
|
||||
|
||||
_VALID_URL = r'https?://(?:[^.]+\.)?iqiyi\.com/.+\.html'
|
||||
_VALID_URL = r'https?://(?:(?:[^.]+\.)?iqiyi\.com|www\.pps\.tv)/.+\.html'
|
||||
|
||||
_NETRC_MACHINE = 'iqiyi'
|
||||
|
||||
@ -273,6 +273,9 @@ class IqiyiIE(InfoExtractor):
|
||||
'title': '灌篮高手 国语版',
|
||||
},
|
||||
'playlist_count': 101,
|
||||
}, {
|
||||
'url': 'http://www.pps.tv/w_19rrbav0ph.html',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_FORMATS_MAP = [
|
||||
@ -284,6 +287,13 @@ class IqiyiIE(InfoExtractor):
|
||||
('10', 'h1'),
|
||||
]
|
||||
|
||||
AUTH_API_ERRORS = {
|
||||
# No preview available (不允许试看鉴权失败)
|
||||
'Q00505': 'This video requires a VIP account',
|
||||
# End of preview time (试看结束鉴权失败)
|
||||
'Q00506': 'Needs a VIP account for full video',
|
||||
}
|
||||
|
||||
def _real_initialize(self):
|
||||
self._login()
|
||||
|
||||
@ -369,14 +379,18 @@ class IqiyiIE(InfoExtractor):
|
||||
note='Downloading video authentication JSON',
|
||||
errnote='Unable to download video authentication JSON')
|
||||
|
||||
if auth_result['code'] == 'Q00505': # No preview available (不允许试看鉴权失败)
|
||||
raise ExtractorError('This video requires a VIP account', expected=True)
|
||||
if auth_result['code'] == 'Q00506': # End of preview time (试看结束鉴权失败)
|
||||
code = auth_result.get('code')
|
||||
msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
|
||||
if code == 'Q00506':
|
||||
if do_report_warning:
|
||||
self.report_warning('Needs a VIP account for full video')
|
||||
self.report_warning(msg)
|
||||
return False
|
||||
if 'data' not in auth_result:
|
||||
if msg is not None:
|
||||
raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
|
||||
raise ExtractorError('Unexpected error from Iqiyi auth API')
|
||||
|
||||
return auth_result
|
||||
return auth_result['data']
|
||||
|
||||
def construct_video_urls(self, data, video_id, _uuid, tvid):
|
||||
def do_xor(x, y):
|
||||
@ -452,11 +466,11 @@ class IqiyiIE(InfoExtractor):
|
||||
need_vip_warning_report = False
|
||||
break
|
||||
param.update({
|
||||
't': auth_result['data']['t'],
|
||||
't': auth_result['t'],
|
||||
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
|
||||
'cid': 'afbe8fd3d73448c9',
|
||||
'vid': video_id,
|
||||
'QY00001': auth_result['data']['u'],
|
||||
'QY00001': auth_result['u'],
|
||||
})
|
||||
api_video_url += '?' if '?' not in api_video_url else '&'
|
||||
api_video_url += compat_urllib_parse_urlencode(param)
|
||||
|
@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
|
||||
'description': 'md5:253753e2655dde93f59f74b572454f6d',
|
||||
'thumbnail': 're:^http://.*\.jpg',
|
||||
'thumbnail': 're:^https?://.*\.jpg',
|
||||
'uploader_id': 'pelikzzle',
|
||||
'timestamp': int,
|
||||
'upload_date': '20140702',
|
||||
@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
|
||||
'id': '17997',
|
||||
'ext': 'mp4',
|
||||
'title': 'Tarkan Dortmund 2006 Konseri',
|
||||
'description': 'Tarkan Dortmund 2006 Konseri',
|
||||
'thumbnail': 're:^http://.*\.jpg',
|
||||
'thumbnail': 're:^https://.*\.jpg',
|
||||
'uploader_id': 'parlayankiz',
|
||||
'timestamp': int,
|
||||
'upload_date': '20061112',
|
||||
@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
title = self._og_search_title(webpage)
|
||||
description = self._og_search_description(webpage)
|
||||
description = self._og_search_description(webpage, default=None)
|
||||
thumbnail = self._proto_relative_url(
|
||||
self._og_search_thumbnail(webpage), scheme='http:')
|
||||
|
||||
|
@ -1,47 +0,0 @@
|
||||
# coding: utf-8
|
||||
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .youtube import YoutubeIE
|
||||
|
||||
|
||||
class JadoreCettePubIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
|
||||
'md5': '401286a06067c70b44076044b66515de',
|
||||
'info_dict': {
|
||||
'id': 'jLMja3tr7a4',
|
||||
'ext': 'mp4',
|
||||
'title': 'La pire utilisation de Star Wars',
|
||||
'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon. Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
display_id = mobj.group('id')
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
title = self._html_search_regex(
|
||||
r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
|
||||
webpage, 'title')
|
||||
description = self._html_search_regex(
|
||||
r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
|
||||
fatal=False)
|
||||
real_url = self._search_regex(
|
||||
r'\[/postlink\](.*)endofvid', webpage, 'video URL')
|
||||
video_id = YoutubeIE.extract_id(real_url)
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'url': real_url,
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
}
|
@ -2,39 +2,63 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urllib_parse_unquote_plus
|
||||
from ..utils import (
|
||||
js_to_json,
|
||||
)
|
||||
|
||||
|
||||
class KaraoketvIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)'
|
||||
_VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
|
||||
_TEST = {
|
||||
'url': 'http://karaoketv.co.il/?container=songs&id=171568',
|
||||
'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
|
||||
'info_dict': {
|
||||
'id': '171568',
|
||||
'ext': 'mp4',
|
||||
'title': 'אל העולם שלך - רותם כהן - שרים קריוקי',
|
||||
'id': '58356',
|
||||
'ext': 'flv',
|
||||
'title': 'קריוקי של איזון',
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
api_page_url = self._search_regex(
|
||||
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
|
||||
webpage, 'API play URL', group='url')
|
||||
|
||||
page_video_url = self._og_search_video_url(webpage, video_id)
|
||||
config_json = compat_urllib_parse_unquote_plus(self._search_regex(
|
||||
r'config=(.*)', page_video_url, 'configuration'))
|
||||
api_page = self._download_webpage(api_page_url, video_id)
|
||||
video_cdn_url = self._search_regex(
|
||||
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
|
||||
api_page, 'video cdn URL', group='url')
|
||||
|
||||
urls_info_json = self._download_json(
|
||||
config_json, video_id, 'Downloading configuration',
|
||||
transform_source=js_to_json)
|
||||
video_cdn = self._download_webpage(video_cdn_url, video_id)
|
||||
play_path = self._parse_json(
|
||||
self._search_regex(
|
||||
r'var\s+options\s*=\s*({.+?});', video_cdn, 'options'),
|
||||
video_id)['clip']['url']
|
||||
|
||||
url = urls_info_json['playlist'][0]['url']
|
||||
settings = self._parse_json(
|
||||
self._search_regex(
|
||||
r'var\s+settings\s*=\s*({.+?});', video_cdn, 'servers', default='{}'),
|
||||
video_id, fatal=False) or {}
|
||||
|
||||
servers = settings.get('servers')
|
||||
if not servers or not isinstance(servers, list):
|
||||
servers = ('wowzail.video-cdn.com:80/vodcdn', )
|
||||
|
||||
formats = [{
|
||||
'url': 'rtmp://%s' % server if not server.startswith('rtmp') else server,
|
||||
'play_path': play_path,
|
||||
'app': 'vodcdn',
|
||||
'page_url': video_cdn_url,
|
||||
'player_url': 'http://www.video-cdn.com/assets/flowplayer/flowplayer.commercial-3.2.18.swf',
|
||||
'rtmp_real_time': True,
|
||||
'ext': 'flv',
|
||||
} for server in servers]
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': self._og_search_title(webpage),
|
||||
'url': url,
|
||||
'formats': formats,
|
||||
}
|
||||
|
@ -52,9 +52,12 @@ class KarriereVideosIE(InfoExtractor):
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'/config/video/(.+?)\.xml', webpage, 'video id')
|
||||
# Server returns malformed headers
|
||||
# Force Accept-Encoding: * to prevent gzipped results
|
||||
playlist = self._download_xml(
|
||||
'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
|
||||
video_id, transform_source=fix_xml_ampersands)
|
||||
video_id, transform_source=fix_xml_ampersands,
|
||||
headers={'Accept-Encoding': '*'})
|
||||
|
||||
NS_MAP = {
|
||||
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'
|
||||
|
@ -81,7 +81,7 @@ class KuwoIE(KuwoBaseIE):
|
||||
'id': '6446136',
|
||||
'ext': 'mp3',
|
||||
'title': '心',
|
||||
'description': 'md5:b2ab6295d014005bfc607525bfc1e38a',
|
||||
'description': 'md5:5d0e947b242c35dc0eb1d2fce9fbf02c',
|
||||
'creator': 'IU',
|
||||
'upload_date': '20150518',
|
||||
},
|
||||
@ -102,10 +102,10 @@ class KuwoIE(KuwoBaseIE):
|
||||
raise ExtractorError('this song has been offline because of copyright issues', expected=True)
|
||||
|
||||
song_name = self._html_search_regex(
|
||||
r'(?s)class="(?:[^"\s]+\s+)*title(?:\s+[^"\s]+)*".*?<h1[^>]+title="([^"]+)"', webpage, 'song name')
|
||||
singer_name = self._html_search_regex(
|
||||
r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"',
|
||||
webpage, 'singer name', fatal=False)
|
||||
r'<p[^>]+id="lrcName">([^<]+)</p>', webpage, 'song name')
|
||||
singer_name = remove_start(self._html_search_regex(
|
||||
r'<a[^>]+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">',
|
||||
webpage, 'singer name', fatal=False), '歌手')
|
||||
lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
|
||||
if lrc_content == '暂无': # indicates no lyrics
|
||||
lrc_content = None
|
||||
@ -114,7 +114,7 @@ class KuwoIE(KuwoBaseIE):
|
||||
self._sort_formats(formats)
|
||||
|
||||
album_id = self._html_search_regex(
|
||||
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
|
||||
r'<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
|
||||
webpage, 'album id', fatal=False)
|
||||
|
||||
publish_time = None
|
||||
@ -268,7 +268,7 @@ class KuwoCategoryIE(InfoExtractor):
|
||||
'title': '八十年代精选',
|
||||
'description': '这些都是属于八十年代的回忆!',
|
||||
},
|
||||
'playlist_count': 30,
|
||||
'playlist_mincount': 24,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@ -63,6 +63,7 @@ class Laola1TvIE(InfoExtractor):
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'This live stream has already finished.',
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -74,6 +75,9 @@ class Laola1TvIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
if 'Dieser Livestream ist bereits beendet.' in webpage:
|
||||
raise ExtractorError('This live stream has already finished.', expected=True)
|
||||
|
||||
iframe_url = self._search_regex(
|
||||
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
|
||||
webpage, 'iframe url')
|
||||
|
@ -6,6 +6,7 @@ import re
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
determine_protocol,
|
||||
parse_duration,
|
||||
int_or_none,
|
||||
)
|
||||
@ -18,10 +19,14 @@ class Lecture2GoIE(InfoExtractor):
|
||||
'md5': 'ac02b570883020d208d405d5a3fd2f7f',
|
||||
'info_dict': {
|
||||
'id': '17473',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': '2 - Endliche Automaten und reguläre Sprachen',
|
||||
'creator': 'Frank Heitmann',
|
||||
'duration': 5220,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
}
|
||||
}
|
||||
|
||||
@ -32,14 +37,18 @@ class Lecture2GoIE(InfoExtractor):
|
||||
title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
|
||||
|
||||
formats = []
|
||||
for url in set(re.findall(r'"src","([^"]+)"', webpage)):
|
||||
for url in set(re.findall(r'var\s+playerUri\d+\s*=\s*"([^"]+)"', webpage)):
|
||||
ext = determine_ext(url)
|
||||
protocol = determine_protocol({'url': url})
|
||||
if ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(url, video_id))
|
||||
formats.extend(self._extract_f4m_formats(url, video_id, f4m_id='hds'))
|
||||
elif ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(url, video_id))
|
||||
formats.extend(self._extract_m3u8_formats(url, video_id, ext='mp4', m3u8_id='hls'))
|
||||
else:
|
||||
if protocol == 'rtmp':
|
||||
continue # XXX: currently broken
|
||||
formats.append({
|
||||
'format_id': protocol,
|
||||
'url': url,
|
||||
})
|
||||
|
||||
|
@ -53,6 +53,14 @@ class LiveLeakIE(InfoExtractor):
|
||||
}
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
|
||||
webpage)
|
||||
if mobj:
|
||||
return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
@ -49,8 +49,8 @@ class MDRIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'Beutolomäus und der geheime Weihnachtswunsch',
|
||||
'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
|
||||
'timestamp': 1419047100,
|
||||
'upload_date': '20141220',
|
||||
'timestamp': 1450950000,
|
||||
'upload_date': '20151224',
|
||||
'duration': 4628,
|
||||
'uploader': 'KIKA',
|
||||
},
|
||||
@ -71,8 +71,8 @@ class MDRIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
data_url = self._search_regex(
|
||||
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>\\?/.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
|
||||
webpage, 'data url', default=None, group='url').replace('\/', '/')
|
||||
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
|
||||
webpage, 'data url', group='url').replace('\/', '/')
|
||||
|
||||
doc = self._download_xml(
|
||||
compat_urlparse.urljoin(url, data_url), video_id)
|
||||
|
@ -81,6 +81,9 @@ class MetacafeIE(InfoExtractor):
|
||||
'title': 'Open: This is Face the Nation, February 9',
|
||||
'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
|
||||
'duration': 96,
|
||||
'uploader': 'CBSI-NEW',
|
||||
'upload_date': '20140209',
|
||||
'timestamp': 1391959800,
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
|
@ -11,7 +11,7 @@ from ..utils import (
|
||||
class MetacriticIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
|
||||
|
||||
_TEST = {
|
||||
_TESTS = [{
|
||||
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
|
||||
'info_dict': {
|
||||
'id': '3698222',
|
||||
@ -20,7 +20,17 @@ class MetacriticIE(InfoExtractor):
|
||||
'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
|
||||
'duration': 221,
|
||||
},
|
||||
}
|
||||
'skip': 'Not providing trailers anymore',
|
||||
}, {
|
||||
'url': 'http://www.metacritic.com/game/playstation-4/tales-from-the-borderlands-a-telltale-game-series/trailers/5740315',
|
||||
'info_dict': {
|
||||
'id': '5740315',
|
||||
'ext': 'mp4',
|
||||
'title': 'Tales from the Borderlands - Finale: The Vault of the Traveler',
|
||||
'description': 'In the final episode of the season, all hell breaks loose. Jack is now in control of Helios\' systems, and he\'s ready to reclaim his rightful place as king of Hyperion (with or without you).',
|
||||
'duration': 114,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
|
63
youtube_dl/extractor/mgtv.py
Normal file
63
youtube_dl/extractor/mgtv.py
Normal file
@ -0,0 +1,63 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import int_or_none
|
||||
|
||||
|
||||
class MGTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
|
||||
IE_DESC = '芒果TV'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
|
||||
'md5': '',
|
||||
'info_dict': {
|
||||
'id': '3116640',
|
||||
'ext': 'mp4',
|
||||
'title': '我是歌手第四季双年巅峰会:韩红李玟“双王”领军对抗',
|
||||
'description': '我是歌手第四季双年巅峰会',
|
||||
'duration': 7461,
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True, # m3u8 download
|
||||
},
|
||||
}
|
||||
|
||||
_FORMAT_MAP = {
|
||||
'标清': ('Standard', 0),
|
||||
'高清': ('High', 1),
|
||||
'超清': ('SuperHigh', 2),
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
api_data = self._download_json(
|
||||
'http://v.api.mgtv.com/player/video', video_id,
|
||||
query={'video_id': video_id})['data']
|
||||
info = api_data['info']
|
||||
|
||||
formats = []
|
||||
for idx, stream in enumerate(api_data['stream']):
|
||||
format_name = stream.get('name')
|
||||
format_id, preference = self._FORMAT_MAP.get(format_name, (None, None))
|
||||
format_info = self._download_json(
|
||||
stream['url'], video_id,
|
||||
note='Download video info for format %s' % format_id or '#%d' % idx)
|
||||
formats.append({
|
||||
'format_id': format_id,
|
||||
'url': format_info['info'],
|
||||
'ext': 'mp4', # These are m3u8 playlists
|
||||
'preference': preference,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': info['title'].strip(),
|
||||
'formats': formats,
|
||||
'description': info.get('desc'),
|
||||
'duration': int_or_none(info.get('duration')),
|
||||
'thumbnail': info.get('thumb'),
|
||||
}
|
@ -1,8 +1,5 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
@ -20,21 +17,28 @@ class MinistryGridIE(InfoExtractor):
|
||||
'id': '3453494717001',
|
||||
'ext': 'mp4',
|
||||
'title': 'The Gospel by Numbers',
|
||||
'thumbnail': 're:^https?://.*\.jpg',
|
||||
'upload_date': '20140410',
|
||||
'description': 'Coming soon from T4G 2014!',
|
||||
'uploader': 'LifeWay Christian Resources (MG)',
|
||||
'uploader_id': '2034960640001',
|
||||
'timestamp': 1397145591,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['TDSLifeway'],
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
video_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
portlets_json = self._search_regex(
|
||||
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list')
|
||||
portlets = json.loads(portlets_json)
|
||||
portlets = self._parse_json(self._search_regex(
|
||||
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list'),
|
||||
video_id)
|
||||
pl_id = self._search_regex(
|
||||
r'<!--\s*p_l_id - ([0-9]+)<br>', webpage, 'p_l_id')
|
||||
r'getPlid:function\(\){return"(\d+)"}', webpage, 'p_l_id')
|
||||
|
||||
for i, portlet in enumerate(portlets):
|
||||
portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet)
|
||||
@ -46,12 +50,8 @@ class MinistryGridIE(InfoExtractor):
|
||||
r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
|
||||
default=None)
|
||||
if video_iframe_url:
|
||||
surl = smuggle_url(
|
||||
video_iframe_url, {'force_videoid': video_id})
|
||||
return {
|
||||
'_type': 'url',
|
||||
'id': video_id,
|
||||
'url': surl,
|
||||
}
|
||||
return self.url_result(
|
||||
smuggle_url(video_iframe_url, {'force_videoid': video_id}),
|
||||
video_id=video_id)
|
||||
|
||||
raise ExtractorError('Could not find video iframe in any portlets')
|
||||
|
@ -1,26 +1,35 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import base64
|
||||
import functools
|
||||
import itertools
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urllib_parse_unquote
|
||||
from ..compat import (
|
||||
compat_chr,
|
||||
compat_ord,
|
||||
compat_urllib_parse_unquote,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
ExtractorError,
|
||||
HEADRequest,
|
||||
OnDemandPagedList,
|
||||
parse_count,
|
||||
str_to_int,
|
||||
)
|
||||
|
||||
|
||||
class MixcloudIE(InfoExtractor):
|
||||
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)'
|
||||
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
|
||||
IE_NAME = 'mixcloud'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
|
||||
'info_dict': {
|
||||
'id': 'dholbach-cryptkeeper',
|
||||
'ext': 'mp3',
|
||||
'ext': 'm4a',
|
||||
'title': 'Cryptkeeper',
|
||||
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
|
||||
'uploader': 'Daniel Holbach',
|
||||
@ -38,22 +47,22 @@ class MixcloudIE(InfoExtractor):
|
||||
'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
|
||||
'uploader': 'Gilles Peterson Worldwide',
|
||||
'uploader_id': 'gillespeterson',
|
||||
'thumbnail': 're:https?://.*/images/',
|
||||
'thumbnail': 're:https?://.*',
|
||||
'view_count': int,
|
||||
'like_count': int,
|
||||
},
|
||||
}]
|
||||
|
||||
def _check_url(self, url, track_id, ext):
|
||||
try:
|
||||
# We only want to know if the request succeed
|
||||
# don't download the whole file
|
||||
self._request_webpage(
|
||||
HEADRequest(url), track_id,
|
||||
'Trying %s URL' % ext)
|
||||
return True
|
||||
except ExtractorError:
|
||||
return False
|
||||
# See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
|
||||
@staticmethod
|
||||
def _decrypt_play_info(play_info):
|
||||
KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
|
||||
|
||||
play_info = base64.b64decode(play_info.encode('ascii'))
|
||||
|
||||
return ''.join([
|
||||
compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
|
||||
for idx, ch in enumerate(play_info)])
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
@ -63,14 +72,19 @@ class MixcloudIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, track_id)
|
||||
|
||||
preview_url = self._search_regex(
|
||||
r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url')
|
||||
song_url = re.sub(r'audiocdn(\d+)', r'stream\1', preview_url)
|
||||
song_url = song_url.replace('/previews/', '/c/originals/')
|
||||
if not self._check_url(song_url, track_id, 'mp3'):
|
||||
song_url = song_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/')
|
||||
if not self._check_url(song_url, track_id, 'm4a'):
|
||||
raise ExtractorError('Unable to extract track url')
|
||||
message = self._html_search_regex(
|
||||
r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
|
||||
webpage, 'error message', default=None)
|
||||
|
||||
encrypted_play_info = self._search_regex(
|
||||
r'm-play-info="([^"]+)"', webpage, 'play info')
|
||||
play_info = self._parse_json(
|
||||
self._decrypt_play_info(encrypted_play_info), track_id)
|
||||
|
||||
if message and 'stream_url' not in play_info:
|
||||
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
|
||||
|
||||
song_url = play_info['stream_url']
|
||||
|
||||
PREFIX = (
|
||||
r'm-play-on-spacebar[^>]+'
|
||||
@ -105,3 +119,201 @@ class MixcloudIE(InfoExtractor):
|
||||
'view_count': view_count,
|
||||
'like_count': like_count,
|
||||
}
|
||||
|
||||
|
||||
class MixcloudPlaylistBaseIE(InfoExtractor):
|
||||
_PAGE_SIZE = 24
|
||||
|
||||
def _find_urls_in_page(self, page):
|
||||
for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
|
||||
yield self.url_result(
|
||||
compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
|
||||
MixcloudIE.ie_key())
|
||||
|
||||
def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
|
||||
real_page_number = real_page_number or current_page + 1
|
||||
return self._download_webpage(
|
||||
'https://www.mixcloud.com/%s/' % path, video_id,
|
||||
note='Download %s (page %d)' % (page_name, current_page + 1),
|
||||
errnote='Unable to download %s' % page_name,
|
||||
query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
|
||||
headers={'X-Requested-With': 'XMLHttpRequest'})
|
||||
|
||||
def _tracks_page_func(self, page, video_id, page_name, current_page):
|
||||
resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
|
||||
|
||||
for item in self._find_urls_in_page(resp):
|
||||
yield item
|
||||
|
||||
def _get_user_description(self, page_content):
|
||||
return self._html_search_regex(
|
||||
r'<div[^>]+class="description-text"[^>]*>(.+?)</div>',
|
||||
page_content, 'user description', fatal=False)
|
||||
|
||||
|
||||
class MixcloudUserIE(MixcloudPlaylistBaseIE):
|
||||
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
|
||||
IE_NAME = 'mixcloud:user'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.mixcloud.com/dholbach/',
|
||||
'info_dict': {
|
||||
'id': 'dholbach_uploads',
|
||||
'title': 'Daniel Holbach (uploads)',
|
||||
'description': 'md5:327af72d1efeb404a8216c27240d1370',
|
||||
},
|
||||
'playlist_mincount': 11,
|
||||
}, {
|
||||
'url': 'http://www.mixcloud.com/dholbach/uploads/',
|
||||
'info_dict': {
|
||||
'id': 'dholbach_uploads',
|
||||
'title': 'Daniel Holbach (uploads)',
|
||||
'description': 'md5:327af72d1efeb404a8216c27240d1370',
|
||||
},
|
||||
'playlist_mincount': 11,
|
||||
}, {
|
||||
'url': 'http://www.mixcloud.com/dholbach/favorites/',
|
||||
'info_dict': {
|
||||
'id': 'dholbach_favorites',
|
||||
'title': 'Daniel Holbach (favorites)',
|
||||
'description': 'md5:327af72d1efeb404a8216c27240d1370',
|
||||
},
|
||||
'params': {
|
||||
'playlist_items': '1-100',
|
||||
},
|
||||
'playlist_mincount': 100,
|
||||
}, {
|
||||
'url': 'http://www.mixcloud.com/dholbach/listens/',
|
||||
'info_dict': {
|
||||
'id': 'dholbach_listens',
|
||||
'title': 'Daniel Holbach (listens)',
|
||||
'description': 'md5:327af72d1efeb404a8216c27240d1370',
|
||||
},
|
||||
'params': {
|
||||
'playlist_items': '1-100',
|
||||
},
|
||||
'playlist_mincount': 100,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
user_id = mobj.group('user')
|
||||
list_type = mobj.group('type')
|
||||
|
||||
# if only a profile URL was supplied, default to download all uploads
|
||||
if list_type is None:
|
||||
list_type = 'uploads'
|
||||
|
||||
video_id = '%s_%s' % (user_id, list_type)
|
||||
|
||||
profile = self._download_webpage(
|
||||
'https://www.mixcloud.com/%s/' % user_id, video_id,
|
||||
note='Downloading user profile',
|
||||
errnote='Unable to download user profile')
|
||||
|
||||
username = self._og_search_title(profile)
|
||||
description = self._get_user_description(profile)
|
||||
|
||||
entries = OnDemandPagedList(
|
||||
functools.partial(
|
||||
self._tracks_page_func,
|
||||
'%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
|
||||
self._PAGE_SIZE, use_cache=True)
|
||||
|
||||
return self.playlist_result(
|
||||
entries, video_id, '%s (%s)' % (username, list_type), description)
|
||||
|
||||
|
||||
class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
|
||||
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/playlists/(?P<playlist>[^/]+)/?$'
|
||||
IE_NAME = 'mixcloud:playlist'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
|
||||
'info_dict': {
|
||||
'id': 'RedBullThre3style_tokyo-finalists-2015',
|
||||
'title': 'National Champions 2015',
|
||||
'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
|
||||
},
|
||||
'playlist_mincount': 16,
|
||||
}, {
|
||||
'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
|
||||
'info_dict': {
|
||||
'id': 'maxvibes_jazzcat-on-ness-radio',
|
||||
'title': 'Jazzcat on Ness Radio',
|
||||
'description': 'md5:7bbbf0d6359a0b8cda85224be0f8f263',
|
||||
},
|
||||
'playlist_mincount': 23
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
user_id = mobj.group('user')
|
||||
playlist_id = mobj.group('playlist')
|
||||
video_id = '%s_%s' % (user_id, playlist_id)
|
||||
|
||||
profile = self._download_webpage(
|
||||
url, user_id,
|
||||
note='Downloading playlist page',
|
||||
errnote='Unable to download playlist page')
|
||||
|
||||
description = self._get_user_description(profile)
|
||||
playlist_title = self._html_search_regex(
|
||||
r'<span[^>]+class="[^"]*list-playlist-title[^"]*"[^>]*>(.*?)</span>',
|
||||
profile, 'playlist title')
|
||||
|
||||
entries = OnDemandPagedList(
|
||||
functools.partial(
|
||||
self._tracks_page_func,
|
||||
'%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
|
||||
self._PAGE_SIZE)
|
||||
|
||||
return self.playlist_result(entries, video_id, playlist_title, description)
|
||||
|
||||
|
||||
class MixcloudStreamIE(MixcloudPlaylistBaseIE):
|
||||
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
|
||||
IE_NAME = 'mixcloud:stream'
|
||||
|
||||
_TEST = {
|
||||
'url': 'https://www.mixcloud.com/FirstEar/stream/',
|
||||
'info_dict': {
|
||||
'id': 'FirstEar',
|
||||
'title': 'First Ear',
|
||||
'description': 'Curators of good music\nfirstearmusic.com',
|
||||
},
|
||||
'playlist_mincount': 192,
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
user_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, user_id)
|
||||
|
||||
entries = []
|
||||
prev_page_url = None
|
||||
|
||||
def _handle_page(page):
|
||||
entries.extend(self._find_urls_in_page(page))
|
||||
return self._search_regex(
|
||||
r'm-next-page-url="([^"]+)"', page,
|
||||
'next page URL', default=None)
|
||||
|
||||
next_page_url = _handle_page(webpage)
|
||||
|
||||
for idx in itertools.count(0):
|
||||
if not next_page_url or prev_page_url == next_page_url:
|
||||
break
|
||||
|
||||
prev_page_url = next_page_url
|
||||
current_page = int(self._search_regex(
|
||||
r'\?page=(\d+)', next_page_url, 'next page number'))
|
||||
|
||||
next_page_url = _handle_page(self._fetch_tracks_page(
|
||||
'%s/stream' % user_id, user_id, 'stream', idx,
|
||||
real_page_number=current_page))
|
||||
|
||||
username = self._og_search_title(webpage)
|
||||
description = self._get_user_description(webpage)
|
||||
|
||||
return self.playlist_result(entries, user_id, username, description)
|
||||
|
@ -1,110 +0,0 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
sanitized_Request,
|
||||
urlencode_postdata,
|
||||
)
|
||||
|
||||
|
||||
class MooshareIE(InfoExtractor):
|
||||
IE_NAME = 'mooshare'
|
||||
IE_DESC = 'Mooshare.biz'
|
||||
_VALID_URL = r'https?://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
|
||||
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://mooshare.biz/8dqtk4bjbp8g',
|
||||
'md5': '4e14f9562928aecd2e42c6f341c8feba',
|
||||
'info_dict': {
|
||||
'id': '8dqtk4bjbp8g',
|
||||
'ext': 'mp4',
|
||||
'title': 'Comedy Football 2011 - (part 1-2)',
|
||||
'duration': 893,
|
||||
},
|
||||
},
|
||||
{
|
||||
'url': 'http://mooshare.biz/aipjtoc4g95j',
|
||||
'info_dict': {
|
||||
'id': 'aipjtoc4g95j',
|
||||
'ext': 'mp4',
|
||||
'title': 'Orange Caramel Dashing Through the Snow',
|
||||
'duration': 212,
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
}
|
||||
}
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
page = self._download_webpage(url, video_id, 'Downloading page')
|
||||
|
||||
if re.search(r'>Video Not Found or Deleted<', page) is not None:
|
||||
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
|
||||
|
||||
hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
|
||||
title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
|
||||
|
||||
download_form = {
|
||||
'op': 'download1',
|
||||
'id': video_id,
|
||||
'hash': hash_key,
|
||||
}
|
||||
|
||||
request = sanitized_Request(
|
||||
'http://mooshare.biz/%s' % video_id, urlencode_postdata(download_form))
|
||||
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
|
||||
|
||||
self._sleep(5, video_id)
|
||||
|
||||
video_page = self._download_webpage(request, video_id, 'Downloading video page')
|
||||
|
||||
thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
|
||||
duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
|
||||
duration = int(duration_str) if duration_str is not None else None
|
||||
|
||||
formats = []
|
||||
|
||||
# SD video
|
||||
mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
|
||||
if mobj is not None:
|
||||
formats.append({
|
||||
'url': mobj.group('url'),
|
||||
'format_id': 'sd',
|
||||
'format': 'SD',
|
||||
})
|
||||
|
||||
# HD video
|
||||
mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
|
||||
if mobj is not None:
|
||||
formats.append({
|
||||
'url': mobj.group('url'),
|
||||
'format_id': 'hd',
|
||||
'format': 'HD',
|
||||
})
|
||||
|
||||
# rtmp video
|
||||
mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
|
||||
if mobj is not None:
|
||||
formats.append({
|
||||
'url': mobj.group('rtmpurl'),
|
||||
'play_path': mobj.group('playpath'),
|
||||
'rtmp_live': False,
|
||||
'ext': 'mp4',
|
||||
'format_id': 'rtmp',
|
||||
'format': 'HD',
|
||||
})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
}
|
@ -1,17 +1,21 @@
|
||||
# encoding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import int_or_none
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
js_to_json,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
class MusicPlayOnIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=100&play)=(?P<id>\d+)'
|
||||
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
|
||||
|
||||
_TEST = {
|
||||
_TESTS = [{
|
||||
'url': 'http://en.musicplayon.com/play?v=433377',
|
||||
'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
|
||||
'info_dict': {
|
||||
'id': '433377',
|
||||
'ext': 'mp4',
|
||||
@ -20,15 +24,16 @@ class MusicPlayOnIE(InfoExtractor):
|
||||
'duration': 342,
|
||||
'uploader': 'ultrafish',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
}, {
|
||||
'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
video_id = self._match_id(url)
|
||||
url = self._URL_TEMPLATE % video_id
|
||||
|
||||
page = self._download_webpage(url, video_id)
|
||||
|
||||
@ -40,28 +45,14 @@ class MusicPlayOnIE(InfoExtractor):
|
||||
uploader = self._html_search_regex(
|
||||
r'<div>by <a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
|
||||
|
||||
formats = [
|
||||
{
|
||||
'url': 'http://media0-eu-nl.musicplayon.com/stream-mobile?id=%s&type=.mp4' % video_id,
|
||||
'ext': 'mp4',
|
||||
}
|
||||
]
|
||||
|
||||
manifest = self._download_webpage(
|
||||
'http://en.musicplayon.com/manifest.m3u8?v=%s' % video_id, video_id, 'Downloading manifest')
|
||||
|
||||
for entry in manifest.split('#')[1:]:
|
||||
if entry.startswith('EXT-X-STREAM-INF:'):
|
||||
meta, url, _ = entry.split('\n')
|
||||
params = dict(param.split('=') for param in meta.split(',')[1:])
|
||||
formats.append({
|
||||
'url': url,
|
||||
'ext': 'mp4',
|
||||
'tbr': int(params['BANDWIDTH']),
|
||||
'width': int(params['RESOLUTION'].split('x')[1]),
|
||||
'height': int(params['RESOLUTION'].split('x')[-1]),
|
||||
'format_note': params['NAME'].replace('"', '').strip(),
|
||||
})
|
||||
sources = self._parse_json(
|
||||
self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
|
||||
video_id, transform_source=js_to_json)
|
||||
formats = [{
|
||||
'url': compat_urlparse.urljoin(url, source['src']),
|
||||
'ext': mimetype2ext(source.get('type')),
|
||||
'format_note': source.get('data-res'),
|
||||
} for source in sources]
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
|
@ -12,7 +12,7 @@ class MwaveIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
|
||||
'md5': 'c930e27b7720aaa3c9d0018dfc8ff6cc',
|
||||
# md5 is unstable
|
||||
'info_dict': {
|
||||
'id': '168859',
|
||||
'ext': 'flv',
|
||||
|
@ -134,6 +134,9 @@ class NBCSportsIE(InfoExtractor):
|
||||
'ext': 'flv',
|
||||
'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
|
||||
'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
|
||||
'uploader': 'NBCU-SPORTS',
|
||||
'upload_date': '20150330',
|
||||
'timestamp': 1427726529,
|
||||
}
|
||||
}
|
||||
|
||||
@ -172,7 +175,7 @@ class CSNNEIE(InfoExtractor):
|
||||
|
||||
|
||||
class NBCNewsIE(ThePlatformIE):
|
||||
_VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/
|
||||
_VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today)\.com/
|
||||
(?:video/.+?/(?P<id>\d+)|
|
||||
([^/]+/)*(?P<display_id>[^/?]+))
|
||||
'''
|
||||
@ -230,6 +233,18 @@ class NBCNewsIE(ThePlatformIE):
|
||||
},
|
||||
'expected_warnings': ['http-6000 is not available']
|
||||
},
|
||||
{
|
||||
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
|
||||
'md5': '118d7ca3f0bea6534f119c68ef539f71',
|
||||
'info_dict': {
|
||||
'id': '669831235788',
|
||||
'ext': 'mp4',
|
||||
'title': 'See the aurora borealis from space in stunning new NASA video',
|
||||
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
|
||||
'upload_date': '20160420',
|
||||
'timestamp': 1461152093,
|
||||
},
|
||||
},
|
||||
{
|
||||
'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952',
|
||||
'only_matching': True,
|
||||
@ -264,7 +279,10 @@ class NBCNewsIE(ThePlatformIE):
|
||||
info = bootstrap['results'][0]['video']
|
||||
else:
|
||||
player_instance_json = self._search_regex(
|
||||
r'videoObj\s*:\s*({.+})', webpage, 'player instance')
|
||||
r'videoObj\s*:\s*({.+})', webpage, 'player instance', default=None)
|
||||
if not player_instance_json:
|
||||
player_instance_json = self._html_search_regex(
|
||||
r'data-video="([^"]+)"', webpage, 'video json')
|
||||
info = self._parse_json(player_instance_json, display_id)
|
||||
video_id = info['mpxId']
|
||||
title = info['title']
|
||||
@ -295,7 +313,7 @@ class NBCNewsIE(ThePlatformIE):
|
||||
formats.extend(tp_formats)
|
||||
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
|
||||
else:
|
||||
tbr = int_or_none(video_asset.get('bitRate'), 1000)
|
||||
tbr = int_or_none(video_asset.get('bitRate') or video_asset.get('bitrate'), 1000)
|
||||
format_id = 'http%s' % ('-%d' % tbr if tbr else '')
|
||||
video_url = update_url_query(
|
||||
video_url, {'format': 'redirect'})
|
||||
@ -321,10 +339,9 @@ class NBCNewsIE(ThePlatformIE):
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': info.get('description'),
|
||||
'thumbnail': info.get('description'),
|
||||
'thumbnail': info.get('thumbnail'),
|
||||
'duration': int_or_none(info.get('duration')),
|
||||
'timestamp': parse_iso8601(info.get('pubDate')),
|
||||
'timestamp': parse_iso8601(info.get('pubDate') or info.get('pub_date')),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
@ -1,80 +0,0 @@
|
||||
# encoding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
parse_iso8601,
|
||||
xpath_text,
|
||||
)
|
||||
|
||||
|
||||
class NerdistIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?nerdist\.com/vepisode/(?P<id>[^/?#]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.nerdist.com/vepisode/exclusive-which-dc-characters-w',
|
||||
'md5': '3698ed582931b90d9e81e02e26e89f23',
|
||||
'info_dict': {
|
||||
'display_id': 'exclusive-which-dc-characters-w',
|
||||
'id': 'RPHpvJyr',
|
||||
'ext': 'mp4',
|
||||
'title': 'Your TEEN TITANS Revealed! Who\'s on the show?',
|
||||
'thumbnail': 're:^https?://.*/thumbs/.*\.jpg$',
|
||||
'description': 'Exclusive: Find out which DC Comics superheroes will star in TEEN TITANS Live-Action TV Show on Nerdist News with Jessica Chobot!',
|
||||
'uploader': 'Eric Diaz',
|
||||
'upload_date': '20150202',
|
||||
'timestamp': 1422892808,
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'''(?x)<script\s+(?:type="text/javascript"\s+)?
|
||||
src="https?://content\.nerdist\.com/players/([a-zA-Z0-9_]+)-''',
|
||||
webpage, 'video ID')
|
||||
timestamp = parse_iso8601(self._html_search_meta(
|
||||
'shareaholic:article_published_time', webpage, 'upload date'))
|
||||
uploader = self._html_search_meta(
|
||||
'shareaholic:article_author_name', webpage, 'article author')
|
||||
|
||||
doc = self._download_xml(
|
||||
'http://content.nerdist.com/jw6/%s.xml' % video_id, video_id)
|
||||
video_info = doc.find('.//item')
|
||||
title = xpath_text(video_info, './title', fatal=True)
|
||||
description = xpath_text(video_info, './description')
|
||||
thumbnail = xpath_text(
|
||||
video_info, './{http://rss.jwpcdn.com/}image', 'thumbnail')
|
||||
|
||||
formats = []
|
||||
for source in video_info.findall('./{http://rss.jwpcdn.com/}source'):
|
||||
vurl = source.attrib['file']
|
||||
ext = determine_ext(vurl)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
vurl, video_id, entry_protocol='m3u8_native', ext='mp4',
|
||||
preference=0))
|
||||
elif ext == 'smil':
|
||||
formats.extend(self._extract_smil_formats(
|
||||
vurl, video_id, fatal=False
|
||||
))
|
||||
else:
|
||||
formats.append({
|
||||
'format_id': ext,
|
||||
'url': vurl,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'timestamp': timestamp,
|
||||
'formats': formats,
|
||||
'uploader': uploader,
|
||||
}
|
@ -89,6 +89,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
|
||||
'timestamp': 1431878400,
|
||||
'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'No lyrics translation.',
|
||||
'url': 'http://music.163.com/#/song?id=29822014',
|
||||
@ -101,6 +102,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
|
||||
'timestamp': 1419523200,
|
||||
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'No lyrics.',
|
||||
'url': 'http://music.163.com/song?id=17241424',
|
||||
@ -112,6 +114,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
|
||||
'upload_date': '20080211',
|
||||
'timestamp': 1202745600,
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'Has translated name.',
|
||||
'url': 'http://music.163.com/#/song?id=22735043',
|
||||
@ -124,7 +127,8 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
|
||||
'upload_date': '20100127',
|
||||
'timestamp': 1264608000,
|
||||
'alt_title': '说出愿望吧(Genie)',
|
||||
}
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}]
|
||||
|
||||
def _process_lyrics(self, lyrics_info):
|
||||
@ -192,6 +196,7 @@ class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
|
||||
'title': 'B\'day',
|
||||
},
|
||||
'playlist_count': 23,
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -223,6 +228,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
|
||||
'title': '张惠妹 - aMEI;阿密特',
|
||||
},
|
||||
'playlist_count': 50,
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'Singer has translated name.',
|
||||
'url': 'http://music.163.com/#/artist?id=124098',
|
||||
@ -231,6 +237,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
|
||||
'title': '李昇基 - 이승기',
|
||||
},
|
||||
'playlist_count': 50,
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -266,6 +273,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
|
||||
'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
|
||||
},
|
||||
'playlist_count': 99,
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'Toplist/Charts sample',
|
||||
'url': 'http://music.163.com/#/discover/toplist?id=3733003',
|
||||
@ -275,6 +283,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
|
||||
'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
|
||||
},
|
||||
'playlist_count': 50,
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -314,6 +323,7 @@ class NetEaseMusicMvIE(NetEaseMusicBaseIE):
|
||||
'creator': '白雅言',
|
||||
'upload_date': '20150520',
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -357,6 +367,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
|
||||
'upload_date': '20150613',
|
||||
'duration': 900,
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'This program has accompanying songs.',
|
||||
'url': 'http://music.163.com/#/program?id=10141022',
|
||||
@ -366,6 +377,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
|
||||
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
|
||||
},
|
||||
'playlist_count': 4,
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}, {
|
||||
'note': 'This program has accompanying songs.',
|
||||
'url': 'http://music.163.com/#/program?id=10141022',
|
||||
@ -379,7 +391,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
|
||||
},
|
||||
'params': {
|
||||
'noplaylist': True
|
||||
}
|
||||
},
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -438,6 +451,7 @@ class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
|
||||
'description': 'md5:766220985cbd16fdd552f64c578a6b15'
|
||||
},
|
||||
'playlist_mincount': 40,
|
||||
'skip': 'Blocked outside Mainland China',
|
||||
}
|
||||
_PAGE_SIZE = 1000
|
||||
|
||||
|
@ -7,8 +7,8 @@ from .common import InfoExtractor
|
||||
|
||||
|
||||
class NewgroundsIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/audio/listen/(?P<id>[0-9]+)'
|
||||
_TEST = {
|
||||
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.newgrounds.com/audio/listen/549479',
|
||||
'md5': 'fe6033d297591288fa1c1f780386f07a',
|
||||
'info_dict': {
|
||||
@ -17,7 +17,16 @@ class NewgroundsIE(InfoExtractor):
|
||||
'title': 'B7 - BusMode',
|
||||
'uploader': 'Burn7',
|
||||
}
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.newgrounds.com/portal/view/673111',
|
||||
'md5': '3394735822aab2478c31b1004fe5e5bc',
|
||||
'info_dict': {
|
||||
'id': '673111',
|
||||
'ext': 'mp4',
|
||||
'title': 'Dancin',
|
||||
'uploader': 'Squirrelman82',
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
@ -25,9 +34,11 @@ class NewgroundsIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, music_id)
|
||||
|
||||
title = self._html_search_regex(
|
||||
r',"name":"([^"]+)",', webpage, 'music title')
|
||||
r'<title>([^>]+)</title>', webpage, 'title')
|
||||
|
||||
uploader = self._html_search_regex(
|
||||
r',"artist":"([^"]+)",', webpage, 'music uploader')
|
||||
[r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'],
|
||||
webpage, 'uploader')
|
||||
|
||||
music_url_json_string = self._html_search_regex(
|
||||
r'({"url":"[^"]+"),', webpage, 'music url') + '}'
|
||||
|
@ -4,24 +4,24 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import ExtractorError
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
)
|
||||
|
||||
|
||||
class NewstubeIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym',
|
||||
'md5': '801eef0c2a9f4089fa04e4fe3533abdc',
|
||||
'info_dict': {
|
||||
'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Телеканал CNN переместил город Славянск в Крым',
|
||||
'description': 'md5:419a8c9f03442bc0b0a794d689360335',
|
||||
'duration': 31.05,
|
||||
},
|
||||
'params': {
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -62,7 +62,6 @@ class NewstubeIE(InfoExtractor):
|
||||
server = media_location.find(ns('./Server')).text
|
||||
app = media_location.find(ns('./App')).text
|
||||
media_id = stream_info.find(ns('./Id')).text
|
||||
quality_id = stream_info.find(ns('./QualityId')).text
|
||||
name = stream_info.find(ns('./Name')).text
|
||||
width = int(stream_info.find(ns('./Width')).text)
|
||||
height = int(stream_info.find(ns('./Height')).text)
|
||||
@ -74,12 +73,38 @@ class NewstubeIE(InfoExtractor):
|
||||
'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'],
|
||||
'page_url': url,
|
||||
'ext': 'flv',
|
||||
'format_id': quality_id,
|
||||
'format_note': name,
|
||||
'format_id': 'rtmp' + ('-%s' % name if name else ''),
|
||||
'width': width,
|
||||
'height': height,
|
||||
})
|
||||
|
||||
sources_data = self._download_json(
|
||||
'http://www.newstube.ru/player2/getsources?guid=%s' % video_guid,
|
||||
video_guid, fatal=False)
|
||||
if sources_data:
|
||||
for source in sources_data.get('Sources', []):
|
||||
source_url = source.get('Src')
|
||||
if not source_url:
|
||||
continue
|
||||
height = int_or_none(source.get('Height'))
|
||||
f = {
|
||||
'format_id': 'http' + ('-%dp' % height if height else ''),
|
||||
'url': source_url,
|
||||
'width': int_or_none(source.get('Width')),
|
||||
'height': height,
|
||||
}
|
||||
source_type = source.get('Type')
|
||||
if source_type:
|
||||
mobj = re.search(r'codecs="([^,]+),\s*([^"]+)"', source_type)
|
||||
if mobj:
|
||||
vcodec, acodec = mobj.groups()
|
||||
f.update({
|
||||
'vcodec': vcodec,
|
||||
'acodec': acodec,
|
||||
})
|
||||
formats.append(f)
|
||||
|
||||
self._check_formats(formats, video_guid)
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
|
@ -8,10 +8,15 @@ from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urlparse,
|
||||
compat_urllib_parse_urlencode,
|
||||
compat_urllib_parse_urlparse
|
||||
compat_urllib_parse_urlparse,
|
||||
compat_str,
|
||||
)
|
||||
from ..utils import (
|
||||
unified_strdate,
|
||||
determine_ext,
|
||||
int_or_none,
|
||||
parse_iso8601,
|
||||
parse_duration,
|
||||
)
|
||||
|
||||
|
||||
@ -70,8 +75,8 @@ class NHLBaseInfoExtractor(InfoExtractor):
|
||||
return ret
|
||||
|
||||
|
||||
class NHLIE(NHLBaseInfoExtractor):
|
||||
IE_NAME = 'nhl.com'
|
||||
class NHLVideocenterIE(NHLBaseInfoExtractor):
|
||||
IE_NAME = 'nhl.com:videocenter'
|
||||
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)'
|
||||
|
||||
_TESTS = [{
|
||||
@ -186,8 +191,8 @@ class NHLNewsIE(NHLBaseInfoExtractor):
|
||||
return self._real_extract_video(video_id)
|
||||
|
||||
|
||||
class NHLVideocenterIE(NHLBaseInfoExtractor):
|
||||
IE_NAME = 'nhl.com:videocenter'
|
||||
class NHLVideocenterCategoryIE(NHLBaseInfoExtractor):
|
||||
IE_NAME = 'nhl.com:videocenter:category'
|
||||
IE_DESC = 'NHL videocenter category'
|
||||
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$'
|
||||
_TEST = {
|
||||
@ -236,3 +241,86 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
|
||||
'id': cat_id,
|
||||
'entries': [self._extract_video(v) for v in videos],
|
||||
}
|
||||
|
||||
|
||||
class NHLIE(InfoExtractor):
|
||||
IE_NAME = 'nhl.com'
|
||||
_VALID_URL = r'https?://(?:www\.)?nhl\.com/([^/]+/)*c-(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
# type=video
|
||||
'url': 'https://www.nhl.com/video/anisimov-cleans-up-mess/t-277752844/c-43663503',
|
||||
'md5': '0f7b9a8f986fb4b4eeeece9a56416eaf',
|
||||
'info_dict': {
|
||||
'id': '43663503',
|
||||
'ext': 'mp4',
|
||||
'title': 'Anisimov cleans up mess',
|
||||
'description': 'md5:a02354acdfe900e940ce40706939ca63',
|
||||
'timestamp': 1461288600,
|
||||
'upload_date': '20160422',
|
||||
},
|
||||
}, {
|
||||
# type=article
|
||||
'url': 'https://www.nhl.com/news/dennis-wideman-suspended/c-278258934',
|
||||
'md5': '1f39f4ea74c1394dea110699a25b366c',
|
||||
'info_dict': {
|
||||
'id': '40784403',
|
||||
'ext': 'mp4',
|
||||
'title': 'Wideman suspended by NHL',
|
||||
'description': 'Flames defenseman Dennis Wideman was banned 20 games for violation of Rule 40 (Physical Abuse of Officials)',
|
||||
'upload_date': '20160204',
|
||||
'timestamp': 1454544904,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
tmp_id = self._match_id(url)
|
||||
video_data = self._download_json(
|
||||
'https://nhl.bamcontent.com/nhl/id/v1/%s/details/web-v1.json' % tmp_id,
|
||||
tmp_id)
|
||||
if video_data.get('type') == 'article':
|
||||
video_data = video_data['media']
|
||||
|
||||
video_id = compat_str(video_data['id'])
|
||||
title = video_data['title']
|
||||
|
||||
formats = []
|
||||
for playback in video_data.get('playbacks', []):
|
||||
playback_url = playback.get('url')
|
||||
if not playback_url:
|
||||
continue
|
||||
ext = determine_ext(playback_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
playback_url, video_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id=playback.get('name', 'hls'), fatal=False))
|
||||
else:
|
||||
height = int_or_none(playback.get('height'))
|
||||
formats.append({
|
||||
'format_id': playback.get('name', 'http' + ('-%dp' % height if height else '')),
|
||||
'url': playback_url,
|
||||
'width': int_or_none(playback.get('width')),
|
||||
'height': height,
|
||||
})
|
||||
self._sort_formats(formats, ('preference', 'width', 'height', 'tbr', 'format_id'))
|
||||
|
||||
thumbnails = []
|
||||
for thumbnail_id, thumbnail_data in video_data.get('image', {}).get('cuts', {}).items():
|
||||
thumbnail_url = thumbnail_data.get('src')
|
||||
if not thumbnail_url:
|
||||
continue
|
||||
thumbnails.append({
|
||||
'id': thumbnail_id,
|
||||
'url': thumbnail_url,
|
||||
'width': int_or_none(thumbnail_data.get('width')),
|
||||
'height': int_or_none(thumbnail_data.get('height')),
|
||||
})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': video_data.get('description'),
|
||||
'timestamp': parse_iso8601(video_data.get('date')),
|
||||
'duration': parse_duration(video_data.get('duration')),
|
||||
'thumbnails': thumbnails,
|
||||
'formats': formats,
|
||||
}
|
||||
|
@ -4,7 +4,10 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import determine_ext
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
int_or_none,
|
||||
)
|
||||
|
||||
|
||||
class OnionStudiosIE(InfoExtractor):
|
||||
@ -17,7 +20,7 @@ class OnionStudiosIE(InfoExtractor):
|
||||
'id': '2937',
|
||||
'ext': 'mp4',
|
||||
'title': 'Hannibal charges forward, stops for a cocktail',
|
||||
'description': 'md5:545299bda6abf87e5ec666548c6a9448',
|
||||
'description': 'md5:e786add7f280b7f0fe237b64cc73df76',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'uploader': 'The A.V. Club',
|
||||
'uploader_id': 'TheAVClub',
|
||||
@ -42,9 +45,19 @@ class OnionStudiosIE(InfoExtractor):
|
||||
|
||||
formats = []
|
||||
for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage):
|
||||
if determine_ext(src) != 'm3u8': # m3u8 always results in 403
|
||||
ext = determine_ext(src)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
height = int_or_none(self._search_regex(
|
||||
r'/(\d+)\.%s' % ext, src, 'height', default=None))
|
||||
formats.append({
|
||||
'format_id': ext + ('-%sp' % height if height else ''),
|
||||
'url': src,
|
||||
'height': height,
|
||||
'ext': ext,
|
||||
'preference': 1,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
@ -52,7 +65,7 @@ class OnionStudiosIE(InfoExtractor):
|
||||
r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
|
||||
webpage, 'title', group='title')
|
||||
description = self._search_regex(
|
||||
r'share_description\s*=\s*(["\'])(?P<description>[^\1]+?)\1',
|
||||
r'share_description\s*=\s*(["\'])(?P<description>[^\'"]+?)\1',
|
||||
webpage, 'description', default=None, group='description')
|
||||
thumbnail = self._search_regex(
|
||||
r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',
|
||||
|
@ -6,8 +6,10 @@ import re
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_chr
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
encode_base_n,
|
||||
ExtractorError,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
@ -29,6 +31,11 @@ class OpenloadIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'https://openload.io/f/ZAn6oz-VZGE/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# unavailable via https://openload.co/f/Sxz5sADo82g/, different layout
|
||||
# for title and ext
|
||||
'url': 'https://openload.co/embed/Sxz5sADo82g/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
@ -96,12 +103,25 @@ class OpenloadIE(InfoExtractor):
|
||||
r'<video[^>]+>\s*<script[^>]+>([^<]+)</script>',
|
||||
webpage, 'JS code')
|
||||
|
||||
decoded = self.openload_decode(code)
|
||||
|
||||
video_url = self._search_regex(
|
||||
r'return\s+"(https?://[^"]+)"', self.openload_decode(code), 'video URL')
|
||||
r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
|
||||
|
||||
title = self._og_search_title(webpage, default=None) or self._search_regex(
|
||||
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
|
||||
'title', default=None) or self._html_search_meta(
|
||||
'description', webpage, 'title', fatal=True)
|
||||
|
||||
ext = mimetype2ext(self._search_regex(
|
||||
r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
|
||||
'mimetype', default=None, group='mimetype')) or determine_ext(
|
||||
video_url, 'mp4')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': self._og_search_title(webpage),
|
||||
'thumbnail': self._og_search_thumbnail(webpage),
|
||||
'title': title,
|
||||
'ext': ext,
|
||||
'thumbnail': self._og_search_thumbnail(webpage, default=None),
|
||||
'url': video_url,
|
||||
}
|
||||
|
32
youtube_dl/extractor/people.py
Normal file
32
youtube_dl/extractor/people.py
Normal file
@ -0,0 +1,32 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class PeopleIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?people\.com/people/videos/0,,(?P<id>\d+),00\.html'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.people.com/people/videos/0,,20995451,00.html',
|
||||
'info_dict': {
|
||||
'id': 'ref:20995451',
|
||||
'ext': 'mp4',
|
||||
'title': 'Astronaut Love Triangle Victim Speaks Out: “The Crime in 2007 Hasn’t Defined Us”',
|
||||
'description': 'Colleen Shipman speaks to PEOPLE for the first time about life after the attack',
|
||||
'thumbnail': 're:^https?://.*\.jpg',
|
||||
'duration': 246.318,
|
||||
'timestamp': 1458720585,
|
||||
'upload_date': '20160323',
|
||||
'uploader_id': '416418724',
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['BrightcoveNew'],
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
return self.url_result(
|
||||
'http://players.brightcove.net/416418724/default_default/index.html?videoId=ref:%s'
|
||||
% self._match_id(url), 'BrightcoveNew')
|
@ -1,61 +0,0 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import ExtractorError
|
||||
|
||||
|
||||
class PlanetaPlayIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?planetaplay\.com/\?sng=(?P<id>[0-9]+)'
|
||||
_API_URL = 'http://planetaplay.com/action/playlist/?sng={0:}'
|
||||
_THUMBNAIL_URL = 'http://planetaplay.com/img/thumb/{thumb:}'
|
||||
_TEST = {
|
||||
'url': 'http://planetaplay.com/?sng=3586',
|
||||
'md5': '9d569dceb7251a4e01355d5aea60f9db',
|
||||
'info_dict': {
|
||||
'id': '3586',
|
||||
'ext': 'flv',
|
||||
'title': 'md5:e829428ee28b1deed00de90de49d1da1',
|
||||
},
|
||||
'skip': 'Not accessible from Travis CI server',
|
||||
}
|
||||
|
||||
_SONG_FORMATS = {
|
||||
'lq': (0, 'http://www.planetaplay.com/videoplayback/{med_hash:}'),
|
||||
'hq': (1, 'http://www.planetaplay.com/videoplayback/hi/{med_hash:}'),
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
|
||||
response = self._download_json(
|
||||
self._API_URL.format(video_id), video_id)['response']
|
||||
try:
|
||||
data = response.get('data')[0]
|
||||
except IndexError:
|
||||
raise ExtractorError(
|
||||
'%s: failed to get the playlist' % self.IE_NAME, expected=True)
|
||||
|
||||
title = '{song_artists:} - {sng_name:}'.format(**data)
|
||||
thumbnail = self._THUMBNAIL_URL.format(**data)
|
||||
|
||||
formats = []
|
||||
for format_id, (quality, url_template) in self._SONG_FORMATS.items():
|
||||
formats.append({
|
||||
'format_id': format_id,
|
||||
'url': url_template.format(**data),
|
||||
'quality': quality,
|
||||
'ext': 'flv',
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'thumbnail': thumbnail,
|
||||
}
|
@ -40,7 +40,7 @@ class Puls4IE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
error_message = self._html_search_regex(
|
||||
r'<div class="message-error">(.+?)</div>',
|
||||
r'<div[^>]+class="message-error"[^>]*>(.+?)</div>',
|
||||
webpage, 'error message', default=None)
|
||||
if error_message:
|
||||
raise ExtractorError(
|
||||
|
@ -1,54 +0,0 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..utils import (
|
||||
determine_ext,
|
||||
int_or_none,
|
||||
)
|
||||
|
||||
|
||||
class QuickVidIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(www\.)?quickvid\.org/watch\.php\?v=(?P<id>[a-zA-Z_0-9-]+)'
|
||||
_TEST = {
|
||||
'url': 'http://quickvid.org/watch.php?v=sUQT3RCG8dx',
|
||||
'md5': 'c0c72dd473f260c06c808a05d19acdc5',
|
||||
'info_dict': {
|
||||
'id': 'sUQT3RCG8dx',
|
||||
'ext': 'mp4',
|
||||
'title': 'Nick Offerman\'s Summer Reading Recap',
|
||||
'thumbnail': 're:^https?://.*\.(?:png|jpg|gif)$',
|
||||
'view_count': int,
|
||||
},
|
||||
'skip': 'Not accessible from Travis CI server',
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
title = self._html_search_regex(r'<h2>(.*?)</h2>', webpage, 'title')
|
||||
view_count = int_or_none(self._html_search_regex(
|
||||
r'(?s)<div id="views">(.*?)</div>',
|
||||
webpage, 'view count', fatal=False))
|
||||
video_code = self._search_regex(
|
||||
r'(?s)<video id="video"[^>]*>(.*?)</video>', webpage, 'video code')
|
||||
formats = [
|
||||
{
|
||||
'url': compat_urlparse.urljoin(url, src),
|
||||
'format_id': determine_ext(src, None),
|
||||
} for src in re.findall('<source\s+src="([^"]+)"', video_code)
|
||||
]
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'thumbnail': self._og_search_thumbnail(webpage),
|
||||
'view_count': view_count,
|
||||
}
|
@ -4,12 +4,18 @@ from __future__ import unicode_literals
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
unescapeHTML,
|
||||
ExtractorError,
|
||||
)
|
||||
|
||||
|
||||
class RTBFIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?rtbf\.be/(?:video/[^?]+\?.*\bid=|ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=)(?P<id>\d+)'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://(?:www\.)?rtbf\.be/
|
||||
(?:
|
||||
video/[^?]+\?.*\bid=|
|
||||
ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=|
|
||||
auvio/[^/]+\?.*id=
|
||||
)(?P<id>\d+)'''
|
||||
_TESTS = [{
|
||||
'url': 'https://www.rtbf.be/video/detail_les-diables-au-coeur-episode-2?id=1921274',
|
||||
'md5': '799f334ddf2c0a582ba80c44655be570',
|
||||
@ -17,7 +23,11 @@ class RTBFIE(InfoExtractor):
|
||||
'id': '1921274',
|
||||
'ext': 'mp4',
|
||||
'title': 'Les Diables au coeur (épisode 2)',
|
||||
'description': 'Football - Diables Rouges',
|
||||
'duration': 3099,
|
||||
'upload_date': '20140425',
|
||||
'timestamp': 1398456336,
|
||||
'uploader': 'rtbfsport',
|
||||
}
|
||||
}, {
|
||||
# geo restricted
|
||||
@ -26,45 +36,63 @@ class RTBFIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://www.rtbf.be/ouftivi/niouzz?videoId=2055858',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.rtbf.be/auvio/detail_jeudi-en-prime-siegfried-bracke?id=2102996',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_IMAGE_HOST = 'http://ds1.ds.static.rtbf.be'
|
||||
_PROVIDERS = {
|
||||
'YOUTUBE': 'Youtube',
|
||||
'DAILYMOTION': 'Dailymotion',
|
||||
'VIMEO': 'Vimeo',
|
||||
}
|
||||
_QUALITIES = [
|
||||
('mobile', 'mobile'),
|
||||
('web', 'SD'),
|
||||
('url', 'MD'),
|
||||
('mobile', 'SD'),
|
||||
('web', 'MD'),
|
||||
('high', 'HD'),
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
data = self._download_json(
|
||||
'http://www.rtbf.be/api/media/video?method=getVideoDetail&args[]=%s' % video_id, video_id)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://www.rtbf.be/video/embed?id=%s' % video_id, video_id)
|
||||
error = data.get('error')
|
||||
if error:
|
||||
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
|
||||
|
||||
data = self._parse_json(
|
||||
unescapeHTML(self._search_regex(
|
||||
r'data-media="([^"]+)"', webpage, 'data video')),
|
||||
video_id)
|
||||
data = data['data']
|
||||
|
||||
provider = data.get('provider')
|
||||
if provider in self._PROVIDERS:
|
||||
return self.url_result(data['url'], self._PROVIDERS[provider])
|
||||
|
||||
if data.get('provider').lower() == 'youtube':
|
||||
video_url = data.get('downloadUrl') or data.get('url')
|
||||
return self.url_result(video_url, 'Youtube')
|
||||
formats = []
|
||||
for key, format_id in self._QUALITIES:
|
||||
format_url = data['sources'].get(key)
|
||||
format_url = data.get(key + 'Url')
|
||||
if format_url:
|
||||
formats.append({
|
||||
'format_id': format_id,
|
||||
'url': format_url,
|
||||
})
|
||||
|
||||
thumbnails = []
|
||||
for thumbnail_id, thumbnail_url in data.get('thumbnail', {}).items():
|
||||
if thumbnail_id != 'default':
|
||||
thumbnails.append({
|
||||
'url': self._IMAGE_HOST + thumbnail_url,
|
||||
'id': thumbnail_id,
|
||||
})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
'title': data['title'],
|
||||
'description': data.get('description') or data.get('subtitle'),
|
||||
'thumbnail': data.get('thumbnail'),
|
||||
'thumbnails': thumbnails,
|
||||
'duration': data.get('duration') or data.get('realDuration'),
|
||||
'timestamp': int_or_none(data.get('created')),
|
||||
'view_count': int_or_none(data.get('viewCount')),
|
||||
'uploader': data.get('channel'),
|
||||
'tags': data.get('tags'),
|
||||
}
|
||||
|
@ -6,6 +6,7 @@ import re
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
js_to_json,
|
||||
unified_strdate,
|
||||
)
|
||||
|
||||
@ -94,19 +95,32 @@ class SportBoxEmbedIE(InfoExtractor):
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
hls = self._search_regex(
|
||||
r"sportboxPlayer\.jwplayer_common_params\.file\s*=\s*['\"]([^'\"]+)['\"]",
|
||||
webpage, 'hls file')
|
||||
formats = []
|
||||
|
||||
def cleanup_js(code):
|
||||
# desktop_advert_config contains complex Javascripts and we don't need it
|
||||
return js_to_json(re.sub(r'desktop_advert_config.*', '', code))
|
||||
|
||||
jwplayer_data = self._parse_json(self._search_regex(
|
||||
r'(?s)player\.setup\(({.+?})\);', webpage, 'jwplayer settings'), video_id,
|
||||
transform_source=cleanup_js)
|
||||
|
||||
hls_url = jwplayer_data.get('hls_url')
|
||||
if hls_url:
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
hls_url, video_id, ext='mp4', m3u8_id='hls'))
|
||||
|
||||
rtsp_url = jwplayer_data.get('rtsp_url')
|
||||
if rtsp_url:
|
||||
formats.append({
|
||||
'url': rtsp_url,
|
||||
'format_id': 'rtsp',
|
||||
})
|
||||
|
||||
formats = self._extract_m3u8_formats(hls, video_id, 'mp4')
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = self._search_regex(
|
||||
r'sportboxPlayer\.node_title\s*=\s*"([^"]+)"', webpage, 'title')
|
||||
|
||||
thumbnail = self._search_regex(
|
||||
r'sportboxPlayer\.jwplayer_common_params\.image\s*=\s*"([^"]+)"',
|
||||
webpage, 'thumbnail', default=None)
|
||||
title = jwplayer_data['node_title']
|
||||
thumbnail = jwplayer_data.get('image_url')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
|
@ -14,7 +14,6 @@ class StreetVoiceIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': '94440',
|
||||
'ext': 'mp3',
|
||||
'filesize': 4167053,
|
||||
'title': '輸',
|
||||
'description': 'Crispy脆樂團 - 輸',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
@ -32,20 +31,19 @@ class StreetVoiceIE(InfoExtractor):
|
||||
song_id = self._match_id(url)
|
||||
|
||||
song = self._download_json(
|
||||
'http://streetvoice.com/music/api/song/%s' % song_id, song_id)
|
||||
'https://streetvoice.com/api/v1/public/song/%s/' % song_id, song_id, data=b'')
|
||||
|
||||
title = song['name']
|
||||
author = song['musician']['name']
|
||||
author = song['user']['nickname']
|
||||
|
||||
return {
|
||||
'id': song_id,
|
||||
'url': song['file'],
|
||||
'filesize': song.get('size'),
|
||||
'title': title,
|
||||
'description': '%s - %s' % (author, title),
|
||||
'thumbnail': self._proto_relative_url(song.get('image'), 'http:'),
|
||||
'duration': song.get('length'),
|
||||
'upload_date': unified_strdate(song.get('created_at')),
|
||||
'uploader': author,
|
||||
'uploader_id': compat_str(song['musician']['id']),
|
||||
'uploader_id': compat_str(song['user']['id']),
|
||||
}
|
||||
|
33
youtube_dl/extractor/tdslifeway.py
Normal file
33
youtube_dl/extractor/tdslifeway.py
Normal file
@ -0,0 +1,33 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class TDSLifewayIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://tds\.lifeway\.com/v1/trainingdeliverysystem/courses/(?P<id>\d+)/index\.html'
|
||||
|
||||
_TEST = {
|
||||
# From http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers
|
||||
'url': 'http://tds.lifeway.com/v1/trainingdeliverysystem/courses/3453494717001/index.html?externalRegistration=AssetId%7C34F466F1-78F3-4619-B2AB-A8EFFA55E9E9%21InstanceId%7C0%21UserId%7Caaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa&grouping=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&activity_id=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&content_endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2Fcontent%2F&actor=%7B%22name%22%3A%5B%22Guest%20Guest%22%5D%2C%22account%22%3A%5B%7B%22accountServiceHomePage%22%3A%22http%3A%2F%2Fscorm.lifeway.com%2F%22%2C%22accountName%22%3A%22aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa%22%7D%5D%2C%22objectType%22%3A%22Agent%22%7D&content_token=462a50b2-b6f9-4970-99b1-930882c499fb®istration=93d6ec8e-7f7b-4ed3-bbc8-a857913c0b2a&externalConfiguration=access%7CFREE%21adLength%7C-1%21assignOrgId%7C4AE36F78-299A-425D-91EF-E14A899B725F%21assignOrgParentId%7C%21courseId%7C%21isAnonymous%7Cfalse%21previewAsset%7Cfalse%21previewLength%7C-1%21previewMode%7Cfalse%21royalty%7CFREE%21sessionId%7C671422F9-8E79-48D4-9C2C-4EE6111EA1CD%21trackId%7C&auth=Basic%20OjhmZjk5MDBmLTBlYTMtNDJhYS04YjFlLWE4MWQ3NGNkOGRjYw%3D%3D&endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2F',
|
||||
'info_dict': {
|
||||
'id': '3453494717001',
|
||||
'ext': 'mp4',
|
||||
'title': 'The Gospel by Numbers',
|
||||
'thumbnail': 're:^https?://.*\.jpg',
|
||||
'upload_date': '20140410',
|
||||
'description': 'Coming soon from T4G 2014!',
|
||||
'uploader_id': '2034960640001',
|
||||
'timestamp': 1397145591,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['BrightcoveNew'],
|
||||
}
|
||||
|
||||
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/2034960640001/default_default/index.html?videoId=%s'
|
||||
|
||||
def _real_extract(self, url):
|
||||
brightcove_id = self._match_id(url)
|
||||
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
|
@ -1,63 +0,0 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class TheOnionIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?theonion\.com/video/[^,]+,(?P<id>[0-9]+)/?'
|
||||
_TEST = {
|
||||
'url': 'http://www.theonion.com/video/man-wearing-mm-jacket-gods-image,36918/',
|
||||
'md5': '19eaa9a39cf9b9804d982e654dc791ee',
|
||||
'info_dict': {
|
||||
'id': '2133',
|
||||
'ext': 'mp4',
|
||||
'title': 'Man Wearing M&M Jacket Apparently Made In God\'s Image',
|
||||
'description': 'md5:cc12448686b5600baae9261d3e180910',
|
||||
'thumbnail': 're:^https?://.*\.jpg\?\d+$',
|
||||
}
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
video_id = self._search_regex(
|
||||
r'"videoId":\s(\d+),', webpage, 'video ID')
|
||||
title = self._og_search_title(webpage)
|
||||
description = self._og_search_description(webpage)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
|
||||
sources = re.findall(r'<source src="([^"]+)" type="([^"]+)"', webpage)
|
||||
formats = []
|
||||
for src, type_ in sources:
|
||||
if type_ == 'video/mp4':
|
||||
formats.append({
|
||||
'format_id': 'mp4_sd',
|
||||
'preference': 1,
|
||||
'url': src,
|
||||
})
|
||||
elif type_ == 'video/webm':
|
||||
formats.append({
|
||||
'format_id': 'webm_sd',
|
||||
'preference': 0,
|
||||
'url': src,
|
||||
})
|
||||
elif type_ == 'application/x-mpegURL':
|
||||
formats.extend(
|
||||
self._extract_m3u8_formats(src, display_id, preference=-1))
|
||||
else:
|
||||
self.report_warning(
|
||||
'Encountered unexpected format: %s' % type_)
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'thumbnail': thumbnail,
|
||||
'description': description,
|
||||
}
|
@ -50,8 +50,6 @@ class ThePlatformBaseIE(OnceIE):
|
||||
else:
|
||||
formats.append(_format)
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = self._parse_smil_subtitles(meta, default_ns)
|
||||
|
||||
return formats, subtitles
|
||||
@ -241,6 +239,7 @@ class ThePlatformIE(ThePlatformBaseIE):
|
||||
smil_url = self._sign_url(smil_url, sig['key'], sig['secret'])
|
||||
|
||||
formats, subtitles = self._extract_theplatform_smil(smil_url, video_id)
|
||||
self._sort_formats(formats)
|
||||
|
||||
ret = self.get_metadata(path, video_id)
|
||||
combined_subtitles = self._merge_subtitles(ret.get('subtitles', {}), subtitles)
|
||||
@ -270,6 +269,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
|
||||
'timestamp': 1391824260,
|
||||
'duration': 467.0,
|
||||
'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'],
|
||||
'uploader': 'NBCU-NEWS',
|
||||
},
|
||||
}
|
||||
|
||||
|
@ -1,7 +1,6 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import codecs
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
@ -10,22 +9,24 @@ from ..utils import (
|
||||
int_or_none,
|
||||
sanitized_Request,
|
||||
urlencode_postdata,
|
||||
parse_iso8601,
|
||||
)
|
||||
|
||||
|
||||
class TubiTvIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?tubitv\.com/video\?id=(?P<id>[0-9]+)'
|
||||
_VALID_URL = r'https?://(?:www\.)?tubitv\.com/video/(?P<id>[0-9]+)'
|
||||
_LOGIN_URL = 'http://tubitv.com/login'
|
||||
_NETRC_MACHINE = 'tubitv'
|
||||
_TEST = {
|
||||
'url': 'http://tubitv.com/video?id=54411&title=The_Kitchen_Musical_-_EP01',
|
||||
'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
|
||||
'info_dict': {
|
||||
'id': '54411',
|
||||
'id': '283829',
|
||||
'ext': 'mp4',
|
||||
'title': 'The Kitchen Musical - EP01',
|
||||
'thumbnail': 're:^https?://.*\.png$',
|
||||
'description': 'md5:37532716166069b353e8866e71fefae7',
|
||||
'duration': 2407,
|
||||
'title': 'The Comedian at The Friday',
|
||||
'description': 'A stand up comedian is forced to look at the decisions in his life while on a one week trip to the west coast.',
|
||||
'uploader': 'Indie Rights Films',
|
||||
'upload_date': '20160111',
|
||||
'timestamp': 1452555979,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': 'HLS download',
|
||||
@ -55,27 +56,31 @@ class TubiTvIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
video_data = self._download_json(
|
||||
'http://tubitv.com/oz/videos/%s/content' % video_id, video_id)
|
||||
title = video_data['n']
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
if re.search(r"<(?:DIV|div) class='login-required-screen'>", webpage):
|
||||
self.raise_login_required('This video requires login')
|
||||
|
||||
title = self._og_search_title(webpage)
|
||||
description = self._og_search_description(webpage)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
duration = int_or_none(self._html_search_meta(
|
||||
'video:duration', webpage, 'duration'))
|
||||
|
||||
apu = self._search_regex(r"apu='([^']+)'", webpage, 'apu')
|
||||
m3u8_url = codecs.decode(apu, 'rot_13')[::-1]
|
||||
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
|
||||
formats = self._extract_m3u8_formats(
|
||||
video_data['mh'], video_id, 'mp4', 'm3u8_native')
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
for sub in video_data.get('sb', []):
|
||||
sub_url = sub.get('u')
|
||||
if not sub_url:
|
||||
continue
|
||||
subtitles.setdefault(sub.get('l', 'en'), []).append({
|
||||
'url': sub_url,
|
||||
})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
'thumbnail': thumbnail,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'subtitles': subtitles,
|
||||
'thumbnail': video_data.get('ph'),
|
||||
'description': video_data.get('d'),
|
||||
'duration': int_or_none(video_data.get('s')),
|
||||
'timestamp': parse_iso8601(video_data.get('u')),
|
||||
'uploader': video_data.get('on'),
|
||||
}
|
||||
|
@ -65,6 +65,9 @@ class TudouIE(InfoExtractor):
|
||||
if quality:
|
||||
info_url += '&hd' + quality
|
||||
xml_data = self._download_xml(info_url, video_id, 'Opening the info XML page')
|
||||
error = xml_data.attrib.get('error')
|
||||
if error is not None:
|
||||
raise ExtractorError('Tudou said: %s' % error, expected=True)
|
||||
final_url = xml_data.text
|
||||
return final_url
|
||||
|
||||
|
@ -58,7 +58,9 @@ class TvigleIE(InfoExtractor):
|
||||
if not video_id:
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
video_id = self._html_search_regex(
|
||||
r'class="video-preview current_playing" id="(\d+)">',
|
||||
(r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)',
|
||||
r'var\s+cloudId\s*=\s*["\'](\d+)',
|
||||
r'class="video-preview current_playing" id="(\d+)"'),
|
||||
webpage, 'video id')
|
||||
|
||||
video_data = self._download_json(
|
||||
@ -81,10 +83,10 @@ class TvigleIE(InfoExtractor):
|
||||
|
||||
formats = []
|
||||
for vcodec, fmts in item['videos'].items():
|
||||
if vcodec == 'hls':
|
||||
continue
|
||||
for format_id, video_url in fmts.items():
|
||||
if format_id == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
video_url, video_id, 'mp4', m3u8_id=vcodec))
|
||||
continue
|
||||
height = self._search_regex(
|
||||
r'^(\d+)[pP]$', format_id, 'height', default=None)
|
||||
|
@ -260,6 +260,17 @@ class TwitterIE(InfoExtractor):
|
||||
'upload_date': '20140615',
|
||||
},
|
||||
'add_ie': ['Vine'],
|
||||
}, {
|
||||
'url': 'https://twitter.com/captainamerica/status/719944021058060289',
|
||||
# md5 constantly changes
|
||||
'info_dict': {
|
||||
'id': '719944021058060289',
|
||||
'ext': 'mp4',
|
||||
'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theaters.',
|
||||
'description': 'Captain America on Twitter: "@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI"',
|
||||
'uploader_id': 'captainamerica',
|
||||
'uploader': 'Captain America',
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -284,17 +295,6 @@ class TwitterIE(InfoExtractor):
|
||||
'title': username + ' - ' + title,
|
||||
}
|
||||
|
||||
card_id = self._search_regex(
|
||||
r'["\']/i/cards/tfw/v1/(\d+)', webpage, 'twitter card url', default=None)
|
||||
if card_id:
|
||||
card_url = 'https://twitter.com/i/cards/tfw/v1/' + card_id
|
||||
info.update({
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': 'TwitterCard',
|
||||
'url': card_url,
|
||||
})
|
||||
return info
|
||||
|
||||
mobj = re.search(r'''(?x)
|
||||
<video[^>]+class="animated-gif"(?P<more_info>[^>]+)>\s*
|
||||
<source[^>]+video-src="(?P<url>[^"]+)"
|
||||
|
@ -1,57 +0,0 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
qualities,
|
||||
)
|
||||
|
||||
|
||||
class UbuIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?ubu\.com/film/(?P<id>[\da-z_-]+)\.html'
|
||||
_TEST = {
|
||||
'url': 'http://ubu.com/film/her_noise.html',
|
||||
'md5': '138d5652618bf0f03878978db9bef1ee',
|
||||
'info_dict': {
|
||||
'id': 'her_noise',
|
||||
'ext': 'm4v',
|
||||
'title': 'Her Noise - The Making Of (2007)',
|
||||
'duration': 3600,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
title = self._html_search_regex(
|
||||
r'<title>.+?Film & Video: ([^<]+)</title>', webpage, 'title')
|
||||
|
||||
duration = int_or_none(self._html_search_regex(
|
||||
r'Duration: (\d+) minutes', webpage, 'duration', fatal=False),
|
||||
invscale=60)
|
||||
|
||||
formats = []
|
||||
FORMAT_REGEXES = [
|
||||
('sq', r"'flashvars'\s*,\s*'file=([^']+)'"),
|
||||
('hq', r'href="(http://ubumexico\.centro\.org\.mx/video/[^"]+)"'),
|
||||
]
|
||||
preference = qualities([fid for fid, _ in FORMAT_REGEXES])
|
||||
for format_id, format_regex in FORMAT_REGEXES:
|
||||
m = re.search(format_regex, webpage)
|
||||
if m:
|
||||
formats.append({
|
||||
'url': m.group(1),
|
||||
'format_id': format_id,
|
||||
'preference': preference(format_id),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
}
|
@ -41,6 +41,12 @@ class UstreamIE(InfoExtractor):
|
||||
'uploader': 'sportscanadatv',
|
||||
},
|
||||
'skip': 'This Pro Broadcaster has chosen to remove this video from the ustream.tv site.',
|
||||
}, {
|
||||
'url': 'http://www.ustream.tv/embed/10299409',
|
||||
'info_dict': {
|
||||
'id': '10299409',
|
||||
},
|
||||
'playlist_count': 3,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -55,10 +61,12 @@ class UstreamIE(InfoExtractor):
|
||||
if m.group('type') == 'embed':
|
||||
video_id = m.group('id')
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
desktop_video_id = self._html_search_regex(
|
||||
r'ContentVideoIds=\["([^"]*?)"\]', webpage, 'desktop_video_id')
|
||||
desktop_url = 'http://www.ustream.tv/recorded/' + desktop_video_id
|
||||
return self.url_result(desktop_url, 'Ustream')
|
||||
content_video_ids = self._parse_json(self._search_regex(
|
||||
r'ustream\.vars\.offAirContentVideoIds=([^;]+);', webpage,
|
||||
'content video IDs'), video_id)
|
||||
return self.playlist_result(
|
||||
map(lambda u: self.url_result('http://www.ustream.tv/recorded/' + u, 'Ustream'), content_video_ids),
|
||||
video_id)
|
||||
|
||||
params = self._download_json(
|
||||
'https://api.ustream.tv/videos/%s.json' % video_id, video_id)
|
||||
|
@ -2,11 +2,19 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urllib_parse_urlparse,
|
||||
compat_parse_qs,
|
||||
)
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
remove_start,
|
||||
)
|
||||
|
||||
|
||||
class Varzesh3IE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?video\.varzesh3\.com/(?:[^/]+/)+(?P<id>[^/]+)/?'
|
||||
_TEST = {
|
||||
_TESTS = [{
|
||||
'url': 'http://video.varzesh3.com/germany/bundesliga/5-%D9%88%D8%A7%DA%A9%D9%86%D8%B4-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AF%D8%B1%D9%88%D8%A7%D8%B2%D9%87%E2%80%8C%D8%A8%D8%A7%D9%86%D8%A7%D9%86%D8%9B%D9%87%D9%81%D8%AA%D9%87-26-%D8%A8%D9%88%D9%86%D8%AF%D8%B3/',
|
||||
'md5': '2a933874cb7dce4366075281eb49e855',
|
||||
'info_dict': {
|
||||
@ -15,8 +23,19 @@ class Varzesh3IE(InfoExtractor):
|
||||
'title': '۵ واکنش برتر دروازهبانان؛هفته ۲۶ بوندسلیگا',
|
||||
'description': 'فصل ۲۰۱۵-۲۰۱۴',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
}
|
||||
}
|
||||
},
|
||||
'skip': 'HTTP 404 Error',
|
||||
}, {
|
||||
'url': 'http://video.varzesh3.com/video/112785/%D8%AF%D9%84%D9%87-%D8%B9%D9%84%DB%8C%D8%9B-%D8%B3%D8%AA%D8%A7%D8%B1%D9%87-%D9%86%D9%88%D8%B8%D9%87%D9%88%D8%B1-%D9%84%DB%8C%DA%AF-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AC%D8%B2%DB%8C%D8%B1%D9%87',
|
||||
'md5': '841b7cd3afbc76e61708d94e53a4a4e7',
|
||||
'info_dict': {
|
||||
'id': '112785',
|
||||
'ext': 'mp4',
|
||||
'title': 'دله علی؛ ستاره نوظهور لیگ برتر جزیره',
|
||||
'description': 'فوتبال 120',
|
||||
},
|
||||
'expected_warnings': ['description'],
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
@ -26,15 +45,30 @@ class Varzesh3IE(InfoExtractor):
|
||||
video_url = self._search_regex(
|
||||
r'<source[^>]+src="([^"]+)"', webpage, 'video url')
|
||||
|
||||
title = self._og_search_title(webpage)
|
||||
title = remove_start(self._html_search_regex(
|
||||
r'<title>([^<]+)</title>', webpage, 'title'), 'ویدیو ورزش 3 | ')
|
||||
|
||||
description = self._html_search_regex(
|
||||
r'(?s)<div class="matn">(.+?)</div>',
|
||||
webpage, 'description', fatal=False)
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
webpage, 'description', default=None)
|
||||
if description is None:
|
||||
description = clean_html(self._html_search_meta('description', webpage))
|
||||
|
||||
thumbnail = self._og_search_thumbnail(webpage, default=None)
|
||||
if thumbnail is None:
|
||||
fb_sharer_url = self._search_regex(
|
||||
r'<a[^>]+href="(https?://www\.facebook\.com/sharer/sharer\.php?[^"]+)"',
|
||||
webpage, 'facebook sharer URL', fatal=False)
|
||||
sharer_params = compat_parse_qs(compat_urllib_parse_urlparse(fb_sharer_url).query)
|
||||
thumbnail = sharer_params.get('p[images][0]', [None])[0]
|
||||
|
||||
video_id = self._search_regex(
|
||||
r"<link[^>]+rel='(?:canonical|shortlink)'[^>]+href='/\?p=([^']+)'",
|
||||
webpage, display_id, default=display_id)
|
||||
webpage, display_id, default=None)
|
||||
if video_id is None:
|
||||
video_id = self._search_regex(
|
||||
'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
|
||||
default=display_id)
|
||||
|
||||
return {
|
||||
'url': video_url,
|
||||
|
@ -3,7 +3,6 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .ooyala import OoyalaIE
|
||||
from ..utils import ExtractorError
|
||||
|
||||
|
||||
@ -14,13 +13,21 @@ class ViceIE(InfoExtractor):
|
||||
'url': 'http://www.vice.com/video/cowboy-capitalists-part-1',
|
||||
'info_dict': {
|
||||
'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp',
|
||||
'ext': 'mp4',
|
||||
'ext': 'flv',
|
||||
'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
|
||||
'duration': 725.983,
|
||||
},
|
||||
'params': {
|
||||
# Requires ffmpeg (m3u8 manifest)
|
||||
'skip_download': True,
|
||||
}, {
|
||||
'url': 'http://www.vice.com/video/how-to-hack-a-car',
|
||||
'md5': '6fb2989a3fed069fb8eab3401fc2d3c9',
|
||||
'info_dict': {
|
||||
'id': '3jstaBeXgAs',
|
||||
'ext': 'mp4',
|
||||
'title': 'How to Hack a Car: Phreaked Out (Episode 2)',
|
||||
'description': 'md5:ee95453f7ff495db8efe14ae8bf56f30',
|
||||
'uploader_id': 'MotherboardTV',
|
||||
'uploader': 'Motherboard',
|
||||
'upload_date': '20140529',
|
||||
},
|
||||
}, {
|
||||
'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
|
||||
@ -39,11 +46,14 @@ class ViceIE(InfoExtractor):
|
||||
try:
|
||||
embed_code = self._search_regex(
|
||||
r'embedCode=([^&\'"]+)', webpage,
|
||||
'ooyala embed code')
|
||||
ooyala_url = OoyalaIE._url_for_embed_code(embed_code)
|
||||
'ooyala embed code', default=None)
|
||||
if embed_code:
|
||||
return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
|
||||
youtube_id = self._search_regex(
|
||||
r'data-youtube-id="([^"]+)"', webpage, 'youtube id')
|
||||
return self.url_result(youtube_id, 'Youtube')
|
||||
except ExtractorError:
|
||||
raise ExtractorError('The page doesn\'t contain a video', expected=True)
|
||||
return self.url_result(ooyala_url, ie='Ooyala')
|
||||
|
||||
|
||||
class ViceShowIE(InfoExtractor):
|
||||
|
@ -1,6 +1,8 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_HTTPError,
|
||||
@ -14,6 +16,7 @@ from ..utils import (
|
||||
parse_iso8601,
|
||||
sanitized_Request,
|
||||
HEADRequest,
|
||||
url_basename,
|
||||
)
|
||||
|
||||
|
||||
@ -114,6 +117,7 @@ class ViewsterIE(InfoExtractor):
|
||||
return self.playlist_result(entries, video_id, title, description)
|
||||
|
||||
formats = []
|
||||
manifest_url = None
|
||||
for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'):
|
||||
media = self._download_json(
|
||||
'https://public-api.viewster.com/movies/%s/video?mediaType=%s'
|
||||
@ -126,29 +130,42 @@ class ViewsterIE(InfoExtractor):
|
||||
continue
|
||||
ext = determine_ext(video_url)
|
||||
if ext == 'f4m':
|
||||
manifest_url = video_url
|
||||
video_url += '&' if '?' in video_url else '?'
|
||||
video_url += 'hdcore=3.2.0&plugin=flowplayer-3.2.0.1'
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
video_url, video_id, f4m_id='hds'))
|
||||
elif ext == 'm3u8':
|
||||
manifest_url = video_url
|
||||
m3u8_formats = self._extract_m3u8_formats(
|
||||
video_url, video_id, 'mp4', m3u8_id='hls',
|
||||
fatal=False) # m3u8 sometimes fail
|
||||
if m3u8_formats:
|
||||
formats.extend(m3u8_formats)
|
||||
else:
|
||||
format_id = media.get('Bitrate')
|
||||
f = {
|
||||
'url': video_url,
|
||||
'format_id': 'mp4-%s' % format_id,
|
||||
'height': int_or_none(media.get('Height')),
|
||||
'width': int_or_none(media.get('Width')),
|
||||
'preference': 1,
|
||||
}
|
||||
if format_id and not f['height']:
|
||||
f['height'] = int_or_none(self._search_regex(
|
||||
r'^(\d+)[pP]$', format_id, 'height', default=None))
|
||||
formats.append(f)
|
||||
qualities_basename = self._search_regex(
|
||||
'/([^/]+)\.csmil/',
|
||||
manifest_url, 'qualities basename', default=None)
|
||||
if not qualities_basename:
|
||||
continue
|
||||
QUALITIES_RE = r'((,\d+k)+,?)'
|
||||
qualities = self._search_regex(
|
||||
QUALITIES_RE, qualities_basename,
|
||||
'qualities', default=None)
|
||||
if not qualities:
|
||||
continue
|
||||
qualities = qualities.strip(',').split(',')
|
||||
http_template = re.sub(QUALITIES_RE, r'%s', qualities_basename)
|
||||
http_url_basename = url_basename(video_url)
|
||||
for q in qualities:
|
||||
tbr = int_or_none(self._search_regex(
|
||||
r'(\d+)k', q, 'bitrate', default=None))
|
||||
formats.append({
|
||||
'url': video_url.replace(http_url_basename, http_template % q),
|
||||
'ext': 'mp4',
|
||||
'format_id': 'http' + ('-%d' % tbr if tbr else ''),
|
||||
'tbr': tbr,
|
||||
})
|
||||
|
||||
if not formats and not info.get('LanguageSets') and not info.get('VODSettings'):
|
||||
self.raise_geo_restricted()
|
||||
|
@ -81,7 +81,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
||||
\.
|
||||
)?
|
||||
vimeo(?P<pro>pro)?\.com/
|
||||
(?!channels/[^/?#]+/?(?:$|[?#])|(?:album|ondemand)/)
|
||||
(?!channels/[^/?#]+/?(?:$|[?#])|[^/]+/review/|(?:album|ondemand)/)
|
||||
(?:.*?/)?
|
||||
(?:
|
||||
(?:
|
||||
@ -90,6 +90,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
||||
)?
|
||||
(?:videos?/)?
|
||||
(?P<id>[0-9]+)
|
||||
(?:/[\da-f]+)?
|
||||
/?(?:[?&].*)?(?:[#].*)?$
|
||||
'''
|
||||
IE_NAME = 'vimeo'
|
||||
@ -232,6 +233,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
||||
'url': 'https://vimeo.com/7809605',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'https://vimeo.com/160743502/abd0e13fb4',
|
||||
'only_matching': True,
|
||||
}
|
||||
]
|
||||
|
||||
@staticmethod
|
||||
@ -277,10 +282,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
|
||||
pass_url = url + '/check-password'
|
||||
password_request = sanitized_Request(pass_url, data)
|
||||
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
|
||||
password_request.add_header('Referer', url)
|
||||
return self._download_json(
|
||||
password_request, video_id,
|
||||
'Verifying the password',
|
||||
'Wrong password')
|
||||
'Verifying the password', 'Wrong password')
|
||||
|
||||
def _real_initialize(self):
|
||||
self._login()
|
||||
|
@ -1,52 +0,0 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class WayOfTheMasterIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://www\.wayofthemaster\.com/([^/?#]*/)*(?P<id>[^/?#]+)\.s?html(?:$|[?#])'
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.wayofthemaster.com/hbks.shtml',
|
||||
'md5': '5316b57487ada8480606a93cb3d18d24',
|
||||
'info_dict': {
|
||||
'id': 'hbks',
|
||||
'ext': 'mp4',
|
||||
'title': 'Intelligent Design vs. Evolution',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
title = self._search_regex(
|
||||
r'<img src="images/title_[^"]+".*?alt="([^"]+)"',
|
||||
webpage, 'title', default=None)
|
||||
if title is None:
|
||||
title = self._html_search_regex(
|
||||
r'<title>(.*?)</title>', webpage, 'page title')
|
||||
|
||||
url_base = self._search_regex(
|
||||
r'<param\s+name="?movie"?\s+value=".*?/wotm_videoplayer_highlow[0-9]*\.swf\?vid=([^"]+)"',
|
||||
webpage, 'URL base')
|
||||
formats = [{
|
||||
'format_id': 'low',
|
||||
'quality': 1,
|
||||
'url': url_base + '_low.mp4',
|
||||
}, {
|
||||
'format_id': 'high',
|
||||
'quality': 2,
|
||||
'url': url_base + '_high.mp4',
|
||||
}]
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
}
|
@ -12,7 +12,7 @@ from ..utils import (
|
||||
class XboxClipsIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?xboxclips\.com/(?:video\.php\?.*vid=|[^/]+/)(?P<id>[\w-]{36})'
|
||||
_TEST = {
|
||||
'url': 'https://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325',
|
||||
'url': 'http://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325',
|
||||
'md5': 'fbe1ec805e920aeb8eced3c3e657df5d',
|
||||
'info_dict': {
|
||||
'id': '074a69a9-5faf-46aa-b93b-9909c1720325',
|
||||
|
@ -2,15 +2,15 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
import time
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_chr,
|
||||
compat_ord,
|
||||
)
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
parse_filesize,
|
||||
parse_duration,
|
||||
)
|
||||
|
||||
|
||||
@ -22,7 +22,7 @@ class XMinusIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': '4542',
|
||||
'ext': 'mp3',
|
||||
'title': 'Леонид Агутин-Песенка шофера',
|
||||
'title': 'Леонид Агутин-Песенка шофёра',
|
||||
'duration': 156,
|
||||
'tbr': 320,
|
||||
'filesize_approx': 5900000,
|
||||
@ -36,38 +36,41 @@ class XMinusIE(InfoExtractor):
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
artist = self._html_search_regex(
|
||||
r'minus_track\.artist="(.+?)"', webpage, 'artist')
|
||||
r'<a[^>]+href="/artist/\d+">([^<]+)</a>', webpage, 'artist')
|
||||
title = artist + '-' + self._html_search_regex(
|
||||
r'minus_track\.title="(.+?)"', webpage, 'title')
|
||||
duration = int_or_none(self._html_search_regex(
|
||||
r'minus_track\.dur_sec=\'([0-9]*?)\'',
|
||||
r'<span[^>]+class="minustrack-full-title(?:\s+[^"]+)?"[^>]*>([^<]+)', webpage, 'title')
|
||||
duration = parse_duration(self._html_search_regex(
|
||||
r'<span[^>]+class="player-duration(?:\s+[^"]+)?"[^>]*>([^<]+)',
|
||||
webpage, 'duration', fatal=False))
|
||||
filesize_approx = parse_filesize(self._html_search_regex(
|
||||
r'<div id="finfo"[^>]*>\s*↓\s*([0-9.]+\s*[a-zA-Z][bB])',
|
||||
webpage, 'approximate filesize', fatal=False))
|
||||
tbr = int_or_none(self._html_search_regex(
|
||||
r'<div class="quality[^"]*"></div>\s*([0-9]+)\s*kbps',
|
||||
webpage, 'bitrate', fatal=False))
|
||||
mobj = re.search(
|
||||
r'<div[^>]+class="dw-info(?:\s+[^"]+)?"[^>]*>(?P<tbr>\d+)\s*кбит/c\s+(?P<filesize>[0-9.]+)\s*мб</div>',
|
||||
webpage)
|
||||
tbr = filesize_approx = None
|
||||
if mobj:
|
||||
filesize_approx = float(mobj.group('filesize')) * 1000000
|
||||
tbr = float(mobj.group('tbr'))
|
||||
view_count = int_or_none(self._html_search_regex(
|
||||
r'<div class="quality.*?► ([0-9]+)',
|
||||
r'<span><[^>]+class="icon-chart-bar".*?>(\d+)</span>',
|
||||
webpage, 'view count', fatal=False))
|
||||
description = self._html_search_regex(
|
||||
r'(?s)<div id="song_texts">(.*?)</div><br',
|
||||
r'(?s)<pre[^>]+id="lyrics-original"[^>]*>(.*?)</pre>',
|
||||
webpage, 'song lyrics', fatal=False)
|
||||
if description:
|
||||
description = re.sub(' *\r *', '\n', description)
|
||||
|
||||
enc_token = self._html_search_regex(
|
||||
r'minus_track\.s?tkn="(.+?)"', webpage, 'enc_token')
|
||||
token = ''.join(
|
||||
c if pos == 3 else compat_chr(compat_ord(c) - 1)
|
||||
for pos, c in enumerate(reversed(enc_token)))
|
||||
video_url = 'http://x-minus.org/dwlf/%s/%s.mp3' % (video_id, token)
|
||||
k = self._search_regex(
|
||||
r'<div[^>]+id="player-bottom"[^>]+data-k="([^"]+)">', webpage,
|
||||
'encoded data')
|
||||
h = time.time() / 3600
|
||||
a = sum(map(int, [compat_ord(c) for c in k])) + int(video_id) + h
|
||||
video_url = 'http://x-minus.me/dl/minus?id=%s&tkn2=%df%d' % (video_id, a, h)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'url': video_url,
|
||||
# The extension is unknown until actual downloading
|
||||
'ext': 'mp3',
|
||||
'duration': duration,
|
||||
'filesize_approx': filesize_approx,
|
||||
'tbr': tbr,
|
||||
|
@ -24,7 +24,7 @@ from .nbc import NBCSportsVPlayerIE
|
||||
|
||||
class YahooIE(InfoExtractor):
|
||||
IE_DESC = 'Yahoo screen and movies'
|
||||
_VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?\.html)'
|
||||
_VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?(?:\.html)?)'
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
|
||||
@ -38,7 +38,7 @@ class YahooIE(InfoExtractor):
|
||||
},
|
||||
{
|
||||
'url': 'http://screen.yahoo.com/wired/codefellas-s1-ep12-cougar-lies-103000935.html',
|
||||
'md5': 'd6e6fc6e1313c608f316ddad7b82b306',
|
||||
'md5': 'c3466d2b6d5dd6b9f41ba9ed04c24b23',
|
||||
'info_dict': {
|
||||
'id': 'd1dedf8c-d58c-38c3-8963-e899929ae0a9',
|
||||
'ext': 'mp4',
|
||||
@ -49,7 +49,7 @@ class YahooIE(InfoExtractor):
|
||||
},
|
||||
{
|
||||
'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed',
|
||||
'md5': '60e8ac193d8fb71997caa8fce54c6460',
|
||||
'md5': '75ffabdb87c16d4ffe8c036dc4d1c136',
|
||||
'info_dict': {
|
||||
'id': '4fe78544-8d48-39d8-97cd-13f205d9fcdb',
|
||||
'ext': 'mp4',
|
||||
@ -59,15 +59,15 @@ class YahooIE(InfoExtractor):
|
||||
}
|
||||
},
|
||||
{
|
||||
'url': 'https://tw.screen.yahoo.com/election-2014-askmayor/敢問市長-黃秀霜批賴清德-非常高傲-033009720.html',
|
||||
'md5': '3a09cf59349cfaddae1797acc3c087fc',
|
||||
'url': 'https://tw.news.yahoo.com/%E6%95%A2%E5%95%8F%E5%B8%82%E9%95%B7%20%E9%BB%83%E7%A7%80%E9%9C%9C%E6%89%B9%E8%B3%B4%E6%B8%85%E5%BE%B7%20%E9%9D%9E%E5%B8%B8%E9%AB%98%E5%82%B2-034024051.html',
|
||||
'md5': '9035d38f88b1782682a3e89f985be5bb',
|
||||
'info_dict': {
|
||||
'id': 'cac903b3-fcf4-3c14-b632-643ab541712f',
|
||||
'ext': 'mp4',
|
||||
'title': '敢問市長/黃秀霜批賴清德「非常高傲」',
|
||||
'description': '直言台南沒捷運 交通居五都之末',
|
||||
'duration': 396,
|
||||
}
|
||||
},
|
||||
},
|
||||
{
|
||||
'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html',
|
||||
@ -89,17 +89,32 @@ class YahooIE(InfoExtractor):
|
||||
'title': 'Program that makes hockey more affordable not offered in Manitoba',
|
||||
'description': 'md5:c54a609f4c078d92b74ffb9bf1f496f4',
|
||||
'duration': 121,
|
||||
}
|
||||
},
|
||||
'skip': 'Video gone',
|
||||
}, {
|
||||
'url': 'https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html',
|
||||
'md5': '226a895aae7e21b0129e2a2006fe9690',
|
||||
'info_dict': {
|
||||
'id': '154609075',
|
||||
},
|
||||
'playlist': [{
|
||||
'md5': 'f8e336c6b66f503282e5f719641d6565',
|
||||
'info_dict': {
|
||||
'id': 'e624c4bc-3389-34de-9dfc-025f74943409',
|
||||
'ext': 'mp4',
|
||||
'title': '\'The Interview\' TV Spot: War',
|
||||
'description': 'The Interview',
|
||||
'duration': 30,
|
||||
}
|
||||
},
|
||||
}, {
|
||||
'md5': '958bcb90b4d6df71c56312137ee1cd5a',
|
||||
'info_dict': {
|
||||
'id': '1fc8ada0-718e-3abe-a450-bf31f246d1a9',
|
||||
'ext': 'mp4',
|
||||
'title': '\'The Interview\' TV Spot: Guys',
|
||||
'description': 'The Interview',
|
||||
'duration': 30,
|
||||
},
|
||||
}],
|
||||
}, {
|
||||
'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html',
|
||||
'md5': '88e209b417f173d86186bef6e4d1f160',
|
||||
@ -119,10 +134,11 @@ class YahooIE(InfoExtractor):
|
||||
'title': 'Connect the Dots: Dark Side of Virgo',
|
||||
'description': 'md5:1428185051cfd1949807ad4ff6d3686a',
|
||||
'duration': 201,
|
||||
}
|
||||
},
|
||||
'skip': 'Domain name in.lifestyle.yahoo.com gone',
|
||||
}, {
|
||||
'url': 'https://www.yahoo.com/movies/v/true-story-trailer-173000497.html',
|
||||
'md5': '989396ae73d20c6f057746fb226aa215',
|
||||
'md5': 'b17ac378b1134fa44370fb27db09a744',
|
||||
'info_dict': {
|
||||
'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
|
||||
'ext': 'mp4',
|
||||
@ -141,6 +157,9 @@ class YahooIE(InfoExtractor):
|
||||
'ext': 'flv',
|
||||
'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
|
||||
'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
|
||||
'upload_date': '20150313',
|
||||
'uploader': 'NBCU-SPORTS',
|
||||
'timestamp': 1426270238,
|
||||
}
|
||||
}, {
|
||||
'url': 'https://tw.news.yahoo.com/-100120367.html',
|
||||
@ -148,7 +167,7 @@ class YahooIE(InfoExtractor):
|
||||
}, {
|
||||
# Query result is embedded in webpage, but explicit request to video API fails with geo restriction
|
||||
'url': 'https://screen.yahoo.com/community/communitary-community-episode-1-ladders-154501237.html',
|
||||
'md5': '4fbafb9c9b6f07aa8f870629f6671b35',
|
||||
'md5': '1ddbf7c850777548438e5c4f147c7b8c',
|
||||
'info_dict': {
|
||||
'id': '1f32853c-a271-3eef-8cb6-f6d6872cb504',
|
||||
'ext': 'mp4',
|
||||
@ -166,6 +185,17 @@ class YahooIE(InfoExtractor):
|
||||
'description': 'While they play feuding fathers in \'Daddy\'s Home,\' star Will Ferrell & Mark Wahlberg share their true feelings on parenthood.',
|
||||
},
|
||||
},
|
||||
{
|
||||
# config['models']['applet_model']['data']['sapi'] has no query
|
||||
'url': 'https://www.yahoo.com/music/livenation/event/galactic-2016',
|
||||
'md5': 'dac0c72d502bc5facda80c9e6d5c98db',
|
||||
'info_dict': {
|
||||
'id': 'a6015640-e9e5-3efb-bb60-05589a183919',
|
||||
'ext': 'mp4',
|
||||
'description': 'Galactic',
|
||||
'title': 'Dolla Diva (feat. Maggie Koerner)',
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -174,19 +204,26 @@ class YahooIE(InfoExtractor):
|
||||
page_id = mobj.group('id')
|
||||
url = mobj.group('url')
|
||||
host = mobj.group('host')
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
webpage, urlh = self._download_webpage_handle(url, display_id)
|
||||
if 'err=404' in urlh.geturl():
|
||||
raise ExtractorError('Video gone', expected=True)
|
||||
|
||||
# Look for iframed media first
|
||||
iframe_m = re.search(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
|
||||
if iframe_m:
|
||||
entries = []
|
||||
iframe_urls = re.findall(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
|
||||
for idx, iframe_url in enumerate(iframe_urls):
|
||||
iframepage = self._download_webpage(
|
||||
host + iframe_m.group(1), display_id, 'Downloading iframe webpage')
|
||||
host + iframe_url, display_id,
|
||||
note='Downloading iframe webpage for video #%d' % idx)
|
||||
items_json = self._search_regex(
|
||||
r'mediaItems: (\[.+?\])$', iframepage, 'items', flags=re.MULTILINE, default=None)
|
||||
if items_json:
|
||||
items = json.loads(items_json)
|
||||
video_id = items[0]['id']
|
||||
return self._get_info(video_id, display_id, webpage)
|
||||
entries.append(self._get_info(video_id, display_id, webpage))
|
||||
if entries:
|
||||
return self.playlist_result(entries, page_id)
|
||||
|
||||
# Look for NBCSports iframes
|
||||
nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
|
||||
if nbc_sports_url:
|
||||
@ -202,7 +239,7 @@ class YahooIE(InfoExtractor):
|
||||
config = self._parse_json(config_json, display_id, fatal=False)
|
||||
if config:
|
||||
sapi = config.get('models', {}).get('applet_model', {}).get('data', {}).get('sapi')
|
||||
if sapi:
|
||||
if sapi and 'query' in sapi:
|
||||
return self._extract_info(display_id, sapi, webpage)
|
||||
|
||||
items_json = self._search_regex(
|
||||
|
@ -64,6 +64,14 @@ class YoukuIE(InfoExtractor):
|
||||
'params': {
|
||||
'videopassword': '100600',
|
||||
},
|
||||
}, {
|
||||
# /play/get.json contains streams with "channel_type":"tail"
|
||||
'url': 'http://v.youku.com/v_show/id_XOTUxMzg4NDMy.html',
|
||||
'info_dict': {
|
||||
'id': 'XOTUxMzg4NDMy',
|
||||
'title': '我的世界☆明月庄主☆车震猎杀☆杀人艺术Minecraft',
|
||||
},
|
||||
'playlist_count': 6,
|
||||
}]
|
||||
|
||||
def construct_video_urls(self, data):
|
||||
@ -92,6 +100,8 @@ class YoukuIE(InfoExtractor):
|
||||
|
||||
fileid_dict = {}
|
||||
for stream in data['stream']:
|
||||
if stream.get('channel_type') == 'tail':
|
||||
continue
|
||||
format = stream.get('stream_type')
|
||||
fileid = stream['stream_fileid']
|
||||
fileid_dict[format] = fileid
|
||||
@ -117,6 +127,8 @@ class YoukuIE(InfoExtractor):
|
||||
# generate video_urls
|
||||
video_urls_dict = {}
|
||||
for stream in data['stream']:
|
||||
if stream.get('channel_type') == 'tail':
|
||||
continue
|
||||
format = stream.get('stream_type')
|
||||
video_urls = []
|
||||
for dt in stream['segs']:
|
||||
@ -253,6 +265,8 @@ class YoukuIE(InfoExtractor):
|
||||
# which one has all
|
||||
} for i in range(max(len(v.get('segs')) for v in data['stream']))]
|
||||
for stream in data['stream']:
|
||||
if stream.get('channel_type') == 'tail':
|
||||
continue
|
||||
fm = stream.get('stream_type')
|
||||
video_urls = video_urls_dict[fm]
|
||||
for video_url, seg, entry in zip(video_urls, stream['segs'], entries):
|
||||
|
@ -125,6 +125,12 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
|
||||
if login_results is False:
|
||||
return False
|
||||
|
||||
error_msg = self._html_search_regex(
|
||||
r'<[^>]+id="errormsg_0_Passwd"[^>]*>([^<]+)<',
|
||||
login_results, 'error message', default=None)
|
||||
if error_msg:
|
||||
raise ExtractorError('Unable to login: %s' % error_msg, expected=True)
|
||||
|
||||
if re.search(r'id="errormsg_0_Passwd"', login_results) is not None:
|
||||
raise ExtractorError('Please use your account password and a two-factor code instead of an application-specific password.', expected=True)
|
||||
|
||||
@ -1818,20 +1824,32 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
|
||||
def _extract_mix(self, playlist_id):
|
||||
# The mixes are generated from a single video
|
||||
# the id of the playlist is just 'RD' + video_id
|
||||
url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id)
|
||||
ids = []
|
||||
last_id = playlist_id[-11:]
|
||||
for n in itertools.count(1):
|
||||
url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
|
||||
webpage = self._download_webpage(
|
||||
url, playlist_id, 'Downloading Youtube mix')
|
||||
url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
|
||||
new_ids = orderedSet(re.findall(
|
||||
r'''(?xs)data-video-username=".*?".*?
|
||||
href="/watch\?v=([0-9A-Za-z_-]{11})&[^"]*?list=%s''' % re.escape(playlist_id),
|
||||
webpage))
|
||||
# Fetch new pages until all the videos are repeated, it seems that
|
||||
# there are always 51 unique videos.
|
||||
new_ids = [_id for _id in new_ids if _id not in ids]
|
||||
if not new_ids:
|
||||
break
|
||||
ids.extend(new_ids)
|
||||
last_id = ids[-1]
|
||||
|
||||
url_results = self._ids_to_results(ids)
|
||||
|
||||
search_title = lambda class_name: get_element_by_attribute('class', class_name, webpage)
|
||||
title_span = (
|
||||
search_title('playlist-title') or
|
||||
search_title('title long-title') or
|
||||
search_title('title'))
|
||||
title = clean_html(title_span)
|
||||
ids = orderedSet(re.findall(
|
||||
r'''(?xs)data-video-username=".*?".*?
|
||||
href="/watch\?v=([0-9A-Za-z_-]{11})&[^"]*?list=%s''' % re.escape(playlist_id),
|
||||
webpage))
|
||||
url_results = self._ids_to_results(ids)
|
||||
|
||||
return self.playlist_result(url_results, playlist_id, title)
|
||||
|
||||
@ -1884,7 +1902,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
|
||||
if video:
|
||||
return video
|
||||
|
||||
if playlist_id.startswith('RD') or playlist_id.startswith('UL'):
|
||||
if playlist_id.startswith(('RD', 'UL', 'PU')):
|
||||
# Mixes require a custom extraction process
|
||||
return self._extract_mix(playlist_id)
|
||||
|
||||
@ -1987,8 +2005,8 @@ class YoutubeUserIE(YoutubeChannelIE):
|
||||
def suitable(cls, url):
|
||||
# Don't return True if the url can be extracted with other youtube
|
||||
# extractor, the regex would is too permissive and it would match.
|
||||
other_ies = iter(klass for (name, klass) in globals().items() if name.endswith('IE') and klass is not cls)
|
||||
if any(ie.suitable(url) for ie in other_ies):
|
||||
other_yt_ies = iter(klass for (name, klass) in globals().items() if name.startswith('Youtube') and name.endswith('IE') and klass is not cls)
|
||||
if any(ie.suitable(url) for ie in other_yt_ies):
|
||||
return False
|
||||
else:
|
||||
return super(YoutubeUserIE, cls).suitable(url)
|
||||
|
@ -425,8 +425,12 @@ def parseOpts(overrideArguments=None):
|
||||
help='Set file xattribute ytdl.filesize with expected filesize (experimental)')
|
||||
downloader.add_option(
|
||||
'--hls-prefer-native',
|
||||
dest='hls_prefer_native', action='store_true',
|
||||
help='Use the native HLS downloader instead of ffmpeg (experimental)')
|
||||
dest='hls_prefer_native', action='store_true', default=None,
|
||||
help='Use the native HLS downloader instead of ffmpeg')
|
||||
downloader.add_option(
|
||||
'--hls-prefer-ffmpeg',
|
||||
dest='hls_prefer_native', action='store_false', default=None,
|
||||
help='Use ffmpeg instead of the native HLS downloader')
|
||||
downloader.add_option(
|
||||
'--hls-use-mpegts',
|
||||
dest='hls_use_mpegts', action='store_true',
|
||||
|
@ -175,7 +175,8 @@ class FFmpegPostProcessor(PostProcessor):
|
||||
# Always use 'file:' because the filename may contain ':' (ffmpeg
|
||||
# interprets that as a protocol) or can start with '-' (-- is broken in
|
||||
# ffmpeg, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details)
|
||||
return 'file:' + fn
|
||||
# Also leave '-' intact in order not to break streaming to stdout.
|
||||
return 'file:' + fn if fn != '-' else fn
|
||||
|
||||
|
||||
class FFmpegExtractAudioPP(FFmpegPostProcessor):
|
||||
|
@ -1540,44 +1540,46 @@ def parse_duration(s):
|
||||
|
||||
s = s.strip()
|
||||
|
||||
days, hours, mins, secs, ms = [None] * 5
|
||||
m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?$', s)
|
||||
if m:
|
||||
days, hours, mins, secs, ms = m.groups()
|
||||
else:
|
||||
m = re.match(
|
||||
r'''(?ix)(?:P?T)?
|
||||
(?:
|
||||
(?P<only_mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*|
|
||||
(?P<only_hours>[0-9.]+)\s*(?:hours?)|
|
||||
|
||||
\s*(?P<hours_reversed>[0-9]+)\s*(?:[:h]|hours?)\s*(?P<mins_reversed>[0-9]+)\s*(?:[:m]|mins?\.?|minutes?)\s*|
|
||||
(?:
|
||||
(?:
|
||||
(?:(?P<days>[0-9]+)\s*(?:[:d]|days?)\s*)?
|
||||
(?P<hours>[0-9]+)\s*(?:[:h]|hours?)\s*
|
||||
(?P<days>[0-9]+)\s*d(?:ays?)?\s*
|
||||
)?
|
||||
(?P<mins>[0-9]+)\s*(?:[:m]|mins?|minutes?)\s*
|
||||
(?:
|
||||
(?P<hours>[0-9]+)\s*h(?:ours?)?\s*
|
||||
)?
|
||||
(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*(?:s|secs?|seconds?)?
|
||||
)$''', s)
|
||||
if not m:
|
||||
(?:
|
||||
(?P<mins>[0-9]+)\s*m(?:in(?:ute)?s?)?\s*
|
||||
)?
|
||||
(?:
|
||||
(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
|
||||
)?$''', s)
|
||||
if m:
|
||||
days, hours, mins, secs, ms = m.groups()
|
||||
else:
|
||||
m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)$', s)
|
||||
if m:
|
||||
hours, mins = m.groups()
|
||||
else:
|
||||
return None
|
||||
res = 0
|
||||
if m.group('only_mins'):
|
||||
return float_or_none(m.group('only_mins'), invscale=60)
|
||||
if m.group('only_hours'):
|
||||
return float_or_none(m.group('only_hours'), invscale=60 * 60)
|
||||
if m.group('secs'):
|
||||
res += int(m.group('secs'))
|
||||
if m.group('mins_reversed'):
|
||||
res += int(m.group('mins_reversed')) * 60
|
||||
if m.group('mins'):
|
||||
res += int(m.group('mins')) * 60
|
||||
if m.group('hours'):
|
||||
res += int(m.group('hours')) * 60 * 60
|
||||
if m.group('hours_reversed'):
|
||||
res += int(m.group('hours_reversed')) * 60 * 60
|
||||
if m.group('days'):
|
||||
res += int(m.group('days')) * 24 * 60 * 60
|
||||
if m.group('ms'):
|
||||
res += float(m.group('ms'))
|
||||
return res
|
||||
|
||||
duration = 0
|
||||
if secs:
|
||||
duration += float(secs)
|
||||
if mins:
|
||||
duration += float(mins) * 60
|
||||
if hours:
|
||||
duration += float(hours) * 60 * 60
|
||||
if days:
|
||||
duration += float(days) * 24 * 60 * 60
|
||||
if ms:
|
||||
duration += float(ms)
|
||||
return duration
|
||||
|
||||
|
||||
def prepend_extension(filename, ext, expected_real_ext=None):
|
||||
@ -1933,6 +1935,9 @@ def error_to_compat_str(err):
|
||||
|
||||
|
||||
def mimetype2ext(mt):
|
||||
if mt is None:
|
||||
return None
|
||||
|
||||
ext = {
|
||||
'audio/mp4': 'm4a',
|
||||
}.get(mt)
|
||||
|
@ -1,3 +1,3 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2016.04.06'
|
||||
__version__ = '2016.04.24'
|
||||
|
Loading…
x
Reference in New Issue
Block a user