diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md index bf9494646..c208eb689 100644 --- a/.github/ISSUE_TEMPLATE.md +++ b/.github/ISSUE_TEMPLATE.md @@ -6,8 +6,8 @@ --- -### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. -- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.06** +### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. +- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.24** ### Before submitting an *issue* make sure you have: - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections @@ -35,7 +35,7 @@ $ youtube-dl -v [debug] User config: [] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 -[debug] youtube-dl version 2016.04.06 +[debug] youtube-dl version 2016.04.24 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] Proxy map: {} diff --git a/AUTHORS b/AUTHORS index ea8d39978..07cade723 100644 --- a/AUTHORS +++ b/AUTHORS @@ -167,3 +167,4 @@ Kacper Michajłow José Joaquín Atria Viťas Strádal Kagami Hiiragi +Philip Huppert diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0df6193fb..c83b8655a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -140,14 +140,14 @@ After you have ensured this site is distributing it's content legally, you can f # TODO more properties (see youtube_dl/extractor/common.py) } ``` -5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). +5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: - $ git add youtube_dl/extractor/__init__.py + $ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/yourextractor.py $ git commit -m '[yourextractor] Add new extractor' $ git push origin yourextractor diff --git a/README.md b/README.md index cd18edd87..e062444b3 100644 --- a/README.md +++ b/README.md @@ -176,7 +176,9 @@ which means you can modify it, redistribute it or use it however you like. --xattr-set-filesize Set file xattribute ytdl.filesize with expected filesize (experimental) --hls-prefer-native Use the native HLS downloader instead of - ffmpeg (experimental) + ffmpeg + --hls-prefer-ffmpeg Use ffmpeg instead of the native HLS + downloader --hls-use-mpegts Use the mpegts container for HLS videos, allowing to play the video while downloading (some players may not be able @@ -515,6 +517,18 @@ Available for the video that is an episode of some series or programme: - `episode_number`: Number of the video episode within a season - `episode_id`: Id of the video episode +Available for the media that is a track or a part of a music album: + - `track`: Title of the track + - `track_number`: Number of the track within an album or a disc + - `track_id`: Id of the track + - `artist`: Artist(s) of the track + - `genre`: Genre(s) of the track + - `album`: Title of the album the track belongs to + - `album_type`: Type of the album + - `album_artist`: List of all artists appeared on the album + - `disc_number`: Number of the disc or other physical medium the track belongs to + - `release_year`: Year (YYYY) when the album was released + Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`. For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory. diff --git a/docs/supportedsites.md b/docs/supportedsites.md index d6ee8476b..03875b8db 100644 --- a/docs/supportedsites.md +++ b/docs/supportedsites.md @@ -50,6 +50,7 @@ - **arte.tv:ddc** - **arte.tv:embed** - **arte.tv:future** + - **arte.tv:info** - **arte.tv:magazine** - **AtresPlayer** - **ATTTechChannel** @@ -115,6 +116,7 @@ - **Cinemassacre** - **Clipfish** - **cliphunter** + - **ClipRs** - **Clipsyndicate** - **cloudtime**: CloudTime - **Cloudy** @@ -161,6 +163,7 @@ - **defense.gouv.fr** - **democracynow** - **DHM**: Filmarchiv - Deutsches Historisches Museum + - **DigitallySpeaking** - **Digiteka** - **Discovery** - **Dotsub** @@ -172,7 +175,6 @@ - **Dropbox** - **DrTuber** - **DRTV** - - **Dump** - **Dumpert** - **dvtv**: http://video.aktualne.cz/ - **dw** @@ -286,7 +288,6 @@ - **ivi:compilation**: ivi.ru compilations - **ivideon**: Ivideon TV - **Izlesene** - - **JadoreCettePub** - **JeuxVideo** - **Jove** - **jpopsuki.tv** @@ -344,19 +345,22 @@ - **metacafe** - **Metacritic** - **Mgoon** + - **MGTV**: 芒果TV - **Minhateca** - **MinistryGrid** - **Minoto** - **miomio.tv** - **MiTele**: mitele.es - **mixcloud** + - **mixcloud:playlist** + - **mixcloud:stream** + - **mixcloud:user** - **MLB** - **Mnet** - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **Mofosex** - **Mojvideo** - **Moniker**: allmyvideos.net and vidspot.net - - **mooshare**: Mooshare.biz - **Morningstar**: morningstar.com - **Motherless** - **Motorsport**: motorsport.com @@ -393,7 +397,6 @@ - **ndr:embed:base** - **NDTV** - **NerdCubedFeed** - - **Nerdist** - **netease:album**: 网易云音乐 - 专辑 - **netease:djradio**: 网易云音乐 - 电台 - **netease:mv**: 网易云音乐 - MV @@ -411,7 +414,8 @@ - **nfl.com** - **nhl.com** - **nhl.com:news**: NHL news - - **nhl.com:videocenter**: NHL videocenter category + - **nhl.com:videocenter** + - **nhl.com:videocenter:category**: NHL videocenter category - **nick.com** - **niconico**: ニコニコ動画 - **NiconicoPlaylist** @@ -459,13 +463,13 @@ - **Patreon** - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC) - **pcmag** + - **People** - **Periscope**: Periscope - **PhilharmonieDeParis**: Philharmonie de Paris - **phoenix.de** - **Photobucket** - **Pinkbike** - **Pladform** - - **PlanetaPlay** - **play.fm** - **played.to** - **PlaysTV** @@ -484,6 +488,7 @@ - **Pornotube** - **PornoVoisines** - **PornoXO** + - **PressTV** - **PrimeShareTV** - **PromptFile** - **prosiebensat1**: ProSiebenSat.1 Digital @@ -494,7 +499,6 @@ - **qqmusic:playlist**: QQ音乐 - 歌单 - **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:toplist**: QQ音乐 - 排行榜 - - **QuickVid** - **R7** - **radio.de** - **radiobremen** @@ -608,6 +612,7 @@ - **Tagesschau** - **Tapely** - **Tass** + - **TDSLifeway** - **teachertube**: teachertube.com videos - **teachertube:user:collection**: teachertube.com user and collection videos - **TeachingChannel** @@ -624,7 +629,6 @@ - **TeleTask** - **TF1** - **TheIntercept** - - **TheOnion** - **ThePlatform** - **ThePlatformFeed** - **TheScene** @@ -683,7 +687,6 @@ - **twitter** - **twitter:amplify** - **twitter:card** - - **Ubu** - **udemy** - **udemy:course** - **UDNEmbed**: 聯合影音 @@ -753,7 +756,6 @@ - **Walla** - **WashingtonPost** - **wat.tv** - - **WayOfTheMaster** - **WDR** - **wdr:mobile** - **WDRMaus**: Sendung mit der Maus diff --git a/test/test_utils.py b/test/test_utils.py index 0f36bb9f0..e16a6761b 100644 --- a/test/test_utils.py +++ b/test/test_utils.py @@ -413,6 +413,7 @@ class TestUtil(unittest.TestCase): self.assertEqual(parse_duration('01:02:03:04'), 93784) self.assertEqual(parse_duration('1 hour 3 minutes'), 3780) self.assertEqual(parse_duration('87 Min.'), 5220) + self.assertEqual(parse_duration('PT1H0.040S'), 3600.04) def test_fix_xml_ampersands(self): self.assertEqual( diff --git a/test/test_youtube_lists.py b/test/test_youtube_lists.py index 47df0f348..af1c45421 100644 --- a/test/test_youtube_lists.py +++ b/test/test_youtube_lists.py @@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase): ie = YoutubePlaylistIE(dl) result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w') entries = result['entries'] - self.assertTrue(len(entries) >= 20) + self.assertTrue(len(entries) >= 50) original_video = entries[0] self.assertEqual(original_video['id'], 'OQpdSVF_k_w') diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py index a89a71a25..055433362 100755 --- a/youtube_dl/YoutubeDL.py +++ b/youtube_dl/YoutubeDL.py @@ -260,7 +260,9 @@ class YoutubeDL(object): The following options determine which downloader is picked: external_downloader: Executable of the external downloader to call. None or unset for standard (built-in) downloader. - hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv. + hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv + if True, otherwise use ffmpeg/avconv if False, otherwise + use downloader suggested by extractor if None. The following parameters are not used by YoutubeDL itself, they are used by the downloader (see youtube_dl/downloader/common.py): diff --git a/youtube_dl/downloader/__init__.py b/youtube_dl/downloader/__init__.py index 73b34fdae..817591d97 100644 --- a/youtube_dl/downloader/__init__.py +++ b/youtube_dl/downloader/__init__.py @@ -41,9 +41,12 @@ def get_suitable_downloader(info_dict, params={}): if ed.can_download(info_dict): return ed - if protocol == 'm3u8' and params.get('hls_prefer_native'): + if protocol == 'm3u8' and params.get('hls_prefer_native') is True: return HlsFD + if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False: + return FFmpegFD + return PROTOCOL_MAP.get(protocol, HttpFD) diff --git a/youtube_dl/downloader/external.py b/youtube_dl/downloader/external.py index 30277dc20..8d642fc3e 100644 --- a/youtube_dl/downloader/external.py +++ b/youtube_dl/downloader/external.py @@ -225,7 +225,7 @@ class FFmpegFD(ExternalFD): args += ['-i', url, '-c', 'copy'] if protocol == 'm3u8': - if self.params.get('hls_use_mpegts', False): + if self.params.get('hls_use_mpegts', False) or tmpfilename == '-': args += ['-f', 'mpegts'] else: args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc'] diff --git a/youtube_dl/downloader/rtsp.py b/youtube_dl/downloader/rtsp.py index 3eb29526c..939358b2a 100644 --- a/youtube_dl/downloader/rtsp.py +++ b/youtube_dl/downloader/rtsp.py @@ -27,6 +27,8 @@ class RtspFD(FileDownloader): self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.') return False + self._debug_cmd(args) + retval = subprocess.call(args) if retval == 0: fsize = os.path.getsize(encodeFilename(tmpfilename)) diff --git a/youtube_dl/extractor/aol.py b/youtube_dl/extractor/aol.py index d4801a25b..24df8fe93 100644 --- a/youtube_dl/extractor/aol.py +++ b/youtube_dl/extractor/aol.py @@ -12,9 +12,10 @@ from ..utils import ( class AolIE(InfoExtractor): IE_NAME = 'on.aol.com' - _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P[^/?-]+)' + _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/.*-)(?P[^/?-]+)' _TESTS = [{ + # video with 5min ID 'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img', 'md5': '18ef68f48740e86ae94b98da815eec42', 'info_dict': { @@ -31,6 +32,7 @@ class AolIE(InfoExtractor): 'skip_download': True, } }, { + # video with vidible ID 'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183', 'info_dict': { 'id': '5707d6b8e4b090497b04f706', @@ -45,6 +47,12 @@ class AolIE(InfoExtractor): # m3u8 download 'skip_download': True, } + }, { + 'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944', + 'only_matching': True, + }, { + 'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763', + 'only_matching': True, }] def _real_extract(self, url): diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py index 9fb84911a..26446c2fe 100644 --- a/youtube_dl/extractor/ard.py +++ b/youtube_dl/extractor/ard.py @@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor): subtitle_url = media_info.get('_subtitleUrl') if subtitle_url: subtitles['de'] = [{ - 'ext': 'srt', + 'ext': 'ttml', 'url': subtitle_url, }] diff --git a/youtube_dl/extractor/arte.py b/youtube_dl/extractor/arte.py index ae0f27dcb..a9e3266dc 100644 --- a/youtube_dl/extractor/arte.py +++ b/youtube_dl/extractor/arte.py @@ -210,7 +210,7 @@ class ArteTVPlus7IE(InfoExtractor): # It also uses the arte_vp_url url from the webpage to extract the information class ArteTVCreativeIE(ArteTVPlus7IE): IE_NAME = 'arte.tv:creative' - _VALID_URL = r'https?://creative\.arte\.tv/(?Pfr|de|en|es)/(?:magazine?/)?(?P[^/?#&]+)' + _VALID_URL = r'https?://creative\.arte\.tv/(?Pfr|de|en|es)/(?:[^/]+/)*(?P[^/?#&]+)' _TESTS = [{ 'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design', @@ -229,9 +229,27 @@ class ArteTVCreativeIE(ArteTVPlus7IE): 'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n', 'upload_date': '20140805', } + }, { + 'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde', + 'only_matching': True, }] +class ArteTVInfoIE(ArteTVPlus7IE): + IE_NAME = 'arte.tv:info' + _VALID_URL = r'https?://info\.arte\.tv/(?Pfr|de|en|es)/(?:[^/]+/)*(?P[^/?#&]+)' + + _TEST = { + 'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere', + 'info_dict': { + 'id': '067528-000-A', + 'ext': 'mp4', + 'title': 'Service civique, un cache misère ?', + 'upload_date': '20160403', + }, + } + + class ArteTVFutureIE(ArteTVPlus7IE): IE_NAME = 'arte.tv:future' _VALID_URL = r'https?://future\.arte\.tv/(?Pfr|de|en|es)/(?P[^/?#&]+)' @@ -337,7 +355,7 @@ class ArteTVEmbedIE(ArteTVPlus7IE): IE_NAME = 'arte.tv:embed' _VALID_URL = r'''(?x) http://www\.arte\.tv - /playerv2/embed\.php\?json_url= + /(?:playerv2/embed|arte_vp/index)\.php\?json_url= (?P http://arte\.tv/papi/tvguide/videos/stream/player/ (?P[^/]+)/(?P[^/]+)[^&]* diff --git a/youtube_dl/extractor/audiomack.py b/youtube_dl/extractor/audiomack.py index 3eed91279..a52d26cec 100644 --- a/youtube_dl/extractor/audiomack.py +++ b/youtube_dl/extractor/audiomack.py @@ -30,14 +30,14 @@ class AudiomackIE(InfoExtractor): # audiomack wrapper around soundcloud song { 'add_ie': ['Soundcloud'], - 'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare', + 'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle', 'info_dict': { - 'id': '172419696', + 'id': '258901379', 'ext': 'mp3', - 'description': 'md5:1fc3272ed7a635cce5be1568c2822997', - 'title': 'Young Thug ft Lil Wayne - Take Kare', - 'uploader': 'Young Thug World', - 'upload_date': '20141016', + 'description': 'mamba day freestyle for the legend Kobe Bryant ', + 'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]', + 'uploader': 'ILOVEMAKONNEN', + 'upload_date': '20160414', } }, ] diff --git a/youtube_dl/extractor/bbc.py b/youtube_dl/extractor/bbc.py index 425f08f2b..74c4510f9 100644 --- a/youtube_dl/extractor/bbc.py +++ b/youtube_dl/extractor/bbc.py @@ -671,6 +671,7 @@ class BBCIE(BBCCoUkIE): 'info_dict': { 'id': '34475836', 'title': 'Jurgen Klopp: Furious football from a witty and winning coach', + 'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.', }, 'playlist_count': 3, }, { diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py index c718cf385..f0781fc27 100644 --- a/youtube_dl/extractor/brightcove.py +++ b/youtube_dl/extractor/brightcove.py @@ -340,7 +340,7 @@ class BrightcoveLegacyIE(InfoExtractor): ext = 'flv' if ext is None: ext = determine_ext(url) - tbr = int_or_none(rend.get('encodingRate'), 1000), + tbr = int_or_none(rend.get('encodingRate'), 1000) a_format = { 'format_id': 'http%s' % ('-%s' % tbr if tbr else ''), 'url': url, diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py index d8aa31038..68a0633b6 100644 --- a/youtube_dl/extractor/cbc.py +++ b/youtube_dl/extractor/cbc.py @@ -33,6 +33,7 @@ class CBCIE(InfoExtractor): 'title': 'Robin Williams freestyles on 90 Minutes Live', 'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.', 'upload_date': '19700101', + 'uploader': 'CBCC-NEW', }, 'params': { # rtmp download diff --git a/youtube_dl/extractor/cbs.py b/youtube_dl/extractor/cbs.py index c621a08d5..051d783a2 100644 --- a/youtube_dl/extractor/cbs.py +++ b/youtube_dl/extractor/cbs.py @@ -5,7 +5,6 @@ from ..utils import ( xpath_text, xpath_element, int_or_none, - ExtractorError, find_xpath_attr, ) @@ -64,7 +63,7 @@ class CBSIE(CBSBaseIE): 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'only_matching': True, }] - TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true' + TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' def _real_extract(self, url): display_id = self._match_id(url) @@ -84,11 +83,11 @@ class CBSIE(CBSBaseIE): pid = xpath_text(item, 'pid') if not pid: continue - try: - tp_formats, tp_subtitles = self._extract_theplatform_smil( - self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid) - except ExtractorError: - continue + tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid + if '.m3u8' in xpath_text(item, 'contentUrl', default=''): + tp_release_url += '&manifest=m3u' + tp_formats, tp_subtitles = self._extract_theplatform_smil( + tp_release_url, content_id, 'Downloading %s SMIL data' % pid) formats.extend(tp_formats) subtitles = self._merge_subtitles(subtitles, tp_subtitles) self._sort_formats(formats) diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py index 5269059d0..02cd2c003 100644 --- a/youtube_dl/extractor/common.py +++ b/youtube_dl/extractor/common.py @@ -382,7 +382,7 @@ class InfoExtractor(object): else: if query: url_or_request = update_url_query(url_or_request, query) - if data or headers: + if data is not None or headers: url_or_request = sanitized_Request(url_or_request, data, headers) try: return self._downloader.urlopen(url_or_request) diff --git a/youtube_dl/extractor/dispeak.py b/youtube_dl/extractor/dispeak.py new file mode 100644 index 000000000..a78cb8a2a --- /dev/null +++ b/youtube_dl/extractor/dispeak.py @@ -0,0 +1,114 @@ +from __future__ import unicode_literals + +import re + +from .common import InfoExtractor +from ..utils import ( + int_or_none, + parse_duration, + remove_end, + xpath_element, + xpath_text, +) + + +class DigitallySpeakingIE(InfoExtractor): + _VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P[^.]+)\.xml' + + _TESTS = [{ + # From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface + 'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml', + 'md5': 'a8efb6c31ed06ca8739294960b2dbabd', + 'info_dict': { + 'id': '840376_BQRC', + 'ext': 'mp4', + 'title': 'Tenacious Design and The Interface of \'Destiny\'', + }, + }, { + # From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC + 'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml', + 'only_matching': True, + }] + + def _parse_mp4(self, metadata): + video_formats = [] + video_root = None + + mp4_video = xpath_text(metadata, './mp4video', default=None) + if mp4_video is not None: + mobj = re.match(r'(?Phttps?://.*?/).*', mp4_video) + video_root = mobj.group('root') + if video_root is None: + http_host = xpath_text(metadata, 'httpHost', default=None) + if http_host: + video_root = 'http://%s/' % http_host + if video_root is None: + # Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js + # Works for GPUTechConf, too + video_root = 'http://s3-2u.digitallyspeaking.com/' + + formats = metadata.findall('./MBRVideos/MBRVideo') + if not formats: + return None + for a_format in formats: + stream_name = xpath_text(a_format, 'streamName', fatal=True) + video_path = re.match(r'mp4\:(?P.*)', stream_name).group('path') + url = video_root + video_path + vbr = xpath_text(a_format, 'bitrate') + video_formats.append({ + 'url': url, + 'vbr': int_or_none(vbr), + }) + return video_formats + + def _parse_flv(self, metadata): + formats = [] + akamai_url = xpath_text(metadata, './akamaiHost', fatal=True) + audios = metadata.findall('./audios/audio') + for audio in audios: + formats.append({ + 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, + 'play_path': remove_end(audio.get('url'), '.flv'), + 'ext': 'flv', + 'vcodec': 'none', + 'format_id': audio.get('code'), + }) + slide_video_path = xpath_text(metadata, './slideVideo', fatal=True) + formats.append({ + 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, + 'play_path': remove_end(slide_video_path, '.flv'), + 'ext': 'flv', + 'format_note': 'slide deck video', + 'quality': -2, + 'preference': -2, + 'format_id': 'slides', + }) + speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True) + formats.append({ + 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, + 'play_path': remove_end(speaker_video_path, '.flv'), + 'ext': 'flv', + 'format_note': 'speaker video', + 'quality': -1, + 'preference': -1, + 'format_id': 'speaker', + }) + return formats + + def _real_extract(self, url): + video_id = self._match_id(url) + + xml_description = self._download_xml(url, video_id) + metadata = xpath_element(xml_description, 'metadata') + + video_formats = self._parse_mp4(metadata) + if video_formats is None: + video_formats = self._parse_flv(metadata) + + return { + 'id': video_id, + 'formats': video_formats, + 'title': xpath_text(metadata, 'title', fatal=True), + 'duration': parse_duration(xpath_text(metadata, 'endTime')), + 'creator': xpath_text(metadata, 'speaker'), + } diff --git a/youtube_dl/extractor/douyutv.py b/youtube_dl/extractor/douyutv.py index 3915cb182..ce6962755 100644 --- a/youtube_dl/extractor/douyutv.py +++ b/youtube_dl/extractor/douyutv.py @@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor): 'display_id': 'iseven', 'ext': 'flv', 'title': 're:^清晨醒脑!T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', - 'description': 'md5:f34981259a03e980a3c6404190a3ed61', + 'description': 're:.*m7show@163\.com.*', 'thumbnail': 're:^https?://.*\.jpg$', 'uploader': '7师傅', 'uploader_id': '431925', @@ -43,7 +43,7 @@ class DouyuTVIE(InfoExtractor): 'params': { 'skip_download': True, }, - 'skip': 'Romm not found', + 'skip': 'Room not found', }, { 'url': 'http://www.douyutv.com/17732', 'info_dict': { @@ -51,7 +51,7 @@ class DouyuTVIE(InfoExtractor): 'display_id': '17732', 'ext': 'flv', 'title': 're:^清晨醒脑!T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', - 'description': 'md5:f34981259a03e980a3c6404190a3ed61', + 'description': 're:.*m7show@163\.com.*', 'thumbnail': 're:^https?://.*\.jpg$', 'uploader': '7师傅', 'uploader_id': '431925', @@ -75,13 +75,28 @@ class DouyuTVIE(InfoExtractor): room_id = self._html_search_regex( r'"room_id"\s*:\s*(\d+),', page, 'room id') - prefix = 'room/%s?aid=android&client_sys=android&time=%d' % ( - room_id, int(time.time())) + config = None + # Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache" + # Retry with different parameters - same parameters cause same errors + for i in range(5): + prefix = 'room/%s?aid=android&client_sys=android&time=%d' % ( + room_id, int(time.time())) + auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest() - auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest() - config = self._download_json( - 'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth), - video_id) + config_page = self._download_webpage( + 'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth), + video_id) + try: + config = self._parse_json(config_page, video_id, fatal=False) + except ExtractorError: + # Wait some time before retrying to get a different time() value + self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. ' + 'Waiting for %(timeout)s seconds before retrying') + continue + else: + break + if config is None: + raise ExtractorError('Unable to fetch API result') data = config['data'] diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py index 66bbfc6ca..5790553f3 100644 --- a/youtube_dl/extractor/dplay.py +++ b/youtube_dl/extractor/dplay.py @@ -6,13 +6,18 @@ import re import time from .common import InfoExtractor -from ..utils import int_or_none +from ..compat import compat_urlparse +from ..utils import ( + int_or_none, + update_url_query, +) class DPlayIE(InfoExtractor): _VALID_URL = r'https?://(?Pit\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P[^/?#]+)' _TESTS = [{ + # geo restricted, via direct unsigned hls URL 'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/', 'info_dict': { 'id': '1255600', @@ -31,11 +36,12 @@ class DPlayIE(InfoExtractor): }, 'expected_warnings': ['Unable to download f4m manifest'], }, { + # non geo restricted, via secure api, unsigned download hls URL 'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/', 'info_dict': { 'id': '3172', 'display_id': 'season-1-svensken-lar-sig-njuta-av-livet', - 'ext': 'flv', + 'ext': 'mp4', 'title': 'Svensken lär sig njuta av livet', 'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8', 'duration': 2650, @@ -48,23 +54,25 @@ class DPlayIE(InfoExtractor): 'age_limit': 0, }, }, { + # geo restricted, via secure api, unsigned download hls URL 'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/', 'info_dict': { 'id': '70816', 'display_id': 'season-6-episode-12', - 'ext': 'flv', + 'ext': 'mp4', 'title': 'Episode 12', 'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90', 'duration': 2563, 'timestamp': 1429696800, 'upload_date': '20150422', - 'creator': 'Kanal 4', + 'creator': 'Kanal 4 (Home)', 'series': 'Mig og min mor', 'season_number': 6, 'episode_number': 12, 'age_limit': 0, }, }, { + # geo restricted, via direct unsigned hls URL 'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/', 'only_matching': True, }] @@ -90,17 +98,24 @@ class DPlayIE(InfoExtractor): def extract_formats(protocol, manifest_url): if protocol == 'hls': - formats.extend(self._extract_m3u8_formats( + m3u8_formats = self._extract_m3u8_formats( manifest_url, video_id, ext='mp4', - entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)) + entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False) + # Sometimes final URLs inside m3u8 are unsigned, let's fix this + # ourselves + query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query) + for m3u8_format in m3u8_formats: + m3u8_format['url'] = update_url_query(m3u8_format['url'], query) + formats.extend(m3u8_formats) elif protocol == 'hds': formats.extend(self._extract_f4m_formats( manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0', video_id, f4m_id=protocol, fatal=False)) domain_tld = domain.split('.')[-1] - if domain_tld in ('se', 'dk'): + if domain_tld in ('se', 'dk', 'no'): for protocol in PROTOCOLS: + # Providing dsc-geo allows to bypass geo restriction in some cases self._set_cookie( 'secure.dplay.%s' % domain_tld, 'dsc-geo', json.dumps({ @@ -113,13 +128,24 @@ class DPlayIE(InfoExtractor): 'Downloading %s stream JSON' % protocol, fatal=False) if stream and stream.get(protocol): extract_formats(protocol, stream[protocol]) - else: + + # The last resort is to try direct unsigned hls/hds URLs from info dictionary. + # Sometimes this does work even when secure API with dsc-geo has failed (e.g. + # http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/). + if not formats: for protocol in PROTOCOLS: if info.get(protocol): extract_formats(protocol, info[protocol]) self._sort_formats(formats) + subtitles = {} + for lang in ('se', 'sv', 'da', 'nl', 'no'): + for format_id in ('web_vtt', 'vtt', 'srt'): + subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id)) + if subtitle_url: + subtitles.setdefault(lang, []).append({'url': subtitle_url}) + return { 'id': video_id, 'display_id': display_id, @@ -133,4 +159,5 @@ class DPlayIE(InfoExtractor): 'episode_number': int_or_none(info.get('episode')), 'age_limit': int_or_none(info.get('minimum_age')), 'formats': formats, + 'subtitles': subtitles, } diff --git a/youtube_dl/extractor/dump.py b/youtube_dl/extractor/dump.py deleted file mode 100644 index ff78d4fd2..000000000 --- a/youtube_dl/extractor/dump.py +++ /dev/null @@ -1,39 +0,0 @@ -# encoding: utf-8 -from __future__ import unicode_literals - -import re - -from .common import InfoExtractor - - -class DumpIE(InfoExtractor): - _VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P[a-zA-Z0-9]+)/' - - _TEST = { - 'url': 'http://www.dump.com/oneus/', - 'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99', - 'info_dict': { - 'id': 'oneus', - 'ext': 'flv', - 'title': "He's one of us.", - 'thumbnail': 're:^https?://.*\.jpg$', - }, - } - - def _real_extract(self, url): - m = re.match(self._VALID_URL, url) - video_id = m.group('id') - - webpage = self._download_webpage(url, video_id) - video_url = self._search_regex( - r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL') - - title = self._og_search_title(webpage) - thumbnail = self._og_search_thumbnail(webpage) - - return { - 'id': video_id, - 'title': title, - 'url': video_url, - 'thumbnail': thumbnail, - } diff --git a/youtube_dl/extractor/eagleplatform.py b/youtube_dl/extractor/eagleplatform.py index 7bbf617d4..0f8c73fd7 100644 --- a/youtube_dl/extractor/eagleplatform.py +++ b/youtube_dl/extractor/eagleplatform.py @@ -4,9 +4,11 @@ from __future__ import unicode_literals import re from .common import InfoExtractor +from ..compat import compat_HTTPError from ..utils import ( ExtractorError, int_or_none, + url_basename, ) @@ -21,7 +23,7 @@ class EaglePlatformIE(InfoExtractor): _TESTS = [{ # http://lenta.ru/news/2015/03/06/navalny/ 'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201', - 'md5': '70f5187fb620f2c1d503b3b22fd4efe3', + 'md5': '881ee8460e1b7735a8be938e2ffb362b', 'info_dict': { 'id': '227304', 'ext': 'mp4', @@ -36,7 +38,7 @@ class EaglePlatformIE(InfoExtractor): # http://muz-tv.ru/play/7129/ # http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true 'url': 'eagleplatform:media.clipyou.ru:12820', - 'md5': '90b26344ba442c8e44aa4cf8f301164a', + 'md5': '358597369cf8ba56675c1df15e7af624', 'info_dict': { 'id': '12820', 'ext': 'mp4', @@ -55,8 +57,13 @@ class EaglePlatformIE(InfoExtractor): raise ExtractorError(' '.join(response['errors']), expected=True) def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'): - response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note) - self._handle_error(response) + try: + response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note) + except ExtractorError as ee: + if isinstance(ee.cause, compat_HTTPError): + response = self._parse_json(ee.cause.read().decode('utf-8'), video_id) + self._handle_error(response) + raise return response def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'): @@ -84,17 +91,30 @@ class EaglePlatformIE(InfoExtractor): secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:') + formats = [] + m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON') - formats = self._extract_m3u8_formats( + m3u8_formats = self._extract_m3u8_formats( m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls') + formats.extend(m3u8_formats) mp4_url = self._get_video_url( # Secure mp4 URL is constructed according to Player.prototype.mp4 from # http://lentaru.media.eagleplatform.com/player/player.js re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8), video_id, 'Downloading mp4 JSON') - formats.append({'url': mp4_url, 'format_id': 'mp4'}) + mp4_url_basename = url_basename(mp4_url) + for m3u8_format in m3u8_formats: + mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url']) + if mobj: + http_format = m3u8_format.copy() + http_format.update({ + 'url': mp4_url.replace(mp4_url_basename, mobj.group(1)), + 'format_id': m3u8_format['format_id'].replace('hls', 'http'), + 'protocol': 'http', + }) + formats.append(http_format) self._sort_formats(formats) diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py index c234ff127..6de3438fc 100644 --- a/youtube_dl/extractor/extractors.py +++ b/youtube_dl/extractor/extractors.py @@ -46,6 +46,7 @@ from .arte import ( ArteTVPlus7IE, ArteTVCreativeIE, ArteTVConcertIE, + ArteTVInfoIE, ArteTVFutureIE, ArteTVCinemaIE, ArteTVDDCIE, @@ -192,10 +193,10 @@ from .drbonanza import DRBonanzaIE from .drtuber import DrTuberIE from .drtv import DRTVIE from .dvtv import DVTVIE -from .dump import DumpIE from .dumpert import DumpertIE from .defense import DefenseGouvFrIE from .discovery import DiscoveryIE +from .dispeak import DigitallySpeakingIE from .dropbox import DropboxIE from .dw import ( DWIE, @@ -336,7 +337,6 @@ from .ivi import ( ) from .ivideon import IvideonIE from .izlesene import IzleseneIE -from .jadorecettepub import JadoreCettePubIE from .jeuxvideo import JeuxVideoIE from .jove import JoveIE from .jwplatform import JWPlatformIE @@ -406,13 +406,19 @@ from .mdr import MDRIE from .metacafe import MetacafeIE from .metacritic import MetacriticIE from .mgoon import MgoonIE +from .mgtv import MGTVIE from .minhateca import MinhatecaIE from .ministrygrid import MinistryGridIE from .minoto import MinotoIE from .miomio import MioMioIE from .mit import TechTVMITIE, MITIE, OCWMITIE from .mitele import MiTeleIE -from .mixcloud import MixcloudIE +from .mixcloud import ( + MixcloudIE, + MixcloudUserIE, + MixcloudPlaylistIE, + MixcloudStreamIE, +) from .mlb import MLBIE from .mnet import MnetIE from .mpora import MporaIE @@ -420,7 +426,6 @@ from .moevideo import MoeVideoIE from .mofosex import MofosexIE from .mojvideo import MojvideoIE from .moniker import MonikerIE -from .mooshare import MooshareIE from .morningstar import MorningstarIE from .motherless import MotherlessIE from .motorsport import MotorsportIE @@ -465,7 +470,6 @@ from .ndr import ( from .ndtv import NDTVIE from .netzkino import NetzkinoIE from .nerdcubed import NerdCubedFeedIE -from .nerdist import NerdistIE from .neteasemusic import ( NetEaseMusicIE, NetEaseMusicAlbumIE, @@ -486,9 +490,10 @@ from .nextmovie import NextMovieIE from .nfb import NFBIE from .nfl import NFLIE from .nhl import ( - NHLIE, - NHLNewsIE, NHLVideocenterIE, + NHLNewsIE, + NHLVideocenterCategoryIE, + NHLIE, ) from .nick import NickIE from .niconico import NiconicoIE, NiconicoPlaylistIE @@ -556,12 +561,12 @@ from .pandoratv import PandoraTVIE from .parliamentliveuk import ParliamentLiveUKIE from .patreon import PatreonIE from .pbs import PBSIE +from .people import PeopleIE from .periscope import PeriscopeIE from .philharmoniedeparis import PhilharmonieDeParisIE from .phoenix import PhoenixIE from .photobucket import PhotobucketIE from .pinkbike import PinkbikeIE -from .planetaplay import PlanetaPlayIE from .pladform import PladformIE from .played import PlayedIE from .playfm import PlayFMIE @@ -597,7 +602,6 @@ from .qqmusic import ( QQMusicToplistIE, QQMusicPlaylistIE, ) -from .quickvid import QuickVidIE from .r7 import R7IE from .radiode import RadioDeIE from .radiojavan import RadioJavanIE @@ -730,6 +734,7 @@ from .sztvhu import SztvHuIE from .tagesschau import TagesschauIE from .tapely import TapelyIE from .tass import TassIE +from .tdslifeway import TDSLifewayIE from .teachertube import ( TeacherTubeIE, TeacherTubeUserIE, @@ -747,7 +752,6 @@ from .teletask import TeleTaskIE from .testurl import TestURLIE from .tf1 import TF1IE from .theintercept import TheInterceptIE -from .theonion import TheOnionIE from .theplatform import ( ThePlatformIE, ThePlatformFeedIE, @@ -832,7 +836,6 @@ from .twitter import ( TwitterIE, TwitterAmplifyIE, ) -from .ubu import UbuIE from .udemy import ( UdemyIE, UdemyCourseIE @@ -917,7 +920,6 @@ from .vulture import VultureIE from .walla import WallaIE from .washingtonpost import WashingtonPostIE from .wat import WatIE -from .wayofthemaster import WayOfTheMasterIE from .wdr import ( WDRIE, WDRMobileIE, diff --git a/youtube_dl/extractor/gazeta.py b/youtube_dl/extractor/gazeta.py index ea32b621c..18ef5c252 100644 --- a/youtube_dl/extractor/gazeta.py +++ b/youtube_dl/extractor/gazeta.py @@ -7,7 +7,7 @@ from .common import InfoExtractor class GazetaIE(InfoExtractor): - _VALID_URL = r'(?Phttps?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P[A-Za-z0-9-_.]+)\.s?html)' + _VALID_URL = r'(?Phttps?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P[A-Za-z0-9-_.]+)\.s?html)' _TESTS = [{ 'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml', 'md5': 'd49c9bdc6e5a7888f27475dc215ee789', @@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor): 'description': 'md5:38617526050bd17b234728e7f9620a71', 'thumbnail': 're:^https?://.*\.jpg', }, + 'skip': 'video not found', }, { 'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml', 'only_matching': True, + }, { + 'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml', + 'md5': '37f19f78355eb2f4256ee1688359f24c', + 'info_dict': { + 'id': '252048', + 'ext': 'mp4', + 'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"', + }, + 'add_ie': ['EaglePlatform'], }] def _real_extract(self, url): diff --git a/youtube_dl/extractor/gdcvault.py b/youtube_dl/extractor/gdcvault.py index 25e93c9a4..3136427db 100644 --- a/youtube_dl/extractor/gdcvault.py +++ b/youtube_dl/extractor/gdcvault.py @@ -4,7 +4,6 @@ import re from .common import InfoExtractor from ..utils import ( - remove_end, HEADRequest, sanitized_Request, urlencode_postdata, @@ -51,63 +50,33 @@ class GDCVaultIE(InfoExtractor): { 'url': 'http://gdcvault.com/play/1020791/', 'only_matching': True, - } + }, + { + # Hard-coded hostname + 'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface', + 'md5': 'a8efb6c31ed06ca8739294960b2dbabd', + 'info_dict': { + 'id': '1023460', + 'ext': 'mp4', + 'display_id': 'Tenacious-Design-and-The-Interface', + 'title': 'Tenacious Design and The Interface of \'Destiny\'', + }, + }, + { + # Multiple audios + 'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC', + 'info_dict': { + 'id': '1014631', + 'ext': 'flv', + 'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man', + }, + 'params': { + 'skip_download': True, # Requires rtmpdump + 'format': 'jp', # The japanese audio + } + }, ] - def _parse_mp4(self, xml_description): - video_formats = [] - mp4_video = xml_description.find('./metadata/mp4video') - if mp4_video is None: - return None - - mobj = re.match(r'(?Phttps?://.*?/).*', mp4_video.text) - video_root = mobj.group('root') - formats = xml_description.findall('./metadata/MBRVideos/MBRVideo') - for format in formats: - mobj = re.match(r'mp4\:(?P.*)', format.find('streamName').text) - url = video_root + mobj.group('path') - vbr = format.find('bitrate').text - video_formats.append({ - 'url': url, - 'vbr': int(vbr), - }) - return video_formats - - def _parse_flv(self, xml_description): - formats = [] - akamai_url = xml_description.find('./metadata/akamaiHost').text - audios = xml_description.find('./metadata/audios') - if audios is not None: - for audio in audios: - formats.append({ - 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, - 'play_path': remove_end(audio.get('url'), '.flv'), - 'ext': 'flv', - 'vcodec': 'none', - 'format_id': audio.get('code'), - }) - slide_video_path = xml_description.find('./metadata/slideVideo').text - formats.append({ - 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, - 'play_path': remove_end(slide_video_path, '.flv'), - 'ext': 'flv', - 'format_note': 'slide deck video', - 'quality': -2, - 'preference': -2, - 'format_id': 'slides', - }) - speaker_video_path = xml_description.find('./metadata/speakerVideo').text - formats.append({ - 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, - 'play_path': remove_end(speaker_video_path, '.flv'), - 'ext': 'flv', - 'format_note': 'speaker video', - 'quality': -1, - 'preference': -1, - 'format_id': 'speaker', - }) - return formats - def _login(self, webpage_url, display_id): (username, password) = self._get_login_info() if username is None or password is None: @@ -183,17 +152,10 @@ class GDCVaultIE(InfoExtractor): r'