Merge branch 'master' into wdr_update

This commit is contained in:
Jürn Brodersen 2015-08-08 16:51:41 +02:00
commit 44124f0268
134 changed files with 5453 additions and 1767 deletions

View File

@ -129,3 +129,11 @@ Mister Hat
Peter Ding Peter Ding
jackyzy823 jackyzy823
George Brighton George Brighton
Remita Amine
Aurélio A. Heckert
Bernhard Minks
sceext
Zach Bruggeman
Tjark Saul
slangangular
Behrouz Abbasi

View File

@ -75,7 +75,7 @@ which means you can modify it, redistribute it or use it however you like.
## Video Selection: ## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1) --playlist-start NUMBER Playlist video to start at (default is 1)
--playlist-end NUMBER Playlist video to end at (default is last) --playlist-end NUMBER Playlist video to end at (default is last)
--playlist-items ITEM_SPEC Playlist video items to download. Specify indices of the videos in the playlist seperated by commas like: "--playlist-items 1,2,5,8" --playlist-items ITEM_SPEC Playlist video items to download. Specify indices of the videos in the playlist separated by commas like: "--playlist-items 1,2,5,8"
if you want to download videos indexed 1, 2, 5, 8 in the playlist. You can specify range: "--playlist-items 1-3,7,10-13", it will if you want to download videos indexed 1, 2, 5, 8 in the playlist. You can specify range: "--playlist-items 1-3,7,10-13", it will
download the videos at index 1, 2, 3, 7, 10, 11, 12 and 13. download the videos at index 1, 2, 3, 7, 10, 11, 12 and 13.
--match-title REGEX Download only matching titles (regex or caseless sub-string) --match-title REGEX Download only matching titles (regex or caseless sub-string)
@ -108,7 +108,7 @@ which means you can modify it, redistribute it or use it however you like.
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse order
--xattr-set-filesize Set file xattribute ytdl.filesize with expected filesize (experimental) --xattr-set-filesize Set file xattribute ytdl.filesize with expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of ffmpeg (experimental) --hls-prefer-native Use the native HLS downloader instead of ffmpeg (experimental)
--external-downloader COMMAND Use the specified external downloader. Currently supports aria2c,curl,wget --external-downloader COMMAND Use the specified external downloader. Currently supports aria2c,curl,httpie,wget
--external-downloader-args ARGS Give these arguments to the external downloader --external-downloader-args ARGS Give these arguments to the external downloader
## Filesystem Options: ## Filesystem Options:
@ -190,8 +190,8 @@ which means you can modify it, redistribute it or use it however you like.
--all-formats Download all available video formats --all-formats Download all available video formats
--prefer-free-formats Prefer free video formats unless a specific one is requested --prefer-free-formats Prefer free video formats unless a specific one is requested
-F, --list-formats List all available formats -F, --list-formats List all available formats
--youtube-skip-dash-manifest Do not download the DASH manifest on YouTube videos --youtube-skip-dash-manifest Do not download the DASH manifests and related data on YouTube videos
--merge-output-format FORMAT If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv.Ignored if no --merge-output-format FORMAT If a merge is required (e.g. bestvideo+bestaudio), output to given container format. One of mkv, mp4, ogg, webm, flv. Ignored if no
merge is required merge is required
## Subtitle Options: ## Subtitle Options:
@ -214,7 +214,8 @@ which means you can modify it, redistribute it or use it however you like.
--audio-format FORMAT Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "best" by default --audio-format FORMAT Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "best" by default
--audio-quality QUALITY Specify ffmpeg/avconv audio quality, insert a value between 0 (better) and 9 (worse) for VBR or a specific bitrate like 128K (default --audio-quality QUALITY Specify ffmpeg/avconv audio quality, insert a value between 0 (better) and 9 (worse) for VBR or a specific bitrate like 128K (default
5) 5)
--recode-video FORMAT Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv) --recode-video FORMAT Encode the video to another format if necessary (currently supported: mp4|flv|ogg|webm|mkv|avi)
--postprocessor-args ARGS Give these arguments to the postprocessor
-k, --keep-video Keep the video file on disk after the post-processing; the video is erased by default -k, --keep-video Keep the video file on disk after the post-processing; the video is erased by default
--no-post-overwrites Do not overwrite post-processed files; the post-processed files are overwritten by default --no-post-overwrites Do not overwrite post-processed files; the post-processed files are overwritten by default
--embed-subs Embed subtitles in the video (only for mkv and mp4 videos) --embed-subs Embed subtitles in the video (only for mkv and mp4 videos)
@ -237,6 +238,26 @@ which means you can modify it, redistribute it or use it however you like.
You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<user name>\youtube-dl.conf`. You can configure youtube-dl by placing default arguments (such as `--extract-audio --no-mtime` to always extract the audio and not copy the mtime) into `/etc/youtube-dl.conf` and/or `~/.config/youtube-dl/config`. On Windows, the configuration file locations are `%APPDATA%\youtube-dl\config.txt` and `C:\Users\<user name>\youtube-dl.conf`.
### Authentication with `.netrc` file ###
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in shell command history. You can achieve this using [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
```
touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc
```
After that you can add credentials for extractor in the following format, where *extractor* is the name of extractor in lowercase:
```
machine <extractor> login <login> password <password>
```
For example:
```
machine youtube login myaccount@gmail.com password my_youtube_password
machine twitch login my_twitch_account_name password my_twitch_password
```
To activate authentication with `.netrc` file you should pass `--netrc` to youtube-dl or to place it in [configuration file](#configuration).
On Windows you may also need to setup `%HOME%` environment variable manually.
# OUTPUT TEMPLATE # OUTPUT TEMPLATE
The `-o` option allows users to indicate a template for the output file names. The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parenthesis, followed by a lowercase S. Allowed names are: The `-o` option allows users to indicate a template for the output file names. The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "http://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences have the format `%(NAME)s`. To clarify, that is a percent symbol followed by a name in parenthesis, followed by a lowercase S. Allowed names are:
@ -268,7 +289,7 @@ youtube-dl_test_video_.mp4 # A simple file name
By default youtube-dl tries to download the best quality, but sometimes you may want to download other format. By default youtube-dl tries to download the best quality, but sometimes you may want to download other format.
The simplest case is requesting a specific format, for example `-f 22`. You can get the list of available formats using `--list-formats`, you can also use a file extension (currently it supports aac, m4a, mp3, mp4, ogg, wav, webm) or the special names `best`, `bestvideo`, `bestaudio` and `worst`. The simplest case is requesting a specific format, for example `-f 22`. You can get the list of available formats using `--list-formats`, you can also use a file extension (currently it supports aac, m4a, mp3, mp4, ogg, wav, webm) or the special names `best`, `bestvideo`, `bestaudio` and `worst`.
If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes, as in `-f 22/17/18`. You can also filter the video results by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). This works for filesize, height, width, tbr, abr, vbr, asr, and fps and the comparisons <, <=, >, >=, =, != and for ext, acodec, vcodec, container, and protocol and the comparisons =, != . Formats for which the value is not known are excluded unless you put a question mark (?) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Use commas to download multiple formats, such as `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv), for example `-f bestvideo+bestaudio`. If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes, as in `-f 22/17/18`. You can also filter the video results by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`). This works for filesize, height, width, tbr, abr, vbr, asr, and fps and the comparisons <, <=, >, >=, =, != and for ext, acodec, vcodec, container, and protocol and the comparisons =, != . Formats for which the value is not known are excluded unless you put a question mark (?) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s. Use commas to download multiple formats, such as `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`. You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv), for example `-f bestvideo+bestaudio`. Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some dash formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed. Since the end of April 2015 and version 2015.04.26 youtube-dl uses `-f bestvideo+bestaudio/best` as default format selection (see #5447, #5456). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some dash formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
@ -418,6 +439,12 @@ Either prepend `http://www.youtube.com/watch?v=` or separate the ID from the opt
youtube-dl -- -wNyEUrxzFU youtube-dl -- -wNyEUrxzFU
youtube-dl "http://www.youtube.com/watch?v=-wNyEUrxzFU" youtube-dl "http://www.youtube.com/watch?v=-wNyEUrxzFU"
### How do I pass cookies to youtube-dl?
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Note that cookies file must be in Mozilla/Netscape format and the first line of cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in cookies file and convert newlines if necessary to correspond your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
Passing cookies to youtube-dl is a good way to workaround login when particular extractor does not implement it explicitly.
### Can you add support for this anime video site, or site which shows current movies for free? ### Can you add support for this anime video site, or site which shows current movies for free?
As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl. As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.

View File

@ -28,7 +28,8 @@
- **anitube.se** - **anitube.se**
- **AnySex** - **AnySex**
- **Aparat** - **Aparat**
- **AppleDaily** - **AppleConnect**
- **AppleDaily**: 臺灣蘋果日報
- **AppleTrailers** - **AppleTrailers**
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
@ -45,11 +46,12 @@
- **audiomack** - **audiomack**
- **audiomack:album** - **audiomack:album**
- **Azubu** - **Azubu**
- **BaiduVideo** - **BaiduVideo**: 百度视频
- **bambuser** - **bambuser**
- **bambuser:channel** - **bambuser:channel**
- **Bandcamp** - **Bandcamp**
- **Bandcamp:album** - **Bandcamp:album**
- **bbc**: BBC
- **bbc.co.uk**: BBC iPlayer - **bbc.co.uk**: BBC iPlayer
- **BeatportPro** - **BeatportPro**
- **Beeg** - **Beeg**
@ -106,7 +108,7 @@
- **Crunchyroll** - **Crunchyroll**
- **crunchyroll:playlist** - **crunchyroll:playlist**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews** - **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
@ -121,7 +123,7 @@
- **Discovery** - **Discovery**
- **divxstage**: DivxStage - **divxstage**: DivxStage
- **Dotsub** - **Dotsub**
- **DouyuTV** - **DouyuTV**: 斗鱼
- **dramafever** - **dramafever**
- **dramafever:series** - **dramafever:series**
- **DRBonanza** - **DRBonanza**
@ -222,7 +224,8 @@
- **instagram:user**: Instagram user profile - **instagram:user**: Instagram user profile
- **InternetVideoArchive** - **InternetVideoArchive**
- **IPrima** - **IPrima**
- **iqiyi** - **iqiyi**: 爱奇艺
- **Ir90Tv**
- **ivi**: ivi.ru - **ivi**: ivi.ru
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **Izlesene** - **Izlesene**
@ -243,9 +246,16 @@
- **kontrtube**: KontrTube.ru - Труба зовёт - **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью - **KrasView**: Красвью
- **Ku6** - **Ku6**
- **kuwo:album**: 酷我音乐 - 专辑
- **kuwo:category**: 酷我音乐 - 分类
- **kuwo:chart**: 酷我音乐 - 排行榜
- **kuwo:mv**: 酷我音乐 - MV
- **kuwo:singer**: 酷我音乐 - 歌手
- **kuwo:song**: 酷我音乐
- **la7.tv** - **la7.tv**
- **Laola1Tv** - **Laola1Tv**
- **Letv** - **Lecture2Go**
- **Letv**: 乐视网
- **LetvPlaylist** - **LetvPlaylist**
- **LetvTv** - **LetvTv**
- **Libsyn** - **Libsyn**
@ -283,6 +293,7 @@
- **Motherless** - **Motherless**
- **Motorsport**: motorsport.com - **Motorsport**: motorsport.com
- **MovieClips** - **MovieClips**
- **MovieFap**
- **Moviezine** - **Moviezine**
- **movshare**: MovShare - **movshare**: MovShare
- **MPORA** - **MPORA**
@ -296,6 +307,7 @@
- **MySpace** - **MySpace**
- **MySpace:album** - **MySpace:album**
- **MySpass** - **MySpass**
- **Myvi**
- **myvideo** - **myvideo**
- **MyVidster** - **MyVidster**
- **N-JOY** - **N-JOY**
@ -311,11 +323,18 @@
- **NDTV** - **NDTV**
- **NerdCubedFeed** - **NerdCubedFeed**
- **Nerdist** - **Nerdist**
- **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV
- **netease:playlist**: 网易云音乐 - 歌单
- **netease:program**: 网易云音乐 - 电台节目
- **netease:singer**: 网易云音乐 - 歌手
- **netease:song**: 网易云音乐
- **Netzkino** - **Netzkino**
- **Newgrounds** - **Newgrounds**
- **Newstube** - **Newstube**
- **NextMedia** - **NextMedia**: 蘋果日報
- **NextMediaActionNews** - **NextMediaActionNews**: 蘋果日報 - 動新聞
- **nfb**: National Film Board of Canada - **nfb**: National Film Board of Canada
- **nfl.com** - **nfl.com**
- **nhl.com** - **nhl.com**
@ -331,13 +350,14 @@
- **Nowness** - **Nowness**
- **NowTV** - **NowTV**
- **nowvideo**: NowVideo - **nowvideo**: NowVideo
- **npo.nl** - **npo**: npo.nl and ntr.nl
- **npo**: npo.nl and ntr.nl
- **npo.nl:live** - **npo.nl:live**
- **npo.nl:radio** - **npo.nl:radio**
- **npo.nl:radio:fragment** - **npo.nl:radio:fragment**
- **NRK** - **NRK**
- **NRKPlaylist** - **NRKPlaylist**
- **NRKTV** - **NRKTV**: NRK TV and NRK Radio
- **ntv.ru** - **ntv.ru**
- **Nuvid** - **Nuvid**
- **NYTimes** - **NYTimes**
@ -381,10 +401,11 @@
- **prosiebensat1**: ProSiebenSat.1 Digital - **prosiebensat1**: ProSiebenSat.1 Digital
- **Puls4** - **Puls4**
- **Pyvideo** - **Pyvideo**
- **qqmusic** - **qqmusic**: QQ音乐
- **qqmusic:album** - **qqmusic:album**: QQ音乐 - 专辑
- **qqmusic:singer** - **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:toplist** - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuickVid** - **QuickVid**
- **R7** - **R7**
- **radio.de** - **radio.de**
@ -393,6 +414,7 @@
- **RadioJavan** - **RadioJavan**
- **Rai** - **Rai**
- **RBMARadio** - **RBMARadio**
- **RDS**: RDS.ca
- **RedTube** - **RedTube**
- **Restudy** - **Restudy**
- **ReverbNation** - **ReverbNation**
@ -440,6 +462,8 @@
- **smotri:broadcast**: Smotri.com broadcasts - **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos - **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos - **smotri:user**: Smotri.com user videos
- **SnagFilms**
- **SnagFilmsEmbed**
- **Snotr** - **Snotr**
- **Sohu** - **Sohu**
- **soompi** - **soompi**
@ -466,6 +490,7 @@
- **SportBox** - **SportBox**
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
- **Sportschau**
- **Srf** - **Srf**
- **SRMediathek**: Saarländischer Rundfunk - **SRMediathek**: Saarländischer Rundfunk
- **SSA** - **SSA**
@ -491,7 +516,6 @@
- **TechTalks** - **TechTalks**
- **techtv.mit.edu** - **techtv.mit.edu**
- **ted** - **ted**
- **tegenlicht.vpro.nl**
- **TeleBruxelles** - **TeleBruxelles**
- **telecinco.es** - **telecinco.es**
- **TeleMB** - **TeleMB**
@ -502,6 +526,7 @@
- **TheOnion** - **TheOnion**
- **ThePlatform** - **ThePlatform**
- **TheSixtyOne** - **TheSixtyOne**
- **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **THVideo** - **THVideo**
- **THVideoPlaylist** - **THVideoPlaylist**
@ -542,10 +567,11 @@
- **twitch:stream** - **twitch:stream**
- **twitch:video** - **twitch:video**
- **twitch:vod** - **twitch:vod**
- **TwitterCard**
- **Ubu** - **Ubu**
- **udemy** - **udemy**
- **udemy:course** - **udemy:course**
- **UDNEmbed** - **UDNEmbed**: 聯合影音
- **Ultimedia** - **Ultimedia**
- **Unistra** - **Unistra**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
@ -590,8 +616,8 @@
- **Vimple**: Vimple - one-click video hosting - **Vimple**: Vimple - one-click video hosting
- **Vine** - **Vine**
- **vine:user** - **vine:user**
- **vk.com** - **vk**: VK
- **vk.com:user-videos**: vk.com:All of a user's videos - **vk:uservideos**: VK - User's Videos
- **Vodlocker** - **Vodlocker**
- **VoiceRepublic** - **VoiceRepublic**
- **Vporn** - **Vporn**
@ -607,9 +633,11 @@
- **wdr:mobile** - **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus - **WDRMaus**: Sendung mit der Maus
- **WebOfStories** - **WebOfStories**
- **WebOfStoriesPlaylist**
- **Weibo** - **Weibo**
- **Wimp** - **Wimp**
- **Wistia** - **Wistia**
- **WNL**
- **WorldStarHipHop** - **WorldStarHipHop**
- **wrzuta.pl** - **wrzuta.pl**
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
@ -622,18 +650,19 @@
- **Xstream** - **Xstream**
- **XTube** - **XTube**
- **XTubeUser**: XTube user profile - **XTubeUser**: XTube user profile
- **Xuite** - **Xuite**: 隨意窩Xuite影音
- **XVideos** - **XVideos**
- **XXXYMovies** - **XXXYMovies**
- **Yahoo**: Yahoo screen and movies - **Yahoo**: Yahoo screen and movies
- **Yam** - **Yam**: 蕃薯藤yam天空部落
- **yandexmusic:album**: Яндекс.Музыка - Альбом - **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист - **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек - **yandexmusic:track**: Яндекс.Музыка - Трек
- **YesJapan** - **YesJapan**
- **yinyuetai:video**: 音悦Tai
- **Ynet** - **Ynet**
- **YouJizz** - **YouJizz**
- **youku** - **youku**: 优酷
- **YouPorn** - **YouPorn**
- **YourUpload** - **YourUpload**
- **youtube**: YouTube.com - **youtube**: YouTube.com

View File

@ -133,8 +133,8 @@ def expect_info_dict(self, got_dict, expected_dict):
elif isinstance(expected, compat_str) and expected.startswith('mincount:'): elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
got = got_dict.get(info_field) got = got_dict.get(info_field)
self.assertTrue( self.assertTrue(
isinstance(got, list), isinstance(got, (list, dict)),
'Expected field %s to be a list, but it is of type %s' % ( 'Expected field %s to be a list or a dict, but it is of type %s' % (
info_field, type(got).__name__)) info_field, type(got).__name__))
expected_num = int(expected.partition(':')[2]) expected_num = int(expected.partition(':')[2])
assertGreaterEqual( assertGreaterEqual(

View File

@ -15,7 +15,7 @@ from youtube_dl import YoutubeDL
from youtube_dl.compat import compat_str from youtube_dl.compat import compat_str
from youtube_dl.extractor import YoutubeIE from youtube_dl.extractor import YoutubeIE
from youtube_dl.postprocessor.common import PostProcessor from youtube_dl.postprocessor.common import PostProcessor
from youtube_dl.utils import match_filter_func from youtube_dl.utils import ExtractorError, match_filter_func
TEST_URL = 'http://localhost/sample.mp4' TEST_URL = 'http://localhost/sample.mp4'
@ -105,6 +105,7 @@ class TestFormatSelection(unittest.TestCase):
def test_format_selection(self): def test_format_selection(self):
formats = [ formats = [
{'format_id': '35', 'ext': 'mp4', 'preference': 1, 'url': TEST_URL}, {'format_id': '35', 'ext': 'mp4', 'preference': 1, 'url': TEST_URL},
{'format_id': 'example-with-dashes', 'ext': 'webm', 'preference': 1, 'url': TEST_URL},
{'format_id': '45', 'ext': 'webm', 'preference': 2, 'url': TEST_URL}, {'format_id': '45', 'ext': 'webm', 'preference': 2, 'url': TEST_URL},
{'format_id': '47', 'ext': 'webm', 'preference': 3, 'url': TEST_URL}, {'format_id': '47', 'ext': 'webm', 'preference': 3, 'url': TEST_URL},
{'format_id': '2', 'ext': 'flv', 'preference': 4, 'url': TEST_URL}, {'format_id': '2', 'ext': 'flv', 'preference': 4, 'url': TEST_URL},
@ -136,6 +137,11 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '35') self.assertEqual(downloaded['format_id'], '35')
ydl = YDL({'format': 'example-with-dashes'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'example-with-dashes')
def test_format_selection_audio(self): def test_format_selection_audio(self):
formats = [ formats = [
{'format_id': 'audio-low', 'ext': 'webm', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL}, {'format_id': 'audio-low', 'ext': 'webm', 'preference': 1, 'vcodec': 'none', 'url': TEST_URL},
@ -229,21 +235,70 @@ class TestFormatSelection(unittest.TestCase):
'141', '172', '140', '171', '139', '141', '172', '140', '171', '139',
] ]
for f1id, f2id in zip(order, order[1:]): def format_info(f_id):
f1 = YoutubeIE._formats[f1id].copy() info = YoutubeIE._formats[f_id].copy()
f1['format_id'] = f1id info['format_id'] = f_id
f1['url'] = 'url:' + f1id info['url'] = 'url:' + f_id
f2 = YoutubeIE._formats[f2id].copy() return info
f2['format_id'] = f2id formats_order = [format_info(f_id) for f_id in order]
f2['url'] = 'url:' + f2id
info_dict = _make_result(list(formats_order), extractor='youtube')
ydl = YDL({'format': 'bestvideo+bestaudio'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '137+141')
self.assertEqual(downloaded['ext'], 'mp4')
info_dict = _make_result(list(formats_order), extractor='youtube')
ydl = YDL({'format': 'bestvideo[height>=999999]+bestaudio/best'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], '38')
info_dict = _make_result(list(formats_order), extractor='youtube')
ydl = YDL({'format': 'bestvideo/best,bestaudio'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['137', '141'])
info_dict = _make_result(list(formats_order), extractor='youtube')
ydl = YDL({'format': '(bestvideo[ext=mp4],bestvideo[ext=webm])+bestaudio'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['137+141', '248+141'])
info_dict = _make_result(list(formats_order), extractor='youtube')
ydl = YDL({'format': '(bestvideo[ext=mp4],bestvideo[ext=webm])[height<=720]+bestaudio'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['136+141', '247+141'])
info_dict = _make_result(list(formats_order), extractor='youtube')
ydl = YDL({'format': '(bestvideo[ext=none]/bestvideo[ext=webm])+bestaudio'})
yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['248+141'])
for f1, f2 in zip(formats_order, formats_order[1:]):
info_dict = _make_result([f1, f2], extractor='youtube') info_dict = _make_result([f1, f2], extractor='youtube')
ydl = YDL({'format': 'best/bestvideo'}) ydl = YDL({'format': 'best/bestvideo'})
yie = YoutubeIE(ydl) yie = YoutubeIE(ydl)
yie._sort_formats(info_dict['formats']) yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict) ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1id) self.assertEqual(downloaded['format_id'], f1['format_id'])
info_dict = _make_result([f2, f1], extractor='youtube') info_dict = _make_result([f2, f1], extractor='youtube')
ydl = YDL({'format': 'best/bestvideo'}) ydl = YDL({'format': 'best/bestvideo'})
@ -251,7 +306,18 @@ class TestFormatSelection(unittest.TestCase):
yie._sort_formats(info_dict['formats']) yie._sort_formats(info_dict['formats'])
ydl.process_ie_result(info_dict) ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], f1id) self.assertEqual(downloaded['format_id'], f1['format_id'])
def test_invalid_format_specs(self):
def assert_syntax_error(format_spec):
ydl = YDL({'format': format_spec})
info_dict = _make_result([{'format_id': 'foo', 'url': TEST_URL}])
self.assertRaises(SyntaxError, ydl.process_ie_result, info_dict)
assert_syntax_error('bestvideo,,best')
assert_syntax_error('+bestaudio')
assert_syntax_error('bestvideo+')
assert_syntax_error('/')
def test_format_filtering(self): def test_format_filtering(self):
formats = [ formats = [
@ -308,6 +374,18 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'G') self.assertEqual(downloaded['format_id'], 'G')
ydl = YDL({'format': 'all[width>=400][width<=600]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['B', 'C', 'D'])
ydl = YDL({'format': 'best[height<40]'})
try:
ydl.process_ie_result(info_dict)
except ExtractorError:
pass
self.assertEqual(ydl.downloaded_info_dicts, [])
class TestYoutubeDL(unittest.TestCase): class TestYoutubeDL(unittest.TestCase):
def test_subtitles(self): def test_subtitles(self):

View File

@ -14,6 +14,8 @@ from youtube_dl.utils import get_filesystem_encoding
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_getenv, compat_getenv,
compat_expanduser, compat_expanduser,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
) )
@ -42,5 +44,28 @@ class TestCompat(unittest.TestCase):
dir(youtube_dl.compat))) - set(['unicode_literals']) dir(youtube_dl.compat))) - set(['unicode_literals'])
self.assertEqual(all_names, sorted(present_names)) self.assertEqual(all_names, sorted(present_names))
def test_compat_urllib_parse_unquote(self):
self.assertEqual(compat_urllib_parse_unquote('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote('%7e/abc+def'), '~/abc+def')
self.assertEqual(compat_urllib_parse_unquote(''), '')
self.assertEqual(compat_urllib_parse_unquote('%'), '%')
self.assertEqual(compat_urllib_parse_unquote('%%'), '%%')
self.assertEqual(compat_urllib_parse_unquote('%%%'), '%%%')
self.assertEqual(compat_urllib_parse_unquote('%2F'), '/')
self.assertEqual(compat_urllib_parse_unquote('%2f'), '/')
self.assertEqual(compat_urllib_parse_unquote('%E6%B4%A5%E6%B3%A2'), '津波')
self.assertEqual(
compat_urllib_parse_unquote('''<meta property="og:description" content="%E2%96%81%E2%96%82%E2%96%83%E2%96%84%25%E2%96%85%E2%96%86%E2%96%87%E2%96%88" />
%<a href="https://ar.wikipedia.org/wiki/%D8%AA%D8%B3%D9%88%D9%86%D8%A7%D9%85%D9%8A">%a'''),
'''<meta property="og:description" content="▁▂▃▄%▅▆▇█" />
%<a href="https://ar.wikipedia.org/wiki/تسونامي">%a''')
self.assertEqual(
compat_urllib_parse_unquote('''%28%5E%E2%97%A3_%E2%97%A2%5E%29%E3%81%A3%EF%B8%BB%E3%83%87%E2%95%90%E4%B8%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%87%80 %E2%86%B6%I%Break%25Things%'''),
'''(^◣_◢^)っ︻デ═一 ⇀ ⇀ ⇀ ⇀ ⇀ ↶%I%Break%Things%''')
def test_compat_urllib_parse_unquote_plus(self):
self.assertEqual(compat_urllib_parse_unquote_plus('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote_plus('%7e/abc+def'), '~/abc def')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -136,7 +136,9 @@ def generator(test_case):
# We're not using .download here sine that is just a shim # We're not using .download here sine that is just a shim
# for outside error handling, and returns the exit code # for outside error handling, and returns the exit code
# instead of the result dict. # instead of the result dict.
res_dict = ydl.extract_info(test_case['url']) res_dict = ydl.extract_info(
test_case['url'],
force_generic_extractor=params.get('force_generic_extractor', False))
except (DownloadError, ExtractorError) as err: except (DownloadError, ExtractorError) as err:
# Check if the exception is not a network related one # Check if the exception is not a network related one
if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError, compat_http_client.BadStatusLine) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503): if not err.exc_info[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError, compat_http_client.BadStatusLine) or (err.exc_info[0] == compat_HTTPError and err.exc_info[1].code == 503):

View File

@ -235,12 +235,21 @@ class TestUtil(unittest.TestCase):
<node x="a"/> <node x="a"/>
<node x="a" y="c" /> <node x="a" y="c" />
<node x="b" y="d" /> <node x="b" y="d" />
<node x="" />
</root>''' </root>'''
doc = xml.etree.ElementTree.fromstring(testxml) doc = xml.etree.ElementTree.fromstring(testxml)
self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n'), None)
self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n', 'v'), None) self.assertEqual(find_xpath_attr(doc, './/fourohfour', 'n', 'v'), None)
self.assertEqual(find_xpath_attr(doc, './/node', 'n'), None)
self.assertEqual(find_xpath_attr(doc, './/node', 'n', 'v'), None)
self.assertEqual(find_xpath_attr(doc, './/node', 'x'), doc[1])
self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'a'), doc[1]) self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'a'), doc[1])
self.assertEqual(find_xpath_attr(doc, './/node', 'x', 'b'), doc[3])
self.assertEqual(find_xpath_attr(doc, './/node', 'y'), doc[2])
self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'c'), doc[2]) self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'c'), doc[2])
self.assertEqual(find_xpath_attr(doc, './/node', 'y', 'd'), doc[3])
self.assertEqual(find_xpath_attr(doc, './/node', 'x', ''), doc[4])
def test_xpath_with_ns(self): def test_xpath_with_ns(self):
testxml = '''<root xmlns:media="http://example.com/"> testxml = '''<root xmlns:media="http://example.com/">
@ -324,6 +333,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('02:03:04'), 7384) self.assertEqual(parse_duration('02:03:04'), 7384)
self.assertEqual(parse_duration('01:02:03:04'), 93784) self.assertEqual(parse_duration('01:02:03:04'), 93784)
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780) self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
self.assertEqual(parse_duration('87 Min.'), 5220)
def test_fix_xml_ampersands(self): def test_fix_xml_ampersands(self):
self.assertEqual( self.assertEqual(

View File

@ -21,24 +21,24 @@ import subprocess
import socket import socket
import sys import sys
import time import time
import tokenize
import traceback import traceback
if os.name == 'nt': if os.name == 'nt':
import ctypes import ctypes
from .compat import ( from .compat import (
compat_basestring,
compat_cookiejar, compat_cookiejar,
compat_expanduser, compat_expanduser,
compat_get_terminal_size, compat_get_terminal_size,
compat_http_client, compat_http_client,
compat_kwargs, compat_kwargs,
compat_str, compat_str,
compat_tokenize_tokenize,
compat_urllib_error, compat_urllib_error,
compat_urllib_request, compat_urllib_request,
) )
from .utils import ( from .utils import (
escape_url,
ContentTooShortError, ContentTooShortError,
date_from_str, date_from_str,
DateRange, DateRange,
@ -49,7 +49,6 @@ from .utils import (
ExtractorError, ExtractorError,
format_bytes, format_bytes,
formatSeconds, formatSeconds,
HEADRequest,
locked_file, locked_file,
make_HTTPS_handler, make_HTTPS_handler,
MaxDownloadsReached, MaxDownloadsReached,
@ -262,6 +261,8 @@ class YoutubeDL(object):
The following options are used by the post processors: The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available, prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available,
otherwise prefer avconv. otherwise prefer avconv.
postprocessor_args: A list of additional command-line arguments for the
postprocessor.
""" """
params = None params = None
@ -851,8 +852,8 @@ class YoutubeDL(object):
else: else:
raise Exception('Invalid result type: %s' % result_type) raise Exception('Invalid result type: %s' % result_type)
def _apply_format_filter(self, format_spec, available_formats): def _build_format_filter(self, filter_spec):
" Returns a tuple of the remaining format_spec and filtered formats " " Returns a function to filter the formats according to the filter_spec "
OPERATORS = { OPERATORS = {
'<': operator.lt, '<': operator.lt,
@ -862,13 +863,13 @@ class YoutubeDL(object):
'=': operator.eq, '=': operator.eq,
'!=': operator.ne, '!=': operator.ne,
} }
operator_rex = re.compile(r'''(?x)\s*\[ operator_rex = re.compile(r'''(?x)\s*
(?P<key>width|height|tbr|abr|vbr|asr|filesize|fps) (?P<key>width|height|tbr|abr|vbr|asr|filesize|fps)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s* \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?) (?P<value>[0-9.]+(?:[kKmMgGtTpPeEzZyY]i?[Bb]?)?)
\]$ $
''' % '|'.join(map(re.escape, OPERATORS.keys()))) ''' % '|'.join(map(re.escape, OPERATORS.keys())))
m = operator_rex.search(format_spec) m = operator_rex.search(filter_spec)
if m: if m:
try: try:
comparison_value = int(m.group('value')) comparison_value = int(m.group('value'))
@ -879,7 +880,7 @@ class YoutubeDL(object):
if comparison_value is None: if comparison_value is None:
raise ValueError( raise ValueError(
'Invalid value %r in format specification %r' % ( 'Invalid value %r in format specification %r' % (
m.group('value'), format_spec)) m.group('value'), filter_spec))
op = OPERATORS[m.group('op')] op = OPERATORS[m.group('op')]
if not m: if not m:
@ -887,85 +888,283 @@ class YoutubeDL(object):
'=': operator.eq, '=': operator.eq,
'!=': operator.ne, '!=': operator.ne,
} }
str_operator_rex = re.compile(r'''(?x)\s*\[ str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol) \s*(?P<key>ext|acodec|vcodec|container|protocol)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)? \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9_-]+) \s*(?P<value>[a-zA-Z0-9_-]+)
\s*\]$ \s*$
''' % '|'.join(map(re.escape, STR_OPERATORS.keys()))) ''' % '|'.join(map(re.escape, STR_OPERATORS.keys())))
m = str_operator_rex.search(format_spec) m = str_operator_rex.search(filter_spec)
if m: if m:
comparison_value = m.group('value') comparison_value = m.group('value')
op = STR_OPERATORS[m.group('op')] op = STR_OPERATORS[m.group('op')]
if not m: if not m:
raise ValueError('Invalid format specification %r' % format_spec) raise ValueError('Invalid filter specification %r' % filter_spec)
def _filter(f): def _filter(f):
actual_value = f.get(m.group('key')) actual_value = f.get(m.group('key'))
if actual_value is None: if actual_value is None:
return m.group('none_inclusive') return m.group('none_inclusive')
return op(actual_value, comparison_value) return op(actual_value, comparison_value)
new_formats = [f for f in available_formats if _filter(f)] return _filter
new_format_spec = format_spec[:-len(m.group(0))] def build_format_selector(self, format_spec):
if not new_format_spec: def syntax_error(note, start):
new_format_spec = 'best' message = (
'Invalid format specification: '
'{0}\n\t{1}\n\t{2}^'.format(note, format_spec, ' ' * start[1]))
return SyntaxError(message)
return (new_format_spec, new_formats) PICKFIRST = 'PICKFIRST'
MERGE = 'MERGE'
SINGLE = 'SINGLE'
GROUP = 'GROUP'
FormatSelector = collections.namedtuple('FormatSelector', ['type', 'selector', 'filters'])
def select_format(self, format_spec, available_formats): def _parse_filter(tokens):
while format_spec.endswith(']'): filter_parts = []
format_spec, available_formats = self._apply_format_filter( for type, string, start, _, _ in tokens:
format_spec, available_formats) if type == tokenize.OP and string == ']':
if not available_formats: return ''.join(filter_parts)
return None else:
filter_parts.append(string)
if format_spec in ['best', 'worst', None]: def _remove_unused_ops(tokens):
# Remove operators that we don't use and join them with the sourrounding strings
# for example: 'mp4' '-' 'baseline' '-' '16x9' is converted to 'mp4-baseline-16x9'
ALLOWED_OPS = ('/', '+', ',', '(', ')')
last_string, last_start, last_end, last_line = None, None, None, None
for type, string, start, end, line in tokens:
if type == tokenize.OP and string == '[':
if last_string:
yield tokenize.NAME, last_string, last_start, last_end, last_line
last_string = None
yield type, string, start, end, line
# everything inside brackets will be handled by _parse_filter
for type, string, start, end, line in tokens:
yield type, string, start, end, line
if type == tokenize.OP and string == ']':
break
elif type == tokenize.OP and string in ALLOWED_OPS:
if last_string:
yield tokenize.NAME, last_string, last_start, last_end, last_line
last_string = None
yield type, string, start, end, line
elif type in [tokenize.NAME, tokenize.NUMBER, tokenize.OP]:
if not last_string:
last_string = string
last_start = start
last_end = end
else:
last_string += string
if last_string:
yield tokenize.NAME, last_string, last_start, last_end, last_line
def _parse_format_selection(tokens, inside_merge=False, inside_choice=False, inside_group=False):
selectors = []
current_selector = None
for type, string, start, _, _ in tokens:
# ENCODING is only defined in python 3.x
if type == getattr(tokenize, 'ENCODING', None):
continue
elif type in [tokenize.NAME, tokenize.NUMBER]:
current_selector = FormatSelector(SINGLE, string, [])
elif type == tokenize.OP:
if string == ')':
if not inside_group:
# ')' will be handled by the parentheses group
tokens.restore_last_token()
break
elif inside_merge and string in ['/', ',']:
tokens.restore_last_token()
break
elif inside_choice and string == ',':
tokens.restore_last_token()
break
elif string == ',':
if not current_selector:
raise syntax_error('"," must follow a format selector', start)
selectors.append(current_selector)
current_selector = None
elif string == '/':
if not current_selector:
raise syntax_error('"/" must follow a format selector', start)
first_choice = current_selector
second_choice = _parse_format_selection(tokens, inside_choice=True)
current_selector = FormatSelector(PICKFIRST, (first_choice, second_choice), [])
elif string == '[':
if not current_selector:
current_selector = FormatSelector(SINGLE, 'best', [])
format_filter = _parse_filter(tokens)
current_selector.filters.append(format_filter)
elif string == '(':
if current_selector:
raise syntax_error('Unexpected "("', start)
group = _parse_format_selection(tokens, inside_group=True)
current_selector = FormatSelector(GROUP, group, [])
elif string == '+':
video_selector = current_selector
audio_selector = _parse_format_selection(tokens, inside_merge=True)
if not video_selector or not audio_selector:
raise syntax_error('"+" must be between two format selectors', start)
current_selector = FormatSelector(MERGE, (video_selector, audio_selector), [])
else:
raise syntax_error('Operator not recognized: "{0}"'.format(string), start)
elif type == tokenize.ENDMARKER:
break
if current_selector:
selectors.append(current_selector)
return selectors
def _build_selector_function(selector):
if isinstance(selector, list):
fs = [_build_selector_function(s) for s in selector]
def selector_function(formats):
for f in fs:
for format in f(formats):
yield format
return selector_function
elif selector.type == GROUP:
selector_function = _build_selector_function(selector.selector)
elif selector.type == PICKFIRST:
fs = [_build_selector_function(s) for s in selector.selector]
def selector_function(formats):
for f in fs:
picked_formats = list(f(formats))
if picked_formats:
return picked_formats
return []
elif selector.type == SINGLE:
format_spec = selector.selector
def selector_function(formats):
formats = list(formats)
if not formats:
return
if format_spec == 'all':
for f in formats:
yield f
elif format_spec in ['best', 'worst', None]:
format_idx = 0 if format_spec == 'worst' else -1 format_idx = 0 if format_spec == 'worst' else -1
audiovideo_formats = [ audiovideo_formats = [
f for f in available_formats f for f in formats
if f.get('vcodec') != 'none' and f.get('acodec') != 'none'] if f.get('vcodec') != 'none' and f.get('acodec') != 'none']
if audiovideo_formats: if audiovideo_formats:
return audiovideo_formats[format_idx] yield audiovideo_formats[format_idx]
# for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format # for audio only (soundcloud) or video only (imgur) urls, select the best/worst audio format
elif (all(f.get('acodec') != 'none' for f in available_formats) or elif (all(f.get('acodec') != 'none' for f in formats) or
all(f.get('vcodec') != 'none' for f in available_formats)): all(f.get('vcodec') != 'none' for f in formats)):
return available_formats[format_idx] yield formats[format_idx]
elif format_spec == 'bestaudio': elif format_spec == 'bestaudio':
audio_formats = [ audio_formats = [
f for f in available_formats f for f in formats
if f.get('vcodec') == 'none'] if f.get('vcodec') == 'none']
if audio_formats: if audio_formats:
return audio_formats[-1] yield audio_formats[-1]
elif format_spec == 'worstaudio': elif format_spec == 'worstaudio':
audio_formats = [ audio_formats = [
f for f in available_formats f for f in formats
if f.get('vcodec') == 'none'] if f.get('vcodec') == 'none']
if audio_formats: if audio_formats:
return audio_formats[0] yield audio_formats[0]
elif format_spec == 'bestvideo': elif format_spec == 'bestvideo':
video_formats = [ video_formats = [
f for f in available_formats f for f in formats
if f.get('acodec') == 'none'] if f.get('acodec') == 'none']
if video_formats: if video_formats:
return video_formats[-1] yield video_formats[-1]
elif format_spec == 'worstvideo': elif format_spec == 'worstvideo':
video_formats = [ video_formats = [
f for f in available_formats f for f in formats
if f.get('acodec') == 'none'] if f.get('acodec') == 'none']
if video_formats: if video_formats:
return video_formats[0] yield video_formats[0]
else: else:
extensions = ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav'] extensions = ['mp4', 'flv', 'webm', '3gp', 'm4a', 'mp3', 'ogg', 'aac', 'wav']
if format_spec in extensions: if format_spec in extensions:
filter_f = lambda f: f['ext'] == format_spec filter_f = lambda f: f['ext'] == format_spec
else: else:
filter_f = lambda f: f['format_id'] == format_spec filter_f = lambda f: f['format_id'] == format_spec
matches = list(filter(filter_f, available_formats)) matches = list(filter(filter_f, formats))
if matches: if matches:
return matches[-1] yield matches[-1]
return None elif selector.type == MERGE:
def _merge(formats_info):
format_1, format_2 = [f['format_id'] for f in formats_info]
# The first format must contain the video and the
# second the audio
if formats_info[0].get('vcodec') == 'none':
self.report_error('The first format must '
'contain the video, try using '
'"-f %s+%s"' % (format_2, format_1))
return
output_ext = (
formats_info[0]['ext']
if self.params.get('merge_output_format') is None
else self.params['merge_output_format'])
return {
'requested_formats': formats_info,
'format': '%s+%s' % (formats_info[0].get('format'),
formats_info[1].get('format')),
'format_id': '%s+%s' % (formats_info[0].get('format_id'),
formats_info[1].get('format_id')),
'width': formats_info[0].get('width'),
'height': formats_info[0].get('height'),
'resolution': formats_info[0].get('resolution'),
'fps': formats_info[0].get('fps'),
'vcodec': formats_info[0].get('vcodec'),
'vbr': formats_info[0].get('vbr'),
'stretched_ratio': formats_info[0].get('stretched_ratio'),
'acodec': formats_info[1].get('acodec'),
'abr': formats_info[1].get('abr'),
'ext': output_ext,
}
video_selector, audio_selector = map(_build_selector_function, selector.selector)
def selector_function(formats):
formats = list(formats)
for pair in itertools.product(video_selector(formats), audio_selector(formats)):
yield _merge(pair)
filters = [self._build_format_filter(f) for f in selector.filters]
def final_selector(formats):
for _filter in filters:
formats = list(filter(_filter, formats))
return selector_function(formats)
return final_selector
stream = io.BytesIO(format_spec.encode('utf-8'))
try:
tokens = list(_remove_unused_ops(compat_tokenize_tokenize(stream.readline)))
except tokenize.TokenError:
raise syntax_error('Missing closing/opening brackets or parenthesis', (0, len(format_spec)))
class TokenIterator(object):
def __init__(self, tokens):
self.tokens = tokens
self.counter = 0
def __iter__(self):
return self
def __next__(self):
if self.counter >= len(self.tokens):
raise StopIteration()
value = self.tokens[self.counter]
self.counter += 1
return value
next = __next__
def restore_last_token(self):
self.counter -= 1
parsed_selector = _parse_format_selection(iter(TokenIterator(tokens)))
return _build_selector_function(parsed_selector)
def _calc_headers(self, info_dict): def _calc_headers(self, info_dict):
res = std_headers.copy() res = std_headers.copy()
@ -1102,62 +1301,15 @@ class YoutubeDL(object):
if req_format is None: if req_format is None:
req_format_list = [] req_format_list = []
if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and if (self.params.get('outtmpl', DEFAULT_OUTTMPL) != '-' and
info_dict['extractor'] in ['youtube', 'ted']): info_dict['extractor'] in ['youtube', 'ted'] and
not info_dict.get('is_live')):
merger = FFmpegMergerPP(self) merger = FFmpegMergerPP(self)
if merger.available and merger.can_merge(): if merger.available and merger.can_merge():
req_format_list.append('bestvideo+bestaudio') req_format_list.append('bestvideo+bestaudio')
req_format_list.append('best') req_format_list.append('best')
req_format = '/'.join(req_format_list) req_format = '/'.join(req_format_list)
formats_to_download = [] format_selector = self.build_format_selector(req_format)
if req_format == 'all': formats_to_download = list(format_selector(formats))
formats_to_download = formats
else:
for rfstr in req_format.split(','):
# We can accept formats requested in the format: 34/5/best, we pick
# the first that is available, starting from left
req_formats = rfstr.split('/')
for rf in req_formats:
if re.match(r'.+?\+.+?', rf) is not None:
# Two formats have been requested like '137+139'
format_1, format_2 = rf.split('+')
formats_info = (self.select_format(format_1, formats),
self.select_format(format_2, formats))
if all(formats_info):
# The first format must contain the video and the
# second the audio
if formats_info[0].get('vcodec') == 'none':
self.report_error('The first format must '
'contain the video, try using '
'"-f %s+%s"' % (format_2, format_1))
return
output_ext = (
formats_info[0]['ext']
if self.params.get('merge_output_format') is None
else self.params['merge_output_format'])
selected_format = {
'requested_formats': formats_info,
'format': '%s+%s' % (formats_info[0].get('format'),
formats_info[1].get('format')),
'format_id': '%s+%s' % (formats_info[0].get('format_id'),
formats_info[1].get('format_id')),
'width': formats_info[0].get('width'),
'height': formats_info[0].get('height'),
'resolution': formats_info[0].get('resolution'),
'fps': formats_info[0].get('fps'),
'vcodec': formats_info[0].get('vcodec'),
'vbr': formats_info[0].get('vbr'),
'stretched_ratio': formats_info[0].get('stretched_ratio'),
'acodec': formats_info[1].get('acodec'),
'abr': formats_info[1].get('abr'),
'ext': output_ext,
}
else:
selected_format = None
else:
selected_format = self.select_format(rf, formats)
if selected_format is not None:
formats_to_download.append(selected_format)
break
if not formats_to_download: if not formats_to_download:
raise ExtractorError('requested format not available', raise ExtractorError('requested format not available',
expected=True) expected=True)
@ -1705,27 +1857,6 @@ class YoutubeDL(object):
def urlopen(self, req): def urlopen(self, req):
""" Start an HTTP download """ """ Start an HTTP download """
# According to RFC 3986, URLs can not contain non-ASCII characters, however this is not
# always respected by websites, some tend to give out URLs with non percent-encoded
# non-ASCII characters (see telemb.py, ard.py [#3412])
# urllib chokes on URLs with non-ASCII characters (see http://bugs.python.org/issue3991)
# To work around aforementioned issue we will replace request's original URL with
# percent-encoded one
req_is_string = isinstance(req, compat_basestring)
url = req if req_is_string else req.get_full_url()
url_escaped = escape_url(url)
# Substitute URL if any change after escaping
if url != url_escaped:
if req_is_string:
req = url_escaped
else:
req_type = HEADRequest if req.get_method() == 'HEAD' else compat_urllib_request.Request
req = req_type(
url_escaped, data=req.data, headers=req.headers,
origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
return self._opener.open(req, timeout=self._socket_timeout) return self._opener.open(req, timeout=self._socket_timeout)
def print_debug_header(self): def print_debug_header(self):

View File

@ -169,7 +169,7 @@ def _real_main(argv=None):
if not opts.audioquality.isdigit(): if not opts.audioquality.isdigit():
parser.error('invalid audio quality specified') parser.error('invalid audio quality specified')
if opts.recodevideo is not None: if opts.recodevideo is not None:
if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv']: if opts.recodevideo not in ['mp4', 'flv', 'webm', 'ogg', 'mkv', 'avi']:
parser.error('invalid video recode format specified') parser.error('invalid video recode format specified')
if opts.convertsubtitles is not None: if opts.convertsubtitles is not None:
if opts.convertsubtitles not in ['srt', 'vtt', 'ass']: if opts.convertsubtitles not in ['srt', 'vtt', 'ass']:
@ -263,6 +263,9 @@ def _real_main(argv=None):
external_downloader_args = None external_downloader_args = None
if opts.external_downloader_args: if opts.external_downloader_args:
external_downloader_args = shlex.split(opts.external_downloader_args) external_downloader_args = shlex.split(opts.external_downloader_args)
postprocessor_args = None
if opts.postprocessor_args:
postprocessor_args = shlex.split(opts.postprocessor_args)
match_filter = ( match_filter = (
None if opts.match_filter is None None if opts.match_filter is None
else match_filter_func(opts.match_filter)) else match_filter_func(opts.match_filter))
@ -367,6 +370,7 @@ def _real_main(argv=None):
'ffmpeg_location': opts.ffmpeg_location, 'ffmpeg_location': opts.ffmpeg_location,
'hls_prefer_native': opts.hls_prefer_native, 'hls_prefer_native': opts.hls_prefer_native,
'external_downloader_args': external_downloader_args, 'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy, 'cn_verification_proxy': opts.cn_verification_proxy,
} }

View File

@ -9,6 +9,7 @@ import shutil
import socket import socket
import subprocess import subprocess
import sys import sys
import itertools
try: try:
@ -41,6 +42,11 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import cookielib as compat_cookiejar import cookielib as compat_cookiejar
try:
import http.cookies as compat_cookies
except ImportError: # Python 2
import Cookie as compat_cookies
try: try:
import html.entities as compat_html_entities import html.entities as compat_html_entities
except ImportError: # Python 2 except ImportError: # Python 2
@ -74,42 +80,74 @@ except ImportError:
import BaseHTTPServer as compat_http_server import BaseHTTPServer as compat_http_server
try: try:
from urllib.parse import unquote_to_bytes as compat_urllib_parse_unquote_to_bytes
from urllib.parse import unquote as compat_urllib_parse_unquote from urllib.parse import unquote as compat_urllib_parse_unquote
except ImportError: from urllib.parse import unquote_plus as compat_urllib_parse_unquote_plus
def compat_urllib_parse_unquote(string, encoding='utf-8', errors='replace'): except ImportError: # Python 2
if string == '': _asciire = (compat_urllib_parse._asciire if hasattr(compat_urllib_parse, '_asciire')
else re.compile('([\x00-\x7f]+)'))
# HACK: The following are the correct unquote_to_bytes, unquote and unquote_plus
# implementations from cpython 3.4.3's stdlib. Python 2's version
# is apparently broken (see https://github.com/rg3/youtube-dl/pull/6244)
def compat_urllib_parse_unquote_to_bytes(string):
"""unquote_to_bytes('abc%20def') -> b'abc def'."""
# Note: strings are encoded as UTF-8. This is only an issue if it contains
# unescaped non-ASCII characters, which URIs should not.
if not string:
# Is it a string-like object?
string.split
return b''
if isinstance(string, unicode):
string = string.encode('utf-8')
bits = string.split(b'%')
if len(bits) == 1:
return string return string
res = string.split('%') res = [bits[0]]
if len(res) == 1: append = res.append
for item in bits[1:]:
try:
append(compat_urllib_parse._hextochr[item[:2]])
append(item[2:])
except KeyError:
append(b'%')
append(item)
return b''.join(res)
def compat_urllib_parse_unquote(string, encoding='utf-8', errors='replace'):
"""Replace %xx escapes by their single-character equivalent. The optional
encoding and errors parameters specify how to decode percent-encoded
sequences into Unicode characters, as accepted by the bytes.decode()
method.
By default, percent-encoded sequences are decoded with UTF-8, and invalid
sequences are replaced by a placeholder character.
unquote('abc%20def') -> 'abc def'.
"""
if '%' not in string:
string.split
return string return string
if encoding is None: if encoding is None:
encoding = 'utf-8' encoding = 'utf-8'
if errors is None: if errors is None:
errors = 'replace' errors = 'replace'
# pct_sequence: contiguous sequence of percent-encoded bytes, decoded bits = _asciire.split(string)
pct_sequence = b'' res = [bits[0]]
string = res[0] append = res.append
for item in res[1:]: for i in range(1, len(bits), 2):
try: append(compat_urllib_parse_unquote_to_bytes(bits[i]).decode(encoding, errors))
if not item: append(bits[i + 1])
raise ValueError return ''.join(res)
pct_sequence += item[:2].decode('hex')
rest = item[2:] def compat_urllib_parse_unquote_plus(string, encoding='utf-8', errors='replace'):
if not rest: """Like unquote(), but also replace plus signs by spaces, as required for
# This segment was just a single percent-encoded character. unquoting HTML form values.
# May be part of a sequence of code units, so delay decoding.
# (Stored in pct_sequence). unquote_plus('%7e/abc+def') -> '~/abc def'
continue """
except ValueError: string = string.replace('+', ' ')
rest = '%' + item return compat_urllib_parse_unquote(string, encoding, errors)
# Encountered non-percent-encoded characters. Flush the current
# pct_sequence.
string += pct_sequence.decode(encoding, errors) + rest
pct_sequence = b''
if pct_sequence:
# Flush the final pct_sequence
string += pct_sequence.decode(encoding, errors)
return string
try: try:
compat_str = unicode # Python 2 compat_str = unicode # Python 2
@ -388,12 +426,27 @@ else:
pass pass
return _terminal_size(columns, lines) return _terminal_size(columns, lines)
try:
itertools.count(start=0, step=1)
compat_itertools_count = itertools.count
except TypeError: # Python 2.6
def compat_itertools_count(start=0, step=1):
n = start
while True:
yield n
n += step
if sys.version_info >= (3, 0):
from tokenize import tokenize as compat_tokenize_tokenize
else:
from tokenize import generate_tokens as compat_tokenize_tokenize
__all__ = [ __all__ = [
'compat_HTTPError', 'compat_HTTPError',
'compat_basestring', 'compat_basestring',
'compat_chr', 'compat_chr',
'compat_cookiejar', 'compat_cookiejar',
'compat_cookies',
'compat_expanduser', 'compat_expanduser',
'compat_get_terminal_size', 'compat_get_terminal_size',
'compat_getenv', 'compat_getenv',
@ -401,6 +454,7 @@ __all__ = [
'compat_html_entities', 'compat_html_entities',
'compat_http_client', 'compat_http_client',
'compat_http_server', 'compat_http_server',
'compat_itertools_count',
'compat_kwargs', 'compat_kwargs',
'compat_ord', 'compat_ord',
'compat_parse_qs', 'compat_parse_qs',
@ -408,9 +462,12 @@ __all__ = [
'compat_socket_create_connection', 'compat_socket_create_connection',
'compat_str', 'compat_str',
'compat_subprocess_get_DEVNULL', 'compat_subprocess_get_DEVNULL',
'compat_tokenize_tokenize',
'compat_urllib_error', 'compat_urllib_error',
'compat_urllib_parse', 'compat_urllib_parse',
'compat_urllib_parse_unquote', 'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlparse', 'compat_urllib_parse_urlparse',
'compat_urllib_request', 'compat_urllib_request',
'compat_urlparse', 'compat_urlparse',

View File

@ -8,6 +8,7 @@ from .hls import NativeHlsFD
from .http import HttpFD from .http import HttpFD
from .rtsp import RtspFD from .rtsp import RtspFD
from .rtmp import RtmpFD from .rtmp import RtmpFD
from .dash import DashSegmentsFD
from ..utils import ( from ..utils import (
determine_protocol, determine_protocol,
@ -20,6 +21,7 @@ PROTOCOL_MAP = {
'mms': RtspFD, 'mms': RtspFD,
'rtsp': RtspFD, 'rtsp': RtspFD,
'f4m': F4mFD, 'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
} }

View File

@ -0,0 +1,66 @@
from __future__ import unicode_literals
import re
from .common import FileDownloader
from ..compat import compat_urllib_request
class DashSegmentsFD(FileDownloader):
"""
Download segments in a DASH manifest
"""
def real_download(self, filename, info_dict):
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
base_url = info_dict['url']
segment_urls = info_dict['segment_urls']
is_test = self.params.get('test', False)
remaining_bytes = self._TEST_FILE_SIZE if is_test else None
byte_counter = 0
def append_url_to_file(outf, target_url, target_name, remaining_bytes=None):
self.to_screen('[DashSegments] %s: Downloading %s' % (info_dict['id'], target_name))
req = compat_urllib_request.Request(target_url)
if remaining_bytes is not None:
req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
data = self.ydl.urlopen(req).read()
if remaining_bytes is not None:
data = data[:remaining_bytes]
outf.write(data)
return len(data)
def combine_url(base_url, target_url):
if re.match(r'^https?://', target_url):
return target_url
return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
with open(tmpfilename, 'wb') as outf:
append_url_to_file(
outf, combine_url(base_url, info_dict['initialization_url']),
'initialization segment')
for i, segment_url in enumerate(segment_urls):
segment_len = append_url_to_file(
outf, combine_url(base_url, segment_url),
'segment %d / %d' % (i + 1, len(segment_urls)),
remaining_bytes)
byte_counter += segment_len
if remaining_bytes is not None:
remaining_bytes -= segment_len
if remaining_bytes <= 0:
break
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': byte_counter,
'total_bytes': byte_counter,
'filename': filename,
'status': 'finished',
})
return True

View File

@ -51,6 +51,9 @@ class ExternalFD(FileDownloader):
return [] return []
return [command_option, source_address] return [command_option, source_address]
def _no_check_certificate(self, command_option):
return [command_option] if self.params.get('nocheckcertificate', False) else []
def _configuration_args(self, default=[]): def _configuration_args(self, default=[]):
ex_args = self.params.get('external_downloader_args') ex_args = self.params.get('external_downloader_args')
if ex_args is None: if ex_args is None:
@ -83,12 +86,23 @@ class CurlFD(ExternalFD):
return cmd return cmd
class AxelFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-o', tmpfilename]
for key, val in info_dict['http_headers'].items():
cmd += ['-H', '%s: %s' % (key, val)]
cmd += self._configuration_args()
cmd += ['--', info_dict['url']]
return cmd
class WgetFD(ExternalFD): class WgetFD(ExternalFD):
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-O', tmpfilename, '-nv', '--no-cookies'] cmd = [self.exe, '-O', tmpfilename, '-nv', '--no-cookies']
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
cmd += ['--header', '%s: %s' % (key, val)] cmd += ['--header', '%s: %s' % (key, val)]
cmd += self._source_address('--bind-address') cmd += self._source_address('--bind-address')
cmd += self._no_check_certificate('--no-check-certificate')
cmd += self._configuration_args() cmd += self._configuration_args()
cmd += ['--', info_dict['url']] cmd += ['--', info_dict['url']]
return cmd return cmd

View File

@ -7,8 +7,7 @@ import os
import time import time
import xml.etree.ElementTree as etree import xml.etree.ElementTree as etree
from .common import FileDownloader from .fragment import FragmentFD
from .http import HttpFD
from ..compat import ( from ..compat import (
compat_urlparse, compat_urlparse,
compat_urllib_error, compat_urllib_error,
@ -16,8 +15,6 @@ from ..compat import (
from ..utils import ( from ..utils import (
struct_pack, struct_pack,
struct_unpack, struct_unpack,
encodeFilename,
sanitize_open,
xpath_text, xpath_text,
) )
@ -226,16 +223,13 @@ def _add_ns(prop):
return '{http://ns.adobe.com/f4m/1.0}%s' % prop return '{http://ns.adobe.com/f4m/1.0}%s' % prop
class HttpQuietDownloader(HttpFD): class F4mFD(FragmentFD):
def to_screen(self, *args, **kargs):
pass
class F4mFD(FileDownloader):
""" """
A downloader for f4m manifests or AdobeHDS. A downloader for f4m manifests or AdobeHDS.
""" """
FD_NAME = 'f4m'
def _get_unencrypted_media(self, doc): def _get_unencrypted_media(self, doc):
media = doc.findall(_add_ns('media')) media = doc.findall(_add_ns('media'))
if not media: if not media:
@ -288,7 +282,7 @@ class F4mFD(FileDownloader):
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
man_url = info_dict['url'] man_url = info_dict['url']
requested_bitrate = info_dict.get('tbr') requested_bitrate = info_dict.get('tbr')
self.to_screen('[download] Downloading f4m manifest') self.to_screen('[%s] Downloading f4m manifest' % self.FD_NAME)
manifest = self.ydl.urlopen(man_url).read() manifest = self.ydl.urlopen(man_url).read()
doc = etree.fromstring(manifest) doc = etree.fromstring(manifest)
@ -320,67 +314,20 @@ class F4mFD(FileDownloader):
# For some akamai manifests we'll need to add a query to the fragment url # For some akamai manifests we'll need to add a query to the fragment url
akamai_pv = xpath_text(doc, _add_ns('pv-2.0')) akamai_pv = xpath_text(doc, _add_ns('pv-2.0'))
self.report_destination(filename) ctx = {
http_dl = HttpQuietDownloader( 'filename': filename,
self.ydl, 'total_frags': total_frags,
{
'continuedl': True,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit', None),
'test': self.params.get('test', False),
} }
)
tmpfilename = self.temp_name(filename) self._prepare_frag_download(ctx)
(dest_stream, tmpfilename) = sanitize_open(tmpfilename, 'wb')
dest_stream = ctx['dest_stream']
write_flv_header(dest_stream) write_flv_header(dest_stream)
if not live: if not live:
write_metadata_tag(dest_stream, metadata) write_metadata_tag(dest_stream, metadata)
# This dict stores the download progress, it's updated by the progress self._start_frag_download(ctx)
# hook
state = {
'status': 'downloading',
'downloaded_bytes': 0,
'frag_index': 0,
'frag_count': total_frags,
'filename': filename,
'tmpfilename': tmpfilename,
}
start = time.time()
def frag_progress_hook(s):
if s['status'] not in ('downloading', 'finished'):
return
frag_total_bytes = s.get('total_bytes', 0)
if s['status'] == 'finished':
state['downloaded_bytes'] += frag_total_bytes
state['frag_index'] += 1
estimated_size = (
(state['downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags)
time_now = time.time()
state['total_bytes_estimate'] = estimated_size
state['elapsed'] = time_now - start
if s['status'] == 'finished':
progress = self.calc_percent(state['frag_index'], total_frags)
else:
frag_downloaded_bytes = s['downloaded_bytes']
frag_progress = self.calc_percent(frag_downloaded_bytes,
frag_total_bytes)
progress = self.calc_percent(state['frag_index'], total_frags)
progress += frag_progress / float(total_frags)
state['eta'] = self.calc_eta(
start, time_now, estimated_size, state['downloaded_bytes'] + frag_downloaded_bytes)
state['speed'] = s.get('speed')
self._hook_progress(state)
http_dl.add_progress_hook(frag_progress_hook)
frags_filenames = [] frags_filenames = []
while fragments_list: while fragments_list:
@ -391,9 +338,9 @@ class F4mFD(FileDownloader):
url += '?' + akamai_pv.strip(';') url += '?' + akamai_pv.strip(';')
if info_dict.get('extra_param_to_segment_url'): if info_dict.get('extra_param_to_segment_url'):
url += info_dict.get('extra_param_to_segment_url') url += info_dict.get('extra_param_to_segment_url')
frag_filename = '%s-%s' % (tmpfilename, name) frag_filename = '%s-%s' % (ctx['tmpfilename'], name)
try: try:
success = http_dl.download(frag_filename, {'url': url}) success = ctx['dl'].download(frag_filename, {'url': url})
if not success: if not success:
return False return False
with open(frag_filename, 'rb') as down: with open(frag_filename, 'rb') as down:
@ -425,20 +372,9 @@ class F4mFD(FileDownloader):
msg = 'Missed %d fragments' % (fragments_list[0][1] - (frag_i + 1)) msg = 'Missed %d fragments' % (fragments_list[0][1] - (frag_i + 1))
self.report_warning(msg) self.report_warning(msg)
dest_stream.close() self._finish_frag_download(ctx)
elapsed = time.time() - start
self.try_rename(tmpfilename, filename)
for frag_file in frags_filenames: for frag_file in frags_filenames:
os.remove(frag_file) os.remove(frag_file)
fsize = os.path.getsize(encodeFilename(filename))
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
'elapsed': elapsed,
})
return True return True

View File

@ -0,0 +1,110 @@
from __future__ import division, unicode_literals
import os
import time
from .common import FileDownloader
from .http import HttpFD
from ..utils import (
encodeFilename,
sanitize_open,
)
class HttpQuietDownloader(HttpFD):
def to_screen(self, *args, **kargs):
pass
class FragmentFD(FileDownloader):
"""
A base file downloader class for fragmented media (e.g. f4m/m3u8 manifests).
"""
def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx)
self._start_frag_download(ctx)
def _prepare_frag_download(self, ctx):
self.to_screen('[%s] Total fragments: %d' % (self.FD_NAME, ctx['total_frags']))
self.report_destination(ctx['filename'])
dl = HttpQuietDownloader(
self.ydl,
{
'continuedl': True,
'quiet': True,
'noprogress': True,
'ratelimit': self.params.get('ratelimit', None),
'test': self.params.get('test', False),
}
)
tmpfilename = self.temp_name(ctx['filename'])
dest_stream, tmpfilename = sanitize_open(tmpfilename, 'wb')
ctx.update({
'dl': dl,
'dest_stream': dest_stream,
'tmpfilename': tmpfilename,
})
def _start_frag_download(self, ctx):
total_frags = ctx['total_frags']
# This dict stores the download progress, it's updated by the progress
# hook
state = {
'status': 'downloading',
'downloaded_bytes': 0,
'frag_index': 0,
'frag_count': total_frags,
'filename': ctx['filename'],
'tmpfilename': ctx['tmpfilename'],
}
start = time.time()
ctx['started'] = start
def frag_progress_hook(s):
if s['status'] not in ('downloading', 'finished'):
return
frag_total_bytes = s.get('total_bytes', 0)
if s['status'] == 'finished':
state['downloaded_bytes'] += frag_total_bytes
state['frag_index'] += 1
estimated_size = (
(state['downloaded_bytes'] + frag_total_bytes) /
(state['frag_index'] + 1) * total_frags)
time_now = time.time()
state['total_bytes_estimate'] = estimated_size
state['elapsed'] = time_now - start
if s['status'] == 'finished':
progress = self.calc_percent(state['frag_index'], total_frags)
else:
frag_downloaded_bytes = s['downloaded_bytes']
frag_progress = self.calc_percent(frag_downloaded_bytes,
frag_total_bytes)
progress = self.calc_percent(state['frag_index'], total_frags)
progress += frag_progress / float(total_frags)
state['eta'] = self.calc_eta(
start, time_now, estimated_size, state['downloaded_bytes'] + frag_downloaded_bytes)
state['speed'] = s.get('speed')
self._hook_progress(state)
ctx['dl'].add_progress_hook(frag_progress_hook)
return start
def _finish_frag_download(self, ctx):
ctx['dest_stream'].close()
elapsed = time.time() - ctx['started']
self.try_rename(ctx['tmpfilename'], ctx['filename'])
fsize = os.path.getsize(encodeFilename(ctx['filename']))
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': ctx['filename'],
'status': 'finished',
'elapsed': elapsed,
})

View File

@ -4,12 +4,11 @@ import os
import re import re
import subprocess import subprocess
from ..postprocessor.ffmpeg import FFmpegPostProcessor
from .common import FileDownloader from .common import FileDownloader
from ..compat import ( from .fragment import FragmentFD
compat_urlparse,
compat_urllib_request, from ..compat import compat_urlparse
) from ..postprocessor.ffmpeg import FFmpegPostProcessor
from ..utils import ( from ..utils import (
encodeArgument, encodeArgument,
encodeFilename, encodeFilename,
@ -51,54 +50,50 @@ class HlsFD(FileDownloader):
return False return False
class NativeHlsFD(FileDownloader): class NativeHlsFD(FragmentFD):
""" A more limited implementation that does not require ffmpeg """ """ A more limited implementation that does not require ffmpeg """
def real_download(self, filename, info_dict): FD_NAME = 'hlsnative'
url = info_dict['url']
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
self.to_screen( def real_download(self, filename, info_dict):
'[hlsnative] %s: Downloading m3u8 manifest' % info_dict['id']) man_url = info_dict['url']
data = self.ydl.urlopen(url).read() self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
s = data.decode('utf-8', 'ignore') manifest = self.ydl.urlopen(man_url).read()
segment_urls = []
s = manifest.decode('utf-8', 'ignore')
fragment_urls = []
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()
if line and not line.startswith('#'): if line and not line.startswith('#'):
segment_url = ( segment_url = (
line line
if re.match(r'^https?://', line) if re.match(r'^https?://', line)
else compat_urlparse.urljoin(url, line)) else compat_urlparse.urljoin(man_url, line))
segment_urls.append(segment_url) fragment_urls.append(segment_url)
# We only download the first fragment during the test
is_test = self.params.get('test', False) if self.params.get('test', False):
remaining_bytes = self._TEST_FILE_SIZE if is_test else None
byte_counter = 0
with open(tmpfilename, 'wb') as outf:
for i, segurl in enumerate(segment_urls):
self.to_screen(
'[hlsnative] %s: Downloading segment %d / %d' %
(info_dict['id'], i + 1, len(segment_urls)))
seg_req = compat_urllib_request.Request(segurl)
if remaining_bytes is not None:
seg_req.add_header('Range', 'bytes=0-%d' % (remaining_bytes - 1))
segment = self.ydl.urlopen(seg_req).read()
if remaining_bytes is not None:
segment = segment[:remaining_bytes]
remaining_bytes -= len(segment)
outf.write(segment)
byte_counter += len(segment)
if remaining_bytes is not None and remaining_bytes <= 0:
break break
self._hook_progress({ ctx = {
'downloaded_bytes': byte_counter,
'total_bytes': byte_counter,
'filename': filename, 'filename': filename,
'status': 'finished', 'total_frags': len(fragment_urls),
}) }
self.try_rename(tmpfilename, filename)
self._prepare_and_start_frag_download(ctx)
frags_filenames = []
for i, frag_url in enumerate(fragment_urls):
frag_filename = '%s-Frag%d' % (ctx['tmpfilename'], i)
success = ctx['dl'].download(frag_filename, {'url': frag_url})
if not success:
return False
with open(frag_filename, 'rb') as down:
ctx['dest_stream'].write(down.read())
frags_filenames.append(frag_filename)
self._finish_frag_download(ctx)
for frag_file in frags_filenames:
os.remove(frag_file)
return True return True

View File

@ -4,6 +4,7 @@ import errno
import os import os
import socket import socket
import time import time
import re
from .common import FileDownloader from .common import FileDownloader
from ..compat import ( from ..compat import (
@ -57,6 +58,24 @@ class HttpFD(FileDownloader):
# Establish connection # Establish connection
try: try:
data = self.ydl.urlopen(request) data = self.ydl.urlopen(request)
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
# set in response despite of requested Range (see
# https://github.com/rg3/youtube-dl/issues/6057#issuecomment-126129799)
if resume_len > 0:
content_range = data.headers.get('Content-Range')
if content_range:
content_range_m = re.search(r'bytes (\d+)-', content_range)
# Content-Range is present and matches requested Range, resume is possible
if content_range_m and resume_len == int(content_range_m.group(1)):
break
# Content-Range is either not present or invalid. Assuming remote webserver is
# trying to send the whole file, resume is not possible, so wiping the local file
# and performing entire redownload
self.report_unable_to_resume()
resume_len = 0
open_mode = 'wb'
break break
except (compat_urllib_error.HTTPError, ) as err: except (compat_urllib_error.HTTPError, ) as err:
if (err.code < 500 or err.code >= 600) and err.code != 416: if (err.code < 500 or err.code >= 600) and err.code != 416:

View File

@ -19,9 +19,14 @@ from .anysex import AnySexIE
from .aol import AolIE from .aol import AolIE
from .allocine import AllocineIE from .allocine import AllocineIE
from .aparat import AparatIE from .aparat import AparatIE
from .appleconnect import AppleConnectIE
from .appletrailers import AppleTrailersIE from .appletrailers import AppleTrailersIE
from .archiveorg import ArchiveOrgIE from .archiveorg import ArchiveOrgIE
from .ard import ARDIE, ARDMediathekIE from .ard import (
ARDIE,
ARDMediathekIE,
SportschauIE,
)
from .arte import ( from .arte import (
ArteTvIE, ArteTvIE,
ArteTVPlus7IE, ArteTVPlus7IE,
@ -38,7 +43,10 @@ from .azubu import AzubuIE
from .baidu import BaiduVideoIE from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbccouk import BBCCoUkIE from .bbc import (
BBCCoUkIE,
BBCIE,
)
from .beeg import BeegIE from .beeg import BeegIE
from .behindkink import BehindKinkIE from .behindkink import BehindKinkIE
from .beatportpro import BeatportProIE from .beatportpro import BeatportProIE
@ -110,6 +118,7 @@ from .dailymotion import (
) )
from .daum import DaumIE from .daum import DaumIE
from .dbtv import DBTVIE from .dbtv import DBTVIE
from .dcn import DCNIE
from .dctp import DctpTvIE from .dctp import DctpTvIE
from .deezer import DeezerPlaylistIE from .deezer import DeezerPlaylistIE
from .dfb import DFBIE from .dfb import DFBIE
@ -238,6 +247,7 @@ from .instagram import InstagramIE, InstagramUserIE
from .internetvideoarchive import InternetVideoArchiveIE from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE from .iprima import IPrimaIE
from .iqiyi import IqiyiIE from .iqiyi import IqiyiIE
from .ir90tv import Ir90TvIE
from .ivi import ( from .ivi import (
IviIE, IviIE,
IviCompilationIE IviCompilationIE
@ -260,8 +270,17 @@ from .keek import KeekIE
from .kontrtube import KontrTubeIE from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE from .krasview import KrasViewIE
from .ku6 import Ku6IE from .ku6 import Ku6IE
from .kuwo import (
KuwoIE,
KuwoAlbumIE,
KuwoChartIE,
KuwoSingerIE,
KuwoCategoryIE,
KuwoMvIE,
)
from .la7 import LA7IE from .la7 import LA7IE
from .laola1tv import Laola1TvIE from .laola1tv import Laola1TvIE
from .lecture2go import Lecture2GoIE
from .letv import ( from .letv import (
LetvIE, LetvIE,
LetvTvIE, LetvTvIE,
@ -323,6 +342,7 @@ from .musicvault import MusicVaultIE
from .muzu import MuzuTVIE from .muzu import MuzuTVIE
from .myspace import MySpaceIE, MySpaceAlbumIE from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvi import MyviIE
from .myvideo import MyVideoIE from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE from .myvidster import MyVidsterIE
from .nationalgeographic import NationalGeographicIE from .nationalgeographic import NationalGeographicIE
@ -342,6 +362,15 @@ from .ndtv import NDTVIE
from .netzkino import NetzkinoIE from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE from .nerdcubed import NerdCubedFeedIE
from .nerdist import NerdistIE from .nerdist import NerdistIE
from .neteasemusic import (
NetEaseMusicIE,
NetEaseMusicAlbumIE,
NetEaseMusicSingerIE,
NetEaseMusicListIE,
NetEaseMusicMvIE,
NetEaseMusicProgramIE,
NetEaseMusicDjRadioIE,
)
from .newgrounds import NewgroundsIE from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE from .newstube import NewstubeIE
from .nextmedia import ( from .nextmedia import (
@ -371,7 +400,8 @@ from .npo import (
NPOLiveIE, NPOLiveIE,
NPORadioIE, NPORadioIE,
NPORadioFragmentIE, NPORadioFragmentIE,
TegenlichtVproIE, VPROIE,
WNLIE
) )
from .nrk import ( from .nrk import (
NRKIE, NRKIE,
@ -402,6 +432,10 @@ from .orf import (
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
from .pbs import PBSIE from .pbs import PBSIE
from .periscope import (
PeriscopeIE,
QuickscopeIE,
)
from .philharmoniedeparis import PhilharmonieDeParisIE from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
@ -432,6 +466,7 @@ from .qqmusic import (
QQMusicSingerIE, QQMusicSingerIE,
QQMusicAlbumIE, QQMusicAlbumIE,
QQMusicToplistIE, QQMusicToplistIE,
QQMusicPlaylistIE,
) )
from .quickvid import QuickVidIE from .quickvid import QuickVidIE
from .r7 import R7IE from .r7 import R7IE
@ -441,6 +476,7 @@ from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE from .radiofrance import RadioFranceIE
from .rai import RaiIE from .rai import RaiIE
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redtube import RedTubeIE from .redtube import RedTubeIE
from .restudy import RestudyIE from .restudy import RestudyIE
from .reverbnation import ReverbNationIE from .reverbnation import ReverbNationIE
@ -702,7 +738,10 @@ from .wdr import (
WDRMobileIE, WDRMobileIE,
WDRMausIE, WDRMausIE,
) )
from .webofstories import WebOfStoriesIE from .webofstories import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
from .weibo import WeiboIE from .weibo import WeiboIE
from .wimp import WimpIE from .wimp import WimpIE
from .wistia import WistiaIE from .wistia import WistiaIE
@ -733,6 +772,7 @@ from .yandexmusic import (
YandexMusicPlaylistIE, YandexMusicPlaylistIE,
) )
from .yesjapan import YesJapanIE from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE from .ynet import YnetIE
from .youjizz import YouJizzIE from .youjizz import YouJizzIE
from .youku import YoukuIE from .youku import YoukuIE

View File

@ -0,0 +1,50 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
str_to_int,
ExtractorError
)
class AppleConnectIE(InfoExtractor):
_VALID_URL = r'https?://itunes\.apple\.com/\w{0,2}/?post/idsa\.(?P<id>[\w-]+)'
_TEST = {
'url': 'https://itunes.apple.com/us/post/idsa.4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'md5': '10d0f2799111df4cb1c924520ca78f98',
'info_dict': {
'id': '4ab17a39-2720-11e5-96c5-a5b38f6c42d3',
'ext': 'm4v',
'title': 'Energy',
'uploader': 'Drake',
'thumbnail': 'http://is5.mzstatic.com/image/thumb/Video5/v4/78/61/c5/7861c5fa-ad6d-294b-1464-cf7605b911d6/source/1920x1080sr.jpg',
'upload_date': '20150710',
'timestamp': 1436545535,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
try:
video_json = self._html_search_regex(
r'class="auc-video-data">(\{.*?\})', webpage, 'json')
except ExtractorError:
raise ExtractorError('This post doesn\'t contain a video', expected=True)
video_data = self._parse_json(video_json, video_id)
timestamp = str_to_int(self._html_search_regex(r'data-timestamp="(\d+)"', webpage, 'timestamp'))
like_count = str_to_int(self._html_search_regex(r'(\d+) Loves', webpage, 'like count'))
return {
'id': video_id,
'url': video_data['sslSrc'],
'title': video_data['title'],
'description': video_data['description'],
'uploader': video_data['artistName'],
'thumbnail': video_data['artworkUrl'],
'timestamp': timestamp,
'like_count': like_count,
}

View File

@ -8,6 +8,7 @@ from .generic import GenericIE
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
get_element_by_attribute,
qualities, qualities,
int_or_none, int_or_none,
parse_duration, parse_duration,
@ -22,19 +23,125 @@ class ARDMediathekIE(InfoExtractor):
_VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?' _VALID_URL = r'^https?://(?:(?:www\.)?ardmediathek\.de|mediathek\.daserste\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ardmediathek.de/tv/Dokumentation-und-Reportage/Ich-liebe-das-Leben-trotzdem/rbb-Fernsehen/Video?documentId=29582122&bcastId=3822114',
'info_dict': {
'id': '29582122',
'ext': 'mp4',
'title': 'Ich liebe das Leben trotzdem',
'description': 'md5:45e4c225c72b27993314b31a84a5261c',
'duration': 4557,
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Tatort-Scheinwelten-H%C3%B6rfassung-Video/Das-Erste/Video?documentId=29522730&bcastId=602916',
'md5': 'f4d98b10759ac06c0072bbcd1f0b9e3e',
'info_dict': {
'id': '29522730',
'ext': 'mp4',
'title': 'Tatort: Scheinwelten - Hörfassung (Video tgl. ab 20 Uhr)',
'description': 'md5:196392e79876d0ac94c94e8cdb2875f1',
'duration': 5252,
},
}, {
# audio
'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
'md5': '219d94d8980b4f538c7fcb0865eb7f2c',
'info_dict': {
'id': '28488308',
'ext': 'mp3',
'title': 'Tod eines Fußballers',
'description': 'md5:f6e39f3461f0e1f54bfa48c8875c86ef',
'duration': 3240,
},
}, {
'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht', 'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.ardmediathek.de/tv/Tatort/Das-Wunder-von-Wolbeck-Video-tgl-ab-20/Das-Erste/Video?documentId=22490580&bcastId=602916',
'info_dict': {
'id': '22490580',
'ext': 'mp4',
'title': 'Das Wunder von Wolbeck (Video tgl. ab 20 Uhr)',
'description': 'Auf einem restaurierten Hof bei Wolbeck wird der Heilpraktiker Raffael Lembeck eines morgens von seiner Frau Stella tot aufgefunden. Das Opfer war offensichtlich in seiner Praxis zu Fall gekommen und ist dann verblutet, erklärt Prof. Boerne am Tatort.',
},
'skip': 'Blocked outside of Germany',
}] }]
def _extract_media_info(self, media_info_url, webpage, video_id):
media_info = self._download_json(
media_info_url, video_id, 'Downloading media JSON')
formats = self._extract_formats(media_info, video_id)
if not formats:
if '"fsk"' in webpage:
raise ExtractorError(
'This video is only available after 20:00', expected=True)
elif media_info.get('_geoblocked'):
raise ExtractorError('This video is not available due to geo restriction', expected=True)
self._sort_formats(formats)
duration = int_or_none(media_info.get('_duration'))
thumbnail = media_info.get('_previewImage')
subtitles = {}
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'srt',
'url': subtitle_url,
}]
return {
'id': video_id,
'duration': duration,
'thumbnail': thumbnail,
'formats': formats,
'subtitles': subtitles,
}
def _extract_formats(self, media_info, video_id):
type_ = media_info.get('_type')
media_array = media_info.get('_mediaArray', [])
formats = []
for num, media in enumerate(media_array):
for stream in media.get('_mediaStreamArray', []):
stream_urls = stream.get('_stream')
if not stream_urls:
continue
if not isinstance(stream_urls, list):
stream_urls = [stream_urls]
quality = stream.get('_quality')
server = stream.get('_server')
for stream_url in stream_urls:
ext = determine_ext(stream_url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124',
video_id, preference=-1, f4m_id='hds'))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream_url, video_id, 'mp4', preference=1, m3u8_id='hls'))
else:
if server and server.startswith('rtmp'):
f = {
'url': server,
'play_path': stream_url,
'format_id': 'a%s-rtmp-%s' % (num, quality),
}
elif stream_url.startswith('http'):
f = {
'url': stream_url,
'format_id': 'a%s-%s-%s' % (num, ext, quality)
}
else:
continue
m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
if type_ == 'audio':
f['vcodec'] = 'none'
formats.append(f)
return formats
def _real_extract(self, url): def _real_extract(self, url):
# determine video id from url # determine video id from url
m = re.match(self._VALID_URL, url) m = re.match(self._VALID_URL, url)
@ -92,46 +199,22 @@ class ARDMediathekIE(InfoExtractor):
'format_id': fid, 'format_id': fid,
'url': furl, 'url': furl,
}) })
else: # request JSON file
media_info = self._download_json(
'http://www.ardmediathek.de/play/media/%s' % video_id, video_id)
# The second element of the _mediaArray contains the standard http urls
streams = media_info['_mediaArray'][1]['_mediaStreamArray']
if not streams:
if '"fsk"' in webpage:
raise ExtractorError('This video is only available after 20:00')
formats = []
for s in streams:
if type(s['_stream']) == list:
for index, url in enumerate(s['_stream'][::-1]):
quality = s['_quality'] + index
formats.append({
'quality': quality,
'url': url,
'format_id': '%s-%s' % (determine_ext(url), quality)
})
continue
format = {
'quality': s['_quality'],
'url': s['_stream'],
}
format['format_id'] = '%s-%s' % (
determine_ext(format['url']), format['quality'])
formats.append(format)
self._sort_formats(formats) self._sort_formats(formats)
info = {
'formats': formats,
}
else: # request JSON file
info = self._extract_media_info(
'http://www.ardmediathek.de/play/media/%s' % video_id, webpage, video_id)
return { info.update({
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': description,
'formats': formats,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} })
return info
class ARDIE(InfoExtractor): class ARDIE(InfoExtractor):
@ -189,3 +272,41 @@ class ARDIE(InfoExtractor):
'upload_date': upload_date, 'upload_date': upload_date,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
} }
class SportschauIE(ARDMediathekIE):
IE_NAME = 'Sportschau'
_VALID_URL = r'(?P<baseurl>https?://(?:www\.)?sportschau\.de/(?:[^/]+/)+video(?P<id>[^/#?]+))\.html'
_TESTS = [{
'url': 'http://www.sportschau.de/tourdefrance/videoseppeltkokainhatnichtsmitklassischemdopingzutun100.html',
'info_dict': {
'id': 'seppeltkokainhatnichtsmitklassischemdopingzutun100',
'ext': 'mp4',
'title': 'Seppelt: "Kokain hat nichts mit klassischem Doping zu tun"',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Der ARD-Doping Experte Hajo Seppelt gibt seine Einschätzung zum ersten Dopingfall der diesjährigen Tour de France um den Italiener Luca Paolini ab.',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = get_element_by_attribute('class', 'headline', webpage)
description = self._html_search_meta('description', webpage, 'description')
info = self._extract_media_info(
base_url + '-mc_defaultQuality-h.json', webpage, video_id)
info.update({
'title': title,
'description': description,
})
return info

View File

@ -8,6 +8,7 @@ from ..compat import compat_urlparse
class BaiduVideoIE(InfoExtractor): class BaiduVideoIE(InfoExtractor):
IE_DESC = '百度视频'
_VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm' _VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_TESTS = [{ _TESTS = [{
'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6', 'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',

780
youtube_dl/extractor/bbc.py Normal file
View File

@ -0,0 +1,780 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_duration,
parse_iso8601,
)
from ..compat import compat_HTTPError
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:(?:programmes|iplayer(?:/[^/]+)?/(?:episode|playlist))/)|music/clips[/#])(?P<id>[\da-z]{8})'
_MEDIASELECTOR_URLS = [
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s',
]
_TESTS = [
{
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
'duration': 1740,
},
'params': {
# rtmp download
'skip_download': True,
}
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b00yng5w/The_Man_in_Black_Series_3_The_Printed_Name/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Man in Black: Series 3: The Printed Name',
'description': "Mark Gatiss introduces Nicholas Pierpan's chilling tale of a writer's devilish pact with a mysterious man. Stars Ewan Bailey.",
'duration': 1800,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Episode is no longer available on BBC iPlayer Radio',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b03vhd1f/The_Voice_UK_Series_3_Blind_Auditions_5/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Voice UK: Series 3: Blind Auditions 5',
'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
'duration': 5100,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/p026c7jt/tomorrows-worlds-the-unearthly-history-of-science-fiction-2-invasion',
'info_dict': {
'id': 'b03k3pb7',
'ext': 'flv',
'title': "Tomorrow's Worlds: The Unearthly History of Science Fiction",
'description': '2. Invasion',
'duration': 3600,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
}, {
'url': 'http://www.bbc.co.uk/programmes/b04v20dw',
'info_dict': {
'id': 'b04v209v',
'ext': 'flv',
'title': 'Pete Tong, The Essential New Tune Special',
'description': "Pete has a very special mix - all of 2014's Essential New Tunes!",
'duration': 10800,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
'note': 'Audio',
'info_dict': {
'id': 'p02frcch',
'ext': 'flv',
'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
'duration': 3507,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p025c0zz',
'note': 'Video',
'info_dict': {
'id': 'p025c103',
'ext': 'flv',
'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)',
'description': 'Rae Morris performs Closer for BBC Three at Reading 2014',
'duration': 226,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b054fn09/ad/natural-world-20152016-2-super-powered-owls',
'info_dict': {
'id': 'p02n76xf',
'ext': 'flv',
'title': 'Natural World, 2015-2016: 2. Super Powered Owls',
'description': 'md5:e4db5c937d0e95a7c6b5e654d429183d',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b05zmgwn/royal-academy-summer-exhibition',
'info_dict': {
'id': 'b05zmgw1',
'ext': 'flv',
'description': 'Kirsty Wark and Morgan Quaintance visit the Royal Academy as it prepares for its annual artistic extravaganza, meeting people who have come together to make the show unique.',
'title': 'Royal Academy Summer Exhibition',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/music/clips#p02frcc3',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/iplayer/cbeebies/episode/b0480276/bing-14-atchoo',
'only_matching': True,
}
]
class MediaSelectionError(Exception):
def __init__(self, id):
self.id = id
def _extract_asx_playlist(self, connection, programme_id):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
transfer_format = connection.get('transferFormat')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Skip DASH until supported
elif transfer_format == 'dash':
pass
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist):
return playlist.findall('./{http://bbc.co.uk/2008/emp/playlist}item')
def _extract_medias(self, media_selection):
error = media_selection.find('./{http://bbc.co.uk/2008/mp/mediaselection}error')
if error is not None:
raise BBCCoUkIE.MediaSelectionError(error.get('id'))
return media_selection.findall('./{http://bbc.co.uk/2008/mp/mediaselection}media')
def _extract_connections(self, media):
return media.findall('./{http://bbc.co.uk/2008/mp/mediaselection}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int_or_none(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int_or_none(media.get('width'))
height = int_or_none(media.get('height'))
file_size = int_or_none(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int_or_none(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id):
subtitles = {}
for connection in self._extract_connections(media):
captions = self._download_xml(connection.get('href'), programme_id, 'Downloading captions')
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en')
subtitles[lang] = [
{
'url': connection.get('href'),
'ext': 'ttml',
},
]
return subtitles
def _raise_extractor_error(self, media_selection_error):
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, media_selection_error.id),
expected=True)
def _download_media_selector(self, programme_id):
last_exception = None
for mediaselector_url in self._MEDIASELECTOR_URLS:
try:
return self._download_media_selector_url(
mediaselector_url % programme_id, programme_id)
except BBCCoUkIE.MediaSelectionError as e:
if e.id == 'notukerror':
last_exception = e
continue
self._raise_extractor_error(e)
self._raise_extractor_error(last_exception)
def _download_media_selector_url(self, url, programme_id=None):
try:
media_selection = self._download_xml(
url, programme_id, 'Downloading media selection XML')
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
media_selection = xml.etree.ElementTree.fromstring(ee.cause.read().decode('utf-8'))
else:
raise
return self._process_media_selector(media_selection, programme_id)
def _process_media_selector(self, media_selection, programme_id):
formats = []
subtitles = None
for media in self._extract_medias(media_selection):
kind = media.get('kind')
if kind == 'audio':
formats.extend(self._extract_audio(media, programme_id))
elif kind == 'video':
formats.extend(self._extract_video(media, programme_id))
elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles
def _download_playlist(self, playlist_id):
try:
playlist = self._download_json(
'http://www.bbc.co.uk/programmes/%s/playlist.json' % playlist_id,
playlist_id, 'Downloading playlist JSON')
version = playlist.get('defaultAvailableVersion')
if version:
smp_config = version['smpConfig']
title = smp_config['title']
description = smp_config['summary']
for item in smp_config['items']:
kind = item['kind']
if kind != 'programme' and kind != 'radioProgramme':
continue
programme_id = item.get('vpid')
duration = int_or_none(item.get('duration'))
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
except ExtractorError as ee:
if not (isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404):
raise
# fallback to legacy playlist
return self._process_legacy_playlist(playlist_id)
def _process_legacy_playlist_url(self, url, display_id):
playlist = self._download_legacy_playlist_url(url, display_id)
return self._extract_from_legacy_playlist(playlist, display_id)
def _process_legacy_playlist(self, playlist_id):
return self._process_legacy_playlist_url(
'http://www.bbc.co.uk/iplayer/playlist/%s' % playlist_id, playlist_id)
def _download_legacy_playlist_url(self, url, playlist_id=None):
return self._download_xml(
url, playlist_id, 'Downloading legacy playlist XML')
def _extract_from_legacy_playlist(self, playlist, playlist_id):
no_items = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}noItems')
if no_items is not None:
reason = no_items.get('reason')
if reason == 'preAvailability':
msg = 'Episode %s is not yet available' % playlist_id
elif reason == 'postAvailability':
msg = 'Episode %s is no longer available' % playlist_id
elif reason == 'noMedia':
msg = 'Episode %s is not currently available' % playlist_id
else:
msg = 'Episode %s is not available: %s' % (playlist_id, reason)
raise ExtractorError(msg, expected=True)
for item in self._extract_items(playlist):
kind = item.get('kind')
if kind != 'programme' and kind != 'radioProgramme':
continue
title = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}title').text
description = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}summary').text
def get_programme_id(item):
def get_from_attributes(item):
for p in('identifier', 'group'):
value = item.get(p)
if value and re.match(r'^[pb][\da-z]{7}$', value):
return value
get_from_attributes(item)
mediator = item.find('./{http://bbc.co.uk/2008/emp/playlist}mediator')
if mediator is not None:
return get_from_attributes(mediator)
programme_id = get_programme_id(item)
duration = int_or_none(item.get('duration'))
# TODO: programme_id can be None and media items can be incorporated right inside
# playlist's item (e.g. http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
# as f4m and m3u8
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
def _real_extract(self, url):
group_id = self._match_id(url)
webpage = self._download_webpage(url, group_id, 'Downloading video page')
programme_id = None
tviplayer = self._search_regex(
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
webpage, 'player', default=None)
if tviplayer:
player = self._parse_json(tviplayer, group_id).get('player', {})
duration = int_or_none(player.get('duration'))
programme_id = player.get('vpid')
if not programme_id:
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
title = self._og_search_title(webpage)
description = self._search_regex(
r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
webpage, 'description', fatal=False)
else:
programme_id, title, description, duration, formats, subtitles = self._download_playlist(group_id)
self._sort_formats(formats)
return {
'id': programme_id,
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}
class BBCIE(BBCCoUkIE):
IE_NAME = 'bbc'
IE_DESC = 'BBC'
_VALID_URL = r'https?://(?:www\.)?bbc\.(?:com|co\.uk)/(?:[^/]+/)+(?P<id>[^/#?]+)'
_MEDIASELECTOR_URLS = [
# Provides more formats, namely direct mp4 links, but fails on some videos with
# notukerror for non UK (?) users (e.g.
# http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
'http://open.live.bbc.co.uk/mediaselector/4/mtis/stream/%s',
# Provides fewer formats, but works everywhere for everybody (hopefully)
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/journalism-pc/vpid/%s',
]
_TESTS = [{
# article with multiple videos embedded with data-media-meta containing
# playlist.sxml, externalId and no direct video links
'url': 'http://www.bbc.com/news/world-europe-32668511',
'info_dict': {
'id': 'world-europe-32668511',
'title': 'Russia stages massive WW2 parade despite Western boycott',
'description': 'md5:00ff61976f6081841f759a08bf78cc9c',
},
'playlist_count': 2,
}, {
# article with multiple videos embedded with data-media-meta (more videos)
'url': 'http://www.bbc.com/news/business-28299555',
'info_dict': {
'id': 'business-28299555',
'title': 'Farnborough Airshow: Video highlights',
'description': 'BBC reports and video highlights at the Farnborough Airshow.',
},
'playlist_count': 9,
'skip': 'Save time',
}, {
# article with multiple videos embedded with `new SMP()`
'url': 'http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460',
'info_dict': {
'id': '3662a707-0af9-3149-963f-47bea720b460',
'title': 'BBC Blogs - Adam Curtis - BUGGER',
},
'playlist_count': 18,
}, {
# single video embedded with mediaAssetPage.init()
'url': 'http://www.bbc.com/news/world-europe-32041533',
'info_dict': {
'id': 'p02mprgb',
'ext': 'mp4',
'title': 'Aerial footage showed the site of the crash in the Alps - courtesy BFM TV',
'duration': 47,
'timestamp': 1427219242,
'upload_date': '20150324',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# article with single video embedded with data-media-meta containing
# direct video links (for now these are extracted) and playlist.xml (with
# media items as f4m and m3u8 - currently unsupported)
'url': 'http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu',
'info_dict': {
'id': '150615_telabyad_kentin_cogu',
'ext': 'mp4',
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
'duration': 47,
'timestamp': 1434397334,
'upload_date': '20150615',
},
'params': {
'skip_download': True,
}
}, {
# single video embedded with mediaAssetPage.init() (regional section)
'url': 'http://www.bbc.com/mundo/video_fotos/2015/06/150619_video_honduras_militares_hospitales_corrupcion_aw',
'info_dict': {
'id': '150619_video_honduras_militares_hospitales_corrupcion_aw',
'ext': 'mp4',
'title': 'Honduras militariza sus hospitales por nuevo escándalo de corrupción',
'duration': 87,
'timestamp': 1434713142,
'upload_date': '20150619',
},
'params': {
'skip_download': True,
}
}, {
# single video from video playlist embedded with vxp-playlist-data JSON
'url': 'http://www.bbc.com/news/video_and_audio/must_see/33376376',
'info_dict': {
'id': 'p02w6qjc',
'ext': 'mp4',
'title': '''Judge Mindy Glazer: "I'm sorry to see you here... I always wondered what happened to you"''',
'duration': 56,
},
'params': {
'skip_download': True,
}
}, {
# single video story with digitalData
'url': 'http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret',
'info_dict': {
'id': 'p02q6gc4',
'ext': 'flv',
'title': 'Sri Lankas spicy secret',
'description': 'As a new train line to Jaffna opens up the countrys north, travellers can experience a truly distinct slice of Tamil culture.',
'timestamp': 1437674293,
'upload_date': '20150723',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# single video story without digitalData
'url': 'http://www.bbc.com/autos/story/20130513-hyundais-rock-star',
'info_dict': {
'id': 'p018zqqg',
'ext': 'mp4',
'title': 'Hyundai Santa Fe Sport: Rock star',
'description': 'md5:b042a26142c4154a6e472933cf20793d',
'timestamp': 1368473503,
'upload_date': '20130513',
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# single video with playlist.sxml URL
'url': 'http://www.bbc.com/sport/0/football/33653409',
'info_dict': {
'id': 'p02xycnp',
'ext': 'mp4',
'title': 'Transfers: Cristiano Ronaldo to Man Utd, Arsenal to spend?',
'description': 'md5:398fca0e2e701c609d726e034fa1fc89',
'duration': 140,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
# single video with playlist URL from weather section
'url': 'http://www.bbc.com/weather/features/33601775',
'only_matching': True,
}, {
# custom redirection to www.bbc.com
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if BBCCoUkIE.suitable(url) else super(BBCIE, cls).suitable(url)
def _extract_from_media_meta(self, media_meta, video_id):
# Direct links to media in media metadata (e.g.
# http://www.bbc.com/turkce/haberler/2015/06/150615_telabyad_kentin_cogu)
# TODO: there are also f4m and m3u8 streams incorporated in playlist.sxml
source_files = media_meta.get('sourceFiles')
if source_files:
return [{
'url': f['url'],
'format_id': format_id,
'ext': f.get('encoding'),
'tbr': float_or_none(f.get('bitrate'), 1000),
'filesize': int_or_none(f.get('filesize')),
} for format_id, f in source_files.items() if f.get('url')], []
programme_id = media_meta.get('externalId')
if programme_id:
return self._download_media_selector(programme_id)
# Process playlist.sxml as legacy playlist
href = media_meta.get('href')
if href:
playlist = self._download_legacy_playlist_url(href)
_, _, _, _, formats, subtitles = self._extract_from_legacy_playlist(playlist, video_id)
return formats, subtitles
return [], []
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
timestamp = parse_iso8601(self._search_regex(
[r'"datePublished":\s*"([^"]+)',
r'<meta[^>]+property="article:published_time"[^>]+content="([^"]+)"',
r'itemprop="datePublished"[^>]+datetime="([^"]+)"'],
webpage, 'date', default=None))
# single video with playlist.sxml URL (e.g. http://www.bbc.com/sport/0/football/3365340ng)
playlist = self._search_regex(
r'<param[^>]+name="playlist"[^>]+value="([^"]+)"',
webpage, 'playlist', default=None)
if playlist:
programme_id, title, description, duration, formats, subtitles = \
self._process_legacy_playlist_url(playlist, playlist_id)
self._sort_formats(formats)
return {
'id': programme_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
}
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex(
[r'data-video-player-vpid="([\da-z]{8})"',
r'<param[^>]+name="externalIdentifier"[^>]+value="([\da-z]{8})"'],
webpage, 'vpid', default=None)
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
# digitalData may be missing (e.g. http://www.bbc.com/autos/story/20130513-hyundais-rock-star)
digital_data = self._parse_json(
self._search_regex(
r'var\s+digitalData\s*=\s*({.+?});?\n', webpage, 'digital data', default='{}'),
programme_id, fatal=False)
page_info = digital_data.get('page', {}).get('pageInfo', {})
title = page_info.get('pageName') or self._og_search_title(webpage)
description = page_info.get('description') or self._og_search_description(webpage)
timestamp = parse_iso8601(page_info.get('publicationDate')) or timestamp
return {
'id': programme_id,
'title': title,
'description': description,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
}
playlist_title = self._html_search_regex(
r'<title>(.*?)(?:\s*-\s*BBC [^ ]+)?</title>', webpage, 'playlist title')
playlist_description = self._og_search_description(webpage, default=None)
def extract_all(pattern):
return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False),
re.findall(pattern, webpage))))
# Multiple video article (e.g.
# http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460)
EMBED_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:[^/]+/)+[\da-z]{8}(?:\b[^"]+)?'
entries = []
for match in extract_all(r'new\s+SMP\(({.+?})\)'):
embed_url = match.get('playerSettings', {}).get('externalEmbedUrl')
if embed_url and re.match(EMBED_URL, embed_url):
entries.append(embed_url)
entries.extend(re.findall(
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
if entries:
return self.playlist_result(
[self.url_result(entry, 'BBCCoUk') for entry in entries],
playlist_id, playlist_title, playlist_description)
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
medias = extract_all(r"data-media-meta='({[^']+})'")
if not medias:
# Single video article (e.g. http://www.bbc.com/news/video_and_audio/international)
media_asset = self._search_regex(
r'mediaAssetPage\.init\(\s*({.+?}), "/',
webpage, 'media asset', default=None)
if media_asset:
media_asset_page = self._parse_json(media_asset, playlist_id, fatal=False)
medias = []
for video in media_asset_page.get('videos', {}).values():
medias.extend(video.values())
if not medias:
# Multiple video playlist with single `now playing` entry (e.g.
# http://www.bbc.com/news/video_and_audio/must_see/33767813)
vxp_playlist = self._parse_json(
self._search_regex(
r'<script[^>]+class="vxp-playlist-data"[^>]+type="application/json"[^>]*>([^<]+)</script>',
webpage, 'playlist data'),
playlist_id)
playlist_medias = []
for item in vxp_playlist:
media = item.get('media')
if not media:
continue
playlist_medias.append(media)
# Download single video if found media with asset id matching the video id from URL
if item.get('advert', {}).get('assetId') == playlist_id:
medias = [media]
break
# Fallback to the whole playlist
if not medias:
medias = playlist_medias
entries = []
for num, media_meta in enumerate(medias, start=1):
formats, subtitles = self._extract_from_media_meta(media_meta, playlist_id)
if not formats:
continue
self._sort_formats(formats)
video_id = media_meta.get('externalId')
if not video_id:
video_id = playlist_id if len(medias) == 1 else '%s-%s' % (playlist_id, num)
title = media_meta.get('caption')
if not title:
title = playlist_title if len(medias) == 1 else '%s - Video %s' % (playlist_title, num)
duration = int_or_none(media_meta.get('durationInSeconds')) or parse_duration(media_meta.get('duration'))
images = []
for image in media_meta.get('images', {}).values():
images.extend(image.values())
if 'image' in media_meta:
images.append(media_meta['image'])
thumbnails = [{
'url': image.get('href'),
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
} for image in images]
entries.append({
'id': video_id,
'title': title,
'thumbnails': thumbnails,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
})
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -1,379 +0,0 @@
from __future__ import unicode_literals
import xml.etree.ElementTree
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
from ..compat import compat_HTTPError
class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer'
_VALID_URL = r'https?://(?:www\.)?bbc\.co\.uk/(?:(?:(?:programmes|iplayer(?:/[^/]+)?/(?:episode|playlist))/)|music/clips[/#])(?P<id>[\da-z]{8})'
_TESTS = [
{
'url': 'http://www.bbc.co.uk/programmes/b039g8p7',
'info_dict': {
'id': 'b039d07m',
'ext': 'flv',
'title': 'Kaleidoscope, Leonard Cohen',
'description': 'The Canadian poet and songwriter reflects on his musical career.',
'duration': 1740,
},
'params': {
# rtmp download
'skip_download': True,
}
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b00yng5w/The_Man_in_Black_Series_3_The_Printed_Name/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Man in Black: Series 3: The Printed Name',
'description': "Mark Gatiss introduces Nicholas Pierpan's chilling tale of a writer's devilish pact with a mysterious man. Stars Ewan Bailey.",
'duration': 1800,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Episode is no longer available on BBC iPlayer Radio',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/b03vhd1f/The_Voice_UK_Series_3_Blind_Auditions_5/',
'info_dict': {
'id': 'b00yng1d',
'ext': 'flv',
'title': 'The Voice UK: Series 3: Blind Auditions 5',
'description': "Emma Willis and Marvin Humes present the fifth set of blind auditions in the singing competition, as the coaches continue to build their teams based on voice alone.",
'duration': 5100,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
},
{
'url': 'http://www.bbc.co.uk/iplayer/episode/p026c7jt/tomorrows-worlds-the-unearthly-history-of-science-fiction-2-invasion',
'info_dict': {
'id': 'b03k3pb7',
'ext': 'flv',
'title': "Tomorrow's Worlds: The Unearthly History of Science Fiction",
'description': '2. Invasion',
'duration': 3600,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'Currently BBC iPlayer TV programmes are available to play in the UK only',
}, {
'url': 'http://www.bbc.co.uk/programmes/b04v20dw',
'info_dict': {
'id': 'b04v209v',
'ext': 'flv',
'title': 'Pete Tong, The Essential New Tune Special',
'description': "Pete has a very special mix - all of 2014's Essential New Tunes!",
'duration': 10800,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p02frcc3',
'note': 'Audio',
'info_dict': {
'id': 'p02frcch',
'ext': 'flv',
'title': 'Pete Tong, Past, Present and Future Special, Madeon - After Hours mix',
'description': 'French house superstar Madeon takes us out of the club and onto the after party.',
'duration': 3507,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/music/clips/p025c0zz',
'note': 'Video',
'info_dict': {
'id': 'p025c103',
'ext': 'flv',
'title': 'Reading and Leeds Festival, 2014, Rae Morris - Closer (Live on BBC Three)',
'description': 'Rae Morris performs Closer for BBC Three at Reading 2014',
'duration': 226,
},
'params': {
# rtmp download
'skip_download': True,
}
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b054fn09/ad/natural-world-20152016-2-super-powered-owls',
'info_dict': {
'id': 'p02n76xf',
'ext': 'flv',
'title': 'Natural World, 2015-2016: 2. Super Powered Owls',
'description': 'md5:e4db5c937d0e95a7c6b5e654d429183d',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/episode/b05zmgwn/royal-academy-summer-exhibition',
'info_dict': {
'id': 'b05zmgw1',
'ext': 'flv',
'description': 'Kirsty Wark and Morgan Quaintance visit the Royal Academy as it prepares for its annual artistic extravaganza, meeting people who have come together to make the show unique.',
'title': 'Royal Academy Summer Exhibition',
'duration': 3540,
},
'params': {
# rtmp download
'skip_download': True,
},
'skip': 'geolocation',
}, {
'url': 'http://www.bbc.co.uk/iplayer/playlist/p01dvks4',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/music/clips#p02frcc3',
'only_matching': True,
}, {
'url': 'http://www.bbc.co.uk/iplayer/cbeebies/episode/b0480276/bing-14-atchoo',
'only_matching': True,
}
]
def _extract_asx_playlist(self, connection, programme_id):
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
def _extract_connection(self, connection, programme_id):
formats = []
protocol = connection.get('protocol')
supplier = connection.get('supplier')
if protocol == 'http':
href = connection.get('href')
# ASX playlist
if supplier == 'asx':
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
formats.append({
'url': ref,
'format_id': 'ref%s_%s' % (i, supplier),
})
# Direct link
else:
formats.append({
'url': href,
'format_id': supplier,
})
elif protocol == 'rtmp':
application = connection.get('application', 'ondemand')
auth_string = connection.get('authString')
identifier = connection.get('identifier')
server = connection.get('server')
formats.append({
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
'play_path': identifier,
'app': '%s?%s' % (application, auth_string),
'page_url': 'http://www.bbc.co.uk',
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
'rtmp_live': False,
'ext': 'flv',
'format_id': supplier,
})
return formats
def _extract_items(self, playlist):
return playlist.findall('./{http://bbc.co.uk/2008/emp/playlist}item')
def _extract_medias(self, media_selection):
error = media_selection.find('./{http://bbc.co.uk/2008/mp/mediaselection}error')
if error is not None:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error.get('id')), expected=True)
return media_selection.findall('./{http://bbc.co.uk/2008/mp/mediaselection}media')
def _extract_connections(self, media):
return media.findall('./{http://bbc.co.uk/2008/mp/mediaselection}connection')
def _extract_video(self, media, programme_id):
formats = []
vbr = int(media.get('bitrate'))
vcodec = media.get('encoding')
service = media.get('service')
width = int(media.get('width'))
height = int(media.get('height'))
file_size = int(media.get('media_file_size'))
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'width': width,
'height': height,
'vbr': vbr,
'vcodec': vcodec,
'filesize': file_size,
})
formats.extend(conn_formats)
return formats
def _extract_audio(self, media, programme_id):
formats = []
abr = int(media.get('bitrate'))
acodec = media.get('encoding')
service = media.get('service')
for connection in self._extract_connections(media):
conn_formats = self._extract_connection(connection, programme_id)
for format in conn_formats:
format.update({
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
})
formats.extend(conn_formats)
return formats
def _get_subtitles(self, media, programme_id):
subtitles = {}
for connection in self._extract_connections(media):
captions = self._download_xml(connection.get('href'), programme_id, 'Downloading captions')
lang = captions.get('{http://www.w3.org/XML/1998/namespace}lang', 'en')
subtitles[lang] = [
{
'url': connection.get('href'),
'ext': 'ttml',
},
]
return subtitles
def _download_media_selector(self, programme_id):
try:
media_selection = self._download_xml(
'http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/pc/vpid/%s' % programme_id,
programme_id, 'Downloading media selection XML')
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
media_selection = xml.etree.ElementTree.fromstring(ee.cause.read().decode('utf-8'))
else:
raise
formats = []
subtitles = None
for media in self._extract_medias(media_selection):
kind = media.get('kind')
if kind == 'audio':
formats.extend(self._extract_audio(media, programme_id))
elif kind == 'video':
formats.extend(self._extract_video(media, programme_id))
elif kind == 'captions':
subtitles = self.extract_subtitles(media, programme_id)
return formats, subtitles
def _download_playlist(self, playlist_id):
try:
playlist = self._download_json(
'http://www.bbc.co.uk/programmes/%s/playlist.json' % playlist_id,
playlist_id, 'Downloading playlist JSON')
version = playlist.get('defaultAvailableVersion')
if version:
smp_config = version['smpConfig']
title = smp_config['title']
description = smp_config['summary']
for item in smp_config['items']:
kind = item['kind']
if kind != 'programme' and kind != 'radioProgramme':
continue
programme_id = item.get('vpid')
duration = int(item.get('duration'))
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
except ExtractorError as ee:
if not (isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404):
raise
# fallback to legacy playlist
playlist = self._download_xml(
'http://www.bbc.co.uk/iplayer/playlist/%s' % playlist_id,
playlist_id, 'Downloading legacy playlist XML')
no_items = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}noItems')
if no_items is not None:
reason = no_items.get('reason')
if reason == 'preAvailability':
msg = 'Episode %s is not yet available' % playlist_id
elif reason == 'postAvailability':
msg = 'Episode %s is no longer available' % playlist_id
elif reason == 'noMedia':
msg = 'Episode %s is not currently available' % playlist_id
else:
msg = 'Episode %s is not available: %s' % (playlist_id, reason)
raise ExtractorError(msg, expected=True)
for item in self._extract_items(playlist):
kind = item.get('kind')
if kind != 'programme' and kind != 'radioProgramme':
continue
title = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}title').text
description = playlist.find('./{http://bbc.co.uk/2008/emp/playlist}summary').text
programme_id = item.get('identifier')
duration = int(item.get('duration'))
formats, subtitles = self._download_media_selector(programme_id)
return programme_id, title, description, duration, formats, subtitles
def _real_extract(self, url):
group_id = self._match_id(url)
webpage = self._download_webpage(url, group_id, 'Downloading video page')
programme_id = None
tviplayer = self._search_regex(
r'mediator\.bind\(({.+?})\s*,\s*document\.getElementById',
webpage, 'player', default=None)
if tviplayer:
player = self._parse_json(tviplayer, group_id).get('player', {})
duration = int_or_none(player.get('duration'))
programme_id = player.get('vpid')
if not programme_id:
programme_id = self._search_regex(
r'"vpid"\s*:\s*"([\da-z]{8})"', webpage, 'vpid', fatal=False, default=None)
if programme_id:
formats, subtitles = self._download_media_selector(programme_id)
title = self._og_search_title(webpage)
description = self._search_regex(
r'<p class="[^"]*medium-description[^"]*">([^<]+)</p>',
webpage, 'description', fatal=False)
else:
programme_id, title, description, duration, formats, subtitles = self._download_playlist(group_id)
self._sort_formats(formats)
return {
'id': programme_id,
'title': title,
'description': description,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -1,7 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
xpath_text, xpath_text,
xpath_with_ns, xpath_with_ns,
@ -57,7 +57,7 @@ class BetIE(InfoExtractor):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
media_url = compat_urllib_parse.unquote(self._search_regex( media_url = compat_urllib_parse_unquote(self._search_regex(
[r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"], [r'mediaURL\s*:\s*"([^"]+)"', r"var\s+mrssMediaUrl\s*=\s*'([^']+)'"],
webpage, 'media URL')) webpage, 'media URL'))

View File

@ -41,8 +41,15 @@ class BiliBiliIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
if self._search_regex(r'(此视频不存在或被删除)', webpage, 'error message', default=None): if '(此视频不存在或被删除)' in webpage:
raise ExtractorError('The video does not exist or was deleted', expected=True) raise ExtractorError(
'The video does not exist or was deleted', expected=True)
if '>你没有权限浏览! 由于版权相关问题 我们不对您所在的地区提供服务<' in webpage:
raise ExtractorError(
'The video is not available in your region due to copyright reasons',
expected=True)
video_code = self._search_regex( video_code = self._search_regex(
r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code') r'(?s)<div itemprop="video".*?>(.*?)</div>', webpage, 'video code')

View File

@ -5,7 +5,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_str,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
) )
@ -14,6 +13,8 @@ from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
unescapeHTML, unescapeHTML,
xpath_text,
xpath_with_ns,
) )
@ -23,10 +24,10 @@ class BlipTVIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352', 'url': 'http://blip.tv/cbr/cbr-exclusive-gotham-city-imposters-bats-vs-jokerz-short-3-5796352',
'md5': 'c6934ad0b6acf2bd920720ec888eb812', 'md5': '80baf1ec5c3d2019037c1c707d676b9f',
'info_dict': { 'info_dict': {
'id': '5779306', 'id': '5779306',
'ext': 'mov', 'ext': 'm4v',
'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3', 'title': 'CBR EXCLUSIVE: "Gotham City Imposters" Bats VS Jokerz Short 3',
'description': 'md5:9bc31f227219cde65e47eeec8d2dc596', 'description': 'md5:9bc31f227219cde65e47eeec8d2dc596',
'timestamp': 1323138843, 'timestamp': 1323138843,
@ -100,6 +101,20 @@ class BlipTVIE(InfoExtractor):
'vcodec': 'none', 'vcodec': 'none',
} }
}, },
{
# missing duration
'url': 'http://blip.tv/rss/flash/6700880',
'info_dict': {
'id': '6684191',
'ext': 'm4v',
'title': 'Cowboy Bebop: Gateway Shuffle Review',
'description': 'md5:3acc480c0f9ae157f5fe88547ecaf3f8',
'timestamp': 1386639757,
'upload_date': '20131210',
'uploader': 'sfdebris',
'uploader_id': '706520',
}
}
] ]
@staticmethod @staticmethod
@ -128,35 +143,34 @@ class BlipTVIE(InfoExtractor):
rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS') rss = self._download_xml('http://blip.tv/rss/flash/%s' % video_id, video_id, 'Downloading video RSS')
def blip(s): def _x(p):
return '{http://blip.tv/dtd/blip/1.0}%s' % s return xpath_with_ns(p, {
'blip': 'http://blip.tv/dtd/blip/1.0',
def media(s): 'media': 'http://search.yahoo.com/mrss/',
return '{http://search.yahoo.com/mrss/}%s' % s 'itunes': 'http://www.itunes.com/dtds/podcast-1.0.dtd',
})
def itunes(s):
return '{http://www.itunes.com/dtds/podcast-1.0.dtd}%s' % s
item = rss.find('channel/item') item = rss.find('channel/item')
video_id = item.find(blip('item_id')).text video_id = xpath_text(item, _x('blip:item_id'), 'video id') or lookup_id
title = item.find('./title').text title = xpath_text(item, 'title', 'title', fatal=True)
description = clean_html(compat_str(item.find(blip('puredescription')).text)) description = clean_html(xpath_text(item, _x('blip:puredescription'), 'description'))
timestamp = parse_iso8601(item.find(blip('datestamp')).text) timestamp = parse_iso8601(xpath_text(item, _x('blip:datestamp'), 'timestamp'))
uploader = item.find(blip('user')).text uploader = xpath_text(item, _x('blip:user'), 'uploader')
uploader_id = item.find(blip('userid')).text uploader_id = xpath_text(item, _x('blip:userid'), 'uploader id')
duration = int(item.find(blip('runtime')).text) duration = int_or_none(xpath_text(item, _x('blip:runtime'), 'duration'))
media_thumbnail = item.find(media('thumbnail')) media_thumbnail = item.find(_x('media:thumbnail'))
thumbnail = media_thumbnail.get('url') if media_thumbnail is not None else item.find(itunes('image')).text thumbnail = (media_thumbnail.get('url') if media_thumbnail is not None
categories = [category.text for category in item.findall('category')] else xpath_text(item, 'image', 'thumbnail'))
categories = [category.text for category in item.findall('category') if category is not None]
formats = [] formats = []
subtitles_urls = {} subtitles_urls = {}
media_group = item.find(media('group')) media_group = item.find(_x('media:group'))
for media_content in media_group.findall(media('content')): for media_content in media_group.findall(_x('media:content')):
url = media_content.get('url') url = media_content.get('url')
role = media_content.get(blip('role')) role = media_content.get(_x('blip:role'))
msg = self._download_webpage( msg = self._download_webpage(
url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url', url + '?showplayer=20140425131715&referrer=http://blip.tv&mask=7&skin=flashvars&view=url',
video_id, 'Resolving URL for %s' % role) video_id, 'Resolving URL for %s' % role)
@ -175,8 +189,8 @@ class BlipTVIE(InfoExtractor):
'url': real_url, 'url': real_url,
'format_id': role, 'format_id': role,
'format_note': media_type, 'format_note': media_type,
'vcodec': media_content.get(blip('vcodec')) or 'none', 'vcodec': media_content.get(_x('blip:vcodec')) or 'none',
'acodec': media_content.get(blip('acodec')), 'acodec': media_content.get(_x('blip:acodec')),
'filesize': media_content.get('filesize'), 'filesize': media_content.get('filesize'),
'width': int_or_none(media_content.get('width')), 'width': int_or_none(media_content.get('width')),
'height': int_or_none(media_content.get('height')), 'height': int_or_none(media_content.get('height')),

View File

@ -106,15 +106,11 @@ class CanalplusIE(InfoExtractor):
continue continue
format_id = fmt.tag format_id = fmt.tag
if format_id == 'HLS': if format_id == 'HLS':
hls_formats = self._extract_m3u8_formats(format_url, video_id, 'flv') formats.extend(self._extract_m3u8_formats(
for fmt in hls_formats: format_url, video_id, 'mp4', preference=preference(format_id)))
fmt['preference'] = preference(format_id)
formats.extend(hls_formats)
elif format_id == 'HDS': elif format_id == 'HDS':
hds_formats = self._extract_f4m_formats(format_url + '?hdcore=2.11.3', video_id) formats.extend(self._extract_f4m_formats(
for fmt in hds_formats: format_url + '?hdcore=2.11.3', video_id, preference=preference(format_id)))
fmt['preference'] = preference(format_id)
formats.extend(hds_formats)
else: else:
formats.append({ formats.append({
'url': format_url, 'url': format_url,

View File

@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_request, compat_urllib_request,
compat_urllib_parse, compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
@ -88,7 +89,7 @@ class CeskaTelevizeIE(InfoExtractor):
if playlist_url == 'error_region': if playlist_url == 'error_region':
raise ExtractorError(NOT_AVAILABLE_STRING, expected=True) raise ExtractorError(NOT_AVAILABLE_STRING, expected=True)
req = compat_urllib_request.Request(compat_urllib_parse.unquote(playlist_url)) req = compat_urllib_request.Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url) req.add_header('Referer', url)
playlist = self._download_json(req, video_id) playlist = self._download_json(req, video_id)

View File

@ -1,53 +1,68 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import time
import xml.etree.ElementTree
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, determine_ext,
parse_duration, int_or_none,
js_to_json,
parse_iso8601,
remove_end,
) )
class ClipfishIE(InfoExtractor): class ClipfishIE(InfoExtractor):
IE_NAME = 'clipfish' _VALID_URL = r'https?://(?:www\.)?clipfish\.de/(?:[^/]+/)+video/(?P<id>[0-9]+)'
_VALID_URL = r'^https?://(?:www\.)?clipfish\.de/.*?/video/(?P<id>[0-9]+)/'
_TEST = { _TEST = {
'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/', 'url': 'http://www.clipfish.de/special/game-trailer/video/3966754/fifa-14-e3-2013-trailer/',
'md5': '2521cd644e862936cf2e698206e47385', 'md5': '79bc922f3e8a9097b3d68a93780fd475',
'info_dict': { 'info_dict': {
'id': '3966754', 'id': '3966754',
'ext': 'mp4', 'ext': 'mp4',
'title': 'FIFA 14 - E3 2013 Trailer', 'title': 'FIFA 14 - E3 2013 Trailer',
'timestamp': 1370938118,
'upload_date': '20130611',
'duration': 82, 'duration': 82,
}, }
'skip': 'Blocked in the US'
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group(1)
info_url = ('http://www.clipfish.de/devxml/videoinfo/%s?ts=%d' % webpage = self._download_webpage(url, video_id)
(video_id, int(time.time())))
doc = self._download_xml( video_info = self._parse_json(
info_url, video_id, note='Downloading info page') js_to_json(self._html_search_regex(
title = doc.find('title').text '(?s)videoObject\s*=\s*({.+?});', webpage, 'video object')),
video_url = doc.find('filename').text video_id)
if video_url is None:
xml_bytes = xml.etree.ElementTree.tostring(doc) formats = []
raise ExtractorError('Cannot find video URL in document %r' % for video_url in re.findall(r'var\s+videourl\s*=\s*"([^"]+)"', webpage):
xml_bytes) ext = determine_ext(video_url)
thumbnail = doc.find('imageurl').text if ext == 'm3u8':
duration = parse_duration(doc.find('duration').text) formats.append({
'url': video_url.replace('de.hls.fra.clipfish.de', 'hls.fra.clipfish.de'),
'ext': 'mp4',
'format_id': 'hls',
})
else:
formats.append({
'url': video_url,
'format_id': ext,
})
self._sort_formats(formats)
title = remove_end(self._og_search_title(webpage), ' - Video')
thumbnail = self._og_search_thumbnail(webpage)
duration = int_or_none(video_info.get('length'))
timestamp = parse_iso8601(self._html_search_meta('uploadDate', webpage, 'upload date'))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': video_url, 'formats': formats,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'timestamp': timestamp,
} }

View File

@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
find_xpath_attr, find_xpath_attr,
@ -10,9 +8,9 @@ from ..utils import (
class ClipsyndicateIE(InfoExtractor): class ClipsyndicateIE(InfoExtractor):
_VALID_URL = r'http://www\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)' _VALID_URL = r'http://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe', 'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',
'md5': '4d7d549451bad625e0ff3d7bd56d776c', 'md5': '4d7d549451bad625e0ff3d7bd56d776c',
'info_dict': { 'info_dict': {
@ -22,11 +20,13 @@ class ClipsyndicateIE(InfoExtractor):
'duration': 612, 'duration': 612,
'thumbnail': 're:^https?://.+\.jpg', 'thumbnail': 're:^https?://.+\.jpg',
}, },
} }, {
'url': 'http://chic.clipsyndicate.com/video/play/5844117/shark_attack',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
js_player = self._download_webpage( js_player = self._download_webpage(
'http://eplayer.clipsyndicate.com/embed/player.js?va_id=%s' % video_id, 'http://eplayer.clipsyndicate.com/embed/player.js?va_id=%s' % video_id,
video_id, 'Downlaoding player') video_id, 'Downlaoding player')

View File

@ -36,7 +36,7 @@ class ComCarCoffIE(InfoExtractor):
webpage, 'full data json')) webpage, 'full data json'))
video_id = full_data['activeVideo']['video'] video_id = full_data['activeVideo']['video']
video_data = full_data['videos'][video_id] video_data = full_data.get('videos', {}).get(video_id) or full_data['singleshots'][video_id]
thumbnails = [{ thumbnails = [{
'url': video_data['images']['thumb'], 'url': video_data['images']['thumb'],
}, { }, {

View File

@ -14,10 +14,13 @@ import xml.etree.ElementTree
from ..compat import ( from ..compat import (
compat_cookiejar, compat_cookiejar,
compat_cookies,
compat_HTTPError, compat_HTTPError,
compat_http_client, compat_http_client,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urllib_request,
compat_urlparse, compat_urlparse,
compat_str, compat_str,
) )
@ -27,12 +30,15 @@ from ..utils import (
bug_reports_message, bug_reports_message,
clean_html, clean_html,
compiled_regex_type, compiled_regex_type,
determine_ext,
ExtractorError, ExtractorError,
fix_xml_ampersands,
float_or_none, float_or_none,
int_or_none, int_or_none,
RegexNotFoundError, RegexNotFoundError,
sanitize_filename, sanitize_filename,
unescapeHTML, unescapeHTML,
url_basename,
) )
@ -63,7 +69,7 @@ class InfoExtractor(object):
Potential fields: Potential fields:
* url Mandatory. The URL of the video file * url Mandatory. The URL of the video file
* ext Will be calculated from url if missing * ext Will be calculated from URL if missing
* format A human-readable description of the format * format A human-readable description of the format
("mp4 container with h264/opus"). ("mp4 container with h264/opus").
Calculated from the format_id, width, height. Calculated from the format_id, width, height.
@ -153,7 +159,7 @@ class InfoExtractor(object):
lower to higher preference, each element is a dictionary lower to higher preference, each element is a dictionary
with the "ext" entry and one of: with the "ext" entry and one of:
* "data": The subtitles file contents * "data": The subtitles file contents
* "url": A url pointing to the subtitles file * "url": A URL pointing to the subtitles file
automatic_captions: Like 'subtitles', used by the YoutubeIE for automatic_captions: Like 'subtitles', used by the YoutubeIE for
automatically generated captions automatically generated captions
duration: Length of the video in seconds, as an integer. duration: Length of the video in seconds, as an integer.
@ -174,13 +180,18 @@ class InfoExtractor(object):
Set to "root" to indicate that this is a Set to "root" to indicate that this is a
comment to the original video. comment to the original video.
age_limit: Age restriction for the video, as an integer (years) age_limit: Age restriction for the video, as an integer (years)
webpage_url: The url to the video webpage, if given to youtube-dl it webpage_url: The URL to the video webpage, if given to youtube-dl it
should allow to get the same result again. (It will be set should allow to get the same result again. (It will be set
by YoutubeDL if it's missing) by YoutubeDL if it's missing)
categories: A list of categories that the video falls in, for example categories: A list of categories that the video falls in, for example
["Sports", "Berlin"] ["Sports", "Berlin"]
tags: A list of tags assigned to the video, e.g. ["sweden", "pop music"]
is_live: True, False, or None (=unknown). Whether this video is a is_live: True, False, or None (=unknown). Whether this video is a
live stream that goes on instead of a fixed-length video. live stream that goes on instead of a fixed-length video.
start_time: Time in seconds where the reproduction should start, as
specified in the URL.
end_time: Time in seconds where the reproduction should end, as
specified in the URL.
Unless mentioned otherwise, the fields should be Unicode strings. Unless mentioned otherwise, the fields should be Unicode strings.
@ -499,7 +510,7 @@ class InfoExtractor(object):
# Methods for following #608 # Methods for following #608
@staticmethod @staticmethod
def url_result(url, ie=None, video_id=None, video_title=None): def url_result(url, ie=None, video_id=None, video_title=None):
"""Returns a url that points to a page that should be processed""" """Returns a URL that points to a page that should be processed"""
# TODO: ie should be the class used for getting the info # TODO: ie should be the class used for getting the info
video_info = {'_type': 'url', video_info = {'_type': 'url',
'url': url, 'url': url,
@ -624,6 +635,12 @@ class InfoExtractor(object):
template % (content_re, property_re), template % (content_re, property_re),
] ]
@staticmethod
def _meta_regex(prop):
return r'''(?isx)<meta
(?=[^>]+(?:itemprop|name|property|id)=(["\']?)%s\1)
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(prop)
def _og_search_property(self, prop, html, name=None, **kargs): def _og_search_property(self, prop, html, name=None, **kargs):
if name is None: if name is None:
name = 'OpenGraph %s' % prop name = 'OpenGraph %s' % prop
@ -633,7 +650,7 @@ class InfoExtractor(object):
return unescapeHTML(escaped) return unescapeHTML(escaped)
def _og_search_thumbnail(self, html, **kargs): def _og_search_thumbnail(self, html, **kargs):
return self._og_search_property('image', html, 'thumbnail url', fatal=False, **kargs) return self._og_search_property('image', html, 'thumbnail URL', fatal=False, **kargs)
def _og_search_description(self, html, **kargs): def _og_search_description(self, html, **kargs):
return self._og_search_property('description', html, fatal=False, **kargs) return self._og_search_property('description', html, fatal=False, **kargs)
@ -654,9 +671,7 @@ class InfoExtractor(object):
if display_name is None: if display_name is None:
display_name = name display_name = name
return self._html_search_regex( return self._html_search_regex(
r'''(?isx)<meta self._meta_regex(name),
(?=[^>]+(?:itemprop|name|property)=(["\']?)%s\1)
[^>]+?content=(["\'])(?P<content>.*?)\2''' % re.escape(name),
html, display_name, fatal=fatal, group='content', **kwargs) html, display_name, fatal=fatal, group='content', **kwargs)
def _dc_search_uploader(self, html): def _dc_search_uploader(self, html):
@ -705,6 +720,25 @@ class InfoExtractor(object):
return self._html_search_meta('twitter:player', html, return self._html_search_meta('twitter:player', html,
'twitter card player') 'twitter card player')
@staticmethod
def _hidden_inputs(html):
return dict([
(input.group('name'), input.group('value')) for input in re.finditer(
r'''(?x)
<input\s+
type=(?P<q_hidden>["\'])hidden(?P=q_hidden)\s+
name=(?P<q_name>["\'])(?P<name>.+?)(?P=q_name)\s+
(?:id=(?P<q_id>["\']).+?(?P=q_id)\s+)?
value=(?P<q_value>["\'])(?P<value>.*?)(?P=q_value)
''', html)
])
def _form_hidden_inputs(self, form_id, html):
form = self._search_regex(
r'(?s)<form[^>]+?id=(["\'])%s\1[^>]*>(?P<form>.+?)</form>' % form_id,
html, '%s form' % form_id, group='form')
return self._hidden_inputs(form)
def _sort_formats(self, formats, field_preference=None): def _sort_formats(self, formats, field_preference=None):
if not formats: if not formats:
raise ExtractorError('No video formats found') raise ExtractorError('No video formats found')
@ -815,10 +849,14 @@ class InfoExtractor(object):
self.to_screen(msg) self.to_screen(msg)
time.sleep(timeout) time.sleep(timeout)
def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None): def _extract_f4m_formats(self, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip()):
manifest = self._download_xml( manifest = self._download_xml(
manifest_url, video_id, 'Downloading f4m manifest', manifest_url, video_id, 'Downloading f4m manifest',
'Unable to download f4m manifest') 'Unable to download f4m manifest',
# Some manifests may be malformed, e.g. prosiebensat1 generated manifests
# (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244)
transform_source=transform_source)
formats = [] formats = []
manifest_version = '1.0' manifest_version = '1.0'
@ -828,8 +866,19 @@ class InfoExtractor(object):
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media') media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
for i, media_el in enumerate(media_nodes): for i, media_el in enumerate(media_nodes):
if manifest_version == '2.0': if manifest_version == '2.0':
manifest_url = ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url = media_el.attrib.get('href') or media_el.attrib.get('url')
(media_el.attrib.get('href') or media_el.attrib.get('url'))) if not media_url:
continue
manifest_url = (
media_url if media_url.startswith('http://') or media_url.startswith('https://')
else ('/'.join(manifest_url.split('/')[:-1]) + '/' + media_url))
# If media_url is itself a f4m manifest do the recursive extraction
# since bitrates in parent manifest (this one) and media_url manifest
# may differ leading to inability to resolve the format by requested
# bitrate in f4m downloader
if determine_ext(manifest_url) == 'f4m':
formats.extend(self._extract_f4m_formats(manifest_url, video_id, preference, f4m_id))
continue
tbr = int_or_none(media_el.attrib.get('bitrate')) tbr = int_or_none(media_el.attrib.get('bitrate'))
formats.append({ formats.append({
'format_id': '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)])), 'format_id': '-'.join(filter(None, [f4m_id, compat_str(i if tbr is None else tbr)])),
@ -846,7 +895,8 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None, def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None, entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None): m3u8_id=None, note=None, errnote=None,
fatal=True):
formats = [{ formats = [{
'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])), 'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
@ -866,7 +916,10 @@ class InfoExtractor(object):
m3u8_doc = self._download_webpage( m3u8_doc = self._download_webpage(
m3u8_url, video_id, m3u8_url, video_id,
note=note or 'Downloading m3u8 information', note=note or 'Downloading m3u8 information',
errnote=errnote or 'Failed to download m3u8 information') errnote=errnote or 'Failed to download m3u8 information',
fatal=fatal)
if m3u8_doc is False:
return m3u8_doc
last_info = None last_info = None
last_media = None last_media = None
kv_rex = re.compile( kv_rex = re.compile(
@ -927,69 +980,167 @@ class InfoExtractor(object):
self._sort_formats(formats) self._sort_formats(formats)
return formats return formats
# TODO: improve extraction @staticmethod
def _extract_smil_formats(self, smil_url, video_id, fatal=True): def _xpath_ns(path, namespace=None):
smil = self._download_xml( if not namespace:
smil_url, video_id, 'Downloading SMIL file', return path
'Unable to download SMIL file', fatal=fatal) out = []
for c in path.split('/'):
if not c or c == '.':
out.append(c)
else:
out.append('{%s}%s' % (namespace, c))
return '/'.join(out)
def _extract_smil_formats(self, smil_url, video_id, fatal=True, f4m_params=None):
smil = self._download_smil(smil_url, video_id, fatal=fatal)
if smil is False: if smil is False:
assert not fatal assert not fatal
return [] return []
base = smil.find('./head/meta').get('base') namespace = self._parse_smil_namespace(smil)
return self._parse_smil_formats(
smil, smil_url, video_id, namespace=namespace, f4m_params=f4m_params)
def _extract_smil_info(self, smil_url, video_id, fatal=True, f4m_params=None):
smil = self._download_smil(smil_url, video_id, fatal=fatal)
if smil is False:
return {}
return self._parse_smil(smil, smil_url, video_id, f4m_params=f4m_params)
def _download_smil(self, smil_url, video_id, fatal=True):
return self._download_xml(
smil_url, video_id, 'Downloading SMIL file',
'Unable to download SMIL file', fatal=fatal)
def _parse_smil(self, smil, smil_url, video_id, f4m_params=None):
namespace = self._parse_smil_namespace(smil)
formats = self._parse_smil_formats(
smil, smil_url, video_id, namespace=namespace, f4m_params=f4m_params)
subtitles = self._parse_smil_subtitles(smil, namespace=namespace)
video_id = os.path.splitext(url_basename(smil_url))[0]
title = None
description = None
for meta in smil.findall(self._xpath_ns('./head/meta', namespace)):
name = meta.attrib.get('name')
content = meta.attrib.get('content')
if not name or not content:
continue
if not title and name == 'title':
title = content
elif not description and name in ('description', 'abstract'):
description = content
return {
'id': video_id,
'title': title or video_id,
'description': description,
'formats': formats,
'subtitles': subtitles,
}
def _parse_smil_namespace(self, smil):
return self._search_regex(
r'(?i)^{([^}]+)?}smil$', smil.tag, 'namespace', default=None)
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None):
base = smil_url
for meta in smil.findall(self._xpath_ns('./head/meta', namespace)):
b = meta.get('base') or meta.get('httpBase')
if b:
base = b
break
formats = [] formats = []
rtmp_count = 0 rtmp_count = 0
if smil.findall('./body/seq/video'): http_count = 0
video = smil.findall('./body/seq/video')[0]
fmts, rtmp_count = self._parse_smil_video(video, video_id, base, rtmp_count)
formats.extend(fmts)
else:
for video in smil.findall('./body/switch/video'):
fmts, rtmp_count = self._parse_smil_video(video, video_id, base, rtmp_count)
formats.extend(fmts)
self._sort_formats(formats) videos = smil.findall(self._xpath_ns('.//video', namespace))
for video in videos:
return formats
def _parse_smil_video(self, video, video_id, base, rtmp_count):
src = video.get('src') src = video.get('src')
if not src: if not src:
return ([], rtmp_count) continue
bitrate = int_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000) bitrate = int_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
filesize = int_or_none(video.get('size') or video.get('fileSize'))
width = int_or_none(video.get('width')) width = int_or_none(video.get('width'))
height = int_or_none(video.get('height')) height = int_or_none(video.get('height'))
proto = video.get('proto') proto = video.get('proto')
if not proto:
if base:
if base.startswith('rtmp'):
proto = 'rtmp'
elif base.startswith('http'):
proto = 'http'
ext = video.get('ext') ext = video.get('ext')
if proto == 'm3u8': src_ext = determine_ext(src)
return (self._extract_m3u8_formats(src, video_id, ext), rtmp_count)
elif proto == 'rtmp':
rtmp_count += 1
streamer = video.get('streamer') or base streamer = video.get('streamer') or base
return ([{
if proto == 'rtmp' or streamer.startswith('rtmp'):
rtmp_count += 1
formats.append({
'url': streamer, 'url': streamer,
'play_path': src, 'play_path': src,
'ext': 'flv', 'ext': 'flv',
'format_id': 'rtmp-%d' % (rtmp_count if bitrate is None else bitrate), 'format_id': 'rtmp-%d' % (rtmp_count if bitrate is None else bitrate),
'tbr': bitrate, 'tbr': bitrate,
'filesize': filesize,
'width': width, 'width': width,
'height': height, 'height': height,
}], rtmp_count) })
elif proto.startswith('http'): continue
return ([{
'url': base + src, src_url = src if src.startswith('http') else compat_urlparse.urljoin(base, src)
'ext': ext or 'flv',
if proto == 'm3u8' or src_ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src_url, video_id, ext or 'mp4', m3u8_id='hls'))
continue
if src_ext == 'f4m':
f4m_url = src_url
if not f4m_params:
f4m_params = {
'hdcore': '3.2.0',
'plugin': 'flowplayer-3.2.0.1',
}
f4m_url += '&' if '?' in f4m_url else '?'
f4m_url += compat_urllib_parse.urlencode(f4m_params)
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds'))
continue
if src_url.startswith('http'):
http_count += 1
formats.append({
'url': src_url,
'ext': ext or src_ext or 'flv',
'format_id': 'http-%d' % (bitrate or http_count),
'tbr': bitrate, 'tbr': bitrate,
'filesize': filesize,
'width': width, 'width': width,
'height': height, 'height': height,
}], rtmp_count) })
continue
self._sort_formats(formats)
return formats
def _parse_smil_subtitles(self, smil, namespace=None):
subtitles = {}
for num, textstream in enumerate(smil.findall(self._xpath_ns('.//textstream', namespace))):
src = textstream.get('src')
if not src:
continue
ext = textstream.get('ext') or determine_ext(src)
if not ext:
type_ = textstream.get('type')
if type_ == 'text/srt':
ext = 'srt'
lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName')
subtitles.setdefault(lang, []).append({
'url': src,
'ext': ext,
})
return subtitles
def _live_title(self, name): def _live_title(self, name):
""" Generate the title for a live video """ """ Generate the title for a live video """
@ -1025,6 +1176,12 @@ class InfoExtractor(object):
None, '/', True, False, expire_time, '', None, None, None) None, '/', True, False, expire_time, '', None, None, None)
self._downloader.cookiejar.set_cookie(cookie) self._downloader.cookiejar.set_cookie(cookie)
def _get_cookies(self, url):
""" Return a compat_cookies.SimpleCookie with the cookies for the url """
req = compat_urllib_request.Request(url)
self._downloader.cookiejar.add_cookie_header(req)
return compat_cookies.SimpleCookie(req.get_header('Cookie'))
def get_testcases(self, include_onlymatching=False): def get_testcases(self, include_onlymatching=False):
t = getattr(self, '_TEST', None) t = getattr(self, '_TEST', None)
if t: if t:
@ -1076,7 +1233,7 @@ class InfoExtractor(object):
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """
Base class for paged search queries extractors. Base class for paged search queries extractors.
They accept urls in the format _SEARCH_KEY(|all|[0-9]):{query} They accept URLs in the format _SEARCH_KEY(|all|[0-9]):{query}
Instances should define _SEARCH_KEY and _MAX_RESULTS. Instances should define _SEARCH_KEY and _MAX_RESULTS.
""" """

View File

@ -12,6 +12,7 @@ from math import pow, sqrt, floor
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request, compat_urllib_request,
) )
from ..utils import ( from ..utils import (
@ -27,7 +28,7 @@ from ..aes import (
class CrunchyrollIE(InfoExtractor): class CrunchyrollIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:[^/]*/[^/?&]*?|media/\?id=)(?P<video_id>[0-9]+))(?:[/?&]|$)' _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.(?:com|fr)/(?:media(?:-|/\?id=)|[^/]*/[^/?&]*?)(?P<video_id>[0-9]+))(?:[/?&]|$)'
_NETRC_MACHINE = 'crunchyroll' _NETRC_MACHINE = 'crunchyroll'
_TESTS = [{ _TESTS = [{
'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513', 'url': 'http://www.crunchyroll.com/wanna-be-the-strongest-in-the-world/episode-1-an-idol-wrestler-is-born-645513',
@ -45,6 +46,22 @@ class CrunchyrollIE(InfoExtractor):
# rtmp # rtmp
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': {
'id': '589804',
'ext': 'flv',
'title': 'Culture Japan Episode 1 Rebuilding Japan after the 3.11',
'description': 'md5:fe2743efedb49d279552926d0bd0cd9e',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Danny Choo Network',
'upload_date': '20120213',
},
'params': {
# rtmp
'skip_download': True,
},
}, { }, {
'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697', 'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
'only_matching': True, 'only_matching': True,
@ -238,7 +255,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
video_upload_date = unified_strdate(video_upload_date) video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, 'video_uploader', fatal=False, flags=re.DOTALL) video_uploader = self._html_search_regex(r'<div>\s*Publisher:(.+?)</div>', webpage, 'video_uploader', fatal=False, flags=re.DOTALL)
playerdata_url = compat_urllib_parse.unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url')) playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = compat_urllib_request.Request(playerdata_url) playerdata_req = compat_urllib_request.Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url}) playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
@ -251,16 +268,17 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage): for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage):
stream_quality, stream_format = self._FORMAT_IDS[fmt] stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p' video_format = fmt + 'p'
streamdata_req = compat_urllib_request.Request('http://www.crunchyroll.com/xml/') streamdata_req = compat_urllib_request.Request(
# urlencode doesn't work! 'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
streamdata_req.data = 'req=RpcApiVideoEncode%5FGetStreamInfo&video%5Fencode%5Fquality=' + stream_quality + '&media%5Fid=' + stream_id + '&video%5Fformat=' + stream_format % (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata_req.add_header('Content-Length', str(len(streamdata_req.data)))
streamdata = self._download_xml( streamdata = self._download_xml(
streamdata_req, video_id, streamdata_req, video_id,
note='Downloading media info for %s' % video_format) note='Downloading media info for %s' % video_format)
video_url = streamdata.find('./host').text stream_info = streamdata.find('./{default}preload/stream_info')
video_play_path = streamdata.find('./file').text video_url = stream_info.find('./host').text
video_play_path = stream_info.find('./file').text
formats.append({ formats.append({
'url': video_url, 'url': video_url,
'play_path': video_play_path, 'play_path': video_play_path,

View File

@ -6,6 +6,7 @@ from ..utils import parse_iso8601, ExtractorError
class CtsNewsIE(InfoExtractor): class CtsNewsIE(InfoExtractor):
IE_DESC = '華視新聞'
# https connection failed (Connection reset) # https connection failed (Connection reset)
_VALID_URL = r'http://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html' _VALID_URL = r'http://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_TESTS = [{ _TESTS = [{

View File

@ -13,8 +13,9 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
determine_ext,
int_or_none, int_or_none,
orderedSet, parse_iso8601,
str_to_int, str_to_int,
unescapeHTML, unescapeHTML,
) )
@ -28,10 +29,16 @@ class DailymotionBaseInfoExtractor(InfoExtractor):
request.add_header('Cookie', 'family_filter=off; ff=off') request.add_header('Cookie', 'family_filter=off; ff=off')
return request return request
def _download_webpage_handle_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage_handle(request, *args, **kwargs)
def _download_webpage_no_ff(self, url, *args, **kwargs):
request = self._build_request(url)
return self._download_webpage(request, *args, **kwargs)
class DailymotionIE(DailymotionBaseInfoExtractor): class DailymotionIE(DailymotionBaseInfoExtractor):
"""Information Extractor for Dailymotion"""
_VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)' _VALID_URL = r'(?i)(?:https?://)?(?:(www|touch)\.)?dailymotion\.[a-z]{2,3}/(?:(embed|#)/)?video/(?P<id>[^/?_]+)'
IE_NAME = 'dailymotion' IE_NAME = 'dailymotion'
@ -50,9 +57,17 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': 'x2iuewm', 'id': 'x2iuewm',
'ext': 'mp4', 'ext': 'mp4',
'uploader': 'IGN',
'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News', 'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News',
'description': 'Several come bundled with the Steam Controller.',
'thumbnail': 're:^https?:.*\.(?:jpg|png)$',
'duration': 74,
'timestamp': 1425657362,
'upload_date': '20150306', 'upload_date': '20150306',
'uploader': 'IGN',
'uploader_id': 'xijv66',
'age_limit': 0,
'view_count': int,
'comment_count': int,
} }
}, },
# Vevo video # Vevo video
@ -86,38 +101,106 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
url = 'https://www.dailymotion.com/video/%s' % video_id
# Retrieve video webpage to extract further information webpage = self._download_webpage_no_ff(
request = self._build_request(url) 'https://www.dailymotion.com/video/%s' % video_id, video_id)
webpage = self._download_webpage(request, video_id)
# Extract URL, uploader and title from webpage
self.report_extraction(video_id)
# It may just embed a vevo video:
m_vevo = re.search(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?video=(?P<id>[\w]*)',
webpage)
if m_vevo is not None:
vevo_id = m_vevo.group('id')
self.to_screen('Vevo video detected: %s' % vevo_id)
return self.url_result('vevo:%s' % vevo_id, ie='Vevo')
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
video_upload_date = None description = self._og_search_description(webpage) or self._html_search_meta(
mobj = re.search(r'<meta property="video:release_date" content="([0-9]{4})-([0-9]{2})-([0-9]{2}).+?"/>', webpage) 'description', webpage, 'description')
if mobj is not None:
video_upload_date = mobj.group(1) + mobj.group(2) + mobj.group(3) view_count = str_to_int(self._search_regex(
[r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserPlays:(\d+)"',
r'video_views_count[^>]+>\s+([\d\.,]+)'],
webpage, 'view count', fatal=False))
comment_count = int_or_none(self._search_regex(
r'<meta[^>]+itemprop="interactionCount"[^>]+content="UserComments:(\d+)"',
webpage, 'comment count', fatal=False))
player_v5 = self._search_regex(
r'playerV5\s*=\s*dmp\.create\([^,]+?,\s*({.+?})\);',
webpage, 'player v5', default=None)
if player_v5:
player = self._parse_json(player_v5, video_id)
metadata = player['metadata']
formats = []
for quality, media_list in metadata['qualities'].items():
for media in media_list:
media_url = media.get('url')
if not media_url:
continue
type_ = media.get('type')
if type_ == 'application/vnd.lumberjack.manifest':
continue
if type_ == 'application/x-mpegURL' or determine_ext(media_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', m3u8_id='hls'))
else:
f = {
'url': media_url,
'format_id': quality,
}
m = re.search(r'H264-(?P<width>\d+)x(?P<height>\d+)', media_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(f)
self._sort_formats(formats)
title = metadata['title']
duration = int_or_none(metadata.get('duration'))
timestamp = int_or_none(metadata.get('created_time'))
thumbnail = metadata.get('poster_url')
uploader = metadata.get('owner', {}).get('screenname')
uploader_id = metadata.get('owner', {}).get('id')
subtitles = {}
for subtitle_lang, subtitle in metadata.get('subtitles', {}).get('data', {}).items():
subtitles[subtitle_lang] = [{
'ext': determine_ext(subtitle_url),
'url': subtitle_url,
} for subtitle_url in subtitle.get('urls', [])]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'age_limit': age_limit,
'view_count': view_count,
'comment_count': comment_count,
'formats': formats,
'subtitles': subtitles,
}
# vevo embed
vevo_id = self._search_regex(
r'<link rel="video_src" href="[^"]*?vevo.com[^"]*?video=(?P<id>[\w]*)',
webpage, 'vevo embed', default=None)
if vevo_id:
return self.url_result('vevo:%s' % vevo_id, 'Vevo')
# fallback old player
embed_page = self._download_webpage_no_ff(
'https://www.dailymotion.com/embed/video/%s' % video_id,
video_id, 'Downloading embed page')
timestamp = parse_iso8601(self._html_search_meta(
'video:release_date', webpage, 'upload date'))
info = self._parse_json(
self._search_regex(
r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE),
video_id)
embed_url = 'https://www.dailymotion.com/embed/video/%s' % video_id
embed_request = self._build_request(embed_url)
embed_page = self._download_webpage(
embed_request, video_id, 'Downloading embed page')
info = self._search_regex(r'var info = ({.*?}),$', embed_page,
'video info', flags=re.MULTILINE)
info = json.loads(info)
if info.get('error') is not None: if info.get('error') is not None:
msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title'] msg = 'Couldn\'t get video, Dailymotion says: %s' % info['error']['title']
raise ExtractorError(msg, expected=True) raise ExtractorError(msg, expected=True)
@ -138,16 +221,11 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'width': width, 'width': width,
'height': height, 'height': height,
}) })
if not formats: self._sort_formats(formats)
raise ExtractorError('Unable to extract video URL')
# subtitles # subtitles
video_subtitles = self.extract_subtitles(video_id, webpage) video_subtitles = self.extract_subtitles(video_id, webpage)
view_count = str_to_int(self._search_regex(
r'video_views_count[^>]+>\s+([\d\.,]+)',
webpage, 'view count', fatal=False))
title = self._og_search_title(webpage, default=None) title = self._og_search_title(webpage, default=None)
if title is None: if title is None:
title = self._html_search_regex( title = self._html_search_regex(
@ -158,12 +236,14 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'uploader': info['owner.screenname'], 'uploader': info['owner.screenname'],
'upload_date': video_upload_date, 'timestamp': timestamp,
'title': title, 'title': title,
'description': description,
'subtitles': video_subtitles, 'subtitles': video_subtitles,
'thumbnail': info['thumbnail_url'], 'thumbnail': info['thumbnail_url'],
'age_limit': age_limit, 'age_limit': age_limit,
'view_count': view_count, 'view_count': view_count,
'duration': info['duration']
} }
def _get_subtitles(self, video_id, webpage): def _get_subtitles(self, video_id, webpage):
@ -197,18 +277,26 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
}] }]
def _extract_entries(self, id): def _extract_entries(self, id):
video_ids = [] video_ids = set()
processed_urls = set()
for pagenum in itertools.count(1): for pagenum in itertools.count(1):
request = self._build_request(self._PAGE_TEMPLATE % (id, pagenum)) page_url = self._PAGE_TEMPLATE % (id, pagenum)
webpage = self._download_webpage(request, webpage, urlh = self._download_webpage_handle_no_ff(
id, 'Downloading page %s' % pagenum) page_url, id, 'Downloading page %s' % pagenum)
if urlh.geturl() in processed_urls:
self.report_warning('Stopped at duplicated page %s, which is the same as %s' % (
page_url, urlh.geturl()), id)
break
video_ids.extend(re.findall(r'data-xid="(.+?)"', webpage)) processed_urls.add(urlh.geturl())
for video_id in re.findall(r'data-xid="(.+?)"', webpage):
if video_id not in video_ids:
yield self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
video_ids.add(video_id)
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None: if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break break
return [self.url_result('http://www.dailymotion.com/video/%s' % video_id, 'Dailymotion')
for video_id in orderedSet(video_ids)]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -225,7 +313,7 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
class DailymotionUserIE(DailymotionPlaylistIE): class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = 'dailymotion:user' IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?:(?:old/)?user/)?(?P<user>[^/]+)$' _VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s' _PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{ _TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv', 'url': 'https://www.dailymotion.com/user/nqtv',
@ -234,6 +322,17 @@ class DailymotionUserIE(DailymotionPlaylistIE):
'title': 'Rémi Gaillard', 'title': 'Rémi Gaillard',
}, },
'playlist_mincount': 100, 'playlist_mincount': 100,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'skip': 'Takes too long time',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -284,8 +383,7 @@ class DailymotionCloudIE(DailymotionBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
request = self._build_request(url) webpage = self._download_webpage_no_ff(url, video_id)
webpage = self._download_webpage(request, video_id)
title = self._html_search_regex(r'<title>([^>]+)</title>', webpage, 'title') title = self._html_search_regex(r'<title>([^>]+)</title>', webpage, 'title')

View File

@ -0,0 +1,84 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import (
int_or_none,
parse_iso8601,
)
class DCNIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dcndigital\.ae/(?:#/)?(?:video/.+|show/\d+/.+?)/(?P<id>\d+)'
_TEST = {
'url': 'http://www.dcndigital.ae/#/show/199074/%D8%B1%D8%AD%D9%84%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D8%B1-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-1/17375/6887',
'info_dict':
{
'id': '17375',
'ext': 'mp4',
'title': 'رحلة العمر : الحلقة 1',
'description': 'md5:0156e935d870acb8ef0a66d24070c6d6',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 2041,
'timestamp': 1227504126,
'upload_date': '20081124',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
request = compat_urllib_request.Request(
'http://admin.mangomolo.com/analytics/index.php/plus/video?id=%s' % video_id,
headers={'Origin': 'http://www.dcndigital.ae'})
video = self._download_json(request, video_id)
title = video.get('title_en') or video['title_ar']
webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?'
+ compat_urllib_parse.urlencode({
'id': video['id'],
'user_id': video['user_id'],
'signature': video['signature'],
'countries': 'Q0M=',
'filter': 'DENY',
}), video_id)
m3u8_url = self._html_search_regex(r'file:\s*"([^"]+)', webpage, 'm3u8 url')
formats = self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
rtsp_url = self._search_regex(
r'<a[^>]+href="(rtsp://[^"]+)"', webpage, 'rtsp url', fatal=False)
if rtsp_url:
formats.append({
'url': rtsp_url,
'format_id': 'rtsp',
})
self._sort_formats(formats)
img = video.get('img')
thumbnail = 'http://admin.mangomolo.com/analytics/%s' % img if img else None
duration = int_or_none(video.get('duration'))
description = video.get('description_en') or video.get('description_ar')
timestamp = parse_iso8601(video.get('create_time') or video.get('update_time'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@ -3,42 +3,47 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unified_strdate
class DFBIE(InfoExtractor): class DFBIE(InfoExtractor):
IE_NAME = 'tv.dfb.de' IE_NAME = 'tv.dfb.de'
_VALID_URL = r'https?://tv\.dfb\.de/video/[^/]+/(?P<id>\d+)' _VALID_URL = r'https?://tv\.dfb\.de/video/(?P<display_id>[^/]+)/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://tv.dfb.de/video/highlights-des-empfangs-in-berlin/9070/', 'url': 'http://tv.dfb.de/video/u-19-em-stimmen-zum-spiel-gegen-russland/11633/',
# The md5 is different each time # The md5 is different each time
'info_dict': { 'info_dict': {
'id': '9070', 'id': '11633',
'display_id': 'u-19-em-stimmen-zum-spiel-gegen-russland',
'ext': 'flv', 'ext': 'flv',
'title': 'Highlights des Empfangs in Berlin', 'title': 'U 19-EM: Stimmen zum Spiel gegen Russland',
'upload_date': '20140716', 'upload_date': '20150714',
}, },
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
player_info = self._download_xml( player_info = self._download_xml(
'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id, 'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
video_id) display_id)
video_info = player_info.find('video') video_info = player_info.find('video')
f4m_info = self._download_xml(self._proto_relative_url(video_info.find('url').text.strip()), video_id) f4m_info = self._download_xml(
self._proto_relative_url(video_info.find('url').text.strip()), display_id)
token_el = f4m_info.find('token') token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0' manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id,
'title': video_info.find('title').text, 'title': video_info.find('title').text,
'url': manifest_url,
'ext': 'flv',
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),
'upload_date': ''.join(video_info.find('time_date').text.split('.')[::-1]), 'upload_date': unified_strdate(video_info.find('time_date').text),
'formats': formats,
} }

View File

@ -9,6 +9,7 @@ from ..compat import (compat_str, compat_basestring)
class DouyuTVIE(InfoExtractor): class DouyuTVIE(InfoExtractor):
IE_DESC = '斗鱼'
_VALID_URL = r'http://(?:www\.)?douyutv\.com/(?P<id>[A-Za-z0-9]+)' _VALID_URL = r'http://(?:www\.)?douyutv\.com/(?P<id>[A-Za-z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.douyutv.com/iseven', 'url': 'http://www.douyutv.com/iseven',

View File

@ -23,8 +23,23 @@ class DramaFeverBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.dramafever.com/accounts/login/' _LOGIN_URL = 'https://www.dramafever.com/accounts/login/'
_NETRC_MACHINE = 'dramafever' _NETRC_MACHINE = 'dramafever'
_CONSUMER_SECRET = 'DA59dtVXYLxajktV'
_consumer_secret = None
def _get_consumer_secret(self):
mainjs = self._download_webpage(
'http://www.dramafever.com/static/51afe95/df2014/scripts/main.js',
None, 'Downloading main.js', fatal=False)
if not mainjs:
return self._CONSUMER_SECRET
return self._search_regex(
r"var\s+cs\s*=\s*'([^']+)'", mainjs,
'consumer secret', default=self._CONSUMER_SECRET)
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
self._consumer_secret = self._get_consumer_secret()
def _login(self): def _login(self):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
@ -119,6 +134,23 @@ class DramaFeverIE(DramaFeverBaseIE):
'url': href, 'url': href,
}] }]
series_id, episode_number = video_id.split('.')
episode_info = self._download_json(
# We only need a single episode info, so restricting page size to one episode
# and dealing with page number as with episode number
r'http://www.dramafever.com/api/4/episode/series/?cs=%s&series_id=%s&page_number=%s&page_size=1'
% (self._consumer_secret, series_id, episode_number),
video_id, 'Downloading episode info JSON', fatal=False)
if episode_info:
value = episode_info.get('value')
if value:
subfile = value[0].get('subfile') or value[0].get('new_subfile')
if subfile and subfile != 'http://www.dramafever.com/st/':
subtitles.setdefault('English', []).append({
'ext': 'srt',
'url': subfile,
})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
@ -152,27 +184,14 @@ class DramaFeverSeriesIE(DramaFeverBaseIE):
'playlist_count': 20, 'playlist_count': 20,
}] }]
_CONSUMER_SECRET = 'DA59dtVXYLxajktV'
_PAGE_SIZE = 60 # max is 60 (see http://api.drama9.com/#get--api-4-episode-series-) _PAGE_SIZE = 60 # max is 60 (see http://api.drama9.com/#get--api-4-episode-series-)
def _get_consumer_secret(self, video_id):
mainjs = self._download_webpage(
'http://www.dramafever.com/static/51afe95/df2014/scripts/main.js',
video_id, 'Downloading main.js', fatal=False)
if not mainjs:
return self._CONSUMER_SECRET
return self._search_regex(
r"var\s+cs\s*=\s*'([^']+)'", mainjs,
'consumer secret', default=self._CONSUMER_SECRET)
def _real_extract(self, url): def _real_extract(self, url):
series_id = self._match_id(url) series_id = self._match_id(url)
consumer_secret = self._get_consumer_secret(series_id)
series = self._download_json( series = self._download_json(
'http://www.dramafever.com/api/4/series/query/?cs=%s&series_id=%s' 'http://www.dramafever.com/api/4/series/query/?cs=%s&series_id=%s'
% (consumer_secret, series_id), % (self._consumer_secret, series_id),
series_id, 'Downloading series JSON')['series'][series_id] series_id, 'Downloading series JSON')['series'][series_id]
title = clean_html(series['name']) title = clean_html(series['name'])
@ -182,7 +201,7 @@ class DramaFeverSeriesIE(DramaFeverBaseIE):
for page_num in itertools.count(1): for page_num in itertools.count(1):
episodes = self._download_json( episodes = self._download_json(
'http://www.dramafever.com/api/4/episode/series/?cs=%s&series_id=%s&page_size=%d&page_number=%d' 'http://www.dramafever.com/api/4/episode/series/?cs=%s&series_id=%s&page_size=%d&page_number=%d'
% (consumer_secret, series_id, self._PAGE_SIZE, page_num), % (self._consumer_secret, series_id, self._PAGE_SIZE, page_num),
series_id, 'Downloading episodes JSON page #%d' % page_num) series_id, 'Downloading episodes JSON page #%d' % page_num)
for episode in episodes.get('value', []): for episode in episodes.get('value', []):
episode_url = episode.get('episode_url') episode_url = episode.get('episode_url')

View File

@ -1,9 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from ..compat import (
compat_urllib_parse,
)
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
class EHowIE(InfoExtractor): class EHowIE(InfoExtractor):
@ -26,7 +24,7 @@ class EHowIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_url = self._search_regex( video_url = self._search_regex(
r'(?:file|source)=(http[^\'"&]*)', webpage, 'video URL') r'(?:file|source)=(http[^\'"&]*)', webpage, 'video URL')
final_url = compat_urllib_parse.unquote(video_url) final_url = compat_urllib_parse_unquote(video_url)
uploader = self._html_search_meta('uploader', webpage) uploader = self._html_search_meta('uploader', webpage)
title = self._og_search_title(webpage).replace(' | eHow', '') title = self._og_search_title(webpage).replace(' | eHow', '')

View File

@ -9,7 +9,7 @@ from ..compat import (
compat_http_client, compat_http_client,
compat_str, compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse, compat_urllib_parse_unquote,
compat_urllib_request, compat_urllib_request,
) )
from ..utils import ( from ..utils import (
@ -17,6 +17,8 @@ from ..utils import (
int_or_none, int_or_none,
limit_length, limit_length,
urlencode_postdata, urlencode_postdata,
get_element_by_id,
clean_html,
) )
@ -42,6 +44,7 @@ class FacebookIE(InfoExtractor):
'id': '637842556329505', 'id': '637842556329505',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Did you know Kei Nishikori is the first Asian man to ever reach a Grand Slam', 'title': 're:Did you know Kei Nishikori is the first Asian man to ever reach a Grand Slam',
'uploader': 'Tennis on Facebook',
} }
}, { }, {
'note': 'Video without discernible title', 'note': 'Video without discernible title',
@ -50,6 +53,7 @@ class FacebookIE(InfoExtractor):
'id': '274175099429670', 'id': '274175099429670',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Facebook video #274175099429670', 'title': 'Facebook video #274175099429670',
'uploader': 'Asif Nawab Butt',
}, },
'expected_warnings': [ 'expected_warnings': [
'title' 'title'
@ -136,7 +140,7 @@ class FacebookIE(InfoExtractor):
else: else:
raise ExtractorError('Cannot parse data') raise ExtractorError('Cannot parse data')
data = dict(json.loads(m.group(1))) data = dict(json.loads(m.group(1)))
params_raw = compat_urllib_parse.unquote(data['params']) params_raw = compat_urllib_parse_unquote(data['params'])
params = json.loads(params_raw) params = json.loads(params_raw)
video_data = params['video_data'][0] video_data = params['video_data'][0]
@ -161,6 +165,7 @@ class FacebookIE(InfoExtractor):
video_title = limit_length(video_title, 80) video_title = limit_length(video_title, 80)
if not video_title: if not video_title:
video_title = 'Facebook video #%s' % video_id video_title = 'Facebook video #%s' % video_id
uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage))
return { return {
'id': video_id, 'id': video_id,
@ -168,4 +173,5 @@ class FacebookIE(InfoExtractor):
'formats': formats, 'formats': formats,
'duration': int_or_none(video_data.get('video_duration')), 'duration': int_or_none(video_data.get('video_duration')),
'thumbnail': video_data.get('thumbnail_src'), 'thumbnail': video_data.get('thumbnail_src'),
'uploader': uploader,
} }

View File

@ -6,15 +6,11 @@ import re
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_urlparse
compat_urllib_parse_urlparse,
compat_urlparse,
)
from ..utils import ( from ..utils import (
clean_html, clean_html,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
float_or_none,
parse_duration, parse_duration,
determine_ext, determine_ext,
) )
@ -59,12 +55,12 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
# See https://github.com/rg3/youtube-dl/issues/3963 # See https://github.com/rg3/youtube-dl/issues/3963
# m3u8 urls work fine # m3u8 urls work fine
continue continue
video_url_parsed = compat_urllib_parse_urlparse(video_url)
f4m_url = self._download_webpage( f4m_url = self._download_webpage(
'http://hdfauth.francetv.fr/esi/TA?url=%s' % video_url_parsed.path, 'http://hdfauth.francetv.fr/esi/TA?url=%s' % video_url,
video_id, 'Downloading f4m manifest token', fatal=False) video_id, 'Downloading f4m manifest token', fatal=False)
if f4m_url: if f4m_url:
formats.extend(self._extract_f4m_formats(f4m_url, video_id, 1, format_id)) formats.extend(self._extract_f4m_formats(
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, 1, format_id))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4', m3u8_id=format_id)) formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4', m3u8_id=format_id))
elif video_url.startswith('rtmp'): elif video_url.startswith('rtmp'):
@ -87,7 +83,7 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
'title': info['titre'], 'title': info['titre'],
'description': clean_html(info['synopsis']), 'description': clean_html(info['synopsis']),
'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', info['image']), 'thumbnail': compat_urlparse.urljoin('http://pluzz.francetv.fr', info['image']),
'duration': float_or_none(info.get('real_duration'), 1000) or parse_duration(info['duree']), 'duration': int_or_none(info.get('real_duration')) or parse_duration(info['duree']),
'timestamp': int_or_none(info['diffusion']['timestamp']), 'timestamp': int_or_none(info['diffusion']['timestamp']),
'formats': formats, 'formats': formats,
} }
@ -160,11 +156,21 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
class FranceTVIE(FranceTVBaseInfoExtractor): class FranceTVIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetv' IE_NAME = 'francetv'
IE_DESC = 'France 2, 3, 4, 5 and Ô' IE_DESC = 'France 2, 3, 4, 5 and Ô'
_VALID_URL = r'''(?x)https?://www\.france[2345o]\.fr/ _VALID_URL = r'''(?x)
https?://
(?: (?:
emissions/.*?/(videos|emissions)/(?P<id>[^/?]+) (?:www\.)?france[2345o]\.fr/
| (emissions?|jt)/(?P<key>[^/?]+) (?:
)''' emissions/[^/]+/(?:videos|diffusions)|
emission/[^/]+|
videos|
jt
)
/|
embed\.francetv\.fr/\?ue=
)
(?P<id>[^/?]+)
'''
_TESTS = [ _TESTS = [
# france2 # france2
@ -221,24 +227,46 @@ class FranceTVIE(FranceTVBaseInfoExtractor):
}, },
# franceo # franceo
{ {
'url': 'http://www.franceo.fr/jt/info-afrique/04-12-2013', 'url': 'http://www.franceo.fr/jt/info-soir/18-07-2015',
'md5': '52f0bfe202848b15915a2f39aaa8981b', 'md5': '47d5816d3b24351cdce512ad7ab31da8',
'info_dict': { 'info_dict': {
'id': '108634970', 'id': '125377621',
'ext': 'flv', 'ext': 'flv',
'title': 'Infô Afrique', 'title': 'Infô soir',
'description': 'md5:ebf346da789428841bee0fd2a935ea55', 'description': 'md5:01b8c6915a3d93d8bbbd692651714309',
'upload_date': '20140915', 'upload_date': '20150718',
'timestamp': 1410822000, 'timestamp': 1437241200,
'duration': 414,
}, },
}, },
{
# francetv embed
'url': 'http://embed.francetv.fr/?ue=8d7d3da1e3047c42ade5a5d7dfd3fc87',
'info_dict': {
'id': 'EV_30231',
'ext': 'flv',
'title': 'Alcaline, le concert avec Calogero',
'description': 'md5:61f08036dcc8f47e9cfc33aed08ffaff',
'upload_date': '20150226',
'timestamp': 1424989860,
'duration': 5400,
},
},
{
'url': 'http://www.france4.fr/emission/highlander/diffusion-du-17-07-2015-04h05',
'only_matching': True,
},
{
'url': 'http://www.franceo.fr/videos/125377617',
'only_matching': True,
}
] ]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
webpage = self._download_webpage(url, mobj.group('key') or mobj.group('id')) webpage = self._download_webpage(url, video_id)
video_id, catalogue = self._html_search_regex( video_id, catalogue = self._html_search_regex(
r'href="http://videos\.francetv\.fr/video/([^@]+@[^"]+)"', r'href="http://videos?\.francetv\.fr/video/([^@]+@[^"]+)"',
webpage, 'video ID').split('@') webpage, 'video ID').split('@')
return self._extract_video(video_id, catalogue) return self._extract_video(video_id, catalogue)

View File

@ -5,7 +5,7 @@ import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -75,7 +75,7 @@ class GameSpotIE(InfoExtractor):
return { return {
'id': data_video['guid'], 'id': data_video['guid'],
'display_id': page_id, 'display_id': page_id,
'title': compat_urllib_parse.unquote(data_video['title']), 'title': compat_urllib_parse_unquote(data_video['title']),
'formats': formats, 'formats': formats,
'description': self._html_search_meta('description', webpage), 'description': self._html_search_meta('description', webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),

View File

@ -7,7 +7,10 @@ from ..compat import (
compat_urllib_parse, compat_urllib_parse,
compat_urllib_request, compat_urllib_request,
) )
from ..utils import remove_end from ..utils import (
remove_end,
HEADRequest,
)
class GDCVaultIE(InfoExtractor): class GDCVaultIE(InfoExtractor):
@ -73,10 +76,20 @@ class GDCVaultIE(InfoExtractor):
return video_formats return video_formats
def _parse_flv(self, xml_description): def _parse_flv(self, xml_description):
video_formats = [] formats = []
akamai_url = xml_description.find('./metadata/akamaiHost').text akamai_url = xml_description.find('./metadata/akamaiHost').text
audios = xml_description.find('./metadata/audios')
if audios is not None:
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xml_description.find('./metadata/slideVideo').text slide_video_path = xml_description.find('./metadata/slideVideo').text
video_formats.append({ formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'), 'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv', 'ext': 'flv',
@ -86,7 +99,7 @@ class GDCVaultIE(InfoExtractor):
'format_id': 'slides', 'format_id': 'slides',
}) })
speaker_video_path = xml_description.find('./metadata/speakerVideo').text speaker_video_path = xml_description.find('./metadata/speakerVideo').text
video_formats.append({ formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url, 'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'), 'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv', 'ext': 'flv',
@ -95,7 +108,7 @@ class GDCVaultIE(InfoExtractor):
'preference': -1, 'preference': -1,
'format_id': 'speaker', 'format_id': 'speaker',
}) })
return video_formats return formats
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
@ -133,16 +146,18 @@ class GDCVaultIE(InfoExtractor):
r's1\.addVariable\("file",\s*encodeURIComponent\("(/[^"]+)"\)\);', r's1\.addVariable\("file",\s*encodeURIComponent\("(/[^"]+)"\)\);',
start_page, 'url', default=None) start_page, 'url', default=None)
if direct_url: if direct_url:
video_url = 'http://www.gdcvault.com/' + direct_url
title = self._html_search_regex( title = self._html_search_regex(
r'<td><strong>Session Name</strong></td>\s*<td>(.*?)</td>', r'<td><strong>Session Name</strong></td>\s*<td>(.*?)</td>',
start_page, 'title') start_page, 'title')
video_url = 'http://www.gdcvault.com' + direct_url
# resolve the url so that we can detect the correct extension
head = self._request_webpage(HEADRequest(video_url), video_id)
video_url = head.geturl()
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'url': video_url, 'url': video_url,
'ext': 'flv',
'title': title, 'title': title,
} }
@ -168,8 +183,8 @@ class GDCVaultIE(InfoExtractor):
# Fallback to the older format # Fallback to the older format
xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename') xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename')
xml_decription_url = xml_root + 'xml/' + xml_name xml_description_url = xml_root + 'xml/' + xml_name
xml_description = self._download_xml(xml_decription_url, display_id) xml_description = self._download_xml(xml_description_url, display_id)
video_title = xml_description.find('./metadata/title').text video_title = xml_description.find('./metadata/title').text
video_formats = self._parse_mp4(xml_description) video_formats = self._parse_mp4(xml_description)

View File

@ -8,7 +8,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE from .youtube import YoutubeIE
from ..compat import ( from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
@ -37,6 +36,7 @@ from .rutv import RUTVIE
from .tvc import TVCIE from .tvc import TVCIE
from .sportbox import SportBoxEmbedIE from .sportbox import SportBoxEmbedIE
from .smotri import SmotriIE from .smotri import SmotriIE
from .myvi import MyviIE
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .udn import UDNEmbedIE from .udn import UDNEmbedIE
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
@ -130,6 +130,74 @@ class GenericIE(InfoExtractor):
'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624', 'title': 'pdv_maddow_netcast_m4v-02-27-2015-201624',
} }
}, },
# SMIL from http://videolectures.net/promogram_igor_mekjavic_eng
{
'url': 'http://videolectures.net/promogram_igor_mekjavic_eng/video/1/smil.xml',
'info_dict': {
'id': 'smil',
'ext': 'mp4',
'title': 'Automatics, robotics and biocybernetics',
'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
'formats': 'mincount:16',
'subtitles': 'mincount:1',
},
'params': {
'force_generic_extractor': True,
'skip_download': True,
},
},
# SMIL from http://www1.wdr.de/mediathek/video/livestream/index.html
{
'url': 'http://metafilegenerator.de/WDR/WDR_FS/hds/hds.smil',
'info_dict': {
'id': 'hds',
'ext': 'flv',
'title': 'hds',
'formats': 'mincount:1',
},
'params': {
'skip_download': True,
},
},
# SMIL from https://www.restudy.dk/video/play/id/1637
{
'url': 'https://www.restudy.dk/awsmedia/SmilDirectory/video_1637.xml',
'info_dict': {
'id': 'video_1637',
'ext': 'flv',
'title': 'video_1637',
'formats': 'mincount:3',
},
'params': {
'skip_download': True,
},
},
# SMIL from http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm
{
'url': 'http://services.media.howstuffworks.com/videos/450221/smil-service.smil',
'info_dict': {
'id': 'smil-service',
'ext': 'flv',
'title': 'smil-service',
'formats': 'mincount:1',
},
'params': {
'skip_download': True,
},
},
# SMIL from http://new.livestream.com/CoheedandCambria/WebsterHall/videos/4719370
{
'url': 'http://api.new.livestream.com/accounts/1570303/events/1585861/videos/4719370.smil',
'info_dict': {
'id': '4719370',
'ext': 'mp4',
'title': '571de1fd-47bc-48db-abf9-238872a58d1f',
'formats': 'mincount:3',
},
'params': {
'skip_download': True,
},
},
# google redirect # google redirect
{ {
'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE', 'url': 'http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCUQtwIwAA&url=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DcmQHVoWB5FY&ei=F-sNU-LLCaXk4QT52ICQBQ&usg=AFQjCNEw4hL29zgOohLXvpJ-Bdh2bils1Q&bvm=bv.61965928,d.bGE',
@ -236,6 +304,19 @@ class GenericIE(InfoExtractor):
}, },
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, },
{
# ooyala video embedded with http://player.ooyala.com/iframe.js
'url': 'http://www.macrumors.com/2015/07/24/steve-jobs-the-man-in-the-machine-first-trailer/',
'info_dict': {
'id': 'p0MGJndjoG5SOKqO_hZJuZFPB-Tr5VgB',
'ext': 'mp4',
'title': '"Steve Jobs: Man in the Machine" trailer',
'description': 'The first trailer for the Alex Gibney documentary "Steve Jobs: Man in the Machine."',
},
'params': {
'skip_download': True,
},
},
# multiple ooyala embeds on SBN network websites # multiple ooyala embeds on SBN network websites
{ {
'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok', 'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
@ -276,14 +357,6 @@ class GenericIE(InfoExtractor):
'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.', 'description': 'Episode 18: President Barack Obama sits down with Zach Galifianakis for his most memorable interview yet.',
}, },
}, },
# BBC iPlayer embeds
{
'url': 'http://www.bbc.co.uk/blogs/adamcurtis/posts/BUGGER',
'info_dict': {
'title': 'BBC - Blogs - Adam Curtis - BUGGER',
},
'playlist_mincount': 18,
},
# RUTV embed # RUTV embed
{ {
'url': 'http://www.rg.ru/2014/03/15/reg-dfo/anklav-anons.html', 'url': 'http://www.rg.ru/2014/03/15/reg-dfo/anklav-anons.html',
@ -338,6 +411,17 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
# Myvi.ru embed
{
'url': 'http://www.kinomyvi.tv/news/detail/Pervij-dublirovannij-trejler--Uzhastikov-_nOw1',
'info_dict': {
'id': 'f4dafcad-ff21-423d-89b5-146cfd89fa1e',
'ext': 'mp4',
'title': 'Ужастики, русский трейлер (2015)',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 153,
}
},
# XHamster embed # XHamster embed
{ {
'url': 'http://www.numisc.com/forum/showthread.php?11696-FM15-which-pumiscer-was-this-%28-vid-%29-%28-alfa-as-fuck-srx-%29&s=711f5db534502e22260dec8c5e2d66d8', 'url': 'http://www.numisc.com/forum/showthread.php?11696-FM15-which-pumiscer-was-this-%28-vid-%29-%28-alfa-as-fuck-srx-%29&s=711f5db534502e22260dec8c5e2d66d8',
@ -396,6 +480,26 @@ class GenericIE(InfoExtractor):
'skip_download': 'Requires rtmpdump' 'skip_download': 'Requires rtmpdump'
} }
}, },
# francetv embed
{
'url': 'http://www.tsprod.com/replay-du-concert-alcaline-de-calogero',
'info_dict': {
'id': 'EV_30231',
'ext': 'mp4',
'title': 'Alcaline, le concert avec Calogero',
'description': 'md5:61f08036dcc8f47e9cfc33aed08ffaff',
'upload_date': '20150226',
'timestamp': 1424989860,
'duration': 5400,
},
'params': {
# m3u8 downloads
'skip_download': True,
},
'expected_warnings': [
'Forbidden'
]
},
# Condé Nast embed # Condé Nast embed
{ {
'url': 'http://www.wired.com/2014/04/honda-asimo/', 'url': 'http://www.wired.com/2014/04/honda-asimo/',
@ -1087,11 +1191,13 @@ class GenericIE(InfoExtractor):
self.report_extraction(video_id) self.report_extraction(video_id)
# Is it an RSS feed? # Is it an RSS feed or a SMIL file?
try: try:
doc = parse_xml(webpage) doc = parse_xml(webpage)
if doc.tag == 'rss': if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc) return self._extract_rss(url, video_id, doc)
elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
return self._parse_smil(doc, url, video_id)
except compat_xml_parse_error: except compat_xml_parse_error:
pass pass
@ -1103,7 +1209,7 @@ class GenericIE(InfoExtractor):
# Sometimes embedded video player is hidden behind percent encoding # Sometimes embedded video player is hidden behind percent encoding
# (e.g. https://github.com/rg3/youtube-dl/issues/2448) # (e.g. https://github.com/rg3/youtube-dl/issues/2448)
# Unescaping the whole page allows to handle those cases in a generic way # Unescaping the whole page allows to handle those cases in a generic way
webpage = compat_urllib_parse.unquote(webpage) webpage = compat_urllib_parse_unquote(webpage)
# it's tempting to parse this further, but you would # it's tempting to parse this further, but you would
# have to take into account all the variations like # have to take into account all the variations like
@ -1165,6 +1271,12 @@ class GenericIE(InfoExtractor):
if vimeo_url is not None: if vimeo_url is not None:
return self.url_result(vimeo_url) return self.url_result(vimeo_url)
vid_me_embed_url = self._search_regex(
r'src=[\'"](https?://vid\.me/[^\'"]+)[\'"]',
webpage, 'vid.me embed', default=None)
if vid_me_embed_url is not None:
return self.url_result(vid_me_embed_url, 'Vidme')
# Look for embedded YouTube player # Look for embedded YouTube player
matches = re.findall(r'''(?x) matches = re.findall(r'''(?x)
(?: (?:
@ -1291,7 +1403,7 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url')) return self.url_result(mobj.group('url'))
# Look for Ooyala videos # Look for Ooyala videos
mobj = (re.search(r'player\.ooyala\.com/[^"?]+\?[^"]*?(?:embedCode|ec)=(?P<ec>[^"&]+)', webpage) or mobj = (re.search(r'player\.ooyala\.com/[^"?]+[?#][^"]*?(?:embedCode|ec)=(?P<ec>[^"&]+)', webpage) or
re.search(r'OO\.Player\.create\([\'"].*?[\'"],\s*[\'"](?P<ec>.{32})[\'"]', webpage) or re.search(r'OO\.Player\.create\([\'"].*?[\'"],\s*[\'"](?P<ec>.{32})[\'"]', webpage) or
re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or
re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage)) re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage))
@ -1357,7 +1469,7 @@ class GenericIE(InfoExtractor):
return self.url_result(mobj.group('url')) return self.url_result(mobj.group('url'))
mobj = re.search(r'class=["\']embedly-embed["\'][^>]src=["\'][^"\']*url=(?P<url>[^&]+)', webpage) mobj = re.search(r'class=["\']embedly-embed["\'][^>]src=["\'][^"\']*url=(?P<url>[^&]+)', webpage)
if mobj is not None: if mobj is not None:
return self.url_result(compat_urllib_parse.unquote(mobj.group('url'))) return self.url_result(compat_urllib_parse_unquote(mobj.group('url')))
# Look for funnyordie embed # Look for funnyordie embed
matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage) matches = re.findall(r'<iframe[^>]+?src="(https?://(?:www\.)?funnyordie\.com/embed/[^"]+)"', webpage)
@ -1420,11 +1532,23 @@ class GenericIE(InfoExtractor):
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed') return self.url_result(mobj.group('url'), 'ArteTVEmbed')
# Look for embedded francetv player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?://)?embed\.francetv\.fr/\?ue=.+?)\1',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'))
# Look for embedded smotri.com player # Look for embedded smotri.com player
smotri_url = SmotriIE._extract_url(webpage) smotri_url = SmotriIE._extract_url(webpage)
if smotri_url: if smotri_url:
return self.url_result(smotri_url, 'Smotri') return self.url_result(smotri_url, 'Smotri')
# Look for embedded Myvi.ru player
myvi_url = MyviIE._extract_url(webpage)
if myvi_url:
return self.url_result(myvi_url)
# Look for embeded soundcloud player # Look for embeded soundcloud player
mobj = re.search( mobj = re.search(
r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"', r'<iframe\s+(?:[a-zA-Z0-9_-]+="[^"]+"\s+)*src="(?P<url>https?://(?:w\.)?soundcloud\.com/player[^"]+)"',
@ -1614,7 +1738,7 @@ class GenericIE(InfoExtractor):
if not found: if not found:
# Broaden the findall a little bit: JWPlayer JS loader # Broaden the findall a little bit: JWPlayer JS loader
found = filter_video(re.findall( found = filter_video(re.findall(
r'[^A-Za-z0-9]?file["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage)) r'[^A-Za-z0-9]?(?:file|video_url)["\']?:\s*["\'](http(?![^\'"]+\.[0-9]+[\'"])[^\'"]+)["\']', webpage))
if not found: if not found:
# Flow player # Flow player
found = filter_video(re.findall(r'''(?xs) found = filter_video(re.findall(r'''(?xs)
@ -1653,7 +1777,7 @@ class GenericIE(InfoExtractor):
if refresh_header: if refresh_header:
found = re.search(REDIRECT_REGEX, refresh_header) found = re.search(REDIRECT_REGEX, refresh_header)
if found: if found:
new_url = compat_urlparse.urljoin(url, found.group(1)) new_url = compat_urlparse.urljoin(url, unescapeHTML(found.group(1)))
self.report_following_redirect(new_url) self.report_following_redirect(new_url)
return { return {
'_type': 'url', '_type': 'url',
@ -1665,7 +1789,7 @@ class GenericIE(InfoExtractor):
entries = [] entries = []
for video_url in found: for video_url in found:
video_url = compat_urlparse.urljoin(url, video_url) video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse.unquote(os.path.basename(video_url)) video_id = compat_urllib_parse_unquote(os.path.basename(video_url))
# Sometimes, jwplayer extraction will result in a YouTube URL # Sometimes, jwplayer extraction will result in a YouTube URL
if YoutubeIE.suitable(video_url): if YoutubeIE.suitable(video_url):

View File

@ -6,12 +6,13 @@ from ..utils import (
int_or_none, int_or_none,
float_or_none, float_or_none,
qualities, qualities,
ExtractorError,
) )
class GfycatIE(InfoExtractor): class GfycatIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?gfycat\.com/(?:ifr/)?(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher', 'url': 'http://gfycat.com/DeadlyDecisiveGermanpinscher',
'info_dict': { 'info_dict': {
'id': 'DeadlyDecisiveGermanpinscher', 'id': 'DeadlyDecisiveGermanpinscher',
@ -27,14 +28,33 @@ class GfycatIE(InfoExtractor):
'categories': list, 'categories': list,
'age_limit': 0, 'age_limit': 0,
} }
}, {
'url': 'http://gfycat.com/ifr/JauntyTimelyAmazontreeboa',
'info_dict': {
'id': 'JauntyTimelyAmazontreeboa',
'ext': 'mp4',
'title': 'JauntyTimelyAmazontreeboa',
'timestamp': 1411720126,
'upload_date': '20140926',
'uploader': 'anonymous',
'duration': 3.52,
'view_count': int,
'like_count': int,
'dislike_count': int,
'categories': list,
'age_limit': 0,
} }
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
gfy = self._download_json( gfy = self._download_json(
'http://gfycat.com/cajax/get/%s' % video_id, 'http://gfycat.com/cajax/get/%s' % video_id,
video_id, 'Downloading video info')['gfyItem'] video_id, 'Downloading video info')
if 'error' in gfy:
raise ExtractorError('Gfycat said: ' + gfy['error'], expected=True)
gfy = gfy['gfyItem']
title = gfy.get('title') or gfy['gfyName'] title = gfy.get('title') or gfy['gfyName']
description = gfy.get('description') description = gfy.get('description')

View File

@ -78,12 +78,7 @@ class GorillaVidIE(InfoExtractor):
if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None: if re.search(self._FILE_NOT_FOUND_REGEX, webpage) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True) raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = dict(re.findall(r'''(?x)<input\s+ fields = self._hidden_inputs(webpage)
type="hidden"\s+
name="([^"]+)"\s+
(?:id="[^"]+"\s+)?
value="([^"]*)"
''', webpage))
if fields['op'] == 'download1': if fields['op'] == 'download1':
countdown = int_or_none(self._search_regex( countdown = int_or_none(self._search_regex(

View File

@ -58,11 +58,7 @@ class HostingBulkIE(InfoExtractor):
r'<img src="([^"]+)".+?class="pic"', r'<img src="([^"]+)".+?class="pic"',
webpage, 'thumbnail', fatal=False) webpage, 'thumbnail', fatal=False)
fields = dict(re.findall(r'''(?x)<input\s+ fields = self._hidden_inputs(webpage)
type="hidden"\s+
name="([^"]+)"\s+
value="([^"]*)"
''', webpage))
request = compat_urllib_request.Request(url, urlencode_postdata(fields)) request = compat_urllib_request.Request(url, urlencode_postdata(fields))
request.add_header('Content-type', 'application/x-www-form-urlencoded') request.add_header('Content-type', 'application/x-www-form-urlencoded')

View File

@ -10,7 +10,7 @@ from ..utils import (
class HowStuffWorksIE(InfoExtractor): class HowStuffWorksIE(InfoExtractor):
_VALID_URL = r'https?://[\da-z-]+\.howstuffworks\.com/(?:[^/]+/)*\d+-(?P<id>.+?)-video\.htm' _VALID_URL = r'https?://[\da-z-]+\.howstuffworks\.com/(?:[^/]+/)*(?:\d+-)?(?P<id>.+?)-video\.htm'
_TESTS = [ _TESTS = [
{ {
'url': 'http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm', 'url': 'http://adventure.howstuffworks.com/5266-cool-jobs-iditarod-musher-video.htm',
@ -46,6 +46,10 @@ class HowStuffWorksIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, },
}, },
{
'url': 'http://shows.howstuffworks.com/stuff-to-blow-your-mind/optical-illusions-video.htm',
'only_matching': True,
}
] ]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -4,7 +4,7 @@ import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
@ -39,7 +39,7 @@ class InfoQIE(InfoExtractor):
# Extract video URL # Extract video URL
encoded_id = self._search_regex( encoded_id = self._search_regex(
r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id') r"jsclassref\s*=\s*'([^']*)'", webpage, 'encoded id')
real_id = compat_urllib_parse.unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8')) real_id = compat_urllib_parse_unquote(base64.b64decode(encoded_id.encode('ascii')).decode('utf-8'))
playpath = 'mp4:' + real_id playpath = 'mp4:' + real_id
video_filename = playpath.split('/')[-1] video_filename = playpath.split('/')[-1]

View File

@ -3,23 +3,18 @@ from __future__ import unicode_literals
import hashlib import hashlib
import math import math
import os.path
import random import random
import re
import time import time
import uuid import uuid
import zlib
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse from ..compat import compat_urllib_parse
from ..utils import ( from ..utils import ExtractorError
ExtractorError,
url_basename,
)
class IqiyiIE(InfoExtractor): class IqiyiIE(InfoExtractor):
IE_NAME = 'iqiyi' IE_NAME = 'iqiyi'
IE_DESC = '爱奇艺'
_VALID_URL = r'http://(?:www\.)iqiyi.com/v_.+?\.html' _VALID_URL = r'http://(?:www\.)iqiyi.com/v_.+?\.html'
@ -38,62 +33,57 @@ class IqiyiIE(InfoExtractor):
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
'playlist': [{ 'playlist': [{
'md5': '7e49376fecaffa115d951634917fe105',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part1', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part1',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}, { }, {
'md5': '41b75ba13bb7ac0e411131f92bc4f6ca',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part2', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part2',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}, { }, {
'md5': '0cee1dd0a3d46a83e71e2badeae2aab0',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part3', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part3',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}, { }, {
'md5': '4f8ad72373b0c491b582e7c196b0b1f9',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part4', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part4',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}, { }, {
'md5': 'd89ad028bcfad282918e8098e811711d',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part5', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part5',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}, { }, {
'md5': '9cb1e5c95da25dff0660c32ae50903b7',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part6', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part6',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}, { }, {
'md5': '155116e0ff1867bbc9b98df294faabc9',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part7', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part7',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}, { }, {
'md5': '53f5db77622ae14fa493ed2a278a082b',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb_part8', 'id': 'e3f585b550a280af23c98b6cb2be19fb_part8',
'ext': 'f4v', 'ext': 'f4v',
'title': '名侦探柯南第752集', 'title': '名侦探柯南第752集',
}, },
}], }],
'params': {
'skip_download': True,
},
}] }]
_FORMATS_MAP = [ _FORMATS_MAP = [
@ -211,20 +201,7 @@ class IqiyiIE(InfoExtractor):
return raw_data return raw_data
def get_enc_key(self, swf_url, video_id): def get_enc_key(self, swf_url, video_id):
filename, _ = os.path.splitext(url_basename(swf_url)) enc_key = '8e29ab5666d041c3a1ea76e06dabdffb'
enc_key_json = self._downloader.cache.load('iqiyi-enc-key', filename)
if enc_key_json is not None:
return enc_key_json[0]
req = self._request_webpage(
swf_url, video_id, note='download swf content')
cn = req.read()
cn = zlib.decompress(cn[8:])
pt = re.compile(b'MixerRemote\x08(?P<enc_key>.+?)\$&vv')
enc_key = self._search_regex(pt, cn, 'enc_key').decode('utf8')
self._downloader.cache.store('iqiyi-enc-key', filename, [enc_key])
return enc_key return enc_key
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -0,0 +1,42 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_start
class Ir90TvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?90tv\.ir/video/(?P<id>[0-9]+)/.*'
_TESTS = [{
'url': 'http://90tv.ir/video/95719/%D8%B4%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA-%D9%86%D9%82%D9%84-%D9%88-%D8%A7%D9%86%D8%AA%D9%82%D8%A7%D9%84%D8%A7%D8%AA-%D9%85%D9%87%D9%85-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7-940218',
'md5': '411dbd94891381960cb9e13daa47a869',
'info_dict': {
'id': '95719',
'ext': 'mp4',
'title': 'شایعات نقل و انتقالات مهم فوتبال اروپا 94/02/18',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'url': 'http://www.90tv.ir/video/95719/%D8%B4%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA-%D9%86%D9%82%D9%84-%D9%88-%D8%A7%D9%86%D8%AA%D9%82%D8%A7%D9%84%D8%A7%D8%AA-%D9%85%D9%87%D9%85-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7-940218',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = remove_start(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), '90tv.ir :: ')
video_url = self._search_regex(
r'<source[^>]+src="([^"]+)"', webpage, 'video url')
thumbnail = self._search_regex(r'poster="([^"]+)"', webpage, 'thumbnail url', fatal=False)
return {
'url': video_url,
'id': video_id,
'title': title,
'video_url': video_url,
'thumbnail': thumbnail,
}

View File

@ -8,9 +8,9 @@ from .common import InfoExtractor
class JeuxVideoIE(InfoExtractor): class JeuxVideoIE(InfoExtractor):
_VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)-\d+\.htm' _VALID_URL = r'http://.*?\.jeuxvideo\.com/.*/(.*?)\.htm'
_TEST = { _TESTS = [{
'url': 'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm', 'url': 'http://www.jeuxvideo.com/reportages-videos-jeux/0004/00046170/tearaway-playstation-vita-gc-2013-tearaway-nous-presente-ses-papiers-d-identite-00115182.htm',
'md5': '046e491afb32a8aaac1f44dd4ddd54ee', 'md5': '046e491afb32a8aaac1f44dd4ddd54ee',
'info_dict': { 'info_dict': {
@ -19,7 +19,10 @@ class JeuxVideoIE(InfoExtractor):
'title': 'Tearaway : GC 2013 : Tearaway nous présente ses papiers d\'identité', 'title': 'Tearaway : GC 2013 : Tearaway nous présente ses papiers d\'identité',
'description': 'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.', 'description': 'Lorsque les développeurs de LittleBigPlanet proposent un nouveau titre, on ne peut que s\'attendre à un résultat original et fort attrayant.',
}, },
} }, {
'url': 'http://www.jeuxvideo.com/videos/chroniques/434220/l-histoire-du-jeu-video-la-saturn.htm',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -2,7 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse from ..compat import compat_urllib_parse_unquote_plus
from ..utils import ( from ..utils import (
js_to_json, js_to_json,
) )
@ -24,7 +24,7 @@ class KaraoketvIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
page_video_url = self._og_search_video_url(webpage, video_id) page_video_url = self._og_search_video_url(webpage, video_id)
config_json = compat_urllib_parse.unquote_plus(self._search_regex( config_json = compat_urllib_parse_unquote_plus(self._search_regex(
r'config=(.*)', page_video_url, 'configuration')) r'config=(.*)', page_video_url, 'configuration'))
urls_info_json = self._download_json( urls_info_json = self._download_json(

View File

@ -0,0 +1,314 @@
# coding: utf-8
from __future__ import unicode_literals
import re
import itertools
from .common import InfoExtractor
from ..utils import (
get_element_by_id,
clean_html,
ExtractorError,
remove_start,
)
class KuwoBaseIE(InfoExtractor):
_FORMATS = [
{'format': 'ape', 'ext': 'ape', 'preference': 100},
{'format': 'mp3-320', 'ext': 'mp3', 'br': '320kmp3', 'abr': 320, 'preference': 80},
{'format': 'mp3-192', 'ext': 'mp3', 'br': '192kmp3', 'abr': 192, 'preference': 70},
{'format': 'mp3-128', 'ext': 'mp3', 'br': '128kmp3', 'abr': 128, 'preference': 60},
{'format': 'wma', 'ext': 'wma', 'preference': 20},
{'format': 'aac', 'ext': 'aac', 'abr': 48, 'preference': 10}
]
def _get_formats(self, song_id):
formats = []
for file_format in self._FORMATS:
song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s?format=%s&br=%s&rid=MUSIC_%s&type=convert_url&response=url' %
(file_format['ext'], file_format.get('br', ''), song_id),
song_id, note='Download %s url info' % file_format['format'],
)
if song_url.startswith('http://') or song_url.startswith('https://'):
formats.append({
'url': song_url,
'format_id': file_format['format'],
'format': file_format['format'],
'preference': file_format['preference'],
'abr': file_format.get('abr'),
})
self._sort_formats(formats)
return formats
class KuwoIE(KuwoBaseIE):
IE_NAME = 'kuwo:song'
IE_DESC = '酷我音乐'
_VALID_URL = r'http://www\.kuwo\.cn/yinyue/(?P<id>\d+?)/'
_TESTS = [{
'url': 'http://www.kuwo.cn/yinyue/635632/',
'info_dict': {
'id': '635632',
'ext': 'ape',
'title': '爱我别走',
'creator': '张震岳',
'upload_date': '20080122',
'description': 'md5:ed13f58e3c3bf3f7fd9fbc4e5a7aa75c'
},
}, {
'url': 'http://www.kuwo.cn/yinyue/6446136/',
'info_dict': {
'id': '6446136',
'ext': 'mp3',
'title': '',
'creator': 'IU',
'upload_date': '20150518',
},
'params': {
'format': 'mp3-320'
},
}]
def _real_extract(self, url):
song_id = self._match_id(url)
webpage = self._download_webpage(
url, song_id, note='Download song detail info',
errnote='Unable to get song detail info')
song_name = self._html_search_regex(
r'<h1[^>]+title="([^"]+)">', webpage, 'song name')
singer_name = self._html_search_regex(
r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"',
webpage, 'singer name', fatal=False)
lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
if lrc_content == '暂无': # indicates no lyrics
lrc_content = None
formats = self._get_formats(song_id)
album_id = self._html_search_regex(
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
webpage, 'album id', fatal=False)
publish_time = None
if album_id is not None:
album_info_page = self._download_webpage(
'http://www.kuwo.cn/album/%s/' % album_id, song_id,
note='Download album detail info',
errnote='Unable to get album detail info')
publish_time = self._html_search_regex(
r'发行时间:(\d{4}-\d{2}-\d{2})', album_info_page,
'publish time', fatal=False)
if publish_time:
publish_time = publish_time.replace('-', '')
return {
'id': song_id,
'title': song_name,
'creator': singer_name,
'upload_date': publish_time,
'description': lrc_content,
'formats': formats,
}
class KuwoAlbumIE(InfoExtractor):
IE_NAME = 'kuwo:album'
IE_DESC = '酷我音乐 - 专辑'
_VALID_URL = r'http://www\.kuwo\.cn/album/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/album/502294/',
'info_dict': {
'id': '502294',
'title': 'M',
'description': 'md5:6a7235a84cc6400ec3b38a7bdaf1d60c',
},
'playlist_count': 2,
}
def _real_extract(self, url):
album_id = self._match_id(url)
webpage = self._download_webpage(
url, album_id, note='Download album info',
errnote='Unable to get album info')
album_name = self._html_search_regex(
r'<div[^>]+class="comm"[^<]+<h1[^>]+title="([^"]+)"', webpage,
'album name')
album_intro = remove_start(
clean_html(get_element_by_id('intro', webpage)),
'%s简介:' % album_name)
entries = [
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<p[^>]+class="listen"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+/)"',
webpage)
]
return self.playlist_result(entries, album_id, album_name, album_intro)
class KuwoChartIE(InfoExtractor):
IE_NAME = 'kuwo:chart'
IE_DESC = '酷我音乐 - 排行榜'
_VALID_URL = r'http://yinyue\.kuwo\.cn/billboard_(?P<id>[^.]+).htm'
_TEST = {
'url': 'http://yinyue.kuwo.cn/billboard_香港中文龙虎榜.htm',
'info_dict': {
'id': '香港中文龙虎榜',
'title': '香港中文龙虎榜',
'description': 're:\d{4}\d{2}',
},
'playlist_mincount': 10,
}
def _real_extract(self, url):
chart_id = self._match_id(url)
webpage = self._download_webpage(
url, chart_id, note='Download chart info',
errnote='Unable to get chart info')
chart_name = self._html_search_regex(
r'<h1[^>]+class="unDis">([^<]+)</h1>', webpage, 'chart name')
chart_desc = self._html_search_regex(
r'<p[^>]+class="tabDef">(\d{4}\d{2}期)</p>', webpage, 'chart desc')
entries = [
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)/"', webpage)
]
return self.playlist_result(entries, chart_id, chart_name, chart_desc)
class KuwoSingerIE(InfoExtractor):
IE_NAME = 'kuwo:singer'
IE_DESC = '酷我音乐 - 歌手'
_VALID_URL = r'http://www\.kuwo\.cn/mingxing/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/mingxing/bruno+mars/',
'info_dict': {
'id': 'bruno+mars',
'title': 'Bruno Mars',
},
'playlist_count': 10,
}, {
'url': 'http://www.kuwo.cn/mingxing/Ali/music.htm',
'info_dict': {
'id': 'Ali',
'title': 'Ali',
},
'playlist_mincount': 95,
}]
def _real_extract(self, url):
singer_id = self._match_id(url)
webpage = self._download_webpage(
url, singer_id, note='Download singer info',
errnote='Unable to get singer info')
singer_name = self._html_search_regex(
r'<div class="title clearfix">\s*<h1>([^<]+)<span', webpage, 'singer name'
)
entries = []
first_page_only = False if re.search(r'/music(?:_\d+)?\.htm', url) else True
for page_num in itertools.count(1):
webpage = self._download_webpage(
'http://www.kuwo.cn/mingxing/%s/music_%d.htm' % (singer_id, page_num),
singer_id, note='Download song list page #%d' % page_num,
errnote='Unable to get song list page #%d' % page_num)
entries.extend([
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
r'<p[^>]+class="m_name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)/',
webpage)
][:10 if first_page_only else None])
if first_page_only or not re.search(r'<a[^>]+href="[^"]+">下一页</a>', webpage):
break
return self.playlist_result(entries, singer_id, singer_name)
class KuwoCategoryIE(InfoExtractor):
IE_NAME = 'kuwo:category'
IE_DESC = '酷我音乐 - 分类'
_VALID_URL = r'http://yinyue\.kuwo\.cn/yy/cinfo_(?P<id>\d+?).htm'
_TEST = {
'url': 'http://yinyue.kuwo.cn/yy/cinfo_86375.htm',
'info_dict': {
'id': '86375',
'title': '八十年代精选',
'description': '这些都是属于八十年代的回忆!',
},
'playlist_count': 30,
}
def _real_extract(self, url):
category_id = self._match_id(url)
webpage = self._download_webpage(
url, category_id, note='Download category info',
errnote='Unable to get category info')
category_name = self._html_search_regex(
r'<h1[^>]+title="([^<>]+?)">[^<>]+?</h1>', webpage, 'category name')
category_desc = remove_start(
get_element_by_id('intro', webpage).strip(),
'%s简介:' % category_name)
jsonm = self._parse_json(self._html_search_regex(
r'var\s+jsonm\s*=\s*([^;]+);', webpage, 'category songs'), category_id)
entries = [
self.url_result('http://www.kuwo.cn/yinyue/%s/' % song['musicrid'], 'Kuwo')
for song in jsonm['musiclist']
]
return self.playlist_result(entries, category_id, category_name, category_desc)
class KuwoMvIE(KuwoBaseIE):
IE_NAME = 'kuwo:mv'
IE_DESC = '酷我音乐 - MV'
_VALID_URL = r'http://www\.kuwo\.cn/mv/(?P<id>\d+?)/'
_TEST = {
'url': 'http://www.kuwo.cn/mv/6480076/',
'info_dict': {
'id': '6480076',
'ext': 'mkv',
'title': '我们家MV',
'creator': '2PM',
},
}
_FORMATS = KuwoBaseIE._FORMATS + [
{'format': 'mkv', 'ext': 'mkv', 'preference': 250},
{'format': 'mp4', 'ext': 'mp4', 'preference': 200},
]
def _real_extract(self, url):
song_id = self._match_id(url)
webpage = self._download_webpage(
url, song_id, note='Download mv detail info: %s' % song_id,
errnote='Unable to get mv detail info: %s' % song_id)
mobj = re.search(
r'<h1[^>]+title="(?P<song>[^"]+)">[^<]+<span[^>]+title="(?P<singer>[^"]+)"',
webpage)
if mobj:
song_name = mobj.group('song')
singer_name = mobj.group('singer')
else:
raise ExtractorError('Unable to find song or singer names')
formats = self._get_formats(song_id)
return {
'id': song_id,
'title': song_name,
'creator': singer_name,
'formats': formats,
}

View File

@ -0,0 +1,62 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_duration,
int_or_none,
)
class Lecture2GoIE(InfoExtractor):
_VALID_URL = r'https?://lecture2go\.uni-hamburg\.de/veranstaltungen/-/v/(?P<id>\d+)'
_TEST = {
'url': 'https://lecture2go.uni-hamburg.de/veranstaltungen/-/v/17473',
'md5': 'ac02b570883020d208d405d5a3fd2f7f',
'info_dict': {
'id': '17473',
'ext': 'flv',
'title': '2 - Endliche Automaten und reguläre Sprachen',
'creator': 'Frank Heitmann',
'duration': 5220,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
formats = []
for url in set(re.findall(r'"src","([^"]+)"', webpage)):
ext = determine_ext(url)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(url, video_id))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(url, video_id))
else:
formats.append({
'url': url,
})
self._sort_formats(formats)
creator = self._html_search_regex(
r'<div[^>]+id="description">([^<]+)</div>', webpage, 'creator', fatal=False)
duration = parse_duration(self._html_search_regex(
r'Duration:\s*</em>\s*<em[^>]*>([^<]+)</em>', webpage, 'duration', fatal=False))
view_count = int_or_none(self._html_search_regex(
r'Views:\s*</em>\s*<em[^>]+>(\d+)</em>', webpage, 'view count', fatal=False))
return {
'id': video_id,
'title': title,
'formats': formats,
'creator': creator,
'duration': duration,
'view_count': view_count,
}

View File

@ -15,10 +15,12 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
parse_iso8601, parse_iso8601,
int_or_none,
) )
class LetvIE(InfoExtractor): class LetvIE(InfoExtractor):
IE_DESC = '乐视网'
_VALID_URL = r'http://www\.letv\.com/ptv/vplay/(?P<id>\d+).html' _VALID_URL = r'http://www\.letv\.com/ptv/vplay/(?P<id>\d+).html'
_TESTS = [{ _TESTS = [{
@ -133,7 +135,7 @@ class LetvIE(InfoExtractor):
} }
if format_id[-1:] == 'p': if format_id[-1:] == 'p':
url_info_dict['height'] = format_id[:-1] url_info_dict['height'] = int_or_none(format_id[:-1])
urls.append(url_info_dict) urls.append(url_info_dict)

View File

@ -17,7 +17,6 @@ from ..utils import (
class LyndaBaseIE(InfoExtractor): class LyndaBaseIE(InfoExtractor):
_LOGIN_URL = 'https://www.lynda.com/login/login.aspx' _LOGIN_URL = 'https://www.lynda.com/login/login.aspx'
_SUCCESSFUL_LOGIN_REGEX = r'isLoggedIn: true'
_ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.' _ACCOUNT_CREDENTIALS_HINT = 'Use --username and --password options to provide lynda.com account credentials.'
_NETRC_MACHINE = 'lynda' _NETRC_MACHINE = 'lynda'
@ -41,7 +40,7 @@ class LyndaBaseIE(InfoExtractor):
request, None, 'Logging in as %s' % username) request, None, 'Logging in as %s' % username)
# Not (yet) logged in # Not (yet) logged in
m = re.search(r'loginResultJson = \'(?P<json>[^\']+)\';', login_page) m = re.search(r'loginResultJson\s*=\s*\'(?P<json>[^\']+)\';', login_page)
if m is not None: if m is not None:
response = m.group('json') response = m.group('json')
response_json = json.loads(response) response_json = json.loads(response)
@ -70,7 +69,7 @@ class LyndaBaseIE(InfoExtractor):
request, None, request, None,
'Confirming log in and log out from another device') 'Confirming log in and log out from another device')
if re.search(self._SUCCESSFUL_LOGIN_REGEX, login_page) is None: if all(not re.search(p, login_page) for p in ('isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
raise ExtractorError('Unable to log in') raise ExtractorError('Unable to log in')

View File

@ -2,9 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_urllib_parse,
)
class MalemotionIE(InfoExtractor): class MalemotionIE(InfoExtractor):
@ -24,7 +22,7 @@ class MalemotionIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_url = compat_urllib_parse.unquote(self._search_regex( video_url = compat_urllib_parse_unquote(self._search_regex(
r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL')) r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
video_title = self._html_search_regex( video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title') r'<title>(.*?)</title', webpage, 'title')

View File

@ -29,7 +29,7 @@ class MDRIE(InfoExtractor):
doc = self._download_xml(domain + xmlurl, video_id) doc = self._download_xml(domain + xmlurl, video_id)
formats = [] formats = []
for a in doc.findall('./assets/asset'): for a in doc.findall('./assets/asset'):
url_el = a.find('.//progressiveDownloadUrl') url_el = a.find('./progressiveDownloadUrl')
if url_el is None: if url_el is None:
continue continue
abr = int(a.find('bitrateAudio').text) // 1000 abr = int(a.find('bitrateAudio').text) // 1000

View File

@ -6,6 +6,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urllib_parse, compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request, compat_urllib_request,
) )
from ..utils import ( from ..utils import (
@ -155,7 +156,7 @@ class MetacafeIE(InfoExtractor):
video_url = None video_url = None
mobj = re.search(r'(?m)&mediaURL=([^&]+)', webpage) mobj = re.search(r'(?m)&mediaURL=([^&]+)', webpage)
if mobj is not None: if mobj is not None:
mediaURL = compat_urllib_parse.unquote(mobj.group(1)) mediaURL = compat_urllib_parse_unquote(mobj.group(1))
video_ext = mediaURL[-3:] video_ext = mediaURL[-3:]
# Extract gdaKey if available # Extract gdaKey if available

View File

@ -5,6 +5,7 @@ import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -48,7 +49,7 @@ class MiTeleIE(InfoExtractor):
domain = 'http://' + domain domain = 'http://' + domain
info_url = compat_urlparse.urljoin( info_url = compat_urlparse.urljoin(
domain, domain,
compat_urllib_parse.unquote(embed_data['flashvars']['host']) compat_urllib_parse_unquote(embed_data['flashvars']['host'])
) )
info_el = self._download_xml(info_url, episode).find('./video/info') info_el = self._download_xml(info_url, episode).find('./video/info')

View File

@ -3,9 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_urllib_parse,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
@ -60,7 +58,7 @@ class MixcloudIE(InfoExtractor):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
uploader = mobj.group(1) uploader = mobj.group(1)
cloudcast_name = mobj.group(2) cloudcast_name = mobj.group(2)
track_id = compat_urllib_parse.unquote('-'.join((uploader, cloudcast_name))) track_id = compat_urllib_parse_unquote('-'.join((uploader, cloudcast_name)))
webpage = self._download_webpage(url, track_id) webpage = self._download_webpage(url, track_id)

View File

@ -5,9 +5,9 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse_unquote,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urllib_request, compat_urllib_request,
compat_urllib_parse,
) )
@ -34,7 +34,7 @@ class MofosexIE(InfoExtractor):
webpage = self._download_webpage(req, video_id) webpage = self._download_webpage(req, video_id)
video_title = self._html_search_regex(r'<h1>(.+?)<', webpage, 'title') video_title = self._html_search_regex(r'<h1>(.+?)<', webpage, 'title')
video_url = compat_urllib_parse.unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, 'video_url')) video_url = compat_urllib_parse_unquote(self._html_search_regex(r'flashvars.video_url = \'([^\']+)', webpage, 'video_url'))
path = compat_urllib_parse_urlparse(video_url).path path = compat_urllib_parse_urlparse(video_url).path
extension = os.path.splitext(path)[1][1:] extension = os.path.splitext(path)[1][1:]
format = path.split('/')[5].split('_')[:2] format = path.split('/')[5].split('_')[:2]

View File

@ -35,7 +35,8 @@ class MySpassIE(InfoExtractor):
# get metadata # get metadata
metadata_url = META_DATA_URL_TEMPLATE % video_id metadata_url = META_DATA_URL_TEMPLATE % video_id
metadata = self._download_xml(metadata_url, video_id) metadata = self._download_xml(
metadata_url, video_id, transform_source=lambda s: s.strip())
# extract values from metadata # extract values from metadata
url_flv_el = metadata.find('url_flv') url_flv_el = metadata.find('url_flv')

View File

@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .vimple import SprutoBaseIE
class MyviIE(SprutoBaseIE):
_VALID_URL = r'''(?x)
https?://
myvi\.(?:ru/player|tv)/
(?:
(?:
embed/html|
flash|
api/Video/Get
)/|
content/preloader\.swf\?.*\bid=
)
(?P<id>[\da-zA-Z_-]+)
'''
_TESTS = [{
'url': 'http://myvi.ru/player/embed/html/oOy4euHA6LVwNNAjhD9_Jq5Ha2Qf0rtVMVFMAZav8wObeRTZaCATzucDQIDph8hQU0',
'md5': '571bbdfba9f9ed229dc6d34cc0f335bf',
'info_dict': {
'id': 'f16b2bbd-cde8-481c-a981-7cd48605df43',
'ext': 'mp4',
'title': 'хозяин жизни',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 25,
},
}, {
'url': 'http://myvi.ru/player/content/preloader.swf?id=oOy4euHA6LVwNNAjhD9_Jq5Ha2Qf0rtVMVFMAZav8wOYf1WFpPfc_bWTKGVf_Zafr0',
'only_matching': True,
}, {
'url': 'http://myvi.ru/player/api/Video/Get/oOy4euHA6LVwNNAjhD9_Jq5Ha2Qf0rtVMVFMAZav8wObeRTZaCATzucDQIDph8hQU0',
'only_matching': True,
}, {
'url': 'http://myvi.tv/embed/html/oTGTNWdyz4Zwy_u1nraolwZ1odenTd9WkTnRfIL9y8VOgHYqOHApE575x4_xxS9Vn0?ap=0',
'only_matching': True,
}, {
'url': 'http://myvi.ru/player/flash/ocp2qZrHI-eZnHKQBK4cZV60hslH8LALnk0uBfKsB-Q4WnY26SeGoYPi8HWHxu0O30',
'only_matching': True,
}]
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//myvi\.(?:ru/player|tv)/(?:embed/html|flash)/[^"]+)\1', webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
spruto = self._download_json(
'http://myvi.ru/player/api/Video/Get/%s?sig' % video_id, video_id)['sprutoData']
return self._extract_spruto(spruto, video_id)

View File

@ -10,6 +10,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_ord, compat_ord,
compat_urllib_parse, compat_urllib_parse,
compat_urllib_parse_unquote,
compat_urllib_request, compat_urllib_request,
) )
from ..utils import ( from ..utils import (
@ -107,7 +108,7 @@ class MyVideoIE(InfoExtractor):
if not a == '_encxml': if not a == '_encxml':
params[a] = b params[a] = b
else: else:
encxml = compat_urllib_parse.unquote(b) encxml = compat_urllib_parse_unquote(b)
if not params.get('domain'): if not params.get('domain'):
params['domain'] = 'www.myvideo.de' params['domain'] = 'www.myvideo.de'
xmldata_url = '%s?%s' % (encxml, compat_urllib_parse.urlencode(params)) xmldata_url = '%s?%s' % (encxml, compat_urllib_parse.urlencode(params))
@ -135,7 +136,7 @@ class MyVideoIE(InfoExtractor):
video_url = None video_url = None
mobj = re.search('connectionurl=\'(.*?)\'', dec_data) mobj = re.search('connectionurl=\'(.*?)\'', dec_data)
if mobj: if mobj:
video_url = compat_urllib_parse.unquote(mobj.group(1)) video_url = compat_urllib_parse_unquote(mobj.group(1))
if 'myvideo2flash' in video_url: if 'myvideo2flash' in video_url:
self.report_warning( self.report_warning(
'Rewriting URL to use unencrypted rtmp:// ...', 'Rewriting URL to use unencrypted rtmp:// ...',
@ -147,10 +148,10 @@ class MyVideoIE(InfoExtractor):
mobj = re.search('path=\'(http.*?)\' source=\'(.*?)\'', dec_data) mobj = re.search('path=\'(http.*?)\' source=\'(.*?)\'', dec_data)
if mobj is None: if mobj is None:
raise ExtractorError('unable to extract url') raise ExtractorError('unable to extract url')
video_url = compat_urllib_parse.unquote(mobj.group(1)) + compat_urllib_parse.unquote(mobj.group(2)) video_url = compat_urllib_parse_unquote(mobj.group(1)) + compat_urllib_parse_unquote(mobj.group(2))
video_file = self._search_regex('source=\'(.*?)\'', dec_data, 'video file') video_file = self._search_regex('source=\'(.*?)\'', dec_data, 'video file')
video_file = compat_urllib_parse.unquote(video_file) video_file = compat_urllib_parse_unquote(video_file)
if not video_file.endswith('f4m'): if not video_file.endswith('f4m'):
ppath, prefix = video_file.split('.') ppath, prefix = video_file.split('.')
@ -159,7 +160,7 @@ class MyVideoIE(InfoExtractor):
video_playpath = '' video_playpath = ''
video_swfobj = self._search_regex('swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj') video_swfobj = self._search_regex('swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj')
video_swfobj = compat_urllib_parse.unquote(video_swfobj) video_swfobj = compat_urllib_parse_unquote(video_swfobj)
video_title = self._html_search_regex("<h1(?: class='globalHd')?>(.*?)</h1>", video_title = self._html_search_regex("<h1(?: class='globalHd')?>(.*?)</h1>",
webpage, 'title') webpage, 'title')

View File

@ -8,9 +8,10 @@ from ..utils import (
class NationalGeographicIE(InfoExtractor): class NationalGeographicIE(InfoExtractor):
_VALID_URL = r'http://video\.nationalgeographic\.com/video/.*?' _VALID_URL = r'http://video\.nationalgeographic\.com/.*?'
_TEST = { _TESTS = [
{
'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo', 'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo',
'info_dict': { 'info_dict': {
'id': '4DmDACA6Qtk_', 'id': '4DmDACA6Qtk_',
@ -19,14 +20,28 @@ class NationalGeographicIE(InfoExtractor):
'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3', 'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3',
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
} },
{
'url': 'http://video.nationalgeographic.com/wild/when-sharks-attack/the-real-jaws',
'info_dict': {
'id': '_JeBD_D7PlS5',
'ext': 'flv',
'title': 'The Real Jaws',
'description': 'md5:8d3e09d9d53a85cd397b4b21b2c77be6',
},
'add_ie': ['ThePlatform'],
},
]
def _real_extract(self, url): def _real_extract(self, url):
name = url_basename(url) name = url_basename(url)
webpage = self._download_webpage(url, name) webpage = self._download_webpage(url, name)
feed_url = self._search_regex(r'data-feed-url="([^"]+)"', webpage, 'feed url') feed_url = self._search_regex(
guid = self._search_regex(r'data-video-guid="([^"]+)"', webpage, 'guid') r'data-feed-url="([^"]+)"', webpage, 'feed url')
guid = self._search_regex(
r'id="(?:videoPlayer|player-container)"[^>]+data-guid="([^"]+)"',
webpage, 'guid')
feed = self._download_xml('%s?byGuid=%s' % (feed_url, guid), name) feed = self._download_xml('%s?byGuid=%s' % (feed_url, guid), name)
content = feed.find('.//{http://search.yahoo.com/mrss/}content') content = feed.find('.//{http://search.yahoo.com/mrss/}content')
@ -34,5 +49,6 @@ class NationalGeographicIE(InfoExtractor):
return self.url_result(smuggle_url( return self.url_result(smuggle_url(
'http://link.theplatform.com/s/ngs/%s?format=SMIL&formats=MPEG4&manifest=f4m' % theplatform_id, 'http://link.theplatform.com/s/ngs/%s?format=SMIL&formats=MPEG4&manifest=f4m' % theplatform_id,
# For some reason, the normal links don't work and we must force the use of f4m # For some reason, the normal links don't work and we must force
# the use of f4m
{'force_smil_url': True})) {'force_smil_url': True}))

View File

@ -124,7 +124,7 @@ class NBCSportsIE(InfoExtractor):
class NBCNewsIE(InfoExtractor): class NBCNewsIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/ _VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/
(?:video/.+?/(?P<id>\d+)| (?:video/.+?/(?P<id>\d+)|
(?:feature|nightly-news)/[^/]+/(?P<title>.+)) (?:watch|feature|nightly-news)/[^/]+/(?P<title>.+))
''' '''
_TESTS = [ _TESTS = [
@ -169,6 +169,10 @@ class NBCNewsIE(InfoExtractor):
'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5', 'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
}, },
}, },
{
'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952',
'only_matching': True,
},
] ]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -0,0 +1,459 @@
# coding: utf-8
from __future__ import unicode_literals
from hashlib import md5
from base64 import b64encode
from datetime import datetime
import re
from .common import InfoExtractor
from ..compat import (
compat_urllib_request,
compat_urllib_parse,
compat_str,
compat_itertools_count,
)
class NetEaseMusicBaseIE(InfoExtractor):
_FORMATS = ['bMusic', 'mMusic', 'hMusic']
_NETEASE_SALT = '3go8&$8*3*3h0k(2)2'
_API_BASE = 'http://music.163.com/api/'
@classmethod
def _encrypt(cls, dfsid):
salt_bytes = bytearray(cls._NETEASE_SALT.encode('utf-8'))
string_bytes = bytearray(compat_str(dfsid).encode('ascii'))
salt_len = len(salt_bytes)
for i in range(len(string_bytes)):
string_bytes[i] = string_bytes[i] ^ salt_bytes[i % salt_len]
m = md5()
m.update(bytes(string_bytes))
result = b64encode(m.digest()).decode('ascii')
return result.replace('/', '_').replace('+', '-')
@classmethod
def extract_formats(cls, info):
formats = []
for song_format in cls._FORMATS:
details = info.get(song_format)
if not details:
continue
formats.append({
'url': 'http://m1.music.126.net/%s/%s.%s' %
(cls._encrypt(details['dfsId']), details['dfsId'],
details['extension']),
'ext': details.get('extension'),
'abr': details.get('bitrate', 0) / 1000,
'format_id': song_format,
'filesize': details.get('size'),
'asr': details.get('sr')
})
return formats
@classmethod
def convert_milliseconds(cls, ms):
return int(round(ms / 1000.0))
def query_api(self, endpoint, video_id, note):
req = compat_urllib_request.Request('%s%s' % (self._API_BASE, endpoint))
req.add_header('Referer', self._API_BASE)
return self._download_json(req, video_id, note)
class NetEaseMusicIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:song'
IE_DESC = '网易云音乐'
_VALID_URL = r'https?://music\.163\.com/(#/)?song\?id=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://music.163.com/#/song?id=32102397',
'md5': 'f2e97280e6345c74ba9d5677dd5dcb45',
'info_dict': {
'id': '32102397',
'ext': 'mp3',
'title': 'Bad Blood (feat. Kendrick Lamar)',
'creator': 'Taylor Swift / Kendrick Lamar',
'upload_date': '20150517',
'timestamp': 1431878400,
'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
},
}, {
'note': 'No lyrics translation.',
'url': 'http://music.163.com/#/song?id=29822014',
'info_dict': {
'id': '29822014',
'ext': 'mp3',
'title': '听见下雨的声音',
'creator': '周杰伦',
'upload_date': '20141225',
'timestamp': 1419523200,
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
},
}, {
'note': 'No lyrics.',
'url': 'http://music.163.com/song?id=17241424',
'info_dict': {
'id': '17241424',
'ext': 'mp3',
'title': 'Opus 28',
'creator': 'Dustin O\'Halloran',
'upload_date': '20080211',
'timestamp': 1202745600,
},
}, {
'note': 'Has translated name.',
'url': 'http://music.163.com/#/song?id=22735043',
'info_dict': {
'id': '22735043',
'ext': 'mp3',
'title': '소원을 말해봐 (Genie)',
'creator': '少女时代',
'description': 'md5:79d99cc560e4ca97e0c4d86800ee4184',
'upload_date': '20100127',
'timestamp': 1264608000,
'alt_title': '说出愿望吧(Genie)',
}
}]
def _process_lyrics(self, lyrics_info):
original = lyrics_info.get('lrc', {}).get('lyric')
translated = lyrics_info.get('tlyric', {}).get('lyric')
if not translated:
return original
lyrics_expr = r'(\[[0-9]{2}:[0-9]{2}\.[0-9]{2,}\])([^\n]+)'
original_ts_texts = re.findall(lyrics_expr, original)
translation_ts_dict = dict(
(time_stamp, text) for time_stamp, text in re.findall(lyrics_expr, translated)
)
lyrics = '\n'.join([
'%s%s / %s' % (time_stamp, text, translation_ts_dict.get(time_stamp, ''))
for time_stamp, text in original_ts_texts
])
return lyrics
def _real_extract(self, url):
song_id = self._match_id(url)
params = {
'id': song_id,
'ids': '[%s]' % song_id
}
info = self.query_api(
'song/detail?' + compat_urllib_parse.urlencode(params),
song_id, 'Downloading song info')['songs'][0]
formats = self.extract_formats(info)
self._sort_formats(formats)
lyrics_info = self.query_api(
'song/lyric?id=%s&lv=-1&tv=-1' % song_id,
song_id, 'Downloading lyrics data')
lyrics = self._process_lyrics(lyrics_info)
alt_title = None
if info.get('transNames'):
alt_title = '/'.join(info.get('transNames'))
return {
'id': song_id,
'title': info['name'],
'alt_title': alt_title,
'creator': ' / '.join([artist['name'] for artist in info.get('artists', [])]),
'timestamp': self.convert_milliseconds(info.get('album', {}).get('publishTime')),
'thumbnail': info.get('album', {}).get('picUrl'),
'duration': self.convert_milliseconds(info.get('duration', 0)),
'description': lyrics,
'formats': formats,
}
class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:album'
IE_DESC = '网易云音乐 - 专辑'
_VALID_URL = r'https?://music\.163\.com/(#/)?album\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://music.163.com/#/album?id=220780',
'info_dict': {
'id': '220780',
'title': 'B\'day',
},
'playlist_count': 23,
}
def _real_extract(self, url):
album_id = self._match_id(url)
info = self.query_api(
'album/%s?id=%s' % (album_id, album_id),
album_id, 'Downloading album data')['album']
name = info['name']
desc = info.get('description')
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song['id'],
'NetEaseMusic', song['id'])
for song in info['songs']
]
return self.playlist_result(entries, album_id, name, desc)
class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:singer'
IE_DESC = '网易云音乐 - 歌手'
_VALID_URL = r'https?://music\.163\.com/(#/)?artist\?id=(?P<id>[0-9]+)'
_TESTS = [{
'note': 'Singer has aliases.',
'url': 'http://music.163.com/#/artist?id=10559',
'info_dict': {
'id': '10559',
'title': '张惠妹 - aMEI;阿密特',
},
'playlist_count': 50,
}, {
'note': 'Singer has translated name.',
'url': 'http://music.163.com/#/artist?id=124098',
'info_dict': {
'id': '124098',
'title': '李昇基 - 이승기',
},
'playlist_count': 50,
}]
def _real_extract(self, url):
singer_id = self._match_id(url)
info = self.query_api(
'artist/%s?id=%s' % (singer_id, singer_id),
singer_id, 'Downloading singer data')
name = info['artist']['name']
if info['artist']['trans']:
name = '%s - %s' % (name, info['artist']['trans'])
if info['artist']['alias']:
name = '%s - %s' % (name, ';'.join(info['artist']['alias']))
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song['id'],
'NetEaseMusic', song['id'])
for song in info['hotSongs']
]
return self.playlist_result(entries, singer_id, name)
class NetEaseMusicListIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:playlist'
IE_DESC = '网易云音乐 - 歌单'
_VALID_URL = r'https?://music\.163\.com/(#/)?(playlist|discover/toplist)\?id=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://music.163.com/#/playlist?id=79177352',
'info_dict': {
'id': '79177352',
'title': 'Billboard 2007 Top 100',
'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
},
'playlist_count': 99,
}, {
'note': 'Toplist/Charts sample',
'url': 'http://music.163.com/#/discover/toplist?id=3733003',
'info_dict': {
'id': '3733003',
'title': 're:韩国Melon排行榜周榜 [0-9]{4}-[0-9]{2}-[0-9]{2}',
'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
},
'playlist_count': 50,
}]
def _real_extract(self, url):
list_id = self._match_id(url)
info = self.query_api(
'playlist/detail?id=%s&lv=-1&tv=-1' % list_id,
list_id, 'Downloading playlist data')['result']
name = info['name']
desc = info.get('description')
if info.get('specialType') == 10: # is a chart/toplist
datestamp = datetime.fromtimestamp(
self.convert_milliseconds(info['updateTime'])).strftime('%Y-%m-%d')
name = '%s %s' % (name, datestamp)
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song['id'],
'NetEaseMusic', song['id'])
for song in info['tracks']
]
return self.playlist_result(entries, list_id, name, desc)
class NetEaseMusicMvIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:mv'
IE_DESC = '网易云音乐 - MV'
_VALID_URL = r'https?://music\.163\.com/(#/)?mv\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://music.163.com/#/mv?id=415350',
'info_dict': {
'id': '415350',
'ext': 'mp4',
'title': '이럴거면 그러지말지',
'description': '白雅言自作曲唱甜蜜爱情',
'creator': '白雅言',
'upload_date': '20150520',
},
}
def _real_extract(self, url):
mv_id = self._match_id(url)
info = self.query_api(
'mv/detail?id=%s&type=mp4' % mv_id,
mv_id, 'Downloading mv info')['data']
formats = [
{'url': mv_url, 'ext': 'mp4', 'format_id': '%sp' % brs, 'height': int(brs)}
for brs, mv_url in info['brs'].items()
]
self._sort_formats(formats)
return {
'id': mv_id,
'title': info['name'],
'description': info.get('desc') or info.get('briefDesc'),
'creator': info['artistName'],
'upload_date': info['publishTime'].replace('-', ''),
'formats': formats,
'thumbnail': info.get('cover'),
'duration': self.convert_milliseconds(info.get('duration', 0)),
}
class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:program'
IE_DESC = '网易云音乐 - 电台节目'
_VALID_URL = r'https?://music\.163\.com/(#/?)program\?id=(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://music.163.com/#/program?id=10109055',
'info_dict': {
'id': '10109055',
'ext': 'mp3',
'title': '不丹足球背后的故事',
'description': '喜马拉雅人的足球梦 ...',
'creator': '大话西藏',
'timestamp': 1434179342,
'upload_date': '20150613',
'duration': 900,
},
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
'info_dict': {
'id': '10141022',
'title': '25岁你是自在如风的少年<27°C>',
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
},
'playlist_count': 4,
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
'info_dict': {
'id': '10141022',
'ext': 'mp3',
'title': '25岁你是自在如风的少年<27°C>',
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
'timestamp': 1434450841,
'upload_date': '20150616',
},
'params': {
'noplaylist': True
}
}]
def _real_extract(self, url):
program_id = self._match_id(url)
info = self.query_api(
'dj/program/detail?id=%s' % program_id,
program_id, 'Downloading program info')['program']
name = info['name']
description = info['description']
if not info['songs'] or self._downloader.params.get('noplaylist'):
if info['songs']:
self.to_screen(
'Downloading just the main audio %s because of --no-playlist'
% info['mainSong']['id'])
formats = self.extract_formats(info['mainSong'])
self._sort_formats(formats)
return {
'id': program_id,
'title': name,
'description': description,
'creator': info['dj']['brand'],
'timestamp': self.convert_milliseconds(info['createTime']),
'thumbnail': info['coverUrl'],
'duration': self.convert_milliseconds(info.get('duration', 0)),
'formats': formats,
}
self.to_screen(
'Downloading playlist %s - add --no-playlist to just download the main audio %s'
% (program_id, info['mainSong']['id']))
song_ids = [info['mainSong']['id']]
song_ids.extend([song['id'] for song in info['songs']])
entries = [
self.url_result('http://music.163.com/#/song?id=%s' % song_id,
'NetEaseMusic', song_id)
for song_id in song_ids
]
return self.playlist_result(entries, program_id, name, description)
class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
IE_NAME = 'netease:djradio'
IE_DESC = '网易云音乐 - 电台'
_VALID_URL = r'https?://music\.163\.com/(#/)?djradio\?id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://music.163.com/#/djradio?id=42',
'info_dict': {
'id': '42',
'title': '声音蔓延',
'description': 'md5:766220985cbd16fdd552f64c578a6b15'
},
'playlist_mincount': 40,
}
_PAGE_SIZE = 1000
def _real_extract(self, url):
dj_id = self._match_id(url)
name = None
desc = None
entries = []
for offset in compat_itertools_count(start=0, step=self._PAGE_SIZE):
info = self.query_api(
'dj/program/byradio?asc=false&limit=%d&radioId=%s&offset=%d'
% (self._PAGE_SIZE, dj_id, offset),
dj_id, 'Downloading dj programs - %d' % offset)
entries.extend([
self.url_result(
'http://music.163.com/#/program?id=%s' % program['id'],
'NetEaseMusicProgram', program['id'])
for program in info['programs']
])
if name is None:
radio = info['programs'][0]['radio']
name = radio['name']
desc = radio['desc']
if not info['more']:
break
return self.playlist_result(entries, dj_id, name, desc)

View File

@ -6,6 +6,7 @@ from ..utils import parse_iso8601
class NextMediaIE(InfoExtractor): class NextMediaIE(InfoExtractor):
IE_DESC = '蘋果日報'
_VALID_URL = r'http://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)' _VALID_URL = r'http://hk.apple.nextmedia.com/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199', 'url': 'http://hk.apple.nextmedia.com/realtime/news/20141108/53109199',
@ -66,6 +67,7 @@ class NextMediaIE(InfoExtractor):
class NextMediaActionNewsIE(NextMediaIE): class NextMediaActionNewsIE(NextMediaIE):
IE_DESC = '蘋果日報 - 動新聞'
_VALID_URL = r'http://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+' _VALID_URL = r'http://hk.dv.nextmedia.com/actionnews/[^/]+/(?P<date>\d+)/(?P<id>\d+)/\d+'
_TESTS = [{ _TESTS = [{
'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460', 'url': 'http://hk.dv.nextmedia.com/actionnews/hit/20150121/19009428/20061460',
@ -90,6 +92,7 @@ class NextMediaActionNewsIE(NextMediaIE):
class AppleDailyIE(NextMediaIE): class AppleDailyIE(NextMediaIE):
IE_DESC = '臺灣蘋果日報'
_VALID_URL = r'http://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?' _VALID_URL = r'http://(www|ent).appledaily.com.tw/(?:animation|appledaily|enews|realtimenews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
_TESTS = [{ _TESTS = [{
'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694', 'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',

View File

@ -1,12 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
determine_ext,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
parse_duration, parse_duration,
@ -15,7 +14,7 @@ from ..utils import (
class NowTVIE(InfoExtractor): class NowTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nowtv\.de/(?P<station>rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<id>.+?)/player' _VALID_URL = r'https?://(?:www\.)?nowtv\.(?:de|at|ch)/(?:rtl|rtl2|rtlnitro|superrtl|ntv|vox)/(?P<id>.+?)/(?:player|preview)'
_TESTS = [{ _TESTS = [{
# rtl # rtl
@ -23,7 +22,7 @@ class NowTVIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '203519', 'id': '203519',
'display_id': 'bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit', 'display_id': 'bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
'ext': 'mp4', 'ext': 'flv',
'title': 'Die neuen Bauern und eine Hochzeit', 'title': 'Die neuen Bauern und eine Hochzeit',
'description': 'md5:e234e1ed6d63cf06be5c070442612e7e', 'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -32,7 +31,7 @@ class NowTVIE(InfoExtractor):
'duration': 2786, 'duration': 2786,
}, },
'params': { 'params': {
# m3u8 download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
@ -41,7 +40,7 @@ class NowTVIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '203481', 'id': '203481',
'display_id': 'berlin-tag-nacht/berlin-tag-nacht-folge-934', 'display_id': 'berlin-tag-nacht/berlin-tag-nacht-folge-934',
'ext': 'mp4', 'ext': 'flv',
'title': 'Berlin - Tag & Nacht (Folge 934)', 'title': 'Berlin - Tag & Nacht (Folge 934)',
'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0', 'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -50,7 +49,7 @@ class NowTVIE(InfoExtractor):
'duration': 2641, 'duration': 2641,
}, },
'params': { 'params': {
# m3u8 download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
@ -59,7 +58,7 @@ class NowTVIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '165780', 'id': '165780',
'display_id': 'alarm-fuer-cobra-11-die-autobahnpolizei/hals-und-beinbruch-2014-08-23-21-10-00', 'display_id': 'alarm-fuer-cobra-11-die-autobahnpolizei/hals-und-beinbruch-2014-08-23-21-10-00',
'ext': 'mp4', 'ext': 'flv',
'title': 'Hals- und Beinbruch', 'title': 'Hals- und Beinbruch',
'description': 'md5:b50d248efffe244e6f56737f0911ca57', 'description': 'md5:b50d248efffe244e6f56737f0911ca57',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -68,7 +67,7 @@ class NowTVIE(InfoExtractor):
'duration': 2742, 'duration': 2742,
}, },
'params': { 'params': {
# m3u8 download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
@ -77,7 +76,7 @@ class NowTVIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '99205', 'id': '99205',
'display_id': 'medicopter-117/angst', 'display_id': 'medicopter-117/angst',
'ext': 'mp4', 'ext': 'flv',
'title': 'Angst!', 'title': 'Angst!',
'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e', 'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -86,7 +85,7 @@ class NowTVIE(InfoExtractor):
'duration': 3025, 'duration': 3025,
}, },
'params': { 'params': {
# m3u8 download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
@ -95,7 +94,7 @@ class NowTVIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '203521', 'id': '203521',
'display_id': 'ratgeber-geld/thema-ua-der-erste-blick-die-apple-watch', 'display_id': 'ratgeber-geld/thema-ua-der-erste-blick-die-apple-watch',
'ext': 'mp4', 'ext': 'flv',
'title': 'Thema u.a.: Der erste Blick: Die Apple Watch', 'title': 'Thema u.a.: Der erste Blick: Die Apple Watch',
'description': 'md5:4312b6c9d839ffe7d8caf03865a531af', 'description': 'md5:4312b6c9d839ffe7d8caf03865a531af',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -104,7 +103,7 @@ class NowTVIE(InfoExtractor):
'duration': 1083, 'duration': 1083,
}, },
'params': { 'params': {
# m3u8 download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
@ -113,7 +112,7 @@ class NowTVIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '128953', 'id': '128953',
'display_id': 'der-hundeprofi/buero-fall-chihuahua-joel', 'display_id': 'der-hundeprofi/buero-fall-chihuahua-joel',
'ext': 'mp4', 'ext': 'flv',
'title': "Büro-Fall / Chihuahua 'Joel'", 'title': "Büro-Fall / Chihuahua 'Joel'",
'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d', 'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -122,18 +121,22 @@ class NowTVIE(InfoExtractor):
'duration': 3092, 'duration': 3092,
}, },
'params': { 'params': {
# m3u8 download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://www.nowtv.de/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/preview',
'only_matching': True,
}, {
'url': 'http://www.nowtv.at/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit/preview?return=/rtl/bauer-sucht-frau/die-neuen-bauern-und-eine-hochzeit',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id = self._match_id(url)
display_id = mobj.group('id')
station = mobj.group('station')
info = self._download_json( info = self._download_json(
'https://api.nowtv.de/v3/movies/%s?fields=*,format,files' % display_id, 'https://api.nowtv.de/v3/movies/%s?fields=id,title,free,geoblocked,articleLong,articleShort,broadcastStartDate,seoUrl,duration,format,files' % display_id,
display_id) display_id)
video_id = compat_str(info['id']) video_id = compat_str(info['id'])
@ -148,29 +151,19 @@ class NowTVIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'Video %s is not available for free' % video_id, expected=True) 'Video %s is not available for free' % video_id, expected=True)
f = info.get('format', {})
station = f.get('station') or station
STATIONS = {
'rtl': 'rtlnow',
'rtl2': 'rtl2now',
'vox': 'voxnow',
'nitro': 'rtlnitronow',
'ntv': 'n-tvnow',
'superrtl': 'superrtlnow'
}
formats = [] formats = []
for item in files['items']: for item in files['items']:
item_path = remove_start(item['path'], '/') if determine_ext(item['path']) != 'f4v':
tbr = int_or_none(item['bitrate']) continue
m3u8_url = 'http://hls.fra.%s.de/hls-vod-enc/%s.m3u8' % (STATIONS[station], item_path) app, play_path = remove_start(item['path'], '/').split('/', 1)
m3u8_url = m3u8_url.replace('now/', 'now/videos/')
formats.append({ formats.append({
'url': m3u8_url, 'url': 'rtmpe://fms.rtl.de',
'format_id': '%s-%sk' % (item['id'], tbr), 'app': app,
'ext': 'mp4', 'play_path': 'mp4:%s' % play_path,
'tbr': tbr, 'ext': 'flv',
'page_url': url,
'player_url': 'http://rtl-now.rtl.de/includes/nc_player.swf',
'tbr': int_or_none(item.get('bitrate')),
}) })
self._sort_formats(formats) self._sort_formats(formats)
@ -178,6 +171,8 @@ class NowTVIE(InfoExtractor):
description = info.get('articleLong') or info.get('articleShort') description = info.get('articleLong') or info.get('articleShort')
timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ') timestamp = parse_iso8601(info.get('broadcastStartDate'), ' ')
duration = parse_duration(info.get('duration')) duration = parse_duration(info.get('duration'))
f = info.get('format', {})
thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo') thumbnail = f.get('defaultImage169Format') or f.get('defaultImage169Logo')
return { return {

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
fix_xml_ampersands, fix_xml_ampersands,
@ -7,7 +9,6 @@ from ..utils import (
qualities, qualities,
strip_jsonp, strip_jsonp,
unified_strdate, unified_strdate,
url_basename,
) )
@ -37,8 +38,21 @@ class NPOBaseIE(InfoExtractor):
class NPOIE(NPOBaseIE): class NPOIE(NPOBaseIE):
IE_NAME = 'npo.nl' IE_NAME = 'npo'
_VALID_URL = r'https?://(?:www\.)?npo\.nl/(?!live|radio)[^/]+/[^/]+/(?P<id>[^/?]+)' IE_DESC = 'npo.nl and ntr.nl'
_VALID_URL = r'''(?x)
(?:
npo:|
https?://
(?:www\.)?
(?:
npo\.nl/(?!live|radio)(?:[^/]+/){2}|
ntr\.nl/(?:[^/]+/){2,}|
omroepwnl\.nl/video/fragment/[^/]+__
)
)
(?P<id>[^/?#]+)
'''
_TESTS = [ _TESTS = [
{ {
@ -58,7 +72,7 @@ class NPOIE(NPOBaseIE):
'info_dict': { 'info_dict': {
'id': 'VARA_101191800', 'id': 'VARA_101191800',
'ext': 'm4v', 'ext': 'm4v',
'title': 'De Mega Mike & Mega Thomas show', 'title': 'De Mega Mike & Mega Thomas show: The best of.',
'description': 'md5:3b74c97fc9d6901d5a665aac0e5400f4', 'description': 'md5:3b74c97fc9d6901d5a665aac0e5400f4',
'upload_date': '20090227', 'upload_date': '20090227',
'duration': 2400, 'duration': 2400,
@ -70,8 +84,8 @@ class NPOIE(NPOBaseIE):
'info_dict': { 'info_dict': {
'id': 'VPWON_1169289', 'id': 'VPWON_1169289',
'ext': 'm4v', 'ext': 'm4v',
'title': 'Tegenlicht', 'title': 'Tegenlicht: De toekomst komt uit Afrika',
'description': 'md5:d6476bceb17a8c103c76c3b708f05dd1', 'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225', 'upload_date': '20130225',
'duration': 3000, 'duration': 3000,
}, },
@ -100,6 +114,30 @@ class NPOIE(NPOBaseIE):
'title': 'Hoe gaat Europa verder na Parijs?', 'title': 'Hoe gaat Europa verder na Parijs?',
}, },
}, },
{
'url': 'http://www.ntr.nl/Aap-Poot-Pies/27/detail/Aap-poot-pies/VPWON_1233944#content',
'md5': '01c6a2841675995da1f0cf776f03a9c3',
'info_dict': {
'id': 'VPWON_1233944',
'ext': 'm4v',
'title': 'Aap, poot, pies',
'description': 'md5:c9c8005d1869ae65b858e82c01a91fde',
'upload_date': '20150508',
'duration': 599,
},
},
{
'url': 'http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698',
'md5': 'd30cd8417b8b9bca1fdff27428860d08',
'info_dict': {
'id': 'POW_00996502',
'ext': 'm4v',
'title': '''"Dit is wel een 'landslide'..."''',
'description': 'md5:f8d66d537dfb641380226e31ca57b8e8',
'upload_date': '20150508',
'duration': 462,
},
}
] ]
def _real_extract(self, url): def _real_extract(self, url):
@ -114,6 +152,18 @@ class NPOIE(NPOBaseIE):
transform_source=strip_jsonp, transform_source=strip_jsonp,
) )
# For some videos actual video id (prid) is different (e.g. for
# http://www.omroepwnl.nl/video/fragment/vandaag-de-dag-verkiezingen__POMS_WNL_853698
# video id is POMS_WNL_853698 but prid is POW_00996502)
video_id = metadata.get('prid') or video_id
# titel is too generic in some cases so utilize aflevering_titel as well
# when available (e.g. http://tegenlicht.vpro.nl/afleveringen/2014-2015/access-to-africa.html)
title = metadata['titel']
sub_title = metadata.get('aflevering_titel')
if sub_title and sub_title != title:
title += ': %s' % sub_title
token = self._get_token(video_id) token = self._get_token(video_id)
formats = [] formats = []
@ -186,8 +236,8 @@ class NPOIE(NPOBaseIE):
return { return {
'id': video_id, 'id': video_id,
'title': metadata['titel'], 'title': title,
'description': metadata['info'], 'description': metadata.get('info'),
'thumbnail': metadata.get('images', [{'url': None}])[-1]['url'], 'thumbnail': metadata.get('images', [{'url': None}])[-1]['url'],
'upload_date': unified_strdate(metadata.get('gidsdatum')), 'upload_date': unified_strdate(metadata.get('gidsdatum')),
'duration': parse_duration(metadata.get('tijdsduur')), 'duration': parse_duration(metadata.get('tijdsduur')),
@ -356,9 +406,8 @@ class NPORadioFragmentIE(InfoExtractor):
} }
class TegenlichtVproIE(NPOIE): class VPROIE(NPOIE):
IE_NAME = 'tegenlicht.vpro.nl' _VALID_URL = r'https?://(?:www\.)?(?:tegenlicht\.)?vpro\.nl/(?:[^/]+/){2,}(?P<id>[^/]+)\.html'
_VALID_URL = r'https?://tegenlicht\.vpro\.nl/afleveringen/.*?'
_TESTS = [ _TESTS = [
{ {
@ -367,17 +416,72 @@ class TegenlichtVproIE(NPOIE):
'info_dict': { 'info_dict': {
'id': 'VPWON_1169289', 'id': 'VPWON_1169289',
'ext': 'm4v', 'ext': 'm4v',
'title': 'Tegenlicht', 'title': 'De toekomst komt uit Afrika',
'description': 'md5:d6476bceb17a8c103c76c3b708f05dd1', 'description': 'md5:52cf4eefbc96fffcbdc06d024147abea',
'upload_date': '20130225', 'upload_date': '20130225',
}, },
}, },
{
'url': 'http://www.vpro.nl/programmas/2doc/2015/sergio-herman.html',
'info_dict': {
'id': 'sergio-herman',
'title': 'Sergio Herman: Fucking perfect',
},
'playlist_count': 2,
},
{
# playlist with youtube embed
'url': 'http://www.vpro.nl/programmas/2doc/2015/education-education.html',
'info_dict': {
'id': 'education-education',
'title': '2Doc',
},
'playlist_count': 2,
}
] ]
def _real_extract(self, url): def _real_extract(self, url):
name = url_basename(url) playlist_id = self._match_id(url)
webpage = self._download_webpage(url, name)
urn = self._html_search_meta('mediaurn', webpage) webpage = self._download_webpage(url, playlist_id)
info_page = self._download_json(
'http://rs.vpro.nl/v2/api/media/%s.json' % urn, name) entries = [
return self._get_info(info_page['mid']) self.url_result('npo:%s' % video_id if not video_id.startswith('http') else video_id)
for video_id in re.findall(r'data-media-id="([^"]+)"', webpage)
]
playlist_title = self._search_regex(
r'<title>\s*([^>]+?)\s*-\s*Teledoc\s*-\s*VPRO\s*</title>',
webpage, 'playlist title', default=None) or self._og_search_title(webpage)
return self.playlist_result(entries, playlist_id, playlist_title)
class WNLIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?omroepwnl\.nl/video/detail/(?P<id>[^/]+)__\d+'
_TEST = {
'url': 'http://www.omroepwnl.nl/video/detail/vandaag-de-dag-6-mei__060515',
'info_dict': {
'id': 'vandaag-de-dag-6-mei',
'title': 'Vandaag de Dag 6 mei',
},
'playlist_count': 4,
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
entries = [
self.url_result('npo:%s' % video_id, 'NPO')
for video_id, part in re.findall(
r'<a[^>]+href="([^"]+)"[^>]+class="js-mid"[^>]*>(Deel \d+)', webpage)
]
playlist_title = self._html_search_regex(
r'(?s)<h1[^>]+class="subject"[^>]*>(.+?)</h1>',
webpage, 'playlist title')
return self.playlist_result(entries, playlist_id, playlist_title)

View File

@ -116,7 +116,8 @@ class NRKPlaylistIE(InfoExtractor):
class NRKTVIE(InfoExtractor): class NRKTVIE(InfoExtractor):
_VALID_URL = r'(?P<baseurl>https?://tv\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?' IE_DESC = 'NRK TV and NRK Radio'
_VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_TESTS = [ _TESTS = [
{ {
@ -188,6 +189,10 @@ class NRKTVIE(InfoExtractor):
'duration': 6947.5199999999995, 'duration': 6947.5199999999995,
}, },
'skip': 'Only works from Norway', 'skip': 'Only works from Norway',
},
{
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True,
} }
] ]
@ -206,7 +211,8 @@ class NRKTVIE(InfoExtractor):
]} ]}
def _extract_f4m(self, manifest_url, video_id): def _extract_f4m(self, manifest_url, video_id):
return self._extract_f4m_formats(manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id) return self._extract_f4m_formats(
manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id, f4m_id='hds')
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -268,7 +274,7 @@ class NRKTVIE(InfoExtractor):
m3u8_url = re.search(r'data-hls-media="([^"]+)"', webpage) m3u8_url = re.search(r'data-hls-media="([^"]+)"', webpage)
if m3u8_url: if m3u8_url:
formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4')) formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4', m3u8_id='hls'))
self._sort_formats(formats) self._sort_formats(formats)
subtitles_url = self._html_search_regex( subtitles_url = self._html_search_regex(

View File

@ -2,7 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
unified_strdate, unified_strdate,
int_or_none, int_or_none,
@ -62,7 +62,7 @@ class OdnoklassnikiIE(InfoExtractor):
metadata = self._parse_json(metadata, video_id) metadata = self._parse_json(metadata, video_id)
else: else:
metadata = self._download_json( metadata = self._download_json(
compat_urllib_parse.unquote(flashvars['metadataUrl']), compat_urllib_parse_unquote(flashvars['metadataUrl']),
video_id, 'Downloading metadata JSON') video_id, 'Downloading metadata JSON')
movie = metadata['movie'] movie = metadata['movie']

View File

@ -49,19 +49,21 @@ class OnionStudiosIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
title = self._search_regex( title = self._search_regex(
r'share_title\s*=\s*"([^"]+)"', webpage, 'title') r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
webpage, 'title', group='title')
description = self._search_regex( description = self._search_regex(
r'share_description\s*=\s*"([^"]+)"', webpage, r'share_description\s*=\s*(["\'])(?P<description>[^\1]+?)\1',
'description', default=None) webpage, 'description', default=None, group='description')
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'poster="([^"]+)"', webpage, 'thumbnail', default=False) r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',
webpage, 'thumbnail', default=False, group='thumbnail')
uploader_id = self._search_regex( uploader_id = self._search_regex(
r'twitter_handle\s*=\s*"([^"]+)"', r'twitter_handle\s*=\s*(["\'])(?P<uploader_id>[^\1]+?)\1',
webpage, 'uploader id', fatal=False) webpage, 'uploader id', fatal=False, group='uploader_id')
uploader = self._search_regex( uploader = self._search_regex(
r'window\.channelName\s*=\s*"Embedded:([^"]+)"', r'window\.channelName\s*=\s*(["\'])Embedded:(?P<uploader>[^\1]+?)\1',
webpage, 'uploader', default=False) webpage, 'uploader', default=False, group='uploader')
return { return {
'id': video_id, 'id': video_id,

View File

@ -3,9 +3,9 @@ from __future__ import unicode_literals
import json import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import ( from ..utils import (
parse_iso8601, parse_iso8601,
compat_urllib_parse,
parse_age_limit, parse_age_limit,
int_or_none, int_or_none,
) )
@ -37,7 +37,7 @@ class OpenFilmIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
player = compat_urllib_parse.unquote_plus( player = compat_urllib_parse_unquote_plus(
self._og_search_video_url(webpage)) self._og_search_video_url(webpage))
video = json.loads(self._search_regex( video = json.loads(self._search_regex(

View File

@ -32,7 +32,7 @@ class PBSIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '2365006249', 'id': '2365006249',
'ext': 'mp4', 'ext': 'mp4',
'title': 'A More Perfect Union', 'title': 'Constitution USA with Peter Sagal - A More Perfect Union',
'description': 'md5:ba0c207295339c8d6eced00b7c363c6a', 'description': 'md5:ba0c207295339c8d6eced00b7c363c6a',
'duration': 3190, 'duration': 3190,
}, },
@ -46,7 +46,7 @@ class PBSIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '2365297690', 'id': '2365297690',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Losing Iraq', 'title': 'FRONTLINE - Losing Iraq',
'description': 'md5:f5bfbefadf421e8bb8647602011caf8e', 'description': 'md5:f5bfbefadf421e8bb8647602011caf8e',
'duration': 5050, 'duration': 5050,
}, },
@ -60,7 +60,7 @@ class PBSIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '2201174722', 'id': '2201174722',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Cyber Schools Gain Popularity, but Quality Questions Persist', 'title': 'PBS NewsHour - Cyber Schools Gain Popularity, but Quality Questions Persist',
'description': 'md5:5871c15cba347c1b3d28ac47a73c7c28', 'description': 'md5:5871c15cba347c1b3d28ac47a73c7c28',
'duration': 801, 'duration': 801,
}, },
@ -72,7 +72,7 @@ class PBSIE(InfoExtractor):
'id': '2365297708', 'id': '2365297708',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:68d87ef760660eb564455eb30ca464fe', 'description': 'md5:68d87ef760660eb564455eb30ca464fe',
'title': 'Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full', 'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
'duration': 6559, 'duration': 6559,
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, },
@ -88,7 +88,7 @@ class PBSIE(InfoExtractor):
'display_id': 'killer-typhoon', 'display_id': 'killer-typhoon',
'ext': 'mp4', 'ext': 'mp4',
'description': 'md5:c741d14e979fc53228c575894094f157', 'description': 'md5:c741d14e979fc53228c575894094f157',
'title': 'Killer Typhoon', 'title': 'NOVA - Killer Typhoon',
'duration': 3172, 'duration': 3172,
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20140122', 'upload_date': '20140122',
@ -110,7 +110,7 @@ class PBSIE(InfoExtractor):
'id': '2280706814', 'id': '2280706814',
'display_id': 'player', 'display_id': 'player',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Death and the Civil War', 'title': 'American Experience - Death and the Civil War',
'description': 'American Experience, TVs most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.', 'description': 'American Experience, TVs most-watched history series, brings to life the compelling stories from our past that inform our understanding of the world today.',
'duration': 6705, 'duration': 6705,
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -118,6 +118,21 @@ class PBSIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, # requires ffmpeg 'skip_download': True, # requires ffmpeg
}, },
},
{
'url': 'http://video.pbs.org/video/2365367186/',
'info_dict': {
'id': '2365367186',
'display_id': '2365367186',
'ext': 'mp4',
'title': 'To Catch A Comet - Full Episode',
'description': 'On November 12, 2014, billions of kilometers from Earth, spacecraft orbiter Rosetta and lander Philae did what no other had dared to attempt \u2014 land on the volatile surface of a comet as it zooms around the sun at 67,000 km/hr. The European Space Agency hopes this mission can help peer into our past and unlock secrets of our origins.',
'duration': 3342,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # requires ffmpeg
},
} }
] ]
@ -224,6 +239,20 @@ class PBSIE(InfoExtractor):
rating_str = rating_str.rpartition('-')[2] rating_str = rating_str.rpartition('-')[2]
age_limit = US_RATINGS.get(rating_str) age_limit = US_RATINGS.get(rating_str)
subtitles = {}
closed_captions_url = info.get('closed_captions_url')
if closed_captions_url:
subtitles['en'] = [{
'ext': 'ttml',
'url': closed_captions_url,
}]
# info['title'] is often incomplete (e.g. 'Full Episode', 'Episode 5', etc)
# Try turning it to 'program - title' naming scheme if possible
alt_title = info.get('program', {}).get('title')
if alt_title:
info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + '[\s\-:]+', '', info['title'])
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
@ -234,4 +263,5 @@ class PBSIE(InfoExtractor):
'age_limit': age_limit, 'age_limit': age_limit,
'upload_date': upload_date, 'upload_date': upload_date,
'formats': formats, 'formats': formats,
'subtitles': subtitles,
} }

View File

@ -0,0 +1,99 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse,
compat_urllib_request,
)
from ..utils import parse_iso8601
class PeriscopeIE(InfoExtractor):
IE_DESC = 'Periscope'
_VALID_URL = r'https?://(?:www\.)?periscope\.tv/w/(?P<id>[^/?#]+)'
_TEST = {
'url': 'https://www.periscope.tv/w/aJUQnjY3MjA3ODF8NTYxMDIyMDl2zCg2pECBgwTqRpQuQD352EMPTKQjT4uqlM3cgWFA-g==',
'md5': '65b57957972e503fcbbaeed8f4fa04ca',
'info_dict': {
'id': '56102209',
'ext': 'mp4',
'title': 'Bec Boop - 🚠✈️🇬🇧 Fly above #London in Emirates Air Line cable car at night 🇬🇧✈️🚠 #BoopScope 🎀💗',
'timestamp': 1438978559,
'upload_date': '20150807',
'uploader': 'Bec Boop',
'uploader_id': '1465763',
},
'skip': 'Expires in 24 hours',
}
def _call_api(self, method, token):
return self._download_json(
'https://api.periscope.tv/api/v2/%s?token=%s' % (method, token), token)
def _real_extract(self, url):
token = self._match_id(url)
broadcast_data = self._call_api('getBroadcastPublic', token)
broadcast = broadcast_data['broadcast']
status = broadcast['status']
uploader = broadcast.get('user_display_name') or broadcast_data.get('user', {}).get('display_name')
uploader_id = broadcast.get('user_id') or broadcast_data.get('user', {}).get('id')
title = '%s - %s' % (uploader, status) if uploader else status
state = broadcast.get('state').lower()
if state == 'running':
title = self._live_title(title)
timestamp = parse_iso8601(broadcast.get('created_at'))
thumbnails = [{
'url': broadcast[image],
} for image in ('image_url', 'image_url_small') if broadcast.get(image)]
stream = self._call_api('getAccessPublic', token)
formats = []
for format_id in ('replay', 'rtmp', 'hls', 'https_hls'):
video_url = stream.get(format_id + '_url')
if not video_url:
continue
f = {
'url': video_url,
'ext': 'flv' if format_id == 'rtmp' else 'mp4',
}
if format_id != 'rtmp':
f['protocol'] = 'm3u8_native' if state == 'ended' else 'm3u8'
formats.append(f)
self._sort_formats(formats)
return {
'id': broadcast.get('id') or token,
'title': title,
'timestamp': timestamp,
'uploader': uploader,
'uploader_id': uploader_id,
'thumbnails': thumbnails,
'formats': formats,
}
class QuickscopeIE(InfoExtractor):
IE_DESC = 'Quick Scope'
_VALID_URL = r'https?://watchonperiscope\.com/broadcast/(?P<id>\d+)'
_TEST = {
'url': 'https://watchonperiscope.com/broadcast/56180087',
'only_matching': True,
}
def _real_extract(self, url):
broadcast_id = self._match_id(url)
request = compat_urllib_request.Request(
'https://watchonperiscope.com/api/accessChannel', compat_urllib_parse.urlencode({
'broadcast_id': broadcast_id,
'entry_ticket': '',
'from_push': 'false',
'uses_sessions': 'true',
}).encode('utf-8'))
return self.url_result(
self._download_json(request, broadcast_id)['share_url'], 'Periscope')

View File

@ -4,7 +4,7 @@ import json
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse from ..compat import compat_urllib_parse_unquote
class PhotobucketIE(InfoExtractor): class PhotobucketIE(InfoExtractor):
@ -34,7 +34,7 @@ class PhotobucketIE(InfoExtractor):
info_json = self._search_regex(r'Pb\.Data\.Shared\.put\(Pb\.Data\.Shared\.MEDIA, (.*?)\);', info_json = self._search_regex(r'Pb\.Data\.Shared\.put\(Pb\.Data\.Shared\.MEDIA, (.*?)\);',
webpage, 'info json') webpage, 'info json')
info = json.loads(info_json) info = json.loads(info_json)
url = compat_urllib_parse.unquote(self._html_search_regex(r'file=(.+\.mp4)', info['linkcodes']['html'], 'url')) url = compat_urllib_parse_unquote(self._html_search_regex(r'file=(.+\.mp4)', info['linkcodes']['html'], 'url'))
return { return {
'id': video_id, 'id': video_id,
'url': url, 'url': url,

View File

@ -38,9 +38,7 @@ class PlayedIE(InfoExtractor):
if m_error: if m_error:
raise ExtractorError(m_error.group('msg'), expected=True) raise ExtractorError(m_error.group('msg'), expected=True)
fields = re.findall( data = self._hidden_inputs(orig_webpage)
r'type="hidden" name="([^"]+)"\s+value="([^"]+)">', orig_webpage)
data = dict(fields)
self._sleep(2, video_id) self._sleep(2, video_id)

View File

@ -4,7 +4,8 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
) )
from ..utils import ( from ..utils import (
clean_html, clean_html,
@ -44,7 +45,7 @@ class PlayvidIE(InfoExtractor):
flashvars = self._html_search_regex( flashvars = self._html_search_regex(
r'flashvars="(.+?)"', webpage, 'flashvars') r'flashvars="(.+?)"', webpage, 'flashvars')
infos = compat_urllib_parse.unquote(flashvars).split(r'&') infos = compat_urllib_parse_unquote(flashvars).split(r'&')
for info in infos: for info in infos:
videovars_match = re.match(r'^video_vars\[(.+?)\]=(.+?)$', info) videovars_match = re.match(r'^video_vars\[(.+?)\]=(.+?)$', info)
if videovars_match: if videovars_match:
@ -52,7 +53,7 @@ class PlayvidIE(InfoExtractor):
val = videovars_match.group(2) val = videovars_match.group(2)
if key == 'title': if key == 'title':
video_title = compat_urllib_parse.unquote_plus(val) video_title = compat_urllib_parse_unquote_plus(val)
if key == 'duration': if key == 'duration':
try: try:
duration = int(val) duration = int(val)

View File

@ -5,7 +5,8 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urllib_request, compat_urllib_request,
) )
@ -69,7 +70,7 @@ class PornHubIE(InfoExtractor):
webpage, 'uploader', fatal=False) webpage, 'uploader', fatal=False)
thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False) thumbnail = self._html_search_regex(r'"image_url":"([^"]+)', webpage, 'thumbnail', fatal=False)
if thumbnail: if thumbnail:
thumbnail = compat_urllib_parse.unquote(thumbnail) thumbnail = compat_urllib_parse_unquote(thumbnail)
view_count = self._extract_count( view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> views', webpage, 'view') r'<span class="count">([\d,\.]+)</span> views', webpage, 'view')
@ -80,9 +81,9 @@ class PornHubIE(InfoExtractor):
comment_count = self._extract_count( comment_count = self._extract_count(
r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment') r'All Comments\s*<span>\(([\d,.]+)\)', webpage, 'comment')
video_urls = list(map(compat_urllib_parse.unquote, re.findall(r'"quality_[0-9]{3}p":"([^"]+)', webpage))) video_urls = list(map(compat_urllib_parse_unquote, re.findall(r"player_quality_[0-9]{3}p\s*=\s*'([^']+)'", webpage)))
if webpage.find('"encrypted":true') != -1: if webpage.find('"encrypted":true') != -1:
password = compat_urllib_parse.unquote_plus( password = compat_urllib_parse_unquote_plus(
self._search_regex(r'"video_title":"([^"]+)', webpage, 'password')) self._search_regex(r'"video_title":"([^"]+)', webpage, 'password'))
video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls)) video_urls = list(map(lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'), video_urls))
@ -93,7 +94,7 @@ class PornHubIE(InfoExtractor):
format = path.split('/')[5].split('_')[:2] format = path.split('/')[5].split('_')[:2]
format = "-".join(format) format = "-".join(format)
m = re.match(r'^(?P<height>[0-9]+)P-(?P<tbr>[0-9]+)K$', format) m = re.match(r'^(?P<height>[0-9]+)[pP]-(?P<tbr>[0-9]+)[kK]$', format)
if m is None: if m is None:
height = None height = None
tbr = None tbr = None

View File

@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse,
@ -31,12 +29,7 @@ class PrimeShareTVIE(InfoExtractor):
if '>File not exist<' in webpage: if '>File not exist<' in webpage:
raise ExtractorError('Video %s does not exist' % video_id, expected=True) raise ExtractorError('Video %s does not exist' % video_id, expected=True)
fields = dict(re.findall(r'''(?x)<input\s+ fields = self._hidden_inputs(webpage)
type="hidden"\s+
name="([^"]+)"\s+
(?:id="[^"]+"\s+)?
value="([^"]*)"
''', webpage))
headers = { headers = {
'Referer': url, 'Referer': url,

View File

@ -35,10 +35,7 @@ class PromptFileIE(InfoExtractor):
raise ExtractorError('Video %s does not exist' % video_id, raise ExtractorError('Video %s does not exist' % video_id,
expected=True) expected=True)
fields = dict(re.findall(r'''(?x)type="hidden"\s+ fields = self._hidden_inputs(webpage)
name="(.+?)"\s+
value="(.*?)"
''', webpage))
post = compat_urllib_parse.urlencode(fields) post = compat_urllib_parse.urlencode(fields)
req = compat_urllib_request.Request(url, post) req = compat_urllib_request.Request(url, post)
req.add_header('Content-type', 'application/x-www-form-urlencoded') req.add_header('Content-type', 'application/x-www-form-urlencoded')

View File

@ -9,8 +9,11 @@ from ..compat import (
compat_urllib_parse, compat_urllib_parse,
) )
from ..utils import ( from ..utils import (
unified_strdate, ExtractorError,
determine_ext,
float_or_none,
int_or_none, int_or_none,
unified_strdate,
) )
@ -21,6 +24,11 @@ class ProSiebenSat1IE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
# Tests changes introduced in https://github.com/rg3/youtube-dl/pull/6242
# in response to fixing https://github.com/rg3/youtube-dl/issues/6215:
# - malformed f4m manifest support
# - proper handling of URLs starting with `https?://` in 2.0 manifests
# - recursive child f4m manifests extraction
'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge', 'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge',
'info_dict': { 'info_dict': {
'id': '2104602', 'id': '2104602',
@ -208,7 +216,7 @@ class ProSiebenSat1IE(InfoExtractor):
clip_id = self._html_search_regex(self._CLIPID_REGEXES, webpage, 'clip id') clip_id = self._html_search_regex(self._CLIPID_REGEXES, webpage, 'clip id')
access_token = 'prosieben' access_token = 'prosieben'
client_name = 'kolibri-1.12.6' client_name = 'kolibri-2.0.19-splec4'
client_location = url client_location = url
videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse.urlencode({ videos_api_url = 'http://vas.sim-technik.de/vas/live/v2/videos?%s' % compat_urllib_parse.urlencode({
@ -218,10 +226,13 @@ class ProSiebenSat1IE(InfoExtractor):
'ids': clip_id, 'ids': clip_id,
}) })
videos = self._download_json(videos_api_url, clip_id, 'Downloading videos JSON') video = self._download_json(videos_api_url, clip_id, 'Downloading videos JSON')[0]
duration = float(videos[0]['duration']) if video.get('is_protected') is True:
source_ids = [source['id'] for source in videos[0]['sources']] raise ExtractorError('This video is DRM protected.', expected=True)
duration = float_or_none(video.get('duration'))
source_ids = [source['id'] for source in video['sources']]
source_ids_str = ','.join(map(str, source_ids)) source_ids_str = ','.join(map(str, source_ids))
g = '01!8d8F_)r9]4s[qeuXfP%' g = '01!8d8F_)r9]4s[qeuXfP%'
@ -275,8 +286,9 @@ class ProSiebenSat1IE(InfoExtractor):
for source in urls_sources: for source in urls_sources:
protocol = source['protocol'] protocol = source['protocol']
source_url = source['url']
if protocol == 'rtmp' or protocol == 'rtmpe': if protocol == 'rtmp' or protocol == 'rtmpe':
mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source['url']) mobj = re.search(r'^(?P<url>rtmpe?://[^/]+)/(?P<path>.+)$', source_url)
if not mobj: if not mobj:
continue continue
path = mobj.group('path') path = mobj.group('path')
@ -293,9 +305,11 @@ class ProSiebenSat1IE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'format_id': '%s_%s' % (source['cdn'], source['bitrate']), 'format_id': '%s_%s' % (source['cdn'], source['bitrate']),
}) })
elif 'f4mgenerator' in source_url or determine_ext(source_url) == 'f4m':
formats.extend(self._extract_f4m_formats(source_url, clip_id))
else: else:
formats.append({ formats.append({
'url': source['url'], 'url': source_url,
'vbr': fix_bitrate(source['bitrate']), 'vbr': fix_bitrate(source['bitrate']),
}) })

View File

@ -9,12 +9,14 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
strip_jsonp, strip_jsonp,
unescapeHTML, unescapeHTML,
clean_html,
) )
from ..compat import compat_urllib_request from ..compat import compat_urllib_request
class QQMusicIE(InfoExtractor): class QQMusicIE(InfoExtractor):
IE_NAME = 'qqmusic' IE_NAME = 'qqmusic'
IE_DESC = 'QQ音乐'
_VALID_URL = r'http://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)' _VALID_URL = r'http://y.qq.com/#type=song&mid=(?P<id>[0-9A-Za-z]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD', 'url': 'http://y.qq.com/#type=song&mid=004295Et37taLD',
@ -26,6 +28,20 @@ class QQMusicIE(InfoExtractor):
'upload_date': '20141227', 'upload_date': '20141227',
'creator': '林俊杰', 'creator': '林俊杰',
'description': 'md5:d327722d0361576fde558f1ac68a7065', 'description': 'md5:d327722d0361576fde558f1ac68a7065',
'thumbnail': 're:^https?://.*\.jpg$',
}
}, {
'note': 'There is no mp3-320 version of this song.',
'url': 'http://y.qq.com/#type=song&mid=004MsGEo3DdNxV',
'md5': 'fa3926f0c585cda0af8fa4f796482e3e',
'info_dict': {
'id': '004MsGEo3DdNxV',
'ext': 'mp3',
'title': '如果',
'upload_date': '20050626',
'creator': '李季美',
'description': 'md5:46857d5ed62bc4ba84607a805dccf437',
'thumbnail': 're:^https?://.*\.jpg$',
} }
}] }]
@ -68,6 +84,14 @@ class QQMusicIE(InfoExtractor):
if lrc_content: if lrc_content:
lrc_content = lrc_content.replace('\\n', '\n') lrc_content = lrc_content.replace('\\n', '\n')
thumbnail_url = None
albummid = self._search_regex(
[r'albummid:\'([0-9a-zA-Z]+)\'', r'"albummid":"([0-9a-zA-Z]+)"'],
detail_info_page, 'album mid', default=None)
if albummid:
thumbnail_url = "http://i.gtimg.cn/music/photo/mid_album_500/%s/%s/%s.jpg" \
% (albummid[-2:-1], albummid[-1], albummid)
guid = self.m_r_get_ruin() guid = self.m_r_get_ruin()
vkey = self._download_json( vkey = self._download_json(
@ -85,6 +109,7 @@ class QQMusicIE(InfoExtractor):
'preference': details['preference'], 'preference': details['preference'],
'abr': details.get('abr'), 'abr': details.get('abr'),
}) })
self._check_formats(formats, mid)
self._sort_formats(formats) self._sort_formats(formats)
return { return {
@ -94,6 +119,7 @@ class QQMusicIE(InfoExtractor):
'upload_date': publish_time, 'upload_date': publish_time,
'creator': singer, 'creator': singer,
'description': lrc_content, 'description': lrc_content,
'thumbnail': thumbnail_url,
} }
@ -117,6 +143,7 @@ class QQPlaylistBaseIE(InfoExtractor):
class QQMusicSingerIE(QQPlaylistBaseIE): class QQMusicSingerIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:singer' IE_NAME = 'qqmusic:singer'
IE_DESC = 'QQ音乐 - 歌手'
_VALID_URL = r'http://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)' _VALID_URL = r'http://y.qq.com/#type=singer&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = { _TEST = {
'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2', 'url': 'http://y.qq.com/#type=singer&mid=001BLpXF2DyJe2',
@ -161,39 +188,50 @@ class QQMusicSingerIE(QQPlaylistBaseIE):
class QQMusicAlbumIE(QQPlaylistBaseIE): class QQMusicAlbumIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:album' IE_NAME = 'qqmusic:album'
IE_DESC = 'QQ音乐 - 专辑'
_VALID_URL = r'http://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)' _VALID_URL = r'http://y.qq.com/#type=album&mid=(?P<id>[0-9A-Za-z]+)'
_TEST = { _TESTS = [{
'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1&play=0', 'url': 'http://y.qq.com/#type=album&mid=000gXCTb2AhRR1',
'info_dict': { 'info_dict': {
'id': '000gXCTb2AhRR1', 'id': '000gXCTb2AhRR1',
'title': '我们都是这样长大的', 'title': '我们都是这样长大的',
'description': 'md5:d216c55a2d4b3537fe4415b8767d74d6', 'description': 'md5:179c5dce203a5931970d306aa9607ea6',
}, },
'playlist_count': 4, 'playlist_count': 4,
} }, {
'url': 'http://y.qq.com/#type=album&mid=002Y5a3b3AlCu3',
'info_dict': {
'id': '002Y5a3b3AlCu3',
'title': '그리고...',
'description': 'md5:a48823755615508a95080e81b51ba729',
},
'playlist_count': 8,
}]
def _real_extract(self, url): def _real_extract(self, url):
mid = self._match_id(url) mid = self._match_id(url)
album_page = self._download_webpage( album = self._download_json(
self.qq_static_url('album', mid), mid, 'Download album page') 'http://i.y.qq.com/v8/fcg-bin/fcg_v8_album_info_cp.fcg?albummid=%s&format=json' % mid,
mid, 'Download album page')['data']
entries = self.get_entries_from_page(album_page) entries = [
self.url_result(
album_name = self._html_search_regex( 'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
r"albumname\s*:\s*'([^']+)',", album_page, 'album name', ) for song in album['list']
default=None) ]
album_name = album.get('name')
album_detail = self._html_search_regex( album_detail = album.get('desc')
r'<div class="album_detail close_detail">\s*<p>((?:[^<>]+(?:<br />)?)+)</p>', if album_detail is not None:
album_page, 'album details', default=None) album_detail = album_detail.strip()
return self.playlist_result(entries, mid, album_name, album_detail) return self.playlist_result(entries, mid, album_name, album_detail)
class QQMusicToplistIE(QQPlaylistBaseIE): class QQMusicToplistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:toplist' IE_NAME = 'qqmusic:toplist'
IE_DESC = 'QQ音乐 - 排行榜'
_VALID_URL = r'http://y\.qq\.com/#type=toplist&p=(?P<id>(top|global)_[0-9]+)' _VALID_URL = r'http://y\.qq\.com/#type=toplist&p=(?P<id>(top|global)_[0-9]+)'
_TESTS = [{ _TESTS = [{
@ -243,3 +281,37 @@ class QQMusicToplistIE(QQPlaylistBaseIE):
list_name = topinfo.get('ListName') list_name = topinfo.get('ListName')
list_description = topinfo.get('info') list_description = topinfo.get('info')
return self.playlist_result(entries, list_id, list_name, list_description) return self.playlist_result(entries, list_id, list_name, list_description)
class QQMusicPlaylistIE(QQPlaylistBaseIE):
IE_NAME = 'qqmusic:playlist'
IE_DESC = 'QQ音乐 - 歌单'
_VALID_URL = r'http://y\.qq\.com/#type=taoge&id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://y.qq.com/#type=taoge&id=3462654915',
'info_dict': {
'id': '3462654915',
'title': '韩国5月新歌精选下旬',
'description': 'md5:d2c9d758a96b9888cf4fe82f603121d4',
},
'playlist_count': 40,
}
def _real_extract(self, url):
list_id = self._match_id(url)
list_json = self._download_json(
'http://i.y.qq.com/qzone-music/fcg-bin/fcg_ucc_getcdinfo_byids_cp.fcg?type=1&json=1&utf8=1&onlysong=0&disstid=%s'
% list_id, list_id, 'Download list page',
transform_source=strip_jsonp)['cdlist'][0]
entries = [
self.url_result(
'http://y.qq.com/#type=song&mid=' + song['songmid'], 'QQMusic', song['songmid']
) for song in list_json['songlist']
]
list_name = list_json.get('dissname')
list_description = clean_html(unescapeHTML(list_json.get('desc')))
return self.playlist_result(entries, list_id, list_name, list_description)

View File

@ -0,0 +1,73 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
)
class RDSIE(InfoExtractor):
IE_DESC = 'RDS.ca'
_VALID_URL = r'https?://(?:www\.)?rds\.ca/vid(?:[eé]|%C3%A9)os/(?:[^/]+/)*(?P<display_id>[^/]+)-(?P<id>\d+\.\d+)'
_TESTS = [{
'url': 'http://www.rds.ca/videos/football/nfl/fowler-jr-prend-la-direction-de-jacksonville-3.1132799',
'info_dict': {
'id': '3.1132799',
'display_id': 'fowler-jr-prend-la-direction-de-jacksonville',
'ext': 'mp4',
'title': 'Fowler Jr. prend la direction de Jacksonville',
'description': 'Dante Fowler Jr. est le troisième choix du repêchage 2015 de la NFL. ',
'timestamp': 1430397346,
'upload_date': '20150430',
'duration': 154.354,
'age_limit': 0,
}
}, {
'url': 'http://www.rds.ca/vid%C3%A9os/un-voyage-positif-3.877934',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
# TODO: extract f4m from 9c9media.com
video_url = self._search_regex(
r'<span[^>]+itemprop="contentURL"[^>]+content="([^"]+)"',
webpage, 'video url')
title = self._og_search_title(webpage) or self._html_search_meta(
'title', webpage, 'title', fatal=True)
description = self._og_search_description(webpage) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage) or self._search_regex(
[r'<link[^>]+itemprop="thumbnailUrl"[^>]+href="([^"]+)"',
r'<span[^>]+itemprop="thumbnailUrl"[^>]+content="([^"]+)"'],
webpage, 'thumbnail', fatal=False)
timestamp = parse_iso8601(self._search_regex(
r'<span[^>]+itemprop="uploadDate"[^>]+content="([^"]+)"',
webpage, 'upload date', fatal=False))
duration = parse_duration(self._search_regex(
r'<span[^>]+itemprop="duration"[^>]+content="([^"]+)"',
webpage, 'duration', fatal=False))
age_limit = self._family_friendly_search(webpage)
return {
'id': video_id,
'display_id': display_id,
'url': video_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'duration': duration,
'age_limit': age_limit,
}

View File

@ -43,6 +43,25 @@ class RtlNlIE(InfoExtractor):
'upload_date': '20150215', 'upload_date': '20150215',
'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.', 'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.',
} }
}, {
# empty synopsis and missing episodes (see https://github.com/rg3/youtube-dl/issues/6275)
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a/autoplay=false',
'info_dict': {
'id': 'f536aac0-1dc3-4314-920e-3bd1c5b3811a',
'ext': 'mp4',
'title': 'RTL Nieuws - Meer beelden van overval juwelier',
'thumbnail': 're:^https?://screenshots\.rtl\.nl/system/thumb/sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
'timestamp': 1437233400,
'upload_date': '20150718',
'duration': 30.474,
},
'params': {
'skip_download': True,
},
}, {
# encrypted m3u8 streams, georestricted
'url': 'http://www.rtlxl.nl/#!/afl-2-257632/52a74543-c504-4cde-8aa8-ec66fe8d68a7',
'only_matching': True,
}, { }, {
'url': 'http://www.rtl.nl/system/videoplayer/derden/embed.html#!/uuid=bb0353b0-d6a4-1dad-90e9-18fe75b8d1f0', 'url': 'http://www.rtl.nl/system/videoplayer/derden/embed.html#!/uuid=bb0353b0-d6a4-1dad-90e9-18fe75b8d1f0',
'only_matching': True, 'only_matching': True,
@ -51,21 +70,33 @@ class RtlNlIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
uuid = self._match_id(url) uuid = self._match_id(url)
info = self._download_json( info = self._download_json(
'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=flash/' % uuid, 'http://www.rtl.nl/system/s4m/vfd/version=2/uuid=%s/fmt=adaptive/' % uuid,
uuid) uuid)
material = info['material'][0] material = info['material'][0]
progname = info['abstracts'][0]['name'] title = info['abstracts'][0]['name']
subtitle = material['title'] or info['episodes'][0]['name'] subtitle = material.get('title')
description = material.get('synopsis') or info['episodes'][0]['synopsis'] if subtitle:
title += ' - %s' % subtitle
description = material.get('synopsis')
# Use unencrypted m3u8 streams (See https://github.com/rg3/youtube-dl/issues/4118) meta = info.get('meta', {})
videopath = material['videopath'].replace('.f4m', '.m3u8')
m3u8_url = 'http://manifest.us.rtl.nl' + videopath # m3u8 streams are encrypted and may not be handled properly by older ffmpeg/avconv.
# To workaround this previously adaptive -> flash trick was used to obtain
# unencrypted m3u8 streams (see https://github.com/rg3/youtube-dl/issues/4118)
# and bypass georestrictions as well.
# Currently, unencrypted m3u8 playlists are (intentionally?) invalid and therefore
# unusable albeit can be fixed by simple string replacement (see
# https://github.com/rg3/youtube-dl/pull/6337)
# Since recent ffmpeg and avconv handle encrypted streams just fine encrypted
# streams are used now.
videopath = material['videopath']
m3u8_url = meta.get('videohost', 'http://manifest.us.rtl.nl') + videopath
formats = self._extract_m3u8_formats(m3u8_url, uuid, ext='mp4') formats = self._extract_m3u8_formats(m3u8_url, uuid, ext='mp4')
video_urlpart = videopath.split('/flash/')[1][:-5] video_urlpart = videopath.split('/adaptive/')[1][:-5]
PG_URL_TEMPLATE = 'http://pg.us.rtl.nl/rtlxl/network/%s/progressive/%s.mp4' PG_URL_TEMPLATE = 'http://pg.us.rtl.nl/rtlxl/network/%s/progressive/%s.mp4'
formats.extend([ formats.extend([
@ -82,7 +113,7 @@ class RtlNlIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = [] thumbnails = []
meta = info.get('meta', {})
for p in ('poster_base_url', '"thumb_base_url"'): for p in ('poster_base_url', '"thumb_base_url"'):
if not meta.get(p): if not meta.get(p):
continue continue
@ -98,7 +129,7 @@ class RtlNlIE(InfoExtractor):
return { return {
'id': uuid, 'id': uuid,
'title': '%s - %s' % (progname, subtitle), 'title': title,
'formats': formats, 'formats': formats,
'timestamp': material['original_date'], 'timestamp': material['original_date'],
'description': description, 'description': description,

View File

@ -19,7 +19,16 @@ from ..utils import (
class RTSIE(InfoExtractor): class RTSIE(InfoExtractor):
IE_DESC = 'RTS.ch' IE_DESC = 'RTS.ch'
_VALID_URL = r'https?://(?:www\.)?rts\.ch/(?:(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html|play/tv/[^/]+/video/(?P<display_id_new>.+?)\?id=(?P<id_new>[0-9]+))' _VALID_URL = r'''(?x)
(?:
rts:(?P<rts_id>\d+)|
https?://
(?:www\.)?rts\.ch/
(?:
(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html|
play/tv/[^/]+/video/(?P<display_id_new>.+?)\?id=(?P<id_new>[0-9]+)
)
)'''
_TESTS = [ _TESTS = [
{ {
@ -122,6 +131,15 @@ class RTSIE(InfoExtractor):
'view_count': int, 'view_count': int,
}, },
}, },
{
# article with videos on rhs
'url': 'http://www.rts.ch/sport/hockey/6693917-hockey-davos-decroche-son-31e-titre-de-champion-de-suisse.html',
'info_dict': {
'id': '6693917',
'title': 'Hockey: Davos décroche son 31e titre de champion de Suisse',
},
'playlist_mincount': 5,
},
{ {
'url': 'http://www.rts.ch/play/tv/le-19h30/video/le-chantier-du-nouveau-parlement-vaudois-a-permis-une-trouvaille-historique?id=6348280', 'url': 'http://www.rts.ch/play/tv/le-19h30/video/le-chantier-du-nouveau-parlement-vaudois-a-permis-une-trouvaille-historique?id=6348280',
'only_matching': True, 'only_matching': True,
@ -130,7 +148,7 @@ class RTSIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
m = re.match(self._VALID_URL, url) m = re.match(self._VALID_URL, url)
video_id = m.group('id') or m.group('id_new') video_id = m.group('rts_id') or m.group('id') or m.group('id_new')
display_id = m.group('display_id') or m.group('display_id_new') display_id = m.group('display_id') or m.group('display_id_new')
def download_json(internal_id): def download_json(internal_id):
@ -143,6 +161,15 @@ class RTSIE(InfoExtractor):
# video_id extracted out of URL is not always a real id # video_id extracted out of URL is not always a real id
if 'video' not in all_info and 'audio' not in all_info: if 'video' not in all_info and 'audio' not in all_info:
page = self._download_webpage(url, display_id) page = self._download_webpage(url, display_id)
# article with videos on rhs
videos = re.findall(
r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:rts:video:(\d+)"',
page)
if videos:
entries = [self.url_result('rts:%s' % video_urn, 'RTS') for video_urn in videos]
return self.playlist_result(entries, video_id, self._og_search_title(page))
internal_id = self._html_search_regex( internal_id = self._html_search_regex(
r'<(?:video|audio) data-id="([0-9]+)"', page, r'<(?:video|audio) data-id="([0-9]+)"', page,
'internal video id') 'internal video id')

View File

@ -1,17 +1,12 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
js_to_json,
remove_end,
)
class SBSIE(InfoExtractor): class SBSIE(InfoExtractor):
IE_DESC = 'sbs.com.au' IE_DESC = 'sbs.com.au'
_VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/ondemand/video/(?:single/)?(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?sbs\.com\.au/(?:ondemand|news)/video/(?:single/)?(?P<id>[0-9]+)'
_TESTS = [{ _TESTS = [{
# Original URL is handled by the generic IE which finds the iframe: # Original URL is handled by the generic IE which finds the iframe:
@ -21,39 +16,36 @@ class SBSIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '320403011771', 'id': '320403011771',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dingo Conservation', 'title': 'Dingo Conservation (The Feed)',
'description': 'Dingoes are on the brink of extinction; most of the animals we think are dingoes are in fact crossbred with wild dogs. This family run a dingo conservation park to prevent their extinction', 'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5',
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
'duration': 308,
}, },
'add_ies': ['generic'],
}, { }, {
'url': 'http://www.sbs.com.au/ondemand/video/320403011771/Dingo-Conservation-The-Feed', 'url': 'http://www.sbs.com.au/ondemand/video/320403011771/Dingo-Conservation-The-Feed',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.sbs.com.au/news/video/471395907773/The-Feed-July-9',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(
'http://www.sbs.com.au/ondemand/video/single/%s?context=web' % video_id, video_id)
player = self._search_regex( player_params = self._parse_json(
r'(?s)playerParams\.releaseUrls\s*=\s*(\{.*?\n\});\n', self._search_regex(
webpage, 'player') r'(?s)var\s+playerParams\s*=\s*({.+?});', webpage, 'playerParams'),
player = re.sub(r"'\s*\+\s*[\da-zA-Z_]+\s*\+\s*'", '', player) video_id)
release_urls = self._parse_json(js_to_json(player), video_id) urls = player_params['releaseUrls']
theplatform_url = (urls.get('progressive') or urls.get('standard') or
theplatform_url = release_urls.get('progressive') or release_urls['standard'] urls.get('html') or player_params['relatedItemsURL'])
title = remove_end(self._og_search_title(webpage), ' (The Feed)')
description = self._html_search_meta('description', webpage)
thumbnail = self._og_search_thumbnail(webpage)
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
'id': video_id, 'id': video_id,
'url': theplatform_url, 'url': theplatform_url,
'title': title,
'description': description,
'thumbnail': thumbnail,
} }

View File

@ -1,12 +1,11 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
unified_strdate, unified_strdate,
js_to_json,
) )
@ -22,59 +21,48 @@ class ScreenwaveMediaIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
playerdata = self._download_webpage( playerdata = self._download_webpage(
'http://player.screenwavemedia.com/play/player.php?id=%s' % video_id, 'http://player.screenwavemedia.com/player.php?id=%s' % video_id,
video_id, 'Downloading player webpage') video_id, 'Downloading player webpage')
vidtitle = self._search_regex( vidtitle = self._search_regex(
r'\'vidtitle\'\s*:\s*"([^"]+)"', playerdata, 'vidtitle').replace('\\/', '/') r'\'vidtitle\'\s*:\s*"([^"]+)"', playerdata, 'vidtitle').replace('\\/', '/')
vidurl = self._search_regex(
r'\'vidurl\'\s*:\s*"([^"]+)"', playerdata, 'vidurl').replace('\\/', '/')
videolist_url = None playerconfig = self._download_webpage(
'http://player.screenwavemedia.com/player.js',
video_id, 'Downloading playerconfig webpage')
mobj = re.search(r"'videoserver'\s*:\s*'(?P<videoserver>[^']+)'", playerdata) videoserver = self._search_regex(r"\[ipaddress\]\s*=>\s*([\d\.]+)", playerdata, 'videoserver')
if mobj:
videoserver = mobj.group('videoserver') sources = self._parse_json(
mobj = re.search(r'\'vidid\'\s*:\s*"(?P<vidid>[^\']+)"', playerdata) js_to_json(
vidid = mobj.group('vidid') if mobj else video_id self._search_regex(
videolist_url = 'http://%s/vod/smil:%s.smil/jwplayer.smil' % (videoserver, vidid) r"sources\s*:\s*(\[[^\]]+?\])", playerconfig,
else: 'sources',
mobj = re.search(r"file\s*:\s*'(?P<smil>http.+?/jwplayer\.smil)'", playerdata) ).replace(
if mobj: "' + thisObj.options.videoserver + '",
videolist_url = mobj.group('smil') videoserver
).replace(
"' + playerVidId + '",
video_id
)
),
video_id
)
if videolist_url:
videolist = self._download_xml(videolist_url, video_id, 'Downloading videolist XML')
formats = [] formats = []
baseurl = vidurl[:vidurl.rfind('/') + 1] for source in sources:
for video in videolist.findall('.//video'): if source['type'] == 'hls':
src = video.get('src') formats.extend(self._extract_m3u8_formats(source['file'], video_id))
if not src: else:
continue format_label = source.get('label')
file_ = src.partition(':')[-1] height = int_or_none(self._search_regex(
width = int_or_none(video.get('width')) r'^(\d+)[pP]', format_label, 'height', default=None))
height = int_or_none(video.get('height')) formats.append({
bitrate = int_or_none(video.get('system-bitrate'), scale=1000) 'url': source['file'],
format = { 'format': format_label,
'url': baseurl + file_, 'ext': source.get('type'),
'format_id': src.rpartition('.')[0].rpartition('_')[-1],
}
if width or height:
format.update({
'tbr': bitrate,
'width': width,
'height': height, 'height': height,
}) })
else:
format.update({
'abr': bitrate,
'vcodec': 'none',
})
formats.append(format)
else:
formats = [{
'url': vidurl,
}]
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -1,6 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
import base64 import base64
from .common import InfoExtractor from .common import InfoExtractor
@ -35,8 +34,7 @@ class SharedIE(InfoExtractor):
raise ExtractorError( raise ExtractorError(
'Video %s does not exist' % video_id, expected=True) 'Video %s does not exist' % video_id, expected=True)
download_form = dict(re.findall( download_form = self._hidden_inputs(webpage)
r'<input type="hidden" name="([^"]+)" value="([^"]*)"', webpage))
request = compat_urllib_request.Request( request = compat_urllib_request.Request(
url, compat_urllib_parse.urlencode(download_form)) url, compat_urllib_parse.urlencode(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded') request.add_header('Content-Type', 'application/x-www-form-urlencoded')

View File

@ -23,6 +23,15 @@ class SnagFilmsEmbedIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': '#whilewewatch', 'title': '#whilewewatch',
} }
}, {
# invalid labels, 360p is better that 480p
'url': 'http://www.snagfilms.com/embed/player?filmId=17ca0950-a74a-11e0-a92a-0026bb61d036',
'md5': '882fca19b9eb27ef865efeeaed376a48',
'info_dict': {
'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
'ext': 'mp4',
'title': 'Life in Limbo',
}
}, { }, {
'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017', 'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
'only_matching': True, 'only_matching': True,
@ -52,14 +61,15 @@ class SnagFilmsEmbedIE(InfoExtractor):
if not file_: if not file_:
continue continue
type_ = source.get('type') type_ = source.get('type')
format_id = source.get('label')
ext = determine_ext(file_) ext = determine_ext(file_)
if any(_ == 'm3u8' for _ in (type_, ext)): format_id = source.get('label') or ext
if all(v == 'm3u8' for v in (type_, ext)):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
file_, video_id, 'mp4', m3u8_id='hls')) file_, video_id, 'mp4', m3u8_id='hls'))
else: else:
bitrate = int_or_none(self._search_regex( bitrate = int_or_none(self._search_regex(
r'(\d+)kbps', file_, 'bitrate', default=None)) [r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
file_, 'bitrate', default=None))
height = int_or_none(self._search_regex( height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None)) r'^(\d+)[pP]$', format_id, 'height', default=None))
formats.append({ formats.append({

View File

@ -29,7 +29,7 @@ class SoundcloudIE(InfoExtractor):
_VALID_URL = r'''(?x)^(?:https?://)? _VALID_URL = r'''(?x)^(?:https?://)?
(?:(?:(?:www\.|m\.)?soundcloud\.com/ (?:(?:(?:www\.|m\.)?soundcloud\.com/
(?P<uploader>[\w\d-]+)/ (?P<uploader>[\w\d-]+)/
(?!sets/|(?:likes|tracks)/?(?:$|[?#])) (?!(?:tracks|sets(?:/[^/?#]+)?|reposts|likes|spotlight)/?(?:$|[?#]))
(?P<title>[\w\d-]+)/? (?P<title>[\w\d-]+)/?
(?P<token>[^?]+?)?(?:[?].*)?$) (?P<token>[^?]+?)?(?:[?].*)?$)
|(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+) |(?:api\.soundcloud\.com/tracks/(?P<track_id>\d+)
@ -282,69 +282,150 @@ class SoundcloudSetIE(SoundcloudIE):
msgs = (compat_str(err['error_message']) for err in info['errors']) msgs = (compat_str(err['error_message']) for err in info['errors'])
raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs)) raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in info['tracks']]
return { return {
'_type': 'playlist', '_type': 'playlist',
'entries': [self._extract_info_dict(track, secret_token=token) for track in info['tracks']], 'entries': entries,
'id': '%s' % info['id'], 'id': '%s' % info['id'],
'title': info['title'], 'title': info['title'],
} }
class SoundcloudUserIE(SoundcloudIE): class SoundcloudUserIE(SoundcloudIE):
_VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<user>[^/]+)/?((?P<rsrc>tracks|likes)/?)?(\?.*)?$' _VALID_URL = r'''(?x)
https?://
(?:(?:www|m)\.)?soundcloud\.com/
(?P<user>[^/]+)
(?:/
(?P<rsrc>tracks|sets|reposts|likes|spotlight)
)?
/?(?:[?#].*)?$
'''
IE_NAME = 'soundcloud:user' IE_NAME = 'soundcloud:user'
_TESTS = [{ _TESTS = [{
'url': 'https://soundcloud.com/the-concept-band', 'url': 'https://soundcloud.com/the-akashic-chronicler',
'info_dict': { 'info_dict': {
'id': '9615865', 'id': '114582580',
'title': 'The Royal Concept', 'title': 'The Akashic Chronicler (All)',
}, },
'playlist_mincount': 12 'playlist_mincount': 112,
}, {
'url': 'https://soundcloud.com/the-concept-band/likes',
'info_dict': {
'id': '9615865',
'title': 'The Royal Concept',
},
'playlist_mincount': 1,
}, { }, {
'url': 'https://soundcloud.com/the-akashic-chronicler/tracks', 'url': 'https://soundcloud.com/the-akashic-chronicler/tracks',
'only_matching': True, 'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Tracks)',
},
'playlist_mincount': 50,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/sets',
'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Playlists)',
},
'playlist_mincount': 3,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/reposts',
'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Reposts)',
},
'playlist_mincount': 9,
}, {
'url': 'https://soundcloud.com/the-akashic-chronicler/likes',
'info_dict': {
'id': '114582580',
'title': 'The Akashic Chronicler (Likes)',
},
'playlist_mincount': 333,
}, {
'url': 'https://soundcloud.com/grynpyret/spotlight',
'info_dict': {
'id': '7098329',
'title': 'Grynpyret (Spotlight)',
},
'playlist_mincount': 1,
}] }]
_API_BASE = 'https://api.soundcloud.com'
_API_V2_BASE = 'https://api-v2.soundcloud.com'
_BASE_URL_MAP = {
'all': '%s/profile/soundcloud:users:%%s' % _API_V2_BASE,
'tracks': '%s/users/%%s/tracks' % _API_BASE,
'sets': '%s/users/%%s/playlists' % _API_V2_BASE,
'reposts': '%s/profile/soundcloud:users:%%s/reposts' % _API_V2_BASE,
'likes': '%s/users/%%s/likes' % _API_V2_BASE,
'spotlight': '%s/users/%%s/spotlight' % _API_V2_BASE,
}
_TITLE_MAP = {
'all': 'All',
'tracks': 'Tracks',
'sets': 'Playlists',
'reposts': 'Reposts',
'likes': 'Likes',
'spotlight': 'Spotlight',
}
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
uploader = mobj.group('user') uploader = mobj.group('user')
resource = mobj.group('rsrc')
if resource is None:
resource = 'tracks'
elif resource == 'likes':
resource = 'favorites'
url = 'http://soundcloud.com/%s/' % uploader url = 'http://soundcloud.com/%s/' % uploader
resolv_url = self._resolv_url(url) resolv_url = self._resolv_url(url)
user = self._download_json( user = self._download_json(
resolv_url, uploader, 'Downloading user info') resolv_url, uploader, 'Downloading user info')
base_url = 'http://api.soundcloud.com/users/%s/%s.json?' % (uploader, resource)
resource = mobj.group('rsrc') or 'all'
base_url = self._BASE_URL_MAP[resource] % user['id']
next_href = None
entries = [] entries = []
for i in itertools.count(): for i in itertools.count():
if not next_href:
data = compat_urllib_parse.urlencode({ data = compat_urllib_parse.urlencode({
'offset': i * 50, 'offset': i * 50,
'limit': 50, 'limit': 50,
'client_id': self._CLIENT_ID, 'client_id': self._CLIENT_ID,
'linked_partitioning': '1',
'representation': 'speedy',
}) })
new_entries = self._download_json( next_href = base_url + '?' + data
base_url + data, uploader, 'Downloading track page %s' % (i + 1))
if len(new_entries) == 0: response = self._download_json(
next_href, uploader, 'Downloading track page %s' % (i + 1))
collection = response['collection']
if not collection:
self.to_screen('%s: End page received' % uploader) self.to_screen('%s: End page received' % uploader)
break break
entries.extend(self.url_result(e['permalink_url'], 'Soundcloud') for e in new_entries)
def resolve_permalink_url(candidates):
for cand in candidates:
if isinstance(cand, dict):
permalink_url = cand.get('permalink_url')
if permalink_url and permalink_url.startswith('http'):
return permalink_url
for e in collection:
permalink_url = resolve_permalink_url((e, e.get('track'), e.get('playlist')))
if permalink_url:
entries.append(self.url_result(permalink_url))
if 'next_href' in response:
next_href = response['next_href']
if not next_href:
break
else:
next_href = None
return { return {
'_type': 'playlist', '_type': 'playlist',
'id': compat_str(user['id']), 'id': compat_str(user['id']),
'title': user['username'], 'title': '%s (%s)' % (user['username'], self._TITLE_MAP[resource]),
'entries': entries, 'entries': entries,
} }
@ -379,9 +460,7 @@ class SoundcloudPlaylistIE(SoundcloudIE):
data = self._download_json( data = self._download_json(
base_url + data, playlist_id, 'Downloading playlist') base_url + data, playlist_id, 'Downloading playlist')
entries = [ entries = [self.url_result(track['permalink_url'], 'Soundcloud') for track in data['tracks']]
self._extract_info_dict(t, quiet=True, secret_token=token)
for t in data['tracks']]
return { return {
'_type': 'playlist', '_type': 'playlist',

View File

@ -45,6 +45,14 @@ class SouthParkDeIE(SouthParkIE):
'title': 'The Government Won\'t Respect My Privacy', 'title': 'The Government Won\'t Respect My Privacy',
'description': 'Cartman explains the benefits of "Shitter" to Stan, Kyle and Craig.', 'description': 'Cartman explains the benefits of "Shitter" to Stan, Kyle and Craig.',
}, },
}, {
# non-ASCII characters in initial URL
'url': 'http://www.southpark.de/alle-episoden/s18e09-hashtag-aufwärmen',
'playlist_count': 4,
}, {
# non-ASCII characters in redirect URL
'url': 'http://www.southpark.de/alle-episoden/s18e09',
'playlist_count': 4,
}] }]

View File

@ -4,7 +4,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_unquote,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urllib_request, compat_urllib_request,
) )
@ -68,7 +68,7 @@ class SpankwireIE(InfoExtractor):
webpage, 'comment count', fatal=False)) webpage, 'comment count', fatal=False))
video_urls = list(map( video_urls = list(map(
compat_urllib_parse.unquote, compat_urllib_parse_unquote,
re.findall(r'playerData\.cdnPath[0-9]{3,}\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage))) re.findall(r'playerData\.cdnPath[0-9]{3,}\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)))
if webpage.find('flashvars\.encrypted = "true"') != -1: if webpage.find('flashvars\.encrypted = "true"') != -1:
password = self._search_regex( password = self._search_regex(

View File

@ -9,7 +9,7 @@ from .spiegeltv import SpiegeltvIE
class SpiegelIE(InfoExtractor): class SpiegelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?spiegel\.de/video/[^/]*-(?P<id>[0-9]+)(?:-embed)?(?:\.html)?(?:#.*)?$' _VALID_URL = r'https?://(?:www\.)?spiegel\.de/video/[^/]*-(?P<id>[0-9]+)(?:-embed|-iframe)?(?:\.html)?(?:#.*)?$'
_TESTS = [{ _TESTS = [{
'url': 'http://www.spiegel.de/video/vulkan-tungurahua-in-ecuador-ist-wieder-aktiv-video-1259285.html', 'url': 'http://www.spiegel.de/video/vulkan-tungurahua-in-ecuador-ist-wieder-aktiv-video-1259285.html',
'md5': '2c2754212136f35fb4b19767d242f66e', 'md5': '2c2754212136f35fb4b19767d242f66e',
@ -39,6 +39,9 @@ class SpiegelIE(InfoExtractor):
'description': 'SPIEGEL ONLINE-Nutzer durften den deutschen Astronauten Alexander Gerst über sein Leben auf der ISS-Station befragen. Hier kommen seine Antworten auf die besten sechs Fragen.', 'description': 'SPIEGEL ONLINE-Nutzer durften den deutschen Astronauten Alexander Gerst über sein Leben auf der ISS-Station befragen. Hier kommen seine Antworten auf die besten sechs Fragen.',
'title': 'Fragen an Astronaut Alexander Gerst: "Bekommen Sie die Tageszeiten mit?"', 'title': 'Fragen an Astronaut Alexander Gerst: "Bekommen Sie die Tageszeiten mit?"',
} }
}, {
'url': 'http://www.spiegel.de/video/astronaut-alexander-gerst-von-der-iss-station-beantwortet-fragen-video-1519126-iframe.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -77,11 +77,13 @@ class SpiegeltvIE(InfoExtractor):
'rtmp_live': True, 'rtmp_live': True,
}) })
elif determine_ext(endpoint) == 'm3u8': elif determine_ext(endpoint) == 'm3u8':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
endpoint.replace('[video]', play_path), endpoint.replace('[video]', play_path),
video_id, 'm4v', video_id, 'm4v',
preference=1, # Prefer hls since it allows to workaround georestriction preference=1, # Prefer hls since it allows to workaround georestriction
m3u8_id='hls')) m3u8_id='hls', fatal=False)
if m3u8_formats is not False:
formats.extend(m3u8_formats)
else: else:
formats.append({ formats.append({
'url': endpoint, 'url': endpoint,

Some files were not shown because too many files have changed in this diff Show More