Merge pull request #6 from rg3/master

update
This commit is contained in:
siddht1 2016-04-25 10:33:44 +05:30
commit 23edc3f8dc
91 changed files with 1608 additions and 1110 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.24*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.06** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.24**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.06 [debug] youtube-dl version 2016.04.24
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -167,3 +167,4 @@ Kacper Michajłow
José Joaquín Atria José Joaquín Atria
Viťas Strádal Viťas Strádal
Kagami Hiiragi Kagami Hiiragi
Philip Huppert

View File

@ -140,14 +140,14 @@ After you have ensured this site is distributing it's content legally, you can f
# TODO more properties (see youtube_dl/extractor/common.py) # TODO more properties (see youtube_dl/extractor/common.py)
} }
``` ```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor

View File

@ -176,7 +176,9 @@ which means you can modify it, redistribute it or use it however you like.
--xattr-set-filesize Set file xattribute ytdl.filesize with --xattr-set-filesize Set file xattribute ytdl.filesize with
expected filesize (experimental) expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of --hls-prefer-native Use the native HLS downloader instead of
ffmpeg (experimental) ffmpeg
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
downloader
--hls-use-mpegts Use the mpegts container for HLS videos, --hls-use-mpegts Use the mpegts container for HLS videos,
allowing to play the video while allowing to play the video while
downloading (some players may not be able downloading (some players may not be able
@ -515,6 +517,18 @@ Available for the video that is an episode of some series or programme:
- `episode_number`: Number of the video episode within a season - `episode_number`: Number of the video episode within a season
- `episode_id`: Id of the video episode - `episode_id`: Id of the video episode
Available for the media that is a track or a part of a music album:
- `track`: Title of the track
- `track_number`: Number of the track within an album or a disc
- `track_id`: Id of the track
- `artist`: Artist(s) of the track
- `genre`: Genre(s) of the track
- `album`: Title of the album the track belongs to
- `album_type`: Type of the album
- `album_artist`: List of all artists appeared on the album
- `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`. Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory. For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.

View File

@ -50,6 +50,7 @@
- **arte.tv:ddc** - **arte.tv:ddc**
- **arte.tv:embed** - **arte.tv:embed**
- **arte.tv:future** - **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine** - **arte.tv:magazine**
- **AtresPlayer** - **AtresPlayer**
- **ATTTechChannel** - **ATTTechChannel**
@ -115,6 +116,7 @@
- **Cinemassacre** - **Cinemassacre**
- **Clipfish** - **Clipfish**
- **cliphunter** - **cliphunter**
- **ClipRs**
- **Clipsyndicate** - **Clipsyndicate**
- **cloudtime**: CloudTime - **cloudtime**: CloudTime
- **Cloudy** - **Cloudy**
@ -161,6 +163,7 @@
- **defense.gouv.fr** - **defense.gouv.fr**
- **democracynow** - **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum - **DHM**: Filmarchiv - Deutsches Historisches Museum
- **DigitallySpeaking**
- **Digiteka** - **Digiteka**
- **Discovery** - **Discovery**
- **Dotsub** - **Dotsub**
@ -172,7 +175,6 @@
- **Dropbox** - **Dropbox**
- **DrTuber** - **DrTuber**
- **DRTV** - **DRTV**
- **Dump**
- **Dumpert** - **Dumpert**
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **dw** - **dw**
@ -286,7 +288,6 @@
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
- **Izlesene** - **Izlesene**
- **JadoreCettePub**
- **JeuxVideo** - **JeuxVideo**
- **Jove** - **Jove**
- **jpopsuki.tv** - **jpopsuki.tv**
@ -344,19 +345,22 @@
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV
- **Minhateca** - **Minhateca**
- **MinistryGrid** - **MinistryGrid**
- **Minoto** - **Minoto**
- **miomio.tv** - **miomio.tv**
- **MiTele**: mitele.es - **MiTele**: mitele.es
- **mixcloud** - **mixcloud**
- **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user**
- **MLB** - **MLB**
- **Mnet** - **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **Mojvideo** - **Mojvideo**
- **Moniker**: allmyvideos.net and vidspot.net - **Moniker**: allmyvideos.net and vidspot.net
- **mooshare**: Mooshare.biz
- **Morningstar**: morningstar.com - **Morningstar**: morningstar.com
- **Motherless** - **Motherless**
- **Motorsport**: motorsport.com - **Motorsport**: motorsport.com
@ -393,7 +397,6 @@
- **ndr:embed:base** - **ndr:embed:base**
- **NDTV** - **NDTV**
- **NerdCubedFeed** - **NerdCubedFeed**
- **Nerdist**
- **netease:album**: 网易云音乐 - 专辑 - **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台 - **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV - **netease:mv**: 网易云音乐 - MV
@ -411,7 +414,8 @@
- **nfl.com** - **nfl.com**
- **nhl.com** - **nhl.com**
- **nhl.com:news**: NHL news - **nhl.com:news**: NHL news
- **nhl.com:videocenter**: NHL videocenter category - **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com** - **nick.com**
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
@ -459,13 +463,13 @@
- **Patreon** - **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC) - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag** - **pcmag**
- **People**
- **Periscope**: Periscope - **Periscope**: Periscope
- **PhilharmonieDeParis**: Philharmonie de Paris - **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de** - **phoenix.de**
- **Photobucket** - **Photobucket**
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**
- **PlanetaPlay**
- **play.fm** - **play.fm**
- **played.to** - **played.to**
- **PlaysTV** - **PlaysTV**
@ -484,6 +488,7 @@
- **Pornotube** - **Pornotube**
- **PornoVoisines** - **PornoVoisines**
- **PornoXO** - **PornoXO**
- **PressTV**
- **PrimeShareTV** - **PrimeShareTV**
- **PromptFile** - **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital - **prosiebensat1**: ProSiebenSat.1 Digital
@ -494,7 +499,6 @@
- **qqmusic:playlist**: QQ音乐 - 歌单 - **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuickVid**
- **R7** - **R7**
- **radio.de** - **radio.de**
- **radiobremen** - **radiobremen**
@ -608,6 +612,7 @@
- **Tagesschau** - **Tagesschau**
- **Tapely** - **Tapely**
- **Tass** - **Tass**
- **TDSLifeway**
- **teachertube**: teachertube.com videos - **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos - **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel** - **TeachingChannel**
@ -624,7 +629,6 @@
- **TeleTask** - **TeleTask**
- **TF1** - **TF1**
- **TheIntercept** - **TheIntercept**
- **TheOnion**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene** - **TheScene**
@ -683,7 +687,6 @@
- **twitter** - **twitter**
- **twitter:amplify** - **twitter:amplify**
- **twitter:card** - **twitter:card**
- **Ubu**
- **udemy** - **udemy**
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
@ -753,7 +756,6 @@
- **Walla** - **Walla**
- **WashingtonPost** - **WashingtonPost**
- **wat.tv** - **wat.tv**
- **WayOfTheMaster**
- **WDR** - **WDR**
- **wdr:mobile** - **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus - **WDRMaus**: Sendung mit der Maus

View File

@ -413,6 +413,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('01:02:03:04'), 93784) self.assertEqual(parse_duration('01:02:03:04'), 93784)
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780) self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
self.assertEqual(parse_duration('87 Min.'), 5220) self.assertEqual(parse_duration('87 Min.'), 5220)
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
def test_fix_xml_ampersands(self): def test_fix_xml_ampersands(self):
self.assertEqual( self.assertEqual(

View File

@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase):
ie = YoutubePlaylistIE(dl) ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w') result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries'] entries = result['entries']
self.assertTrue(len(entries) >= 20) self.assertTrue(len(entries) >= 50)
original_video = entries[0] original_video = entries[0]
self.assertEqual(original_video['id'], 'OQpdSVF_k_w') self.assertEqual(original_video['id'], 'OQpdSVF_k_w')

View File

@ -260,7 +260,9 @@ class YoutubeDL(object):
The following options determine which downloader is picked: The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call. external_downloader: Executable of the external downloader to call.
None or unset for standard (built-in) downloader. None or unset for standard (built-in) downloader.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv. hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
The following parameters are not used by YoutubeDL itself, they are used by The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see youtube_dl/downloader/common.py): the downloader (see youtube_dl/downloader/common.py):

View File

@ -41,9 +41,12 @@ def get_suitable_downloader(info_dict, params={}):
if ed.can_download(info_dict): if ed.can_download(info_dict):
return ed return ed
if protocol == 'm3u8' and params.get('hls_prefer_native'): if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
return HlsFD return HlsFD
if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False:
return FFmpegFD
return PROTOCOL_MAP.get(protocol, HttpFD) return PROTOCOL_MAP.get(protocol, HttpFD)

View File

@ -225,7 +225,7 @@ class FFmpegFD(ExternalFD):
args += ['-i', url, '-c', 'copy'] args += ['-i', url, '-c', 'copy']
if protocol == 'm3u8': if protocol == 'm3u8':
if self.params.get('hls_use_mpegts', False): if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts'] args += ['-f', 'mpegts']
else: else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc'] args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']

View File

@ -27,6 +27,8 @@ class RtspFD(FileDownloader):
self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.') self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
return False return False
self._debug_cmd(args)
retval = subprocess.call(args) retval = subprocess.call(args)
if retval == 0: if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename)) fsize = os.path.getsize(encodeFilename(tmpfilename))

View File

@ -12,9 +12,10 @@ from ..utils import (
class AolIE(InfoExtractor): class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com' IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[^/?-]+)' _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/.*-)(?P<id>[^/?-]+)'
_TESTS = [{ _TESTS = [{
# video with 5min ID
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img', 'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
'md5': '18ef68f48740e86ae94b98da815eec42', 'md5': '18ef68f48740e86ae94b98da815eec42',
'info_dict': { 'info_dict': {
@ -31,6 +32,7 @@ class AolIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
} }
}, { }, {
# video with vidible ID
'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183', 'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
'info_dict': { 'info_dict': {
'id': '5707d6b8e4b090497b04f706', 'id': '5707d6b8e4b090497b04f706',
@ -45,6 +47,12 @@ class AolIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944',
'only_matching': True,
}, {
'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor):
subtitle_url = media_info.get('_subtitleUrl') subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url: if subtitle_url:
subtitles['de'] = [{ subtitles['de'] = [{
'ext': 'srt', 'ext': 'ttml',
'url': subtitle_url, 'url': subtitle_url,
}] }]

View File

@ -210,7 +210,7 @@ class ArteTVPlus7IE(InfoExtractor):
# It also uses the arte_vp_url url from the webpage to extract the information # It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE): class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative' IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)' _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design', 'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
@ -229,9 +229,27 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n', 'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805', 'upload_date': '20140805',
} }
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
}] }]
class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
'ext': 'mp4',
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}
class ArteTVFutureIE(ArteTVPlus7IE): class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future' IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
@ -337,7 +355,7 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed' IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
http://www\.arte\.tv http://www\.arte\.tv
/playerv2/embed\.php\?json_url= /(?:playerv2/embed|arte_vp/index)\.php\?json_url=
(?P<json_url> (?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/ http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]* (?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*

View File

@ -30,14 +30,14 @@ class AudiomackIE(InfoExtractor):
# audiomack wrapper around soundcloud song # audiomack wrapper around soundcloud song
{ {
'add_ie': ['Soundcloud'], 'add_ie': ['Soundcloud'],
'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare', 'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
'info_dict': { 'info_dict': {
'id': '172419696', 'id': '258901379',
'ext': 'mp3', 'ext': 'mp3',
'description': 'md5:1fc3272ed7a635cce5be1568c2822997', 'description': 'mamba day freestyle for the legend Kobe Bryant ',
'title': 'Young Thug ft Lil Wayne - Take Kare', 'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
'uploader': 'Young Thug World', 'uploader': 'ILOVEMAKONNEN',
'upload_date': '20141016', 'upload_date': '20160414',
} }
}, },
] ]

View File

@ -671,6 +671,7 @@ class BBCIE(BBCCoUkIE):
'info_dict': { 'info_dict': {
'id': '34475836', 'id': '34475836',
'title': 'Jurgen Klopp: Furious football from a witty and winning coach', 'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
}, },
'playlist_count': 3, 'playlist_count': 3,
}, { }, {

View File

@ -340,7 +340,7 @@ class BrightcoveLegacyIE(InfoExtractor):
ext = 'flv' ext = 'flv'
if ext is None: if ext is None:
ext = determine_ext(url) ext = determine_ext(url)
tbr = int_or_none(rend.get('encodingRate'), 1000), tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = { a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''), 'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url, 'url': url,

View File

@ -33,6 +33,7 @@ class CBCIE(InfoExtractor):
'title': 'Robin Williams freestyles on 90 Minutes Live', 'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.', 'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101', 'upload_date': '19700101',
'uploader': 'CBCC-NEW',
}, },
'params': { 'params': {
# rtmp download # rtmp download

View File

@ -5,7 +5,6 @@ from ..utils import (
xpath_text, xpath_text,
xpath_element, xpath_element,
int_or_none, int_or_none,
ExtractorError,
find_xpath_attr, find_xpath_attr,
) )
@ -64,7 +63,7 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}] }]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true' TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -84,11 +83,11 @@ class CBSIE(CBSBaseIE):
pid = xpath_text(item, 'pid') pid = xpath_text(item, 'pid')
if not pid: if not pid:
continue continue
try: tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil( if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid) tp_release_url += '&manifest=m3u'
except ExtractorError: tp_formats, tp_subtitles = self._extract_theplatform_smil(
continue tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats) formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -382,7 +382,7 @@ class InfoExtractor(object):
else: else:
if query: if query:
url_or_request = update_url_query(url_or_request, query) url_or_request = update_url_query(url_or_request, query)
if data or headers: if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers) url_or_request = sanitized_Request(url_or_request, data, headers)
try: try:
return self._downloader.urlopen(url_or_request) return self._downloader.urlopen(url_or_request)

View File

@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
remove_end,
xpath_element,
xpath_text,
)
class DigitallySpeakingIE(InfoExtractor):
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_TESTS = [{
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '840376_BQRC',
'ext': 'mp4',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
}, {
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):
video_formats = []
video_root = None
mp4_video = xpath_text(metadata, './mp4video', default=None)
if mp4_video is not None:
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video)
video_root = mobj.group('root')
if video_root is None:
http_host = xpath_text(metadata, 'httpHost', default=None)
if http_host:
video_root = 'http://%s/' % http_host
if video_root is None:
# Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js
# Works for GPUTechConf, too
video_root = 'http://s3-2u.digitallyspeaking.com/'
formats = metadata.findall('./MBRVideos/MBRVideo')
if not formats:
return None
for a_format in formats:
stream_name = xpath_text(a_format, 'streamName', fatal=True)
video_path = re.match(r'mp4\:(?P<path>.*)', stream_name).group('path')
url = video_root + video_path
vbr = xpath_text(a_format, 'bitrate')
video_formats.append({
'url': url,
'vbr': int_or_none(vbr),
})
return video_formats
def _parse_flv(self, metadata):
formats = []
akamai_url = xpath_text(metadata, './akamaiHost', fatal=True)
audios = metadata.findall('./audios/audio')
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _real_extract(self, url):
video_id = self._match_id(url)
xml_description = self._download_xml(url, video_id)
metadata = xpath_element(xml_description, 'metadata')
video_formats = self._parse_mp4(metadata)
if video_formats is None:
video_formats = self._parse_flv(metadata)
return {
'id': video_id,
'formats': video_formats,
'title': xpath_text(metadata, 'title', fatal=True),
'duration': parse_duration(xpath_text(metadata, 'endTime')),
'creator': xpath_text(metadata, 'speaker'),
}

View File

@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': 'iseven', 'display_id': 'iseven',
'ext': 'flv', 'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61', 'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅', 'uploader': '7师傅',
'uploader_id': '431925', 'uploader_id': '431925',
@ -43,7 +43,7 @@ class DouyuTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Romm not found', 'skip': 'Room not found',
}, { }, {
'url': 'http://www.douyutv.com/17732', 'url': 'http://www.douyutv.com/17732',
'info_dict': { 'info_dict': {
@ -51,7 +51,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': '17732', 'display_id': '17732',
'ext': 'flv', 'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61', 'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅', 'uploader': '7师傅',
'uploader_id': '431925', 'uploader_id': '431925',
@ -75,13 +75,28 @@ class DouyuTVIE(InfoExtractor):
room_id = self._html_search_regex( room_id = self._html_search_regex(
r'"room_id"\s*:\s*(\d+),', page, 'room id') r'"room_id"\s*:\s*(\d+),', page, 'room id')
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % ( config = None
room_id, int(time.time())) # Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache"
# Retry with different parameters - same parameters cause same errors
for i in range(5):
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
room_id, int(time.time()))
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest() config_page = self._download_webpage(
config = self._download_json( 'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth), video_id)
video_id) try:
config = self._parse_json(config_page, video_id, fatal=False)
except ExtractorError:
# Wait some time before retrying to get a different time() value
self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. '
'Waiting for %(timeout)s seconds before retrying')
continue
else:
break
if config is None:
raise ExtractorError('Unable to fetch API result')
data = config['data'] data = config['data']

View File

@ -6,13 +6,18 @@ import re
import time import time
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..compat import compat_urlparse
from ..utils import (
int_or_none,
update_url_query,
)
class DPlayIE(InfoExtractor): class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# geo restricted, via direct unsigned hls URL
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/', 'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
'info_dict': { 'info_dict': {
'id': '1255600', 'id': '1255600',
@ -31,11 +36,12 @@ class DPlayIE(InfoExtractor):
}, },
'expected_warnings': ['Unable to download f4m manifest'], 'expected_warnings': ['Unable to download f4m manifest'],
}, { }, {
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/', 'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': { 'info_dict': {
'id': '3172', 'id': '3172',
'display_id': 'season-1-svensken-lar-sig-njuta-av-livet', 'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'flv', 'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet', 'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8', 'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'duration': 2650, 'duration': 2650,
@ -48,23 +54,25 @@ class DPlayIE(InfoExtractor):
'age_limit': 0, 'age_limit': 0,
}, },
}, { }, {
# geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/', 'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': { 'info_dict': {
'id': '70816', 'id': '70816',
'display_id': 'season-6-episode-12', 'display_id': 'season-6-episode-12',
'ext': 'flv', 'ext': 'mp4',
'title': 'Episode 12', 'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90', 'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
'duration': 2563, 'duration': 2563,
'timestamp': 1429696800, 'timestamp': 1429696800,
'upload_date': '20150422', 'upload_date': '20150422',
'creator': 'Kanal 4', 'creator': 'Kanal 4 (Home)',
'series': 'Mig og min mor', 'series': 'Mig og min mor',
'season_number': 6, 'season_number': 6,
'episode_number': 12, 'episode_number': 12,
'age_limit': 0, 'age_limit': 0,
}, },
}, { }, {
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/', 'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True, 'only_matching': True,
}] }]
@ -90,17 +98,24 @@ class DPlayIE(InfoExtractor):
def extract_formats(protocol, manifest_url): def extract_formats(protocol, manifest_url):
if protocol == 'hls': if protocol == 'hls':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, ext='mp4', manifest_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)) entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
# ourselves
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
for m3u8_format in m3u8_formats:
m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
formats.extend(m3u8_formats)
elif protocol == 'hds': elif protocol == 'hds':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0', manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
video_id, f4m_id=protocol, fatal=False)) video_id, f4m_id=protocol, fatal=False))
domain_tld = domain.split('.')[-1] domain_tld = domain.split('.')[-1]
if domain_tld in ('se', 'dk'): if domain_tld in ('se', 'dk', 'no'):
for protocol in PROTOCOLS: for protocol in PROTOCOLS:
# Providing dsc-geo allows to bypass geo restriction in some cases
self._set_cookie( self._set_cookie(
'secure.dplay.%s' % domain_tld, 'dsc-geo', 'secure.dplay.%s' % domain_tld, 'dsc-geo',
json.dumps({ json.dumps({
@ -113,13 +128,24 @@ class DPlayIE(InfoExtractor):
'Downloading %s stream JSON' % protocol, fatal=False) 'Downloading %s stream JSON' % protocol, fatal=False)
if stream and stream.get(protocol): if stream and stream.get(protocol):
extract_formats(protocol, stream[protocol]) extract_formats(protocol, stream[protocol])
else:
# The last resort is to try direct unsigned hls/hds URLs from info dictionary.
# Sometimes this does work even when secure API with dsc-geo has failed (e.g.
# http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
if not formats:
for protocol in PROTOCOLS: for protocol in PROTOCOLS:
if info.get(protocol): if info.get(protocol):
extract_formats(protocol, info[protocol]) extract_formats(protocol, info[protocol])
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
for lang in ('se', 'sv', 'da', 'nl', 'no'):
for format_id in ('web_vtt', 'vtt', 'srt'):
subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
if subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
@ -133,4 +159,5 @@ class DPlayIE(InfoExtractor):
'episode_number': int_or_none(info.get('episode')), 'episode_number': int_or_none(info.get('episode')),
'age_limit': int_or_none(info.get('minimum_age')), 'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats, 'formats': formats,
'subtitles': subtitles,
} }

View File

@ -1,39 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class DumpIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
_TEST = {
'url': 'http://www.dump.com/oneus/',
'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
'info_dict': {
'id': 'oneus',
'ext': 'flv',
'title': "He's one of us.",
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
}

View File

@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
url_basename,
) )
@ -21,7 +23,7 @@ class EaglePlatformIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# http://lenta.ru/news/2015/03/06/navalny/ # http://lenta.ru/news/2015/03/06/navalny/
'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201', 'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
'md5': '70f5187fb620f2c1d503b3b22fd4efe3', 'md5': '881ee8460e1b7735a8be938e2ffb362b',
'info_dict': { 'info_dict': {
'id': '227304', 'id': '227304',
'ext': 'mp4', 'ext': 'mp4',
@ -36,7 +38,7 @@ class EaglePlatformIE(InfoExtractor):
# http://muz-tv.ru/play/7129/ # http://muz-tv.ru/play/7129/
# http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true # http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true
'url': 'eagleplatform:media.clipyou.ru:12820', 'url': 'eagleplatform:media.clipyou.ru:12820',
'md5': '90b26344ba442c8e44aa4cf8f301164a', 'md5': '358597369cf8ba56675c1df15e7af624',
'info_dict': { 'info_dict': {
'id': '12820', 'id': '12820',
'ext': 'mp4', 'ext': 'mp4',
@ -55,8 +57,13 @@ class EaglePlatformIE(InfoExtractor):
raise ExtractorError(' '.join(response['errors']), expected=True) raise ExtractorError(' '.join(response['errors']), expected=True)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'): def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note) try:
self._handle_error(response) response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError):
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
self._handle_error(response)
raise
return response return response
def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'): def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'):
@ -84,17 +91,30 @@ class EaglePlatformIE(InfoExtractor):
secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:') secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:')
formats = []
m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON') m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON')
formats = self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, m3u8_url, video_id,
'mp4', entry_protocol='m3u8_native', m3u8_id='hls') 'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
formats.extend(m3u8_formats)
mp4_url = self._get_video_url( mp4_url = self._get_video_url(
# Secure mp4 URL is constructed according to Player.prototype.mp4 from # Secure mp4 URL is constructed according to Player.prototype.mp4 from
# http://lentaru.media.eagleplatform.com/player/player.js # http://lentaru.media.eagleplatform.com/player/player.js
re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8), re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8),
video_id, 'Downloading mp4 JSON') video_id, 'Downloading mp4 JSON')
formats.append({'url': mp4_url, 'format_id': 'mp4'}) mp4_url_basename = url_basename(mp4_url)
for m3u8_format in m3u8_formats:
mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url'])
if mobj:
http_format = m3u8_format.copy()
http_format.update({
'url': mp4_url.replace(mp4_url_basename, mobj.group(1)),
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(http_format)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -46,6 +46,7 @@ from .arte import (
ArteTVPlus7IE, ArteTVPlus7IE,
ArteTVCreativeIE, ArteTVCreativeIE,
ArteTVConcertIE, ArteTVConcertIE,
ArteTVInfoIE,
ArteTVFutureIE, ArteTVFutureIE,
ArteTVCinemaIE, ArteTVCinemaIE,
ArteTVDDCIE, ArteTVDDCIE,
@ -192,10 +193,10 @@ from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .drtv import DRTVIE from .drtv import DRTVIE
from .dvtv import DVTVIE from .dvtv import DVTVIE
from .dump import DumpIE
from .dumpert import DumpertIE from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE from .discovery import DiscoveryIE
from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE from .dropbox import DropboxIE
from .dw import ( from .dw import (
DWIE, DWIE,
@ -336,7 +337,6 @@ from .ivi import (
) )
from .ivideon import IvideonIE from .ivideon import IvideonIE
from .izlesene import IzleseneIE from .izlesene import IzleseneIE
from .jadorecettepub import JadoreCettePubIE
from .jeuxvideo import JeuxVideoIE from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE from .jove import JoveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
@ -406,13 +406,19 @@ from .mdr import MDRIE
from .metacafe import MetacafeIE from .metacafe import MetacafeIE
from .metacritic import MetacriticIE from .metacritic import MetacriticIE
from .mgoon import MgoonIE from .mgoon import MgoonIE
from .mgtv import MGTVIE
from .minhateca import MinhatecaIE from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE from .minoto import MinotoIE
from .miomio import MioMioIE from .miomio import MioMioIE
from .mit import TechTVMITIE, MITIE, OCWMITIE from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE from .mitele import MiTeleIE
from .mixcloud import MixcloudIE from .mixcloud import (
MixcloudIE,
MixcloudUserIE,
MixcloudPlaylistIE,
MixcloudStreamIE,
)
from .mlb import MLBIE from .mlb import MLBIE
from .mnet import MnetIE from .mnet import MnetIE
from .mpora import MporaIE from .mpora import MporaIE
@ -420,7 +426,6 @@ from .moevideo import MoeVideoIE
from .mofosex import MofosexIE from .mofosex import MofosexIE
from .mojvideo import MojvideoIE from .mojvideo import MojvideoIE
from .moniker import MonikerIE from .moniker import MonikerIE
from .mooshare import MooshareIE
from .morningstar import MorningstarIE from .morningstar import MorningstarIE
from .motherless import MotherlessIE from .motherless import MotherlessIE
from .motorsport import MotorsportIE from .motorsport import MotorsportIE
@ -465,7 +470,6 @@ from .ndr import (
from .ndtv import NDTVIE from .ndtv import NDTVIE
from .netzkino import NetzkinoIE from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE from .nerdcubed import NerdCubedFeedIE
from .nerdist import NerdistIE
from .neteasemusic import ( from .neteasemusic import (
NetEaseMusicIE, NetEaseMusicIE,
NetEaseMusicAlbumIE, NetEaseMusicAlbumIE,
@ -486,9 +490,10 @@ from .nextmovie import NextMovieIE
from .nfb import NFBIE from .nfb import NFBIE
from .nfl import NFLIE from .nfl import NFLIE
from .nhl import ( from .nhl import (
NHLIE,
NHLNewsIE,
NHLVideocenterIE, NHLVideocenterIE,
NHLNewsIE,
NHLVideocenterCategoryIE,
NHLIE,
) )
from .nick import NickIE from .nick import NickIE
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
@ -556,12 +561,12 @@ from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
from .pbs import PBSIE from .pbs import PBSIE
from .people import PeopleIE
from .periscope import PeriscopeIE from .periscope import PeriscopeIE
from .philharmoniedeparis import PhilharmonieDeParisIE from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
from .pinkbike import PinkbikeIE from .pinkbike import PinkbikeIE
from .planetaplay import PlanetaPlayIE
from .pladform import PladformIE from .pladform import PladformIE
from .played import PlayedIE from .played import PlayedIE
from .playfm import PlayFMIE from .playfm import PlayFMIE
@ -597,7 +602,6 @@ from .qqmusic import (
QQMusicToplistIE, QQMusicToplistIE,
QQMusicPlaylistIE, QQMusicPlaylistIE,
) )
from .quickvid import QuickVidIE
from .r7 import R7IE from .r7 import R7IE
from .radiode import RadioDeIE from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE from .radiojavan import RadioJavanIE
@ -730,6 +734,7 @@ from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE from .tagesschau import TagesschauIE
from .tapely import TapelyIE from .tapely import TapelyIE
from .tass import TassIE from .tass import TassIE
from .tdslifeway import TDSLifewayIE
from .teachertube import ( from .teachertube import (
TeacherTubeIE, TeacherTubeIE,
TeacherTubeUserIE, TeacherTubeUserIE,
@ -747,7 +752,6 @@ from .teletask import TeleTaskIE
from .testurl import TestURLIE from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
from .theonion import TheOnionIE
from .theplatform import ( from .theplatform import (
ThePlatformIE, ThePlatformIE,
ThePlatformFeedIE, ThePlatformFeedIE,
@ -832,7 +836,6 @@ from .twitter import (
TwitterIE, TwitterIE,
TwitterAmplifyIE, TwitterAmplifyIE,
) )
from .ubu import UbuIE
from .udemy import ( from .udemy import (
UdemyIE, UdemyIE,
UdemyCourseIE UdemyCourseIE
@ -917,7 +920,6 @@ from .vulture import VultureIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import WashingtonPostIE from .washingtonpost import WashingtonPostIE
from .wat import WatIE from .wat import WatIE
from .wayofthemaster import WayOfTheMasterIE
from .wdr import ( from .wdr import (
WDRIE, WDRIE,
WDRMobileIE, WDRMobileIE,

View File

@ -7,7 +7,7 @@ from .common import InfoExtractor
class GazetaIE(InfoExtractor): class GazetaIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)' _VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml', 'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml',
'md5': 'd49c9bdc6e5a7888f27475dc215ee789', 'md5': 'd49c9bdc6e5a7888f27475dc215ee789',
@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor):
'description': 'md5:38617526050bd17b234728e7f9620a71', 'description': 'md5:38617526050bd17b234728e7f9620a71',
'thumbnail': 're:^https?://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
}, },
'skip': 'video not found',
}, { }, {
'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml', 'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml',
'md5': '37f19f78355eb2f4256ee1688359f24c',
'info_dict': {
'id': '252048',
'ext': 'mp4',
'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"',
},
'add_ie': ['EaglePlatform'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -4,7 +4,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
remove_end,
HEADRequest, HEADRequest,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
@ -51,63 +50,33 @@ class GDCVaultIE(InfoExtractor):
{ {
'url': 'http://gdcvault.com/play/1020791/', 'url': 'http://gdcvault.com/play/1020791/',
'only_matching': True, 'only_matching': True,
} },
{
# Hard-coded hostname
'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '1023460',
'ext': 'mp4',
'display_id': 'Tenacious-Design-and-The-Interface',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
},
{
# Multiple audios
'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC',
'info_dict': {
'id': '1014631',
'ext': 'flv',
'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man',
},
'params': {
'skip_download': True, # Requires rtmpdump
'format': 'jp', # The japanese audio
}
},
] ]
def _parse_mp4(self, xml_description):
video_formats = []
mp4_video = xml_description.find('./metadata/mp4video')
if mp4_video is None:
return None
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video.text)
video_root = mobj.group('root')
formats = xml_description.findall('./metadata/MBRVideos/MBRVideo')
for format in formats:
mobj = re.match(r'mp4\:(?P<path>.*)', format.find('streamName').text)
url = video_root + mobj.group('path')
vbr = format.find('bitrate').text
video_formats.append({
'url': url,
'vbr': int(vbr),
})
return video_formats
def _parse_flv(self, xml_description):
formats = []
akamai_url = xml_description.find('./metadata/akamaiHost').text
audios = xml_description.find('./metadata/audios')
if audios is not None:
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xml_description.find('./metadata/slideVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xml_description.find('./metadata/speakerVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None or password is None: if username is None or password is None:
@ -183,17 +152,10 @@ class GDCVaultIE(InfoExtractor):
r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename') start_page, 'xml filename')
xml_description = self._download_xml(
'%s/xml/%s' % (xml_root, xml_name), display_id)
video_title = xml_description.find('./metadata/title').text
video_formats = self._parse_mp4(xml_description)
if video_formats is None:
video_formats = self._parse_flv(xml_description)
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': video_title, 'url': '%s/xml/%s' % (xml_root, xml_name),
'formats': video_formats, 'ie_key': 'DigitallySpeaking',
} }

View File

@ -60,6 +60,7 @@ from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .instagram import InstagramIE from .instagram import InstagramIE
from .liveleak import LiveLeakIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -104,7 +105,8 @@ class GenericIE(InfoExtractor):
'skip_download': True, # infinite live stream 'skip_download': True, # infinite live stream
}, },
'expected_warnings': [ 'expected_warnings': [
r'501.*Not Implemented' r'501.*Not Implemented',
r'400.*Bad Request',
], ],
}, },
# Direct link with incorrect MIME type # Direct link with incorrect MIME type
@ -235,6 +237,7 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'car-20120827-manifest', 'title': 'car-20120827-manifest',
'formats': 'mincount:9', 'formats': 'mincount:9',
'upload_date': '20130904',
}, },
'params': { 'params': {
'format': 'bestvideo', 'format': 'bestvideo',
@ -594,7 +597,11 @@ class GenericIE(InfoExtractor):
'id': 'k2mm4bCdJ6CQ2i7c8o2', 'id': 'k2mm4bCdJ6CQ2i7c8o2',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Le Zap de Spi0n n°216 - Zapping du Web', 'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
'description': 'md5:faf028e48a461b8b7fad38f1e104b119',
'uploader': 'Spi0n', 'uploader': 'Spi0n',
'uploader_id': 'xgditw',
'upload_date': '20140425',
'timestamp': 1398441542,
}, },
'add_ie': ['Dailymotion'], 'add_ie': ['Dailymotion'],
}, },
@ -727,8 +734,11 @@ class GenericIE(InfoExtractor):
'id': 'uxjb0lwrcz', 'id': 'uxjb0lwrcz',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks', 'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
'description': 'a Martin Fowler video from ThoughtWorks',
'duration': 1715.0, 'duration': 1715.0,
'uploader': 'thoughtworks.wistia.com', 'uploader': 'thoughtworks.wistia.com',
'upload_date': '20140603',
'timestamp': 1401832161,
}, },
}, },
# Soundcloud embed # Soundcloud embed
@ -979,6 +989,9 @@ class GenericIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'title': "PFT Live: New leader in the 'new-look' defense", 'title': "PFT Live: New leader in the 'new-look' defense",
'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e', 'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e',
'uploader': 'NBCU-SPORTS',
'upload_date': '20140107',
'timestamp': 1389118457,
}, },
}, },
# UDN embed # UDN embed
@ -1031,6 +1044,9 @@ class GenericIE(InfoExtractor):
'title': 'SN Presents: Russell Martin, World Citizen', 'title': 'SN Presents: Russell Martin, World Citizen',
'description': 'To understand why he was the Toronto Blue Jays top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.', 'description': 'To understand why he was the Toronto Blue Jays top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
'uploader': 'Rogers Sportsnet', 'uploader': 'Rogers Sportsnet',
'uploader_id': '1704050871',
'upload_date': '20150525',
'timestamp': 1432570283,
}, },
}, },
# Dailymotion Cloud video # Dailymotion Cloud video
@ -1122,12 +1138,39 @@ class GenericIE(InfoExtractor):
'title': 'The Cardinal Pell Interview', 'title': 'The Cardinal Pell Interview',
'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ', 'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
'uploader': 'GlobeCast Australia - GlobeStream', 'uploader': 'GlobeCast Australia - GlobeStream',
'uploader_id': '2733773828001',
'upload_date': '20160304',
'timestamp': 1457083087,
}, },
'params': { 'params': {
# m3u8 downloads # m3u8 downloads
'skip_download': True, 'skip_download': True,
}, },
}, },
# Another form of arte.tv embed
{
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
'md5': '850bfe45417ddf221288c88a0cffe2e2',
'info_dict': {
'id': '030273-562_PLUS7-F',
'ext': 'mp4',
'title': 'ARTE Reportage - Nulle part, en France',
'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
}
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@ -1702,7 +1745,7 @@ class GenericIE(InfoExtractor):
# Look for embedded arte.tv player # Look for embedded arte.tv player
mobj = re.search( mobj = re.search(
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"', r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed') return self.url_result(mobj.group('url'), 'ArteTVEmbed')
@ -1930,7 +1973,13 @@ class GenericIE(InfoExtractor):
# Look for Instagram embeds # Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage) instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None: if instagram_embed_url is not None:
return self.url_result(instagram_embed_url, InstagramIE.ie_key()) return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_url = LiveLeakIE._extract_url(webpage)
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
@ -2013,6 +2062,7 @@ class GenericIE(InfoExtractor):
entries = [] entries = []
for video_url in found: for video_url in found:
video_url = unescapeHTML(video_url)
video_url = video_url.replace('\\/', '/') video_url = video_url.replace('\\/', '/')
video_url = compat_urlparse.urljoin(url, video_url) video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse_unquote(os.path.basename(video_url)) video_id = compat_urllib_parse_unquote(os.path.basename(video_url))

View File

@ -14,13 +14,13 @@ class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)' _VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = { _TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video', 'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '027fcc54459dff0feb0bc06a7aeda680', 'md5': '4b6db9a0a333142eb9f15913142b0ed1',
'info_dict': { 'info_dict': {
'id': '299069', 'id': '299069',
'ext': 'flv', 'ext': 'flv',
'title': 'DIESEL SFW XXX Video', 'title': 'DIESEL SFW XXX Video',
'thumbnail': 're:^http://.*\.jpg$', 'thumbnail': 're:^http://.*\.jpg$',
'duration': 79, 'duration': 80,
'age_limit': 18, 'age_limit': 18,
} }
} }
@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'age_limit': self._family_friendly_search(webpage), 'age_limit': 18,
} }

View File

@ -2,12 +2,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
xpath_element,
xpath_text,
int_or_none,
parse_duration,
)
class GPUTechConfIE(InfoExtractor): class GPUTechConfIE(InfoExtractor):
@ -27,29 +21,15 @@ class GPUTechConfIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
root_path = self._search_regex(r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path', 'http://evt.dispeak.com/nvidia/events/gtc15/') root_path = self._search_regex(
xml_file_id = self._search_regex(r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id') r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path',
default='http://evt.dispeak.com/nvidia/events/gtc15/')
doc = self._download_xml('%sxml/%s.xml' % (root_path, xml_file_id), video_id) xml_file_id = self._search_regex(
r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
metadata = xpath_element(doc, 'metadata')
http_host = xpath_text(metadata, 'httpHost', 'http host', True)
mbr_videos = xpath_element(metadata, 'MBRVideos')
formats = []
for mbr_video in mbr_videos.findall('MBRVideo'):
stream_name = xpath_text(mbr_video, 'streamName')
if stream_name:
formats.append({
'url': 'http://%s/%s' % (http_host, stream_name.replace('mp4:', '')),
'tbr': int_or_none(xpath_text(mbr_video, 'bitrate')),
})
self._sort_formats(formats)
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'title': xpath_text(metadata, 'title'), 'url': '%sxml/%s.xml' % (root_path, xml_file_id),
'duration': parse_duration(xpath_text(metadata, 'endTime')), 'ie_key': 'DigitallySpeaking',
'creator': xpath_text(metadata, 'speaker'),
'formats': formats,
} }

View File

@ -16,14 +16,14 @@ class GrouponIE(InfoExtractor):
'playlist': [{ 'playlist': [{
'info_dict': { 'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf', 'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'mp4', 'ext': 'flv',
'title': 'Bikram Yoga Huntington Beach | Orange County', 'title': 'Bikram Yoga Huntington Beach | Orange County',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e', 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 44.961, 'duration': 44.961,
}, },
}], }],
'params': { 'params': {
'skip_download': 'HLS', 'skip_download': 'HDS',
} }
} }
@ -32,7 +32,7 @@ class GrouponIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
payload = self._parse_json(self._search_regex( payload = self._parse_json(self._search_regex(
r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id) r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
videos = payload['carousel'].get('dealVideos', []) videos = payload['carousel'].get('dealVideos', [])
entries = [] entries = []
for v in videos: for v in videos:

View File

@ -24,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 161, 'duration': 161,
}, },
'skip': 'Video broken',
}, },
{ {
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm', 'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',

View File

@ -4,6 +4,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
parse_duration, parse_duration,
unified_strdate, unified_strdate,
) )
@ -29,7 +30,12 @@ class HuffPostIE(InfoExtractor):
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ', 'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
'duration': 1549, 'duration': 1549,
'upload_date': '20140124', 'upload_date': '20140124',
} },
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['HTTP Error 404: Not Found'],
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -45,7 +51,7 @@ class HuffPostIE(InfoExtractor):
description = data.get('description') description = data.get('description')
thumbnails = [] thumbnails = []
for url in data['images'].values(): for url in filter(None, data['images'].values()):
m = re.match('.*-([0-9]+x[0-9]+)\.', url) m = re.match('.*-([0-9]+x[0-9]+)\.', url)
if not m: if not m:
continue continue
@ -54,13 +60,25 @@ class HuffPostIE(InfoExtractor):
'resolution': m.group(1), 'resolution': m.group(1),
}) })
formats = [{ formats = []
'format': key, sources = data.get('sources', {})
'format_id': key.replace('/', '.'), live_sources = list(sources.get('live', {}).items()) + list(sources.get('live_again', {}).items())
'ext': 'mp4', for key, url in live_sources:
'url': url, ext = determine_ext(url)
'vcodec': 'none' if key.startswith('audio/') else None, if ext == 'm3u8':
} for key, url in data.get('sources', {}).get('live', {}).items()] formats.extend(self._extract_m3u8_formats(
url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
url + '?hdcore=2.9.5', video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'format': key,
'format_id': key.replace('/', '.'),
'ext': 'mp4',
'url': url,
'vcodec': 'none' if key.startswith('audio/') else None,
})
if not formats and data.get('fivemin_id'): if not formats and data.get('fivemin_id'):
return self.url_result('5min:%s' % data['fivemin_id']) return self.url_result('5min:%s' % data['fivemin_id'])

View File

@ -12,7 +12,7 @@ from ..utils import (
class InstagramIE(InfoExtractor): class InstagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)' _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc', 'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516', 'md5': '0d2da106a9d2631273e192b372806516',
@ -38,10 +38,19 @@ class InstagramIE(InfoExtractor):
}, { }, {
'url': 'https://instagram.com/p/-Cmh1cukG2/', 'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_embed_url(webpage): def _extract_embed_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
webpage)
if mobj:
return mobj.group('url')
blockquote_el = get_element_by_attribute( blockquote_el = get_element_by_attribute(
'class', 'instagram-media', webpage) 'class', 'instagram-media', webpage)
if blockquote_el is None: if blockquote_el is None:
@ -53,7 +62,9 @@ class InstagramIE(InfoExtractor):
return mobj.group('link') return mobj.group('link')
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = mobj.group('url')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"', uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',

View File

@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
IE_NAME = 'iqiyi' IE_NAME = 'iqiyi'
IE_DESC = '爱奇艺' IE_DESC = '爱奇艺'
_VALID_URL = r'https?://(?:[^.]+\.)?iqiyi\.com/.+\.html' _VALID_URL = r'https?://(?:(?:[^.]+\.)?iqiyi\.com|www\.pps\.tv)/.+\.html'
_NETRC_MACHINE = 'iqiyi' _NETRC_MACHINE = 'iqiyi'
@ -273,6 +273,9 @@ class IqiyiIE(InfoExtractor):
'title': '灌篮高手 国语版', 'title': '灌篮高手 国语版',
}, },
'playlist_count': 101, 'playlist_count': 101,
}, {
'url': 'http://www.pps.tv/w_19rrbav0ph.html',
'only_matching': True,
}] }]
_FORMATS_MAP = [ _FORMATS_MAP = [
@ -284,6 +287,13 @@ class IqiyiIE(InfoExtractor):
('10', 'h1'), ('10', 'h1'),
] ]
AUTH_API_ERRORS = {
# No preview available (不允许试看鉴权失败)
'Q00505': 'This video requires a VIP account',
# End of preview time (试看结束鉴权失败)
'Q00506': 'Needs a VIP account for full video',
}
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -369,14 +379,18 @@ class IqiyiIE(InfoExtractor):
note='Downloading video authentication JSON', note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON') errnote='Unable to download video authentication JSON')
if auth_result['code'] == 'Q00505': # No preview available (不允许试看鉴权失败) code = auth_result.get('code')
raise ExtractorError('This video requires a VIP account', expected=True) msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
if auth_result['code'] == 'Q00506': # End of preview time (试看结束鉴权失败) if code == 'Q00506':
if do_report_warning: if do_report_warning:
self.report_warning('Needs a VIP account for full video') self.report_warning(msg)
return False return False
if 'data' not in auth_result:
if msg is not None:
raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
raise ExtractorError('Unexpected error from Iqiyi auth API')
return auth_result return auth_result['data']
def construct_video_urls(self, data, video_id, _uuid, tvid): def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y): def do_xor(x, y):
@ -452,11 +466,11 @@ class IqiyiIE(InfoExtractor):
need_vip_warning_report = False need_vip_warning_report = False
break break
param.update({ param.update({
't': auth_result['data']['t'], 't': auth_result['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as # cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9', 'cid': 'afbe8fd3d73448c9',
'vid': video_id, 'vid': video_id,
'QY00001': auth_result['data']['u'], 'QY00001': auth_result['u'],
}) })
api_video_url += '?' if '?' not in api_video_url else '&' api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse_urlencode(param) api_video_url += compat_urllib_parse_urlencode(param)

View File

@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi', 'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
'description': 'md5:253753e2655dde93f59f74b572454f6d', 'description': 'md5:253753e2655dde93f59f74b572454f6d',
'thumbnail': 're:^http://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
'uploader_id': 'pelikzzle', 'uploader_id': 'pelikzzle',
'timestamp': int, 'timestamp': int,
'upload_date': '20140702', 'upload_date': '20140702',
@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
'id': '17997', 'id': '17997',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tarkan Dortmund 2006 Konseri', 'title': 'Tarkan Dortmund 2006 Konseri',
'description': 'Tarkan Dortmund 2006 Konseri', 'thumbnail': 're:^https://.*\.jpg',
'thumbnail': 're:^http://.*\.jpg',
'uploader_id': 'parlayankiz', 'uploader_id': 'parlayankiz',
'timestamp': int, 'timestamp': int,
'upload_date': '20061112', 'upload_date': '20061112',
@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
description = self._og_search_description(webpage) description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url( thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:') self._og_search_thumbnail(webpage), scheme='http:')

View File

@ -1,47 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
class JadoreCettePubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_TEST = {
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
'md5': '401286a06067c70b44076044b66515de',
'info_dict': {
'id': 'jLMja3tr7a4',
'ext': 'mp4',
'title': 'La pire utilisation de Star Wars',
'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon. Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
webpage, 'title')
description = self._html_search_regex(
r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
fatal=False)
real_url = self._search_regex(
r'\[/postlink\](.*)endofvid', webpage, 'video URL')
video_id = YoutubeIE.extract_id(real_url)
return {
'_type': 'url_transparent',
'url': real_url,
'id': video_id,
'title': title,
'description': description,
}

View File

@ -2,39 +2,63 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
js_to_json,
)
class KaraoketvIE(InfoExtractor): class KaraoketvIE(InfoExtractor):
_VALID_URL = r'https?://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)' _VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://karaoketv.co.il/?container=songs&id=171568', 'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': { 'info_dict': {
'id': '171568', 'id': '58356',
'ext': 'mp4', 'ext': 'flv',
'title': 'אל העולם שלך - רותם כהן - שרים קריוקי', 'title': 'קריוקי של איזון',
},
'params': {
# rtmp download
'skip_download': True,
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
api_page_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
webpage, 'API play URL', group='url')
page_video_url = self._og_search_video_url(webpage, video_id) api_page = self._download_webpage(api_page_url, video_id)
config_json = compat_urllib_parse_unquote_plus(self._search_regex( video_cdn_url = self._search_regex(
r'config=(.*)', page_video_url, 'configuration')) r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
api_page, 'video cdn URL', group='url')
urls_info_json = self._download_json( video_cdn = self._download_webpage(video_cdn_url, video_id)
config_json, video_id, 'Downloading configuration', play_path = self._parse_json(
transform_source=js_to_json) self._search_regex(
r'var\s+options\s*=\s*({.+?});', video_cdn, 'options'),
video_id)['clip']['url']
url = urls_info_json['playlist'][0]['url'] settings = self._parse_json(
self._search_regex(
r'var\s+settings\s*=\s*({.+?});', video_cdn, 'servers', default='{}'),
video_id, fatal=False) or {}
servers = settings.get('servers')
if not servers or not isinstance(servers, list):
servers = ('wowzail.video-cdn.com:80/vodcdn', )
formats = [{
'url': 'rtmp://%s' % server if not server.startswith('rtmp') else server,
'play_path': play_path,
'app': 'vodcdn',
'page_url': video_cdn_url,
'player_url': 'http://www.video-cdn.com/assets/flowplayer/flowplayer.commercial-3.2.18.swf',
'rtmp_real_time': True,
'ext': 'flv',
} for server in servers]
return { return {
'id': video_id, 'id': video_id,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'url': url, 'formats': formats,
} }

View File

@ -52,9 +52,12 @@ class KarriereVideosIE(InfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
r'/config/video/(.+?)\.xml', webpage, 'video id') r'/config/video/(.+?)\.xml', webpage, 'video id')
# Server returns malformed headers
# Force Accept-Encoding: * to prevent gzipped results
playlist = self._download_xml( playlist = self._download_xml(
'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id, 'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
video_id, transform_source=fix_xml_ampersands) video_id, transform_source=fix_xml_ampersands,
headers={'Accept-Encoding': '*'})
NS_MAP = { NS_MAP = {
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats' 'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'

View File

@ -81,7 +81,7 @@ class KuwoIE(KuwoBaseIE):
'id': '6446136', 'id': '6446136',
'ext': 'mp3', 'ext': 'mp3',
'title': '', 'title': '',
'description': 'md5:b2ab6295d014005bfc607525bfc1e38a', 'description': 'md5:5d0e947b242c35dc0eb1d2fce9fbf02c',
'creator': 'IU', 'creator': 'IU',
'upload_date': '20150518', 'upload_date': '20150518',
}, },
@ -102,10 +102,10 @@ class KuwoIE(KuwoBaseIE):
raise ExtractorError('this song has been offline because of copyright issues', expected=True) raise ExtractorError('this song has been offline because of copyright issues', expected=True)
song_name = self._html_search_regex( song_name = self._html_search_regex(
r'(?s)class="(?:[^"\s]+\s+)*title(?:\s+[^"\s]+)*".*?<h1[^>]+title="([^"]+)"', webpage, 'song name') r'<p[^>]+id="lrcName">([^<]+)</p>', webpage, 'song name')
singer_name = self._html_search_regex( singer_name = remove_start(self._html_search_regex(
r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"', r'<a[^>]+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">',
webpage, 'singer name', fatal=False) webpage, 'singer name', fatal=False), '歌手')
lrc_content = clean_html(get_element_by_id('lrcContent', webpage)) lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
if lrc_content == '暂无': # indicates no lyrics if lrc_content == '暂无': # indicates no lyrics
lrc_content = None lrc_content = None
@ -114,7 +114,7 @@ class KuwoIE(KuwoBaseIE):
self._sort_formats(formats) self._sort_formats(formats)
album_id = self._html_search_regex( album_id = self._html_search_regex(
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"', r'<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
webpage, 'album id', fatal=False) webpage, 'album id', fatal=False)
publish_time = None publish_time = None
@ -268,7 +268,7 @@ class KuwoCategoryIE(InfoExtractor):
'title': '八十年代精选', 'title': '八十年代精选',
'description': '这些都是属于八十年代的回忆!', 'description': '这些都是属于八十年代的回忆!',
}, },
'playlist_count': 30, 'playlist_mincount': 24,
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -63,6 +63,7 @@ class Laola1TvIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This live stream has already finished.',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -74,6 +75,9 @@ class Laola1TvIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
iframe_url = self._search_regex( iframe_url = self._search_regex(
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"', r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
webpage, 'iframe url') webpage, 'iframe url')

View File

@ -6,6 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
determine_protocol,
parse_duration, parse_duration,
int_or_none, int_or_none,
) )
@ -18,10 +19,14 @@ class Lecture2GoIE(InfoExtractor):
'md5': 'ac02b570883020d208d405d5a3fd2f7f', 'md5': 'ac02b570883020d208d405d5a3fd2f7f',
'info_dict': { 'info_dict': {
'id': '17473', 'id': '17473',
'ext': 'flv', 'ext': 'mp4',
'title': '2 - Endliche Automaten und reguläre Sprachen', 'title': '2 - Endliche Automaten und reguläre Sprachen',
'creator': 'Frank Heitmann', 'creator': 'Frank Heitmann',
'duration': 5220, 'duration': 5220,
},
'params': {
# m3u8 download
'skip_download': True,
} }
} }
@ -32,14 +37,18 @@ class Lecture2GoIE(InfoExtractor):
title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title') title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
formats = [] formats = []
for url in set(re.findall(r'"src","([^"]+)"', webpage)): for url in set(re.findall(r'var\s+playerUri\d+\s*=\s*"([^"]+)"', webpage)):
ext = determine_ext(url) ext = determine_ext(url)
protocol = determine_protocol({'url': url})
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats(url, video_id)) formats.extend(self._extract_f4m_formats(url, video_id, f4m_id='hds'))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(url, video_id)) formats.extend(self._extract_m3u8_formats(url, video_id, ext='mp4', m3u8_id='hls'))
else: else:
if protocol == 'rtmp':
continue # XXX: currently broken
formats.append({ formats.append({
'format_id': protocol,
'url': url, 'url': url,
}) })

View File

@ -53,6 +53,14 @@ class LiveLeakIE(InfoExtractor):
} }
}] }]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
webpage)
if mobj:
return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)

View File

@ -49,8 +49,8 @@ class MDRIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Beutolomäus und der geheime Weihnachtswunsch', 'title': 'Beutolomäus und der geheime Weihnachtswunsch',
'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd', 'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
'timestamp': 1419047100, 'timestamp': 1450950000,
'upload_date': '20141220', 'upload_date': '20151224',
'duration': 4628, 'duration': 4628,
'uploader': 'KIKA', 'uploader': 'KIKA',
}, },
@ -71,8 +71,8 @@ class MDRIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data_url = self._search_regex( data_url = self._search_regex(
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>\\?/.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1', r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
webpage, 'data url', default=None, group='url').replace('\/', '/') webpage, 'data url', group='url').replace('\/', '/')
doc = self._download_xml( doc = self._download_xml(
compat_urlparse.urljoin(url, data_url), video_id) compat_urlparse.urljoin(url, data_url), video_id)

View File

@ -81,6 +81,9 @@ class MetacafeIE(InfoExtractor):
'title': 'Open: This is Face the Nation, February 9', 'title': 'Open: This is Face the Nation, February 9',
'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476', 'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
'duration': 96, 'duration': 96,
'uploader': 'CBSI-NEW',
'upload_date': '20140209',
'timestamp': 1391959800,
}, },
'params': { 'params': {
# rtmp download # rtmp download

View File

@ -11,7 +11,7 @@ from ..utils import (
class MetacriticIE(InfoExtractor): class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)' _VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222', 'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
'info_dict': { 'info_dict': {
'id': '3698222', 'id': '3698222',
@ -20,7 +20,17 @@ class MetacriticIE(InfoExtractor):
'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.', 'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
'duration': 221, 'duration': 221,
}, },
} 'skip': 'Not providing trailers anymore',
}, {
'url': 'http://www.metacritic.com/game/playstation-4/tales-from-the-borderlands-a-telltale-game-series/trailers/5740315',
'info_dict': {
'id': '5740315',
'ext': 'mp4',
'title': 'Tales from the Borderlands - Finale: The Vault of the Traveler',
'description': 'In the final episode of the season, all hell breaks loose. Jack is now in control of Helios\' systems, and he\'s ready to reclaim his rightful place as king of Hyperion (with or without you).',
'duration': 114,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -0,0 +1,63 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV'
_TEST = {
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
'md5': '',
'info_dict': {
'id': '3116640',
'ext': 'mp4',
'title': '我是歌手第四季双年巅峰会:韩红李玟“双王”领军对抗',
'description': '我是歌手第四季双年巅峰会',
'duration': 7461,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # m3u8 download
},
}
_FORMAT_MAP = {
'标清': ('Standard', 0),
'高清': ('High', 1),
'超清': ('SuperHigh', 2),
}
def _real_extract(self, url):
video_id = self._match_id(url)
api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id,
query={'video_id': video_id})['data']
info = api_data['info']
formats = []
for idx, stream in enumerate(api_data['stream']):
format_name = stream.get('name')
format_id, preference = self._FORMAT_MAP.get(format_name, (None, None))
format_info = self._download_json(
stream['url'], video_id,
note='Download video info for format %s' % format_id or '#%d' % idx)
formats.append({
'format_id': format_id,
'url': format_info['info'],
'ext': 'mp4', # These are m3u8 playlists
'preference': preference,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': info['title'].strip(),
'formats': formats,
'description': info.get('desc'),
'duration': int_or_none(info.get('duration')),
'thumbnail': info.get('thumb'),
}

View File

@ -1,8 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -20,21 +17,28 @@ class MinistryGridIE(InfoExtractor):
'id': '3453494717001', 'id': '3453494717001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Gospel by Numbers', 'title': 'The Gospel by Numbers',
'thumbnail': 're:^https?://.*\.jpg',
'upload_date': '20140410',
'description': 'Coming soon from T4G 2014!', 'description': 'Coming soon from T4G 2014!',
'uploader': 'LifeWay Christian Resources (MG)', 'uploader_id': '2034960640001',
'timestamp': 1397145591,
}, },
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['TDSLifeway'],
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
portlets_json = self._search_regex( portlets = self._parse_json(self._search_regex(
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list') r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list'),
portlets = json.loads(portlets_json) video_id)
pl_id = self._search_regex( pl_id = self._search_regex(
r'<!--\s*p_l_id - ([0-9]+)<br>', webpage, 'p_l_id') r'getPlid:function\(\){return"(\d+)"}', webpage, 'p_l_id')
for i, portlet in enumerate(portlets): for i, portlet in enumerate(portlets):
portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet) portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet)
@ -46,12 +50,8 @@ class MinistryGridIE(InfoExtractor):
r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe', r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
default=None) default=None)
if video_iframe_url: if video_iframe_url:
surl = smuggle_url( return self.url_result(
video_iframe_url, {'force_videoid': video_id}) smuggle_url(video_iframe_url, {'force_videoid': video_id}),
return { video_id=video_id)
'_type': 'url',
'id': video_id,
'url': surl,
}
raise ExtractorError('Could not find video iframe in any portlets') raise ExtractorError('Could not find video iframe in any portlets')

View File

@ -1,26 +1,35 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import functools
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote from ..compat import (
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import ( from ..utils import (
clean_html,
ExtractorError, ExtractorError,
HEADRequest, OnDemandPagedList,
parse_count, parse_count,
str_to_int, str_to_int,
) )
class MixcloudIE(InfoExtractor): class MixcloudIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)' _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
IE_NAME = 'mixcloud' IE_NAME = 'mixcloud'
_TESTS = [{ _TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/', 'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
'info_dict': { 'info_dict': {
'id': 'dholbach-cryptkeeper', 'id': 'dholbach-cryptkeeper',
'ext': 'mp3', 'ext': 'm4a',
'title': 'Cryptkeeper', 'title': 'Cryptkeeper',
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.', 'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
'uploader': 'Daniel Holbach', 'uploader': 'Daniel Holbach',
@ -38,22 +47,22 @@ class MixcloudIE(InfoExtractor):
'description': 'md5:2b8aec6adce69f9d41724647c65875e8', 'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
'uploader': 'Gilles Peterson Worldwide', 'uploader': 'Gilles Peterson Worldwide',
'uploader_id': 'gillespeterson', 'uploader_id': 'gillespeterson',
'thumbnail': 're:https?://.*/images/', 'thumbnail': 're:https?://.*',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
}, },
}] }]
def _check_url(self, url, track_id, ext): # See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
try: @staticmethod
# We only want to know if the request succeed def _decrypt_play_info(play_info):
# don't download the whole file KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
self._request_webpage(
HEADRequest(url), track_id, play_info = base64.b64decode(play_info.encode('ascii'))
'Trying %s URL' % ext)
return True return ''.join([
except ExtractorError: compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
return False for idx, ch in enumerate(play_info)])
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -63,14 +72,19 @@ class MixcloudIE(InfoExtractor):
webpage = self._download_webpage(url, track_id) webpage = self._download_webpage(url, track_id)
preview_url = self._search_regex( message = self._html_search_regex(
r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url') r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
song_url = re.sub(r'audiocdn(\d+)', r'stream\1', preview_url) webpage, 'error message', default=None)
song_url = song_url.replace('/previews/', '/c/originals/')
if not self._check_url(song_url, track_id, 'mp3'): encrypted_play_info = self._search_regex(
song_url = song_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/') r'm-play-info="([^"]+)"', webpage, 'play info')
if not self._check_url(song_url, track_id, 'm4a'): play_info = self._parse_json(
raise ExtractorError('Unable to extract track url') self._decrypt_play_info(encrypted_play_info), track_id)
if message and 'stream_url' not in play_info:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
song_url = play_info['stream_url']
PREFIX = ( PREFIX = (
r'm-play-on-spacebar[^>]+' r'm-play-on-spacebar[^>]+'
@ -105,3 +119,201 @@ class MixcloudIE(InfoExtractor):
'view_count': view_count, 'view_count': view_count,
'like_count': like_count, 'like_count': like_count,
} }
class MixcloudPlaylistBaseIE(InfoExtractor):
_PAGE_SIZE = 24
def _find_urls_in_page(self, page):
for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
yield self.url_result(
compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
MixcloudIE.ie_key())
def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
real_page_number = real_page_number or current_page + 1
return self._download_webpage(
'https://www.mixcloud.com/%s/' % path, video_id,
note='Download %s (page %d)' % (page_name, current_page + 1),
errnote='Unable to download %s' % page_name,
query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
headers={'X-Requested-With': 'XMLHttpRequest'})
def _tracks_page_func(self, page, video_id, page_name, current_page):
resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
for item in self._find_urls_in_page(resp):
yield item
def _get_user_description(self, page_content):
return self._html_search_regex(
r'<div[^>]+class="description-text"[^>]*>(.+?)</div>',
page_content, 'user description', fatal=False)
class MixcloudUserIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
IE_NAME = 'mixcloud:user'
_TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/uploads/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/favorites/',
'info_dict': {
'id': 'dholbach_favorites',
'title': 'Daniel Holbach (favorites)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.mixcloud.com/dholbach/listens/',
'info_dict': {
'id': 'dholbach_listens',
'title': 'Daniel Holbach (listens)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
list_type = mobj.group('type')
# if only a profile URL was supplied, default to download all uploads
if list_type is None:
list_type = 'uploads'
video_id = '%s_%s' % (user_id, list_type)
profile = self._download_webpage(
'https://www.mixcloud.com/%s/' % user_id, video_id,
note='Downloading user profile',
errnote='Unable to download user profile')
username = self._og_search_title(profile)
description = self._get_user_description(profile)
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
self._PAGE_SIZE, use_cache=True)
return self.playlist_result(
entries, video_id, '%s (%s)' % (username, list_type), description)
class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/playlists/(?P<playlist>[^/]+)/?$'
IE_NAME = 'mixcloud:playlist'
_TESTS = [{
'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
'info_dict': {
'id': 'RedBullThre3style_tokyo-finalists-2015',
'title': 'National Champions 2015',
'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
},
'playlist_mincount': 16,
}, {
'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
'info_dict': {
'id': 'maxvibes_jazzcat-on-ness-radio',
'title': 'Jazzcat on Ness Radio',
'description': 'md5:7bbbf0d6359a0b8cda85224be0f8f263',
},
'playlist_mincount': 23
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
playlist_id = mobj.group('playlist')
video_id = '%s_%s' % (user_id, playlist_id)
profile = self._download_webpage(
url, user_id,
note='Downloading playlist page',
errnote='Unable to download playlist page')
description = self._get_user_description(profile)
playlist_title = self._html_search_regex(
r'<span[^>]+class="[^"]*list-playlist-title[^"]*"[^>]*>(.*?)</span>',
profile, 'playlist title')
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
self._PAGE_SIZE)
return self.playlist_result(entries, video_id, playlist_title, description)
class MixcloudStreamIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
IE_NAME = 'mixcloud:stream'
_TEST = {
'url': 'https://www.mixcloud.com/FirstEar/stream/',
'info_dict': {
'id': 'FirstEar',
'title': 'First Ear',
'description': 'Curators of good music\nfirstearmusic.com',
},
'playlist_mincount': 192,
}
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(url, user_id)
entries = []
prev_page_url = None
def _handle_page(page):
entries.extend(self._find_urls_in_page(page))
return self._search_regex(
r'm-next-page-url="([^"]+)"', page,
'next page URL', default=None)
next_page_url = _handle_page(webpage)
for idx in itertools.count(0):
if not next_page_url or prev_page_url == next_page_url:
break
prev_page_url = next_page_url
current_page = int(self._search_regex(
r'\?page=(\d+)', next_page_url, 'next page number'))
next_page_url = _handle_page(self._fetch_tracks_page(
'%s/stream' % user_id, user_id, 'stream', idx,
real_page_number=current_page))
username = self._og_search_title(webpage)
description = self._get_user_description(webpage)
return self.playlist_result(entries, user_id, username, description)

View File

@ -1,110 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class MooshareIE(InfoExtractor):
IE_NAME = 'mooshare'
IE_DESC = 'Mooshare.biz'
_VALID_URL = r'https?://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
_TESTS = [
{
'url': 'http://mooshare.biz/8dqtk4bjbp8g',
'md5': '4e14f9562928aecd2e42c6f341c8feba',
'info_dict': {
'id': '8dqtk4bjbp8g',
'ext': 'mp4',
'title': 'Comedy Football 2011 - (part 1-2)',
'duration': 893,
},
},
{
'url': 'http://mooshare.biz/aipjtoc4g95j',
'info_dict': {
'id': 'aipjtoc4g95j',
'ext': 'mp4',
'title': 'Orange Caramel Dashing Through the Snow',
'duration': 212,
},
'params': {
# rtmp download
'skip_download': True,
}
}
]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id, 'Downloading page')
if re.search(r'>Video Not Found or Deleted<', page) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
download_form = {
'op': 'download1',
'id': video_id,
'hash': hash_key,
}
request = sanitized_Request(
'http://mooshare.biz/%s' % video_id, urlencode_postdata(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._sleep(5, video_id)
video_page = self._download_webpage(request, video_id, 'Downloading video page')
thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
duration = int(duration_str) if duration_str is not None else None
formats = []
# SD video
mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'sd',
'format': 'SD',
})
# HD video
mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'hd',
'format': 'HD',
})
# rtmp video
mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('rtmpurl'),
'play_path': mobj.group('playpath'),
'rtmp_live': False,
'ext': 'mp4',
'format_id': 'rtmp',
'format': 'HD',
})
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@ -1,17 +1,21 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
)
class MusicPlayOnIE(InfoExtractor): class MusicPlayOnIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=100&play)=(?P<id>\d+)' _VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://en.musicplayon.com/play?v=433377', 'url': 'http://en.musicplayon.com/play?v=433377',
'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
'info_dict': { 'info_dict': {
'id': '433377', 'id': '433377',
'ext': 'mp4', 'ext': 'mp4',
@ -20,15 +24,16 @@ class MusicPlayOnIE(InfoExtractor):
'duration': 342, 'duration': 342,
'uploader': 'ultrafish', 'uploader': 'ultrafish',
}, },
'params': { }, {
# m3u8 download 'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
'skip_download': True, 'only_matching': True,
}, }]
}
_URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') url = self._URL_TEMPLATE % video_id
page = self._download_webpage(url, video_id) page = self._download_webpage(url, video_id)
@ -40,28 +45,14 @@ class MusicPlayOnIE(InfoExtractor):
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False) r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
formats = [ sources = self._parse_json(
{ self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
'url': 'http://media0-eu-nl.musicplayon.com/stream-mobile?id=%s&type=.mp4' % video_id, video_id, transform_source=js_to_json)
'ext': 'mp4', formats = [{
} 'url': compat_urlparse.urljoin(url, source['src']),
] 'ext': mimetype2ext(source.get('type')),
'format_note': source.get('data-res'),
manifest = self._download_webpage( } for source in sources]
'http://en.musicplayon.com/manifest.m3u8?v=%s' % video_id, video_id, 'Downloading manifest')
for entry in manifest.split('#')[1:]:
if entry.startswith('EXT-X-STREAM-INF:'):
meta, url, _ = entry.split('\n')
params = dict(param.split('=') for param in meta.split(',')[1:])
formats.append({
'url': url,
'ext': 'mp4',
'tbr': int(params['BANDWIDTH']),
'width': int(params['RESOLUTION'].split('x')[1]),
'height': int(params['RESOLUTION'].split('x')[-1]),
'format_note': params['NAME'].replace('"', '').strip(),
})
return { return {
'id': video_id, 'id': video_id,

View File

@ -12,7 +12,7 @@ class MwaveIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)' _VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859', 'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
'md5': 'c930e27b7720aaa3c9d0018dfc8ff6cc', # md5 is unstable
'info_dict': { 'info_dict': {
'id': '168859', 'id': '168859',
'ext': 'flv', 'ext': 'flv',

View File

@ -134,6 +134,9 @@ class NBCSportsIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke', 'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113', 'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
'uploader': 'NBCU-SPORTS',
'upload_date': '20150330',
'timestamp': 1427726529,
} }
} }
@ -172,7 +175,7 @@ class CSNNEIE(InfoExtractor):
class NBCNewsIE(ThePlatformIE): class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/ _VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today)\.com/
(?:video/.+?/(?P<id>\d+)| (?:video/.+?/(?P<id>\d+)|
([^/]+/)*(?P<display_id>[^/?]+)) ([^/]+/)*(?P<display_id>[^/?]+))
''' '''
@ -230,6 +233,18 @@ class NBCNewsIE(ThePlatformIE):
}, },
'expected_warnings': ['http-6000 is not available'] 'expected_warnings': ['http-6000 is not available']
}, },
{
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
'md5': '118d7ca3f0bea6534f119c68ef539f71',
'info_dict': {
'id': '669831235788',
'ext': 'mp4',
'title': 'See the aurora borealis from space in stunning new NASA video',
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
'upload_date': '20160420',
'timestamp': 1461152093,
},
},
{ {
'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952', 'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952',
'only_matching': True, 'only_matching': True,
@ -264,7 +279,10 @@ class NBCNewsIE(ThePlatformIE):
info = bootstrap['results'][0]['video'] info = bootstrap['results'][0]['video']
else: else:
player_instance_json = self._search_regex( player_instance_json = self._search_regex(
r'videoObj\s*:\s*({.+})', webpage, 'player instance') r'videoObj\s*:\s*({.+})', webpage, 'player instance', default=None)
if not player_instance_json:
player_instance_json = self._html_search_regex(
r'data-video="([^"]+)"', webpage, 'video json')
info = self._parse_json(player_instance_json, display_id) info = self._parse_json(player_instance_json, display_id)
video_id = info['mpxId'] video_id = info['mpxId']
title = info['title'] title = info['title']
@ -295,7 +313,7 @@ class NBCNewsIE(ThePlatformIE):
formats.extend(tp_formats) formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
else: else:
tbr = int_or_none(video_asset.get('bitRate'), 1000) tbr = int_or_none(video_asset.get('bitRate') or video_asset.get('bitrate'), 1000)
format_id = 'http%s' % ('-%d' % tbr if tbr else '') format_id = 'http%s' % ('-%d' % tbr if tbr else '')
video_url = update_url_query( video_url = update_url_query(
video_url, {'format': 'redirect'}) video_url, {'format': 'redirect'})
@ -321,10 +339,9 @@ class NBCNewsIE(ThePlatformIE):
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': info.get('description'), 'description': info.get('description'),
'thumbnail': info.get('description'),
'thumbnail': info.get('thumbnail'), 'thumbnail': info.get('thumbnail'),
'duration': int_or_none(info.get('duration')), 'duration': int_or_none(info.get('duration')),
'timestamp': parse_iso8601(info.get('pubDate')), 'timestamp': parse_iso8601(info.get('pubDate') or info.get('pub_date')),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
} }

View File

@ -1,80 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_iso8601,
xpath_text,
)
class NerdistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nerdist\.com/vepisode/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.nerdist.com/vepisode/exclusive-which-dc-characters-w',
'md5': '3698ed582931b90d9e81e02e26e89f23',
'info_dict': {
'display_id': 'exclusive-which-dc-characters-w',
'id': 'RPHpvJyr',
'ext': 'mp4',
'title': 'Your TEEN TITANS Revealed! Who\'s on the show?',
'thumbnail': 're:^https?://.*/thumbs/.*\.jpg$',
'description': 'Exclusive: Find out which DC Comics superheroes will star in TEEN TITANS Live-Action TV Show on Nerdist News with Jessica Chobot!',
'uploader': 'Eric Diaz',
'upload_date': '20150202',
'timestamp': 1422892808,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'''(?x)<script\s+(?:type="text/javascript"\s+)?
src="https?://content\.nerdist\.com/players/([a-zA-Z0-9_]+)-''',
webpage, 'video ID')
timestamp = parse_iso8601(self._html_search_meta(
'shareaholic:article_published_time', webpage, 'upload date'))
uploader = self._html_search_meta(
'shareaholic:article_author_name', webpage, 'article author')
doc = self._download_xml(
'http://content.nerdist.com/jw6/%s.xml' % video_id, video_id)
video_info = doc.find('.//item')
title = xpath_text(video_info, './title', fatal=True)
description = xpath_text(video_info, './description')
thumbnail = xpath_text(
video_info, './{http://rss.jwpcdn.com/}image', 'thumbnail')
formats = []
for source in video_info.findall('./{http://rss.jwpcdn.com/}source'):
vurl = source.attrib['file']
ext = determine_ext(vurl)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
vurl, video_id, entry_protocol='m3u8_native', ext='mp4',
preference=0))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
vurl, video_id, fatal=False
))
else:
formats.append({
'format_id': ext,
'url': vurl,
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
'uploader': uploader,
}

View File

@ -89,6 +89,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1431878400, 'timestamp': 1431878400,
'description': 'md5:a10a54589c2860300d02e1de821eb2ef', 'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'No lyrics translation.', 'note': 'No lyrics translation.',
'url': 'http://music.163.com/#/song?id=29822014', 'url': 'http://music.163.com/#/song?id=29822014',
@ -101,6 +102,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1419523200, 'timestamp': 1419523200,
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c', 'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'No lyrics.', 'note': 'No lyrics.',
'url': 'http://music.163.com/song?id=17241424', 'url': 'http://music.163.com/song?id=17241424',
@ -112,6 +114,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20080211', 'upload_date': '20080211',
'timestamp': 1202745600, 'timestamp': 1202745600,
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'Has translated name.', 'note': 'Has translated name.',
'url': 'http://music.163.com/#/song?id=22735043', 'url': 'http://music.163.com/#/song?id=22735043',
@ -124,7 +127,8 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20100127', 'upload_date': '20100127',
'timestamp': 1264608000, 'timestamp': 1264608000,
'alt_title': '说出愿望吧(Genie)', 'alt_title': '说出愿望吧(Genie)',
} },
'skip': 'Blocked outside Mainland China',
}] }]
def _process_lyrics(self, lyrics_info): def _process_lyrics(self, lyrics_info):
@ -192,6 +196,7 @@ class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
'title': 'B\'day', 'title': 'B\'day',
}, },
'playlist_count': 23, 'playlist_count': 23,
'skip': 'Blocked outside Mainland China',
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -223,6 +228,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '张惠妹 - aMEI;阿密特', 'title': '张惠妹 - aMEI;阿密特',
}, },
'playlist_count': 50, 'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'Singer has translated name.', 'note': 'Singer has translated name.',
'url': 'http://music.163.com/#/artist?id=124098', 'url': 'http://music.163.com/#/artist?id=124098',
@ -231,6 +237,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '李昇基 - 이승기', 'title': '李昇基 - 이승기',
}, },
'playlist_count': 50, 'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -266,6 +273,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:12fd0819cab2965b9583ace0f8b7b022' 'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
}, },
'playlist_count': 99, 'playlist_count': 99,
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'Toplist/Charts sample', 'note': 'Toplist/Charts sample',
'url': 'http://music.163.com/#/discover/toplist?id=3733003', 'url': 'http://music.163.com/#/discover/toplist?id=3733003',
@ -275,6 +283,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:73ec782a612711cadc7872d9c1e134fc', 'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
}, },
'playlist_count': 50, 'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -314,6 +323,7 @@ class NetEaseMusicMvIE(NetEaseMusicBaseIE):
'creator': '白雅言', 'creator': '白雅言',
'upload_date': '20150520', 'upload_date': '20150520',
}, },
'skip': 'Blocked outside Mainland China',
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -357,6 +367,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'upload_date': '20150613', 'upload_date': '20150613',
'duration': 900, 'duration': 900,
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'This program has accompanying songs.', 'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022', 'url': 'http://music.163.com/#/program?id=10141022',
@ -366,6 +377,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b', 'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
}, },
'playlist_count': 4, 'playlist_count': 4,
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'This program has accompanying songs.', 'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022', 'url': 'http://music.163.com/#/program?id=10141022',
@ -379,7 +391,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
}, },
'params': { 'params': {
'noplaylist': True 'noplaylist': True
} },
'skip': 'Blocked outside Mainland China',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -438,6 +451,7 @@ class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
'description': 'md5:766220985cbd16fdd552f64c578a6b15' 'description': 'md5:766220985cbd16fdd552f64c578a6b15'
}, },
'playlist_mincount': 40, 'playlist_mincount': 40,
'skip': 'Blocked outside Mainland China',
} }
_PAGE_SIZE = 1000 _PAGE_SIZE = 1000

View File

@ -7,8 +7,8 @@ from .common import InfoExtractor
class NewgroundsIE(InfoExtractor): class NewgroundsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/audio/listen/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
'url': 'http://www.newgrounds.com/audio/listen/549479', 'url': 'http://www.newgrounds.com/audio/listen/549479',
'md5': 'fe6033d297591288fa1c1f780386f07a', 'md5': 'fe6033d297591288fa1c1f780386f07a',
'info_dict': { 'info_dict': {
@ -17,7 +17,16 @@ class NewgroundsIE(InfoExtractor):
'title': 'B7 - BusMode', 'title': 'B7 - BusMode',
'uploader': 'Burn7', 'uploader': 'Burn7',
} }
} }, {
'url': 'http://www.newgrounds.com/portal/view/673111',
'md5': '3394735822aab2478c31b1004fe5e5bc',
'info_dict': {
'id': '673111',
'ext': 'mp4',
'title': 'Dancin',
'uploader': 'Squirrelman82',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -25,9 +34,11 @@ class NewgroundsIE(InfoExtractor):
webpage = self._download_webpage(url, music_id) webpage = self._download_webpage(url, music_id)
title = self._html_search_regex( title = self._html_search_regex(
r',"name":"([^"]+)",', webpage, 'music title') r'<title>([^>]+)</title>', webpage, 'title')
uploader = self._html_search_regex( uploader = self._html_search_regex(
r',"artist":"([^"]+)",', webpage, 'music uploader') [r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'],
webpage, 'uploader')
music_url_json_string = self._html_search_regex( music_url_json_string = self._html_search_regex(
r'({"url":"[^"]+"),', webpage, 'music url') + '}' r'({"url":"[^"]+"),', webpage, 'music url') + '}'

View File

@ -4,24 +4,24 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..utils import (
ExtractorError,
int_or_none,
)
class NewstubeIE(InfoExtractor): class NewstubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)' _VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
_TEST = { _TEST = {
'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym', 'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym',
'md5': '801eef0c2a9f4089fa04e4fe3533abdc',
'info_dict': { 'info_dict': {
'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6', 'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6',
'ext': 'flv', 'ext': 'mp4',
'title': 'Телеканал CNN переместил город Славянск в Крым', 'title': 'Телеканал CNN переместил город Славянск в Крым',
'description': 'md5:419a8c9f03442bc0b0a794d689360335', 'description': 'md5:419a8c9f03442bc0b0a794d689360335',
'duration': 31.05, 'duration': 31.05,
}, },
'params': {
# rtmp download
'skip_download': True,
},
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -62,7 +62,6 @@ class NewstubeIE(InfoExtractor):
server = media_location.find(ns('./Server')).text server = media_location.find(ns('./Server')).text
app = media_location.find(ns('./App')).text app = media_location.find(ns('./App')).text
media_id = stream_info.find(ns('./Id')).text media_id = stream_info.find(ns('./Id')).text
quality_id = stream_info.find(ns('./QualityId')).text
name = stream_info.find(ns('./Name')).text name = stream_info.find(ns('./Name')).text
width = int(stream_info.find(ns('./Width')).text) width = int(stream_info.find(ns('./Width')).text)
height = int(stream_info.find(ns('./Height')).text) height = int(stream_info.find(ns('./Height')).text)
@ -74,12 +73,38 @@ class NewstubeIE(InfoExtractor):
'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'], 'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'],
'page_url': url, 'page_url': url,
'ext': 'flv', 'ext': 'flv',
'format_id': quality_id, 'format_id': 'rtmp' + ('-%s' % name if name else ''),
'format_note': name,
'width': width, 'width': width,
'height': height, 'height': height,
}) })
sources_data = self._download_json(
'http://www.newstube.ru/player2/getsources?guid=%s' % video_guid,
video_guid, fatal=False)
if sources_data:
for source in sources_data.get('Sources', []):
source_url = source.get('Src')
if not source_url:
continue
height = int_or_none(source.get('Height'))
f = {
'format_id': 'http' + ('-%dp' % height if height else ''),
'url': source_url,
'width': int_or_none(source.get('Width')),
'height': height,
}
source_type = source.get('Type')
if source_type:
mobj = re.search(r'codecs="([^,]+),\s*([^"]+)"', source_type)
if mobj:
vcodec, acodec = mobj.groups()
f.update({
'vcodec': vcodec,
'acodec': acodec,
})
formats.append(f)
self._check_formats(formats, video_guid)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -8,10 +8,15 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urlparse, compat_urlparse,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse compat_urllib_parse_urlparse,
compat_str,
) )
from ..utils import ( from ..utils import (
unified_strdate, unified_strdate,
determine_ext,
int_or_none,
parse_iso8601,
parse_duration,
) )
@ -70,8 +75,8 @@ class NHLBaseInfoExtractor(InfoExtractor):
return ret return ret
class NHLIE(NHLBaseInfoExtractor): class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com' IE_NAME = 'nhl.com:videocenter'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)' _VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)'
_TESTS = [{ _TESTS = [{
@ -186,8 +191,8 @@ class NHLNewsIE(NHLBaseInfoExtractor):
return self._real_extract_video(video_id) return self._real_extract_video(video_id)
class NHLVideocenterIE(NHLBaseInfoExtractor): class NHLVideocenterCategoryIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com:videocenter' IE_NAME = 'nhl.com:videocenter:category'
IE_DESC = 'NHL videocenter category' IE_DESC = 'NHL videocenter category'
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$' _VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$'
_TEST = { _TEST = {
@ -236,3 +241,86 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
'id': cat_id, 'id': cat_id,
'entries': [self._extract_video(v) for v in videos], 'entries': [self._extract_video(v) for v in videos],
} }
class NHLIE(InfoExtractor):
IE_NAME = 'nhl.com'
_VALID_URL = r'https?://(?:www\.)?nhl\.com/([^/]+/)*c-(?P<id>\d+)'
_TESTS = [{
# type=video
'url': 'https://www.nhl.com/video/anisimov-cleans-up-mess/t-277752844/c-43663503',
'md5': '0f7b9a8f986fb4b4eeeece9a56416eaf',
'info_dict': {
'id': '43663503',
'ext': 'mp4',
'title': 'Anisimov cleans up mess',
'description': 'md5:a02354acdfe900e940ce40706939ca63',
'timestamp': 1461288600,
'upload_date': '20160422',
},
}, {
# type=article
'url': 'https://www.nhl.com/news/dennis-wideman-suspended/c-278258934',
'md5': '1f39f4ea74c1394dea110699a25b366c',
'info_dict': {
'id': '40784403',
'ext': 'mp4',
'title': 'Wideman suspended by NHL',
'description': 'Flames defenseman Dennis Wideman was banned 20 games for violation of Rule 40 (Physical Abuse of Officials)',
'upload_date': '20160204',
'timestamp': 1454544904,
},
}]
def _real_extract(self, url):
tmp_id = self._match_id(url)
video_data = self._download_json(
'https://nhl.bamcontent.com/nhl/id/v1/%s/details/web-v1.json' % tmp_id,
tmp_id)
if video_data.get('type') == 'article':
video_data = video_data['media']
video_id = compat_str(video_data['id'])
title = video_data['title']
formats = []
for playback in video_data.get('playbacks', []):
playback_url = playback.get('url')
if not playback_url:
continue
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=playback.get('name', 'hls'), fatal=False))
else:
height = int_or_none(playback.get('height'))
formats.append({
'format_id': playback.get('name', 'http' + ('-%dp' % height if height else '')),
'url': playback_url,
'width': int_or_none(playback.get('width')),
'height': height,
})
self._sort_formats(formats, ('preference', 'width', 'height', 'tbr', 'format_id'))
thumbnails = []
for thumbnail_id, thumbnail_data in video_data.get('image', {}).get('cuts', {}).items():
thumbnail_url = thumbnail_data.get('src')
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
'width': int_or_none(thumbnail_data.get('width')),
'height': int_or_none(thumbnail_data.get('height')),
})
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'timestamp': parse_iso8601(video_data.get('date')),
'duration': parse_duration(video_data.get('duration')),
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import determine_ext from ..utils import (
determine_ext,
int_or_none,
)
class OnionStudiosIE(InfoExtractor): class OnionStudiosIE(InfoExtractor):
@ -17,7 +20,7 @@ class OnionStudiosIE(InfoExtractor):
'id': '2937', 'id': '2937',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Hannibal charges forward, stops for a cocktail', 'title': 'Hannibal charges forward, stops for a cocktail',
'description': 'md5:545299bda6abf87e5ec666548c6a9448', 'description': 'md5:e786add7f280b7f0fe237b64cc73df76',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'The A.V. Club', 'uploader': 'The A.V. Club',
'uploader_id': 'TheAVClub', 'uploader_id': 'TheAVClub',
@ -42,9 +45,19 @@ class OnionStudiosIE(InfoExtractor):
formats = [] formats = []
for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage): for src in re.findall(r'<source[^>]+src="([^"]+)"', webpage):
if determine_ext(src) != 'm3u8': # m3u8 always results in 403 ext = determine_ext(src)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
height = int_or_none(self._search_regex(
r'/(\d+)\.%s' % ext, src, 'height', default=None))
formats.append({ formats.append({
'format_id': ext + ('-%sp' % height if height else ''),
'url': src, 'url': src,
'height': height,
'ext': ext,
'preference': 1,
}) })
self._sort_formats(formats) self._sort_formats(formats)
@ -52,7 +65,7 @@ class OnionStudiosIE(InfoExtractor):
r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1', r'share_title\s*=\s*(["\'])(?P<title>[^\1]+?)\1',
webpage, 'title', group='title') webpage, 'title', group='title')
description = self._search_regex( description = self._search_regex(
r'share_description\s*=\s*(["\'])(?P<description>[^\1]+?)\1', r'share_description\s*=\s*(["\'])(?P<description>[^\'"]+?)\1',
webpage, 'description', default=None, group='description') webpage, 'description', default=None, group='description')
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1', r'poster\s*=\s*(["\'])(?P<thumbnail>[^\1]+?)\1',

View File

@ -6,8 +6,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_chr from ..compat import compat_chr
from ..utils import ( from ..utils import (
determine_ext,
encode_base_n, encode_base_n,
ExtractorError, ExtractorError,
mimetype2ext,
) )
@ -29,6 +31,11 @@ class OpenloadIE(InfoExtractor):
}, { }, {
'url': 'https://openload.io/f/ZAn6oz-VZGE/', 'url': 'https://openload.io/f/ZAn6oz-VZGE/',
'only_matching': True, 'only_matching': True,
}, {
# unavailable via https://openload.co/f/Sxz5sADo82g/, different layout
# for title and ext
'url': 'https://openload.co/embed/Sxz5sADo82g/',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -96,12 +103,25 @@ class OpenloadIE(InfoExtractor):
r'<video[^>]+>\s*<script[^>]+>([^<]+)</script>', r'<video[^>]+>\s*<script[^>]+>([^<]+)</script>',
webpage, 'JS code') webpage, 'JS code')
decoded = self.openload_decode(code)
video_url = self._search_regex( video_url = self._search_regex(
r'return\s+"(https?://[^"]+)"', self.openload_decode(code), 'video URL') r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
title = self._og_search_title(webpage, default=None) or self._search_regex(
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
'title', default=None) or self._html_search_meta(
'description', webpage, 'title', fatal=True)
ext = mimetype2ext(self._search_regex(
r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
'mimetype', default=None, group='mimetype')) or determine_ext(
video_url, 'mp4')
return { return {
'id': video_id, 'id': video_id,
'title': self._og_search_title(webpage), 'title': title,
'thumbnail': self._og_search_thumbnail(webpage), 'ext': ext,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'url': video_url, 'url': video_url,
} }

View File

@ -0,0 +1,32 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class PeopleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?people\.com/people/videos/0,,(?P<id>\d+),00\.html'
_TEST = {
'url': 'http://www.people.com/people/videos/0,,20995451,00.html',
'info_dict': {
'id': 'ref:20995451',
'ext': 'mp4',
'title': 'Astronaut Love Triangle Victim Speaks Out: “The Crime in 2007 Hasnt Defined Us”',
'description': 'Colleen Shipman speaks to PEOPLE for the first time about life after the attack',
'thumbnail': 're:^https?://.*\.jpg',
'duration': 246.318,
'timestamp': 1458720585,
'upload_date': '20160323',
'uploader_id': '416418724',
},
'params': {
'skip_download': True,
},
'add_ie': ['BrightcoveNew'],
}
def _real_extract(self, url):
return self.url_result(
'http://players.brightcove.net/416418724/default_default/index.html?videoId=ref:%s'
% self._match_id(url), 'BrightcoveNew')

View File

@ -1,61 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
class PlanetaPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?planetaplay\.com/\?sng=(?P<id>[0-9]+)'
_API_URL = 'http://planetaplay.com/action/playlist/?sng={0:}'
_THUMBNAIL_URL = 'http://planetaplay.com/img/thumb/{thumb:}'
_TEST = {
'url': 'http://planetaplay.com/?sng=3586',
'md5': '9d569dceb7251a4e01355d5aea60f9db',
'info_dict': {
'id': '3586',
'ext': 'flv',
'title': 'md5:e829428ee28b1deed00de90de49d1da1',
},
'skip': 'Not accessible from Travis CI server',
}
_SONG_FORMATS = {
'lq': (0, 'http://www.planetaplay.com/videoplayback/{med_hash:}'),
'hq': (1, 'http://www.planetaplay.com/videoplayback/hi/{med_hash:}'),
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
response = self._download_json(
self._API_URL.format(video_id), video_id)['response']
try:
data = response.get('data')[0]
except IndexError:
raise ExtractorError(
'%s: failed to get the playlist' % self.IE_NAME, expected=True)
title = '{song_artists:} - {sng_name:}'.format(**data)
thumbnail = self._THUMBNAIL_URL.format(**data)
formats = []
for format_id, (quality, url_template) in self._SONG_FORMATS.items():
formats.append({
'format_id': format_id,
'url': url_template.format(**data),
'quality': quality,
'ext': 'flv',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
}

View File

@ -40,7 +40,7 @@ class Puls4IE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
error_message = self._html_search_regex( error_message = self._html_search_regex(
r'<div class="message-error">(.+?)</div>', r'<div[^>]+class="message-error"[^>]*>(.+?)</div>',
webpage, 'error message', default=None) webpage, 'error message', default=None)
if error_message: if error_message:
raise ExtractorError( raise ExtractorError(

View File

@ -1,54 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
)
from ..utils import (
determine_ext,
int_or_none,
)
class QuickVidIE(InfoExtractor):
_VALID_URL = r'https?://(www\.)?quickvid\.org/watch\.php\?v=(?P<id>[a-zA-Z_0-9-]+)'
_TEST = {
'url': 'http://quickvid.org/watch.php?v=sUQT3RCG8dx',
'md5': 'c0c72dd473f260c06c808a05d19acdc5',
'info_dict': {
'id': 'sUQT3RCG8dx',
'ext': 'mp4',
'title': 'Nick Offerman\'s Summer Reading Recap',
'thumbnail': 're:^https?://.*\.(?:png|jpg|gif)$',
'view_count': int,
},
'skip': 'Not accessible from Travis CI server',
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(r'<h2>(.*?)</h2>', webpage, 'title')
view_count = int_or_none(self._html_search_regex(
r'(?s)<div id="views">(.*?)</div>',
webpage, 'view count', fatal=False))
video_code = self._search_regex(
r'(?s)<video id="video"[^>]*>(.*?)</video>', webpage, 'video code')
formats = [
{
'url': compat_urlparse.urljoin(url, src),
'format_id': determine_ext(src, None),
} for src in re.findall('<source\s+src="([^"]+)"', video_code)
]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': self._og_search_thumbnail(webpage),
'view_count': view_count,
}

View File

@ -4,12 +4,18 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
unescapeHTML, ExtractorError,
) )
class RTBFIE(InfoExtractor): class RTBFIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?rtbf\.be/(?:video/[^?]+\?.*\bid=|ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=)(?P<id>\d+)' _VALID_URL = r'''(?x)
https?://(?:www\.)?rtbf\.be/
(?:
video/[^?]+\?.*\bid=|
ouftivi/(?:[^/]+/)*[^?]+\?.*\bvideoId=|
auvio/[^/]+\?.*id=
)(?P<id>\d+)'''
_TESTS = [{ _TESTS = [{
'url': 'https://www.rtbf.be/video/detail_les-diables-au-coeur-episode-2?id=1921274', 'url': 'https://www.rtbf.be/video/detail_les-diables-au-coeur-episode-2?id=1921274',
'md5': '799f334ddf2c0a582ba80c44655be570', 'md5': '799f334ddf2c0a582ba80c44655be570',
@ -17,7 +23,11 @@ class RTBFIE(InfoExtractor):
'id': '1921274', 'id': '1921274',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Les Diables au coeur (épisode 2)', 'title': 'Les Diables au coeur (épisode 2)',
'description': 'Football - Diables Rouges',
'duration': 3099, 'duration': 3099,
'upload_date': '20140425',
'timestamp': 1398456336,
'uploader': 'rtbfsport',
} }
}, { }, {
# geo restricted # geo restricted
@ -26,45 +36,63 @@ class RTBFIE(InfoExtractor):
}, { }, {
'url': 'http://www.rtbf.be/ouftivi/niouzz?videoId=2055858', 'url': 'http://www.rtbf.be/ouftivi/niouzz?videoId=2055858',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.rtbf.be/auvio/detail_jeudi-en-prime-siegfried-bracke?id=2102996',
'only_matching': True,
}] }]
_IMAGE_HOST = 'http://ds1.ds.static.rtbf.be'
_PROVIDERS = {
'YOUTUBE': 'Youtube',
'DAILYMOTION': 'Dailymotion',
'VIMEO': 'Vimeo',
}
_QUALITIES = [ _QUALITIES = [
('mobile', 'mobile'), ('mobile', 'SD'),
('web', 'SD'), ('web', 'MD'),
('url', 'MD'),
('high', 'HD'), ('high', 'HD'),
] ]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
data = self._download_json(
'http://www.rtbf.be/api/media/video?method=getVideoDetail&args[]=%s' % video_id, video_id)
webpage = self._download_webpage( error = data.get('error')
'http://www.rtbf.be/video/embed?id=%s' % video_id, video_id) if error:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
data = self._parse_json( data = data['data']
unescapeHTML(self._search_regex(
r'data-media="([^"]+)"', webpage, 'data video')), provider = data.get('provider')
video_id) if provider in self._PROVIDERS:
return self.url_result(data['url'], self._PROVIDERS[provider])
if data.get('provider').lower() == 'youtube':
video_url = data.get('downloadUrl') or data.get('url')
return self.url_result(video_url, 'Youtube')
formats = [] formats = []
for key, format_id in self._QUALITIES: for key, format_id in self._QUALITIES:
format_url = data['sources'].get(key) format_url = data.get(key + 'Url')
if format_url: if format_url:
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': format_url, 'url': format_url,
}) })
thumbnails = []
for thumbnail_id, thumbnail_url in data.get('thumbnail', {}).items():
if thumbnail_id != 'default':
thumbnails.append({
'url': self._IMAGE_HOST + thumbnail_url,
'id': thumbnail_id,
})
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'title': data['title'], 'title': data['title'],
'description': data.get('description') or data.get('subtitle'), 'description': data.get('description') or data.get('subtitle'),
'thumbnail': data.get('thumbnail'), 'thumbnails': thumbnails,
'duration': data.get('duration') or data.get('realDuration'), 'duration': data.get('duration') or data.get('realDuration'),
'timestamp': int_or_none(data.get('created')), 'timestamp': int_or_none(data.get('created')),
'view_count': int_or_none(data.get('viewCount')), 'view_count': int_or_none(data.get('viewCount')),
'uploader': data.get('channel'),
'tags': data.get('tags'),
} }

View File

@ -6,6 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
js_to_json,
unified_strdate, unified_strdate,
) )
@ -94,19 +95,32 @@ class SportBoxEmbedIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
hls = self._search_regex( formats = []
r"sportboxPlayer\.jwplayer_common_params\.file\s*=\s*['\"]([^'\"]+)['\"]",
webpage, 'hls file') def cleanup_js(code):
# desktop_advert_config contains complex Javascripts and we don't need it
return js_to_json(re.sub(r'desktop_advert_config.*', '', code))
jwplayer_data = self._parse_json(self._search_regex(
r'(?s)player\.setup\(({.+?})\);', webpage, 'jwplayer settings'), video_id,
transform_source=cleanup_js)
hls_url = jwplayer_data.get('hls_url')
if hls_url:
formats.extend(self._extract_m3u8_formats(
hls_url, video_id, ext='mp4', m3u8_id='hls'))
rtsp_url = jwplayer_data.get('rtsp_url')
if rtsp_url:
formats.append({
'url': rtsp_url,
'format_id': 'rtsp',
})
formats = self._extract_m3u8_formats(hls, video_id, 'mp4')
self._sort_formats(formats) self._sort_formats(formats)
title = self._search_regex( title = jwplayer_data['node_title']
r'sportboxPlayer\.node_title\s*=\s*"([^"]+)"', webpage, 'title') thumbnail = jwplayer_data.get('image_url')
thumbnail = self._search_regex(
r'sportboxPlayer\.jwplayer_common_params\.image\s*=\s*"([^"]+)"',
webpage, 'thumbnail', default=None)
return { return {
'id': video_id, 'id': video_id,

View File

@ -14,7 +14,6 @@ class StreetVoiceIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '94440', 'id': '94440',
'ext': 'mp3', 'ext': 'mp3',
'filesize': 4167053,
'title': '', 'title': '',
'description': 'Crispy脆樂團 - 輸', 'description': 'Crispy脆樂團 - 輸',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
@ -32,20 +31,19 @@ class StreetVoiceIE(InfoExtractor):
song_id = self._match_id(url) song_id = self._match_id(url)
song = self._download_json( song = self._download_json(
'http://streetvoice.com/music/api/song/%s' % song_id, song_id) 'https://streetvoice.com/api/v1/public/song/%s/' % song_id, song_id, data=b'')
title = song['name'] title = song['name']
author = song['musician']['name'] author = song['user']['nickname']
return { return {
'id': song_id, 'id': song_id,
'url': song['file'], 'url': song['file'],
'filesize': song.get('size'),
'title': title, 'title': title,
'description': '%s - %s' % (author, title), 'description': '%s - %s' % (author, title),
'thumbnail': self._proto_relative_url(song.get('image'), 'http:'), 'thumbnail': self._proto_relative_url(song.get('image'), 'http:'),
'duration': song.get('length'), 'duration': song.get('length'),
'upload_date': unified_strdate(song.get('created_at')), 'upload_date': unified_strdate(song.get('created_at')),
'uploader': author, 'uploader': author,
'uploader_id': compat_str(song['musician']['id']), 'uploader_id': compat_str(song['user']['id']),
} }

View File

@ -0,0 +1,33 @@
from __future__ import unicode_literals
from .common import InfoExtractor
class TDSLifewayIE(InfoExtractor):
_VALID_URL = r'https?://tds\.lifeway\.com/v1/trainingdeliverysystem/courses/(?P<id>\d+)/index\.html'
_TEST = {
# From http://www.ministrygrid.com/training-viewer/-/training/t4g-2014-conference/the-gospel-by-numbers-4/the-gospel-by-numbers
'url': 'http://tds.lifeway.com/v1/trainingdeliverysystem/courses/3453494717001/index.html?externalRegistration=AssetId%7C34F466F1-78F3-4619-B2AB-A8EFFA55E9E9%21InstanceId%7C0%21UserId%7Caaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa&grouping=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&activity_id=http%3A%2F%2Flifeway.com%2Fvideo%2F3453494717001&content_endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2Fcontent%2F&actor=%7B%22name%22%3A%5B%22Guest%20Guest%22%5D%2C%22account%22%3A%5B%7B%22accountServiceHomePage%22%3A%22http%3A%2F%2Fscorm.lifeway.com%2F%22%2C%22accountName%22%3A%22aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa%22%7D%5D%2C%22objectType%22%3A%22Agent%22%7D&content_token=462a50b2-b6f9-4970-99b1-930882c499fb&registration=93d6ec8e-7f7b-4ed3-bbc8-a857913c0b2a&externalConfiguration=access%7CFREE%21adLength%7C-1%21assignOrgId%7C4AE36F78-299A-425D-91EF-E14A899B725F%21assignOrgParentId%7C%21courseId%7C%21isAnonymous%7Cfalse%21previewAsset%7Cfalse%21previewLength%7C-1%21previewMode%7Cfalse%21royalty%7CFREE%21sessionId%7C671422F9-8E79-48D4-9C2C-4EE6111EA1CD%21trackId%7C&auth=Basic%20OjhmZjk5MDBmLTBlYTMtNDJhYS04YjFlLWE4MWQ3NGNkOGRjYw%3D%3D&endpoint=http%3A%2F%2Ftds.lifeway.com%2Fv1%2Ftrainingdeliverysystem%2FScormEngineInterface%2FTCAPI%2F',
'info_dict': {
'id': '3453494717001',
'ext': 'mp4',
'title': 'The Gospel by Numbers',
'thumbnail': 're:^https?://.*\.jpg',
'upload_date': '20140410',
'description': 'Coming soon from T4G 2014!',
'uploader_id': '2034960640001',
'timestamp': 1397145591,
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['BrightcoveNew'],
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/2034960640001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
brightcove_id = self._match_id(url)
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)

View File

@ -1,63 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class TheOnionIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?theonion\.com/video/[^,]+,(?P<id>[0-9]+)/?'
_TEST = {
'url': 'http://www.theonion.com/video/man-wearing-mm-jacket-gods-image,36918/',
'md5': '19eaa9a39cf9b9804d982e654dc791ee',
'info_dict': {
'id': '2133',
'ext': 'mp4',
'title': 'Man Wearing M&M Jacket Apparently Made In God\'s Image',
'description': 'md5:cc12448686b5600baae9261d3e180910',
'thumbnail': 're:^https?://.*\.jpg\?\d+$',
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'"videoId":\s(\d+),', webpage, 'video ID')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
sources = re.findall(r'<source src="([^"]+)" type="([^"]+)"', webpage)
formats = []
for src, type_ in sources:
if type_ == 'video/mp4':
formats.append({
'format_id': 'mp4_sd',
'preference': 1,
'url': src,
})
elif type_ == 'video/webm':
formats.append({
'format_id': 'webm_sd',
'preference': 0,
'url': src,
})
elif type_ == 'application/x-mpegURL':
formats.extend(
self._extract_m3u8_formats(src, display_id, preference=-1))
else:
self.report_warning(
'Encountered unexpected format: %s' % type_)
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'description': description,
}

View File

@ -50,8 +50,6 @@ class ThePlatformBaseIE(OnceIE):
else: else:
formats.append(_format) formats.append(_format)
self._sort_formats(formats)
subtitles = self._parse_smil_subtitles(meta, default_ns) subtitles = self._parse_smil_subtitles(meta, default_ns)
return formats, subtitles return formats, subtitles
@ -241,6 +239,7 @@ class ThePlatformIE(ThePlatformBaseIE):
smil_url = self._sign_url(smil_url, sig['key'], sig['secret']) smil_url = self._sign_url(smil_url, sig['key'], sig['secret'])
formats, subtitles = self._extract_theplatform_smil(smil_url, video_id) formats, subtitles = self._extract_theplatform_smil(smil_url, video_id)
self._sort_formats(formats)
ret = self.get_metadata(path, video_id) ret = self.get_metadata(path, video_id)
combined_subtitles = self._merge_subtitles(ret.get('subtitles', {}), subtitles) combined_subtitles = self._merge_subtitles(ret.get('subtitles', {}), subtitles)
@ -270,6 +269,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
'timestamp': 1391824260, 'timestamp': 1391824260,
'duration': 467.0, 'duration': 467.0,
'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'], 'categories': ['MSNBC/Issues/Democrats', 'MSNBC/Issues/Elections/Election 2016'],
'uploader': 'NBCU-NEWS',
}, },
} }

View File

@ -1,7 +1,6 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import codecs
import re import re
from .common import InfoExtractor from .common import InfoExtractor
@ -10,22 +9,24 @@ from ..utils import (
int_or_none, int_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
parse_iso8601,
) )
class TubiTvIE(InfoExtractor): class TubiTvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tubitv\.com/video\?id=(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?tubitv\.com/video/(?P<id>[0-9]+)'
_LOGIN_URL = 'http://tubitv.com/login' _LOGIN_URL = 'http://tubitv.com/login'
_NETRC_MACHINE = 'tubitv' _NETRC_MACHINE = 'tubitv'
_TEST = { _TEST = {
'url': 'http://tubitv.com/video?id=54411&title=The_Kitchen_Musical_-_EP01', 'url': 'http://tubitv.com/video/283829/the_comedian_at_the_friday',
'info_dict': { 'info_dict': {
'id': '54411', 'id': '283829',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Kitchen Musical - EP01', 'title': 'The Comedian at The Friday',
'thumbnail': 're:^https?://.*\.png$', 'description': 'A stand up comedian is forced to look at the decisions in his life while on a one week trip to the west coast.',
'description': 'md5:37532716166069b353e8866e71fefae7', 'uploader': 'Indie Rights Films',
'duration': 2407, 'upload_date': '20160111',
'timestamp': 1452555979,
}, },
'params': { 'params': {
'skip_download': 'HLS download', 'skip_download': 'HLS download',
@ -55,27 +56,31 @@ class TubiTvIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json(
'http://tubitv.com/oz/videos/%s/content' % video_id, video_id)
title = video_data['n']
webpage = self._download_webpage(url, video_id) formats = self._extract_m3u8_formats(
if re.search(r"<(?:DIV|div) class='login-required-screen'>", webpage): video_data['mh'], video_id, 'mp4', 'm3u8_native')
self.raise_login_required('This video requires login')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'duration'))
apu = self._search_regex(r"apu='([^']+)'", webpage, 'apu')
m3u8_url = codecs.decode(apu, 'rot_13')[::-1]
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
for sub in video_data.get('sb', []):
sub_url = sub.get('u')
if not sub_url:
continue
subtitles.setdefault(sub.get('l', 'en'), []).append({
'url': sub_url,
})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'formats': formats, 'formats': formats,
'thumbnail': thumbnail, 'subtitles': subtitles,
'description': description, 'thumbnail': video_data.get('ph'),
'duration': duration, 'description': video_data.get('d'),
'duration': int_or_none(video_data.get('s')),
'timestamp': parse_iso8601(video_data.get('u')),
'uploader': video_data.get('on'),
} }

View File

@ -65,6 +65,9 @@ class TudouIE(InfoExtractor):
if quality: if quality:
info_url += '&hd' + quality info_url += '&hd' + quality
xml_data = self._download_xml(info_url, video_id, 'Opening the info XML page') xml_data = self._download_xml(info_url, video_id, 'Opening the info XML page')
error = xml_data.attrib.get('error')
if error is not None:
raise ExtractorError('Tudou said: %s' % error, expected=True)
final_url = xml_data.text final_url = xml_data.text
return final_url return final_url

View File

@ -58,7 +58,9 @@ class TvigleIE(InfoExtractor):
if not video_id: if not video_id:
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'class="video-preview current_playing" id="(\d+)">', (r'<div[^>]+class=["\']player["\'][^>]+id=["\'](\d+)',
r'var\s+cloudId\s*=\s*["\'](\d+)',
r'class="video-preview current_playing" id="(\d+)"'),
webpage, 'video id') webpage, 'video id')
video_data = self._download_json( video_data = self._download_json(
@ -81,10 +83,10 @@ class TvigleIE(InfoExtractor):
formats = [] formats = []
for vcodec, fmts in item['videos'].items(): for vcodec, fmts in item['videos'].items():
if vcodec == 'hls':
continue
for format_id, video_url in fmts.items(): for format_id, video_url in fmts.items():
if format_id == 'm3u8': if format_id == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id=vcodec))
continue continue
height = self._search_regex( height = self._search_regex(
r'^(\d+)[pP]$', format_id, 'height', default=None) r'^(\d+)[pP]$', format_id, 'height', default=None)

View File

@ -260,6 +260,17 @@ class TwitterIE(InfoExtractor):
'upload_date': '20140615', 'upload_date': '20140615',
}, },
'add_ie': ['Vine'], 'add_ie': ['Vine'],
}, {
'url': 'https://twitter.com/captainamerica/status/719944021058060289',
# md5 constantly changes
'info_dict': {
'id': '719944021058060289',
'ext': 'mp4',
'title': 'Captain America - @King0fNerd Are you sure you made the right choice? Find out in theaters.',
'description': 'Captain America on Twitter: "@King0fNerd Are you sure you made the right choice? Find out in theaters. https://t.co/GpgYi9xMJI"',
'uploader_id': 'captainamerica',
'uploader': 'Captain America',
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -284,17 +295,6 @@ class TwitterIE(InfoExtractor):
'title': username + ' - ' + title, 'title': username + ' - ' + title,
} }
card_id = self._search_regex(
r'["\']/i/cards/tfw/v1/(\d+)', webpage, 'twitter card url', default=None)
if card_id:
card_url = 'https://twitter.com/i/cards/tfw/v1/' + card_id
info.update({
'_type': 'url_transparent',
'ie_key': 'TwitterCard',
'url': card_url,
})
return info
mobj = re.search(r'''(?x) mobj = re.search(r'''(?x)
<video[^>]+class="animated-gif"(?P<more_info>[^>]+)>\s* <video[^>]+class="animated-gif"(?P<more_info>[^>]+)>\s*
<source[^>]+video-src="(?P<url>[^"]+)" <source[^>]+video-src="(?P<url>[^"]+)"

View File

@ -1,57 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
qualities,
)
class UbuIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ubu\.com/film/(?P<id>[\da-z_-]+)\.html'
_TEST = {
'url': 'http://ubu.com/film/her_noise.html',
'md5': '138d5652618bf0f03878978db9bef1ee',
'info_dict': {
'id': 'her_noise',
'ext': 'm4v',
'title': 'Her Noise - The Making Of (2007)',
'duration': 3600,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>.+?Film &amp; Video: ([^<]+)</title>', webpage, 'title')
duration = int_or_none(self._html_search_regex(
r'Duration: (\d+) minutes', webpage, 'duration', fatal=False),
invscale=60)
formats = []
FORMAT_REGEXES = [
('sq', r"'flashvars'\s*,\s*'file=([^']+)'"),
('hq', r'href="(http://ubumexico\.centro\.org\.mx/video/[^"]+)"'),
]
preference = qualities([fid for fid, _ in FORMAT_REGEXES])
for format_id, format_regex in FORMAT_REGEXES:
m = re.search(format_regex, webpage)
if m:
formats.append({
'url': m.group(1),
'format_id': format_id,
'preference': preference(format_id),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'duration': duration,
'formats': formats,
}

View File

@ -41,6 +41,12 @@ class UstreamIE(InfoExtractor):
'uploader': 'sportscanadatv', 'uploader': 'sportscanadatv',
}, },
'skip': 'This Pro Broadcaster has chosen to remove this video from the ustream.tv site.', 'skip': 'This Pro Broadcaster has chosen to remove this video from the ustream.tv site.',
}, {
'url': 'http://www.ustream.tv/embed/10299409',
'info_dict': {
'id': '10299409',
},
'playlist_count': 3,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -55,10 +61,12 @@ class UstreamIE(InfoExtractor):
if m.group('type') == 'embed': if m.group('type') == 'embed':
video_id = m.group('id') video_id = m.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
desktop_video_id = self._html_search_regex( content_video_ids = self._parse_json(self._search_regex(
r'ContentVideoIds=\["([^"]*?)"\]', webpage, 'desktop_video_id') r'ustream\.vars\.offAirContentVideoIds=([^;]+);', webpage,
desktop_url = 'http://www.ustream.tv/recorded/' + desktop_video_id 'content video IDs'), video_id)
return self.url_result(desktop_url, 'Ustream') return self.playlist_result(
map(lambda u: self.url_result('http://www.ustream.tv/recorded/' + u, 'Ustream'), content_video_ids),
video_id)
params = self._download_json( params = self._download_json(
'https://api.ustream.tv/videos/%s.json' % video_id, video_id) 'https://api.ustream.tv/videos/%s.json' % video_id, video_id)

View File

@ -2,11 +2,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_urlparse,
compat_parse_qs,
)
from ..utils import (
clean_html,
remove_start,
)
class Varzesh3IE(InfoExtractor): class Varzesh3IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?video\.varzesh3\.com/(?:[^/]+/)+(?P<id>[^/]+)/?' _VALID_URL = r'https?://(?:www\.)?video\.varzesh3\.com/(?:[^/]+/)+(?P<id>[^/]+)/?'
_TEST = { _TESTS = [{
'url': 'http://video.varzesh3.com/germany/bundesliga/5-%D9%88%D8%A7%DA%A9%D9%86%D8%B4-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AF%D8%B1%D9%88%D8%A7%D8%B2%D9%87%E2%80%8C%D8%A8%D8%A7%D9%86%D8%A7%D9%86%D8%9B%D9%87%D9%81%D8%AA%D9%87-26-%D8%A8%D9%88%D9%86%D8%AF%D8%B3/', 'url': 'http://video.varzesh3.com/germany/bundesliga/5-%D9%88%D8%A7%DA%A9%D9%86%D8%B4-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AF%D8%B1%D9%88%D8%A7%D8%B2%D9%87%E2%80%8C%D8%A8%D8%A7%D9%86%D8%A7%D9%86%D8%9B%D9%87%D9%81%D8%AA%D9%87-26-%D8%A8%D9%88%D9%86%D8%AF%D8%B3/',
'md5': '2a933874cb7dce4366075281eb49e855', 'md5': '2a933874cb7dce4366075281eb49e855',
'info_dict': { 'info_dict': {
@ -15,8 +23,19 @@ class Varzesh3IE(InfoExtractor):
'title': '۵ واکنش برتر دروازه‌بانان؛هفته ۲۶ بوندسلیگا', 'title': '۵ واکنش برتر دروازه‌بانان؛هفته ۲۶ بوندسلیگا',
'description': 'فصل ۲۰۱۵-۲۰۱۴', 'description': 'فصل ۲۰۱۵-۲۰۱۴',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
} },
} 'skip': 'HTTP 404 Error',
}, {
'url': 'http://video.varzesh3.com/video/112785/%D8%AF%D9%84%D9%87-%D8%B9%D9%84%DB%8C%D8%9B-%D8%B3%D8%AA%D8%A7%D8%B1%D9%87-%D9%86%D9%88%D8%B8%D9%87%D9%88%D8%B1-%D9%84%DB%8C%DA%AF-%D8%A8%D8%B1%D8%AA%D8%B1-%D8%AC%D8%B2%DB%8C%D8%B1%D9%87',
'md5': '841b7cd3afbc76e61708d94e53a4a4e7',
'info_dict': {
'id': '112785',
'ext': 'mp4',
'title': 'دله علی؛ ستاره نوظهور لیگ برتر جزیره',
'description': 'فوتبال 120',
},
'expected_warnings': ['description'],
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -26,15 +45,30 @@ class Varzesh3IE(InfoExtractor):
video_url = self._search_regex( video_url = self._search_regex(
r'<source[^>]+src="([^"]+)"', webpage, 'video url') r'<source[^>]+src="([^"]+)"', webpage, 'video url')
title = self._og_search_title(webpage) title = remove_start(self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title'), 'ویدیو ورزش 3 | ')
description = self._html_search_regex( description = self._html_search_regex(
r'(?s)<div class="matn">(.+?)</div>', r'(?s)<div class="matn">(.+?)</div>',
webpage, 'description', fatal=False) webpage, 'description', default=None)
thumbnail = self._og_search_thumbnail(webpage) if description is None:
description = clean_html(self._html_search_meta('description', webpage))
thumbnail = self._og_search_thumbnail(webpage, default=None)
if thumbnail is None:
fb_sharer_url = self._search_regex(
r'<a[^>]+href="(https?://www\.facebook\.com/sharer/sharer\.php?[^"]+)"',
webpage, 'facebook sharer URL', fatal=False)
sharer_params = compat_parse_qs(compat_urllib_parse_urlparse(fb_sharer_url).query)
thumbnail = sharer_params.get('p[images][0]', [None])[0]
video_id = self._search_regex( video_id = self._search_regex(
r"<link[^>]+rel='(?:canonical|shortlink)'[^>]+href='/\?p=([^']+)'", r"<link[^>]+rel='(?:canonical|shortlink)'[^>]+href='/\?p=([^']+)'",
webpage, display_id, default=display_id) webpage, display_id, default=None)
if video_id is None:
video_id = self._search_regex(
'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
default=display_id)
return { return {
'url': video_url, 'url': video_url,

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..utils import ExtractorError from ..utils import ExtractorError
@ -14,13 +13,21 @@ class ViceIE(InfoExtractor):
'url': 'http://www.vice.com/video/cowboy-capitalists-part-1', 'url': 'http://www.vice.com/video/cowboy-capitalists-part-1',
'info_dict': { 'info_dict': {
'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp', 'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp',
'ext': 'mp4', 'ext': 'flv',
'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov', 'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
'duration': 725.983, 'duration': 725.983,
}, },
'params': { }, {
# Requires ffmpeg (m3u8 manifest) 'url': 'http://www.vice.com/video/how-to-hack-a-car',
'skip_download': True, 'md5': '6fb2989a3fed069fb8eab3401fc2d3c9',
'info_dict': {
'id': '3jstaBeXgAs',
'ext': 'mp4',
'title': 'How to Hack a Car: Phreaked Out (Episode 2)',
'description': 'md5:ee95453f7ff495db8efe14ae8bf56f30',
'uploader_id': 'MotherboardTV',
'uploader': 'Motherboard',
'upload_date': '20140529',
}, },
}, { }, {
'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab', 'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
@ -39,11 +46,14 @@ class ViceIE(InfoExtractor):
try: try:
embed_code = self._search_regex( embed_code = self._search_regex(
r'embedCode=([^&\'"]+)', webpage, r'embedCode=([^&\'"]+)', webpage,
'ooyala embed code') 'ooyala embed code', default=None)
ooyala_url = OoyalaIE._url_for_embed_code(embed_code) if embed_code:
return self.url_result('ooyala:%s' % embed_code, 'Ooyala')
youtube_id = self._search_regex(
r'data-youtube-id="([^"]+)"', webpage, 'youtube id')
return self.url_result(youtube_id, 'Youtube')
except ExtractorError: except ExtractorError:
raise ExtractorError('The page doesn\'t contain a video', expected=True) raise ExtractorError('The page doesn\'t contain a video', expected=True)
return self.url_result(ooyala_url, ie='Ooyala')
class ViceShowIE(InfoExtractor): class ViceShowIE(InfoExtractor):

View File

@ -1,6 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
@ -14,6 +16,7 @@ from ..utils import (
parse_iso8601, parse_iso8601,
sanitized_Request, sanitized_Request,
HEADRequest, HEADRequest,
url_basename,
) )
@ -114,6 +117,7 @@ class ViewsterIE(InfoExtractor):
return self.playlist_result(entries, video_id, title, description) return self.playlist_result(entries, video_id, title, description)
formats = [] formats = []
manifest_url = None
for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'): for media_type in ('application/f4m+xml', 'application/x-mpegURL', 'video/mp4'):
media = self._download_json( media = self._download_json(
'https://public-api.viewster.com/movies/%s/video?mediaType=%s' 'https://public-api.viewster.com/movies/%s/video?mediaType=%s'
@ -126,29 +130,42 @@ class ViewsterIE(InfoExtractor):
continue continue
ext = determine_ext(video_url) ext = determine_ext(video_url)
if ext == 'f4m': if ext == 'f4m':
manifest_url = video_url
video_url += '&' if '?' in video_url else '?' video_url += '&' if '?' in video_url else '?'
video_url += 'hdcore=3.2.0&plugin=flowplayer-3.2.0.1' video_url += 'hdcore=3.2.0&plugin=flowplayer-3.2.0.1'
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id='hds')) video_url, video_id, f4m_id='hds'))
elif ext == 'm3u8': elif ext == 'm3u8':
manifest_url = video_url
m3u8_formats = self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='hls', video_url, video_id, 'mp4', m3u8_id='hls',
fatal=False) # m3u8 sometimes fail fatal=False) # m3u8 sometimes fail
if m3u8_formats: if m3u8_formats:
formats.extend(m3u8_formats) formats.extend(m3u8_formats)
else: else:
format_id = media.get('Bitrate') qualities_basename = self._search_regex(
f = { '/([^/]+)\.csmil/',
'url': video_url, manifest_url, 'qualities basename', default=None)
'format_id': 'mp4-%s' % format_id, if not qualities_basename:
'height': int_or_none(media.get('Height')), continue
'width': int_or_none(media.get('Width')), QUALITIES_RE = r'((,\d+k)+,?)'
'preference': 1, qualities = self._search_regex(
} QUALITIES_RE, qualities_basename,
if format_id and not f['height']: 'qualities', default=None)
f['height'] = int_or_none(self._search_regex( if not qualities:
r'^(\d+)[pP]$', format_id, 'height', default=None)) continue
formats.append(f) qualities = qualities.strip(',').split(',')
http_template = re.sub(QUALITIES_RE, r'%s', qualities_basename)
http_url_basename = url_basename(video_url)
for q in qualities:
tbr = int_or_none(self._search_regex(
r'(\d+)k', q, 'bitrate', default=None))
formats.append({
'url': video_url.replace(http_url_basename, http_template % q),
'ext': 'mp4',
'format_id': 'http' + ('-%d' % tbr if tbr else ''),
'tbr': tbr,
})
if not formats and not info.get('LanguageSets') and not info.get('VODSettings'): if not formats and not info.get('LanguageSets') and not info.get('VODSettings'):
self.raise_geo_restricted() self.raise_geo_restricted()

View File

@ -81,7 +81,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
\. \.
)? )?
vimeo(?P<pro>pro)?\.com/ vimeo(?P<pro>pro)?\.com/
(?!channels/[^/?#]+/?(?:$|[?#])|(?:album|ondemand)/) (?!channels/[^/?#]+/?(?:$|[?#])|[^/]+/review/|(?:album|ondemand)/)
(?:.*?/)? (?:.*?/)?
(?: (?:
(?: (?:
@ -90,6 +90,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
)? )?
(?:videos?/)? (?:videos?/)?
(?P<id>[0-9]+) (?P<id>[0-9]+)
(?:/[\da-f]+)?
/?(?:[?&].*)?(?:[#].*)?$ /?(?:[?&].*)?(?:[#].*)?$
''' '''
IE_NAME = 'vimeo' IE_NAME = 'vimeo'
@ -232,6 +233,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
'url': 'https://vimeo.com/7809605', 'url': 'https://vimeo.com/7809605',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'https://vimeo.com/160743502/abd0e13fb4',
'only_matching': True,
}
] ]
@staticmethod @staticmethod
@ -277,10 +282,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
pass_url = url + '/check-password' pass_url = url + '/check-password'
password_request = sanitized_Request(pass_url, data) password_request = sanitized_Request(pass_url, data)
password_request.add_header('Content-Type', 'application/x-www-form-urlencoded') password_request.add_header('Content-Type', 'application/x-www-form-urlencoded')
password_request.add_header('Referer', url)
return self._download_json( return self._download_json(
password_request, video_id, password_request, video_id,
'Verifying the password', 'Verifying the password', 'Wrong password')
'Wrong password')
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()

View File

@ -1,52 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class WayOfTheMasterIE(InfoExtractor):
_VALID_URL = r'https?://www\.wayofthemaster\.com/([^/?#]*/)*(?P<id>[^/?#]+)\.s?html(?:$|[?#])'
_TEST = {
'url': 'http://www.wayofthemaster.com/hbks.shtml',
'md5': '5316b57487ada8480606a93cb3d18d24',
'info_dict': {
'id': 'hbks',
'ext': 'mp4',
'title': 'Intelligent Design vs. Evolution',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
title = self._search_regex(
r'<img src="images/title_[^"]+".*?alt="([^"]+)"',
webpage, 'title', default=None)
if title is None:
title = self._html_search_regex(
r'<title>(.*?)</title>', webpage, 'page title')
url_base = self._search_regex(
r'<param\s+name="?movie"?\s+value=".*?/wotm_videoplayer_highlow[0-9]*\.swf\?vid=([^"]+)"',
webpage, 'URL base')
formats = [{
'format_id': 'low',
'quality': 1,
'url': url_base + '_low.mp4',
}, {
'format_id': 'high',
'quality': 2,
'url': url_base + '_high.mp4',
}]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
}

View File

@ -12,7 +12,7 @@ from ..utils import (
class XboxClipsIE(InfoExtractor): class XboxClipsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?xboxclips\.com/(?:video\.php\?.*vid=|[^/]+/)(?P<id>[\w-]{36})' _VALID_URL = r'https?://(?:www\.)?xboxclips\.com/(?:video\.php\?.*vid=|[^/]+/)(?P<id>[\w-]{36})'
_TEST = { _TEST = {
'url': 'https://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325', 'url': 'http://xboxclips.com/video.php?uid=2533274823424419&gamertag=Iabdulelah&vid=074a69a9-5faf-46aa-b93b-9909c1720325',
'md5': 'fbe1ec805e920aeb8eced3c3e657df5d', 'md5': 'fbe1ec805e920aeb8eced3c3e657df5d',
'info_dict': { 'info_dict': {
'id': '074a69a9-5faf-46aa-b93b-9909c1720325', 'id': '074a69a9-5faf-46aa-b93b-9909c1720325',

View File

@ -2,15 +2,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import time
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_chr,
compat_ord, compat_ord,
) )
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_filesize, parse_duration,
) )
@ -22,7 +22,7 @@ class XMinusIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '4542', 'id': '4542',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Леонид Агутин-Песенка шофера', 'title': 'Леонид Агутин-Песенка шофёра',
'duration': 156, 'duration': 156,
'tbr': 320, 'tbr': 320,
'filesize_approx': 5900000, 'filesize_approx': 5900000,
@ -36,38 +36,41 @@ class XMinusIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
artist = self._html_search_regex( artist = self._html_search_regex(
r'minus_track\.artist="(.+?)"', webpage, 'artist') r'<a[^>]+href="/artist/\d+">([^<]+)</a>', webpage, 'artist')
title = artist + '-' + self._html_search_regex( title = artist + '-' + self._html_search_regex(
r'minus_track\.title="(.+?)"', webpage, 'title') r'<span[^>]+class="minustrack-full-title(?:\s+[^"]+)?"[^>]*>([^<]+)', webpage, 'title')
duration = int_or_none(self._html_search_regex( duration = parse_duration(self._html_search_regex(
r'minus_track\.dur_sec=\'([0-9]*?)\'', r'<span[^>]+class="player-duration(?:\s+[^"]+)?"[^>]*>([^<]+)',
webpage, 'duration', fatal=False)) webpage, 'duration', fatal=False))
filesize_approx = parse_filesize(self._html_search_regex( mobj = re.search(
r'<div id="finfo"[^>]*>\s*↓\s*([0-9.]+\s*[a-zA-Z][bB])', r'<div[^>]+class="dw-info(?:\s+[^"]+)?"[^>]*>(?P<tbr>\d+)\s*кбит/c\s+(?P<filesize>[0-9.]+)\s*мб</div>',
webpage, 'approximate filesize', fatal=False)) webpage)
tbr = int_or_none(self._html_search_regex( tbr = filesize_approx = None
r'<div class="quality[^"]*"></div>\s*([0-9]+)\s*kbps', if mobj:
webpage, 'bitrate', fatal=False)) filesize_approx = float(mobj.group('filesize')) * 1000000
tbr = float(mobj.group('tbr'))
view_count = int_or_none(self._html_search_regex( view_count = int_or_none(self._html_search_regex(
r'<div class="quality.*?► ([0-9]+)', r'<span><[^>]+class="icon-chart-bar".*?>(\d+)</span>',
webpage, 'view count', fatal=False)) webpage, 'view count', fatal=False))
description = self._html_search_regex( description = self._html_search_regex(
r'(?s)<div id="song_texts">(.*?)</div><br', r'(?s)<pre[^>]+id="lyrics-original"[^>]*>(.*?)</pre>',
webpage, 'song lyrics', fatal=False) webpage, 'song lyrics', fatal=False)
if description: if description:
description = re.sub(' *\r *', '\n', description) description = re.sub(' *\r *', '\n', description)
enc_token = self._html_search_regex( k = self._search_regex(
r'minus_track\.s?tkn="(.+?)"', webpage, 'enc_token') r'<div[^>]+id="player-bottom"[^>]+data-k="([^"]+)">', webpage,
token = ''.join( 'encoded data')
c if pos == 3 else compat_chr(compat_ord(c) - 1) h = time.time() / 3600
for pos, c in enumerate(reversed(enc_token))) a = sum(map(int, [compat_ord(c) for c in k])) + int(video_id) + h
video_url = 'http://x-minus.org/dwlf/%s/%s.mp3' % (video_id, token) video_url = 'http://x-minus.me/dl/minus?id=%s&tkn2=%df%d' % (video_id, a, h)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': video_url, 'url': video_url,
# The extension is unknown until actual downloading
'ext': 'mp3',
'duration': duration, 'duration': duration,
'filesize_approx': filesize_approx, 'filesize_approx': filesize_approx,
'tbr': tbr, 'tbr': tbr,

View File

@ -24,7 +24,7 @@ from .nbc import NBCSportsVPlayerIE
class YahooIE(InfoExtractor): class YahooIE(InfoExtractor):
IE_DESC = 'Yahoo screen and movies' IE_DESC = 'Yahoo screen and movies'
_VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?\.html)' _VALID_URL = r'(?P<url>(?P<host>https?://(?:[a-zA-Z]{2}\.)?[\da-zA-Z_-]+\.yahoo\.com)/(?:[^/]+/)*(?P<display_id>.+)?-(?P<id>[0-9]+)(?:-[a-z]+)?(?:\.html)?)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html', 'url': 'http://screen.yahoo.com/julian-smith-travis-legg-watch-214727115.html',
@ -38,7 +38,7 @@ class YahooIE(InfoExtractor):
}, },
{ {
'url': 'http://screen.yahoo.com/wired/codefellas-s1-ep12-cougar-lies-103000935.html', 'url': 'http://screen.yahoo.com/wired/codefellas-s1-ep12-cougar-lies-103000935.html',
'md5': 'd6e6fc6e1313c608f316ddad7b82b306', 'md5': 'c3466d2b6d5dd6b9f41ba9ed04c24b23',
'info_dict': { 'info_dict': {
'id': 'd1dedf8c-d58c-38c3-8963-e899929ae0a9', 'id': 'd1dedf8c-d58c-38c3-8963-e899929ae0a9',
'ext': 'mp4', 'ext': 'mp4',
@ -49,7 +49,7 @@ class YahooIE(InfoExtractor):
}, },
{ {
'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed', 'url': 'https://screen.yahoo.com/community/community-sizzle-reel-203225340.html?format=embed',
'md5': '60e8ac193d8fb71997caa8fce54c6460', 'md5': '75ffabdb87c16d4ffe8c036dc4d1c136',
'info_dict': { 'info_dict': {
'id': '4fe78544-8d48-39d8-97cd-13f205d9fcdb', 'id': '4fe78544-8d48-39d8-97cd-13f205d9fcdb',
'ext': 'mp4', 'ext': 'mp4',
@ -59,15 +59,15 @@ class YahooIE(InfoExtractor):
} }
}, },
{ {
'url': 'https://tw.screen.yahoo.com/election-2014-askmayor/敢問市長-黃秀霜批賴清德-非常高傲-033009720.html', 'url': 'https://tw.news.yahoo.com/%E6%95%A2%E5%95%8F%E5%B8%82%E9%95%B7%20%E9%BB%83%E7%A7%80%E9%9C%9C%E6%89%B9%E8%B3%B4%E6%B8%85%E5%BE%B7%20%E9%9D%9E%E5%B8%B8%E9%AB%98%E5%82%B2-034024051.html',
'md5': '3a09cf59349cfaddae1797acc3c087fc', 'md5': '9035d38f88b1782682a3e89f985be5bb',
'info_dict': { 'info_dict': {
'id': 'cac903b3-fcf4-3c14-b632-643ab541712f', 'id': 'cac903b3-fcf4-3c14-b632-643ab541712f',
'ext': 'mp4', 'ext': 'mp4',
'title': '敢問市長/黃秀霜批賴清德「非常高傲」', 'title': '敢問市長/黃秀霜批賴清德「非常高傲」',
'description': '直言台南沒捷運 交通居五都之末', 'description': '直言台南沒捷運 交通居五都之末',
'duration': 396, 'duration': 396,
} },
}, },
{ {
'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html', 'url': 'https://uk.screen.yahoo.com/editor-picks/cute-raccoon-freed-drain-using-091756545.html',
@ -89,17 +89,32 @@ class YahooIE(InfoExtractor):
'title': 'Program that makes hockey more affordable not offered in Manitoba', 'title': 'Program that makes hockey more affordable not offered in Manitoba',
'description': 'md5:c54a609f4c078d92b74ffb9bf1f496f4', 'description': 'md5:c54a609f4c078d92b74ffb9bf1f496f4',
'duration': 121, 'duration': 121,
} },
'skip': 'Video gone',
}, { }, {
'url': 'https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html', 'url': 'https://ca.finance.yahoo.com/news/hackers-sony-more-trouble-well-154609075.html',
'md5': '226a895aae7e21b0129e2a2006fe9690',
'info_dict': { 'info_dict': {
'id': 'e624c4bc-3389-34de-9dfc-025f74943409', 'id': '154609075',
'ext': 'mp4', },
'title': '\'The Interview\' TV Spot: War', 'playlist': [{
'description': 'The Interview', 'md5': 'f8e336c6b66f503282e5f719641d6565',
'duration': 30, 'info_dict': {
} 'id': 'e624c4bc-3389-34de-9dfc-025f74943409',
'ext': 'mp4',
'title': '\'The Interview\' TV Spot: War',
'description': 'The Interview',
'duration': 30,
},
}, {
'md5': '958bcb90b4d6df71c56312137ee1cd5a',
'info_dict': {
'id': '1fc8ada0-718e-3abe-a450-bf31f246d1a9',
'ext': 'mp4',
'title': '\'The Interview\' TV Spot: Guys',
'description': 'The Interview',
'duration': 30,
},
}],
}, { }, {
'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html', 'url': 'http://news.yahoo.com/video/china-moses-crazy-blues-104538833.html',
'md5': '88e209b417f173d86186bef6e4d1f160', 'md5': '88e209b417f173d86186bef6e4d1f160',
@ -119,10 +134,11 @@ class YahooIE(InfoExtractor):
'title': 'Connect the Dots: Dark Side of Virgo', 'title': 'Connect the Dots: Dark Side of Virgo',
'description': 'md5:1428185051cfd1949807ad4ff6d3686a', 'description': 'md5:1428185051cfd1949807ad4ff6d3686a',
'duration': 201, 'duration': 201,
} },
'skip': 'Domain name in.lifestyle.yahoo.com gone',
}, { }, {
'url': 'https://www.yahoo.com/movies/v/true-story-trailer-173000497.html', 'url': 'https://www.yahoo.com/movies/v/true-story-trailer-173000497.html',
'md5': '989396ae73d20c6f057746fb226aa215', 'md5': 'b17ac378b1134fa44370fb27db09a744',
'info_dict': { 'info_dict': {
'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1', 'id': '071c4013-ce30-3a93-a5b2-e0413cd4a9d1',
'ext': 'mp4', 'ext': 'mp4',
@ -141,6 +157,9 @@ class YahooIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d', 'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson', 'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
'upload_date': '20150313',
'uploader': 'NBCU-SPORTS',
'timestamp': 1426270238,
} }
}, { }, {
'url': 'https://tw.news.yahoo.com/-100120367.html', 'url': 'https://tw.news.yahoo.com/-100120367.html',
@ -148,7 +167,7 @@ class YahooIE(InfoExtractor):
}, { }, {
# Query result is embedded in webpage, but explicit request to video API fails with geo restriction # Query result is embedded in webpage, but explicit request to video API fails with geo restriction
'url': 'https://screen.yahoo.com/community/communitary-community-episode-1-ladders-154501237.html', 'url': 'https://screen.yahoo.com/community/communitary-community-episode-1-ladders-154501237.html',
'md5': '4fbafb9c9b6f07aa8f870629f6671b35', 'md5': '1ddbf7c850777548438e5c4f147c7b8c',
'info_dict': { 'info_dict': {
'id': '1f32853c-a271-3eef-8cb6-f6d6872cb504', 'id': '1f32853c-a271-3eef-8cb6-f6d6872cb504',
'ext': 'mp4', 'ext': 'mp4',
@ -166,6 +185,17 @@ class YahooIE(InfoExtractor):
'description': 'While they play feuding fathers in \'Daddy\'s Home,\' star Will Ferrell & Mark Wahlberg share their true feelings on parenthood.', 'description': 'While they play feuding fathers in \'Daddy\'s Home,\' star Will Ferrell & Mark Wahlberg share their true feelings on parenthood.',
}, },
}, },
{
# config['models']['applet_model']['data']['sapi'] has no query
'url': 'https://www.yahoo.com/music/livenation/event/galactic-2016',
'md5': 'dac0c72d502bc5facda80c9e6d5c98db',
'info_dict': {
'id': 'a6015640-e9e5-3efb-bb60-05589a183919',
'ext': 'mp4',
'description': 'Galactic',
'title': 'Dolla Diva (feat. Maggie Koerner)',
},
},
] ]
def _real_extract(self, url): def _real_extract(self, url):
@ -174,19 +204,26 @@ class YahooIE(InfoExtractor):
page_id = mobj.group('id') page_id = mobj.group('id')
url = mobj.group('url') url = mobj.group('url')
host = mobj.group('host') host = mobj.group('host')
webpage = self._download_webpage(url, display_id) webpage, urlh = self._download_webpage_handle(url, display_id)
if 'err=404' in urlh.geturl():
raise ExtractorError('Video gone', expected=True)
# Look for iframed media first # Look for iframed media first
iframe_m = re.search(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage) entries = []
if iframe_m: iframe_urls = re.findall(r'<iframe[^>]+src="(/video/.+?-\d+\.html\?format=embed.*?)"', webpage)
for idx, iframe_url in enumerate(iframe_urls):
iframepage = self._download_webpage( iframepage = self._download_webpage(
host + iframe_m.group(1), display_id, 'Downloading iframe webpage') host + iframe_url, display_id,
note='Downloading iframe webpage for video #%d' % idx)
items_json = self._search_regex( items_json = self._search_regex(
r'mediaItems: (\[.+?\])$', iframepage, 'items', flags=re.MULTILINE, default=None) r'mediaItems: (\[.+?\])$', iframepage, 'items', flags=re.MULTILINE, default=None)
if items_json: if items_json:
items = json.loads(items_json) items = json.loads(items_json)
video_id = items[0]['id'] video_id = items[0]['id']
return self._get_info(video_id, display_id, webpage) entries.append(self._get_info(video_id, display_id, webpage))
if entries:
return self.playlist_result(entries, page_id)
# Look for NBCSports iframes # Look for NBCSports iframes
nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage) nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
if nbc_sports_url: if nbc_sports_url:
@ -202,7 +239,7 @@ class YahooIE(InfoExtractor):
config = self._parse_json(config_json, display_id, fatal=False) config = self._parse_json(config_json, display_id, fatal=False)
if config: if config:
sapi = config.get('models', {}).get('applet_model', {}).get('data', {}).get('sapi') sapi = config.get('models', {}).get('applet_model', {}).get('data', {}).get('sapi')
if sapi: if sapi and 'query' in sapi:
return self._extract_info(display_id, sapi, webpage) return self._extract_info(display_id, sapi, webpage)
items_json = self._search_regex( items_json = self._search_regex(

View File

@ -64,6 +64,14 @@ class YoukuIE(InfoExtractor):
'params': { 'params': {
'videopassword': '100600', 'videopassword': '100600',
}, },
}, {
# /play/get.json contains streams with "channel_type":"tail"
'url': 'http://v.youku.com/v_show/id_XOTUxMzg4NDMy.html',
'info_dict': {
'id': 'XOTUxMzg4NDMy',
'title': '我的世界☆明月庄主☆车震猎杀☆杀人艺术Minecraft',
},
'playlist_count': 6,
}] }]
def construct_video_urls(self, data): def construct_video_urls(self, data):
@ -92,6 +100,8 @@ class YoukuIE(InfoExtractor):
fileid_dict = {} fileid_dict = {}
for stream in data['stream']: for stream in data['stream']:
if stream.get('channel_type') == 'tail':
continue
format = stream.get('stream_type') format = stream.get('stream_type')
fileid = stream['stream_fileid'] fileid = stream['stream_fileid']
fileid_dict[format] = fileid fileid_dict[format] = fileid
@ -117,6 +127,8 @@ class YoukuIE(InfoExtractor):
# generate video_urls # generate video_urls
video_urls_dict = {} video_urls_dict = {}
for stream in data['stream']: for stream in data['stream']:
if stream.get('channel_type') == 'tail':
continue
format = stream.get('stream_type') format = stream.get('stream_type')
video_urls = [] video_urls = []
for dt in stream['segs']: for dt in stream['segs']:
@ -253,6 +265,8 @@ class YoukuIE(InfoExtractor):
# which one has all # which one has all
} for i in range(max(len(v.get('segs')) for v in data['stream']))] } for i in range(max(len(v.get('segs')) for v in data['stream']))]
for stream in data['stream']: for stream in data['stream']:
if stream.get('channel_type') == 'tail':
continue
fm = stream.get('stream_type') fm = stream.get('stream_type')
video_urls = video_urls_dict[fm] video_urls = video_urls_dict[fm]
for video_url, seg, entry in zip(video_urls, stream['segs'], entries): for video_url, seg, entry in zip(video_urls, stream['segs'], entries):

View File

@ -125,6 +125,12 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
if login_results is False: if login_results is False:
return False return False
error_msg = self._html_search_regex(
r'<[^>]+id="errormsg_0_Passwd"[^>]*>([^<]+)<',
login_results, 'error message', default=None)
if error_msg:
raise ExtractorError('Unable to login: %s' % error_msg, expected=True)
if re.search(r'id="errormsg_0_Passwd"', login_results) is not None: if re.search(r'id="errormsg_0_Passwd"', login_results) is not None:
raise ExtractorError('Please use your account password and a two-factor code instead of an application-specific password.', expected=True) raise ExtractorError('Please use your account password and a two-factor code instead of an application-specific password.', expected=True)
@ -1818,20 +1824,32 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
def _extract_mix(self, playlist_id): def _extract_mix(self, playlist_id):
# The mixes are generated from a single video # The mixes are generated from a single video
# the id of the playlist is just 'RD' + video_id # the id of the playlist is just 'RD' + video_id
url = 'https://youtube.com/watch?v=%s&list=%s' % (playlist_id[-11:], playlist_id) ids = []
webpage = self._download_webpage( last_id = playlist_id[-11:]
url, playlist_id, 'Downloading Youtube mix') for n in itertools.count(1):
url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
webpage = self._download_webpage(
url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
new_ids = orderedSet(re.findall(
r'''(?xs)data-video-username=".*?".*?
href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id),
webpage))
# Fetch new pages until all the videos are repeated, it seems that
# there are always 51 unique videos.
new_ids = [_id for _id in new_ids if _id not in ids]
if not new_ids:
break
ids.extend(new_ids)
last_id = ids[-1]
url_results = self._ids_to_results(ids)
search_title = lambda class_name: get_element_by_attribute('class', class_name, webpage) search_title = lambda class_name: get_element_by_attribute('class', class_name, webpage)
title_span = ( title_span = (
search_title('playlist-title') or search_title('playlist-title') or
search_title('title long-title') or search_title('title long-title') or
search_title('title')) search_title('title'))
title = clean_html(title_span) title = clean_html(title_span)
ids = orderedSet(re.findall(
r'''(?xs)data-video-username=".*?".*?
href="/watch\?v=([0-9A-Za-z_-]{11})&amp;[^"]*?list=%s''' % re.escape(playlist_id),
webpage))
url_results = self._ids_to_results(ids)
return self.playlist_result(url_results, playlist_id, title) return self.playlist_result(url_results, playlist_id, title)
@ -1884,7 +1902,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
if video: if video:
return video return video
if playlist_id.startswith('RD') or playlist_id.startswith('UL'): if playlist_id.startswith(('RD', 'UL', 'PU')):
# Mixes require a custom extraction process # Mixes require a custom extraction process
return self._extract_mix(playlist_id) return self._extract_mix(playlist_id)
@ -1987,8 +2005,8 @@ class YoutubeUserIE(YoutubeChannelIE):
def suitable(cls, url): def suitable(cls, url):
# Don't return True if the url can be extracted with other youtube # Don't return True if the url can be extracted with other youtube
# extractor, the regex would is too permissive and it would match. # extractor, the regex would is too permissive and it would match.
other_ies = iter(klass for (name, klass) in globals().items() if name.endswith('IE') and klass is not cls) other_yt_ies = iter(klass for (name, klass) in globals().items() if name.startswith('Youtube') and name.endswith('IE') and klass is not cls)
if any(ie.suitable(url) for ie in other_ies): if any(ie.suitable(url) for ie in other_yt_ies):
return False return False
else: else:
return super(YoutubeUserIE, cls).suitable(url) return super(YoutubeUserIE, cls).suitable(url)

View File

@ -425,8 +425,12 @@ def parseOpts(overrideArguments=None):
help='Set file xattribute ytdl.filesize with expected filesize (experimental)') help='Set file xattribute ytdl.filesize with expected filesize (experimental)')
downloader.add_option( downloader.add_option(
'--hls-prefer-native', '--hls-prefer-native',
dest='hls_prefer_native', action='store_true', dest='hls_prefer_native', action='store_true', default=None,
help='Use the native HLS downloader instead of ffmpeg (experimental)') help='Use the native HLS downloader instead of ffmpeg')
downloader.add_option(
'--hls-prefer-ffmpeg',
dest='hls_prefer_native', action='store_false', default=None,
help='Use ffmpeg instead of the native HLS downloader')
downloader.add_option( downloader.add_option(
'--hls-use-mpegts', '--hls-use-mpegts',
dest='hls_use_mpegts', action='store_true', dest='hls_use_mpegts', action='store_true',

View File

@ -175,7 +175,8 @@ class FFmpegPostProcessor(PostProcessor):
# Always use 'file:' because the filename may contain ':' (ffmpeg # Always use 'file:' because the filename may contain ':' (ffmpeg
# interprets that as a protocol) or can start with '-' (-- is broken in # interprets that as a protocol) or can start with '-' (-- is broken in
# ffmpeg, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details) # ffmpeg, see https://ffmpeg.org/trac/ffmpeg/ticket/2127 for details)
return 'file:' + fn # Also leave '-' intact in order not to break streaming to stdout.
return 'file:' + fn if fn != '-' else fn
class FFmpegExtractAudioPP(FFmpegPostProcessor): class FFmpegExtractAudioPP(FFmpegPostProcessor):

View File

@ -1540,44 +1540,46 @@ def parse_duration(s):
s = s.strip() s = s.strip()
m = re.match( days, hours, mins, secs, ms = [None] * 5
r'''(?ix)(?:P?T)? m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?$', s)
(?: if m:
(?P<only_mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*| days, hours, mins, secs, ms = m.groups()
(?P<only_hours>[0-9.]+)\s*(?:hours?)| else:
m = re.match(
\s*(?P<hours_reversed>[0-9]+)\s*(?:[:h]|hours?)\s*(?P<mins_reversed>[0-9]+)\s*(?:[:m]|mins?\.?|minutes?)\s*| r'''(?ix)(?:P?T)?
(?:
(?: (?:
(?:(?P<days>[0-9]+)\s*(?:[:d]|days?)\s*)? (?P<days>[0-9]+)\s*d(?:ays?)?\s*
(?P<hours>[0-9]+)\s*(?:[:h]|hours?)\s*
)? )?
(?P<mins>[0-9]+)\s*(?:[:m]|mins?|minutes?)\s* (?:
)? (?P<hours>[0-9]+)\s*h(?:ours?)?\s*
(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*(?:s|secs?|seconds?)? )?
)$''', s) (?:
if not m: (?P<mins>[0-9]+)\s*m(?:in(?:ute)?s?)?\s*
return None )?
res = 0 (?:
if m.group('only_mins'): (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
return float_or_none(m.group('only_mins'), invscale=60) )?$''', s)
if m.group('only_hours'): if m:
return float_or_none(m.group('only_hours'), invscale=60 * 60) days, hours, mins, secs, ms = m.groups()
if m.group('secs'): else:
res += int(m.group('secs')) m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)$', s)
if m.group('mins_reversed'): if m:
res += int(m.group('mins_reversed')) * 60 hours, mins = m.groups()
if m.group('mins'): else:
res += int(m.group('mins')) * 60 return None
if m.group('hours'):
res += int(m.group('hours')) * 60 * 60 duration = 0
if m.group('hours_reversed'): if secs:
res += int(m.group('hours_reversed')) * 60 * 60 duration += float(secs)
if m.group('days'): if mins:
res += int(m.group('days')) * 24 * 60 * 60 duration += float(mins) * 60
if m.group('ms'): if hours:
res += float(m.group('ms')) duration += float(hours) * 60 * 60
return res if days:
duration += float(days) * 24 * 60 * 60
if ms:
duration += float(ms)
return duration
def prepend_extension(filename, ext, expected_real_ext=None): def prepend_extension(filename, ext, expected_real_ext=None):
@ -1933,6 +1935,9 @@ def error_to_compat_str(err):
def mimetype2ext(mt): def mimetype2ext(mt):
if mt is None:
return None
ext = { ext = {
'audio/mp4': 'm4a', 'audio/mp4': 'm4a',
}.get(mt) }.get(mt)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.04.06' __version__ = '2016.04.24'