Merge branch 'master' into tvpleextractor

This commit is contained in:
kjy00302 2016-05-14 17:32:32 +09:00
commit 0b692229f7
169 changed files with 5301 additions and 2643 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.05.10*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.06** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.05.10**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.06 [debug] youtube-dl version 2016.05.10
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

2
.gitignore vendored
View File

@ -31,7 +31,9 @@ updates_key.pem
*.part *.part
*.swp *.swp
test/testdata test/testdata
test/local_parameters.json
.tox .tox
youtube-dl.zsh youtube-dl.zsh
.idea .idea
.idea/* .idea/*
tmp/

View File

@ -7,6 +7,9 @@ python:
- "3.4" - "3.4"
- "3.5" - "3.5"
sudo: false sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose script: nosetests test --verbose
notifications: notifications:
email: email:

View File

@ -167,3 +167,8 @@ Kacper Michajłow
José Joaquín Atria José Joaquín Atria
Viťas Strádal Viťas Strádal
Kagami Hiiragi Kagami Hiiragi
Philip Huppert
blahgeek
Kevin Deldycke
inondle
Tomáš Čech

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi *.mkv *.webm CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete
@ -37,7 +37,7 @@ test:
ot: offlinetest ot: offlinetest
offlinetest: codetest offlinetest: codetest
$(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py $(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_socks.py
tar: youtube-dl.tar.gz tar: youtube-dl.tar.gz

View File

@ -85,9 +85,11 @@ which means you can modify it, redistribute it or use it however you like.
--no-color Do not emit color codes in output --no-color Do not emit color codes in output
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS proxy. Pass in --proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
an empty string (--proxy "") for direct To enable experimental SOCKS proxy, specify
connection a proper scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds --socket-timeout SECONDS Time to wait before giving up, in seconds
--source-address IP Client-side IP address to bind to --source-address IP Client-side IP address to bind to
(experimental) (experimental)
@ -176,7 +178,9 @@ which means you can modify it, redistribute it or use it however you like.
--xattr-set-filesize Set file xattribute ytdl.filesize with --xattr-set-filesize Set file xattribute ytdl.filesize with
expected filesize (experimental) expected filesize (experimental)
--hls-prefer-native Use the native HLS downloader instead of --hls-prefer-native Use the native HLS downloader instead of
ffmpeg (experimental) ffmpeg
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
downloader
--hls-use-mpegts Use the mpegts container for HLS videos, --hls-use-mpegts Use the mpegts container for HLS videos,
allowing to play the video while allowing to play the video while
downloading (some players may not be able downloading (some players may not be able
@ -463,7 +467,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `display_id`: An alternative identifier for the video - `display_id`: An alternative identifier for the video
- `uploader`: Full name of the video uploader - `uploader`: Full name of the video uploader
- `license`: License name the video is licensed under - `license`: License name the video is licensed under
- `creator`: The main artist who created the video - `creator`: The creator of the video
- `release_date`: The date (YYYYMMDD) when the video was released - `release_date`: The date (YYYYMMDD) when the video was released
- `timestamp`: UNIX timestamp of the moment the video became available - `timestamp`: UNIX timestamp of the moment the video became available
- `upload_date`: Video upload date (YYYYMMDD) - `upload_date`: Video upload date (YYYYMMDD)
@ -515,6 +519,18 @@ Available for the video that is an episode of some series or programme:
- `episode_number`: Number of the video episode within a season - `episode_number`: Number of the video episode within a season
- `episode_id`: Id of the video episode - `episode_id`: Id of the video episode
Available for the media that is a track or a part of a music album:
- `track`: Title of the track
- `track_number`: Number of the track within an album or a disc
- `track_id`: Id of the track
- `artist`: Artist(s) of the track
- `genre`: Genre(s) of the track
- `album`: Title of the album the track belongs to
- `album_type`: Type of the album
- `album_artist`: List of all artists appeared on the album
- `disc_number`: Number of the disc or other physical medium the track belongs to
- `release_year`: Year (YYYY) when the album was released
Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`. Each aforementioned sequence when referenced in output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by particular extractor, such sequences will be replaced with `NA`.
For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory. For example for `-o %(title)s-%(id)s.%(ext)s` and mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj` this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
@ -683,6 +699,10 @@ YouTube changed their playlist format in March 2014 and later on, so you'll need
If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update. If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging guys](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
Make sure you are not using `-o` with any of these options `-t`, `--title`, `--id`, `-A` or `--auto-number` set in command line or in a configuration file. Remove the latter if any.
### Do I always have to pass `-citw`? ### Do I always have to pass `-citw`?
By default, youtube-dl intends to have the best options (incidentally, if you have a convincing case that these should be different, [please file an issue where you explain that](https://yt-dl.org/bug)). Therefore, it is unnecessary and sometimes harmful to copy long option strings from webpages. In particular, the only option out of `-citw` that is regularly useful is `-i`. By default, youtube-dl intends to have the best options (incidentally, if you have a convincing case that these should be different, [please file an issue where you explain that](https://yt-dl.org/bug)). Therefore, it is unnecessary and sometimes harmful to copy long option strings from webpages. In particular, the only option out of `-citw` that is regularly useful is `-i`.
@ -703,7 +723,7 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [
### I have downloaded a video but how can I play it? ### I have downloaded a video but how can I play it?
Once the video is fully downloaded, use any video player, such as [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/). Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org) or [mplayer](http://www.mplayerhq.hu/).
### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser. ### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.

8
devscripts/install_srelay.sh Executable file
View File

@ -0,0 +1,8 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@ -50,6 +50,7 @@
- **arte.tv:ddc** - **arte.tv:ddc**
- **arte.tv:embed** - **arte.tv:embed**
- **arte.tv:future** - **arte.tv:future**
- **arte.tv:info**
- **arte.tv:magazine** - **arte.tv:magazine**
- **AtresPlayer** - **AtresPlayer**
- **ATTTechChannel** - **ATTTechChannel**
@ -76,6 +77,7 @@
- **Bild**: Bild.de - **Bild**: Bild.de
- **BiliBili** - **BiliBili**
- **BioBioChileTV** - **BioBioChileTV**
- **BIQLE**
- **BleacherReport** - **BleacherReport**
- **BleacherReportCMS** - **BleacherReportCMS**
- **blinkx** - **blinkx**
@ -115,6 +117,7 @@
- **Cinemassacre** - **Cinemassacre**
- **Clipfish** - **Clipfish**
- **cliphunter** - **cliphunter**
- **ClipRs**
- **Clipsyndicate** - **Clipsyndicate**
- **cloudtime**: CloudTime - **cloudtime**: CloudTime
- **Cloudy** - **Cloudy**
@ -143,6 +146,7 @@
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
- **CultureUnplugged** - **CultureUnplugged**
- **CWTV** - **CWTV**
- **DailyMail**
- **dailymotion** - **dailymotion**
- **dailymotion:playlist** - **dailymotion:playlist**
- **dailymotion:user** - **dailymotion:user**
@ -161,6 +165,7 @@
- **defense.gouv.fr** - **defense.gouv.fr**
- **democracynow** - **democracynow**
- **DHM**: Filmarchiv - Deutsches Historisches Museum - **DHM**: Filmarchiv - Deutsches Historisches Museum
- **DigitallySpeaking**
- **Digiteka** - **Digiteka**
- **Discovery** - **Discovery**
- **Dotsub** - **Dotsub**
@ -172,7 +177,6 @@
- **Dropbox** - **Dropbox**
- **DrTuber** - **DrTuber**
- **DRTV** - **DRTV**
- **Dump**
- **Dumpert** - **Dumpert**
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **dw** - **dw**
@ -286,7 +290,6 @@
- **ivi:compilation**: ivi.ru compilations - **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV - **ivideon**: Ivideon TV
- **Izlesene** - **Izlesene**
- **JadoreCettePub**
- **JeuxVideo** - **JeuxVideo**
- **Jove** - **Jove**
- **jpopsuki.tv** - **jpopsuki.tv**
@ -324,6 +327,7 @@
- **limelight** - **limelight**
- **limelight:channel** - **limelight:channel**
- **limelight:channel_list** - **limelight:channel_list**
- **LiTV**
- **LiveLeak** - **LiveLeak**
- **livestream** - **livestream**
- **livestream:original** - **livestream:original**
@ -337,26 +341,28 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **MakersChannel** - **MakersChannel**
- **MakerTV** - **MakerTV**
- **Malemotion**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
- **media.ccc.de** - **media.ccc.de**
- **metacafe** - **metacafe**
- **Metacritic** - **Metacritic**
- **Mgoon** - **Mgoon**
- **MGTV**: 芒果TV
- **Minhateca** - **Minhateca**
- **MinistryGrid** - **MinistryGrid**
- **Minoto** - **Minoto**
- **miomio.tv** - **miomio.tv**
- **MiTele**: mitele.es - **MiTele**: mitele.es
- **mixcloud** - **mixcloud**
- **mixcloud:playlist**
- **mixcloud:stream**
- **mixcloud:user**
- **MLB** - **MLB**
- **Mnet** - **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **Mojvideo** - **Mojvideo**
- **Moniker**: allmyvideos.net and vidspot.net - **Moniker**: allmyvideos.net and vidspot.net
- **mooshare**: Mooshare.biz
- **Morningstar**: morningstar.com - **Morningstar**: morningstar.com
- **Motherless** - **Motherless**
- **Motorsport**: motorsport.com - **Motorsport**: motorsport.com
@ -371,8 +377,10 @@
- **mtvservices:embedded** - **mtvservices:embedded**
- **MuenchenTV**: münchen.tv - **MuenchenTV**: münchen.tv
- **MusicPlayOn** - **MusicPlayOn**
- **muzu.tv** - **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
- **Mwave** - **Mwave**
- **MwaveMeetGreet**
- **MySpace** - **MySpace**
- **MySpace:album** - **MySpace:album**
- **MySpass** - **MySpass**
@ -393,7 +401,6 @@
- **ndr:embed:base** - **ndr:embed:base**
- **NDTV** - **NDTV**
- **NerdCubedFeed** - **NerdCubedFeed**
- **Nerdist**
- **netease:album**: 网易云音乐 - 专辑 - **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台 - **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV - **netease:mv**: 网易云音乐 - MV
@ -411,7 +418,8 @@
- **nfl.com** - **nfl.com**
- **nhl.com** - **nhl.com**
- **nhl.com:news**: NHL news - **nhl.com:news**: NHL news
- **nhl.com:videocenter**: NHL videocenter category - **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com** - **nick.com**
- **niconico**: ニコニコ動画 - **niconico**: ニコニコ動画
- **NiconicoPlaylist** - **NiconicoPlaylist**
@ -459,13 +467,14 @@
- **Patreon** - **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC) - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag** - **pcmag**
- **Periscope**: Periscope - **People**
- **periscope**: Periscope
- **periscope:user**: Periscope user videos
- **PhilharmonieDeParis**: Philharmonie de Paris - **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de** - **phoenix.de**
- **Photobucket** - **Photobucket**
- **Pinkbike** - **Pinkbike**
- **Pladform** - **Pladform**
- **PlanetaPlay**
- **play.fm** - **play.fm**
- **played.to** - **played.to**
- **PlaysTV** - **PlaysTV**
@ -484,6 +493,7 @@
- **Pornotube** - **Pornotube**
- **PornoVoisines** - **PornoVoisines**
- **PornoXO** - **PornoXO**
- **PressTV**
- **PrimeShareTV** - **PrimeShareTV**
- **PromptFile** - **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital - **prosiebensat1**: ProSiebenSat.1 Digital
@ -494,7 +504,6 @@
- **qqmusic:playlist**: QQ音乐 - 歌单 - **qqmusic:playlist**: QQ音乐 - 歌单
- **qqmusic:singer**: QQ音乐 - 歌手 - **qqmusic:singer**: QQ音乐 - 歌手
- **qqmusic:toplist**: QQ音乐 - 排行榜 - **qqmusic:toplist**: QQ音乐 - 排行榜
- **QuickVid**
- **R7** - **R7**
- **radio.de** - **radio.de**
- **radiobremen** - **radiobremen**
@ -550,7 +559,6 @@
- **SenateISVP** - **SenateISVP**
- **ServingSys** - **ServingSys**
- **Sexu** - **Sexu**
- **SexyKarma**: Sexy Karma and Watch Indian Porn
- **Shahid** - **Shahid**
- **Shared**: shared.sx and vivo.sx - **Shared**: shared.sx and vivo.sx
- **ShareSix** - **ShareSix**
@ -563,8 +571,6 @@
- **smotri:broadcast**: Smotri.com broadcasts - **smotri:broadcast**: Smotri.com broadcasts
- **smotri:community**: Smotri.com community videos - **smotri:community**: Smotri.com community videos
- **smotri:user**: Smotri.com user videos - **smotri:user**: Smotri.com user videos
- **SnagFilms**
- **SnagFilmsEmbed**
- **Snotr** - **Snotr**
- **Sohu** - **Sohu**
- **soundcloud** - **soundcloud**
@ -606,8 +612,10 @@
- **Syfy** - **Syfy**
- **SztvHu** - **SztvHu**
- **Tagesschau** - **Tagesschau**
- **tagesschau:player**
- **Tapely** - **Tapely**
- **Tass** - **Tass**
- **TDSLifeway**
- **teachertube**: teachertube.com videos - **teachertube**: teachertube.com videos
- **teachertube:user:collection**: teachertube.com user and collection videos - **teachertube:user:collection**: teachertube.com user and collection videos
- **TeachingChannel** - **TeachingChannel**
@ -624,7 +632,6 @@
- **TeleTask** - **TeleTask**
- **TF1** - **TF1**
- **TheIntercept** - **TheIntercept**
- **TheOnion**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene** - **TheScene**
@ -684,7 +691,6 @@
- **twitter** - **twitter**
- **twitter:amplify** - **twitter:amplify**
- **twitter:card** - **twitter:card**
- **Ubu**
- **udemy** - **udemy**
- **udemy:course** - **udemy:course**
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
@ -701,6 +707,7 @@
- **Vessel** - **Vessel**
- **Vesti**: Вести.Ru - **Vesti**: Вести.Ru
- **Vevo** - **Vevo**
- **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com** - **vh1.com**
- **Vice** - **Vice**
@ -723,6 +730,8 @@
- **Vidzi** - **Vidzi**
- **vier** - **vier**
- **vier:videos** - **vier:videos**
- **ViewLift**
- **ViewLiftEmbed**
- **Viewster** - **Viewster**
- **Viidea** - **Viidea**
- **viki** - **viki**
@ -754,7 +763,7 @@
- **Walla** - **Walla**
- **WashingtonPost** - **WashingtonPost**
- **wat.tv** - **wat.tv**
- **WayOfTheMaster** - **WatchIndianPorn**: Watch Indian Porn
- **WDR** - **WDR**
- **wdr:mobile** - **wdr:mobile**
- **WDRMaus**: Sendung mit der Maus - **WDRMaus**: Sendung mit der Maus
@ -771,9 +780,13 @@
- **WSJ**: Wall Street Journal - **WSJ**: Wall Street Journal
- **XBef** - **XBef**
- **XboxClips** - **XboxClips**
- **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me - **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To
- **XHamster** - **XHamster**
- **XHamsterEmbed** - **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑
- **xiami:artist**: 虾米音乐 - 歌手
- **xiami:collection**: 虾米音乐 - 精选集
- **xiami:song**: 虾米音乐
- **XMinus** - **XMinus**
- **XNXX** - **XNXX**
- **Xstream** - **Xstream**

View File

@ -24,8 +24,13 @@ from youtube_dl.utils import (
def get_params(override=None): def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json") "parameters.json")
LOCAL_PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"local_parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf: with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf) parameters = json.load(pf)
if os.path.exists(LOCAL_PARAMETERS_FILE):
with io.open(LOCAL_PARAMETERS_FILE, encoding='utf-8') as pf:
parameters.update(json.load(pf))
if override: if override:
parameters.update(override) parameters.update(override)
return parameters return parameters
@ -143,6 +148,9 @@ def expect_value(self, got, expected, field):
expect_value(self, item_got, item_expected, field) expect_value(self, item_got, item_expected, field)
else: else:
if isinstance(expected, compat_str) and expected.startswith('md5:'): if isinstance(expected, compat_str) and expected.startswith('md5:'):
self.assertTrue(
isinstance(got, compat_str),
'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
got = 'md5:' + md5(got) got = 'md5:' + md5(got)
elif isinstance(expected, compat_str) and expected.startswith('mincount:'): elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
self.assertTrue( self.assertTrue(

View File

@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError
class TestIE(InfoExtractor): class TestIE(InfoExtractor):
@ -66,5 +67,14 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('e', html), '5') self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6') self.assertEqual(ie._html_search_meta('f', html), '6')
def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
uri = encode_data_uri(b'callback({"foo": "blah"})', 'application/javascript')
self.assertEqual(self.ie._download_json(uri, None, transform_source=strip_jsonp), {'foo': 'blah'})
uri = encode_data_uri(b'{"foo": invalid}', 'application/json')
self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -10,13 +10,14 @@ import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.utils import get_filesystem_encoding
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_getenv, compat_getenv,
compat_setenv,
compat_etree_fromstring, compat_etree_fromstring,
compat_expanduser, compat_expanduser,
compat_shlex_split, compat_shlex_split,
compat_str, compat_str,
compat_struct_unpack,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
@ -26,19 +27,22 @@ from youtube_dl.compat import (
class TestCompat(unittest.TestCase): class TestCompat(unittest.TestCase):
def test_compat_getenv(self): def test_compat_getenv(self):
test_str = 'тест' test_str = 'тест'
os.environ['YOUTUBE-DL-TEST'] = ( compat_setenv('YOUTUBE-DL-TEST', test_str)
test_str if sys.version_info >= (3, 0)
else test_str.encode(get_filesystem_encoding()))
self.assertEqual(compat_getenv('YOUTUBE-DL-TEST'), test_str) self.assertEqual(compat_getenv('YOUTUBE-DL-TEST'), test_str)
def test_compat_setenv(self):
test_var = 'YOUTUBE-DL-TEST'
test_str = 'тест'
compat_setenv(test_var, test_str)
compat_getenv(test_var)
self.assertEqual(compat_getenv(test_var), test_str)
def test_compat_expanduser(self): def test_compat_expanduser(self):
old_home = os.environ.get('HOME') old_home = os.environ.get('HOME')
test_str = 'C:\Documents and Settings\тест\Application Data' test_str = 'C:\Documents and Settings\тест\Application Data'
os.environ['HOME'] = ( compat_setenv('HOME', test_str)
test_str if sys.version_info >= (3, 0)
else test_str.encode(get_filesystem_encoding()))
self.assertEqual(compat_expanduser('~'), test_str) self.assertEqual(compat_expanduser('~'), test_str)
os.environ['HOME'] = old_home compat_setenv('HOME', old_home or '')
def test_all_present(self): def test_all_present(self):
import youtube_dl.compat import youtube_dl.compat
@ -99,5 +103,9 @@ class TestCompat(unittest.TestCase):
self.assertTrue(isinstance(doc.find('chinese').text, compat_str)) self.assertTrue(isinstance(doc.find('chinese').text, compat_str))
self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str)) self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str))
def test_struct_unpack(self):
self.assertEqual(compat_struct_unpack('!B', b'\x00'), (0,))
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

107
test/test_socks.py Normal file
View File

@ -0,0 +1,107 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import random
import subprocess
from test.helper import (
FakeYDL,
get_params,
)
from youtube_dl.compat import (
compat_str,
compat_urllib_request,
)
class TestMultipleSocks(unittest.TestCase):
@staticmethod
def _check_params(attrs):
params = get_params()
for attr in attrs:
if attr not in params:
print('Missing %s. Skipping.' % attr)
return
return params
def test_proxy_http(self):
params = self._check_params(['primary_proxy', 'primary_server_ip'])
if params is None:
return
ydl = FakeYDL({
'proxy': params['primary_proxy']
})
self.assertEqual(
ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8'),
params['primary_server_ip'])
def test_proxy_https(self):
params = self._check_params(['primary_proxy', 'primary_server_ip'])
if params is None:
return
ydl = FakeYDL({
'proxy': params['primary_proxy']
})
self.assertEqual(
ydl.urlopen('https://yt-dl.org/ip').read().decode('utf-8'),
params['primary_server_ip'])
def test_secondary_proxy_http(self):
params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
if params is None:
return
ydl = FakeYDL()
req = compat_urllib_request.Request('http://yt-dl.org/ip')
req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
self.assertEqual(
ydl.urlopen(req).read().decode('utf-8'),
params['secondary_server_ip'])
def test_secondary_proxy_https(self):
params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
if params is None:
return
ydl = FakeYDL()
req = compat_urllib_request.Request('https://yt-dl.org/ip')
req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
self.assertEqual(
ydl.urlopen(req).read().decode('utf-8'),
params['secondary_server_ip'])
class TestSocks(unittest.TestCase):
def setUp(self):
self.port = random.randint(20000, 30000)
self.server_process = subprocess.Popen([
'srelay', '-f', '-i', '127.0.0.1:%d' % self.port],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def tearDown(self):
self.server_process.terminate()
self.server_process.communicate()
def _get_ip(self, protocol):
ydl = FakeYDL({
'proxy': '%s://127.0.0.1:%d' % (protocol, self.port),
})
return ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8')
def test_socks4(self):
self.assertTrue(isinstance(self._get_ip('socks4'), compat_str))
def test_socks4a(self):
self.assertTrue(isinstance(self._get_ip('socks4a'), compat_str))
def test_socks5(self):
self.assertTrue(isinstance(self._get_ip('socks5'), compat_str))
if __name__ == '__main__':
unittest.main()

View File

@ -20,6 +20,7 @@ from youtube_dl.utils import (
args_to_str, args_to_str,
encode_base_n, encode_base_n,
clean_html, clean_html,
date_from_str,
DateRange, DateRange,
detect_exe_version, detect_exe_version,
determine_ext, determine_ext,
@ -54,7 +55,6 @@ from youtube_dl.utils import (
smuggle_url, smuggle_url,
str_to_int, str_to_int,
strip_jsonp, strip_jsonp,
struct_unpack,
timeconvert, timeconvert,
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
@ -138,8 +138,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual('yes_no', sanitize_filename('yes? no', restricted=True)) self.assertEqual('yes_no', sanitize_filename('yes? no', restricted=True))
self.assertEqual('this_-_that', sanitize_filename('this: that', restricted=True)) self.assertEqual('this_-_that', sanitize_filename('this: that', restricted=True))
tests = 'a\xe4b\u4e2d\u56fd\u7684c' tests = 'aäb\u4e2d\u56fd\u7684c'
self.assertEqual(sanitize_filename(tests, restricted=True), 'a_b_c') self.assertEqual(sanitize_filename(tests, restricted=True), 'aab_c')
self.assertTrue(sanitize_filename('\xf6', restricted=True) != '') # No empty filename self.assertTrue(sanitize_filename('\xf6', restricted=True) != '') # No empty filename
forbidden = '"\0\\/&!: \'\t\n()[]{}$;`^,#' forbidden = '"\0\\/&!: \'\t\n()[]{}$;`^,#'
@ -154,6 +154,10 @@ class TestUtil(unittest.TestCase):
self.assertTrue(sanitize_filename('-', restricted=True) != '') self.assertTrue(sanitize_filename('-', restricted=True) != '')
self.assertTrue(sanitize_filename(':', restricted=True) != '') self.assertTrue(sanitize_filename(':', restricted=True) != '')
self.assertEqual(sanitize_filename(
'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØŒÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøœùúûüýþÿ', restricted=True),
'AAAAAAAECEEEEIIIIDNOOOOOOOEUUUUYPssaaaaaaaeceeeeiiiionoooooooeuuuuypy')
def test_sanitize_ids(self): def test_sanitize_ids(self):
self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw') self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw')
self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw') self.assertEqual(sanitize_filename('_BD_eEpuzXw', is_id=True), '_BD_eEpuzXw')
@ -234,6 +238,13 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&eacute;'), 'é') self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;') self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
def test_date_from_str(self):
self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
self.assertEqual(date_from_str('now+7day'), date_from_str('now+1week'))
self.assertEqual(date_from_str('now+14day'), date_from_str('now+2week'))
self.assertEqual(date_from_str('now+365day'), date_from_str('now+1year'))
self.assertEqual(date_from_str('now+30day'), date_from_str('now+1month'))
def test_daterange(self): def test_daterange(self):
_20century = DateRange("19000101", "20000101") _20century = DateRange("19000101", "20000101")
self.assertFalse("17890714" in _20century) self.assertFalse("17890714" in _20century)
@ -405,6 +416,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_duration('01:02:03:04'), 93784) self.assertEqual(parse_duration('01:02:03:04'), 93784)
self.assertEqual(parse_duration('1 hour 3 minutes'), 3780) self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
self.assertEqual(parse_duration('87 Min.'), 5220) self.assertEqual(parse_duration('87 Min.'), 5220)
self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
def test_fix_xml_ampersands(self): def test_fix_xml_ampersands(self):
self.assertEqual( self.assertEqual(
@ -444,9 +456,6 @@ class TestUtil(unittest.TestCase):
testPL(5, 2, (2, 99), [2, 3, 4]) testPL(5, 2, (2, 99), [2, 3, 4])
testPL(5, 2, (20, 99), []) testPL(5, 2, (20, 99), [])
def test_struct_unpack(self):
self.assertEqual(struct_unpack('!B', b'\x00'), (0,))
def test_read_batch_urls(self): def test_read_batch_urls(self):
f = io.StringIO('''\xef\xbb\xbf foo f = io.StringIO('''\xef\xbb\xbf foo
bar\r bar\r

View File

@ -44,7 +44,7 @@ class TestYoutubeLists(unittest.TestCase):
ie = YoutubePlaylistIE(dl) ie = YoutubePlaylistIE(dl)
result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w') result = ie.extract('https://www.youtube.com/watch?v=W01L70IGBgE&index=2&list=RDOQpdSVF_k_w')
entries = result['entries'] entries = result['entries']
self.assertTrue(len(entries) >= 20) self.assertTrue(len(entries) >= 50)
original_video = entries[0] original_video = entries[0]
self.assertEqual(original_video['id'], 'OQpdSVF_k_w') self.assertEqual(original_video['id'], 'OQpdSVF_k_w')

View File

@ -9,5 +9,6 @@ passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
--exclude test_socks.py
commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo # test.test_download:TestDownload.test_NowVideo

View File

@ -64,6 +64,7 @@ from .utils import (
PostProcessingError, PostProcessingError,
preferredencoding, preferredencoding,
prepend_extension, prepend_extension,
register_socks_protocols,
render_table, render_table,
replace_extension, replace_extension,
SameFileError, SameFileError,
@ -260,7 +261,9 @@ class YoutubeDL(object):
The following options determine which downloader is picked: The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call. external_downloader: Executable of the external downloader to call.
None or unset for standard (built-in) downloader. None or unset for standard (built-in) downloader.
hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv. hls_prefer_native: Use the native HLS downloader instead of ffmpeg/avconv
if True, otherwise use ffmpeg/avconv if False, otherwise
use downloader suggested by extractor if None.
The following parameters are not used by YoutubeDL itself, they are used by The following parameters are not used by YoutubeDL itself, they are used by
the downloader (see youtube_dl/downloader/common.py): the downloader (see youtube_dl/downloader/common.py):
@ -359,6 +362,8 @@ class YoutubeDL(object):
for ph in self.params.get('progress_hooks', []): for ph in self.params.get('progress_hooks', []):
self.add_progress_hook(ph) self.add_progress_hook(ph)
register_socks_protocols()
def warn_if_short_id(self, argv): def warn_if_short_id(self, argv):
# short YouTube ID starting with dash? # short YouTube ID starting with dash?
idxs = [ idxs = [
@ -578,7 +583,7 @@ class YoutubeDL(object):
is_id=(k == 'id')) is_id=(k == 'id'))
template_dict = dict((k, sanitize(k, v)) template_dict = dict((k, sanitize(k, v))
for k, v in template_dict.items() for k, v in template_dict.items()
if v is not None) if v is not None and not isinstance(v, (list, tuple, dict)))
template_dict = collections.defaultdict(lambda: 'NA', template_dict) template_dict = collections.defaultdict(lambda: 'NA', template_dict)
outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL) outtmpl = self.params.get('outtmpl', DEFAULT_OUTTMPL)
@ -715,6 +720,7 @@ class YoutubeDL(object):
result_type = ie_result.get('_type', 'video') result_type = ie_result.get('_type', 'video')
if result_type in ('url', 'url_transparent'): if result_type in ('url', 'url_transparent'):
ie_result['url'] = sanitize_url(ie_result['url'])
extract_flat = self.params.get('extract_flat', False) extract_flat = self.params.get('extract_flat', False)
if ((extract_flat == 'in_playlist' and 'playlist' in extra_info) or if ((extract_flat == 'in_playlist' and 'playlist' in extra_info) or
extract_flat is True): extract_flat is True):
@ -1637,7 +1643,7 @@ class YoutubeDL(object):
# Just a single file # Just a single file
success = dl(filename, info_dict) success = dl(filename, info_dict)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
self.report_error('unable to download video data: %s' % str(err)) self.report_error('unable to download video data: %s' % error_to_compat_str(err))
return return
except (OSError, IOError) as err: except (OSError, IOError) as err:
raise UnavailableVideoError(err) raise UnavailableVideoError(err)
@ -2016,6 +2022,7 @@ class YoutubeDL(object):
if opts_cookiefile is None: if opts_cookiefile is None:
self.cookiejar = compat_cookiejar.CookieJar() self.cookiejar = compat_cookiejar.CookieJar()
else: else:
opts_cookiefile = compat_expanduser(opts_cookiefile)
self.cookiejar = compat_cookiejar.MozillaCookieJar( self.cookiejar = compat_cookiejar.MozillaCookieJar(
opts_cookiefile) opts_cookiefile)
if os.access(opts_cookiefile, os.R_OK): if os.access(opts_cookiefile, os.R_OK):

View File

@ -67,9 +67,9 @@ def _real_main(argv=None):
# Custom HTTP headers # Custom HTTP headers
if opts.headers is not None: if opts.headers is not None:
for h in opts.headers: for h in opts.headers:
if h.find(':', 1) < 0: if ':' not in h:
parser.error('wrong header formatting, it should be key:value, not "%s"' % h) parser.error('wrong header formatting, it should be key:value, not "%s"' % h)
key, value = h.split(':', 2) key, value = h.split(':', 1)
if opts.verbose: if opts.verbose:
write_string('[debug] Adding header from command line option %s:%s\n' % (key, value)) write_string('[debug] Adding header from command line option %s:%s\n' % (key, value))
std_headers[key] = value std_headers[key] = value
@ -86,7 +86,9 @@ def _real_main(argv=None):
if opts.batchfile == '-': if opts.batchfile == '-':
batchfd = sys.stdin batchfd = sys.stdin
else: else:
batchfd = io.open(opts.batchfile, 'r', encoding='utf-8', errors='ignore') batchfd = io.open(
compat_expanduser(opts.batchfile),
'r', encoding='utf-8', errors='ignore')
batch_urls = read_batch_urls(batchfd) batch_urls = read_batch_urls(batchfd)
if opts.verbose: if opts.verbose:
write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n') write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
@ -404,7 +406,7 @@ def _real_main(argv=None):
try: try:
if opts.load_info_filename is not None: if opts.load_info_filename is not None:
retcode = ydl.download_with_info_file(opts.load_info_filename) retcode = ydl.download_with_info_file(compat_expanduser(opts.load_info_filename))
else: else:
retcode = ydl.download(all_urls) retcode = ydl.download(all_urls)
except MaxDownloadsReached: except MaxDownloadsReached:

View File

@ -11,6 +11,7 @@ import re
import shlex import shlex
import shutil import shutil
import socket import socket
import struct
import subprocess import subprocess
import sys import sys
import itertools import itertools
@ -340,9 +341,9 @@ except ImportError: # Python 2
return parsed_result return parsed_result
try: try:
from shlex import quote as shlex_quote from shlex import quote as compat_shlex_quote
except ImportError: # Python < 3.3 except ImportError: # Python < 3.3
def shlex_quote(s): def compat_shlex_quote(s):
if re.match(r'^[-_\w./]+$', s): if re.match(r'^[-_\w./]+$', s):
return s return s
else: else:
@ -373,6 +374,9 @@ compat_os_name = os._name if os.name == 'java' else os.name
if sys.version_info >= (3, 0): if sys.version_info >= (3, 0):
compat_getenv = os.getenv compat_getenv = os.getenv
compat_expanduser = os.path.expanduser compat_expanduser = os.path.expanduser
def compat_setenv(key, value, env=os.environ):
env[key] = value
else: else:
# Environment variables should be decoded with filesystem encoding. # Environment variables should be decoded with filesystem encoding.
# Otherwise it will fail if any non-ASCII characters present (see #3854 #3217 #2918) # Otherwise it will fail if any non-ASCII characters present (see #3854 #3217 #2918)
@ -384,6 +388,12 @@ else:
env = env.decode(get_filesystem_encoding()) env = env.decode(get_filesystem_encoding())
return env return env
def compat_setenv(key, value, env=os.environ):
def encode(v):
from .utils import get_filesystem_encoding
return v.encode(get_filesystem_encoding()) if isinstance(v, compat_str) else v
env[encode(key)] = encode(value)
# HACK: The default implementations of os.path.expanduser from cpython do not decode # HACK: The default implementations of os.path.expanduser from cpython do not decode
# environment variables with filesystem encoding. We will work around this by # environment variables with filesystem encoding. We will work around this by
# providing adjusted implementations. # providing adjusted implementations.
@ -456,18 +466,6 @@ else:
print(s) print(s)
try:
subprocess_check_output = subprocess.check_output
except AttributeError:
def subprocess_check_output(*args, **kwargs):
assert 'input' not in kwargs
p = subprocess.Popen(*args, stdout=subprocess.PIPE, **kwargs)
output, _ = p.communicate()
ret = p.poll()
if ret:
raise subprocess.CalledProcessError(ret, p.args, output=output)
return output
if sys.version_info < (3, 0) and sys.platform == 'win32': if sys.version_info < (3, 0) and sys.platform == 'win32':
def compat_getpass(prompt, *args, **kwargs): def compat_getpass(prompt, *args, **kwargs):
if isinstance(prompt, compat_str): if isinstance(prompt, compat_str):
@ -583,6 +581,26 @@ if sys.version_info >= (3, 0):
else: else:
from tokenize import generate_tokens as compat_tokenize_tokenize from tokenize import generate_tokens as compat_tokenize_tokenize
try:
struct.pack('!I', 0)
except TypeError:
# In Python 2.6 and 2.7.x < 2.7.7, struct requires a bytes argument
# See https://bugs.python.org/issue19099
def compat_struct_pack(spec, *args):
if isinstance(spec, compat_str):
spec = spec.encode('ascii')
return struct.pack(spec, *args)
def compat_struct_unpack(spec, *args):
if isinstance(spec, compat_str):
spec = spec.encode('ascii')
return struct.unpack(spec, *args)
else:
compat_struct_pack = struct.pack
compat_struct_unpack = struct.unpack
__all__ = [ __all__ = [
'compat_HTMLParser', 'compat_HTMLParser',
'compat_HTTPError', 'compat_HTTPError',
@ -604,9 +622,13 @@ __all__ = [
'compat_os_name', 'compat_os_name',
'compat_parse_qs', 'compat_parse_qs',
'compat_print', 'compat_print',
'compat_setenv',
'compat_shlex_quote',
'compat_shlex_split', 'compat_shlex_split',
'compat_socket_create_connection', 'compat_socket_create_connection',
'compat_str', 'compat_str',
'compat_struct_pack',
'compat_struct_unpack',
'compat_subprocess_get_DEVNULL', 'compat_subprocess_get_DEVNULL',
'compat_tokenize_tokenize', 'compat_tokenize_tokenize',
'compat_urllib_error', 'compat_urllib_error',
@ -623,7 +645,5 @@ __all__ = [
'compat_urlretrieve', 'compat_urlretrieve',
'compat_xml_parse_error', 'compat_xml_parse_error',
'compat_xpath', 'compat_xpath',
'shlex_quote',
'subprocess_check_output',
'workaround_optparse_bug9161', 'workaround_optparse_bug9161',
] ]

View File

@ -41,9 +41,12 @@ def get_suitable_downloader(info_dict, params={}):
if ed.can_download(info_dict): if ed.can_download(info_dict):
return ed return ed
if protocol == 'm3u8' and params.get('hls_prefer_native'): if protocol == 'm3u8' and params.get('hls_prefer_native') is True:
return HlsFD return HlsFD
if protocol == 'm3u8_native' and params.get('hls_prefer_native') is False:
return FFmpegFD
return PROTOCOL_MAP.get(protocol, HttpFD) return PROTOCOL_MAP.get(protocol, HttpFD)

View File

@ -6,6 +6,7 @@ import sys
import re import re
from .common import FileDownloader from .common import FileDownloader
from ..compat import compat_setenv
from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
from ..utils import ( from ..utils import (
cli_option, cli_option,
@ -198,6 +199,18 @@ class FFmpegFD(ExternalFD):
'-headers', '-headers',
''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())] ''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
env = None
proxy = self.params.get('proxy')
if proxy:
if not re.match(r'^[\da-zA-Z]+://', proxy):
proxy = 'http://%s' % proxy
# Since December 2015 ffmpeg supports -http_proxy option (see
# http://git.videolan.org/?p=ffmpeg.git;a=commit;h=b4eb1f29ebddd60c41a2eb39f5af701e38e0d3fd)
# We could switch to the following code if we are able to detect version properly
# args += ['-http_proxy', proxy]
env = os.environ.copy()
compat_setenv('HTTP_PROXY', proxy, env=env)
protocol = info_dict.get('protocol') protocol = info_dict.get('protocol')
if protocol == 'rtmp': if protocol == 'rtmp':
@ -224,8 +237,8 @@ class FFmpegFD(ExternalFD):
args += ['-rtmp_live', 'live'] args += ['-rtmp_live', 'live']
args += ['-i', url, '-c', 'copy'] args += ['-i', url, '-c', 'copy']
if protocol == 'm3u8': if protocol in ('m3u8', 'm3u8_native'):
if self.params.get('hls_use_mpegts', False): if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
args += ['-f', 'mpegts'] args += ['-f', 'mpegts']
else: else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc'] args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
@ -239,7 +252,7 @@ class FFmpegFD(ExternalFD):
self._debug_cmd(args) self._debug_cmd(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE) proc = subprocess.Popen(args, stdin=subprocess.PIPE, env=env)
try: try:
retval = proc.wait() retval = proc.wait()
except KeyboardInterrupt: except KeyboardInterrupt:

View File

@ -12,37 +12,49 @@ from ..compat import (
compat_urlparse, compat_urlparse,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_struct_pack,
compat_struct_unpack,
) )
from ..utils import ( from ..utils import (
encodeFilename, encodeFilename,
fix_xml_ampersands, fix_xml_ampersands,
sanitize_open, sanitize_open,
struct_pack,
struct_unpack,
xpath_text, xpath_text,
) )
class DataTruncatedError(Exception):
pass
class FlvReader(io.BytesIO): class FlvReader(io.BytesIO):
""" """
Reader for Flv files Reader for Flv files
The file format is documented in https://www.adobe.com/devnet/f4v.html The file format is documented in https://www.adobe.com/devnet/f4v.html
""" """
def read_bytes(self, n):
data = self.read(n)
if len(data) < n:
raise DataTruncatedError(
'FlvReader error: need %d bytes while only %d bytes got' % (
n, len(data)))
return data
# Utility functions for reading numbers and strings # Utility functions for reading numbers and strings
def read_unsigned_long_long(self): def read_unsigned_long_long(self):
return struct_unpack('!Q', self.read(8))[0] return compat_struct_unpack('!Q', self.read_bytes(8))[0]
def read_unsigned_int(self): def read_unsigned_int(self):
return struct_unpack('!I', self.read(4))[0] return compat_struct_unpack('!I', self.read_bytes(4))[0]
def read_unsigned_char(self): def read_unsigned_char(self):
return struct_unpack('!B', self.read(1))[0] return compat_struct_unpack('!B', self.read_bytes(1))[0]
def read_string(self): def read_string(self):
res = b'' res = b''
while True: while True:
char = self.read(1) char = self.read_bytes(1)
if char == b'\x00': if char == b'\x00':
break break
res += char res += char
@ -53,18 +65,18 @@ class FlvReader(io.BytesIO):
Read a box and return the info as a tuple: (box_size, box_type, box_data) Read a box and return the info as a tuple: (box_size, box_type, box_data)
""" """
real_size = size = self.read_unsigned_int() real_size = size = self.read_unsigned_int()
box_type = self.read(4) box_type = self.read_bytes(4)
header_end = 8 header_end = 8
if size == 1: if size == 1:
real_size = self.read_unsigned_long_long() real_size = self.read_unsigned_long_long()
header_end = 16 header_end = 16
return real_size, box_type, self.read(real_size - header_end) return real_size, box_type, self.read_bytes(real_size - header_end)
def read_asrt(self): def read_asrt(self):
# version # version
self.read_unsigned_char() self.read_unsigned_char()
# flags # flags
self.read(3) self.read_bytes(3)
quality_entry_count = self.read_unsigned_char() quality_entry_count = self.read_unsigned_char()
# QualityEntryCount # QualityEntryCount
for i in range(quality_entry_count): for i in range(quality_entry_count):
@ -85,7 +97,7 @@ class FlvReader(io.BytesIO):
# version # version
self.read_unsigned_char() self.read_unsigned_char()
# flags # flags
self.read(3) self.read_bytes(3)
# time scale # time scale
self.read_unsigned_int() self.read_unsigned_int()
@ -119,7 +131,7 @@ class FlvReader(io.BytesIO):
# version # version
self.read_unsigned_char() self.read_unsigned_char()
# flags # flags
self.read(3) self.read_bytes(3)
self.read_unsigned_int() # BootstrapinfoVersion self.read_unsigned_int() # BootstrapinfoVersion
# Profile,Live,Update,Reserved # Profile,Live,Update,Reserved
@ -194,11 +206,11 @@ def build_fragments_list(boot_info):
def write_unsigned_int(stream, val): def write_unsigned_int(stream, val):
stream.write(struct_pack('!I', val)) stream.write(compat_struct_pack('!I', val))
def write_unsigned_int_24(stream, val): def write_unsigned_int_24(stream, val):
stream.write(struct_pack('!I', val)[1:]) stream.write(compat_struct_pack('!I', val)[1:])
def write_flv_header(stream): def write_flv_header(stream):
@ -374,7 +386,17 @@ class F4mFD(FragmentFD):
down.close() down.close()
reader = FlvReader(down_data) reader = FlvReader(down_data)
while True: while True:
try:
_, box_type, box_data = reader.read_box_info() _, box_type, box_data = reader.read_box_info()
except DataTruncatedError:
if test:
# In tests, segments may be truncated, and thus
# FlvReader may not be able to parse the whole
# chunk. If so, write the segment as is
# See https://github.com/rg3/youtube-dl/issues/9214
dest_stream.write(down_data)
break
raise
if box_type == b'mdat': if box_type == b'mdat':
dest_stream.write(box_data) dest_stream.write(box_data)
break break

View File

@ -4,6 +4,7 @@ import os.path
import re import re
from .fragment import FragmentFD from .fragment import FragmentFD
from .external import FFmpegFD
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
@ -17,12 +18,39 @@ class HlsFD(FragmentFD):
FD_NAME = 'hlsnative' FD_NAME = 'hlsnative'
@staticmethod
def can_download(manifest):
UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
# Live streams heuristic does not always work (e.g. geo restricted to Germany
# http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0)
# r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3]
r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# event media playlists [4]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2
# 4. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.5
)
return all(not re.search(feature, manifest) for feature in UNSUPPORTED_FEATURES)
def real_download(self, filename, info_dict): def real_download(self, filename, info_dict):
man_url = info_dict['url'] man_url = info_dict['url']
self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME) self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
manifest = self.ydl.urlopen(man_url).read() manifest = self.ydl.urlopen(man_url).read()
s = manifest.decode('utf-8', 'ignore') s = manifest.decode('utf-8', 'ignore')
if not self.can_download(s):
self.report_warning(
'hlsnative has detected features it does not support, '
'extraction will be delegated to ffmpeg')
fd = FFmpegFD(self.ydl, self.params)
for ph in self._progress_hooks:
fd.add_progress_hook(ph)
return fd.real_download(filename, info_dict)
fragment_urls = [] fragment_urls = []
for line in s.splitlines(): for line in s.splitlines():
line = line.strip() line = line.strip()

View File

@ -27,6 +27,8 @@ class RtspFD(FileDownloader):
self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.') self.report_error('MMS or RTSP download detected but neither "mplayer" nor "mpv" could be run. Please install any.')
return False return False
self._debug_cmd(args)
retval = subprocess.call(args) retval = subprocess.call(args)
if retval == 0: if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename)) fsize = os.path.getsize(encodeFilename(tmpfilename))

View File

@ -1,26 +1,113 @@
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
class AolIE(InfoExtractor): class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com' IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)' _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
# video with 5min ID
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img', 'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
'md5': '18ef68f48740e86ae94b98da815eec42', 'md5': '18ef68f48740e86ae94b98da815eec42',
'info_dict': { 'info_dict': {
'id': '518167793', 'id': '518167793',
'ext': 'mp4', 'ext': 'mp4',
'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam', 'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
'description': 'A major phone scam has cost thousands of taxpayers more than $1 million, with less than a month until income tax returns are due to the IRS.',
'timestamp': 1395405060,
'upload_date': '20140321',
'uploader': 'Newsy Studio',
}, },
'add_ie': ['FiveMin'], 'params': {
# m3u8 download
'skip_download': True,
}
}, {
# video with vidible ID
'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
'info_dict': {
'id': '5707d6b8e4b090497b04f706',
'ext': 'mp4',
'title': 'Netflix is Raising Rates',
'description': 'Netflix is rewarding millions of its long-standing members with an increase in cost. Veuers Carly Figueroa has more.',
'upload_date': '20160408',
'timestamp': 1460123280,
'uploader': 'Veuer',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://on.aol.com/partners/abc-551438d309eab105804dbfe8/sneak-peek-was-haley-really-framed-570eaebee4b0448640a5c944',
'only_matching': True,
}, {
'url': 'http://on.aol.com/shows/park-bench-shw518173474-559a1b9be4b0c3bfad3357a7?context=SH:SHW518173474:PL4327:1460619712763',
'only_matching': True,
}, {
'url': 'http://on.aol.com/video/519442220',
'only_matching': True,
}, {
'url': 'aol-video:5707d6b8e4b090497b04f706',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id)
response = self._download_json(
'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,
video_id)['response']
if response['statusText'] != 'Ok':
raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusText']), expected=True)
video_data = response['data']
formats = []
m3u8_url = video_data.get('videoMasterPlaylist')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
for rendition in video_data.get('renditions', []):
video_url = rendition.get('url')
if not video_url:
continue
ext = rendition.get('format')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
f = {
'url': video_url,
'format_id': rendition.get('quality'),
}
mobj = re.search(r'(\d+)x(\d+)', video_url)
if mobj:
f.update({
'width': int(mobj.group(1)),
'height': int(mobj.group(2)),
})
formats.append(f)
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': video_data['title'],
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('publishDate')),
'view_count': int_or_none(video_data.get('views')),
'description': video_data.get('description'),
'uploader': video_data.get('videoOwner'),
'formats': formats,
}
class AolFeaturesIE(InfoExtractor): class AolFeaturesIE(InfoExtractor):

View File

@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor):
subtitle_url = media_info.get('_subtitleUrl') subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url: if subtitle_url:
subtitles['de'] = [{ subtitles['de'] = [{
'ext': 'srt', 'ext': 'ttml',
'url': subtitle_url, 'url': subtitle_url,
}] }]

View File

@ -63,7 +63,7 @@ class ArteTvIE(InfoExtractor):
class ArteTVPlus7IE(InfoExtractor): class ArteTVPlus7IE(InfoExtractor):
IE_NAME = 'arte.tv:+7' IE_NAME = 'arte.tv:+7'
_VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&+])' _VALID_URL = r'https?://(?:www\.)?arte\.tv/guide/(?P<lang>fr|de|en|es)/(?:(?:sendungen|emissions|embed)/)?(?P<id>[^/]+)/(?P<name>[^/?#&]+)'
@classmethod @classmethod
def _extract_url_info(cls, url): def _extract_url_info(cls, url):
@ -161,24 +161,53 @@ class ArteTVPlus7IE(InfoExtractor):
'es': 'E[ESP]', 'es': 'E[ESP]',
} }
langcode = LANGS.get(lang, lang)
formats = [] formats = []
for format_id, format_dict in player_info['VSR'].items(): for format_id, format_dict in player_info['VSR'].items():
f = dict(format_dict) f = dict(format_dict)
versionCode = f.get('versionCode') versionCode = f.get('versionCode')
langcode = LANGS.get(lang, lang) l = re.escape(langcode)
lang_rexs = [r'VO?%s-' % re.escape(langcode), r'VO?.-ST%s$' % re.escape(langcode)]
lang_pref = None # Language preference from most to least priority
if versionCode: # Reference: section 5.6.3 of
matched_lang_rexs = [r for r in lang_rexs if re.match(r, versionCode)] # http://www.arte.tv/sites/en/corporate/files/complete-technical-guidelines-arte-geie-v1-05.pdf
lang_pref = -10 if not matched_lang_rexs else 10 * len(matched_lang_rexs) PREFERENCES = (
source_pref = 0 # original version in requested language, without subtitles
if versionCode is not None: r'VO{0}$'.format(l),
# The original version with subtitles has lower relevance # original version in requested language, with partial subtitles in requested language
if re.match(r'VO-ST(F|A|E)', versionCode): r'VO{0}-ST{0}$'.format(l),
source_pref -= 10 # original version in requested language, with subtitles for the deaf and hard-of-hearing in requested language
# The version with sourds/mal subtitles has also lower relevance r'VO{0}-STM{0}$'.format(l),
elif re.match(r'VO?(F|A|E)-STM\1', versionCode): # non-original (dubbed) version in requested language, without subtitles
source_pref -= 9 r'V{0}$'.format(l),
# non-original (dubbed) version in requested language, with subtitles partial subtitles in requested language
r'V{0}-ST{0}$'.format(l),
# non-original (dubbed) version in requested language, with subtitles for the deaf and hard-of-hearing in requested language
r'V{0}-STM{0}$'.format(l),
# original version in requested language, with partial subtitles in different language
r'VO{0}-ST(?!{0}).+?$'.format(l),
# original version in requested language, with subtitles for the deaf and hard-of-hearing in different language
r'VO{0}-STM(?!{0}).+?$'.format(l),
# original version in different language, with partial subtitles in requested language
r'VO(?:(?!{0}).+?)?-ST{0}$'.format(l),
# original version in different language, with subtitles for the deaf and hard-of-hearing in requested language
r'VO(?:(?!{0}).+?)?-STM{0}$'.format(l),
# original version in different language, without subtitles
r'VO(?:(?!{0}))?$'.format(l),
# original version in different language, with partial subtitles in different language
r'VO(?:(?!{0}).+?)?-ST(?!{0}).+?$'.format(l),
# original version in different language, with subtitles for the deaf and hard-of-hearing in different language
r'VO(?:(?!{0}).+?)?-STM(?!{0}).+?$'.format(l),
)
for pref, p in enumerate(PREFERENCES):
if re.match(p, versionCode):
lang_pref = len(PREFERENCES) - pref
break
else:
lang_pref = -1
format = { format = {
'format_id': format_id, 'format_id': format_id,
'preference': -10 if f.get('videoFormat') == 'M3U8' else None, 'preference': -10 if f.get('videoFormat') == 'M3U8' else None,
@ -188,7 +217,6 @@ class ArteTVPlus7IE(InfoExtractor):
'height': int_or_none(f.get('height')), 'height': int_or_none(f.get('height')),
'tbr': int_or_none(f.get('bitrate')), 'tbr': int_or_none(f.get('bitrate')),
'quality': qfunc(f.get('quality')), 'quality': qfunc(f.get('quality')),
'source_preference': source_pref,
} }
if f.get('mediaType') == 'rtmp': if f.get('mediaType') == 'rtmp':
@ -210,7 +238,7 @@ class ArteTVPlus7IE(InfoExtractor):
# It also uses the arte_vp_url url from the webpage to extract the information # It also uses the arte_vp_url url from the webpage to extract the information
class ArteTVCreativeIE(ArteTVPlus7IE): class ArteTVCreativeIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:creative' IE_NAME = 'arte.tv:creative'
_VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:magazine?/)?(?P<id>[^/?#&]+)' _VALID_URL = r'https?://creative\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design', 'url': 'http://creative.arte.tv/de/magazin/agentur-amateur-corporate-design',
@ -229,9 +257,27 @@ class ArteTVCreativeIE(ArteTVPlus7IE):
'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n', 'description': 'Événement ! Quarante-cinq ans après leurs premiers succès, les légendaires Monty Python remontent sur scène.\n',
'upload_date': '20140805', 'upload_date': '20140805',
} }
}, {
'url': 'http://creative.arte.tv/de/episode/agentur-amateur-4-der-erste-kunde',
'only_matching': True,
}] }]
class ArteTVInfoIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:info'
_VALID_URL = r'https?://info\.arte\.tv/(?P<lang>fr|de|en|es)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TEST = {
'url': 'http://info.arte.tv/fr/service-civique-un-cache-misere',
'info_dict': {
'id': '067528-000-A',
'ext': 'mp4',
'title': 'Service civique, un cache misère ?',
'upload_date': '20160403',
},
}
class ArteTVFutureIE(ArteTVPlus7IE): class ArteTVFutureIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:future' IE_NAME = 'arte.tv:future'
_VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://future\.arte\.tv/(?P<lang>fr|de|en|es)/(?P<id>[^/?#&]+)'
@ -337,7 +383,7 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed' IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
http://www\.arte\.tv http://www\.arte\.tv
/playerv2/embed\.php\?json_url= /(?:playerv2/embed|arte_vp/index)\.php\?json_url=
(?P<json_url> (?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/ http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]* (?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*

View File

@ -30,14 +30,14 @@ class AudiomackIE(InfoExtractor):
# audiomack wrapper around soundcloud song # audiomack wrapper around soundcloud song
{ {
'add_ie': ['Soundcloud'], 'add_ie': ['Soundcloud'],
'url': 'http://www.audiomack.com/song/xclusiveszone/take-kare', 'url': 'http://www.audiomack.com/song/hip-hop-daily/black-mamba-freestyle',
'info_dict': { 'info_dict': {
'id': '172419696', 'id': '258901379',
'ext': 'mp3', 'ext': 'mp3',
'description': 'md5:1fc3272ed7a635cce5be1568c2822997', 'description': 'mamba day freestyle for the legend Kobe Bryant ',
'title': 'Young Thug ft Lil Wayne - Take Kare', 'title': 'Black Mamba Freestyle [Prod. By Danny Wolf]',
'uploader': 'Young Thug World', 'uploader': 'ILOVEMAKONNEN',
'upload_date': '20141016', 'upload_date': '20160414',
} }
}, },
] ]

View File

@ -671,6 +671,7 @@ class BBCIE(BBCCoUkIE):
'info_dict': { 'info_dict': {
'id': '34475836', 'id': '34475836',
'title': 'Jurgen Klopp: Furious football from a witty and winning coach', 'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
'description': 'Fast-paced football, wit, wisdom and a ready smile - why Liverpool fans should come to love new boss Jurgen Klopp.',
}, },
'playlist_count': 3, 'playlist_count': 3,
}, { }, {

View File

@ -0,0 +1,39 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class BIQLEIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?biqle\.(?:com|org|ru)/watch/(?P<id>-?\d+_\d+)'
_TESTS = [{
'url': 'http://www.biqle.ru/watch/847655_160197695',
'md5': 'ad5f746a874ccded7b8f211aeea96637',
'info_dict': {
'id': '160197695',
'ext': 'mp4',
'title': 'Foo Fighters - The Pretender (Live at Wembley Stadium)',
'uploader': 'Andrey Rogozin',
'upload_date': '20110605',
}
}, {
'url': 'https://biqle.org/watch/-44781847_168547604',
'md5': '7f24e72af1db0edf7c1aaba513174f97',
'info_dict': {
'id': '168547604',
'ext': 'mp4',
'title': 'Ребенок в шоке от автоматической мойки',
'uploader': 'Dmitry Kotov',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
embed_url = self._proto_relative_url(self._search_regex(
r'<iframe.+?src="((?:http:)?//daxab\.com/[^"]+)".*?></iframe>', webpage, 'embed url'))
return {
'_type': 'url_transparent',
'url': embed_url,
}

View File

@ -17,6 +17,9 @@ class BloombergIE(InfoExtractor):
'title': 'Shah\'s Presentation on Foreign-Exchange Strategies', 'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
'description': 'md5:a8ba0302912d03d246979735c17d2761', 'description': 'md5:a8ba0302912d03d246979735c17d2761',
}, },
'params': {
'format': 'best[format_id^=hds]',
},
}, { }, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets', 'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True, 'only_matching': True,

View File

@ -307,9 +307,10 @@ class BrightcoveLegacyIE(InfoExtractor):
playlist_title=playlist_info['mediaCollectionDTO']['displayName']) playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info): def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
publisher_id = video_info.get('publisherId') publisher_id = video_info.get('publisherId')
info = { info = {
'id': compat_str(video_info['id']), 'id': video_id,
'title': video_info['displayName'].strip(), 'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'), 'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'), 'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
@ -331,7 +332,8 @@ class BrightcoveLegacyIE(InfoExtractor):
url_comp = compat_urllib_parse_urlparse(url) url_comp = compat_urllib_parse_urlparse(url)
if url_comp.path.endswith('.m3u8'): if url_comp.path.endswith('.m3u8'):
formats.extend( formats.extend(
self._extract_m3u8_formats(url, info['id'], 'mp4')) self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue continue
elif 'akamaihd.net' in url_comp.netloc: elif 'akamaihd.net' in url_comp.netloc:
# This type of renditions are served through # This type of renditions are served through
@ -340,7 +342,7 @@ class BrightcoveLegacyIE(InfoExtractor):
ext = 'flv' ext = 'flv'
if ext is None: if ext is None:
ext = determine_ext(url) ext = determine_ext(url)
tbr = int_or_none(rend.get('encodingRate'), 1000), tbr = int_or_none(rend.get('encodingRate'), 1000)
a_format = { a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''), 'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url, 'url': url,
@ -365,7 +367,7 @@ class BrightcoveLegacyIE(InfoExtractor):
a_format.update({ a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''), 'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4', 'ext': 'mp4',
'protocol': 'm3u8', 'protocol': 'm3u8_native',
}) })
formats.append(a_format) formats.append(a_format)
@ -395,7 +397,7 @@ class BrightcoveLegacyIE(InfoExtractor):
return ad_info return ad_info
if 'url' not in info and not info.get('formats'): if 'url' not in info and not info.get('formats'):
raise ExtractorError('Unable to extract video url for %s' % info['id']) raise ExtractorError('Unable to extract video url for %s' % video_id)
return info return info
@ -527,7 +529,7 @@ class BrightcoveNewIE(InfoExtractor):
if not src: if not src:
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id='hls', fatal=False)) src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif source_type == 'application/dash+xml': elif source_type == 'application/dash+xml':
if not src: if not src:
continue continue

View File

@ -33,6 +33,7 @@ class CBCIE(InfoExtractor):
'title': 'Robin Williams freestyles on 90 Minutes Live', 'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.', 'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101', 'upload_date': '19700101',
'uploader': 'CBCC-NEW',
}, },
'params': { 'params': {
# rtmp download # rtmp download

View File

@ -5,7 +5,6 @@ from ..utils import (
xpath_text, xpath_text,
xpath_element, xpath_element,
int_or_none, int_or_none,
ExtractorError,
find_xpath_attr, find_xpath_attr,
) )
@ -64,7 +63,7 @@ class CBSIE(CBSBaseIE):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}] }]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true' TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -84,11 +83,11 @@ class CBSIE(CBSBaseIE):
pid = xpath_text(item, 'pid') pid = xpath_text(item, 'pid')
if not pid: if not pid:
continue continue
try: tp_release_url = self.TP_RELEASE_URL_TEMPLATE % pid
if '.m3u8' in xpath_text(item, 'contentUrl', default=''):
tp_release_url += '&manifest=m3u'
tp_formats, tp_subtitles = self._extract_theplatform_smil( tp_formats, tp_subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid) tp_release_url, content_id, 'Downloading %s SMIL data' % pid)
except ExtractorError:
continue
formats.extend(tp_formats) formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -1,13 +1,9 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_duration, parse_iso8601,
qualities,
unified_strdate,
) )
@ -19,14 +15,14 @@ class CCCIE(InfoExtractor):
'url': 'https://media.ccc.de/v/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor#video', 'url': 'https://media.ccc.de/v/30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor#video',
'md5': '3a1eda8f3a29515d27f5adb967d7e740', 'md5': '3a1eda8f3a29515d27f5adb967d7e740',
'info_dict': { 'info_dict': {
'id': '30C3_-_5443_-_en_-_saal_g_-_201312281830_-_introduction_to_processor_design_-_byterazor', 'id': '1839',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Introduction to Processor Design', 'title': 'Introduction to Processor Design',
'description': 'md5:80be298773966f66d56cb11260b879af', 'description': 'md5:df55f6d073d4ceae55aae6f2fd98a0ac',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'view_count': int,
'upload_date': '20131228', 'upload_date': '20131228',
'duration': 3660, 'timestamp': 1388188800,
'duration': 3710,
} }
}, { }, {
'url': 'https://media.ccc.de/v/32c3-7368-shopshifting#download', 'url': 'https://media.ccc.de/v/32c3-7368-shopshifting#download',
@ -34,79 +30,48 @@ class CCCIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, display_id)
event_id = self._search_regex("data-id='(\d+)'", webpage, 'event id')
event_data = self._download_json('https://media.ccc.de/public/events/%s' % event_id, event_id)
if self._downloader.params.get('prefer_free_formats'):
preference = qualities(['mp3', 'opus', 'mp4-lq', 'webm-lq', 'h264-sd', 'mp4-sd', 'webm-sd', 'mp4', 'webm', 'mp4-hd', 'h264-hd', 'webm-hd'])
else:
preference = qualities(['opus', 'mp3', 'webm-lq', 'mp4-lq', 'webm-sd', 'h264-sd', 'mp4-sd', 'webm', 'mp4', 'webm-hd', 'mp4-hd', 'h264-hd'])
title = self._html_search_regex(
r'(?s)<h1>(.*?)</h1>', webpage, 'title')
description = self._html_search_regex(
r'(?s)<h3>About</h3>(.+?)<h3>',
webpage, 'description', fatal=False)
upload_date = unified_strdate(self._html_search_regex(
r"(?s)<span[^>]+class='[^']*fa-calendar-o'[^>]*>(.+?)</span>",
webpage, 'upload date', fatal=False))
view_count = int_or_none(self._html_search_regex(
r"(?s)<span class='[^']*fa-eye'></span>(.*?)</li>",
webpage, 'view count', fatal=False))
duration = parse_duration(self._html_search_regex(
r'(?s)<span[^>]+class=(["\']).*?fa-clock-o.*?\1[^>]*></span>(?P<duration>.+?)</li',
webpage, 'duration', fatal=False, group='duration'))
matches = re.finditer(r'''(?xs)
<(?:span|div)\s+class='label\s+filetype'>(?P<format>[^<]*)</(?:span|div)>\s*
<(?:span|div)\s+class='label\s+filetype'>(?P<lang>[^<]*)</(?:span|div)>\s*
<a\s+download\s+href='(?P<http_url>[^']+)'>\s*
(?:
.*?
<a\s+(?:download\s+)?href='(?P<torrent_url>[^']+\.torrent)'
)?''', webpage)
formats = [] formats = []
for m in matches: for recording in event_data.get('recordings', []):
format = m.group('format') recording_url = recording.get('recording_url')
format_id = self._search_regex( if not recording_url:
r'.*/([a-z0-9_-]+)/[^/]*$', continue
m.group('http_url'), 'format id', default=None) language = recording.get('language')
if format_id: folder = recording.get('folder')
format_id = m.group('lang') + '-' + format_id format_id = None
vcodec = 'h264' if 'h264' in format_id else ( if language:
'none' if format_id in ('mp3', 'opus') else None format_id = language
if folder:
if language:
format_id += '-' + folder
else:
format_id = folder
vcodec = 'h264' if 'h264' in folder else (
'none' if folder in ('mp3', 'opus') else None
) )
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'format': format, 'url': recording_url,
'language': m.group('lang'), 'width': int_or_none(recording.get('width')),
'url': m.group('http_url'), 'height': int_or_none(recording.get('height')),
'filesize': int_or_none(recording.get('size'), invscale=1024 * 1024),
'language': language,
'vcodec': vcodec, 'vcodec': vcodec,
'preference': preference(format_id),
})
if m.group('torrent_url'):
formats.append({
'format_id': 'torrent-%s' % (format if format_id is None else format_id),
'format': '%s (torrent)' % format,
'proto': 'torrent',
'format_note': '(unsupported; will just download the .torrent file)',
'vcodec': vcodec,
'preference': -100 + preference(format_id),
'url': m.group('torrent_url'),
}) })
self._sort_formats(formats) self._sort_formats(formats)
thumbnail = self._html_search_regex(
r"<video.*?poster='([^']+)'", webpage, 'thumbnail', fatal=False)
return { return {
'id': video_id, 'id': event_id,
'title': title, 'display_id': display_id,
'description': description, 'title': event_data['title'],
'thumbnail': thumbnail, 'description': event_data.get('description'),
'view_count': view_count, 'thumbnail': event_data.get('thumb_url'),
'upload_date': upload_date, 'timestamp': parse_iso8601(event_data.get('date')),
'duration': duration, 'duration': int_or_none(event_data.get('length')),
'tags': event_data.get('tags'),
'formats': formats, 'formats': formats,
} }

View File

@ -33,19 +33,33 @@ class CeskaTelevizeIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
'url': 'http://www.ceskatelevize.cz/ivysilani/10532695142-prvni-republika/bonus/14716-zpevacka-z-duparny-bobina', 'url': 'http://www.ceskatelevize.cz/ivysilani/10441294653-hyde-park-civilizace/215411058090502/bonus/20641-bonus-01-en',
'info_dict': { 'info_dict': {
'id': '61924494876844374', 'id': '61924494877028507',
'ext': 'mp4', 'ext': 'mp4',
'title': 'První republika: Zpěvačka z Dupárny Bobina', 'title': 'Hyde Park Civilizace: Bonus 01 - En',
'description': 'Sága mapující atmosféru první republiky od r. 1918 do r. 1945.', 'description': 'English Subtittles',
'thumbnail': 're:^https?://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
'duration': 88.4, 'duration': 81.3,
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
}, {
# live stream
'url': 'http://www.ceskatelevize.cz/ivysilani/zive/ct4/',
'info_dict': {
'id': 402,
'ext': 'mp4',
'title': 're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'Georestricted to Czech Republic',
}, { }, {
# video with 18+ caution trailer # video with 18+ caution trailer
'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/', 'url': 'http://www.ceskatelevize.cz/porady/10520528904-queer/215562210900007-bogotart/',
@ -118,19 +132,21 @@ class CeskaTelevizeIE(InfoExtractor):
req = sanitized_Request(compat_urllib_parse_unquote(playlist_url)) req = sanitized_Request(compat_urllib_parse_unquote(playlist_url))
req.add_header('Referer', url) req.add_header('Referer', url)
playlist_title = self._og_search_title(webpage) playlist_title = self._og_search_title(webpage, default=None)
playlist_description = self._og_search_description(webpage) playlist_description = self._og_search_description(webpage, default=None)
playlist = self._download_json(req, playlist_id)['playlist'] playlist = self._download_json(req, playlist_id)['playlist']
playlist_len = len(playlist) playlist_len = len(playlist)
entries = [] entries = []
for item in playlist: for item in playlist:
is_live = item.get('type') == 'LIVE'
formats = [] formats = []
for format_id, stream_url in item['streamUrls'].items(): for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', stream_url, playlist_id, 'mp4',
entry_protocol='m3u8_native', fatal=False)) entry_protocol='m3u8' if is_live else 'm3u8_native',
fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
item_id = item.get('id') or item['assetId'] item_id = item.get('id') or item['assetId']
@ -145,14 +161,22 @@ class CeskaTelevizeIE(InfoExtractor):
if subs: if subs:
subtitles = self.extract_subtitles(episode_id, subs) subtitles = self.extract_subtitles(episode_id, subs)
if playlist_len == 1:
final_title = playlist_title or title
if is_live:
final_title = self._live_title(final_title)
else:
final_title = '%s (%s)' % (playlist_title, title)
entries.append({ entries.append({
'id': item_id, 'id': item_id,
'title': playlist_title if playlist_len == 1 else '%s (%s)' % (playlist_title, title), 'title': final_title,
'description': playlist_description if playlist_len == 1 else None, 'description': playlist_description if playlist_len == 1 else None,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'is_live': is_live,
}) })
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)

View File

@ -1,119 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from .screenwavemedia import ScreenwaveMediaIE
class CinemassacreIE(InfoExtractor):
_VALID_URL = 'https?://(?:www\.)?cinemassacre\.com/(?P<date_y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/(?P<display_id>[^?#/]+)'
_TESTS = [
{
'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
'md5': 'fde81fbafaee331785f58cd6c0d46190',
'info_dict': {
'id': 'Cinemassacre-19911',
'ext': 'mp4',
'upload_date': '20121110',
'title': '“Angry Video Game Nerd: The Movie” Trailer',
'description': 'md5:fb87405fcb42a331742a0dce2708560b',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
'md5': 'd72f10cd39eac4215048f62ab477a511',
'info_dict': {
'id': 'Cinemassacre-521be8ef82b16',
'ext': 'mp4',
'upload_date': '20131002',
'title': 'The Mummys Hand (1940)',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
# Youtube embedded video
'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/',
'md5': 'ec9838a5520ef5409b3e4e42fcb0a3b9',
'info_dict': {
'id': 'OEVzPCY2T-g',
'ext': 'webm',
'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles',
'upload_date': '20061207',
'uploader': 'Cinemassacre',
'uploader_id': 'JamesNintendoNerd',
'description': 'md5:784734696c2b8b7f4b8625cc799e07f6',
}
},
{
# Youtube embedded video
'url': 'http://cinemassacre.com/2006/09/01/mckids/',
'md5': '7393c4e0f54602ad110c793eb7a6513a',
'info_dict': {
'id': 'FnxsNhuikpo',
'ext': 'webm',
'upload_date': '20060901',
'uploader': 'Cinemassacre Extra',
'description': 'md5:de9b751efa9e45fbaafd9c8a1123ed53',
'uploader_id': 'Cinemassacre',
'title': 'AVGN: McKids',
}
},
{
'url': 'http://cinemassacre.com/2015/05/25/mario-kart-64-nintendo-64-james-mike-mondays/',
'md5': '1376908e49572389e7b06251a53cdd08',
'info_dict': {
'id': 'Cinemassacre-555779690c440',
'ext': 'mp4',
'description': 'Lets Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!',
'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays',
'upload_date': '20150525',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_date = mobj.group('date_y') + mobj.group('date_m') + mobj.group('date_d')
webpage = self._download_webpage(url, display_id)
playerdata_url = self._search_regex(
[
ScreenwaveMediaIE.EMBED_PATTERN,
r'<iframe[^>]+src="(?P<url>(?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
],
webpage, 'player data URL', default=None, group='url')
if not playerdata_url:
raise ExtractorError('Unable to find player data')
video_title = self._html_search_regex(
r'<title>(?P<title>.+?)\|', webpage, 'title')
video_description = self._html_search_regex(
r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, 'description', flags=re.DOTALL, fatal=False)
video_thumbnail = self._og_search_thumbnail(webpage)
return {
'_type': 'url_transparent',
'display_id': display_id,
'title': video_title,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
'url': playerdata_url,
}

View File

@ -0,0 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
class ClipRsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
'md5': 'c412d57815ba07b56f9edc7b5d6a14e5',
'info_dict': {
'id': '1488842.1399140381',
'ext': 'mp4',
'title': 'PREMIJERA Frajle predstavljaju novi spot za pesmu Moli me, moli',
'description': 'md5:56ce2c3b4ab31c5a2e0b17cb9a453026',
'duration': 229,
'timestamp': 1459850243,
'upload_date': '20160405',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
error = response.get('error')
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@ -19,7 +19,7 @@ from ..utils import (
class CloudyIE(InfoExtractor): class CloudyIE(InfoExtractor):
_IE_DESC = 'cloudy.ec and videoraj.ch' _IE_DESC = 'cloudy.ec and videoraj.ch'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.ch)/ https?://(?:www\.)?(?P<host>cloudy\.ec|videoraj\.(?:ch|to))/
(?:v/|embed\.php\?id=) (?:v/|embed\.php\?id=)
(?P<id>[A-Za-z0-9]+) (?P<id>[A-Za-z0-9]+)
''' '''
@ -37,7 +37,7 @@ class CloudyIE(InfoExtractor):
} }
}, },
{ {
'url': 'http://www.videoraj.ch/v/47f399fd8bb60', 'url': 'http://www.videoraj.to/v/47f399fd8bb60',
'md5': '7d0f8799d91efd4eda26587421c3c3b0', 'md5': '7d0f8799d91efd4eda26587421c3c3b0',
'info_dict': { 'info_dict': {
'id': '47f399fd8bb60', 'id': '47f399fd8bb60',

View File

@ -163,7 +163,7 @@ class InfoExtractor(object):
description: Full video description. description: Full video description.
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
license: License name the video is licensed under. license: License name the video is licensed under.
creator: The main artist who created the video. creator: The creator of the video.
release_date: The date (YYYYMMDD) when the video was released. release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available. timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD). upload_date: Video upload date (YYYYMMDD).
@ -376,14 +376,13 @@ class InfoExtractor(object):
self.to_screen('%s' % (note,)) self.to_screen('%s' % (note,))
else: else:
self.to_screen('%s: %s' % (video_id, note)) self.to_screen('%s: %s' % (video_id, note))
# data, headers and query params will be ignored for `Request` objects
if isinstance(url_or_request, compat_urllib_request.Request): if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request( url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query) url_or_request, data=data, headers=headers, query=query)
else: else:
if query: if query:
url_or_request = update_url_query(url_or_request, query) url_or_request = update_url_query(url_or_request, query)
if data or headers: if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers) url_or_request = sanitized_Request(url_or_request, data, headers)
try: try:
return self._downloader.urlopen(url_or_request) return self._downloader.urlopen(url_or_request)
@ -1007,6 +1006,13 @@ class InfoExtractor(object):
def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None, def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(), transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True): fatal=True):
# currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0')
if akamai_pv is not None and ';' in akamai_pv.text:
playerVerificationChallenge = akamai_pv.text.split(';')[0]
if playerVerificationChallenge.strip() != '':
return []
formats = [] formats = []
manifest_version = '1.0' manifest_version = '1.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media') media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media')
@ -1055,7 +1061,7 @@ class InfoExtractor(object):
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None, def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None, entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None, m3u8_id=None, note=None, errnote=None,
fatal=True): fatal=True, live=False):
formats = [{ formats = [{
'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])), 'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
@ -1133,6 +1139,10 @@ class InfoExtractor(object):
if m3u8_id: if m3u8_id:
format_id.append(m3u8_id) format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None
# Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided
# format_id intact.
if not live:
format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats))) format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
f = { f = {
'format_id': '-'.join(format_id), 'format_id': '-'.join(format_id),

View File

@ -11,7 +11,6 @@ from math import pow, sqrt, floor
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_urllib_parse_unquote,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
@ -27,6 +26,7 @@ from ..utils import (
unified_strdate, unified_strdate,
urlencode_postdata, urlencode_postdata,
xpath_text, xpath_text,
extract_attributes,
) )
from ..aes import ( from ..aes import (
aes_cbc_decrypt, aes_cbc_decrypt,
@ -306,28 +306,36 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage, r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', webpage,
'video_uploader', fatal=False) 'video_uploader', fatal=False)
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url')) available_fmts = []
playerdata_req = sanitized_Request(playerdata_url) for a, fmt in re.findall(r'(<a[^>]+token=["\']showmedia\.([0-9]{3,4})p["\'][^>]+>)', webpage):
playerdata_req.data = urlencode_postdata({'current_page': webpage_url}) attrs = extract_attributes(a)
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') href = attrs.get('href')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info') if href and '/freetrial' in href:
continue
stream_id = self._search_regex(r'<media_id>([^<]+)', playerdata, 'stream_id') available_fmts.append(fmt)
video_thumbnail = self._search_regex(r'<episode_image_url>([^<]+)', playerdata, 'thumbnail', fatal=False) if not available_fmts:
for p in (r'token=["\']showmedia\.([0-9]{3,4})p"', r'showmedia\.([0-9]{3,4})p'):
available_fmts = re.findall(p, webpage)
if available_fmts:
break
video_encode_ids = []
formats = [] formats = []
for fmt in re.findall(r'showmedia\.([0-9]{3,4})p', webpage): for fmt in available_fmts:
stream_quality, stream_format = self._FORMAT_IDS[fmt] stream_quality, stream_format = self._FORMAT_IDS[fmt]
video_format = fmt + 'p' video_format = fmt + 'p'
streamdata_req = sanitized_Request( streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s' 'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality), % (video_id, stream_format, stream_quality),
compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8')) compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata = self._download_xml( streamdata = self._download_xml(
streamdata_req, video_id, streamdata_req, video_id,
note='Downloading media info for %s' % video_format) note='Downloading media info for %s' % video_format)
stream_info = streamdata.find('./{default}preload/stream_info') stream_info = streamdata.find('./{default}preload/stream_info')
video_encode_id = xpath_text(stream_info, './video_encode_id')
if video_encode_id in video_encode_ids:
continue
video_encode_ids.append(video_encode_id)
video_url = xpath_text(stream_info, './host') video_url = xpath_text(stream_info, './host')
video_play_path = xpath_text(stream_info, './file') video_play_path = xpath_text(stream_info, './file')
if not video_url or not video_play_path: if not video_url or not video_play_path:
@ -359,6 +367,14 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'ext': 'flv', 'ext': 'flv',
}) })
formats.append(format_info) formats.append(format_info)
self._sort_formats(formats)
metadata = self._download_xml(
'http://www.crunchyroll.com/xml', video_id,
note='Downloading media info', query={
'req': 'RpcApiVideoPlayer_GetMediaMetadata',
'media_id': video_id,
})
subtitles = self.extract_subtitles(video_id, webpage) subtitles = self.extract_subtitles(video_id, webpage)
@ -366,9 +382,12 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
'description': video_description, 'description': video_description,
'thumbnail': video_thumbnail, 'thumbnail': xpath_text(metadata, 'episode_image_url'),
'uploader': video_uploader, 'uploader': video_uploader,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'series': xpath_text(metadata, 'series_title'),
'episode': xpath_text(metadata, 'episode_title'),
'episode_number': int_or_none(xpath_text(metadata, 'episode_number')),
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }

View File

@ -9,7 +9,7 @@ from ..utils import (
class CWTVIE(InfoExtractor): class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/shows/(?:[^/]+/){2}\?play=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})' _VALID_URL = r'https?://(?:www\.)?cw(?:tv|seed)\.com/(?:shows/)?(?:[^/]+/){2}\?.*\bplay=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{ _TESTS = [{
'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63', 'url': 'http://cwtv.com/shows/arrow/legends-of-yesterday/?play=6b15e985-9345-4f60-baf8-56e96be57c63',
'info_dict': { 'info_dict': {
@ -48,6 +48,9 @@ class CWTVIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -0,0 +1,61 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
determine_protocol,
)
class DailyMailIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dailymail\.co\.uk/video/[^/]+/video-(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.dailymail.co.uk/video/sciencetech/video-1288527/Turn-video-impressionist-masterpiece.html',
'md5': '2f639d446394f53f3a33658b518b6615',
'info_dict': {
'id': '1288527',
'ext': 'mp4',
'title': 'Turn any video into an impressionist masterpiece',
'description': 'md5:88ddbcb504367987b2708bb38677c9d2',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_data = self._parse_json(self._search_regex(
r"data-opts='({.+?})'", webpage, 'video data'), video_id)
title = video_data['title']
video_sources = self._download_json(video_data.get(
'sources', {}).get('url') or 'http://www.dailymail.co.uk/api/player/%s/video-sources.json' % video_id, video_id)
formats = []
for rendition in video_sources['renditions']:
rendition_url = rendition.get('url')
if not rendition_url:
continue
tbr = int_or_none(rendition.get('encodingRate'), 1000)
container = rendition.get('videoContainer')
is_hls = container == 'M2TS'
protocol = 'm3u8_native' if is_hls else determine_protocol({'url': rendition_url})
formats.append({
'format_id': ('hls' if is_hls else protocol) + ('-%d' % tbr if tbr else ''),
'url': rendition_url,
'width': int_or_none(rendition.get('frameWidth')),
'height': int_or_none(rendition.get('frameHeight')),
'tbr': tbr,
'vcodec': rendition.get('videoCodec'),
'container': container,
'protocol': protocol,
'ext': 'mp4' if is_hls else None,
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': video_data.get('descr'),
'thumbnail': video_data.get('poster') or video_data.get('thumbnail'),
'formats': formats,
}

View File

@ -12,39 +12,46 @@ class DFBIE(InfoExtractor):
_TEST = { _TEST = {
'url': 'http://tv.dfb.de/video/u-19-em-stimmen-zum-spiel-gegen-russland/11633/', 'url': 'http://tv.dfb.de/video/u-19-em-stimmen-zum-spiel-gegen-russland/11633/',
# The md5 is different each time 'md5': 'ac0f98a52a330f700b4b3034ad240649',
'info_dict': { 'info_dict': {
'id': '11633', 'id': '11633',
'display_id': 'u-19-em-stimmen-zum-spiel-gegen-russland', 'display_id': 'u-19-em-stimmen-zum-spiel-gegen-russland',
'ext': 'flv', 'ext': 'mp4',
'title': 'U 19-EM: Stimmen zum Spiel gegen Russland', 'title': 'U 19-EM: Stimmen zum Spiel gegen Russland',
'upload_date': '20150714', 'upload_date': '20150714',
}, },
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) display_id, video_id = re.match(self._VALID_URL, url).groups()
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
player_info = self._download_xml( player_info = self._download_xml(
'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id, 'http://tv.dfb.de/server/hd_video.php?play=%s' % video_id,
display_id) display_id)
video_info = player_info.find('video') video_info = player_info.find('video')
stream_access_url = self._proto_relative_url(video_info.find('url').text.strip())
f4m_info = self._download_xml( formats = []
self._proto_relative_url(video_info.find('url').text.strip()), display_id) # see http://tv.dfb.de/player/js/ajax.js for the method to extract m3u8 formats
token_el = f4m_info.find('token') for sa_url in (stream_access_url, stream_access_url + '&area=&format=iphone'):
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0' stream_access_info = self._download_xml(sa_url, display_id)
formats = self._extract_f4m_formats(manifest_url, display_id) token_el = stream_access_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth']
if '.f4m' in manifest_url:
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.2.0',
display_id, f4m_id='hds', fatal=False))
else:
formats.extend(self._extract_m3u8_formats(
manifest_url, display_id, 'mp4',
'm3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': video_info.find('title').text, 'title': video_info.find('title').text,
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': 'http://tv.dfb.de/images/%s_640x360.jpg' % video_id,
'upload_date': unified_strdate(video_info.find('time_date').text), 'upload_date': unified_strdate(video_info.find('time_date').text),
'formats': formats, 'formats': formats,
} }

View File

@ -33,6 +33,7 @@ class DiscoveryIE(InfoExtractor):
'duration': 156, 'duration': 156,
'timestamp': 1302032462, 'timestamp': 1302032462,
'upload_date': '20110405', 'upload_date': '20110405',
'uploader_id': '103207',
}, },
'params': { 'params': {
'skip_download': True, # requires ffmpeg 'skip_download': True, # requires ffmpeg
@ -54,7 +55,11 @@ class DiscoveryIE(InfoExtractor):
'upload_date': '20140725', 'upload_date': '20140725',
'timestamp': 1406246400, 'timestamp': 1406246400,
'duration': 116, 'duration': 116,
'uploader_id': '103207',
}, },
'params': {
'skip_download': True, # requires ffmpeg
}
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -66,13 +71,19 @@ class DiscoveryIE(InfoExtractor):
entries = [] entries = []
for idx, video_info in enumerate(info['playlist']): for idx, video_info in enumerate(info['playlist']):
formats = self._extract_m3u8_formats( subtitles = {}
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls', caption_url = video_info.get('captionsUrl')
note='Download m3u8 information for video %d' % (idx + 1)) if caption_url:
self._sort_formats(formats) subtitles = {
'en': [{
'url': caption_url,
}]
}
entries.append({ entries.append({
'_type': 'url_transparent',
'url': 'http://players.brightcove.net/103207/default_default/index.html?videoId=ref:%s' % video_info['referenceId'],
'id': compat_str(video_info['id']), 'id': compat_str(video_info['id']),
'formats': formats,
'title': video_info['title'], 'title': video_info['title'],
'description': video_info.get('description'), 'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')), 'duration': parse_duration(video_info.get('video_length')),
@ -80,6 +91,7 @@ class DiscoveryIE(InfoExtractor):
'thumbnail': video_info.get('thumbnailURL'), 'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'), 'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')), 'timestamp': parse_iso8601(video_info.get('publishedDate')),
'subtitles': subtitles,
}) })
return self.playlist_result(entries, display_id, video_title) return self.playlist_result(entries, display_id, video_title)

View File

@ -0,0 +1,114 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
remove_end,
xpath_element,
xpath_text,
)
class DigitallySpeakingIE(InfoExtractor):
_VALID_URL = r'https?://(?:evt\.dispeak|events\.digitallyspeaking)\.com/(?:[^/]+/)+xml/(?P<id>[^.]+)\.xml'
_TESTS = [{
# From http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface
'url': 'http://evt.dispeak.com/ubm/gdc/sf16/xml/840376_BQRC.xml',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '840376_BQRC',
'ext': 'mp4',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
}, {
# From http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC
'url': 'http://events.digitallyspeaking.com/gdc/sf11/xml/12396_1299111843500GMPX.xml',
'only_matching': True,
}]
def _parse_mp4(self, metadata):
video_formats = []
video_root = None
mp4_video = xpath_text(metadata, './mp4video', default=None)
if mp4_video is not None:
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video)
video_root = mobj.group('root')
if video_root is None:
http_host = xpath_text(metadata, 'httpHost', default=None)
if http_host:
video_root = 'http://%s/' % http_host
if video_root is None:
# Hard-coded in http://evt.dispeak.com/ubm/gdc/sf16/custom/player2.js
# Works for GPUTechConf, too
video_root = 'http://s3-2u.digitallyspeaking.com/'
formats = metadata.findall('./MBRVideos/MBRVideo')
if not formats:
return None
for a_format in formats:
stream_name = xpath_text(a_format, 'streamName', fatal=True)
video_path = re.match(r'mp4\:(?P<path>.*)', stream_name).group('path')
url = video_root + video_path
vbr = xpath_text(a_format, 'bitrate')
video_formats.append({
'url': url,
'vbr': int_or_none(vbr),
})
return video_formats
def _parse_flv(self, metadata):
formats = []
akamai_url = xpath_text(metadata, './akamaiHost', fatal=True)
audios = metadata.findall('./audios/audio')
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xpath_text(metadata, './slideVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xpath_text(metadata, './speakerVideo', fatal=True)
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _real_extract(self, url):
video_id = self._match_id(url)
xml_description = self._download_xml(url, video_id)
metadata = xpath_element(xml_description, 'metadata')
video_formats = self._parse_mp4(metadata)
if video_formats is None:
video_formats = self._parse_flv(metadata)
return {
'id': video_id,
'formats': video_formats,
'title': xpath_text(metadata, 'title', fatal=True),
'duration': parse_duration(xpath_text(metadata, 'endTime')),
'creator': xpath_text(metadata, 'speaker'),
}

View File

@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': 'iseven', 'display_id': 'iseven',
'ext': 'flv', 'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61', 'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅', 'uploader': '7师傅',
'uploader_id': '431925', 'uploader_id': '431925',
@ -43,7 +43,7 @@ class DouyuTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Romm not found', 'skip': 'Room not found',
}, { }, {
'url': 'http://www.douyutv.com/17732', 'url': 'http://www.douyutv.com/17732',
'info_dict': { 'info_dict': {
@ -51,7 +51,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': '17732', 'display_id': '17732',
'ext': 'flv', 'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61', 'description': 're:.*m7show@163\.com.*',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅', 'uploader': '7师傅',
'uploader_id': '431925', 'uploader_id': '431925',
@ -75,13 +75,28 @@ class DouyuTVIE(InfoExtractor):
room_id = self._html_search_regex( room_id = self._html_search_regex(
r'"room_id"\s*:\s*(\d+),', page, 'room id') r'"room_id"\s*:\s*(\d+),', page, 'room id')
config = None
# Douyu API sometimes returns error "Unable to load the requested class: eticket_redis_cache"
# Retry with different parameters - same parameters cause same errors
for i in range(5):
prefix = 'room/%s?aid=android&client_sys=android&time=%d' % ( prefix = 'room/%s?aid=android&client_sys=android&time=%d' % (
room_id, int(time.time())) room_id, int(time.time()))
auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest() auth = hashlib.md5((prefix + '1231').encode('ascii')).hexdigest()
config = self._download_json(
config_page = self._download_webpage(
'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth), 'http://www.douyutv.com/api/v1/%s&auth=%s' % (prefix, auth),
video_id) video_id)
try:
config = self._parse_json(config_page, video_id, fatal=False)
except ExtractorError:
# Wait some time before retrying to get a different time() value
self._sleep(1, video_id, msg_template='%(video_id)s: Error occurs. '
'Waiting for %(timeout)s seconds before retrying')
continue
else:
break
if config is None:
raise ExtractorError('Unable to fetch API result')
data = config['data'] data = config['data']

View File

@ -6,13 +6,18 @@ import re
import time import time
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..compat import compat_urlparse
from ..utils import (
int_or_none,
update_url_query,
)
class DPlayIE(InfoExtractor): class DPlayIE(InfoExtractor):
_VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# geo restricted, via direct unsigned hls URL
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/', 'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
'info_dict': { 'info_dict': {
'id': '1255600', 'id': '1255600',
@ -31,11 +36,12 @@ class DPlayIE(InfoExtractor):
}, },
'expected_warnings': ['Unable to download f4m manifest'], 'expected_warnings': ['Unable to download f4m manifest'],
}, { }, {
# non geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/', 'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': { 'info_dict': {
'id': '3172', 'id': '3172',
'display_id': 'season-1-svensken-lar-sig-njuta-av-livet', 'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'flv', 'ext': 'mp4',
'title': 'Svensken lär sig njuta av livet', 'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8', 'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'duration': 2650, 'duration': 2650,
@ -48,23 +54,25 @@ class DPlayIE(InfoExtractor):
'age_limit': 0, 'age_limit': 0,
}, },
}, { }, {
# geo restricted, via secure api, unsigned download hls URL
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/', 'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': { 'info_dict': {
'id': '70816', 'id': '70816',
'display_id': 'season-6-episode-12', 'display_id': 'season-6-episode-12',
'ext': 'flv', 'ext': 'mp4',
'title': 'Episode 12', 'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90', 'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
'duration': 2563, 'duration': 2563,
'timestamp': 1429696800, 'timestamp': 1429696800,
'upload_date': '20150422', 'upload_date': '20150422',
'creator': 'Kanal 4', 'creator': 'Kanal 4 (Home)',
'series': 'Mig og min mor', 'series': 'Mig og min mor',
'season_number': 6, 'season_number': 6,
'episode_number': 12, 'episode_number': 12,
'age_limit': 0, 'age_limit': 0,
}, },
}, { }, {
# geo restricted, via direct unsigned hls URL
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/', 'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True, 'only_matching': True,
}] }]
@ -90,17 +98,24 @@ class DPlayIE(InfoExtractor):
def extract_formats(protocol, manifest_url): def extract_formats(protocol, manifest_url):
if protocol == 'hls': if protocol == 'hls':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
manifest_url, video_id, ext='mp4', manifest_url, video_id, ext='mp4',
entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)) entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
# Sometimes final URLs inside m3u8 are unsigned, let's fix this
# ourselves
query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
for m3u8_format in m3u8_formats:
m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
formats.extend(m3u8_formats)
elif protocol == 'hds': elif protocol == 'hds':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0', manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
video_id, f4m_id=protocol, fatal=False)) video_id, f4m_id=protocol, fatal=False))
domain_tld = domain.split('.')[-1] domain_tld = domain.split('.')[-1]
if domain_tld in ('se', 'dk'): if domain_tld in ('se', 'dk', 'no'):
for protocol in PROTOCOLS: for protocol in PROTOCOLS:
# Providing dsc-geo allows to bypass geo restriction in some cases
self._set_cookie( self._set_cookie(
'secure.dplay.%s' % domain_tld, 'dsc-geo', 'secure.dplay.%s' % domain_tld, 'dsc-geo',
json.dumps({ json.dumps({
@ -113,13 +128,24 @@ class DPlayIE(InfoExtractor):
'Downloading %s stream JSON' % protocol, fatal=False) 'Downloading %s stream JSON' % protocol, fatal=False)
if stream and stream.get(protocol): if stream and stream.get(protocol):
extract_formats(protocol, stream[protocol]) extract_formats(protocol, stream[protocol])
else:
# The last resort is to try direct unsigned hls/hds URLs from info dictionary.
# Sometimes this does work even when secure API with dsc-geo has failed (e.g.
# http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/).
if not formats:
for protocol in PROTOCOLS: for protocol in PROTOCOLS:
if info.get(protocol): if info.get(protocol):
extract_formats(protocol, info[protocol]) extract_formats(protocol, info[protocol])
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
for lang in ('se', 'sv', 'da', 'nl', 'no'):
for format_id in ('web_vtt', 'vtt', 'srt'):
subtitle_url = info.get('subtitles_%s_%s' % (lang, format_id))
if subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url})
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
@ -133,4 +159,5 @@ class DPlayIE(InfoExtractor):
'episode_number': int_or_none(info.get('episode')), 'episode_number': int_or_none(info.get('episode')),
'age_limit': int_or_none(info.get('minimum_age')), 'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats, 'formats': formats,
'subtitles': subtitles,
} }

View File

@ -1,39 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class DumpIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?dump\.com/(?P<id>[a-zA-Z0-9]+)/'
_TEST = {
'url': 'http://www.dump.com/oneus/',
'md5': 'ad71704d1e67dfd9e81e3e8b42d69d99',
'info_dict': {
'id': 'oneus',
'ext': 'flv',
'title': "He's one of us.",
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
video_id = m.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r's1.addVariable\("file",\s*"([^"]+)"', webpage, 'video URL')
title = self._og_search_title(webpage)
thumbnail = self._og_search_thumbnail(webpage)
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
}

View File

@ -4,9 +4,11 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
url_basename,
) )
@ -21,7 +23,7 @@ class EaglePlatformIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
# http://lenta.ru/news/2015/03/06/navalny/ # http://lenta.ru/news/2015/03/06/navalny/
'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201', 'url': 'http://lentaru.media.eagleplatform.com/index/player?player=new&record_id=227304&player_template_id=5201',
'md5': '70f5187fb620f2c1d503b3b22fd4efe3', # Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': { 'info_dict': {
'id': '227304', 'id': '227304',
'ext': 'mp4', 'ext': 'mp4',
@ -36,7 +38,7 @@ class EaglePlatformIE(InfoExtractor):
# http://muz-tv.ru/play/7129/ # http://muz-tv.ru/play/7129/
# http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true # http://media.clipyou.ru/index/player?record_id=12820&width=730&height=415&autoplay=true
'url': 'eagleplatform:media.clipyou.ru:12820', 'url': 'eagleplatform:media.clipyou.ru:12820',
'md5': '90b26344ba442c8e44aa4cf8f301164a', 'md5': '358597369cf8ba56675c1df15e7af624',
'info_dict': { 'info_dict': {
'id': '12820', 'id': '12820',
'ext': 'mp4', 'ext': 'mp4',
@ -55,8 +57,13 @@ class EaglePlatformIE(InfoExtractor):
raise ExtractorError(' '.join(response['errors']), expected=True) raise ExtractorError(' '.join(response['errors']), expected=True)
def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'): def _download_json(self, url_or_request, video_id, note='Downloading JSON metadata'):
try:
response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note) response = super(EaglePlatformIE, self)._download_json(url_or_request, video_id, note)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError):
response = self._parse_json(ee.cause.read().decode('utf-8'), video_id)
self._handle_error(response) self._handle_error(response)
raise
return response return response
def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'): def _get_video_url(self, url_or_request, video_id, note='Downloading JSON metadata'):
@ -84,17 +91,33 @@ class EaglePlatformIE(InfoExtractor):
secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:') secure_m3u8 = self._proto_relative_url(media['sources']['secure_m3u8']['auto'], 'http:')
formats = []
m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON') m3u8_url = self._get_video_url(secure_m3u8, video_id, 'Downloading m3u8 JSON')
formats = self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
m3u8_url, video_id, m3u8_url, video_id,
'mp4', entry_protocol='m3u8_native', m3u8_id='hls') 'mp4', entry_protocol='m3u8_native', m3u8_id='hls')
formats.extend(m3u8_formats)
mp4_url = self._get_video_url( mp4_url = self._get_video_url(
# Secure mp4 URL is constructed according to Player.prototype.mp4 from # Secure mp4 URL is constructed according to Player.prototype.mp4 from
# http://lentaru.media.eagleplatform.com/player/player.js # http://lentaru.media.eagleplatform.com/player/player.js
re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8), re.sub(r'm3u8|hlsvod|hls|f4m', 'mp4', secure_m3u8),
video_id, 'Downloading mp4 JSON') video_id, 'Downloading mp4 JSON')
formats.append({'url': mp4_url, 'format_id': 'mp4'}) mp4_url_basename = url_basename(mp4_url)
for m3u8_format in m3u8_formats:
mobj = re.search('/([^/]+)/index\.m3u8', m3u8_format['url'])
if mobj:
http_format = m3u8_format.copy()
video_url = mp4_url.replace(mp4_url_basename, mobj.group(1))
if not self._is_valid_url(video_url, video_id):
continue
http_format.update({
'url': video_url,
'format_id': m3u8_format['format_id'].replace('hls', 'http'),
'protocol': 'http',
})
formats.append(http_format)
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -4,10 +4,10 @@ from .common import InfoExtractor
class EbaumsWorldIE(InfoExtractor): class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?ebaumsworld\.com/videos/[^/]+/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.ebaumsworld.com/video/watch/83367677/', 'url': 'http://www.ebaumsworld.com/videos/a-giant-python-opens-the-door/83367677/',
'info_dict': { 'info_dict': {
'id': '83367677', 'id': '83367677',
'ext': 'mp4', 'ext': 'mp4',

View File

@ -46,6 +46,7 @@ from .arte import (
ArteTVPlus7IE, ArteTVPlus7IE,
ArteTVCreativeIE, ArteTVCreativeIE,
ArteTVConcertIE, ArteTVConcertIE,
ArteTVInfoIE,
ArteTVFutureIE, ArteTVFutureIE,
ArteTVCinemaIE, ArteTVCinemaIE,
ArteTVDDCIE, ArteTVDDCIE,
@ -74,6 +75,7 @@ from .bigflix import BigflixIE
from .bild import BildIE from .bild import BildIE
from .bilibili import BiliBiliIE from .bilibili import BiliBiliIE
from .biobiochiletv import BioBioChileTVIE from .biobiochiletv import BioBioChileTVIE
from .biqle import BIQLEIE
from .bleacherreport import ( from .bleacherreport import (
BleacherReportIE, BleacherReportIE,
BleacherReportCMSIE, BleacherReportCMSIE,
@ -122,7 +124,7 @@ from .chirbit import (
ChirbitProfileIE, ChirbitProfileIE,
) )
from .cinchcast import CinchcastIE from .cinchcast import CinchcastIE
from .cinemassacre import CinemassacreIE from .cliprs import ClipRsIE
from .clipfish import ClipfishIE from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE from .clipsyndicate import ClipsyndicateIE
@ -155,6 +157,7 @@ from .cspan import CSpanIE
from .ctsnews import CtsNewsIE from .ctsnews import CtsNewsIE
from .cultureunplugged import CultureUnpluggedIE from .cultureunplugged import CultureUnpluggedIE
from .cwtv import CWTVIE from .cwtv import CWTVIE
from .dailymail import DailyMailIE
from .dailymotion import ( from .dailymotion import (
DailymotionIE, DailymotionIE,
DailymotionPlaylistIE, DailymotionPlaylistIE,
@ -191,10 +194,10 @@ from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE from .drtuber import DrTuberIE
from .drtv import DRTVIE from .drtv import DRTVIE
from .dvtv import DVTVIE from .dvtv import DVTVIE
from .dump import DumpIE
from .dumpert import DumpertIE from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE from .discovery import DiscoveryIE
from .dispeak import DigitallySpeakingIE
from .dropbox import DropboxIE from .dropbox import DropboxIE
from .dw import ( from .dw import (
DWIE, DWIE,
@ -335,7 +338,6 @@ from .ivi import (
) )
from .ivideon import IvideonIE from .ivideon import IvideonIE
from .izlesene import IzleseneIE from .izlesene import IzleseneIE
from .jadorecettepub import JadoreCettePubIE
from .jeuxvideo import JeuxVideoIE from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE from .jove import JoveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
@ -381,6 +383,7 @@ from .limelight import (
LimelightChannelIE, LimelightChannelIE,
LimelightChannelListIE, LimelightChannelListIE,
) )
from .litv import LiTVIE
from .liveleak import LiveLeakIE from .liveleak import LiveLeakIE
from .livestream import ( from .livestream import (
LivestreamIE, LivestreamIE,
@ -399,19 +402,28 @@ from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE from .mailru import MailRuIE
from .makerschannel import MakersChannelIE from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE from .makertv import MakerTVIE
from .malemotion import MalemotionIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .metacafe import MetacafeIE from .metacafe import MetacafeIE
from .metacritic import MetacriticIE from .metacritic import MetacriticIE
from .mgoon import MgoonIE from .mgoon import MgoonIE
from .mgtv import MGTVIE
from .microsoftvirtualacademy import (
MicrosoftVirtualAcademyIE,
MicrosoftVirtualAcademyCourseIE,
)
from .minhateca import MinhatecaIE from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE from .minoto import MinotoIE
from .miomio import MioMioIE from .miomio import MioMioIE
from .mit import TechTVMITIE, MITIE, OCWMITIE from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE from .mitele import MiTeleIE
from .mixcloud import MixcloudIE from .mixcloud import (
MixcloudIE,
MixcloudUserIE,
MixcloudPlaylistIE,
MixcloudStreamIE,
)
from .mlb import MLBIE from .mlb import MLBIE
from .mnet import MnetIE from .mnet import MnetIE
from .mpora import MporaIE from .mpora import MporaIE
@ -419,7 +431,6 @@ from .moevideo import MoeVideoIE
from .mofosex import MofosexIE from .mofosex import MofosexIE
from .mojvideo import MojvideoIE from .mojvideo import MojvideoIE
from .moniker import MonikerIE from .moniker import MonikerIE
from .mooshare import MooshareIE
from .morningstar import MorningstarIE from .morningstar import MorningstarIE
from .motherless import MotherlessIE from .motherless import MotherlessIE
from .motorsport import MotorsportIE from .motorsport import MotorsportIE
@ -433,8 +444,7 @@ from .mtv import (
) )
from .muenchentv import MuenchenTVIE from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE from .musicplayon import MusicPlayOnIE
from .muzu import MuzuTVIE from .mwave import MwaveIE, MwaveMeetGreetIE
from .mwave import MwaveIE
from .myspace import MySpaceIE, MySpaceAlbumIE from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvi import MyviIE from .myvi import MyviIE
@ -464,7 +474,6 @@ from .ndr import (
from .ndtv import NDTVIE from .ndtv import NDTVIE
from .netzkino import NetzkinoIE from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE from .nerdcubed import NerdCubedFeedIE
from .nerdist import NerdistIE
from .neteasemusic import ( from .neteasemusic import (
NetEaseMusicIE, NetEaseMusicIE,
NetEaseMusicAlbumIE, NetEaseMusicAlbumIE,
@ -485,9 +494,10 @@ from .nextmovie import NextMovieIE
from .nfb import NFBIE from .nfb import NFBIE
from .nfl import NFLIE from .nfl import NFLIE
from .nhl import ( from .nhl import (
NHLIE,
NHLNewsIE,
NHLVideocenterIE, NHLVideocenterIE,
NHLNewsIE,
NHLVideocenterCategoryIE,
NHLIE,
) )
from .nick import NickIE from .nick import NickIE
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
@ -555,12 +565,15 @@ from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
from .pbs import PBSIE from .pbs import PBSIE
from .periscope import PeriscopeIE from .people import PeopleIE
from .periscope import (
PeriscopeIE,
PeriscopeUserIE,
)
from .philharmoniedeparis import PhilharmonieDeParisIE from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE from .photobucket import PhotobucketIE
from .pinkbike import PinkbikeIE from .pinkbike import PinkbikeIE
from .planetaplay import PlanetaPlayIE
from .pladform import PladformIE from .pladform import PladformIE
from .played import PlayedIE from .played import PlayedIE
from .playfm import PlayFMIE from .playfm import PlayFMIE
@ -583,6 +596,7 @@ from .pornhub import (
from .pornotube import PornotubeIE from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE from .pornovoisines import PornoVoisinesIE
from .pornoxo import PornoXOIE from .pornoxo import PornoXOIE
from .presstv import PressTVIE
from .primesharetv import PrimeShareTVIE from .primesharetv import PrimeShareTVIE
from .promptfile import PromptFileIE from .promptfile import PromptFileIE
from .prosiebensat1 import ProSiebenSat1IE from .prosiebensat1 import ProSiebenSat1IE
@ -595,7 +609,6 @@ from .qqmusic import (
QQMusicToplistIE, QQMusicToplistIE,
QQMusicPlaylistIE, QQMusicPlaylistIE,
) )
from .quickvid import QuickVidIE
from .r7 import R7IE from .r7 import R7IE
from .radiode import RadioDeIE from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE from .radiojavan import RadioJavanIE
@ -653,7 +666,6 @@ from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .senateisvp import SenateISVPIE from .senateisvp import SenateISVPIE
from .servingsys import ServingSysIE from .servingsys import ServingSysIE
from .sexu import SexuIE from .sexu import SexuIE
from .sexykarma import SexyKarmaIE
from .shahid import ShahidIE from .shahid import ShahidIE
from .shared import SharedIE from .shared import SharedIE
from .sharesix import ShareSixIE from .sharesix import ShareSixIE
@ -670,10 +682,6 @@ from .smotri import (
SmotriUserIE, SmotriUserIE,
SmotriBroadcastIE, SmotriBroadcastIE,
) )
from .snagfilms import (
SnagFilmsIE,
SnagFilmsEmbedIE,
)
from .snotr import SnotrIE from .snotr import SnotrIE
from .sohu import SohuIE from .sohu import SohuIE
from .soundcloud import ( from .soundcloud import (
@ -725,9 +733,13 @@ from .svt import (
from .swrmediathek import SWRMediathekIE from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE from .syfy import SyfyIE
from .sztvhu import SztvHuIE from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE from .tagesschau import (
TagesschauPlayerIE,
TagesschauIE,
)
from .tapely import TapelyIE from .tapely import TapelyIE
from .tass import TassIE from .tass import TassIE
from .tdslifeway import TDSLifewayIE
from .teachertube import ( from .teachertube import (
TeacherTubeIE, TeacherTubeIE,
TeacherTubeUserIE, TeacherTubeUserIE,
@ -745,7 +757,6 @@ from .teletask import TeleTaskIE
from .testurl import TestURLIE from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
from .theonion import TheOnionIE
from .theplatform import ( from .theplatform import (
ThePlatformIE, ThePlatformIE,
ThePlatformFeedIE, ThePlatformFeedIE,
@ -823,7 +834,6 @@ from .twitch import (
TwitchVodIE, TwitchVodIE,
TwitchProfileIE, TwitchProfileIE,
TwitchPastBroadcastsIE, TwitchPastBroadcastsIE,
TwitchBookmarksIE,
TwitchStreamIE, TwitchStreamIE,
) )
from .twitter import ( from .twitter import (
@ -831,7 +841,6 @@ from .twitter import (
TwitterIE, TwitterIE,
TwitterAmplifyIE, TwitterAmplifyIE,
) )
from .ubu import UbuIE
from .udemy import ( from .udemy import (
UdemyIE, UdemyIE,
UdemyCourseIE UdemyCourseIE
@ -842,14 +851,20 @@ from .unistra import UnistraIE
from .urort import UrortIE from .urort import UrortIE
from .usatoday import USATodayIE from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import UstudioIE from .ustudio import (
UstudioIE,
UstudioEmbedIE,
)
from .varzesh3 import Varzesh3IE from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE from .vbox7 import Vbox7IE
from .veehd import VeeHDIE from .veehd import VeeHDIE
from .veoh import VeohIE from .veoh import VeohIE
from .vessel import VesselIE from .vessel import VesselIE
from .vesti import VestiIE from .vesti import VestiIE
from .vevo import VevoIE from .vevo import (
VevoIE,
VevoPlaylistIE,
)
from .vgtv import ( from .vgtv import (
BTArticleIE, BTArticleIE,
BTVestlendingenIE, BTVestlendingenIE,
@ -878,6 +893,10 @@ from .vidme import (
) )
from .vidzi import VidziIE from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE from .vier import VierIE, VierVideosIE
from .viewlift import (
ViewLiftIE,
ViewLiftEmbedIE,
)
from .viewster import ViewsterIE from .viewster import ViewsterIE
from .viidea import ViideaIE from .viidea import ViideaIE
from .vimeo import ( from .vimeo import (
@ -916,7 +935,7 @@ from .vulture import VultureIE
from .walla import WallaIE from .walla import WallaIE
from .washingtonpost import WashingtonPostIE from .washingtonpost import WashingtonPostIE
from .wat import WatIE from .wat import WatIE
from .wayofthemaster import WayOfTheMasterIE from .watchindianporn import WatchIndianPornIE
from .wdr import ( from .wdr import (
WDRIE, WDRIE,
WDRMobileIE, WDRMobileIE,
@ -940,6 +959,12 @@ from .xhamster import (
XHamsterIE, XHamsterIE,
XHamsterEmbedIE, XHamsterEmbedIE,
) )
from .xiami import (
XiamiSongIE,
XiamiAlbumIE,
XiamiArtistIE,
XiamiCollectionIE
)
from .xminus import XMinusIE from .xminus import XMinusIE
from .xnxx import XNXXIE from .xnxx import XNXXIE
from .xstream import XstreamIE from .xstream import XstreamIE

View File

@ -1,20 +1,19 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
class FczenitIE(InfoExtractor): class FczenitIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?fc-zenit\.ru/video/gl(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?fc-zenit\.ru/video/(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://fc-zenit.ru/video/gl6785/', 'url': 'http://fc-zenit.ru/video/41044/',
'md5': '458bacc24549173fe5a5aa29174a5606', 'md5': '0e3fab421b455e970fa1aa3891e57df0',
'info_dict': { 'info_dict': {
'id': '6785', 'id': '41044',
'ext': 'mp4', 'ext': 'mp4',
'title': '«Зенит-ТВ»: как Олег Шатов играл против «Урала»', 'title': 'Так пишется история: казанский разгром ЦСКА на «Зенит-ТВ»',
}, },
} }
@ -22,15 +21,23 @@ class FczenitIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
video_title = self._html_search_regex(r'<div class=\"photoalbum__title\">([^<]+)', webpage, 'title') video_title = self._html_search_regex(
r'<[^>]+class=\"photoalbum__title\">([^<]+)', webpage, 'title')
bitrates_raw = self._html_search_regex(r'bitrates:.*\n(.*)\]', webpage, 'video URL') video_items = self._parse_json(self._search_regex(
bitrates = re.findall(r'url:.?\'(.+?)\'.*?bitrate:.?([0-9]{3}?)', bitrates_raw) r'arrPath\s*=\s*JSON\.parse\(\'(.+)\'\)', webpage, 'video items'),
video_id)
def merge_dicts(*dicts):
ret = {}
for a_dict in dicts:
ret.update(a_dict)
return ret
formats = [{ formats = [{
'url': furl, 'url': compat_urlparse.urljoin(url, video_url),
'tbr': tbr, 'tbr': int(tbr),
} for furl, tbr in bitrates] } for tbr, video_url in merge_dicts(*video_items).items()]
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -2,78 +2,133 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..compat import compat_xpath
from ..utils import (
int_or_none,
qualities,
unified_strdate,
xpath_attr,
xpath_element,
xpath_text,
xpath_with_ns,
)
class FirstTVIE(InfoExtractor): class FirstTVIE(InfoExtractor):
IE_NAME = '1tv' IE_NAME = '1tv'
IE_DESC = 'Первый канал' IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)' _VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+p?(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390', # single format via video_materials.json API
'md5': '777f525feeec4806130f4f764bc18a4f',
'info_dict': {
'id': '73390',
'ext': 'mp4',
'title': 'Олимпийские канатные дороги',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'duration': 149,
'like_count': int,
'dislike_count': int,
},
'skip': 'Only works from Russia',
}, {
'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930', 'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
'md5': 'a1b6b60d530ebcf8daacf4565762bbaf', 'md5': '82a2777648acae812d58b3f5bd42882b',
'info_dict': { 'info_dict': {
'id': '35930', 'id': '35930',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Наедине со всеми. Людмила Сенчина', 'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
'description': 'md5:89553aed1d641416001fe8d450f06cb9', 'description': 'md5:357933adeede13b202c7c21f91b871b2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$', 'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20150212',
'duration': 2694, 'duration': 2694,
}, },
'skip': 'Only works from Russia', }, {
# multiple formats via video_materials.json API
'url': 'http://www.1tv.ru/video_archive/projects/dobroeutro/p113641',
'info_dict': {
'id': '113641',
'ext': 'mp4',
'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
'description': 'md5:8dcebb3dded0ff20fade39087fd1fee2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20160407',
'duration': 179,
'formats': 'mincount:3',
},
'params': {
'skip_download': True,
},
}, {
# single format only available via ONE_ONLINE_VIDEOS.archive_single_xml API
'url': 'http://www.1tv.ru/video_archive/series/f7552/p47038',
'md5': '519d306c5b5669761fd8906c39dbee23',
'info_dict': {
'id': '47038',
'ext': 'mp4',
'title': '"Побег". Второй сезон. 3 серия',
'description': 'md5:3abf8f6b9bce88201c33e9a3d794a00b',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20120516',
'duration': 3080,
},
}, {
'url': 'http://www.1tv.ru/videoarchive/9967',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, 'Downloading page') # Videos with multiple formats only available via this API
video = self._download_json(
'http://www.1tv.ru/video_materials.json?legacy_id=%s' % video_id,
video_id, fatal=False)
video_url = self._html_search_regex( description, thumbnail, upload_date, duration = [None] * 4
r'''(?s)(?:jwplayer\('flashvideoportal_1'\)\.setup\({|var\s+playlistObj\s*=).*?'file'\s*:\s*'([^']+)'.*?}\);''',
webpage, 'video URL')
if video:
item = video[0]
title = item['title']
quality = qualities(('ld', 'sd', 'hd', ))
formats = [{
'url': f['src'],
'format_id': f.get('name'),
'quality': quality(f.get('name')),
} for f in item['mbr'] if f.get('src')]
thumbnail = item.get('poster')
else:
# Some videos are not available via video_materials.json
video = self._download_xml(
'http://www.1tv.ru/owa/win/ONE_ONLINE_VIDEOS.archive_single_xml?pid=%s' % video_id,
video_id)
NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
item = xpath_element(video, './channel/item', fatal=True)
title = xpath_text(item, './title', fatal=True)
formats = [{
'url': content.attrib['url'],
} for content in item.findall(
compat_xpath(xpath_with_ns('./media:content', NS_MAP))) if content.attrib.get('url')]
thumbnail = xpath_attr(
item, xpath_with_ns('./media:thumbnail', NS_MAP), 'url')
self._sort_formats(formats)
webpage = self._download_webpage(url, video_id, 'Downloading page', fatal=False)
if webpage:
title = self._html_search_regex( title = self._html_search_regex(
[r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>', (r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"], webpage, 'title') r"'title'\s*:\s*'([^']+)'"),
webpage, 'title', default=None) or title
description = self._html_search_regex( description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>', r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta( webpage, 'description', default=None) or self._html_search_meta(
'description', webpage, 'description') 'description', webpage, 'description')
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
thumbnail = self._og_search_thumbnail(webpage) duration = int_or_none(self._html_search_meta(
duration = self._og_search_property( 'video:duration', webpage, 'video duration', fatal=False))
'video:duration', webpage, upload_date = unified_strdate(self._html_search_meta(
'video duration', fatal=False) 'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
like_count = self._html_search_regex(
r'title="Понравилось".*?/></label> \[(\d+)\]',
webpage, 'like count', default=None)
dislike_count = self._html_search_regex(
r'title="Не понравилось".*?/></label> \[(\d+)\]',
webpage, 'dislike count', default=None)
return { return {
'id': video_id, 'id': video_id,
'url': video_url,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'title': title, 'title': title,
'description': description, 'description': description,
'upload_date': upload_date,
'duration': int_or_none(duration), 'duration': int_or_none(duration),
'like_count': int_or_none(like_count), 'formats': formats
'dislike_count': int_or_none(dislike_count),
} }

View File

@ -24,13 +24,28 @@ class FlickrIE(InfoExtractor):
'upload_date': '20110423', 'upload_date': '20110423',
'uploader_id': '10922353@N03', 'uploader_id': '10922353@N03',
'uploader': 'Forest Wander', 'uploader': 'Forest Wander',
'uploader_url': 'https://www.flickr.com/photos/forestwander-nature-pictures/',
'comment_count': int, 'comment_count': int,
'view_count': int, 'view_count': int,
'tags': list, 'tags': list,
'license': 'Attribution-ShareAlike',
} }
} }
_API_BASE_URL = 'https://api.flickr.com/services/rest?' _API_BASE_URL = 'https://api.flickr.com/services/rest?'
# https://help.yahoo.com/kb/flickr/SLN25525.html
_LICENSES = {
'0': 'All Rights Reserved',
'1': 'Attribution-NonCommercial-ShareAlike',
'2': 'Attribution-NonCommercial',
'3': 'Attribution-NonCommercial-NoDerivs',
'4': 'Attribution',
'5': 'Attribution-ShareAlike',
'6': 'Attribution-NoDerivs',
'7': 'No known copyright restrictions',
'8': 'United States government work',
'9': 'Public Domain Dedication (CC0)',
'10': 'Public Domain Work',
}
def _call_api(self, method, video_id, api_key, note, secret=None): def _call_api(self, method, video_id, api_key, note, secret=None):
query = { query = {
@ -75,6 +90,9 @@ class FlickrIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
owner = video_info.get('owner', {}) owner = video_info.get('owner', {})
uploader_id = owner.get('nsid')
uploader_path = owner.get('path_alias') or uploader_id
uploader_url = 'https://www.flickr.com/photos/%s/' % uploader_path if uploader_path else None
return { return {
'id': video_id, 'id': video_id,
@ -83,11 +101,13 @@ class FlickrIE(InfoExtractor):
'formats': formats, 'formats': formats,
'timestamp': int_or_none(video_info.get('dateuploaded')), 'timestamp': int_or_none(video_info.get('dateuploaded')),
'duration': int_or_none(video_info.get('video', {}).get('duration')), 'duration': int_or_none(video_info.get('video', {}).get('duration')),
'uploader_id': owner.get('nsid'), 'uploader_id': uploader_id,
'uploader': owner.get('realname'), 'uploader': owner.get('realname'),
'uploader_url': uploader_url,
'comment_count': int_or_none(video_info.get('comments', {}).get('_content')), 'comment_count': int_or_none(video_info.get('comments', {}).get('_content')),
'view_count': int_or_none(video_info.get('views')), 'view_count': int_or_none(video_info.get('views')),
'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])] 'tags': [tag.get('_content') for tag in video_info.get('tags', {}).get('tag', [])],
'license': self._LICENSES.get(video_info.get('license')),
} }
else: else:
raise ExtractorError('not a video', expected=True) raise ExtractorError('not a video', expected=True)

View File

@ -2,6 +2,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse_unquote_plus,
)
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
@ -27,6 +31,7 @@ class FunimationIE(InfoExtractor):
'description': 'md5:1769f43cd5fc130ace8fd87232207892', 'description': 'md5:1769f43cd5fc130ace8fd87232207892',
'thumbnail': 're:https?://.*\.jpg', 'thumbnail': 're:https?://.*\.jpg',
}, },
'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed',
}, { }, {
'url': 'http://www.funimation.com/shows/hacksign/videos/official/role-play', 'url': 'http://www.funimation.com/shows/hacksign/videos/official/role-play',
'info_dict': { 'info_dict': {
@ -37,6 +42,7 @@ class FunimationIE(InfoExtractor):
'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd', 'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
'thumbnail': 're:https?://.*\.jpg', 'thumbnail': 're:https?://.*\.jpg',
}, },
'skip': 'Access without user interaction is forbidden by CloudFlare',
}, { }, {
'url': 'http://www.funimation.com/shows/attack-on-titan-junior-high/videos/promotional/broadcast-dub-preview', 'url': 'http://www.funimation.com/shows/attack-on-titan-junior-high/videos/promotional/broadcast-dub-preview',
'info_dict': { 'info_dict': {
@ -47,8 +53,36 @@ class FunimationIE(InfoExtractor):
'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803', 'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803',
'thumbnail': 're:https?://.*\.(?:jpg|png)', 'thumbnail': 're:https?://.*\.(?:jpg|png)',
}, },
'skip': 'Access without user interaction is forbidden by CloudFlare',
}] }]
_LOGIN_URL = 'http://www.funimation.com/login'
def _download_webpage(self, *args, **kwargs):
try:
return super(FunimationIE, self)._download_webpage(*args, **kwargs)
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
response = ee.cause.read()
if b'>Please complete the security check to access<' in response:
raise ExtractorError(
'Access to funimation.com is blocked by CloudFlare. '
'Please browse to http://www.funimation.com/, solve '
'the reCAPTCHA, export browser cookies to a text file,'
' and then try again with --cookies YOUR_COOKIE_FILE.',
expected=True)
raise
def _extract_cloudflare_session_ua(self, url):
ci_session_cookie = self._get_cookies(url).get('ci_session')
if ci_session_cookie:
ci_session = compat_urllib_parse_unquote_plus(ci_session_cookie.value)
# ci_session is a string serialized by PHP function serialize()
# This case is simple enough to use regular expressions only
return self._search_regex(
r'"user_agent";s:\d+:"([^"]+)"', ci_session, 'user agent',
default=None)
def _login(self): def _login(self):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None: if username is None:
@ -57,8 +91,11 @@ class FunimationIE(InfoExtractor):
'email_field': username, 'email_field': username,
'password_field': password, 'password_field': password,
}) })
login_request = sanitized_Request('http://www.funimation.com/login', data, headers={ user_agent = self._extract_cloudflare_session_ua(self._LOGIN_URL)
'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0', if not user_agent:
user_agent = 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0'
login_request = sanitized_Request(self._LOGIN_URL, data, headers={
'User-Agent': user_agent,
'Content-Type': 'application/x-www-form-urlencoded' 'Content-Type': 'application/x-www-form-urlencoded'
}) })
login_page = self._download_webpage( login_page = self._download_webpage(
@ -103,11 +140,16 @@ class FunimationIE(InfoExtractor):
('mobile', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'), ('mobile', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'),
) )
user_agent = self._extract_cloudflare_session_ua(url)
if user_agent:
USER_AGENTS = ((None, user_agent),)
for kind, user_agent in USER_AGENTS: for kind, user_agent in USER_AGENTS:
request = sanitized_Request(url) request = sanitized_Request(url)
request.add_header('User-Agent', user_agent) request.add_header('User-Agent', user_agent)
webpage = self._download_webpage( webpage = self._download_webpage(
request, display_id, 'Downloading %s webpage' % kind) request, display_id,
'Downloading %s webpage' % kind if kind else 'Downloading webpage')
playlist = self._parse_json( playlist = self._parse_json(
self._search_regex( self._search_regex(

View File

@ -46,8 +46,8 @@ class FunnyOrDieIE(InfoExtractor):
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0) links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
m3u8_url = self._search_regex( m3u8_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8)\1', r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
webpage, 'm3u8 url', default=None, group='url') webpage, 'm3u8 url', group='url')
formats = [] formats = []

View File

@ -7,7 +7,7 @@ from .common import InfoExtractor
class GazetaIE(InfoExtractor): class GazetaIE(InfoExtractor):
_VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:(?:main|\d{4}/\d{2}/\d{2})/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)' _VALID_URL = r'(?P<url>https?://(?:www\.)?gazeta\.ru/(?:[^/]+/)?video/(?:main/)*(?:\d{4}/\d{2}/\d{2}/)?(?P<id>[A-Za-z0-9-_.]+)\.s?html)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml', 'url': 'http://www.gazeta.ru/video/main/zadaite_vopros_vladislavu_yurevichu.shtml',
'md5': 'd49c9bdc6e5a7888f27475dc215ee789', 'md5': 'd49c9bdc6e5a7888f27475dc215ee789',
@ -18,9 +18,19 @@ class GazetaIE(InfoExtractor):
'description': 'md5:38617526050bd17b234728e7f9620a71', 'description': 'md5:38617526050bd17b234728e7f9620a71',
'thumbnail': 're:^https?://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
}, },
'skip': 'video not found',
}, { }, {
'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml', 'url': 'http://www.gazeta.ru/lifestyle/video/2015/03/08/master-klass_krasivoi_byt._delaem_vesennii_makiyazh.shtml',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.gazeta.ru/video/main/main/2015/06/22/platit_ili_ne_platit_po_isku_yukosa.shtml',
'md5': '37f19f78355eb2f4256ee1688359f24c',
'info_dict': {
'id': '252048',
'ext': 'mp4',
'title': '"Если по иску ЮКОСа придется платить, это будет большой удар по бюджету"',
},
'add_ie': ['EaglePlatform'],
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -4,7 +4,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
remove_end,
HEADRequest, HEADRequest,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
@ -51,63 +50,33 @@ class GDCVaultIE(InfoExtractor):
{ {
'url': 'http://gdcvault.com/play/1020791/', 'url': 'http://gdcvault.com/play/1020791/',
'only_matching': True, 'only_matching': True,
},
{
# Hard-coded hostname
'url': 'http://gdcvault.com/play/1023460/Tenacious-Design-and-The-Interface',
'md5': 'a8efb6c31ed06ca8739294960b2dbabd',
'info_dict': {
'id': '1023460',
'ext': 'mp4',
'display_id': 'Tenacious-Design-and-The-Interface',
'title': 'Tenacious Design and The Interface of \'Destiny\'',
},
},
{
# Multiple audios
'url': 'http://www.gdcvault.com/play/1014631/Classic-Game-Postmortem-PAC',
'info_dict': {
'id': '1014631',
'ext': 'flv',
'title': 'How to Create a Good Game - From My Experience of Designing Pac-Man',
},
'params': {
'skip_download': True, # Requires rtmpdump
'format': 'jp', # The japanese audio
} }
},
] ]
def _parse_mp4(self, xml_description):
video_formats = []
mp4_video = xml_description.find('./metadata/mp4video')
if mp4_video is None:
return None
mobj = re.match(r'(?P<root>https?://.*?/).*', mp4_video.text)
video_root = mobj.group('root')
formats = xml_description.findall('./metadata/MBRVideos/MBRVideo')
for format in formats:
mobj = re.match(r'mp4\:(?P<path>.*)', format.find('streamName').text)
url = video_root + mobj.group('path')
vbr = format.find('bitrate').text
video_formats.append({
'url': url,
'vbr': int(vbr),
})
return video_formats
def _parse_flv(self, xml_description):
formats = []
akamai_url = xml_description.find('./metadata/akamaiHost').text
audios = xml_description.find('./metadata/audios')
if audios is not None:
for audio in audios:
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(audio.get('url'), '.flv'),
'ext': 'flv',
'vcodec': 'none',
'format_id': audio.get('code'),
})
slide_video_path = xml_description.find('./metadata/slideVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(slide_video_path, '.flv'),
'ext': 'flv',
'format_note': 'slide deck video',
'quality': -2,
'preference': -2,
'format_id': 'slides',
})
speaker_video_path = xml_description.find('./metadata/speakerVideo').text
formats.append({
'url': 'rtmp://%s/ondemand?ovpfv=1.1' % akamai_url,
'play_path': remove_end(speaker_video_path, '.flv'),
'ext': 'flv',
'format_note': 'speaker video',
'quality': -1,
'preference': -1,
'format_id': 'speaker',
})
return formats
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None or password is None: if username is None or password is None:
@ -183,17 +152,10 @@ class GDCVaultIE(InfoExtractor):
r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename') start_page, 'xml filename')
xml_description = self._download_xml(
'%s/xml/%s' % (xml_root, xml_name), display_id)
video_title = xml_description.find('./metadata/title').text
video_formats = self._parse_mp4(xml_description)
if video_formats is None:
video_formats = self._parse_flv(xml_description)
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': video_title, 'url': '%s/xml/%s' % (xml_root, xml_name),
'formats': video_formats, 'ie_key': 'DigitallySpeaking',
} }

View File

@ -51,7 +51,7 @@ from .tnaflix import TNAFlixNetworkEmbedIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE from .dailymotion import DailymotionCloudIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
from .snagfilms import SnagFilmsEmbedIE from .viewlift import ViewLiftEmbedIE
from .screenwavemedia import ScreenwaveMediaIE from .screenwavemedia import ScreenwaveMediaIE
from .mtv import MTVServicesEmbeddedIE from .mtv import MTVServicesEmbeddedIE
from .pladform import PladformIE from .pladform import PladformIE
@ -60,6 +60,7 @@ from .googledrive import GoogleDriveIE
from .jwplatform import JWPlatformIE from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .instagram import InstagramIE from .instagram import InstagramIE
from .liveleak import LiveLeakIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -104,7 +105,8 @@ class GenericIE(InfoExtractor):
'skip_download': True, # infinite live stream 'skip_download': True, # infinite live stream
}, },
'expected_warnings': [ 'expected_warnings': [
r'501.*Not Implemented' r'501.*Not Implemented',
r'400.*Bad Request',
], ],
}, },
# Direct link with incorrect MIME type # Direct link with incorrect MIME type
@ -235,6 +237,7 @@ class GenericIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'car-20120827-manifest', 'title': 'car-20120827-manifest',
'formats': 'mincount:9', 'formats': 'mincount:9',
'upload_date': '20130904',
}, },
'params': { 'params': {
'format': 'bestvideo', 'format': 'bestvideo',
@ -594,7 +597,11 @@ class GenericIE(InfoExtractor):
'id': 'k2mm4bCdJ6CQ2i7c8o2', 'id': 'k2mm4bCdJ6CQ2i7c8o2',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Le Zap de Spi0n n°216 - Zapping du Web', 'title': 'Le Zap de Spi0n n°216 - Zapping du Web',
'description': 'md5:faf028e48a461b8b7fad38f1e104b119',
'uploader': 'Spi0n', 'uploader': 'Spi0n',
'uploader_id': 'xgditw',
'upload_date': '20140425',
'timestamp': 1398441542,
}, },
'add_ie': ['Dailymotion'], 'add_ie': ['Dailymotion'],
}, },
@ -727,8 +734,11 @@ class GenericIE(InfoExtractor):
'id': 'uxjb0lwrcz', 'id': 'uxjb0lwrcz',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks', 'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
'description': 'a Martin Fowler video from ThoughtWorks',
'duration': 1715.0, 'duration': 1715.0,
'uploader': 'thoughtworks.wistia.com', 'uploader': 'thoughtworks.wistia.com',
'upload_date': '20140603',
'timestamp': 1401832161,
}, },
}, },
# Soundcloud embed # Soundcloud embed
@ -877,6 +887,7 @@ class GenericIE(InfoExtractor):
# Eagle.Platform embed (generic URL) # Eagle.Platform embed (generic URL)
{ {
'url': 'http://lenta.ru/news/2015/03/06/navalny/', 'url': 'http://lenta.ru/news/2015/03/06/navalny/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': { 'info_dict': {
'id': '227304', 'id': '227304',
'ext': 'mp4', 'ext': 'mp4',
@ -891,6 +902,7 @@ class GenericIE(InfoExtractor):
# ClipYou (Eagle.Platform) embed (custom URL) # ClipYou (Eagle.Platform) embed (custom URL)
{ {
'url': 'http://muz-tv.ru/play/7129/', 'url': 'http://muz-tv.ru/play/7129/',
# Not checking MD5 as sometimes the direct HTTP link results in 404 and HLS is used
'info_dict': { 'info_dict': {
'id': '12820', 'id': '12820',
'ext': 'mp4', 'ext': 'mp4',
@ -979,6 +991,9 @@ class GenericIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'title': "PFT Live: New leader in the 'new-look' defense", 'title': "PFT Live: New leader in the 'new-look' defense",
'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e', 'description': 'md5:65a19b4bbfb3b0c0c5768bed1dfad74e',
'uploader': 'NBCU-SPORTS',
'upload_date': '20140107',
'timestamp': 1389118457,
}, },
}, },
# UDN embed # UDN embed
@ -1031,6 +1046,9 @@ class GenericIE(InfoExtractor):
'title': 'SN Presents: Russell Martin, World Citizen', 'title': 'SN Presents: Russell Martin, World Citizen',
'description': 'To understand why he was the Toronto Blue Jays top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.', 'description': 'To understand why he was the Toronto Blue Jays top off-season priority is to appreciate his background and upbringing in Montreal, where he first developed his baseball skills. Written and narrated by Stephen Brunt.',
'uploader': 'Rogers Sportsnet', 'uploader': 'Rogers Sportsnet',
'uploader_id': '1704050871',
'upload_date': '20150525',
'timestamp': 1432570283,
}, },
}, },
# Dailymotion Cloud video # Dailymotion Cloud video
@ -1122,12 +1140,39 @@ class GenericIE(InfoExtractor):
'title': 'The Cardinal Pell Interview', 'title': 'The Cardinal Pell Interview',
'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ', 'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
'uploader': 'GlobeCast Australia - GlobeStream', 'uploader': 'GlobeCast Australia - GlobeStream',
'uploader_id': '2733773828001',
'upload_date': '20160304',
'timestamp': 1457083087,
}, },
'params': { 'params': {
# m3u8 downloads # m3u8 downloads
'skip_download': True, 'skip_download': True,
}, },
}, },
# Another form of arte.tv embed
{
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
'md5': '850bfe45417ddf221288c88a0cffe2e2',
'info_dict': {
'id': '030273-562_PLUS7-F',
'ext': 'mp4',
'title': 'ARTE Reportage - Nulle part, en France',
'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
'upload_date': '20160409',
},
},
# LiveLeak embed
{
'url': 'http://www.wykop.pl/link/3088787/',
'md5': 'ace83b9ed19b21f68e1b50e844fdf95d',
'info_dict': {
'id': '874_1459135191',
'ext': 'mp4',
'title': 'Man shows poor quality of new apartment building',
'description': 'The wall is like a sand pile.',
'uploader': 'Lake8737',
}
},
] ]
def report_following_redirect(self, new_url): def report_following_redirect(self, new_url):
@ -1702,7 +1747,7 @@ class GenericIE(InfoExtractor):
# Look for embedded arte.tv player # Look for embedded arte.tv player
mobj = re.search( mobj = re.search(
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"', r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
webpage) webpage)
if mobj is not None: if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed') return self.url_result(mobj.group('url'), 'ArteTVEmbed')
@ -1879,10 +1924,10 @@ class GenericIE(InfoExtractor):
if onionstudios_url: if onionstudios_url:
return self.url_result(onionstudios_url) return self.url_result(onionstudios_url)
# Look for SnagFilms embeds # Look for ViewLift embeds
snagfilms_url = SnagFilmsEmbedIE._extract_url(webpage) viewlift_url = ViewLiftEmbedIE._extract_url(webpage)
if snagfilms_url: if viewlift_url:
return self.url_result(snagfilms_url) return self.url_result(viewlift_url)
# Look for JWPlatform embeds # Look for JWPlatform embeds
jwplatform_url = JWPlatformIE._extract_url(webpage) jwplatform_url = JWPlatformIE._extract_url(webpage)
@ -1930,7 +1975,13 @@ class GenericIE(InfoExtractor):
# Look for Instagram embeds # Look for Instagram embeds
instagram_embed_url = InstagramIE._extract_embed_url(webpage) instagram_embed_url = InstagramIE._extract_embed_url(webpage)
if instagram_embed_url is not None: if instagram_embed_url is not None:
return self.url_result(instagram_embed_url, InstagramIE.ie_key()) return self.url_result(
self._proto_relative_url(instagram_embed_url), InstagramIE.ie_key())
# Look for LiveLeak embeds
liveleak_url = LiveLeakIE._extract_url(webpage)
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
def check_video(vurl): def check_video(vurl):
if YoutubeIE.suitable(vurl): if YoutubeIE.suitable(vurl):
@ -2013,6 +2064,7 @@ class GenericIE(InfoExtractor):
entries = [] entries = []
for video_url in found: for video_url in found:
video_url = unescapeHTML(video_url)
video_url = video_url.replace('\\/', '/') video_url = video_url.replace('\\/', '/')
video_url = compat_urlparse.urljoin(url, video_url) video_url = compat_urlparse.urljoin(url, video_url)
video_id = compat_urllib_parse_unquote(os.path.basename(video_url)) video_id = compat_urllib_parse_unquote(os.path.basename(video_url))

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import unified_strdate
class GlideIE(InfoExtractor): class GlideIE(InfoExtractor):
@ -15,26 +16,38 @@ class GlideIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Damon Timm\'s Glide message', 'title': 'Damon Timm\'s Glide message',
'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$', 'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
'uploader': 'Damon Timm',
'upload_date': '20140919',
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._html_search_regex( title = self._html_search_regex(
r'<title>(.*?)</title>', webpage, 'title') r'<title>(.+?)</title>', webpage, 'title')
video_url = self.http_scheme() + self._search_regex( video_url = self._proto_relative_url(self._search_regex(
r'<source src="(.*?)" type="video/mp4">', webpage, 'video URL') r'<source[^>]+src=(["\'])(?P<url>.+?)\1',
thumbnail_url = self._search_regex( webpage, 'video URL', default=None,
r'<img id="video-thumbnail" src="(.*?)"', group='url')) or self._og_search_video_url(webpage)
webpage, 'thumbnail url', fatal=False) thumbnail = self._proto_relative_url(self._search_regex(
thumbnail = ( r'<img[^>]+id=["\']video-thumbnail["\'][^>]+src=(["\'])(?P<url>.+?)\1',
thumbnail_url if thumbnail_url is None webpage, 'thumbnail url', default=None,
else self.http_scheme() + thumbnail_url) group='url')) or self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'<div[^>]+class=["\']info-name["\'][^>]*>([^<]+)',
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'<div[^>]+class="info-date"[^>]*>([^<]+)',
webpage, 'upload date', fatal=False))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': video_url, 'url': video_url,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
} }

View File

@ -14,13 +14,13 @@ class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)' _VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = { _TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video', 'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '027fcc54459dff0feb0bc06a7aeda680', 'md5': '4b6db9a0a333142eb9f15913142b0ed1',
'info_dict': { 'info_dict': {
'id': '299069', 'id': '299069',
'ext': 'flv', 'ext': 'flv',
'title': 'DIESEL SFW XXX Video', 'title': 'DIESEL SFW XXX Video',
'thumbnail': 're:^http://.*\.jpg$', 'thumbnail': 're:^http://.*\.jpg$',
'duration': 79, 'duration': 80,
'age_limit': 18, 'age_limit': 18,
} }
} }
@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
'title': title, 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'age_limit': self._family_friendly_search(webpage), 'age_limit': 18,
} }

View File

@ -2,12 +2,6 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
xpath_element,
xpath_text,
int_or_none,
parse_duration,
)
class GPUTechConfIE(InfoExtractor): class GPUTechConfIE(InfoExtractor):
@ -27,29 +21,15 @@ class GPUTechConfIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
root_path = self._search_regex(r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path', 'http://evt.dispeak.com/nvidia/events/gtc15/') root_path = self._search_regex(
xml_file_id = self._search_regex(r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id') r'var\s+rootPath\s*=\s*"([^"]+)', webpage, 'root path',
default='http://evt.dispeak.com/nvidia/events/gtc15/')
doc = self._download_xml('%sxml/%s.xml' % (root_path, xml_file_id), video_id) xml_file_id = self._search_regex(
r'var\s+xmlFileId\s*=\s*"([^"]+)', webpage, 'xml file id')
metadata = xpath_element(doc, 'metadata')
http_host = xpath_text(metadata, 'httpHost', 'http host', True)
mbr_videos = xpath_element(metadata, 'MBRVideos')
formats = []
for mbr_video in mbr_videos.findall('MBRVideo'):
stream_name = xpath_text(mbr_video, 'streamName')
if stream_name:
formats.append({
'url': 'http://%s/%s' % (http_host, stream_name.replace('mp4:', '')),
'tbr': int_or_none(xpath_text(mbr_video, 'bitrate')),
})
self._sort_formats(formats)
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'title': xpath_text(metadata, 'title'), 'url': '%sxml/%s.xml' % (root_path, xml_file_id),
'duration': parse_duration(xpath_text(metadata, 'endTime')), 'ie_key': 'DigitallySpeaking',
'creator': xpath_text(metadata, 'speaker'),
'formats': formats,
} }

View File

@ -16,14 +16,14 @@ class GrouponIE(InfoExtractor):
'playlist': [{ 'playlist': [{
'info_dict': { 'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf', 'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'mp4', 'ext': 'flv',
'title': 'Bikram Yoga Huntington Beach | Orange County', 'title': 'Bikram Yoga Huntington Beach | Orange County',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e', 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 44.961, 'duration': 44.961,
}, },
}], }],
'params': { 'params': {
'skip_download': 'HLS', 'skip_download': 'HDS',
} }
} }
@ -32,7 +32,7 @@ class GrouponIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id) webpage = self._download_webpage(url, playlist_id)
payload = self._parse_json(self._search_regex( payload = self._parse_json(self._search_regex(
r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id) r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
videos = payload['carousel'].get('dealVideos', []) videos = payload['carousel'].get('dealVideos', [])
entries = [] entries = []
for v in videos: for v in videos:

View File

@ -24,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 161, 'duration': 161,
}, },
'skip': 'Video broken',
}, },
{ {
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm', 'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',

View File

@ -4,6 +4,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext,
parse_duration, parse_duration,
unified_strdate, unified_strdate,
) )
@ -29,7 +30,12 @@ class HuffPostIE(InfoExtractor):
'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ', 'description': 'This week on Legalese It, Mike talks to David Bosco about his new book on the ICC, "Rough Justice," he also discusses the Virginia AG\'s historic stance on gay marriage, the execution of Edgar Tamayo, the ICC\'s delay of Kenya\'s President and more. ',
'duration': 1549, 'duration': 1549,
'upload_date': '20140124', 'upload_date': '20140124',
} },
'params': {
# m3u8 download
'skip_download': True,
},
'expected_warnings': ['HTTP Error 404: Not Found'],
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -45,7 +51,7 @@ class HuffPostIE(InfoExtractor):
description = data.get('description') description = data.get('description')
thumbnails = [] thumbnails = []
for url in data['images'].values(): for url in filter(None, data['images'].values()):
m = re.match('.*-([0-9]+x[0-9]+)\.', url) m = re.match('.*-([0-9]+x[0-9]+)\.', url)
if not m: if not m:
continue continue
@ -54,13 +60,25 @@ class HuffPostIE(InfoExtractor):
'resolution': m.group(1), 'resolution': m.group(1),
}) })
formats = [{ formats = []
sources = data.get('sources', {})
live_sources = list(sources.get('live', {}).items()) + list(sources.get('live_again', {}).items())
for key, url in live_sources:
ext = determine_ext(url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
url, video_id, ext='mp4', m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
url + '?hdcore=2.9.5', video_id, f4m_id='hds', fatal=False))
else:
formats.append({
'format': key, 'format': key,
'format_id': key.replace('/', '.'), 'format_id': key.replace('/', '.'),
'ext': 'mp4', 'ext': 'mp4',
'url': url, 'url': url,
'vcodec': 'none' if key.startswith('audio/') else None, 'vcodec': 'none' if key.startswith('audio/') else None,
} for key, url in data.get('sources', {}).get('live', {}).items()] })
if not formats and data.get('fivemin_id'): if not formats and data.get('fivemin_id'):
return self.url_result('5min:%s' % data['fivemin_id']) return self.url_result('5min:%s' % data['fivemin_id'])

View File

@ -1,10 +1,10 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re import re
import json
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
mimetype2ext,
qualities, qualities,
) )
@ -12,9 +12,9 @@ from ..utils import (
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/[^/]+/vi(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': { 'info_dict': {
'id': '2524815897', 'id': '2524815897',
@ -22,7 +22,10 @@ class ImdbIE(InfoExtractor):
'title': 'Ice Age: Continental Drift Trailer (No. 2) - IMDb', 'title': 'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
'description': 'md5:9061c2219254e5d14e03c25c98e96a81', 'description': 'md5:9061c2219254e5d14e03c25c98e96a81',
} }
} }, {
'url': 'http://www.imdb.com/video/_/vi2524815897',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -48,13 +51,27 @@ class ImdbIE(InfoExtractor):
json_data = self._search_regex( json_data = self._search_regex(
r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>', r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
format_page, 'json data', flags=re.DOTALL) format_page, 'json data', flags=re.DOTALL)
info = json.loads(json_data) info = self._parse_json(json_data, video_id, fatal=False)
format_info = info['videoPlayerObject']['video'] if not info:
f_id = format_info['ffname'] continue
format_info = info.get('videoPlayerObject', {}).get('video', {})
if not format_info:
continue
video_info_list = format_info.get('videoInfoList')
if not video_info_list or not isinstance(video_info_list, list):
continue
video_info = video_info_list[0]
if not video_info or not isinstance(video_info, dict):
continue
video_url = video_info.get('videoUrl')
if not video_url:
continue
format_id = format_info.get('ffname')
formats.append({ formats.append({
'format_id': f_id, 'format_id': format_id,
'url': format_info['videoInfoList'][0]['videoUrl'], 'url': video_url,
'quality': quality(f_id), 'ext': mimetype2ext(video_info.get('videoMimeType')),
'quality': quality(format_id),
}) })
self._sort_formats(formats) self._sort_formats(formats)

View File

@ -12,7 +12,7 @@ from ..utils import (
class InstagramIE(InfoExtractor): class InstagramIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+)' _VALID_URL = r'(?P<url>https?://(?:www\.)?instagram\.com/p/(?P<id>[^/?#&]+))'
_TESTS = [{ _TESTS = [{
'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc', 'url': 'https://instagram.com/p/aye83DjauH/?foo=bar#abc',
'md5': '0d2da106a9d2631273e192b372806516', 'md5': '0d2da106a9d2631273e192b372806516',
@ -38,10 +38,19 @@ class InstagramIE(InfoExtractor):
}, { }, {
'url': 'https://instagram.com/p/-Cmh1cukG2/', 'url': 'https://instagram.com/p/-Cmh1cukG2/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://instagram.com/p/9o6LshA7zy/embed/',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_embed_url(webpage): def _extract_embed_url(webpage):
mobj = re.search(
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?instagram\.com/p/[^/]+/embed.*?)\1',
webpage)
if mobj:
return mobj.group('url')
blockquote_el = get_element_by_attribute( blockquote_el = get_element_by_attribute(
'class', 'instagram-media', webpage) 'class', 'instagram-media', webpage)
if blockquote_el is None: if blockquote_el is None:
@ -53,7 +62,9 @@ class InstagramIE(InfoExtractor):
return mobj.group('link') return mobj.group('link')
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
url = mobj.group('url')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"', uploader_id = self._search_regex(r'"owner":{"username":"(.+?)"',

View File

@ -1,93 +1,91 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs,
compat_urlparse, compat_urlparse,
compat_urllib_parse_urlencode,
) )
from ..utils import ( from ..utils import (
xpath_with_ns, determine_ext,
int_or_none,
xpath_text,
) )
class InternetVideoArchiveIE(InfoExtractor): class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?' _VALID_URL = r'https?://video\.internetvideoarchive\.net/(?:player|flash/players)/.*?\?.*?publishedid.*?'
_TEST = { _TEST = {
'url': 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247', 'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?customerid=69249&publishedid=194487&reporttag=vdbetatitle&playerid=641&autolist=0&domain=www.videodetective.com&maxrate=high&minrate=low&socialplayer=false',
'info_dict': { 'info_dict': {
'id': '452693', 'id': '194487',
'ext': 'mp4', 'ext': 'mp4',
'title': 'SKYFALL', 'title': 'KICK-ASS 2',
'description': 'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.', 'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
'duration': 152, },
'params': {
# m3u8 download
'skip_download': True,
}, },
} }
@staticmethod @staticmethod
def _build_url(query): def _build_json_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
@staticmethod @staticmethod
def _clean_query(query): def _build_xml_url(query):
NEEDED_ARGS = ['publishedid', 'customerid'] return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k, v[0]) for (k, v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse_urlencode(cleaned_dic)
def _real_extract(self, url): def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query) query_dic = compat_parse_qs(query)
video_id = query_dic['publishedid'][0] video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration = self._download_xml(url, video_id, if '/player/' in url:
'Downloading flash configuration') configuration = self._download_json(url, video_id)
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx') # There are multiple videos in the playlist whlie only the first one
# Replace some of the parameters in the query to get the best quality # matches the video played in browsers
# and http links (no m3u8 manifests) video_info = configuration['playlist'][0]
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info = self._download_xml(file_url, video_id,
'Downloading video info')
item = info.find('channel/item')
def _bp(p):
return xpath_with_ns(
p,
{
'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats',
}
)
formats = [] formats = []
for content in item.findall(_bp('media:group/media:content')): for source in video_info['sources']:
attr = content.attrib file_url = source['file']
f_url = attr['url'] if determine_ext(file_url) == 'm3u8':
width = int(attr['width']) formats.extend(self._extract_m3u8_formats(
bitrate = int(attr['bitrate']) file_url, video_id, ext='mp4', m3u8_id='hls'))
format_id = '%d-%dk' % (width, bitrate) else:
formats.append({ a_format = {
'format_id': format_id, 'url': file_url,
'url': f_url, }
'width': width,
'tbr': bitrate, if source.get('label') and source['label'][-4:] == ' kbs':
tbr = int_or_none(source['label'][:-4])
a_format.update({
'tbr': tbr,
'format_id': 'http-%d' % tbr,
}) })
formats.append(a_format)
self._sort_formats(formats) self._sort_formats(formats)
title = video_info['title']
description = video_info.get('description')
thumbnail = video_info.get('image')
else:
configuration = self._download_xml(url, video_id)
formats = [{
'url': xpath_text(configuration, './file', 'file URL', fatal=True),
}]
thumbnail = xpath_text(configuration, './image', 'thumbnail')
title = 'InternetVideoArchive video %s' % video_id
description = None
return { return {
'id': video_id, 'id': video_id,
'title': item.find('title').text, 'title': title,
'formats': formats, 'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'], 'thumbnail': thumbnail,
'description': item.find('description').text, 'description': description,
'duration': int(attr['duration']),
} }

View File

@ -165,7 +165,7 @@ class IqiyiIE(InfoExtractor):
IE_NAME = 'iqiyi' IE_NAME = 'iqiyi'
IE_DESC = '爱奇艺' IE_DESC = '爱奇艺'
_VALID_URL = r'https?://(?:[^.]+\.)?iqiyi\.com/.+\.html' _VALID_URL = r'https?://(?:(?:[^.]+\.)?iqiyi\.com|www\.pps\.tv)/.+\.html'
_NETRC_MACHINE = 'iqiyi' _NETRC_MACHINE = 'iqiyi'
@ -273,6 +273,9 @@ class IqiyiIE(InfoExtractor):
'title': '灌篮高手 国语版', 'title': '灌篮高手 国语版',
}, },
'playlist_count': 101, 'playlist_count': 101,
}, {
'url': 'http://www.pps.tv/w_19rrbav0ph.html',
'only_matching': True,
}] }]
_FORMATS_MAP = [ _FORMATS_MAP = [
@ -284,6 +287,13 @@ class IqiyiIE(InfoExtractor):
('10', 'h1'), ('10', 'h1'),
] ]
AUTH_API_ERRORS = {
# No preview available (不允许试看鉴权失败)
'Q00505': 'This video requires a VIP account',
# End of preview time (试看结束鉴权失败)
'Q00506': 'Needs a VIP account for full video',
}
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -369,14 +379,18 @@ class IqiyiIE(InfoExtractor):
note='Downloading video authentication JSON', note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON') errnote='Unable to download video authentication JSON')
if auth_result['code'] == 'Q00505': # No preview available (不允许试看鉴权失败) code = auth_result.get('code')
raise ExtractorError('This video requires a VIP account', expected=True) msg = self.AUTH_API_ERRORS.get(code) or auth_result.get('msg') or code
if auth_result['code'] == 'Q00506': # End of preview time (试看结束鉴权失败) if code == 'Q00506':
if do_report_warning: if do_report_warning:
self.report_warning('Needs a VIP account for full video') self.report_warning(msg)
return False return False
if 'data' not in auth_result:
if msg is not None:
raise ExtractorError('%s said: %s' % (self.IE_NAME, msg), expected=True)
raise ExtractorError('Unexpected error from Iqiyi auth API')
return auth_result return auth_result['data']
def construct_video_urls(self, data, video_id, _uuid, tvid): def construct_video_urls(self, data, video_id, _uuid, tvid):
def do_xor(x, y): def do_xor(x, y):
@ -452,11 +466,11 @@ class IqiyiIE(InfoExtractor):
need_vip_warning_report = False need_vip_warning_report = False
break break
param.update({ param.update({
't': auth_result['data']['t'], 't': auth_result['t'],
# cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as # cid is hard-coded in com/qiyi/player/core/player/RuntimeData.as
'cid': 'afbe8fd3d73448c9', 'cid': 'afbe8fd3d73448c9',
'vid': video_id, 'vid': video_id,
'QY00001': auth_result['data']['u'], 'QY00001': auth_result['u'],
}) })
api_video_url += '?' if '?' not in api_video_url else '&' api_video_url += '?' if '?' not in api_video_url else '&'
api_video_url += compat_urllib_parse_urlencode(param) api_video_url += compat_urllib_parse_urlencode(param)
@ -491,7 +505,10 @@ class IqiyiIE(InfoExtractor):
'enc': md5_text(enc_key + tail), 'enc': md5_text(enc_key + tail),
'qyid': _uuid, 'qyid': _uuid,
'tn': random.random(), 'tn': random.random(),
'um': 0, # In iQiyi's flash player, um is set to 1 if there's a logged user
# Some 1080P formats are only available with a logged user.
# Here force um=1 to trick the iQiyi server
'um': 1,
'authkey': md5_text(md5_text('') + tail), 'authkey': md5_text(md5_text('') + tail),
'k_tag': 1, 'k_tag': 1,
} }

View File

@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi', 'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
'description': 'md5:253753e2655dde93f59f74b572454f6d', 'description': 'md5:253753e2655dde93f59f74b572454f6d',
'thumbnail': 're:^http://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
'uploader_id': 'pelikzzle', 'uploader_id': 'pelikzzle',
'timestamp': int, 'timestamp': int,
'upload_date': '20140702', 'upload_date': '20140702',
@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
'id': '17997', 'id': '17997',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tarkan Dortmund 2006 Konseri', 'title': 'Tarkan Dortmund 2006 Konseri',
'description': 'Tarkan Dortmund 2006 Konseri', 'thumbnail': 're:^https://.*\.jpg',
'thumbnail': 're:^http://.*\.jpg',
'uploader_id': 'parlayankiz', 'uploader_id': 'parlayankiz',
'timestamp': int, 'timestamp': int,
'upload_date': '20061112', 'upload_date': '20061112',
@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage) title = self._og_search_title(webpage)
description = self._og_search_description(webpage) description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url( thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:') self._og_search_thumbnail(webpage), scheme='http:')

View File

@ -1,47 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
class JadoreCettePubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_TEST = {
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
'md5': '401286a06067c70b44076044b66515de',
'info_dict': {
'id': 'jLMja3tr7a4',
'ext': 'mp4',
'title': 'La pire utilisation de Star Wars',
'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon. Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
webpage, 'title')
description = self._html_search_regex(
r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
fatal=False)
real_url = self._search_regex(
r'\[/postlink\](.*)endofvid', webpage, 'video URL')
video_id = YoutubeIE.extract_id(real_url)
return {
'_type': 'url_transparent',
'url': real_url,
'id': video_id,
'title': title,
'description': description,
}

View File

@ -4,16 +4,15 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import (
float_or_none,
int_or_none,
)
class JWPlatformBaseIE(InfoExtractor): class JWPlatformBaseIE(InfoExtractor):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True): def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
video_data = jwplayer_data['playlist'][0] video_data = jwplayer_data['playlist'][0]
subtitles = {}
for track in video_data['tracks']:
if track['kind'] == 'captions':
subtitles[track['label']] = [{'url': self._proto_relative_url(track['file'])}]
formats = [] formats = []
for source in video_data['sources']: for source in video_data['sources']:
@ -35,12 +34,22 @@ class JWPlatformBaseIE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
return { return {
'id': video_id, 'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'), 'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'), 'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')), 'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')), 'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
} }

View File

@ -2,39 +2,63 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
js_to_json,
)
class KaraoketvIE(InfoExtractor): class KaraoketvIE(InfoExtractor):
_VALID_URL = r'https?://karaoketv\.co\.il/\?container=songs&id=(?P<id>[0-9]+)' _VALID_URL = r'http://www.karaoketv.co.il/[^/]+/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://karaoketv.co.il/?container=songs&id=171568', 'url': 'http://www.karaoketv.co.il/%D7%A9%D7%99%D7%A8%D7%99_%D7%A7%D7%A8%D7%99%D7%95%D7%A7%D7%99/58356/%D7%90%D7%99%D7%96%D7%95%D7%9F',
'info_dict': { 'info_dict': {
'id': '171568', 'id': '58356',
'ext': 'mp4', 'ext': 'flv',
'title': 'אל העולם שלך - רותם כהן - שרים קריוקי', 'title': 'קריוקי של איזון',
},
'params': {
# rtmp download
'skip_download': True,
} }
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
api_page_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.karaoke\.co\.il/api_play\.php\?.+?)\1',
webpage, 'API play URL', group='url')
page_video_url = self._og_search_video_url(webpage, video_id) api_page = self._download_webpage(api_page_url, video_id)
config_json = compat_urllib_parse_unquote_plus(self._search_regex( video_cdn_url = self._search_regex(
r'config=(.*)', page_video_url, 'configuration')) r'<iframe[^>]+src=(["\'])(?P<url>https?://www\.video-cdn\.com/embed/iframe/.+?)\1',
api_page, 'video cdn URL', group='url')
urls_info_json = self._download_json( video_cdn = self._download_webpage(video_cdn_url, video_id)
config_json, video_id, 'Downloading configuration', play_path = self._parse_json(
transform_source=js_to_json) self._search_regex(
r'var\s+options\s*=\s*({.+?});', video_cdn, 'options'),
video_id)['clip']['url']
url = urls_info_json['playlist'][0]['url'] settings = self._parse_json(
self._search_regex(
r'var\s+settings\s*=\s*({.+?});', video_cdn, 'servers', default='{}'),
video_id, fatal=False) or {}
servers = settings.get('servers')
if not servers or not isinstance(servers, list):
servers = ('wowzail.video-cdn.com:80/vodcdn', )
formats = [{
'url': 'rtmp://%s' % server if not server.startswith('rtmp') else server,
'play_path': play_path,
'app': 'vodcdn',
'page_url': video_cdn_url,
'player_url': 'http://www.video-cdn.com/assets/flowplayer/flowplayer.commercial-3.2.18.swf',
'rtmp_real_time': True,
'ext': 'flv',
} for server in servers]
return { return {
'id': video_id, 'id': video_id,
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'url': url, 'formats': formats,
} }

View File

@ -52,9 +52,12 @@ class KarriereVideosIE(InfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
r'/config/video/(.+?)\.xml', webpage, 'video id') r'/config/video/(.+?)\.xml', webpage, 'video id')
# Server returns malformed headers
# Force Accept-Encoding: * to prevent gzipped results
playlist = self._download_xml( playlist = self._download_xml(
'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id, 'http://www.karrierevideos.at/player-playlist.xml.php?p=%s' % video_id,
video_id, transform_source=fix_xml_ampersands) video_id, transform_source=fix_xml_ampersands,
headers={'Accept-Encoding': '*'})
NS_MAP = { NS_MAP = {
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats' 'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats'

View File

@ -81,7 +81,7 @@ class KuwoIE(KuwoBaseIE):
'id': '6446136', 'id': '6446136',
'ext': 'mp3', 'ext': 'mp3',
'title': '', 'title': '',
'description': 'md5:b2ab6295d014005bfc607525bfc1e38a', 'description': 'md5:5d0e947b242c35dc0eb1d2fce9fbf02c',
'creator': 'IU', 'creator': 'IU',
'upload_date': '20150518', 'upload_date': '20150518',
}, },
@ -102,10 +102,10 @@ class KuwoIE(KuwoBaseIE):
raise ExtractorError('this song has been offline because of copyright issues', expected=True) raise ExtractorError('this song has been offline because of copyright issues', expected=True)
song_name = self._html_search_regex( song_name = self._html_search_regex(
r'(?s)class="(?:[^"\s]+\s+)*title(?:\s+[^"\s]+)*".*?<h1[^>]+title="([^"]+)"', webpage, 'song name') r'<p[^>]+id="lrcName">([^<]+)</p>', webpage, 'song name')
singer_name = self._html_search_regex( singer_name = remove_start(self._html_search_regex(
r'<div[^>]+class="s_img">\s*<a[^>]+title="([^>]+)"', r'<a[^>]+href="http://www\.kuwo\.cn/artist/content\?name=([^"]+)">',
webpage, 'singer name', fatal=False) webpage, 'singer name', fatal=False), '歌手')
lrc_content = clean_html(get_element_by_id('lrcContent', webpage)) lrc_content = clean_html(get_element_by_id('lrcContent', webpage))
if lrc_content == '暂无': # indicates no lyrics if lrc_content == '暂无': # indicates no lyrics
lrc_content = None lrc_content = None
@ -114,7 +114,7 @@ class KuwoIE(KuwoBaseIE):
self._sort_formats(formats) self._sort_formats(formats)
album_id = self._html_search_regex( album_id = self._html_search_regex(
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"', r'<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',
webpage, 'album id', fatal=False) webpage, 'album id', fatal=False)
publish_time = None publish_time = None
@ -268,7 +268,7 @@ class KuwoCategoryIE(InfoExtractor):
'title': '八十年代精选', 'title': '八十年代精选',
'description': '这些都是属于八十年代的回忆!', 'description': '这些都是属于八十年代的回忆!',
}, },
'playlist_count': 30, 'playlist_mincount': 24,
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -283,6 +283,8 @@ class KuwoCategoryIE(InfoExtractor):
category_desc = remove_start( category_desc = remove_start(
get_element_by_id('intro', webpage).strip(), get_element_by_id('intro', webpage).strip(),
'%s简介:' % category_name) '%s简介:' % category_name)
if category_desc == '暂无':
category_desc = None
jsonm = self._parse_json(self._html_search_regex( jsonm = self._parse_json(self._html_search_regex(
r'var\s+jsonm\s*=\s*([^;]+);', webpage, 'category songs'), category_id) r'var\s+jsonm\s*=\s*([^;]+);', webpage, 'category songs'), category_id)

View File

@ -63,6 +63,7 @@ class Laola1TvIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This live stream has already finished.',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -74,6 +75,9 @@ class Laola1TvIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
if 'Dieser Livestream ist bereits beendet.' in webpage:
raise ExtractorError('This live stream has already finished.', expected=True)
iframe_url = self._search_regex( iframe_url = self._search_regex(
r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"', r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
webpage, 'iframe url') webpage, 'iframe url')

View File

@ -6,6 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
determine_protocol,
parse_duration, parse_duration,
int_or_none, int_or_none,
) )
@ -18,10 +19,14 @@ class Lecture2GoIE(InfoExtractor):
'md5': 'ac02b570883020d208d405d5a3fd2f7f', 'md5': 'ac02b570883020d208d405d5a3fd2f7f',
'info_dict': { 'info_dict': {
'id': '17473', 'id': '17473',
'ext': 'flv', 'ext': 'mp4',
'title': '2 - Endliche Automaten und reguläre Sprachen', 'title': '2 - Endliche Automaten und reguläre Sprachen',
'creator': 'Frank Heitmann', 'creator': 'Frank Heitmann',
'duration': 5220, 'duration': 5220,
},
'params': {
# m3u8 download
'skip_download': True,
} }
} }
@ -32,14 +37,18 @@ class Lecture2GoIE(InfoExtractor):
title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title') title = self._html_search_regex(r'<em[^>]+class="title">(.+)</em>', webpage, 'title')
formats = [] formats = []
for url in set(re.findall(r'"src","([^"]+)"', webpage)): for url in set(re.findall(r'var\s+playerUri\d+\s*=\s*"([^"]+)"', webpage)):
ext = determine_ext(url) ext = determine_ext(url)
protocol = determine_protocol({'url': url})
if ext == 'f4m': if ext == 'f4m':
formats.extend(self._extract_f4m_formats(url, video_id)) formats.extend(self._extract_f4m_formats(url, video_id, f4m_id='hds'))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(url, video_id)) formats.extend(self._extract_m3u8_formats(url, video_id, ext='mp4', m3u8_id='hls'))
else: else:
if protocol == 'rtmp':
continue # XXX: currently broken
formats.append({ formats.append({
'format_id': protocol,
'url': url, 'url': url,
}) })

View File

@ -0,0 +1,137 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
smuggle_url,
unsmuggle_url,
)
class LiTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.litv\.tv/vod/[^/]+/content\.do\?.*?\bid=(?P<id>[^&]+)'
_URL_TEMPLATE = 'https://www.litv.tv/vod/%s/content.do?id=%s'
_TESTS = [{
'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
'info_dict': {
'id': 'VOD00041606',
'title': '花千骨',
},
'playlist_count': 50,
}, {
'url': 'https://www.litv.tv/vod/drama/content.do?brc_id=root&id=VOD00041610&isUHEnabled=true&autoPlay=1',
'info_dict': {
'id': 'VOD00041610',
'ext': 'mp4',
'title': '花千骨第1集',
'thumbnail': 're:https?://.*\.jpg$',
'description': 'md5:c7017aa144c87467c4fb2909c4b05d6f',
'episode_number': 1,
},
'params': {
'noplaylist': True,
'skip_download': True, # m3u8 download
},
'skip': 'Georestricted to Taiwan',
}]
def _extract_playlist(self, season_list, video_id, vod_data, view_data, prompt=True):
episode_title = view_data['title']
content_id = season_list['contentId']
if prompt:
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (content_id, video_id))
all_episodes = [
self.url_result(smuggle_url(
self._URL_TEMPLATE % (view_data['contentType'], episode['contentId']),
{'force_noplaylist': True})) # To prevent infinite recursion
for episode in season_list['episode']]
return self.playlist_result(all_episodes, content_id, episode_title)
def _real_extract(self, url):
url, data = unsmuggle_url(url, {})
video_id = self._match_id(url)
noplaylist = self._downloader.params.get('noplaylist')
noplaylist_prompt = True
if 'force_noplaylist' in data:
noplaylist = data['force_noplaylist']
noplaylist_prompt = False
webpage = self._download_webpage(url, video_id)
view_data = dict(map(lambda t: (t[0], t[2]), re.findall(
r'viewData\.([a-zA-Z]+)\s*=\s*(["\'])([^"\']+)\2',
webpage)))
vod_data = self._parse_json(self._search_regex(
'var\s+vod\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
video_id)
season_list = list(vod_data.get('seasonList', {}).values())
if season_list:
if not noplaylist:
return self._extract_playlist(
season_list[0], video_id, vod_data, view_data,
prompt=noplaylist_prompt)
if noplaylist_prompt:
self.to_screen('Downloading just video %s because of --no-playlist' % video_id)
# In browsers `getMainUrl` request is always issued. Usually this
# endpoint gives the same result as the data embedded in the webpage.
# If georestricted, there are no embedded data, so an extra request is
# necessary to get the error code
video_data = self._parse_json(self._search_regex(
r'uiHlsUrl\s*=\s*testBackendData\(([^;]+)\);',
webpage, 'video data', default='{}'), video_id)
if not video_data:
payload = {
'assetId': view_data['assetId'],
'watchDevices': vod_data['watchDevices'],
'contentType': view_data['contentType'],
}
video_data = self._download_json(
'https://www.litv.tv/vod/getMainUrl', video_id,
data=json.dumps(payload).encode('utf-8'),
headers={'Content-Type': 'application/json'})
if not video_data.get('fullpath'):
error_msg = video_data.get('errorMessage')
if error_msg == 'vod.error.outsideregionerror':
self.raise_geo_restricted('This video is available in Taiwan only')
if error_msg:
raise ExtractorError('%s said: %s' % (self.IE_NAME, error_msg), expected=True)
raise ExtractorError('Unexpected result from %s' % self.IE_NAME)
formats = self._extract_m3u8_formats(
video_data['fullpath'], video_id, ext='mp4', m3u8_id='hls')
for a_format in formats:
# LiTV HLS segments doesn't like compressions
a_format.setdefault('http_headers', {})['Youtubedl-no-compression'] = True
title = view_data['title'] + view_data.get('secondaryMark', '')
description = view_data.get('description')
thumbnail = view_data.get('imageFile')
categories = [item['name'] for item in vod_data.get('category', [])]
episode = int_or_none(view_data.get('episode'))
return {
'id': video_id,
'formats': formats,
'title': title,
'description': description,
'thumbnail': thumbnail,
'categories': categories,
'episode_number': episode,
}

View File

@ -17,7 +17,8 @@ class LiveLeakIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'description': 'extremely bad day for this guy..!', 'description': 'extremely bad day for this guy..!',
'uploader': 'ljfriel2', 'uploader': 'ljfriel2',
'title': 'Most unlucky car accident' 'title': 'Most unlucky car accident',
'thumbnail': 're:^https?://.*\.jpg$'
} }
}, { }, {
'url': 'http://www.liveleak.com/view?i=f93_1390833151', 'url': 'http://www.liveleak.com/view?i=f93_1390833151',
@ -28,6 +29,7 @@ class LiveLeakIE(InfoExtractor):
'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.', 'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
'uploader': 'ARD_Stinkt', 'uploader': 'ARD_Stinkt',
'title': 'German Television does first Edward Snowden Interview (ENGLISH)', 'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
'thumbnail': 're:^https?://.*\.jpg$'
} }
}, { }, {
'url': 'http://www.liveleak.com/view?i=4f7_1392687779', 'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
@ -49,10 +51,19 @@ class LiveLeakIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.', 'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
'uploader': 'bony333', 'uploader': 'bony333',
'title': 'Crazy Hungarian tourist films close call waterspout in Croatia' 'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
'thumbnail': 're:^https?://.*\.jpg$'
} }
}] }]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+src="https?://(?:\w+\.)?liveleak\.com/ll_embed\?(?:.*?)i=(?P<id>[\w_]+)(?:.*)',
webpage)
if mobj:
return 'http://www.liveleak.com/view?i=%s' % mobj.group('id')
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -64,6 +75,7 @@ class LiveLeakIE(InfoExtractor):
age_limit = int_or_none(self._search_regex( age_limit = int_or_none(self._search_regex(
r'you confirm that you are ([0-9]+) years and over.', r'you confirm that you are ([0-9]+) years and over.',
webpage, 'age limit', default=None)) webpage, 'age limit', default=None))
video_thumbnail = self._og_search_thumbnail(webpage)
sources_raw = self._search_regex( sources_raw = self._search_regex(
r'(?s)sources:\s*(\[.*?\]),', webpage, 'video URLs', default=None) r'(?s)sources:\s*(\[.*?\]),', webpage, 'video URLs', default=None)
@ -116,4 +128,5 @@ class LiveLeakIE(InfoExtractor):
'uploader': video_uploader, 'uploader': video_uploader,
'formats': formats, 'formats': formats,
'age_limit': age_limit, 'age_limit': age_limit,
'thumbnail': video_thumbnail,
} }

View File

@ -1,46 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
class MalemotionIE(InfoExtractor):
_VALID_URL = r'https?://malemotion\.com/video/(.+?)\.(?P<id>.+?)(#|$)'
_TEST = {
'url': 'http://malemotion.com/video/bete-de-concours.ltc',
'md5': '3013e53a0afbde2878bc39998c33e8a5',
'info_dict': {
'id': 'ltc',
'ext': 'mp4',
'title': 'Bête de Concours',
'age_limit': 18,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_url = compat_urllib_parse_unquote(self._search_regex(
r'<source type="video/mp4" src="(.+?)"', webpage, 'video URL'))
video_title = self._html_search_regex(
r'<title>(.*?)</title', webpage, 'title')
video_thumbnail = self._search_regex(
r'<video .+?poster="(.+?)"', webpage, 'thumbnail', fatal=False)
formats = [{
'url': video_url,
'ext': 'mp4',
'format_id': 'mp4',
'preference': 1,
}]
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'title': video_title,
'thumbnail': video_thumbnail,
'age_limit': 18,
}

View File

@ -49,8 +49,8 @@ class MDRIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Beutolomäus und der geheime Weihnachtswunsch', 'title': 'Beutolomäus und der geheime Weihnachtswunsch',
'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd', 'description': 'md5:b69d32d7b2c55cbe86945ab309d39bbd',
'timestamp': 1419047100, 'timestamp': 1450950000,
'upload_date': '20141220', 'upload_date': '20151224',
'duration': 4628, 'duration': 4628,
'uploader': 'KIKA', 'uploader': 'KIKA',
}, },
@ -71,8 +71,8 @@ class MDRIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data_url = self._search_regex( data_url = self._search_regex(
r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>\\?/.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1', r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
webpage, 'data url', default=None, group='url').replace('\/', '/') webpage, 'data url', group='url').replace('\/', '/')
doc = self._download_xml( doc = self._download_xml(
compat_urlparse.urljoin(url, data_url), video_id) compat_urlparse.urljoin(url, data_url), video_id)

View File

@ -81,6 +81,9 @@ class MetacafeIE(InfoExtractor):
'title': 'Open: This is Face the Nation, February 9', 'title': 'Open: This is Face the Nation, February 9',
'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476', 'description': 'md5:8a9ceec26d1f7ed6eab610834cc1a476',
'duration': 96, 'duration': 96,
'uploader': 'CBSI-NEW',
'upload_date': '20140209',
'timestamp': 1391959800,
}, },
'params': { 'params': {
# rtmp download # rtmp download

View File

@ -11,7 +11,7 @@ from ..utils import (
class MetacriticIE(InfoExtractor): class MetacriticIE(InfoExtractor):
_VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)' _VALID_URL = r'https?://www\.metacritic\.com/.+?/trailers/(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222', 'url': 'http://www.metacritic.com/game/playstation-4/infamous-second-son/trailers/3698222',
'info_dict': { 'info_dict': {
'id': '3698222', 'id': '3698222',
@ -20,7 +20,17 @@ class MetacriticIE(InfoExtractor):
'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.', 'description': 'Take a peak behind-the-scenes to see how Sucker Punch brings smoke into the universe of inFAMOUS Second Son on the PS4.',
'duration': 221, 'duration': 221,
}, },
} 'skip': 'Not providing trailers anymore',
}, {
'url': 'http://www.metacritic.com/game/playstation-4/tales-from-the-borderlands-a-telltale-game-series/trailers/5740315',
'info_dict': {
'id': '5740315',
'ext': 'mp4',
'title': 'Tales from the Borderlands - Finale: The Vault of the Traveler',
'description': 'In the final episode of the season, all hell breaks loose. Jack is now in control of Helios\' systems, and he\'s ready to reclaim his rightful place as king of Hyperion (with or without you).',
'duration': 114,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)

View File

@ -0,0 +1,64 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
class MGTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.mgtv\.com/v/(?:[^/]+/)*(?P<id>\d+)\.html'
IE_DESC = '芒果TV'
_TEST = {
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
'md5': '1bdadcf760a0b90946ca68ee9a2db41a',
'info_dict': {
'id': '3116640',
'ext': 'mp4',
'title': '我是歌手第四季双年巅峰会:韩红李玟“双王”领军对抗',
'description': '我是歌手第四季双年巅峰会',
'duration': 7461,
'thumbnail': 're:^https?://.*\.jpg$',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
api_data = self._download_json(
'http://v.api.mgtv.com/player/video', video_id,
query={'video_id': video_id})['data']
info = api_data['info']
formats = []
for idx, stream in enumerate(api_data['stream']):
stream_url = stream.get('url')
if not stream_url:
continue
tbr = int_or_none(self._search_regex(
r'(\d+)\.mp4', stream_url, 'tbr', default=None))
def extract_format(stream_url, format_id, idx, query={}):
format_info = self._download_json(
stream_url, video_id,
note='Download video info for format %s' % format_id or '#%d' % idx, query=query)
return {
'format_id': format_id,
'url': format_info['info'],
'ext': 'mp4',
'tbr': tbr,
}
formats.append(extract_format(
stream_url, 'hls-%d' % tbr if tbr else None, idx * 2))
formats.append(extract_format(stream_url.replace(
'/playlist.m3u8', ''), 'http-%d' % tbr if tbr else None, idx * 2 + 1, {'pno': 1031}))
self._sort_formats(formats)
return {
'id': video_id,
'title': info['title'].strip(),
'formats': formats,
'description': info.get('desc'),
'duration': int_or_none(info.get('duration')),
'thumbnail': info.get('thumb'),
}

View File

@ -0,0 +1,192 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_xpath,
)
from ..utils import (
int_or_none,
parse_duration,
smuggle_url,
unsmuggle_url,
xpath_text,
)
class MicrosoftVirtualAcademyBaseIE(InfoExtractor):
def _extract_base_url(self, course_id, display_id):
return self._download_json(
'https://api-mlxprod.microsoft.com/services/products/anonymous/%s' % course_id,
display_id, 'Downloading course base URL')
def _extract_chapter_and_title(self, title):
if not title:
return None, None
m = re.search(r'(?P<chapter>\d+)\s*\|\s*(?P<title>.+)', title)
return (int(m.group('chapter')), m.group('title')) if m else (None, title)
class MicrosoftVirtualAcademyIE(MicrosoftVirtualAcademyBaseIE):
IE_NAME = 'mva'
IE_DESC = 'Microsoft Virtual Academy videos'
_VALID_URL = r'(?:%s:|https?://(?:mva\.microsoft|(?:www\.)?microsoftvirtualacademy)\.com/[^/]+/training-courses/[^/?#&]+-)(?P<course_id>\d+)(?::|\?l=)(?P<id>[\da-zA-Z]+_\d+)' % IE_NAME
_TESTS = [{
'url': 'https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788?l=gfVXISmEB_6804984382',
'md5': '7826c44fc31678b12ad8db11f6b5abb9',
'info_dict': {
'id': 'gfVXISmEB_6804984382',
'ext': 'mp4',
'title': 'Course Introduction',
'formats': 'mincount:3',
'subtitles': {
'en': [{
'ext': 'ttml',
}],
},
}
}, {
'url': 'mva:11788:gfVXISmEB_6804984382',
'only_matching': True,
}]
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
course_id = mobj.group('course_id')
video_id = mobj.group('id')
base_url = smuggled_data.get('base_url') or self._extract_base_url(course_id, video_id)
settings = self._download_xml(
'%s/content/content_%s/videosettings.xml?v=1' % (base_url, video_id),
video_id, 'Downloading video settings XML')
_, title = self._extract_chapter_and_title(xpath_text(
settings, './/Title', 'title', fatal=True))
formats = []
for sources in settings.findall(compat_xpath('.//MediaSources')):
if sources.get('videoType') == 'smoothstreaming':
continue
for source in sources.findall(compat_xpath('./MediaSource')):
video_url = source.text
if not video_url or not video_url.startswith('http'):
continue
video_mode = source.get('videoMode')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', video_mode or '', 'height', default=None))
codec = source.get('codec')
acodec, vcodec = [None] * 2
if codec:
codecs = codec.split(',')
if len(codecs) == 2:
acodec, vcodec = codecs
elif len(codecs) == 1:
vcodec = codecs[0]
formats.append({
'url': video_url,
'format_id': video_mode,
'height': height,
'acodec': acodec,
'vcodec': vcodec,
})
self._sort_formats(formats)
subtitles = {}
for source in settings.findall(compat_xpath('.//MarkerResourceSource')):
subtitle_url = source.text
if not subtitle_url:
continue
subtitles.setdefault('en', []).append({
'url': '%s/%s' % (base_url, subtitle_url),
'ext': source.get('type'),
})
return {
'id': video_id,
'title': title,
'subtitles': subtitles,
'formats': formats
}
class MicrosoftVirtualAcademyCourseIE(MicrosoftVirtualAcademyBaseIE):
IE_NAME = 'mva:course'
IE_DESC = 'Microsoft Virtual Academy courses'
_VALID_URL = r'(?:%s:|https?://(?:mva\.microsoft|(?:www\.)?microsoftvirtualacademy)\.com/[^/]+/training-courses/(?P<display_id>[^/?#&]+)-)(?P<id>\d+)' % IE_NAME
_TESTS = [{
'url': 'https://mva.microsoft.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788',
'info_dict': {
'id': '11788',
'title': 'Microsoft Azure Fundamentals: Virtual Machines',
},
'playlist_count': 36,
}, {
# with emphasized chapters
'url': 'https://mva.microsoft.com/en-US/training-courses/developing-windows-10-games-with-construct-2-16335',
'info_dict': {
'id': '16335',
'title': 'Developing Windows 10 Games with Construct 2',
},
'playlist_count': 10,
}, {
'url': 'https://www.microsoftvirtualacademy.com/en-US/training-courses/microsoft-azure-fundamentals-virtual-machines-11788',
'only_matching': True,
}, {
'url': 'mva:course:11788',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if MicrosoftVirtualAcademyIE.suitable(url) else super(
MicrosoftVirtualAcademyCourseIE, cls).suitable(url)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_id = mobj.group('id')
display_id = mobj.group('display_id')
base_url = self._extract_base_url(course_id, display_id)
manifest = self._download_json(
'%s/imsmanifestlite.json' % base_url,
display_id, 'Downloading course manifest JSON')['manifest']
organization = manifest['organizations']['organization'][0]
entries = []
for chapter in organization['item']:
chapter_number, chapter_title = self._extract_chapter_and_title(chapter.get('title'))
chapter_id = chapter.get('@identifier')
for item in chapter.get('item', []):
item_id = item.get('@identifier')
if not item_id:
continue
metadata = item.get('resource', {}).get('metadata') or {}
if metadata.get('learningresourcetype') != 'Video':
continue
_, title = self._extract_chapter_and_title(item.get('title'))
duration = parse_duration(metadata.get('duration'))
description = metadata.get('description')
entries.append({
'_type': 'url_transparent',
'url': smuggle_url(
'mva:%s:%s' % (course_id, item_id), {'base_url': base_url}),
'title': title,
'description': description,
'duration': duration,
'chapter': chapter_title,
'chapter_number': chapter_number,
'chapter_id': chapter_id,
})
title = organization.get('title') or manifest.get('metadata', {}).get('title')
return self.playlist_result(entries, course_id, title)

View File

@ -1,8 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -20,21 +17,28 @@ class MinistryGridIE(InfoExtractor):
'id': '3453494717001', 'id': '3453494717001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Gospel by Numbers', 'title': 'The Gospel by Numbers',
'thumbnail': 're:^https?://.*\.jpg',
'upload_date': '20140410',
'description': 'Coming soon from T4G 2014!', 'description': 'Coming soon from T4G 2014!',
'uploader': 'LifeWay Christian Resources (MG)', 'uploader_id': '2034960640001',
'timestamp': 1397145591,
}, },
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': ['TDSLifeway'],
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
portlets_json = self._search_regex( portlets = self._parse_json(self._search_regex(
r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list') r'Liferay\.Portlet\.list=(\[.+?\])', webpage, 'portlet list'),
portlets = json.loads(portlets_json) video_id)
pl_id = self._search_regex( pl_id = self._search_regex(
r'<!--\s*p_l_id - ([0-9]+)<br>', webpage, 'p_l_id') r'getPlid:function\(\){return"(\d+)"}', webpage, 'p_l_id')
for i, portlet in enumerate(portlets): for i, portlet in enumerate(portlets):
portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet) portlet_url = 'http://www.ministrygrid.com/c/portal/render_portlet?p_l_id=%s&p_p_id=%s' % (pl_id, portlet)
@ -46,12 +50,8 @@ class MinistryGridIE(InfoExtractor):
r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe', r'<iframe.*?src="([^"]+)"', portlet_code, 'video iframe',
default=None) default=None)
if video_iframe_url: if video_iframe_url:
surl = smuggle_url( return self.url_result(
video_iframe_url, {'force_videoid': video_id}) smuggle_url(video_iframe_url, {'force_videoid': video_id}),
return { video_id=video_id)
'_type': 'url',
'id': video_id,
'url': surl,
}
raise ExtractorError('Could not find video iframe in any portlets') raise ExtractorError('Could not find video iframe in any portlets')

View File

@ -15,9 +15,9 @@ class MiTeleIE(InfoExtractor):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/' _VALID_URL = r'https?://www\.mitele\.es/[^/]+/[^/]+/[^/]+/(?P<id>[^/]+)/'
_TESTS = [{ _TEST = {
'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/', 'url': 'http://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144/',
'md5': '0ff1a13aebb35d9bc14081ff633dd324', # MD5 is unstable
'info_dict': { 'info_dict': {
'id': '0NF1jJnxS1Wu3pHrmvFyw2', 'id': '0NF1jJnxS1Wu3pHrmvFyw2',
'display_id': 'programa-144', 'display_id': 'programa-144',
@ -27,7 +27,7 @@ class MiTeleIE(InfoExtractor):
'thumbnail': 're:(?i)^https?://.*\.jpg$', 'thumbnail': 're:(?i)^https?://.*\.jpg$',
'duration': 2913, 'duration': 2913,
}, },
}] }
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)

View File

@ -1,26 +1,35 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import functools
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote from ..compat import (
compat_chr,
compat_ord,
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import ( from ..utils import (
clean_html,
ExtractorError, ExtractorError,
HEADRequest, OnDemandPagedList,
parse_count, parse_count,
str_to_int, str_to_int,
) )
class MixcloudIE(InfoExtractor): class MixcloudIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/([^/]+)' _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
IE_NAME = 'mixcloud' IE_NAME = 'mixcloud'
_TESTS = [{ _TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/', 'url': 'http://www.mixcloud.com/dholbach/cryptkeeper/',
'info_dict': { 'info_dict': {
'id': 'dholbach-cryptkeeper', 'id': 'dholbach-cryptkeeper',
'ext': 'mp3', 'ext': 'm4a',
'title': 'Cryptkeeper', 'title': 'Cryptkeeper',
'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.', 'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
'uploader': 'Daniel Holbach', 'uploader': 'Daniel Holbach',
@ -38,22 +47,22 @@ class MixcloudIE(InfoExtractor):
'description': 'md5:2b8aec6adce69f9d41724647c65875e8', 'description': 'md5:2b8aec6adce69f9d41724647c65875e8',
'uploader': 'Gilles Peterson Worldwide', 'uploader': 'Gilles Peterson Worldwide',
'uploader_id': 'gillespeterson', 'uploader_id': 'gillespeterson',
'thumbnail': 're:https?://.*/images/', 'thumbnail': 're:https?://.*',
'view_count': int, 'view_count': int,
'like_count': int, 'like_count': int,
}, },
}] }]
def _check_url(self, url, track_id, ext): # See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
try: @staticmethod
# We only want to know if the request succeed def _decrypt_play_info(play_info):
# don't download the whole file KEY = 'pleasedontdownloadourmusictheartistswontgetpaid'
self._request_webpage(
HEADRequest(url), track_id, play_info = base64.b64decode(play_info.encode('ascii'))
'Trying %s URL' % ext)
return True return ''.join([
except ExtractorError: compat_chr(compat_ord(ch) ^ compat_ord(KEY[idx % len(KEY)]))
return False for idx, ch in enumerate(play_info)])
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -63,14 +72,19 @@ class MixcloudIE(InfoExtractor):
webpage = self._download_webpage(url, track_id) webpage = self._download_webpage(url, track_id)
preview_url = self._search_regex( message = self._html_search_regex(
r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url') r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
song_url = re.sub(r'audiocdn(\d+)', r'stream\1', preview_url) webpage, 'error message', default=None)
song_url = song_url.replace('/previews/', '/c/originals/')
if not self._check_url(song_url, track_id, 'mp3'): encrypted_play_info = self._search_regex(
song_url = song_url.replace('.mp3', '.m4a').replace('originals/', 'm4a/64/') r'm-play-info="([^"]+)"', webpage, 'play info')
if not self._check_url(song_url, track_id, 'm4a'): play_info = self._parse_json(
raise ExtractorError('Unable to extract track url') self._decrypt_play_info(encrypted_play_info), track_id)
if message and 'stream_url' not in play_info:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
song_url = play_info['stream_url']
PREFIX = ( PREFIX = (
r'm-play-on-spacebar[^>]+' r'm-play-on-spacebar[^>]+'
@ -105,3 +119,201 @@ class MixcloudIE(InfoExtractor):
'view_count': view_count, 'view_count': view_count,
'like_count': like_count, 'like_count': like_count,
} }
class MixcloudPlaylistBaseIE(InfoExtractor):
_PAGE_SIZE = 24
def _find_urls_in_page(self, page):
for url in re.findall(r'm-play-button m-url="(?P<url>[^"]+)"', page):
yield self.url_result(
compat_urlparse.urljoin('https://www.mixcloud.com', clean_html(url)),
MixcloudIE.ie_key())
def _fetch_tracks_page(self, path, video_id, page_name, current_page, real_page_number=None):
real_page_number = real_page_number or current_page + 1
return self._download_webpage(
'https://www.mixcloud.com/%s/' % path, video_id,
note='Download %s (page %d)' % (page_name, current_page + 1),
errnote='Unable to download %s' % page_name,
query={'page': real_page_number, 'list': 'main', '_ajax': '1'},
headers={'X-Requested-With': 'XMLHttpRequest'})
def _tracks_page_func(self, page, video_id, page_name, current_page):
resp = self._fetch_tracks_page(page, video_id, page_name, current_page)
for item in self._find_urls_in_page(resp):
yield item
def _get_user_description(self, page_content):
return self._html_search_regex(
r'<div[^>]+class="description-text"[^>]*>(.+?)</div>',
page_content, 'user description', fatal=False)
class MixcloudUserIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/(?P<type>uploads|favorites|listens)?/?$'
IE_NAME = 'mixcloud:user'
_TESTS = [{
'url': 'http://www.mixcloud.com/dholbach/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/uploads/',
'info_dict': {
'id': 'dholbach_uploads',
'title': 'Daniel Holbach (uploads)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'playlist_mincount': 11,
}, {
'url': 'http://www.mixcloud.com/dholbach/favorites/',
'info_dict': {
'id': 'dholbach_favorites',
'title': 'Daniel Holbach (favorites)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.mixcloud.com/dholbach/listens/',
'info_dict': {
'id': 'dholbach_listens',
'title': 'Daniel Holbach (listens)',
'description': 'md5:327af72d1efeb404a8216c27240d1370',
},
'params': {
'playlist_items': '1-100',
},
'playlist_mincount': 100,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
list_type = mobj.group('type')
# if only a profile URL was supplied, default to download all uploads
if list_type is None:
list_type = 'uploads'
video_id = '%s_%s' % (user_id, list_type)
profile = self._download_webpage(
'https://www.mixcloud.com/%s/' % user_id, video_id,
note='Downloading user profile',
errnote='Unable to download user profile')
username = self._og_search_title(profile)
description = self._get_user_description(profile)
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/%s' % (user_id, list_type), video_id, 'list of %s' % list_type),
self._PAGE_SIZE, use_cache=True)
return self.playlist_result(
entries, video_id, '%s (%s)' % (username, list_type), description)
class MixcloudPlaylistIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<user>[^/]+)/playlists/(?P<playlist>[^/]+)/?$'
IE_NAME = 'mixcloud:playlist'
_TESTS = [{
'url': 'https://www.mixcloud.com/RedBullThre3style/playlists/tokyo-finalists-2015/',
'info_dict': {
'id': 'RedBullThre3style_tokyo-finalists-2015',
'title': 'National Champions 2015',
'description': 'md5:6ff5fb01ac76a31abc9b3939c16243a3',
},
'playlist_mincount': 16,
}, {
'url': 'https://www.mixcloud.com/maxvibes/playlists/jazzcat-on-ness-radio/',
'info_dict': {
'id': 'maxvibes_jazzcat-on-ness-radio',
'title': 'Jazzcat on Ness Radio',
'description': 'md5:7bbbf0d6359a0b8cda85224be0f8f263',
},
'playlist_mincount': 23
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
user_id = mobj.group('user')
playlist_id = mobj.group('playlist')
video_id = '%s_%s' % (user_id, playlist_id)
profile = self._download_webpage(
url, user_id,
note='Downloading playlist page',
errnote='Unable to download playlist page')
description = self._get_user_description(profile)
playlist_title = self._html_search_regex(
r'<span[^>]+class="[^"]*list-playlist-title[^"]*"[^>]*>(.*?)</span>',
profile, 'playlist title')
entries = OnDemandPagedList(
functools.partial(
self._tracks_page_func,
'%s/playlists/%s' % (user_id, playlist_id), video_id, 'tracklist'),
self._PAGE_SIZE)
return self.playlist_result(entries, video_id, playlist_title, description)
class MixcloudStreamIE(MixcloudPlaylistBaseIE):
_VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/(?P<id>[^/]+)/stream/?$'
IE_NAME = 'mixcloud:stream'
_TEST = {
'url': 'https://www.mixcloud.com/FirstEar/stream/',
'info_dict': {
'id': 'FirstEar',
'title': 'First Ear',
'description': 'Curators of good music\nfirstearmusic.com',
},
'playlist_mincount': 192,
}
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(url, user_id)
entries = []
prev_page_url = None
def _handle_page(page):
entries.extend(self._find_urls_in_page(page))
return self._search_regex(
r'm-next-page-url="([^"]+)"', page,
'next page URL', default=None)
next_page_url = _handle_page(webpage)
for idx in itertools.count(0):
if not next_page_url or prev_page_url == next_page_url:
break
prev_page_url = next_page_url
current_page = int(self._search_regex(
r'\?page=(\d+)', next_page_url, 'next page number'))
next_page_url = _handle_page(self._fetch_tracks_page(
'%s/stream' % user_id, user_id, 'stream', idx,
real_page_number=current_page))
username = self._og_search_title(webpage)
description = self._get_user_description(webpage)
return self.playlist_result(entries, user_id, username, description)

View File

@ -1,110 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
sanitized_Request,
urlencode_postdata,
)
class MooshareIE(InfoExtractor):
IE_NAME = 'mooshare'
IE_DESC = 'Mooshare.biz'
_VALID_URL = r'https?://(?:www\.)?mooshare\.biz/(?P<id>[\da-z]{12})'
_TESTS = [
{
'url': 'http://mooshare.biz/8dqtk4bjbp8g',
'md5': '4e14f9562928aecd2e42c6f341c8feba',
'info_dict': {
'id': '8dqtk4bjbp8g',
'ext': 'mp4',
'title': 'Comedy Football 2011 - (part 1-2)',
'duration': 893,
},
},
{
'url': 'http://mooshare.biz/aipjtoc4g95j',
'info_dict': {
'id': 'aipjtoc4g95j',
'ext': 'mp4',
'title': 'Orange Caramel Dashing Through the Snow',
'duration': 212,
},
'params': {
# rtmp download
'skip_download': True,
}
}
]
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(url, video_id, 'Downloading page')
if re.search(r'>Video Not Found or Deleted<', page) is not None:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
hash_key = self._html_search_regex(r'<input type="hidden" name="hash" value="([^"]+)">', page, 'hash')
title = self._html_search_regex(r'(?m)<div class="blockTitle">\s*<h2>Watch ([^<]+)</h2>', page, 'title')
download_form = {
'op': 'download1',
'id': video_id,
'hash': hash_key,
}
request = sanitized_Request(
'http://mooshare.biz/%s' % video_id, urlencode_postdata(download_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
self._sleep(5, video_id)
video_page = self._download_webpage(request, video_id, 'Downloading video page')
thumbnail = self._html_search_regex(r'image:\s*"([^"]+)",', video_page, 'thumbnail', fatal=False)
duration_str = self._html_search_regex(r'duration:\s*"(\d+)",', video_page, 'duration', fatal=False)
duration = int(duration_str) if duration_str is not None else None
formats = []
# SD video
mobj = re.search(r'(?m)file:\s*"(?P<url>[^"]+)",\s*provider:', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'sd',
'format': 'SD',
})
# HD video
mobj = re.search(r'\'hd-2\': { file: \'(?P<url>[^\']+)\' },', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('url'),
'format_id': 'hd',
'format': 'HD',
})
# rtmp video
mobj = re.search(r'(?m)file: "(?P<playpath>[^"]+)",\s*streamer: "(?P<rtmpurl>rtmp://[^"]+)",', video_page)
if mobj is not None:
formats.append({
'url': mobj.group('rtmpurl'),
'play_path': mobj.group('playpath'),
'rtmp_live': False,
'ext': 'mp4',
'format_id': 'rtmp',
'format': 'HD',
})
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'formats': formats,
}

View File

@ -1,17 +1,21 @@
# encoding: utf-8 # encoding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..compat import compat_urlparse
from ..utils import (
int_or_none,
js_to_json,
mimetype2ext,
)
class MusicPlayOnIE(InfoExtractor): class MusicPlayOnIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=100&play)=(?P<id>\d+)' _VALID_URL = r'https?://(?:.+?\.)?musicplayon\.com/play(?:-touch)?\?(?:v|pl=\d+&play)=(?P<id>\d+)'
_TEST = { _TESTS = [{
'url': 'http://en.musicplayon.com/play?v=433377', 'url': 'http://en.musicplayon.com/play?v=433377',
'md5': '00cdcdea1726abdf500d1e7fd6dd59bb',
'info_dict': { 'info_dict': {
'id': '433377', 'id': '433377',
'ext': 'mp4', 'ext': 'mp4',
@ -20,15 +24,16 @@ class MusicPlayOnIE(InfoExtractor):
'duration': 342, 'duration': 342,
'uploader': 'ultrafish', 'uploader': 'ultrafish',
}, },
'params': { }, {
# m3u8 download 'url': 'http://en.musicplayon.com/play?pl=102&play=442629',
'skip_download': True, 'only_matching': True,
}, }]
}
_URL_TEMPLATE = 'http://en.musicplayon.com/play?v=%s'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') url = self._URL_TEMPLATE % video_id
page = self._download_webpage(url, video_id) page = self._download_webpage(url, video_id)
@ -40,28 +45,14 @@ class MusicPlayOnIE(InfoExtractor):
uploader = self._html_search_regex( uploader = self._html_search_regex(
r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False) r'<div>by&nbsp;<a href="[^"]+" class="purple">([^<]+)</a></div>', page, 'uploader', fatal=False)
formats = [ sources = self._parse_json(
{ self._search_regex(r'setup\[\'_sources\'\]\s*=\s*([^;]+);', page, 'video sources'),
'url': 'http://media0-eu-nl.musicplayon.com/stream-mobile?id=%s&type=.mp4' % video_id, video_id, transform_source=js_to_json)
'ext': 'mp4', formats = [{
} 'url': compat_urlparse.urljoin(url, source['src']),
] 'ext': mimetype2ext(source.get('type')),
'format_note': source.get('data-res'),
manifest = self._download_webpage( } for source in sources]
'http://en.musicplayon.com/manifest.m3u8?v=%s' % video_id, video_id, 'Downloading manifest')
for entry in manifest.split('#')[1:]:
if entry.startswith('EXT-X-STREAM-INF:'):
meta, url, _ = entry.split('\n')
params = dict(param.split('=') for param in meta.split(',')[1:])
formats.append({
'url': url,
'ext': 'mp4',
'tbr': int(params['BANDWIDTH']),
'width': int(params['RESOLUTION'].split('x')[1]),
'height': int(params['RESOLUTION'].split('x')[-1]),
'format_note': params['NAME'].replace('"', '').strip(),
})
return { return {
'id': video_id, 'id': video_id,

View File

@ -1,63 +0,0 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
class MuzuTVIE(InfoExtractor):
_VALID_URL = r'https?://www\.muzu\.tv/(.+?)/(.+?)/(?P<id>\d+)'
IE_NAME = 'muzu.tv'
_TEST = {
'url': 'http://www.muzu.tv/defected/marcashken-featuring-sos-cat-walk-original-mix-music-video/1981454/',
'md5': '98f8b2c7bc50578d6a0364fff2bfb000',
'info_dict': {
'id': '1981454',
'ext': 'mp4',
'title': 'Cat Walk (Original Mix)',
'description': 'md5:90e868994de201b2570e4e5854e19420',
'uploader': 'MarcAshken featuring SOS',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
info_data = compat_urllib_parse_urlencode({
'format': 'json',
'url': url,
})
info = self._download_json(
'http://www.muzu.tv/api/oembed/?%s' % info_data,
video_id, 'Downloading video info')
player_info = self._download_json(
'http://player.muzu.tv/player/playerInit?ai=%s' % video_id,
video_id, 'Downloading player info')
video_info = player_info['videos'][0]
for quality in ['1080', '720', '480', '360']:
if video_info.get('v%s' % quality):
break
data = compat_urllib_parse_urlencode({
'ai': video_id,
# Even if each time you watch a video the hash changes,
# it seems to work for different videos, and it will work
# even if you use any non empty string as a hash
'viewhash': 'VBNff6djeV4HV5TRPW5kOHub2k',
'device': 'web',
'qv': quality,
})
video_url_info = self._download_json(
'http://player.muzu.tv/player/requestVideo?%s' % data,
video_id, 'Downloading video url')
video_url = video_url_info['url']
return {
'id': video_id,
'title': info['title'],
'url': video_url,
'thumbnail': info['thumbnail_url'],
'description': info['description'],
'uploader': info['author_name'],
}

View File

@ -10,9 +10,10 @@ from ..utils import (
class MwaveIE(InfoExtractor): class MwaveIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)' _VALID_URL = r'https?://mwave\.interest\.me/mnettv/videodetail\.m\?searchVideoDetailVO\.clip_id=(?P<id>[0-9]+)'
_URL_TEMPLATE = 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=%s'
_TEST = { _TEST = {
'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859', 'url': 'http://mwave.interest.me/mnettv/videodetail.m?searchVideoDetailVO.clip_id=168859',
'md5': 'c930e27b7720aaa3c9d0018dfc8ff6cc', # md5 is unstable
'info_dict': { 'info_dict': {
'id': '168859', 'id': '168859',
'ext': 'flv', 'ext': 'flv',
@ -56,3 +57,28 @@ class MwaveIE(InfoExtractor):
'view_count': int_or_none(vod_info.get('hit')), 'view_count': int_or_none(vod_info.get('hit')),
'formats': formats, 'formats': formats,
} }
class MwaveMeetGreetIE(InfoExtractor):
_VALID_URL = r'https?://mwave\.interest\.me/meetgreet/view/(?P<id>\d+)'
_TEST = {
'url': 'http://mwave.interest.me/meetgreet/view/256',
'info_dict': {
'id': '173294',
'ext': 'flv',
'title': '[MEET&GREET] Park BoRam',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Mwave',
'duration': 3634,
'view_count': int,
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
clip_id = self._html_search_regex(
r'<iframe[^>]+src="/mnettv/ifr_clip\.m\?searchVideoDetailVO\.clip_id=(\d+)',
webpage, 'clip ID')
clip_url = MwaveIE._URL_TEMPLATE % clip_id
return self.url_result(clip_url, 'Mwave', clip_id)

View File

@ -134,6 +134,9 @@ class NBCSportsIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke', 'title': 'Tom Izzo, Michigan St. has \'so much respect\' for Duke',
'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113', 'description': 'md5:ecb459c9d59e0766ac9c7d5d0eda8113',
'uploader': 'NBCU-SPORTS',
'upload_date': '20150330',
'timestamp': 1427726529,
} }
} }
@ -172,7 +175,7 @@ class CSNNEIE(InfoExtractor):
class NBCNewsIE(ThePlatformIE): class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/ _VALID_URL = r'''(?x)https?://(?:www\.)?(?:nbcnews|today)\.com/
(?:video/.+?/(?P<id>\d+)| (?:video/.+?/(?P<id>\d+)|
([^/]+/)*(?P<display_id>[^/?]+)) ([^/]+/)*(?P<display_id>[^/?]+))
''' '''
@ -230,6 +233,18 @@ class NBCNewsIE(ThePlatformIE):
}, },
'expected_warnings': ['http-6000 is not available'] 'expected_warnings': ['http-6000 is not available']
}, },
{
'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
'md5': '118d7ca3f0bea6534f119c68ef539f71',
'info_dict': {
'id': '669831235788',
'ext': 'mp4',
'title': 'See the aurora borealis from space in stunning new NASA video',
'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
'upload_date': '20160420',
'timestamp': 1461152093,
},
},
{ {
'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952', 'url': 'http://www.nbcnews.com/watch/dateline/full-episode--deadly-betrayal-386250819952',
'only_matching': True, 'only_matching': True,
@ -264,7 +279,10 @@ class NBCNewsIE(ThePlatformIE):
info = bootstrap['results'][0]['video'] info = bootstrap['results'][0]['video']
else: else:
player_instance_json = self._search_regex( player_instance_json = self._search_regex(
r'videoObj\s*:\s*({.+})', webpage, 'player instance') r'videoObj\s*:\s*({.+})', webpage, 'player instance', default=None)
if not player_instance_json:
player_instance_json = self._html_search_regex(
r'data-video="([^"]+)"', webpage, 'video json')
info = self._parse_json(player_instance_json, display_id) info = self._parse_json(player_instance_json, display_id)
video_id = info['mpxId'] video_id = info['mpxId']
title = info['title'] title = info['title']
@ -295,7 +313,7 @@ class NBCNewsIE(ThePlatformIE):
formats.extend(tp_formats) formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
else: else:
tbr = int_or_none(video_asset.get('bitRate'), 1000) tbr = int_or_none(video_asset.get('bitRate') or video_asset.get('bitrate'), 1000)
format_id = 'http%s' % ('-%d' % tbr if tbr else '') format_id = 'http%s' % ('-%d' % tbr if tbr else '')
video_url = update_url_query( video_url = update_url_query(
video_url, {'format': 'redirect'}) video_url, {'format': 'redirect'})
@ -321,10 +339,9 @@ class NBCNewsIE(ThePlatformIE):
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': info.get('description'), 'description': info.get('description'),
'thumbnail': info.get('description'),
'thumbnail': info.get('thumbnail'), 'thumbnail': info.get('thumbnail'),
'duration': int_or_none(info.get('duration')), 'duration': int_or_none(info.get('duration')),
'timestamp': parse_iso8601(info.get('pubDate')), 'timestamp': parse_iso8601(info.get('pubDate') or info.get('pub_date')),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
} }

View File

@ -1,80 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
determine_ext,
parse_iso8601,
xpath_text,
)
class NerdistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nerdist\.com/vepisode/(?P<id>[^/?#]+)'
_TEST = {
'url': 'http://www.nerdist.com/vepisode/exclusive-which-dc-characters-w',
'md5': '3698ed582931b90d9e81e02e26e89f23',
'info_dict': {
'display_id': 'exclusive-which-dc-characters-w',
'id': 'RPHpvJyr',
'ext': 'mp4',
'title': 'Your TEEN TITANS Revealed! Who\'s on the show?',
'thumbnail': 're:^https?://.*/thumbs/.*\.jpg$',
'description': 'Exclusive: Find out which DC Comics superheroes will star in TEEN TITANS Live-Action TV Show on Nerdist News with Jessica Chobot!',
'uploader': 'Eric Diaz',
'upload_date': '20150202',
'timestamp': 1422892808,
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
r'''(?x)<script\s+(?:type="text/javascript"\s+)?
src="https?://content\.nerdist\.com/players/([a-zA-Z0-9_]+)-''',
webpage, 'video ID')
timestamp = parse_iso8601(self._html_search_meta(
'shareaholic:article_published_time', webpage, 'upload date'))
uploader = self._html_search_meta(
'shareaholic:article_author_name', webpage, 'article author')
doc = self._download_xml(
'http://content.nerdist.com/jw6/%s.xml' % video_id, video_id)
video_info = doc.find('.//item')
title = xpath_text(video_info, './title', fatal=True)
description = xpath_text(video_info, './description')
thumbnail = xpath_text(
video_info, './{http://rss.jwpcdn.com/}image', 'thumbnail')
formats = []
for source in video_info.findall('./{http://rss.jwpcdn.com/}source'):
vurl = source.attrib['file']
ext = determine_ext(vurl)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
vurl, video_id, entry_protocol='m3u8_native', ext='mp4',
preference=0))
elif ext == 'smil':
formats.extend(self._extract_smil_formats(
vurl, video_id, fatal=False
))
else:
formats.append({
'format_id': ext,
'url': vurl,
})
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
'uploader': uploader,
}

View File

@ -89,6 +89,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1431878400, 'timestamp': 1431878400,
'description': 'md5:a10a54589c2860300d02e1de821eb2ef', 'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'No lyrics translation.', 'note': 'No lyrics translation.',
'url': 'http://music.163.com/#/song?id=29822014', 'url': 'http://music.163.com/#/song?id=29822014',
@ -101,6 +102,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1419523200, 'timestamp': 1419523200,
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c', 'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'No lyrics.', 'note': 'No lyrics.',
'url': 'http://music.163.com/song?id=17241424', 'url': 'http://music.163.com/song?id=17241424',
@ -112,6 +114,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20080211', 'upload_date': '20080211',
'timestamp': 1202745600, 'timestamp': 1202745600,
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'Has translated name.', 'note': 'Has translated name.',
'url': 'http://music.163.com/#/song?id=22735043', 'url': 'http://music.163.com/#/song?id=22735043',
@ -124,7 +127,8 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20100127', 'upload_date': '20100127',
'timestamp': 1264608000, 'timestamp': 1264608000,
'alt_title': '说出愿望吧(Genie)', 'alt_title': '说出愿望吧(Genie)',
} },
'skip': 'Blocked outside Mainland China',
}] }]
def _process_lyrics(self, lyrics_info): def _process_lyrics(self, lyrics_info):
@ -192,6 +196,7 @@ class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
'title': 'B\'day', 'title': 'B\'day',
}, },
'playlist_count': 23, 'playlist_count': 23,
'skip': 'Blocked outside Mainland China',
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -223,6 +228,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '张惠妹 - aMEI;阿密特', 'title': '张惠妹 - aMEI;阿密特',
}, },
'playlist_count': 50, 'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'Singer has translated name.', 'note': 'Singer has translated name.',
'url': 'http://music.163.com/#/artist?id=124098', 'url': 'http://music.163.com/#/artist?id=124098',
@ -231,6 +237,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '李昇基 - 이승기', 'title': '李昇基 - 이승기',
}, },
'playlist_count': 50, 'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -266,6 +273,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:12fd0819cab2965b9583ace0f8b7b022' 'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
}, },
'playlist_count': 99, 'playlist_count': 99,
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'Toplist/Charts sample', 'note': 'Toplist/Charts sample',
'url': 'http://music.163.com/#/discover/toplist?id=3733003', 'url': 'http://music.163.com/#/discover/toplist?id=3733003',
@ -275,6 +283,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:73ec782a612711cadc7872d9c1e134fc', 'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
}, },
'playlist_count': 50, 'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -314,6 +323,7 @@ class NetEaseMusicMvIE(NetEaseMusicBaseIE):
'creator': '白雅言', 'creator': '白雅言',
'upload_date': '20150520', 'upload_date': '20150520',
}, },
'skip': 'Blocked outside Mainland China',
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -357,6 +367,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'upload_date': '20150613', 'upload_date': '20150613',
'duration': 900, 'duration': 900,
}, },
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'This program has accompanying songs.', 'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022', 'url': 'http://music.163.com/#/program?id=10141022',
@ -366,6 +377,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b', 'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
}, },
'playlist_count': 4, 'playlist_count': 4,
'skip': 'Blocked outside Mainland China',
}, { }, {
'note': 'This program has accompanying songs.', 'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022', 'url': 'http://music.163.com/#/program?id=10141022',
@ -379,7 +391,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
}, },
'params': { 'params': {
'noplaylist': True 'noplaylist': True
} },
'skip': 'Blocked outside Mainland China',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -438,6 +451,7 @@ class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
'description': 'md5:766220985cbd16fdd552f64c578a6b15' 'description': 'md5:766220985cbd16fdd552f64c578a6b15'
}, },
'playlist_mincount': 40, 'playlist_mincount': 40,
'skip': 'Blocked outside Mainland China',
} }
_PAGE_SIZE = 1000 _PAGE_SIZE = 1000

View File

@ -7,8 +7,8 @@ from .common import InfoExtractor
class NewgroundsIE(InfoExtractor): class NewgroundsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newgrounds\.com/audio/listen/(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?newgrounds\.com/(?:audio/listen|portal/view)/(?P<id>[0-9]+)'
_TEST = { _TESTS = [{
'url': 'http://www.newgrounds.com/audio/listen/549479', 'url': 'http://www.newgrounds.com/audio/listen/549479',
'md5': 'fe6033d297591288fa1c1f780386f07a', 'md5': 'fe6033d297591288fa1c1f780386f07a',
'info_dict': { 'info_dict': {
@ -17,7 +17,16 @@ class NewgroundsIE(InfoExtractor):
'title': 'B7 - BusMode', 'title': 'B7 - BusMode',
'uploader': 'Burn7', 'uploader': 'Burn7',
} }
} }, {
'url': 'http://www.newgrounds.com/portal/view/673111',
'md5': '3394735822aab2478c31b1004fe5e5bc',
'info_dict': {
'id': '673111',
'ext': 'mp4',
'title': 'Dancin',
'uploader': 'Squirrelman82',
},
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
@ -25,9 +34,11 @@ class NewgroundsIE(InfoExtractor):
webpage = self._download_webpage(url, music_id) webpage = self._download_webpage(url, music_id)
title = self._html_search_regex( title = self._html_search_regex(
r',"name":"([^"]+)",', webpage, 'music title') r'<title>([^>]+)</title>', webpage, 'title')
uploader = self._html_search_regex( uploader = self._html_search_regex(
r',"artist":"([^"]+)",', webpage, 'music uploader') [r',"artist":"([^"]+)",', r'[\'"]owner[\'"]\s*:\s*[\'"]([^\'"]+)[\'"],'],
webpage, 'uploader')
music_url_json_string = self._html_search_regex( music_url_json_string = self._html_search_regex(
r'({"url":"[^"]+"),', webpage, 'music url') + '}' r'({"url":"[^"]+"),', webpage, 'music url') + '}'

View File

@ -4,24 +4,24 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ExtractorError from ..utils import (
ExtractorError,
int_or_none,
)
class NewstubeIE(InfoExtractor): class NewstubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)' _VALID_URL = r'https?://(?:www\.)?newstube\.ru/media/(?P<id>.+)'
_TEST = { _TEST = {
'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym', 'url': 'http://www.newstube.ru/media/telekanal-cnn-peremestil-gorod-slavyansk-v-krym',
'md5': '801eef0c2a9f4089fa04e4fe3533abdc',
'info_dict': { 'info_dict': {
'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6', 'id': '728e0ef2-e187-4012-bac0-5a081fdcb1f6',
'ext': 'flv', 'ext': 'mp4',
'title': 'Телеканал CNN переместил город Славянск в Крым', 'title': 'Телеканал CNN переместил город Славянск в Крым',
'description': 'md5:419a8c9f03442bc0b0a794d689360335', 'description': 'md5:419a8c9f03442bc0b0a794d689360335',
'duration': 31.05, 'duration': 31.05,
}, },
'params': {
# rtmp download
'skip_download': True,
},
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -62,7 +62,6 @@ class NewstubeIE(InfoExtractor):
server = media_location.find(ns('./Server')).text server = media_location.find(ns('./Server')).text
app = media_location.find(ns('./App')).text app = media_location.find(ns('./App')).text
media_id = stream_info.find(ns('./Id')).text media_id = stream_info.find(ns('./Id')).text
quality_id = stream_info.find(ns('./QualityId')).text
name = stream_info.find(ns('./Name')).text name = stream_info.find(ns('./Name')).text
width = int(stream_info.find(ns('./Width')).text) width = int(stream_info.find(ns('./Width')).text)
height = int(stream_info.find(ns('./Height')).text) height = int(stream_info.find(ns('./Height')).text)
@ -74,12 +73,38 @@ class NewstubeIE(InfoExtractor):
'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'], 'rtmp_conn': ['S:%s' % session_id, 'S:%s' % media_id, 'S:n2'],
'page_url': url, 'page_url': url,
'ext': 'flv', 'ext': 'flv',
'format_id': quality_id, 'format_id': 'rtmp' + ('-%s' % name if name else ''),
'format_note': name,
'width': width, 'width': width,
'height': height, 'height': height,
}) })
sources_data = self._download_json(
'http://www.newstube.ru/player2/getsources?guid=%s' % video_guid,
video_guid, fatal=False)
if sources_data:
for source in sources_data.get('Sources', []):
source_url = source.get('Src')
if not source_url:
continue
height = int_or_none(source.get('Height'))
f = {
'format_id': 'http' + ('-%dp' % height if height else ''),
'url': source_url,
'width': int_or_none(source.get('Width')),
'height': height,
}
source_type = source.get('Type')
if source_type:
mobj = re.search(r'codecs="([^,]+),\s*([^"]+)"', source_type)
if mobj:
vcodec, acodec = mobj.groups()
f.update({
'vcodec': vcodec,
'acodec': acodec,
})
formats.append(f)
self._check_formats(formats, video_guid)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -8,10 +8,15 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urlparse, compat_urlparse,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse compat_urllib_parse_urlparse,
compat_str,
) )
from ..utils import ( from ..utils import (
unified_strdate, unified_strdate,
determine_ext,
int_or_none,
parse_iso8601,
parse_duration,
) )
@ -70,8 +75,8 @@ class NHLBaseInfoExtractor(InfoExtractor):
return ret return ret
class NHLIE(NHLBaseInfoExtractor): class NHLVideocenterIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com' IE_NAME = 'nhl.com:videocenter'
_VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)' _VALID_URL = r'https?://video(?P<team>\.[^.]*)?\.nhl\.com/videocenter/(?:console|embed)?(?:\?(?:.*?[?&])?)(?:id|hlg|playlist)=(?P<id>[-0-9a-zA-Z,]+)'
_TESTS = [{ _TESTS = [{
@ -186,8 +191,8 @@ class NHLNewsIE(NHLBaseInfoExtractor):
return self._real_extract_video(video_id) return self._real_extract_video(video_id)
class NHLVideocenterIE(NHLBaseInfoExtractor): class NHLVideocenterCategoryIE(NHLBaseInfoExtractor):
IE_NAME = 'nhl.com:videocenter' IE_NAME = 'nhl.com:videocenter:category'
IE_DESC = 'NHL videocenter category' IE_DESC = 'NHL videocenter category'
_VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$' _VALID_URL = r'https?://video\.(?P<team>[^.]*)\.nhl\.com/videocenter/(console\?[^(id=)]*catid=(?P<catid>[0-9]+)(?![&?]id=).*?)?$'
_TEST = { _TEST = {
@ -236,3 +241,86 @@ class NHLVideocenterIE(NHLBaseInfoExtractor):
'id': cat_id, 'id': cat_id,
'entries': [self._extract_video(v) for v in videos], 'entries': [self._extract_video(v) for v in videos],
} }
class NHLIE(InfoExtractor):
IE_NAME = 'nhl.com'
_VALID_URL = r'https?://(?:www\.)?nhl\.com/([^/]+/)*c-(?P<id>\d+)'
_TESTS = [{
# type=video
'url': 'https://www.nhl.com/video/anisimov-cleans-up-mess/t-277752844/c-43663503',
'md5': '0f7b9a8f986fb4b4eeeece9a56416eaf',
'info_dict': {
'id': '43663503',
'ext': 'mp4',
'title': 'Anisimov cleans up mess',
'description': 'md5:a02354acdfe900e940ce40706939ca63',
'timestamp': 1461288600,
'upload_date': '20160422',
},
}, {
# type=article
'url': 'https://www.nhl.com/news/dennis-wideman-suspended/c-278258934',
'md5': '1f39f4ea74c1394dea110699a25b366c',
'info_dict': {
'id': '40784403',
'ext': 'mp4',
'title': 'Wideman suspended by NHL',
'description': 'Flames defenseman Dennis Wideman was banned 20 games for violation of Rule 40 (Physical Abuse of Officials)',
'upload_date': '20160204',
'timestamp': 1454544904,
},
}]
def _real_extract(self, url):
tmp_id = self._match_id(url)
video_data = self._download_json(
'https://nhl.bamcontent.com/nhl/id/v1/%s/details/web-v1.json' % tmp_id,
tmp_id)
if video_data.get('type') == 'article':
video_data = video_data['media']
video_id = compat_str(video_data['id'])
title = video_data['title']
formats = []
for playback in video_data.get('playbacks', []):
playback_url = playback.get('url')
if not playback_url:
continue
ext = determine_ext(playback_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
playback_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=playback.get('name', 'hls'), fatal=False))
else:
height = int_or_none(playback.get('height'))
formats.append({
'format_id': playback.get('name', 'http' + ('-%dp' % height if height else '')),
'url': playback_url,
'width': int_or_none(playback.get('width')),
'height': height,
})
self._sort_formats(formats, ('preference', 'width', 'height', 'tbr', 'format_id'))
thumbnails = []
for thumbnail_id, thumbnail_data in video_data.get('image', {}).get('cuts', {}).items():
thumbnail_url = thumbnail_data.get('src')
if not thumbnail_url:
continue
thumbnails.append({
'id': thumbnail_id,
'url': thumbnail_url,
'width': int_or_none(thumbnail_data.get('width')),
'height': int_or_none(thumbnail_data.get('height')),
})
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'timestamp': parse_iso8601(video_data.get('date')),
'duration': parse_duration(video_data.get('duration')),
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -2,6 +2,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .screenwavemedia import ScreenwaveMediaIE
from ..utils import ( from ..utils import (
unified_strdate, unified_strdate,
@ -12,7 +13,6 @@ class NormalbootsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?normalboots\.com/video/(?P<id>[0-9a-z-]*)/?$' _VALID_URL = r'https?://(?:www\.)?normalboots\.com/video/(?P<id>[0-9a-z-]*)/?$'
_TEST = { _TEST = {
'url': 'http://normalboots.com/video/home-alone-games-jontron/', 'url': 'http://normalboots.com/video/home-alone-games-jontron/',
'md5': '8bf6de238915dd501105b44ef5f1e0f6',
'info_dict': { 'info_dict': {
'id': 'home-alone-games-jontron', 'id': 'home-alone-games-jontron',
'ext': 'mp4', 'ext': 'mp4',
@ -22,9 +22,10 @@ class NormalbootsIE(InfoExtractor):
'upload_date': '20140125', 'upload_date': '20140125',
}, },
'params': { 'params': {
# rtmp download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'add_ie': ['ScreenwaveMedia'],
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -38,16 +39,15 @@ class NormalbootsIE(InfoExtractor):
r'<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>', r'<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
webpage, 'date', fatal=False)) webpage, 'date', fatal=False))
player_url = self._html_search_regex( screenwavemedia_url = self._html_search_regex(
r'<iframe\swidth="[0-9]+"\sheight="[0-9]+"\ssrc="(?P<url>[\S]+)"', ScreenwaveMediaIE.EMBED_PATTERN, webpage, 'screenwave URL',
webpage, 'player url') group='url')
player_page = self._download_webpage(player_url, video_id)
video_url = self._html_search_regex(
r"file:\s'(?P<file>[^']+\.mp4)'", player_page, 'file')
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'url': video_url, 'url': screenwavemedia_url,
'ie_key': ScreenwaveMediaIE.ie_key(),
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage),
'thumbnail': self._og_search_thumbnail(webpage), 'thumbnail': self._og_search_thumbnail(webpage),

View File

@ -4,37 +4,140 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_urlparse,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
float_or_none, int_or_none,
parse_age_limit,
parse_duration, parse_duration,
unified_strdate,
) )
class NRKIE(InfoExtractor): class NRKBaseIE(InfoExtractor):
_VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)' def _extract_formats(self, manifest_url, video_id, fatal=True):
formats = []
formats.extend(self._extract_f4m_formats(
manifest_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81',
video_id, f4m_id='hds', fatal=fatal))
formats.extend(self._extract_m3u8_formats(manifest_url.replace(
'akamaihd.net/z/', 'akamaihd.net/i/').replace('/manifest.f4m', '/master.m3u8'),
video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=fatal))
return formats
_TESTS = [ def _real_extract(self, url):
{ video_id = self._match_id(url)
data = self._download_json(
'http://%s/mediaelement/%s' % (self._API_HOST, video_id),
video_id, 'Downloading mediaelement JSON')
title = data.get('fullTitle') or data.get('mainTitle') or data['title']
video_id = data.get('id') or video_id
entries = []
media_assets = data.get('mediaAssets')
if media_assets and isinstance(media_assets, list):
def video_id_and_title(idx):
return ((video_id, title) if len(media_assets) == 1
else ('%s-%d' % (video_id, idx), '%s (Part %d)' % (title, idx)))
for num, asset in enumerate(media_assets, 1):
asset_url = asset.get('url')
if not asset_url:
continue
formats = self._extract_formats(asset_url, video_id, fatal=False)
if not formats:
continue
self._sort_formats(formats)
entry_id, entry_title = video_id_and_title(num)
duration = parse_duration(asset.get('duration'))
subtitles = {}
for subtitle in ('webVtt', 'timedText'):
subtitle_url = asset.get('%sSubtitlesUrl' % subtitle)
if subtitle_url:
subtitles.setdefault('no', []).append({'url': subtitle_url})
entries.append({
'id': asset.get('carrierId') or entry_id,
'title': entry_title,
'duration': duration,
'subtitles': subtitles,
'formats': formats,
})
if not entries:
media_url = data.get('mediaUrl')
if media_url:
formats = self._extract_formats(media_url, video_id)
self._sort_formats(formats)
duration = parse_duration(data.get('duration'))
entries = [{
'id': video_id,
'title': title,
'duration': duration,
'formats': formats,
}]
if not entries:
if data.get('usageRights', {}).get('isGeoBlocked'):
raise ExtractorError(
'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
expected=True)
conviva = data.get('convivaStatistics') or {}
series = conviva.get('seriesName') or data.get('seriesTitle')
episode = conviva.get('episodeName') or data.get('episodeNumberOrDate')
thumbnails = None
images = data.get('images')
if images and isinstance(images, dict):
web_images = images.get('webImages')
if isinstance(web_images, list):
thumbnails = [{
'url': image['imageUrl'],
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
} for image in web_images if image.get('imageUrl')]
description = data.get('description')
common_info = {
'description': description,
'series': series,
'episode': episode,
'age_limit': parse_age_limit(data.get('legalAge')),
'thumbnails': thumbnails,
}
vcodec = 'none' if data.get('mediaType') == 'Audio' else None
# TODO: extract chapters when https://github.com/rg3/youtube-dl/pull/9409 is merged
for entry in entries:
entry.update(common_info)
for f in entry['formats']:
f['vcodec'] = vcodec
return self.playlist_result(entries, video_id, title, description)
class NRKIE(NRKBaseIE):
_VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
_API_HOST = 'v8.psapi.nrk.no'
_TESTS = [{
# video
'url': 'http://www.nrk.no/video/PS*150533', 'url': 'http://www.nrk.no/video/PS*150533',
'md5': 'bccd850baebefe23b56d708a113229c2', 'md5': '2f7f6eeb2aacdd99885f355428715cfa',
'info_dict': { 'info_dict': {
'id': '150533', 'id': '150533',
'ext': 'flv', 'ext': 'mp4',
'title': 'Dompap og andre fugler i Piip-Show', 'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f', 'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
'duration': 263, 'duration': 263,
} }
}, }, {
{ # audio
'url': 'http://www.nrk.no/video/PS*154915', 'url': 'http://www.nrk.no/video/PS*154915',
'md5': '0b1493ba1aae7d9579a5ad5531bc395a', # MD5 is unstable
'info_dict': { 'info_dict': {
'id': '154915', 'id': '154915',
'ext': 'flv', 'ext': 'flv',
@ -42,52 +145,75 @@ class NRKIE(InfoExtractor):
'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568', 'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
'duration': 20, 'duration': 20,
} }
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://v8.psapi.nrk.no/mediaelement/%s' % video_id,
video_id, 'Downloading media JSON')
media_url = data.get('mediaUrl')
if not media_url:
if data['usageRights']['isGeoBlocked']:
raise ExtractorError(
'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
expected=True)
if determine_ext(media_url) == 'f4m':
formats = self._extract_f4m_formats(
media_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id, f4m_id='hds')
self._sort_formats(formats)
else:
formats = [{
'url': media_url,
'ext': 'flv',
}] }]
duration = parse_duration(data.get('duration'))
images = data.get('images') class NRKTVIE(NRKBaseIE):
if images: IE_DESC = 'NRK TV and NRK Radio'
thumbnails = images['webImages'] _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
thumbnails.sort(key=lambda image: image['pixelWidth']) _API_HOST = 'psapi-we.nrk.no'
thumbnail = thumbnails[-1]['imageUrl']
else:
thumbnail = None
return { _TESTS = [{
'id': video_id, 'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'title': data['title'], 'md5': '4e9ca6629f09e588ed240fb11619922a',
'description': data['description'], 'info_dict': {
'duration': duration, 'id': 'MUHH48000314AA',
'thumbnail': thumbnail, 'ext': 'mp4',
'formats': formats, 'title': '20 spørsmål 23.05.2014',
} 'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741.52,
},
}, {
'url': 'https://tv.nrk.no/program/mdfp15000514',
'md5': '43d0be26663d380603a9cf0c24366531',
'info_dict': {
'id': 'MDFP15000514CA',
'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014',
'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
'duration': 4605.08,
},
}, {
# single playlist video
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
},
'skip': 'Only works from Norway',
}, {
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'playlist': [{
'md5': '9480285eff92d64f06e02a5367970a7a',
'info_dict': {
'id': 'MSPO40010515-part1',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
},
}, {
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
},
}],
'info_dict': {
'id': 'MSPO40010515',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'duration': 6947.52,
},
'skip': 'Only works from Norway',
}, {
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True,
}]
class NRKPlaylistIE(InfoExtractor): class NRKPlaylistIE(InfoExtractor):
@ -159,179 +285,3 @@ class NRKSkoleIE(InfoExtractor):
nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id') nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id')
return self.url_result('nrk:%s' % nrk_id) return self.url_result('nrk:%s' % nrk_id)
class NRKTVIE(InfoExtractor):
IE_DESC = 'NRK TV and NRK Radio'
_VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_TESTS = [
{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'info_dict': {
'id': 'MUHH48000314',
'ext': 'mp4',
'title': '20 spørsmål',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'upload_date': '20140523',
'duration': 1741.52,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'https://tv.nrk.no/program/mdfp15000514',
'info_dict': {
'id': 'mdfp15000514',
'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting',
'description': 'md5:654c12511f035aed1e42bdf5db3b206a',
'upload_date': '20140524',
'duration': 4605.08,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
# single playlist video
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
'skip': 'Only works from Norway',
},
{
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'playlist': [
{
'md5': '9480285eff92d64f06e02a5367970a7a',
'info_dict': {
'id': 'MSPO40010515-part1',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
},
{
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
},
],
'info_dict': {
'id': 'MSPO40010515',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
'duration': 6947.5199999999995,
},
'skip': 'Only works from Norway',
},
{
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True,
}
]
def _extract_f4m(self, manifest_url, video_id):
return self._extract_f4m_formats(
manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id, f4m_id='hds')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
part_id = mobj.group('part_id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'title', webpage, 'title')
description = self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._html_search_regex(
r'data-posterimage="([^"]+)"',
webpage, 'thumbnail', fatal=False)
upload_date = unified_strdate(self._html_search_meta(
'rightsfrom', webpage, 'upload date', fatal=False))
duration = float_or_none(self._html_search_regex(
r'data-duration="([^"]+)"',
webpage, 'duration', fatal=False))
# playlist
parts = re.findall(
r'<a href="#del=(\d+)"[^>]+data-argument="([^"]+)">([^<]+)</a>', webpage)
if parts:
entries = []
for current_part_id, stream_url, part_title in parts:
if part_id and current_part_id != part_id:
continue
video_part_id = '%s-part%s' % (video_id, current_part_id)
formats = self._extract_f4m(stream_url, video_part_id)
entries.append({
'id': video_part_id,
'title': part_title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'formats': formats,
})
if part_id:
if entries:
return entries[0]
else:
playlist = self.playlist_result(entries, video_id, title, description)
playlist.update({
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
})
return playlist
formats = []
f4m_url = re.search(r'data-media="([^"]+)"', webpage)
if f4m_url:
formats.extend(self._extract_f4m(f4m_url.group(1), video_id))
m3u8_url = re.search(r'data-hls-media="([^"]+)"', webpage)
if m3u8_url:
formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4', m3u8_id='hls'))
self._sort_formats(formats)
subtitles_url = self._html_search_regex(
r'data-subtitlesurl\s*=\s*(["\'])(?P<url>.+?)\1',
webpage, 'subtitle URL', default=None, group='url')
subtitles = {}
if subtitles_url:
subtitles['no'] = [{
'ext': 'ttml',
'url': compat_urlparse.urljoin(base_url, subtitles_url),
}]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

Some files were not shown because too many files have changed in this diff Show More