Merge remote-tracking branch 'upstream/master'

This commit is contained in:
Redwid 2018-08-20 09:55:43 +01:00
commit c3f284639e
182 changed files with 6250 additions and 3042 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.05.09*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2018.08.04*. If it's not, read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.05.09** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2018.08.04**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/rg3/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2018.05.09 [debug] youtube-dl version 2018.08.04
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

4
.gitignore vendored
View File

@ -47,3 +47,7 @@ youtube-dl.zsh
*.iml *.iml
tmp/ tmp/
venv/
# VS Code related files
.vscode

View File

@ -239,3 +239,10 @@ Martin Weinelt
Surya Oktafendri Surya Oktafendri
TingPing TingPing
Alexandre Macabies Alexandre Macabies
Bastian de Groot
Niklas Haas
András Veres-Szentkirályi
Enes Solak
Nathan Rossi
Thomas van der Berg
Luca Cherubin

280
ChangeLog
View File

@ -1,3 +1,283 @@
version 2018.08.04
Extractors
* [funk:channel] Improve byChannelAlias extraction (#17142)
* [twitch] Fix authentication (#17024, #17126)
* [twitch:vod] Improve URL regular expression (#17135)
* [watchbox] Fix extraction (#17107)
* [pbs] Fix extraction (#17109)
* [theplatform] Relax URL regular expression (#16181, #17097)
+ [viqeo] Add support for viqeo.tv (#17066)
version 2018.07.29
Extractors
* [crunchyroll:playlist] Restrict URL regular expression (#17069, #17076)
+ [pornhub] Add support for subtitles (#16924, #17088)
* [ceskatelevize] Use https for API call (#16997, #16999)
* [dailymotion:playlist] Fix extraction (#16894)
* [ted] Improve extraction
* [ted] Fix extraction for videos without nativeDownloads (#16756, #17085)
* [telecinco] Fix extraction (#17080)
* [mitele] Reduce number of requests
* [rai] Return non HTTP relinker URL intact (#17055)
* [vk] Fix extraction for inline only videos (#16923)
* [streamcloud] Fix extraction (#17054)
* [facebook] Fix tahoe player extraction with authentication (#16655)
+ [puhutv] Add support for puhutv.com (#12712, #16010, #16269)
version 2018.07.21
Core
+ [utils] Introduce url_or_none
* [utils] Allow JSONP without function name (#17028)
+ [extractor/common] Extract DASH and MSS formats from SMIL manifests
Extractors
+ [bbc] Add support for BBC Radio Play pages (#17022)
* [iwara] Fix download URLs (#17026)
* [vrtnu] Relax title extraction and extract JSON-LD (#17018)
+ [viu] Pass Referer and Origin headers and area id (#16992)
+ [vimeo] Add another config regular expression (#17013)
+ [facebook] Extract view count (#16942)
* [dailymotion] Improve description extraction (#16984)
* [slutload] Fix and improve extraction (#17001)
* [mediaset] Fix extraction (#16977)
+ [theplatform] Add support for theplatform TLD customization (#16977)
* [imgur] Relax URL regular expression (#16987)
* [pornhub] Improve extraction and extract all formats (#12166, #15891, #16262,
#16959)
version 2018.07.10
Core
* [utils] Share JSON-LD regular expression
* [downloader/dash] Improve error handling (#16927)
Extractors
+ [nrktv] Add support for new season and serie URL schema
+ [nrktv] Add support for new episode URL schema (#16909)
+ [frontendmasters] Add support for frontendmasters.com (#3661, #16328)
* [funk] Fix extraction (#16918)
* [watchbox] Fix extraction (#16904)
* [dplayit] Sort formats
* [dplayit] Fix extraction (#16901)
* [youtube] Improve login error handling (#13822)
version 2018.07.04
Core
* [extractor/common] Properly escape % in MPD templates (#16867)
* [extractor/common] Use source URL as Referer for HTML5 entries (16849)
* Prefer ffmpeg over avconv by default (#8622)
Extractors
* [pluralsight] Switch to graphql (#16889, #16895, #16896, #16899)
* [lynda] Simplify login and improve error capturing (#16891)
+ [go90] Add support for embed URLs (#16873)
* [go90] Detect geo restriction error and pass geo verification headers
(#16874)
* [vlive] Fix live streams extraction (#16871)
* [npo] Fix typo (#16872)
+ [mediaset] Add support for new videos and extract all formats (#16568)
* [dctptv] Restore extraction based on REST API (#16850)
* [svt] Improve extraction and add support for pages (#16802)
* [porncom] Fix extraction (#16808)
version 2018.06.25
Extractors
* [joj] Relax URL regular expression (#16771)
* [brightcove] Workaround sonyliv DRM protected videos (#16807)
* [motherless] Fix extraction (#16786)
* [itv] Make SOAP request non fatal and extract metadata from webpage (#16780)
- [foxnews:insider] Remove extractor (#15810)
+ [foxnews] Add support for iframe embeds (#15810, #16711)
version 2018.06.19
Core
+ [extractor/common] Introduce expected_status in _download_* methods
for convenient accept of HTTP requests failed with non 2xx status codes
+ [compat] Introduce compat_integer_types
Extractors
* [peertube] Improve generic support (#16733)
+ [6play] Use geo verification headers
* [rtbf] Fix extraction for python 3.2
* [vgtv] Improve HLS formats extraction
+ [vgtv] Add support for www.aftonbladet.se/tv URLs
* [bbccouk] Use expected_status
* [markiza] Expect 500 HTTP status code
* [tvnow] Try all clear manifest URLs (#15361)
version 2018.06.18
Core
* [downloader/rtmp] Fix downloading in verbose mode (#16736)
Extractors
+ [markiza] Add support for markiza.sk (#16750)
* [wat] Try all supported adaptive URLs
+ [6play] Add support for rtlplay.be and extract hd usp formats
+ [rtbf] Add support for audio and live streams (#9638, #11923)
+ [rtbf] Extract HLS, DASH and all HTTP formats
+ [rtbf] Extract subtitles
+ [rtbf] Fixup specific HTTP URLs (#16101)
+ [expressen] Add support for expressen.se
* [vidzi] Fix extraction (#16678)
* [pbs] Improve extraction (#16623, #16684)
* [bilibili] Restrict cid regular expression (#16638, #16734)
version 2018.06.14
Core
* [downloader/http] Fix retry on error when streaming to stdout (#16699)
Extractors
+ [discoverynetworks] Add support for disco-api videos (#16724)
+ [dailymotion] Add support for password protected videos (#9789)
+ [abc:iview] Add support for livestreams (#12354)
* [abc:iview] Fix extraction (#16704)
+ [crackle] Add support for sonycrackle.com (#16698)
+ [tvnet] Add support for tvnet.gov.vn (#15462)
* [nrk] Update API hosts and try all previously known ones (#16690)
* [wimp] Fix Youtube embeds extraction
version 2018.06.11
Extractors
* [npo] Extend URL regular expression and add support for npostart.nl (#16682)
+ [inc] Add support for another embed schema (#16666)
* [tv4] Fix format extraction (#16650)
+ [nexx] Add support for free cdn (#16538)
+ [pbs] Add another cove id pattern (#15373)
+ [rbmaradio] Add support for 192k format (#16631)
version 2018.06.04
Extractors
+ [camtube] Add support for camtube.co
+ [twitter:card] Extract guest token (#16609)
+ [chaturbate] Use geo verification headers
+ [bbc] Add support for bbcthree (#16612)
* [youtube] Move metadata extraction after video availability check
+ [youtube] Extract track and artist
+ [safari] Add support for new URL schema (#16614)
* [adn] Fix extraction
version 2018.06.02
Core
* [utils] Improve determine_ext
Extractors
+ [facebook] Add support for tahoe player videos (#15441, #16554)
* [cbc] Improve extraction (#16583, #16593)
* [openload] Improve ext extraction (#16595)
+ [twitter:card] Add support for another endpoint (#16586)
+ [openload] Add support for oload.win and oload.download (#16592)
* [audimedia] Fix extraction (#15309)
+ [francetv] Add support for sport.francetvinfo.fr (#15645)
* [mlb] Improve extraction (#16587)
- [nhl] Remove old extractors
* [rbmaradio] Check formats availability (#16585)
version 2018.05.30
Core
* [downloader/rtmp] Generalize download messages and report time elapsed
on finish
* [downloader/rtmp] Gracefully handle live streams interrupted by user
Extractors
* [teamcoco] Fix extraction for full episodes (#16573)
* [spiegel] Fix info extraction (#16538)
+ [apa] Add support for apa.at (#15041, #15672)
+ [bellmedia] Add support for bnnbloomberg.ca (#16560)
+ [9c9media] Extract MPD formats and subtitles
* [cammodels] Use geo verification headers
+ [ufctv] Add support for authentication (#16542)
+ [cammodels] Add support for cammodels.com (#14499)
* [utils] Fix style id extraction for namespaced id attribute in dfxp2srt
(#16551)
* [soundcloud] Detect format extension (#16549)
* [cbc] Fix playlist title extraction (#16502)
+ [tumblr] Detect and report sensitive media (#13829)
+ [tumblr] Add support for authentication (#15133)
version 2018.05.26
Core
* [utils] Improve parse_age_limit
Extractors
* [audiomack] Stringify video id (#15310)
* [izlesene] Fix extraction (#16233, #16271, #16407)
+ [indavideo] Add support for generic embeds (#11989)
* [indavideo] Fix extraction (#11221)
* [indavideo] Sign download URLs (#16174)
+ [peertube] Add support for PeerTube based sites (#16301, #16329)
* [imgur] Fix extraction (#16537)
+ [hidive] Add support for authentication (#16534)
+ [nbc] Add support for stream.nbcsports.com (#13911)
+ [viewlift] Add support for hoichoi.tv (#16536)
* [go90] Extract age limit and detect DRM protection(#10127)
* [viewlift] fix extraction for snagfilms.com (#15766)
* [globo] Improve extraction (#4189)
* Add support for authentication
* Simplify URL signing
* Extract DASH and MSS formats
* [leeco] Fix extraction (#16464)
* [teamcoco] Add fallback for format extraction (#16484)
* [teamcoco] Improve URL regular expression (#16484)
* [imdb] Improve extraction (#4085, #14557)
version 2018.05.18
Extractors
* [vimeo:likes] Relax URL regular expression and fix single page likes
extraction (#16475)
* [pluralsight] Fix clip id extraction (#16460)
+ [mychannels] Add support for mychannels.com (#15334)
- [moniker] Remove extractor (#15336)
* [pbs] Fix embed data extraction (#16474)
+ [mtv] Add support for paramountnetwork.com and bellator.com (#15418)
* [youtube] Fix hd720 format position
* [dailymotion] Remove fragment part from m3u8 URLs (#8915)
* [3sat] Improve extraction (#15350)
* Extract all formats
* Extract more format metadata
* Improve format sorting
* Use hls native downloader
* Detect and bypass geo-restriction
+ [dtube] Add support for d.tube (#15201)
* [options] Fix typo (#16450)
* [youtube] Improve format filesize extraction (#16453)
* [youtube] Make uploader extraction non fatal (#16444)
* [youtube] Fix extraction for embed restricted live streams (#16433)
* [nbc] Improve info extraction (#16440)
* [twitch:clips] Fix extraction (#16429)
* [redditr] Relax URL regular expression (#16426, #16427)
* [mixcloud] Bypass throttling for HTTP formats (#12579, #16424)
+ [nick] Add support for nickjr.de (#13230)
* [teamcoco] Fix extraction (#16374)
version 2018.05.09 version 2018.05.09
Core Core

View File

@ -18,7 +18,7 @@ Forked version that could work on android
# INSTALLATION # INSTALLATION
To install it right away for all UNIX users (Linux, OS X, etc.), type: To install it right away for all UNIX users (Linux, macOS, etc.), type:
sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl sudo chmod a+rx /usr/local/bin/youtube-dl
@ -36,7 +36,7 @@ You can also use pip:
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information. This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
OS X users can install youtube-dl with [Homebrew](https://brew.sh/): macOS users can install youtube-dl with [Homebrew](https://brew.sh/):
brew install youtube-dl brew install youtube-dl
@ -94,8 +94,8 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
## Network Options: ## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. --proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
To enable experimental SOCKS proxy, specify To enable SOCKS proxy, specify a proper
a proper scheme. For example scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection string (--proxy "") for direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds --socket-timeout SECONDS Time to wait before giving up, in seconds
@ -107,19 +107,18 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--geo-verification-proxy URL Use this proxy to verify the IP address for --geo-verification-proxy URL Use this proxy to verify the IP address for
some geo-restricted sites. The default some geo-restricted sites. The default
proxy specified by --proxy (or none, if the proxy specified by --proxy (or none, if the
options is not present) is used for the option is not present) is used for the
actual downloading. actual downloading.
--geo-bypass Bypass geographic restriction via faking --geo-bypass Bypass geographic restriction via faking
X-Forwarded-For HTTP header (experimental) X-Forwarded-For HTTP header
--no-geo-bypass Do not bypass geographic restriction via --no-geo-bypass Do not bypass geographic restriction via
faking X-Forwarded-For HTTP header faking X-Forwarded-For HTTP header
(experimental)
--geo-bypass-country CODE Force bypass geographic restriction with --geo-bypass-country CODE Force bypass geographic restriction with
explicitly provided two-letter ISO 3166-2 explicitly provided two-letter ISO 3166-2
country code (experimental) country code
--geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction with --geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction with
explicitly provided IP block in CIDR explicitly provided IP block in CIDR
notation (experimental) notation
## Video Selection: ## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1) --playlist-start NUMBER Playlist video to start at (default is 1)
@ -210,7 +209,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
--playlist-reverse Download playlist videos in reverse order --playlist-reverse Download playlist videos in reverse order
--playlist-random Download playlist videos in random order --playlist-random Download playlist videos in random order
--xattr-set-filesize Set file xattribute ytdl.filesize with --xattr-set-filesize Set file xattribute ytdl.filesize with
expected file size (experimental) expected file size
--hls-prefer-native Use the native HLS downloader instead of --hls-prefer-native Use the native HLS downloader instead of
ffmpeg ffmpeg
--hls-prefer-ffmpeg Use ffmpeg instead of the native HLS --hls-prefer-ffmpeg Use ffmpeg instead of the native HLS
@ -429,9 +428,9 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
default; fix file if we can, warn default; fix file if we can, warn
otherwise) otherwise)
--prefer-avconv Prefer avconv over ffmpeg for running the --prefer-avconv Prefer avconv over ffmpeg for running the
postprocessors (default)
--prefer-ffmpeg Prefer ffmpeg over avconv for running the
postprocessors postprocessors
--prefer-ffmpeg Prefer ffmpeg over avconv for running the
postprocessors (default)
--ffmpeg-location PATH Location of the ffmpeg/avconv binary; --ffmpeg-location PATH Location of the ffmpeg/avconv binary;
either the path to the binary or its either the path to the binary or its
containing directory. containing directory.
@ -444,7 +443,7 @@ Alternatively, refer to the [developer instructions](#developer-instructions) fo
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself. You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and macOS, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory: For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```
@ -872,7 +871,7 @@ Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the op
Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`. Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox). In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format. Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.

View File

@ -13,7 +13,7 @@ year = str(datetime.datetime.now().year)
for fn in glob.glob('*.html*'): for fn in glob.glob('*.html*'):
with io.open(fn, encoding='utf-8') as f: with io.open(fn, encoding='utf-8') as f:
content = f.read() content = f.read()
newc = re.sub(r'(?P<copyright>Copyright © 2006-)(?P<year>[0-9]{4})', 'Copyright © 2006-' + year, content) newc = re.sub(r'(?P<copyright>Copyright © 2011-)(?P<year>[0-9]{4})', 'Copyright © 2011-' + year, content)
if content != newc: if content != newc:
tmpFn = fn + '.part' tmpFn = fn + '.part'
with io.open(tmpFn, 'wt', encoding='utf-8') as outf: with io.open(tmpFn, 'wt', encoding='utf-8') as outf:

View File

@ -15,7 +15,6 @@
- **8tracks** - **8tracks**
- **91porn** - **91porn**
- **9c9media** - **9c9media**
- **9c9media:stack**
- **9gag** - **9gag**
- **9now.com.au** - **9now.com.au**
- **abc.net.au** - **abc.net.au**
@ -48,6 +47,7 @@
- **anitube.se** - **anitube.se**
- **Anvato** - **Anvato**
- **AnySex** - **AnySex**
- **APA**
- **Aparat** - **Aparat**
- **AppleConnect** - **AppleConnect**
- **AppleDaily**: 臺灣蘋果日報 - **AppleDaily**: 臺灣蘋果日報
@ -100,6 +100,7 @@
- **Beatport** - **Beatport**
- **Beeg** - **Beeg**
- **BehindKink** - **BehindKink**
- **Bellator**
- **BellMedia** - **BellMedia**
- **Bet** - **Bet**
- **Bigflix** - **Bigflix**
@ -127,6 +128,8 @@
- **BYUtv** - **BYUtv**
- **Camdemy** - **Camdemy**
- **CamdemyFolder** - **CamdemyFolder**
- **CamModels**
- **CamTube**
- **CamWithHer** - **CamWithHer**
- **canalc2.tv** - **canalc2.tv**
- **Canalplus**: mycanal.fr and piwiplus.fr - **Canalplus**: mycanal.fr and piwiplus.fr
@ -234,6 +237,7 @@
- **DrTuber** - **DrTuber**
- **drtv** - **drtv**
- **drtv:live** - **drtv:live**
- **DTube**
- **Dumpert** - **Dumpert**
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **dw** - **dw**
@ -262,6 +266,7 @@
- **Europa** - **Europa**
- **EveryonesMixtape** - **EveryonesMixtape**
- **ExpoTV** - **ExpoTV**
- **Expressen**
- **ExtremeTube** - **ExtremeTube**
- **EyedoTV** - **EyedoTV**
- **facebook** - **facebook**
@ -285,7 +290,6 @@
- **Foxgay** - **Foxgay**
- **foxnews**: Fox News and Fox Business Video - **foxnews**: Fox News and Fox Business Video
- **foxnews:article** - **foxnews:article**
- **foxnews:insider**
- **FoxSports** - **FoxSports**
- **france2.fr:generation-what** - **france2.fr:generation-what**
- **FranceCulture** - **FranceCulture**
@ -298,6 +302,9 @@
- **Freesound** - **Freesound**
- **freespeech.org** - **freespeech.org**
- **FreshLive** - **FreshLive**
- **FrontendMasters**
- **FrontendMastersCourse**
- **FrontendMastersLesson**
- **Funimation** - **Funimation**
- **FunkChannel** - **FunkChannel**
- **FunkMix** - **FunkMix**
@ -363,7 +370,6 @@
- **ImgurAlbum** - **ImgurAlbum**
- **Ina** - **Ina**
- **Inc** - **Inc**
- **Indavideo**
- **IndavideoEmbed** - **IndavideoEmbed**
- **InfoQ** - **InfoQ**
- **Instagram** - **Instagram**
@ -448,11 +454,12 @@
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **mailru:music**: Музыка@Mail.Ru - **mailru:music**: Музыка@Mail.Ru
- **mailru:music:search**: Музыка@Mail.Ru - **mailru:music:search**: Музыка@Mail.Ru
- **MakersChannel**
- **MakerTV** - **MakerTV**
- **mangomolo:live** - **mangomolo:live**
- **mangomolo:video** - **mangomolo:video**
- **ManyVids** - **ManyVids**
- **Markiza**
- **MarkizaPage**
- **massengeschmack.tv** - **massengeschmack.tv**
- **MatchTV** - **MatchTV**
- **MDR**: MDR.DE and KiKA - **MDR**: MDR.DE and KiKA
@ -486,7 +493,6 @@
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **Mojvideo** - **Mojvideo**
- **Moniker**: allmyvideos.net and vidspot.net
- **Morningstar**: morningstar.com - **Morningstar**: morningstar.com
- **Motherless** - **Motherless**
- **MotherlessGroup** - **MotherlessGroup**
@ -508,6 +514,7 @@
- **mva:course**: Microsoft Virtual Academy courses - **mva:course**: Microsoft Virtual Academy courses
- **Mwave** - **Mwave**
- **MwaveMeetGreet** - **MwaveMeetGreet**
- **MyChannels**
- **MySpace** - **MySpace**
- **MySpace:album** - **MySpace:album**
- **MySpass** - **MySpass**
@ -525,6 +532,7 @@
- **nbcolympics** - **nbcolympics**
- **nbcolympics:stream** - **nbcolympics:stream**
- **NBCSports** - **NBCSports**
- **NBCSportsStream**
- **NBCSportsVPlayer** - **NBCSportsVPlayer**
- **ndr**: NDR.de - Norddeutscher Rundfunk - **ndr**: NDR.de - Norddeutscher Rundfunk
- **ndr:embed** - **ndr:embed**
@ -551,9 +559,6 @@
- **nfl.com** - **nfl.com**
- **NhkVod** - **NhkVod**
- **nhl.com** - **nhl.com**
- **nhl.com:news**: NHL news
- **nhl.com:videocenter**
- **nhl.com:videocenter:category**: NHL videocenter category
- **nick.com** - **nick.com**
- **nick.de** - **nick.de**
- **nickelodeon:br** - **nickelodeon:br**
@ -587,7 +592,9 @@
- **NRKSkole**: NRK Skole - **NRKSkole**: NRK Skole
- **NRKTV**: NRK TV and NRK Radio - **NRKTV**: NRK TV and NRK Radio
- **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte - **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte
- **NRKTVEpisode**
- **NRKTVEpisodes** - **NRKTVEpisodes**
- **NRKTVSeason**
- **NRKTVSeries** - **NRKTVSeries**
- **ntv.ru** - **ntv.ru**
- **Nuvid** - **Nuvid**
@ -618,11 +625,13 @@
- **PacktPubCourse** - **PacktPubCourse**
- **PandaTV**: 熊猫TV - **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV - **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos - **parliamentlive.tv**: UK parliament videos
- **Patreon** - **Patreon**
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC) - **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag** - **pcmag**
- **PearVideo** - **PearVideo**
- **PeerTube**
- **People** - **People**
- **PerformGroup** - **PerformGroup**
- **periscope**: Periscope - **periscope**: Periscope
@ -663,6 +672,8 @@
- **PrimeShareTV** - **PrimeShareTV**
- **PromptFile** - **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital - **prosiebensat1**: ProSiebenSat.1 Digital
- **puhutv**
- **puhutv:serie**
- **Puls4** - **Puls4**
- **Pyvideo** - **Pyvideo**
- **qqmusic**: QQ音乐 - **qqmusic**: QQ音乐
@ -789,7 +800,7 @@
- **Spiegel** - **Spiegel**
- **Spiegel:Article**: Articles on spiegel.de - **Spiegel:Article**: Articles on spiegel.de
- **Spiegeltv** - **Spiegeltv**
- **Spike** - **sport.francetvinfo.fr**
- **Sport5** - **Sport5**
- **SportBoxEmbed** - **SportBoxEmbed**
- **SportDeutschland** - **SportDeutschland**
@ -809,6 +820,7 @@
- **StretchInternet** - **StretchInternet**
- **SunPorno** - **SunPorno**
- **SVT** - **SVT**
- **SVTPage**
- **SVTPlay**: SVT Play and Öppet arkiv - **SVTPlay**: SVT Play and Öppet arkiv
- **SVTSeries** - **SVTSeries**
- **SWRMediathek** - **SWRMediathek**
@ -891,6 +903,7 @@
- **tvigle**: Интернет-телевидение Tvigle.ru - **tvigle**: Интернет-телевидение Tvigle.ru
- **tvland.com** - **tvland.com**
- **TVN24** - **TVN24**
- **TVNet**
- **TVNoe** - **TVNoe**
- **TVNow** - **TVNow**
- **TVNowList** - **TVNowList**
@ -988,6 +1001,7 @@
- **Vimple**: Vimple - one-click video hosting - **Vimple**: Vimple - one-click video hosting
- **Vine** - **Vine**
- **vine:user** - **vine:user**
- **Viqeo**
- **Viu** - **Viu**
- **viu:ott** - **viu:ott**
- **viu:playlist** - **viu:playlist**

View File

@ -2,5 +2,5 @@
universal = True universal = True
[flake8] [flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git,venv
ignore = E402,E501,E731,E741 ignore = E402,E501,E731,E741

View File

@ -78,6 +78,7 @@ from youtube_dl.utils import (
uppercase_escape, uppercase_escape,
lowercase_escape, lowercase_escape,
url_basename, url_basename,
url_or_none,
base_url, base_url,
urljoin, urljoin,
urlencode_postdata, urlencode_postdata,
@ -361,6 +362,7 @@ class TestUtil(unittest.TestCase):
self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None) self.assertEqual(determine_ext('http://example.com/foo/bar.nonext/?download', None), None)
self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None) self.assertEqual(determine_ext('http://example.com/foo/bar/mp4?download', None), None)
self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8') self.assertEqual(determine_ext('http://example.com/foo/bar.m3u8//?download'), 'm3u8')
self.assertEqual(determine_ext('foobar', None), None)
def test_find_xpath_attr(self): def test_find_xpath_attr(self):
testxml = '''<root> testxml = '''<root>
@ -506,6 +508,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(urljoin('http://foo.de/', ['foobar']), None) self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt') self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
def test_url_or_none(self):
self.assertEqual(url_or_none(None), None)
self.assertEqual(url_or_none(''), None)
self.assertEqual(url_or_none('foo'), None)
self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de')
self.assertEqual(url_or_none('https://foo.de'), 'https://foo.de')
self.assertEqual(url_or_none('http$://foo.de'), None)
self.assertEqual(url_or_none('http://foo.de'), 'http://foo.de')
self.assertEqual(url_or_none('//foo.de'), '//foo.de')
def test_parse_age_limit(self): def test_parse_age_limit(self):
self.assertEqual(parse_age_limit(None), None) self.assertEqual(parse_age_limit(None), None)
self.assertEqual(parse_age_limit(False), None) self.assertEqual(parse_age_limit(False), None)
@ -519,6 +531,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_age_limit('PG-13'), 13) self.assertEqual(parse_age_limit('PG-13'), 13)
self.assertEqual(parse_age_limit('TV-14'), 14) self.assertEqual(parse_age_limit('TV-14'), 14)
self.assertEqual(parse_age_limit('TV-MA'), 17) self.assertEqual(parse_age_limit('TV-MA'), 17)
self.assertEqual(parse_age_limit('TV14'), 14)
self.assertEqual(parse_age_limit('TV_G'), 0)
def test_parse_duration(self): def test_parse_duration(self):
self.assertEqual(parse_duration(None), None) self.assertEqual(parse_duration(None), None)
@ -714,6 +728,10 @@ class TestUtil(unittest.TestCase):
d = json.loads(stripped) d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'}) self.assertEqual(d, {'status': 'success'})
stripped = strip_jsonp('({"status": "success"});')
d = json.loads(stripped)
self.assertEqual(d, {'status': 'success'})
def test_uppercase_escape(self): def test_uppercase_escape(self):
self.assertEqual(uppercase_escape(''), '') self.assertEqual(uppercase_escape(''), '')
self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐') self.assertEqual(uppercase_escape('\\U0001d550'), '𝕐')

View File

@ -211,7 +211,7 @@ class YoutubeDL(object):
At the moment, this is only supported by YouTube. At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
geo_verification_proxy: URL of the proxy to use for IP address verification geo_verification_proxy: URL of the proxy to use for IP address verification
on geo-restricted sites. (Experimental) on geo-restricted sites.
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
@ -259,7 +259,7 @@ class YoutubeDL(object):
- "warn": only emit a warning - "warn": only emit a warning
- "detect_or_warn": check whether we can do anything - "detect_or_warn": check whether we can do anything
about it, warn otherwise (default) about it, warn otherwise (default)
source_address: (Experimental) Client-side IP address to bind to. source_address: Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the call_home: Boolean, true iff we are allowed to contact the
youtube-dl servers for debugging. youtube-dl servers for debugging.
sleep_interval: Number of seconds to sleep before each download when sleep_interval: Number of seconds to sleep before each download when
@ -281,14 +281,14 @@ class YoutubeDL(object):
match_filter_func in utils.py is one example for this. match_filter_func in utils.py is one example for this.
no_color: Do not emit color codes in output. no_color: Do not emit color codes in output.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header (experimental) HTTP header
geo_bypass_country: geo_bypass_country:
Two-letter ISO 3166-2 country code that will be used for Two-letter ISO 3166-2 country code that will be used for
explicit geographic restriction bypassing via faking explicit geographic restriction bypassing via faking
X-Forwarded-For HTTP header (experimental) X-Forwarded-For HTTP header
geo_bypass_ip_block: geo_bypass_ip_block:
IP range in CIDR notation that will be used similarly to IP range in CIDR notation that will be used similarly to
geo_bypass_country (experimental) geo_bypass_country
The following options determine which downloader is picked: The following options determine which downloader is picked:
external_downloader: Executable of the external downloader to call. external_downloader: Executable of the external downloader to call.
@ -305,8 +305,8 @@ class YoutubeDL(object):
http_chunk_size. http_chunk_size.
The following options are used by the post processors: The following options are used by the post processors:
prefer_ffmpeg: If True, use ffmpeg instead of avconv if both are available, prefer_ffmpeg: If False, use avconv instead of ffmpeg if both are available,
otherwise prefer avconv. otherwise prefer ffmpeg.
postprocessor_args: A list of additional command-line arguments for the postprocessor_args: A list of additional command-line arguments for the
postprocessor. postprocessor.

View File

@ -2787,6 +2787,12 @@ except NameError: # Python 3
compat_numeric_types = (int, float, complex) compat_numeric_types = (int, float, complex)
try:
compat_integer_types = (int, long)
except NameError: # Python 3
compat_integer_types = (int, )
if sys.version_info < (2, 7): if sys.version_info < (2, 7):
def compat_socket_create_connection(address, timeout, source_address=None): def compat_socket_create_connection(address, timeout, source_address=None):
host, port = address host, port = address
@ -2974,6 +2980,7 @@ __all__ = [
'compat_http_client', 'compat_http_client',
'compat_http_server', 'compat_http_server',
'compat_input', 'compat_input',
'compat_integer_types',
'compat_itertools_count', 'compat_itertools_count',
'compat_kwargs', 'compat_kwargs',
'compat_numeric_types', 'compat_numeric_types',

View File

@ -45,7 +45,6 @@ class FileDownloader(object):
min_filesize: Skip files smaller than this size min_filesize: Skip files smaller than this size
max_filesize: Skip files larger than this size max_filesize: Skip files larger than this size
xattr_set_filesize: Set ytdl.filesize user xattribute with expected size. xattr_set_filesize: Set ytdl.filesize user xattribute with expected size.
(experimental)
external_downloader_args: A list of additional command-line arguments for the external_downloader_args: A list of additional command-line arguments for the
external downloader. external downloader.
hls_use_mpegts: Use the mpegts container for HLS videos. hls_use_mpegts: Use the mpegts container for HLS videos.

View File

@ -2,7 +2,10 @@ from __future__ import unicode_literals
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error from ..compat import compat_urllib_error
from ..utils import urljoin from ..utils import (
DownloadError,
urljoin,
)
class DashSegmentsFD(FragmentFD): class DashSegmentsFD(FragmentFD):
@ -57,6 +60,14 @@ class DashSegmentsFD(FragmentFD):
count += 1 count += 1
if count <= fragment_retries: if count <= fragment_retries:
self.report_retry_fragment(err, frag_index, count, fragment_retries) self.report_retry_fragment(err, frag_index, count, fragment_retries)
except DownloadError:
# Don't retry fragment if error occurred during HTTP downloading
# itself since it has own retry settings
if not fatal:
self.report_skip_fragment(frag_index)
break
raise
if count > fragment_retries: if count > fragment_retries:
if not fatal: if not fatal:
self.report_skip_fragment(frag_index) self.report_skip_fragment(frag_index)

View File

@ -217,10 +217,11 @@ class HttpFD(FileDownloader):
before = start # start measuring before = start # start measuring
def retry(e): def retry(e):
if ctx.tmpfilename != '-': to_stdout = ctx.tmpfilename == '-'
if not to_stdout:
ctx.stream.close() ctx.stream.close()
ctx.stream = None ctx.stream = None
ctx.resume_len = os.path.getsize(encodeFilename(ctx.tmpfilename)) ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e) raise RetryDownload(e)
while True: while True:

View File

@ -29,66 +29,68 @@ class RtmpFD(FileDownloader):
proc = subprocess.Popen(args, stderr=subprocess.PIPE) proc = subprocess.Popen(args, stderr=subprocess.PIPE)
cursor_in_new_line = True cursor_in_new_line = True
proc_stderr_closed = False proc_stderr_closed = False
while not proc_stderr_closed: try:
# read line from stderr while not proc_stderr_closed:
line = '' # read line from stderr
while True: line = ''
char = proc.stderr.read(1) while True:
if not char: char = proc.stderr.read(1)
proc_stderr_closed = True if not char:
break proc_stderr_closed = True
if char in [b'\r', b'\n']: break
break if char in [b'\r', b'\n']:
line += char.decode('ascii', 'replace') break
if not line: line += char.decode('ascii', 'replace')
# proc_stderr_closed is True if not line:
continue # proc_stderr_closed is True
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec \(([0-9]{1,2}\.[0-9])%\)', line) continue
if mobj: mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec \(([0-9]{1,2}\.[0-9])%\)', line)
downloaded_data_len = int(float(mobj.group(1)) * 1024)
percent = float(mobj.group(2))
if not resume_percent:
resume_percent = percent
resume_downloaded_data_len = downloaded_data_len
time_now = time.time()
eta = self.calc_eta(start, time_now, 100 - resume_percent, percent - resume_percent)
speed = self.calc_speed(start, time_now, downloaded_data_len - resume_downloaded_data_len)
data_len = None
if percent > 0:
data_len = int(downloaded_data_len * 100 / percent)
self._hook_progress({
'status': 'downloading',
'downloaded_bytes': downloaded_data_len,
'total_bytes_estimate': data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'eta': eta,
'elapsed': time_now - start,
'speed': speed,
})
cursor_in_new_line = False
else:
# no percent for live streams
mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line)
if mobj: if mobj:
downloaded_data_len = int(float(mobj.group(1)) * 1024) downloaded_data_len = int(float(mobj.group(1)) * 1024)
percent = float(mobj.group(2))
if not resume_percent:
resume_percent = percent
resume_downloaded_data_len = downloaded_data_len
time_now = time.time() time_now = time.time()
speed = self.calc_speed(start, time_now, downloaded_data_len) eta = self.calc_eta(start, time_now, 100 - resume_percent, percent - resume_percent)
speed = self.calc_speed(start, time_now, downloaded_data_len - resume_downloaded_data_len)
data_len = None
if percent > 0:
data_len = int(downloaded_data_len * 100 / percent)
self._hook_progress({ self._hook_progress({
'status': 'downloading',
'downloaded_bytes': downloaded_data_len, 'downloaded_bytes': downloaded_data_len,
'total_bytes_estimate': data_len,
'tmpfilename': tmpfilename, 'tmpfilename': tmpfilename,
'filename': filename, 'filename': filename,
'status': 'downloading', 'eta': eta,
'elapsed': time_now - start, 'elapsed': time_now - start,
'speed': speed, 'speed': speed,
}) })
cursor_in_new_line = False cursor_in_new_line = False
elif self.params.get('verbose', False): else:
if not cursor_in_new_line: # no percent for live streams
self.to_screen('') mobj = re.search(r'([0-9]+\.[0-9]{3}) kB / [0-9]+\.[0-9]{2} sec', line)
cursor_in_new_line = True if mobj:
self.to_screen('[rtmpdump] ' + line) downloaded_data_len = int(float(mobj.group(1)) * 1024)
proc.wait() time_now = time.time()
speed = self.calc_speed(start, time_now, downloaded_data_len)
self._hook_progress({
'downloaded_bytes': downloaded_data_len,
'tmpfilename': tmpfilename,
'filename': filename,
'status': 'downloading',
'elapsed': time_now - start,
'speed': speed,
})
cursor_in_new_line = False
elif self.params.get('verbose', False):
if not cursor_in_new_line:
self.to_screen('')
cursor_in_new_line = True
self.to_screen('[rtmpdump] ' + line)
finally:
proc.wait()
if not cursor_in_new_line: if not cursor_in_new_line:
self.to_screen('') self.to_screen('')
return proc.returncode return proc.returncode
@ -163,7 +165,15 @@ class RtmpFD(FileDownloader):
RD_INCOMPLETE = 2 RD_INCOMPLETE = 2
RD_NO_CONNECT = 3 RD_NO_CONNECT = 3
retval = run_rtmpdump(args) started = time.time()
try:
retval = run_rtmpdump(args)
except KeyboardInterrupt:
if not info_dict.get('is_live'):
raise
retval = RD_SUCCESS
self.to_screen('\n[rtmpdump] Interrupted by user')
if retval == RD_NO_CONNECT: if retval == RD_NO_CONNECT:
self.report_error('[rtmpdump] Could not connect to RTMP server.') self.report_error('[rtmpdump] Could not connect to RTMP server.')
@ -171,7 +181,7 @@ class RtmpFD(FileDownloader):
while retval in (RD_INCOMPLETE, RD_FAILED) and not test and not live: while retval in (RD_INCOMPLETE, RD_FAILED) and not test and not live:
prevsize = os.path.getsize(encodeFilename(tmpfilename)) prevsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('[rtmpdump] %s bytes' % prevsize) self.to_screen('[rtmpdump] Downloaded %s bytes' % prevsize)
time.sleep(5.0) # This seems to be needed time.sleep(5.0) # This seems to be needed
args = basic_args + ['--resume'] args = basic_args + ['--resume']
if retval == RD_FAILED: if retval == RD_FAILED:
@ -188,13 +198,14 @@ class RtmpFD(FileDownloader):
break break
if retval == RD_SUCCESS or (test and retval == RD_INCOMPLETE): if retval == RD_SUCCESS or (test and retval == RD_INCOMPLETE):
fsize = os.path.getsize(encodeFilename(tmpfilename)) fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('[rtmpdump] %s bytes' % fsize) self.to_screen('[rtmpdump] Downloaded %s bytes' % fsize)
self.try_rename(tmpfilename, filename) self.try_rename(tmpfilename, filename)
self._hook_progress({ self._hook_progress({
'downloaded_bytes': fsize, 'downloaded_bytes': fsize,
'total_bytes': fsize, 'total_bytes': fsize,
'filename': filename, 'filename': filename,
'status': 'finished', 'status': 'finished',
'elapsed': time.time() - started,
}) })
return True return True
else: else:

View File

@ -105,22 +105,22 @@ class ABCIE(InfoExtractor):
class ABCIViewIE(InfoExtractor): class ABCIViewIE(InfoExtractor):
IE_NAME = 'abc.net.au:iview' IE_NAME = 'abc.net.au:iview'
_VALID_URL = r'https?://iview\.abc\.net\.au/programs/[^/]+/(?P<id>[^/?#]+)' _VALID_URL = r'https?://iview\.abc\.net\.au/(?:[^/]+/)*video/(?P<id>[^/?#]+)'
_GEO_COUNTRIES = ['AU'] _GEO_COUNTRIES = ['AU']
# ABC iview programs are normally available for 14 days only. # ABC iview programs are normally available for 14 days only.
_TESTS = [{ _TESTS = [{
'url': 'https://iview.abc.net.au/programs/ben-and-hollys-little-kingdom/ZY9247A021S00', 'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
'md5': 'cde42d728b3b7c2b32b1b94b4a548afc', 'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
'info_dict': { 'info_dict': {
'id': 'ZY9247A021S00', 'id': 'ZX9371A050S00',
'ext': 'mp4', 'ext': 'mp4',
'title': "Gaston's Visit", 'title': "Gaston's Birthday",
'series': "Ben And Holly's Little Kingdom", 'series': "Ben And Holly's Little Kingdom",
'description': 'md5:18db170ad71cf161e006a4c688e33155', 'description': 'md5:f9de914d02f226968f598ac76f105bcf',
'upload_date': '20180318', 'upload_date': '20180604',
'uploader_id': 'abc4kids', 'uploader_id': 'abc4kids',
'timestamp': 1521400959, 'timestamp': 1528140219,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -129,17 +129,16 @@ class ABCIViewIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) video_params = self._download_json(
video_params = self._parse_json(self._search_regex( 'https://iview.abc.net.au/api/programs/' + video_id, video_id)
r'videoParams\s*=\s*({.+?});', webpage, 'video params'), video_id) title = unescapeHTML(video_params.get('title') or video_params['seriesTitle'])
title = video_params.get('title') or video_params['seriesTitle'] stream = next(s for s in video_params['playlist'] if s.get('type') in ('program', 'livestream'))
stream = next(s for s in video_params['playlist'] if s.get('type') == 'program')
house_number = video_params.get('episodeHouseNumber') house_number = video_params.get('episodeHouseNumber') or video_id
path = '/auth/hls/sign?ts={0}&hn={1}&d=android-mobile'.format( path = '/auth/hls/sign?ts={0}&hn={1}&d=android-tablet'.format(
int(time.time()), house_number) int(time.time()), house_number)
sig = hmac.new( sig = hmac.new(
'android.content.res.Resources'.encode('utf-8'), b'android.content.res.Resources',
path.encode('utf-8'), hashlib.sha256).hexdigest() path.encode('utf-8'), hashlib.sha256).hexdigest()
token = self._download_webpage( token = self._download_webpage(
'http://iview.abc.net.au{0}&sig={1}'.format(path, sig), video_id) 'http://iview.abc.net.au{0}&sig={1}'.format(path, sig), video_id)
@ -169,18 +168,26 @@ class ABCIViewIE(InfoExtractor):
'ext': 'vtt', 'ext': 'vtt',
}] }]
is_live = video_params.get('livestream') == '1'
if is_live:
title = self._live_title(title)
return { return {
'id': video_id, 'id': video_id,
'title': unescapeHTML(title), 'title': title,
'description': self._html_search_meta(['og:description', 'twitter:description'], webpage), 'description': video_params.get('description'),
'thumbnail': self._html_search_meta(['og:image', 'twitter:image:src'], webpage), 'thumbnail': video_params.get('thumbnail'),
'duration': int_or_none(video_params.get('eventDuration')), 'duration': int_or_none(video_params.get('eventDuration')),
'timestamp': parse_iso8601(video_params.get('pubDate'), ' '), 'timestamp': parse_iso8601(video_params.get('pubDate'), ' '),
'series': unescapeHTML(video_params.get('seriesTitle')), 'series': unescapeHTML(video_params.get('seriesTitle')),
'series_id': video_params.get('seriesHouseNumber') or video_id[:7], 'series_id': video_params.get('seriesHouseNumber') or video_id[:7],
'episode_number': int_or_none(self._html_search_meta('episodeNumber', webpage, default=None)), 'season_number': int_or_none(self._search_regex(
'episode': self._html_search_meta('episode_title', webpage, default=None), r'\bSeries\s+(\d+)\b', title, 'season number', default=None)),
'episode_number': int_or_none(self._search_regex(
r'\bEp\s+(\d+)\b', title, 'episode number', default=None)),
'episode_id': house_number,
'uploader_id': video_params.get('channel'), 'uploader_id': video_params.get('channel'),
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'is_live': is_live,
} }

View File

@ -1,8 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import binascii
import json import json
import os import os
import random
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_cbc_decrypt from ..aes import aes_cbc_decrypt
@ -12,9 +15,12 @@ from ..compat import (
) )
from ..utils import ( from ..utils import (
bytes_to_intlist, bytes_to_intlist,
bytes_to_long,
ExtractorError, ExtractorError,
float_or_none, float_or_none,
intlist_to_bytes, intlist_to_bytes,
long_to_bytes,
pkcs1pad,
srt_subtitles_timecode, srt_subtitles_timecode,
strip_or_none, strip_or_none,
urljoin, urljoin,
@ -35,6 +41,7 @@ class ADNIE(InfoExtractor):
} }
} }
_BASE_URL = 'http://animedigitalnetwork.fr' _BASE_URL = 'http://animedigitalnetwork.fr'
_RSA_KEY = (0xc35ae1e4356b65a73b551493da94b8cb443491c0aa092a357a5aee57ffc14dda85326f42d716e539a34542a0d3f363adf16c5ec222d713d5997194030ee2e4f0d1fb328c01a81cf6868c090d50de8e169c6b13d1675b9eeed1cbc51e1fffca9b38af07f37abd790924cd3bee59d0257cfda4fe5f3f0534877e21ce5821447d1b, 65537)
def _get_subtitles(self, sub_path, video_id): def _get_subtitles(self, sub_path, video_id):
if not sub_path: if not sub_path:
@ -42,16 +49,14 @@ class ADNIE(InfoExtractor):
enc_subtitles = self._download_webpage( enc_subtitles = self._download_webpage(
urljoin(self._BASE_URL, sub_path), urljoin(self._BASE_URL, sub_path),
video_id, fatal=False, headers={ video_id, fatal=False)
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0',
})
if not enc_subtitles: if not enc_subtitles:
return None return None
# http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js # http://animedigitalnetwork.fr/components/com_vodvideo/videojs/adn-vjs.min.js
dec_subtitles = intlist_to_bytes(aes_cbc_decrypt( dec_subtitles = intlist_to_bytes(aes_cbc_decrypt(
bytes_to_intlist(compat_b64decode(enc_subtitles[24:])), bytes_to_intlist(compat_b64decode(enc_subtitles[24:])),
bytes_to_intlist(b'\xc8\x6e\x06\xbc\xbe\xc6\x49\xf5\x88\x0d\xc8\x47\xc4\x27\x0c\x60'), bytes_to_intlist(binascii.unhexlify(self._K + '9032ad7083106400')),
bytes_to_intlist(compat_b64decode(enc_subtitles[:24])) bytes_to_intlist(compat_b64decode(enc_subtitles[:24]))
)) ))
subtitles_json = self._parse_json( subtitles_json = self._parse_json(
@ -112,11 +117,24 @@ class ADNIE(InfoExtractor):
error = None error = None
if not links: if not links:
links_url = player_config.get('linksurl') or options['videoUrl'] links_url = player_config.get('linksurl') or options['videoUrl']
links_data = self._download_json(urljoin( token = options['token']
self._BASE_URL, links_url), video_id) self._K = ''.join([random.choice('0123456789abcdef') for _ in range(16)])
message = bytes_to_intlist(json.dumps({
'k': self._K,
'e': 60,
't': token,
}))
padded_message = intlist_to_bytes(pkcs1pad(message, 128))
n, e = self._RSA_KEY
encrypted_message = long_to_bytes(pow(bytes_to_long(padded_message), e, n))
authorization = base64.b64encode(encrypted_message).decode()
links_data = self._download_json(
urljoin(self._BASE_URL, links_url), video_id, headers={
'Authorization': 'Bearer ' + authorization,
})
links = links_data.get('links') or {} links = links_data.get('links') or {}
metas = metas or links_data.get('meta') or {} metas = metas or links_data.get('meta') or {}
sub_path = sub_path or links_data.get('subtitles') sub_path = (sub_path or links_data.get('subtitles')) + '&token=' + token
error = links_data.get('error') error = links_data.get('error')
title = metas.get('title') or video_info['title'] title = metas.get('title') or video_info['title']

View File

@ -7,6 +7,7 @@ from .turner import TurnerBaseIE
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
strip_or_none, strip_or_none,
url_or_none,
) )
@ -98,7 +99,7 @@ class AdultSwimIE(TurnerBaseIE):
if not video_id: if not video_id:
entries = [] entries = []
for episode in video_data.get('archiveEpisodes', []): for episode in video_data.get('archiveEpisodes', []):
episode_url = episode.get('url') episode_url = url_or_none(episode.get('url'))
if not episode_url: if not episode_url:
continue continue
entries.append(self.url_result( entries.append(self.url_result(

View File

@ -9,6 +9,7 @@ from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
url_or_none,
urlencode_postdata, urlencode_postdata,
xpath_text, xpath_text,
) )
@ -304,7 +305,7 @@ class AfreecaTVIE(InfoExtractor):
file_elements = video_element.findall(compat_xpath('./file')) file_elements = video_element.findall(compat_xpath('./file'))
one = len(file_elements) == 1 one = len(file_elements) == 1
for file_num, file_element in enumerate(file_elements, start=1): for file_num, file_element in enumerate(file_elements, start=1):
file_url = file_element.text file_url = url_or_none(file_element.text)
if not file_url: if not file_url:
continue continue
key = file_element.get('key', '') key = file_element.get('key', '')

View File

@ -3,11 +3,12 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none,
parse_iso8601,
mimetype2ext,
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none,
mimetype2ext,
parse_iso8601,
url_or_none,
) )
@ -35,7 +36,7 @@ class AMPIE(InfoExtractor):
media_thumbnail = [media_thumbnail] media_thumbnail = [media_thumbnail]
for thumbnail_data in media_thumbnail: for thumbnail_data in media_thumbnail:
thumbnail = thumbnail_data.get('@attributes', {}) thumbnail = thumbnail_data.get('@attributes', {})
thumbnail_url = thumbnail.get('url') thumbnail_url = url_or_none(thumbnail.get('url'))
if not thumbnail_url: if not thumbnail_url:
continue continue
thumbnails.append({ thumbnails.append({
@ -51,7 +52,7 @@ class AMPIE(InfoExtractor):
media_subtitle = [media_subtitle] media_subtitle = [media_subtitle]
for subtitle_data in media_subtitle: for subtitle_data in media_subtitle:
subtitle = subtitle_data.get('@attributes', {}) subtitle = subtitle_data.get('@attributes', {})
subtitle_href = subtitle.get('href') subtitle_href = url_or_none(subtitle.get('href'))
if not subtitle_href: if not subtitle_href:
continue continue
subtitles.setdefault(subtitle.get('lang') or 'en', []).append({ subtitles.setdefault(subtitle.get('lang') or 'en', []).append({
@ -65,7 +66,7 @@ class AMPIE(InfoExtractor):
media_content = [media_content] media_content = [media_content]
for media_data in media_content: for media_data in media_content:
media = media_data.get('@attributes', {}) media = media_data.get('@attributes', {})
media_url = media.get('url') media_url = url_or_none(media.get('url'))
if not media_url: if not media_url:
continue continue
ext = mimetype2ext(media.get('type')) or determine_ext(media_url) ext = mimetype2ext(media.get('type')) or determine_ext(media_url)
@ -79,7 +80,7 @@ class AMPIE(InfoExtractor):
else: else:
formats.append({ formats.append({
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'), 'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'], 'url': media_url,
'tbr': int_or_none(media.get('bitrate')), 'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')), 'filesize': int_or_none(media.get('fileSize')),
'ext': ext, 'ext': ext,

View File

@ -8,6 +8,7 @@ from ..utils import (
determine_ext, determine_ext,
extract_attributes, extract_attributes,
ExtractorError, ExtractorError,
url_or_none,
urlencode_postdata, urlencode_postdata,
urljoin, urljoin,
) )
@ -52,7 +53,7 @@ class AnimeOnDemandIE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
@ -165,7 +166,7 @@ class AnimeOnDemandIE(InfoExtractor):
}, fatal=False) }, fatal=False)
if not playlist: if not playlist:
continue continue
stream_url = playlist.get('streamurl') stream_url = url_or_none(playlist.get('streamurl'))
if stream_url: if stream_url:
rtmp = re.search( rtmp = re.search(
r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+/))(?P<playpath>mp[34]:.+)', r'^(?P<url>rtmpe?://(?P<host>[^/]+)/(?P<app>.+/))(?P<playpath>mp[34]:.+)',

View File

@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
url_or_none,
) )
@ -77,7 +78,7 @@ class AolIE(InfoExtractor):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False)) m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
for rendition in video_data.get('renditions', []): for rendition in video_data.get('renditions', []):
video_url = rendition.get('url') video_url = url_or_none(rendition.get('url'))
if not video_url: if not video_url:
continue continue
ext = rendition.get('format') ext = rendition.get('format')

View File

@ -0,0 +1,94 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
url_or_none,
)
class APAIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.apa\.at/embed/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'http://uvp.apa.at/embed/293f6d17-692a-44e3-9fd5-7b178f3a1029',
'md5': '2b12292faeb0a7d930c778c7a5b4759b',
'info_dict': {
'id': 'jjv85FdZ',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
}, {
'url': 'https://uvp-apapublisher.sf.apa.at/embed/2f94e9e6-d945-4db2-9548-f9a41ebf7b78',
'only_matching': True,
}, {
'url': 'http://uvp-rma.sf.apa.at/embed/70404cca-2f47-4855-bbb8-20b1fae58f76',
'only_matching': True,
}, {
'url': 'http://uvp-kleinezeitung.sf.apa.at/embed/f1c44979-dba2-4ebf-b021-e4cf2cac3c81',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//[^/]+\.apa\.at/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}.*?)\1',
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
jwplatform_id = self._search_regex(
r'media[iI]d\s*:\s*["\'](?P<id>[a-zA-Z0-9]{8})', webpage,
'jwplatform id', default=None)
if jwplatform_id:
return self.url_result(
'jwplatform:' + jwplatform_id, ie='JWPlatform',
video_id=video_id)
sources = self._parse_json(
self._search_regex(
r'sources\s*=\s*(\[.+?\])\s*;', webpage, 'sources'),
video_id, transform_source=js_to_json)
formats = []
for source in sources:
if not isinstance(source, dict):
continue
source_url = url_or_none(source.get('file'))
if not source_url:
continue
ext = determine_ext(source_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
else:
formats.append({
'url': source_url,
})
self._sort_formats(formats)
thumbnail = self._search_regex(
r'image\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='url')
return {
'id': video_id,
'title': video_id,
'thumbnail': thumbnail,
'formats': formats,
}

View File

@ -5,6 +5,7 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
url_or_none,
) )
@ -43,7 +44,7 @@ class AparatIE(InfoExtractor):
formats = [] formats = []
for item in file_list[0]: for item in file_list[0]:
file_url = item.get('file') file_url = url_or_none(item.get('file'))
if not file_url: if not file_url:
continue continue
ext = mimetype2ext(item.get('type')) ext = mimetype2ext(item.get('type'))

View File

@ -5,7 +5,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .generic import GenericIE from .generic import GenericIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
@ -15,6 +14,7 @@ from ..utils import (
unified_strdate, unified_strdate,
xpath_text, xpath_text,
update_url_query, update_url_query,
url_or_none,
) )
from ..compat import compat_etree_fromstring from ..compat import compat_etree_fromstring
@ -100,7 +100,7 @@ class ARDMediathekIE(InfoExtractor):
quality = stream.get('_quality') quality = stream.get('_quality')
server = stream.get('_server') server = stream.get('_server')
for stream_url in stream_urls: for stream_url in stream_urls:
if not isinstance(stream_url, compat_str) or '//' not in stream_url: if not url_or_none(stream_url):
continue continue
ext = determine_ext(stream_url) ext = determine_ext(stream_url)
if quality != 'auto' and ext in ('f4m', 'm3u8'): if quality != 'auto' and ext in ('f4m', 'm3u8'):

View File

@ -74,7 +74,7 @@ class AtresPlayerIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -5,13 +5,12 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
sanitized_Request,
) )
class AudiMediaIE(InfoExtractor): class AudiMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audi-mediacenter\.com/(?:en|de)/audimediatv/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?audi-mediacenter\.com/(?:en|de)/audimediatv/(?:video/)?(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'https://www.audi-mediacenter.com/en/audimediatv/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test-1467', 'url': 'https://www.audi-mediacenter.com/en/audimediatv/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test-1467',
'md5': '79a8b71c46d49042609795ab59779b66', 'md5': '79a8b71c46d49042609795ab59779b66',
'info_dict': { 'info_dict': {
@ -24,41 +23,46 @@ class AudiMediaIE(InfoExtractor):
'duration': 74022, 'duration': 74022,
'view_count': int, 'view_count': int,
} }
} }, {
# extracted from https://audimedia.tv/assets/embed/embedded-player.js (dataSourceAuthToken) 'url': 'https://www.audi-mediacenter.com/en/audimediatv/video/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test-2991',
_AUTH_TOKEN = 'e25b42847dba18c6c8816d5d8ce94c326e06823ebf0859ed164b3ba169be97f2' 'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
raw_payload = self._search_regex([ raw_payload = self._search_regex([
r'class="amtv-embed"[^>]+id="([^"]+)"', r'class="amtv-embed"[^>]+id="([0-9a-z-]+)"',
r'class=\\"amtv-embed\\"[^>]+id=\\"([^"]+)\\"', r'id="([0-9a-z-]+)"[^>]+class="amtv-embed"',
r'class=\\"amtv-embed\\"[^>]+id=\\"([0-9a-z-]+)\\"',
r'id=\\"([0-9a-z-]+)\\"[^>]+class=\\"amtv-embed\\"',
r'id=(?:\\)?"(amtve-[a-z]-\d+-[a-z]{2})',
], webpage, 'raw payload') ], webpage, 'raw payload')
_, stage_mode, video_id, lang = raw_payload.split('-') _, stage_mode, video_id, _ = raw_payload.split('-')
# TODO: handle s and e stage_mode (live streams and ended live streams) # TODO: handle s and e stage_mode (live streams and ended live streams)
if stage_mode not in ('s', 'e'): if stage_mode not in ('s', 'e'):
request = sanitized_Request( video_data = self._download_json(
'https://audimedia.tv/api/video/v1/videos/%s?embed[]=video_versions&embed[]=thumbnail_image&where[content_language_iso]=%s' % (video_id, lang), 'https://www.audimedia.tv/api/video/v1/videos/' + video_id,
headers={'X-Auth-Token': self._AUTH_TOKEN}) video_id, query={
json_data = self._download_json(request, video_id)['results'] 'embed[]': ['video_versions', 'thumbnail_image'],
})['results']
formats = [] formats = []
stream_url_hls = json_data.get('stream_url_hls') stream_url_hls = video_data.get('stream_url_hls')
if stream_url_hls: if stream_url_hls:
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
stream_url_hls, video_id, 'mp4', stream_url_hls, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)) entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
stream_url_hds = json_data.get('stream_url_hds') stream_url_hds = video_data.get('stream_url_hds')
if stream_url_hds: if stream_url_hds:
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
stream_url_hds + '?hdcore=3.4.0', stream_url_hds + '?hdcore=3.4.0',
video_id, f4m_id='hds', fatal=False)) video_id, f4m_id='hds', fatal=False))
for video_version in json_data.get('video_versions'): for video_version in video_data.get('video_versions', []):
video_version_url = video_version.get('download_url') or video_version.get('stream_url') video_version_url = video_version.get('download_url') or video_version.get('stream_url')
if not video_version_url: if not video_version_url:
continue continue
@ -79,11 +83,11 @@ class AudiMediaIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'title': json_data['title'], 'title': video_data['title'],
'description': json_data.get('subtitle'), 'description': video_data.get('subtitle'),
'thumbnail': json_data.get('thumbnail_image', {}).get('file'), 'thumbnail': video_data.get('thumbnail_image', {}).get('file'),
'timestamp': parse_iso8601(json_data.get('publication_date')), 'timestamp': parse_iso8601(video_data.get('publication_date')),
'duration': int_or_none(json_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'view_count': int_or_none(json_data.get('view_count')), 'view_count': int_or_none(video_data.get('view_count')),
'formats': formats, 'formats': formats,
} }

View File

@ -65,7 +65,7 @@ class AudiomackIE(InfoExtractor):
return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'} return {'_type': 'url', 'url': api_response['url'], 'ie_key': 'Soundcloud'}
return { return {
'id': api_response.get('id', album_url_tag), 'id': compat_str(api_response.get('id', album_url_tag)),
'uploader': api_response.get('artist'), 'uploader': api_response.get('artist'),
'title': api_response.get('title'), 'title': api_response.get('title'),
'url': api_response['url'], 'url': api_response['url'],

View File

@ -44,7 +44,7 @@ class BambuserIE(InfoExtractor):
} }
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return

View File

@ -19,6 +19,7 @@ from ..utils import (
unescapeHTML, unescapeHTML,
update_url_query, update_url_query,
unified_strdate, unified_strdate,
url_or_none,
) )
@ -131,8 +132,8 @@ class BandcampIE(InfoExtractor):
fatal=False) fatal=False)
if not stat: if not stat:
continue continue
retry_url = stat.get('retry_url') retry_url = url_or_none(stat.get('retry_url'))
if not isinstance(retry_url, compat_str): if not retry_url:
continue continue
formats.append({ formats.append({
'url': self._proto_relative_url(retry_url, 'http:'), 'url': self._proto_relative_url(retry_url, 'http:'),
@ -306,7 +307,7 @@ class BandcampWeeklyIE(InfoExtractor):
formats = [] formats = []
for format_id, format_url in show['audio_stream'].items(): for format_id, format_url in show['audio_stream'].items():
if not isinstance(format_url, compat_str): if not url_or_none(format_url):
continue continue
for known_ext in KNOWN_EXTENSIONS: for known_ext in KNOWN_EXTENSIONS:
if known_ext in format_id: if known_ext in format_id:

View File

@ -12,6 +12,7 @@ from ..utils import (
float_or_none, float_or_none,
get_element_by_class, get_element_by_class,
int_or_none, int_or_none,
js_to_json,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
try_get, try_get,
@ -20,7 +21,6 @@ from ..utils import (
urljoin, urljoin,
) )
from ..compat import ( from ..compat import (
compat_etree_fromstring,
compat_HTTPError, compat_HTTPError,
compat_urlparse, compat_urlparse,
) )
@ -29,7 +29,7 @@ from ..compat import (
class BBCCoUkIE(InfoExtractor): class BBCCoUkIE(InfoExtractor):
IE_NAME = 'bbc.co.uk' IE_NAME = 'bbc.co.uk'
IE_DESC = 'BBC iPlayer' IE_DESC = 'BBC iPlayer'
_ID_REGEX = r'[pbw][\da-z]{7}' _ID_REGEX = r'(?:[pbm][\da-z]{7}|w[\da-z]{7,14})'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:www\.)?bbc\.co\.uk/ (?:www\.)?bbc\.co\.uk/
@ -236,6 +236,12 @@ class BBCCoUkIE(InfoExtractor):
}, { }, {
'url': 'http://www.bbc.co.uk/programmes/w3csv1y9', 'url': 'http://www.bbc.co.uk/programmes/w3csv1y9',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.bbc.co.uk/programmes/m00005xn',
'only_matching': True,
}, {
'url': 'https://www.bbc.co.uk/programmes/w172w4dww1jqt5s',
'only_matching': True,
}] }]
_USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8' _USP_RE = r'/([^/]+?)\.ism(?:\.hlsv2\.ism)?/[^/]+\.m3u8'
@ -333,14 +339,9 @@ class BBCCoUkIE(InfoExtractor):
self._raise_extractor_error(last_exception) self._raise_extractor_error(last_exception)
def _download_media_selector_url(self, url, programme_id=None): def _download_media_selector_url(self, url, programme_id=None):
try: media_selection = self._download_xml(
media_selection = self._download_xml( url, programme_id, 'Downloading media selection XML',
url, programme_id, 'Downloading media selection XML') expected_status=(403, 404))
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code in (403, 404):
media_selection = compat_etree_fromstring(ee.cause.read().decode('utf-8'))
else:
raise
return self._process_media_selector(media_selection, programme_id) return self._process_media_selector(media_selection, programme_id)
def _process_media_selector(self, media_selection, programme_id): def _process_media_selector(self, media_selection, programme_id):
@ -772,6 +773,28 @@ class BBCIE(BBCCoUkIE):
# single video article embedded with data-media-vpid # single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187', 'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.bbc.co.uk/bbcthree/clip/73d0bbd0-abc3-4cea-b3c0-cdae21905eb1',
'info_dict': {
'id': 'p06556y7',
'ext': 'mp4',
'title': 'Transfers: Cristiano Ronaldo to Man Utd, Arsenal to spend?',
'description': 'md5:4b7dfd063d5a789a1512e99662be3ddd',
},
'params': {
'skip_download': True,
}
}, {
# window.__PRELOADED_STATE__
'url': 'https://www.bbc.co.uk/radio/play/b0b9z4yl',
'info_dict': {
'id': 'b0b9z4vz',
'ext': 'mp4',
'title': 'Prom 6: An American in Paris and Turangalila',
'description': 'md5:51cf7d6f5c8553f197e58203bc78dff8',
'uploader': 'Radio 3',
'uploader_id': 'bbc_radio_three',
},
}] }]
@classmethod @classmethod
@ -994,6 +1017,66 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
preload_state = self._parse_json(self._search_regex(
r'window\.__PRELOADED_STATE__\s*=\s*({.+?});', webpage,
'preload state', default='{}'), playlist_id, fatal=False)
if preload_state:
current_programme = preload_state.get('programmes', {}).get('current') or {}
programme_id = current_programme.get('id')
if current_programme and programme_id and current_programme.get('type') == 'playable_item':
title = current_programme.get('titles', {}).get('tertiary') or playlist_title
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
synopses = current_programme.get('synopses') or {}
network = current_programme.get('network') or {}
duration = int_or_none(
current_programme.get('duration', {}).get('value'))
thumbnail = None
image_url = current_programme.get('image_url')
if image_url:
thumbnail = image_url.replace('{recipe}', '1920x1920')
return {
'id': programme_id,
'title': title,
'description': dict_get(synopses, ('long', 'medium', 'short')),
'thumbnail': thumbnail,
'duration': duration,
'uploader': network.get('short_title'),
'uploader_id': network.get('id'),
'formats': formats,
'subtitles': subtitles,
}
bbc3_config = self._parse_json(
self._search_regex(
r'(?s)bbcthreeConfig\s*=\s*({.+?})\s*;\s*<', webpage,
'bbcthree config', default='{}'),
playlist_id, transform_source=js_to_json, fatal=False)
if bbc3_config:
bbc3_playlist = try_get(
bbc3_config, lambda x: x['payload']['content']['bbcMedia']['playlist'],
dict)
if bbc3_playlist:
playlist_title = bbc3_playlist.get('title') or playlist_title
thumbnail = bbc3_playlist.get('holdingImageURL')
entries = []
for bbc3_item in bbc3_playlist['items']:
programme_id = bbc3_item.get('versionID')
if not programme_id:
continue
formats, subtitles = self._download_media_selector(programme_id)
self._sort_formats(formats)
entries.append({
'id': programme_id,
'title': playlist_title,
'thumbnail': thumbnail,
'timestamp': timestamp,
'formats': formats,
'subtitles': subtitles,
})
return self.playlist_result(
entries, playlist_id, playlist_title, playlist_description)
def extract_all(pattern): def extract_all(pattern):
return list(filter(None, map( return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False), lambda s: self._parse_json(s, playlist_id, fatal=False),

View File

@ -12,7 +12,7 @@ class BellMediaIE(InfoExtractor):
(?: (?:
ctv| ctv|
tsn| tsn|
bnn| bnn(?:bloomberg)?|
thecomedynetwork| thecomedynetwork|
discovery| discovery|
discoveryvelocity| discoveryvelocity|
@ -27,17 +27,16 @@ class BellMediaIE(InfoExtractor):
much\.com much\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})''' )/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctv.ca/video/player?vid=706966', 'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': 'ff2ebbeae0aa2dcc32a830c3fd69b7b0', 'md5': '36d3ef559cfe8af8efe15922cd3ce950',
'info_dict': { 'info_dict': {
'id': '706966', 'id': '1403070',
'ext': 'mp4', 'ext': 'flv',
'title': 'Larry Day and Richard Jutras on the TIFF red carpet of \'Stonewall\'', 'title': 'David Cockfield\'s Top Picks',
'description': 'etalk catches up with Larry Day and Richard Jutras on the TIFF red carpet of "Stonewall”.', 'description': 'md5:810f7f8c6a83ad5b48677c3f8e5bb2c3',
'upload_date': '20150919', 'upload_date': '20180525',
'timestamp': 1442624700, 'timestamp': 1527288600,
}, },
'expected_warnings': ['HTTP Error 404'],
}, { }, {
'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582', 'url': 'http://www.thecomedynetwork.ca/video/player?vid=923582',
'only_matching': True, 'only_matching': True,
@ -70,6 +69,7 @@ class BellMediaIE(InfoExtractor):
'investigationdiscovery': 'invdisc', 'investigationdiscovery': 'invdisc',
'animalplanet': 'aniplan', 'animalplanet': 'aniplan',
'etalk': 'ctv', 'etalk': 'ctv',
'bnnbloomberg': 'bnn',
} }
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -114,7 +114,7 @@ class BiliBiliIE(InfoExtractor):
if 'anime/' not in url: if 'anime/' not in url:
cid = self._search_regex( cid = self._search_regex(
r'cid(?:["\']:|=)(\d+)', webpage, 'cid', r'\bcid(?:["\']:|=)(\d+)', webpage, 'cid',
default=None default=None
) or compat_parse_qs(self._search_regex( ) or compat_parse_qs(self._search_regex(
[r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)', [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',

View File

@ -0,0 +1,118 @@
# coding: utf-8
from __future__ import unicode_literals
import itertools
import re
from .common import InfoExtractor
from ..utils import urlencode_postdata
class BitChuteIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bitchute\.com/(?:video|embed|torrent/[^/]+)/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.bitchute.com/video/szoMrox2JEI/',
'md5': '66c4a70e6bfc40dcb6be3eb1d74939eb',
'info_dict': {
'id': 'szoMrox2JEI',
'ext': 'mp4',
'title': 'Fuck bitches get money',
'description': 'md5:3f21f6fb5b1d17c3dee9cf6b5fe60b3a',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Victoria X Rave',
},
}, {
'url': 'https://www.bitchute.com/embed/lbb5G1hjPhw/',
'only_matching': True,
}, {
'url': 'https://www.bitchute.com/torrent/Zee5BE49045h/szoMrox2JEI.webtorrent',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(
'https://www.bitchute.com/video/%s' % video_id, video_id)
title = self._search_regex(
(r'<[^>]+\bid=["\']video-title[^>]+>([^<]+)', r'<title>([^<]+)'),
webpage, 'title', default=None) or self._html_search_meta(
'description', webpage, 'title',
default=None) or self._og_search_description(webpage)
formats = [
{'url': mobj.group('url')}
for mobj in re.finditer(
r'addWebSeed\s*\(\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage)]
self._sort_formats(formats)
description = self._html_search_regex(
r'(?s)<div\b[^>]+\bclass=["\']full hidden[^>]+>(.+?)</div>',
webpage, 'description', fatal=False)
thumbnail = self._og_search_thumbnail(
webpage, default=None) or self._html_search_meta(
'twitter:image:src', webpage, 'thumbnail')
uploader = self._html_search_regex(
r'(?s)<p\b[^>]+\bclass=["\']video-author[^>]+>(.+?)</p>', webpage,
'uploader', fatal=False)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'formats': formats,
}
class BitChuteChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bitchute\.com/channel/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.bitchute.com/channel/victoriaxrave/',
'playlist_mincount': 185,
'info_dict': {
'id': 'victoriaxrave',
},
}
_TOKEN = 'zyG6tQcGPE5swyAEFLqKUwMuMMuF6IO2DZ6ZDQjGfsL0e4dcTLwqkTTul05Jdve7'
def _entries(self, channel_id):
channel_url = 'https://www.bitchute.com/channel/%s/' % channel_id
offset = 0
for page_num in itertools.count(1):
data = self._download_json(
'%sextend/' % channel_url, channel_id,
'Downloading channel page %d' % page_num,
data=urlencode_postdata({
'csrfmiddlewaretoken': self._TOKEN,
'name': '',
'offset': offset,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'Referer': channel_url,
'X-Requested-With': 'XMLHttpRequest',
'Cookie': 'csrftoken=%s' % self._TOKEN,
})
if data.get('success') is False:
break
html = data.get('html')
if not html:
break
video_ids = re.findall(
r'class=["\']channel-videos-image-container[^>]+>\s*<a\b[^>]+\bhref=["\']/video/([^"\'/]+)',
html)
if not video_ids:
break
offset += len(video_ids)
for video_id in video_ids:
yield self.url_result(
'https://www.bitchute.com/video/%s' % video_id,
ie=BitChuteIE.ie_key(), video_id=video_id)
def _real_extract(self, url):
channel_id = self._match_id(url)
return self.playlist_result(
self._entries(channel_id), playlist_id=channel_id)

View File

@ -4,8 +4,10 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .youtube import YoutubeIE from .youtube import YoutubeIE
from ..compat import compat_str from ..utils import (
from ..utils import int_or_none int_or_none,
url_or_none,
)
class BreakIE(InfoExtractor): class BreakIE(InfoExtractor):
@ -55,8 +57,8 @@ class BreakIE(InfoExtractor):
formats = [] formats = []
for video in content: for video in content:
video_url = video.get('url') video_url = url_or_none(video.get('url'))
if not video_url or not isinstance(video_url, compat_str): if not video_url:
continue continue
bitrate = int_or_none(self._search_regex( bitrate = int_or_none(self._search_regex(
r'(\d+)_kbps', video_url, 'tbr', default=None)) r'(\d+)_kbps', video_url, 'tbr', default=None))

View File

@ -572,7 +572,8 @@ class BrightcoveNewIE(AdobePassIE):
container = source.get('container') container = source.get('container')
ext = mimetype2ext(source.get('type')) ext = mimetype2ext(source.get('type'))
src = source.get('src') src = source.get('src')
if ext == 'ism' or container == 'WVM': # https://support.brightcove.com/playback-api-video-fields-reference#key_systems_object
if ext == 'ism' or container == 'WVM' or source.get('key_systems'):
continue continue
elif ext == 'm3u8' or container == 'M2TS': elif ext == 'm3u8' or container == 'M2TS':
if not src: if not src:
@ -629,6 +630,14 @@ class BrightcoveNewIE(AdobePassIE):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
if not formats:
# for sonyliv.com DRM protected videos
s3_source_url = json_data.get('custom_fields', {}).get('s3sourceurl')
if s3_source_url:
formats.append({
'url': s3_source_url,
'format_id': 'source',
})
errors = json_data.get('errors') errors = json_data.get('errors')
if not formats and errors: if not formats and errors:

View File

@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
url_or_none,
)
class CamModelsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cammodels\.com/cam/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.cammodels.com/cam/AutumnKnight/',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(
url, user_id, headers=self.geo_verification_headers())
manifest_root = self._html_search_regex(
r'manifestUrlRoot=([^&\']+)', webpage, 'manifest', default=None)
if not manifest_root:
ERRORS = (
("I'm offline, but let's stay connected", 'This user is currently offline'),
('in a private show', 'This user is in a private show'),
('is currently performing LIVE', 'This model is currently performing live'),
)
for pattern, message in ERRORS:
if pattern in webpage:
error = message
expected = True
break
else:
error = 'Unable to find manifest URL root'
expected = False
raise ExtractorError(error, expected=expected)
manifest = self._download_json(
'%s%s.json' % (manifest_root, user_id), user_id)
formats = []
for format_id, format_dict in manifest['formats'].items():
if not isinstance(format_dict, dict):
continue
encodings = format_dict.get('encodings')
if not isinstance(encodings, list):
continue
vcodec = format_dict.get('videoCodec')
acodec = format_dict.get('audioCodec')
for media in encodings:
if not isinstance(media, dict):
continue
media_url = url_or_none(media.get('location'))
if not media_url:
continue
format_id_list = [format_id]
height = int_or_none(media.get('videoHeight'))
if height is not None:
format_id_list.append('%dp' % height)
f = {
'url': media_url,
'format_id': '-'.join(format_id_list),
'width': int_or_none(media.get('videoWidth')),
'height': height,
'vbr': int_or_none(media.get('videoKbps')),
'abr': int_or_none(media.get('audioKbps')),
'fps': int_or_none(media.get('fps')),
'vcodec': vcodec,
'acodec': acodec,
}
if 'rtmp' in format_id:
f['ext'] = 'flv'
elif 'hls' in format_id:
f.update({
'ext': 'mp4',
# hls skips fragments, preferring rtmp
'preference': -1,
})
else:
continue
formats.append(f)
self._sort_formats(formats)
return {
'id': user_id,
'title': self._live_title(user_id),
'is_live': True,
'formats': formats,
}

View File

@ -0,0 +1,69 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_timestamp,
)
class CamTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|api)\.)?camtube\.co/recordings?/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://camtube.co/recording/minafay-030618-1136-chaturbate-female',
'info_dict': {
'id': '42ad3956-dd5b-445a-8313-803ea6079fac',
'display_id': 'minafay-030618-1136-chaturbate-female',
'ext': 'mp4',
'title': 'minafay-030618-1136-chaturbate-female',
'duration': 1274,
'timestamp': 1528018608,
'upload_date': '20180603',
},
'params': {
'skip_download': True,
},
}]
_API_BASE = 'https://api.camtube.co'
def _real_extract(self, url):
display_id = self._match_id(url)
token = self._download_json(
'%s/rpc/session/new' % self._API_BASE, display_id,
'Downloading session token')['token']
self._set_cookie('api.camtube.co', 'session', token)
video = self._download_json(
'%s/recordings/%s' % (self._API_BASE, display_id), display_id,
headers={'Referer': url})
video_id = video['uuid']
timestamp = unified_timestamp(video.get('createdAt'))
duration = int_or_none(video.get('duration'))
view_count = int_or_none(video.get('viewCount'))
like_count = int_or_none(video.get('likeCount'))
creator = video.get('stageName')
formats = [{
'url': '%s/recordings/%s/manifest.m3u8'
% (self._API_BASE, video_id),
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
}]
return {
'id': video_id,
'display_id': display_id,
'title': display_id,
'timestamp': timestamp,
'duration': duration,
'view_count': view_count,
'like_count': like_count,
'creator': creator,
'formats': formats,
}

View File

@ -11,6 +11,7 @@ from ..utils import (
strip_or_none, strip_or_none,
float_or_none, float_or_none,
int_or_none, int_or_none,
merge_dicts,
parse_iso8601, parse_iso8601,
) )
@ -248,9 +249,13 @@ class VrtNUIE(GigyaBaseIE):
webpage, urlh = self._download_webpage_handle(url, display_id) webpage, urlh = self._download_webpage_handle(url, display_id)
title = self._html_search_regex( info = self._search_json_ld(webpage, display_id, default={})
# title is optional here since it may be extracted by extractor
# that is delegated from here
title = strip_or_none(self._html_search_regex(
r'(?ms)<h1 class="content__heading">(.+?)</h1>', r'(?ms)<h1 class="content__heading">(.+?)</h1>',
webpage, 'title').strip() webpage, 'title', default=None))
description = self._html_search_regex( description = self._html_search_regex(
r'(?ms)<div class="content__description">(.+?)</div>', r'(?ms)<div class="content__description">(.+?)</div>',
@ -295,7 +300,7 @@ class VrtNUIE(GigyaBaseIE):
# the first one # the first one
video_id = list(video.values())[0].get('videoid') video_id = list(video.values())[0].get('videoid')
return { return merge_dicts(info, {
'_type': 'url_transparent', '_type': 'url_transparent',
'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id, 'url': 'https://mediazone.vrt.be/api/v1/vrtvideo/assets/%s' % video_id,
'ie_key': CanvasIE.ie_key(), 'ie_key': CanvasIE.ie_key(),
@ -307,4 +312,4 @@ class VrtNUIE(GigyaBaseIE):
'season_number': season_number, 'season_number': season_number,
'episode_number': episode_number, 'episode_number': episode_number,
'release_date': release_date, 'release_date': release_date,
} })

View File

@ -17,9 +17,11 @@ from ..utils import (
xpath_element, xpath_element,
xpath_with_ns, xpath_with_ns,
find_xpath_attr, find_xpath_attr,
orderedSet,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
parse_age_limit, parse_age_limit,
strip_or_none,
int_or_none, int_or_none,
ExtractorError, ExtractorError,
) )
@ -129,15 +131,23 @@ class CBCIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title', default=None) or self._html_search_regex(
r'<title>([^<]+)</title>', webpage, 'title', fatal=False)
entries = [ entries = [
self._extract_player_init(player_init, display_id) self._extract_player_init(player_init, display_id)
for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)] for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)]
media_ids = []
for media_id_re in (
r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"',
r'<div[^>]+\bid=["\']player-(\d+)',
r'guid["\']\s*:\s*["\'](\d+)'):
media_ids.extend(re.findall(media_id_re, webpage))
entries.extend([ entries.extend([
self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]) for media_id in orderedSet(media_ids)])
return self.playlist_result( return self.playlist_result(
entries, display_id, entries, display_id, strip_or_none(title),
self._og_search_title(webpage, fatal=False),
self._og_search_description(webpage)) self._og_search_description(webpage))

View File

@ -4,13 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
parse_resolution, parse_resolution,
url_or_none,
) )
@ -53,8 +53,8 @@ class CCMAIE(InfoExtractor):
media_url = media['media']['url'] media_url = media['media']['url']
if isinstance(media_url, list): if isinstance(media_url, list):
for format_ in media_url: for format_ in media_url:
format_url = format_.get('file') format_url = url_or_none(format_.get('file'))
if not format_url or not isinstance(format_url, compat_str): if not format_url:
continue continue
label = format_.get('label') label = format_.get('label')
f = parse_resolution(label) f = parse_resolution(label)

View File

@ -108,7 +108,7 @@ class CeskaTelevizeIE(InfoExtractor):
for user_agent in (None, USER_AGENTS['Safari']): for user_agent in (None, USER_AGENTS['Safari']):
req = sanitized_Request( req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist', 'https://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=urlencode_postdata(data)) data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded') req.add_header('Content-type', 'application/x-www-form-urlencoded')

View File

@ -31,7 +31,8 @@ class ChaturbateIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(
url, video_id, headers=self.geo_verification_headers())
m3u8_urls = [] m3u8_urls = []

View File

@ -1,15 +1,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urllib_parse_urlparse,
)
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
parse_iso8601, unified_timestamp,
) )
class ClypIE(InfoExtractor): class ClypIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clyp\.it/(?P<id>[a-z0-9]+)' _VALID_URL = r'https?://(?:www\.)?clyp\.it/(?P<id>[a-z0-9]+)'
_TEST = { _TESTS = [{
'url': 'https://clyp.it/ojz2wfah', 'url': 'https://clyp.it/ojz2wfah',
'md5': '1d4961036c41247ecfdcc439c0cddcbb', 'md5': '1d4961036c41247ecfdcc439c0cddcbb',
'info_dict': { 'info_dict': {
@ -21,13 +25,34 @@ class ClypIE(InfoExtractor):
'timestamp': 1443515251, 'timestamp': 1443515251,
'upload_date': '20150929', 'upload_date': '20150929',
}, },
} }, {
'url': 'https://clyp.it/b04p1odi?token=b0078e077e15835845c528a44417719d',
'info_dict': {
'id': 'b04p1odi',
'ext': 'mp3',
'title': 'GJ! (Reward Edit)',
'description': 'Metal Resistance (THE ONE edition)',
'duration': 177.789,
'timestamp': 1528241278,
'upload_date': '20180605',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
audio_id = self._match_id(url) audio_id = self._match_id(url)
qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
token = qs.get('token', [None])[0]
query = {}
if token:
query['token'] = token
metadata = self._download_json( metadata = self._download_json(
'https://api.clyp.it/%s' % audio_id, audio_id) 'https://api.clyp.it/%s' % audio_id, audio_id, query=query)
formats = [] formats = []
for secure in ('', 'Secure'): for secure in ('', 'Secure'):
@ -45,7 +70,7 @@ class ClypIE(InfoExtractor):
title = metadata['Title'] title = metadata['Title']
description = metadata.get('Description') description = metadata.get('Description')
duration = float_or_none(metadata.get('Duration')) duration = float_or_none(metadata.get('Duration'))
timestamp = parse_iso8601(metadata.get('DateCreated')) timestamp = unified_timestamp(metadata.get('DateCreated'))
return { return {
'id': audio_id, 'id': audio_id,

View File

@ -19,6 +19,7 @@ from ..compat import (
compat_cookies, compat_cookies,
compat_etree_fromstring, compat_etree_fromstring,
compat_getpass, compat_getpass,
compat_integer_types,
compat_http_client, compat_http_client,
compat_os_name, compat_os_name,
compat_str, compat_str,
@ -51,6 +52,7 @@ from ..utils import (
GeoUtils, GeoUtils,
int_or_none, int_or_none,
js_to_json, js_to_json,
JSON_LD_RE,
mimetype2ext, mimetype2ext,
orderedSet, orderedSet,
parse_codecs, parse_codecs,
@ -339,20 +341,17 @@ class InfoExtractor(object):
_GEO_BYPASS attribute may be set to False in order to disable _GEO_BYPASS attribute may be set to False in order to disable
geo restriction bypass mechanisms for a particular extractor. geo restriction bypass mechanisms for a particular extractor.
Though it won't disable explicit geo restriction bypass based on Though it won't disable explicit geo restriction bypass based on
country code provided with geo_bypass_country. (experimental) country code provided with geo_bypass_country.
_GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted _GEO_COUNTRIES attribute may contain a list of presumably geo unrestricted
countries for this extractor. One of these countries will be used by countries for this extractor. One of these countries will be used by
geo restriction bypass mechanism right away in order to bypass geo restriction bypass mechanism right away in order to bypass
geo restriction, of course, if the mechanism is not disabled. (experimental) geo restriction, of course, if the mechanism is not disabled.
_GEO_IP_BLOCKS attribute may contain a list of presumably geo unrestricted _GEO_IP_BLOCKS attribute may contain a list of presumably geo unrestricted
IP blocks in CIDR notation for this extractor. One of these IP blocks IP blocks in CIDR notation for this extractor. One of these IP blocks
will be used by geo restriction bypass mechanism similarly will be used by geo restriction bypass mechanism similarly
to _GEO_COUNTRIES. (experimental) to _GEO_COUNTRIES.
NB: both these geo attributes are experimental and may change in future
or be completely removed.
Finally, the _WORKING attribute should be set to False for broken IEs Finally, the _WORKING attribute should be set to False for broken IEs
in order to warn the users and skip the tests. in order to warn the users and skip the tests.
@ -551,8 +550,26 @@ class InfoExtractor(object):
def IE_NAME(self): def IE_NAME(self):
return compat_str(type(self).__name__[:-2]) return compat_str(type(self).__name__[:-2])
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}): @staticmethod
""" Returns the response handle """ def __can_accept_status_code(err, expected_status):
assert isinstance(err, compat_urllib_error.HTTPError)
if expected_status is None:
return False
if isinstance(expected_status, compat_integer_types):
return err.code == expected_status
elif isinstance(expected_status, (list, tuple)):
return err.code in expected_status
elif callable(expected_status):
return expected_status(err.code) is True
else:
assert False
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}, expected_status=None):
"""
Return the response handle.
See _download_webpage docstring for arguments specification.
"""
if note is None: if note is None:
self.report_download_webpage(video_id) self.report_download_webpage(video_id)
elif note is not False: elif note is not False:
@ -581,6 +598,10 @@ class InfoExtractor(object):
try: try:
return self._downloader.urlopen(url_or_request) return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status):
return err.fp
if errnote is False: if errnote is False:
return False return False
if errnote is None: if errnote is None:
@ -593,13 +614,17 @@ class InfoExtractor(object):
self._downloader.report_warning(errmsg) self._downloader.report_warning(errmsg)
return False return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}): def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}, expected_status=None):
""" Returns a tuple (page content as string, URL handle) """ """
Return a tuple (page content as string, URL handle).
See _download_webpage docstring for arguments specification.
"""
# Strip hashes from the URL (#1038) # Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)): if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0] url_or_request = url_or_request.partition('#')[0]
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query) urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query, expected_status=expected_status)
if urlh is False: if urlh is False:
assert not fatal assert not fatal
return False return False
@ -688,13 +713,52 @@ class InfoExtractor(object):
return content return content
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}): def _download_webpage(
""" Returns the data of the page as a string """ self, url_or_request, video_id, note=None, errnote=None,
fatal=True, tries=1, timeout=5, encoding=None, data=None,
headers={}, query={}, expected_status=None):
"""
Return the data of the page as a string.
Arguments:
url_or_request -- plain text URL as a string or
a compat_urllib_request.Requestobject
video_id -- Video/playlist/item identifier (string)
Keyword arguments:
note -- note printed before downloading (string)
errnote -- note printed in case of an error (string)
fatal -- flag denoting whether error should be considered fatal,
i.e. whether it should cause ExtractionError to be raised,
otherwise a warning will be reported and extraction continued
tries -- number of tries
timeout -- sleep interval between tries
encoding -- encoding for a page content decoding, guessed automatically
when not explicitly specified
data -- POST data (bytes)
headers -- HTTP headers (dict)
query -- URL query (dict)
expected_status -- allows to accept failed HTTP requests (non 2xx
status code) by explicitly specifying a set of accepted status
codes. Can be any of the following entities:
- an integer type specifying an exact failed status code to
accept
- a list or a tuple of integer types specifying a list of
failed status codes to accept
- a callable accepting an actual failed status code and
returning True if it should be accepted
Note that this argument does not affect success status codes (2xx)
which are always accepted.
"""
success = False success = False
try_count = 0 try_count = 0
while success is False: while success is False:
try: try:
res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal, encoding=encoding, data=data, headers=headers, query=query) res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal,
encoding=encoding, data=data, headers=headers, query=query,
expected_status=expected_status)
success = True success = True
except compat_http_client.IncompleteRead as e: except compat_http_client.IncompleteRead as e:
try_count += 1 try_count += 1
@ -710,11 +774,17 @@ class InfoExtractor(object):
def _download_xml_handle( def _download_xml_handle(
self, url_or_request, video_id, note='Downloading XML', self, url_or_request, video_id, note='Downloading XML',
errnote='Unable to download XML', transform_source=None, errnote='Unable to download XML', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}): fatal=True, encoding=None, data=None, headers={}, query={},
"""Return a tuple (xml as an xml.etree.ElementTree.Element, URL handle)""" expected_status=None):
"""
Return a tuple (xml as an xml.etree.ElementTree.Element, URL handle).
See _download_webpage docstring for arguments specification.
"""
res = self._download_webpage_handle( res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal, url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query) encoding=encoding, data=data, headers=headers, query=query,
expected_status=expected_status)
if res is False: if res is False:
return res return res
xml_string, urlh = res xml_string, urlh = res
@ -722,15 +792,21 @@ class InfoExtractor(object):
xml_string, video_id, transform_source=transform_source, xml_string, video_id, transform_source=transform_source,
fatal=fatal), urlh fatal=fatal), urlh
def _download_xml(self, url_or_request, video_id, def _download_xml(
note='Downloading XML', errnote='Unable to download XML', self, url_or_request, video_id,
transform_source=None, fatal=True, encoding=None, note='Downloading XML', errnote='Unable to download XML',
data=None, headers={}, query={}): transform_source=None, fatal=True, encoding=None,
"""Return the xml as an xml.etree.ElementTree.Element""" data=None, headers={}, query={}, expected_status=None):
"""
Return the xml as an xml.etree.ElementTree.Element.
See _download_webpage docstring for arguments specification.
"""
res = self._download_xml_handle( res = self._download_xml_handle(
url_or_request, video_id, note=note, errnote=errnote, url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding, transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query) data=data, headers=headers, query=query,
expected_status=expected_status)
return res if res is False else res[0] return res if res is False else res[0]
def _parse_xml(self, xml_string, video_id, transform_source=None, fatal=True): def _parse_xml(self, xml_string, video_id, transform_source=None, fatal=True):
@ -748,11 +824,17 @@ class InfoExtractor(object):
def _download_json_handle( def _download_json_handle(
self, url_or_request, video_id, note='Downloading JSON metadata', self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None, errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}): fatal=True, encoding=None, data=None, headers={}, query={},
"""Return a tuple (JSON object, URL handle)""" expected_status=None):
"""
Return a tuple (JSON object, URL handle).
See _download_webpage docstring for arguments specification.
"""
res = self._download_webpage_handle( res = self._download_webpage_handle(
url_or_request, video_id, note, errnote, fatal=fatal, url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query) encoding=encoding, data=data, headers=headers, query=query,
expected_status=expected_status)
if res is False: if res is False:
return res return res
json_string, urlh = res json_string, urlh = res
@ -763,11 +845,18 @@ class InfoExtractor(object):
def _download_json( def _download_json(
self, url_or_request, video_id, note='Downloading JSON metadata', self, url_or_request, video_id, note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', transform_source=None, errnote='Unable to download JSON metadata', transform_source=None,
fatal=True, encoding=None, data=None, headers={}, query={}): fatal=True, encoding=None, data=None, headers={}, query={},
expected_status=None):
"""
Return the JSON object as a dict.
See _download_webpage docstring for arguments specification.
"""
res = self._download_json_handle( res = self._download_json_handle(
url_or_request, video_id, note=note, errnote=errnote, url_or_request, video_id, note=note, errnote=errnote,
transform_source=transform_source, fatal=fatal, encoding=encoding, transform_source=transform_source, fatal=fatal, encoding=encoding,
data=data, headers=headers, query=query) data=data, headers=headers, query=query,
expected_status=expected_status)
return res if res is False else res[0] return res if res is False else res[0]
def _parse_json(self, json_string, video_id, transform_source=None, fatal=True): def _parse_json(self, json_string, video_id, transform_source=None, fatal=True):
@ -1058,8 +1147,7 @@ class InfoExtractor(object):
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs): def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
json_ld = self._search_regex( json_ld = self._search_regex(
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>', JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
html, 'JSON-LD', group='json_ld', **kwargs)
default = kwargs.get('default', NO_DEFAULT) default = kwargs.get('default', NO_DEFAULT)
if not json_ld: if not json_ld:
return default if default is not NO_DEFAULT else {} return default if default is not NO_DEFAULT else {}
@ -1768,9 +1856,7 @@ class InfoExtractor(object):
'height': height, 'height': height,
}) })
formats.extend(m3u8_formats) formats.extend(m3u8_formats)
continue elif src_ext == 'f4m':
if src_ext == 'f4m':
f4m_url = src_url f4m_url = src_url
if not f4m_params: if not f4m_params:
f4m_params = { f4m_params = {
@ -1780,9 +1866,13 @@ class InfoExtractor(object):
f4m_url += '&' if '?' in f4m_url else '?' f4m_url += '&' if '?' in f4m_url else '?'
f4m_url += compat_urllib_parse_urlencode(f4m_params) f4m_url += compat_urllib_parse_urlencode(f4m_params)
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False)) formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
continue elif src_ext == 'mpd':
formats.extend(self._extract_mpd_formats(
if src_url.startswith('http') and self._is_valid_url(src, video_id): src_url, video_id, mpd_id='dash', fatal=False))
elif re.search(r'\.ism/[Mm]anifest', src_url):
formats.extend(self._extract_ism_formats(
src_url, video_id, ism_id='mss', fatal=False))
elif src_url.startswith('http') and self._is_valid_url(src, video_id):
http_count += 1 http_count += 1
formats.append({ formats.append({
'url': src_url, 'url': src_url,
@ -1793,7 +1883,6 @@ class InfoExtractor(object):
'width': width, 'width': width,
'height': height, 'height': height,
}) })
continue
return formats return formats
@ -2015,7 +2104,21 @@ class InfoExtractor(object):
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info) representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
def prepare_template(template_name, identifiers): def prepare_template(template_name, identifiers):
t = representation_ms_info[template_name] tmpl = representation_ms_info[template_name]
# First of, % characters outside $...$ templates
# must be escaped by doubling for proper processing
# by % operator string formatting used further (see
# https://github.com/rg3/youtube-dl/issues/16867).
t = ''
in_template = False
for c in tmpl:
t += c
if c == '$':
in_template = not in_template
elif c == '%' and not in_template:
t += c
# Next, $...$ templates are translated to their
# %(...) counterparts to be used with % operator
t = t.replace('$RepresentationID$', representation_id) t = t.replace('$RepresentationID$', representation_id)
t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t) t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t) t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
@ -2346,6 +2449,8 @@ class InfoExtractor(object):
media_info['subtitles'].setdefault(lang, []).append({ media_info['subtitles'].setdefault(lang, []).append({
'url': absolute_url(src), 'url': absolute_url(src),
}) })
for f in media_info['formats']:
f.setdefault('http_headers', {})['Referer'] = base_url
if media_info['formats'] or media_info['subtitles']: if media_info['formats'] or media_info['subtitles']:
entries.append(media_info) entries.append(media_info)
return entries return entries

View File

@ -4,23 +4,21 @@ from __future__ import unicode_literals, division
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_HTTPError
compat_str,
compat_HTTPError,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
url_or_none,
ExtractorError ExtractorError
) )
class CrackleIE(InfoExtractor): class CrackleIE(InfoExtractor):
_VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)' _VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?(?:sony)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
_TEST = { _TESTS = [{
# geo restricted to CA # geo restricted to CA
'url': 'https://www.crackle.com/andromeda/2502343', 'url': 'https://www.crackle.com/andromeda/2502343',
'info_dict': { 'info_dict': {
@ -45,7 +43,10 @@ class CrackleIE(InfoExtractor):
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
} }
} }, {
'url': 'https://www.sonycrackle.com/andromeda/2502343',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -83,8 +84,8 @@ class CrackleIE(InfoExtractor):
for e in media['MediaURLs']: for e in media['MediaURLs']:
if e.get('UseDRM') is True: if e.get('UseDRM') is True:
continue continue
format_url = e.get('Path') format_url = url_or_none(e.get('Path'))
if not format_url or not isinstance(format_url, compat_str): if not format_url:
continue continue
ext = determine_ext(format_url) ext = determine_ext(format_url)
if ext == 'm3u8': if ext == 'm3u8':
@ -121,8 +122,8 @@ class CrackleIE(InfoExtractor):
for cc_file in cc_files: for cc_file in cc_files:
if not isinstance(cc_file, dict): if not isinstance(cc_file, dict):
continue continue
cc_url = cc_file.get('Path') cc_url = url_or_none(cc_file.get('Path'))
if not cc_url or not isinstance(cc_url, compat_str): if not cc_url:
continue continue
lang = cc_file.get('Locale') or 'en' lang = cc_file.get('Locale') or 'en'
subtitles.setdefault(lang, []).append({'url': cc_url}) subtitles.setdefault(lang, []).append({'url': cc_url})

View File

@ -49,7 +49,7 @@ class CrunchyrollBaseIE(InfoExtractor):
}) })
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
@ -262,6 +262,9 @@ class CrunchyrollIE(CrunchyrollBaseIE):
# Just test metadata extraction # Just test metadata extraction
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://www.crunchyroll.com/media-723735',
'only_matching': True,
}] }]
_FORMAT_IDS = { _FORMAT_IDS = {
@ -580,7 +583,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE): class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:playlist' IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login))(?P<id>[\w\-]+))/?(?:\?|$)' _VALID_URL = r'https?://(?:(?P<prefix>www|m)\.)?(?P<url>crunchyroll\.com/(?!(?:news|anime-news|library|forum|launchcalendar|lineup|store|comics|freetrial|login|media-\d+))(?P<id>[\w\-]+))/?(?:\?|$)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi', 'url': 'http://www.crunchyroll.com/a-bridge-to-the-starry-skies-hoshizora-e-kakaru-hashi',

View File

@ -11,10 +11,10 @@ class CTVNewsIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)' _VALID_URL = r'https?://(?:.+?\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.ctvnews.ca/video?clipId=901995', 'url': 'http://www.ctvnews.ca/video?clipId=901995',
'md5': '10deb320dc0ccb8d01d34d12fc2ea672', 'md5': '9b8624ba66351a23e0b6e1391971f9af',
'info_dict': { 'info_dict': {
'id': '901995', 'id': '901995',
'ext': 'mp4', 'ext': 'flv',
'title': 'Extended: \'That person cannot be me\' Johnson says', 'title': 'Extended: \'That person cannot be me\' Johnson says',
'description': 'md5:958dd3b4f5bbbf0ed4d045c790d89285', 'description': 'md5:958dd3b4f5bbbf0ed4d045c790d89285',
'timestamp': 1467286284, 'timestamp': 1467286284,

View File

@ -35,7 +35,7 @@ class CuriosityStreamBaseIE(InfoExtractor):
return result['data'] return result['data']
def _real_initialize(self): def _real_initialize(self):
(email, password) = self._get_login_info() email, password = self._get_login_info()
if email is None: if email is None:
return return
result = self._download_json( result = self._download_json(

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_age_limit,
parse_iso8601, parse_iso8601,
smuggle_url,
str_or_none,
) )
@ -40,10 +43,15 @@ class CWTVIE(InfoExtractor):
'duration': 1263, 'duration': 1263,
'series': 'Whose Line Is It Anyway?', 'series': 'Whose Line Is It Anyway?',
'season_number': 11, 'season_number': 11,
'season': '11',
'episode_number': 20, 'episode_number': 20,
'upload_date': '20151006', 'upload_date': '20151006',
'timestamp': 1444107300, 'timestamp': 1444107300,
'age_limit': 14,
'uploader': 'CWTV',
},
'params': {
# m3u8 download
'skip_download': True,
}, },
}, { }, {
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6', 'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
@ -58,60 +66,28 @@ class CWTVIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = None video_data = self._download_json(
formats = [] 'http://images.cwtv.com/feed/mobileapp/video-meta/apiversion_8/guid_' + video_id,
for partner in (154, 213): video_id)['video']
vdata = self._download_json( title = video_data['title']
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False) mpx_url = video_data.get('mpx_url') or 'http://link.theplatform.com/s/cwtv/media/guid/2703454149/%s?formats=M3U' % video_id
if not vdata:
continue
video_data = vdata
for quality, quality_data in vdata.get('videos', {}).items():
quality_url = quality_data.get('uri')
if not quality_url:
continue
if quality == 'variantplaylist':
formats.extend(self._extract_m3u8_formats(
quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
tbr = int_or_none(quality_data.get('bitrate'))
format_id = 'http' + ('-%d' % tbr if tbr else '')
if self._is_valid_url(quality_url, video_id, format_id):
formats.append({
'format_id': format_id,
'url': quality_url,
'tbr': tbr,
})
video_metadata = video_data['assetFields']
ism_url = video_metadata.get('smoothStreamingUrl')
if ism_url:
formats.extend(self._extract_ism_formats(
ism_url, video_id, ism_id='mss', fatal=False))
self._sort_formats(formats)
thumbnails = [{ season = str_or_none(video_data.get('season'))
'url': image['uri'], episode = str_or_none(video_data.get('episode'))
'width': image.get('width'), if episode and season:
'height': image.get('height'), episode = episode.lstrip(season)
} for image_id, image in video_data['images'].items() if image.get('uri')] if video_data.get('images') else None
subtitles = {
'en': [{
'url': video_metadata['UnicornCcUrl'],
}],
} if video_metadata.get('UnicornCcUrl') else None
return { return {
'_type': 'url_transparent',
'id': video_id, 'id': video_id,
'title': video_metadata['title'], 'title': title,
'description': video_metadata.get('description'), 'url': smuggle_url(mpx_url, {'force_smil_url': True}),
'duration': int_or_none(video_metadata.get('duration')), 'description': video_data.get('description_long'),
'series': video_metadata.get('seriesName'), 'duration': int_or_none(video_data.get('duration_secs')),
'season_number': int_or_none(video_metadata.get('seasonNumber')), 'series': video_data.get('series_name'),
'season': video_metadata.get('seasonName'), 'season_number': int_or_none(season),
'episode_number': int_or_none(video_metadata.get('episodeNumber')), 'episode_number': int_or_none(episode),
'timestamp': parse_iso8601(video_data.get('startTime')), 'timestamp': parse_iso8601(video_data.get('start_time')),
'thumbnails': thumbnails, 'age_limit': parse_age_limit(video_data.get('rating')),
'formats': formats, 'ie_key': 'ThePlatform',
'subtitles': subtitles,
} }

View File

@ -1,22 +1,29 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re import base64
import json import functools
import hashlib
import itertools import itertools
import json
import random
import re
import string
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_struct_pack
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
error_to_compat_str, error_to_compat_str,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
mimetype2ext,
OnDemandPagedList,
parse_iso8601, parse_iso8601,
sanitized_Request, sanitized_Request,
str_to_int, str_to_int,
unescapeHTML, unescapeHTML,
mimetype2ext, urlencode_postdata,
) )
@ -64,7 +71,6 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
'uploader': 'Deadline', 'uploader': 'Deadline',
'uploader_id': 'x1xm8ri', 'uploader_id': 'x1xm8ri',
'age_limit': 0, 'age_limit': 0,
'view_count': int,
}, },
}, { }, {
'url': 'https://www.dailymotion.com/video/x2iuewm_steam-machine-models-pricing-listed-on-steam-store-ign-news_videogames', 'url': 'https://www.dailymotion.com/video/x2iuewm_steam-machine-models-pricing-listed-on-steam-store-ign-news_videogames',
@ -141,7 +147,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage)
description = self._og_search_description(webpage) or self._html_search_meta( description = self._og_search_description(
webpage, default=None) or self._html_search_meta(
'description', webpage, 'description') 'description', webpage, 'description')
view_count_str = self._search_regex( view_count_str = self._search_regex(
@ -167,6 +174,17 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
player = self._parse_json(player_v5, video_id) player = self._parse_json(player_v5, video_id)
metadata = player['metadata'] metadata = player['metadata']
if metadata.get('error', {}).get('type') == 'password_protected':
password = self._downloader.params.get('videopassword')
if password:
r = int(metadata['id'][1:], 36)
us64e = lambda x: base64.urlsafe_b64encode(x).decode().strip('=')
t = ''.join(random.choice(string.ascii_letters) for i in range(10))
n = us64e(compat_struct_pack('I', r))
i = us64e(hashlib.md5(('%s%d%s' % (password, r, t)).encode()).digest())
metadata = self._download_json(
'http://www.dailymotion.com/player/metadata/video/p' + i + t + n, video_id)
self._check_error(metadata) self._check_error(metadata)
formats = [] formats = []
@ -180,9 +198,12 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
continue continue
ext = mimetype2ext(type_) or determine_ext(media_url) ext = mimetype2ext(type_) or determine_ext(media_url)
if ext == 'm3u8': if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats( m3u8_formats = self._extract_m3u8_formats(
media_url, video_id, 'mp4', preference=-1, media_url, video_id, 'mp4', preference=-1,
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False)
for f in m3u8_formats:
f['url'] = f['url'].split('#')[0]
formats.append(f)
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
media_url, video_id, preference=-1, f4m_id='hds', fatal=False)) media_url, video_id, preference=-1, f4m_id='hds', fatal=False))
@ -299,8 +320,8 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
def _check_error(self, info): def _check_error(self, info):
error = info.get('error') error = info.get('error')
if info.get('error') is not None: if error:
title = error['title'] title = error.get('title') or error['message']
# See https://developer.dailymotion.com/api#access-error # See https://developer.dailymotion.com/api#access-error
if error.get('code') == 'DM007': if error.get('code') == 'DM007':
self.raise_geo_restricted(msg=title) self.raise_geo_restricted(msg=title)
@ -325,17 +346,93 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
class DailymotionPlaylistIE(DailymotionBaseInfoExtractor): class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:playlist' IE_NAME = 'dailymotion:playlist'
_VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>[^/?#&]+)' _VALID_URL = r'(?:https?://)?(?:www\.)?dailymotion\.[a-z]{2,3}/playlist/(?P<id>x[0-9a-z]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'https://www.dailymotion.com/playlist/%s/%s'
_TESTS = [{ _TESTS = [{
'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q', 'url': 'http://www.dailymotion.com/playlist/xv4bw_nqtv_sport/1#video=xl8v3q',
'info_dict': { 'info_dict': {
'title': 'SPORT', 'title': 'SPORT',
'id': 'xv4bw_nqtv_sport', 'id': 'xv4bw',
}, },
'playlist_mincount': 20, 'playlist_mincount': 20,
}] }]
_PAGE_SIZE = 100
def _fetch_page(self, playlist_id, authorizaion, page):
page += 1
videos = self._download_json(
'https://graphql.api.dailymotion.com',
playlist_id, 'Downloading page %d' % page,
data=json.dumps({
'query': '''{
collection(xid: "%s") {
videos(first: %d, page: %d) {
pageInfo {
hasNextPage
nextPage
}
edges {
node {
xid
url
}
}
}
}
}''' % (playlist_id, self._PAGE_SIZE, page)
}).encode(), headers={
'Authorization': authorizaion,
'Origin': 'https://www.dailymotion.com',
})['data']['collection']['videos']
for edge in videos['edges']:
node = edge['node']
yield self.url_result(
node['url'], DailymotionIE.ie_key(), node['xid'])
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
api = self._parse_json(self._search_regex(
r'__PLAYER_CONFIG__\s*=\s*({.+?});',
webpage, 'player config'), playlist_id)['context']['api']
auth = self._download_json(
api.get('auth_url', 'https://graphql.api.dailymotion.com/oauth/token'),
playlist_id, data=urlencode_postdata({
'client_id': api.get('client_id', 'f1a362d288c1b98099c7'),
'client_secret': api.get('client_secret', 'eea605b96e01c796ff369935357eca920c5da4c5'),
'grant_type': 'client_credentials',
}))
authorizaion = '%s %s' % (auth.get('token_type', 'Bearer'), auth['access_token'])
entries = OnDemandPagedList(functools.partial(
self._fetch_page, playlist_id, authorizaion), self._PAGE_SIZE)
return self.playlist_result(
entries, playlist_id,
self._og_search_title(webpage))
class DailymotionUserIE(DailymotionBaseInfoExtractor):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_MORE_PAGES_INDICATOR = r'(?s)<div class="pages[^"]*">.*?<a\s+class="[^"]*?icon-arrow_right[^"]*?"'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
'title': 'Rémi Gaillard',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'skip': 'Takes too long time',
}]
def _extract_entries(self, id): def _extract_entries(self, id):
video_ids = set() video_ids = set()
@ -361,43 +458,6 @@ class DailymotionPlaylistIE(DailymotionBaseInfoExtractor):
if re.search(self._MORE_PAGES_INDICATOR, webpage) is None: if re.search(self._MORE_PAGES_INDICATOR, webpage) is None:
break break
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
webpage = self._download_webpage(url, playlist_id)
return {
'_type': 'playlist',
'id': playlist_id,
'title': self._og_search_title(webpage),
'entries': self._extract_entries(playlist_id),
}
class DailymotionUserIE(DailymotionPlaylistIE):
IE_NAME = 'dailymotion:user'
_VALID_URL = r'https?://(?:www\.)?dailymotion\.[a-z]{2,3}/(?!(?:embed|swf|#|video|playlist)/)(?:(?:old/)?user/)?(?P<user>[^/]+)'
_PAGE_TEMPLATE = 'http://www.dailymotion.com/user/%s/%s'
_TESTS = [{
'url': 'https://www.dailymotion.com/user/nqtv',
'info_dict': {
'id': 'nqtv',
'title': 'Rémi Gaillard',
},
'playlist_mincount': 100,
}, {
'url': 'http://www.dailymotion.com/user/UnderProject',
'info_dict': {
'id': 'UnderProject',
'title': 'UnderProject',
},
'playlist_mincount': 1800,
'expected_warnings': [
'Stopped at duplicated page',
],
'skip': 'Takes too long time',
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
user = mobj.group('user') user = mobj.group('user')

View File

@ -5,13 +5,16 @@ from .common import InfoExtractor
from ..compat import compat_str from ..compat import compat_str
from ..utils import ( from ..utils import (
float_or_none, float_or_none,
unified_strdate, int_or_none,
unified_timestamp,
url_or_none,
) )
class DctpTvIE(InfoExtractor): class DctpTvIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?dctp\.tv/(?:#/)?filme/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://(?:www\.)?dctp\.tv/(?:#/)?filme/(?P<id>[^/?#&]+)'
_TEST = { _TESTS = [{
# 4x3
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/', 'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'info_dict': { 'info_dict': {
'id': '95eaa4f33dad413aa17b4ee613cccc6c', 'id': '95eaa4f33dad413aa17b4ee613cccc6c',
@ -19,37 +22,55 @@ class DctpTvIE(InfoExtractor):
'ext': 'flv', 'ext': 'flv',
'title': 'Videoinstallation für eine Kaufhausfassade', 'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm', 'description': 'Kurzfilm',
'upload_date': '20110407',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 71.24, 'duration': 71.24,
'timestamp': 1302172322,
'upload_date': '20110407',
}, },
'params': { 'params': {
# rtmp download # rtmp download
'skip_download': True, 'skip_download': True,
}, },
} }, {
# 16x9
'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
'only_matching': True,
}]
_BASE_URL = 'http://dctp-ivms2-restapi.s3.amazonaws.com'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) version = self._download_json(
'%s/version.json' % self._BASE_URL, display_id,
'Downloading version JSON')
video_id = self._html_search_meta( restapi_base = '%s/%s/restapi' % (
'DC.identifier', webpage, 'video id', self._BASE_URL, version['version_name'])
default=None) or self._search_regex(
r'id=["\']uuid[^>]+>([^<]+)<', webpage, 'video id')
title = self._og_search_title(webpage) info = self._download_json(
'%s/slugs/%s.json' % (restapi_base, display_id), display_id,
'Downloading video info JSON')
media = self._download_json(
'%s/media/%s.json' % (restapi_base, compat_str(info['object_id'])),
display_id, 'Downloading media JSON')
uuid = media['uuid']
title = media['title']
ratio = '16x9' if media.get('is_wide') else '4x3'
play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
servers = self._download_json( servers = self._download_json(
'http://www.dctp.tv/streaming_servers/', display_id, 'http://www.dctp.tv/streaming_servers/', display_id,
note='Downloading server list', fatal=False) note='Downloading server list JSON', fatal=False)
if servers: if servers:
endpoint = next( endpoint = next(
server['endpoint'] server['endpoint']
for server in servers for server in servers
if isinstance(server.get('endpoint'), compat_str) and if url_or_none(server.get('endpoint')) and
'cloudfront' in server['endpoint']) 'cloudfront' in server['endpoint'])
else: else:
endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/' endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
@ -60,27 +81,35 @@ class DctpTvIE(InfoExtractor):
formats = [{ formats = [{
'url': endpoint, 'url': endpoint,
'app': app, 'app': app,
'play_path': 'mp4:%s_dctp_0500_4x3.m4v' % video_id, 'play_path': play_path,
'page_url': url, 'page_url': url,
'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-109.swf', 'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
'ext': 'flv', 'ext': 'flv',
}] }]
description = self._html_search_meta('DC.description', webpage) thumbnails = []
upload_date = unified_strdate( images = media.get('images')
self._html_search_meta('DC.date.created', webpage)) if isinstance(images, list):
thumbnail = self._og_search_thumbnail(webpage) for image in images:
duration = float_or_none(self._search_regex( if not isinstance(image, dict):
r'id=["\']duration_in_ms[^+]>(\d+)', webpage, 'duration', continue
default=None), scale=1000) image_url = url_or_none(image.get('url'))
if not image_url:
continue
thumbnails.append({
'url': image_url,
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
})
return { return {
'id': video_id, 'id': uuid,
'title': title,
'formats': formats,
'display_id': display_id, 'display_id': display_id,
'description': description, 'title': title,
'upload_date': upload_date, 'alt_title': media.get('subtitle'),
'thumbnail': thumbnail, 'description': media.get('description') or media.get('teaser'),
'duration': duration, 'timestamp': unified_timestamp(media.get('created')),
'duration': float_or_none(media.get('duration_in_ms'), scale=1000),
'thumbnails': thumbnails,
'formats': formats,
} }

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
extract_attributes, extract_attributes,
@ -12,6 +11,7 @@ from ..utils import (
parse_age_limit, parse_age_limit,
remove_end, remove_end,
unescapeHTML, unescapeHTML,
url_or_none,
) )
@ -69,9 +69,8 @@ class DiscoveryGoBaseIE(InfoExtractor):
captions = stream.get('captions') captions = stream.get('captions')
if isinstance(captions, list): if isinstance(captions, list):
for caption in captions: for caption in captions:
subtitle_url = caption.get('fileUrl') subtitle_url = url_or_none(caption.get('fileUrl'))
if (not subtitle_url or not isinstance(subtitle_url, compat_str) or if not subtitle_url or not subtitle_url.startswith('http'):
not subtitle_url.startswith('http')):
continue continue
lang = caption.get('fileLang', 'en') lang = caption.get('fileLang', 'en')
ext = determine_ext(subtitle_url) ext = determine_ext(subtitle_url)

View File

@ -3,8 +3,8 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE from .brightcove import BrightcoveLegacyIE
from .dplay import DPlayIE
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urlparse, compat_urlparse,
@ -12,8 +12,13 @@ from ..compat import (
from ..utils import smuggle_url from ..utils import smuggle_url
class DiscoveryNetworksDeIE(InfoExtractor): class DiscoveryNetworksDeIE(DPlayIE):
_VALID_URL = r'https?://(?:www\.)?(?:discovery|tlc|animalplanet|dmax)\.de/(?:.*#(?P<id>\d+)|(?:[^/]+/)*videos/(?P<title>[^/?#]+))' _VALID_URL = r'''(?x)https?://(?:www\.)?(?P<site>discovery|tlc|animalplanet|dmax)\.de/
(?:
.*\#(?P<id>\d+)|
(?:[^/]+/)*videos/(?P<display_id>[^/?#]+)|
programme/(?P<programme>[^/]+)/video/(?P<alternate_id>[^/]+)
)'''
_TESTS = [{ _TESTS = [{
'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001', 'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
@ -40,6 +45,14 @@ class DiscoveryNetworksDeIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
alternate_id = mobj.group('alternate_id')
if alternate_id:
self._initialize_geo_bypass({
'countries': ['DE'],
})
return self._get_disco_api_info(
url, '%s/%s' % (mobj.group('programme'), alternate_id),
'sonic-eu1-prod.disco-api.com', mobj.group('site') + 'de')
brightcove_id = mobj.group('id') brightcove_id = mobj.group('id')
if not brightcove_id: if not brightcove_id:
title = mobj.group('title') title = mobj.group('title')

View File

@ -21,6 +21,7 @@ from ..utils import (
unified_strdate, unified_strdate,
unified_timestamp, unified_timestamp,
update_url_query, update_url_query,
urljoin,
USER_AGENTS, USER_AGENTS,
) )
@ -97,6 +98,75 @@ class DPlayIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
def _get_disco_api_info(self, url, display_id, disco_host, realm):
disco_base = 'https://' + disco_host
token = self._download_json(
'%s/token' % disco_base, display_id, 'Downloading token',
query={
'realm': realm,
})['data']['attributes']['token']
headers = {
'Referer': url,
'Authorization': 'Bearer ' + token,
}
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
headers=headers, query={
'include': 'show'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id, headers=headers)['data']['attributes']['streaming'].items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id='dash', fatal=False))
elif format_id == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
}
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id') display_id = mobj.group('id')
@ -113,72 +183,8 @@ class DPlayIE(InfoExtractor):
if not video_id: if not video_id:
host = mobj.group('host') host = mobj.group('host')
disco_base = 'https://disco-api.%s' % host return self._get_disco_api_info(
self._download_json( url, display_id, 'disco-api.' + host, host.replace('.', ''))
'%s/token' % disco_base, display_id, 'Downloading token',
query={
'realm': host.replace('.', ''),
})
video = self._download_json(
'%s/content/videos/%s' % (disco_base, display_id), display_id,
headers={
'Referer': url,
'x-disco-client': 'WEB:UNKNOWN:dplay-client:0.0.1',
}, query={
'include': 'show'
})
video_id = video['data']['id']
info = video['data']['attributes']
title = info['name']
formats = []
for format_id, format_dict in self._download_json(
'%s/playback/videoPlaybackInfo/%s' % (disco_base, video_id),
display_id)['data']['attributes']['streaming'].items():
if not isinstance(format_dict, dict):
continue
format_url = format_dict.get('url')
if not format_url:
continue
ext = determine_ext(format_url)
if format_id == 'dash' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
format_url, display_id, mpd_id='dash', fatal=False))
elif format_id == 'hls' or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, display_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls',
fatal=False))
else:
formats.append({
'url': format_url,
'format_id': format_id,
})
self._sort_formats(formats)
series = None
try:
included = video.get('included')
if isinstance(included, list):
show = next(e for e in included if e.get('type') == 'show')
series = try_get(
show, lambda x: x['attributes']['name'], compat_str)
except StopIteration:
pass
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': info.get('description'),
'duration': float_or_none(
info.get('videoDuration'), scale=1000),
'timestamp': unified_timestamp(info.get('publishStart')),
'series': series,
'season_number': int_or_none(info.get('seasonNumber')),
'episode_number': int_or_none(info.get('episodeNumber')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
}
info = self._download_json( info = self._download_json(
'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id), 'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
@ -305,9 +311,11 @@ class DPlayItIE(InfoExtractor):
if not info: if not info:
info_url = self._search_regex( info_url = self._search_regex(
r'url\s*[:=]\s*["\']((?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)', (r'playback_json_url\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
webpage, 'info url') r'url\s*[:=]\s*["\'](?P<url>(?:https?:)?//[^/]+/playback/videoPlaybackInfo/\d+)'),
webpage, 'info url', group='url')
info_url = urljoin(url, info_url)
video_id = info_url.rpartition('/')[-1] video_id = info_url.rpartition('/')[-1]
try: try:
@ -317,6 +325,8 @@ class DPlayItIE(InfoExtractor):
'dplayit_token').value, 'dplayit_token').value,
'Referer': url, 'Referer': url,
}) })
if isinstance(info, compat_str):
info = self._parse_json(info, display_id)
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403): if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 403):
info = self._parse_json(e.cause.read().decode('utf-8'), display_id) info = self._parse_json(e.cause.read().decode('utf-8'), display_id)
@ -332,6 +342,7 @@ class DPlayItIE(InfoExtractor):
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
hls_url, display_id, ext='mp4', entry_protocol='m3u8_native', hls_url, display_id, ext='mp4', entry_protocol='m3u8_native',
m3u8_id='hls') m3u8_id='hls')
self._sort_formats(formats)
series = self._html_search_regex( series = self._html_search_regex(
r'(?s)<h1[^>]+class=["\'].*?\bshow_title\b.*?["\'][^>]*>(.+?)</h1>', r'(?s)<h1[^>]+class=["\'].*?\bshow_title\b.*?["\'][^>]*>(.+?)</h1>',

View File

@ -7,7 +7,6 @@ import json
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_str,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -17,6 +16,7 @@ from ..utils import (
parse_age_limit, parse_age_limit,
parse_duration, parse_duration,
unified_timestamp, unified_timestamp,
url_or_none,
) )
@ -42,7 +42,7 @@ class DramaFeverBaseIE(InfoExtractor):
self._login() self._login()
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
@ -139,8 +139,8 @@ class DramaFeverIE(DramaFeverBaseIE):
for sub in subs: for sub in subs:
if not isinstance(sub, dict): if not isinstance(sub, dict):
continue continue
sub_url = sub.get('url') sub_url = url_or_none(sub.get('url'))
if not sub_url or not isinstance(sub_url, compat_str): if not sub_url:
continue continue
subtitles.setdefault( subtitles.setdefault(
sub.get('code') or sub.get('language') or 'en', []).append({ sub.get('code') or sub.get('language') or 'en', []).append({
@ -163,8 +163,8 @@ class DramaFeverIE(DramaFeverBaseIE):
for format_id, format_dict in download_assets.items(): for format_id, format_dict in download_assets.items():
if not isinstance(format_dict, dict): if not isinstance(format_dict, dict):
continue continue
format_url = format_dict.get('url') format_url = url_or_none(format_dict.get('url'))
if not format_url or not isinstance(format_url, compat_str): if not format_url:
continue continue
formats.append({ formats.append({
'url': format_url, 'url': format_url,

View File

@ -8,7 +8,6 @@ from ..utils import (
unified_strdate, unified_strdate,
xpath_text, xpath_text,
determine_ext, determine_ext,
qualities,
float_or_none, float_or_none,
ExtractorError, ExtractorError,
) )
@ -16,7 +15,8 @@ from ..utils import (
class DreiSatIE(InfoExtractor): class DreiSatIE(InfoExtractor):
IE_NAME = '3sat' IE_NAME = '3sat'
_VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$' _GEO_COUNTRIES = ['DE']
_VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918', 'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
@ -43,7 +43,8 @@ class DreiSatIE(InfoExtractor):
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
param_groups = {} param_groups = {}
for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)): for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace')) group_id = param_group.get(self._xpath_ns(
'id', 'http://www.w3.org/XML/1998/namespace'))
params = {} params = {}
for param in param_group: for param in param_group:
params[param.get('name')] = param.get('value') params[param.get('name')] = param.get('value')
@ -54,7 +55,7 @@ class DreiSatIE(InfoExtractor):
src = video.get('src') src = video.get('src')
if not src: if not src:
continue continue
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000) bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
group_id = video.get('paramGroup') group_id = video.get('paramGroup')
param_group = param_groups[group_id] param_group = param_groups[group_id]
for proto in param_group['protocols'].split(','): for proto in param_group['protocols'].split(','):
@ -75,66 +76,36 @@ class DreiSatIE(InfoExtractor):
note='Downloading video info', note='Downloading video info',
errnote='Failed to download video info') errnote='Failed to download video info')
status_code = doc.find('./status/statuscode') status_code = xpath_text(doc, './status/statuscode')
if status_code is not None and status_code.text != 'ok': if status_code and status_code != 'ok':
code = status_code.text if status_code == 'notVisibleAnymore':
if code == 'notVisibleAnymore':
message = 'Video %s is not available' % video_id message = 'Video %s is not available' % video_id
else: else:
message = '%s returned error: %s' % (self.IE_NAME, code) message = '%s returned error: %s' % (self.IE_NAME, status_code)
raise ExtractorError(message, expected=True) raise ExtractorError(message, expected=True)
title = doc.find('.//information/title').text title = xpath_text(doc, './/information/title', 'title', True)
description = xpath_text(doc, './/information/detail', 'description')
duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
def xml_to_thumbnails(fnode): urls = []
thumbnails = []
for node in fnode:
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
if 'key' in node.attrib:
m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
return thumbnails
thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
format_nodes = doc.findall('.//formitaeten/formitaet')
quality = qualities(['veryhigh', 'high', 'med', 'low'])
def get_quality(elem):
return quality(xpath_text(elem, 'quality'))
format_nodes.sort(key=get_quality)
format_ids = []
formats = [] formats = []
for fnode in format_nodes: for fnode in doc.findall('.//formitaeten/formitaet'):
video_url = fnode.find('url').text video_url = xpath_text(fnode, 'url')
is_available = 'http://www.metafilegenerator' not in video_url if not video_url or video_url in urls:
if not is_available:
continue continue
urls.append(video_url)
is_available = 'http://www.metafilegenerator' not in video_url
geoloced = 'static_geoloced_online' in video_url
if not is_available or geoloced:
continue
format_id = fnode.attrib['basetype'] format_id = fnode.attrib['basetype']
quality = xpath_text(fnode, './quality', 'quality')
format_m = re.match(r'''(?x) format_m = re.match(r'''(?x)
(?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_ (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
(?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+) (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
''', format_id) ''', format_id)
ext = determine_ext(video_url, None) or format_m.group('container') ext = determine_ext(video_url, None) or format_m.group('container')
if ext not in ('smil', 'f4m', 'm3u8'):
format_id = format_id + '-' + quality
if format_id in format_ids:
continue
if ext == 'meta': if ext == 'meta':
continue continue
@ -147,24 +118,23 @@ class DreiSatIE(InfoExtractor):
if video_url.startswith('https://'): if video_url.startswith('https://'):
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)) video_url, video_id, 'mp4', 'm3u8_native',
m3u8_id=format_id, fatal=False))
elif ext == 'f4m': elif ext == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
video_url, video_id, f4m_id=format_id, fatal=False)) video_url, video_id, f4m_id=format_id, fatal=False))
else: else:
proto = format_m.group('proto').lower() quality = xpath_text(fnode, './quality')
if quality:
format_id += '-' + quality
abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000) abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000) vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
width = int_or_none(xpath_text(fnode, './width', 'width')) tbr = int_or_none(self._search_regex(
height = int_or_none(xpath_text(fnode, './height', 'height')) r'_(\d+)k', video_url, 'bitrate', None))
if tbr and vbr and not abr:
filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize')) abr = tbr - vbr
format_note = ''
if not format_note:
format_note = None
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
@ -174,31 +144,50 @@ class DreiSatIE(InfoExtractor):
'vcodec': format_m.group('vcodec'), 'vcodec': format_m.group('vcodec'),
'abr': abr, 'abr': abr,
'vbr': vbr, 'vbr': vbr,
'width': width, 'tbr': tbr,
'height': height, 'width': int_or_none(xpath_text(fnode, './width')),
'filesize': filesize, 'height': int_or_none(xpath_text(fnode, './height')),
'format_note': format_note, 'filesize': int_or_none(xpath_text(fnode, './filesize')),
'protocol': proto, 'protocol': format_m.group('proto').lower(),
'_available': is_available,
}) })
format_ids.append(format_id)
geolocation = xpath_text(doc, './/details/geolocation')
if not formats and geolocation and geolocation != 'none':
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
self._sort_formats(formats) self._sort_formats(formats)
thumbnails = []
for node in doc.findall('.//teaserimages/teaserimage'):
thumbnail_url = node.text
if not thumbnail_url:
continue
thumbnail = {
'url': thumbnail_url,
}
thumbnail_key = node.get('key')
if thumbnail_key:
m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
if m:
thumbnail['width'] = int(m.group(1))
thumbnail['height'] = int(m.group(2))
thumbnails.append(thumbnail)
upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': xpath_text(doc, './/information/detail'),
'duration': duration, 'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'uploader': uploader, 'uploader': xpath_text(doc, './/details/originChannelTitle'),
'uploader_id': uploader_id, 'uploader_id': xpath_text(doc, './/details/originChannelId'),
'upload_date': upload_date, 'upload_date': upload_date,
'formats': formats, 'formats': formats,
} }
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
return self.extract_from_xml_url(video_id, details_url) return self.extract_from_xml_url(video_id, details_url)

View File

@ -0,0 +1,83 @@
# coding: utf-8
from __future__ import unicode_literals
import json
import re
from socket import timeout
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_iso8601,
)
class DTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?d\.tube/(?:#!/)?v/(?P<uploader_id>[0-9a-z.-]+)/(?P<id>[0-9a-z]{8})'
_TEST = {
'url': 'https://d.tube/#!/v/benswann/zqd630em',
'md5': 'a03eaa186618ffa7a3145945543a251e',
'info_dict': {
'id': 'zqd630em',
'ext': 'mp4',
'title': 'Reality Check: FDA\'s Disinformation Campaign on Kratom',
'description': 'md5:700d164e066b87f9eac057949e4227c2',
'uploader_id': 'benswann',
'upload_date': '20180222',
'timestamp': 1519328958,
},
'params': {
'format': '480p',
},
}
def _real_extract(self, url):
uploader_id, video_id = re.match(self._VALID_URL, url).groups()
result = self._download_json('https://api.steemit.com/', video_id, data=json.dumps({
'jsonrpc': '2.0',
'method': 'get_content',
'params': [uploader_id, video_id],
}).encode())['result']
metadata = json.loads(result['json_metadata'])
video = metadata['video']
content = video['content']
info = video.get('info', {})
title = info.get('title') or result['title']
def canonical_url(h):
if not h:
return None
return 'https://ipfs.io/ipfs/' + h
formats = []
for q in ('240', '480', '720', '1080', ''):
video_url = canonical_url(content.get('video%shash' % q))
if not video_url:
continue
format_id = (q + 'p') if q else 'Source'
try:
self.to_screen('%s: Checking %s video format URL' % (video_id, format_id))
self._downloader._opener.open(video_url, timeout=5).close()
except timeout as e:
self.to_screen(
'%s: %s URL is invalid, skipping' % (video_id, format_id))
continue
formats.append({
'format_id': format_id,
'url': video_url,
'height': int_or_none(q),
'ext': 'mp4',
})
return {
'id': video_id,
'title': title,
'description': content.get('description'),
'thumbnail': canonical_url(info.get('snaphash')),
'tags': content.get('tags') or metadata.get('tags'),
'duration': info.get('duration'),
'formats': formats,
'timestamp': parse_iso8601(result.get('created')),
'uploader_id': uploader_id,
}

View File

@ -91,17 +91,6 @@ class DVTVIE(InfoExtractor):
}, { }, {
'url': 'http://video.aktualne.cz/v-cechach-poprve-zazni-zelenkova-zrestaurovana-mse/r~45b4b00483ec11e4883b002590604f2e/', 'url': 'http://video.aktualne.cz/v-cechach-poprve-zazni-zelenkova-zrestaurovana-mse/r~45b4b00483ec11e4883b002590604f2e/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://video.aktualne.cz/dvtv/babis-a-zeman-nesou-vinu-za-to-ze-nemame-jasno-v-tom-kdo-bud/r~026afb54fad711e79704ac1f6b220ee8/',
'md5': '87defe16681b1429c91f7a74809823c6',
'info_dict': {
'id': 'f5ae72f6fad611e794dbac1f6b220ee8',
'ext': 'mp4',
'title': 'Babiš a Zeman nesou vinu za to, že nemáme jasno v tom, kdo bude vládnout, říká Pekarová Adamová',
},
'params': {
'skip_download': True,
},
}] }]
def _parse_video_metadata(self, js, video_id, live_js=None): def _parse_video_metadata(self, js, video_id, live_js=None):

View File

@ -4,14 +4,12 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_HTTPError
compat_HTTPError,
compat_str,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
unsmuggle_url, unsmuggle_url,
url_or_none,
) )
@ -177,7 +175,7 @@ class EaglePlatformIE(InfoExtractor):
video_id, 'Downloading mp4 JSON', fatal=False) video_id, 'Downloading mp4 JSON', fatal=False)
if mp4_data: if mp4_data:
for format_id, format_url in mp4_data.get('data', {}).items(): for format_id, format_url in mp4_data.get('data', {}).items():
if not isinstance(format_url, compat_str): if not url_or_none(format_url):
continue continue
height = int_or_none(format_id) height = int_or_none(format_id)
if height is not None and m3u8_formats_dict.get(height): if height is not None and m3u8_formats_dict.get(height):

View File

@ -8,6 +8,7 @@ from ..utils import (
int_or_none, int_or_none,
try_get, try_get,
unified_timestamp, unified_timestamp,
url_or_none,
) )
@ -34,8 +35,8 @@ class EggheadCourseIE(InfoExtractor):
entries = [] entries = []
for lesson in lessons: for lesson in lessons:
lesson_url = lesson.get('http_url') lesson_url = url_or_none(lesson.get('http_url'))
if not lesson_url or not isinstance(lesson_url, compat_str): if not lesson_url:
continue continue
lesson_id = lesson.get('id') lesson_id = lesson.get('id')
if lesson_id: if lesson_id:
@ -95,7 +96,8 @@ class EggheadLessonIE(InfoExtractor):
formats = [] formats = []
for _, format_url in lesson['media_urls'].items(): for _, format_url in lesson['media_urls'].items():
if not format_url or not isinstance(format_url, compat_str): format_url = url_or_none(format_url)
if not format_url:
continue continue
ext = determine_ext(format_url) ext = determine_ext(format_url)
if ext == 'm3u8': if ext == 'm3u8':

View File

@ -11,6 +11,7 @@ from ..utils import (
int_or_none, int_or_none,
parse_duration, parse_duration,
str_to_int, str_to_int,
url_or_none,
) )
@ -82,8 +83,8 @@ class EpornerIE(InfoExtractor):
for format_id, format_dict in formats_dict.items(): for format_id, format_dict in formats_dict.items():
if not isinstance(format_dict, dict): if not isinstance(format_dict, dict):
continue continue
src = format_dict.get('src') src = url_or_none(format_dict.get('src'))
if not isinstance(src, compat_str) or not src.startswith('http'): if not src or not src.startswith('http'):
continue continue
if kind == 'hls': if kind == 'hls':
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(

View File

@ -0,0 +1,98 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
unescapeHTML,
unified_timestamp,
)
class ExpressenIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?expressen\.se/
(?:(?:tvspelare/video|videoplayer/embed)/)?
tv/(?:[^/]+/)*
(?P<id>[^/?#&]+)
'''
_TESTS = [{
'url': 'https://www.expressen.se/tv/ledare/ledarsnack/ledarsnack-om-arbetslosheten-bland-kvinnor-i-speciellt-utsatta-omraden/',
'md5': '2fbbe3ca14392a6b1b36941858d33a45',
'info_dict': {
'id': '8690962',
'ext': 'mp4',
'title': 'Ledarsnack: Om arbetslösheten bland kvinnor i speciellt utsatta områden',
'description': 'md5:f38c81ff69f3de4d269bbda012fcbbba',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 788,
'timestamp': 1526639109,
'upload_date': '20180518',
},
}, {
'url': 'https://www.expressen.se/tv/kultur/kulturdebatt-med-expressens-karin-olsson/',
'only_matching': True,
}, {
'url': 'https://www.expressen.se/tvspelare/video/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}, {
'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?expressen\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
webpage)]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
def extract_data(name):
return self._parse_json(
self._search_regex(
r'data-%s=(["\'])(?P<value>(?:(?!\1).)+)\1' % name,
webpage, 'info', group='value'),
display_id, transform_source=unescapeHTML)
info = extract_data('video-tracking-info')
video_id = info['videoId']
data = extract_data('article-data')
stream = data['stream']
if determine_ext(stream) == 'm3u8':
formats = self._extract_m3u8_formats(
stream, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls')
else:
formats = [{
'url': stream,
}]
self._sort_formats(formats)
title = info.get('titleRaw') or data['title']
description = info.get('descriptionRaw')
thumbnail = info.get('socialMediaImage') or data.get('image')
duration = int_or_none(info.get('videoTotalSecondsDuration') or
data.get('totalSecondsDuration'))
timestamp = unified_timestamp(info.get('publishDate'))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@ -44,6 +44,7 @@ from .anysex import AnySexIE
from .aol import AolIE from .aol import AolIE
from .allocine import AllocineIE from .allocine import AllocineIE
from .aliexpress import AliExpressLiveIE from .aliexpress import AliExpressLiveIE
from .apa import APAIE
from .aparat import AparatIE from .aparat import AparatIE
from .appleconnect import AppleConnectIE from .appleconnect import AppleConnectIE
from .appletrailers import ( from .appletrailers import (
@ -117,6 +118,10 @@ from .bilibili import (
BiliBiliBangumiIE, BiliBiliBangumiIE,
) )
from .biobiochiletv import BioBioChileTVIE from .biobiochiletv import BioBioChileTVIE
from .bitchute import (
BitChuteIE,
BitChuteChannelIE,
)
from .biqle import BIQLEIE from .biqle import BIQLEIE
from .bleacherreport import ( from .bleacherreport import (
BleacherReportIE, BleacherReportIE,
@ -145,6 +150,8 @@ from .camdemy import (
CamdemyIE, CamdemyIE,
CamdemyFolderIE CamdemyFolderIE
) )
from .cammodels import CamModelsIE
from .camtube import CamTubeIE
from .camwithher import CamWithHerIE from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
@ -283,6 +290,7 @@ from .drtv import (
DRTVIE, DRTVIE,
DRTVLiveIE, DRTVLiveIE,
) )
from .dtube import DTubeIE
from .dvtv import DVTVIE from .dvtv import DVTVIE
from .dumpert import DumpertIE from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
@ -331,6 +339,7 @@ from .esri import EsriVideoIE
from .europa import EuropaIE from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE from .everyonesmixtape import EveryonesMixtapeIE
from .expotv import ExpoTVIE from .expotv import ExpoTVIE
from .expressen import ExpressenIE
from .extremetube import ExtremeTubeIE from .extremetube import ExtremeTubeIE
from .eyedotv import EyedoTVIE from .eyedotv import EyedoTVIE
from .facebook import ( from .facebook import (
@ -368,7 +377,6 @@ from .foxgay import FoxgayIE
from .foxnews import ( from .foxnews import (
FoxNewsIE, FoxNewsIE,
FoxNewsArticleIE, FoxNewsArticleIE,
FoxNewsInsiderIE,
) )
from .foxsports import FoxSportsIE from .foxsports import FoxSportsIE
from .franceculture import FranceCultureIE from .franceculture import FranceCultureIE
@ -378,6 +386,7 @@ from .francetv import (
FranceTVSiteIE, FranceTVSiteIE,
FranceTVEmbedIE, FranceTVEmbedIE,
FranceTVInfoIE, FranceTVInfoIE,
FranceTVInfoSportIE,
FranceTVJeunesseIE, FranceTVJeunesseIE,
GenerationWhatIE, GenerationWhatIE,
CultureboxIE, CultureboxIE,
@ -385,6 +394,11 @@ from .francetv import (
from .freesound import FreesoundIE from .freesound import FreesoundIE
from .freespeech import FreespeechIE from .freespeech import FreespeechIE
from .freshlive import FreshLiveIE from .freshlive import FreshLiveIE
from .frontendmasters import (
FrontendMastersIE,
FrontendMastersLessonIE,
FrontendMastersCourseIE
)
from .funimation import FunimationIE from .funimation import FunimationIE
from .funk import ( from .funk import (
FunkMixIE, FunkMixIE,
@ -468,10 +482,7 @@ from .imgur import (
) )
from .ina import InaIE from .ina import InaIE
from .inc import IncIE from .inc import IncIE
from .indavideo import ( from .indavideo import IndavideoEmbedIE
IndavideoIE,
IndavideoEmbedIE,
)
from .infoq import InfoQIE from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE from .instagram import InstagramIE, InstagramUserIE
from .internazionale import InternazionaleIE from .internazionale import InternazionaleIE
@ -581,13 +592,16 @@ from .mailru import (
MailRuMusicIE, MailRuMusicIE,
MailRuMusicSearchIE, MailRuMusicSearchIE,
) )
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE from .makertv import MakerTVIE
from .mangomolo import ( from .mangomolo import (
MangomoloVideoIE, MangomoloVideoIE,
MangomoloLiveIE, MangomoloLiveIE,
) )
from .manyvids import ManyVidsIE from .manyvids import ManyVidsIE
from .markiza import (
MarkizaIE,
MarkizaPageIE,
)
from .massengeschmacktv import MassengeschmackTVIE from .massengeschmacktv import MassengeschmackTVIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
@ -624,7 +638,6 @@ from .mnet import MnetIE
from .moevideo import MoeVideoIE from .moevideo import MoeVideoIE
from .mofosex import MofosexIE from .mofosex import MofosexIE
from .mojvideo import MojvideoIE from .mojvideo import MojvideoIE
from .moniker import MonikerIE
from .morningstar import MorningstarIE from .morningstar import MorningstarIE
from .motherless import ( from .motherless import (
MotherlessIE, MotherlessIE,
@ -645,6 +658,7 @@ from .mtv import (
from .muenchentv import MuenchenTVIE from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE from .musicplayon import MusicPlayOnIE
from .mwave import MwaveIE, MwaveMeetGreetIE from .mwave import MwaveIE, MwaveMeetGreetIE
from .mychannels import MyChannelsIE
from .myspace import MySpaceIE, MySpaceAlbumIE from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE from .myspass import MySpassIE
from .myvi import ( from .myvi import (
@ -666,6 +680,7 @@ from .nbc import (
NBCOlympicsIE, NBCOlympicsIE,
NBCOlympicsStreamIE, NBCOlympicsStreamIE,
NBCSportsIE, NBCSportsIE,
NBCSportsStreamIE,
NBCSportsVPlayerIE, NBCSportsVPlayerIE,
) )
from .ndr import ( from .ndr import (
@ -705,12 +720,7 @@ from .nexx import (
from .nfb import NFBIE from .nfb import NFBIE
from .nfl import NFLIE from .nfl import NFLIE
from .nhk import NhkVodIE from .nhk import NhkVodIE
from .nhl import ( from .nhl import NHLIE
NHLVideocenterIE,
NHLNewsIE,
NHLVideocenterCategoryIE,
NHLIE,
)
from .nick import ( from .nick import (
NickIE, NickIE,
NickBrIE, NickBrIE,
@ -719,10 +729,7 @@ from .nick import (
NickRuIE, NickRuIE,
) )
from .niconico import NiconicoIE, NiconicoPlaylistIE from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninecninemedia import ( from .ninecninemedia import NineCNineMediaIE
NineCNineMediaStackIE,
NineCNineMediaIE,
)
from .ninegag import NineGagIE from .ninegag import NineGagIE
from .ninenow import NineNowIE from .ninenow import NineNowIE
from .nintendo import NintendoIE from .nintendo import NintendoIE
@ -765,7 +772,9 @@ from .nrk import (
NRKSkoleIE, NRKSkoleIE,
NRKTVIE, NRKTVIE,
NRKTVDirekteIE, NRKTVDirekteIE,
NRKTVEpisodeIE,
NRKTVEpisodesIE, NRKTVEpisodesIE,
NRKTVSeasonIE,
NRKTVSeriesIE, NRKTVSeriesIE,
) )
from .ntvde import NTVDeIE from .ntvde import NTVDeIE
@ -810,6 +819,7 @@ from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE from .patreon import PatreonIE
from .pbs import PBSIE from .pbs import PBSIE
from .pearvideo import PearVideoIE from .pearvideo import PearVideoIE
from .peertube import PeerTubeIE
from .people import PeopleIE from .people import PeopleIE
from .performgroup import PerformGroupIE from .performgroup import PerformGroupIE
from .periscope import ( from .periscope import (
@ -854,6 +864,10 @@ from .pornhub import (
from .pornotube import PornotubeIE from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE from .pornovoisines import PornoVoisinesIE
from .pornoxo import PornoXOIE from .pornoxo import PornoXOIE
from .puhutv import (
PuhuTVIE,
PuhuTVSerieIE,
)
from .presstv import PressTVIE from .presstv import PressTVIE
from .primesharetv import PrimeShareTVIE from .primesharetv import PrimeShareTVIE
from .promptfile import PromptFileIE from .promptfile import PromptFileIE
@ -885,7 +899,10 @@ from .rai import (
RaiPlayPlaylistIE, RaiPlayPlaylistIE,
RaiIE, RaiIE,
) )
from .raywenderlich import RayWenderlichIE from .raywenderlich import (
RayWenderlichIE,
RayWenderlichCourseIE,
)
from .rbmaradio import RBMARadioIE from .rbmaradio import RBMARadioIE
from .rds import RDSIE from .rds import RDSIE
from .redbulltv import RedBullTVIE from .redbulltv import RedBullTVIE
@ -1015,7 +1032,10 @@ from .spankbang import SpankBangIE
from .spankwire import SpankwireIE from .spankwire import SpankwireIE
from .spiegel import SpiegelIE, SpiegelArticleIE from .spiegel import SpiegelIE, SpiegelArticleIE
from .spiegeltv import SpiegeltvIE from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE from .spike import (
BellatorIE,
ParamountNetworkIE,
)
from .stitcher import StitcherIE from .stitcher import StitcherIE
from .sport5 import Sport5IE from .sport5 import Sport5IE
from .sportbox import SportBoxEmbedIE from .sportbox import SportBoxEmbedIE
@ -1038,6 +1058,7 @@ from .stretchinternet import StretchInternetIE
from .sunporno import SunPornoIE from .sunporno import SunPornoIE
from .svt import ( from .svt import (
SVTIE, SVTIE,
SVTPageIE,
SVTPlayIE, SVTPlayIE,
SVTSeriesIE, SVTSeriesIE,
) )
@ -1141,6 +1162,7 @@ from .tvc import (
from .tvigle import TvigleIE from .tvigle import TvigleIE
from .tvland import TVLandIE from .tvland import TVLandIE
from .tvn24 import TVN24IE from .tvn24 import TVN24IE
from .tvnet import TVNetIE
from .tvnoe import TVNoeIE from .tvnoe import TVNoeIE
from .tvnow import ( from .tvnow import (
TVNowIE, TVNowIE,
@ -1276,6 +1298,7 @@ from .viki import (
VikiIE, VikiIE,
VikiChannelIE, VikiChannelIE,
) )
from .viqeo import ViqeoIE
from .viu import ( from .viu import (
ViuIE, ViuIE,
ViuPlaylistIE, ViuPlaylistIE,

View File

@ -20,6 +20,7 @@ from ..utils import (
int_or_none, int_or_none,
js_to_json, js_to_json,
limit_length, limit_length,
parse_count,
sanitized_Request, sanitized_Request,
try_get, try_get,
urlencode_postdata, urlencode_postdata,
@ -56,6 +57,7 @@ class FacebookIE(InfoExtractor):
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36' _CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s' _VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
_VIDEO_PAGE_TAHOE_TEMPLATE = 'https://www.facebook.com/video/tahoe/async/%s/?chain=true&isvideo=true'
_TESTS = [{ _TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf', 'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
@ -74,7 +76,7 @@ class FacebookIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '274175099429670', 'id': '274175099429670',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Asif Nawab Butt posted a video to his Timeline.', 'title': 're:^Asif Nawab Butt posted a video',
'uploader': 'Asif Nawab Butt', 'uploader': 'Asif Nawab Butt',
'upload_date': '20140506', 'upload_date': '20140506',
'timestamp': 1399398998, 'timestamp': 1399398998,
@ -132,7 +134,7 @@ class FacebookIE(InfoExtractor):
}, { }, {
# have 1080P, but only up to 720p in swf params # have 1080P, but only up to 720p in swf params
'url': 'https://www.facebook.com/cnn/videos/10155529876156509/', 'url': 'https://www.facebook.com/cnn/videos/10155529876156509/',
'md5': '0d9813160b146b3bc8744e006027fcc6', 'md5': '9571fae53d4165bbbadb17a94651dcdc',
'info_dict': { 'info_dict': {
'id': '10155529876156509', 'id': '10155529876156509',
'ext': 'mp4', 'ext': 'mp4',
@ -141,6 +143,7 @@ class FacebookIE(InfoExtractor):
'upload_date': '20161030', 'upload_date': '20161030',
'uploader': 'CNN', 'uploader': 'CNN',
'thumbnail': r're:^https?://.*', 'thumbnail': r're:^https?://.*',
'view_count': int,
}, },
}, { }, {
# bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall # bigPipe.onPageletArrive ... onPageletArrive pagelet_group_mall
@ -148,7 +151,7 @@ class FacebookIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '1417995061575415', 'id': '1417995061575415',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:a7b86ca673f51800cd54687b7f4012fe', 'title': 'md5:1db063d6a8c13faa8da727817339c857',
'timestamp': 1486648217, 'timestamp': 1486648217,
'upload_date': '20170209', 'upload_date': '20170209',
'uploader': 'Yaroslav Korpan', 'uploader': 'Yaroslav Korpan',
@ -175,7 +178,7 @@ class FacebookIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '1396382447100162', 'id': '1396382447100162',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:e2d2700afdf84e121f5d0f999bad13a3', 'title': 'md5:19a428bbde91364e3de815383b54a235',
'timestamp': 1486035494, 'timestamp': 1486035494,
'upload_date': '20170202', 'upload_date': '20170202',
'uploader': 'Elisabeth Ahtn', 'uploader': 'Elisabeth Ahtn',
@ -208,6 +211,17 @@ class FacebookIE(InfoExtractor):
# no title # no title
'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/', 'url': 'https://www.facebook.com/onlycleverentertainment/videos/1947995502095005/',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.facebook.com/WatchESLOne/videos/359649331226507/',
'info_dict': {
'id': '359649331226507',
'ext': 'mp4',
'title': '#ESLOne VoD - Birmingham Finals Day#1 Fnatic vs. @Evil Geniuses',
'uploader': 'ESL One Dota 2',
},
'params': {
'skip_download': True,
},
}] }]
@staticmethod @staticmethod
@ -226,7 +240,7 @@ class FacebookIE(InfoExtractor):
return urls return urls
def _login(self): def _login(self):
(useremail, password) = self._get_login_info() useremail, password = self._get_login_info()
if useremail is None: if useremail is None:
return return
@ -312,16 +326,18 @@ class FacebookIE(InfoExtractor):
if server_js_data: if server_js_data:
video_data = extract_video_data(server_js_data.get('instances', [])) video_data = extract_video_data(server_js_data.get('instances', []))
def extract_from_jsmods_instances(js_data):
if js_data:
return extract_video_data(try_get(
js_data, lambda x: x['jsmods']['instances'], list) or [])
if not video_data: if not video_data:
server_js_data = self._parse_json( server_js_data = self._parse_json(
self._search_regex( self._search_regex(
r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall|permalink_video_pagelet)', r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+(?:stream_pagelet|pagelet_group_mall|permalink_video_pagelet)',
webpage, 'js data', default='{}'), webpage, 'js data', default='{}'),
video_id, transform_source=js_to_json, fatal=False) video_id, transform_source=js_to_json, fatal=False)
if server_js_data: video_data = extract_from_jsmods_instances(server_js_data)
video_data = extract_video_data(try_get(
server_js_data, lambda x: x['jsmods']['instances'],
list) or [])
if not video_data: if not video_data:
if not fatal_if_no_video: if not fatal_if_no_video:
@ -333,8 +349,35 @@ class FacebookIE(InfoExtractor):
expected=True) expected=True)
elif '>You must log in to continue' in webpage: elif '>You must log in to continue' in webpage:
self.raise_login_required() self.raise_login_required()
else:
raise ExtractorError('Cannot parse data') # Video info not in first request, do a secondary request using
# tahoe player specific URL
tahoe_data = self._download_webpage(
self._VIDEO_PAGE_TAHOE_TEMPLATE % video_id, video_id,
data=urlencode_postdata({
'__a': 1,
'__pc': self._search_regex(
r'pkg_cohort["\']\s*:\s*["\'](.+?)["\']', webpage,
'pkg cohort', default='PHASED:DEFAULT'),
'__rev': self._search_regex(
r'client_revision["\']\s*:\s*(\d+),', webpage,
'client revision', default='3944515'),
'fb_dtsg': self._search_regex(
r'"DTSGInitialData"\s*,\s*\[\]\s*,\s*{\s*"token"\s*:\s*"([^"]+)"',
webpage, 'dtsg token', default=''),
}),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
tahoe_js_data = self._parse_json(
self._search_regex(
r'for\s+\(\s*;\s*;\s*\)\s*;(.+)', tahoe_data,
'tahoe js data', default='{}'),
video_id, fatal=False)
video_data = extract_from_jsmods_instances(tahoe_js_data)
if not video_data:
raise ExtractorError('Cannot parse data')
formats = [] formats = []
for f in video_data: for f in video_data:
@ -380,12 +423,17 @@ class FacebookIE(InfoExtractor):
video_title = 'Facebook video #%s' % video_id video_title = 'Facebook video #%s' % video_id
uploader = clean_html(get_element_by_id( uploader = clean_html(get_element_by_id(
'fbPhotoPageAuthorName', webpage)) or self._search_regex( 'fbPhotoPageAuthorName', webpage)) or self._search_regex(
r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader', fatal=False) r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader',
fatal=False) or self._og_search_title(webpage, fatal=False)
timestamp = int_or_none(self._search_regex( timestamp = int_or_none(self._search_regex(
r'<abbr[^>]+data-utime=["\'](\d+)', webpage, r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
'timestamp', default=None)) 'timestamp', default=None))
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
view_count = parse_count(self._search_regex(
r'\bviewCount\s*:\s*["\']([\d,.]+)', webpage, 'view count',
default=None))
info_dict = { info_dict = {
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
@ -393,6 +441,7 @@ class FacebookIE(InfoExtractor):
'uploader': uploader, 'uploader': uploader,
'timestamp': timestamp, 'timestamp': timestamp,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'view_count': view_count,
} }
return webpage, info_dict return webpage, info_dict

View File

@ -46,7 +46,7 @@ class FC2IE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None or password is None: if username is None or password is None:
return False return False

View File

@ -10,6 +10,7 @@ from ..utils import (
int_or_none, int_or_none,
qualities, qualities,
unified_strdate, unified_strdate,
url_or_none,
) )
@ -88,8 +89,8 @@ class FirstTVIE(InfoExtractor):
formats = [] formats = []
path = None path = None
for f in item.get('mbr', []): for f in item.get('mbr', []):
src = f.get('src') src = url_or_none(f.get('src'))
if not src or not isinstance(src, compat_str): if not src:
continue continue
tbr = int_or_none(self._search_regex( tbr = int_or_none(self._search_regex(
r'_(\d{3,})\.mp4', src, 'tbr', default=None)) r'_(\d{3,})\.mp4', src, 'tbr', default=None))

View File

@ -58,6 +58,14 @@ class FoxNewsIE(AMPIE):
}, },
] ]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url')
for mobj in re.finditer(
r'<(?:amp-)?iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//video\.foxnews\.com/v/video-embed\.html?.*?\bvideo_id=\d+.*?)\1',
webpage)]
def _real_extract(self, url): def _real_extract(self, url):
host, video_id = re.match(self._VALID_URL, url).groups() host, video_id = re.match(self._VALID_URL, url).groups()
@ -68,21 +76,41 @@ class FoxNewsIE(AMPIE):
class FoxNewsArticleIE(InfoExtractor): class FoxNewsArticleIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)' _VALID_URL = r'https?://(?:www\.)?(?:insider\.)?foxnews\.com/(?!v)([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:article' IE_NAME = 'foxnews:article'
_TEST = { _TESTS = [{
# data-video-id
'url': 'http://www.foxnews.com/politics/2016/09/08/buzz-about-bud-clinton-camp-denies-claims-wore-earpiece-at-forum.html', 'url': 'http://www.foxnews.com/politics/2016/09/08/buzz-about-bud-clinton-camp-denies-claims-wore-earpiece-at-forum.html',
'md5': '62aa5a781b308fdee212ebb6f33ae7ef', 'md5': '83d44e1aff1433e7a29a7b537d1700b5',
'info_dict': { 'info_dict': {
'id': '5116295019001', 'id': '5116295019001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Trump and Clinton asked to defend positions on Iraq War', 'title': 'Trump and Clinton asked to defend positions on Iraq War',
'description': 'Veterans react on \'The Kelly File\'', 'description': 'Veterans react on \'The Kelly File\'',
'timestamp': 1473299755, 'timestamp': 1473301045,
'upload_date': '20160908', 'upload_date': '20160908',
}, },
} }, {
# iframe embed
'url': 'http://www.foxnews.com/us/2018/03/09/parkland-survivor-kyle-kashuv-on-meeting-trump-his-app-to-prevent-another-school-shooting.amp.html?__twitter_impression=true',
'info_dict': {
'id': '5748266721001',
'ext': 'flv',
'title': 'Kyle Kashuv has a positive message for the Trump White House',
'description': 'Marjory Stoneman Douglas student disagrees with classmates.',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 229,
'timestamp': 1520594670,
'upload_date': '20180309',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
@ -90,51 +118,10 @@ class FoxNewsArticleIE(InfoExtractor):
video_id = self._html_search_regex( video_id = self._html_search_regex(
r'data-video-id=([\'"])(?P<id>[^\'"]+)\1', r'data-video-id=([\'"])(?P<id>[^\'"]+)\1',
webpage, 'video ID', group='id') webpage, 'video ID', group='id', default=None)
if video_id:
return self.url_result(
'http://video.foxnews.com/v/' + video_id, FoxNewsIE.ie_key())
return self.url_result( return self.url_result(
'http://video.foxnews.com/v/' + video_id, FoxNewsIE._extract_urls(webpage)[0], FoxNewsIE.ie_key())
FoxNewsIE.ie_key())
class FoxNewsInsiderIE(InfoExtractor):
_VALID_URL = r'https?://insider\.foxnews\.com/([^/]+/)+(?P<id>[a-z-]+)'
IE_NAME = 'foxnews:insider'
_TEST = {
'url': 'http://insider.foxnews.com/2016/08/25/univ-wisconsin-student-group-pushing-silence-certain-words',
'md5': 'a10c755e582d28120c62749b4feb4c0c',
'info_dict': {
'id': '5099377331001',
'display_id': 'univ-wisconsin-student-group-pushing-silence-certain-words',
'ext': 'mp4',
'title': 'Student Group: Saying \'Politically Correct,\' \'Trash\' and \'Lame\' Is Offensive',
'description': 'Is campus censorship getting out of control?',
'timestamp': 1472168725,
'upload_date': '20160825',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
# m3u8 download
'skip_download': True,
},
'add_ie': [FoxNewsIE.ie_key()],
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_url = self._html_search_meta('embedUrl', webpage, 'embed URL')
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
return {
'_type': 'url_transparent',
'ie_key': FoxNewsIE.ie_key(),
'url': embed_url,
'display_id': display_id,
'title': title,
'description': description,
}

View File

@ -16,6 +16,7 @@ from ..utils import (
int_or_none, int_or_none,
parse_duration, parse_duration,
try_get, try_get,
url_or_none,
) )
from .dailymotion import DailymotionIE from .dailymotion import DailymotionIE
@ -115,14 +116,13 @@ class FranceTVIE(InfoExtractor):
def sign(manifest_url, manifest_id): def sign(manifest_url, manifest_id):
for host in ('hdfauthftv-a.akamaihd.net', 'hdfauth.francetv.fr'): for host in ('hdfauthftv-a.akamaihd.net', 'hdfauth.francetv.fr'):
signed_url = self._download_webpage( signed_url = url_or_none(self._download_webpage(
'https://%s/esi/TA' % host, video_id, 'https://%s/esi/TA' % host, video_id,
'Downloading signed %s manifest URL' % manifest_id, 'Downloading signed %s manifest URL' % manifest_id,
fatal=False, query={ fatal=False, query={
'url': manifest_url, 'url': manifest_url,
}) }))
if (signed_url and isinstance(signed_url, compat_str) and if signed_url:
re.search(r'^(?:https?:)?//', signed_url)):
return signed_url return signed_url
return manifest_url return manifest_url
@ -379,6 +379,31 @@ class FranceTVInfoIE(FranceTVBaseInfoExtractor):
return self._make_url_result(video_id, catalogue) return self._make_url_result(video_id, catalogue)
class FranceTVInfoSportIE(FranceTVBaseInfoExtractor):
IE_NAME = 'sport.francetvinfo.fr'
_VALID_URL = r'https?://sport\.francetvinfo\.fr/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://sport.francetvinfo.fr/les-jeux-olympiques/retour-sur-les-meilleurs-moments-de-pyeongchang-2018',
'info_dict': {
'id': '6e49080e-3f45-11e8-b459-000d3a2439ea',
'ext': 'mp4',
'title': 'Retour sur les meilleurs moments de Pyeongchang 2018',
'timestamp': 1523639962,
'upload_date': '20180413',
},
'params': {
'skip_download': True,
},
'add_ie': [FranceTVIE.ie_key()],
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(r'data-video="([^"]+)"', webpage, 'video_id')
return self._make_url_result(video_id, 'Sport-web')
class GenerationWhatIE(InfoExtractor): class GenerationWhatIE(InfoExtractor):
IE_NAME = 'france2.fr:generation-what' IE_NAME = 'france2.fr:generation-what'
_VALID_URL = r'https?://generation-what\.francetv\.fr/[^/]+/video/(?P<id>[^/?#&]+)' _VALID_URL = r'https?://generation-what\.francetv\.fr/[^/]+/video/(?P<id>[^/?#&]+)'

View File

@ -0,0 +1,263 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import (
ExtractorError,
parse_duration,
url_or_none,
urlencode_postdata,
)
class FrontendMastersBaseIE(InfoExtractor):
_API_BASE = 'https://api.frontendmasters.com/v1/kabuki'
_LOGIN_URL = 'https://frontendmasters.com/login/'
_NETRC_MACHINE = 'frontendmasters'
_QUALITIES = {
'low': {'width': 480, 'height': 360},
'mid': {'width': 1280, 'height': 720},
'high': {'width': 1920, 'height': 1080}
}
def _real_initialize(self):
self._login()
def _login(self):
(username, password) = self._get_login_info()
if username is None:
return
login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page')
login_form = self._hidden_inputs(login_page)
login_form.update({
'username': username,
'password': password
})
post_url = self._search_regex(
r'<form[^>]+action=(["\'])(?P<url>.+?)\1', login_page,
'post_url', default=self._LOGIN_URL, group='url')
if not post_url.startswith('http'):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
response = self._download_webpage(
post_url, None, 'Logging in', data=urlencode_postdata(login_form),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
# Successful login
if any(p in response for p in (
'wp-login.php?action=logout', '>Logout')):
return
error = self._html_search_regex(
r'class=(["\'])(?:(?!\1).)*\bMessageAlert\b(?:(?!\1).)*\1[^>]*>(?P<error>[^<]+)<',
response, 'error message', default=None, group='error')
if error:
raise ExtractorError('Unable to login: %s' % error, expected=True)
raise ExtractorError('Unable to log in')
class FrontendMastersPageBaseIE(FrontendMastersBaseIE):
def _download_course(self, course_name, url):
return self._download_json(
'%s/courses/%s' % (self._API_BASE, course_name), course_name,
'Downloading course JSON', headers={'Referer': url})
@staticmethod
def _extract_chapters(course):
chapters = []
lesson_elements = course.get('lessonElements')
if isinstance(lesson_elements, list):
chapters = [url_or_none(e) for e in lesson_elements if url_or_none(e)]
return chapters
@staticmethod
def _extract_lesson(chapters, lesson_id, lesson):
title = lesson.get('title') or lesson_id
display_id = lesson.get('slug')
description = lesson.get('description')
thumbnail = lesson.get('thumbnail')
chapter_number = None
index = lesson.get('index')
element_index = lesson.get('elementIndex')
if (isinstance(index, int) and isinstance(element_index, int) and
index < element_index):
chapter_number = element_index - index
chapter = (chapters[chapter_number - 1]
if chapter_number - 1 < len(chapters) else None)
duration = None
timestamp = lesson.get('timestamp')
if isinstance(timestamp, compat_str):
mobj = re.search(
r'(?P<start>\d{1,2}:\d{1,2}:\d{1,2})\s*-(?P<end>\s*\d{1,2}:\d{1,2}:\d{1,2})',
timestamp)
if mobj:
duration = parse_duration(mobj.group('end')) - parse_duration(
mobj.group('start'))
return {
'_type': 'url_transparent',
'url': 'frontendmasters:%s' % lesson_id,
'ie_key': FrontendMastersIE.ie_key(),
'id': lesson_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'chapter': chapter,
'chapter_number': chapter_number,
}
class FrontendMastersIE(FrontendMastersBaseIE):
_VALID_URL = r'(?:frontendmasters:|https?://api\.frontendmasters\.com/v\d+/kabuki/video/)(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://api.frontendmasters.com/v1/kabuki/video/a2qogef6ba',
'md5': '7f161159710d6b7016a4f4af6fcb05e2',
'info_dict': {
'id': 'a2qogef6ba',
'ext': 'mp4',
'title': 'a2qogef6ba',
},
'skip': 'Requires FrontendMasters account credentials',
}, {
'url': 'frontendmasters:a2qogef6ba',
'only_matching': True,
}]
def _real_extract(self, url):
lesson_id = self._match_id(url)
source_url = '%s/video/%s/source' % (self._API_BASE, lesson_id)
formats = []
for ext in ('webm', 'mp4'):
for quality in ('low', 'mid', 'high'):
resolution = self._QUALITIES[quality].copy()
format_id = '%s-%s' % (ext, quality)
format_url = self._download_json(
source_url, lesson_id,
'Downloading %s source JSON' % format_id, query={
'f': ext,
'r': resolution['height'],
}, headers={
'Referer': url,
}, fatal=False)['url']
if not format_url:
continue
f = resolution.copy()
f.update({
'url': format_url,
'ext': ext,
'format_id': format_id,
})
formats.append(f)
self._sort_formats(formats)
subtitles = {
'en': [{
'url': '%s/transcripts/%s.vtt' % (self._API_BASE, lesson_id),
}]
}
return {
'id': lesson_id,
'title': lesson_id,
'formats': formats,
'subtitles': subtitles
}
class FrontendMastersLessonIE(FrontendMastersPageBaseIE):
_VALID_URL = r'https?://(?:www\.)?frontendmasters\.com/courses/(?P<course_name>[^/]+)/(?P<lesson_name>[^/]+)'
_TEST = {
'url': 'https://frontendmasters.com/courses/web-development/tools',
'info_dict': {
'id': 'a2qogef6ba',
'display_id': 'tools',
'ext': 'mp4',
'title': 'Tools',
'description': 'md5:82c1ea6472e88ed5acd1829fe992e4f7',
'thumbnail': r're:^https?://.*\.jpg$',
'chapter': 'Introduction',
'chapter_number': 1,
},
'params': {
'skip_download': True,
},
'skip': 'Requires FrontendMasters account credentials',
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
course_name, lesson_name = mobj.group('course_name', 'lesson_name')
course = self._download_course(course_name, url)
lesson_id, lesson = next(
(video_id, data)
for video_id, data in course['lessonData'].items()
if data.get('slug') == lesson_name)
chapters = self._extract_chapters(course)
return self._extract_lesson(chapters, lesson_id, lesson)
class FrontendMastersCourseIE(FrontendMastersPageBaseIE):
_VALID_URL = r'https?://(?:www\.)?frontendmasters\.com/courses/(?P<id>[^/]+)'
_TEST = {
'url': 'https://frontendmasters.com/courses/web-development/',
'info_dict': {
'id': 'web-development',
'title': 'Introduction to Web Development',
'description': 'md5:9317e6e842098bf725d62360e52d49a6',
},
'playlist_count': 81,
'skip': 'Requires FrontendMasters account credentials',
}
@classmethod
def suitable(cls, url):
return False if FrontendMastersLessonIE.suitable(url) else super(
FrontendMastersBaseIE, cls).suitable(url)
def _real_extract(self, url):
course_name = self._match_id(url)
course = self._download_course(course_name, url)
chapters = self._extract_chapters(course)
lessons = sorted(
course['lessonData'].values(), key=lambda data: data['index'])
entries = []
for lesson in lessons:
lesson_name = lesson.get('slug')
if not lesson_name:
continue
lesson_id = lesson.get('hash') or lesson.get('statsId')
entries.append(self._extract_lesson(chapters, lesson_id, lesson))
title = course.get('title')
description = course.get('description')
return self.playlist_result(entries, course_name, title, description)

View File

@ -51,7 +51,7 @@ class FunimationIE(InfoExtractor):
}] }]
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None: if username is None:
return return
try: try:

View File

@ -1,10 +1,12 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .nexx import NexxIE from .nexx import NexxIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
try_get, try_get,
@ -12,6 +14,19 @@ from ..utils import (
class FunkBaseIE(InfoExtractor): class FunkBaseIE(InfoExtractor):
_HEADERS = {
'Accept': '*/*',
'Accept-Language': 'en-US,en;q=0.9,ru;q=0.8',
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4',
}
_AUTH = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoid2ViYXBwLXYzMSIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxuZXh4LWNvbnRlbnQtYXBpLXYzMSx3ZWJhcHAtYXBpIn0.mbuG9wS9Yf5q6PqgR4fiaRFIagiHk9JhwoKES7ksVX4'
@staticmethod
def _make_headers(referer):
headers = FunkBaseIE._HEADERS.copy()
headers['Referer'] = referer
return headers
def _make_url_result(self, video): def _make_url_result(self, video):
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',
@ -48,19 +63,19 @@ class FunkMixIE(FunkBaseIE):
lists = self._download_json( lists = self._download_json(
'https://www.funk.net/api/v3.1/curation/curatedLists/', 'https://www.funk.net/api/v3.1/curation/curatedLists/',
mix_id, headers={ mix_id, headers=self._make_headers(url), query={
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbC12Mi4wIiwic2NvcGUiOiJzdGF0aWMtY29udGVudC1hcGksY3VyYXRpb24tc2VydmljZSxzZWFyY2gtYXBpIn0.SGCC1IXHLtZYoo8PvRKlU2gXH1su8YSu47sB3S4iXBI',
'Referer': url,
}, query={
'size': 100, 'size': 100,
})['result']['lists'] })['_embedded']['curatedListList']
metas = next( metas = next(
l for l in lists l for l in lists
if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas'] if mix_id in (l.get('entityId'), l.get('alias')))['videoMetas']
video = next( video = next(
meta['videoDataDelegate'] meta['videoDataDelegate']
for meta in metas if meta.get('alias') == alias) for meta in metas
if try_get(
meta, lambda x: x['videoDataDelegate']['alias'],
compat_str) == alias)
return self._make_url_result(video) return self._make_url_result(video)
@ -104,25 +119,53 @@ class FunkChannelIE(FunkBaseIE):
channel_id = mobj.group('id') channel_id = mobj.group('id')
alias = mobj.group('alias') alias = mobj.group('alias')
headers = { headers = self._make_headers(url)
'authorization': 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnROYW1lIjoiY3VyYXRpb24tdG9vbCIsInNjb3BlIjoic3RhdGljLWNvbnRlbnQtYXBpLGN1cmF0aW9uLWFwaSxzZWFyY2gtYXBpIn0.q4Y2xZG8PFHai24-4Pjx2gym9RmJejtmK6lMXP5wAgc',
'Referer': url,
}
video = None video = None
by_id_list = self._download_json( # Id-based channels are currently broken on their side: webplayer
'https://www.funk.net/api/v3.0/content/videos/byIdList', channel_id, # tries to process them via byChannelAlias endpoint and fails
headers=headers, query={ # predictably.
'ids': alias, for page_num in itertools.count():
}, fatal=False) by_channel_alias = self._download_json(
if by_id_list: 'https://www.funk.net/api/v3.1/webapp/videos/byChannelAlias/%s'
video = try_get(by_id_list, lambda x: x['result'][0], dict) % channel_id,
'Downloading byChannelAlias JSON page %d' % (page_num + 1),
headers=headers, query={
'filterFsk': 'false',
'sort': 'creationDate,desc',
'size': 100,
'page': page_num,
}, fatal=False)
if not by_channel_alias:
break
video_list = try_get(
by_channel_alias, lambda x: x['_embedded']['videoList'], list)
if not video_list:
break
try:
video = next(r for r in video_list if r.get('alias') == alias)
break
except StopIteration:
pass
if not try_get(
by_channel_alias, lambda x: x['_links']['next']):
break
if not video:
by_id_list = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/byIdList',
channel_id, 'Downloading byIdList JSON', headers=headers,
query={
'ids': alias,
}, fatal=False)
if by_id_list:
video = try_get(by_id_list, lambda x: x['result'][0], dict)
if not video: if not video:
results = self._download_json( results = self._download_json(
'https://www.funk.net/api/v3.0/content/videos/filter', channel_id, 'https://www.funk.net/api/v3.0/content/videos/filter',
headers=headers, query={ channel_id, 'Downloading filter JSON', headers=headers, query={
'channelId': channel_id, 'channelId': channel_id,
'size': 100, 'size': 100,
})['result'] })['result']

View File

@ -91,7 +91,7 @@ class GDCVaultIE(InfoExtractor):
] ]
def _login(self, webpage_url, display_id): def _login(self, webpage_url, display_id):
(username, password) = self._get_login_info() username, password = self._get_login_info()
if username is None or password is None: if username is None or password is None:
self.report_warning('It looks like ' + webpage_url + ' requires a login. Try specifying a username and password and try again.') self.report_warning('It looks like ' + webpage_url + ' requires a login. Try specifying a username and password and try again.')
return None return None

View File

@ -32,6 +32,7 @@ from ..utils import (
unified_strdate, unified_strdate,
unsmuggle_url, unsmuggle_url,
UnsupportedError, UnsupportedError,
url_or_none,
xpath_text, xpath_text,
) )
from .commonprotocols import RtmpIE from .commonprotocols import RtmpIE
@ -108,6 +109,12 @@ from .yapfiles import YapFilesIE
from .vice import ViceIE from .vice import ViceIE
from .xfileshare import XFileShareIE from .xfileshare import XFileShareIE
from .cloudflarestream import CloudflareStreamIE from .cloudflarestream import CloudflareStreamIE
from .peertube import PeerTubeIE
from .indavideo import IndavideoEmbedIE
from .apa import APAIE
from .foxnews import FoxNewsIE
from .viqeo import ViqeoIE
from .expressen import ExpressenIE
class GenericIE(InfoExtractor): class GenericIE(InfoExtractor):
@ -1391,17 +1398,6 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
# SVT embed
{
'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
'info_dict': {
'id': '2900353',
'ext': 'flv',
'title': 'Här trycker Jagr till Giroux (under SVT-intervjun)',
'duration': 27,
'age_limit': 0,
},
},
# Crooks and Liars embed # Crooks and Liars embed
{ {
'url': 'http://crooksandliars.com/2015/04/fox-friends-says-protecting-atheists', 'url': 'http://crooksandliars.com/2015/04/fox-friends-says-protecting-atheists',
@ -2012,6 +2008,50 @@ class GenericIE(InfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
# PeerTube embed
'url': 'https://joinpeertube.org/fr/home/',
'info_dict': {
'id': 'home',
'title': 'Reprenez le contrôle de vos vidéos ! #JoinPeertube',
},
'playlist_count': 2,
},
{
# Indavideo embed
'url': 'https://streetkitchen.hu/receptek/igy_kell_otthon_hamburgert_sutni/',
'info_dict': {
'id': '1693903',
'ext': 'mp4',
'title': 'Így kell otthon hamburgert sütni',
'description': 'md5:f5a730ecf900a5c852e1e00540bbb0f7',
'timestamp': 1426330212,
'upload_date': '20150314',
'uploader': 'StreetKitchen',
'uploader_id': '546363',
},
'add_ie': [IndavideoEmbedIE.ie_key()],
'params': {
'skip_download': True,
},
},
{
# APA embed via JWPlatform embed
'url': 'http://www.vol.at/blue-man-group/5593454',
'info_dict': {
'id': 'jjv85FdZ',
'ext': 'mp4',
'title': '"Blau ist mysteriös": Die Blue Man Group im Interview',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 254,
'timestamp': 1519211149,
'upload_date': '20180221',
},
'params': {
'skip_download': True,
},
},
{ {
'url': 'http://share-videos.se/auto/video/83645793?uid=13', 'url': 'http://share-videos.se/auto/video/83645793?uid=13',
'md5': 'b68d276de422ab07ee1d49388103f457', 'md5': 'b68d276de422ab07ee1d49388103f457',
@ -2022,6 +2062,15 @@ class GenericIE(InfoExtractor):
}, },
'skip': 'TODO: fix nested playlists processing in tests', 'skip': 'TODO: fix nested playlists processing in tests',
}, },
{
# Viqeo embeds
'url': 'https://viqeo.tv/',
'info_dict': {
'id': 'viqeo',
'title': 'All-new video platform',
},
'playlist_count': 6,
},
# { # {
# # TODO: find another test # # TODO: find another test
# # http://schema.org/VideoObject # # http://schema.org/VideoObject
@ -3029,6 +3078,26 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
cloudflarestream_urls, video_id, video_title, ie=CloudflareStreamIE.ie_key()) cloudflarestream_urls, video_id, video_title, ie=CloudflareStreamIE.ie_key())
peertube_urls = PeerTubeIE._extract_urls(webpage, url)
if peertube_urls:
return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls:
return self.playlist_from_matches(
indavideo_urls, video_id, video_title, ie=IndavideoEmbedIE.ie_key())
apa_urls = APAIE._extract_urls(webpage)
if apa_urls:
return self.playlist_from_matches(
apa_urls, video_id, video_title, ie=APAIE.ie_key())
foxnews_urls = FoxNewsIE._extract_urls(webpage)
if foxnews_urls:
return self.playlist_from_matches(
foxnews_urls, video_id, video_title, ie=FoxNewsIE.ie_key())
sharevideos_urls = [mobj.group('url') for mobj in re.finditer( sharevideos_urls = [mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1', r'<iframe[^>]+?\bsrc\s*=\s*(["\'])(?P<url>(?:https?:)?//embed\.share-videos\.se/auto/embed/\d+\?.*?\buid=\d+.*?)\1',
webpage)] webpage)]
@ -3036,6 +3105,16 @@ class GenericIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
sharevideos_urls, video_id, video_title) sharevideos_urls, video_id, video_title)
viqeo_urls = ViqeoIE._extract_urls(webpage)
if viqeo_urls:
return self.playlist_from_matches(
viqeo_urls, video_id, video_title, ie=ViqeoIE.ie_key())
expressen_urls = ExpressenIE._extract_urls(webpage)
if expressen_urls:
return self.playlist_from_matches(
expressen_urls, video_id, video_title, ie=ExpressenIE.ie_key())
# Look for HTML5 media # Look for HTML5 media
entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls') entries = self._parse_html5_media_entries(url, webpage, video_id, m3u8_id='hls')
if entries: if entries:
@ -3073,8 +3152,8 @@ class GenericIE(InfoExtractor):
sources = [sources] sources = [sources]
formats = [] formats = []
for source in sources: for source in sources:
src = source.get('src') src = url_or_none(source.get('src'))
if not src or not isinstance(src, compat_str): if not src:
continue continue
src = compat_urlparse.urljoin(url, src) src = compat_urlparse.urljoin(url, src)
src_type = source.get('type') src_type = source.get('type')

View File

@ -1,15 +1,16 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import hashlib
import json
import random import random
import re import re
import math
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_str, compat_str,
compat_chr,
compat_ord,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -22,12 +23,7 @@ from ..utils import (
class GloboIE(InfoExtractor): class GloboIE(InfoExtractor):
_VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})' _VALID_URL = r'(?:globo:|https?://.+?\.globo\.com/(?:[^/]+/)*(?:v/(?:[^/]+/)?|videos/))(?P<id>\d{7,})'
_NETRC_MACHINE = 'globo'
_API_URL_TEMPLATE = 'http://api.globovideos.com/videos/%s/playlist'
_SECURITY_URL_TEMPLATE = 'http://security.video.globo.com/videos/%s/hash?player=flash&version=17.0.0.132&resource_id=%s'
_RESIGN_EXPIRATION = 86400
_TESTS = [{ _TESTS = [{
'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/', 'url': 'http://g1.globo.com/carros/autoesporte/videos/t/exclusivos-do-g1/v/mercedes-benz-gla-passa-por-teste-de-colisao-na-europa/3607726/',
'md5': 'b3ccc801f75cd04a914d51dadb83a78d', 'md5': 'b3ccc801f75cd04a914d51dadb83a78d',
@ -70,287 +66,51 @@ class GloboIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
class MD5(object): def _real_initialize(self):
HEX_FORMAT_LOWERCASE = 0 email, password = self._get_login_info()
HEX_FORMAT_UPPERCASE = 1 if email is None:
BASE64_PAD_CHARACTER_DEFAULT_COMPLIANCE = '' return
BASE64_PAD_CHARACTER_RFC_COMPLIANCE = '='
PADDING = '=0xFF01DD'
hexcase = 0
b64pad = ''
def __init__(self): try:
pass self._download_json(
'https://login.globo.com/api/authentication', None, data=json.dumps({
class JSArray(list): 'payload': {
def __getitem__(self, y): 'email': email,
try: 'password': password,
return list.__getitem__(self, y) 'serviceId': 4654,
except IndexError: },
return 0 }).encode(), headers={
'Content-Type': 'application/json; charset=utf-8',
def __setitem__(self, i, y): })
try: except ExtractorError as e:
return list.__setitem__(self, i, y) if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
except IndexError: resp = self._parse_json(e.cause.read(), None)
self.extend([0] * (i - len(self) + 1)) raise ExtractorError(resp.get('userMessage') or resp['id'], expected=True)
self[-1] = y raise
@classmethod
def hex_md5(cls, param1):
return cls.rstr2hex(cls.rstr_md5(cls.str2rstr_utf8(param1)))
@classmethod
def b64_md5(cls, param1, param2=None):
return cls.rstr2b64(cls.rstr_md5(cls.str2rstr_utf8(param1, param2)))
@classmethod
def any_md5(cls, param1, param2):
return cls.rstr2any(cls.rstr_md5(cls.str2rstr_utf8(param1)), param2)
@classmethod
def rstr_md5(cls, param1):
return cls.binl2rstr(cls.binl_md5(cls.rstr2binl(param1), len(param1) * 8))
@classmethod
def rstr2hex(cls, param1):
_loc_2 = '0123456789ABCDEF' if cls.hexcase else '0123456789abcdef'
_loc_3 = ''
for _loc_5 in range(0, len(param1)):
_loc_4 = compat_ord(param1[_loc_5])
_loc_3 += _loc_2[_loc_4 >> 4 & 15] + _loc_2[_loc_4 & 15]
return _loc_3
@classmethod
def rstr2b64(cls, param1):
_loc_2 = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_'
_loc_3 = ''
_loc_4 = len(param1)
for _loc_5 in range(0, _loc_4, 3):
_loc_6_1 = compat_ord(param1[_loc_5]) << 16
_loc_6_2 = compat_ord(param1[_loc_5 + 1]) << 8 if _loc_5 + 1 < _loc_4 else 0
_loc_6_3 = compat_ord(param1[_loc_5 + 2]) if _loc_5 + 2 < _loc_4 else 0
_loc_6 = _loc_6_1 | _loc_6_2 | _loc_6_3
for _loc_7 in range(0, 4):
if _loc_5 * 8 + _loc_7 * 6 > len(param1) * 8:
_loc_3 += cls.b64pad
else:
_loc_3 += _loc_2[_loc_6 >> 6 * (3 - _loc_7) & 63]
return _loc_3
@staticmethod
def rstr2any(param1, param2):
_loc_3 = len(param2)
_loc_4 = []
_loc_9 = [0] * ((len(param1) >> 2) + 1)
for _loc_5 in range(0, len(_loc_9)):
_loc_9[_loc_5] = compat_ord(param1[_loc_5 * 2]) << 8 | compat_ord(param1[_loc_5 * 2 + 1])
while len(_loc_9) > 0:
_loc_8 = []
_loc_7 = 0
for _loc_5 in range(0, len(_loc_9)):
_loc_7 = (_loc_7 << 16) + _loc_9[_loc_5]
_loc_6 = math.floor(_loc_7 / _loc_3)
_loc_7 -= _loc_6 * _loc_3
if len(_loc_8) > 0 or _loc_6 > 0:
_loc_8[len(_loc_8)] = _loc_6
_loc_4[len(_loc_4)] = _loc_7
_loc_9 = _loc_8
_loc_10 = ''
_loc_5 = len(_loc_4) - 1
while _loc_5 >= 0:
_loc_10 += param2[_loc_4[_loc_5]]
_loc_5 -= 1
return _loc_10
@classmethod
def str2rstr_utf8(cls, param1, param2=None):
_loc_3 = ''
_loc_4 = -1
if not param2:
param2 = cls.PADDING
param1 = param1 + param2[1:9]
while True:
_loc_4 += 1
if _loc_4 >= len(param1):
break
_loc_5 = compat_ord(param1[_loc_4])
_loc_6 = compat_ord(param1[_loc_4 + 1]) if _loc_4 + 1 < len(param1) else 0
if 55296 <= _loc_5 <= 56319 and 56320 <= _loc_6 <= 57343:
_loc_5 = 65536 + ((_loc_5 & 1023) << 10) + (_loc_6 & 1023)
_loc_4 += 1
if _loc_5 <= 127:
_loc_3 += compat_chr(_loc_5)
continue
if _loc_5 <= 2047:
_loc_3 += compat_chr(192 | _loc_5 >> 6 & 31) + compat_chr(128 | _loc_5 & 63)
continue
if _loc_5 <= 65535:
_loc_3 += compat_chr(224 | _loc_5 >> 12 & 15) + compat_chr(128 | _loc_5 >> 6 & 63) + compat_chr(
128 | _loc_5 & 63)
continue
if _loc_5 <= 2097151:
_loc_3 += compat_chr(240 | _loc_5 >> 18 & 7) + compat_chr(128 | _loc_5 >> 12 & 63) + compat_chr(
128 | _loc_5 >> 6 & 63) + compat_chr(128 | _loc_5 & 63)
return _loc_3
@staticmethod
def rstr2binl(param1):
_loc_2 = [0] * ((len(param1) >> 2) + 1)
for _loc_3 in range(0, len(_loc_2)):
_loc_2[_loc_3] = 0
for _loc_3 in range(0, len(param1) * 8, 8):
_loc_2[_loc_3 >> 5] |= (compat_ord(param1[_loc_3 // 8]) & 255) << _loc_3 % 32
return _loc_2
@staticmethod
def binl2rstr(param1):
_loc_2 = ''
for _loc_3 in range(0, len(param1) * 32, 8):
_loc_2 += compat_chr(param1[_loc_3 >> 5] >> _loc_3 % 32 & 255)
return _loc_2
@classmethod
def binl_md5(cls, param1, param2):
param1 = cls.JSArray(param1)
param1[param2 >> 5] |= 128 << param2 % 32
param1[(param2 + 64 >> 9 << 4) + 14] = param2
_loc_3 = 1732584193
_loc_4 = -271733879
_loc_5 = -1732584194
_loc_6 = 271733878
for _loc_7 in range(0, len(param1), 16):
_loc_8 = _loc_3
_loc_9 = _loc_4
_loc_10 = _loc_5
_loc_11 = _loc_6
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 0], 7, -680876936)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 1], 12, -389564586)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 2], 17, 606105819)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 3], 22, -1044525330)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 4], 7, -176418897)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 5], 12, 1200080426)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 6], 17, -1473231341)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 7], 22, -45705983)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 8], 7, 1770035416)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 9], 12, -1958414417)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 10], 17, -42063)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 11], 22, -1990404162)
_loc_3 = cls.md5_ff(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 12], 7, 1804603682)
_loc_6 = cls.md5_ff(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 13], 12, -40341101)
_loc_5 = cls.md5_ff(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 14], 17, -1502002290)
_loc_4 = cls.md5_ff(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 15], 22, 1236535329)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 1], 5, -165796510)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 6], 9, -1069501632)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 11], 14, 643717713)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 0], 20, -373897302)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 5], 5, -701558691)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 10], 9, 38016083)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 15], 14, -660478335)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 4], 20, -405537848)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 9], 5, 568446438)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 14], 9, -1019803690)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 3], 14, -187363961)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 8], 20, 1163531501)
_loc_3 = cls.md5_gg(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 13], 5, -1444681467)
_loc_6 = cls.md5_gg(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 2], 9, -51403784)
_loc_5 = cls.md5_gg(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 7], 14, 1735328473)
_loc_4 = cls.md5_gg(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 12], 20, -1926607734)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 5], 4, -378558)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 8], 11, -2022574463)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 11], 16, 1839030562)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 14], 23, -35309556)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 1], 4, -1530992060)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 4], 11, 1272893353)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 7], 16, -155497632)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 10], 23, -1094730640)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 13], 4, 681279174)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 0], 11, -358537222)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 3], 16, -722521979)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 6], 23, 76029189)
_loc_3 = cls.md5_hh(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 9], 4, -640364487)
_loc_6 = cls.md5_hh(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 12], 11, -421815835)
_loc_5 = cls.md5_hh(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 15], 16, 530742520)
_loc_4 = cls.md5_hh(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 2], 23, -995338651)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 0], 6, -198630844)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 7], 10, 1126891415)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 14], 15, -1416354905)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 5], 21, -57434055)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 12], 6, 1700485571)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 3], 10, -1894986606)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 10], 15, -1051523)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 1], 21, -2054922799)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 8], 6, 1873313359)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 15], 10, -30611744)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 6], 15, -1560198380)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 13], 21, 1309151649)
_loc_3 = cls.md5_ii(_loc_3, _loc_4, _loc_5, _loc_6, param1[_loc_7 + 4], 6, -145523070)
_loc_6 = cls.md5_ii(_loc_6, _loc_3, _loc_4, _loc_5, param1[_loc_7 + 11], 10, -1120210379)
_loc_5 = cls.md5_ii(_loc_5, _loc_6, _loc_3, _loc_4, param1[_loc_7 + 2], 15, 718787259)
_loc_4 = cls.md5_ii(_loc_4, _loc_5, _loc_6, _loc_3, param1[_loc_7 + 9], 21, -343485551)
_loc_3 = cls.safe_add(_loc_3, _loc_8)
_loc_4 = cls.safe_add(_loc_4, _loc_9)
_loc_5 = cls.safe_add(_loc_5, _loc_10)
_loc_6 = cls.safe_add(_loc_6, _loc_11)
return [_loc_3, _loc_4, _loc_5, _loc_6]
@classmethod
def md5_cmn(cls, param1, param2, param3, param4, param5, param6):
return cls.safe_add(
cls.bit_rol(cls.safe_add(cls.safe_add(param2, param1), cls.safe_add(param4, param6)), param5), param3)
@classmethod
def md5_ff(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 & param3 | ~param2 & param4, param1, param2, param5, param6, param7)
@classmethod
def md5_gg(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 & param4 | param3 & ~param4, param1, param2, param5, param6, param7)
@classmethod
def md5_hh(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param2 ^ param3 ^ param4, param1, param2, param5, param6, param7)
@classmethod
def md5_ii(cls, param1, param2, param3, param4, param5, param6, param7):
return cls.md5_cmn(param3 ^ (param2 | ~param4), param1, param2, param5, param6, param7)
@classmethod
def safe_add(cls, param1, param2):
_loc_3 = (param1 & 65535) + (param2 & 65535)
_loc_4 = (param1 >> 16) + (param2 >> 16) + (_loc_3 >> 16)
return cls.lshift(_loc_4, 16) | _loc_3 & 65535
@classmethod
def bit_rol(cls, param1, param2):
return cls.lshift(param1, param2) | (param1 & 0xFFFFFFFF) >> (32 - param2)
@staticmethod
def lshift(value, count):
r = (0xFFFFFFFF & value) << count
return -(~(r - 1) & 0xFFFFFFFF) if r > 0x7FFFFFFF else r
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video = self._download_json( video = self._download_json(
self._API_URL_TEMPLATE % video_id, video_id)['videos'][0] 'http://api.globovideos.com/videos/%s/playlist' % video_id,
video_id)['videos'][0]
title = video['title'] title = video['title']
formats = [] formats = []
for resource in video['resources']: for resource in video['resources']:
resource_id = resource.get('_id') resource_id = resource.get('_id')
if not resource_id or resource_id.endswith('manifest'): resource_url = resource.get('url')
if not resource_id or not resource_url:
continue continue
security = self._download_json( security = self._download_json(
self._SECURITY_URL_TEMPLATE % (video_id, resource_id), 'http://security.video.globo.com/videos/%s/hash' % video_id,
video_id, 'Downloading security hash for %s' % resource_id) video_id, 'Downloading security hash for %s' % resource_id, query={
'player': 'flash',
'version': '17.0.0.132',
'resource_id': resource_id,
})
security_hash = security.get('hash') security_hash = security.get('hash')
if not security_hash: if not security_hash:
@ -361,22 +121,28 @@ class GloboIE(InfoExtractor):
continue continue
hash_code = security_hash[:2] hash_code = security_hash[:2]
received_time = int(security_hash[2:12]) received_time = security_hash[2:12]
received_random = security_hash[12:22] received_random = security_hash[12:22]
received_md5 = security_hash[22:] received_md5 = security_hash[22:]
sign_time = received_time + self._RESIGN_EXPIRATION sign_time = compat_str(int(received_time) + 86400)
padding = '%010d' % random.randint(1, 10000000000) padding = '%010d' % random.randint(1, 10000000000)
signed_md5 = self.MD5.b64_md5(received_md5 + compat_str(sign_time) + padding) md5_data = (received_md5 + sign_time + padding + '0xFF01DD').encode()
signed_hash = hash_code + compat_str(received_time) + received_random + compat_str(sign_time) + padding + signed_md5 signed_md5 = base64.urlsafe_b64encode(hashlib.md5(md5_data).digest()).decode().strip('=')
signed_hash = hash_code + received_time + received_random + sign_time + padding + signed_md5
resource_url = resource['url']
signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash') signed_url = '%s?h=%s&k=%s' % (resource_url, signed_hash, 'flash')
if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'): if resource_id.endswith('m3u8') or resource_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
signed_url, resource_id, 'mp4', entry_protocol='m3u8_native', signed_url, resource_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False)) m3u8_id='hls', fatal=False))
elif resource_id.endswith('mpd') or resource_url.endswith('.mpd'):
formats.extend(self._extract_mpd_formats(
signed_url, resource_id, mpd_id='dash', fatal=False))
elif resource_id.endswith('manifest') or resource_url.endswith('/manifest'):
formats.extend(self._extract_ism_formats(
signed_url, resource_id, ism_id='mss', fatal=False))
else: else:
formats.append({ formats.append({
'url': signed_url, 'url': signed_url,

View File

@ -4,16 +4,19 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError,
int_or_none, int_or_none,
parse_age_limit,
parse_iso8601, parse_iso8601,
) )
class Go90IE(InfoExtractor): class Go90IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?go90\.com/videos/(?P<id>[0-9a-zA-Z]+)' _VALID_URL = r'https?://(?:www\.)?go90\.com/(?:videos|embed)/(?P<id>[0-9a-zA-Z]+)'
_TEST = { _TESTS = [{
'url': 'https://www.go90.com/videos/84BUqjLpf9D', 'url': 'https://www.go90.com/videos/84BUqjLpf9D',
'md5': 'efa7670dbbbf21a7b07b360652b24a32', 'md5': 'efa7670dbbbf21a7b07b360652b24a32',
'info_dict': { 'info_dict': {
@ -23,16 +26,35 @@ class Go90IE(InfoExtractor):
'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.', 'description': 'VICE\'s Karley Sciortino meets with activists who discuss the state\'s strong anti-porn stance. Then, VICE Sports explains NFL contracts.',
'timestamp': 1491868800, 'timestamp': 1491868800,
'upload_date': '20170411', 'upload_date': '20170411',
'age_limit': 14,
} }
} }, {
'url': 'https://www.go90.com/embed/261MflWkD3N',
'only_matching': True,
}]
_GEO_BYPASS = False
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
video_data = self._download_json(
'https://www.go90.com/api/view/items/' + video_id, try:
video_id, headers={ headers = self.geo_verification_headers()
headers.update({
'Content-Type': 'application/json; charset=utf-8', 'Content-Type': 'application/json; charset=utf-8',
}, data=b'{"client":"web","device_type":"pc"}') })
video_data = self._download_json(
'https://www.go90.com/api/view/items/' + video_id, video_id,
headers=headers, data=b'{"client":"web","device_type":"pc"}')
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 400:
message = self._parse_json(e.cause.read().decode(), None)['error']['message']
if 'region unavailable' in message:
self.raise_geo_restricted(countries=['US'])
raise ExtractorError(message, expected=True)
raise
if video_data.get('requires_drm'):
raise ExtractorError('This video is DRM protected.', expected=True)
main_video_asset = video_data['main_video_asset'] main_video_asset = video_data['main_video_asset']
episode_number = int_or_none(video_data.get('episode_number')) episode_number = int_or_none(video_data.get('episode_number'))
@ -123,4 +145,5 @@ class Go90IE(InfoExtractor):
'season_number': season_number, 'season_number': season_number,
'episode_number': episode_number, 'episode_number': episode_number,
'subtitles': subtitles, 'subtitles': subtitles,
'age_limit': parse_age_limit(video_data.get('rating')),
} }

View File

@ -8,6 +8,7 @@ from ..compat import compat_str
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
url_or_none,
urlencode_postdata, urlencode_postdata,
) )
@ -17,6 +18,8 @@ class HiDiveIE(InfoExtractor):
# Using X-Forwarded-For results in 403 HTTP error for HLS fragments, # Using X-Forwarded-For results in 403 HTTP error for HLS fragments,
# so disabling geo bypass completely # so disabling geo bypass completely
_GEO_BYPASS = False _GEO_BYPASS = False
_NETRC_MACHINE = 'hidive'
_LOGIN_URL = 'https://www.hidive.com/account/login'
_TESTS = [{ _TESTS = [{
'url': 'https://www.hidive.com/stream/the-comic-artist-and-his-assistants/s01e001', 'url': 'https://www.hidive.com/stream/the-comic-artist-and-his-assistants/s01e001',
@ -31,8 +34,26 @@ class HiDiveIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
'skip': 'Requires Authentication',
}] }]
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
webpage = self._download_webpage(self._LOGIN_URL, None)
form = self._search_regex(
r'(?s)<form[^>]+action="/account/login"[^>]*>(.+?)</form>',
webpage, 'login form')
data = self._hidden_inputs(form)
data.update({
'Email': email,
'Password': password,
})
self._download_webpage(
self._LOGIN_URL, None, 'Logging in', data=urlencode_postdata(data))
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
title, key = mobj.group('title', 'key') title, key = mobj.group('title', 'key')
@ -43,6 +64,7 @@ class HiDiveIE(InfoExtractor):
data=urlencode_postdata({ data=urlencode_postdata({
'Title': title, 'Title': title,
'Key': key, 'Key': key,
'PlayerId': 'f4f895ce1ca713ba263b91caeb1daa2d08904783',
})) }))
restriction = settings.get('restrictionReason') restriction = settings.get('restrictionReason')
@ -59,8 +81,8 @@ class HiDiveIE(InfoExtractor):
bitrates = rendition.get('bitrates') bitrates = rendition.get('bitrates')
if not isinstance(bitrates, dict): if not isinstance(bitrates, dict):
continue continue
m3u8_url = bitrates.get('hls') m3u8_url = url_or_none(bitrates.get('hls'))
if not isinstance(m3u8_url, compat_str): if not m3u8_url:
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native', m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
@ -72,13 +94,13 @@ class HiDiveIE(InfoExtractor):
if not isinstance(cc_file, list) or len(cc_file) < 3: if not isinstance(cc_file, list) or len(cc_file) < 3:
continue continue
cc_lang = cc_file[0] cc_lang = cc_file[0]
cc_url = cc_file[2] cc_url = url_or_none(cc_file[2])
if not isinstance(cc_lang, compat_str) or not isinstance( if not isinstance(cc_lang, compat_str) or not cc_url:
cc_url, compat_str):
continue continue
subtitles.setdefault(cc_lang, []).append({ subtitles.setdefault(cc_lang, []).append({
'url': cc_url, 'url': cc_url,
}) })
self._sort_formats(formats)
season_number = int_or_none(self._search_regex( season_number = int_or_none(self._search_regex(
r's(\d+)', key, 'season number', default=None)) r's(\d+)', key, 'season number', default=None))

View File

@ -66,7 +66,7 @@ class HRTiBaseIE(InfoExtractor):
self._logout_url = modules['user']['resources']['logout']['uri'] self._logout_url = modules['user']['resources']['logout']['uri']
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
# TODO: figure out authentication with cookies # TODO: figure out authentication with cookies
if username is None or password is None: if username is None or password is None:
self.raise_login_required() self.raise_login_required()

View File

@ -3,27 +3,27 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
mimetype2ext, mimetype2ext,
parse_duration,
qualities, qualities,
remove_end, url_or_none,
) )
class ImdbIE(InfoExtractor): class ImdbIE(InfoExtractor):
IE_NAME = 'imdb' IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers' IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title).+?[/-]vi(?P<id>\d+)' _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897', 'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': { 'info_dict': {
'id': '2524815897', 'id': '2524815897',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ice Age: Continental Drift Trailer (No. 2)', 'title': 'No. 2 from Ice Age: Continental Drift (2012)',
'description': 'md5:9061c2219254e5d14e03c25c98e96a81', 'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
} }
}, { }, {
'url': 'http://www.imdb.com/video/_/vi2524815897', 'url': 'http://www.imdb.com/video/_/vi2524815897',
@ -40,82 +40,68 @@ class ImdbIE(InfoExtractor):
}, { }, {
'url': 'http://www.imdb.com/title/tt4218696/videoplayer/vi2608641561', 'url': 'http://www.imdb.com/title/tt4218696/videoplayer/vi2608641561',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.imdb.com/list/ls009921623/videoplayer/vi260482329',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage('http://www.imdb.com/video/imdb/vi%s' % video_id, video_id) webpage = self._download_webpage(
descr = self._html_search_regex( 'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
r'(?s)<span itemprop="description">(.*?)</span>', video_metadata = self._parse_json(self._search_regex(
webpage, 'description', fatal=False) r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
player_url = 'http://www.imdb.com/video/imdb/vi%s/imdb/single' % video_id 'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
player_page = self._download_webpage( title = self._html_search_meta(
player_url, video_id, 'Downloading player page') ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
# the player page contains the info for the default format, we have to r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
# fetch other pages for the rest of the formats
extra_formats = re.findall(r'href="(?P<url>%s.*?)".*?>(?P<name>.*?)<' % re.escape(player_url), player_page)
format_pages = [
self._download_webpage(
f_url, video_id, 'Downloading info for %s format' % f_name)
for f_url, f_name in extra_formats]
format_pages.append(player_page)
quality = qualities(('SD', '480p', '720p', '1080p')) quality = qualities(('SD', '480p', '720p', '1080p'))
formats = [] formats = []
for format_page in format_pages: for encoding in video_metadata.get('encodings', []):
json_data = self._search_regex( if not encoding or not isinstance(encoding, dict):
r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
format_page, 'json data', flags=re.DOTALL)
info = self._parse_json(json_data, video_id, fatal=False)
if not info:
continue continue
format_info = info.get('videoPlayerObject', {}).get('video', {}) video_url = url_or_none(encoding.get('videoUrl'))
if not format_info: if not video_url:
continue continue
video_info_list = format_info.get('videoInfoList') ext = mimetype2ext(encoding.get(
if not video_info_list or not isinstance(video_info_list, list): 'mimeType')) or determine_ext(video_url)
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue continue
for video_info in video_info_list: format_id = encoding.get('definition')
if not video_info or not isinstance(video_info, dict): formats.append({
continue 'format_id': format_id,
video_url = video_info.get('videoUrl') 'url': video_url,
if not video_url or not isinstance(video_url, compat_str): 'ext': ext,
continue 'quality': quality(format_id),
if (video_info.get('videoMimeType') == 'application/x-mpegURL' or })
determine_ext(video_url) == 'm3u8'):
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
continue
format_id = format_info.get('ffname')
formats.append({
'format_id': format_id,
'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')),
'quality': quality(format_id),
})
self._sort_formats(formats) self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'title': remove_end(self._og_search_title(webpage), ' - IMDb'), 'title': title,
'formats': formats, 'formats': formats,
'description': descr, 'description': video_metadata.get('description'),
'thumbnail': format_info.get('slate'), 'thumbnail': video_metadata.get('slate', {}).get('url'),
'duration': parse_duration(video_metadata.get('duration')),
} }
class ImdbListIE(InfoExtractor): class ImdbListIE(InfoExtractor):
IE_NAME = 'imdb:list' IE_NAME = 'imdb:list'
IE_DESC = 'Internet Movie Database lists' IE_DESC = 'Internet Movie Database lists'
_VALID_URL = r'https?://(?:www\.)?imdb\.com/list/(?P<id>[\da-zA-Z_-]{11})' _VALID_URL = r'https?://(?:www\.)?imdb\.com/list/ls(?P<id>\d{9})(?!/videoplayer/vi\d+)'
_TEST = { _TEST = {
'url': 'http://www.imdb.com/list/JFs9NWw6XI0', 'url': 'https://www.imdb.com/list/ls009921623/',
'info_dict': { 'info_dict': {
'id': 'JFs9NWw6XI0', 'id': '009921623',
'title': 'March 23, 2012 Releases', 'title': 'The Bourne Legacy',
'description': 'A list of trailers, clips, and more from The Bourne Legacy, starring Jeremy Renner and Rachel Weisz.',
}, },
'playlist_count': 7, 'playlist_count': 8,
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -123,9 +109,13 @@ class ImdbListIE(InfoExtractor):
webpage = self._download_webpage(url, list_id) webpage = self._download_webpage(url, list_id)
entries = [ entries = [
self.url_result('http://www.imdb.com' + m, 'Imdb') self.url_result('http://www.imdb.com' + m, 'Imdb')
for m in re.findall(r'href="(/video/imdb/vi[^"]+)"\s+data-type="playlist"', webpage)] for m in re.findall(r'href="(/list/ls%s/videoplayer/vi[^"]+)"' % list_id, webpage)]
list_title = self._html_search_regex( list_title = self._html_search_regex(
r'<h1 class="header">(.*?)</h1>', webpage, 'list title') r'<h1[^>]+class="[^"]*header[^"]*"[^>]*>(.*?)</h1>',
webpage, 'list title')
list_description = self._html_search_regex(
r'<div[^>]+class="[^"]*list-description[^"]*"[^>]*><p>(.*?)</p>',
webpage, 'list description')
return self.playlist_result(entries, list_id, list_title) return self.playlist_result(entries, list_id, list_title, list_description)

View File

@ -3,7 +3,6 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
js_to_json, js_to_json,
@ -13,7 +12,7 @@ from ..utils import (
class ImgurIE(InfoExtractor): class ImgurIE(InfoExtractor):
_VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z]+)?$' _VALID_URL = r'https?://(?:i\.)?imgur\.com/(?:(?:gallery|(?:topic|r)/[^/]+)/)?(?P<id>[a-zA-Z0-9]{6,})(?:[/?#&]+|\.[a-z0-9]+)?$'
_TESTS = [{ _TESTS = [{
'url': 'https://i.imgur.com/A61SaA1.gifv', 'url': 'https://i.imgur.com/A61SaA1.gifv',
@ -21,7 +20,7 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1', 'id': 'A61SaA1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$', 'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The most awesome images on the Internet.', 'description': 'Imgur: The magic of the Internet',
}, },
}, { }, {
'url': 'https://imgur.com/A61SaA1', 'url': 'https://imgur.com/A61SaA1',
@ -29,7 +28,7 @@ class ImgurIE(InfoExtractor):
'id': 'A61SaA1', 'id': 'A61SaA1',
'ext': 'mp4', 'ext': 'mp4',
'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$', 'title': 're:Imgur GIF$|MRW gifv is up and running without any bugs$',
'description': 'Imgur: The most awesome images on the Internet.', 'description': 'Imgur: The magic of the Internet',
}, },
}, { }, {
'url': 'https://imgur.com/gallery/YcAQlkx', 'url': 'https://imgur.com/gallery/YcAQlkx',
@ -37,8 +36,6 @@ class ImgurIE(InfoExtractor):
'id': 'YcAQlkx', 'id': 'YcAQlkx',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....', 'title': 'Classic Steve Carell gif...cracks me up everytime....damn the repost downvotes....',
'description': 'Imgur: The most awesome images on the Internet.'
} }
}, { }, {
'url': 'http://imgur.com/topic/Funny/N8rOudd', 'url': 'http://imgur.com/topic/Funny/N8rOudd',
@ -46,12 +43,15 @@ class ImgurIE(InfoExtractor):
}, { }, {
'url': 'http://imgur.com/r/aww/VQcQPhM', 'url': 'http://imgur.com/r/aww/VQcQPhM',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://i.imgur.com/crGpqCV.mp4',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage( gifv_url = 'https://i.imgur.com/{id}.gifv'.format(id=video_id)
compat_urlparse.urljoin(url, video_id), video_id) webpage = self._download_webpage(gifv_url, video_id)
width = int_or_none(self._og_search_property( width = int_or_none(self._og_search_property(
'video:width', webpage, default=None)) 'video:width', webpage, default=None))
@ -107,7 +107,7 @@ class ImgurIE(InfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'formats': formats, 'formats': formats,
'description': self._og_search_description(webpage), 'description': self._og_search_description(webpage, default=None),
'title': self._og_search_title(webpage), 'title': self._og_search_title(webpage),
} }

View File

@ -21,6 +21,21 @@ class IncIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
# div with id=kaltura_player_1_kqs38cgm
'url': 'https://www.inc.com/oscar-raymundo/richard-branson-young-entrepeneurs.html',
'info_dict': {
'id': '1_kqs38cgm',
'ext': 'mp4',
'title': 'Branson: "In the end, you have to say, Screw it. Just do it."',
'description': 'md5:21b832d034f9af5191ca5959da5e9cb6',
'timestamp': 1364403232,
'upload_date': '20130327',
'uploader_id': 'incdigital@inc.com',
},
'params': {
'skip_download': True,
},
}, { }, {
'url': 'http://www.inc.com/video/david-whitford/founders-forum-tripadvisor-steve-kaufer-most-enjoyable-moment-for-entrepreneur.html', 'url': 'http://www.inc.com/video/david-whitford/founders-forum-tripadvisor-steve-kaufer-most-enjoyable-moment-for-entrepreneur.html',
'only_matching': True, 'only_matching': True,
@ -31,10 +46,13 @@ class IncIE(InfoExtractor):
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
partner_id = self._search_regex( partner_id = self._search_regex(
r'var\s+_?bizo_data_partner_id\s*=\s*["\'](\d+)', webpage, 'partner id') r'var\s+_?bizo_data_partner_id\s*=\s*["\'](\d+)', webpage,
'partner id', default='1034971')
kaltura_id = self._parse_json(self._search_regex( kaltura_id = self._search_regex(
r'pageInfo\.videos\s*=\s*\[(.+)\];', webpage, 'kaltura id'), r'id=(["\'])kaltura_player_(?P<id>.+?)\1', webpage, 'kaltura id',
default=None, group='id') or self._parse_json(self._search_regex(
r'pageInfo\.videos\s*=\s*\[(.+)\];', webpage, 'kaltura id'),
display_id)['vid_kaltura_id'] display_id)['vid_kaltura_id']
return self.url_result( return self.url_result(

View File

@ -1,11 +1,15 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
parse_age_limit, parse_age_limit,
parse_iso8601, parse_iso8601,
update_url_query,
) )
@ -13,7 +17,7 @@ class IndavideoEmbedIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:embed\.)?indavideo\.hu/player/video/|assets\.indavideo\.hu/swf/player\.swf\?.*\b(?:v(?:ID|id))=)(?P<id>[\da-f]+)' _VALID_URL = r'https?://(?:(?:embed\.)?indavideo\.hu/player/video/|assets\.indavideo\.hu/swf/player\.swf\?.*\b(?:v(?:ID|id))=)(?P<id>[\da-f]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://indavideo.hu/player/video/1bdc3c6d80/', 'url': 'http://indavideo.hu/player/video/1bdc3c6d80/',
'md5': 'f79b009c66194acacd40712a6778acfa', 'md5': 'c8a507a1c7410685f83a06eaeeaafeab',
'info_dict': { 'info_dict': {
'id': '1837039', 'id': '1837039',
'ext': 'mp4', 'ext': 'mp4',
@ -36,6 +40,20 @@ class IndavideoEmbedIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
# Some example URLs covered by generic extractor:
# http://indavideo.hu/video/Vicces_cica_1
# http://index.indavideo.hu/video/2015_0728_beregszasz
# http://auto.indavideo.hu/video/Sajat_utanfutoban_a_kis_tacsko
# http://erotika.indavideo.hu/video/Amator_tini_punci
# http://film.indavideo.hu/video/f_hrom_nagymamm_volt
# http://palyazat.indavideo.hu/video/Embertelen_dal_Dodgem_egyuttes
@staticmethod
def _extract_urls(webpage):
return re.findall(
r'<iframe[^>]+\bsrc=["\'](?P<url>(?:https?:)?//embed\.indavideo\.hu/player/video/[\da-f]+)',
webpage)
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -45,7 +63,14 @@ class IndavideoEmbedIE(InfoExtractor):
title = video['title'] title = video['title']
video_urls = video.get('video_files', []) video_urls = []
video_files = video.get('video_files')
if isinstance(video_files, list):
video_urls.extend(video_files)
elif isinstance(video_files, dict):
video_urls.extend(video_files.values())
video_file = video.get('video_file') video_file = video.get('video_file')
if video: if video:
video_urls.append(video_file) video_urls.append(video_file)
@ -58,11 +83,23 @@ class IndavideoEmbedIE(InfoExtractor):
if flv_url not in video_urls: if flv_url not in video_urls:
video_urls.append(flv_url) video_urls.append(flv_url)
formats = [{ filesh = video.get('filesh')
'url': video_url,
'height': int_or_none(self._search_regex( formats = []
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None)), for video_url in video_urls:
} for video_url in video_urls] height = int_or_none(self._search_regex(
r'\.(\d{3,4})\.mp4(?:\?|$)', video_url, 'height', default=None))
if filesh:
if not height:
continue
token = filesh.get(compat_str(height))
if token is None:
continue
video_url = update_url_query(video_url, {'token': token})
formats.append({
'url': video_url,
'height': height,
})
self._sort_formats(formats) self._sort_formats(formats)
timestamp = video.get('date') timestamp = video.get('date')
@ -89,55 +126,3 @@ class IndavideoEmbedIE(InfoExtractor):
'tags': tags, 'tags': tags,
'formats': formats, 'formats': formats,
} }
class IndavideoIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?indavideo\.hu/video/(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'http://indavideo.hu/video/Vicces_cica_1',
'md5': '8c82244ba85d2a2310275b318eb51eac',
'info_dict': {
'id': '1335611',
'display_id': 'Vicces_cica_1',
'ext': 'mp4',
'title': 'Vicces cica',
'description': 'Játszik a tablettel. :D',
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Jet_Pack',
'uploader_id': '491217',
'timestamp': 1390821212,
'upload_date': '20140127',
'duration': 7,
'age_limit': 0,
'tags': ['vicces', 'macska', 'cica', 'ügyes', 'nevetés', 'játszik', 'Cukiság', 'Jet_Pack'],
},
}, {
'url': 'http://index.indavideo.hu/video/2015_0728_beregszasz',
'only_matching': True,
}, {
'url': 'http://auto.indavideo.hu/video/Sajat_utanfutoban_a_kis_tacsko',
'only_matching': True,
}, {
'url': 'http://erotika.indavideo.hu/video/Amator_tini_punci',
'only_matching': True,
}, {
'url': 'http://film.indavideo.hu/video/f_hrom_nagymamm_volt',
'only_matching': True,
}, {
'url': 'http://palyazat.indavideo.hu/video/Embertelen_dal_Dodgem_egyuttes',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
embed_url = self._search_regex(
r'<link[^>]+rel="video_src"[^>]+href="(.+?)"', webpage, 'embed url')
return {
'_type': 'url_transparent',
'ie_key': 'IndavideoEmbed',
'url': embed_url,
'display_id': display_id,
}

View File

@ -17,6 +17,7 @@ from ..utils import (
lowercase_escape, lowercase_escape,
std_headers, std_headers,
try_get, try_get,
url_or_none,
) )
@ -170,7 +171,7 @@ class InstagramIE(InfoExtractor):
node = try_get(edge, lambda x: x['node'], dict) node = try_get(edge, lambda x: x['node'], dict)
if not node: if not node:
continue continue
node_video_url = try_get(node, lambda x: x['video_url'], compat_str) node_video_url = url_or_none(node.get('video_url'))
if not node_video_url: if not node_video_url:
continue continue
entries.append({ entries.append({

View File

@ -239,7 +239,7 @@ class IqiyiIE(InfoExtractor):
return ohdave_rsa_encrypt(data, e, N) return ohdave_rsa_encrypt(data, e, N)
def _login(self): def _login(self):
(username, password) = self._get_login_info() username, password = self._get_login_info()
# No authentication to be performed # No authentication to be performed
if not username: if not username:

View File

@ -13,15 +13,17 @@ from ..compat import (
compat_etree_register_namespace, compat_etree_register_namespace,
) )
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError,
extract_attributes, extract_attributes,
int_or_none,
merge_dicts,
parse_duration,
smuggle_url,
url_or_none,
xpath_with_ns, xpath_with_ns,
xpath_element, xpath_element,
xpath_text, xpath_text,
int_or_none,
parse_duration,
smuggle_url,
ExtractorError,
determine_ext,
) )
@ -129,64 +131,65 @@ class ITVIE(InfoExtractor):
resp_env = self._download_xml( resp_env = self._download_xml(
params['data-playlist-url'], video_id, params['data-playlist-url'], video_id,
headers=headers, data=etree.tostring(req_env)) headers=headers, data=etree.tostring(req_env), fatal=False)
playlist = xpath_element(resp_env, './/Playlist') if resp_env:
if playlist is None: playlist = xpath_element(resp_env, './/Playlist')
fault_code = xpath_text(resp_env, './/faultcode') if playlist is None:
fault_string = xpath_text(resp_env, './/faultstring') fault_code = xpath_text(resp_env, './/faultcode')
if fault_code == 'InvalidGeoRegion': fault_string = xpath_text(resp_env, './/faultstring')
self.raise_geo_restricted( if fault_code == 'InvalidGeoRegion':
msg=fault_string, countries=self._GEO_COUNTRIES) self.raise_geo_restricted(
elif fault_code not in ( msg=fault_string, countries=self._GEO_COUNTRIES)
'InvalidEntity', 'InvalidVodcrid', 'ContentUnavailable'): elif fault_code not in (
raise ExtractorError( 'InvalidEntity', 'InvalidVodcrid', 'ContentUnavailable'):
'%s said: %s' % (self.IE_NAME, fault_string), expected=True) raise ExtractorError(
info.update({ '%s said: %s' % (self.IE_NAME, fault_string), expected=True)
'title': self._og_search_title(webpage), info.update({
'episode_title': params.get('data-video-episode'), 'title': self._og_search_title(webpage),
'series': params.get('data-video-title'), 'episode_title': params.get('data-video-episode'),
}) 'series': params.get('data-video-title'),
else: })
title = xpath_text(playlist, 'EpisodeTitle', default=None) else:
info.update({ title = xpath_text(playlist, 'EpisodeTitle', default=None)
'title': title, info.update({
'episode_title': title, 'title': title,
'episode_number': int_or_none(xpath_text(playlist, 'EpisodeNumber')), 'episode_title': title,
'series': xpath_text(playlist, 'ProgrammeTitle'), 'episode_number': int_or_none(xpath_text(playlist, 'EpisodeNumber')),
'duration': parse_duration(xpath_text(playlist, 'Duration')), 'series': xpath_text(playlist, 'ProgrammeTitle'),
}) 'duration': parse_duration(xpath_text(playlist, 'Duration')),
video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True) })
media_files = xpath_element(video_element, 'MediaFiles', fatal=True) video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True)
rtmp_url = media_files.attrib['base'] media_files = xpath_element(video_element, 'MediaFiles', fatal=True)
rtmp_url = media_files.attrib['base']
for media_file in media_files.findall('MediaFile'): for media_file in media_files.findall('MediaFile'):
play_path = xpath_text(media_file, 'URL') play_path = xpath_text(media_file, 'URL')
if not play_path: if not play_path:
continue continue
tbr = int_or_none(media_file.get('bitrate'), 1000) tbr = int_or_none(media_file.get('bitrate'), 1000)
f = { f = {
'format_id': 'rtmp' + ('-%d' % tbr if tbr else ''), 'format_id': 'rtmp' + ('-%d' % tbr if tbr else ''),
'play_path': play_path, 'play_path': play_path,
# Providing this swfVfy allows to avoid truncated downloads # Providing this swfVfy allows to avoid truncated downloads
'player_url': 'http://www.itv.com/mercury/Mercury_VideoPlayer.swf', 'player_url': 'http://www.itv.com/mercury/Mercury_VideoPlayer.swf',
'page_url': url, 'page_url': url,
'tbr': tbr, 'tbr': tbr,
'ext': 'flv', 'ext': 'flv',
} }
app = self._search_regex( app = self._search_regex(
'rtmpe?://[^/]+/(.+)$', rtmp_url, 'app', default=None) 'rtmpe?://[^/]+/(.+)$', rtmp_url, 'app', default=None)
if app: if app:
f.update({ f.update({
'url': rtmp_url.split('?', 1)[0], 'url': rtmp_url.split('?', 1)[0],
'app': app, 'app': app,
}) })
else: else:
f['url'] = rtmp_url f['url'] = rtmp_url
formats.append(f) formats.append(f)
for caption_url in video_element.findall('ClosedCaptioningURIs/URL'): for caption_url in video_element.findall('ClosedCaptioningURIs/URL'):
if caption_url.text: if caption_url.text:
extract_subtitle(caption_url.text) extract_subtitle(caption_url.text)
ios_playlist_url = params.get('data-video-playlist') or params.get('data-video-id') ios_playlist_url = params.get('data-video-playlist') or params.get('data-video-id')
hmac = params.get('data-video-hmac') hmac = params.get('data-video-hmac')
@ -248,8 +251,8 @@ class ITVIE(InfoExtractor):
for sub in subs: for sub in subs:
if not isinstance(sub, dict): if not isinstance(sub, dict):
continue continue
href = sub.get('Href') href = url_or_none(sub.get('Href'))
if isinstance(href, compat_str): if href:
extract_subtitle(href) extract_subtitle(href)
if not info.get('duration'): if not info.get('duration'):
info['duration'] = parse_duration(video_data.get('Duration')) info['duration'] = parse_duration(video_data.get('Duration'))
@ -261,7 +264,17 @@ class ITVIE(InfoExtractor):
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
}) })
return info
webpage_info = self._search_json_ld(webpage, video_id, default={})
if not webpage_info.get('title'):
webpage_info['title'] = self._html_search_regex(
r'(?s)<h\d+[^>]+\bclass=["\'][^>]*episode-title["\'][^>]*>([^<]+)<',
webpage, 'title', default=None) or self._og_search_title(
webpage, default=None) or self._html_search_meta(
'twitter:title', webpage, 'title',
default=None) or webpage_info['episode']
return merge_dicts(info, webpage_info)
class ITVBTCCIE(InfoExtractor): class ITVBTCCIE(InfoExtractor):

View File

@ -7,6 +7,7 @@ from ..utils import (
int_or_none, int_or_none,
mimetype2ext, mimetype2ext,
remove_end, remove_end,
url_or_none,
) )
@ -73,11 +74,14 @@ class IwaraIE(InfoExtractor):
formats = [] formats = []
for a_format in video_data: for a_format in video_data:
format_uri = url_or_none(a_format.get('uri'))
if not format_uri:
continue
format_id = a_format.get('resolution') format_id = a_format.get('resolution')
height = int_or_none(self._search_regex( height = int_or_none(self._search_regex(
r'(\d+)p', format_id, 'height', default=None)) r'(\d+)p', format_id, 'height', default=None))
formats.append({ formats.append({
'url': a_format['uri'], 'url': self._proto_relative_url(format_uri, 'https:'),
'format_id': format_id, 'format_id': format_id,
'ext': mimetype2ext(a_format.get('mime')) or 'mp4', 'ext': mimetype2ext(a_format.get('mime')) or 'mp4',
'height': height, 'height': height,

View File

@ -1,10 +1,11 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote from ..compat import (
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
@ -57,12 +58,33 @@ class IzleseneIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
url = 'http://www.izlesene.com/video/%s' % video_id webpage = self._download_webpage('http://www.izlesene.com/video/%s' % video_id, video_id)
webpage = self._download_webpage(url, video_id)
video = self._parse_json(
self._search_regex(
r'videoObj\s*=\s*({.+?})\s*;\s*\n', webpage, 'streams'),
video_id)
title = video.get('videoTitle') or self._og_search_title(webpage)
formats = []
for stream in video['media']['level']:
source_url = stream.get('source')
if not source_url or not isinstance(source_url, compat_str):
continue
ext = determine_ext(url, 'mp4')
quality = stream.get('value')
height = int_or_none(quality)
formats.append({
'format_id': '%sp' % quality if quality else 'sd',
'url': compat_urllib_parse_unquote(source_url),
'ext': ext,
'height': height,
})
self._sort_formats(formats)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage, default=None) description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url( thumbnail = video.get('posterURL') or self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:') self._og_search_thumbnail(webpage), scheme='http:')
uploader = self._html_search_regex( uploader = self._html_search_regex(
@ -71,41 +93,15 @@ class IzleseneIE(InfoExtractor):
timestamp = parse_iso8601(self._html_search_meta( timestamp = parse_iso8601(self._html_search_meta(
'uploadDate', webpage, 'upload date')) 'uploadDate', webpage, 'upload date'))
duration = float_or_none(self._html_search_regex( duration = float_or_none(video.get('duration') or self._html_search_regex(
r'"videoduration"\s*:\s*"([^"]+)"', r'videoduration["\']?\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'duration', fatal=False), scale=1000) webpage, 'duration', fatal=False, group='value'), scale=1000)
view_count = str_to_int(get_element_by_id('videoViewCount', webpage)) view_count = str_to_int(get_element_by_id('videoViewCount', webpage))
comment_count = self._html_search_regex( comment_count = self._html_search_regex(
r'comment_count\s*=\s*\'([^\']+)\';', r'comment_count\s*=\s*\'([^\']+)\';',
webpage, 'comment_count', fatal=False) webpage, 'comment_count', fatal=False)
content_url = self._html_search_meta(
'contentURL', webpage, 'content URL', fatal=False)
ext = determine_ext(content_url, 'mp4')
# Might be empty for some videos.
streams = self._html_search_regex(
r'"qualitylevel"\s*:\s*"([^"]+)"', webpage, 'streams', default='')
formats = []
if streams:
for stream in streams.split('|'):
quality, url = re.search(r'\[(\w+)\](.+)', stream).groups()
formats.append({
'format_id': '%sp' % quality if quality else 'sd',
'url': compat_urllib_parse_unquote(url),
'ext': ext,
})
else:
stream_url = self._search_regex(
r'"streamurl"\s*:\s*"([^"]+)"', webpage, 'stream URL')
formats.append({
'format_id': 'sd',
'url': compat_urllib_parse_unquote(stream_url),
'ext': ext,
})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,

View File

@ -18,7 +18,7 @@ class JojIE(InfoExtractor):
joj:| joj:|
https?://media\.joj\.sk/embed/ https?://media\.joj\.sk/embed/
) )
(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}) (?P<id>[^/?#^]+)
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://media.joj.sk/embed/a388ec4c-6019-4a4a-9312-b1bee194e932', 'url': 'https://media.joj.sk/embed/a388ec4c-6019-4a4a-9312-b1bee194e932',
@ -29,16 +29,24 @@ class JojIE(InfoExtractor):
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 3118, 'duration': 3118,
} }
}, {
'url': 'https://media.joj.sk/embed/9i1cxv',
'only_matching': True,
}, { }, {
'url': 'joj:a388ec4c-6019-4a4a-9312-b1bee194e932', 'url': 'joj:a388ec4c-6019-4a4a-9312-b1bee194e932',
'only_matching': True, 'only_matching': True,
}, {
'url': 'joj:9i1cxv',
'only_matching': True,
}] }]
@staticmethod @staticmethod
def _extract_urls(webpage): def _extract_urls(webpage):
return re.findall( return [
r'<iframe\b[^>]+\bsrc=["\'](?P<url>(?:https?:)?//media\.joj\.sk/embed/[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})', mobj.group('url')
webpage) for mobj in re.finditer(
r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//media\.joj\.sk/embed/(?:(?!\1).)+)\1',
webpage)]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)

View File

@ -4,16 +4,14 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..aes import aes_decrypt_text from ..aes import aes_decrypt_text
from ..compat import ( from ..compat import compat_urllib_parse_unquote
compat_str,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
str_to_int, str_to_int,
strip_or_none, strip_or_none,
url_or_none,
) )
@ -55,7 +53,8 @@ class KeezMoviesIE(InfoExtractor):
encrypted = False encrypted = False
def extract_format(format_url, height=None): def extract_format(format_url, height=None):
if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//')): format_url = url_or_none(format_url)
if not format_url or not format_url.startswith(('http', '//')):
return return
if format_url in format_urls: if format_url in format_urls:
return return

View File

@ -2,11 +2,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
float_or_none, float_or_none,
int_or_none, int_or_none,
url_or_none,
) )
@ -109,7 +109,8 @@ class KonserthusetPlayIE(InfoExtractor):
captions = source.get('captionsAvailableLanguages') captions = source.get('captionsAvailableLanguages')
if isinstance(captions, dict): if isinstance(captions, dict):
for lang, subtitle_url in captions.items(): for lang, subtitle_url in captions.items():
if lang != 'none' and isinstance(subtitle_url, compat_str): subtitle_url = url_or_none(subtitle_url)
if lang != 'none' and subtitle_url:
subtitles.setdefault(lang, []).append({'url': subtitle_url}) subtitles.setdefault(lang, []).append({'url': subtitle_url})
return { return {

View File

@ -20,5 +20,7 @@ class LCIIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
wat_id = self._search_regex(r'data-watid=[\'"](\d+)', webpage, 'wat id') wat_id = self._search_regex(
(r'data-watid=[\'"](\d+)', r'idwat["\']?\s*:\s*["\']?(\d+)'),
webpage, 'wat id')
return self.url_result('wat:' + wat_id, 'Wat', wat_id) return self.url_result('wat:' + wat_id, 'Wat', wat_id)

View File

@ -130,7 +130,7 @@ class LeIE(InfoExtractor):
media_id, 'Downloading flash playJson data', query={ media_id, 'Downloading flash playJson data', query={
'id': media_id, 'id': media_id,
'platid': 1, 'platid': 1,
'splatid': 101, 'splatid': 105,
'format': 1, 'format': 1,
'source': 1000, 'source': 1000,
'tkey': self.calc_time_key(int(time.time())), 'tkey': self.calc_time_key(int(time.time())),

View File

@ -4,7 +4,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
) )
@ -44,21 +43,15 @@ class LyndaBaseIE(InfoExtractor):
form_data = self._hidden_inputs(form_html) form_data = self._hidden_inputs(form_html)
form_data.update(extra_form_data) form_data.update(extra_form_data)
try: response = self._download_json(
response = self._download_json( action_url, None, note,
action_url, None, note, data=urlencode_postdata(form_data),
data=urlencode_postdata(form_data), headers={
headers={ 'Referer': referrer_url,
'Referer': referrer_url, 'X-Requested-With': 'XMLHttpRequest',
'X-Requested-With': 'XMLHttpRequest', }, expected_status=(418, 500, ))
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
response = self._parse_json(e.cause.read().decode('utf-8'), None)
self._check_error(response, ('email', 'password'))
raise
self._check_error(response, 'ErrorMessage') self._check_error(response, ('email', 'password', 'ErrorMessage'))
return response, action_url return response, action_url

View File

@ -0,0 +1,125 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
orderedSet,
parse_duration,
try_get,
)
class MarkizaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?videoarchiv\.markiza\.sk/(?:video/(?:[^/]+/)*|embed/)(?P<id>\d+)(?:[_/]|$)'
_TESTS = [{
'url': 'http://videoarchiv.markiza.sk/video/oteckovia/84723_oteckovia-109',
'md5': 'ada4e9fad038abeed971843aa028c7b0',
'info_dict': {
'id': '139078',
'ext': 'mp4',
'title': 'Oteckovia 109',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 2760,
},
}, {
'url': 'http://videoarchiv.markiza.sk/video/televizne-noviny/televizne-noviny/85430_televizne-noviny',
'info_dict': {
'id': '85430',
'title': 'Televízne noviny',
},
'playlist_count': 23,
}, {
'url': 'http://videoarchiv.markiza.sk/video/oteckovia/84723',
'only_matching': True,
}, {
'url': 'http://videoarchiv.markiza.sk/video/84723',
'only_matching': True,
}, {
'url': 'http://videoarchiv.markiza.sk/video/filmy/85190_kamenak',
'only_matching': True,
}, {
'url': 'http://videoarchiv.markiza.sk/video/reflex/zo-zakulisia/84651_pribeh-alzbetky',
'only_matching': True,
}, {
'url': 'http://videoarchiv.markiza.sk/embed/85295',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://videoarchiv.markiza.sk/json/video_jwplayer7.json',
video_id, query={'id': video_id})
info = self._parse_jwplayer_data(data, m3u8_id='hls', mpd_id='dash')
if info.get('_type') == 'playlist':
info.update({
'id': video_id,
'title': try_get(
data, lambda x: x['details']['name'], compat_str),
})
else:
info['duration'] = parse_duration(
try_get(data, lambda x: x['details']['duration'], compat_str))
return info
class MarkizaPageIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:(?:[^/]+\.)?markiza|tvnoviny)\.sk/(?:[^/]+/)*(?P<id>\d+)_'
_TESTS = [{
'url': 'http://www.markiza.sk/soubiz/zahranicny/1923705_oteckovia-maju-svoj-den-ti-slavni-nie-su-o-nic-menej-rozkosni',
'md5': 'ada4e9fad038abeed971843aa028c7b0',
'info_dict': {
'id': '139355',
'ext': 'mp4',
'title': 'Oteckovia 110',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 2604,
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://dajto.markiza.sk/filmy-a-serialy/1774695_frajeri-vo-vegas',
'only_matching': True,
}, {
'url': 'http://superstar.markiza.sk/aktualne/1923870_to-je-ale-telo-spevacka-ukazala-sexy-postavicku-v-bikinach',
'only_matching': True,
}, {
'url': 'http://hybsa.markiza.sk/aktualne/1923790_uzasna-atmosfera-na-hybsa-v-poprade-superstaristi-si-prve-koncerty-pred-davom-ludi-poriadne-uzili',
'only_matching': True,
}, {
'url': 'http://doma.markiza.sk/filmy/1885250_moja-vysnivana-svadba',
'only_matching': True,
}, {
'url': 'http://www.tvnoviny.sk/domace/1923887_po-smrti-manzela-ju-cakalo-poriadne-prekvapenie',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if MarkizaIE.suitable(url) else super(MarkizaPageIE, cls).suitable(url)
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(
# Downloading for some hosts (e.g. dajto, doma) fails with 500
# although everything seems to be OK, so considering 500
# status code to be expected.
url, playlist_id, expected_status=500)
entries = [
self.url_result('http://videoarchiv.markiza.sk/video/%s' % video_id)
for video_id in orderedSet(re.findall(
r'(?:initPlayer_|data-entity=["\']|id=["\']player_)(\d+)',
webpage))]
return self.playlist_result(entries, playlist_id)

View File

@ -3,59 +3,75 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .theplatform import ThePlatformBaseIE
from ..compat import compat_str
from ..utils import ( from ..utils import (
determine_ext, ExtractorError,
parse_duration, int_or_none,
try_get, update_url_query,
unified_strdate,
) )
class MediasetIE(InfoExtractor): class MediasetIE(ThePlatformBaseIE):
_TP_TLD = 'eu'
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
mediaset:| mediaset:|
https?:// https?://
(?:www\.)?video\.mediaset\.it/ (?:(?:www|static3)\.)?mediasetplay\.mediaset\.it/
(?: (?:
(?:video|on-demand)/(?:[^/]+/)+[^/]+_| (?:video|on-demand)/(?:[^/]+/)+[^/]+_|
player/playerIFrame(?:Twitter)?\.shtml\?.*?\bid= player/index\.html\?.*?\bprogramGuid=
) )
)(?P<id>[0-9]+) )(?P<id>[0-9A-Z]{16})
''' '''
_TESTS = [{ _TESTS = [{
# full episode # full episode
'url': 'http://www.video.mediaset.it/video/hello_goodbye/full/quarta-puntata_661824.html', 'url': 'https://www.mediasetplay.mediaset.it/video/hellogoodbye/quarta-puntata_FAFU000000661824',
'md5': '9b75534d42c44ecef7bf1ffeacb7f85d', 'md5': '9b75534d42c44ecef7bf1ffeacb7f85d',
'info_dict': { 'info_dict': {
'id': '661824', 'id': 'FAFU000000661824',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Quarta puntata', 'title': 'Quarta puntata',
'description': 'md5:7183696d6df570e3412a5ef74b27c5e2', 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
'duration': 1414, 'duration': 1414.26,
'creator': 'mediaset',
'upload_date': '20161107', 'upload_date': '20161107',
'series': 'Hello Goodbye', 'series': 'Hello Goodbye',
'categories': ['reality'], 'timestamp': 1478532900,
'uploader': 'Rete 4',
'uploader_id': 'R4',
}, },
'expected_warnings': ['is not a supported codec'], }, {
'url': 'https://www.mediasetplay.mediaset.it/video/matrix/puntata-del-25-maggio_F309013801000501',
'md5': '288532f0ad18307705b01e581304cd7b',
'info_dict': {
'id': 'F309013801000501',
'ext': 'mp4',
'title': 'Puntata del 25 maggio',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 6565.007,
'upload_date': '20180526',
'series': 'Matrix',
'timestamp': 1527326245,
'uploader': 'Canale 5',
'uploader_id': 'C5',
},
'expected_warnings': ['HTTP Error 403: Forbidden'],
}, { }, {
# clip # clip
'url': 'http://www.video.mediaset.it/video/gogglebox/clip/un-grande-classico-della-commedia-sexy_661680.html', 'url': 'https://www.mediasetplay.mediaset.it/video/gogglebox/un-grande-classico-della-commedia-sexy_FAFU000000661680',
'only_matching': True, 'only_matching': True,
}, { }, {
# iframe simple # iframe simple
'url': 'http://www.video.mediaset.it/player/playerIFrame.shtml?id=665924&autoplay=true', 'url': 'https://static3.mediasetplay.mediaset.it/player/index.html?appKey=5ad3966b1de1c4000d5cec48&programGuid=FAFU000000665924&id=665924',
'only_matching': True, 'only_matching': True,
}, { }, {
# iframe twitter (from http://www.wittytv.it/se-prima-mi-fidavo-zero/) # iframe twitter (from http://www.wittytv.it/se-prima-mi-fidavo-zero/)
'url': 'https://www.video.mediaset.it/player/playerIFrameTwitter.shtml?id=665104&playrelated=false&autoplay=false&related=true&hidesocial=true', 'url': 'https://static3.mediasetplay.mediaset.it/player/index.html?appKey=5ad3966b1de1c4000d5cec48&programGuid=FAFU000000665104&id=665104',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'mediaset:661824', 'url': 'mediaset:FAFU000000665924',
'only_matching': True, 'only_matching': True,
}] }]
@ -68,51 +84,54 @@ class MediasetIE(InfoExtractor):
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) guid = self._match_id(url)
tp_path = 'PR1GhC/media/guid/2702976343/' + guid
video_list = self._download_json( info = self._extract_theplatform_metadata(tp_path, guid)
'http://cdnsel01.mediaset.net/GetCdn.aspx',
video_id, 'Downloading video CDN JSON', query={
'streamid': video_id,
'format': 'json',
})['videoList']
formats = [] formats = []
for format_url in video_list: subtitles = {}
if '.ism' in format_url: first_e = None
formats.extend(self._extract_ism_formats( for asset_type in ('SD', 'HD'):
format_url, video_id, ism_id='mss', fatal=False)) for f in ('MPEG4', 'MPEG-DASH', 'M3U', 'ISM'):
else: try:
formats.append({ tp_formats, tp_subtitles = self._extract_theplatform_smil(
'url': format_url, update_url_query('http://link.theplatform.%s/s/%s' % (self._TP_TLD, tp_path), {
'format_id': determine_ext(format_url), 'mbr': 'true',
}) 'formats': f,
'assetTypes': asset_type,
}), guid, 'Downloading %s %s SMIL data' % (f, asset_type))
except ExtractorError as e:
if not first_e:
first_e = e
break
for tp_f in tp_formats:
tp_f['quality'] = 1 if asset_type == 'HD' else 0
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
if first_e and not formats:
raise first_e
self._sort_formats(formats) self._sort_formats(formats)
mediainfo = self._download_json( fields = []
'http://plr.video.mediaset.it/html/metainfo.sjson', for templ, repls in (('tvSeason%sNumber', ('', 'Episode')), ('mediasetprogram$%s', ('brandTitle', 'numberOfViews', 'publishInfo'))):
video_id, 'Downloading video info JSON', query={ fields.extend(templ % repl for repl in repls)
'id': video_id, feed_data = self._download_json(
})['video'] 'https://feed.entertainment.tv.theplatform.eu/f/PR1GhC/mediaset-prod-all-programs/guid/-/' + guid,
guid, fatal=False, query={'fields': ','.join(fields)})
if feed_data:
publish_info = feed_data.get('mediasetprogram$publishInfo') or {}
info.update({
'episode_number': int_or_none(feed_data.get('tvSeasonEpisodeNumber')),
'season_number': int_or_none(feed_data.get('tvSeasonNumber')),
'series': feed_data.get('mediasetprogram$brandTitle'),
'uploader': publish_info.get('description'),
'uploader_id': publish_info.get('channel'),
'view_count': int_or_none(feed_data.get('mediasetprogram$numberOfViews')),
})
title = mediainfo['title'] info.update({
'id': guid,
creator = try_get(
mediainfo, lambda x: x['brand-info']['publisher'], compat_str)
category = try_get(
mediainfo, lambda x: x['brand-info']['category'], compat_str)
categories = [category] if category else None
return {
'id': video_id,
'title': title,
'description': mediainfo.get('short-description'),
'thumbnail': mediainfo.get('thumbnail'),
'duration': parse_duration(mediainfo.get('duration')),
'creator': creator,
'upload_date': unified_strdate(mediainfo.get('production-date')),
'webpage_url': mediainfo.get('url'),
'series': mediainfo.get('brand-value'),
'categories': categories,
'formats': formats, 'formats': formats,
} 'subtitles': subtitles,
})
return info

View File

@ -15,6 +15,7 @@ from ..utils import (
mimetype2ext, mimetype2ext,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
url_or_none,
urljoin, urljoin,
) )
@ -156,8 +157,8 @@ class MediasiteIE(InfoExtractor):
stream_formats = [] stream_formats = []
for unum, VideoUrl in enumerate(video_urls): for unum, VideoUrl in enumerate(video_urls):
video_url = VideoUrl.get('Location') video_url = url_or_none(VideoUrl.get('Location'))
if not video_url or not isinstance(video_url, compat_str): if not video_url:
continue continue
# XXX: if Stream.get('CanChangeScheme', False), switch scheme to HTTP/HTTPS # XXX: if Stream.get('CanChangeScheme', False), switch scheme to HTTP/HTTPS

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import int_or_none from ..utils import (
int_or_none,
parse_codecs,
)
class MinotoIE(InfoExtractor): class MinotoIE(InfoExtractor):
@ -26,7 +29,7 @@ class MinotoIE(InfoExtractor):
formats.extend(fmt_url, video_id, 'mp4', m3u8_id='hls', fatal=False) formats.extend(fmt_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
else: else:
fmt_profile = fmt.get('profile') or {} fmt_profile = fmt.get('profile') or {}
f = { formats.append({
'format_id': fmt_profile.get('name-short'), 'format_id': fmt_profile.get('name-short'),
'format_note': fmt_profile.get('name'), 'format_note': fmt_profile.get('name'),
'url': fmt_url, 'url': fmt_url,
@ -35,16 +38,8 @@ class MinotoIE(InfoExtractor):
'filesize': int_or_none(fmt.get('filesize')), 'filesize': int_or_none(fmt.get('filesize')),
'width': int_or_none(fmt.get('width')), 'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')), 'height': int_or_none(fmt.get('height')),
} 'codecs': parse_codecs(fmt.get('codecs')),
codecs = fmt.get('codecs') })
if codecs:
codecs = codecs.split(',')
if len(codecs) == 2:
f.update({
'vcodec': codecs[0],
'acodec': codecs[1],
})
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -1,84 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import uuid
from .common import InfoExtractor from .common import InfoExtractor
from .ooyala import OoyalaIE
from ..compat import (
compat_str,
compat_urlparse,
)
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
extract_attributes,
determine_ext,
smuggle_url, smuggle_url,
parse_duration, parse_duration,
) )
class MiTeleBaseIE(InfoExtractor):
def _get_player_info(self, url, webpage):
player_data = extract_attributes(self._search_regex(
r'(?s)(<ms-video-player.+?</ms-video-player>)',
webpage, 'ms video player'))
video_id = player_data['data-media-id']
if player_data.get('data-cms-id') == 'ooyala':
return self.url_result(
'ooyala:%s' % video_id, ie=OoyalaIE.ie_key(), video_id=video_id)
config_url = compat_urlparse.urljoin(url, player_data['data-config'])
config = self._download_json(
config_url, video_id, 'Downloading config JSON')
mmc_url = config['services']['mmc']
duration = None
formats = []
for m_url in (mmc_url, mmc_url.replace('/flash.json', '/html5.json')):
mmc = self._download_json(
m_url, video_id, 'Downloading mmc JSON')
if not duration:
duration = int_or_none(mmc.get('duration'))
for location in mmc['locations']:
gat = self._proto_relative_url(location.get('gat'), 'http:')
gcp = location.get('gcp')
ogn = location.get('ogn')
if None in (gat, gcp, ogn):
continue
token_data = {
'gcp': gcp,
'ogn': ogn,
'sta': 0,
}
media = self._download_json(
gat, video_id, data=json.dumps(token_data).encode('utf-8'),
headers={
'Content-Type': 'application/json;charset=utf-8',
'Referer': url,
})
stream = media.get('stream') or media.get('file')
if not stream:
continue
ext = determine_ext(stream)
if ext == 'f4m':
formats.extend(self._extract_f4m_formats(
stream + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
video_id, f4m_id='hds', fatal=False))
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
stream, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'formats': formats,
'thumbnail': player_data.get('data-poster') or config.get('poster', {}).get('imageUrl'),
'duration': duration,
}
class MiTeleIE(InfoExtractor): class MiTeleIE(InfoExtractor):
IE_DESC = 'mitele.es' IE_DESC = 'mitele.es'
_VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player' _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
@ -86,7 +16,7 @@ class MiTeleIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player', 'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
'info_dict': { 'info_dict': {
'id': '57b0dfb9c715da65618b4afa', 'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Tor, la web invisible', 'title': 'Tor, la web invisible',
'description': 'md5:3b6fce7eaa41b2d97358726378d9369f', 'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
@ -104,7 +34,7 @@ class MiTeleIE(InfoExtractor):
# no explicit title # no explicit title
'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player', 'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
'info_dict': { 'info_dict': {
'id': '57b0de3dc915da14058b4876', 'id': 'oyNG1iNTE6TAPP-JmCjbwfwJqqMMX3Vq',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Cuarto Milenio Temporada 6 Programa 226', 'title': 'Cuarto Milenio Temporada 6 Programa 226',
'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f', 'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
@ -128,40 +58,21 @@ class MiTeleIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
gigya_url = self._search_regex(
r'<gigya-api>[^>]*</gigya-api>[^>]*<script\s+src="([^"]*)">[^>]*</script>',
webpage, 'gigya', default=None)
gigya_sc = self._download_webpage(
compat_urlparse.urljoin('http://www.mitele.es/', gigya_url),
video_id, 'Downloading gigya script')
# Get a appKey/uuid for getting the session key
appKey = self._search_regex(
r'constant\s*\(\s*["\']_appGridApplicationKey["\']\s*,\s*["\']([0-9a-f]+)',
gigya_sc, 'appKey')
session_json = self._download_json(
'https://appgrid-api.cloud.accedo.tv/session',
video_id, 'Downloading session keys', query={
'appKey': appKey,
'uuid': compat_str(uuid.uuid4()),
})
paths = self._download_json( paths = self._download_json(
'https://appgrid-api.cloud.accedo.tv/metadata/general_configuration,%20web_configuration', 'https://www.mitele.es/amd/agp/web/metadata/general_configuration',
video_id, 'Downloading paths JSON', video_id, 'Downloading paths JSON')
query={'sessionKey': compat_str(session_json['sessionKey'])})
ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search'] ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com')
full_path = ooyala_s.get('full_path', '/search/v1/full/providers/')
source = self._download_json( source = self._download_json(
'http://%s%s%s/docs/%s' % ( '%s://%s%s%s/docs/%s' % (
ooyala_s['base_url'], ooyala_s['full_path'], ooyala_s.get('protocol', 'https'), base_url, full_path,
ooyala_s['provider_id'], video_id), ooyala_s.get('provider_id', '104951'), video_id),
video_id, 'Downloading data JSON', query={ video_id, 'Downloading data JSON', query={
'include_titles': 'Series,Season', 'include_titles': 'Series,Season',
'product_name': 'test', 'product_name': ooyala_s.get('product_name', 'test'),
'format': 'full', 'format': 'full',
})['hits']['hits'][0]['_source'] })['hits']['hits'][0]['_source']

View File

@ -1,96 +1,90 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re from .nhl import NHLBaseIE
from .common import InfoExtractor
from ..utils import (
parse_duration,
parse_iso8601,
)
class MLBIE(InfoExtractor): class MLBIE(NHLBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:[\da-z_-]+\.)*mlb\.com/ (?:[\da-z_-]+\.)*(?P<site>mlb)\.com/
(?: (?:
(?: (?:
(?:.*?/)?video/(?:topic/[\da-z_-]+/)?(?:v|.*?/c-)| (?:[^/]+/)*c-|
(?: (?:
shared/video/embed/(?:embed|m-internal-embed)\.html| shared/video/embed/(?:embed|m-internal-embed)\.html|
(?:[^/]+/)+(?:play|index)\.jsp| (?:[^/]+/)+(?:play|index)\.jsp|
)\?.*?\bcontent_id= )\?.*?\bcontent_id=
) )
(?P<id>n?\d+)| (?P<id>\d+)
(?:[^/]+/)*(?P<path>[^/]+)
) )
''' '''
_CONTENT_DOMAIN = 'content.mlb.com'
_TESTS = [ _TESTS = [
{ {
'url': 'http://m.mlb.com/sea/video/topic/51231442/v34698933/nymsea-ackley-robs-a-home-run-with-an-amazing-catch/?c_id=sea', 'url': 'https://www.mlb.com/mariners/video/ackleys-spectacular-catch/c-34698933',
'md5': 'ff56a598c2cf411a9a38a69709e97079', 'md5': '632358dacfceec06bad823b83d21df2d',
'info_dict': { 'info_dict': {
'id': '34698933', 'id': '34698933',
'ext': 'mp4', 'ext': 'mp4',
'title': "Ackley's spectacular catch", 'title': "Ackley's spectacular catch",
'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0', 'description': 'md5:7f5a981eb4f3cbc8daf2aeffa2215bf0',
'duration': 66, 'duration': 66,
'timestamp': 1405980600, 'timestamp': 1405995000,
'upload_date': '20140721', 'upload_date': '20140722',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{ {
'url': 'http://m.mlb.com/video/topic/81536970/v34496663/mianym-stanton-practices-for-the-home-run-derby', 'url': 'https://www.mlb.com/video/stanton-prepares-for-derby/c-34496663',
'md5': 'd9c022c10d21f849f49c05ae12a8a7e9', 'md5': 'bf2619bf9cacc0a564fc35e6aeb9219f',
'info_dict': { 'info_dict': {
'id': '34496663', 'id': '34496663',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Stanton prepares for Derby', 'title': 'Stanton prepares for Derby',
'description': 'md5:d00ce1e5fd9c9069e9c13ab4faedfa57', 'description': 'md5:d00ce1e5fd9c9069e9c13ab4faedfa57',
'duration': 46, 'duration': 46,
'timestamp': 1405105800, 'timestamp': 1405120200,
'upload_date': '20140711', 'upload_date': '20140711',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{ {
'url': 'http://m.mlb.com/video/topic/vtp_hrd_sponsor/v34578115/hrd-cespedes-wins-2014-gillette-home-run-derby', 'url': 'https://www.mlb.com/video/cespedes-repeats-as-derby-champ/c-34578115',
'md5': '0e6e73d509321e142409b695eadd541f', 'md5': '99bb9176531adc600b90880fb8be9328',
'info_dict': { 'info_dict': {
'id': '34578115', 'id': '34578115',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Cespedes repeats as Derby champ', 'title': 'Cespedes repeats as Derby champ',
'description': 'md5:08df253ce265d4cf6fb09f581fafad07', 'description': 'md5:08df253ce265d4cf6fb09f581fafad07',
'duration': 488, 'duration': 488,
'timestamp': 1405399936, 'timestamp': 1405414336,
'upload_date': '20140715', 'upload_date': '20140715',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{ {
'url': 'http://m.mlb.com/video/v34577915/bautista-on-derby-captaining-duties-his-performance', 'url': 'https://www.mlb.com/video/bautista-on-home-run-derby/c-34577915',
'md5': 'b8fd237347b844365d74ea61d4245967', 'md5': 'da8b57a12b060e7663ee1eebd6f330ec',
'info_dict': { 'info_dict': {
'id': '34577915', 'id': '34577915',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Bautista on Home Run Derby', 'title': 'Bautista on Home Run Derby',
'description': 'md5:b80b34031143d0986dddc64a8839f0fb', 'description': 'md5:b80b34031143d0986dddc64a8839f0fb',
'duration': 52, 'duration': 52,
'timestamp': 1405390722, 'timestamp': 1405405122,
'upload_date': '20140715', 'upload_date': '20140715',
'thumbnail': r're:^https?://.*\.jpg$', 'thumbnail': r're:^https?://.*\.jpg$',
}, },
}, },
{ {
'url': 'http://m.mlb.com/news/article/118550098/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer', 'url': 'https://www.mlb.com/news/blue-jays-kevin-pillar-goes-spidey-up-the-wall-to-rob-tim-beckham-of-a-homer/c-118550098',
'md5': 'aafaf5b0186fee8f32f20508092f8111', 'md5': 'e09e37b552351fddbf4d9e699c924d68',
'info_dict': { 'info_dict': {
'id': '75609783', 'id': '75609783',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Must C: Pillar climbs for catch', 'title': 'Must C: Pillar climbs for catch',
'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run', 'description': '4/15/15: Blue Jays outfielder Kevin Pillar continues his defensive dominance by climbing the wall in left to rob Tim Beckham of a home run',
'timestamp': 1429124820, 'timestamp': 1429139220,
'upload_date': '20150415', 'upload_date': '20150415',
} }
}, },
@ -111,7 +105,7 @@ class MLBIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}, },
{ {
'url': 'http://m.cardinals.mlb.com/stl/video/v51175783/atlstl-piscotty-makes-great-sliding-catch-on-line/?partnerId=as_mlb_20150321_42500876&adbid=579409712979910656&adbpl=tw&adbpr=52847728', 'url': 'https://www.mlb.com/cardinals/video/piscottys-great-sliding-catch/c-51175783',
'only_matching': True, 'only_matching': True,
}, },
{ {
@ -120,58 +114,7 @@ class MLBIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}, },
{ {
'url': 'http://washington.nationals.mlb.com/mlb/gameday/index.jsp?c_id=was&gid=2015_05_09_atlmlb_wasmlb_1&lang=en&content_id=108309983&mode=video#', 'url': 'https://www.mlb.com/cut4/carlos-gomez-borrowed-sunglasses-from-an-as-fan/c-278912842',
'only_matching': True, 'only_matching': True,
} }
] ]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
if not video_id:
video_path = mobj.group('path')
webpage = self._download_webpage(url, video_path)
video_id = self._search_regex(
[r'data-video-?id="(\d+)"', r'content_id=(\d+)'], webpage, 'video id')
detail = self._download_xml(
'http://m.mlb.com/gen/multimedia/detail/%s/%s/%s/%s.xml'
% (video_id[-3], video_id[-2], video_id[-1], video_id), video_id)
title = detail.find('./headline').text
description = detail.find('./big-blurb').text
duration = parse_duration(detail.find('./duration').text)
timestamp = parse_iso8601(detail.attrib['date'][:-5])
thumbnails = [{
'url': thumbnail.text,
} for thumbnail in detail.findall('./thumbnailScenarios/thumbnailScenario')]
formats = []
for media_url in detail.findall('./url'):
playback_scenario = media_url.attrib['playback_scenario']
fmt = {
'url': media_url.text,
'format_id': playback_scenario,
}
m = re.search(r'(?P<vbr>\d+)K_(?P<width>\d+)X(?P<height>\d+)', playback_scenario)
if m:
fmt.update({
'vbr': int(m.group('vbr')) * 1000,
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(fmt)
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
'thumbnails': thumbnails,
}

View File

@ -1,116 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import os.path
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
remove_start,
sanitized_Request,
urlencode_postdata,
)
class MonikerIE(InfoExtractor):
IE_DESC = 'allmyvideos.net and vidspot.net'
_VALID_URL = r'https?://(?:www\.)?(?:allmyvideos|vidspot)\.net/(?:(?:2|v)/v-)?(?P<id>[a-zA-Z0-9_-]+)'
_TESTS = [{
'url': 'http://allmyvideos.net/jih3nce3x6wn',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'jih3nce3x6wn',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'http://allmyvideos.net/embed-jih3nce3x6wn',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'jih3nce3x6wn',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'http://vidspot.net/l2ngsmhs8ci5',
'md5': '710883dee1bfc370ecf9fa6a89307c88',
'info_dict': {
'id': 'l2ngsmhs8ci5',
'ext': 'mp4',
'title': 'youtube-dl test video',
},
}, {
'url': 'https://www.vidspot.net/l2ngsmhs8ci5',
'only_matching': True,
}, {
'url': 'http://vidspot.net/2/v-ywDf99',
'md5': '5f8254ce12df30479428b0152fb8e7ba',
'info_dict': {
'id': 'ywDf99',
'ext': 'mp4',
'title': 'IL FAIT LE MALIN EN PORSHE CAYENNE ( mais pas pour longtemps)',
'description': 'IL FAIT LE MALIN EN PORSHE CAYENNE.',
},
}, {
'url': 'http://allmyvideos.net/v/v-HXZm5t',
'only_matching': True,
}]
def _real_extract(self, url):
orig_video_id = self._match_id(url)
video_id = remove_start(orig_video_id, 'embed-')
url = url.replace(orig_video_id, video_id)
assert re.match(self._VALID_URL, url) is not None
orig_webpage = self._download_webpage(url, video_id)
if '>File Not Found<' in orig_webpage:
raise ExtractorError('Video %s does not exist' % video_id, expected=True)
error = self._search_regex(
r'class="err">([^<]+)<', orig_webpage, 'error', default=None)
if error:
raise ExtractorError(
'%s returned error: %s' % (self.IE_NAME, error), expected=True)
builtin_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?/builtin-.+?)\1',
orig_webpage, 'builtin URL', default=None, group='url')
if builtin_url:
req = sanitized_Request(builtin_url)
req.add_header('Referer', url)
webpage = self._download_webpage(req, video_id, 'Downloading builtin page')
title = self._og_search_title(orig_webpage).strip()
description = self._og_search_description(orig_webpage).strip()
else:
fields = re.findall(r'type="hidden" name="(.+?)"\s* value="?(.+?)">', orig_webpage)
data = dict(fields)
post = urlencode_postdata(data)
headers = {
b'Content-Type': b'application/x-www-form-urlencoded',
}
req = sanitized_Request(url, post, headers)
webpage = self._download_webpage(
req, video_id, note='Downloading video page ...')
title = os.path.splitext(data['fname'])[0]
description = None
# Could be several links with different quality
links = re.findall(r'"file" : "?(.+?)",', webpage)
# Assume the links are ordered in quality
formats = [{
'url': l,
'quality': i,
} for i, l in enumerate(links)]
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'formats': formats,
}

Some files were not shown because too many files have changed in this diff Show More