Merge branch 'master' into tvpleextractor

This commit is contained in:
kjy00302 2016-04-03 03:54:02 +09:00
commit 25b4f17841
282 changed files with 3429 additions and 1202 deletions

58
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.01**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.01
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

58
.github/ISSUE_TEMPLATE_tmpl.md vendored Normal file
View File

@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version %(version)s
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

View File

@ -163,3 +163,7 @@ Patrick Griffis
Aidan Rowe Aidan Rowe
mutantmonkey mutantmonkey
Ben Congdon Ben Congdon
Kacper Michajłow
José Joaquín Atria
Viťas Strádal
Kagami Hiiragi

View File

@ -85,7 +85,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make * make (both GNU make and BSD make are supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites all: youtube-dl README.md CONTRIBUTING.md ISSUE_TEMPLATE.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete find . -name "*.class" -delete
@ -12,15 +12,7 @@ SHAREDIR ?= $(PREFIX)/share
PYTHON ?= /usr/bin/env python PYTHON ?= /usr/bin/env python
# set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local # set SYSCONFDIR to /etc if PREFIX=/usr or PREFIX=/usr/local
ifeq ($(PREFIX),/usr) SYSCONFDIR != if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi
SYSCONFDIR=/etc
else
ifeq ($(PREFIX),/usr/local)
SYSCONFDIR=/etc
else
SYSCONFDIR=$(PREFIX)/etc
endif
endif
install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
install -d $(DESTDIR)$(BINDIR) install -d $(DESTDIR)$(BINDIR)
@ -67,6 +59,9 @@ README.md: youtube_dl/*.py youtube_dl/*/*.py
CONTRIBUTING.md: README.md CONTRIBUTING.md: README.md
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md $(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
ISSUE_TEMPLATE.md:
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md .github/ISSUE_TEMPLATE.md
supportedsites: supportedsites:
$(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md $(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md

View File

@ -164,6 +164,8 @@ which means you can modify it, redistribute it or use it however you like.
(e.g. 50K or 4.2M) (e.g. 50K or 4.2M)
-R, --retries RETRIES Number of retries (default is 10), or -R, --retries RETRIES Number of retries (default is 10), or
"infinite". "infinite".
--fragment-retries RETRIES Number of retries for a fragment (default
is 10), or "infinite" (DASH only)
--buffer-size SIZE Size of download buffer (e.g. 1024 or 16K) --buffer-size SIZE Size of download buffer (e.g. 1024 or 16K)
(default is 1024) (default is 1024)
--no-resize-buffer Do not automatically adjust the buffer --no-resize-buffer Do not automatically adjust the buffer
@ -376,8 +378,8 @@ which means you can modify it, redistribute it or use it however you like.
--no-post-overwrites Do not overwrite post-processed files; the --no-post-overwrites Do not overwrite post-processed files; the
post-processed files are overwritten by post-processed files are overwritten by
default default
--embed-subs Embed subtitles in the video (only for mkv --embed-subs Embed subtitles in the video (only for mp4,
and mp4 videos) webm and mkv videos)
--embed-thumbnail Embed thumbnail in the audio as cover art --embed-thumbnail Embed thumbnail in the audio as cover art
--add-metadata Write metadata to the video file --add-metadata Write metadata to the video file
--metadata-from-title FORMAT Parse additional metadata like song title / --metadata-from-title FORMAT Parse additional metadata like song title /
@ -598,6 +600,7 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `vcodec`: Name of the video codec in use - `vcodec`: Name of the video codec in use
- `container`: Name of the container format - `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native` - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster. Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
@ -831,7 +834,7 @@ To run the test, simply invoke your favorite test runner, or execute a test file
If you want to create a build of youtube-dl yourself, you'll need If you want to create a build of youtube-dl yourself, you'll need
* python * python
* make * make (both GNU make and BSD make are supported)
* pandoc * pandoc
* zip * zip
* nosetests * nosetests

View File

@ -0,0 +1,29 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View File

@ -45,9 +45,9 @@ fi
/bin/echo -e "\n### Changing version in version.py..." /bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation and youtube_dl/version.py..." /bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md supportedsites make README.md CONTRIBUTING.md ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md docs/supportedsites.md youtube_dl/version.py git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version" git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..." /bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@ -74,6 +74,7 @@
- **Bigflix** - **Bigflix**
- **Bild**: Bild.de - **Bild**: Bild.de
- **BiliBili** - **BiliBili**
- **BioBioChileTV**
- **BleacherReport** - **BleacherReport**
- **BleacherReportCMS** - **BleacherReportCMS**
- **blinkx** - **blinkx**
@ -81,6 +82,7 @@
- **BokeCC** - **BokeCC**
- **Bpb**: Bundeszentrale für politische Bildung - **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk Mediathek - **BR**: Bayerischer Rundfunk Mediathek
- **BravoTV**
- **Break** - **Break**
- **brightcove:legacy** - **brightcove:legacy**
- **brightcove:new** - **brightcove:new**
@ -99,6 +101,7 @@
- **CBSNews**: CBS News - **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos - **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports** - **CBSSports**
- **CDA**
- **CeskaTelevize** - **CeskaTelevize**
- **channel9**: Channel 9 - **channel9**: Channel 9
- **Chaturbate** - **Chaturbate**
@ -115,6 +118,7 @@
- **Clubic** - **Clubic**
- **Clyp** - **Clyp**
- **cmt.com** - **cmt.com**
- **CNBC**
- **CNET** - **CNET**
- **CNN** - **CNN**
- **CNNArticle** - **CNNArticle**
@ -131,6 +135,7 @@
- **CrooksAndLiars** - **CrooksAndLiars**
- **Crunchyroll** - **Crunchyroll**
- **crunchyroll:playlist** - **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr** - **culturebox.francetvinfo.fr**
@ -243,6 +248,7 @@
- **GPUTechConf** - **GPUTechConf**
- **Groupon** - **Groupon**
- **Hark** - **Hark**
- **HBO**
- **HearThisAt** - **HearThisAt**
- **Heise** - **Heise**
- **HellPorno** - **HellPorno**
@ -343,6 +349,7 @@
- **MiTele**: mitele.es - **MiTele**: mitele.es
- **mixcloud** - **mixcloud**
- **MLB** - **MLB**
- **Mnet**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex** - **Mofosex**
- **Mojvideo** - **Mojvideo**
@ -371,7 +378,8 @@
- **myvideo** (Currently broken) - **myvideo** (Currently broken)
- **MyVidster** - **MyVidster**
- **n-tv.de** - **n-tv.de**
- **NationalGeographic** - **natgeo**
- **natgeo:channel**
- **Naver** - **Naver**
- **NBA** - **NBA**
- **NBC** - **NBC**
@ -439,6 +447,7 @@
- **OnionStudios** - **OnionStudios**
- **Ooyala** - **Ooyala**
- **OoyalaExternal** - **OoyalaExternal**
- **Openload**
- **OraTV** - **OraTV**
- **orf:fm4**: radio FM4 - **orf:fm4**: radio FM4
- **orf:iptv**: iptv.ORF.at - **orf:iptv**: iptv.ORF.at
@ -499,6 +508,7 @@
- **Restudy** - **Restudy**
- **ReverbNation** - **ReverbNation**
- **Revision3** - **Revision3**
- **RICE**
- **RingTV** - **RingTV**
- **RottenTomatoes** - **RottenTomatoes**
- **Roxwel** - **Roxwel**
@ -523,6 +533,7 @@
- **RUTV**: RUTV.RU - **RUTV**: RUTV.RU
- **Ruutu** - **Ruutu**
- **safari**: safaribooksonline.com online video - **safari**: safaribooksonline.com online video
- **safari:api**
- **safari:course**: safaribooksonline.com online courses - **safari:course**: safaribooksonline.com online courses
- **Sandia**: Sandia National Laboratories - **Sandia**: Sandia National Laboratories
- **Sapo**: SAPO Vídeos - **Sapo**: SAPO Vídeos
@ -610,13 +621,14 @@
- **Telegraaf** - **Telegraaf**
- **TeleMB** - **TeleMB**
- **TeleTask** - **TeleTask**
- **TenPlay**
- **TF1** - **TF1**
- **TheIntercept** - **TheIntercept**
- **TheOnion** - **TheOnion**
- **ThePlatform** - **ThePlatform**
- **ThePlatformFeed** - **ThePlatformFeed**
- **TheScene**
- **TheSixtyOne** - **TheSixtyOne**
- **TheStar**
- **ThisAmericanLife** - **ThisAmericanLife**
- **ThisAV** - **ThisAV**
- **THVideo** - **THVideo**
@ -650,6 +662,7 @@
- **tv.dfb.de** - **tv.dfb.de**
- **TV2** - **TV2**
- **TV2Article** - **TV2Article**
- **TV3**
- **TV4**: tv4.se and tv4play.se - **TV4**: tv4.se and tv4play.se
- **TVC** - **TVC**
- **TVCArticle** - **TVCArticle**
@ -730,6 +743,7 @@
- **vlive** - **vlive**
- **Vodlocker** - **Vodlocker**
- **VoiceRepublic** - **VoiceRepublic**
- **VoxMedia**
- **Vporn** - **Vporn**
- **vpro**: npo.nl and ntr.nl - **vpro**: npo.nl and ntr.nl
- **VRT** - **VRT**
@ -783,6 +797,7 @@
- **youtube:channel**: YouTube.com channels - **youtube:channel**: YouTube.com channels
- **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication) - **youtube:favorites**: YouTube.com favourite videos, ":ytfav" for short (requires authentication)
- **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication) - **youtube:history**: Youtube watch history, ":ythistory" for short (requires authentication)
- **youtube:live**: YouTube.com live streams
- **youtube:playlist**: YouTube.com playlists - **youtube:playlist**: YouTube.com playlists
- **youtube:playlists**: YouTube.com user/channel playlists - **youtube:playlists**: YouTube.com user/channel playlists
- **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication) - **youtube:recommended**: YouTube.com recommended videos, ":ytrec" for short (requires authentication)

View File

@ -2,5 +2,5 @@
universal = True universal = True
[flake8] [flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py,build,.git exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/make_issue_template.py,setup.py,build,.git
ignore = E402,E501,E731 ignore = E402,E501,E731

View File

@ -222,6 +222,11 @@ class TestFormatSelection(unittest.TestCase):
downloaded = ydl.downloaded_info_dicts[0] downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low') self.assertEqual(downloaded['format_id'], 'dash-video-low')
ydl = YDL({'format': 'bestvideo[format_id^=dash][format_id$=low]'})
ydl.process_ie_result(info_dict.copy())
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'dash-video-low')
formats = [ formats = [
{'format_id': 'vid-vcodec-dot', 'ext': 'mp4', 'preference': 1, 'vcodec': 'avc1.123456', 'acodec': 'none', 'url': TEST_URL}, {'format_id': 'vid-vcodec-dot', 'ext': 'mp4', 'preference': 1, 'vcodec': 'avc1.123456', 'acodec': 'none', 'url': TEST_URL},
] ]

View File

@ -19,6 +19,7 @@ from youtube_dl.compat import (
compat_str, compat_str,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus, compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode,
) )
@ -70,6 +71,12 @@ class TestCompat(unittest.TestCase):
self.assertEqual(compat_urllib_parse_unquote_plus('abc%20def'), 'abc def') self.assertEqual(compat_urllib_parse_unquote_plus('abc%20def'), 'abc def')
self.assertEqual(compat_urllib_parse_unquote_plus('%7e/abc+def'), '~/abc def') self.assertEqual(compat_urllib_parse_unquote_plus('%7e/abc+def'), '~/abc def')
def test_compat_urllib_parse_urlencode(self):
self.assertEqual(compat_urllib_parse_urlencode({'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
def test_compat_shlex_split(self): def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two']) self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])

View File

@ -1,4 +1,5 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
# Allow direct execution # Allow direct execution
@ -120,5 +121,14 @@ class TestProxy(unittest.TestCase):
response = ydl.urlopen(req).read().decode('utf-8') response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url)) self.assertEqual(response, 'cn: {0}'.format(url))
def test_proxy_with_idn(self):
ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port),
})
url = 'http://中文.tw/'
response = ydl.urlopen(url).read().decode('utf-8')
# b'xn--fiq228c' is '中文'.encode('idna')
self.assertEqual(response, 'normal: http://xn--fiq228c.tw/')
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -28,6 +28,7 @@ from youtube_dl.utils import (
encodeFilename, encodeFilename,
escape_rfc3986, escape_rfc3986,
escape_url, escape_url,
extract_attributes,
ExtractorError, ExtractorError,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
@ -77,6 +78,7 @@ from youtube_dl.utils import (
cli_bool_option, cli_bool_option,
) )
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_chr,
compat_etree_fromstring, compat_etree_fromstring,
compat_urlparse, compat_urlparse,
compat_parse_qs, compat_parse_qs,
@ -575,11 +577,11 @@ class TestUtil(unittest.TestCase):
) )
self.assertEqual( self.assertEqual(
escape_url('http://тест.рф/фрагмент'), escape_url('http://тест.рф/фрагмент'),
'http://тест.рф/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82' 'http://xn--e1aybc.xn--p1ai/%D1%84%D1%80%D0%B0%D0%B3%D0%BC%D0%B5%D0%BD%D1%82'
) )
self.assertEqual( self.assertEqual(
escape_url('http://тест.рф/абв?абв=абв#абв'), escape_url('http://тест.рф/абв?абв=абв#абв'),
'http://тест.рф/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2' 'http://xn--e1aybc.xn--p1ai/%D0%B0%D0%B1%D0%B2?%D0%B0%D0%B1%D0%B2=%D0%B0%D0%B1%D0%B2#%D0%B0%D0%B1%D0%B2'
) )
self.assertEqual(escape_url('http://vimeo.com/56015672#at=0'), 'http://vimeo.com/56015672#at=0') self.assertEqual(escape_url('http://vimeo.com/56015672#at=0'), 'http://vimeo.com/56015672#at=0')
@ -629,6 +631,44 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{"abc": "def",}') on = js_to_json('{"abc": "def",}')
self.assertEqual(json.loads(on), {'abc': 'def'}) self.assertEqual(json.loads(on), {'abc': 'def'})
def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
self.assertEqual(extract_attributes('<e x=y>'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x="a \'b\' c">'), {'x': "a 'b' c"})
self.assertEqual(extract_attributes('<e x=\'a "b" c\'>'), {'x': 'a "b" c'})
self.assertEqual(extract_attributes('<e x="&#121;">'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x="&#x79;">'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x="&amp;">'), {'x': '&'}) # XML
self.assertEqual(extract_attributes('<e x="&quot;">'), {'x': '"'})
self.assertEqual(extract_attributes('<e x="&pound;">'), {'x': '£'}) # HTML 3.2
self.assertEqual(extract_attributes('<e x="&lambda;">'), {'x': 'λ'}) # HTML 4.0
self.assertEqual(extract_attributes('<e x="&foo">'), {'x': '&foo'})
self.assertEqual(extract_attributes('<e x="\'">'), {'x': "'"})
self.assertEqual(extract_attributes('<e x=\'"\'>'), {'x': '"'})
self.assertEqual(extract_attributes('<e x >'), {'x': None})
self.assertEqual(extract_attributes('<e x=y a>'), {'x': 'y', 'a': None})
self.assertEqual(extract_attributes('<e x= y>'), {'x': 'y'})
self.assertEqual(extract_attributes('<e x=1 y=2 x=3>'), {'y': '2', 'x': '3'})
self.assertEqual(extract_attributes('<e \nx=\ny\n>'), {'x': 'y'})
self.assertEqual(extract_attributes('<e \nx=\n"y"\n>'), {'x': 'y'})
self.assertEqual(extract_attributes("<e \nx=\n'y'\n>"), {'x': 'y'})
self.assertEqual(extract_attributes('<e \nx="\ny\n">'), {'x': '\ny\n'})
self.assertEqual(extract_attributes('<e CAPS=x>'), {'caps': 'x'}) # Names lowercased
self.assertEqual(extract_attributes('<e x=1 X=2>'), {'x': '2'})
self.assertEqual(extract_attributes('<e X=1 x=2>'), {'x': '2'})
self.assertEqual(extract_attributes('<e _:funny-name1=1>'), {'_:funny-name1': '1'})
self.assertEqual(extract_attributes('<e x="Fáilte 世界 \U0001f600">'), {'x': 'Fáilte 世界 \U0001f600'})
self.assertEqual(extract_attributes('<e x="décompose&#769;">'), {'x': 'décompose\u0301'})
# "Narrow" Python builds don't support unicode code points outside BMP.
try:
compat_chr(0x10000)
supports_outside_bmp = True
except ValueError:
supports_outside_bmp = False
if supports_outside_bmp:
self.assertEqual(extract_attributes('<e x="Smile &#128512;!">'), {'x': 'Smile \U0001f600!'})
def test_clean_html(self): def test_clean_html(self):
self.assertEqual(clean_html('a:\nb'), 'a: b') self.assertEqual(clean_html('a:\nb'), 'a: b')
self.assertEqual(clean_html('a:\n "b"'), 'a: "b"') self.assertEqual(clean_html('a:\n "b"'), 'a: "b"')
@ -662,6 +702,8 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_count('1.000'), 1000) self.assertEqual(parse_count('1.000'), 1000)
self.assertEqual(parse_count('1.1k'), 1100) self.assertEqual(parse_count('1.1k'), 1100)
self.assertEqual(parse_count('1.1kk'), 1100000) self.assertEqual(parse_count('1.1kk'), 1100000)
self.assertEqual(parse_count('1.1kk '), 1100000)
self.assertEqual(parse_count('1.1kk views'), 1100000)
def test_version_tuple(self): def test_version_tuple(self):
self.assertEqual(version_tuple('1'), (1,)) self.assertEqual(version_tuple('1'), (1,))

View File

@ -8,6 +8,6 @@ deps =
passenv = HOME passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo # test.test_download:TestDownload.test_NowVideo

View File

@ -39,6 +39,8 @@ from .compat import (
compat_urllib_request_DataHandler, compat_urllib_request_DataHandler,
) )
from .utils import ( from .utils import (
age_restricted,
args_to_str,
ContentTooShortError, ContentTooShortError,
date_from_str, date_from_str,
DateRange, DateRange,
@ -58,13 +60,16 @@ from .utils import (
PagedList, PagedList,
parse_filesize, parse_filesize,
PerRequestProxyHandler, PerRequestProxyHandler,
PostProcessingError,
platform_name, platform_name,
PostProcessingError,
preferredencoding, preferredencoding,
prepend_extension,
render_table, render_table,
replace_extension,
SameFileError, SameFileError,
sanitize_filename, sanitize_filename,
sanitize_path, sanitize_path,
sanitize_url,
sanitized_Request, sanitized_Request,
std_headers, std_headers,
subtitles_filename, subtitles_filename,
@ -75,10 +80,6 @@ from .utils import (
write_string, write_string,
YoutubeDLCookieProcessor, YoutubeDLCookieProcessor,
YoutubeDLHandler, YoutubeDLHandler,
prepend_extension,
replace_extension,
args_to_str,
age_restricted,
) )
from .cache import Cache from .cache import Cache
from .extractor import get_info_extractor, gen_extractors from .extractor import get_info_extractor, gen_extractors
@ -905,7 +906,7 @@ class YoutubeDL(object):
'*=': lambda attr, value: value in attr, '*=': lambda attr, value: value in attr,
} }
str_operator_rex = re.compile(r'''(?x) str_operator_rex = re.compile(r'''(?x)
\s*(?P<key>ext|acodec|vcodec|container|protocol) \s*(?P<key>ext|acodec|vcodec|container|protocol|format_id)
\s*(?P<op>%s)(?P<none_inclusive>\s*\?)? \s*(?P<op>%s)(?P<none_inclusive>\s*\?)?
\s*(?P<value>[a-zA-Z0-9._-]+) \s*(?P<value>[a-zA-Z0-9._-]+)
\s*$ \s*$
@ -1229,6 +1230,7 @@ class YoutubeDL(object):
t.get('preference'), t.get('width'), t.get('height'), t.get('preference'), t.get('width'), t.get('height'),
t.get('id'), t.get('url'))) t.get('id'), t.get('url')))
for i, t in enumerate(thumbnails): for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'): if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height']) t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None: if t.get('id') is None:
@ -1263,6 +1265,8 @@ class YoutubeDL(object):
if subtitles: if subtitles:
for _, subtitle in subtitles.items(): for _, subtitle in subtitles.items():
for subtitle_format in subtitle: for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if 'ext' not in subtitle_format: if 'ext' not in subtitle_format:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower() subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
@ -1292,6 +1296,8 @@ class YoutubeDL(object):
if 'url' not in format: if 'url' not in format:
raise ExtractorError('Missing "url" key in result (index %d)' % i) raise ExtractorError('Missing "url" key in result (index %d)' % i)
format['url'] = sanitize_url(format['url'])
if format.get('format_id') is None: if format.get('format_id') is None:
format['format_id'] = compat_str(i) format['format_id'] = compat_str(i)
else: else:

View File

@ -144,14 +144,20 @@ def _real_main(argv=None):
if numeric_limit is None: if numeric_limit is None:
parser.error('invalid max_filesize specified') parser.error('invalid max_filesize specified')
opts.max_filesize = numeric_limit opts.max_filesize = numeric_limit
if opts.retries is not None:
if opts.retries in ('inf', 'infinite'): def parse_retries(retries):
opts_retries = float('inf') if retries in ('inf', 'infinite'):
parsed_retries = float('inf')
else: else:
try: try:
opts_retries = int(opts.retries) parsed_retries = int(retries)
except (TypeError, ValueError): except (TypeError, ValueError):
parser.error('invalid retry count specified') parser.error('invalid retry count specified')
return parsed_retries
if opts.retries is not None:
opts.retries = parse_retries(opts.retries)
if opts.fragment_retries is not None:
opts.fragment_retries = parse_retries(opts.fragment_retries)
if opts.buffersize is not None: if opts.buffersize is not None:
numeric_buffersize = FileDownloader.parse_bytes(opts.buffersize) numeric_buffersize = FileDownloader.parse_bytes(opts.buffersize)
if numeric_buffersize is None: if numeric_buffersize is None:
@ -299,7 +305,8 @@ def _real_main(argv=None):
'force_generic_extractor': opts.force_generic_extractor, 'force_generic_extractor': opts.force_generic_extractor,
'ratelimit': opts.ratelimit, 'ratelimit': opts.ratelimit,
'nooverwrites': opts.nooverwrites, 'nooverwrites': opts.nooverwrites,
'retries': opts_retries, 'retries': opts.retries,
'fragment_retries': opts.fragment_retries,
'buffersize': opts.buffersize, 'buffersize': opts.buffersize,
'noresizebuffer': opts.noresizebuffer, 'noresizebuffer': opts.noresizebuffer,
'continuedl': opts.continue_dl, 'continuedl': opts.continue_dl,

View File

@ -77,6 +77,11 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
from urllib import urlretrieve as compat_urlretrieve from urllib import urlretrieve as compat_urlretrieve
try:
from html.parser import HTMLParser as compat_HTMLParser
except ImportError: # Python 2
from HTMLParser import HTMLParser as compat_HTMLParser
try: try:
from subprocess import DEVNULL from subprocess import DEVNULL
@ -164,6 +169,31 @@ except ImportError: # Python 2
string = string.replace('+', ' ') string = string.replace('+', ' ')
return compat_urllib_parse_unquote(string, encoding, errors) return compat_urllib_parse_unquote(string, encoding, errors)
try:
from urllib.parse import urlencode as compat_urllib_parse_urlencode
except ImportError: # Python 2
# Python 2 will choke in urlencode on mixture of byte and unicode strings.
# Possible solutions are to either port it from python 3 with all
# the friends or manually ensure input query contains only byte strings.
# We will stick with latter thus recursively encoding the whole query.
def compat_urllib_parse_urlencode(query, doseq=0, encoding='utf-8'):
def encode_elem(e):
if isinstance(e, dict):
e = encode_dict(e)
elif isinstance(e, (list, tuple,)):
e = encode_list(e)
elif isinstance(e, compat_str):
e = e.encode(encoding)
return e
def encode_dict(d):
return dict((encode_elem(k), encode_elem(v)) for k, v in d.items())
def encode_list(l):
return [encode_elem(e) for e in l]
return compat_urllib_parse.urlencode(encode_elem(query), doseq=doseq)
try: try:
from urllib.request import DataHandler as compat_urllib_request_DataHandler from urllib.request import DataHandler as compat_urllib_request_DataHandler
except ImportError: # Python < 3.4 except ImportError: # Python < 3.4
@ -251,6 +281,16 @@ else:
el.text = el.text.decode('utf-8') el.text = el.text.decode('utf-8')
return doc return doc
if sys.version_info < (2, 7):
# Here comes the crazy part: In 2.6, if the xpath is a unicode,
# .//node does not match if a node is a direct child of . !
def compat_xpath(xpath):
if isinstance(xpath, compat_str):
xpath = xpath.encode('ascii')
return xpath
else:
compat_xpath = lambda xpath: xpath
try: try:
from urllib.parse import parse_qs as compat_parse_qs from urllib.parse import parse_qs as compat_parse_qs
except ImportError: # Python 2 except ImportError: # Python 2
@ -543,6 +583,7 @@ else:
from tokenize import generate_tokens as compat_tokenize_tokenize from tokenize import generate_tokens as compat_tokenize_tokenize
__all__ = [ __all__ = [
'compat_HTMLParser',
'compat_HTTPError', 'compat_HTTPError',
'compat_basestring', 'compat_basestring',
'compat_chr', 'compat_chr',
@ -572,6 +613,7 @@ __all__ = [
'compat_urllib_parse_unquote', 'compat_urllib_parse_unquote',
'compat_urllib_parse_unquote_plus', 'compat_urllib_parse_unquote_plus',
'compat_urllib_parse_unquote_to_bytes', 'compat_urllib_parse_unquote_to_bytes',
'compat_urllib_parse_urlencode',
'compat_urllib_parse_urlparse', 'compat_urllib_parse_urlparse',
'compat_urllib_request', 'compat_urllib_request',
'compat_urllib_request_DataHandler', 'compat_urllib_request_DataHandler',
@ -579,6 +621,7 @@ __all__ = [
'compat_urlparse', 'compat_urlparse',
'compat_urlretrieve', 'compat_urlretrieve',
'compat_xml_parse_error', 'compat_xml_parse_error',
'compat_xpath',
'shlex_quote', 'shlex_quote',
'subprocess_check_output', 'subprocess_check_output',
'workaround_optparse_bug9161', 'workaround_optparse_bug9161',

View File

@ -115,6 +115,10 @@ class FileDownloader(object):
return '%10s' % '---b/s' return '%10s' % '---b/s'
return '%10s' % ('%s/s' % format_bytes(speed)) return '%10s' % ('%s/s' % format_bytes(speed))
@staticmethod
def format_retries(retries):
return 'inf' if retries == float('inf') else '%.0f' % retries
@staticmethod @staticmethod
def best_block_size(elapsed_time, bytes): def best_block_size(elapsed_time, bytes):
new_min = max(bytes / 2.0, 1.0) new_min = max(bytes / 2.0, 1.0)
@ -297,7 +301,9 @@ class FileDownloader(object):
def report_retry(self, count, retries): def report_retry(self, count, retries):
"""Report retry in case of HTTP error 5xx""" """Report retry in case of HTTP error 5xx"""
self.to_screen('[download] Got server HTTP error. Retrying (attempt %d of %.0f)...' % (count, retries)) self.to_screen(
'[download] Got server HTTP error. Retrying (attempt %d of %s)...'
% (count, self.format_retries(retries)))
def report_file_already_downloaded(self, file_name): def report_file_already_downloaded(self, file_name):
"""Report file has already been fully downloaded.""" """Report file has already been fully downloaded."""

View File

@ -4,6 +4,7 @@ import os
import re import re
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urllib_error
from ..utils import ( from ..utils import (
sanitize_open, sanitize_open,
encodeFilename, encodeFilename,
@ -36,7 +37,13 @@ class DashSegmentsFD(FragmentFD):
segments_filenames = [] segments_filenames = []
def append_url_to_file(target_url, target_filename): fragment_retries = self.params.get('fragment_retries', 0)
def append_url_to_file(target_url, tmp_filename, segment_name):
target_filename = '%s-%s' % (tmp_filename, segment_name)
count = 0
while count <= fragment_retries:
try:
success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)}) success = ctx['dl'].download(target_filename, {'url': combine_url(base_url, target_url)})
if not success: if not success:
return False return False
@ -44,12 +51,27 @@ class DashSegmentsFD(FragmentFD):
ctx['dest_stream'].write(down.read()) ctx['dest_stream'].write(down.read())
down.close() down.close()
segments_filenames.append(target_sanitized) segments_filenames.append(target_sanitized)
break
except (compat_urllib_error.HTTPError, ) as err:
# YouTube may often return 404 HTTP error for a fragment causing the
# whole download to fail. However if the same fragment is immediately
# retried with the same request data this usually succeeds (1-2 attemps
# is usually enough) thus allowing to download the whole file successfully.
# So, we will retry all fragments that fail with 404 HTTP error for now.
if err.code != 404:
raise
# Retry fragment
count += 1
if count <= fragment_retries:
self.report_retry_fragment(segment_name, count, fragment_retries)
if count > fragment_retries:
self.report_error('giving up after %s fragment retries' % fragment_retries)
return False
if initialization_url: if initialization_url:
append_url_to_file(initialization_url, ctx['tmpfilename'] + '-Init') append_url_to_file(initialization_url, ctx['tmpfilename'], 'Init')
for i, segment_url in enumerate(segment_urls): for i, segment_url in enumerate(segment_urls):
segment_filename = '%s-Seg%d' % (ctx['tmpfilename'], i) append_url_to_file(segment_url, ctx['tmpfilename'], 'Seg%d' % i)
append_url_to_file(segment_url, segment_filename)
self._finish_frag_download(ctx) self._finish_frag_download(ctx)

View File

@ -223,6 +223,12 @@ def write_metadata_tag(stream, metadata):
write_unsigned_int(stream, FLV_TAG_HEADER_LEN + len(metadata)) write_unsigned_int(stream, FLV_TAG_HEADER_LEN + len(metadata))
def remove_encrypted_media(media):
return list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
'drmAdditionalHeaderSetId' not in e.attrib,
media))
def _add_ns(prop): def _add_ns(prop):
return '{http://ns.adobe.com/f4m/1.0}%s' % prop return '{http://ns.adobe.com/f4m/1.0}%s' % prop
@ -244,9 +250,7 @@ class F4mFD(FragmentFD):
# without drmAdditionalHeaderId or drmAdditionalHeaderSetId attribute # without drmAdditionalHeaderId or drmAdditionalHeaderSetId attribute
if 'id' not in e.attrib: if 'id' not in e.attrib:
self.report_error('Missing ID in f4m DRM') self.report_error('Missing ID in f4m DRM')
media = list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and media = remove_encrypted_media(media)
'drmAdditionalHeaderSetId' not in e.attrib,
media))
if not media: if not media:
self.report_error('Unsupported DRM') self.report_error('Unsupported DRM')
return media return media

View File

@ -19,8 +19,17 @@ class HttpQuietDownloader(HttpFD):
class FragmentFD(FileDownloader): class FragmentFD(FileDownloader):
""" """
A base file downloader class for fragmented media (e.g. f4m/m3u8 manifests). A base file downloader class for fragmented media (e.g. f4m/m3u8 manifests).
Available options:
fragment_retries: Number of times to retry a fragment for HTTP error (DASH only)
""" """
def report_retry_fragment(self, fragment_name, count, retries):
self.to_screen(
'[download] Got server HTTP error. Retrying fragment %s (attempt %d of %s)...'
% (fragment_name, count, self.format_retries(retries)))
def _prepare_and_start_frag_download(self, ctx): def _prepare_and_start_frag_download(self, ctx):
self._prepare_frag_download(ctx) self._prepare_frag_download(ctx)
self._start_frag_download(ctx) self._start_frag_download(ctx)

View File

@ -72,6 +72,7 @@ from .bet import BetIE
from .bigflix import BigflixIE from .bigflix import BigflixIE
from .bild import BildIE from .bild import BildIE
from .bilibili import BiliBiliIE from .bilibili import BiliBiliIE
from .biobiochiletv import BioBioChileTVIE
from .bleacherreport import ( from .bleacherreport import (
BleacherReportIE, BleacherReportIE,
BleacherReportCMSIE, BleacherReportCMSIE,
@ -81,6 +82,7 @@ from .bloomberg import BloombergIE
from .bokecc import BokeCCIE from .bokecc import BokeCCIE
from .bpb import BpbIE from .bpb import BpbIE
from .br import BRIE from .br import BRIE
from .bravotv import BravoTVIE
from .breakcom import BreakIE from .breakcom import BreakIE
from .brightcove import ( from .brightcove import (
BrightcoveLegacyIE, BrightcoveLegacyIE,
@ -93,6 +95,7 @@ from .camdemy import (
CamdemyIE, CamdemyIE,
CamdemyFolderIE CamdemyFolderIE
) )
from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE from .canalc2 import Canalc2IE
from .canvas import CanvasIE from .canvas import CanvasIE
@ -101,12 +104,14 @@ from .cbc import (
CBCPlayerIE, CBCPlayerIE,
) )
from .cbs import CBSIE from .cbs import CBSIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import ( from .cbsnews import (
CBSNewsIE, CBSNewsIE,
CBSNewsLiveVideoIE, CBSNewsLiveVideoIE,
) )
from .cbssports import CBSSportsIE from .cbssports import CBSSportsIE
from .ccc import CCCIE from .ccc import CCCIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE from .channel9 import Channel9IE
from .chaturbate import ChaturbateIE from .chaturbate import ChaturbateIE
@ -124,7 +129,7 @@ from .cloudy import CloudyIE
from .clubic import ClubicIE from .clubic import ClubicIE
from .clyp import ClypIE from .clyp import ClypIE
from .cmt import CMTIE from .cmt import CMTIE
from .cnet import CNETIE from .cnbc import CNBCIE
from .cnn import ( from .cnn import (
CNNIE, CNNIE,
CNNBlogsIE, CNNBlogsIE,
@ -135,6 +140,7 @@ from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE
from .condenast import CondeNastIE from .condenast import CondeNastIE
from .cracked import CrackedIE from .cracked import CrackedIE
from .crackle import CrackleIE from .crackle import CrackleIE
@ -282,6 +288,7 @@ from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE from .groupon import GrouponIE
from .hark import HarkIE from .hark import HarkIE
from .hbo import HBOIE
from .hearthisat import HearThisAtIE from .hearthisat import HearThisAtIE
from .heise import HeiseIE from .heise import HeiseIE
from .hellporno import HellPornoIE from .hellporno import HellPornoIE
@ -405,6 +412,7 @@ from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE from .mitele import MiTeleIE
from .mixcloud import MixcloudIE from .mixcloud import MixcloudIE
from .mlb import MLBIE from .mlb import MLBIE
from .mnet import MnetIE
from .mpora import MporaIE from .mpora import MporaIE
from .moevideo import MoeVideoIE from .moevideo import MoeVideoIE
from .mofosex import MofosexIE from .mofosex import MofosexIE
@ -431,10 +439,14 @@ from .myspass import MySpassIE
from .myvi import MyviIE from .myvi import MyviIE
from .myvideo import MyVideoIE from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE from .myvidster import MyVidsterIE
from .nationalgeographic import NationalGeographicIE from .nationalgeographic import (
NationalGeographicIE,
NationalGeographicChannelIE,
)
from .naver import NaverIE from .naver import NaverIE
from .nba import NBAIE from .nba import NBAIE
from .nbc import ( from .nbc import (
CSNNEIE,
NBCIE, NBCIE,
NBCNewsIE, NBCNewsIE,
NBCSportsIE, NBCSportsIE,
@ -530,6 +542,7 @@ from .ooyala import (
OoyalaIE, OoyalaIE,
OoyalaExternalIE, OoyalaExternalIE,
) )
from .openload import OpenloadIE
from .ora import OraTVIE from .ora import OraTVIE
from .orf import ( from .orf import (
ORFTVthekIE, ORFTVthekIE,
@ -625,6 +638,7 @@ from .ruutu import RuutuIE
from .sandia import SandiaIE from .sandia import SandiaIE
from .safari import ( from .safari import (
SafariIE, SafariIE,
SafariApiIE,
SafariCourseIE, SafariCourseIE,
) )
from .sapo import SapoIE from .sapo import SapoIE
@ -727,7 +741,6 @@ from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE from .telegraaf import TelegraafIE
from .telemb import TeleMBIE from .telemb import TeleMBIE
from .teletask import TeleTaskIE from .teletask import TeleTaskIE
from .tenplay import TenPlayIE
from .testurl import TestURLIE from .testurl import TestURLIE
from .tf1 import TF1IE from .tf1 import TF1IE
from .theintercept import TheInterceptIE from .theintercept import TheInterceptIE
@ -736,7 +749,9 @@ from .theplatform import (
ThePlatformIE, ThePlatformIE,
ThePlatformFeedIE, ThePlatformFeedIE,
) )
from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thisamericanlife import ThisAmericanLifeIE from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE from .thisav import ThisAVIE
from .tinypic import TinyPicIE from .tinypic import TinyPicIE
@ -783,6 +798,7 @@ from .tv2 import (
TV2IE, TV2IE,
TV2ArticleIE, TV2ArticleIE,
) )
from .tv3 import TV3IE
from .tv4 import TV4IE from .tv4 import TV4IE
from .tvc import ( from .tvc import (
TVCIE, TVCIE,
@ -890,6 +906,7 @@ from .vk import (
from .vlive import VLiveIE from .vlive import VLiveIE
from .vodlocker import VodlockerIE from .vodlocker import VodlockerIE
from .voicerepublic import VoiceRepublicIE from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE
from .vporn import VpornIE from .vporn import VpornIE
from .vrt import VRTIE from .vrt import VRTIE
from .vube import VubeIE from .vube import VubeIE
@ -951,7 +968,9 @@ from .youtube import (
YoutubeChannelIE, YoutubeChannelIE,
YoutubeFavouritesIE, YoutubeFavouritesIE,
YoutubeHistoryIE, YoutubeHistoryIE,
YoutubeLiveIE,
YoutubePlaylistIE, YoutubePlaylistIE,
YoutubePlaylistsIE,
YoutubeRecommendedIE, YoutubeRecommendedIE,
YoutubeSearchDateIE, YoutubeSearchDateIE,
YoutubeSearchIE, YoutubeSearchIE,
@ -961,7 +980,6 @@ from .youtube import (
YoutubeTruncatedIDIE, YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE, YoutubeTruncatedURLIE,
YoutubeUserIE, YoutubeUserIE,
YoutubePlaylistsIE,
YoutubeWatchLaterIE, YoutubeWatchLaterIE,
) )
from .zapiks import ZapiksIE from .zapiks import ZapiksIE

View File

@ -12,7 +12,7 @@ from ..utils import (
class ABCIE(InfoExtractor): class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au' IE_NAME = 'abc.net.au'
_VALID_URL = r'http://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)' _VALID_URL = r'https?://www\.abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334', 'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',

View File

@ -44,6 +44,7 @@ class Abc7NewsIE(InfoExtractor):
'contentURL', webpage, 'm3u8 url', fatal=True) 'contentURL', webpage, 'm3u8 url', fatal=True)
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4') formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
title = self._og_search_title(webpage).strip() title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip() description = self._og_search_description(webpage).strip()

View File

@ -6,7 +6,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_str, compat_str,
compat_urllib_parse, compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
@ -16,7 +16,7 @@ from ..utils import (
class AddAnimeIE(InfoExtractor): class AddAnimeIE(InfoExtractor):
_VALID_URL = r'http://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)' _VALID_URL = r'https?://(?:\w+\.)?add-anime\.net/(?:watch_video\.php\?(?:.*?)v=|video/)(?P<id>[\w_]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9', 'url': 'http://www.add-anime.net/watch_video.php?v=24MR3YO5SAS9',
'md5': '72954ea10bc979ab5e2eb288b21425a0', 'md5': '72954ea10bc979ab5e2eb288b21425a0',
@ -60,7 +60,7 @@ class AddAnimeIE(InfoExtractor):
confirm_url = ( confirm_url = (
parsed_url.scheme + '://' + parsed_url.netloc + parsed_url.scheme + '://' + parsed_url.netloc +
action + '?' + action + '?' +
compat_urllib_parse.urlencode({ compat_urllib_parse_urlencode({
'jschl_vc': vc, 'jschl_answer': compat_str(av_val)})) 'jschl_vc': vc, 'jschl_answer': compat_str(av_val)}))
self._download_webpage( self._download_webpage(
confirm_url, video_id, confirm_url, video_id,

View File

@ -1,13 +1,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import smuggle_url from ..utils import (
smuggle_url,
update_url_query,
unescapeHTML,
)
class AENetworksIE(InfoExtractor): class AENetworksIE(InfoExtractor):
IE_NAME = 'aenetworks' IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network' IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])' _VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_TESTS = [{ _TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false', 'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
@ -16,6 +22,9 @@ class AENetworksIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day", 'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7', 'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729,
'upload_date': '20130806',
'uploader': 'AENE-NEW',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -25,15 +34,15 @@ class AENetworksIE(InfoExtractor):
'expected_warnings': ['JSON-LD'], 'expected_warnings': ['JSON-LD'],
}, { }, {
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1', 'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
'info_dict': { 'info_dict': {
'id': 'eg47EERs_JsZ', 'id': 'eg47EERs_JsZ',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Winter Is Coming', 'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74', 'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
}, 'timestamp': 1338306241,
'params': { 'upload_date': '20120529',
# m3u8 download 'uploader': 'AENE-NEW',
'skip_download': True,
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
}, { }, {
@ -48,7 +57,7 @@ class AENetworksIE(InfoExtractor):
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) page_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
@ -56,11 +65,23 @@ class AENetworksIE(InfoExtractor):
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id, r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'" r"media_url\s*=\s*'([^']+)'"
] ]
video_url = self._search_regex(video_url_re, webpage, 'video url') video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
query = {'mbr': 'true'}
if page_type == 'shows':
query['assetTypes'] = 'medium_video_s3'
if 'switch=hds' in video_url:
query['switch'] = 'hls'
info = self._search_json_ld(webpage, video_id, fatal=False) info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({ info.update({
'_type': 'url_transparent', '_type': 'url_transparent',
'url': smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}), 'url': smuggle_url(
update_url_query(video_url, query),
{
'sig': {
'key': 'crazyjava',
'secret': 's3cr3t'},
'force_smil_url': True
}),
}) })
return info return info

View File

@ -6,7 +6,7 @@ from ..utils import int_or_none
class AftonbladetIE(InfoExtractor): class AftonbladetIE(InfoExtractor):
_VALID_URL = r'http://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)' _VALID_URL = r'https?://tv\.aftonbladet\.se/abtv/articles/(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://tv.aftonbladet.se/abtv/articles/36015', 'url': 'http://tv.aftonbladet.se/abtv/articles/36015',
'info_dict': { 'info_dict': {

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class AlJazeeraIE(InfoExtractor): class AlJazeeraIE(InfoExtractor):
_VALID_URL = r'http://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html' _VALID_URL = r'https?://www\.aljazeera\.com/programmes/.*?/(?P<id>[^/]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html', 'url': 'http://www.aljazeera.com/programmes/the-slum/2014/08/deliverance-201482883754237240.html',
@ -13,24 +13,18 @@ class AlJazeeraIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'The Slum - Episode 1: Deliverance', 'title': 'The Slum - Episode 1: Deliverance',
'description': 'As a birth attendant advocating for family planning, Remy is on the frontline of Tondo\'s battle with overcrowding.', 'description': 'As a birth attendant advocating for family planning, Remy is on the frontline of Tondo\'s battle with overcrowding.',
'uploader': 'Al Jazeera English', 'uploader_id': '665003303001',
'timestamp': 1411116829,
'upload_date': '20140919',
}, },
'add_ie': ['BrightcoveLegacy'], 'add_ie': ['BrightcoveNew'],
'skip': 'Not accessible from Travis CI server', 'skip': 'Not accessible from Travis CI server',
} }
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/665003303001/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
program_name = self._match_id(url) program_name = self._match_id(url)
webpage = self._download_webpage(url, program_name) webpage = self._download_webpage(url, program_name)
brightcove_id = self._search_regex( brightcove_id = self._search_regex(
r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id') r'RenderPagesVideo\(\'(.+?)\'', webpage, 'brightcove id')
return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
return {
'_type': 'url',
'url': (
'brightcove:'
'playerKey=AQ~~%2CAAAAmtVJIFk~%2CTVGOQ5ZTwJbeMWnq5d_H4MOM57xfzApc'
'&%40videoPlayer={0}'.format(brightcove_id)
),
'ie_key': 'BrightcoveLegacy',
}

View File

@ -69,12 +69,14 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return { return {
'id': video_id, 'id': video_id,
'title': get_media_node('title'), 'title': get_media_node('title'),
'description': get_media_node('description'), 'description': get_media_node('description'),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'timestamp': parse_iso8601(item.get('pubDate'), ' '), 'timestamp': timestamp,
'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')), 'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')),
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,

View File

@ -3,10 +3,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_str,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
encode_dict, extract_attributes,
ExtractorError, ExtractorError,
sanitized_Request, sanitized_Request,
urlencode_postdata, urlencode_postdata,
@ -18,7 +21,7 @@ class AnimeOnDemandIE(InfoExtractor):
_LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in' _LOGIN_URL = 'https://www.anime-on-demand.de/users/sign_in'
_APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply' _APPLY_HTML5_URL = 'https://www.anime-on-demand.de/html5apply'
_NETRC_MACHINE = 'animeondemand' _NETRC_MACHINE = 'animeondemand'
_TEST = { _TESTS = [{
'url': 'https://www.anime-on-demand.de/anime/161', 'url': 'https://www.anime-on-demand.de/anime/161',
'info_dict': { 'info_dict': {
'id': '161', 'id': '161',
@ -26,7 +29,19 @@ class AnimeOnDemandIE(InfoExtractor):
'description': 'md5:6681ce3c07c7189d255ac6ab23812d31', 'description': 'md5:6681ce3c07c7189d255ac6ab23812d31',
}, },
'playlist_mincount': 4, 'playlist_mincount': 4,
} }, {
# Film wording is used instead of Episode
'url': 'https://www.anime-on-demand.de/anime/39',
'only_matching': True,
}, {
# Episodes without titles
'url': 'https://www.anime-on-demand.de/anime/162',
'only_matching': True,
}, {
# ger/jap, Dub/OmU, account required
'url': 'https://www.anime-on-demand.de/anime/169',
'only_matching': True,
}]
def _login(self): def _login(self):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
@ -36,6 +51,10 @@ class AnimeOnDemandIE(InfoExtractor):
login_page = self._download_webpage( login_page = self._download_webpage(
self._LOGIN_URL, None, 'Downloading login page') self._LOGIN_URL, None, 'Downloading login page')
if '>Our licensing terms allow the distribution of animes only to German-speaking countries of Europe' in login_page:
self.raise_geo_restricted(
'%s is only available in German-speaking countries of Europe' % self.IE_NAME)
login_form = self._form_hidden_inputs('new_user', login_page) login_form = self._form_hidden_inputs('new_user', login_page)
login_form.update({ login_form.update({
@ -51,7 +70,7 @@ class AnimeOnDemandIE(InfoExtractor):
post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url) post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
request = sanitized_Request( request = sanitized_Request(
post_url, urlencode_postdata(encode_dict(login_form))) post_url, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL) request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage( response = self._download_webpage(
@ -91,14 +110,22 @@ class AnimeOnDemandIE(InfoExtractor):
entries = [] entries = []
for episode_html in re.findall(r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage): for num, episode_html in enumerate(re.findall(
m = re.search( r'(?s)<h3[^>]+class="episodebox-title".+?>Episodeninhalt<', webpage), 1):
r'class="episodebox-title"[^>]+title="Episode (?P<number>\d+) - (?P<title>.+?)"', episode_html) episodebox_title = self._search_regex(
if not m: (r'class="episodebox-title"[^>]+title=(["\'])(?P<title>.+?)\1',
r'class="episodebox-title"[^>]+>(?P<title>.+?)<'),
episode_html, 'episodebox title', default=None, group='title')
if not episodebox_title:
continue continue
episode_number = int(m.group('number')) episode_number = int(self._search_regex(
episode_title = m.group('title') r'(?:Episode|Film)\s*(\d+)',
episodebox_title, 'episode number', default=num))
episode_title = self._search_regex(
r'(?:Episode|Film)\s*\d+\s*-\s*(.+)',
episodebox_title, 'episode title', default=None)
video_id = 'episode-%d' % episode_number video_id = 'episode-%d' % episode_number
common_info = { common_info = {
@ -110,10 +137,34 @@ class AnimeOnDemandIE(InfoExtractor):
formats = [] formats = []
playlist_url = self._search_regex( for input_ in re.findall(
r'data-playlist=(["\'])(?P<url>.+?)\1', r'<input[^>]+class=["\'].*?streamstarter_html5[^>]+>', episode_html):
episode_html, 'data playlist', default=None, group='url') attributes = extract_attributes(input_)
if playlist_url: playlist_urls = []
for playlist_key in ('data-playlist', 'data-otherplaylist'):
playlist_url = attributes.get(playlist_key)
if isinstance(playlist_url, compat_str) and re.match(
r'/?[\da-zA-Z]+', playlist_url):
playlist_urls.append(attributes[playlist_key])
if not playlist_urls:
continue
lang = attributes.get('data-lang')
lang_note = attributes.get('value')
for playlist_url in playlist_urls:
kind = self._search_regex(
r'videomaterialurl/\d+/([^/]+)/',
playlist_url, 'media kind', default=None)
format_id_list = []
if lang:
format_id_list.append(lang)
if kind:
format_id_list.append(kind)
if not format_id_list:
format_id_list.append(compat_str(num))
format_id = '-'.join(format_id_list)
format_note = ', '.join(filter(None, (kind, lang_note)))
request = sanitized_Request( request = sanitized_Request(
compat_urlparse.urljoin(url, playlist_url), compat_urlparse.urljoin(url, playlist_url),
headers={ headers={
@ -122,21 +173,50 @@ class AnimeOnDemandIE(InfoExtractor):
'Referer': url, 'Referer': url,
'Accept': 'application/json, text/javascript, */*; q=0.01', 'Accept': 'application/json, text/javascript, */*; q=0.01',
}) })
playlist = self._download_json( playlist = self._download_json(
request, video_id, 'Downloading playlist JSON', fatal=False) request, video_id, 'Downloading %s playlist JSON' % format_id,
if playlist: fatal=False)
playlist = playlist['playlist'][0] if not playlist:
title = playlist['title'] continue
start_video = playlist.get('startvideo', 0)
playlist = playlist.get('playlist')
if not playlist or not isinstance(playlist, list):
continue
playlist = playlist[start_video]
title = playlist.get('title')
if not title:
continue
description = playlist.get('description') description = playlist.get('description')
for source in playlist.get('sources', []): for source in playlist.get('sources', []):
file_ = source.get('file') file_ = source.get('file')
if file_ and determine_ext(file_) == 'm3u8': if not file_:
formats = self._extract_m3u8_formats( continue
ext = determine_ext(file_)
format_id_list = [lang, kind]
if ext == 'm3u8':
format_id_list.append('hls')
elif source.get('type') == 'video/dash' or ext == 'mpd':
format_id_list.append('dash')
format_id = '-'.join(filter(None, format_id_list))
if ext == 'm3u8':
file_formats = self._extract_m3u8_formats(
file_, video_id, 'mp4', file_, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls') entry_protocol='m3u8_native', m3u8_id=format_id, fatal=False)
elif source.get('type') == 'video/dash' or ext == 'mpd':
continue
file_formats = self._extract_mpd_formats(
file_, video_id, mpd_id=format_id, fatal=False)
else:
continue
for f in file_formats:
f.update({
'language': lang,
'format_note': format_note,
})
formats.extend(file_formats)
if formats: if formats:
self._sort_formats(formats)
f = common_info.copy() f = common_info.copy()
f.update({ f.update({
'title': title, 'title': title,
@ -145,6 +225,8 @@ class AnimeOnDemandIE(InfoExtractor):
}) })
entries.append(f) entries.append(f)
# Extract teaser only when full episode is not available
if not formats:
m = re.search( m = re.search(
r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<', r'data-dialog-header=(["\'])(?P<title>.+?)\1[^>]+href=(["\'])(?P<href>.+?)\3[^>]*>Teaser<',
episode_html) episode_html)

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class AolIE(InfoExtractor): class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com' IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|http://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)' _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_TESTS = [{ _TESTS = [{
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img', 'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
@ -25,7 +25,7 @@ class AolIE(InfoExtractor):
class AolFeaturesIE(InfoExtractor): class AolFeaturesIE(InfoExtractor):
IE_NAME = 'features.aol.com' IE_NAME = 'features.aol.com'
_VALID_URL = r'http://features\.aol\.com/video/(?P<id>[^/?#]+)' _VALID_URL = r'https?://features\.aol\.com/video/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts', 'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts',

View File

@ -23,7 +23,7 @@ from ..utils import (
class ArteTvIE(InfoExtractor): class ArteTvIE(InfoExtractor):
_VALID_URL = r'http://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html' _VALID_URL = r'https?://videos\.arte\.tv/(?P<lang>fr|de|en|es)/.*-(?P<id>.*?)\.html'
IE_NAME = 'arte.tv' IE_NAME = 'arte.tv'
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -6,16 +6,14 @@ import hashlib
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_str,
compat_urllib_parse,
)
from ..utils import ( from ..utils import (
int_or_none,
float_or_none,
sanitized_Request,
xpath_text,
ExtractorError, ExtractorError,
float_or_none,
int_or_none,
sanitized_Request,
urlencode_postdata,
xpath_text,
) )
@ -86,7 +84,7 @@ class AtresPlayerIE(InfoExtractor):
} }
request = sanitized_Request( request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Content-Type', 'application/x-www-form-urlencoded') request.add_header('Content-Type', 'application/x-www-form-urlencoded')
response = self._download_webpage( response = self._download_webpage(
request, None, 'Logging in as %s' % username) request, None, 'Logging in as %s' % username)

View File

@ -98,7 +98,7 @@ class AzubuIE(InfoExtractor):
class AzubuLiveIE(InfoExtractor): class AzubuLiveIE(InfoExtractor):
_VALID_URL = r'http://www.azubu.tv/(?P<id>[^/]+)$' _VALID_URL = r'https?://www.azubu.tv/(?P<id>[^/]+)$'
_TEST = { _TEST = {
'url': 'http://www.azubu.tv/MarsTVMDLen', 'url': 'http://www.azubu.tv/MarsTVMDLen',
@ -120,6 +120,7 @@ class AzubuLiveIE(InfoExtractor):
bc_info = self._download_json(req, user) bc_info = self._download_json(req, user)
m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS') m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4') formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
self._sort_formats(formats)
return { return {
'id': info['id'], 'id': info['id'],

View File

@ -9,7 +9,7 @@ from ..utils import unescapeHTML
class BaiduVideoIE(InfoExtractor): class BaiduVideoIE(InfoExtractor):
IE_DESC = '百度视频' IE_DESC = '百度视频'
_VALID_URL = r'http://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm' _VALID_URL = r'https?://v\.baidu\.com/(?P<type>[a-z]+)/(?P<id>\d+)\.htm'
_TESTS = [{ _TESTS = [{
'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6', 'url': 'http://v.baidu.com/comic/1069.htm?frp=bdbrand&q=%E4%B8%AD%E5%8D%8E%E5%B0%8F%E5%BD%93%E5%AE%B6',
'info_dict': { 'info_dict': {

View File

@ -4,15 +4,13 @@ import re
import itertools import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import compat_str
compat_urllib_parse,
compat_str,
)
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none,
float_or_none, float_or_none,
int_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata,
) )
@ -58,7 +56,7 @@ class BambuserIE(InfoExtractor):
} }
request = sanitized_Request( request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) self._LOGIN_URL, urlencode_postdata(login_form))
request.add_header('Referer', self._LOGIN_URL) request.add_header('Referer', self._LOGIN_URL)
response = self._download_webpage( response = self._download_webpage(
request, None, 'Logging in as %s' % username) request, None, 'Logging in as %s' % username)

View File

@ -328,6 +328,7 @@ class BBCCoUkIE(InfoExtractor):
'format_id': '%s_%s' % (service, format['format_id']), 'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr, 'abr': abr,
'acodec': acodec, 'acodec': acodec,
'vcodec': 'none',
}) })
formats.extend(conn_formats) formats.extend(conn_formats)
return formats return formats
@ -688,6 +689,10 @@ class BBCIE(BBCCoUkIE):
# custom redirection to www.bbc.com # custom redirection to www.bbc.com
'url': 'http://www.bbc.co.uk/news/science-environment-33661876', 'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True, 'only_matching': True,
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True,
}] }]
@classmethod @classmethod
@ -817,7 +822,7 @@ class BBCIE(BBCCoUkIE):
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret) # single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex( programme_id = self._search_regex(
[r'data-video-player-vpid="(%s)"' % self._ID_REGEX, [r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,
r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX, r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX,
r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX], r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX],
webpage, 'vpid', default=None) webpage, 'vpid', default=None)
@ -942,7 +947,7 @@ class BBCIE(BBCCoUkIE):
class BBCCoUkArticleIE(InfoExtractor): class BBCCoUkArticleIE(InfoExtractor):
_VALID_URL = 'http://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://www.bbc.co.uk/programmes/articles/(?P<id>[a-zA-Z0-9]+)'
IE_NAME = 'bbc.co.uk:article' IE_NAME = 'bbc.co.uk:article'
IE_DESC = 'BBC articles' IE_DESC = 'BBC articles'

View File

@ -34,7 +34,7 @@ class BeegIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
video = self._download_json( video = self._download_json(
'https://api.beeg.com/api/v5/video/%s' % video_id, video_id) 'https://api.beeg.com/api/v6/1738/video/%s' % video_id, video_id)
def split(o, e): def split(o, e):
def cut(s, x): def cut(s, x):
@ -50,8 +50,8 @@ class BeegIE(InfoExtractor):
return n return n
def decrypt_key(key): def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1105.js # Reverse engineered from http://static.beeg.com/cpl/1738.js
a = '5ShMcIQlssOd7zChAIOlmeTZDaUxULbJRnywYaiB' a = 'GUuyodcfS8FW8gQp4OKLMsZBcX0T7B'
e = compat_urllib_parse_unquote(key) e = compat_urllib_parse_unquote(key)
o = ''.join([ o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21) compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)

View File

@ -8,7 +8,7 @@ from ..utils import url_basename
class BehindKinkIE(InfoExtractor): class BehindKinkIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)' _VALID_URL = r'https?://(?:www\.)?behindkink\.com/(?P<year>[0-9]{4})/(?P<month>[0-9]{2})/(?P<day>[0-9]{2})/(?P<id>[^/#?_]+)'
_TEST = { _TEST = {
'url': 'http://www.behindkink.com/2014/12/05/what-are-you-passionate-about-marley-blaze/', 'url': 'http://www.behindkink.com/2014/12/05/what-are-you-passionate-about-marley-blaze/',
'md5': '507b57d8fdcd75a41a9a7bdb7989c762', 'md5': '507b57d8fdcd75a41a9a7bdb7989c762',

View File

@ -94,6 +94,7 @@ class BetIE(InfoExtractor):
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url') xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id) formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,

View File

@ -14,7 +14,7 @@ from ..utils import (
class BiliBiliIE(InfoExtractor): class BiliBiliIE(InfoExtractor):
_VALID_URL = r'http://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?' _VALID_URL = r'https?://www\.bilibili\.(?:tv|com)/video/av(?P<id>\d+)(?:/index_(?P<page_num>\d+).html)?'
_TESTS = [{ _TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/', 'url': 'http://www.bilibili.tv/video/av1074402/',

View File

@ -0,0 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import remove_end
class BioBioChileTVIE(InfoExtractor):
_VALID_URL = r'https?://tv\.biobiochile\.cl/notas/(?:[^/]+/)+(?P<id>[^/]+)\.shtml'
_TESTS = [{
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/sobre-camaras-y-camarillas-parlamentarias.shtml',
'md5': '26f51f03cf580265defefb4518faec09',
'info_dict': {
'id': 'sobre-camaras-y-camarillas-parlamentarias',
'ext': 'mp4',
'title': 'Sobre Cámaras y camarillas parlamentarias',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Fernando Atria',
},
}, {
# different uploader layout
'url': 'http://tv.biobiochile.cl/notas/2016/03/18/natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades.shtml',
'md5': 'edc2e6b58974c46d5b047dea3c539ff3',
'info_dict': {
'id': 'natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades',
'ext': 'mp4',
'title': 'Natalia Valdebenito repasa a diputado Hasbún: Pasó a la categoría de hablar brutalidades',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': 'Piangella Obrador',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/22/ninos-transexuales-de-quien-es-la-decision.shtml',
'only_matching': True,
}, {
'url': 'http://tv.biobiochile.cl/notas/2015/10/21/exclusivo-hector-pinto-formador-de-chupete-revela-version-del-ex-delantero-albo.shtml',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' - BioBioChile TV')
file_url = self._search_regex(
r'loadFWPlayerVideo\([^,]+,\s*(["\'])(?P<url>.+?)\1',
webpage, 'file url', group='url')
base_url = self._search_regex(
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*fileURL', webpage,
'base url', default='http://unlimited2-cl.digitalproserver.com/bbtv/',
group='url')
formats = self._extract_m3u8_formats(
'%s%s/playlist.m3u8' % (base_url, file_url), video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)
f = {
'url': '%s%s' % (base_url, file_url),
'format_id': 'http',
'protocol': 'http',
'preference': 1,
}
if formats:
f_copy = formats[-1].copy()
f_copy.update(f)
f = f_copy
formats.append(f)
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
uploader = self._html_search_regex(
r'<a[^>]+href=["\']https?://busca\.biobiochile\.cl/author[^>]+>(.+?)</a>',
webpage, 'uploader', fatal=False)
return {
'id': video_id,
'title': title,
'thumbnail': thumbnail,
'uploader': uploader,
'formats': formats,
}

View File

@ -33,7 +33,7 @@ class BokeCCBaseIE(InfoExtractor):
class BokeCCIE(BokeCCBaseIE): class BokeCCIE(BokeCCBaseIE):
_IE_DESC = 'CC视频' _IE_DESC = 'CC视频'
_VALID_URL = r'http://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)' _VALID_URL = r'https?://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{ _TESTS = [{
'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B', 'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',

View File

@ -12,7 +12,7 @@ from ..utils import (
class BpbIE(InfoExtractor): class BpbIE(InfoExtractor):
IE_DESC = 'Bundeszentrale für politische Bildung' IE_DESC = 'Bundeszentrale für politische Bildung'
_VALID_URL = r'http://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/' _VALID_URL = r'https?://www\.bpb\.de/mediathek/(?P<id>[0-9]+)/'
_TEST = { _TEST = {
'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr', 'url': 'http://www.bpb.de/mediathek/297/joachim-gauck-zu-1989-und-die-erinnerung-an-die-ddr',

View File

@ -0,0 +1,31 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class BravoTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?bravotv\.com/(?:[^/]+/)+videos/(?P<id>[^/?]+)'
_TEST = {
'url': 'http://www.bravotv.com/last-chance-kitchen/season-5/videos/lck-ep-12-fishy-finale',
'md5': 'd60cdf68904e854fac669bd26cccf801',
'info_dict': {
'id': 'LitrBdX64qLn',
'ext': 'mp4',
'title': 'Last Chance Kitchen Returns',
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
'timestamp': 1448926740,
'upload_date': '20151130',
'uploader': 'NBCU-BRAV',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
account_pid = self._search_regex(r'"account_pid"\s*:\s*"([^"]+)"', webpage, 'account pid')
release_pid = self._search_regex(r'"release_pid"\s*:\s*"([^"]+)"', webpage, 'release pid')
return self.url_result(smuggle_url(
'http://link.theplatform.com/s/%s/%s?mbr=true&switch=progressive' % (account_pid, release_pid),
{'force_smil_url': True}), 'ThePlatform', release_pid)

View File

@ -11,7 +11,7 @@ from ..utils import (
class BreakIE(InfoExtractor): class BreakIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056', 'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
'info_dict': { 'info_dict': {

View File

@ -9,7 +9,6 @@ from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_parse_qs, compat_parse_qs,
compat_str, compat_str,
compat_urllib_parse,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
compat_xml_parse_error, compat_xml_parse_error,
@ -24,16 +23,16 @@ from ..utils import (
js_to_json, js_to_json,
int_or_none, int_or_none,
parse_iso8601, parse_iso8601,
sanitized_Request,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
update_url_query,
) )
class BrightcoveLegacyIE(InfoExtractor): class BrightcoveLegacyIE(InfoExtractor):
IE_NAME = 'brightcove:legacy' IE_NAME = 'brightcove:legacy'
_VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)' _VALID_URL = r'(?:https?://.*brightcove\.com/(services|viewer).*?\?|brightcove:)(?P<query>.*)'
_FEDERATED_URL_TEMPLATE = 'http://c.brightcove.com/services/viewer/htmlFederated?%s' _FEDERATED_URL = 'http://c.brightcove.com/services/viewer/htmlFederated'
_TESTS = [ _TESTS = [
{ {
@ -47,6 +46,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”', 'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
'uploader': '8TV', 'uploader': '8TV',
'description': 'md5:a950cc4285c43e44d763d036710cd9cd', 'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
'timestamp': 1368213670,
'upload_date': '20130510',
'uploader_id': 1589608506001,
} }
}, },
{ {
@ -58,6 +60,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges', 'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.', 'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
'uploader': 'Oracle', 'uploader': 'Oracle',
'timestamp': 1344975024,
'upload_date': '20120814',
'uploader_id': 1460825906,
}, },
}, },
{ {
@ -69,6 +74,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'This Bracelet Acts as a Personal Thermostat', 'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0', 'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable', 'uploader': 'Mashable',
'timestamp': 1382041798,
'upload_date': '20131017',
'uploader_id': 1130468786001,
}, },
}, },
{ {
@ -86,14 +94,17 @@ class BrightcoveLegacyIE(InfoExtractor):
{ {
# test flv videos served by akamaihd.net # test flv videos served by akamaihd.net
# From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william # From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D', 'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3Aevent-stream-356&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
# The md5 checksum changes on each download # The md5 checksum changes on each download
'info_dict': { 'info_dict': {
'id': '2996102916001', 'id': '3750436379001',
'ext': 'flv', 'ext': 'flv',
'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals', 'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'uploader': 'Red Bull TV', 'uploader': 'RBTV Old (do not use)',
'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals', 'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'timestamp': 1409122195,
'upload_date': '20140827',
'uploader_id': 710858724001,
}, },
}, },
{ {
@ -107,6 +118,12 @@ class BrightcoveLegacyIE(InfoExtractor):
'playlist_mincount': 7, 'playlist_mincount': 7,
}, },
] ]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod @classmethod
def _build_brighcove_url(cls, object_str): def _build_brighcove_url(cls, object_str):
@ -137,13 +154,16 @@ class BrightcoveLegacyIE(InfoExtractor):
else: else:
flashvars = {} flashvars = {}
data_url = object_doc.attrib.get('data', '')
data_url_params = compat_parse_qs(compat_urllib_parse_urlparse(data_url).query)
def find_param(name): def find_param(name):
if name in flashvars: if name in flashvars:
return flashvars[name] return flashvars[name]
node = find_xpath_attr(object_doc, './param', 'name', name) node = find_xpath_attr(object_doc, './param', 'name', name)
if node is not None: if node is not None:
return node.attrib['value'] return node.attrib['value']
return None return data_url_params.get(name)
params = {} params = {}
@ -156,8 +176,8 @@ class BrightcoveLegacyIE(InfoExtractor):
# Not all pages define this value # Not all pages define this value
if playerKey is not None: if playerKey is not None:
params['playerKey'] = playerKey params['playerKey'] = playerKey
# The three fields hold the id of the video # These fields hold the id of the video
videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') videoPlayer = find_param('@videoPlayer') or find_param('videoId') or find_param('videoID') or find_param('@videoList')
if videoPlayer is not None: if videoPlayer is not None:
params['@videoPlayer'] = videoPlayer params['@videoPlayer'] = videoPlayer
linkBase = find_param('linkBaseURL') linkBase = find_param('linkBaseURL')
@ -185,8 +205,7 @@ class BrightcoveLegacyIE(InfoExtractor):
@classmethod @classmethod
def _make_brightcove_url(cls, params): def _make_brightcove_url(cls, params):
data = compat_urllib_parse.urlencode(params) return update_url_query(cls._FEDERATED_URL, params)
return cls._FEDERATED_URL_TEMPLATE % data
@classmethod @classmethod
def _extract_brightcove_url(cls, webpage): def _extract_brightcove_url(cls, webpage):
@ -240,7 +259,7 @@ class BrightcoveLegacyIE(InfoExtractor):
# We set the original url as the default 'Referer' header # We set the original url as the default 'Referer' header
referer = smuggled_data.get('Referer', url) referer = smuggled_data.get('Referer', url)
return self._get_video_info( return self._get_video_info(
videoPlayer[0], query_str, query, referer=referer) videoPlayer[0], query, referer=referer)
elif 'playerKey' in query: elif 'playerKey' in query:
player_key = query['playerKey'] player_key = query['playerKey']
return self._get_playlist_info(player_key[0]) return self._get_playlist_info(player_key[0])
@ -249,15 +268,14 @@ class BrightcoveLegacyIE(InfoExtractor):
'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?', 'Cannot find playerKey= variable. Did you forget quotes in a shell invocation?',
expected=True) expected=True)
def _get_video_info(self, video_id, query_str, query, referer=None): def _get_video_info(self, video_id, query, referer=None):
request_url = self._FEDERATED_URL_TEMPLATE % query_str headers = {}
req = sanitized_Request(request_url)
linkBase = query.get('linkBaseURL') linkBase = query.get('linkBaseURL')
if linkBase is not None: if linkBase is not None:
referer = linkBase[0] referer = linkBase[0]
if referer is not None: if referer is not None:
req.add_header('Referer', referer) headers['Referer'] = referer
webpage = self._download_webpage(req, video_id) webpage = self._download_webpage(self._FEDERATED_URL, video_id, headers=headers, query=query)
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage, r"<h1>We're sorry.</h1>([\s\n]*<p>.*?</p>)+", webpage,
@ -295,9 +313,12 @@ class BrightcoveLegacyIE(InfoExtractor):
'description': video_info.get('shortDescription'), 'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'), 'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'), 'uploader': video_info.get('publisherName'),
'uploader_id': video_info.get('publisherId'),
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
} }
renditions = video_info.get('renditions') renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions: if renditions:
formats = [] formats = []
for rend in renditions: for rend in renditions:
@ -318,19 +339,42 @@ class BrightcoveLegacyIE(InfoExtractor):
ext = 'flv' ext = 'flv'
if ext is None: if ext is None:
ext = determine_ext(url) ext = determine_ext(url)
size = rend.get('size') tbr = int_or_none(rend.get('encodingRate'), 1000),
formats.append({ a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url, 'url': url,
'ext': ext, 'ext': ext,
'height': rend.get('frameHeight'), 'filesize': int_or_none(rend.get('size')) or None,
'width': rend.get('frameWidth'), 'tbr': tbr,
'filesize': size if size != 0 else None, }
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
}) })
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8',
})
formats.append(a_format)
self._sort_formats(formats) self._sort_formats(formats)
info['formats'] = formats info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None: elif video_info.get('FLVFullLengthURL') is not None:
info.update({ info.update({
'url': video_info['FLVFullLengthURL'], 'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
}) })
if self._downloader.params.get('include_ads', False): if self._downloader.params.get('include_ads', False):
@ -386,6 +430,7 @@ class BrightcoveNewIE(InfoExtractor):
'formats': 'mincount:41', 'formats': 'mincount:41',
}, },
'params': { 'params': {
# m3u8 download
'skip_download': True, 'skip_download': True,
} }
}, { }, {
@ -415,8 +460,8 @@ class BrightcoveNewIE(InfoExtractor):
# Look for iframe embeds [1] # Look for iframe embeds [1]
for _, url in re.findall( for _, url in re.findall(
r'<iframe[^>]+src=(["\'])((?:https?:)//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage): r'<iframe[^>]+src=(["\'])((?:https?:)?//players\.brightcove\.net/\d+/[^/]+/index\.html.+?)\1', webpage):
entries.append(url) entries.append(url if url.startswith('http') else 'http:' + url)
# Look for embed_in_page embeds [2] # Look for embed_in_page embeds [2]
for video_id, account_id, player_id, embed in re.findall( for video_id, account_id, player_id, embed in re.findall(
@ -425,11 +470,11 @@ class BrightcoveNewIE(InfoExtractor):
# According to [4] data-video-id may be prefixed with ref: # According to [4] data-video-id may be prefixed with ref:
r'''(?sx) r'''(?sx)
<video[^>]+ <video[^>]+
data-video-id=["\']((?:ref:)?\d+)["\'][^>]*>.*? data-video-id=["\'](\d+|ref:[^"\']+)["\'][^>]*>.*?
</video>.*? </video>.*?
<script[^>]+ <script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/ src=["\'](?:https?:)?//players\.brightcove\.net/
(\d+)/([\da-f-]+)_([^/]+)/index(?:\.min)?\.js (\d+)/([^/]+)_([^/]+)/index(?:\.min)?\.js
''', webpage): ''', webpage):
entries.append( entries.append(
'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s' 'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
@ -459,19 +504,18 @@ class BrightcoveNewIE(InfoExtractor):
r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1', r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
webpage, 'policy key', group='pk') webpage, 'policy key', group='pk')
req = sanitized_Request( api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' % (account_id, video_id)
'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s'
% (account_id, video_id),
headers={'Accept': 'application/json;pk=%s' % policy_key})
try: try:
json_data = self._download_json(req, video_id) json_data = self._download_json(api_url, video_id, headers={
'Accept': 'application/json;pk=%s' % policy_key
})
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403: if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id) json_data = self._parse_json(e.cause.read().decode(), video_id)
raise ExtractorError(json_data[0]['message'], expected=True) raise ExtractorError(json_data[0]['message'], expected=True)
raise raise
title = json_data['name'] title = json_data['name'].strip()
formats = [] formats = []
for source in json_data.get('sources', []): for source in json_data.get('sources', []):
@ -482,8 +526,11 @@ class BrightcoveNewIE(InfoExtractor):
if not src: if not src:
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', entry_protocol='m3u8_native', src, video_id, 'mp4', m3u8_id='hls', fatal=False))
m3u8_id='hls', fatal=False)) elif source_type == 'application/dash+xml':
if not src:
continue
formats.extend(self._extract_mpd_formats(src, video_id, 'dash', fatal=False))
else: else:
streaming_src = source.get('streaming_src') streaming_src = source.get('streaming_src')
stream_name, app_name = source.get('stream_name'), source.get('app_name') stream_name, app_name = source.get('stream_name'), source.get('app_name')
@ -491,15 +538,23 @@ class BrightcoveNewIE(InfoExtractor):
continue continue
tbr = float_or_none(source.get('avg_bitrate'), 1000) tbr = float_or_none(source.get('avg_bitrate'), 1000)
height = int_or_none(source.get('height')) height = int_or_none(source.get('height'))
width = int_or_none(source.get('width'))
f = { f = {
'tbr': tbr, 'tbr': tbr,
'width': int_or_none(source.get('width')),
'height': height,
'filesize': int_or_none(source.get('size')), 'filesize': int_or_none(source.get('size')),
'container': container, 'container': container,
'vcodec': source.get('codec'), 'ext': container.lower(),
'ext': source.get('container').lower(),
} }
if width == 0 and height == 0:
f.update({
'vcodec': 'none',
})
else:
f.update({
'width': width,
'height': height,
'vcodec': source.get('codec'),
})
def build_format_id(kind): def build_format_id(kind):
format_id = kind format_id = kind
@ -513,7 +568,7 @@ class BrightcoveNewIE(InfoExtractor):
f.update({ f.update({
'url': src or streaming_src, 'url': src or streaming_src,
'format_id': build_format_id('http' if src else 'http-streaming'), 'format_id': build_format_id('http' if src else 'http-streaming'),
'preference': 2 if src else 1, 'source_preference': 0 if src else -1,
}) })
else: else:
f.update({ f.update({
@ -524,20 +579,22 @@ class BrightcoveNewIE(InfoExtractor):
formats.append(f) formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
description = json_data.get('description') subtitles = {}
thumbnail = json_data.get('thumbnail') for text_track in json_data.get('text_tracks', []):
timestamp = parse_iso8601(json_data.get('published_at')) if text_track.get('src'):
duration = float_or_none(json_data.get('duration'), 1000) subtitles.setdefault(text_track.get('srclang'), []).append({
tags = json_data.get('tags', []) 'url': text_track['src'],
})
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'description': description, 'description': json_data.get('description'),
'thumbnail': thumbnail, 'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': duration, 'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': timestamp, 'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id, 'uploader_id': account_id,
'formats': formats, 'formats': formats,
'tags': tags, 'subtitles': subtitles,
'tags': json_data.get('tags', []),
} }

View File

@ -6,7 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_urlencode,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -16,7 +16,7 @@ from ..utils import (
class CamdemyIE(InfoExtractor): class CamdemyIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?camdemy\.com/media/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?camdemy\.com/media/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# single file # single file
'url': 'http://www.camdemy.com/media/5181/', 'url': 'http://www.camdemy.com/media/5181/',
@ -104,7 +104,7 @@ class CamdemyIE(InfoExtractor):
class CamdemyFolderIE(InfoExtractor): class CamdemyFolderIE(InfoExtractor):
_VALID_URL = r'http://www.camdemy.com/folder/(?P<id>\d+)' _VALID_URL = r'https?://www.camdemy.com/folder/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
# links with trailing slash # links with trailing slash
'url': 'http://www.camdemy.com/folder/450', 'url': 'http://www.camdemy.com/folder/450',
@ -139,7 +139,7 @@ class CamdemyFolderIE(InfoExtractor):
parsed_url = list(compat_urlparse.urlparse(url)) parsed_url = list(compat_urlparse.urlparse(url))
query = dict(compat_urlparse.parse_qsl(parsed_url[4])) query = dict(compat_urlparse.parse_qsl(parsed_url[4]))
query.update({'displayMode': 'list'}) query.update({'displayMode': 'list'})
parsed_url[4] = compat_urllib_parse.urlencode(query) parsed_url[4] = compat_urllib_parse_urlencode(query)
final_url = compat_urlparse.urlunparse(parsed_url) final_url = compat_urlparse.urlunparse(parsed_url)
page = self._download_webpage(final_url, folder_id) page = self._download_webpage(final_url, folder_id)

View File

@ -0,0 +1,87 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
unified_strdate,
)
class CamWithHerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?camwithher\.tv/view_video\.php\?.*\bviewkey=(?P<id>\w+)'
_TESTS = [{
'url': 'http://camwithher.tv/view_video.php?viewkey=6e9a24e2c0e842e1f177&page=&viewtype=&category=',
'info_dict': {
'id': '5644',
'ext': 'flv',
'title': 'Periscope Tease',
'description': 'In the clouds teasing on periscope to my favorite song',
'duration': 240,
'view_count': int,
'comment_count': int,
'uploader': 'MileenaK',
'upload_date': '20160322',
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=6dfd8b7c97531a459937',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?page=&viewkey=6e9a24e2c0e842e1f177&viewtype=&category=',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=b6c3b5bea9515d1a1fc4&page=&viewtype=&category=mv',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flv_id = self._html_search_regex(
r'<a[^>]+href=["\']/download/\?v=(\d+)', webpage, 'video id')
# Video URL construction algorithm is reverse-engineered from cwhplayer.swf
rtmp_url = 'rtmp://camwithher.tv/clipshare/%s' % (
('mp4:%s.mp4' % flv_id) if int(flv_id) > 2010 else flv_id)
title = self._html_search_regex(
r'<div[^>]+style="float:left"[^>]*>\s*<h2>(.+?)</h2>', webpage, 'title')
description = self._html_search_regex(
r'>Description:</span>(.+?)</div>', webpage, 'description', default=None)
runtime = self._search_regex(
r'Runtime\s*:\s*(.+?) \|', webpage, 'duration', default=None)
if runtime:
runtime = re.sub(r'[\s-]', '', runtime)
duration = parse_duration(runtime)
view_count = int_or_none(self._search_regex(
r'Views\s*:\s*(\d+)', webpage, 'view count', default=None))
comment_count = int_or_none(self._search_regex(
r'Comments\s*:\s*(\d+)', webpage, 'comment count', default=None))
uploader = self._search_regex(
r'Added by\s*:\s*<a[^>]+>([^<]+)</a>', webpage, 'uploader', default=None)
upload_date = unified_strdate(self._search_regex(
r'Added on\s*:\s*([\d-]+)', webpage, 'upload date', default=None))
return {
'id': flv_id,
'url': rtmp_url,
'ext': 'flv',
'no_resume': True,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@ -1,24 +1,41 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .theplatform import ThePlatformIE
from ..utils import ( from ..utils import (
sanitized_Request, xpath_text,
smuggle_url, xpath_element,
int_or_none,
ExtractorError,
find_xpath_attr,
) )
class CBSIE(InfoExtractor): class CBSBaseIE(ThePlatformIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
class CBSIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/', 'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': { 'info_dict': {
'id': '4JUVEwq3wUT7', 'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks', 'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'flv', 'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks', 'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!', 'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
'duration': 1495, 'duration': 1495,
'timestamp': 1385585425,
'upload_date': '20131127',
'uploader': 'CBSI-NEW',
}, },
'params': { 'params': {
# rtmp download # rtmp download
@ -47,22 +64,46 @@ class CBSIE(InfoExtractor):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/', 'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True, 'only_matching': True,
}] }]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
request = sanitized_Request(url) webpage = self._download_webpage(url, display_id)
# Android UA is served with higher quality (720p) streams (see content_id = self._search_regex(
# https://github.com/rg3/youtube-dl/issues/7490) [r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)') webpage, 'content id')
webpage = self._download_webpage(request, display_id) items_data = self._download_xml(
real_id = self._search_regex( 'http://can.cbs.com/thunder/player/videoPlayerService.php',
[r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"], content_id, query={'partner': 'cbs', 'contentId': content_id})
webpage, 'real video ID') video_data = xpath_element(items_data, './/item')
return { title = xpath_text(video_data, 'videoTitle', 'title', True)
'_type': 'url_transparent',
'ie_key': 'ThePlatform', subtitles = {}
'url': smuggle_url( formats = []
'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true&manifest=m3u' % real_id, for item in items_data.findall('.//item'):
{'force_smil_url': True}), pid = xpath_text(item, 'pid')
if not pid:
continue
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid)
except ExtractorError:
continue
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id, 'display_id': display_id,
} 'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info

View File

@ -1,12 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE from .theplatform import ThePlatformIE
from ..utils import int_or_none from ..utils import int_or_none
class CNETIE(ThePlatformIE): class CBSInteractiveIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/' _VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video/share)/(?P<id>[^/?]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/', 'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
'info_dict': { 'info_dict': {
@ -17,6 +19,8 @@ class CNETIE(ThePlatformIE):
'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861', 'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
'uploader': 'Sarah Mitroff', 'uploader': 'Sarah Mitroff',
'duration': 70, 'duration': 70,
'timestamp': 1396479627,
'upload_date': '20140402',
}, },
}, { }, {
'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/', 'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
@ -28,15 +32,38 @@ class CNETIE(ThePlatformIE):
'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40', 'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
'uploader': 'Ashley Esqueda', 'uploader': 'Ashley Esqueda',
'duration': 1482, 'duration': 1482,
'timestamp': 1433289889,
'upload_date': '20150603',
}, },
}, {
'url': 'http://www.zdnet.com/video/share/video-keeping-android-smartphones-and-tablets-secure/',
'info_dict': {
'id': 'bc1af9f0-a2b5-4e54-880d-0d95525781c0',
'ext': 'mp4',
'title': 'Video: Keeping Android smartphones and tablets secure',
'description': 'Here\'s the best way to keep Android devices secure, and what you do when they\'ve come to the end of their lives.',
'uploader_id': 'f2d97ea2-8175-11e2-9d12-0018fe8a00b0',
'uploader': 'Adrian Kingsley-Hughes',
'timestamp': 1448961720,
'upload_date': '20151201',
},
'params': {
# m3u8 download
'skip_download': True,
}
}] }]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true'
MPX_ACCOUNTS = {
'cnet': 2288573011,
'zdnet': 2387448114,
}
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex( data_json = self._html_search_regex(
r"data-cnet-video(?:-uvp)?-options='([^']+)'", r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'",
webpage, 'data json') webpage, 'data json')
data = self._parse_json(data_json, display_id) data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0] vdata = data.get('video') or data['videos'][0]
@ -51,16 +78,15 @@ class CNETIE(ThePlatformIE):
uploader = None uploader = None
uploader_id = None uploader_id = None
metadata = self.get_metadata('kYEXFC/%s' % list(vdata['files'].values())[0], video_id) media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
description = vdata.get('description') or metadata.get('description') formats, subtitles = [], {}
duration = int_or_none(vdata.get('duration')) or metadata.get('duration') if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
formats = [] self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
subtitles = {}
for (fkey, vid) in vdata['files'].items(): for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']: if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue continue
release_url = 'http://link.theplatform.com/s/kYEXFC/%s?format=SMIL&mbr=true' % vid release_url = self.TP_RELEASE_URL_TEMPLATE % vid
if fkey == 'hds': if fkey == 'hds':
release_url += '&manifest=f4m' release_url += '&manifest=f4m'
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey) tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
@ -68,15 +94,15 @@ class CNETIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats) self._sort_formats(formats)
return { info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'duration': int_or_none(vdata.get('duration')),
'thumbnail': metadata.get('thumbnail'),
'duration': duration,
'uploader': uploader, 'uploader': uploader,
'uploader_id': uploader_id, 'uploader_id': uploader_id,
'subtitles': subtitles, 'subtitles': subtitles,
'formats': formats, 'formats': formats,
} })
return info

View File

@ -2,16 +2,15 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from .theplatform import ThePlatformIE from .cbs import CBSBaseIE
from ..utils import ( from ..utils import (
parse_duration, parse_duration,
find_xpath_attr,
) )
class CBSNewsIE(ThePlatformIE): class CBSNewsIE(CBSBaseIE):
IE_DESC = 'CBS News' IE_DESC = 'CBS News'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
_TESTS = [ _TESTS = [
{ {
@ -49,15 +48,6 @@ class CBSNewsIE(ThePlatformIE):
}, },
] ]
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -78,7 +68,7 @@ class CBSNewsIE(ThePlatformIE):
pid = item.get('media' + format_id) pid = item.get('media' + format_id)
if not pid: if not pid:
continue continue
release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?format=SMIL&mbr=true' % pid release_url = 'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true' % pid
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid) tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % pid)
formats.extend(tp_formats) formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles) subtitles = self._merge_subtitles(subtitles, tp_subtitles)
@ -96,7 +86,7 @@ class CBSNewsIE(ThePlatformIE):
class CBSNewsLiveVideoIE(InfoExtractor): class CBSNewsLiveVideoIE(InfoExtractor):
IE_DESC = 'CBS News Live Videos' IE_DESC = 'CBS News Live Videos'
_VALID_URL = r'http://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)' _VALID_URL = r'https?://(?:www\.)?cbsnews\.com/live/video/(?P<id>[\da-z_-]+)'
_TEST = { _TEST = {
'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/', 'url': 'http://www.cbsnews.com/live/video/clinton-sanders-prepare-to-face-off-in-nh/',
@ -122,6 +112,7 @@ class CBSNewsLiveVideoIE(InfoExtractor):
for entry in f4m_formats: for entry in f4m_formats:
# URLs without the extra param induce an 404 error # URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign}) entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
return { return {
'id': video_id, 'id': video_id,

View File

@ -6,7 +6,7 @@ from .common import InfoExtractor
class CBSSportsIE(InfoExtractor): class CBSSportsIE(InfoExtractor):
_VALID_URL = r'http://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)' _VALID_URL = r'https?://www\.cbssports\.com/video/player/(?P<section>[^/]+)/(?P<id>[^/]+)'
_TEST = { _TEST = {
'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s', 'url': 'http://www.cbssports.com/video/player/tennis/318462531970/0/us-open-flashbacks-1990s',

96
youtube_dl/extractor/cda.py Executable file
View File

@ -0,0 +1,96 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
decode_packed_codes,
ExtractorError,
parse_duration
)
class CDAIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?cda\.pl/video|ebd\.cda\.pl/[0-9]+x[0-9]+)/(?P<id>[0-9a-z]+)'
_TESTS = [{
'url': 'http://www.cda.pl/video/5749950c',
'md5': '6f844bf51b15f31fae165365707ae970',
'info_dict': {
'id': '5749950c',
'ext': 'mp4',
'height': 720,
'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
'duration': 39
}
}, {
'url': 'http://www.cda.pl/video/57413289',
'md5': 'a88828770a8310fc00be6c95faf7f4d5',
'info_dict': {
'id': '57413289',
'ext': 'mp4',
'title': 'Lądowanie na lotnisku na Maderze',
'duration': 137
}
}, {
'url': 'http://ebd.cda.pl/0x0/5749950c',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage('http://ebd.cda.pl/0x0/' + video_id, video_id)
if 'Ten film jest dostępny dla użytkowników premium' in webpage:
raise ExtractorError('This video is only available for premium users.', expected=True)
title = self._html_search_regex(r'<title>(.+?)</title>', webpage, 'title')
formats = []
info_dict = {
'id': video_id,
'title': title,
'formats': formats,
'duration': None,
}
def extract_format(page, version):
unpacked = decode_packed_codes(page)
format_url = self._search_regex(
r"url:\\'(.+?)\\'", unpacked, '%s url' % version, fatal=False)
if not format_url:
return
f = {
'url': format_url,
}
m = re.search(
r'<a[^>]+data-quality="(?P<format_id>[^"]+)"[^>]+href="[^"]+"[^>]+class="[^"]*quality-btn-active[^"]*">(?P<height>[0-9]+)p',
page)
if m:
f.update({
'format_id': m.group('format_id'),
'height': int(m.group('height')),
})
info_dict['formats'].append(f)
if not info_dict['duration']:
info_dict['duration'] = parse_duration(self._search_regex(
r"duration:\\'(.+?)\\'", unpacked, 'duration', fatal=False))
extract_format(webpage, 'default')
for href, resolution in re.findall(
r'<a[^>]+data-quality="[^"]+"[^>]+href="([^"]+)"[^>]+class="quality-btn"[^>]*>([0-9]+p)',
webpage):
webpage = self._download_webpage(
href, video_id, 'Downloading %s version information' % resolution, fatal=False)
if not webpage:
# Manually report warning because empty page is returned when
# invalid version is requested.
self.report_warning('Unable to download %s version information' % resolution)
continue
extract_format(webpage, resolution)
self._sort_formats(formats)
return info_dict

View File

@ -5,7 +5,6 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
@ -13,6 +12,7 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata,
) )
@ -102,7 +102,7 @@ class CeskaTelevizeIE(InfoExtractor):
req = sanitized_Request( req = sanitized_Request(
'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist', 'http://www.ceskatelevize.cz/ivysilani/ajax/get-client-playlist',
data=compat_urllib_parse.urlencode(data)) data=urlencode_postdata(data))
req.add_header('Content-type', 'application/x-www-form-urlencoded') req.add_header('Content-type', 'application/x-www-form-urlencoded')
req.add_header('x-addr', '127.0.0.1') req.add_header('x-addr', '127.0.0.1')
@ -129,7 +129,8 @@ class CeskaTelevizeIE(InfoExtractor):
formats = [] formats = []
for format_id, stream_url in item['streamUrls'].items(): for format_id, stream_url in item['streamUrls'].items():
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
stream_url, playlist_id, 'mp4', entry_protocol='m3u8_native')) stream_url, playlist_id, 'mp4',
entry_protocol='m3u8_native', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
item_id = item.get('id') or item['assetId'] item_id = item.get('id') or item['assetId']

View File

@ -48,6 +48,7 @@ class ChaturbateIE(InfoExtractor):
raise ExtractorError('Unable to find stream URL') raise ExtractorError('Unable to find stream URL')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4') formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,

View File

@ -19,7 +19,7 @@ def _decode(s):
class CliphunterIE(InfoExtractor): class CliphunterIE(InfoExtractor):
IE_NAME = 'cliphunter' IE_NAME = 'cliphunter'
_VALID_URL = r'''(?x)http://(?:www\.)?cliphunter\.com/w/ _VALID_URL = r'''(?x)https?://(?:www\.)?cliphunter\.com/w/
(?P<id>[0-9]+)/ (?P<id>[0-9]+)/
(?P<seo>.+?)(?:$|[#\?]) (?P<seo>.+?)(?:$|[#\?])
''' '''

View File

@ -8,7 +8,7 @@ from ..utils import (
class ClipsyndicateIE(InfoExtractor): class ClipsyndicateIE(InfoExtractor):
_VALID_URL = r'http://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)' _VALID_URL = r'https?://(?:chic|www)\.clipsyndicate\.com/video/play(list/\d+)?/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe', 'url': 'http://www.clipsyndicate.com/video/play/4629301/brick_briscoe',

View File

@ -6,7 +6,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urllib_parse, compat_urllib_parse_urlencode,
compat_HTTPError, compat_HTTPError,
) )
from ..utils import ( from ..utils import (
@ -64,7 +64,7 @@ class CloudyIE(InfoExtractor):
'errorUrl': error_url, 'errorUrl': error_url,
}) })
data_url = self._API_URL % (video_host, compat_urllib_parse.urlencode(form)) data_url = self._API_URL % (video_host, compat_urllib_parse_urlencode(form))
player_data = self._download_webpage( player_data = self._download_webpage(
data_url, video_id, 'Downloading player data') data_url, video_id, 'Downloading player data')
data = compat_parse_qs(player_data) data = compat_parse_qs(player_data)

View File

@ -12,7 +12,7 @@ from ..utils import (
class ClubicIE(InfoExtractor): class ClubicIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?clubic\.com/video/(?:[^/]+/)*video.*-(?P<id>[0-9]+)\.html' _VALID_URL = r'https?://(?:www\.)?clubic\.com/video/(?:[^/]+/)*video.*-(?P<id>[0-9]+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.clubic.com/video/clubic-week/video-clubic-week-2-0-le-fbi-se-lance-dans-la-photo-d-identite-448474.html', 'url': 'http://www.clubic.com/video/clubic-week/video-clubic-week-2-0-le-fbi-se-lance-dans-la-photo-d-identite-448474.html',

View File

@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class CNBCIE(InfoExtractor):
_VALID_URL = r'https?://video\.cnbc\.com/gallery/\?video=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://video.cnbc.com/gallery/?video=3000503714',
'info_dict': {
'id': '3000503714',
'ext': 'mp4',
'title': 'Fighting zombies is big business',
'description': 'md5:0c100d8e1a7947bd2feec9a5550e519e',
'timestamp': 1459332000,
'upload_date': '20160330',
'uploader': 'NBCU-CNBC',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/gZWlPC/media/guid/2408950221/%s?mbr=true&manifest=m3u' % video_id,
{'force_smil_url': True}),
'id': video_id,
}

View File

@ -11,7 +11,7 @@ from ..utils import (
class ComCarCoffIE(InfoExtractor): class ComCarCoffIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)' _VALID_URL = r'https?://(?:www\.)?comediansincarsgettingcoffee\.com/(?P<id>[a-z0-9\-]*)'
_TESTS = [{ _TESTS = [{
'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/', 'url': 'http://comediansincarsgettingcoffee.com/miranda-sings-happy-thanksgiving-miranda/',
'info_dict': { 'info_dict': {
@ -41,7 +41,13 @@ class ComCarCoffIE(InfoExtractor):
display_id = full_data['activeVideo']['video'] display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id] video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId']) video_id = compat_str(video_data['mediaId'])
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{ thumbnails = [{
'url': video_data['images']['thumb'], 'url': video_data['images']['thumb'],
}, { }, {
@ -54,15 +60,14 @@ class ComCarCoffIE(InfoExtractor):
video_data.get('duration')) video_data.get('duration'))
return { return {
'_type': 'url_transparent',
'url': 'crackle:%s' % video_id,
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': video_data['title'], 'title': title,
'description': video_data.get('description'), 'description': video_data.get('description'),
'timestamp': timestamp, 'timestamp': timestamp,
'duration': duration, 'duration': duration,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')), 'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')), 'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))), 'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),

View File

@ -5,7 +5,7 @@ import re
from .mtv import MTVServicesInfoExtractor from .mtv import MTVServicesInfoExtractor
from ..compat import ( from ..compat import (
compat_str, compat_str,
compat_urllib_parse, compat_urllib_parse_urlencode,
) )
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
@ -201,7 +201,7 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
# Correct cc.com in uri # Correct cc.com in uri
uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri) uri = re.sub(r'(episode:[^.]+)(\.cc)?\.com', r'\1.com', uri)
index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse.urlencode({'uri': uri})) index_url = 'http://%s.cc.com/feeds/mrss?%s' % (show_name, compat_urllib_parse_urlencode({'uri': uri}))
idoc = self._download_xml( idoc = self._download_xml(
index_url, epTitle, index_url, epTitle,
'Downloading show index', 'Unable to download episode index') 'Downloading show index', 'Unable to download episode index')

View File

@ -21,9 +21,11 @@ from ..compat import (
compat_os_name, compat_os_name,
compat_str, compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse, compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse, compat_urlparse,
) )
from ..downloader.f4m import remove_encrypted_media
from ..utils import ( from ..utils import (
NO_DEFAULT, NO_DEFAULT,
age_restricted, age_restricted,
@ -48,6 +50,7 @@ from ..utils import (
determine_protocol, determine_protocol,
parse_duration, parse_duration,
mimetype2ext, mimetype2ext,
update_Request,
update_url_query, update_url_query,
) )
@ -346,7 +349,7 @@ class InfoExtractor(object):
def IE_NAME(self): def IE_NAME(self):
return compat_str(type(self).__name__[:-2]) return compat_str(type(self).__name__[:-2])
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers=None, query=None): def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
""" Returns the response handle """ """ Returns the response handle """
if note is None: if note is None:
self.report_download_webpage(video_id) self.report_download_webpage(video_id)
@ -356,11 +359,14 @@ class InfoExtractor(object):
else: else:
self.to_screen('%s: %s' % (video_id, note)) self.to_screen('%s: %s' % (video_id, note))
# data, headers and query params will be ignored for `Request` objects # data, headers and query params will be ignored for `Request` objects
if isinstance(url_or_request, compat_str): if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query)
else:
if query: if query:
url_or_request = update_url_query(url_or_request, query) url_or_request = update_url_query(url_or_request, query)
if data or headers: if data or headers:
url_or_request = sanitized_Request(url_or_request, data, headers or {}) url_or_request = sanitized_Request(url_or_request, data, headers)
try: try:
return self._downloader.urlopen(url_or_request) return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
@ -376,7 +382,7 @@ class InfoExtractor(object):
self._downloader.report_warning(errmsg) self._downloader.report_warning(errmsg)
return False return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers=None, query=None): def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}):
""" Returns a tuple (page content as string, URL handle) """ """ Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038) # Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)): if isinstance(url_or_request, (compat_str, str)):
@ -469,7 +475,7 @@ class InfoExtractor(object):
return content return content
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers=None, query=None): def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}):
""" Returns the data of the page as a string """ """ Returns the data of the page as a string """
success = False success = False
try_count = 0 try_count = 0
@ -490,7 +496,7 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id, def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML', note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None, data=None, headers=None, query=None): transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
"""Return the xml as an xml.etree.ElementTree.Element""" """Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage( xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query) url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
@ -504,7 +510,7 @@ class InfoExtractor(object):
note='Downloading JSON metadata', note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', errnote='Unable to download JSON metadata',
transform_source=None, transform_source=None,
fatal=True, encoding=None, data=None, headers=None, query=None): fatal=True, encoding=None, data=None, headers={}, query={}):
json_string = self._download_webpage( json_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query) encoding=encoding, data=data, headers=headers, query=query)
@ -862,6 +868,7 @@ class InfoExtractor(object):
proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1 proto_preference = 0 if determine_protocol(f) in ['http', 'https'] else -0.1
if f.get('vcodec') == 'none': # audio only if f.get('vcodec') == 'none': # audio only
preference -= 50
if self._downloader.params.get('prefer_free_formats'): if self._downloader.params.get('prefer_free_formats'):
ORDER = ['aac', 'mp3', 'm4a', 'webm', 'ogg', 'opus'] ORDER = ['aac', 'mp3', 'm4a', 'webm', 'ogg', 'opus']
else: else:
@ -872,6 +879,8 @@ class InfoExtractor(object):
except ValueError: except ValueError:
audio_ext_preference = -1 audio_ext_preference = -1
else: else:
if f.get('acodec') == 'none': # video only
preference -= 40
if self._downloader.params.get('prefer_free_formats'): if self._downloader.params.get('prefer_free_formats'):
ORDER = ['flv', 'mp4', 'webm'] ORDER = ['flv', 'mp4', 'webm']
else: else:
@ -986,6 +995,11 @@ class InfoExtractor(object):
if not media_nodes: if not media_nodes:
manifest_version = '2.0' manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media') media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
# Remove unsupported DRM protected media from final formats
# rendition (see https://github.com/rg3/youtube-dl/issues/8573).
media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes:
return formats
base_url = xpath_text( base_url = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'], manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
'base URL', default=None) 'base URL', default=None)
@ -1018,8 +1032,6 @@ class InfoExtractor(object):
'height': int_or_none(media_el.attrib.get('height')), 'height': int_or_none(media_el.attrib.get('height')),
'preference': preference, 'preference': preference,
}) })
self._sort_formats(formats)
return formats return formats
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None, def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
@ -1140,7 +1152,6 @@ class InfoExtractor(object):
last_media = None last_media = None
formats.append(f) formats.append(f)
last_info = {} last_info = {}
self._sort_formats(formats)
return formats return formats
@staticmethod @staticmethod
@ -1297,7 +1308,7 @@ class InfoExtractor(object):
'plugin': 'flowplayer-3.2.0.1', 'plugin': 'flowplayer-3.2.0.1',
} }
f4m_url += '&' if '?' in f4m_url else '?' f4m_url += '&' if '?' in f4m_url else '?'
f4m_url += compat_urllib_parse.urlencode(f4m_params) f4m_url += compat_urllib_parse_urlencode(f4m_params)
formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False)) formats.extend(self._extract_f4m_formats(f4m_url, video_id, f4m_id='hds', fatal=False))
continue continue
@ -1314,8 +1325,6 @@ class InfoExtractor(object):
}) })
continue continue
self._sort_formats(formats)
return formats return formats
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'): def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
@ -1326,7 +1335,7 @@ class InfoExtractor(object):
if not src or src in urls: if not src or src in urls:
continue continue
urls.append(src) urls.append(src)
ext = textstream.get('ext') or determine_ext(src) or mimetype2ext(textstream.get('type')) ext = textstream.get('ext') or mimetype2ext(textstream.get('type')) or determine_ext(src)
lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
subtitles.setdefault(lang, []).append({ subtitles.setdefault(lang, []).append({
'url': src, 'url': src,
@ -1506,9 +1515,16 @@ class InfoExtractor(object):
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration)) representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template'] media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id) media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)(?:%(0\d+)d)?\$', r'%(\1)\2d', media_template) media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%(\d+)\$', r'%(\1)\2d', media_template)
media_template.replace('$$', '$') media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [media_template % {'Number': segment_number, 'Bandwidth': representation_attrib.get('bandwidth')} for segment_number in range(representation_ms_info['start_number'], representation_ms_info['total_number'] + representation_ms_info['start_number'])] representation_ms_info['segment_urls'] = [
media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth')}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
if 'segment_urls' in representation_ms_info: if 'segment_urls' in representation_ms_info:
f.update({ f.update({
'segment_urls': representation_ms_info['segment_urls'], 'segment_urls': representation_ms_info['segment_urls'],
@ -1533,7 +1549,6 @@ class InfoExtractor(object):
existing_format.update(f) existing_format.update(f)
else: else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type) self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
self._sort_formats(formats)
return formats return formats
def _live_title(self, name): def _live_title(self, name):

View File

@ -0,0 +1,36 @@
from __future__ import unicode_literals
import os
from .common import InfoExtractor
from ..compat import (
compat_urllib_parse_unquote,
compat_urlparse,
)
from ..utils import url_basename
class RtmpIE(InfoExtractor):
IE_DESC = False # Do not list
_VALID_URL = r'(?i)rtmp[est]?://.+'
_TESTS = [{
'url': 'rtmp://cp44293.edgefcs.net/ondemand?auth=daEcTdydfdqcsb8cZcDbAaCbhamacbbawaS-bw7dBb-bWG-GqpGFqCpNCnGoyL&aifp=v001&slist=public/unsecure/audio/2c97899446428e4301471a8cb72b4b97--audio--pmg-20110908-0900a_flv_aac_med_int.mp4',
'only_matching': True,
}, {
'url': 'rtmp://edge.live.hitbox.tv/live/dimak',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(os.path.splitext(url.rstrip('/').split('/')[-1])[0])
title = compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0])
return {
'id': video_id,
'title': title,
'formats': [{
'url': url,
'ext': 'flv',
'format_id': compat_urlparse.urlparse(url).scheme,
}],
}

View File

@ -5,7 +5,7 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
) )
@ -45,7 +45,7 @@ class CondeNastIE(InfoExtractor):
'wmagazine': 'W Magazine', 'wmagazine': 'W Magazine',
} }
_VALID_URL = r'http://(?:video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed(?:js)?)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys()) _VALID_URL = r'https?://(?:video|www|player)\.(?P<site>%s)\.com/(?P<type>watch|series|video|embed(?:js)?)/(?P<id>[^/?#]+)' % '|'.join(_SITES.keys())
IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values())) IE_DESC = 'Condé Nast media group: %s' % ', '.join(sorted(_SITES.values()))
EMBED_URL = r'(?:https?:)?//player\.(?P<site>%s)\.com/(?P<type>embed(?:js)?)/.+?' % '|'.join(_SITES.keys()) EMBED_URL = r'(?:https?:)?//player\.(?P<site>%s)\.com/(?P<type>embed(?:js)?)/.+?' % '|'.join(_SITES.keys())
@ -97,7 +97,7 @@ class CondeNastIE(InfoExtractor):
video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id') video_id = self._search_regex(r'videoId: [\'"](.+?)[\'"]', params, 'video id')
player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id') player_id = self._search_regex(r'playerId: [\'"](.+?)[\'"]', params, 'player id')
target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target') target = self._search_regex(r'target: [\'"](.+?)[\'"]', params, 'target')
data = compat_urllib_parse.urlencode({'videoId': video_id, data = compat_urllib_parse_urlencode({'videoId': video_id,
'playerId': player_id, 'playerId': player_id,
'target': target, 'target': target,
}) })

View File

@ -11,8 +11,8 @@ from math import pow, sqrt, floor
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_urllib_parse,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
) )
@ -54,7 +54,7 @@ class CrunchyrollBaseIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None): def _download_webpage(self, url_or_request, *args, **kwargs):
request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request) request = (url_or_request if isinstance(url_or_request, compat_urllib_request.Request)
else sanitized_Request(url_or_request)) else sanitized_Request(url_or_request))
# Accept-Language must be set explicitly to accept any language to avoid issues # Accept-Language must be set explicitly to accept any language to avoid issues
@ -65,8 +65,7 @@ class CrunchyrollBaseIE(InfoExtractor):
# Crunchyroll to not work in georestriction cases in some browsers that don't place # Crunchyroll to not work in georestriction cases in some browsers that don't place
# the locale lang first in header. However allowing any language seems to workaround the issue. # the locale lang first in header. However allowing any language seems to workaround the issue.
request.add_header('Accept-Language', '*') request.add_header('Accept-Language', '*')
return super(CrunchyrollBaseIE, self)._download_webpage( return super(CrunchyrollBaseIE, self)._download_webpage(request, *args, **kwargs)
request, video_id, note, errnote, fatal, tries, timeout, encoding)
@staticmethod @staticmethod
def _add_skip_wall(url): def _add_skip_wall(url):
@ -79,7 +78,7 @@ class CrunchyrollBaseIE(InfoExtractor):
# See https://github.com/rg3/youtube-dl/issues/7202. # See https://github.com/rg3/youtube-dl/issues/7202.
qs['skip_wall'] = ['1'] qs['skip_wall'] = ['1']
return compat_urlparse.urlunparse( return compat_urlparse.urlunparse(
parsed_url._replace(query=compat_urllib_parse.urlencode(qs, True))) parsed_url._replace(query=compat_urllib_parse_urlencode(qs, True)))
class CrunchyrollIE(CrunchyrollBaseIE): class CrunchyrollIE(CrunchyrollBaseIE):
@ -309,7 +308,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url')) playerdata_url = compat_urllib_parse_unquote(self._html_search_regex(r'"config_url":"([^"]+)', webpage, 'playerdata_url'))
playerdata_req = sanitized_Request(playerdata_url) playerdata_req = sanitized_Request(playerdata_url)
playerdata_req.data = compat_urllib_parse.urlencode({'current_page': webpage_url}) playerdata_req.data = urlencode_postdata({'current_page': webpage_url})
playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') playerdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info') playerdata = self._download_webpage(playerdata_req, video_id, note='Downloading media info')
@ -323,7 +322,7 @@ Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
streamdata_req = sanitized_Request( streamdata_req = sanitized_Request(
'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s' 'http://www.crunchyroll.com/xml/?req=RpcApiVideoPlayer_GetStandardConfig&media_id=%s&video_format=%s&video_quality=%s'
% (stream_id, stream_format, stream_quality), % (stream_id, stream_format, stream_quality),
compat_urllib_parse.urlencode({'current_page': url}).encode('utf-8')) compat_urllib_parse_urlencode({'current_page': url}).encode('utf-8'))
streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded') streamdata_req.add_header('Content-Type', 'application/x-www-form-urlencoded')
streamdata = self._download_xml( streamdata = self._download_xml(
streamdata_req, video_id, streamdata_req, video_id,

View File

@ -15,7 +15,7 @@ from .senateisvp import SenateISVPIE
class CSpanIE(InfoExtractor): class CSpanIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)' _VALID_URL = r'https?://(?:www\.)?c-span\.org/video/\?(?P<id>[0-9a-f]+)'
IE_DESC = 'C-SPAN' IE_DESC = 'C-SPAN'
_TESTS = [{ _TESTS = [{
'url': 'http://www.c-span.org/video/?313572-1/HolderonV', 'url': 'http://www.c-span.org/video/?313572-1/HolderonV',

View File

@ -8,7 +8,7 @@ from ..utils import parse_iso8601, ExtractorError
class CtsNewsIE(InfoExtractor): class CtsNewsIE(InfoExtractor):
IE_DESC = '華視新聞' IE_DESC = '華視新聞'
# https connection failed (Connection reset) # https connection failed (Connection reset)
_VALID_URL = r'http://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html' _VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html', 'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
'md5': 'a9875cb790252b08431186d741beaabe', 'md5': 'a9875cb790252b08431186d741beaabe',

View File

@ -57,6 +57,7 @@ class CWTVIE(InfoExtractor):
formats = self._extract_m3u8_formats( formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4') video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{ thumbnails = [{
'url': image['uri'], 'url': image['uri'],

View File

@ -8,8 +8,8 @@ import itertools
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_parse_qs, compat_parse_qs,
compat_urllib_parse,
compat_urllib_parse_unquote, compat_urllib_parse_unquote,
compat_urllib_parse_urlencode,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -70,7 +70,7 @@ class DaumIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url)) video_id = compat_urllib_parse_unquote(self._match_id(url))
query = compat_urllib_parse.urlencode({'vid': video_id}) query = compat_urllib_parse_urlencode({'vid': video_id})
movie_data = self._download_json( movie_data = self._download_json(
'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query, 'http://videofarm.daum.net/controller/api/closed/v1_2/IntegratedMovieData.json?' + query,
video_id, 'Downloading video formats info') video_id, 'Downloading video formats info')
@ -86,7 +86,7 @@ class DaumIE(InfoExtractor):
formats = [] formats = []
for format_el in movie_data['output_list']['output_list']: for format_el in movie_data['output_list']['output_list']:
profile = format_el['profile'] profile = format_el['profile']
format_query = compat_urllib_parse.urlencode({ format_query = compat_urllib_parse_urlencode({
'vid': video_id, 'vid': video_id,
'profile': profile, 'profile': profile,
}) })

View File

@ -6,7 +6,7 @@ import base64
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse_urlencode,
compat_str, compat_str,
) )
from ..utils import ( from ..utils import (
@ -15,6 +15,7 @@ from ..utils import (
sanitized_Request, sanitized_Request,
smuggle_url, smuggle_url,
unsmuggle_url, unsmuggle_url,
urlencode_postdata,
) )
@ -106,7 +107,7 @@ class DCNVideoIE(DCNBaseIE):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' + 'http://admin.mangomolo.com/analytics/index.php/customers/embed/video?' +
compat_urllib_parse.urlencode({ compat_urllib_parse_urlencode({
'id': video_data['id'], 'id': video_data['id'],
'user_id': video_data['user_id'], 'user_id': video_data['user_id'],
'signature': video_data['signature'], 'signature': video_data['signature'],
@ -133,7 +134,7 @@ class DCNLiveIE(DCNBaseIE):
webpage = self._download_webpage( webpage = self._download_webpage(
'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' + 'http://admin.mangomolo.com/analytics/index.php/customers/embed/index?' +
compat_urllib_parse.urlencode({ compat_urllib_parse_urlencode({
'id': base64.b64encode(channel_data['user_id'].encode()).decode(), 'id': base64.b64encode(channel_data['user_id'].encode()).decode(),
'channelid': base64.b64encode(channel_data['id'].encode()).decode(), 'channelid': base64.b64encode(channel_data['id'].encode()).decode(),
'signature': channel_data['signature'], 'signature': channel_data['signature'],
@ -174,7 +175,7 @@ class DCNSeasonIE(InfoExtractor):
data['show_id'] = show_id data['show_id'] = show_id
request = sanitized_Request( request = sanitized_Request(
'http://admin.mangomolo.com/analytics/index.php/plus/show', 'http://admin.mangomolo.com/analytics/index.php/plus/show',
compat_urllib_parse.urlencode(data), urlencode_postdata(data),
{ {
'Origin': 'http://www.dcndigital.ae', 'Origin': 'http://www.dcndigital.ae',
'Content-Type': 'application/x-www-form-urlencoded' 'Content-Type': 'application/x-www-form-urlencoded'

View File

@ -6,7 +6,7 @@ from ..compat import compat_str
class DctpTvIE(InfoExtractor): class DctpTvIE(InfoExtractor):
_VALID_URL = r'http://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$' _VALID_URL = r'https?://www.dctp.tv/(#/)?filme/(?P<id>.+?)/$'
_TEST = { _TEST = {
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/', 'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
'info_dict': { 'info_dict': {

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class DefenseGouvFrIE(InfoExtractor): class DefenseGouvFrIE(InfoExtractor):
IE_NAME = 'defense.gouv.fr' IE_NAME = 'defense.gouv.fr'
_VALID_URL = r'http://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)' _VALID_URL = r'https?://.*?\.defense\.gouv\.fr/layout/set/ligthboxvideo/base-de-medias/webtv/(?P<id>[^/?#]*)'
_TEST = { _TEST = {
'url': 'http://www.defense.gouv.fr/layout/set/ligthboxvideo/base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1', 'url': 'http://www.defense.gouv.fr/layout/set/ligthboxvideo/base-de-medias/webtv/attaque-chimique-syrienne-du-21-aout-2013-1',

View File

@ -38,6 +38,7 @@ class DFBIE(InfoExtractor):
token_el = f4m_info.find('token') token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0' manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id) formats = self._extract_f4m_formats(manifest_url, display_id)
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,

View File

@ -9,7 +9,7 @@ from ..compat import compat_str
class DiscoveryIE(InfoExtractor): class DiscoveryIE(InfoExtractor):
_VALID_URL = r'''(?x)http://(?:www\.)?(?: _VALID_URL = r'''(?x)https?://(?:www\.)?(?:
discovery| discovery|
investigationdiscovery| investigationdiscovery|
discoverylife| discoverylife|
@ -63,11 +63,16 @@ class DiscoveryIE(InfoExtractor):
video_title = info.get('playlist_title') or info.get('video_title') video_title = info.get('playlist_title') or info.get('video_title')
entries = [{ entries = []
'id': compat_str(video_info['id']),
'formats': self._extract_m3u8_formats( for idx, video_info in enumerate(info['playlist']):
formats = self._extract_m3u8_formats(
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls', video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Download m3u8 information for video %d' % (idx + 1)), note='Download m3u8 information for video %d' % (idx + 1))
self._sort_formats(formats)
entries.append({
'id': compat_str(video_info['id']),
'formats': formats,
'title': video_info['title'], 'title': video_info['title'],
'description': video_info.get('description'), 'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')), 'duration': parse_duration(video_info.get('video_length')),
@ -75,6 +80,6 @@ class DiscoveryIE(InfoExtractor):
'thumbnail': video_info.get('thumbnailURL'), 'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'), 'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')), 'timestamp': parse_iso8601(video_info.get('publishedDate')),
} for idx, video_info in enumerate(info['playlist'])] })
return self.playlist_result(entries, display_id, video_title) return self.playlist_result(entries, display_id, video_title)

View File

@ -10,7 +10,7 @@ from ..compat import (compat_str, compat_basestring)
class DouyuTVIE(InfoExtractor): class DouyuTVIE(InfoExtractor):
IE_DESC = '斗鱼' IE_DESC = '斗鱼'
_VALID_URL = r'http://(?:www\.)?douyutv\.com/(?P<id>[A-Za-z0-9]+)' _VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?P<id>[A-Za-z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.douyutv.com/iseven', 'url': 'http://www.douyutv.com/iseven',
'info_dict': { 'info_dict': {
@ -60,6 +60,9 @@ class DouyuTVIE(InfoExtractor):
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, {
'url': 'http://www.douyu.com/xiaocang',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -10,7 +10,7 @@ from ..utils import int_or_none
class DPlayIE(InfoExtractor): class DPlayIE(InfoExtractor):
_VALID_URL = r'http://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/', 'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
@ -118,6 +118,8 @@ class DPlayIE(InfoExtractor):
if info.get(protocol): if info.get(protocol):
extract_formats(protocol, info[protocol]) extract_formats(protocol, info[protocol])
self._sort_formats(formats)
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,

View File

@ -6,7 +6,6 @@ import itertools
from .amp import AMPIE from .amp import AMPIE
from ..compat import ( from ..compat import (
compat_HTTPError, compat_HTTPError,
compat_urllib_parse,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
@ -14,6 +13,7 @@ from ..utils import (
clean_html, clean_html,
int_or_none, int_or_none,
sanitized_Request, sanitized_Request,
urlencode_postdata
) )
@ -50,7 +50,7 @@ class DramaFeverBaseIE(AMPIE):
} }
request = sanitized_Request( request = sanitized_Request(
self._LOGIN_URL, compat_urllib_parse.urlencode(login_form).encode('utf-8')) self._LOGIN_URL, urlencode_postdata(login_form))
response = self._download_webpage( response = self._download_webpage(
request, None, 'Logging in as %s' % username) request, None, 'Logging in as %s' % username)

View File

@ -7,7 +7,7 @@ from .zdf import ZDFIE
class DreiSatIE(ZDFIE): class DreiSatIE(ZDFIE):
IE_NAME = '3sat' IE_NAME = '3sat'
_VALID_URL = r'(?:http://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$' _VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
_TESTS = [ _TESTS = [
{ {
'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918', 'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',

View File

@ -15,7 +15,7 @@ class DVTVIE(InfoExtractor):
IE_NAME = 'dvtv' IE_NAME = 'dvtv'
IE_DESC = 'http://video.aktualne.cz/' IE_DESC = 'http://video.aktualne.cz/'
_VALID_URL = r'http://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})' _VALID_URL = r'https?://video\.aktualne\.cz/(?:[^/]+/)+r~(?P<id>[0-9a-f]{32})'
_TESTS = [{ _TESTS = [{
'url': 'http://video.aktualne.cz/dvtv/vondra-o-ceskem-stoleti-pri-pohledu-na-havla-mi-bylo-trapne/r~e5efe9ca855511e4833a0025900fea04/', 'url': 'http://video.aktualne.cz/dvtv/vondra-o-ceskem-stoleti-pri-pohledu-na-havla-mi-bylo-trapne/r~e5efe9ca855511e4833a0025900fea04/',

View File

@ -39,13 +39,13 @@ class DWIE(InfoExtractor):
hidden_inputs = self._hidden_inputs(webpage) hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title'] title = hidden_inputs['media_title']
formats = []
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1': if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats( formats = self._extract_smil_formats(
'http://www.dw.com/smil/v-%s' % media_id, media_id, 'http://www.dw.com/smil/v-%s' % media_id, media_id,
transform_source=lambda s: s.replace( transform_source=lambda s: s.replace(
'rtmp://tv-od.dw.de/flash/', 'rtmp://tv-od.dw.de/flash/',
'http://tv-download.dw.de/dwtv_video/flv/')) 'http://tv-download.dw.de/dwtv_video/flv/'))
self._sort_formats(formats)
else: else:
formats = [{'url': hidden_inputs['file_name']}] formats = [{'url': hidden_inputs['file_name']}]

View File

@ -7,7 +7,7 @@ from .common import InfoExtractor
class EchoMskIE(InfoExtractor): class EchoMskIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?echo\.msk\.ru/sounds/(?P<id>\d+)'
_TEST = { _TEST = {
'url': 'http://www.echo.msk.ru/sounds/1464134.html', 'url': 'http://www.echo.msk.ru/sounds/1464134.html',
'md5': '2e44b3b78daff5b458e4dbc37f191f7c', 'md5': '2e44b3b78daff5b458e4dbc37f191f7c',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse from ..compat import compat_urllib_parse_urlencode
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
unescapeHTML unescapeHTML
@ -43,7 +43,7 @@ class EroProfileIE(InfoExtractor):
if username is None: if username is None:
return return
query = compat_urllib_parse.urlencode({ query = compat_urllib_parse_urlencode({
'username': username, 'username': username,
'password': password, 'password': password,
'url': 'http://www.eroprofile.com/', 'url': 'http://www.eroprofile.com/',

View File

@ -8,7 +8,7 @@ from .common import InfoExtractor
class ExfmIE(InfoExtractor): class ExfmIE(InfoExtractor):
IE_NAME = 'exfm' IE_NAME = 'exfm'
IE_DESC = 'ex.fm' IE_DESC = 'ex.fm'
_VALID_URL = r'http://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?ex\.fm/song/(?P<id>[^/]+)'
_SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream' _SOUNDCLOUD_URL = r'http://(?:www\.)?api\.soundcloud\.com/tracks/([^/]+)/stream'
_TESTS = [ _TESTS = [
{ {

View File

@ -5,19 +5,18 @@ import hashlib
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse,
compat_urllib_request, compat_urllib_request,
compat_urlparse, compat_urlparse,
) )
from ..utils import ( from ..utils import (
encode_dict,
ExtractorError, ExtractorError,
sanitized_Request, sanitized_Request,
urlencode_postdata,
) )
class FC2IE(InfoExtractor): class FC2IE(InfoExtractor):
_VALID_URL = r'^http://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)' _VALID_URL = r'^https?://video\.fc2\.com/(?:[^/]+/)*content/(?P<id>[^/]+)'
IE_NAME = 'fc2' IE_NAME = 'fc2'
_NETRC_MACHINE = 'fc2' _NETRC_MACHINE = 'fc2'
_TESTS = [{ _TESTS = [{
@ -57,7 +56,7 @@ class FC2IE(InfoExtractor):
'Submit': ' Login ', 'Submit': ' Login ',
} }
login_data = compat_urllib_parse.urlencode(encode_dict(login_form_strs)).encode('utf-8') login_data = urlencode_postdata(login_form_strs)
request = sanitized_Request( request = sanitized_Request(
'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data) 'https://secure.id.fc2.com/index.php?mode=login&switch_language=en', login_data)

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class FirstpostIE(InfoExtractor): class FirstpostIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html' _VALID_URL = r'https?://(?:www\.)?firstpost\.com/[^/]+/.*-(?P<id>[0-9]+)\.html'
_TEST = { _TEST = {
'url': 'http://www.firstpost.com/india/india-to-launch-indigenous-aircraft-carrier-monday-1025403.html', 'url': 'http://www.firstpost.com/india/india-to-launch-indigenous-aircraft-carrier-monday-1025403.html',

View File

@ -8,7 +8,7 @@ from ..utils import int_or_none
class FirstTVIE(InfoExtractor): class FirstTVIE(InfoExtractor):
IE_NAME = '1tv' IE_NAME = '1tv'
IE_DESC = 'Первый канал' IE_DESC = 'Первый канал'
_VALID_URL = r'http://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)' _VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390', 'url': 'http://www.1tv.ru/videoarchive/73390',

View File

@ -4,8 +4,8 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse,
compat_parse_qs, compat_parse_qs,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
) )
@ -109,7 +109,7 @@ class FiveMinIE(InfoExtractor):
response = self._download_json( response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' + 'https://syn.5min.com/handlers/SenseHandler.ashx?' +
compat_urllib_parse.urlencode({ compat_urllib_parse_urlencode({
'func': 'GetResults', 'func': 'GetResults',
'playlist': video_id, 'playlist': video_id,
'sid': sid, 'sid': sid,

View File

@ -10,7 +10,7 @@ from ..utils import (
class FKTVIE(InfoExtractor): class FKTVIE(InfoExtractor):
IE_NAME = 'fernsehkritik.tv' IE_NAME = 'fernsehkritik.tv'
_VALID_URL = r'http://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?' _VALID_URL = r'https?://(?:www\.)?fernsehkritik\.tv/folge-(?P<id>[0-9]+)(?:/.*)?'
_TEST = { _TEST = {
'url': 'http://fernsehkritik.tv/folge-1', 'url': 'http://fernsehkritik.tv/folge-1',

View File

@ -1,7 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse from ..compat import compat_urllib_parse_urlencode
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
int_or_none, int_or_none,
@ -42,7 +42,7 @@ class FlickrIE(InfoExtractor):
} }
if secret: if secret:
query['secret'] = secret query['secret'] = secret
data = self._download_json(self._API_BASE_URL + compat_urllib_parse.urlencode(query), video_id, note) data = self._download_json(self._API_BASE_URL + compat_urllib_parse_urlencode(query), video_id, note)
if data['stat'] != 'ok': if data['stat'] != 'ok':
raise ExtractorError(data['message']) raise ExtractorError(data['message'])
return data return data

View File

@ -5,7 +5,7 @@ from .common import InfoExtractor
class FootyRoomIE(InfoExtractor): class FootyRoomIE(InfoExtractor):
_VALID_URL = r'http://footyroom\.com/(?P<id>[^/]+)' _VALID_URL = r'https?://footyroom\.com/(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/', 'url': 'http://footyroom.com/schalke-04-0-2-real-madrid-2015-02/',
'info_dict': { 'info_dict': {

View File

@ -16,6 +16,9 @@ class FOXIE(InfoExtractor):
'title': 'Official Trailer: Gotham', 'title': 'Official Trailer: Gotham',
'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.', 'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.',
'duration': 129, 'duration': 129,
'timestamp': 1400020798,
'upload_date': '20140513',
'uploader': 'NEWA-FNG-FOXCOM',
}, },
'add_ie': ['ThePlatform'], 'add_ie': ['ThePlatform'],
} }

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class FoxgayIE(InfoExtractor): class FoxgayIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml' _VALID_URL = r'https?://(?:www\.)?foxgay\.com/videos/(?:\S+-)?(?P<id>\d+)\.shtml'
_TEST = { _TEST = {
'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml', 'url': 'http://foxgay.com/videos/fuck-turkish-style-2582.shtml',
'md5': '80d72beab5d04e1655a56ad37afe6841', 'md5': '80d72beab5d04e1655a56ad37afe6841',

View File

@ -18,8 +18,8 @@ class FoxNewsIE(AMPIE):
'title': 'Frozen in Time', 'title': 'Frozen in Time',
'description': '16-year-old girl is size of toddler', 'description': '16-year-old girl is size of toddler',
'duration': 265, 'duration': 265,
# 'timestamp': 1304411491, 'timestamp': 1304411491,
# 'upload_date': '20110503', 'upload_date': '20110503',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, },
}, },
@ -32,8 +32,8 @@ class FoxNewsIE(AMPIE):
'title': "Rep. Luis Gutierrez on if Obama's immigration plan is legal", 'title': "Rep. Luis Gutierrez on if Obama's immigration plan is legal",
'description': "Congressman discusses president's plan", 'description': "Congressman discusses president's plan",
'duration': 292, 'duration': 292,
# 'timestamp': 1417662047, 'timestamp': 1417662047,
# 'upload_date': '20141204', 'upload_date': '20141204',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, },
'params': { 'params': {

View File

@ -6,7 +6,7 @@ from ..utils import int_or_none
class FranceInterIE(InfoExtractor): class FranceInterIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)' _VALID_URL = r'https?://(?:www\.)?franceinter\.fr/player/reecouter\?play=(?P<id>[0-9]+)'
_TEST = { _TEST = {
'url': 'http://www.franceinter.fr/player/reecouter?play=793962', 'url': 'http://www.franceinter.fr/player/reecouter?play=793962',
'md5': '4764932e466e6f6c79c317d2e74f6884', 'md5': '4764932e466e6f6c79c317d2e74f6884',

View File

@ -60,21 +60,23 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
video_id, 'Downloading f4m manifest token', fatal=False) video_id, 'Downloading f4m manifest token', fatal=False)
if f4m_url: if f4m_url:
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44', video_id, 1, format_id)) f4m_url + '&hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
video_id, f4m_id=format_id, fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(video_url, video_id, 'mp4', m3u8_id=format_id)) formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id=format_id, fatal=False))
elif video_url.startswith('rtmp'): elif video_url.startswith('rtmp'):
formats.append({ formats.append({
'url': video_url, 'url': video_url,
'format_id': 'rtmp-%s' % format_id, 'format_id': 'rtmp-%s' % format_id,
'ext': 'flv', 'ext': 'flv',
'preference': 1,
}) })
else: else:
if self._is_valid_url(video_url, video_id, format_id):
formats.append({ formats.append({
'url': video_url, 'url': video_url,
'format_id': format_id, 'format_id': format_id,
'preference': -1,
}) })
self._sort_formats(formats) self._sort_formats(formats)
@ -82,6 +84,7 @@ class FranceTVBaseInfoExtractor(InfoExtractor):
subtitle = info.get('sous_titre') subtitle = info.get('sous_titre')
if subtitle: if subtitle:
title += ' - %s' % subtitle title += ' - %s' % subtitle
title = title.strip()
subtitles = {} subtitles = {}
subtitles_list = [{ subtitles_list = [{
@ -125,13 +128,13 @@ class PluzzIE(FranceTVBaseInfoExtractor):
class FranceTvInfoIE(FranceTVBaseInfoExtractor): class FranceTvInfoIE(FranceTVBaseInfoExtractor):
IE_NAME = 'francetvinfo.fr' IE_NAME = 'francetvinfo.fr'
_VALID_URL = r'https?://(?:www|mobile)\.francetvinfo\.fr/.*/(?P<title>.+)\.html' _VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html', 'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
'info_dict': { 'info_dict': {
'id': '84981923', 'id': '84981923',
'ext': 'flv', 'ext': 'mp4',
'title': 'Soir 3', 'title': 'Soir 3',
'upload_date': '20130826', 'upload_date': '20130826',
'timestamp': 1377548400, 'timestamp': 1377548400,
@ -139,6 +142,10 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'fr': 'mincount:2', 'fr': 'mincount:2',
}, },
}, },
'params': {
# m3u8 downloads
'skip_download': True,
},
}, { }, {
'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html', 'url': 'http://www.francetvinfo.fr/elections/europeennes/direct-europeennes-regardez-le-debat-entre-les-candidats-a-la-presidence-de-la-commission_600639.html',
'info_dict': { 'info_dict': {
@ -155,11 +162,32 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
'url': 'http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html', 'url': 'http://www.francetvinfo.fr/economie/entreprises/les-entreprises-familiales-le-secret-de-la-reussite_933271.html',
'md5': 'f485bda6e185e7d15dbc69b72bae993e', 'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': { 'info_dict': {
'id': '556e03339473995ee145930c', 'id': 'NI_173343',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Les entreprises familiales : le secret de la réussite', 'title': 'Les entreprises familiales : le secret de la réussite',
'thumbnail': 're:^https?://.*\.jpe?g$', 'thumbnail': 're:^https?://.*\.jpe?g$',
} 'timestamp': 1433273139,
'upload_date': '20150602',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
}, {
'url': 'http://france3-regions.francetvinfo.fr/bretagne/cotes-d-armor/thalassa-echappee-breizh-ce-venredi-dans-les-cotes-d-armor-954961.html',
'md5': 'f485bda6e185e7d15dbc69b72bae993e',
'info_dict': {
'id': 'NI_657393',
'ext': 'mp4',
'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de lArmor"',
'description': 'md5:a3264114c9d29aeca11ced113c37b16c',
'thumbnail': 're:^https?://.*\.jpe?g$',
'timestamp': 1458300695,
'upload_date': '20160318',
},
'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -172,7 +200,9 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
return self.url_result(dmcloud_url, 'DailymotionCloud') return self.url_result(dmcloud_url, 'DailymotionCloud')
video_id, catalogue = self._search_regex( video_id, catalogue = self._search_regex(
r'id-video=([^@]+@[^"]+)', webpage, 'video id').split('@') (r'id-video=([^@]+@[^"]+)',
r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
webpage, 'video id').split('@')
return self._extract_video(video_id, catalogue) return self._extract_video(video_id, catalogue)

View File

@ -5,7 +5,7 @@ from ..utils import ExtractorError
class FreeVideoIE(InfoExtractor): class FreeVideoIE(InfoExtractor):
_VALID_URL = r'^http://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])' _VALID_URL = r'^https?://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])'
_TEST = { _TEST = {
'url': 'http://www.freevideo.cz/vase-videa/vysukany-zadecek-22033.html', 'url': 'http://www.freevideo.cz/vase-videa/vysukany-zadecek-22033.html',

View File

@ -5,7 +5,6 @@ from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html, clean_html,
determine_ext, determine_ext,
encode_dict,
int_or_none, int_or_none,
sanitized_Request, sanitized_Request,
ExtractorError, ExtractorError,
@ -54,10 +53,10 @@ class FunimationIE(InfoExtractor):
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None: if username is None:
return return
data = urlencode_postdata(encode_dict({ data = urlencode_postdata({
'email_field': username, 'email_field': username,
'password_field': password, 'password_field': password,
})) })
login_request = sanitized_Request('http://www.funimation.com/login', data, headers={ login_request = sanitized_Request('http://www.funimation.com/login', data, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0', 'User-Agent': 'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0',
'Content-Type': 'application/x-www-form-urlencoded' 'Content-Type': 'application/x-www-form-urlencoded'

Some files were not shown because too many files have changed in this diff Show More