This commit is contained in:
Gilles Habran 2016-06-08 08:04:25 +02:00
commit f8c7bee0c2
23 changed files with 658 additions and 348 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.02*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.03*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.02** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.03**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.06.02 [debug] youtube-dl version 2016.06.03
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -17,7 +17,7 @@ youtube-dl - download videos from youtube.com or other video platforms
To install it right away for all UNIX users (Linux, OS X, etc.), type: To install it right away for all UNIX users (Linux, OS X, etc.), type:
sudo curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl sudo curl -L https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget: If you do not have curl, you can alternatively use a recent wget:
@ -27,13 +27,19 @@ If you do not have curl, you can alternatively use a recent wget:
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`). Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/). You can also use pip:
sudo pip install --upgrade youtube-dl
This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
OS X users can install youtube-dl with [Homebrew](http://brew.sh/):
brew install youtube-dl brew install youtube-dl
You can also use pip: Or with [MacPorts](https://www.macports.org/):
sudo pip install youtube-dl sudo port install youtube-dl
Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html). Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
@ -842,6 +848,12 @@ It is *not* possible to detect whether a URL is supported or not. That's because
If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program. If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program.
# Why do I need to go through that much red tape when filing bugs?
Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was alrady reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
# DEVELOPER INSTRUCTIONS # DEVELOPER INSTRUCTIONS
Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution. Most users do not need to build youtube-dl and can [download the builds](http://rg3.github.io/youtube-dl/download.html) or get them from their distribution.

View File

@ -13,6 +13,7 @@ import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__))))) sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_input,
compat_http_server, compat_http_server,
compat_str, compat_str,
compat_urlparse, compat_urlparse,
@ -30,11 +31,6 @@ try:
except ImportError: # Python 2 except ImportError: # Python 2
import SocketServer as compat_socketserver import SocketServer as compat_socketserver
try:
compat_input = raw_input
except NameError: # Python 3
compat_input = input
class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer): class BuildHTTPServer(compat_socketserver.ThreadingMixIn, compat_http_server.HTTPServer):
allow_reuse_address = True allow_reuse_address = True

View File

@ -0,0 +1,111 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import base64
import json
import mimetypes
import netrc
import optparse
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_basestring,
compat_input,
compat_getpass,
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import (
make_HTTPS_handler,
sanitized_Request,
)
class GitHubReleaser(object):
_API_URL = 'https://api.github.com/repos/rg3/youtube-dl/releases'
_UPLOADS_URL = 'https://uploads.github.com/repos/rg3/youtube-dl/releases/%s/assets?name=%s'
_NETRC_MACHINE = 'github.com'
def __init__(self, debuglevel=0):
self._init_github_account()
https_handler = make_HTTPS_handler({}, debuglevel=debuglevel)
self._opener = compat_urllib_request.build_opener(https_handler)
def _init_github_account(self):
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
self._username = info[0]
self._password = info[2]
compat_print('Using GitHub credentials found in .netrc...')
return
else:
compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc')
self._username = compat_input(
'Type your GitHub username or email address and press [Return]: ')
self._password = compat_getpass(
'Type your GitHub password and press [Return]: ')
def _call(self, req):
if isinstance(req, compat_basestring):
req = sanitized_Request(req)
# Authorizing manually since GitHub does not response with 401 with
# WWW-Authenticate header set (see
# https://developer.github.com/v3/#basic-authentication)
b64 = base64.b64encode(
('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
req.add_header('Authorization', 'Basic %s' % b64)
response = self._opener.open(req).read().decode('utf-8')
return json.loads(response)
def list_releases(self):
return self._call(self._API_URL)
def create_release(self, tag_name, name=None, body='', draft=False, prerelease=False):
data = {
'tag_name': tag_name,
'target_commitish': 'master',
'name': name,
'body': body,
'draft': draft,
'prerelease': prerelease,
}
req = sanitized_Request(self._API_URL, json.dumps(data).encode('utf-8'))
return self._call(req)
def create_asset(self, release_id, asset):
asset_name = os.path.basename(asset)
url = self._UPLOADS_URL % (release_id, asset_name)
# Our files are small enough to be loaded directly into memory.
data = open(asset, 'rb').read()
req = sanitized_Request(url, data)
mime_type, _ = mimetypes.guess_type(asset_name)
req.add_header('Content-Type', mime_type or 'application/octet-stream')
return self._call(req)
def main():
parser = optparse.OptionParser(usage='%prog VERSION BUILDPATH')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected a version and a build directory')
version, build_path = args
releaser = GitHubReleaser()
new_release = releaser.create_release(version, name='youtube-dl %s' % version)
release_id = new_release['id']
for asset in os.listdir(build_path):
compat_print('Uploading %s...' % asset)
releaser.create_asset(release_id, os.path.join(build_path, asset))
if __name__ == '__main__':
main()

View File

@ -95,15 +95,16 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS) (cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
(cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS) (cd build/$version/ && sha512sum $RELEASE_FILES > SHA2-512SUMS)
/bin/echo -e "\n### Signing and uploading the new binaries to yt-dl.org ..." /bin/echo -e "\n### Signing and uploading the new binaries to GitHub..."
for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
scp -r "build/$version" ytdl@yt-dl.org:html/tmp/
ssh ytdl@yt-dl.org "mv html/tmp/$version html/downloads/" ROOT=$(pwd)
python devscripts/create-github-release.py $version "$ROOT/build/$version"
ssh ytdl@yt-dl.org "sh html/update_latest.sh $version" ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
/bin/echo -e "\n### Now switching to gh-pages..." /bin/echo -e "\n### Now switching to gh-pages..."
git clone --branch gh-pages --single-branch . build/gh-pages git clone --branch gh-pages --single-branch . build/gh-pages
ROOT=$(pwd)
( (
set -e set -e
ORIGIN_URL=$(git config --get remote.origin.url) ORIGIN_URL=$(git config --get remote.origin.url)

View File

@ -43,8 +43,8 @@
- **appletrailers:section** - **appletrailers:section**
- **archive.org**: archive.org videos - **archive.org**: archive.org videos
- **ARD** - **ARD**
- **ARD:mediathek**
- **ARD:mediathek**: Saarländischer Rundfunk - **ARD:mediathek**: Saarländischer Rundfunk
- **ARD:mediathek**
- **arte.tv** - **arte.tv**
- **arte.tv:+7** - **arte.tv:+7**
- **arte.tv:cinema** - **arte.tv:cinema**
@ -339,6 +339,7 @@
- **livestream** - **livestream**
- **livestream:original** - **livestream:original**
- **LnkGo** - **LnkGo**
- **loc**: Library of Congress
- **LocalNews8** - **LocalNews8**
- **LoveHomePorn** - **LoveHomePorn**
- **lrt.lt** - **lrt.lt**
@ -528,7 +529,8 @@
- **Restudy** - **Restudy**
- **Reuters** - **Reuters**
- **ReverbNation** - **ReverbNation**
- **Revision3** - **revision**
- **revision3:embed**
- **RICE** - **RICE**
- **RingTV** - **RingTV**
- **RottenTomatoes** - **RottenTomatoes**
@ -567,6 +569,7 @@
- **ScreencastOMatic** - **ScreencastOMatic**
- **ScreenJunkies** - **ScreenJunkies**
- **ScreenwaveMedia** - **ScreenwaveMedia**
- **Seeker**
- **SenateISVP** - **SenateISVP**
- **SendtoNews** - **SendtoNews**
- **ServingSys** - **ServingSys**

View File

@ -18,7 +18,6 @@ from .options import (
from .compat import ( from .compat import (
compat_expanduser, compat_expanduser,
compat_getpass, compat_getpass,
compat_print,
compat_shlex_split, compat_shlex_split,
workaround_optparse_bug9161, workaround_optparse_bug9161,
) )
@ -76,7 +75,7 @@ def _real_main(argv=None):
# Dump user agent # Dump user agent
if opts.dump_user_agent: if opts.dump_user_agent:
compat_print(std_headers['User-Agent']) write_string(std_headers['User-Agent'] + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
# Batch file verification # Batch file verification
@ -101,10 +100,10 @@ def _real_main(argv=None):
if opts.list_extractors: if opts.list_extractors:
for ie in list_extractors(opts.age_limit): for ie in list_extractors(opts.age_limit):
compat_print(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else '')) write_string(ie.IE_NAME + (' (CURRENTLY BROKEN)' if not ie._WORKING else '') + '\n', out=sys.stdout)
matchedUrls = [url for url in all_urls if ie.suitable(url)] matchedUrls = [url for url in all_urls if ie.suitable(url)]
for mu in matchedUrls: for mu in matchedUrls:
compat_print(' ' + mu) write_string(' ' + mu + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
if opts.list_extractor_descriptions: if opts.list_extractor_descriptions:
for ie in list_extractors(opts.age_limit): for ie in list_extractors(opts.age_limit):
@ -117,7 +116,7 @@ def _real_main(argv=None):
_SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow') _SEARCHES = ('cute kittens', 'slithering pythons', 'falling cat', 'angry poodle', 'purple fish', 'running tortoise', 'sleeping bunny', 'burping cow')
_COUNTS = ('', '5', '10', 'all') _COUNTS = ('', '5', '10', 'all')
desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES)) desc += ' (Example: "%s%s:%s" )' % (ie.SEARCH_KEY, random.choice(_COUNTS), random.choice(_SEARCHES))
compat_print(desc) write_string(desc + '\n', out=sys.stdout)
sys.exit(0) sys.exit(0)
# Conflicting, missing and erroneous options # Conflicting, missing and erroneous options

View File

@ -482,6 +482,11 @@ if sys.version_info < (3, 0) and sys.platform == 'win32':
else: else:
compat_getpass = getpass.getpass compat_getpass = getpass.getpass
try:
compat_input = raw_input
except NameError: # Python 3
compat_input = input
# Python < 2.6.5 require kwargs to be bytes # Python < 2.6.5 require kwargs to be bytes
try: try:
def _testfunc(x): def _testfunc(x):
@ -623,6 +628,7 @@ __all__ = [
'compat_html_entities', 'compat_html_entities',
'compat_http_client', 'compat_http_client',
'compat_http_server', 'compat_http_server',
'compat_input',
'compat_itertools_count', 'compat_itertools_count',
'compat_kwargs', 'compat_kwargs',
'compat_ord', 'compat_ord',

View File

@ -23,11 +23,17 @@ class HlsFD(FragmentFD):
UNSUPPORTED_FEATURES = ( UNSUPPORTED_FEATURES = (
r'#EXT-X-KEY:METHOD=(?!NONE)', # encrypted streams [1] r'#EXT-X-KEY:METHOD=(?!NONE)', # encrypted streams [1]
r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2] r'#EXT-X-BYTERANGE', # playlists composed of byte ranges of media files [2]
# Live streams heuristic does not always work (e.g. geo restricted to Germany # Live streams heuristic does not always work (e.g. geo restricted to Germany
# http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0) # http://hls-geo.daserste.de/i/videoportal/Film/c_620000/622873/format,716451,716457,716450,716458,716459,.mp4.csmil/index_4_av.m3u8?null=0)
# r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3] # r'#EXT-X-MEDIA-SEQUENCE:(?!0$)', # live streams [3]
r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# event media playlists [4] # This heuristic also is not correct since segments may not be appended as well.
# Twitch vods of finished streams have EXT-X-PLAYLIST-TYPE:EVENT despite
# no segments will definitely be appended to the end of the playlist.
# r'#EXT-X-PLAYLIST-TYPE:EVENT', # media segments may be appended to the end of
# # event media playlists [4]
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4 # 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2 # 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.2.2
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2 # 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.2

View File

@ -4,11 +4,11 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlparse
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
unified_strdate, unified_strdate,
url_basename,
qualities, qualities,
int_or_none, int_or_none,
) )
@ -16,24 +16,38 @@ from ..utils import (
class CanalplusIE(InfoExtractor): class CanalplusIE(InfoExtractor):
IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv' IE_DESC = 'canalplus.fr, piwiplus.fr and d8.tv'
_VALID_URL = r'https?://(?:www\.(?P<site>canalplus\.fr|piwiplus\.fr|d8\.tv|itele\.fr)/.*?/(?P<path>.*)|player\.canalplus\.fr/#/(?P<id>[0-9]+))' _VALID_URL = r'''(?x)
https?://
(?:
(?:
(?:(?:www|m)\.)?canalplus\.fr|
(?:www\.)?piwiplus\.fr|
(?:www\.)?d8\.tv|
(?:www\.)?d17\.tv|
(?:www\.)?itele\.fr
)/(?:(?:[^/]+/)*(?P<display_id>[^/?#&]+))?(?:\?.*\bvid=(?P<vid>\d+))?|
player\.canalplus\.fr/#/(?P<id>\d+)
)
'''
_VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json' _VIDEO_INFO_TEMPLATE = 'http://service.canal-plus.com/video/rest/getVideosLiees/%s/%s?format=json'
_SITE_ID_MAP = { _SITE_ID_MAP = {
'canalplus.fr': 'cplus', 'canalplus': 'cplus',
'piwiplus.fr': 'teletoon', 'piwiplus': 'teletoon',
'd8.tv': 'd8', 'd8': 'd8',
'itele.fr': 'itele', 'd17': 'd17',
'itele': 'itele',
} }
_TESTS = [{ _TESTS = [{
'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1263092', 'url': 'http://www.canalplus.fr/c-emissions/pid1830-c-zapping.html?vid=1192814',
'md5': '12164a6f14ff6df8bd628e8ba9b10b78', 'md5': '41f438a4904f7664b91b4ed0dec969dc',
'info_dict': { 'info_dict': {
'id': '1263092', 'id': '1192814',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Le Zapping - 13/05/15', 'title': "L'Année du Zapping 2014 - L'Année du Zapping 2014",
'description': 'md5:09738c0d06be4b5d06a0940edb0da73f', 'description': "Toute l'année 2014 dans un Zapping exceptionnel !",
'upload_date': '20150513', 'upload_date': '20150105',
}, },
}, { }, {
'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190', 'url': 'http://www.piwiplus.fr/videos-piwi/pid1405-le-labyrinthe-boing-super-ranger.html?vid=1108190',
@ -46,35 +60,45 @@ class CanalplusIE(InfoExtractor):
}, },
'skip': 'Only works from France', 'skip': 'Only works from France',
}, { }, {
'url': 'http://www.d8.tv/d8-docs-mags/pid6589-d8-campagne-intime.html', 'url': 'http://www.d8.tv/d8-docs-mags/pid5198-d8-en-quete-d-actualite.html?vid=1390231',
'info_dict': { 'info_dict': {
'id': '966289', 'id': '1390231',
'ext': 'flv',
'title': 'Campagne intime - Documentaire exceptionnel',
'description': 'md5:d2643b799fb190846ae09c61e59a859f',
'upload_date': '20131108',
},
'skip': 'videos get deleted after a while',
}, {
'url': 'http://www.itele.fr/france/video/aubervilliers-un-lycee-en-colere-111559',
'md5': '38b8f7934def74f0d6f3ba6c036a5f82',
'info_dict': {
'id': '1213714',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Aubervilliers : un lycée en colère - Le 11/02/2015 à 06h45', 'title': "Vacances pas chères : prix discount ou grosses dépenses ? - En quête d'actualité",
'description': 'md5:8216206ec53426ea6321321f3b3c16db', 'description': 'md5:edb6cf1cb4a1e807b5dd089e1ac8bfc6',
'upload_date': '20150211', 'upload_date': '20160512',
}, },
'params': {
'skip_download': True,
},
}, {
'url': 'http://www.itele.fr/chroniques/invite-bruce-toussaint/thierry-solere-nicolas-sarkozy-officialisera-sa-candidature-a-la-primaire-quand-il-le-voudra-167224',
'info_dict': {
'id': '1398334',
'ext': 'mp4',
'title': "L'invité de Bruce Toussaint du 07/06/2016 - ",
'description': 'md5:40ac7c9ad0feaeb6f605bad986f61324',
'upload_date': '20160607',
},
'params': {
'skip_download': True,
},
}, {
'url': 'http://m.canalplus.fr/?vid=1398231',
'only_matching': True,
}, {
'url': 'http://www.d17.tv/emissions/pid8303-lolywood.html?vid=1397061',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.groupdict().get('id') video_id = mobj.groupdict().get('id') or mobj.groupdict().get('vid')
site_id = self._SITE_ID_MAP[mobj.group('site') or 'canal'] site_id = self._SITE_ID_MAP[compat_urllib_parse_urlparse(url).netloc.rsplit('.', 2)[-2]]
# Beware, some subclasses do not define an id group # Beware, some subclasses do not define an id group
display_id = url_basename(mobj.group('path')) display_id = mobj.group('display_id') or video_id
if video_id is None: if video_id is None:
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)

View File

@ -20,54 +20,64 @@ class Channel9IE(InfoExtractor):
''' '''
IE_DESC = 'Channel 9' IE_DESC = 'Channel 9'
IE_NAME = 'channel9' IE_NAME = 'channel9'
_VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+)/?' _VALID_URL = r'https?://(?:www\.)?channel9\.msdn\.com/(?P<contentpath>.+?)(?P<rss>/RSS)?/?(?:[?#&]|$)'
_TESTS = [ _TESTS = [{
{ 'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002',
'url': 'http://channel9.msdn.com/Events/TechEd/Australia/2013/KOS002', 'md5': 'bbd75296ba47916b754e73c3a4bbdf10',
'md5': 'bbd75296ba47916b754e73c3a4bbdf10', 'info_dict': {
'info_dict': { 'id': 'Events/TechEd/Australia/2013/KOS002',
'id': 'Events/TechEd/Australia/2013/KOS002', 'ext': 'mp4',
'ext': 'mp4', 'title': 'Developer Kick-Off Session: Stuff We Love',
'title': 'Developer Kick-Off Session: Stuff We Love', 'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
'description': 'md5:c08d72240b7c87fcecafe2692f80e35f', 'duration': 4576,
'duration': 4576, 'thumbnail': 're:http://.*\.jpg',
'thumbnail': 're:http://.*\.jpg', 'session_code': 'KOS002',
'session_code': 'KOS002', 'session_day': 'Day 1',
'session_day': 'Day 1', 'session_room': 'Arena 1A',
'session_room': 'Arena 1A', 'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug',
'session_speakers': ['Ed Blankenship', 'Andrew Coates', 'Brady Gaster', 'Patrick Klug', 'Mads Kristensen'], 'Mads Kristensen'],
},
}, },
{ }, {
'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing', 'url': 'http://channel9.msdn.com/posts/Self-service-BI-with-Power-BI-nuclear-testing',
'md5': 'b43ee4529d111bc37ba7ee4f34813e68', 'md5': 'b43ee4529d111bc37ba7ee4f34813e68',
'info_dict': { 'info_dict': {
'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing', 'id': 'posts/Self-service-BI-with-Power-BI-nuclear-testing',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Self-service BI with Power BI - nuclear testing', 'title': 'Self-service BI with Power BI - nuclear testing',
'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b', 'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
'duration': 1540, 'duration': 1540,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
'authors': ['Mike Wilmot'], 'authors': ['Mike Wilmot'],
},
}, },
{ }, {
# low quality mp4 is best # low quality mp4 is best
'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'url': 'https://channel9.msdn.com/Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'info_dict': { 'info_dict': {
'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library', 'id': 'Events/CPP/CppCon-2015/Ranges-for-the-Standard-Library',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ranges for the Standard Library', 'title': 'Ranges for the Standard Library',
'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d', 'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
'duration': 5646, 'duration': 5646,
'thumbnail': 're:http://.*\.jpg', 'thumbnail': 're:http://.*\.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
} }, {
] 'url': 'https://channel9.msdn.com/Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b/RSS',
'info_dict': {
'id': 'Niners/Splendid22/Queue/76acff796e8f411184b008028e0d492b',
'title': 'Channel 9',
},
'playlist_count': 2,
}, {
'url': 'https://channel9.msdn.com/Events/DEVintersection/DEVintersection-2016/RSS',
'only_matching': True,
}, {
'url': 'https://channel9.msdn.com/Events/Speakers/scott-hanselman/RSS?UrlSafeName=scott-hanselman',
'only_matching': True,
}]
_RSS_URL = 'http://channel9.msdn.com/%s/RSS' _RSS_URL = 'http://channel9.msdn.com/%s/RSS'
@ -254,22 +264,30 @@ class Channel9IE(InfoExtractor):
return self.playlist_result(contents) return self.playlist_result(contents)
def _extract_list(self, content_path): def _extract_list(self, video_id, rss_url=None):
rss = self._download_xml(self._RSS_URL % content_path, content_path, 'Downloading RSS') if not rss_url:
rss_url = self._RSS_URL % video_id
rss = self._download_xml(rss_url, video_id, 'Downloading RSS')
entries = [self.url_result(session_url.text, 'Channel9') entries = [self.url_result(session_url.text, 'Channel9')
for session_url in rss.findall('./channel/item/link')] for session_url in rss.findall('./channel/item/link')]
title_text = rss.find('./channel/title').text title_text = rss.find('./channel/title').text
return self.playlist_result(entries, content_path, title_text) return self.playlist_result(entries, video_id, title_text)
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
content_path = mobj.group('contentpath') content_path = mobj.group('contentpath')
rss = mobj.group('rss')
webpage = self._download_webpage(url, content_path, 'Downloading web page') if rss:
return self._extract_list(content_path, url)
page_type_m = re.search(r'<meta name="WT.entryid" content="(?P<pagetype>[^:]+)[^"]+"/>', webpage) webpage = self._download_webpage(
if page_type_m is not None: url, content_path, 'Downloading web page')
page_type = page_type_m.group('pagetype')
page_type = self._search_regex(
r'<meta[^>]+name=(["\'])WT\.entryid\1[^>]+content=(["\'])(?P<pagetype>[^:]+).+?\2',
webpage, 'page type', default=None, group='pagetype')
if page_type:
if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content if page_type == 'Entry': # Any 'item'-like page, may contain downloadable content
return self._extract_entry_item(webpage, content_path) return self._extract_entry_item(webpage, content_path)
elif page_type == 'Session': # Event session page, may contain downloadable content elif page_type == 'Session': # Event session page, may contain downloadable content
@ -278,6 +296,5 @@ class Channel9IE(InfoExtractor):
return self._extract_list(content_path) return self._extract_list(content_path)
else: else:
raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True) raise ExtractorError('Unexpected WT.entryid %s' % page_type, expected=True)
else: # Assuming list else: # Assuming list
return self._extract_list(content_path) return self._extract_list(content_path)

View File

@ -45,6 +45,7 @@ from ..utils import (
unescapeHTML, unescapeHTML,
unified_strdate, unified_strdate,
url_basename, url_basename,
xpath_element,
xpath_text, xpath_text,
xpath_with_ns, xpath_with_ns,
determine_protocol, determine_protocol,
@ -1030,7 +1031,7 @@ class InfoExtractor(object):
if base_url: if base_url:
base_url = base_url.strip() base_url = base_url.strip()
bootstrap_info = xpath_text( bootstrap_info = xpath_element(
manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'], manifest, ['{http://ns.adobe.com/f4m/1.0}bootstrapInfo', '{http://ns.adobe.com/f4m/2.0}bootstrapInfo'],
'bootstrap info', default=None) 'bootstrap info', default=None)
@ -1085,7 +1086,7 @@ class InfoExtractor(object):
formats.append({ formats.append({
'format_id': format_id, 'format_id': format_id,
'url': manifest_url, 'url': manifest_url,
'ext': 'flv' if bootstrap_info else None, 'ext': 'flv' if bootstrap_info is not None else None,
'tbr': tbr, 'tbr': tbr,
'width': width, 'width': width,
'height': height, 'height': height,

View File

@ -382,6 +382,7 @@ from .leeco import (
LePlaylistIE, LePlaylistIE,
LetvCloudIE, LetvCloudIE,
) )
from .libraryofcongress import LibraryOfCongressIE
from .libsyn import LibsynIE from .libsyn import LibsynIE
from .lifenews import ( from .lifenews import (
LifeNewsIE, LifeNewsIE,
@ -909,6 +910,7 @@ from .videomore import (
) )
from .videopremium import VideoPremiumIE from .videopremium import VideoPremiumIE
from .videott import VideoTtIE from .videott import VideoTtIE
from .vidio import VidioIE
from .vidme import ( from .vidme import (
VidmeIE, VidmeIE,
VidmeUserIE, VidmeUserIE,
@ -965,7 +967,6 @@ from .watchindianporn import WatchIndianPornIE
from .wdr import ( from .wdr import (
WDRIE, WDRIE,
WDRMobileIE, WDRMobileIE,
WDRMausIE,
) )
from .webofstories import ( from .webofstories import (
WebOfStoriesIE, WebOfStoriesIE,

View File

@ -0,0 +1,143 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
parse_filesize,
)
class LibraryOfCongressIE(InfoExtractor):
IE_NAME = 'loc'
IE_DESC = 'Library of Congress'
_VALID_URL = r'https?://(?:www\.)?loc\.gov/(?:item/|today/cyberlc/feature_wdesc\.php\?.*\brec=)(?P<id>[0-9]+)'
_TESTS = [{
# embedded via <div class="media-player"
'url': 'http://loc.gov/item/90716351/',
'md5': '353917ff7f0255aa6d4b80a034833de8',
'info_dict': {
'id': '90716351',
'ext': 'mp4',
'title': "Pa's trip to Mars",
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 0,
'view_count': int,
},
}, {
# webcast embedded via mediaObjectId
'url': 'https://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=5578',
'info_dict': {
'id': '5578',
'ext': 'mp4',
'title': 'Help! Preservation Training Needs Here, There & Everywhere',
'duration': 3765,
'view_count': int,
'subtitles': 'mincount:1',
},
'params': {
'skip_download': True,
},
}, {
# with direct download links
'url': 'https://www.loc.gov/item/78710669/',
'info_dict': {
'id': '78710669',
'ext': 'mp4',
'title': 'La vie et la passion de Jesus-Christ',
'duration': 0,
'view_count': int,
'formats': 'mincount:4',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
media_id = self._search_regex(
(r'id=(["\'])media-player-(?P<id>.+?)\1',
r'<video[^>]+id=(["\'])uuid-(?P<id>.+?)\1',
r'<video[^>]+data-uuid=(["\'])(?P<id>.+?)\1',
r'mediaObjectId\s*:\s*(["\'])(?P<id>.+?)\1'),
webpage, 'media id', group='id')
data = self._download_json(
'https://media.loc.gov/services/v1/media?id=%s&context=json' % media_id,
video_id)['mediaObject']
derivative = data['derivatives'][0]
media_url = derivative['derivativeUrl']
title = derivative.get('shortName') or data.get('shortName') or self._og_search_title(
webpage)
# Following algorithm was extracted from setAVSource js function
# found in webpage
media_url = media_url.replace('rtmp', 'https')
is_video = data.get('mediaType', 'v').lower() == 'v'
ext = determine_ext(media_url)
if ext not in ('mp4', 'mp3'):
media_url += '.mp4' if is_video else '.mp3'
if 'vod/mp4:' in media_url:
formats = [{
'url': media_url.replace('vod/mp4:', 'hls-vod/media/') + '.m3u8',
'format_id': 'hls',
'ext': 'mp4',
'protocol': 'm3u8_native',
'quality': 1,
}]
elif 'vod/mp3:' in media_url:
formats = [{
'url': media_url.replace('vod/mp3:', ''),
'vcodec': 'none',
}]
download_urls = set()
for m in re.finditer(
r'<option[^>]+value=(["\'])(?P<url>.+?)\1[^>]+data-file-download=[^>]+>\s*(?P<id>.+?)(?:(?:&nbsp;|\s+)\((?P<size>.+?)\))?\s*<', webpage):
format_id = m.group('id').lower()
if format_id == 'gif':
continue
download_url = m.group('url')
if download_url in download_urls:
continue
download_urls.add(download_url)
formats.append({
'url': download_url,
'format_id': format_id,
'filesize_approx': parse_filesize(m.group('size')),
})
self._sort_formats(formats)
duration = float_or_none(data.get('duration'))
view_count = int_or_none(data.get('viewCount'))
subtitles = {}
cc_url = data.get('ccUrl')
if cc_url:
subtitles.setdefault('en', []).append({
'url': cc_url,
'ext': 'ttml',
})
return {
'id': video_id,
'title': title,
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'duration': duration,
'view_count': view_count,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -203,9 +203,10 @@ class LivestreamIE(InfoExtractor):
if not videos_info: if not videos_info:
break break
for v in videos_info: for v in videos_info:
v_id = compat_str(v['id'])
entries.append(self.url_result( entries.append(self.url_result(
'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v['id']), 'http://livestream.com/accounts/%s/events/%s/videos/%s' % (account_id, event_id, v_id),
'Livestream', v['id'], v['caption'])) 'Livestream', v_id, v.get('caption')))
last_video = videos_info[-1]['id'] last_video = videos_info[-1]['id']
return self.playlist_result(entries, event_id, event_data['full_name']) return self.playlist_result(entries, event_id, event_data['full_name'])

View File

@ -12,7 +12,7 @@ class TheSixtyOneIE(InfoExtractor):
s| s|
song/comments/list| song/comments/list|
song song
)/(?P<id>[A-Za-z0-9]+)/?$''' )/(?:[^/]+/)?(?P<id>[A-Za-z0-9]+)/?$'''
_SONG_URL_TEMPLATE = 'http://thesixtyone.com/s/{0:}' _SONG_URL_TEMPLATE = 'http://thesixtyone.com/s/{0:}'
_SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}/thesixtyone_production/audio/{0:}_stream' _SONG_FILE_URL_TEMPLATE = 'http://{audio_server:}/thesixtyone_production/audio/{0:}_stream'
_THUMBNAIL_URL_TEMPLATE = '{photo_base_url:}_desktop' _THUMBNAIL_URL_TEMPLATE = '{photo_base_url:}_desktop'
@ -45,6 +45,10 @@ class TheSixtyOneIE(InfoExtractor):
'url': 'http://www.thesixtyone.com/song/SrE3zD7s1jt/', 'url': 'http://www.thesixtyone.com/song/SrE3zD7s1jt/',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'http://www.thesixtyone.com/maryatmidnight/song/StrawberriesandCream/yvWtLp0c4GQ/',
'only_matching': True,
},
] ]
_DECODE_MAP = { _DECODE_MAP = {

View File

@ -260,7 +260,7 @@ class TwitchVodIE(TwitchItemBaseIE):
'nauth': access_token['token'], 'nauth': access_token['token'],
'nauthsig': access_token['sig'], 'nauthsig': access_token['sig'],
})), })),
item_id, 'mp4') item_id, 'mp4', entry_protocol='m3u8_native')
self._prefer_source(formats) self._prefer_source(formats)
info['formats'] = formats info['formats'] = formats

View File

@ -0,0 +1,73 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class VidioIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vidio\.com/watch/(?P<id>\d+)-(?P<display_id>[^/?#&]+)'
_TESTS = [{
'url': 'http://www.vidio.com/watch/165683-dj_ambred-booyah-live-2015',
'md5': 'cd2801394afc164e9775db6a140b91fe',
'info_dict': {
'id': '165683',
'display_id': 'dj_ambred-booyah-live-2015',
'ext': 'mp4',
'title': 'DJ_AMBRED - Booyah (Live 2015)',
'description': 'md5:27dc15f819b6a78a626490881adbadf8',
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 149,
'like_count': int,
},
}, {
'url': 'https://www.vidio.com/watch/77949-south-korea-test-fires-missile-that-can-strike-all-of-the-north',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id, display_id = mobj.group('id', 'display_id')
webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage)
m3u8_url, duration, thumbnail = [None] * 3
clips = self._parse_json(
self._html_search_regex(
r'data-json-clips\s*=\s*(["\'])(?P<data>\[.+?\])\1',
webpage, 'video data', default='[]', group='data'),
display_id, fatal=False)
if clips:
clip = clips[0]
m3u8_url = clip.get('sources', [{}])[0].get('file')
duration = clip.get('clip_duration')
thumbnail = clip.get('image')
m3u8_url = m3u8_url or self._search_regex(
r'data(?:-vjs)?-clip-hls-url=(["\'])(?P<url>.+?)\1', webpage, 'hls url')
formats = self._extract_m3u8_formats(m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native')
duration = int_or_none(duration or self._search_regex(
r'data-video-duration=(["\'])(?P<duartion>\d+)\1', webpage, 'duration'))
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
like_count = int_or_none(self._search_regex(
(r'<span[^>]+data-comment-vote-count=["\'](\d+)',
r'<span[^>]+class=["\'].*?\blike(?:__|-)count\b.*?["\'][^>]*>\s*(\d+)'),
webpage, 'like count', fatal=False))
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': thumbnail,
'duration': duration,
'like_count': like_count,
'formats': formats,
}

View File

@ -9,6 +9,7 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
int_or_none, int_or_none,
remove_start,
) )
from ..compat import compat_urllib_parse_urlencode from ..compat import compat_urllib_parse_urlencode
@ -39,6 +40,7 @@ class VLiveIE(InfoExtractor):
webpage, 'video params') webpage, 'video params')
status, _, _, live_params, long_video_id, key = re.split( status, _, _, live_params, long_video_id, key = re.split(
r'"\s*,\s*"', video_params)[2:8] r'"\s*,\s*"', video_params)[2:8]
status = remove_start(status, 'PRODUCT_')
if status == 'LIVE_ON_AIR' or status == 'BIG_EVENT_ON_AIR': if status == 'LIVE_ON_AIR' or status == 'BIG_EVENT_ON_AIR':
live_params = self._parse_json('"%s"' % live_params, video_id) live_params = self._parse_json('"%s"' % live_params, video_id)

View File

@ -1,214 +1,200 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import unicode_literals from __future__ import unicode_literals
import itertools
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urlparse,
)
from ..utils import ( from ..utils import (
determine_ext,
js_to_json,
strip_jsonp,
unified_strdate, unified_strdate,
qualities, ExtractorError,
) )
class WDRIE(InfoExtractor): class WDRIE(InfoExtractor):
_PLAYER_REGEX = '-(?:video|audio)player(?:_size-[LMS])?' _CURRENT_MAUS_URL = r'https?://(?:www\.)wdrmaus.de/(?:[^/]+/){1,2}[^/?#]+\.php5'
_VALID_URL = r'(?P<url>https?://www\d?\.(?:wdr\d?|funkhauseuropa)\.de/)(?P<id>.+?)(?P<player>%s)?\.html' % _PLAYER_REGEX _PAGE_REGEX = r'/mediathek/(?P<media_type>[^/]+)/(?P<type>[^/]+)/(?P<display_id>.+)\.html'
_VALID_URL = r'(?P<page_url>https?://(?:www\d\.)?wdr\d?\.de)' + _PAGE_REGEX + '|' + _CURRENT_MAUS_URL
_TESTS = [ _TESTS = [
{ {
'url': 'http://www1.wdr.de/mediathek/video/sendungen/servicezeit/videoservicezeit560-videoplayer_size-L.html', 'url': 'http://www1.wdr.de/mediathek/video/sendungen/doku-am-freitag/video-geheimnis-aachener-dom-100.html',
# HDS download, MD5 is unstable
'info_dict': { 'info_dict': {
'id': 'mdb-362427', 'id': 'mdb-1058683',
'ext': 'flv', 'ext': 'flv',
'title': 'Servicezeit', 'display_id': 'doku-am-freitag/video-geheimnis-aachener-dom-100',
'description': 'md5:c8f43e5e815eeb54d0b96df2fba906cb', 'title': 'Geheimnis Aachener Dom',
'upload_date': '20140310', 'alt_title': 'Doku am Freitag',
'is_live': False 'upload_date': '20160304',
'description': 'md5:87be8ff14d8dfd7a7ee46f0299b52318',
'is_live': False,
'subtitles': {'de': [{
'url': 'http://ondemand-ww.wdr.de/medp/fsk0/105/1058683/1058683_12220974.xml'
}]},
}, },
'params': {
'skip_download': True,
},
'skip': 'Page Not Found',
}, },
{ {
'url': 'http://www1.wdr.de/themen/av/videomargaspiegelisttot101-videoplayer.html', 'url': 'http://www1.wdr.de/mediathek/audio/wdr3/wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100.html',
'md5': 'f4c1f96d01cf285240f53ea4309663d8',
'info_dict': { 'info_dict': {
'id': 'mdb-363194', 'id': 'mdb-1072000',
'ext': 'flv',
'title': 'Marga Spiegel ist tot',
'description': 'md5:2309992a6716c347891c045be50992e4',
'upload_date': '20140311',
'is_live': False
},
'params': {
'skip_download': True,
},
'skip': 'Page Not Found',
},
{
'url': 'http://www1.wdr.de/themen/kultur/audioerlebtegeschichtenmargaspiegel100-audioplayer.html',
'md5': '83e9e8fefad36f357278759870805898',
'info_dict': {
'id': 'mdb-194332',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Erlebte Geschichten: Marga Spiegel (29.11.2009)', 'display_id': 'wdr3-gespraech-am-samstag/audio-schriftstellerin-juli-zeh-100',
'description': 'md5:2309992a6716c347891c045be50992e4', 'title': 'Schriftstellerin Juli Zeh',
'upload_date': '20091129', 'alt_title': 'WDR 3 Gespräch am Samstag',
'is_live': False 'upload_date': '20160312',
'description': 'md5:e127d320bc2b1f149be697ce044a3dd7',
'is_live': False,
'subtitles': {}
}, },
}, },
{ {
'url': 'http://www.funkhauseuropa.de/av/audioflaviacoelhoamaramar100-audioplayer.html', 'url': 'http://www1.wdr.de/mediathek/video/live/index.html',
'md5': '99a1443ff29af19f6c52cf6f4dc1f4aa',
'info_dict': {
'id': 'mdb-478135',
'ext': 'mp3',
'title': 'Flavia Coelho: Amar é Amar',
'description': 'md5:7b29e97e10dfb6e265238b32fa35b23a',
'upload_date': '20140717',
'is_live': False
},
'skip': 'Page Not Found',
},
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/quarks_und_co/filterseite-quarks-und-co100.html',
'playlist_mincount': 146,
'info_dict': {
'id': 'mediathek/video/sendungen/quarks_und_co/filterseite-quarks-und-co100',
}
},
{
'url': 'http://www1.wdr.de/mediathek/video/livestream/index.html',
'info_dict': { 'info_dict': {
'id': 'mdb-103364', 'id': 'mdb-103364',
'title': 're:^WDR Fernsehen Live [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'ext': 'mp4',
'display_id': 'index',
'title': r're:^WDR Fernsehen im Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'alt_title': 'WDR Fernsehen Live',
'upload_date': None,
'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9', 'description': 'md5:ae2ff888510623bf8d4b115f95a9b7c9',
'ext': 'flv', 'is_live': True,
'upload_date': '20150101', 'subtitles': {}
'is_live': True
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True, # m3u8 download
}, },
} },
{
'url': 'http://www1.wdr.de/mediathek/video/sendungen/aktuelle-stunde/aktuelle-stunde-120.html',
'playlist_mincount': 8,
'info_dict': {
'id': 'aktuelle-stunde/aktuelle-stunde-120',
},
},
{
'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
'info_dict': {
'id': 'mdb-1096487',
'ext': 'flv',
'upload_date': 're:^[0-9]{8}$',
'title': 're:^Die Sendung mit der Maus vom [0-9.]{10}$',
'description': '- Die Sendung mit der Maus -',
},
'skip': 'The id changes from week to week because of the new episode'
},
{
'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/achterbahn.php5',
# HDS download, MD5 is unstable
'info_dict': {
'id': 'mdb-186083',
'ext': 'flv',
'upload_date': '20130919',
'title': 'Sachgeschichte - Achterbahn ',
'description': '- Die Sendung mit der Maus -',
},
},
] ]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
page_url = mobj.group('url') url_type = mobj.group('type')
page_id = mobj.group('id') page_url = mobj.group('page_url')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, page_id) # for wdr.de the data-extension is in a tag with the class "mediaLink"
# for wdrmaus its in a link to the page in a multiline "videoLink"-tag
json_metadata = self._html_search_regex(
r'class=(?:"mediaLink\b[^"]*"[^>]+|"videoLink\b[^"]*"[\s]*>\n[^\n]*)data-extension="([^"]+)"',
webpage, 'media link', default=None, flags=re.MULTILINE)
if mobj.group('player') is None: if not json_metadata:
entries = [ entries = [
self.url_result(page_url + href, 'WDR') self.url_result(page_url + href[0], 'WDR')
for href in re.findall( for href in re.findall(
r'<a href="/?(.+?%s\.html)" rel="nofollow"' % self._PLAYER_REGEX, r'<a href="(%s)"[^>]+data-extension=' % self._PAGE_REGEX,
webpage) webpage)
] ]
if entries: # Playlist page if entries: # Playlist page
return self.playlist_result(entries, page_id) return self.playlist_result(entries, playlist_id=display_id)
# Overview page raise ExtractorError('No downloadable streams found', expected=True)
entries = []
for page_num in itertools.count(2):
hrefs = re.findall(
r'<li class="mediathekvideo"\s*>\s*<img[^>]*>\s*<a href="(/mediathek/video/[^"]+)"',
webpage)
entries.extend(
self.url_result(page_url + href, 'WDR')
for href in hrefs)
next_url_m = re.search(
r'<li class="nextToLast">\s*<a href="([^"]+)"', webpage)
if not next_url_m:
break
next_url = page_url + next_url_m.group(1)
webpage = self._download_webpage(
next_url, page_id,
note='Downloading playlist page %d' % page_num)
return self.playlist_result(entries, page_id)
flashvars = compat_parse_qs(self._html_search_regex( media_link_obj = self._parse_json(json_metadata, display_id,
r'<param name="flashvars" value="([^"]+)"', webpage, 'flashvars')) transform_source=js_to_json)
jsonp_url = media_link_obj['mediaObj']['url']
page_id = flashvars['trackerClipId'][0] metadata = self._download_json(
video_url = flashvars['dslSrc'][0] jsonp_url, 'metadata', transform_source=strip_jsonp)
title = flashvars['trackerClipTitle'][0]
thumbnail = flashvars['startPicture'][0] if 'startPicture' in flashvars else None metadata_tracker_data = metadata['trackerData']
is_live = flashvars.get('isLive', ['0'])[0] == '1' metadata_media_resource = metadata['mediaResource']
formats = []
# check if the metadata contains a direct URL to a file
metadata_media_alt = metadata_media_resource.get('alt')
if metadata_media_alt:
for tag_name in ['videoURL', 'audioURL']:
if tag_name in metadata_media_alt:
alt_url = metadata_media_alt[tag_name]
if determine_ext(alt_url) == 'm3u8':
m3u_fmt = self._extract_m3u8_formats(
alt_url, display_id, 'mp4', 'm3u8_native',
m3u8_id='hls')
formats.extend(m3u_fmt)
else:
formats.append({
'url': alt_url
})
# check if there are flash-streams for this video
if 'dflt' in metadata_media_resource and 'videoURL' in metadata_media_resource['dflt']:
video_url = metadata_media_resource['dflt']['videoURL']
if video_url.endswith('.f4m'):
full_video_url = video_url + '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18'
formats.extend(self._extract_f4m_formats(full_video_url, display_id, f4m_id='hds', fatal=False))
elif video_url.endswith('.smil'):
formats.extend(self._extract_smil_formats(video_url, 'stream', fatal=False))
subtitles = {}
caption_url = metadata_media_resource.get('captionURL')
if caption_url:
subtitles['de'] = [{
'url': caption_url
}]
title = metadata_tracker_data.get('trackerClipTitle')
is_live = url_type == 'live'
if is_live: if is_live:
title = self._live_title(title) title = self._live_title(title)
upload_date = None
if 'trackerClipAirTime' in flashvars: elif 'trackerClipAirTime' in metadata_tracker_data:
upload_date = flashvars['trackerClipAirTime'][0] upload_date = metadata_tracker_data['trackerClipAirTime']
else: else:
upload_date = self._html_search_meta( upload_date = self._html_search_meta('DC.Date', webpage, 'upload date')
'DC.Date', webpage, 'upload date')
if upload_date: if upload_date:
upload_date = unified_strdate(upload_date) upload_date = unified_strdate(upload_date)
formats = []
preference = qualities(['S', 'M', 'L', 'XL'])
if video_url.endswith('.f4m'):
formats.extend(self._extract_f4m_formats(
video_url + '?hdcore=3.2.0&plugin=aasp-3.2.0.77.18', page_id,
f4m_id='hds', fatal=False))
elif video_url.endswith('.smil'):
formats.extend(self._extract_smil_formats(
video_url, page_id, False, {
'hdcore': '3.3.0',
'plugin': 'aasp-3.3.0.99.43',
}))
else:
formats.append({
'url': video_url,
'http_headers': {
'User-Agent': 'mobile',
},
})
m3u8_url = self._search_regex(
r'rel="adaptiv"[^>]+href="([^"]+)"',
webpage, 'm3u8 url', default=None)
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, page_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
direct_urls = re.findall(
r'rel="web(S|M|L|XL)"[^>]+href="([^"]+)"', webpage)
if direct_urls:
for quality, video_url in direct_urls:
formats.append({
'url': video_url,
'preference': preference(quality),
'http_headers': {
'User-Agent': 'mobile',
},
})
self._sort_formats(formats) self._sort_formats(formats)
description = self._html_search_meta('Description', webpage, 'description')
return { return {
'id': page_id, 'id': metadata_tracker_data.get('trackerClipId', display_id),
'formats': formats, 'display_id': display_id,
'title': title, 'title': title,
'description': description, 'alt_title': metadata_tracker_data.get('trackerClipSubcategory'),
'thumbnail': thumbnail, 'formats': formats,
'upload_date': upload_date, 'upload_date': upload_date,
'is_live': is_live 'description': self._html_search_meta('Description', webpage),
'is_live': is_live,
'subtitles': subtitles,
} }
@ -241,81 +227,3 @@ class WDRMobileIE(InfoExtractor):
'User-Agent': 'mobile', 'User-Agent': 'mobile',
}, },
} }
class WDRMausIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?wdrmaus\.de/(?:[^/]+/){,2}(?P<id>[^/?#]+)(?:/index\.php5|(?<!index)\.php5|/(?:$|[?#]))'
IE_DESC = 'Sendung mit der Maus'
_TESTS = [{
'url': 'http://www.wdrmaus.de/aktuelle-sendung/index.php5',
'info_dict': {
'id': 'aktuelle-sendung',
'ext': 'mp4',
'thumbnail': 're:^http://.+\.jpg',
'upload_date': 're:^[0-9]{8}$',
'title': 're:^[0-9.]{10} - Aktuelle Sendung$',
}
}, {
'url': 'http://www.wdrmaus.de/sachgeschichten/sachgeschichten/40_jahre_maus.php5',
'md5': '3b1227ca3ed28d73ec5737c65743b2a3',
'info_dict': {
'id': '40_jahre_maus',
'ext': 'mp4',
'thumbnail': 're:^http://.+\.jpg',
'upload_date': '20131007',
'title': '12.03.2011 - 40 Jahre Maus',
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
param_code = self._html_search_regex(
r'<a href="\?startVideo=1&amp;([^"]+)"', webpage, 'parameters')
title_date = self._search_regex(
r'<div class="sendedatum"><p>Sendedatum:\s*([0-9\.]+)</p>',
webpage, 'air date')
title_str = self._html_search_regex(
r'<h1>(.*?)</h1>', webpage, 'title')
title = '%s - %s' % (title_date, title_str)
upload_date = unified_strdate(
self._html_search_meta('dc.date', webpage))
fields = compat_parse_qs(param_code)
video_url = fields['firstVideo'][0]
thumbnail = compat_urlparse.urljoin(url, fields['startPicture'][0])
formats = [{
'format_id': 'rtmp',
'url': video_url,
}]
jscode = self._download_webpage(
'http://www.wdrmaus.de/codebase/js/extended-medien.min.js',
video_id, fatal=False,
note='Downloading URL translation table',
errnote='Could not download URL translation table')
if jscode:
for m in re.finditer(
r"stream:\s*'dslSrc=(?P<stream>[^']+)',\s*download:\s*'(?P<dl>[^']+)'\s*\}",
jscode):
if video_url.startswith(m.group('stream')):
http_url = video_url.replace(
m.group('stream'), m.group('dl'))
formats.append({
'format_id': 'http',
'url': http_url,
})
break
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'upload_date': upload_date,
}

View File

@ -344,6 +344,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'139': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 48, 'preference': -50, 'container': 'm4a_dash'}, '139': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 48, 'preference': -50, 'container': 'm4a_dash'},
'140': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 128, 'preference': -50, 'container': 'm4a_dash'}, '140': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 128, 'preference': -50, 'container': 'm4a_dash'},
'141': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 256, 'preference': -50, 'container': 'm4a_dash'}, '141': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'abr': 256, 'preference': -50, 'container': 'm4a_dash'},
'256': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'preference': -50, 'container': 'm4a_dash'},
'258': {'ext': 'm4a', 'format_note': 'DASH audio', 'acodec': 'aac', 'preference': -50, 'container': 'm4a_dash'},
# Dash webm # Dash webm
'167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40}, '167': {'ext': 'webm', 'height': 360, 'width': 640, 'format_note': 'DASH video', 'container': 'webm', 'vcodec': 'vp8', 'preference': -40},

View File

@ -668,7 +668,7 @@ def parseOpts(overrideArguments=None):
action='store_true', dest='writeannotations', default=False, action='store_true', dest='writeannotations', default=False,
help='Write video annotations to a .annotations.xml file') help='Write video annotations to a .annotations.xml file')
filesystem.add_option( filesystem.add_option(
'--load-info', '--load-info-json', '--load-info',
dest='load_info_filename', metavar='FILE', dest='load_info_filename', metavar='FILE',
help='JSON file containing the video information (created with the "--write-info-json" option)') help='JSON file containing the video information (created with the "--write-info-json" option)')
filesystem.add_option( filesystem.add_option(

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.06.02' __version__ = '2016.06.03'