diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md index 693d787e3..a78413518 100644 --- a/.github/ISSUE_TEMPLATE.md +++ b/.github/ISSUE_TEMPLATE.md @@ -6,8 +6,8 @@ --- -### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.12.22*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. -- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.12.22** +### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.01.05*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. +- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.01.05** ### Before submitting an *issue* make sure you have: - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections @@ -35,7 +35,7 @@ $ youtube-dl -v [debug] User config: [] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 -[debug] youtube-dl version 2016.12.22 +[debug] youtube-dl version 2017.01.05 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] Proxy map: {} @@ -50,6 +50,8 @@ $ youtube-dl -v - Single video: https://youtu.be/BaW_jenozKc - Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc +Note that **youtube-dl does not support sites dedicated to [copyright infringement](https://github.com/rg3/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. In order for site support request to be accepted all provided example URLs should not violate any copyrights. + --- ### Description of your *issue*, suggested solution and other information diff --git a/.github/ISSUE_TEMPLATE_tmpl.md b/.github/ISSUE_TEMPLATE_tmpl.md index ab9968129..df79503d3 100644 --- a/.github/ISSUE_TEMPLATE_tmpl.md +++ b/.github/ISSUE_TEMPLATE_tmpl.md @@ -50,6 +50,8 @@ $ youtube-dl -v - Single video: https://youtu.be/BaW_jenozKc - Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc +Note that **youtube-dl does not support sites dedicated to [copyright infringement](https://github.com/rg3/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. In order for site support request to be accepted all provided example URLs should not violate any copyrights. + --- ### Description of your *issue*, suggested solution and other information diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 495955bb5..f50f52841 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -58,7 +58,7 @@ We are then presented with a very complicated request when the original problem Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones. -In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service. +In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, White house podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service. ### Is anyone going to need the feature? @@ -94,7 +94,7 @@ If you want to create a build of youtube-dl yourself, you'll need If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**. -After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): +After you have ensured this site is distributing its content legally, you can follow this quick list (assuming your service is called `yourextractor`): 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 2. Check out the source code with: @@ -199,7 +199,7 @@ Assume at this point `meta`'s layout is: } ``` -Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like: +Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional meta field you should be ready that this key may be missing from the `meta` dict, so that you should extract it like: ```python description = meta.get('summary') # correct diff --git a/ChangeLog b/ChangeLog index c45441345..2d2e22af9 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,49 @@ +version + +Core +* Fix "invalid escape sequence" errors under Python 3.6 (#11581) + +Extractors +* [discoverygo] Fix JSON data parsing (#11219, #11522) + + +version 2017.01.05 + +Extractors ++ [zdf] Fix extraction (#11055, #11063) +* [pornhub:playlist] Improve extraction (#11594) ++ [cctv] Add support for ncpa-classic.com (#11591) ++ [tunein] Add support for embeds (#11579) + + +version 2017.01.02 + +Extractors +* [cctv] Improve extraction (#879, #6753, #8541) ++ [nrktv:episodes] Add support for episodes (#11571) ++ [arkena] Add support for video.arkena.com (#11568) + + +version 2016.12.31 + +Core ++ Introduce --config-location option for custom configuration files (#6745, + #10648) + +Extractors ++ [twitch] Add support for player.twitch.tv (#11535, #11537) ++ [videa] Add support for videa.hu (#8181, #11133) +* [vk] Fix postlive videos extraction +* [vk] Extract from playerParams (#11555) +- [freevideo] Remove extractor (#11515) ++ [showroomlive] Add support for showroom-live.com (#11458) +* [xhamster] Fix duration extraction (#11549) +* [rtve:live] Fix extraction (#11529) +* [brightcove:legacy] Improve embeds detection (#11523) ++ [twitch] Add support for rechat messages (#11524) +* [acast] Fix audio and timestamp extraction (#11521) + + version 2016.12.22 Core diff --git a/README.md b/README.md index 71d37e8b0..905c1b73f 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.ex You can also use pip: - sudo pip install --upgrade youtube-dl + sudo -H pip install --upgrade youtube-dl This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information. @@ -44,11 +44,7 @@ Or with [MacPorts](https://www.macports.org/): Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html). # DESCRIPTION -**youtube-dl** is a command-line program to download videos from -YouTube.com and a few more sites. It requires the Python interpreter, version -2.6, 2.7, or 3.2+, and it is not platform specific. It should work on -your Unix box, on Windows or on Mac OS X. It is released to the public domain, -which means you can modify it, redistribute it or use it however you like. +**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like. youtube-dl [OPTIONS] URL [URL...] @@ -84,6 +80,9 @@ which means you can modify it, redistribute it or use it however you like. configuration in ~/.config/youtube- dl/config (%APPDATA%/youtube-dl/config.txt on Windows) + --config-location PATH Location of the configuration file; either + the path to the config or its containing + directory. --flat-playlist Do not extract the videos of a playlist, only list them. --mark-watched Mark videos watched (YouTube only) @@ -187,7 +186,7 @@ which means you can modify it, redistribute it or use it however you like. of SIZE. --playlist-reverse Download playlist videos in reverse order --xattr-set-filesize Set file xattribute ytdl.filesize with - expected filesize (experimental) + expected file size (experimental) --hls-prefer-native Use the native HLS downloader instead of ffmpeg --hls-prefer-ffmpeg Use ffmpeg instead of the native HLS @@ -354,7 +353,7 @@ which means you can modify it, redistribute it or use it however you like. -u, --username USERNAME Login with this account ID -p, --password PASSWORD Account password. If this option is left out, youtube-dl will ask interactively. - -2, --twofactor TWOFACTOR Two-factor auth code + -2, --twofactor TWOFACTOR Two-factor authentication code -n, --netrc Use .netrc authentication data --video-password PASSWORD Video password (vimeo, smotri, youku) @@ -447,6 +446,8 @@ Note that options in configuration file are just the same options aka switches u You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run. +You can also use `--config-location` if you want to use custom configuration file for a particular youtube-dl run. + ### Authentication with `.netrc` file You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you: @@ -744,7 +745,7 @@ Most people asking this question are not aware that youtube-dl now defaults to d ### I get HTTP error 402 when trying to download a video. What's this? -Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl. +Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a web browser to the youtube URL, solving the CAPTCHA, and restart youtube-dl. ### Do I need any other programs? @@ -756,7 +757,7 @@ Videos or video formats streamed via RTMP protocol can only be downloaded when [ Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/). -### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser. +### I extracted a video URL with `-g`, but it does not play on another machine / in my web browser. It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies and/or HTTP headers. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl. You can also get necessary cookies and HTTP headers from JSON output obtained with `--dump-json`. @@ -962,7 +963,7 @@ After you have ensured this site is distributing its content legally, you can fo 'id': '42', 'ext': 'mp4', 'title': 'Video title goes here', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', # TODO more properties, either as: # * A value # * MD5 checksum; start the string with md5: @@ -1037,7 +1038,7 @@ Assume at this point `meta`'s layout is: } ``` -Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like: +Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional meta field you should be ready that this key may be missing from the `meta` dict, so that you should extract it like: ```python description = meta.get('summary') # correct @@ -1149,7 +1150,7 @@ with youtube_dl.YoutubeDL(ydl_opts) as ydl: ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc']) ``` -Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object. +Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L129-L279). For a start, if you want to intercept youtube-dl's output, set a `logger` object. Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file: @@ -1252,7 +1253,7 @@ We are then presented with a very complicated request when the original problem Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones. -In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service. +In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, White house podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service. ### Is anyone going to need the feature? diff --git a/devscripts/buildserver.py b/devscripts/buildserver.py index fc99c3213..1344b4d87 100644 --- a/devscripts/buildserver.py +++ b/devscripts/buildserver.py @@ -424,8 +424,6 @@ class BuildHTTPRequestHandler(compat_http_server.BaseHTTPRequestHandler): self.send_header('Content-Length', len(msg)) self.end_headers() self.wfile.write(msg) - except HTTPError as e: - self.send_response(e.code, str(e)) else: self.send_response(500, 'Unknown build method "%s"' % action) else: diff --git a/docs/supportedsites.md b/docs/supportedsites.md index 0b3d794c6..0e301e8f3 100644 --- a/docs/supportedsites.md +++ b/docs/supportedsites.md @@ -132,7 +132,7 @@ - **cbsnews:livevideo**: CBS News Live Videos - **CBSSports** - **CCMA** - - **CCTV** + - **CCTV**: 央视网 - **CDA** - **CeskaTelevize** - **channel9**: Channel 9 @@ -263,7 +263,6 @@ - **francetvinfo.fr** - **Freesound** - **freespeech.org** - - **FreeVideo** - **Funimation** - **FunnyOrDie** - **Fusion** @@ -518,6 +517,7 @@ - **NRKSkole**: NRK Skole - **NRKTV**: NRK TV and NRK Radio - **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte + - **NRKTVEpisodes** - **ntv.ru** - **Nuvid** - **NYTimes** @@ -659,6 +659,7 @@ - **Shahid** - **Shared**: shared.sx - **ShareSix** + - **ShowRoomLive** - **Sina** - **SixPlay** - **skynewsarabia:article** @@ -834,6 +835,7 @@ - **ViceShow** - **Vidbit** - **Viddler** + - **Videa** - **video.google:search**: Google Video search - **video.mit.edu** - **VideoDetective** diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py index 53f20ac2c..5d654f55f 100755 --- a/youtube_dl/YoutubeDL.py +++ b/youtube_dl/YoutubeDL.py @@ -1339,7 +1339,7 @@ class YoutubeDL(object): format['format_id'] = compat_str(i) else: # Sanitize format_id from characters used in format selector expression - format['format_id'] = re.sub('[\s,/+\[\]()]', '_', format['format_id']) + format['format_id'] = re.sub(r'[\s,/+\[\]()]', '_', format['format_id']) format_id = format['format_id'] if format_id not in formats_dict: formats_dict[format_id] = [] diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py index 6850d95e1..dfa4ae839 100644 --- a/youtube_dl/__init__.py +++ b/youtube_dl/__init__.py @@ -405,7 +405,7 @@ def _real_main(argv=None): 'postprocessor_args': postprocessor_args, 'cn_verification_proxy': opts.cn_verification_proxy, 'geo_verification_proxy': opts.geo_verification_proxy, - + 'config_location': opts.config_location, } with YoutubeDL(ydl_opts) as ydl: diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py index 83ee7e257..02abf8c1e 100644 --- a/youtube_dl/compat.py +++ b/youtube_dl/compat.py @@ -2344,7 +2344,7 @@ try: from urllib.parse import unquote_plus as compat_urllib_parse_unquote_plus except ImportError: # Python 2 _asciire = (compat_urllib_parse._asciire if hasattr(compat_urllib_parse, '_asciire') - else re.compile('([\x00-\x7f]+)')) + else re.compile(r'([\x00-\x7f]+)')) # HACK: The following are the correct unquote_to_bytes, unquote and unquote_plus # implementations from cpython 3.4.3's stdlib. Python 2's version diff --git a/youtube_dl/extractor/abcnews.py b/youtube_dl/extractor/abcnews.py index 6ae5d9a96..4f56c4c11 100644 --- a/youtube_dl/extractor/abcnews.py +++ b/youtube_dl/extractor/abcnews.py @@ -23,7 +23,7 @@ class AbcNewsVideoIE(AMPIE): 'title': '\'This Week\' Exclusive: Iran\'s Foreign Minister Zarif', 'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.', 'duration': 180, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { # m3u8 download @@ -59,7 +59,7 @@ class AbcNewsIE(InfoExtractor): 'display_id': 'dramatic-video-rare-death-job-america', 'title': 'Occupational Hazards', 'description': 'Nightline investigates the dangers that lurk at various jobs.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20100428', 'timestamp': 1272412800, }, diff --git a/youtube_dl/extractor/abcotvs.py b/youtube_dl/extractor/abcotvs.py index 054bb0596..76e98132b 100644 --- a/youtube_dl/extractor/abcotvs.py +++ b/youtube_dl/extractor/abcotvs.py @@ -23,7 +23,7 @@ class ABCOTVSIE(InfoExtractor): 'ext': 'mp4', 'title': 'East Bay museum celebrates vintage synthesizers', 'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1421123075, 'upload_date': '20150113', 'uploader': 'Jonathan Bloom', diff --git a/youtube_dl/extractor/adobetv.py b/youtube_dl/extractor/adobetv.py index 5ae16fa16..008c98e51 100644 --- a/youtube_dl/extractor/adobetv.py +++ b/youtube_dl/extractor/adobetv.py @@ -30,7 +30,7 @@ class AdobeTVIE(AdobeTVBaseIE): 'ext': 'mp4', 'title': 'Quick Tip - How to Draw a Circle Around an Object in Photoshop', 'description': 'md5:99ec318dc909d7ba2a1f2b038f7d2311', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'upload_date': '20110914', 'duration': 60, 'view_count': int, diff --git a/youtube_dl/extractor/airmozilla.py b/youtube_dl/extractor/airmozilla.py index f8e70f4e5..0e0691879 100644 --- a/youtube_dl/extractor/airmozilla.py +++ b/youtube_dl/extractor/airmozilla.py @@ -20,7 +20,7 @@ class AirMozillaIE(InfoExtractor): 'id': '6x4q2w', 'ext': 'mp4', 'title': 'Privacy Lab - a meetup for privacy minded people in San Francisco', - 'thumbnail': 're:https?://vid\.ly/(?P[0-9a-z-]+)/poster', + 'thumbnail': r're:https?://vid\.ly/(?P[0-9a-z-]+)/poster', 'description': 'Brings together privacy professionals and others interested in privacy at for-profits, non-profits, and NGOs in an effort to contribute to the state of the ecosystem...', 'timestamp': 1422487800, 'upload_date': '20150128', diff --git a/youtube_dl/extractor/allocine.py b/youtube_dl/extractor/allocine.py index 517b06def..90f11d39f 100644 --- a/youtube_dl/extractor/allocine.py +++ b/youtube_dl/extractor/allocine.py @@ -21,7 +21,7 @@ class AllocineIE(InfoExtractor): 'ext': 'mp4', 'title': 'Astérix - Le Domaine des Dieux Teaser VF', 'description': 'md5:4a754271d9c6f16c72629a8a993ee884', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }, { 'url': 'http://www.allocine.fr/video/player_gen_cmedia=19540403&cfilm=222257.html', @@ -32,7 +32,7 @@ class AllocineIE(InfoExtractor): 'ext': 'mp4', 'title': 'Planes 2 Bande-annonce VF', 'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }, { 'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html', @@ -43,7 +43,7 @@ class AllocineIE(InfoExtractor): 'ext': 'mp4', 'title': 'Dragons 2 - Bande annonce finale VF', 'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }, { 'url': 'http://www.allocine.fr/video/video-19550147/', @@ -53,7 +53,7 @@ class AllocineIE(InfoExtractor): 'ext': 'mp4', 'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger', 'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }] diff --git a/youtube_dl/extractor/alphaporno.py b/youtube_dl/extractor/alphaporno.py index c34719d1f..3a6d99f6b 100644 --- a/youtube_dl/extractor/alphaporno.py +++ b/youtube_dl/extractor/alphaporno.py @@ -19,7 +19,7 @@ class AlphaPornoIE(InfoExtractor): 'display_id': 'sensual-striptease-porn-with-samantha-alexandra', 'ext': 'mp4', 'title': 'Sensual striptease porn with Samantha Alexandra', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'timestamp': 1418694611, 'upload_date': '20141216', 'duration': 387, diff --git a/youtube_dl/extractor/aol.py b/youtube_dl/extractor/aol.py index 2cdee3320..b50f454ee 100644 --- a/youtube_dl/extractor/aol.py +++ b/youtube_dl/extractor/aol.py @@ -12,7 +12,7 @@ from ..utils import ( class AolIE(InfoExtractor): IE_NAME = 'on.aol.com' - _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P[^/?#&]+)' + _VALID_URL = r'(?:aol-video:|https?://(?:(?:www|on)\.)?aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P[^/?#&]+)' _TESTS = [{ # video with 5min ID @@ -33,7 +33,7 @@ class AolIE(InfoExtractor): } }, { # video with vidible ID - 'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183', + 'url': 'http://www.aol.com/video/view/netflix-is-raising-rates/5707d6b8e4b090497b04f706/', 'info_dict': { 'id': '5707d6b8e4b090497b04f706', 'ext': 'mp4', @@ -108,30 +108,3 @@ class AolIE(InfoExtractor): 'uploader': video_data.get('videoOwner'), 'formats': formats, } - - -class AolFeaturesIE(InfoExtractor): - IE_NAME = 'features.aol.com' - _VALID_URL = r'https?://features\.aol\.com/video/(?P[^/?#]+)' - - _TESTS = [{ - 'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts', - 'md5': '7db483bb0c09c85e241f84a34238cc75', - 'info_dict': { - 'id': '519507715', - 'ext': 'mp4', - 'title': 'What To Watch - February 17, 2016', - }, - 'add_ie': ['FiveMin'], - 'params': { - # encrypted m3u8 download - 'skip_download': True, - }, - }] - - def _real_extract(self, url): - display_id = self._match_id(url) - webpage = self._download_webpage(url, display_id) - return self.url_result(self._search_regex( - r'', - webpage, 'info json', flags=re.DOTALL)) + _DEFAULT_BITRATES = (48, 150, 320, 496, 864, 2240, 3264) - youtube_id = info.get('youtubeId') + def _real_extract(self, url): + site, display_id, video_id = re.match(self._VALID_URL, url).groups() + + if not video_id: + webpage = self._download_webpage(url, display_id) + video_id = self._search_regex( + (r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'), + webpage, 'video id') + + webpage = self._download_webpage( + 'http://www.%s.com/embed/%s' % (site, video_id), + display_id, 'Downloading video embed page') + embed_vars = self._parse_json( + self._search_regex( + r'(?s)embedVars\s*=\s*({.+?})\s*', webpage, 'embed vars'), + display_id) + + youtube_id = embed_vars.get('youtubeId') if youtube_id: return self.url_result(youtube_id, 'Youtube') - formats = [{ - 'url': media['uri'] + '?' + info['AuthToken'], - 'tbr': media['bitRate'], - 'width': media['width'], - 'height': media['height'], - } for media in info['media'] if media.get('mediaPurpose') == 'play'] + title = embed_vars['contentName'] - if not formats: + formats = [] + bitrates = [] + for f in embed_vars.get('media', []): + if not f.get('uri') or f.get('mediaPurpose') != 'play': + continue + bitrate = int_or_none(f.get('bitRate')) + if bitrate: + bitrates.append(bitrate) formats.append({ - 'url': info['videoUri'] + 'url': f['uri'], + 'format_id': 'http-%d' % bitrate if bitrate else 'http', + 'width': int_or_none(f.get('width')), + 'height': int_or_none(f.get('height')), + 'tbr': bitrate, + 'format': 'mp4', }) - self._sort_formats(formats) + if not bitrates: + # When subscriptionLevel > 0, i.e. plus subscription is required + # media list will be empty. However, hds and hls uris are still + # available. We can grab them assuming bitrates to be default. + bitrates = self._DEFAULT_BITRATES - duration = int_or_none(info.get('videoLengthInSeconds')) - age_limit = parse_age_limit(info.get('audienceRating')) + auth_token = embed_vars.get('AuthToken') + + def construct_manifest_url(base_url, ext): + pieces = [base_url] + pieces.extend([compat_str(b) for b in bitrates]) + pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token)) + return ','.join(pieces) + + if bitrates and auth_token: + hds_url = embed_vars.get('hdsUri') + if hds_url: + formats.extend(self._extract_f4m_formats( + construct_manifest_url(hds_url, 'f4m'), + display_id, f4m_id='hds', fatal=False)) + hls_url = embed_vars.get('hlsUri') + if hls_url: + formats.extend(self._extract_m3u8_formats( + construct_manifest_url(hls_url, 'm3u8'), + display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)) + self._sort_formats(formats) return { 'id': video_id, - 'title': info['contentName'], - 'thumbnail': info['thumbUri'], - 'duration': duration, - 'age_limit': age_limit, + 'display_id': display_id, + 'title': title, + 'thumbnail': embed_vars.get('thumbUri'), + 'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None, + 'age_limit': parse_age_limit(embed_vars.get('audienceRating')), + 'tags': embed_vars.get('tags', '').split(','), 'formats': formats, } diff --git a/youtube_dl/extractor/byutv.py b/youtube_dl/extractor/byutv.py index 4be175d70..8ef089653 100644 --- a/youtube_dl/extractor/byutv.py +++ b/youtube_dl/extractor/byutv.py @@ -16,7 +16,7 @@ class BYUtvIE(InfoExtractor): 'ext': 'mp4', 'title': 'Season 5 Episode 5', 'description': 'md5:e07269172baff037f8e8bf9956bc9747', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 1486.486, }, 'params': { diff --git a/youtube_dl/extractor/camdemy.py b/youtube_dl/extractor/camdemy.py index d4e6fbdce..8f0c6c545 100644 --- a/youtube_dl/extractor/camdemy.py +++ b/youtube_dl/extractor/camdemy.py @@ -26,7 +26,7 @@ class CamdemyIE(InfoExtractor): 'id': '5181', 'ext': 'mp4', 'title': 'Ch1-1 Introduction, Signals (02-23-2012)', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'creator': 'ss11spring', 'duration': 1591, 'upload_date': '20130114', @@ -41,7 +41,7 @@ class CamdemyIE(InfoExtractor): 'id': '13885', 'ext': 'mp4', 'title': 'EverCam + Camdemy QuickStart', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:2a9f989c2b153a2342acee579c6e7db6', 'creator': 'evercam', 'duration': 318, diff --git a/youtube_dl/extractor/canvas.py b/youtube_dl/extractor/canvas.py index 2cc539a6c..544c6657c 100644 --- a/youtube_dl/extractor/canvas.py +++ b/youtube_dl/extractor/canvas.py @@ -17,7 +17,7 @@ class CanvasIE(InfoExtractor): 'ext': 'mp4', 'title': 'De afspraak veilt voor de Warmste Week', 'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 49.02, } }, { @@ -29,7 +29,7 @@ class CanvasIE(InfoExtractor): 'ext': 'mp4', 'title': 'Pieter 0167', 'description': 'md5:943cd30f48a5d29ba02c3a104dc4ec4e', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 2553.08, 'subtitles': { 'nl': [{ @@ -48,7 +48,7 @@ class CanvasIE(InfoExtractor): 'ext': 'mp4', 'title': 'Herbekijk Sorry voor alles', 'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 3788.06, }, 'params': { diff --git a/youtube_dl/extractor/carambatv.py b/youtube_dl/extractor/carambatv.py index 66c0f900a..9ba909a91 100644 --- a/youtube_dl/extractor/carambatv.py +++ b/youtube_dl/extractor/carambatv.py @@ -21,7 +21,7 @@ class CarambaTVIE(InfoExtractor): 'id': '191910501', 'ext': 'mp4', 'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 2678.31, }, }, { @@ -69,7 +69,7 @@ class CarambaTVPageIE(InfoExtractor): 'id': '475222', 'ext': 'flv', 'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', # duration reported by videomore is incorrect 'duration': int, }, diff --git a/youtube_dl/extractor/cbsnews.py b/youtube_dl/extractor/cbsnews.py index 91b0f5fa9..17bb9af4f 100644 --- a/youtube_dl/extractor/cbsnews.py +++ b/youtube_dl/extractor/cbsnews.py @@ -39,7 +39,7 @@ class CBSNewsIE(CBSIE): 'upload_date': '20140404', 'timestamp': 1396650660, 'uploader': 'CBSI-NEW', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 205, 'subtitles': { 'en': [{ diff --git a/youtube_dl/extractor/ccc.py b/youtube_dl/extractor/ccc.py index 8f7f09e22..734702144 100644 --- a/youtube_dl/extractor/ccc.py +++ b/youtube_dl/extractor/ccc.py @@ -19,7 +19,7 @@ class CCCIE(InfoExtractor): 'ext': 'mp4', 'title': 'Introduction to Processor Design', 'description': 'md5:df55f6d073d4ceae55aae6f2fd98a0ac', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20131228', 'timestamp': 1388188800, 'duration': 3710, @@ -32,7 +32,7 @@ class CCCIE(InfoExtractor): def _real_extract(self, url): display_id = self._match_id(url) webpage = self._download_webpage(url, display_id) - event_id = self._search_regex("data-id='(\d+)'", webpage, 'event id') + event_id = self._search_regex(r"data-id='(\d+)'", webpage, 'event id') event_data = self._download_json('https://media.ccc.de/public/events/%s' % event_id, event_id) formats = [] diff --git a/youtube_dl/extractor/cctv.py b/youtube_dl/extractor/cctv.py index 72a72cb73..c76f361c6 100644 --- a/youtube_dl/extractor/cctv.py +++ b/youtube_dl/extractor/cctv.py @@ -4,50 +4,188 @@ from __future__ import unicode_literals import re from .common import InfoExtractor -from ..utils import float_or_none +from ..compat import compat_str +from ..utils import ( + float_or_none, + try_get, + unified_timestamp, +) class CCTVIE(InfoExtractor): - _VALID_URL = r'''(?x)https?://(?:.+?\.)? - (?: - cctv\.(?:com|cn)| - cntv\.cn - )/ - (?: - video/[^/]+/(?P[0-9a-f]{32})| - \d{4}/\d{2}/\d{2}/(?PVID[0-9A-Za-z]+) - )''' + IE_DESC = '央视网' + _VALID_URL = r'https?://(?:(?:[^/]+)\.(?:cntv|cctv)\.(?:com|cn)|(?:www\.)?ncpa-classic\.com)/(?:[^/]+/)*?(?P[^/?#&]+?)(?:/index)?(?:\.s?html|[?#&]|$)' _TESTS = [{ - 'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml', - 'md5': '819c7b49fc3927d529fb4cd555621823', + # fo.addVariable("videoCenterId","id") + 'url': 'http://sports.cntv.cn/2016/02/12/ARTIaBRxv4rTT1yWf1frW2wi160212.shtml', + 'md5': 'd61ec00a493e09da810bf406a078f691', 'info_dict': { - 'id': '454368eb19ad44a1925bf1eb96140a61', + 'id': '5ecdbeab623f4973b40ff25f18b174e8', 'ext': 'mp4', - 'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1', - } + 'title': '[NBA]二少联手砍下46分 雷霆主场击败鹈鹕(快讯)', + 'description': 'md5:7e14a5328dc5eb3d1cd6afbbe0574e95', + 'duration': 98, + 'uploader': 'songjunjie', + 'timestamp': 1455279956, + 'upload_date': '20160212', + }, + }, { + # var guid = "id" + 'url': 'http://tv.cctv.com/2016/02/05/VIDEUS7apq3lKrHG9Dncm03B160205.shtml', + 'info_dict': { + 'id': 'efc5d49e5b3b4ab2b34f3a502b73d3ae', + 'ext': 'mp4', + 'title': '[赛车]“车王”舒马赫恢复情况成谜(快讯)', + 'description': '2月4日,蒙特泽莫罗透露了关于“车王”舒马赫恢复情况,但情况是否属实遭到了质疑。', + 'duration': 37, + 'uploader': 'shujun', + 'timestamp': 1454677291, + 'upload_date': '20160205', + }, + 'params': { + 'skip_download': True, + }, + }, { + # changePlayer('id') + 'url': 'http://english.cntv.cn/special/four_comprehensives/index.shtml', + 'info_dict': { + 'id': '4bb9bb4db7a6471ba85fdeda5af0381e', + 'ext': 'mp4', + 'title': 'NHnews008 ANNUAL POLITICAL SEASON', + 'description': 'Four Comprehensives', + 'duration': 60, + 'uploader': 'zhangyunlei', + 'timestamp': 1425385521, + 'upload_date': '20150303', + }, + 'params': { + 'skip_download': True, + }, + }, { + # loadvideo('id') + 'url': 'http://cctv.cntv.cn/lm/tvseries_russian/yilugesanghua/index.shtml', + 'info_dict': { + 'id': 'b15f009ff45c43968b9af583fc2e04b2', + 'ext': 'mp4', + 'title': 'Путь,усыпанный космеями Серия 1', + 'description': 'Путь, усыпанный космеями', + 'duration': 2645, + 'uploader': 'renxue', + 'timestamp': 1477479241, + 'upload_date': '20161026', + }, + 'params': { + 'skip_download': True, + }, + }, { + # var initMyAray = 'id' + 'url': 'http://www.ncpa-classic.com/2013/05/22/VIDE1369219508996867.shtml', + 'info_dict': { + 'id': 'a194cfa7f18c426b823d876668325946', + 'ext': 'mp4', + 'title': '小泽征尔音乐塾 音乐梦想无国界', + 'duration': 2173, + 'timestamp': 1369248264, + 'upload_date': '20130522', + }, + 'params': { + 'skip_download': True, + }, + }, { + # var ids = ["id"] + 'url': 'http://www.ncpa-classic.com/clt/more/416/index.shtml', + 'info_dict': { + 'id': 'a8606119a4884588a79d81c02abecc16', + 'ext': 'mp3', + 'title': '来自维也纳的新年贺礼', + 'description': 'md5:f13764ae8dd484e84dd4b39d5bcba2a7', + 'duration': 1578, + 'uploader': 'djy', + 'timestamp': 1482942419, + 'upload_date': '20161228', + }, + 'params': { + 'skip_download': True, + }, + 'expected_warnings': ['Failed to download m3u8 information'], + }, { + 'url': 'http://ent.cntv.cn/2016/01/18/ARTIjprSSJH8DryTVr5Bx8Wb160118.shtml', + 'only_matching': True, + }, { + 'url': 'http://tv.cntv.cn/video/C39296/e0210d949f113ddfb38d31f00a4e5c44', + 'only_matching': True, + }, { + 'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml', + 'only_matching': True, }, { 'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml', 'only_matching': True, }, { 'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44', - 'only_matching': True + 'only_matching': True, }] def _real_extract(self, url): - video_id, display_id = re.match(self._VALID_URL, url).groups() - if not video_id: - webpage = self._download_webpage(url, display_id) - video_id = self._search_regex( - r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})', - webpage, 'video_id') - api_data = self._download_json( - 'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id) - m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url']) + video_id = self._match_id(url) + webpage = self._download_webpage(url, video_id) + + video_id = self._search_regex( + [r'var\s+guid\s*=\s*["\']([\da-fA-F]+)', + r'videoCenterId["\']\s*,\s*["\']([\da-fA-F]+)', + r'changePlayer\s*\(\s*["\']([\da-fA-F]+)', + r'load[Vv]ideo\s*\(\s*["\']([\da-fA-F]+)', + r'var\s+initMyAray\s*=\s*["\']([\da-fA-F]+)', + r'var\s+ids\s*=\s*\[["\']([\da-fA-F]+)'], + webpage, 'video id') + + data = self._download_json( + 'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do', video_id, + query={ + 'pid': video_id, + 'url': url, + 'idl': 32, + 'idlr': 32, + 'modifyed': 'false', + }) + + title = data['title'] + + formats = [] + + video = data.get('video') + if isinstance(video, dict): + for quality, chapters_key in enumerate(('lowChapters', 'chapters')): + video_url = try_get( + video, lambda x: x[chapters_key][0]['url'], compat_str) + if video_url: + formats.append({ + 'url': video_url, + 'format_id': 'http', + 'quality': quality, + 'preference': -1, + }) + + hls_url = try_get(data, lambda x: x['hls_url'], compat_str) + if hls_url: + hls_url = re.sub(r'maxbr=\d+&?', '', hls_url) + formats.extend(self._extract_m3u8_formats( + hls_url, video_id, 'mp4', entry_protocol='m3u8_native', + m3u8_id='hls', fatal=False)) + + self._sort_formats(formats) + + uploader = data.get('editer_name') + description = self._html_search_meta( + 'description', webpage, default=None) + timestamp = unified_timestamp(data.get('f_pgmtime')) + duration = float_or_none(try_get(video, lambda x: x['totalLength'])) return { 'id': video_id, - 'title': api_data['title'], - 'formats': self._extract_m3u8_formats( - m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False), - 'duration': float_or_none(api_data.get('video', {}).get('totalLength')), + 'title': title, + 'description': description, + 'uploader': uploader, + 'timestamp': timestamp, + 'duration': duration, + 'formats': formats, } diff --git a/youtube_dl/extractor/cda.py b/youtube_dl/extractor/cda.py index e00bdaf66..ae7af2f0e 100755 --- a/youtube_dl/extractor/cda.py +++ b/youtube_dl/extractor/cda.py @@ -24,7 +24,7 @@ class CDAIE(InfoExtractor): 'height': 720, 'title': 'Oto dlaczego przed zakrętem należy zwolnić.', 'description': 'md5:269ccd135d550da90d1662651fcb9772', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'average_rating': float, 'duration': 39 } @@ -36,7 +36,7 @@ class CDAIE(InfoExtractor): 'ext': 'mp4', 'title': 'Lądowanie na lotnisku na Maderze', 'description': 'md5:60d76b71186dcce4e0ba6d4bbdb13e1a', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'crash404', 'view_count': int, 'average_rating': float, diff --git a/youtube_dl/extractor/ceskatelevize.py b/youtube_dl/extractor/ceskatelevize.py index 4ec79d19d..4f88c31ad 100644 --- a/youtube_dl/extractor/ceskatelevize.py +++ b/youtube_dl/extractor/ceskatelevize.py @@ -25,7 +25,7 @@ class CeskaTelevizeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Hyde Park Civilizace', 'description': 'md5:fe93f6eda372d150759d11644ebbfb4a', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 3350, }, 'params': { @@ -39,7 +39,7 @@ class CeskaTelevizeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Hyde Park Civilizace: Bonus 01 - En', 'description': 'English Subtittles', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 81.3, }, 'params': { @@ -52,7 +52,7 @@ class CeskaTelevizeIE(InfoExtractor): 'info_dict': { 'id': 402, 'ext': 'mp4', - 'title': 're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', + 'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', 'is_live': True, }, 'params': { @@ -80,7 +80,7 @@ class CeskaTelevizeIE(InfoExtractor): 'id': '61924494877068022', 'ext': 'mp4', 'title': 'Queer: Bogotart (Queer)', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 1558.3, }, }], diff --git a/youtube_dl/extractor/channel9.py b/youtube_dl/extractor/channel9.py index 34d4e6156..865dbcaba 100644 --- a/youtube_dl/extractor/channel9.py +++ b/youtube_dl/extractor/channel9.py @@ -31,7 +31,7 @@ class Channel9IE(InfoExtractor): 'title': 'Developer Kick-Off Session: Stuff We Love', 'description': 'md5:c08d72240b7c87fcecafe2692f80e35f', 'duration': 4576, - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'session_code': 'KOS002', 'session_day': 'Day 1', 'session_room': 'Arena 1A', @@ -47,7 +47,7 @@ class Channel9IE(InfoExtractor): 'title': 'Self-service BI with Power BI - nuclear testing', 'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b', 'duration': 1540, - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'authors': ['Mike Wilmot'], }, }, { @@ -59,7 +59,7 @@ class Channel9IE(InfoExtractor): 'title': 'Ranges for the Standard Library', 'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d', 'duration': 5646, - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, 'params': { 'skip_download': True, diff --git a/youtube_dl/extractor/charlierose.py b/youtube_dl/extractor/charlierose.py index 4bf2cf7b0..2d517f231 100644 --- a/youtube_dl/extractor/charlierose.py +++ b/youtube_dl/extractor/charlierose.py @@ -13,7 +13,7 @@ class CharlieRoseIE(InfoExtractor): 'id': '27996', 'ext': 'mp4', 'title': 'Remembering Zaha Hadid', - 'thumbnail': 're:^https?://.*\.jpg\?\d+', + 'thumbnail': r're:^https?://.*\.jpg\?\d+', 'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.', 'subtitles': { 'en': [{ diff --git a/youtube_dl/extractor/cliphunter.py b/youtube_dl/extractor/cliphunter.py index 252c2e846..ab651d1c8 100644 --- a/youtube_dl/extractor/cliphunter.py +++ b/youtube_dl/extractor/cliphunter.py @@ -30,7 +30,7 @@ class CliphunterIE(InfoExtractor): 'id': '1012420', 'ext': 'flv', 'title': 'Fun Jynx Maze solo', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'age_limit': 18, }, 'skip': 'Video gone', @@ -41,7 +41,7 @@ class CliphunterIE(InfoExtractor): 'id': '2019449', 'ext': 'mp4', 'title': 'ShesNew - My booty girlfriend, Victoria Paradice\'s pussy filled with jizz', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'age_limit': 18, }, }] diff --git a/youtube_dl/extractor/clipsyndicate.py b/youtube_dl/extractor/clipsyndicate.py index 0b6ad895f..6cdb42f5a 100644 --- a/youtube_dl/extractor/clipsyndicate.py +++ b/youtube_dl/extractor/clipsyndicate.py @@ -18,7 +18,7 @@ class ClipsyndicateIE(InfoExtractor): 'ext': 'mp4', 'title': 'Brick Briscoe', 'duration': 612, - 'thumbnail': 're:^https?://.+\.jpg', + 'thumbnail': r're:^https?://.+\.jpg', }, }, { 'url': 'http://chic.clipsyndicate.com/video/play/5844117/shark_attack', diff --git a/youtube_dl/extractor/clubic.py b/youtube_dl/extractor/clubic.py index f7ee3a8f8..98f9cb596 100644 --- a/youtube_dl/extractor/clubic.py +++ b/youtube_dl/extractor/clubic.py @@ -19,7 +19,7 @@ class ClubicIE(InfoExtractor): 'ext': 'mp4', 'title': 'Clubic Week 2.0 : le FBI se lance dans la photo d\u0092identité', 'description': 're:Gueule de bois chez Nokia. Le constructeur a indiqué cette.*', - 'thumbnail': 're:^http://img\.clubic\.com/.*\.jpg$', + 'thumbnail': r're:^http://img\.clubic\.com/.*\.jpg$', } }, { 'url': 'http://www.clubic.com/video/video-clubic-week-2-0-apple-iphone-6s-et-plus-mais-surtout-le-pencil-469792.html', diff --git a/youtube_dl/extractor/collegerama.py b/youtube_dl/extractor/collegerama.py index f9e84193d..18c734766 100644 --- a/youtube_dl/extractor/collegerama.py +++ b/youtube_dl/extractor/collegerama.py @@ -21,7 +21,7 @@ class CollegeRamaIE(InfoExtractor): 'ext': 'mp4', 'title': 'Een nieuwe wereld: waarden, bewustzijn en techniek van de mensheid 2.0.', 'description': '', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 7713.088, 'timestamp': 1413309600, 'upload_date': '20141014', diff --git a/youtube_dl/extractor/comedycentral.py b/youtube_dl/extractor/comedycentral.py index 0239dfd84..8bd589774 100644 --- a/youtube_dl/extractor/comedycentral.py +++ b/youtube_dl/extractor/comedycentral.py @@ -57,7 +57,8 @@ class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor): feed = self._download_json(video_zone['feed'], playlist_id) mgid = feed['result']['data']['id'] - videos_info = self._get_videos_info(mgid) + videos_info = self._get_videos_info(mgid, use_hls=True) + return videos_info @@ -79,7 +80,7 @@ class ToshIE(MTVServicesInfoExtractor): 'ext': 'mp4', 'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans', 'description': 'Tosh asked fans to share their summer plans.', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', # It's really reported to be published on year 2077 'upload_date': '20770610', 'timestamp': 3390510600, diff --git a/youtube_dl/extractor/coub.py b/youtube_dl/extractor/coub.py index a901b8d22..5fa1f006b 100644 --- a/youtube_dl/extractor/coub.py +++ b/youtube_dl/extractor/coub.py @@ -20,7 +20,7 @@ class CoubIE(InfoExtractor): 'id': '5u5n1', 'ext': 'mp4', 'title': 'The Matrix Moonwalk', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 4.6, 'timestamp': 1428527772, 'upload_date': '20150408', diff --git a/youtube_dl/extractor/crackle.py b/youtube_dl/extractor/crackle.py index cc68f1c00..25c5e7d04 100644 --- a/youtube_dl/extractor/crackle.py +++ b/youtube_dl/extractor/crackle.py @@ -14,7 +14,7 @@ class CrackleIE(InfoExtractor): 'ext': 'mp4', 'title': 'Everybody Respects A Bloody Nose', 'description': 'Jerry is kaffeeklatsching in L.A. with funnyman J.B. Smoove (Saturday Night Live, Real Husbands of Hollywood). They’re headed for brew at 10 Speed Coffee in a 1964 Studebaker Avanti.', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 906, 'series': 'Comedians In Cars Getting Coffee', 'season_number': 8, diff --git a/youtube_dl/extractor/criterion.py b/youtube_dl/extractor/criterion.py index cf6a5d6cb..f7815b905 100644 --- a/youtube_dl/extractor/criterion.py +++ b/youtube_dl/extractor/criterion.py @@ -14,7 +14,7 @@ class CriterionIE(InfoExtractor): 'ext': 'mp4', 'title': 'Le Samouraï', 'description': 'md5:a2b4b116326558149bef81f76dcbb93f', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/crooksandliars.py b/youtube_dl/extractor/crooksandliars.py index 443eb7691..7fb782db7 100644 --- a/youtube_dl/extractor/crooksandliars.py +++ b/youtube_dl/extractor/crooksandliars.py @@ -16,7 +16,7 @@ class CrooksAndLiarsIE(InfoExtractor): 'ext': 'mp4', 'title': 'Fox & Friends Says Protecting Atheists From Discrimination Is Anti-Christian!', 'description': 'md5:e1a46ad1650e3a5ec7196d432799127f', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1428207000, 'upload_date': '20150405', 'uploader': 'Heather', diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py index 8d5b69f68..559044352 100644 --- a/youtube_dl/extractor/crunchyroll.py +++ b/youtube_dl/extractor/crunchyroll.py @@ -142,7 +142,7 @@ class CrunchyrollIE(CrunchyrollBaseIE): 'ext': 'flv', 'title': 'Culture Japan Episode 1 – Rebuilding Japan after the 3.11', 'description': 'md5:2fbc01f90b87e8e9137296f37b461c12', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'Danny Choo Network', 'upload_date': '20120213', }, @@ -158,7 +158,7 @@ class CrunchyrollIE(CrunchyrollBaseIE): 'ext': 'mp4', 'title': 'Re:ZERO -Starting Life in Another World- Episode 5 – The Morning of Our Promise Is Still Distant', 'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'TV TOKYO', 'upload_date': '20160508', }, diff --git a/youtube_dl/extractor/ctsnews.py b/youtube_dl/extractor/ctsnews.py index 83ca90c3b..d565335cf 100644 --- a/youtube_dl/extractor/ctsnews.py +++ b/youtube_dl/extractor/ctsnews.py @@ -28,7 +28,7 @@ class CtsNewsIE(InfoExtractor): 'ext': 'mp4', 'title': '韓國31歲童顏男 貌如十多歲小孩', 'description': '越有年紀的人,越希望看起來年輕一點,而南韓卻有一位31歲的男子,看起來像是11、12歲的小孩,身...', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1378205880, 'upload_date': '20130903', } @@ -41,7 +41,7 @@ class CtsNewsIE(InfoExtractor): 'ext': 'mp4', 'title': 'iPhone6熱銷 蘋果財報亮眼', 'description': 'md5:f395d4f485487bb0f992ed2c4b07aa7d', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20150128', 'uploader_id': 'TBSCTS', 'uploader': '中華電視公司', diff --git a/youtube_dl/extractor/cultureunplugged.py b/youtube_dl/extractor/cultureunplugged.py index 9f26fa587..bcdf27323 100644 --- a/youtube_dl/extractor/cultureunplugged.py +++ b/youtube_dl/extractor/cultureunplugged.py @@ -21,7 +21,7 @@ class CultureUnpluggedIE(InfoExtractor): 'ext': 'mp4', 'title': 'The Next, Best West', 'description': 'md5:0423cd00833dea1519cf014e9d0903b1', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'creator': 'Coldstream Creative', 'duration': 2203, 'view_count': int, diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py index 4a3314ea7..31bf5faf6 100644 --- a/youtube_dl/extractor/dailymotion.py +++ b/youtube_dl/extractor/dailymotion.py @@ -58,7 +58,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor): 'ext': 'mp4', 'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News', 'description': 'Several come bundled with the Steam Controller.', - 'thumbnail': 're:^https?:.*\.(?:jpg|png)$', + 'thumbnail': r're:^https?:.*\.(?:jpg|png)$', 'duration': 74, 'timestamp': 1425657362, 'upload_date': '20150306', diff --git a/youtube_dl/extractor/daum.py b/youtube_dl/extractor/daum.py index 732b4362a..76f021892 100644 --- a/youtube_dl/extractor/daum.py +++ b/youtube_dl/extractor/daum.py @@ -32,7 +32,7 @@ class DaumIE(InfoExtractor): 'title': '마크 헌트 vs 안토니오 실바', 'description': 'Mark Hunt vs Antonio Silva', 'upload_date': '20131217', - 'thumbnail': 're:^https?://.*\.(?:jpg|png)', + 'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'duration': 2117, 'view_count': int, 'comment_count': int, @@ -45,7 +45,7 @@ class DaumIE(InfoExtractor): 'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118', 'description': 'md5:79794514261164ff27e36a21ad229fc5', 'upload_date': '20150604', - 'thumbnail': 're:^https?://.*\.(?:jpg|png)', + 'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'duration': 154, 'view_count': int, 'comment_count': int, @@ -61,7 +61,7 @@ class DaumIE(InfoExtractor): 'title': '01-Korean War ( Trouble on the horizon )', 'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름', 'upload_date': '20080223', - 'thumbnail': 're:^https?://.*\.(?:jpg|png)', + 'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'duration': 249, 'view_count': int, 'comment_count': int, @@ -139,7 +139,7 @@ class DaumClipIE(InfoExtractor): 'title': 'DOTA 2GETHER 시즌2 6회 - 2부', 'description': 'DOTA 2GETHER 시즌2 6회 - 2부', 'upload_date': '20130831', - 'thumbnail': 're:^https?://.*\.(?:jpg|png)', + 'thumbnail': r're:^https?://.*\.(?:jpg|png)', 'duration': 3868, 'view_count': int, }, diff --git a/youtube_dl/extractor/dbtv.py b/youtube_dl/extractor/dbtv.py index 6d880d43d..f232f0dc5 100644 --- a/youtube_dl/extractor/dbtv.py +++ b/youtube_dl/extractor/dbtv.py @@ -17,7 +17,7 @@ class DBTVIE(InfoExtractor): 'ext': 'mp4', 'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen', 'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0', - 'thumbnail': 're:https?://.*\.jpg', + 'thumbnail': r're:https?://.*\.jpg', 'timestamp': 1404039863, 'upload_date': '20140629', 'duration': 69.544, diff --git a/youtube_dl/extractor/dctp.py b/youtube_dl/extractor/dctp.py index 14ba88715..00fbbff2f 100644 --- a/youtube_dl/extractor/dctp.py +++ b/youtube_dl/extractor/dctp.py @@ -17,7 +17,7 @@ class DctpTvIE(InfoExtractor): 'title': 'Videoinstallation für eine Kaufhausfassade', 'description': 'Kurzfilm', 'upload_date': '20110407', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, } diff --git a/youtube_dl/extractor/deezer.py b/youtube_dl/extractor/deezer.py index 7a07f3267..ec87b94db 100644 --- a/youtube_dl/extractor/deezer.py +++ b/youtube_dl/extractor/deezer.py @@ -19,7 +19,7 @@ class DeezerPlaylistIE(InfoExtractor): 'id': '176747451', 'title': 'Best!', 'uploader': 'Anonymous', - 'thumbnail': 're:^https?://cdn-images.deezer.com/images/cover/.*\.jpg$', + 'thumbnail': r're:^https?://cdn-images.deezer.com/images/cover/.*\.jpg$', }, 'playlist_count': 30, 'skip': 'Only available in .de', diff --git a/youtube_dl/extractor/dhm.py b/youtube_dl/extractor/dhm.py index 44e0c5d4d..aee72a6ed 100644 --- a/youtube_dl/extractor/dhm.py +++ b/youtube_dl/extractor/dhm.py @@ -17,7 +17,7 @@ class DHMIE(InfoExtractor): 'title': 'MARSHALL PLAN AT WORK IN WESTERN GERMANY, THE', 'description': 'md5:1fabd480c153f97b07add61c44407c82', 'duration': 660, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { 'url': 'http://www.dhm.de/filmarchiv/02-mapping-the-wall/peter-g/rolle-1/', @@ -26,7 +26,7 @@ class DHMIE(InfoExtractor): 'id': 'rolle-1', 'ext': 'flv', 'title': 'ROLLE 1', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }] diff --git a/youtube_dl/extractor/digiteka.py b/youtube_dl/extractor/digiteka.py index 7bb79ffda..3dfde0d8c 100644 --- a/youtube_dl/extractor/digiteka.py +++ b/youtube_dl/extractor/digiteka.py @@ -36,7 +36,7 @@ class DigitekaIE(InfoExtractor): 'id': 's8uk0r', 'ext': 'mp4', 'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 74, 'upload_date': '20150317', 'timestamp': 1426604939, @@ -50,7 +50,7 @@ class DigitekaIE(InfoExtractor): 'id': 'xvpfp8', 'ext': 'mp4', 'title': 'Two - C\'est La Vie (clip)', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 233, 'upload_date': '20150224', 'timestamp': 1424760500, diff --git a/youtube_dl/extractor/discoverygo.py b/youtube_dl/extractor/discoverygo.py index c4e83b2c3..2042493a8 100644 --- a/youtube_dl/extractor/discoverygo.py +++ b/youtube_dl/extractor/discoverygo.py @@ -6,7 +6,6 @@ from ..utils import ( extract_attributes, int_or_none, parse_age_limit, - unescapeHTML, ExtractorError, ) @@ -49,7 +48,7 @@ class DiscoveryGoIE(InfoExtractor): webpage, 'video container')) video = self._parse_json( - unescapeHTML(container.get('data-video') or container.get('data-json')), + container.get('data-video') or container.get('data-json'), display_id) title = video['name'] diff --git a/youtube_dl/extractor/douyutv.py b/youtube_dl/extractor/douyutv.py index e366e17e6..2f3c5113e 100644 --- a/youtube_dl/extractor/douyutv.py +++ b/youtube_dl/extractor/douyutv.py @@ -26,8 +26,8 @@ class DouyuTVIE(InfoExtractor): 'display_id': 'iseven', 'ext': 'flv', 'title': 're:^清晨醒脑!T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', - 'description': 're:.*m7show@163\.com.*', - 'thumbnail': 're:^https?://.*\.jpg$', + 'description': r're:.*m7show@163\.com.*', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': '7师傅', 'is_live': True, }, @@ -42,7 +42,7 @@ class DouyuTVIE(InfoExtractor): 'ext': 'flv', 'title': 're:^小漠从零单排记!——CSOL2躲猫猫 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'description': 'md5:746a2f7a253966a06755a912f0acc0d2', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'douyu小漠', 'is_live': True, }, @@ -57,8 +57,8 @@ class DouyuTVIE(InfoExtractor): 'display_id': '17732', 'ext': 'flv', 'title': 're:^清晨醒脑!T-ara根本停不下来! [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', - 'description': 're:.*m7show@163\.com.*', - 'thumbnail': 're:^https?://.*\.jpg$', + 'description': r're:.*m7show@163\.com.*', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': '7师傅', 'is_live': True, }, diff --git a/youtube_dl/extractor/dramafever.py b/youtube_dl/extractor/dramafever.py index c11595612..1edd8e7bd 100644 --- a/youtube_dl/extractor/dramafever.py +++ b/youtube_dl/extractor/dramafever.py @@ -76,7 +76,7 @@ class DramaFeverIE(DramaFeverBaseIE): 'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0', 'episode': 'Episode 1', 'episode_number': 1, - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1404336058, 'upload_date': '20140702', 'duration': 343, @@ -94,7 +94,7 @@ class DramaFeverIE(DramaFeverBaseIE): 'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91', 'episode': 'Mnet Asian Music Awards 2015 - Part 3', 'episode_number': 4, - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1450213200, 'upload_date': '20151215', 'duration': 5602, diff --git a/youtube_dl/extractor/drbonanza.py b/youtube_dl/extractor/drbonanza.py index 01271f8f0..79ec212c8 100644 --- a/youtube_dl/extractor/drbonanza.py +++ b/youtube_dl/extractor/drbonanza.py @@ -20,7 +20,7 @@ class DRBonanzaIE(InfoExtractor): 'ext': 'mp4', 'title': 'Talkshowet - Leonard Cohen', 'description': 'md5:8f34194fb30cd8c8a30ad8b27b70c0ca', - 'thumbnail': 're:^https?://.*\.(?:gif|jpg)$', + 'thumbnail': r're:^https?://.*\.(?:gif|jpg)$', 'timestamp': 1295537932, 'upload_date': '20110120', 'duration': 3664, @@ -36,7 +36,7 @@ class DRBonanzaIE(InfoExtractor): 'ext': 'mp3', 'title': 'EM fodbold 1992 Danmark - Tyskland finale Transmission', 'description': 'md5:501e5a195749480552e214fbbed16c4e', - 'thumbnail': 're:^https?://.*\.(?:gif|jpg)$', + 'thumbnail': r're:^https?://.*\.(?:gif|jpg)$', 'timestamp': 1223274900, 'upload_date': '20081006', 'duration': 7369, diff --git a/youtube_dl/extractor/dreisat.py b/youtube_dl/extractor/dreisat.py index 908c9e514..f138025d5 100644 --- a/youtube_dl/extractor/dreisat.py +++ b/youtube_dl/extractor/dreisat.py @@ -2,10 +2,19 @@ from __future__ import unicode_literals import re -from .zdf import ZDFIE +from .common import InfoExtractor +from ..utils import ( + int_or_none, + unified_strdate, + xpath_text, + determine_ext, + qualities, + float_or_none, + ExtractorError, +) -class DreiSatIE(ZDFIE): +class DreiSatIE(InfoExtractor): IE_NAME = '3sat' _VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P[0-9]+)$' _TESTS = [ @@ -31,6 +40,163 @@ class DreiSatIE(ZDFIE): }, ] + def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): + param_groups = {} + for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)): + group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace')) + params = {} + for param in param_group: + params[param.get('name')] = param.get('value') + param_groups[group_id] = params + + formats = [] + for video in smil.findall(self._xpath_ns('.//video', namespace)): + src = video.get('src') + if not src: + continue + bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000) + group_id = video.get('paramGroup') + param_group = param_groups[group_id] + for proto in param_group['protocols'].split(','): + formats.append({ + 'url': '%s://%s' % (proto, param_group['host']), + 'app': param_group['app'], + 'play_path': src, + 'ext': 'flv', + 'format_id': '%s-%d' % (proto, bitrate), + 'tbr': bitrate, + }) + self._sort_formats(formats) + return formats + + def extract_from_xml_url(self, video_id, xml_url): + doc = self._download_xml( + xml_url, video_id, + note='Downloading video info', + errnote='Failed to download video info') + + status_code = doc.find('./status/statuscode') + if status_code is not None and status_code.text != 'ok': + code = status_code.text + if code == 'notVisibleAnymore': + message = 'Video %s is not available' % video_id + else: + message = '%s returned error: %s' % (self.IE_NAME, code) + raise ExtractorError(message, expected=True) + + title = doc.find('.//information/title').text + description = xpath_text(doc, './/information/detail', 'description') + duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration')) + uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader') + uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id') + upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date')) + + def xml_to_thumbnails(fnode): + thumbnails = [] + for node in fnode: + thumbnail_url = node.text + if not thumbnail_url: + continue + thumbnail = { + 'url': thumbnail_url, + } + if 'key' in node.attrib: + m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key']) + if m: + thumbnail['width'] = int(m.group(1)) + thumbnail['height'] = int(m.group(2)) + thumbnails.append(thumbnail) + return thumbnails + + thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage')) + + format_nodes = doc.findall('.//formitaeten/formitaet') + quality = qualities(['veryhigh', 'high', 'med', 'low']) + + def get_quality(elem): + return quality(xpath_text(elem, 'quality')) + format_nodes.sort(key=get_quality) + format_ids = [] + formats = [] + for fnode in format_nodes: + video_url = fnode.find('url').text + is_available = 'http://www.metafilegenerator' not in video_url + if not is_available: + continue + format_id = fnode.attrib['basetype'] + quality = xpath_text(fnode, './quality', 'quality') + format_m = re.match(r'''(?x) + (?P[^_]+)_(?P[^_]+)_(?P[^_]+)_ + (?P[^_]+)_(?P[^_]+)_(?P[^_]+) + ''', format_id) + + ext = determine_ext(video_url, None) or format_m.group('container') + if ext not in ('smil', 'f4m', 'm3u8'): + format_id = format_id + '-' + quality + if format_id in format_ids: + continue + + if ext == 'meta': + continue + elif ext == 'smil': + formats.extend(self._extract_smil_formats( + video_url, video_id, fatal=False)) + elif ext == 'm3u8': + # the certificates are misconfigured (see + # https://github.com/rg3/youtube-dl/issues/8665) + if video_url.startswith('https://'): + continue + formats.extend(self._extract_m3u8_formats( + video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)) + elif ext == 'f4m': + formats.extend(self._extract_f4m_formats( + video_url, video_id, f4m_id=format_id, fatal=False)) + else: + proto = format_m.group('proto').lower() + + abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000) + vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000) + + width = int_or_none(xpath_text(fnode, './width', 'width')) + height = int_or_none(xpath_text(fnode, './height', 'height')) + + filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize')) + + format_note = '' + if not format_note: + format_note = None + + formats.append({ + 'format_id': format_id, + 'url': video_url, + 'ext': ext, + 'acodec': format_m.group('acodec'), + 'vcodec': format_m.group('vcodec'), + 'abr': abr, + 'vbr': vbr, + 'width': width, + 'height': height, + 'filesize': filesize, + 'format_note': format_note, + 'protocol': proto, + '_available': is_available, + }) + format_ids.append(format_id) + + self._sort_formats(formats) + + return { + 'id': video_id, + 'title': title, + 'description': description, + 'duration': duration, + 'thumbnails': thumbnails, + 'uploader': uploader, + 'uploader_id': uploader_id, + 'upload_date': upload_date, + 'formats': formats, + } + def _real_extract(self, url): mobj = re.match(self._VALID_URL, url) video_id = mobj.group('id') diff --git a/youtube_dl/extractor/drtuber.py b/youtube_dl/extractor/drtuber.py index 22da8e481..1eca82b3b 100644 --- a/youtube_dl/extractor/drtuber.py +++ b/youtube_dl/extractor/drtuber.py @@ -22,7 +22,7 @@ class DrTuberIE(InfoExtractor): 'like_count': int, 'comment_count': int, 'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'], - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'age_limit': 18, } }, { diff --git a/youtube_dl/extractor/dumpert.py b/youtube_dl/extractor/dumpert.py index e5aadcd25..c9fc9b5a9 100644 --- a/youtube_dl/extractor/dumpert.py +++ b/youtube_dl/extractor/dumpert.py @@ -21,7 +21,7 @@ class DumpertIE(InfoExtractor): 'ext': 'mp4', 'title': 'Ik heb nieuws voor je', 'description': 'Niet schrikken hoor', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }, { 'url': 'http://www.dumpert.nl/embed/6675421/dc440fe7/', diff --git a/youtube_dl/extractor/eagleplatform.py b/youtube_dl/extractor/eagleplatform.py index c2f593eca..76d39adac 100644 --- a/youtube_dl/extractor/eagleplatform.py +++ b/youtube_dl/extractor/eagleplatform.py @@ -31,7 +31,7 @@ class EaglePlatformIE(InfoExtractor): 'ext': 'mp4', 'title': 'Навальный вышел на свободу', 'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 87, 'view_count': int, 'age_limit': 0, @@ -45,7 +45,7 @@ class EaglePlatformIE(InfoExtractor): 'id': '12820', 'ext': 'mp4', 'title': "'O Sole Mio", - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 216, 'view_count': int, }, diff --git a/youtube_dl/extractor/einthusan.py b/youtube_dl/extractor/einthusan.py index 443865ad2..6ca07a13d 100644 --- a/youtube_dl/extractor/einthusan.py +++ b/youtube_dl/extractor/einthusan.py @@ -19,7 +19,7 @@ class EinthusanIE(InfoExtractor): 'id': '2447', 'ext': 'mp4', 'title': 'Ek Villain', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:9d29fc91a7abadd4591fb862fa560d93', } }, @@ -30,7 +30,7 @@ class EinthusanIE(InfoExtractor): 'id': '1671', 'ext': 'mp4', 'title': 'Soodhu Kavvuum', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c', } }, diff --git a/youtube_dl/extractor/eroprofile.py b/youtube_dl/extractor/eroprofile.py index 297f8a6f5..c08643a17 100644 --- a/youtube_dl/extractor/eroprofile.py +++ b/youtube_dl/extractor/eroprofile.py @@ -22,7 +22,7 @@ class EroProfileIE(InfoExtractor): 'display_id': 'sexy-babe-softcore', 'ext': 'm4v', 'title': 'sexy babe softcore', - 'thumbnail': 're:https?://.*\.jpg', + 'thumbnail': r're:https?://.*\.jpg', 'age_limit': 18, } }, { @@ -32,7 +32,7 @@ class EroProfileIE(InfoExtractor): 'id': '1133519', 'ext': 'm4v', 'title': 'Try It On Pee_cut_2.wmv - 4shared.com - file sharing - download movie file', - 'thumbnail': 're:https?://.*\.jpg', + 'thumbnail': r're:https?://.*\.jpg', 'age_limit': 18, }, 'skip': 'Requires login', diff --git a/youtube_dl/extractor/escapist.py b/youtube_dl/extractor/escapist.py index a3d7bbbcb..4d8a3c134 100644 --- a/youtube_dl/extractor/escapist.py +++ b/youtube_dl/extractor/escapist.py @@ -45,7 +45,7 @@ class EscapistIE(InfoExtractor): 'ext': 'mp4', 'description': "Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.", 'title': "Breaking Down Baldur's Gate", - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 264, 'uploader': 'The Escapist', } @@ -57,7 +57,7 @@ class EscapistIE(InfoExtractor): 'ext': 'mp4', 'description': 'This week, Zero Punctuation reviews Evolve.', 'title': 'Evolve - One vs Multiplayer', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 304, 'uploader': 'The Escapist', } diff --git a/youtube_dl/extractor/esri.py b/youtube_dl/extractor/esri.py index d4205d7fb..e9dcaeb1d 100644 --- a/youtube_dl/extractor/esri.py +++ b/youtube_dl/extractor/esri.py @@ -22,7 +22,7 @@ class EsriVideoIE(InfoExtractor): 'ext': 'mp4', 'title': 'ArcGIS Online - Developing Applications', 'description': 'Jeremy Bartley demonstrates how to develop applications with ArcGIS Online.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 185, 'upload_date': '20120419', } diff --git a/youtube_dl/extractor/europa.py b/youtube_dl/extractor/europa.py index adc43919e..1efc0b2ec 100644 --- a/youtube_dl/extractor/europa.py +++ b/youtube_dl/extractor/europa.py @@ -23,7 +23,7 @@ class EuropaIE(InfoExtractor): 'ext': 'mp4', 'title': 'TRADE - Wikileaks on TTIP', 'description': 'NEW LIVE EC Midday press briefing of 11/08/2015', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20150811', 'duration': 34, 'view_count': int, diff --git a/youtube_dl/extractor/expotv.py b/youtube_dl/extractor/expotv.py index ef11962f3..95a897782 100644 --- a/youtube_dl/extractor/expotv.py +++ b/youtube_dl/extractor/expotv.py @@ -17,7 +17,7 @@ class ExpoTVIE(InfoExtractor): 'ext': 'mp4', 'title': 'NYX Butter Lipstick Little Susie', 'description': 'Goes on like butter, but looks better!', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'Stephanie S.', 'upload_date': '20150520', 'view_count': int, diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py index fcfe87f6f..ed9a133ea 100644 --- a/youtube_dl/extractor/extractors.py +++ b/youtube_dl/extractor/extractors.py @@ -38,10 +38,7 @@ from .amcnetworks import AMCNetworksIE from .animeondemand import AnimeOnDemandIE from .anitube import AnitubeIE from .anysex import AnySexIE -from .aol import ( - AolIE, - AolFeaturesIE, -) +from .aol import AolIE from .allocine import AllocineIE from .aparat import AparatIE from .appleconnect import AppleConnectIE @@ -320,7 +317,6 @@ from .francetv import ( ) from .freesound import FreesoundIE from .freespeech import FreespeechIE -from .freevideo import FreeVideoIE from .funimation import FunimationIE from .funnyordie import FunnyOrDieIE from .fusion import FusionIE @@ -656,6 +652,7 @@ from .nrk import ( NRKSkoleIE, NRKTVIE, NRKTVDirekteIE, + NRKTVEpisodesIE, ) from .ntvde import NTVDeIE from .ntvru import NTVRuIE @@ -813,7 +810,6 @@ from .sbs import SBSIE from .scivee import SciVeeIE from .screencast import ScreencastIE from .screencastomatic import ScreencastOMaticIE -from .screenjunkies import ScreenJunkiesIE from .seeker import SeekerIE from .senateisvp import SenateISVPIE from .sendtonews import SendtoNewsIE @@ -824,7 +820,7 @@ from .shared import ( SharedIE, VivoIE, ) -from .sharesix import ShareSixIE +from .showroomlive import ShowRoomLiveIE from .sina import SinaIE from .sixplay import SixPlayIE from .skynewsarabia import ( @@ -1064,6 +1060,7 @@ from .vice import ( from .viceland import VicelandIE from .vidbit import VidbitIE from .viddler import ViddlerIE +from .videa import VideaIE from .videodetective import VideoDetectiveIE from .videofyme import VideofyMeIE from .videomega import VideoMegaIE @@ -1073,7 +1070,6 @@ from .videomore import ( VideomoreSeasonIE, ) from .videopremium import VideoPremiumIE -from .videott import VideoTtIE from .vidio import VidioIE from .vidme import ( VidmeIE, diff --git a/youtube_dl/extractor/fc2.py b/youtube_dl/extractor/fc2.py index c032d4d02..448647d72 100644 --- a/youtube_dl/extractor/fc2.py +++ b/youtube_dl/extractor/fc2.py @@ -133,7 +133,7 @@ class FC2EmbedIE(InfoExtractor): 'id': '201403223kCqB3Ez', 'ext': 'flv', 'title': 'プリズン・ブレイク S1-01 マイケル 【吹替】', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, } diff --git a/youtube_dl/extractor/firsttv.py b/youtube_dl/extractor/firsttv.py index 47673e2d4..c6fb67057 100644 --- a/youtube_dl/extractor/firsttv.py +++ b/youtube_dl/extractor/firsttv.py @@ -26,7 +26,7 @@ class FirstTVIE(InfoExtractor): 'id': '40049', 'ext': 'mp4', 'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015', - 'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$', + 'thumbnail': r're:^https?://.*\.(?:jpg|JPG)$', 'upload_date': '20150212', 'duration': 2694, }, @@ -37,7 +37,7 @@ class FirstTVIE(InfoExtractor): 'id': '364746', 'ext': 'mp4', 'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016', - 'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$', + 'thumbnail': r're:^https?://.*\.(?:jpg|JPG)$', 'upload_date': '20160407', 'duration': 179, 'formats': 'mincount:3', diff --git a/youtube_dl/extractor/fivetv.py b/youtube_dl/extractor/fivetv.py index 13fbc4da2..15736c9fe 100644 --- a/youtube_dl/extractor/fivetv.py +++ b/youtube_dl/extractor/fivetv.py @@ -25,7 +25,7 @@ class FiveTVIE(InfoExtractor): 'ext': 'mp4', 'title': 'Россияне выбрали имя для общенациональной платежной системы', 'description': 'md5:a8aa13e2b7ad36789e9f77a74b6de660', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 180, }, }, { @@ -35,7 +35,7 @@ class FiveTVIE(InfoExtractor): 'ext': 'mp4', 'title': '3D принтер', 'description': 'md5:d76c736d29ef7ec5c0cf7d7c65ffcb41', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 180, }, }, { @@ -44,7 +44,7 @@ class FiveTVIE(InfoExtractor): 'id': 'glavnoe', 'ext': 'mp4', 'title': 'Итоги недели с 8 по 14 июня 2015 года', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { 'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/', diff --git a/youtube_dl/extractor/fktv.py b/youtube_dl/extractor/fktv.py index a3a291599..2958452f4 100644 --- a/youtube_dl/extractor/fktv.py +++ b/youtube_dl/extractor/fktv.py @@ -19,7 +19,7 @@ class FKTVIE(InfoExtractor): 'id': '1', 'ext': 'mp4', 'title': 'Folge 1 vom 10. April 2007', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, } diff --git a/youtube_dl/extractor/foxgay.py b/youtube_dl/extractor/foxgay.py index 39174fcec..e887ae488 100644 --- a/youtube_dl/extractor/foxgay.py +++ b/youtube_dl/extractor/foxgay.py @@ -20,7 +20,7 @@ class FoxgayIE(InfoExtractor): 'title': 'Fuck Turkish-style', 'description': 'md5:6ae2d9486921891efe89231ace13ffdf', 'age_limit': 18, - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', }, } diff --git a/youtube_dl/extractor/foxnews.py b/youtube_dl/extractor/foxnews.py index 229bcb175..dc0662f74 100644 --- a/youtube_dl/extractor/foxnews.py +++ b/youtube_dl/extractor/foxnews.py @@ -22,7 +22,7 @@ class FoxNewsIE(AMPIE): 'duration': 265, 'timestamp': 1304411491, 'upload_date': '20110503', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -36,7 +36,7 @@ class FoxNewsIE(AMPIE): 'duration': 292, 'timestamp': 1417662047, 'upload_date': '20141204', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { # m3u8 download @@ -111,7 +111,7 @@ class FoxNewsInsiderIE(InfoExtractor): 'description': 'Is campus censorship getting out of control?', 'timestamp': 1472168725, 'upload_date': '20160825', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { # m3u8 download diff --git a/youtube_dl/extractor/franceculture.py b/youtube_dl/extractor/franceculture.py index 56048ffc2..b98da692c 100644 --- a/youtube_dl/extractor/franceculture.py +++ b/youtube_dl/extractor/franceculture.py @@ -17,7 +17,7 @@ class FranceCultureIE(InfoExtractor): 'display_id': 'rendez-vous-au-pays-des-geeks', 'ext': 'mp3', 'title': 'Rendez-vous au pays des geeks', - 'thumbnail': 're:^https?://.*\\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20140301', 'vcodec': 'none', } diff --git a/youtube_dl/extractor/francetv.py b/youtube_dl/extractor/francetv.py index e7068d1ae..48d43ae58 100644 --- a/youtube_dl/extractor/francetv.py +++ b/youtube_dl/extractor/francetv.py @@ -168,7 +168,7 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor): 'id': 'NI_173343', 'ext': 'mp4', 'title': 'Les entreprises familiales : le secret de la réussite', - 'thumbnail': 're:^https?://.*\.jpe?g$', + 'thumbnail': r're:^https?://.*\.jpe?g$', 'timestamp': 1433273139, 'upload_date': '20150602', }, @@ -184,7 +184,7 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor): 'ext': 'mp4', 'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de l’Armor"', 'description': 'md5:a3264114c9d29aeca11ced113c37b16c', - 'thumbnail': 're:^https?://.*\.jpe?g$', + 'thumbnail': r're:^https?://.*\.jpe?g$', 'timestamp': 1458300695, 'upload_date': '20160318', }, diff --git a/youtube_dl/extractor/freevideo.py b/youtube_dl/extractor/freevideo.py deleted file mode 100644 index cd8423a6f..000000000 --- a/youtube_dl/extractor/freevideo.py +++ /dev/null @@ -1,38 +0,0 @@ -from __future__ import unicode_literals - -from .common import InfoExtractor -from ..utils import ExtractorError - - -class FreeVideoIE(InfoExtractor): - _VALID_URL = r'^https?://www.freevideo.cz/vase-videa/(?P[^.]+)\.html(?:$|[?#])' - - _TEST = { - 'url': 'http://www.freevideo.cz/vase-videa/vysukany-zadecek-22033.html', - 'info_dict': { - 'id': 'vysukany-zadecek-22033', - 'ext': 'mp4', - 'title': 'vysukany-zadecek-22033', - 'age_limit': 18, - }, - 'skip': 'Blocked outside .cz', - } - - def _real_extract(self, url): - video_id = self._match_id(url) - webpage, handle = self._download_webpage_handle(url, video_id) - if '//www.czechav.com/' in handle.geturl(): - raise ExtractorError( - 'Access to freevideo is blocked from your location', - expected=True) - - video_url = self._search_regex( - r'\s+url: "(http://[a-z0-9-]+.cdn.freevideo.cz/stream/.*?/video.mp4)"', - webpage, 'video URL') - - return { - 'id': video_id, - 'url': video_url, - 'title': video_id, - 'age_limit': 18, - } diff --git a/youtube_dl/extractor/funimation.py b/youtube_dl/extractor/funimation.py index 0ad0d9b6a..eba00cd5a 100644 --- a/youtube_dl/extractor/funimation.py +++ b/youtube_dl/extractor/funimation.py @@ -29,7 +29,7 @@ class FunimationIE(InfoExtractor): 'ext': 'mp4', 'title': 'Air - 1 - Breeze', 'description': 'md5:1769f43cd5fc130ace8fd87232207892', - 'thumbnail': 're:https?://.*\.jpg', + 'thumbnail': r're:https?://.*\.jpg', }, 'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed', }, { @@ -40,7 +40,7 @@ class FunimationIE(InfoExtractor): 'ext': 'mp4', 'title': '.hack//SIGN - 1 - Role Play', 'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd', - 'thumbnail': 're:https?://.*\.jpg', + 'thumbnail': r're:https?://.*\.jpg', }, 'skip': 'Access without user interaction is forbidden by CloudFlare', }, { @@ -51,7 +51,7 @@ class FunimationIE(InfoExtractor): 'ext': 'mp4', 'title': 'Attack on Titan: Junior High - Broadcast Dub Preview', 'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803', - 'thumbnail': 're:https?://.*\.(?:jpg|png)', + 'thumbnail': r're:https?://.*\.(?:jpg|png)', }, 'skip': 'Access without user interaction is forbidden by CloudFlare', }] diff --git a/youtube_dl/extractor/funnyordie.py b/youtube_dl/extractor/funnyordie.py index f2928b5fe..81c0ce9a3 100644 --- a/youtube_dl/extractor/funnyordie.py +++ b/youtube_dl/extractor/funnyordie.py @@ -17,7 +17,7 @@ class FunnyOrDieIE(InfoExtractor): 'ext': 'mp4', 'title': 'Heart-Shaped Box: Literal Video Version', 'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338', - 'thumbnail': 're:^http:.*\.jpg$', + 'thumbnail': r're:^http:.*\.jpg$', }, }, { 'url': 'http://www.funnyordie.com/embed/e402820827', @@ -26,7 +26,7 @@ class FunnyOrDieIE(InfoExtractor): 'ext': 'mp4', 'title': 'Please Use This Song (Jon Lajoie)', 'description': 'Please use this to sell something. www.jonlajoie.com', - 'thumbnail': 're:^http:.*\.jpg$', + 'thumbnail': r're:^http:.*\.jpg$', }, 'params': { 'skip_download': True, diff --git a/youtube_dl/extractor/gamersyde.py b/youtube_dl/extractor/gamersyde.py index d545e01bb..a218a6944 100644 --- a/youtube_dl/extractor/gamersyde.py +++ b/youtube_dl/extractor/gamersyde.py @@ -20,7 +20,7 @@ class GamersydeIE(InfoExtractor): 'ext': 'mp4', 'duration': 372, 'title': 'Bloodborne - Birth of a hero', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/gamespot.py b/youtube_dl/extractor/gamespot.py index 4e859e09a..682c49e79 100644 --- a/youtube_dl/extractor/gamespot.py +++ b/youtube_dl/extractor/gamespot.py @@ -63,7 +63,7 @@ class GameSpotIE(OnceIE): streams, ('progressive_hd', 'progressive_high', 'progressive_low')) if progressive_url and manifest_url: qualities_basename = self._search_regex( - '/([^/]+)\.csmil/', + r'/([^/]+)\.csmil/', manifest_url, 'qualities basename', default=None) if qualities_basename: QUALITIES_RE = r'((,\d+)+,?)' diff --git a/youtube_dl/extractor/gamestar.py b/youtube_dl/extractor/gamestar.py index 55a34604a..e607d6ab8 100644 --- a/youtube_dl/extractor/gamestar.py +++ b/youtube_dl/extractor/gamestar.py @@ -18,7 +18,7 @@ class GameStarIE(InfoExtractor): 'ext': 'mp4', 'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil', 'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1406542020, 'upload_date': '20140728', 'duration': 17 diff --git a/youtube_dl/extractor/gazeta.py b/youtube_dl/extractor/gazeta.py index 18ef5c252..57c67a451 100644 --- a/youtube_dl/extractor/gazeta.py +++ b/youtube_dl/extractor/gazeta.py @@ -16,7 +16,7 @@ class GazetaIE(InfoExtractor): 'ext': 'mp4', 'title': '«70–80 процентов гражданских в Донецке на грани голода»', 'description': 'md5:38617526050bd17b234728e7f9620a71', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', }, 'skip': 'video not found', }, { diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py index 79d10a1d1..86dc79307 100644 --- a/youtube_dl/extractor/generic.py +++ b/youtube_dl/extractor/generic.py @@ -73,9 +73,11 @@ from .kaltura import KalturaIE from .eagleplatform import EaglePlatformIE from .facebook import FacebookIE from .soundcloud import SoundcloudIE +from .tunein import TuneInBaseIE from .vbox7 import Vbox7IE from .dbtv import DBTVIE from .piksel import PikselIE +from .videa import VideaIE class GenericIE(InfoExtractor): @@ -237,7 +239,7 @@ class GenericIE(InfoExtractor): 'ext': 'mp4', 'title': 'Tikibad ontruimd wegens brand', 'description': 'md5:05ca046ff47b931f9b04855015e163a4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 33, }, 'params': { @@ -298,7 +300,7 @@ class GenericIE(InfoExtractor): 'ext': 'mp4', 'upload_date': '20130224', 'uploader_id': 'TheVerge', - 'description': 're:^Chris Ziegler takes a look at the\.*', + 'description': r're:^Chris Ziegler takes a look at the\.*', 'uploader': 'The Verge', 'title': 'First Firefox OS phones side-by-side', }, @@ -537,7 +539,7 @@ class GenericIE(InfoExtractor): 'id': 'f4dafcad-ff21-423d-89b5-146cfd89fa1e', 'ext': 'mp4', 'title': 'Ужастики, русский трейлер (2015)', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 153, } }, @@ -757,7 +759,7 @@ class GenericIE(InfoExtractor): 'duration': 48, 'timestamp': 1401537900, 'upload_date': '20140531', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, # Wistia embed @@ -827,6 +829,21 @@ class GenericIE(InfoExtractor): }, 'playlist_mincount': 7, }, + # TuneIn station embed + { + 'url': 'http://radiocnrv.com/promouvoir-radio-cnrv/', + 'info_dict': { + 'id': '204146', + 'ext': 'mp3', + 'title': 'CNRV', + 'location': 'Paris, France', + 'is_live': True, + }, + 'params': { + # Live stream + 'skip_download': True, + }, + }, # Livestream embed { 'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast', @@ -1014,7 +1031,7 @@ class GenericIE(InfoExtractor): 'ext': 'mp4', 'title': 'Навальный вышел на свободу', 'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 87, 'view_count': int, 'age_limit': 0, @@ -1028,7 +1045,7 @@ class GenericIE(InfoExtractor): 'id': '12820', 'ext': 'mp4', 'title': "'O Sole Mio", - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 216, 'view_count': int, }, @@ -1041,7 +1058,7 @@ class GenericIE(InfoExtractor): 'ext': 'mp4', 'title': 'Тайны перевала Дятлова • 1 серия 2 часть', 'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 694, 'age_limit': 0, }, @@ -1053,7 +1070,7 @@ class GenericIE(InfoExtractor): 'id': '3519514', 'ext': 'mp4', 'title': 'Joe Dirt 2 Beautiful Loser Teaser Trailer', - 'thumbnail': 're:^https?://.*\.png$', + 'thumbnail': r're:^https?://.*\.png$', 'duration': 45.115, }, }, @@ -1136,7 +1153,7 @@ class GenericIE(InfoExtractor): 'id': '300346', 'ext': 'mp4', 'title': '中一中男師變性 全校師生力挺', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { # m3u8 download @@ -1182,7 +1199,7 @@ class GenericIE(InfoExtractor): 'ext': 'mp4', 'title': 'Sauvons les abeilles ! - Le débat', 'description': 'md5:d9082128b1c5277987825d684939ca26', - 'thumbnail': 're:^https?://.*\.jpe?g$', + 'thumbnail': r're:^https?://.*\.jpe?g$', 'timestamp': 1434970506, 'upload_date': '20150622', 'uploader': 'Public Sénat', @@ -1196,7 +1213,7 @@ class GenericIE(InfoExtractor): 'id': '2855', 'ext': 'mp4', 'title': 'Don’t Understand Bitcoin? This Man Will Mumble An Explanation At You', - 'thumbnail': 're:^https?://.*\.jpe?g$', + 'thumbnail': r're:^https?://.*\.jpe?g$', 'uploader': 'ClickHole', 'uploader_id': 'clickhole', } @@ -1422,6 +1439,15 @@ class GenericIE(InfoExtractor): }, 'playlist_mincount': 3, }, + { + # Videa embeds + 'url': 'http://forum.dvdtalk.com/movie-talk/623756-deleted-magic-star-wars-ot-deleted-alt-scenes-docu-style.html', + 'info_dict': { + 'id': '623756-deleted-magic-star-wars-ot-deleted-alt-scenes-docu-style', + 'title': 'Deleted Magic - Star Wars: OT Deleted / Alt. Scenes Docu. Style - DVD Talk Forum', + }, + 'playlist_mincount': 2, + }, # { # # TODO: find another test # # http://schema.org/VideoObject @@ -2078,6 +2104,11 @@ class GenericIE(InfoExtractor): if soundcloud_urls: return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key()) + # Look for tunein player + tunein_urls = TuneInBaseIE._extract_urls(webpage) + if tunein_urls: + return _playlist_from_matches(tunein_urls) + # Look for embedded mtvservices player mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage) if mtvservices_url: @@ -2358,6 +2389,11 @@ class GenericIE(InfoExtractor): if dbtv_urls: return _playlist_from_matches(dbtv_urls, ie=DBTVIE.ie_key()) + # Look for Videa embeds + videa_urls = VideaIE._extract_urls(webpage) + if videa_urls: + return _playlist_from_matches(videa_urls, ie=VideaIE.ie_key()) + # Looking for http://schema.org/VideoObject json_ld = self._search_json_ld( webpage, video_id, default={}, expected_type='VideoObject') diff --git a/youtube_dl/extractor/giantbomb.py b/youtube_dl/extractor/giantbomb.py index 87cd19147..29b684d35 100644 --- a/youtube_dl/extractor/giantbomb.py +++ b/youtube_dl/extractor/giantbomb.py @@ -23,7 +23,7 @@ class GiantBombIE(InfoExtractor): 'title': 'Quick Look: Destiny: The Dark Below', 'description': 'md5:0aa3aaf2772a41b91d44c63f30dfad24', 'duration': 2399, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/giga.py b/youtube_dl/extractor/giga.py index 28eb733e2..5a9992a27 100644 --- a/youtube_dl/extractor/giga.py +++ b/youtube_dl/extractor/giga.py @@ -24,7 +24,7 @@ class GigaIE(InfoExtractor): 'ext': 'mp4', 'title': 'Anime Awesome: Chihiros Reise ins Zauberland – Das Beste kommt zum Schluss', 'description': 'md5:afdf5862241aded4718a30dff6a57baf', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 578, 'timestamp': 1414749706, 'upload_date': '20141031', diff --git a/youtube_dl/extractor/glide.py b/youtube_dl/extractor/glide.py index f0d951396..d94dfbf09 100644 --- a/youtube_dl/extractor/glide.py +++ b/youtube_dl/extractor/glide.py @@ -14,7 +14,7 @@ class GlideIE(InfoExtractor): 'id': 'UZF8zlmuQbe4mr+7dCiQ0w==', 'ext': 'mp4', 'title': "Damon's Glide message", - 'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$', + 'thumbnail': r're:^https?://.*?\.cloudfront\.net/.*\.jpg$', } } diff --git a/youtube_dl/extractor/godtube.py b/youtube_dl/extractor/godtube.py index 363dc6608..92efd16b3 100644 --- a/youtube_dl/extractor/godtube.py +++ b/youtube_dl/extractor/godtube.py @@ -23,7 +23,7 @@ class GodTubeIE(InfoExtractor): 'timestamp': 1205712000, 'uploader': 'beverlybmusic', 'upload_date': '20080317', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, ] diff --git a/youtube_dl/extractor/goshgay.py b/youtube_dl/extractor/goshgay.py index 74e1720ee..377981d3e 100644 --- a/youtube_dl/extractor/goshgay.py +++ b/youtube_dl/extractor/goshgay.py @@ -19,7 +19,7 @@ class GoshgayIE(InfoExtractor): 'id': '299069', 'ext': 'flv', 'title': 'DIESEL SFW XXX Video', - 'thumbnail': 're:^http://.*\.jpg$', + 'thumbnail': r're:^http://.*\.jpg$', 'duration': 80, 'age_limit': 18, } diff --git a/youtube_dl/extractor/hbo.py b/youtube_dl/extractor/hbo.py index cbf774377..8116ad9bd 100644 --- a/youtube_dl/extractor/hbo.py +++ b/youtube_dl/extractor/hbo.py @@ -120,7 +120,7 @@ class HBOIE(HBOBaseIE): 'id': '1437839', 'ext': 'mp4', 'title': 'Ep. 64 Clip: Encryption', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'duration': 1072, } } @@ -141,7 +141,7 @@ class HBOEpisodeIE(HBOBaseIE): 'display_id': 'ep-52-inside-the-episode', 'ext': 'mp4', 'title': 'Ep. 52: Inside the Episode', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'duration': 240, }, }, { diff --git a/youtube_dl/extractor/hearthisat.py b/youtube_dl/extractor/hearthisat.py index 256453882..18c252012 100644 --- a/youtube_dl/extractor/hearthisat.py +++ b/youtube_dl/extractor/hearthisat.py @@ -25,7 +25,7 @@ class HearThisAtIE(InfoExtractor): 'id': '150939', 'ext': 'wav', 'title': 'Moofi - Dr. Kreep', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1421564134, 'description': 'Listen to Dr. Kreep by Moofi on hearthis.at - Modular, Eurorack, Mutable Intruments Braids, Valhalla-DSP', 'upload_date': '20150118', @@ -46,7 +46,7 @@ class HearThisAtIE(InfoExtractor): 'description': 'Listen to DJ Jim Hopkins - Totally Bitchin\' 80\'s Dance Mix! by TwitchSF on hearthis.at - Dance', 'upload_date': '20160328', 'timestamp': 1459186146, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'comment_count': int, 'view_count': int, 'like_count': int, diff --git a/youtube_dl/extractor/heise.py b/youtube_dl/extractor/heise.py index 278d9f527..1629cdb8d 100644 --- a/youtube_dl/extractor/heise.py +++ b/youtube_dl/extractor/heise.py @@ -29,7 +29,7 @@ class HeiseIE(InfoExtractor): 'timestamp': 1411812600, 'upload_date': '20140927', 'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.', - 'thumbnail': 're:^https?://.*\.jpe?g$', + 'thumbnail': r're:^https?://.*\.jpe?g$', } } diff --git a/youtube_dl/extractor/hellporno.py b/youtube_dl/extractor/hellporno.py index 10da14067..0ee8ea712 100644 --- a/youtube_dl/extractor/hellporno.py +++ b/youtube_dl/extractor/hellporno.py @@ -20,7 +20,7 @@ class HellPornoIE(InfoExtractor): 'display_id': 'dixie-is-posing-with-naked-ass-very-erotic', 'ext': 'mp4', 'title': 'Dixie is posing with naked ass very erotic', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'age_limit': 18, } }, { diff --git a/youtube_dl/extractor/historicfilms.py b/youtube_dl/extractor/historicfilms.py index 6a36933ac..56343e98f 100644 --- a/youtube_dl/extractor/historicfilms.py +++ b/youtube_dl/extractor/historicfilms.py @@ -14,7 +14,7 @@ class HistoricFilmsIE(InfoExtractor): 'ext': 'mov', 'title': 'Historic Films: GP-7', 'description': 'md5:1a86a0f3ac54024e419aba97210d959a', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 2096, }, } diff --git a/youtube_dl/extractor/hitbox.py b/youtube_dl/extractor/hitbox.py index ff797438d..e21ebb8fb 100644 --- a/youtube_dl/extractor/hitbox.py +++ b/youtube_dl/extractor/hitbox.py @@ -25,7 +25,7 @@ class HitboxIE(InfoExtractor): 'alt_title': 'hitboxlive - Aug 9th #6', 'description': '', 'ext': 'mp4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 215.1666, 'resolution': 'HD 720p', 'uploader': 'hitboxlive', @@ -163,7 +163,7 @@ class HitboxLiveIE(HitboxIE): if cdn.get('rtmpSubscribe') is True: continue base_url = cdn.get('netConnectionUrl') - host = re.search('.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1) + host = re.search(r'.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1) if base_url not in servers: servers.append(base_url) for stream in cdn.get('bitrates'): diff --git a/youtube_dl/extractor/hornbunny.py b/youtube_dl/extractor/hornbunny.py index 0615f06af..c458a959d 100644 --- a/youtube_dl/extractor/hornbunny.py +++ b/youtube_dl/extractor/hornbunny.py @@ -20,7 +20,7 @@ class HornBunnyIE(InfoExtractor): 'duration': 550, 'age_limit': 18, 'view_count': int, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/howstuffworks.py b/youtube_dl/extractor/howstuffworks.py index 65ba2a48b..2be68abad 100644 --- a/youtube_dl/extractor/howstuffworks.py +++ b/youtube_dl/extractor/howstuffworks.py @@ -21,7 +21,7 @@ class HowStuffWorksIE(InfoExtractor): 'title': 'Cool Jobs - Iditarod Musher', 'description': 'Cold sleds, freezing temps and warm dog breath... an Iditarod musher\'s dream. Kasey-Dee Gardner jumps on a sled to find out what the big deal is.', 'display_id': 'cool-jobs-iditarod-musher', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 161, }, 'skip': 'Video broken', @@ -34,7 +34,7 @@ class HowStuffWorksIE(InfoExtractor): 'title': 'Survival Zone: Food and Water In the Savanna', 'description': 'Learn how to find both food and water while trekking in the African savannah. In this video from the Discovery Channel.', 'display_id': 'survival-zone-food-and-water-in-the-savanna', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -45,7 +45,7 @@ class HowStuffWorksIE(InfoExtractor): 'title': 'Sword Swallowing #1 by Dan Meyer', 'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International ', 'display_id': 'sword-swallowing-1-by-dan-meyer', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { diff --git a/youtube_dl/extractor/huajiao.py b/youtube_dl/extractor/huajiao.py index cec0df09a..4ca275dda 100644 --- a/youtube_dl/extractor/huajiao.py +++ b/youtube_dl/extractor/huajiao.py @@ -20,7 +20,7 @@ class HuajiaoIE(InfoExtractor): 'title': '#新人求关注#', 'description': 're:.*', 'duration': 2424.0, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1475866459, 'upload_date': '20161007', 'uploader': 'Penny_余姿昀', diff --git a/youtube_dl/extractor/huffpost.py b/youtube_dl/extractor/huffpost.py index 059073749..97e36f056 100644 --- a/youtube_dl/extractor/huffpost.py +++ b/youtube_dl/extractor/huffpost.py @@ -52,7 +52,7 @@ class HuffPostIE(InfoExtractor): thumbnails = [] for url in filter(None, data['images'].values()): - m = re.match('.*-([0-9]+x[0-9]+)\.', url) + m = re.match(r'.*-([0-9]+x[0-9]+)\.', url) if not m: continue thumbnails.append({ diff --git a/youtube_dl/extractor/indavideo.py b/youtube_dl/extractor/indavideo.py index c6f080484..11cf3c609 100644 --- a/youtube_dl/extractor/indavideo.py +++ b/youtube_dl/extractor/indavideo.py @@ -19,7 +19,7 @@ class IndavideoEmbedIE(InfoExtractor): 'ext': 'mp4', 'title': 'Cicatánc', 'description': '', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'cukiajanlo', 'uploader_id': '83729', 'timestamp': 1439193826, @@ -102,7 +102,7 @@ class IndavideoIE(InfoExtractor): 'ext': 'mp4', 'title': 'Vicces cica', 'description': 'Játszik a tablettel. :D', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'Jet_Pack', 'uploader_id': '491217', 'timestamp': 1390821212, diff --git a/youtube_dl/extractor/instagram.py b/youtube_dl/extractor/instagram.py index 196407b06..98f408c18 100644 --- a/youtube_dl/extractor/instagram.py +++ b/youtube_dl/extractor/instagram.py @@ -22,7 +22,7 @@ class InstagramIE(InfoExtractor): 'ext': 'mp4', 'title': 'Video by naomipq', 'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1371748545, 'upload_date': '20130620', 'uploader_id': 'naomipq', @@ -38,7 +38,7 @@ class InstagramIE(InfoExtractor): 'id': 'BA-pQFBG8HZ', 'ext': 'mp4', 'title': 'Video by britneyspears', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1453760977, 'upload_date': '20160125', 'uploader_id': 'britneyspears', @@ -169,7 +169,7 @@ class InstagramUserIE(InfoExtractor): 'id': '614605558512799803_462752227', 'ext': 'mp4', 'title': '#Porsche Intelligent Performance.', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'uploader': 'Porsche', 'uploader_id': 'porsche', 'timestamp': 1387486713, diff --git a/youtube_dl/extractor/ir90tv.py b/youtube_dl/extractor/ir90tv.py index 214bcd5b5..d5a3f6fa5 100644 --- a/youtube_dl/extractor/ir90tv.py +++ b/youtube_dl/extractor/ir90tv.py @@ -14,7 +14,7 @@ class Ir90TvIE(InfoExtractor): 'id': '95719', 'ext': 'mp4', 'title': 'شایعات نقل و انتقالات مهم فوتبال اروپا 94/02/18', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }, { 'url': 'http://www.90tv.ir/video/95719/%D8%B4%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA-%D9%86%D9%82%D9%84-%D9%88-%D8%A7%D9%86%D8%AA%D9%82%D8%A7%D9%84%D8%A7%D8%AA-%D9%85%D9%87%D9%85-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7-940218', diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py index 7c8cb21c2..3d3c15024 100644 --- a/youtube_dl/extractor/ivi.py +++ b/youtube_dl/extractor/ivi.py @@ -28,7 +28,7 @@ class IviIE(InfoExtractor): 'title': 'Иван Васильевич меняет профессию', 'description': 'md5:b924063ea1677c8fe343d8a72ac2195f', 'duration': 5498, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'skip': 'Only works from Russia', }, @@ -46,7 +46,7 @@ class IviIE(InfoExtractor): 'episode': 'Дело Гольдберга (1 часть)', 'episode_number': 1, 'duration': 2655, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'skip': 'Only works from Russia', }, @@ -60,7 +60,7 @@ class IviIE(InfoExtractor): 'title': 'Кукла', 'description': 'md5:ffca9372399976a2d260a407cc74cce6', 'duration': 5599, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'skip': 'Only works from Russia', } diff --git a/youtube_dl/extractor/izlesene.py b/youtube_dl/extractor/izlesene.py index aa0728abc..b1d72177d 100644 --- a/youtube_dl/extractor/izlesene.py +++ b/youtube_dl/extractor/izlesene.py @@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor): 'ext': 'mp4', 'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi', 'description': 'md5:253753e2655dde93f59f74b572454f6d', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'uploader_id': 'pelikzzle', 'timestamp': int, 'upload_date': '20140702', @@ -44,7 +44,7 @@ class IzleseneIE(InfoExtractor): 'id': '17997', 'ext': 'mp4', 'title': 'Tarkan Dortmund 2006 Konseri', - 'thumbnail': 're:^https://.*\.jpg', + 'thumbnail': r're:^https://.*\.jpg', 'uploader_id': 'parlayankiz', 'timestamp': int, 'upload_date': '20061112', diff --git a/youtube_dl/extractor/jamendo.py b/youtube_dl/extractor/jamendo.py index ee9acac09..51d19e67d 100644 --- a/youtube_dl/extractor/jamendo.py +++ b/youtube_dl/extractor/jamendo.py @@ -17,7 +17,7 @@ class JamendoIE(InfoExtractor): 'display_id': 'stories-from-emona-i', 'ext': 'flac', 'title': 'Stories from Emona I', - 'thumbnail': 're:^https?://.*\.jpg' + 'thumbnail': r're:^https?://.*\.jpg' } } diff --git a/youtube_dl/extractor/jove.py b/youtube_dl/extractor/jove.py index cf73cd753..f9a034b78 100644 --- a/youtube_dl/extractor/jove.py +++ b/youtube_dl/extractor/jove.py @@ -21,7 +21,7 @@ class JoveIE(InfoExtractor): 'ext': 'mp4', 'title': 'Electrode Positioning and Montage in Transcranial Direct Current Stimulation', 'description': 'md5:015dd4509649c0908bc27f049e0262c6', - 'thumbnail': 're:^https?://.*\.png$', + 'thumbnail': r're:^https?://.*\.png$', 'upload_date': '20110523', } }, @@ -33,7 +33,7 @@ class JoveIE(InfoExtractor): 'ext': 'mp4', 'title': 'Culturing Caenorhabditis elegans in Axenic Liquid Media and Creation of Transgenic Worms by Microparticle Bombardment', 'description': 'md5:35ff029261900583970c4023b70f1dc9', - 'thumbnail': 're:^https?://.*\.png$', + 'thumbnail': r're:^https?://.*\.png$', 'upload_date': '20140802', } }, diff --git a/youtube_dl/extractor/karrierevideos.py b/youtube_dl/extractor/karrierevideos.py index c05263e61..4e9eb67bf 100644 --- a/youtube_dl/extractor/karrierevideos.py +++ b/youtube_dl/extractor/karrierevideos.py @@ -20,7 +20,7 @@ class KarriereVideosIE(InfoExtractor): 'ext': 'flv', 'title': 'AltenpflegerIn', 'description': 'md5:dbadd1259fde2159a9b28667cb664ae2', - 'thumbnail': 're:^http://.*\.png', + 'thumbnail': r're:^http://.*\.png', }, 'params': { # rtmp download @@ -34,7 +34,7 @@ class KarriereVideosIE(InfoExtractor): 'ext': 'flv', 'title': 'Väterkarenz und neue Chancen für Mütter - "Baby - was nun?"', 'description': 'md5:97092c6ad1fd7d38e9d6a5fdeb2bcc33', - 'thumbnail': 're:^http://.*\.png', + 'thumbnail': r're:^http://.*\.png', }, 'params': { # rtmp download diff --git a/youtube_dl/extractor/keezmovies.py b/youtube_dl/extractor/keezmovies.py index 588a4d0ec..e83115e2a 100644 --- a/youtube_dl/extractor/keezmovies.py +++ b/youtube_dl/extractor/keezmovies.py @@ -27,7 +27,7 @@ class KeezMoviesIE(InfoExtractor): 'display_id': 'petite-asian-lady-mai-playing-in-bathtub', 'ext': 'mp4', 'title': 'Petite Asian Lady Mai Playing In Bathtub', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'view_count': int, 'age_limit': 18, } diff --git a/youtube_dl/extractor/ketnet.py b/youtube_dl/extractor/ketnet.py index eb0a16008..fb9c2dbd4 100644 --- a/youtube_dl/extractor/ketnet.py +++ b/youtube_dl/extractor/ketnet.py @@ -13,7 +13,7 @@ class KetnetIE(InfoExtractor): 'ext': 'mp4', 'title': 'Gluur mee op de filmset en op Pennenzakkenrock', 'description': 'Gluur mee met Ghost Rockers op de filmset', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }, { 'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016', diff --git a/youtube_dl/extractor/krasview.py b/youtube_dl/extractor/krasview.py index cf8876fa1..d27d052ff 100644 --- a/youtube_dl/extractor/krasview.py +++ b/youtube_dl/extractor/krasview.py @@ -23,7 +23,7 @@ class KrasViewIE(InfoExtractor): 'title': 'Снег, лёд, заносы', 'description': 'Снято в городе Нягань, в Ханты-Мансийском автономном округе.', 'duration': 27, - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', }, 'params': { 'skip_download': 'Not accessible from Travis CI server', diff --git a/youtube_dl/extractor/kusi.py b/youtube_dl/extractor/kusi.py index 2e66e8cf9..6a7e3baa7 100644 --- a/youtube_dl/extractor/kusi.py +++ b/youtube_dl/extractor/kusi.py @@ -27,7 +27,7 @@ class KUSIIE(InfoExtractor): 'duration': 223.586, 'upload_date': '20160826', 'timestamp': 1472233118, - 'thumbnail': 're:^https?://.*\.jpg$' + 'thumbnail': r're:^https?://.*\.jpg$' }, }, { 'url': 'http://kusi.com/video?clipId=12203019', diff --git a/youtube_dl/extractor/leeco.py b/youtube_dl/extractor/leeco.py index c48a5aad1..4321f90c8 100644 --- a/youtube_dl/extractor/leeco.py +++ b/youtube_dl/extractor/leeco.py @@ -386,8 +386,8 @@ class LetvCloudIE(InfoExtractor): return formats def _real_extract(self, url): - uu_mobj = re.search('uu=([\w]+)', url) - vu_mobj = re.search('vu=([\w]+)', url) + uu_mobj = re.search(r'uu=([\w]+)', url) + vu_mobj = re.search(r'vu=([\w]+)', url) if not uu_mobj or not vu_mobj: raise ExtractorError('Invalid URL: %s' % url, expected=True) diff --git a/youtube_dl/extractor/lemonde.py b/youtube_dl/extractor/lemonde.py index be66fff03..42568f315 100644 --- a/youtube_dl/extractor/lemonde.py +++ b/youtube_dl/extractor/lemonde.py @@ -12,7 +12,7 @@ class LemondeIE(InfoExtractor): 'id': 'lqm3kl', 'ext': 'mp4', 'title': "Comprendre l'affaire Bygmalion en 5 minutes", - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 320, 'upload_date': '20160119', 'timestamp': 1453194778, diff --git a/youtube_dl/extractor/libraryofcongress.py b/youtube_dl/extractor/libraryofcongress.py index 0a94366fd..40295a30b 100644 --- a/youtube_dl/extractor/libraryofcongress.py +++ b/youtube_dl/extractor/libraryofcongress.py @@ -25,7 +25,7 @@ class LibraryOfCongressIE(InfoExtractor): 'id': '90716351', 'ext': 'mp4', 'title': "Pa's trip to Mars", - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 0, 'view_count': int, }, diff --git a/youtube_dl/extractor/libsyn.py b/youtube_dl/extractor/libsyn.py index d375695f5..4750b03a3 100644 --- a/youtube_dl/extractor/libsyn.py +++ b/youtube_dl/extractor/libsyn.py @@ -41,7 +41,7 @@ class LibsynIE(InfoExtractor): formats = [{ 'url': media_url, - } for media_url in set(re.findall('var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))] + } for media_url in set(re.findall(r'var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))] podcast_title = self._search_regex( r'

([^<]+)

', webpage, 'podcast title', default=None) diff --git a/youtube_dl/extractor/lifenews.py b/youtube_dl/extractor/lifenews.py index afce2010e..42e263bfa 100644 --- a/youtube_dl/extractor/lifenews.py +++ b/youtube_dl/extractor/lifenews.py @@ -176,7 +176,7 @@ class LifeEmbedIE(InfoExtractor): 'id': 'e50c2dec2867350528e2574c899b8291', 'ext': 'mp4', 'title': 'e50c2dec2867350528e2574c899b8291', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', } }, { # with 1080p diff --git a/youtube_dl/extractor/limelight.py b/youtube_dl/extractor/limelight.py index b7bfa7a6d..905a0e85f 100644 --- a/youtube_dl/extractor/limelight.py +++ b/youtube_dl/extractor/limelight.py @@ -164,7 +164,7 @@ class LimelightMediaIE(LimelightBaseIE): 'ext': 'mp4', 'title': 'HaP and the HB Prince Trailer', 'description': 'md5:8005b944181778e313d95c1237ddb640', - 'thumbnail': 're:^https?://.*\.jpeg$', + 'thumbnail': r're:^https?://.*\.jpeg$', 'duration': 144.23, 'timestamp': 1244136834, 'upload_date': '20090604', @@ -181,7 +181,7 @@ class LimelightMediaIE(LimelightBaseIE): 'id': 'a3e00274d4564ec4a9b29b9466432335', 'ext': 'mp4', 'title': '3Play Media Overview Video', - 'thumbnail': 're:^https?://.*\.jpeg$', + 'thumbnail': r're:^https?://.*\.jpeg$', 'duration': 78.101, 'timestamp': 1338929955, 'upload_date': '20120605', diff --git a/youtube_dl/extractor/litv.py b/youtube_dl/extractor/litv.py index ded717cf2..337b1b15c 100644 --- a/youtube_dl/extractor/litv.py +++ b/youtube_dl/extractor/litv.py @@ -31,7 +31,7 @@ class LiTVIE(InfoExtractor): 'id': 'VOD00041610', 'ext': 'mp4', 'title': '花千骨第1集', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'description': 'md5:c7017aa144c87467c4fb2909c4b05d6f', 'episode_number': 1, }, @@ -80,7 +80,7 @@ class LiTVIE(InfoExtractor): webpage = self._download_webpage(url, video_id) program_info = self._parse_json(self._search_regex( - 'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'), + r'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'), video_id) season_list = list(program_info.get('seasonList', {}).values()) diff --git a/youtube_dl/extractor/liveleak.py b/youtube_dl/extractor/liveleak.py index b84e4dd6c..c7de65353 100644 --- a/youtube_dl/extractor/liveleak.py +++ b/youtube_dl/extractor/liveleak.py @@ -18,7 +18,7 @@ class LiveLeakIE(InfoExtractor): 'description': 'extremely bad day for this guy..!', 'uploader': 'ljfriel2', 'title': 'Most unlucky car accident', - 'thumbnail': 're:^https?://.*\.jpg$' + 'thumbnail': r're:^https?://.*\.jpg$' } }, { 'url': 'http://www.liveleak.com/view?i=f93_1390833151', @@ -29,7 +29,7 @@ class LiveLeakIE(InfoExtractor): 'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.', 'uploader': 'ARD_Stinkt', 'title': 'German Television does first Edward Snowden Interview (ENGLISH)', - 'thumbnail': 're:^https?://.*\.jpg$' + 'thumbnail': r're:^https?://.*\.jpg$' } }, { 'url': 'http://www.liveleak.com/view?i=4f7_1392687779', @@ -52,7 +52,7 @@ class LiveLeakIE(InfoExtractor): 'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.', 'uploader': 'bony333', 'title': 'Crazy Hungarian tourist films close call waterspout in Croatia', - 'thumbnail': 're:^https?://.*\.jpg$' + 'thumbnail': r're:^https?://.*\.jpg$' } }, { # Covers https://github.com/rg3/youtube-dl/pull/10664#issuecomment-247439521 diff --git a/youtube_dl/extractor/livestream.py b/youtube_dl/extractor/livestream.py index bc7894bf1..c863413bf 100644 --- a/youtube_dl/extractor/livestream.py +++ b/youtube_dl/extractor/livestream.py @@ -37,7 +37,7 @@ class LivestreamIE(InfoExtractor): 'duration': 5968.0, 'like_count': int, 'view_count': int, - 'thumbnail': 're:^http://.*\.jpg$' + 'thumbnail': r're:^http://.*\.jpg$' } }, { 'url': 'http://new.livestream.com/tedx/cityenglish', diff --git a/youtube_dl/extractor/lnkgo.py b/youtube_dl/extractor/lnkgo.py index fd23b0b43..068378c9c 100644 --- a/youtube_dl/extractor/lnkgo.py +++ b/youtube_dl/extractor/lnkgo.py @@ -22,7 +22,7 @@ class LnkGoIE(InfoExtractor): 'description': 'md5:d82a5e36b775b7048617f263a0e3475e', 'age_limit': 7, 'duration': 3019, - 'thumbnail': 're:^https?://.*\.jpg$' + 'thumbnail': r're:^https?://.*\.jpg$' }, 'params': { 'skip_download': True, # HLS download @@ -37,7 +37,7 @@ class LnkGoIE(InfoExtractor): 'description': 'md5:7352d113a242a808676ff17e69db6a69', 'age_limit': 18, 'duration': 346, - 'thumbnail': 're:^https?://.*\.jpg$' + 'thumbnail': r're:^https?://.*\.jpg$' }, 'params': { 'skip_download': True, # HLS download diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py index f4dcfd93f..da94eab56 100644 --- a/youtube_dl/extractor/lynda.py +++ b/youtube_dl/extractor/lynda.py @@ -73,7 +73,7 @@ class LyndaBaseIE(InfoExtractor): # Already logged in if any(re.search(p, signin_page) for p in ( - 'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')): + r'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')): return # Step 2: submit email diff --git a/youtube_dl/extractor/matchtv.py b/youtube_dl/extractor/matchtv.py index 33b0b539f..bc9933a81 100644 --- a/youtube_dl/extractor/matchtv.py +++ b/youtube_dl/extractor/matchtv.py @@ -14,7 +14,7 @@ class MatchTVIE(InfoExtractor): 'info_dict': { 'id': 'matchtv-live', 'ext': 'flv', - 'title': 're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', + 'title': r're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', 'is_live': True, }, 'params': { diff --git a/youtube_dl/extractor/mdr.py b/youtube_dl/extractor/mdr.py index 2100583df..6e4290aad 100644 --- a/youtube_dl/extractor/mdr.py +++ b/youtube_dl/extractor/mdr.py @@ -72,7 +72,7 @@ class MDRIE(InfoExtractor): data_url = self._search_regex( r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1', - webpage, 'data url', group='url').replace('\/', '/') + webpage, 'data url', group='url').replace(r'\/', '/') doc = self._download_xml( compat_urlparse.urljoin(url, data_url), video_id) diff --git a/youtube_dl/extractor/meipai.py b/youtube_dl/extractor/meipai.py index 35914fd4b..c8eacb4f4 100644 --- a/youtube_dl/extractor/meipai.py +++ b/youtube_dl/extractor/meipai.py @@ -21,7 +21,7 @@ class MeipaiIE(InfoExtractor): 'ext': 'mp4', 'title': '#葉子##阿桑##余姿昀##超級女聲#', 'description': '#葉子##阿桑##余姿昀##超級女聲#', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 152, 'timestamp': 1465492420, 'upload_date': '20160609', @@ -38,7 +38,7 @@ class MeipaiIE(InfoExtractor): 'ext': 'mp4', 'title': '姿昀和善願 練歌練琴啦😁😁😁', 'description': '姿昀和善願 練歌練琴啦😁😁😁', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 5975, 'timestamp': 1474311799, 'upload_date': '20160919', diff --git a/youtube_dl/extractor/melonvod.py b/youtube_dl/extractor/melonvod.py index 2c80b3ba8..bd8cf13ab 100644 --- a/youtube_dl/extractor/melonvod.py +++ b/youtube_dl/extractor/melonvod.py @@ -16,7 +16,7 @@ class MelonVODIE(InfoExtractor): 'id': '50158734', 'ext': 'mp4', 'title': "Jessica 'Wonderland' MV Making Film", - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'artist': 'Jessica (제시카)', 'upload_date': '20161212', 'duration': 203, diff --git a/youtube_dl/extractor/metacafe.py b/youtube_dl/extractor/metacafe.py index e6e7659a1..9880924e6 100644 --- a/youtube_dl/extractor/metacafe.py +++ b/youtube_dl/extractor/metacafe.py @@ -133,7 +133,7 @@ class MetacafeIE(InfoExtractor): video_id, display_id = re.match(self._VALID_URL, url).groups() # the video may come from an external site - m_external = re.match('^(\w{2})-(.*)$', video_id) + m_external = re.match(r'^(\w{2})-(.*)$', video_id) if m_external is not None: prefix, ext_id = m_external.groups() # Check if video comes from YouTube diff --git a/youtube_dl/extractor/mgoon.py b/youtube_dl/extractor/mgoon.py index 94bc87b00..7bb473900 100644 --- a/youtube_dl/extractor/mgoon.py +++ b/youtube_dl/extractor/mgoon.py @@ -27,7 +27,7 @@ class MgoonIE(InfoExtractor): 'upload_date': '20131220', 'ext': 'mp4', 'title': 'md5:543aa4c27a4931d371c3f433e8cebebc', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }, { diff --git a/youtube_dl/extractor/mgtv.py b/youtube_dl/extractor/mgtv.py index e0bb5d208..659ede8c2 100644 --- a/youtube_dl/extractor/mgtv.py +++ b/youtube_dl/extractor/mgtv.py @@ -18,7 +18,7 @@ class MGTVIE(InfoExtractor): 'title': '我是歌手第四季双年巅峰会:韩红李玟“双王”领军对抗', 'description': '我是歌手第四季双年巅峰会', 'duration': 7461, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { # no tbr extracted from stream_url diff --git a/youtube_dl/extractor/minhateca.py b/youtube_dl/extractor/minhateca.py index e6730b75a..dccc54249 100644 --- a/youtube_dl/extractor/minhateca.py +++ b/youtube_dl/extractor/minhateca.py @@ -19,7 +19,7 @@ class MinhatecaIE(InfoExtractor): 'id': '125848331', 'ext': 'mp4', 'title': 'youtube-dl test video', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'filesize_approx': 1530000, 'duration': 9, 'view_count': int, diff --git a/youtube_dl/extractor/ministrygrid.py b/youtube_dl/extractor/ministrygrid.py index 10190d5f6..8ad9239c5 100644 --- a/youtube_dl/extractor/ministrygrid.py +++ b/youtube_dl/extractor/ministrygrid.py @@ -17,7 +17,7 @@ class MinistryGridIE(InfoExtractor): 'id': '3453494717001', 'ext': 'mp4', 'title': 'The Gospel by Numbers', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'upload_date': '20140410', 'description': 'Coming soon from T4G 2014!', 'uploader_id': '2034960640001', diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py index f577836be..8984d3b8d 100644 --- a/youtube_dl/extractor/mitele.py +++ b/youtube_dl/extractor/mitele.py @@ -90,7 +90,7 @@ class MiTeleIE(InfoExtractor): 'season_id': 'diario_de_t14_11981', 'episode': 'Programa 144', 'episode_number': 3, - 'thumbnail': 're:(?i)^https?://.*\.jpg$', + 'thumbnail': r're:(?i)^https?://.*\.jpg$', 'duration': 2913, }, 'add_ie': ['Ooyala'], @@ -108,7 +108,7 @@ class MiTeleIE(InfoExtractor): 'season_id': 'cuarto_milenio_t06_12715', 'episode': 'Programa 226', 'episode_number': 24, - 'thumbnail': 're:(?i)^https?://.*\.jpg$', + 'thumbnail': r're:(?i)^https?://.*\.jpg$', 'duration': 7313, }, 'params': { diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py index 202c05dcb..4ba2310fd 100644 --- a/youtube_dl/extractor/mixcloud.py +++ b/youtube_dl/extractor/mixcloud.py @@ -34,7 +34,7 @@ class MixcloudIE(InfoExtractor): 'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.', 'uploader': 'Daniel Holbach', 'uploader_id': 'dholbach', - 'thumbnail': 're:https?://.*\.jpg', + 'thumbnail': r're:https?://.*\.jpg', 'view_count': int, 'like_count': int, }, diff --git a/youtube_dl/extractor/mlb.py b/youtube_dl/extractor/mlb.py index e242b897f..59cd4b838 100644 --- a/youtube_dl/extractor/mlb.py +++ b/youtube_dl/extractor/mlb.py @@ -37,7 +37,7 @@ class MLBIE(InfoExtractor): 'duration': 66, 'timestamp': 1405980600, 'upload_date': '20140721', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -51,7 +51,7 @@ class MLBIE(InfoExtractor): 'duration': 46, 'timestamp': 1405105800, 'upload_date': '20140711', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -65,7 +65,7 @@ class MLBIE(InfoExtractor): 'duration': 488, 'timestamp': 1405399936, 'upload_date': '20140715', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -79,7 +79,7 @@ class MLBIE(InfoExtractor): 'duration': 52, 'timestamp': 1405390722, 'upload_date': '20140715', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { diff --git a/youtube_dl/extractor/mnet.py b/youtube_dl/extractor/mnet.py index e3f42e7bd..6a85dcbd5 100644 --- a/youtube_dl/extractor/mnet.py +++ b/youtube_dl/extractor/mnet.py @@ -22,7 +22,7 @@ class MnetIE(InfoExtractor): 'timestamp': 1451564040, 'age_limit': 0, 'thumbnails': 'mincount:5', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'ext': 'flv', }, 'params': { diff --git a/youtube_dl/extractor/moevideo.py b/youtube_dl/extractor/moevideo.py index 91ee9c4e9..44bcc4982 100644 --- a/youtube_dl/extractor/moevideo.py +++ b/youtube_dl/extractor/moevideo.py @@ -30,7 +30,7 @@ class MoeVideoIE(InfoExtractor): 'ext': 'flv', 'title': 'Sink cut out machine', 'description': 'md5:f29ff97b663aefa760bf7ca63c8ca8a8', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'width': 540, 'height': 360, 'duration': 179, @@ -46,7 +46,7 @@ class MoeVideoIE(InfoExtractor): 'ext': 'flv', 'title': 'Operacion Condor.', 'description': 'md5:7e68cb2fcda66833d5081c542491a9a3', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'width': 480, 'height': 296, 'duration': 6027, diff --git a/youtube_dl/extractor/mofosex.py b/youtube_dl/extractor/mofosex.py index e3bbe5aa8..54716f5c7 100644 --- a/youtube_dl/extractor/mofosex.py +++ b/youtube_dl/extractor/mofosex.py @@ -18,7 +18,7 @@ class MofosexIE(KeezMoviesIE): 'display_id': 'amateur-teen-playing-and-masturbating-318131', 'ext': 'mp4', 'title': 'amateur teen playing and masturbating', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20121114', 'view_count': int, 'like_count': int, diff --git a/youtube_dl/extractor/mojvideo.py b/youtube_dl/extractor/mojvideo.py index 0ba435dc5..165e658c9 100644 --- a/youtube_dl/extractor/mojvideo.py +++ b/youtube_dl/extractor/mojvideo.py @@ -20,7 +20,7 @@ class MojvideoIE(InfoExtractor): 'display_id': 'v-avtu-pred-mano-rdecelaska-alfi-nipic', 'ext': 'mp4', 'title': 'V avtu pred mano rdečelaska - Alfi Nipič', - 'thumbnail': 're:^http://.*\.jpg$', + 'thumbnail': r're:^http://.*\.jpg$', 'duration': 242, } } diff --git a/youtube_dl/extractor/motherless.py b/youtube_dl/extractor/motherless.py index 5e1a8a71a..6fe3b6049 100644 --- a/youtube_dl/extractor/motherless.py +++ b/youtube_dl/extractor/motherless.py @@ -23,7 +23,7 @@ class MotherlessIE(InfoExtractor): 'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'], 'upload_date': '20100913', 'uploader_id': 'famouslyfuckedup', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'age_limit': 18, } }, { @@ -37,7 +37,7 @@ class MotherlessIE(InfoExtractor): 'game', 'hairy'], 'upload_date': '20140622', 'uploader_id': 'Sulivana7x', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'age_limit': 18, }, 'skip': '404', @@ -51,7 +51,7 @@ class MotherlessIE(InfoExtractor): 'categories': ['superheroine heroine superher'], 'upload_date': '20140827', 'uploader_id': 'shade0230', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'age_limit': 18, } }, { diff --git a/youtube_dl/extractor/movieclips.py b/youtube_dl/extractor/movieclips.py index 30c206f9b..5453da1ac 100644 --- a/youtube_dl/extractor/movieclips.py +++ b/youtube_dl/extractor/movieclips.py @@ -20,7 +20,7 @@ class MovieClipsIE(InfoExtractor): 'ext': 'mp4', 'title': 'Warcraft Trailer 1', 'description': 'Watch Trailer 1 from Warcraft (2016). Legendary’s WARCRAFT is a 3D epic adventure of world-colliding conflict based.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1446843055, 'upload_date': '20151106', 'uploader': 'Movieclips', diff --git a/youtube_dl/extractor/moviezine.py b/youtube_dl/extractor/moviezine.py index 478e39967..85cc6e22f 100644 --- a/youtube_dl/extractor/moviezine.py +++ b/youtube_dl/extractor/moviezine.py @@ -16,7 +16,7 @@ class MoviezineIE(InfoExtractor): 'ext': 'mp4', 'title': 'Oculus - Trailer 1', 'description': 'md5:40cc6790fc81d931850ca9249b40e8a4', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, } diff --git a/youtube_dl/extractor/movingimage.py b/youtube_dl/extractor/movingimage.py index bb789c32e..4f62d628a 100644 --- a/youtube_dl/extractor/movingimage.py +++ b/youtube_dl/extractor/movingimage.py @@ -18,7 +18,7 @@ class MovingImageIE(InfoExtractor): 'title': 'SHETLAND WOOL', 'description': 'md5:c5afca6871ad59b4271e7704fe50ab04', 'duration': 900, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, } diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py index 03351917e..e1f1f8fa4 100644 --- a/youtube_dl/extractor/mtv.py +++ b/youtube_dl/extractor/mtv.py @@ -77,7 +77,7 @@ class MTVServicesInfoExtractor(InfoExtractor): url = re.sub(r'.+pxE=mp4', 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=0+_pxK=18639+_pxE=mp4', url, 1) return [{'url': url, 'ext': 'mp4'}] - def _extract_video_formats(self, mdoc, mtvn_id): + def _extract_video_formats(self, mdoc, mtvn_id, video_id): if re.match(r'.*/(error_country_block\.swf|geoblock\.mp4|copyright_error\.flv(?:\?geo\b.+?)?)$', mdoc.find('.//src').text) is not None: if mtvn_id is not None and self._MOBILE_TEMPLATE is not None: self.to_screen('The normal version is not available from your ' @@ -88,21 +88,26 @@ class MTVServicesInfoExtractor(InfoExtractor): formats = [] for rendition in mdoc.findall('.//rendition'): - try: - _, _, ext = rendition.attrib['type'].partition('/') - rtmp_video_url = rendition.find('./src').text - if rtmp_video_url.endswith('siteunavail.png'): - continue - new_urls = self._transform_rtmp_url(rtmp_video_url) - formats.extend([{ - 'ext': 'flv' if new_url.startswith('rtmp') else ext, - 'url': new_url, - 'format_id': '-'.join(filter(None, [kind, rendition.get('bitrate')])), - 'width': int(rendition.get('width')), - 'height': int(rendition.get('height')), - } for kind, new_url in new_urls.items()]) - except (KeyError, TypeError): - raise ExtractorError('Invalid rendition field.') + if rendition.attrib['method'] == 'hls': + hls_url = rendition.find('./src').text + formats.extend(self._extract_m3u8_formats(hls_url, video_id, ext='mp4')) + else: + # fms + try: + _, _, ext = rendition.attrib['type'].partition('/') + rtmp_video_url = rendition.find('./src').text + if rtmp_video_url.endswith('siteunavail.png'): + continue + new_urls = self._transform_rtmp_url(rtmp_video_url) + formats.extend([{ + 'ext': 'flv' if new_url.startswith('rtmp') else ext, + 'url': new_url, + 'format_id': '-'.join(filter(None, [kind, rendition.get('bitrate')])), + 'width': int(rendition.get('width')), + 'height': int(rendition.get('height')), + } for kind, new_url in new_urls.items()]) + except (KeyError, TypeError): + raise ExtractorError('Invalid rendition field.') self._sort_formats(formats) return formats @@ -118,15 +123,17 @@ class MTVServicesInfoExtractor(InfoExtractor): } for typographic in transcript.findall('./typographic')] return subtitles - def _get_video_info(self, itemdoc): + def _get_video_info(self, itemdoc, use_hls): uri = itemdoc.find('guid').text video_id = self._id_from_uri(uri) self.report_extraction(video_id) content_el = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content'))) mediagen_url = self._remove_template_parameter(content_el.attrib['url']) + mediagen_url = mediagen_url.replace('device={device}', '') if 'acceptMethods' not in mediagen_url: mediagen_url += '&' if '?' in mediagen_url else '?' - mediagen_url += 'acceptMethods=fms' + mediagen_url += 'acceptMethods=' + mediagen_url += 'hls' if use_hls else 'fms' mediagen_doc = self._download_xml(mediagen_url, video_id, 'Downloading video urls') @@ -167,9 +174,11 @@ class MTVServicesInfoExtractor(InfoExtractor): if mtvn_id_node is not None: mtvn_id = mtvn_id_node.text + formats = self._extract_video_formats(mediagen_doc, mtvn_id, video_id) + return { 'title': title, - 'formats': self._extract_video_formats(mediagen_doc, mtvn_id), + 'formats': formats, 'subtitles': self._extract_subtitles(mediagen_doc, mtvn_id), 'id': video_id, 'thumbnail': self._get_thumbnail_url(uri, itemdoc), @@ -184,13 +193,13 @@ class MTVServicesInfoExtractor(InfoExtractor): data['lang'] = self._LANG return data - def _get_videos_info(self, uri): + def _get_videos_info(self, uri, use_hls=False): video_id = self._id_from_uri(uri) feed_url = self._get_feed_url(uri) info_url = update_url_query(feed_url, self._get_feed_query(uri)) - return self._get_videos_info_from_url(info_url, video_id) + return self._get_videos_info_from_url(info_url, video_id, use_hls) - def _get_videos_info_from_url(self, url, video_id): + def _get_videos_info_from_url(self, url, video_id, use_hls): idoc = self._download_xml( url, video_id, 'Downloading info', transform_source=fix_xml_ampersands) @@ -199,7 +208,7 @@ class MTVServicesInfoExtractor(InfoExtractor): description = xpath_text(idoc, './channel/description') return self.playlist_result( - [self._get_video_info(item) for item in idoc.findall('.//item')], + [self._get_video_info(item, use_hls) for item in idoc.findall('.//item')], playlist_title=title, playlist_description=description) def _extract_mgid(self, webpage, default=NO_DEFAULT): diff --git a/youtube_dl/extractor/muenchentv.py b/youtube_dl/extractor/muenchentv.py index d9f176136..2cc2bf229 100644 --- a/youtube_dl/extractor/muenchentv.py +++ b/youtube_dl/extractor/muenchentv.py @@ -22,7 +22,7 @@ class MuenchenTVIE(InfoExtractor): 'ext': 'mp4', 'title': 're:^münchen.tv-Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'is_live': True, - 'thumbnail': 're:^https?://.*\.jpg$' + 'thumbnail': r're:^https?://.*\.jpg$' }, 'params': { 'skip_download': True, diff --git a/youtube_dl/extractor/mwave.py b/youtube_dl/extractor/mwave.py index fea1caf47..a67276596 100644 --- a/youtube_dl/extractor/mwave.py +++ b/youtube_dl/extractor/mwave.py @@ -18,7 +18,7 @@ class MwaveIE(InfoExtractor): 'id': '168859', 'ext': 'flv', 'title': '[M COUNTDOWN] SISTAR - SHAKE IT', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'M COUNTDOWN', 'duration': 206, 'view_count': int, @@ -70,7 +70,7 @@ class MwaveMeetGreetIE(InfoExtractor): 'id': '173294', 'ext': 'flv', 'title': '[MEET&GREET] Park BoRam', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'Mwave', 'duration': 3634, 'view_count': int, diff --git a/youtube_dl/extractor/myvi.py b/youtube_dl/extractor/myvi.py index 4c65be122..621ae74a7 100644 --- a/youtube_dl/extractor/myvi.py +++ b/youtube_dl/extractor/myvi.py @@ -27,7 +27,7 @@ class MyviIE(SprutoBaseIE): 'id': 'f16b2bbd-cde8-481c-a981-7cd48605df43', 'ext': 'mp4', 'title': 'хозяин жизни', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 25, }, }, { diff --git a/youtube_dl/extractor/myvideo.py b/youtube_dl/extractor/myvideo.py index 6d447a493..6bb64eb63 100644 --- a/youtube_dl/extractor/myvideo.py +++ b/youtube_dl/extractor/myvideo.py @@ -160,7 +160,7 @@ class MyVideoIE(InfoExtractor): else: video_playpath = '' - video_swfobj = self._search_regex('swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj') + video_swfobj = self._search_regex(r'swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj') video_swfobj = compat_urllib_parse_unquote(video_swfobj) video_title = self._html_search_regex("(.*?)", diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py index 4e96e78c3..434a94de4 100644 --- a/youtube_dl/extractor/nbc.py +++ b/youtube_dl/extractor/nbc.py @@ -276,7 +276,7 @@ class NBCNewsIE(ThePlatformIE): 'ext': 'mp4', 'title': 'The chaotic GOP immigration vote', 'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1406937606, 'upload_date': '20140802', 'uploader': 'NBCU-NEWS', diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py index e3b0da2e9..07528d140 100644 --- a/youtube_dl/extractor/ndr.py +++ b/youtube_dl/extractor/ndr.py @@ -302,7 +302,7 @@ class NDREmbedIE(NDREmbedBaseIE): 'info_dict': { 'id': 'livestream217', 'ext': 'flv', - 'title': 're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', + 'title': r're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', 'is_live': True, 'upload_date': '20150910', }, @@ -367,7 +367,7 @@ class NJoyEmbedIE(NDREmbedBaseIE): 'info_dict': { 'id': 'webradioweltweit100', 'ext': 'mp3', - 'title': 're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', + 'title': r're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', 'is_live': True, 'uploader': 'njoy', 'upload_date': '20150810', diff --git a/youtube_dl/extractor/ndtv.py b/youtube_dl/extractor/ndtv.py index 96528f649..255f60878 100644 --- a/youtube_dl/extractor/ndtv.py +++ b/youtube_dl/extractor/ndtv.py @@ -21,7 +21,7 @@ class NDTVIE(InfoExtractor): 'description': 'md5:ab2d4b4a6056c5cb4caa6d729deabf02', 'upload_date': '20131208', 'duration': 1327, - 'thumbnail': 're:https?://.*\.jpg', + 'thumbnail': r're:https?://.*\.jpg', }, } diff --git a/youtube_dl/extractor/netzkino.py b/youtube_dl/extractor/netzkino.py index 0d165a82a..aec3026b1 100644 --- a/youtube_dl/extractor/netzkino.py +++ b/youtube_dl/extractor/netzkino.py @@ -25,7 +25,7 @@ class NetzkinoIE(InfoExtractor): 'comments': 'mincount:3', 'description': 'md5:1eddeacc7e62d5a25a2d1a7290c64a28', 'upload_date': '20120813', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'timestamp': 1344858571, 'age_limit': 12, }, diff --git a/youtube_dl/extractor/nextmedia.py b/youtube_dl/extractor/nextmedia.py index dee9056d3..c900f232a 100644 --- a/youtube_dl/extractor/nextmedia.py +++ b/youtube_dl/extractor/nextmedia.py @@ -15,7 +15,7 @@ class NextMediaIE(InfoExtractor): 'id': '53109199', 'ext': 'mp4', 'title': '【佔領金鐘】50外國領事議員撐場 讚學生勇敢香港有希望', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:28222b9912b6665a21011b034c70fcc7', 'timestamp': 1415456273, 'upload_date': '20141108', @@ -76,7 +76,7 @@ class NextMediaActionNewsIE(NextMediaIE): 'id': '19009428', 'ext': 'mp4', 'title': '【壹週刊】細10年男友偷食 50歲邵美琪再失戀', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:cd802fad1f40fd9ea178c1e2af02d659', 'timestamp': 1421791200, 'upload_date': '20150120', @@ -101,7 +101,7 @@ class AppleDailyIE(NextMediaIE): 'id': '36354694', 'ext': 'mp4', 'title': '周亭羽走過摩鐵陰霾2男陪吃 九把刀孤寒看醫生', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:2acd430e59956dc47cd7f67cb3c003f4', 'upload_date': '20150128', } @@ -112,7 +112,7 @@ class AppleDailyIE(NextMediaIE): 'id': '550549', 'ext': 'mp4', 'title': '不滿被踩腳 山東兩大媽一路打下車', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:175b4260c1d7c085993474217e4ab1b4', 'upload_date': '20150128', } @@ -123,7 +123,7 @@ class AppleDailyIE(NextMediaIE): 'id': '5003671', 'ext': 'mp4', 'title': '20正妹熱舞 《刀龍傳說Online》火辣上市', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:23c0aac567dc08c9c16a3161a2c2e3cd', 'upload_date': '20150128', }, @@ -150,7 +150,7 @@ class AppleDailyIE(NextMediaIE): 'id': '35770334', 'ext': 'mp4', 'title': '咖啡占卜測 XU裝熟指數', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748', 'upload_date': '20140417', }, diff --git a/youtube_dl/extractor/nfl.py b/youtube_dl/extractor/nfl.py index 3930d16f1..460deb162 100644 --- a/youtube_dl/extractor/nfl.py +++ b/youtube_dl/extractor/nfl.py @@ -72,7 +72,7 @@ class NFLIE(InfoExtractor): 'description': 'md5:56323bfb0ac4ee5ab24bd05fdf3bf478', 'upload_date': '20140921', 'timestamp': 1411337580, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }, { 'url': 'http://prod.www.steelers.clubs.nfl.com/video-and-audio/videos/LIVE_Post_Game_vs_Browns/9d72f26a-9e2b-4718-84d3-09fb4046c266', @@ -84,7 +84,7 @@ class NFLIE(InfoExtractor): 'description': 'md5:6a97f7e5ebeb4c0e69a418a89e0636e8', 'upload_date': '20131229', 'timestamp': 1388354455, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }, { 'url': 'http://www.nfl.com/news/story/0ap3000000467586/article/patriots-seahawks-involved-in-lategame-skirmish', diff --git a/youtube_dl/extractor/nosvideo.py b/youtube_dl/extractor/nosvideo.py index eab816e49..53c500c35 100644 --- a/youtube_dl/extractor/nosvideo.py +++ b/youtube_dl/extractor/nosvideo.py @@ -17,7 +17,7 @@ _x = lambda p: xpath_with_ns(p, {'xspf': 'http://xspf.org/ns/0/'}) class NosVideoIE(InfoExtractor): _VALID_URL = r'https?://(?:www\.)?nosvideo\.com/' + \ - '(?:embed/|\?v=)(?P[A-Za-z0-9]{12})/?' + r'(?:embed/|\?v=)(?P[A-Za-z0-9]{12})/?' _PLAYLIST_URL = 'http://nosvideo.com/xml/{xml_id:s}.xml' _FILE_DELETED_REGEX = r'File Not Found' _TEST = { @@ -27,7 +27,7 @@ class NosVideoIE(InfoExtractor): 'id': 'mu8fle7g7rpq', 'ext': 'mp4', 'title': 'big_buck_bunny_480p_surround-fix.avi.mp4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/nova.py b/youtube_dl/extractor/nova.py index 103952345..06cb8cb3f 100644 --- a/youtube_dl/extractor/nova.py +++ b/youtube_dl/extractor/nova.py @@ -21,7 +21,7 @@ class NovaIE(InfoExtractor): 'ext': 'flv', 'title': 'Duel: Michal Hrdlička a Petr Suchoň', 'description': 'md5:d0cc509858eee1b1374111c588c6f5d5', - 'thumbnail': 're:^https?://.*\.(?:jpg)', + 'thumbnail': r're:^https?://.*\.(?:jpg)', }, 'params': { # rtmp download @@ -36,7 +36,7 @@ class NovaIE(InfoExtractor): 'ext': 'mp4', 'title': 'Podzemní nemocnice v pražské Krči', 'description': 'md5:f0a42dd239c26f61c28f19e62d20ef53', - 'thumbnail': 're:^https?://.*\.(?:jpg)', + 'thumbnail': r're:^https?://.*\.(?:jpg)', } }, { 'url': 'http://novaplus.nova.cz/porad/policie-modrava/video/5591-policie-modrava-15-dil-blondynka-na-hrbitove', @@ -46,7 +46,7 @@ class NovaIE(InfoExtractor): 'ext': 'flv', 'title': 'Policie Modrava - 15. díl - Blondýnka na hřbitově', 'description': 'md5:dc24e50be5908df83348e50d1431295e', # Make sure this description is clean of html tags - 'thumbnail': 're:^https?://.*\.(?:jpg)', + 'thumbnail': r're:^https?://.*\.(?:jpg)', }, 'params': { # rtmp download @@ -58,7 +58,7 @@ class NovaIE(InfoExtractor): 'id': '1756858', 'ext': 'flv', 'title': 'Televizní noviny - 30. 5. 2015', - 'thumbnail': 're:^https?://.*\.(?:jpg)', + 'thumbnail': r're:^https?://.*\.(?:jpg)', 'upload_date': '20150530', }, 'params': { @@ -72,7 +72,7 @@ class NovaIE(InfoExtractor): 'ext': 'mp4', 'title': 'Zaklínač 3: Divoký hon', 'description': 're:.*Pokud se stejně jako my nemůžete.*', - 'thumbnail': 're:https?://.*\.jpg(\?.*)?', + 'thumbnail': r're:https?://.*\.jpg(\?.*)?', 'upload_date': '20150521', }, 'params': { diff --git a/youtube_dl/extractor/novamov.py b/youtube_dl/extractor/novamov.py index 3bbd47355..829c71960 100644 --- a/youtube_dl/extractor/novamov.py +++ b/youtube_dl/extractor/novamov.py @@ -24,7 +24,7 @@ class NovaMovIE(InfoExtractor): ) (?P[a-z\d]{13}) ''' - _VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'} + _VALID_URL = _VALID_URL_TEMPLATE % {'host': r'novamov\.com'} _HOST = 'www.novamov.com' @@ -104,7 +104,7 @@ class WholeCloudIE(NovaMovIE): IE_NAME = 'wholecloud' IE_DESC = 'WholeCloud' - _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': '(?:wholecloud\.net|movshare\.(?:net|sx|ag))'} + _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'(?:wholecloud\.net|movshare\.(?:net|sx|ag))'} _HOST = 'www.wholecloud.net' @@ -128,7 +128,7 @@ class NowVideoIE(NovaMovIE): IE_NAME = 'nowvideo' IE_DESC = 'NowVideo' - _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'} + _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'} _HOST = 'www.nowvideo.to' @@ -152,7 +152,7 @@ class VideoWeedIE(NovaMovIE): IE_NAME = 'videoweed' IE_DESC = 'VideoWeed' - _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'videoweed\.(?:es|com)'} + _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'videoweed\.(?:es|com)'} _HOST = 'www.videoweed.es' @@ -176,7 +176,7 @@ class CloudTimeIE(NovaMovIE): IE_NAME = 'cloudtime' IE_DESC = 'CloudTime' - _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'cloudtime\.to'} + _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'cloudtime\.to'} _HOST = 'www.cloudtime.to' @@ -190,7 +190,7 @@ class AuroraVidIE(NovaMovIE): IE_NAME = 'auroravid' IE_DESC = 'AuroraVid' - _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'auroravid\.to'} + _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'auroravid\.to'} _HOST = 'www.auroravid.to' diff --git a/youtube_dl/extractor/nowness.py b/youtube_dl/extractor/nowness.py index 7e5346316..b6c5ee6e4 100644 --- a/youtube_dl/extractor/nowness.py +++ b/youtube_dl/extractor/nowness.py @@ -62,7 +62,7 @@ class NownessIE(NownessBaseIE): 'ext': 'mp4', 'title': 'Candor: The Art of Gesticulation', 'description': 'Candor: The Art of Gesticulation', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1446745676, 'upload_date': '20151105', 'uploader_id': '2385340575001', @@ -76,7 +76,7 @@ class NownessIE(NownessBaseIE): 'ext': 'mp4', 'title': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR', 'description': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1407315371, 'upload_date': '20140806', 'uploader_id': '2385340575001', @@ -91,7 +91,7 @@ class NownessIE(NownessBaseIE): 'ext': 'mp4', 'title': 'Bleu, Blanc, Rouge - A Godard Supercut', 'description': 'md5:f0ea5f1857dffca02dbd37875d742cec', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'upload_date': '20150607', 'uploader': 'Cinema Sem Lei', 'uploader_id': 'cinemasemlei', diff --git a/youtube_dl/extractor/nowtv.py b/youtube_dl/extractor/nowtv.py index 916a102bf..e43b37136 100644 --- a/youtube_dl/extractor/nowtv.py +++ b/youtube_dl/extractor/nowtv.py @@ -83,7 +83,7 @@ class NowTVIE(NowTVBaseIE): 'ext': 'flv', 'title': 'Inka Bause stellt die neuen Bauern vor', 'description': 'md5:e234e1ed6d63cf06be5c070442612e7e', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1432580700, 'upload_date': '20150525', 'duration': 2786, @@ -101,7 +101,7 @@ class NowTVIE(NowTVBaseIE): 'ext': 'flv', 'title': 'Berlin - Tag & Nacht (Folge 934)', 'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1432666800, 'upload_date': '20150526', 'duration': 2641, @@ -119,7 +119,7 @@ class NowTVIE(NowTVBaseIE): 'ext': 'flv', 'title': 'Hals- und Beinbruch', 'description': 'md5:b50d248efffe244e6f56737f0911ca57', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1432415400, 'upload_date': '20150523', 'duration': 2742, @@ -137,7 +137,7 @@ class NowTVIE(NowTVBaseIE): 'ext': 'flv', 'title': 'Angst!', 'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1222632900, 'upload_date': '20080928', 'duration': 3025, @@ -155,7 +155,7 @@ class NowTVIE(NowTVBaseIE): 'ext': 'flv', 'title': 'Thema u.a.: Der erste Blick: Die Apple Watch', 'description': 'md5:4312b6c9d839ffe7d8caf03865a531af', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1432751700, 'upload_date': '20150527', 'duration': 1083, @@ -173,7 +173,7 @@ class NowTVIE(NowTVBaseIE): 'ext': 'flv', 'title': "Büro-Fall / Chihuahua 'Joel'", 'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1432408200, 'upload_date': '20150523', 'duration': 3092, diff --git a/youtube_dl/extractor/noz.py b/youtube_dl/extractor/noz.py index c47a33d15..ccafd7723 100644 --- a/youtube_dl/extractor/noz.py +++ b/youtube_dl/extractor/noz.py @@ -24,7 +24,7 @@ class NozIE(InfoExtractor): 'duration': 215, 'title': '3:2 - Deutschland gewinnt Badminton-Länderspiel in Melle', 'description': 'Vor rund 370 Zuschauern gewinnt die deutsche Badminton-Nationalmannschaft am Donnerstag ein EM-Vorbereitungsspiel gegen Frankreich in Melle. Video Moritz Frankenberg.', - 'thumbnail': 're:^http://.*\.jpg', + 'thumbnail': r're:^http://.*\.jpg', }, }] diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py index 776c40b94..ea7be005a 100644 --- a/youtube_dl/extractor/nrk.py +++ b/youtube_dl/extractor/nrk.py @@ -207,7 +207,15 @@ class NRKIE(NRKBaseIE): class NRKTVIE(NRKBaseIE): IE_DESC = 'NRK TV and NRK Radio' - _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P\d+))?' + _EPISODE_RE = r'(?P[a-zA-Z]{4}\d{8})' + _VALID_URL = r'''(?x) + https?:// + (?:tv|radio)\.nrk(?:super)?\.no/ + (?:serie/[^/]+|program)/ + (?![Ee]pisodes)%s + (?:/\d{2}-\d{2}-\d{4})? + (?:\#del=(?P\d+))? + ''' % _EPISODE_RE _API_HOST = 'psapi-we.nrk.no' _TESTS = [{ @@ -286,9 +294,30 @@ class NRKTVDirekteIE(NRKTVIE): }] -class NRKPlaylistIE(InfoExtractor): - _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P[^/]+)' +class NRKPlaylistBaseIE(InfoExtractor): + def _extract_description(self, webpage): + pass + def _real_extract(self, url): + playlist_id = self._match_id(url) + + webpage = self._download_webpage(url, playlist_id) + + entries = [ + self.url_result('nrk:%s' % video_id, NRKIE.ie_key()) + for video_id in re.findall(self._ITEM_RE, webpage) + ] + + playlist_title = self. _extract_title(webpage) + playlist_description = self._extract_description(webpage) + + return self.playlist_result( + entries, playlist_id, playlist_title, playlist_description) + + +class NRKPlaylistIE(NRKPlaylistBaseIE): + _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P[^/]+)' + _ITEM_RE = r'class="[^"]*\brich\b[^"]*"[^>]+data-video-id="([^"]+)"' _TESTS = [{ 'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763', 'info_dict': { @@ -307,23 +336,28 @@ class NRKPlaylistIE(InfoExtractor): 'playlist_count': 5, }] - def _real_extract(self, url): - playlist_id = self._match_id(url) + def _extract_title(self, webpage): + return self._og_search_title(webpage, fatal=False) - webpage = self._download_webpage(url, playlist_id) + def _extract_description(self, webpage): + return self._og_search_description(webpage) - entries = [ - self.url_result('nrk:%s' % video_id, 'NRK') - for video_id in re.findall( - r'class="[^"]*\brich\b[^"]*"[^>]+data-video-id="([^"]+)"', - webpage) - ] - playlist_title = self._og_search_title(webpage) - playlist_description = self._og_search_description(webpage) +class NRKTVEpisodesIE(NRKPlaylistBaseIE): + _VALID_URL = r'https?://tv\.nrk\.no/program/[Ee]pisodes/[^/]+/(?P\d+)' + _ITEM_RE = r'data-episode=["\']%s' % NRKTVIE._EPISODE_RE + _TESTS = [{ + 'url': 'https://tv.nrk.no/program/episodes/nytt-paa-nytt/69031', + 'info_dict': { + 'id': '69031', + 'title': 'Nytt på nytt, sesong: 201210', + }, + 'playlist_count': 4, + }] - return self.playlist_result( - entries, playlist_id, playlist_title, playlist_description) + def _extract_title(self, webpage): + return self._html_search_regex( + r'

([^<]+)

', webpage, 'title', fatal=False) class NRKSkoleIE(InfoExtractor): diff --git a/youtube_dl/extractor/ntvde.py b/youtube_dl/extractor/ntvde.py index d28a81542..101a5374c 100644 --- a/youtube_dl/extractor/ntvde.py +++ b/youtube_dl/extractor/ntvde.py @@ -22,7 +22,7 @@ class NTVDeIE(InfoExtractor): 'info_dict': { 'id': '14438086', 'ext': 'mp4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'title': 'Schnee und Glätte führen zu zahlreichen Unfällen und Staus', 'alt_title': 'Winterchaos auf deutschen Straßen', 'description': 'Schnee und Glätte sorgen deutschlandweit für einen chaotischen Start in die Woche: Auf den Straßen kommt es zu kilometerlangen Staus und Dutzenden Glätteunfällen. In Düsseldorf und München wirbelt der Schnee zudem den Flugplan durcheinander. Dutzende Flüge landen zu spät, einige fallen ganz aus.', diff --git a/youtube_dl/extractor/ntvru.py b/youtube_dl/extractor/ntvru.py index 7d7a785ab..4f9cedb84 100644 --- a/youtube_dl/extractor/ntvru.py +++ b/youtube_dl/extractor/ntvru.py @@ -21,7 +21,7 @@ class NTVRuIE(InfoExtractor): 'ext': 'mp4', 'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины', 'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины', - 'thumbnail': 're:^http://.*\.jpg', + 'thumbnail': r're:^http://.*\.jpg', 'duration': 136, }, }, { @@ -32,7 +32,7 @@ class NTVRuIE(InfoExtractor): 'ext': 'mp4', 'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход', 'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход', - 'thumbnail': 're:^http://.*\.jpg', + 'thumbnail': r're:^http://.*\.jpg', 'duration': 172, }, }, { @@ -43,7 +43,7 @@ class NTVRuIE(InfoExtractor): 'ext': 'mp4', 'title': '«Сегодня». 21 марта 2014 года. 16:00', 'description': '«Сегодня». 21 марта 2014 года. 16:00', - 'thumbnail': 're:^http://.*\.jpg', + 'thumbnail': r're:^http://.*\.jpg', 'duration': 1496, }, }, { @@ -54,7 +54,7 @@ class NTVRuIE(InfoExtractor): 'ext': 'mp4', 'title': 'Остросюжетный фильм «Кома»', 'description': 'Остросюжетный фильм «Кома»', - 'thumbnail': 're:^http://.*\.jpg', + 'thumbnail': r're:^http://.*\.jpg', 'duration': 5592, }, }, { @@ -65,7 +65,7 @@ class NTVRuIE(InfoExtractor): 'ext': 'mp4', 'title': '«Дело врачей»: «Деревце жизни»', 'description': '«Дело врачей»: «Деревце жизни»', - 'thumbnail': 're:^http://.*\.jpg', + 'thumbnail': r're:^http://.*\.jpg', 'duration': 2590, }, }] diff --git a/youtube_dl/extractor/oktoberfesttv.py b/youtube_dl/extractor/oktoberfesttv.py index 50fbbc79c..a914068f9 100644 --- a/youtube_dl/extractor/oktoberfesttv.py +++ b/youtube_dl/extractor/oktoberfesttv.py @@ -13,7 +13,7 @@ class OktoberfestTVIE(InfoExtractor): 'id': 'hb-zelt', 'ext': 'mp4', 'title': 're:^Live-Kamera: Hofbräuzelt [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'is_live': True, }, 'params': { diff --git a/youtube_dl/extractor/ondemandkorea.py b/youtube_dl/extractor/ondemandkorea.py index c3e830c23..de1d6b08a 100644 --- a/youtube_dl/extractor/ondemandkorea.py +++ b/youtube_dl/extractor/ondemandkorea.py @@ -16,7 +16,7 @@ class OnDemandKoreaIE(JWPlatformBaseIE): 'id': 'ask-us-anything-e43', 'ext': 'mp4', 'title': 'Ask Us Anything : E43', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { 'skip_download': 'm3u8 download' diff --git a/youtube_dl/extractor/onionstudios.py b/youtube_dl/extractor/onionstudios.py index 6fb1a3fcc..1d336cf30 100644 --- a/youtube_dl/extractor/onionstudios.py +++ b/youtube_dl/extractor/onionstudios.py @@ -22,7 +22,7 @@ class OnionStudiosIE(InfoExtractor): 'id': '2937', 'ext': 'mp4', 'title': 'Hannibal charges forward, stops for a cocktail', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'The A.V. Club', 'uploader_id': 'the-av-club', }, diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py index 8c5ec72d9..2ce9f3826 100644 --- a/youtube_dl/extractor/openload.py +++ b/youtube_dl/extractor/openload.py @@ -19,7 +19,7 @@ class OpenloadIE(InfoExtractor): 'id': 'kUEfGclsU9o', 'ext': 'mp4', 'title': 'skyrim_no-audio_1080.mp4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { 'url': 'https://openload.co/embed/rjC09fkPLYs', @@ -27,7 +27,7 @@ class OpenloadIE(InfoExtractor): 'id': 'rjC09fkPLYs', 'ext': 'mp4', 'title': 'movie.mp4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'subtitles': { 'en': [{ 'ext': 'vtt', diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py index b4cce7ea9..1e2c54e68 100644 --- a/youtube_dl/extractor/orf.py +++ b/youtube_dl/extractor/orf.py @@ -247,7 +247,7 @@ class ORFIPTVIE(InfoExtractor): 'title': 'Weitere Evakuierungen um Vulkan Calbuco', 'description': 'md5:d689c959bdbcf04efeddedbf2299d633', 'duration': 68.197, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20150425', }, } diff --git a/youtube_dl/extractor/pandoratv.py b/youtube_dl/extractor/pandoratv.py index cbb1968d3..89c95fffb 100644 --- a/youtube_dl/extractor/pandoratv.py +++ b/youtube_dl/extractor/pandoratv.py @@ -26,7 +26,7 @@ class PandoraTVIE(InfoExtractor): 'ext': 'flv', 'title': '頭を撫でてくれる?', 'description': '頭を撫でてくれる?', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 39, 'upload_date': '20151218', 'uploader': 'カワイイ動物まとめ', diff --git a/youtube_dl/extractor/pbs.py b/youtube_dl/extractor/pbs.py index f1c0cd068..6baed773f 100644 --- a/youtube_dl/extractor/pbs.py +++ b/youtube_dl/extractor/pbs.py @@ -236,7 +236,7 @@ class PBSIE(InfoExtractor): 'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full', 'description': 'md5:657897370e09e2bc6bf0f8d2cd313c6b', 'duration': 6559, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -249,7 +249,7 @@ class PBSIE(InfoExtractor): 'description': 'md5:c741d14e979fc53228c575894094f157', 'title': 'NOVA - Killer Typhoon', 'duration': 3172, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20140122', 'age_limit': 10, }, @@ -270,7 +270,7 @@ class PBSIE(InfoExtractor): 'title': 'American Experience - Death and the Civil War, Chapter 1', 'description': 'md5:67fa89a9402e2ee7d08f53b920674c18', 'duration': 682, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { 'skip_download': True, # requires ffmpeg @@ -286,7 +286,7 @@ class PBSIE(InfoExtractor): 'title': 'FRONTLINE - United States of Secrets (Part One)', 'description': 'md5:55756bd5c551519cc4b7703e373e217e', 'duration': 6851, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -302,7 +302,7 @@ class PBSIE(InfoExtractor): 'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business", 'description': 'md5:c0ff7475a4b70261c7e58f493c2792a5', 'duration': 1480, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { @@ -315,7 +315,7 @@ class PBSIE(InfoExtractor): 'title': 'FRONTLINE - The Atomic Artists', 'description': 'md5:f677e4520cfacb4a5ce1471e31b57800', 'duration': 723, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { 'skip_download': True, # requires ffmpeg @@ -330,7 +330,7 @@ class PBSIE(InfoExtractor): 'ext': 'mp4', 'title': 'FRONTLINE - Netanyahu at War', 'duration': 6852, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'formats': 'mincount:8', }, }, @@ -568,7 +568,7 @@ class PBSIE(InfoExtractor): # Try turning it to 'program - title' naming scheme if possible alt_title = info.get('program', {}).get('title') if alt_title: - info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + '[\s\-:]+', '', info['title']) + info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + r'[\s\-:]+', '', info['title']) description = info.get('description') or info.get( 'program', {}).get('description') or description diff --git a/youtube_dl/extractor/people.py b/youtube_dl/extractor/people.py index 9ecdbc13b..6ca95715e 100644 --- a/youtube_dl/extractor/people.py +++ b/youtube_dl/extractor/people.py @@ -14,7 +14,7 @@ class PeopleIE(InfoExtractor): 'ext': 'mp4', 'title': 'Astronaut Love Triangle Victim Speaks Out: “The Crime in 2007 Hasn’t Defined Us”', 'description': 'Colleen Shipman speaks to PEOPLE for the first time about life after the attack', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 246.318, 'timestamp': 1458720585, 'upload_date': '20160323', diff --git a/youtube_dl/extractor/phoenix.py b/youtube_dl/extractor/phoenix.py index ac009f60f..e435c28e1 100644 --- a/youtube_dl/extractor/phoenix.py +++ b/youtube_dl/extractor/phoenix.py @@ -1,9 +1,9 @@ from __future__ import unicode_literals -from .zdf import ZDFIE +from .dreisat import DreiSatIE -class PhoenixIE(ZDFIE): +class PhoenixIE(DreiSatIE): IE_NAME = 'phoenix.de' _VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/ (?: diff --git a/youtube_dl/extractor/pinkbike.py b/youtube_dl/extractor/pinkbike.py index a52210fab..6a4580d54 100644 --- a/youtube_dl/extractor/pinkbike.py +++ b/youtube_dl/extractor/pinkbike.py @@ -23,7 +23,7 @@ class PinkbikeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Brandon Semenuk - RAW 100', 'description': 'Official release: www.redbull.ca/rupertwalker', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 100, 'upload_date': '20150406', 'uploader': 'revelco', diff --git a/youtube_dl/extractor/pladform.py b/youtube_dl/extractor/pladform.py index 77e1211d6..e38c7618e 100644 --- a/youtube_dl/extractor/pladform.py +++ b/youtube_dl/extractor/pladform.py @@ -34,7 +34,7 @@ class PladformIE(InfoExtractor): 'ext': 'mp4', 'title': 'Тайны перевала Дятлова • 1 серия 2 часть', 'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 694, 'age_limit': 0, }, diff --git a/youtube_dl/extractor/playtvak.py b/youtube_dl/extractor/playtvak.py index 1e8096a25..391e1bd09 100644 --- a/youtube_dl/extractor/playtvak.py +++ b/youtube_dl/extractor/playtvak.py @@ -25,7 +25,7 @@ class PlaytvakIE(InfoExtractor): 'ext': 'mp4', 'title': 'Vyžeňte vosy a sršně ze zahrady', 'description': 'md5:f93d398691044d303bc4a3de62f3e976', - 'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$', + 'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$', 'duration': 279, 'timestamp': 1438732860, 'upload_date': '20150805', @@ -38,7 +38,7 @@ class PlaytvakIE(InfoExtractor): 'ext': 'flv', 'title': 're:^Přímý přenos iDNES.cz [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'description': 'Sledujte provoz na ranveji Letiště Václava Havla v Praze', - 'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$', + 'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$', 'is_live': True, }, 'params': { @@ -52,7 +52,7 @@ class PlaytvakIE(InfoExtractor): 'ext': 'mp4', 'title': 'Zavřeli jsme mraženou pizzu do auta. Upekla se', 'description': 'md5:01e73f02329e2e5760bd5eed4d42e3c2', - 'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$', + 'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$', 'duration': 39, 'timestamp': 1438969140, 'upload_date': '20150807', @@ -66,7 +66,7 @@ class PlaytvakIE(InfoExtractor): 'ext': 'mp4', 'title': 'Táhni! Demonstrace proti imigrantům budila emoce', 'description': 'md5:97c81d589a9491fbfa323c9fa3cca72c', - 'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$', + 'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$', 'timestamp': 1439052180, 'upload_date': '20150808', 'is_live': False, @@ -79,7 +79,7 @@ class PlaytvakIE(InfoExtractor): 'ext': 'mp4', 'title': 'Recesisté udělali z billboardu kolotoč', 'description': 'md5:7369926049588c3989a66c9c1a043c4c', - 'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$', + 'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$', 'timestamp': 1415725500, 'upload_date': '20141111', 'is_live': False, diff --git a/youtube_dl/extractor/playvid.py b/youtube_dl/extractor/playvid.py index 79c2db085..4aef186ea 100644 --- a/youtube_dl/extractor/playvid.py +++ b/youtube_dl/extractor/playvid.py @@ -34,7 +34,7 @@ class PlayvidIE(InfoExtractor): 'ext': 'mp4', 'title': 'Ellen Euro Cutie Blond Takes a Sexy Survey Get Facial in The Park', 'age_limit': 18, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }] diff --git a/youtube_dl/extractor/playwire.py b/youtube_dl/extractor/playwire.py index 0bc743118..4d96a10a7 100644 --- a/youtube_dl/extractor/playwire.py +++ b/youtube_dl/extractor/playwire.py @@ -18,7 +18,7 @@ class PlaywireIE(InfoExtractor): 'id': '3353705', 'ext': 'mp4', 'title': 'S04_RM_UCL_Rus', - 'thumbnail': 're:^https?://.*\.png$', + 'thumbnail': r're:^https?://.*\.png$', 'duration': 145.94, }, }, { diff --git a/youtube_dl/extractor/polskieradio.py b/youtube_dl/extractor/polskieradio.py index 5ff173774..2ac1fcb0b 100644 --- a/youtube_dl/extractor/polskieradio.py +++ b/youtube_dl/extractor/polskieradio.py @@ -36,7 +36,7 @@ class PolskieRadioIE(InfoExtractor): 'timestamp': 1456594200, 'upload_date': '20160227', 'duration': 2364, - 'thumbnail': 're:^https?://static\.prsa\.pl/images/.*\.jpg$' + 'thumbnail': r're:^https?://static\.prsa\.pl/images/.*\.jpg$' }, }], }, { diff --git a/youtube_dl/extractor/porncom.py b/youtube_dl/extractor/porncom.py index d85e0294d..8218c7d3b 100644 --- a/youtube_dl/extractor/porncom.py +++ b/youtube_dl/extractor/porncom.py @@ -22,7 +22,7 @@ class PornComIE(InfoExtractor): 'display_id': 'teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec', 'ext': 'mp4', 'title': 'Teen grabs a dildo and fucks her pussy live on 1hottie, I rec', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 551, 'view_count': int, 'age_limit': 18, diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py index 8df12eec0..842317e6c 100644 --- a/youtube_dl/extractor/pornhd.py +++ b/youtube_dl/extractor/pornhd.py @@ -21,7 +21,7 @@ class PornHdIE(InfoExtractor): 'ext': 'mp4', 'title': 'Restroom selfie masturbation', 'description': 'md5:3748420395e03e31ac96857a8f125b2b', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'view_count': int, 'age_limit': 18, } @@ -35,7 +35,7 @@ class PornHdIE(InfoExtractor): 'ext': 'mp4', 'title': 'Sierra loves doing laundry', 'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'view_count': int, 'age_limit': 18, }, diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py index 40dbe6967..3eaf56973 100644 --- a/youtube_dl/extractor/pornhub.py +++ b/youtube_dl/extractor/pornhub.py @@ -229,7 +229,14 @@ class PornHubPlaylistBaseIE(InfoExtractor): webpage = self._download_webpage(url, playlist_id) - entries = self._extract_entries(webpage) + # Only process container div with main playlist content skipping + # drop-down menu that uses similar pattern for videos (see + # https://github.com/rg3/youtube-dl/issues/11594). + container = self._search_regex( + r'(?s)(]+class=["\']container.+)', webpage, + 'container', default=webpage) + + entries = self._extract_entries(container) playlist = self._parse_json( self._search_regex( @@ -243,12 +250,12 @@ class PornHubPlaylistBaseIE(InfoExtractor): class PornHubPlaylistIE(PornHubPlaylistBaseIE): _VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P\d+)' _TESTS = [{ - 'url': 'http://www.pornhub.com/playlist/6201671', + 'url': 'http://www.pornhub.com/playlist/4667351', 'info_dict': { - 'id': '6201671', - 'title': 'P0p4', + 'id': '4667351', + 'title': 'Nataly Hot', }, - 'playlist_mincount': 35, + 'playlist_mincount': 2, }] diff --git a/youtube_dl/extractor/pornotube.py b/youtube_dl/extractor/pornotube.py index 63816c358..1b5b9a320 100644 --- a/youtube_dl/extractor/pornotube.py +++ b/youtube_dl/extractor/pornotube.py @@ -19,7 +19,7 @@ class PornotubeIE(InfoExtractor): 'description': 'md5:a8304bef7ef06cb4ab476ca6029b01b0', 'categories': ['Adult Humor', 'Blondes'], 'uploader': 'Alpha Blue Archives', - 'thumbnail': 're:^https?://.*\\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1417582800, 'age_limit': 18, } diff --git a/youtube_dl/extractor/pornovoisines.py b/youtube_dl/extractor/pornovoisines.py index 58f557e39..b6b71069d 100644 --- a/youtube_dl/extractor/pornovoisines.py +++ b/youtube_dl/extractor/pornovoisines.py @@ -23,7 +23,7 @@ class PornoVoisinesIE(InfoExtractor): 'ext': 'mp4', 'title': 'Recherche appartement', 'description': 'md5:fe10cb92ae2dd3ed94bb4080d11ff493', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20140925', 'duration': 120, 'view_count': int, diff --git a/youtube_dl/extractor/pornoxo.py b/youtube_dl/extractor/pornoxo.py index 3c9087f2d..1a0cce7e0 100644 --- a/youtube_dl/extractor/pornoxo.py +++ b/youtube_dl/extractor/pornoxo.py @@ -20,7 +20,7 @@ class PornoXOIE(JWPlatformBaseIE): 'display_id': 'striptease-from-sexy-secretary', 'description': 'md5:0ee35252b685b3883f4a1d38332f9980', 'categories': list, # NSFW - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'age_limit': 18, } } diff --git a/youtube_dl/extractor/presstv.py b/youtube_dl/extractor/presstv.py index 2da93ed34..b5c279203 100644 --- a/youtube_dl/extractor/presstv.py +++ b/youtube_dl/extractor/presstv.py @@ -19,7 +19,7 @@ class PressTVIE(InfoExtractor): 'ext': 'mp4', 'title': 'Organic mattresses used to clean waste water', 'upload_date': '20160409', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'description': 'md5:20002e654bbafb6908395a5c0cfcd125' } } diff --git a/youtube_dl/extractor/promptfile.py b/youtube_dl/extractor/promptfile.py index d40cca06f..23ac93d7e 100644 --- a/youtube_dl/extractor/promptfile.py +++ b/youtube_dl/extractor/promptfile.py @@ -20,7 +20,7 @@ class PromptFileIE(InfoExtractor): 'id': '86D1CE8462-576CAAE416', 'ext': 'mp4', 'title': 'oceans.mp4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py index 30478f979..03e1b1f7f 100644 --- a/youtube_dl/extractor/prosiebensat1.py +++ b/youtube_dl/extractor/prosiebensat1.py @@ -394,7 +394,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE): self._PLAYLIST_ID_REGEXES, webpage, 'playlist id') playlist = self._parse_json( self._search_regex( - 'var\s+contentResources\s*=\s*(\[.+?\]);\s*]+src=(?P[\'"])(?P(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)', + r']+src=(?P[\'"])(?P(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)', webpage) if mobj: return mobj.group('url') diff --git a/youtube_dl/extractor/ruhd.py b/youtube_dl/extractor/ruhd.py index ce631b46c..2b830cf47 100644 --- a/youtube_dl/extractor/ruhd.py +++ b/youtube_dl/extractor/ruhd.py @@ -14,7 +14,7 @@ class RUHDIE(InfoExtractor): 'ext': 'divx', 'title': 'КОТ бааааам', 'description': 'классный кот)', - 'thumbnail': 're:^http://.*\.jpg$', + 'thumbnail': r're:^http://.*\.jpg$', } } diff --git a/youtube_dl/extractor/ruutu.py b/youtube_dl/extractor/ruutu.py index 6db3e3e93..f12bc5614 100644 --- a/youtube_dl/extractor/ruutu.py +++ b/youtube_dl/extractor/ruutu.py @@ -23,7 +23,7 @@ class RuutuIE(InfoExtractor): 'ext': 'mp4', 'title': 'Oletko aina halunnut tietää mitä tapahtuu vain hetki ennen lähetystä? - Nyt se selvisi!', 'description': 'md5:cfc6ccf0e57a814360df464a91ff67d6', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 114, 'age_limit': 0, }, @@ -36,7 +36,7 @@ class RuutuIE(InfoExtractor): 'ext': 'mp4', 'title': 'Superpesis: katso koko kausi Ruudussa', 'description': 'md5:bfb7336df2a12dc21d18fa696c9f8f23', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 40, 'age_limit': 0, }, @@ -49,7 +49,7 @@ class RuutuIE(InfoExtractor): 'ext': 'mp4', 'title': 'Osa 1: Mikael Jungner', 'description': 'md5:7d90f358c47542e3072ff65d7b1bcffe', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'age_limit': 0, }, }, diff --git a/youtube_dl/extractor/savefrom.py b/youtube_dl/extractor/savefrom.py index 5b7367b94..30f9cf824 100644 --- a/youtube_dl/extractor/savefrom.py +++ b/youtube_dl/extractor/savefrom.py @@ -20,7 +20,7 @@ class SaveFromIE(InfoExtractor): 'upload_date': '20120816', 'uploader': 'Howcast', 'uploader_id': 'Howcast', - 'description': 're:(?s).* Hi, my name is Rene Dreifuss\. And I\'m here to show you some MMA.*', + 'description': r're:(?s).* Hi, my name is Rene Dreifuss\. And I\'m here to show you some MMA.*', }, 'params': { 'skip_download': True diff --git a/youtube_dl/extractor/sbs.py b/youtube_dl/extractor/sbs.py index 43131fb7e..845712a76 100644 --- a/youtube_dl/extractor/sbs.py +++ b/youtube_dl/extractor/sbs.py @@ -22,7 +22,7 @@ class SBSIE(InfoExtractor): 'ext': 'mp4', 'title': 'Dingo Conservation (The Feed)', 'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'duration': 308, 'timestamp': 1408613220, 'upload_date': '20140821', diff --git a/youtube_dl/extractor/screencast.py b/youtube_dl/extractor/screencast.py index ed9de9648..62a6a8337 100644 --- a/youtube_dl/extractor/screencast.py +++ b/youtube_dl/extractor/screencast.py @@ -21,7 +21,7 @@ class ScreencastIE(InfoExtractor): 'ext': 'm4v', 'title': 'Color Measurement with Ocean Optics Spectrometers', 'description': 'md5:240369cde69d8bed61349a199c5fb153', - 'thumbnail': 're:^https?://.*\.(?:gif|jpg)$', + 'thumbnail': r're:^https?://.*\.(?:gif|jpg)$', } }, { 'url': 'http://www.screencast.com/t/V2uXehPJa1ZI', @@ -31,7 +31,7 @@ class ScreencastIE(InfoExtractor): 'ext': 'mov', 'title': 'The Amadeus Spectrometer', 'description': 're:^In this video, our friends at.*To learn more about Amadeus, visit', - 'thumbnail': 're:^https?://.*\.(?:gif|jpg)$', + 'thumbnail': r're:^https?://.*\.(?:gif|jpg)$', } }, { 'url': 'http://www.screencast.com/t/aAB3iowa', @@ -41,7 +41,7 @@ class ScreencastIE(InfoExtractor): 'ext': 'mp4', 'title': 'Google Earth Export', 'description': 'Provides a demo of a CommunityViz export to Google Earth, one of the 3D viewing options.', - 'thumbnail': 're:^https?://.*\.(?:gif|jpg)$', + 'thumbnail': r're:^https?://.*\.(?:gif|jpg)$', } }, { 'url': 'http://www.screencast.com/t/X3ddTrYh', @@ -51,7 +51,7 @@ class ScreencastIE(InfoExtractor): 'ext': 'wmv', 'title': 'Toolkit 6 User Group Webinar (2014-03-04) - Default Judgment and First Impression', 'description': 'md5:7b9f393bc92af02326a5c5889639eab0', - 'thumbnail': 're:^https?://.*\.(?:gif|jpg)$', + 'thumbnail': r're:^https?://.*\.(?:gif|jpg)$', } }, { 'url': 'http://screencast.com/t/aAB3iowa', diff --git a/youtube_dl/extractor/screencastomatic.py b/youtube_dl/extractor/screencastomatic.py index 7a88a42cd..94a2a37d2 100644 --- a/youtube_dl/extractor/screencastomatic.py +++ b/youtube_dl/extractor/screencastomatic.py @@ -14,7 +14,7 @@ class ScreencastOMaticIE(JWPlatformBaseIE): 'id': 'c2lD3BeOPl', 'ext': 'mp4', 'title': 'Welcome to 3-4 Philosophy @ DECV!', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.', 'duration': 369.163, } diff --git a/youtube_dl/extractor/screenjunkies.py b/youtube_dl/extractor/screenjunkies.py deleted file mode 100644 index 02e574cd8..000000000 --- a/youtube_dl/extractor/screenjunkies.py +++ /dev/null @@ -1,138 +0,0 @@ -from __future__ import unicode_literals - -import re - -from .common import InfoExtractor -from ..compat import compat_str -from ..utils import ( - int_or_none, - parse_age_limit, -) - - -class ScreenJunkiesIE(InfoExtractor): - _VALID_URL = r'https?://(?:www\.)?screenjunkies\.com/video/(?P[^/]+?)(?:-(?P\d+))?(?:[/?#&]|$)' - _TESTS = [{ - 'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915', - 'md5': '5c2b686bec3d43de42bde9ec047536b0', - 'info_dict': { - 'id': '2841915', - 'display_id': 'best-quentin-tarantino-movie', - 'ext': 'mp4', - 'title': 'Best Quentin Tarantino Movie', - 'thumbnail': 're:^https?://.*\.jpg', - 'duration': 3671, - 'age_limit': 13, - 'tags': list, - }, - }, { - 'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight', - 'info_dict': { - 'id': '2348808', - 'display_id': 'honest-trailers-the-dark-knight', - 'ext': 'mp4', - 'title': "Honest Trailers: 'The Dark Knight'", - 'thumbnail': 're:^https?://.*\.jpg', - 'age_limit': 10, - 'tags': list, - }, - }, { - # requires subscription but worked around - 'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285', - 'info_dict': { - 'id': '3003285', - 'display_id': 'knocking-dead-ep-1-the-show-so-far', - 'ext': 'mp4', - 'title': 'Knocking Dead Ep 1: State of The Dead Recap', - 'thumbnail': 're:^https?://.*\.jpg', - 'duration': 3307, - 'age_limit': 13, - 'tags': list, - }, - }] - - _DEFAULT_BITRATES = (48, 150, 496, 864, 2240) - - def _real_extract(self, url): - mobj = re.match(self._VALID_URL, url) - video_id = mobj.group('id') - display_id = mobj.group('display_id') - - if not video_id: - webpage = self._download_webpage(url, display_id) - video_id = self._search_regex( - (r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'), - webpage, 'video id') - - webpage = self._download_webpage( - 'http://www.screenjunkies.com/embed/%s' % video_id, - display_id, 'Downloading video embed page') - embed_vars = self._parse_json( - self._search_regex( - r'(?s)embedVars\s*=\s*({.+?})\s*', webpage, 'embed vars'), - display_id) - - title = embed_vars['contentName'] - - formats = [] - bitrates = [] - for f in embed_vars.get('media', []): - if not f.get('uri') or f.get('mediaPurpose') != 'play': - continue - bitrate = int_or_none(f.get('bitRate')) - if bitrate: - bitrates.append(bitrate) - formats.append({ - 'url': f['uri'], - 'format_id': 'http-%d' % bitrate if bitrate else 'http', - 'width': int_or_none(f.get('width')), - 'height': int_or_none(f.get('height')), - 'tbr': bitrate, - 'format': 'mp4', - }) - - if not bitrates: - # When subscriptionLevel > 0, i.e. plus subscription is required - # media list will be empty. However, hds and hls uris are still - # available. We can grab them assuming bitrates to be default. - bitrates = self._DEFAULT_BITRATES - - auth_token = embed_vars.get('AuthToken') - - def construct_manifest_url(base_url, ext): - pieces = [base_url] - pieces.extend([compat_str(b) for b in bitrates]) - pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token)) - return ','.join(pieces) - - if bitrates and auth_token: - hds_url = embed_vars.get('hdsUri') - if hds_url: - f4m_formats = self._extract_f4m_formats( - construct_manifest_url(hds_url, 'f4m'), - display_id, f4m_id='hds', fatal=False) - if len(f4m_formats) == len(bitrates): - for f, bitrate in zip(f4m_formats, bitrates): - if not f.get('tbr'): - f['format_id'] = 'hds-%d' % bitrate - f['tbr'] = bitrate - # TODO: fix f4m downloader to handle manifests without bitrates if possible - # formats.extend(f4m_formats) - - hls_url = embed_vars.get('hlsUri') - if hls_url: - formats.extend(self._extract_m3u8_formats( - construct_manifest_url(hls_url, 'm3u8'), - display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False)) - self._sort_formats(formats) - - return { - 'id': video_id, - 'display_id': display_id, - 'title': title, - 'thumbnail': embed_vars.get('thumbUri'), - 'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None, - 'age_limit': parse_age_limit(embed_vars.get('audienceRating')), - 'tags': embed_vars.get('tags', '').split(','), - 'formats': formats, - } diff --git a/youtube_dl/extractor/senateisvp.py b/youtube_dl/extractor/senateisvp.py index 35540c082..387a4f7f6 100644 --- a/youtube_dl/extractor/senateisvp.py +++ b/youtube_dl/extractor/senateisvp.py @@ -55,7 +55,7 @@ class SenateISVPIE(InfoExtractor): 'id': 'judiciary031715', 'ext': 'mp4', 'title': 'Integrated Senate Video Player', - 'thumbnail': 're:^https?://.*\.(?:jpg|png)$', + 'thumbnail': r're:^https?://.*\.(?:jpg|png)$', }, 'params': { # m3u8 download diff --git a/youtube_dl/extractor/sendtonews.py b/youtube_dl/extractor/sendtonews.py index 2dbe490bb..9880a5a78 100644 --- a/youtube_dl/extractor/sendtonews.py +++ b/youtube_dl/extractor/sendtonews.py @@ -8,6 +8,9 @@ from ..utils import ( float_or_none, parse_iso8601, update_url_query, + int_or_none, + determine_protocol, + unescapeHTML, ) @@ -20,18 +23,18 @@ class SendtoNewsIE(JWPlatformBaseIE): 'info_dict': { 'id': 'GxfCe0Zo7D-175909-5588' }, - 'playlist_count': 9, + 'playlist_count': 8, # test the first video only to prevent lengthy tests 'playlist': [{ 'info_dict': { - 'id': '198180', + 'id': '240385', 'ext': 'mp4', - 'title': 'Recap: CLE 5, LAA 4', - 'description': '8/14/16: Naquin, Almonte lead Indians in 5-4 win', - 'duration': 57.343, - 'thumbnail': 're:https?://.*\.jpg$', - 'upload_date': '20160815', - 'timestamp': 1471221961, + 'title': 'Indians introduce Encarnacion', + 'description': 'Indians president of baseball operations Chris Antonetti and Edwin Encarnacion discuss the slugger\'s three-year contract with Cleveland', + 'duration': 137.898, + 'thumbnail': r're:https?://.*\.jpg$', + 'upload_date': '20170105', + 'timestamp': 1483649762, }, }], 'params': { @@ -64,7 +67,20 @@ class SendtoNewsIE(JWPlatformBaseIE): for video in playlist_data['playlistData'][0]: info_dict = self._parse_jwplayer_data( video['jwconfiguration'], - require_title=False, rtmp_params={'no_resume': True}) + require_title=False, m3u8_id='hls', rtmp_params={'no_resume': True}) + + for f in info_dict['formats']: + if f.get('tbr'): + continue + tbr = int_or_none(self._search_regex( + r'/(\d+)k/', f['url'], 'bitrate', default=None)) + if not tbr: + continue + f.update({ + 'format_id': '%s-%d' % (determine_protocol(f), tbr), + 'tbr': tbr, + }) + self._sort_formats(info_dict['formats'], ('tbr', 'height', 'width', 'format_id')) thumbnails = [] if video.get('thumbnailUrl'): @@ -78,8 +94,8 @@ class SendtoNewsIE(JWPlatformBaseIE): 'url': video['smThumbnailUrl'], }) info_dict.update({ - 'title': video['S_headLine'], - 'description': video.get('S_fullStory'), + 'title': video['S_headLine'].strip(), + 'description': unescapeHTML(video.get('S_fullStory')), 'thumbnails': thumbnails, 'duration': float_or_none(video.get('SM_length')), 'timestamp': parse_iso8601(video.get('S_sysDate'), delimiter=' '), diff --git a/youtube_dl/extractor/sexu.py b/youtube_dl/extractor/sexu.py index a99b2a8e7..5e22ea730 100644 --- a/youtube_dl/extractor/sexu.py +++ b/youtube_dl/extractor/sexu.py @@ -14,7 +14,7 @@ class SexuIE(InfoExtractor): 'title': 'md5:4d05a19a5fc049a63dbbaf05fb71d91b', 'description': 'md5:2b75327061310a3afb3fbd7d09e2e403', 'categories': list, # NSFW - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'age_limit': 18, } } diff --git a/youtube_dl/extractor/sharesix.py b/youtube_dl/extractor/sharesix.py deleted file mode 100644 index 9cce5ceb4..000000000 --- a/youtube_dl/extractor/sharesix.py +++ /dev/null @@ -1,91 +0,0 @@ -# coding: utf-8 -from __future__ import unicode_literals - -import re - -from .common import InfoExtractor -from ..utils import ( - parse_duration, - sanitized_Request, - urlencode_postdata, -) - - -class ShareSixIE(InfoExtractor): - _VALID_URL = r'https?://(?:www\.)?sharesix\.com/(?:f/)?(?P[0-9a-zA-Z]+)' - _TESTS = [ - { - 'url': 'http://sharesix.com/f/OXjQ7Y6', - 'md5': '9e8e95d8823942815a7d7c773110cc93', - 'info_dict': { - 'id': 'OXjQ7Y6', - 'ext': 'mp4', - 'title': 'big_buck_bunny_480p_surround-fix.avi', - 'duration': 596, - 'width': 854, - 'height': 480, - }, - }, - { - 'url': 'http://sharesix.com/lfrwoxp35zdd', - 'md5': 'dd19f1435b7cec2d7912c64beeee8185', - 'info_dict': { - 'id': 'lfrwoxp35zdd', - 'ext': 'flv', - 'title': 'WhiteBoard___a_Mac_vs_PC_Parody_Cartoon.mp4.flv', - 'duration': 65, - 'width': 1280, - 'height': 720, - }, - } - ] - - def _real_extract(self, url): - mobj = re.match(self._VALID_URL, url) - video_id = mobj.group('id') - - fields = { - 'method_free': 'Free' - } - post = urlencode_postdata(fields) - req = sanitized_Request(url, post) - req.add_header('Content-type', 'application/x-www-form-urlencoded') - - webpage = self._download_webpage(req, video_id, - 'Downloading video page') - - video_url = self._search_regex( - r"var\slnk1\s=\s'([^']+)'", webpage, 'video URL') - title = self._html_search_regex( - r'(?s)
Filename:
.+?
(.+?)
', webpage, 'title') - duration = parse_duration( - self._search_regex( - r'(?s)
Length:
.+?
(.+?)
', - webpage, - 'duration', - fatal=False - ) - ) - - m = re.search( - r'''(?xs)
Width\sx\sHeight
.+? -
(?P\d+)\sx\s(?P\d+)
''', - webpage - ) - width = height = None - if m: - width, height = int(m.group('width')), int(m.group('height')) - - formats = [{ - 'format_id': 'sd', - 'url': video_url, - 'width': width, - 'height': height, - }] - - return { - 'id': video_id, - 'title': title, - 'duration': duration, - 'formats': formats, - } diff --git a/youtube_dl/extractor/showroomlive.py b/youtube_dl/extractor/showroomlive.py new file mode 100644 index 000000000..efd9d561f --- /dev/null +++ b/youtube_dl/extractor/showroomlive.py @@ -0,0 +1,84 @@ +# coding: utf-8 +from __future__ import unicode_literals + +from .common import InfoExtractor +from ..compat import compat_str +from ..utils import ( + ExtractorError, + int_or_none, + urljoin, +) + + +class ShowRoomLiveIE(InfoExtractor): + _VALID_URL = r'https?://(?:www\.)?showroom-live\.com/(?!onlive|timetable|event|campaign|news|ranking|room)(?P[^/?#&]+)' + _TEST = { + 'url': 'https://www.showroom-live.com/48_Nana_Okada', + 'only_matching': True, + } + + def _real_extract(self, url): + broadcaster_id = self._match_id(url) + + webpage = self._download_webpage(url, broadcaster_id) + + room_id = self._search_regex( + (r'SrGlobal\.roomId\s*=\s*(\d+)', + r'(?:profile|room)\?room_id\=(\d+)'), webpage, 'room_id') + + room = self._download_json( + urljoin(url, '/api/room/profile?room_id=%s' % room_id), + broadcaster_id) + + is_live = room.get('is_onlive') + if is_live is not True: + raise ExtractorError('%s is offline' % broadcaster_id, expected=True) + + uploader = room.get('performer_name') or broadcaster_id + title = room.get('room_name') or room.get('main_name') or uploader + + streaming_url_list = self._download_json( + urljoin(url, '/api/live/streaming_url?room_id=%s' % room_id), + broadcaster_id)['streaming_url_list'] + + formats = [] + for stream in streaming_url_list: + stream_url = stream.get('url') + if not stream_url: + continue + stream_type = stream.get('type') + if stream_type == 'hls': + m3u8_formats = self._extract_m3u8_formats( + stream_url, broadcaster_id, ext='mp4', m3u8_id='hls', + live=True) + for f in m3u8_formats: + f['quality'] = int_or_none(stream.get('quality', 100)) + formats.extend(m3u8_formats) + elif stream_type == 'rtmp': + stream_name = stream.get('stream_name') + if not stream_name: + continue + formats.append({ + 'url': stream_url, + 'play_path': stream_name, + 'page_url': url, + 'player_url': 'https://www.showroom-live.com/assets/swf/v3/ShowRoomLive.swf', + 'rtmp_live': True, + 'ext': 'flv', + 'format_id': 'rtmp', + 'format_note': stream.get('label'), + 'quality': int_or_none(stream.get('quality', 100)), + }) + self._sort_formats(formats) + + return { + 'id': compat_str(room.get('live_id') or broadcaster_id), + 'title': self._live_title(title), + 'description': room.get('description'), + 'timestamp': int_or_none(room.get('current_live_started_at')), + 'uploader': uploader, + 'uploader_id': broadcaster_id, + 'view_count': int_or_none(room.get('view_num')), + 'formats': formats, + 'is_live': True, + } diff --git a/youtube_dl/extractor/skysports.py b/youtube_dl/extractor/skysports.py index 9dc78c7d2..4ca9f6b3c 100644 --- a/youtube_dl/extractor/skysports.py +++ b/youtube_dl/extractor/skysports.py @@ -2,18 +2,19 @@ from __future__ import unicode_literals from .common import InfoExtractor +from ..utils import strip_or_none class SkySportsIE(InfoExtractor): _VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/(?P[0-9]+)' _TEST = { 'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine', - 'md5': 'c44a1db29f27daf9a0003e010af82100', + 'md5': '77d59166cddc8d3cb7b13e35eaf0f5ec', 'info_dict': { 'id': '10328419', - 'ext': 'flv', - 'title': 'Bale: Its our time to shine', - 'description': 'md5:9fd1de3614d525f5addda32ac3c482c9', + 'ext': 'mp4', + 'title': 'Bale: It\'s our time to shine', + 'description': 'md5:e88bda94ae15f7720c5cb467e777bb6d', }, 'add_ie': ['Ooyala'], } @@ -28,6 +29,6 @@ class SkySportsIE(InfoExtractor): 'url': 'ooyala:%s' % self._search_regex( r'data-video-id="([^"]+)"', webpage, 'ooyala id'), 'title': self._og_search_title(webpage), - 'description': self._og_search_description(webpage), + 'description': strip_or_none(self._og_search_description(webpage)), 'ie_key': 'Ooyala', } diff --git a/youtube_dl/extractor/slutload.py b/youtube_dl/extractor/slutload.py index 18cc7721e..7145d285a 100644 --- a/youtube_dl/extractor/slutload.py +++ b/youtube_dl/extractor/slutload.py @@ -13,7 +13,7 @@ class SlutloadIE(InfoExtractor): 'ext': 'mp4', 'title': 'virginie baisee en cam', 'age_limit': 18, - 'thumbnail': 're:https?://.*?\.jpg' + 'thumbnail': r're:https?://.*?\.jpg' } } diff --git a/youtube_dl/extractor/smotri.py b/youtube_dl/extractor/smotri.py index def46abda..370fa8879 100644 --- a/youtube_dl/extractor/smotri.py +++ b/youtube_dl/extractor/smotri.py @@ -81,7 +81,7 @@ class SmotriIE(InfoExtractor): 'uploader': 'psavari1', 'uploader_id': 'psavari1', 'upload_date': '20081103', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { 'videopassword': '223322', @@ -117,7 +117,7 @@ class SmotriIE(InfoExtractor): 'uploader': 'вАся', 'uploader_id': 'asya_prosto', 'upload_date': '20081218', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'age_limit': 18, }, 'params': { diff --git a/youtube_dl/extractor/snotr.py b/youtube_dl/extractor/snotr.py index 4819fe5b4..f77354748 100644 --- a/youtube_dl/extractor/snotr.py +++ b/youtube_dl/extractor/snotr.py @@ -22,7 +22,7 @@ class SnotrIE(InfoExtractor): 'duration': 248, 'filesize_approx': 40700000, 'description': 'A drone flying through Fourth of July Fireworks', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'expected_warnings': ['description'], }, { @@ -34,7 +34,7 @@ class SnotrIE(InfoExtractor): 'duration': 126, 'filesize_approx': 8500000, 'description': 'The top 10 George W. Bush moments, brought to you by David Letterman!', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }] diff --git a/youtube_dl/extractor/soundgasm.py b/youtube_dl/extractor/soundgasm.py index 3a4ddf57e..e004e2c5a 100644 --- a/youtube_dl/extractor/soundgasm.py +++ b/youtube_dl/extractor/soundgasm.py @@ -27,7 +27,7 @@ class SoundgasmIE(InfoExtractor): webpage = self._download_webpage(url, display_id) audio_url = self._html_search_regex( r'(?s)m4a\:\s"([^"]+)"', webpage, 'audio URL') - audio_id = re.split('\/|\.', audio_url)[-2] + audio_id = re.split(r'\/|\.', audio_url)[-2] description = self._html_search_regex( r'(?s)
  • Description:\s(.*?)<\/li>', webpage, 'description', fatal=False) diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py index 186d22b7d..123c33ac3 100644 --- a/youtube_dl/extractor/spankbang.py +++ b/youtube_dl/extractor/spankbang.py @@ -15,7 +15,7 @@ class SpankBangIE(InfoExtractor): 'ext': 'mp4', 'title': 'fantasy solo', 'description': 'Watch fantasy solo free HD porn video - 05 minutes - dillion harper masturbates on a bed free adult movies.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'silly2587', 'age_limit': 18, } diff --git a/youtube_dl/extractor/spankwire.py b/youtube_dl/extractor/spankwire.py index 92a7120a3..44d8fa52f 100644 --- a/youtube_dl/extractor/spankwire.py +++ b/youtube_dl/extractor/spankwire.py @@ -85,7 +85,7 @@ class SpankwireIE(InfoExtractor): r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage) heights = [int(video[0]) for video in videos] video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos])) - if webpage.find('flashvars\.encrypted = "true"') != -1: + if webpage.find(r'flashvars\.encrypted = "true"') != -1: password = self._search_regex( r'flashvars\.video_title = "([^"]+)', webpage, 'password').replace('+', ' ') diff --git a/youtube_dl/extractor/spiegeltv.py b/youtube_dl/extractor/spiegeltv.py index 034bd47ff..e1cfb8698 100644 --- a/youtube_dl/extractor/spiegeltv.py +++ b/youtube_dl/extractor/spiegeltv.py @@ -18,7 +18,7 @@ class SpiegeltvIE(InfoExtractor): 'ext': 'm4v', 'title': 'Flug MH370', 'description': 'Das Rätsel um die Boeing 777 der Malaysia-Airlines', - 'thumbnail': 're:http://.*\.jpg$', + 'thumbnail': r're:http://.*\.jpg$', }, 'params': { # m3u8 download diff --git a/youtube_dl/extractor/sport5.py b/youtube_dl/extractor/sport5.py index 7e6783306..a417b5a4e 100644 --- a/youtube_dl/extractor/sport5.py +++ b/youtube_dl/extractor/sport5.py @@ -41,7 +41,7 @@ class Sport5IE(InfoExtractor): webpage = self._download_webpage(url, media_id) - video_id = self._html_search_regex('clipId=([\w-]+)', webpage, 'video id') + video_id = self._html_search_regex(r'clipId=([\w-]+)', webpage, 'video id') metadata = self._download_xml( 'http://sport5-metadata-rr-d.nsacdn.com/vod/vod/%s/HDS/metadata.xml' % video_id, diff --git a/youtube_dl/extractor/sportbox.py b/youtube_dl/extractor/sportbox.py index e5c28ae89..b512cd20f 100644 --- a/youtube_dl/extractor/sportbox.py +++ b/youtube_dl/extractor/sportbox.py @@ -21,7 +21,7 @@ class SportBoxIE(InfoExtractor): 'ext': 'mp4', 'title': 'Гонка 2 заезд ««Объединенный 2000»: классы Туринг и Супер-продакшн', 'description': 'md5:3d72dc4a006ab6805d82f037fdc637ad', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20140928', }, 'params': { @@ -73,7 +73,7 @@ class SportBoxEmbedIE(InfoExtractor): 'id': '211355', 'ext': 'mp4', 'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { # m3u8 download diff --git a/youtube_dl/extractor/sportdeutschland.py b/youtube_dl/extractor/sportdeutschland.py index a9927f6e2..a3c35a899 100644 --- a/youtube_dl/extractor/sportdeutschland.py +++ b/youtube_dl/extractor/sportdeutschland.py @@ -20,8 +20,8 @@ class SportDeutschlandIE(InfoExtractor): 'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen', 'categories': ['Badminton'], 'view_count': int, - 'thumbnail': 're:^https?://.*\.jpg$', - 'description': 're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV', + 'thumbnail': r're:^https?://.*\.jpg$', + 'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV', 'timestamp': int, 'upload_date': 're:^201408[23][0-9]$', }, @@ -38,7 +38,7 @@ class SportDeutschlandIE(InfoExtractor): 'timestamp': 1408976060, 'duration': 2732, 'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'view_count': int, 'categories': ['Li-Ning Badminton WM 2014'], diff --git a/youtube_dl/extractor/srgssr.py b/youtube_dl/extractor/srgssr.py index 847d3c08f..47aa887cc 100644 --- a/youtube_dl/extractor/srgssr.py +++ b/youtube_dl/extractor/srgssr.py @@ -153,7 +153,7 @@ class SRGSSRPlayIE(InfoExtractor): 'uploader': '19h30', 'upload_date': '20141201', 'timestamp': 1417458600, - 'thumbnail': 're:^https?://.*\.image', + 'thumbnail': r're:^https?://.*\.image', 'view_count': int, }, 'params': { diff --git a/youtube_dl/extractor/srmediathek.py b/youtube_dl/extractor/srmediathek.py index b03272f7a..28baf901c 100644 --- a/youtube_dl/extractor/srmediathek.py +++ b/youtube_dl/extractor/srmediathek.py @@ -20,7 +20,7 @@ class SRMediathekIE(ARDMediathekIE): 'ext': 'mp4', 'title': 'sportarena (26.10.2014)', 'description': 'Ringen: KSV Köllerbach gegen Aachen-Walheim; Frauen-Fußball: 1. FC Saarbrücken gegen Sindelfingen; Motorsport: Rallye in Losheim; dazu: Interview mit Timo Bernhard; Turnen: TG Saar; Reitsport: Deutscher Voltigier-Pokal; Badminton: Interview mit Michael Fuchs ', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'skip': 'no longer available', }, { diff --git a/youtube_dl/extractor/stanfordoc.py b/youtube_dl/extractor/stanfordoc.py index 4a3d8bb8f..cce65fb10 100644 --- a/youtube_dl/extractor/stanfordoc.py +++ b/youtube_dl/extractor/stanfordoc.py @@ -66,7 +66,7 @@ class StanfordOpenClassroomIE(InfoExtractor): r'(?s)([^<]+)', coursepage, 'description', fatal=False) - links = orderedSet(re.findall('', coursepage)) + links = orderedSet(re.findall(r'', coursepage)) info['entries'] = [self.url_result( 'http://openclassroom.stanford.edu/MainFolder/%s' % unescapeHTML(l) ) for l in links] @@ -84,7 +84,7 @@ class StanfordOpenClassroomIE(InfoExtractor): rootpage = self._download_webpage(rootURL, info['id'], errnote='Unable to download course info page') - links = orderedSet(re.findall('', rootpage)) + links = orderedSet(re.findall(r'', rootpage)) info['entries'] = [self.url_result( 'http://openclassroom.stanford.edu/MainFolder/%s' % unescapeHTML(l) ) for l in links] diff --git a/youtube_dl/extractor/stitcher.py b/youtube_dl/extractor/stitcher.py index 0f8782d03..97d1ff681 100644 --- a/youtube_dl/extractor/stitcher.py +++ b/youtube_dl/extractor/stitcher.py @@ -22,7 +22,7 @@ class StitcherIE(InfoExtractor): 'title': 'Machine Learning Mastery and Cancer Clusters', 'description': 'md5:55163197a44e915a14a1ac3a1de0f2d3', 'duration': 1604, - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', }, }, { 'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true', @@ -33,7 +33,7 @@ class StitcherIE(InfoExtractor): 'title': "The CW's 'Crazy Ex-Girlfriend'", 'description': 'md5:04f1e2f98eb3f5cbb094cea0f9e19b17', 'duration': 2235, - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', }, 'params': { 'skip_download': True, diff --git a/youtube_dl/extractor/streamable.py b/youtube_dl/extractor/streamable.py index 2c26fa689..e973c867c 100644 --- a/youtube_dl/extractor/streamable.py +++ b/youtube_dl/extractor/streamable.py @@ -21,7 +21,7 @@ class StreamableIE(InfoExtractor): 'id': 'dnd1', 'ext': 'mp4', 'title': 'Mikel Oiarzabal scores to make it 0-3 for La Real against Espanyol', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'uploader': 'teabaker', 'timestamp': 1454964157.35115, 'upload_date': '20160208', @@ -37,7 +37,7 @@ class StreamableIE(InfoExtractor): 'id': 'moo', 'ext': 'mp4', 'title': '"Please don\'t eat me!"', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'timestamp': 1426115495, 'upload_date': '20150311', 'duration': 12, diff --git a/youtube_dl/extractor/streetvoice.py b/youtube_dl/extractor/streetvoice.py index e529051d1..91612c7f2 100644 --- a/youtube_dl/extractor/streetvoice.py +++ b/youtube_dl/extractor/streetvoice.py @@ -16,7 +16,7 @@ class StreetVoiceIE(InfoExtractor): 'ext': 'mp3', 'title': '輸', 'description': 'Crispy脆樂團 - 輸', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 260, 'upload_date': '20091018', 'uploader': 'Crispy脆樂團', diff --git a/youtube_dl/extractor/sunporno.py b/youtube_dl/extractor/sunporno.py index ef9be7926..68051169b 100644 --- a/youtube_dl/extractor/sunporno.py +++ b/youtube_dl/extractor/sunporno.py @@ -21,7 +21,7 @@ class SunPornoIE(InfoExtractor): 'ext': 'mp4', 'title': 'md5:0a400058e8105d39e35c35e7c5184164', 'description': 'md5:a31241990e1bd3a64e72ae99afb325fb', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 302, 'age_limit': 18, } diff --git a/youtube_dl/extractor/svt.py b/youtube_dl/extractor/svt.py index fb0a4b24e..10cf80885 100644 --- a/youtube_dl/extractor/svt.py +++ b/youtube_dl/extractor/svt.py @@ -129,7 +129,7 @@ class SVTPlayIE(SVTBaseIE): 'ext': 'mp4', 'title': 'Flygplan till Haile Selassie', 'duration': 3527, - 'thumbnail': 're:^https?://.*[\.-]jpg$', + 'thumbnail': r're:^https?://.*[\.-]jpg$', 'age_limit': 0, 'subtitles': { 'sv': [{ diff --git a/youtube_dl/extractor/swrmediathek.py b/youtube_dl/extractor/swrmediathek.py index 6d69f7686..0f615979e 100644 --- a/youtube_dl/extractor/swrmediathek.py +++ b/youtube_dl/extractor/swrmediathek.py @@ -1,10 +1,12 @@ # coding: utf-8 from __future__ import unicode_literals -import re - from .common import InfoExtractor -from ..utils import parse_duration +from ..utils import ( + parse_duration, + int_or_none, + determine_protocol, +) class SWRMediathekIE(InfoExtractor): @@ -18,7 +20,7 @@ class SWRMediathekIE(InfoExtractor): 'ext': 'mp4', 'title': 'SWR odysso', 'description': 'md5:2012e31baad36162e97ce9eb3f157b8a', - 'thumbnail': 're:^http:.*\.jpg$', + 'thumbnail': r're:^http:.*\.jpg$', 'duration': 2602, 'upload_date': '20140515', 'uploader': 'SWR Fernsehen', @@ -32,12 +34,13 @@ class SWRMediathekIE(InfoExtractor): 'ext': 'mp4', 'title': 'Nachtcafé - Alltagsdroge Alkohol - zwischen Sektempfang und Komasaufen', 'description': 'md5:e0a3adc17e47db2c23aab9ebc36dbee2', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'duration': 5305, 'upload_date': '20140516', 'uploader': 'SWR Fernsehen', 'uploader_id': '990030', }, + 'skip': 'redirect to http://swrmediathek.de/index.htm?hinweis=swrlink', }, { 'url': 'http://swrmediathek.de/player.htm?show=bba23e10-cb93-11e3-bf7f-0026b975f2e6', 'md5': '4382e4ef2c9d7ce6852535fa867a0dd3', @@ -46,59 +49,67 @@ class SWRMediathekIE(InfoExtractor): 'ext': 'mp3', 'title': 'Saša Stanišic: Vor dem Fest', 'description': 'md5:5b792387dc3fbb171eb709060654e8c9', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'duration': 3366, 'upload_date': '20140520', 'uploader': 'SWR 2', 'uploader_id': '284670', - } + }, + 'skip': 'redirect to http://swrmediathek.de/index.htm?hinweis=swrlink', }] def _real_extract(self, url): - mobj = re.match(self._VALID_URL, url) - video_id = mobj.group('id') + video_id = self._match_id(url) video = self._download_json( - 'http://swrmediathek.de/AjaxEntry?ekey=%s' % video_id, video_id, 'Downloading video JSON') + 'http://swrmediathek.de/AjaxEntry?ekey=%s' % video_id, + video_id, 'Downloading video JSON') attr = video['attr'] - media_type = attr['entry_etype'] + title = attr['entry_title'] + media_type = attr.get('entry_etype') formats = [] - for entry in video['sub']: - if entry['name'] != 'entry_media': + for entry in video.get('sub', []): + if entry.get('name') != 'entry_media': continue - entry_attr = entry['attr'] - codec = entry_attr['val0'] - quality = int(entry_attr['val1']) - - fmt = { - 'url': entry_attr['val2'], - 'quality': quality, - } - - if media_type == 'Video': - fmt.update({ - 'format_note': ['144p', '288p', '544p', '720p'][quality - 1], - 'vcodec': codec, + entry_attr = entry.get('attr', {}) + f_url = entry_attr.get('val2') + if not f_url: + continue + codec = entry_attr.get('val0') + if codec == 'm3u8': + formats.extend(self._extract_m3u8_formats( + f_url, video_id, 'mp4', 'm3u8_native', + m3u8_id='hls', fatal=False)) + elif codec == 'f4m': + formats.extend(self._extract_f4m_formats( + f_url + '?hdcore=3.7.0', video_id, + f4m_id='hds', fatal=False)) + else: + formats.append({ + 'format_id': determine_protocol({'url': f_url}), + 'url': f_url, + 'quality': int_or_none(entry_attr.get('val1')), + 'vcodec': codec if media_type == 'Video' else 'none', + 'acodec': codec if media_type == 'Audio' else None, }) - elif media_type == 'Audio': - fmt.update({ - 'acodec': codec, - }) - formats.append(fmt) - self._sort_formats(formats) + upload_date = None + entry_pdatet = attr.get('entry_pdatet') + if entry_pdatet: + upload_date = entry_pdatet[:-4] + return { 'id': video_id, - 'title': attr['entry_title'], - 'description': attr['entry_descl'], - 'thumbnail': attr['entry_image_16_9'], - 'duration': parse_duration(attr['entry_durat']), - 'upload_date': attr['entry_pdatet'][:-4], - 'uploader': attr['channel_title'], - 'uploader_id': attr['channel_idkey'], + 'title': title, + 'description': attr.get('entry_descl'), + 'thumbnail': attr.get('entry_image_16_9'), + 'duration': parse_duration(attr.get('entry_durat')), + 'upload_date': upload_date, + 'uploader': attr.get('channel_title'), + 'uploader_id': attr.get('channel_idkey'), 'formats': formats, } diff --git a/youtube_dl/extractor/tagesschau.py b/youtube_dl/extractor/tagesschau.py index 8670cee28..c351b7545 100644 --- a/youtube_dl/extractor/tagesschau.py +++ b/youtube_dl/extractor/tagesschau.py @@ -23,7 +23,7 @@ class TagesschauPlayerIE(InfoExtractor): 'id': '179517', 'ext': 'mp4', 'title': 'Marie Kristin Boese, ARD Berlin, über den zukünftigen Kurs der AfD', - 'thumbnail': 're:^https?:.*\.jpg$', + 'thumbnail': r're:^https?:.*\.jpg$', 'formats': 'mincount:6', }, }, { @@ -33,7 +33,7 @@ class TagesschauPlayerIE(InfoExtractor): 'id': '29417', 'ext': 'mp3', 'title': 'Trabi - Bye, bye Rennpappe', - 'thumbnail': 're:^https?:.*\.jpg$', + 'thumbnail': r're:^https?:.*\.jpg$', 'formats': 'mincount:2', }, }, { @@ -135,7 +135,7 @@ class TagesschauIE(InfoExtractor): 'ext': 'mp4', 'title': 'Regierungsumbildung in Athen: Neue Minister in Griechenland vereidigt', 'description': '18.07.2015 20:10 Uhr', - 'thumbnail': 're:^https?:.*\.jpg$', + 'thumbnail': r're:^https?:.*\.jpg$', }, }, { 'url': 'http://www.tagesschau.de/multimedia/sendung/ts-5727.html', @@ -145,7 +145,7 @@ class TagesschauIE(InfoExtractor): 'ext': 'mp4', 'title': 'Sendung: tagesschau \t04.12.2014 20:00 Uhr', 'description': 'md5:695c01bfd98b7e313c501386327aea59', - 'thumbnail': 're:^https?:.*\.jpg$', + 'thumbnail': r're:^https?:.*\.jpg$', }, }, { # exclusive audio @@ -156,7 +156,7 @@ class TagesschauIE(InfoExtractor): 'ext': 'mp3', 'title': 'Trabi - Bye, bye Rennpappe', 'description': 'md5:8687dda862cbbe2cfb2df09b56341317', - 'thumbnail': 're:^https?:.*\.jpg$', + 'thumbnail': r're:^https?:.*\.jpg$', }, }, { # audio in article @@ -167,7 +167,7 @@ class TagesschauIE(InfoExtractor): 'ext': 'mp3', 'title': 'Viele Baustellen für neuen BND-Chef', 'description': 'md5:1e69a54be3e1255b2b07cdbce5bcd8b4', - 'thumbnail': 're:^https?:.*\.jpg$', + 'thumbnail': r're:^https?:.*\.jpg$', }, }, { 'url': 'http://www.tagesschau.de/inland/afd-parteitag-135.html', diff --git a/youtube_dl/extractor/tass.py b/youtube_dl/extractor/tass.py index 5293393ef..6d336da78 100644 --- a/youtube_dl/extractor/tass.py +++ b/youtube_dl/extractor/tass.py @@ -21,7 +21,7 @@ class TassIE(InfoExtractor): 'ext': 'mp4', 'title': 'Посетителям московского зоопарка показали красную панду', 'description': 'Приехавшую из Дублина Зейну можно увидеть в павильоне "Кошки тропиков"', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { diff --git a/youtube_dl/extractor/tdslifeway.py b/youtube_dl/extractor/tdslifeway.py index 4d1f5c801..101c6ee31 100644 --- a/youtube_dl/extractor/tdslifeway.py +++ b/youtube_dl/extractor/tdslifeway.py @@ -13,7 +13,7 @@ class TDSLifewayIE(InfoExtractor): 'id': '3453494717001', 'ext': 'mp4', 'title': 'The Gospel by Numbers', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'upload_date': '20140410', 'description': 'Coming soon from T4G 2014!', 'uploader_id': '2034960640001', diff --git a/youtube_dl/extractor/teachertube.py b/youtube_dl/extractor/teachertube.py index df5d5556f..f14713a78 100644 --- a/youtube_dl/extractor/teachertube.py +++ b/youtube_dl/extractor/teachertube.py @@ -24,7 +24,7 @@ class TeacherTubeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Measures of dispersion from a frequency table', 'description': 'Measures of dispersion from a frequency table', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }, { 'url': 'http://www.teachertube.com/viewVideo.php?video_id=340064', @@ -34,7 +34,7 @@ class TeacherTubeIE(InfoExtractor): 'ext': 'mp4', 'title': 'How to Make Paper Dolls _ Paper Art Projects', 'description': 'Learn how to make paper dolls in this simple', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }, { 'url': 'http://www.teachertube.com/music.php?music_id=8805', diff --git a/youtube_dl/extractor/ted.py b/youtube_dl/extractor/ted.py index 451cde76d..1b1afab32 100644 --- a/youtube_dl/extractor/ted.py +++ b/youtube_dl/extractor/ted.py @@ -47,7 +47,7 @@ class TEDIE(InfoExtractor): 'id': 'tSVI8ta_P4w', 'ext': 'mp4', 'title': 'Vishal Sikka: The beauty and power of algorithms', - 'thumbnail': 're:^https?://.+\.jpg', + 'thumbnail': r're:^https?://.+\.jpg', 'description': 'md5:6261fdfe3e02f4f579cbbfc00aff73f4', 'upload_date': '20140122', 'uploader_id': 'TEDInstitute', @@ -189,7 +189,7 @@ class TEDIE(InfoExtractor): 'format_id': '%s-%sk' % (format_id, bitrate), 'tbr': bitrate, }) - if re.search('\d+k', h264_url): + if re.search(r'\d+k', h264_url): http_url = h264_url elif format_id == 'rtmp': streamer = talk_info.get('streamer') diff --git a/youtube_dl/extractor/telegraaf.py b/youtube_dl/extractor/telegraaf.py index 58078c531..0f576c1ab 100644 --- a/youtube_dl/extractor/telegraaf.py +++ b/youtube_dl/extractor/telegraaf.py @@ -17,7 +17,7 @@ class TelegraafIE(InfoExtractor): 'ext': 'mp4', 'title': 'Tikibad ontruimd wegens brand', 'description': 'md5:05ca046ff47b931f9b04855015e163a4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 33, }, 'params': { diff --git a/youtube_dl/extractor/telemb.py b/youtube_dl/extractor/telemb.py index 1bbd0e7bd..9bcac4ec0 100644 --- a/youtube_dl/extractor/telemb.py +++ b/youtube_dl/extractor/telemb.py @@ -19,7 +19,7 @@ class TeleMBIE(InfoExtractor): 'ext': 'mp4', 'title': 'Mons - Cook with Danielle : des cours de cuisine en anglais ! - Les reportages', 'description': 'md5:bc5225f47b17c309761c856ad4776265', - 'thumbnail': 're:^http://.*\.(?:jpg|png)$', + 'thumbnail': r're:^http://.*\.(?:jpg|png)$', } }, { @@ -32,7 +32,7 @@ class TeleMBIE(InfoExtractor): 'ext': 'mp4', 'title': 'Havré - Incendie mortel - Les reportages', 'description': 'md5:5e54cb449acb029c2b7734e2d946bd4a', - 'thumbnail': 're:^http://.*\.(?:jpg|png)$', + 'thumbnail': r're:^http://.*\.(?:jpg|png)$', } }, ] diff --git a/youtube_dl/extractor/telewebion.py b/youtube_dl/extractor/telewebion.py index 7786b2813..1207b1a1b 100644 --- a/youtube_dl/extractor/telewebion.py +++ b/youtube_dl/extractor/telewebion.py @@ -13,7 +13,7 @@ class TelewebionIE(InfoExtractor): 'id': '1263668', 'ext': 'mp4', 'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'view_count': int, }, 'params': { diff --git a/youtube_dl/extractor/theplatform.py b/youtube_dl/extractor/theplatform.py index 0405bd6b0..192d8fa29 100644 --- a/youtube_dl/extractor/theplatform.py +++ b/youtube_dl/extractor/theplatform.py @@ -156,7 +156,7 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE): 'title': 'iPhone Siri’s sassy response to a math question has people talking', 'description': 'md5:a565d1deadd5086f3331d57298ec6333', 'duration': 83.0, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1435752600, 'upload_date': '20150701', 'uploader': 'NBCU-NEWS', @@ -297,7 +297,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE): 'ext': 'mp4', 'title': 'The Biden factor: will Joe run in 2016?', 'description': 'Could Vice President Joe Biden be preparing a 2016 campaign? Mark Halperin and Sam Stein weigh in.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20140208', 'timestamp': 1391824260, 'duration': 467.0, diff --git a/youtube_dl/extractor/thisamericanlife.py b/youtube_dl/extractor/thisamericanlife.py index 36493a5de..91e45f2c3 100644 --- a/youtube_dl/extractor/thisamericanlife.py +++ b/youtube_dl/extractor/thisamericanlife.py @@ -13,7 +13,7 @@ class ThisAmericanLifeIE(InfoExtractor): 'ext': 'm4a', 'title': '487: Harper High School, Part One', 'description': 'md5:ee40bdf3fb96174a9027f76dbecea655', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { 'url': 'http://www.thisamericanlife.org/play_full.php?play=487', diff --git a/youtube_dl/extractor/tinypic.py b/youtube_dl/extractor/tinypic.py index c43cace24..bc2def508 100644 --- a/youtube_dl/extractor/tinypic.py +++ b/youtube_dl/extractor/tinypic.py @@ -34,7 +34,7 @@ class TinyPicIE(InfoExtractor): webpage = self._download_webpage(url, video_id, 'Downloading page') mobj = re.search(r'(?m)fo\.addVariable\("file",\s"(?P[\da-z]+)"\);\n' - '\s+fo\.addVariable\("s",\s"(?P\d+)"\);', webpage) + r'\s+fo\.addVariable\("s",\s"(?P\d+)"\);', webpage) if mobj is None: raise ExtractorError('Video %s does not exist' % video_id, expected=True) diff --git a/youtube_dl/extractor/tnaflix.py b/youtube_dl/extractor/tnaflix.py index 77d56b8ca..7e6ec3430 100644 --- a/youtube_dl/extractor/tnaflix.py +++ b/youtube_dl/extractor/tnaflix.py @@ -91,7 +91,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor): formats = [] def extract_video_url(vl): - return re.sub('speed=\d+', 'speed=', unescapeHTML(vl.text)) + return re.sub(r'speed=\d+', 'speed=', unescapeHTML(vl.text)) video_link = cfg_xml.find('./videoLink') if video_link is not None: @@ -174,7 +174,7 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE): 'display_id': '6538', 'ext': 'mp4', 'title': 'Educational xxx video', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'age_limit': 18, }, 'params': { @@ -209,7 +209,7 @@ class TNAFlixIE(TNAFlixNetworkBaseIE): 'display_id': 'Carmella-Decesare-striptease', 'ext': 'mp4', 'title': 'Carmella Decesare - striptease', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'duration': 91, 'age_limit': 18, 'categories': ['Porn Stars'], @@ -224,7 +224,7 @@ class TNAFlixIE(TNAFlixNetworkBaseIE): 'ext': 'mp4', 'title': 'Educational xxx video', 'description': 'md5:b4fab8f88a8621c8fabd361a173fe5b8', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'duration': 164, 'age_limit': 18, 'uploader': 'bobwhite39', @@ -250,7 +250,7 @@ class EMPFlixIE(TNAFlixNetworkBaseIE): 'ext': 'mp4', 'title': 'Amateur Finger Fuck', 'description': 'Amateur solo finger fucking.', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'duration': 83, 'age_limit': 18, 'uploader': 'cwbike', @@ -280,7 +280,7 @@ class MovieFapIE(TNAFlixNetworkBaseIE): 'ext': 'mp4', 'title': 'Experienced MILF Amazing Handjob', 'description': 'Experienced MILF giving an Amazing Handjob', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'age_limit': 18, 'uploader': 'darvinfred06', 'view_count': int, @@ -298,7 +298,7 @@ class MovieFapIE(TNAFlixNetworkBaseIE): 'ext': 'flv', 'title': 'Jeune Couple Russe', 'description': 'Amateur', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'age_limit': 18, 'uploader': 'whiskeyjar', 'view_count': int, diff --git a/youtube_dl/extractor/tudou.py b/youtube_dl/extractor/tudou.py index bb8b8e234..2aae55e7e 100644 --- a/youtube_dl/extractor/tudou.py +++ b/youtube_dl/extractor/tudou.py @@ -23,7 +23,7 @@ class TudouIE(InfoExtractor): 'id': '159448201', 'ext': 'f4v', 'title': '卡马乔国足开大脚长传冲吊集锦', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1372113489000, 'description': '卡马乔卡家军,开大脚先进战术不完全集锦!', 'duration': 289.04, @@ -36,7 +36,7 @@ class TudouIE(InfoExtractor): 'id': '117049447', 'ext': 'f4v', 'title': 'La Sylphide-Bolshoi-Ekaterina Krysanova & Vyacheslav Lopatin 2012', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1349207518000, 'description': 'md5:294612423894260f2dcd5c6c04fe248b', 'duration': 5478.33, diff --git a/youtube_dl/extractor/tumblr.py b/youtube_dl/extractor/tumblr.py index ebe411e12..786143525 100644 --- a/youtube_dl/extractor/tumblr.py +++ b/youtube_dl/extractor/tumblr.py @@ -17,7 +17,7 @@ class TumblrIE(InfoExtractor): 'ext': 'mp4', 'title': 'tatiana maslany news, Orphan Black || DVD extra - behind the scenes ↳...', 'description': 'md5:37db8211e40b50c7c44e95da14f630b7', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', } }, { 'url': 'http://5sostrum.tumblr.com/post/90208453769/yall-forgetting-the-greatest-keek-of-them-all', @@ -27,7 +27,7 @@ class TumblrIE(InfoExtractor): 'ext': 'mp4', 'title': '5SOS STRUM ;]', 'description': 'md5:dba62ac8639482759c8eb10ce474586a', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', } }, { 'url': 'http://hdvideotest.tumblr.com/post/130323439814/test-description-for-my-hd-video', @@ -37,7 +37,7 @@ class TumblrIE(InfoExtractor): 'ext': 'mp4', 'title': 'HD Video Testing \u2014 Test description for my HD video', 'description': 'md5:97cc3ab5fcd27ee4af6356701541319c', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, 'params': { 'format': 'hd', @@ -92,7 +92,7 @@ class TumblrIE(InfoExtractor): 'title': 'Video by victoriassecret', 'description': 'Invisibility or flight…which superpower would YOU choose? #VSFashionShow #ThisOrThat', 'uploader_id': 'victoriassecret', - 'thumbnail': 're:^https?://.*\.jpg' + 'thumbnail': r're:^https?://.*\.jpg' }, 'add_ie': ['Instagram'], }] diff --git a/youtube_dl/extractor/tunein.py b/youtube_dl/extractor/tunein.py index ae4cfaec2..7e51de89e 100644 --- a/youtube_dl/extractor/tunein.py +++ b/youtube_dl/extractor/tunein.py @@ -11,6 +11,12 @@ from ..compat import compat_urlparse class TuneInBaseIE(InfoExtractor): _API_BASE_URL = 'http://tunein.com/tuner/tune/' + @staticmethod + def _extract_urls(webpage): + return re.findall( + r']+src=["\'](?P(?:https?://)?tunein\.com/embed/player/[pst]\d+)', + webpage) + def _real_extract(self, url): content_id = self._match_id(url) @@ -69,82 +75,83 @@ class TuneInClipIE(TuneInBaseIE): _VALID_URL = r'https?://(?:www\.)?tunein\.com/station/.*?audioClipId\=(?P\d+)' _API_URL_QUERY = '?tuneType=AudioClip&audioclipId=%s' - _TESTS = [ - { - 'url': 'http://tunein.com/station/?stationId=246119&audioClipId=816', - 'md5': '99f00d772db70efc804385c6b47f4e77', - 'info_dict': { - 'id': '816', - 'title': '32m', - 'ext': 'mp3', - }, + _TESTS = [{ + 'url': 'http://tunein.com/station/?stationId=246119&audioClipId=816', + 'md5': '99f00d772db70efc804385c6b47f4e77', + 'info_dict': { + 'id': '816', + 'title': '32m', + 'ext': 'mp3', }, - ] + }] class TuneInStationIE(TuneInBaseIE): IE_NAME = 'tunein:station' - _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-s|station/.*?StationId\=)(?P\d+)' + _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-s|station/.*?StationId=|embed/player/s)(?P\d+)' _API_URL_QUERY = '?tuneType=Station&stationId=%s' @classmethod def suitable(cls, url): return False if TuneInClipIE.suitable(url) else super(TuneInStationIE, cls).suitable(url) - _TESTS = [ - { - 'url': 'http://tunein.com/radio/Jazz24-885-s34682/', - 'info_dict': { - 'id': '34682', - 'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2', - 'ext': 'mp3', - 'location': 'Tacoma, WA', - }, - 'params': { - 'skip_download': True, # live stream - }, + _TESTS = [{ + 'url': 'http://tunein.com/radio/Jazz24-885-s34682/', + 'info_dict': { + 'id': '34682', + 'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2', + 'ext': 'mp3', + 'location': 'Tacoma, WA', }, - ] + 'params': { + 'skip_download': True, # live stream + }, + }, { + 'url': 'http://tunein.com/embed/player/s6404/', + 'only_matching': True, + }] class TuneInProgramIE(TuneInBaseIE): IE_NAME = 'tunein:program' - _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-p|program/.*?ProgramId\=)(?P\d+)' + _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-p|program/.*?ProgramId=|embed/player/p)(?P\d+)' _API_URL_QUERY = '?tuneType=Program&programId=%s' - _TESTS = [ - { - 'url': 'http://tunein.com/radio/Jazz-24-p2506/', - 'info_dict': { - 'id': '2506', - 'title': 'Jazz 24 on 91.3 WUKY-HD3', - 'ext': 'mp3', - 'location': 'Lexington, KY', - }, - 'params': { - 'skip_download': True, # live stream - }, + _TESTS = [{ + 'url': 'http://tunein.com/radio/Jazz-24-p2506/', + 'info_dict': { + 'id': '2506', + 'title': 'Jazz 24 on 91.3 WUKY-HD3', + 'ext': 'mp3', + 'location': 'Lexington, KY', }, - ] + 'params': { + 'skip_download': True, # live stream + }, + }, { + 'url': 'http://tunein.com/embed/player/p191660/', + 'only_matching': True, + }] class TuneInTopicIE(TuneInBaseIE): IE_NAME = 'tunein:topic' - _VALID_URL = r'https?://(?:www\.)?tunein\.com/topic/.*?TopicId\=(?P\d+)' + _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:topic/.*?TopicId=|embed/player/t)(?P\d+)' _API_URL_QUERY = '?tuneType=Topic&topicId=%s' - _TESTS = [ - { - 'url': 'http://tunein.com/topic/?TopicId=101830576', - 'md5': 'c31a39e6f988d188252eae7af0ef09c9', - 'info_dict': { - 'id': '101830576', - 'title': 'Votez pour moi du 29 octobre 2015 (29/10/15)', - 'ext': 'mp3', - 'location': 'Belgium', - }, + _TESTS = [{ + 'url': 'http://tunein.com/topic/?TopicId=101830576', + 'md5': 'c31a39e6f988d188252eae7af0ef09c9', + 'info_dict': { + 'id': '101830576', + 'title': 'Votez pour moi du 29 octobre 2015 (29/10/15)', + 'ext': 'mp3', + 'location': 'Belgium', }, - ] + }, { + 'url': 'http://tunein.com/embed/player/t101830576/', + 'only_matching': True, + }] class TuneInShortenerIE(InfoExtractor): diff --git a/youtube_dl/extractor/turbo.py b/youtube_dl/extractor/turbo.py index 7ae63a499..25aa9c58e 100644 --- a/youtube_dl/extractor/turbo.py +++ b/youtube_dl/extractor/turbo.py @@ -24,7 +24,7 @@ class TurboIE(InfoExtractor): 'duration': 3715, 'title': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia... ', 'description': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia...', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/tv2.py b/youtube_dl/extractor/tv2.py index bd28267b0..d5071e8a5 100644 --- a/youtube_dl/extractor/tv2.py +++ b/youtube_dl/extractor/tv2.py @@ -126,7 +126,7 @@ class TV2ArticleIE(InfoExtractor): if not assets: # New embed pattern - for v in re.findall('TV2ContentboxVideo\(({.+?})\)', webpage): + for v in re.findall(r'TV2ContentboxVideo\(({.+?})\)', webpage): video = self._parse_json( v, playlist_id, transform_source=js_to_json, fatal=False) if not video: diff --git a/youtube_dl/extractor/tv4.py b/youtube_dl/extractor/tv4.py index 5d2d8f132..29f62b970 100644 --- a/youtube_dl/extractor/tv4.py +++ b/youtube_dl/extractor/tv4.py @@ -33,7 +33,7 @@ class TV4IE(InfoExtractor): 'id': '2491650', 'ext': 'mp4', 'title': 'Kalla Fakta 5 (english subtitles)', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': int, 'upload_date': '20131125', }, @@ -45,7 +45,7 @@ class TV4IE(InfoExtractor): 'id': '3054113', 'ext': 'mp4', 'title': 'Så här jobbar ficktjuvarna - se avslöjande bilder', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'Unika bilder avslöjar hur turisternas fickor vittjas mitt på Stockholms central. Två experter på ficktjuvarna avslöjar knepen du ska se upp för.', 'timestamp': int, 'upload_date': '20150130', diff --git a/youtube_dl/extractor/tvc.py b/youtube_dl/extractor/tvc.py index 4065354dd..008f64cc2 100644 --- a/youtube_dl/extractor/tvc.py +++ b/youtube_dl/extractor/tvc.py @@ -19,7 +19,7 @@ class TVCIE(InfoExtractor): 'id': '74622', 'ext': 'mp4', 'title': 'События. "События". Эфир от 22.05.2015 14:30', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 1122, }, } @@ -72,7 +72,7 @@ class TVCArticleIE(InfoExtractor): 'ext': 'mp4', 'title': 'События. "События". Эфир от 22.05.2015 14:30', 'description': 'md5:ad7aa7db22903f983e687b8a3e98c6dd', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 1122, }, }, { @@ -82,7 +82,7 @@ class TVCArticleIE(InfoExtractor): 'ext': 'mp4', 'title': 'Эксперты: в столице встал вопрос о максимально безопасных остановках', 'description': 'md5:f2098f71e21f309e89f69b525fd9846e', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 278, }, }, { @@ -92,7 +92,7 @@ class TVCArticleIE(InfoExtractor): 'ext': 'mp4', 'title': 'Ещё не поздно. Эфир от 03.08.2013', 'description': 'md5:51fae9f3f8cfe67abce014e428e5b027', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 3316, }, }] diff --git a/youtube_dl/extractor/tweakers.py b/youtube_dl/extractor/tweakers.py index 7a9386cde..2b10d9bca 100644 --- a/youtube_dl/extractor/tweakers.py +++ b/youtube_dl/extractor/tweakers.py @@ -18,7 +18,7 @@ class TweakersIE(InfoExtractor): 'ext': 'mp4', 'title': 'New Nintendo 3DS XL - Op alle fronten beter', 'description': 'md5:3789b21fed9c0219e9bcaacd43fab280', - 'thumbnail': 're:^https?://.*\.jpe?g$', + 'thumbnail': r're:^https?://.*\.jpe?g$', 'duration': 386, 'uploader_id': 's7JeEm', } diff --git a/youtube_dl/extractor/twentyfourvideo.py b/youtube_dl/extractor/twentyfourvideo.py index af92b713b..1093a3829 100644 --- a/youtube_dl/extractor/twentyfourvideo.py +++ b/youtube_dl/extractor/twentyfourvideo.py @@ -22,7 +22,7 @@ class TwentyFourVideoIE(InfoExtractor): 'ext': 'mp4', 'title': 'Эротика каменного века', 'description': 'Как смотрели порно в каменном веке.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'SUPERTELO', 'duration': 31, 'timestamp': 1275937857, diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py index bbf071da3..6d67bda86 100644 --- a/youtube_dl/extractor/twitch.py +++ b/youtube_dl/extractor/twitch.py @@ -206,7 +206,14 @@ class TwitchChapterIE(TwitchItemBaseIE): class TwitchVodIE(TwitchItemBaseIE): IE_NAME = 'twitch:vod' - _VALID_URL = r'%s/[^/]+/v/(?P\d+)' % TwitchBaseIE._VALID_URL_BASE + _VALID_URL = r'''(?x) + https?:// + (?: + (?:www\.)?twitch\.tv/[^/]+/v/| + player\.twitch\.tv/\?.*?\bvideo=v + ) + (?P\d+) + ''' _ITEM_TYPE = 'vod' _ITEM_SHORTCUT = 'v' @@ -216,7 +223,7 @@ class TwitchVodIE(TwitchItemBaseIE): 'id': 'v6528877', 'ext': 'mp4', 'title': 'LCK Summer Split - Week 6 Day 1', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 17208, 'timestamp': 1435131709, 'upload_date': '20150624', @@ -236,7 +243,7 @@ class TwitchVodIE(TwitchItemBaseIE): 'id': 'v11230755', 'ext': 'mp4', 'title': 'Untitled Broadcast', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 1638, 'timestamp': 1439746708, 'upload_date': '20150816', @@ -249,6 +256,9 @@ class TwitchVodIE(TwitchItemBaseIE): 'skip_download': True, }, 'skip': 'HTTP Error 404: Not Found', + }, { + 'url': 'http://player.twitch.tv/?t=5m10s&video=v6528877', + 'only_matching': True, }] def _real_extract(self, url): @@ -540,7 +550,7 @@ class TwitchClipsIE(InfoExtractor): 'id': 'AggressiveCobraPoooound', 'ext': 'mp4', 'title': 'EA Play 2016 Live from the Novo Theatre', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'creator': 'EA', 'uploader': 'stereotype_', 'uploader_id': 'stereotype_', diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py index ac0b221b4..37e3bc412 100644 --- a/youtube_dl/extractor/twitter.py +++ b/youtube_dl/extractor/twitter.py @@ -34,7 +34,7 @@ class TwitterCardIE(TwitterBaseIE): 'id': '560070183650213889', 'ext': 'mp4', 'title': 'Twitter Card', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 30.033, } }, @@ -45,7 +45,7 @@ class TwitterCardIE(TwitterBaseIE): 'id': '623160978427936768', 'ext': 'mp4', 'title': 'Twitter Card', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 80.155, }, }, @@ -82,7 +82,7 @@ class TwitterCardIE(TwitterBaseIE): 'id': '705235433198714880', 'ext': 'mp4', 'title': 'Twitter web player', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', }, }, { 'url': 'https://twitter.com/i/videos/752274308186120192', @@ -201,7 +201,7 @@ class TwitterIE(InfoExtractor): 'id': '643211948184596480', 'ext': 'mp4', 'title': 'FREE THE NIPPLE - FTN supporters on Hollywood Blvd today!', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'description': 'FREE THE NIPPLE on Twitter: "FTN supporters on Hollywood Blvd today! http://t.co/c7jHH749xJ"', 'uploader': 'FREE THE NIPPLE', 'uploader_id': 'freethenipple', @@ -217,7 +217,7 @@ class TwitterIE(InfoExtractor): 'ext': 'mp4', 'title': 'Gifs - tu vai cai tu vai cai tu nao eh capaz disso tu vai cai', 'description': 'Gifs on Twitter: "tu vai cai tu vai cai tu nao eh capaz disso tu vai cai https://t.co/tM46VHFlO5"', - 'thumbnail': 're:^https?://.*\.png', + 'thumbnail': r're:^https?://.*\.png', 'uploader': 'Gifs', 'uploader_id': 'giphz', }, @@ -257,7 +257,7 @@ class TwitterIE(InfoExtractor): 'ext': 'mp4', 'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel', 'description': 'JG on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'uploader': 'JG', 'uploader_id': 'jaydingeer', }, diff --git a/youtube_dl/extractor/udn.py b/youtube_dl/extractor/udn.py index 57dd73aef..daf45d0b4 100644 --- a/youtube_dl/extractor/udn.py +++ b/youtube_dl/extractor/udn.py @@ -23,7 +23,7 @@ class UDNEmbedIE(InfoExtractor): 'id': '300040', 'ext': 'mp4', 'title': '生物老師男變女 全校挺"做自己"', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'params': { # m3u8 download diff --git a/youtube_dl/extractor/urort.py b/youtube_dl/extractor/urort.py index 8872cfcb2..8f6edab4b 100644 --- a/youtube_dl/extractor/urort.py +++ b/youtube_dl/extractor/urort.py @@ -21,7 +21,7 @@ class UrortIE(InfoExtractor): 'id': '33124-24', 'ext': 'mp3', 'title': 'The Bomb', - 'thumbnail': 're:^https?://.+\.jpg', + 'thumbnail': r're:^https?://.+\.jpg', 'uploader': 'Gerilja', 'uploader_id': 'Gerilja', 'upload_date': '20100323', diff --git a/youtube_dl/extractor/ustudio.py b/youtube_dl/extractor/ustudio.py index 3484a2046..56509beed 100644 --- a/youtube_dl/extractor/ustudio.py +++ b/youtube_dl/extractor/ustudio.py @@ -22,7 +22,7 @@ class UstudioIE(InfoExtractor): 'ext': 'mp4', 'title': 'San Francisco: Golden Gate Bridge', 'description': 'md5:23925500697f2c6d4830e387ba51a9be', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20111107', 'uploader': 'Tony Farley', } diff --git a/youtube_dl/extractor/varzesh3.py b/youtube_dl/extractor/varzesh3.py index 84698371a..f474ed73f 100644 --- a/youtube_dl/extractor/varzesh3.py +++ b/youtube_dl/extractor/varzesh3.py @@ -22,7 +22,7 @@ class Varzesh3IE(InfoExtractor): 'ext': 'mp4', 'title': '۵ واکنش برتر دروازه‌بانان؛هفته ۲۶ بوندسلیگا', 'description': 'فصل ۲۰۱۵-۲۰۱۴', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, 'skip': 'HTTP 404 Error', }, { @@ -67,7 +67,7 @@ class Varzesh3IE(InfoExtractor): webpage, display_id, default=None) if video_id is None: video_id = self._search_regex( - 'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id', + r'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id', default=display_id) return { diff --git a/youtube_dl/extractor/vbox7.py b/youtube_dl/extractor/vbox7.py index 429893e38..bef639462 100644 --- a/youtube_dl/extractor/vbox7.py +++ b/youtube_dl/extractor/vbox7.py @@ -28,7 +28,7 @@ class Vbox7IE(InfoExtractor): 'ext': 'mp4', 'title': 'Борисов: Притеснен съм за бъдещето на България', 'description': 'По думите му е опасно страната ни да бъде обявена за "сигурна"', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'timestamp': 1470982814, 'upload_date': '20160812', 'uploader': 'zdraveibulgaria', @@ -56,7 +56,7 @@ class Vbox7IE(InfoExtractor): @staticmethod def _extract_url(webpage): mobj = re.search( - ']+src=(?P["\'])(?P(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)', + r']+src=(?P["\'])(?P(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)', webpage) if mobj: return mobj.group('url') diff --git a/youtube_dl/extractor/vessel.py b/youtube_dl/extractor/vessel.py index 6b9c227db..80a643dfe 100644 --- a/youtube_dl/extractor/vessel.py +++ b/youtube_dl/extractor/vessel.py @@ -24,7 +24,7 @@ class VesselIE(InfoExtractor): 'id': 'HDN7G5UMs', 'ext': 'mp4', 'title': 'Nvidia GeForce GTX Titan X - The Best Video Card on the Market?', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'upload_date': '20150317', 'description': 'Did Nvidia pull out all the stops on the Titan X, or does its performance leave something to be desired?', 'timestamp': int, diff --git a/youtube_dl/extractor/vgtv.py b/youtube_dl/extractor/vgtv.py index 3b38ac700..8a574bc26 100644 --- a/youtube_dl/extractor/vgtv.py +++ b/youtube_dl/extractor/vgtv.py @@ -61,7 +61,7 @@ class VGTVIE(XstreamIE): 'ext': 'mp4', 'title': 'Hevnen er søt: Episode 10 - Abu', 'description': 'md5:e25e4badb5f544b04341e14abdc72234', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 648.000, 'timestamp': 1404626400, 'upload_date': '20140706', @@ -76,7 +76,7 @@ class VGTVIE(XstreamIE): 'ext': 'flv', 'title': 'OPPTAK: VGTV følger EM-kvalifiseringen', 'description': 'md5:3772d9c0dc2dff92a886b60039a7d4d3', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 9103.0, 'timestamp': 1410113864, 'upload_date': '20140907', @@ -96,7 +96,7 @@ class VGTVIE(XstreamIE): 'ext': 'mp4', 'title': 'V75 fra Solvalla 30.05.15', 'description': 'md5:b3743425765355855f88e096acc93231', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 25966, 'timestamp': 1432975582, 'upload_date': '20150530', @@ -200,7 +200,7 @@ class VGTVIE(XstreamIE): format_info = { 'url': mp4_url, } - mobj = re.search('(\d+)_(\d+)_(\d+)', mp4_url) + mobj = re.search(r'(\d+)_(\d+)_(\d+)', mp4_url) if mobj: tbr = int(mobj.group(3)) format_info.update({ @@ -246,7 +246,7 @@ class BTArticleIE(InfoExtractor): 'ext': 'mp4', 'title': 'Alrekstad internat', 'description': 'md5:dc81a9056c874fedb62fc48a300dac58', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 191, 'timestamp': 1289991323, 'upload_date': '20101117', diff --git a/youtube_dl/extractor/vidbit.py b/youtube_dl/extractor/vidbit.py index e7ac5a842..91f45b7cc 100644 --- a/youtube_dl/extractor/vidbit.py +++ b/youtube_dl/extractor/vidbit.py @@ -20,7 +20,7 @@ class VidbitIE(InfoExtractor): 'ext': 'mp4', 'title': 'Intro to VidBit', 'description': 'md5:5e0d6142eec00b766cbf114bfd3d16b7', - 'thumbnail': 're:https?://.*\.jpg$', + 'thumbnail': r're:https?://.*\.jpg$', 'upload_date': '20160618', 'view_count': int, 'comment_count': int, diff --git a/youtube_dl/extractor/viddler.py b/youtube_dl/extractor/viddler.py index 8d92aee87..67808e7e6 100644 --- a/youtube_dl/extractor/viddler.py +++ b/youtube_dl/extractor/viddler.py @@ -26,7 +26,7 @@ class ViddlerIE(InfoExtractor): 'timestamp': 1335371429, 'upload_date': '20120425', 'duration': 100.89, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'view_count': int, 'comment_count': int, 'categories': ['video content', 'high quality video', 'video made easy', 'how to produce video with limited resources', 'viddler'], diff --git a/youtube_dl/extractor/videa.py b/youtube_dl/extractor/videa.py new file mode 100644 index 000000000..311df58f4 --- /dev/null +++ b/youtube_dl/extractor/videa.py @@ -0,0 +1,97 @@ +# coding: utf-8 +from __future__ import unicode_literals + +import re + +from .common import InfoExtractor +from ..utils import ( + int_or_none, + mimetype2ext, + parse_codecs, + xpath_element, + xpath_text, +) + + +class VideaIE(InfoExtractor): + _VALID_URL = r'''(?x) + https?:// + videa\.hu/ + (?: + videok/(?:[^/]+/)*[^?#&]+-| + player\?.*?\bv=| + player/v/ + ) + (?P[^?#&]+) + ''' + _TESTS = [{ + 'url': 'http://videa.hu/videok/allatok/az-orult-kigyasz-285-kigyot-kigyo-8YfIAjxwWGwT8HVQ', + 'md5': '97a7af41faeaffd9f1fc864a7c7e7603', + 'info_dict': { + 'id': '8YfIAjxwWGwT8HVQ', + 'ext': 'mp4', + 'title': 'Az őrült kígyász 285 kígyót enged szabadon', + 'thumbnail': 'http://videa.hu/static/still/1.4.1.1007274.1204470.3', + 'duration': 21, + }, + }, { + 'url': 'http://videa.hu/videok/origo/jarmuvek/supercars-elozes-jAHDWfWSJH5XuFhH', + 'only_matching': True, + }, { + 'url': 'http://videa.hu/player?v=8YfIAjxwWGwT8HVQ', + 'only_matching': True, + }, { + 'url': 'http://videa.hu/player/v/8YfIAjxwWGwT8HVQ?autoplay=1', + 'only_matching': True, + }] + + @staticmethod + def _extract_urls(webpage): + return [url for _, url in re.findall( + r']+src=(["\'])(?P(?:https?:)?//videa\.hu/player\?.*?\bv=.+?)\1', + webpage)] + + def _real_extract(self, url): + video_id = self._match_id(url) + + info = self._download_xml( + 'http://videa.hu/videaplayer_get_xml.php', video_id, + query={'v': video_id}) + + video = xpath_element(info, './/video', 'video', fatal=True) + sources = xpath_element(info, './/video_sources', 'sources', fatal=True) + + title = xpath_text(video, './title', fatal=True) + + formats = [] + for source in sources.findall('./video_source'): + source_url = source.text + if not source_url: + continue + f = parse_codecs(source.get('codecs')) + f.update({ + 'url': source_url, + 'ext': mimetype2ext(source.get('mimetype')) or 'mp4', + 'format_id': source.get('name'), + 'width': int_or_none(source.get('width')), + 'height': int_or_none(source.get('height')), + }) + formats.append(f) + self._sort_formats(formats) + + thumbnail = xpath_text(video, './poster_src') + duration = int_or_none(xpath_text(video, './duration')) + + age_limit = None + is_adult = xpath_text(video, './is_adult_content', default=None) + if is_adult: + age_limit = 18 if is_adult == '1' else 0 + + return { + 'id': video_id, + 'title': title, + 'thumbnail': thumbnail, + 'duration': duration, + 'age_limit': age_limit, + 'formats': formats, + } diff --git a/youtube_dl/extractor/videomega.py b/youtube_dl/extractor/videomega.py index 4f0dcd18c..c02830ddd 100644 --- a/youtube_dl/extractor/videomega.py +++ b/youtube_dl/extractor/videomega.py @@ -19,7 +19,7 @@ class VideoMegaIE(InfoExtractor): 'id': 'AOSQBJYKIDDIKYJBQSOA', 'ext': 'mp4', 'title': '1254207', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } }, { 'url': 'http://videomega.tv/cdn.php?ref=AOSQBJYKIDDIKYJBQSOA&width=1070&height=600', diff --git a/youtube_dl/extractor/videomore.py b/youtube_dl/extractor/videomore.py index 7f2566586..9b56630de 100644 --- a/youtube_dl/extractor/videomore.py +++ b/youtube_dl/extractor/videomore.py @@ -23,7 +23,7 @@ class VideomoreIE(InfoExtractor): 'title': 'Кино в деталях 5 сезон В гостях Алексей Чумаков и Юлия Ковальчук', 'series': 'Кино в деталях', 'episode': 'В гостях Алексей Чумаков и Юлия Ковальчук', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 2910, 'view_count': int, 'comment_count': int, @@ -37,7 +37,7 @@ class VideomoreIE(InfoExtractor): 'title': 'Молодежка 2 сезон 40 серия', 'series': 'Молодежка', 'episode': '40 серия', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 2809, 'view_count': int, 'comment_count': int, @@ -53,7 +53,7 @@ class VideomoreIE(InfoExtractor): 'ext': 'flv', 'title': 'Промо Команда проиграла из-за Бакина?', 'episode': 'Команда проиграла из-за Бакина?', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 29, 'age_limit': 16, 'view_count': int, @@ -145,7 +145,7 @@ class VideomoreVideoIE(InfoExtractor): 'ext': 'flv', 'title': 'Ёлки 3', 'description': '', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 5579, 'age_limit': 6, 'view_count': int, @@ -168,7 +168,7 @@ class VideomoreVideoIE(InfoExtractor): 'ext': 'flv', 'title': '1 серия. Здравствуй, Аквавилль!', 'description': 'md5:c6003179538b5d353e7bcd5b1372b2d7', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 754, 'age_limit': 6, 'view_count': int, diff --git a/youtube_dl/extractor/videott.py b/youtube_dl/extractor/videott.py deleted file mode 100644 index 0f798711b..000000000 --- a/youtube_dl/extractor/videott.py +++ /dev/null @@ -1,65 +0,0 @@ -from __future__ import unicode_literals - -import re -import base64 - -from .common import InfoExtractor -from ..utils import ( - unified_strdate, - int_or_none, -) - - -class VideoTtIE(InfoExtractor): - _WORKING = False - ID_NAME = 'video.tt' - IE_DESC = 'video.tt - Your True Tube' - _VALID_URL = r'https?://(?:www\.)?video\.tt/(?:(?:video|embed)/|watch_video\.php\?v=)(?P[\da-zA-Z]{9})' - - _TESTS = [{ - 'url': 'http://www.video.tt/watch_video.php?v=amd5YujV8', - 'md5': 'b13aa9e2f267effb5d1094443dff65ba', - 'info_dict': { - 'id': 'amd5YujV8', - 'ext': 'flv', - 'title': 'Motivational video Change your mind in just 2.50 mins', - 'description': '', - 'upload_date': '20130827', - 'uploader': 'joseph313', - } - }, { - 'url': 'http://video.tt/embed/amd5YujV8', - 'only_matching': True, - }] - - def _real_extract(self, url): - mobj = re.match(self._VALID_URL, url) - video_id = mobj.group('id') - - settings = self._download_json( - 'http://www.video.tt/player_control/settings.php?v=%s' % video_id, video_id, - 'Downloading video JSON')['settings'] - - video = settings['video_details']['video'] - - formats = [ - { - 'url': base64.b64decode(res['u'].encode('utf-8')).decode('utf-8'), - 'ext': 'flv', - 'format_id': res['l'], - } for res in settings['res'] if res['u'] - ] - - return { - 'id': video_id, - 'title': video['title'], - 'description': video['description'], - 'thumbnail': settings['config']['thumbnail'], - 'upload_date': unified_strdate(video['added']), - 'uploader': video['owner'], - 'view_count': int_or_none(video['view_count']), - 'comment_count': None if video.get('comment_count') == '--' else int_or_none(video['comment_count']), - 'like_count': int_or_none(video['liked']), - 'dislike_count': int_or_none(video['disliked']), - 'formats': formats, - } diff --git a/youtube_dl/extractor/vidio.py b/youtube_dl/extractor/vidio.py index 6898042de..4e4b4e38c 100644 --- a/youtube_dl/extractor/vidio.py +++ b/youtube_dl/extractor/vidio.py @@ -18,7 +18,7 @@ class VidioIE(InfoExtractor): 'ext': 'mp4', 'title': 'DJ_AMBRED - Booyah (Live 2015)', 'description': 'md5:27dc15f819b6a78a626490881adbadf8', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 149, 'like_count': int, }, diff --git a/youtube_dl/extractor/vidme.py b/youtube_dl/extractor/vidme.py index b1156d531..e9ff336c4 100644 --- a/youtube_dl/extractor/vidme.py +++ b/youtube_dl/extractor/vidme.py @@ -23,7 +23,7 @@ class VidmeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Fishing for piranha - the easy way', 'description': 'source: https://www.facebook.com/photo.php?v=312276045600871', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1406313244, 'upload_date': '20140725', 'age_limit': 0, @@ -39,7 +39,7 @@ class VidmeIE(InfoExtractor): 'id': 'Gc6M', 'ext': 'mp4', 'title': 'O Mere Dil ke chain - Arnav and Khushi VM', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1441211642, 'upload_date': '20150902', 'uploader': 'SunshineM', @@ -61,7 +61,7 @@ class VidmeIE(InfoExtractor): 'ext': 'mp4', 'title': 'The Carver', 'description': 'md5:e9c24870018ae8113be936645b93ba3c', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1433203629, 'upload_date': '20150602', 'uploader': 'Thomas', @@ -82,7 +82,7 @@ class VidmeIE(InfoExtractor): 'id': 'Wmur', 'ext': 'mp4', 'title': 'naked smoking & stretching', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1430931613, 'upload_date': '20150506', 'uploader': 'naked-yogi', @@ -115,7 +115,7 @@ class VidmeIE(InfoExtractor): 'id': 'e5g', 'ext': 'mp4', 'title': 'Video upload (e5g)', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'timestamp': 1401480195, 'upload_date': '20140530', 'uploader': None, diff --git a/youtube_dl/extractor/viewlift.py b/youtube_dl/extractor/viewlift.py index 19500eba8..18735cfb2 100644 --- a/youtube_dl/extractor/viewlift.py +++ b/youtube_dl/extractor/viewlift.py @@ -14,7 +14,7 @@ from ..utils import ( class ViewLiftBaseIE(InfoExtractor): - _DOMAINS_REGEX = '(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv' + _DOMAINS_REGEX = r'(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv' class ViewLiftEmbedIE(ViewLiftBaseIE): @@ -110,7 +110,7 @@ class ViewLiftIE(ViewLiftBaseIE): 'ext': 'mp4', 'title': 'Lost for Life', 'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 4489, 'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals'] } @@ -123,7 +123,7 @@ class ViewLiftIE(ViewLiftBaseIE): 'ext': 'mp4', 'title': 'India', 'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 979, 'categories': ['Documentary', 'Sports', 'Politics'] } @@ -160,7 +160,7 @@ class ViewLiftIE(ViewLiftBaseIE): snag = self._parse_json( self._search_regex( - 'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'), + r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'), display_id) for item in snag: diff --git a/youtube_dl/extractor/viewster.py b/youtube_dl/extractor/viewster.py index a93196a07..52dd95e2f 100644 --- a/youtube_dl/extractor/viewster.py +++ b/youtube_dl/extractor/viewster.py @@ -157,7 +157,7 @@ class ViewsterIE(InfoExtractor): formats.extend(m3u8_formats) else: qualities_basename = self._search_regex( - '/([^/]+)\.csmil/', + r'/([^/]+)\.csmil/', manifest_url, 'qualities basename', default=None) if not qualities_basename: continue diff --git a/youtube_dl/extractor/viidea.py b/youtube_dl/extractor/viidea.py index a4f914d14..4adcd1830 100644 --- a/youtube_dl/extractor/viidea.py +++ b/youtube_dl/extractor/viidea.py @@ -40,7 +40,7 @@ class ViideaIE(InfoExtractor): 'ext': 'mp4', 'title': 'Automatics, robotics and biocybernetics', 'description': 'md5:815fc1deb6b3a2bff99de2d5325be482', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'timestamp': 1372349289, 'upload_date': '20130627', 'duration': 565, @@ -58,7 +58,7 @@ class ViideaIE(InfoExtractor): 'ext': 'flv', 'title': 'NLP at Google', 'description': 'md5:fc7a6d9bf0302d7cc0e53f7ca23747b3', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'timestamp': 1284375600, 'upload_date': '20100913', 'duration': 5352, @@ -74,7 +74,7 @@ class ViideaIE(InfoExtractor): 'id': '23181', 'title': 'Deep Learning Summer School, Montreal 2015', 'description': 'md5:0533a85e4bd918df52a01f0e1ebe87b7', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'timestamp': 1438560000, }, 'playlist_count': 30, @@ -85,7 +85,7 @@ class ViideaIE(InfoExtractor): 'id': '9737', 'display_id': 'mlss09uk_bishop_ibi', 'title': 'Introduction To Bayesian Inference', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'timestamp': 1251622800, }, 'playlist': [{ @@ -94,7 +94,7 @@ class ViideaIE(InfoExtractor): 'display_id': 'mlss09uk_bishop_ibi_part1', 'ext': 'wmv', 'title': 'Introduction To Bayesian Inference (Part 1)', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'duration': 4622, 'timestamp': 1251622800, 'upload_date': '20090830', @@ -105,7 +105,7 @@ class ViideaIE(InfoExtractor): 'display_id': 'mlss09uk_bishop_ibi_part2', 'ext': 'wmv', 'title': 'Introduction To Bayesian Inference (Part 2)', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', 'duration': 5641, 'timestamp': 1251622800, 'upload_date': '20090830', diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py index c35cafcc6..37e1da70d 100644 --- a/youtube_dl/extractor/vimeo.py +++ b/youtube_dl/extractor/vimeo.py @@ -205,7 +205,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550", 'description': 'md5:2d3305bad981a06ff79f027f19865021', 'upload_date': '20121220', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user7108434', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user7108434', 'uploader_id': 'user7108434', 'uploader': 'Filippo Valsorda', 'duration': 10, @@ -218,7 +218,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'info_dict': { 'id': '68093876', 'ext': 'mp4', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/openstreetmapus', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/openstreetmapus', 'uploader_id': 'openstreetmapus', 'uploader': 'OpenStreetMap US', 'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography', @@ -235,7 +235,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012', 'uploader': 'The BLN & Business of Software', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware', 'uploader_id': 'theblnbusinessofsoftware', 'duration': 3610, 'description': None, @@ -250,7 +250,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'youtube-dl password protected test video', 'upload_date': '20130614', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user18948128', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user18948128', 'uploader_id': 'user18948128', 'uploader': 'Jaime Marquínez Ferrándiz', 'duration': 10, @@ -268,7 +268,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'Key & Peele: Terrorist Interrogation', 'description': 'md5:8678b246399b070816b12313e8b4eb5c', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/atencio', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/atencio', 'uploader_id': 'atencio', 'uploader': 'Peter Atencio', 'upload_date': '20130927', @@ -284,7 +284,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'title': 'The New Vimeo Player (You Know, For Videos)', 'description': 'md5:2ec900bf97c3f389378a96aee11260ea', 'upload_date': '20131015', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/staff', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/staff', 'uploader_id': 'staff', 'uploader': 'Vimeo Staff', 'duration': 62, @@ -299,7 +299,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'Pier Solar OUYA Official Trailer', 'uploader': 'Tulio Gonçalves', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user28849593', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user28849593', 'uploader_id': 'user28849593', }, }, @@ -312,7 +312,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'FOX CLASSICS - Forever Classic ID - A Full Minute', 'uploader': 'The DMCI', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/dmci', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/dmci', 'uploader_id': 'dmci', 'upload_date': '20111220', 'description': 'md5:ae23671e82d05415868f7ad1aec21147', @@ -327,7 +327,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'Vimeo Tribute: The Shining', 'uploader': 'Casey Donahue', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/caseydonahue', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/caseydonahue', 'uploader_id': 'caseydonahue', 'upload_date': '20090821', 'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6', @@ -346,7 +346,7 @@ class VimeoIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'The Reluctant Revolutionary', 'uploader': '10Ft Films', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/tenfootfilms', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/tenfootfilms', 'uploader_id': 'tenfootfilms', }, 'params': { @@ -497,7 +497,7 @@ class VimeoIE(VimeoBaseInfoExtractor): except RegexNotFoundError: # For pro videos or player.vimeo.com urls # We try to find out to which variable is assigned the config dic - m_variable_name = re.search('(\w)\.video\.id', webpage) + m_variable_name = re.search(r'(\w)\.video\.id', webpage) if m_variable_name is not None: config_re = r'%s=({[^}].+?});' % re.escape(m_variable_name.group(1)) else: @@ -626,7 +626,7 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'המעבדה - במאי יותם פלדמן', 'uploader': 'גם סרטים', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/gumfilms', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/gumfilms', 'uploader_id': 'gumfilms', }, }, { @@ -637,7 +637,7 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor): 'ext': 'mp4', 'title': 'Rävlock, rätt läte på rätt plats', 'uploader': 'Lindroth & Norin', - 'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user14430847', + 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user14430847', 'uploader_id': 'user14430847', }, 'params': { @@ -857,7 +857,7 @@ class VimeoReviewIE(VimeoBaseInfoExtractor): 'title': 're:(?i)^Death by dogma versus assembling agile . Sander Hoogendoorn', 'uploader': 'DevWeek Events', 'duration': 2773, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader_id': 'user22258446', } }, { diff --git a/youtube_dl/extractor/vimple.py b/youtube_dl/extractor/vimple.py index 7fd9b777b..c74b43766 100644 --- a/youtube_dl/extractor/vimple.py +++ b/youtube_dl/extractor/vimple.py @@ -37,7 +37,7 @@ class VimpleIE(SprutoBaseIE): 'ext': 'mp4', 'title': 'Sunset', 'duration': 20, - 'thumbnail': 're:https?://.*?\.jpg', + 'thumbnail': r're:https?://.*?\.jpg', }, }, { 'url': 'http://player.vimple.ru/iframe/52e1beec-1314-4a83-aeac-c61562eadbf9', diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py index 1990e7093..6e6c3a0e1 100644 --- a/youtube_dl/extractor/vk.py +++ b/youtube_dl/extractor/vk.py @@ -245,7 +245,7 @@ class VKIE(VKBaseIE): }, }, { - # finished live stream, live_mp4 + # finished live stream, postlive_mp4 'url': 'https://vk.com/videos-387766?z=video-387766_456242764%2Fpl_-387766_-2', 'md5': '90d22d051fccbbe9becfccc615be6791', 'info_dict': { @@ -258,7 +258,7 @@ class VKIE(VKBaseIE): }, }, { - # live stream, hls and rtmp links,most likely already finished live + # live stream, hls and rtmp links, most likely already finished live # stream by the time you are reading this comment 'url': 'https://vk.com/video-140332_456239111', 'only_matching': True, @@ -378,12 +378,24 @@ class VKIE(VKBaseIE): if not data: data = self._parse_json( self._search_regex( - r'\s*({.+?})\s*', info_page, 'json'), - video_id)['player']['params'][0] + r'\s*({.+?})\s*', info_page, 'json', default='{}'), + video_id) + if data: + data = data['player']['params'][0] + + if not data: + data = self._parse_json( + self._search_regex( + r'var\s+playerParams\s*=\s*({.+?})\s*;\s*\n', info_page, + 'player params'), + video_id)['params'][0] title = unescapeHTML(data['md_title']) - if data.get('live') == 2: + # 2 = live + # 3 = post live (finished live) + is_live = data.get('live') == 2 + if is_live: title = self._live_title(title) timestamp = unified_timestamp(self._html_search_regex( @@ -398,7 +410,8 @@ class VKIE(VKBaseIE): for format_id, format_url in data.items(): if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//', 'rtmp')): continue - if format_id.startswith(('url', 'cache')) or format_id in ('extra_data', 'live_mp4'): + if (format_id.startswith(('url', 'cache')) or + format_id in ('extra_data', 'live_mp4', 'postlive_mp4')): height = int_or_none(self._search_regex( r'^(?:url|cache)(\d+)', format_id, 'height', default=None)) formats.append({ @@ -408,8 +421,9 @@ class VKIE(VKBaseIE): }) elif format_id == 'hls': formats.extend(self._extract_m3u8_formats( - format_url, video_id, 'mp4', m3u8_id=format_id, - fatal=False, live=True)) + format_url, video_id, 'mp4', + entry_protocol='m3u8' if is_live else 'm3u8_native', + m3u8_id=format_id, fatal=False, live=is_live)) elif format_id == 'rtmp': formats.append({ 'format_id': format_id, @@ -427,6 +441,7 @@ class VKIE(VKBaseIE): 'duration': data.get('duration'), 'timestamp': timestamp, 'view_count': view_count, + 'is_live': is_live, } diff --git a/youtube_dl/extractor/vodlocker.py b/youtube_dl/extractor/vodlocker.py index bbfa6e5f2..02c9617d2 100644 --- a/youtube_dl/extractor/vodlocker.py +++ b/youtube_dl/extractor/vodlocker.py @@ -20,7 +20,7 @@ class VodlockerIE(InfoExtractor): 'id': 'e8wvyzz4sl42', 'ext': 'mp4', 'title': 'Germany vs Brazil', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }] diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py index 4f1a99a89..59e1359c4 100644 --- a/youtube_dl/extractor/voicerepublic.py +++ b/youtube_dl/extractor/voicerepublic.py @@ -26,7 +26,7 @@ class VoiceRepublicIE(InfoExtractor): 'ext': 'm4a', 'title': 'Watching the Watchers: Building a Sousveillance State', 'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.', - 'thumbnail': 're:^https?://.*\.(?:png|jpg)$', + 'thumbnail': r're:^https?://.*\.(?:png|jpg)$', 'duration': 1800, 'view_count': int, } diff --git a/youtube_dl/extractor/vporn.py b/youtube_dl/extractor/vporn.py index e22900f8d..858ac9e71 100644 --- a/youtube_dl/extractor/vporn.py +++ b/youtube_dl/extractor/vporn.py @@ -23,7 +23,7 @@ class VpornIE(InfoExtractor): 'ext': 'mp4', 'title': 'Violet on her 19th birthday', 'description': 'Violet dances in front of the camera which is sure to get you horny.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'kileyGrope', 'categories': ['Masturbation', 'Teen'], 'duration': 393, @@ -41,7 +41,7 @@ class VpornIE(InfoExtractor): 'ext': 'mp4', 'title': 'Hana Shower', 'description': 'Hana showers at the bathroom.', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'Hmmmmm', 'categories': ['Big Boobs', 'Erotic', 'Teen', 'Female', '720p'], 'duration': 588, diff --git a/youtube_dl/extractor/vube.py b/youtube_dl/extractor/vube.py index 10ca6acb1..8ce3a6b81 100644 --- a/youtube_dl/extractor/vube.py +++ b/youtube_dl/extractor/vube.py @@ -26,7 +26,7 @@ class VubeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Best Drummer Ever [HD]', 'description': 'md5:2d63c4b277b85c2277761c2cf7337d71', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'uploader': 'William', 'timestamp': 1406876915, 'upload_date': '20140801', @@ -45,7 +45,7 @@ class VubeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Chiara Grispo - Price Tag by Jessie J', 'description': 'md5:8ea652a1f36818352428cb5134933313', - 'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f\.jpg$', + 'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f\.jpg$', 'uploader': 'Chiara.Grispo', 'timestamp': 1388743358, 'upload_date': '20140103', @@ -65,7 +65,7 @@ class VubeIE(InfoExtractor): 'ext': 'mp4', 'title': 'My 7 year old Sister and I singing "Alive" by Krewella', 'description': 'md5:40bcacb97796339f1690642c21d56f4a', - 'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102265d5a9f-0f17-4f6b-5753-adf08484ee1e\.jpg$', + 'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102265d5a9f-0f17-4f6b-5753-adf08484ee1e\.jpg$', 'uploader': 'Seraina', 'timestamp': 1396492438, 'upload_date': '20140403', @@ -84,7 +84,7 @@ class VubeIE(InfoExtractor): 'ext': 'mp4', 'title': 'Frozen - Let It Go Cover by Siren Gene', 'description': 'My rendition of "Let It Go" originally sung by Idina Menzel.', - 'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/10283ab622a-86c9-4681-51f2-30d1f65774af\.jpg$', + 'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/10283ab622a-86c9-4681-51f2-30d1f65774af\.jpg$', 'uploader': 'Siren', 'timestamp': 1395448018, 'upload_date': '20140322', diff --git a/youtube_dl/extractor/walla.py b/youtube_dl/extractor/walla.py index 8b9488340..cbb548672 100644 --- a/youtube_dl/extractor/walla.py +++ b/youtube_dl/extractor/walla.py @@ -20,7 +20,7 @@ class WallaIE(InfoExtractor): 'ext': 'flv', 'title': 'וואן דיירקשן: ההיסטריה', 'description': 'md5:de9e2512a92442574cdb0913c49bc4d8', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', 'duration': 3600, }, 'params': { diff --git a/youtube_dl/extractor/watchindianporn.py b/youtube_dl/extractor/watchindianporn.py index 5d3b5bdb4..ed099beea 100644 --- a/youtube_dl/extractor/watchindianporn.py +++ b/youtube_dl/extractor/watchindianporn.py @@ -22,7 +22,7 @@ class WatchIndianPornIE(InfoExtractor): 'display_id': 'hot-milf-from-kerala-shows-off-her-gorgeous-large-breasts-on-camera', 'ext': 'mp4', 'title': 'Hot milf from kerala shows off her gorgeous large breasts on camera', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'LoveJay', 'upload_date': '20160428', 'duration': 226, diff --git a/youtube_dl/extractor/webcaster.py b/youtube_dl/extractor/webcaster.py index 7486cb347..e4b65f54f 100644 --- a/youtube_dl/extractor/webcaster.py +++ b/youtube_dl/extractor/webcaster.py @@ -20,7 +20,7 @@ class WebcasterIE(InfoExtractor): 'id': 'c8cefd240aa593681c8d068cff59f407_hd', 'ext': 'mp4', 'title': 'Сибирь - Нефтехимик. Лучшие моменты первого периода', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { 'url': 'http://bl.webcaster.pro/media/start/free_6246c7a4453ac4c42b4398f840d13100_hd/2_2991109016/e8d0d82587ef435480118f9f9c41db41/4635726126', diff --git a/youtube_dl/extractor/webofstories.py b/youtube_dl/extractor/webofstories.py index 7aea47ed5..1eb1f6702 100644 --- a/youtube_dl/extractor/webofstories.py +++ b/youtube_dl/extractor/webofstories.py @@ -19,7 +19,7 @@ class WebOfStoriesIE(InfoExtractor): 'id': '4536', 'ext': 'mp4', 'title': 'The temperature of the sun', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'Hans Bethe talks about calculating the temperature of the sun', 'duration': 238, } @@ -30,7 +30,7 @@ class WebOfStoriesIE(InfoExtractor): 'id': '55908', 'ext': 'mp4', 'title': 'The story of Gemmata obscuriglobus', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'Planctomycete talks about The story of Gemmata obscuriglobus', 'duration': 169, }, @@ -42,7 +42,7 @@ class WebOfStoriesIE(InfoExtractor): 'id': '54215', 'ext': 'mp4', 'title': '"A Leg to Stand On"', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'description': 'Oliver Sacks talks about the death and resurrection of a limb', 'duration': 97, }, @@ -134,7 +134,7 @@ class WebOfStoriesPlaylistIE(InfoExtractor): entries = [ self.url_result('http://www.webofstories.com/play/%s' % video_number, 'WebOfStories') - for video_number in set(re.findall('href="/playAll/%s\?sId=(\d+)"' % playlist_id, webpage)) + for video_number in set(re.findall(r'href="/playAll/%s\?sId=(\d+)"' % playlist_id, webpage)) ] title = self._search_regex( diff --git a/youtube_dl/extractor/weiqitv.py b/youtube_dl/extractor/weiqitv.py index 8e09156c2..7e0befd39 100644 --- a/youtube_dl/extractor/weiqitv.py +++ b/youtube_dl/extractor/weiqitv.py @@ -37,11 +37,11 @@ class WeiqiTVIE(InfoExtractor): page = self._download_webpage(url, media_id) info_json_str = self._search_regex( - 'var\s+video\s*=\s*(.+});', page, 'info json str') + r'var\s+video\s*=\s*(.+});', page, 'info json str') info_json = self._parse_json(info_json_str, media_id) letvcloud_url = self._search_regex( - 'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url') + r'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url') return { '_type': 'url_transparent', diff --git a/youtube_dl/extractor/xbef.py b/youtube_dl/extractor/xbef.py index e4a2baad2..4c41e98b2 100644 --- a/youtube_dl/extractor/xbef.py +++ b/youtube_dl/extractor/xbef.py @@ -14,7 +14,7 @@ class XBefIE(InfoExtractor): 'ext': 'mp4', 'title': 'md5:7358a9faef8b7b57acda7c04816f170e', 'age_limit': 18, - 'thumbnail': 're:^http://.*\.jpg', + 'thumbnail': r're:^http://.*\.jpg', } } diff --git a/youtube_dl/extractor/xfileshare.py b/youtube_dl/extractor/xfileshare.py index de344bad2..e616adce3 100644 --- a/youtube_dl/extractor/xfileshare.py +++ b/youtube_dl/extractor/xfileshare.py @@ -44,7 +44,7 @@ class XFileShareIE(InfoExtractor): 'id': '06y9juieqpmi', 'ext': 'mp4', 'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, }, { 'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html', @@ -56,7 +56,7 @@ class XFileShareIE(InfoExtractor): 'id': '3rso4kdn6f9m', 'ext': 'mp4', 'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', } }, { 'url': 'http://movpod.in/0wguyyxi1yca', @@ -67,7 +67,7 @@ class XFileShareIE(InfoExtractor): 'id': '3ivfabn7573c', 'ext': 'mp4', 'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4', - 'thumbnail': 're:http://.*\.jpg', + 'thumbnail': r're:http://.*\.jpg', }, 'skip': 'Video removed', }, { diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py index bd8e1af2e..36a8c9840 100644 --- a/youtube_dl/extractor/xhamster.py +++ b/youtube_dl/extractor/xhamster.py @@ -5,8 +5,8 @@ import re from .common import InfoExtractor from ..utils import ( dict_get, - float_or_none, int_or_none, + parse_duration, unified_strdate, ) @@ -22,7 +22,7 @@ class XHamsterIE(InfoExtractor): 'title': 'FemaleAgent Shy beauty takes the bait', 'upload_date': '20121014', 'uploader': 'Ruseful2011', - 'duration': 893.52, + 'duration': 893, 'age_limit': 18, }, }, { @@ -33,7 +33,7 @@ class XHamsterIE(InfoExtractor): 'title': 'Britney Spears Sexy Booty', 'upload_date': '20130914', 'uploader': 'jojo747400', - 'duration': 200.48, + 'duration': 200, 'age_limit': 18, }, 'params': { @@ -48,7 +48,7 @@ class XHamsterIE(InfoExtractor): 'title': '....', 'upload_date': '20160208', 'uploader': 'parejafree', - 'duration': 72.0, + 'duration': 72, 'age_limit': 18, }, 'params': { @@ -101,9 +101,9 @@ class XHamsterIE(InfoExtractor): r''']+poster=(?P["'])(?P.+?)(?P=q)[^>]*>'''], webpage, 'thumbnail', fatal=False, group='thumbnail') - duration = float_or_none(self._search_regex( - r'(["\'])duration\1\s*:\s*(["\'])(?P.+?)\2', - webpage, 'duration', fatal=False, group='duration')) + duration = parse_duration(self._search_regex( + r'Runtime:\s*\s*([\d:]+)', webpage, + 'duration', fatal=False)) view_count = int_or_none(self._search_regex( r'content=["\']User(?:View|Play)s:(\d+)', diff --git a/youtube_dl/extractor/xuite.py b/youtube_dl/extractor/xuite.py index 4b9c1ee9c..e0818201a 100644 --- a/youtube_dl/extractor/xuite.py +++ b/youtube_dl/extractor/xuite.py @@ -24,7 +24,7 @@ class XuiteIE(InfoExtractor): 'id': '3860914', 'ext': 'mp3', 'title': '孤單南半球-歐德陽', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 247.246, 'timestamp': 1314932940, 'upload_date': '20110902', @@ -40,7 +40,7 @@ class XuiteIE(InfoExtractor): 'id': '25925099', 'ext': 'mp4', 'title': 'BigBuckBunny_320x180', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 596.458, 'timestamp': 1454242500, 'upload_date': '20160131', @@ -58,7 +58,7 @@ class XuiteIE(InfoExtractor): 'ext': 'mp4', 'title': '暗殺教室 02', 'description': '字幕:【極影字幕社】', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 1384.907, 'timestamp': 1421481240, 'upload_date': '20150117', diff --git a/youtube_dl/extractor/yesjapan.py b/youtube_dl/extractor/yesjapan.py index 112a6c030..681338c96 100644 --- a/youtube_dl/extractor/yesjapan.py +++ b/youtube_dl/extractor/yesjapan.py @@ -21,7 +21,7 @@ class YesJapanIE(InfoExtractor): 'ext': 'mp4', 'timestamp': 1416391590, 'upload_date': '20141119', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', } } diff --git a/youtube_dl/extractor/yinyuetai.py b/youtube_dl/extractor/yinyuetai.py index 834d860af..1fd8d35c6 100644 --- a/youtube_dl/extractor/yinyuetai.py +++ b/youtube_dl/extractor/yinyuetai.py @@ -18,7 +18,7 @@ class YinYueTaiIE(InfoExtractor): 'title': '少女时代_PARTY_Music Video Teaser', 'creator': '少女时代', 'duration': 25, - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { 'url': 'http://v.yinyuetai.com/video/h5/2322376', diff --git a/youtube_dl/extractor/ynet.py b/youtube_dl/extractor/ynet.py index 0d943c343..c4ae4d88e 100644 --- a/youtube_dl/extractor/ynet.py +++ b/youtube_dl/extractor/ynet.py @@ -17,7 +17,7 @@ class YnetIE(InfoExtractor): 'id': 'L-11659-99244', 'ext': 'flv', 'title': 'איש לא יודע מאיפה באנו', - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', } }, { 'url': 'http://hot.ynet.co.il/home/0,7340,L-8859-84418,00.html', @@ -25,7 +25,7 @@ class YnetIE(InfoExtractor): 'id': 'L-8859-84418', 'ext': 'flv', 'title': "צפו: הנשיקה הלוהטת של תורגי' ויוליה פלוטקין", - 'thumbnail': 're:^https?://.*\.jpg', + 'thumbnail': r're:^https?://.*\.jpg', } } ] diff --git a/youtube_dl/extractor/youporn.py b/youtube_dl/extractor/youporn.py index 0265a64a7..34ab878a4 100644 --- a/youtube_dl/extractor/youporn.py +++ b/youtube_dl/extractor/youporn.py @@ -24,7 +24,7 @@ class YouPornIE(InfoExtractor): 'ext': 'mp4', 'title': 'Sex Ed: Is It Safe To Masturbate Daily?', 'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'Ask Dan And Jennifer', 'upload_date': '20101221', 'average_rating': int, @@ -43,7 +43,7 @@ class YouPornIE(InfoExtractor): 'ext': 'mp4', 'title': 'Big Tits Awesome Brunette On amazing webcam show', 'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'uploader': 'Unknown', 'upload_date': '20111125', 'average_rating': int, diff --git a/youtube_dl/extractor/yourupload.py b/youtube_dl/extractor/yourupload.py index 4e25d6f22..4ce327845 100644 --- a/youtube_dl/extractor/yourupload.py +++ b/youtube_dl/extractor/yourupload.py @@ -19,7 +19,7 @@ class YourUploadIE(InfoExtractor): 'id': '14i14h', 'ext': 'mp4', 'title': 'BigBuckBunny_320x180.mp4', - 'thumbnail': 're:^https?://.*\.jpe?g', + 'thumbnail': r're:^https?://.*\.jpe?g', } }, { diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py index bd24a2838..335568a10 100644 --- a/youtube_dl/extractor/youtube.py +++ b/youtube_dl/extractor/youtube.py @@ -376,7 +376,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'title': 'youtube-dl test video "\'/\\ä↭𝕐', 'uploader': 'Philipp Hagemeister', 'uploader_id': 'phihag', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag', 'upload_date': '20121002', 'license': 'Standard YouTube License', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', @@ -403,7 +403,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'iconic ep', 'iconic', 'love', 'it'], 'uploader': 'Icona Pop', 'uploader_id': 'IconaPop', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IconaPop', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IconaPop', 'license': 'Standard YouTube License', 'creator': 'Icona Pop', } @@ -420,7 +420,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'description': 'md5:64249768eec3bc4276236606ea996373', 'uploader': 'justintimberlakeVEVO', 'uploader_id': 'justintimberlakeVEVO', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO', 'license': 'Standard YouTube License', 'creator': 'Justin Timberlake', 'age_limit': 18, @@ -437,7 +437,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7', 'uploader': 'SET India', 'uploader_id': 'setindia', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/setindia', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/setindia', 'license': 'Standard YouTube License', 'age_limit': 18, } @@ -451,7 +451,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'title': 'youtube-dl test video "\'/\\ä↭𝕐', 'uploader': 'Philipp Hagemeister', 'uploader_id': 'phihag', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag', 'upload_date': '20121002', 'license': 'Standard YouTube License', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', @@ -472,7 +472,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'ext': 'm4a', 'upload_date': '20121002', 'uploader_id': '8KVIDEO', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/8KVIDEO', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/8KVIDEO', 'description': '', 'uploader': '8KVIDEO', 'license': 'Standard YouTube License', @@ -531,7 +531,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'upload_date': '20100909', 'uploader': 'The Amazing Atheist', 'uploader_id': 'TheAmazingAtheist', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist', 'license': 'Standard YouTube License', 'title': 'Burning Everyone\'s Koran', 'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html', @@ -544,10 +544,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'id': 'HtVdAasjOgU', 'ext': 'mp4', 'title': 'The Witcher 3: Wild Hunt - The Sword Of Destiny Trailer', - 'description': 're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}', + 'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}', 'uploader': 'The Witcher', 'uploader_id': 'WitcherGame', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/WitcherGame', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame', 'upload_date': '20140605', 'license': 'Standard YouTube License', 'age_limit': 18, @@ -563,7 +563,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'description': 'md5:33765bb339e1b47e7e72b5490139bb41', 'uploader': 'LloydVEVO', 'uploader_id': 'LloydVEVO', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/LloydVEVO', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO', 'upload_date': '20110629', 'license': 'Standard YouTube License', 'age_limit': 18, @@ -577,7 +577,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'ext': 'mp4', 'upload_date': '20100430', 'uploader_id': 'deadmau5', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/deadmau5', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5', 'creator': 'deadmau5', 'description': 'md5:12c56784b8032162bb936a5f76d55360', 'uploader': 'deadmau5', @@ -597,7 +597,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'ext': 'mp4', 'upload_date': '20150827', 'uploader_id': 'olympic', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic', 'license': 'Standard YouTube License', 'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games', 'uploader': 'Olympic', @@ -616,7 +616,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'stretched_ratio': 16 / 9., 'upload_date': '20110310', 'uploader_id': 'AllenMeow', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/AllenMeow', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow', 'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯', 'uploader': '孫艾倫', 'license': 'Standard YouTube License', @@ -650,7 +650,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'description': 'md5:116377fd2963b81ec4ce64b542173306', 'upload_date': '20150625', 'uploader_id': 'dorappi2000', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000', 'uploader': 'dorappi2000', 'license': 'Standard YouTube License', 'formats': 'mincount:32', @@ -693,7 +693,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'upload_date': '20150721', 'uploader': 'Beer Games Beer', 'uploader_id': 'beergamesbeer', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', 'license': 'Standard YouTube License', }, }, { @@ -705,7 +705,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'upload_date': '20150721', 'uploader': 'Beer Games Beer', 'uploader_id': 'beergamesbeer', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', 'license': 'Standard YouTube License', }, }, { @@ -717,7 +717,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'upload_date': '20150721', 'uploader': 'Beer Games Beer', 'uploader_id': 'beergamesbeer', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', 'license': 'Standard YouTube License', }, }, { @@ -729,7 +729,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'upload_date': '20150721', 'uploader': 'Beer Games Beer', 'uploader_id': 'beergamesbeer', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer', 'license': 'Standard YouTube License', }, }], @@ -769,7 +769,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a', 'upload_date': '20151119', 'uploader_id': 'IronSoulElf', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IronSoulElf', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf', 'uploader': 'IronSoulElf', 'license': 'Standard YouTube License', 'creator': 'Todd Haberman, Daniel Law Heath & Aaron Kaplan', @@ -810,7 +810,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'description': 'md5:a677553cf0840649b731a3024aeff4cc', 'upload_date': '20150127', 'uploader_id': 'BerkmanCenter', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter', 'uploader': 'BerkmanCenter', 'license': 'Creative Commons Attribution license (reuse allowed)', }, @@ -829,7 +829,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'upload_date': '20151119', 'uploader': 'Bernie 2016', 'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg', 'license': 'Creative Commons Attribution license (reuse allowed)', }, 'params': { @@ -856,7 +856,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor): 'upload_date': '20150811', 'uploader': 'FlixMatrix', 'uploader_id': 'FlixMatrixKaravan', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan', 'license': 'Standard YouTube License', }, 'params': { @@ -1877,7 +1877,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor): 'title': "Smiley's People 01 detective, Adventure Series, Action", 'uploader': 'STREEM', 'uploader_id': 'UCyPhqAZgwYWZfxElWVbVJng', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng', 'upload_date': '20150526', 'license': 'Standard YouTube License', 'description': 'md5:507cdcb5a49ac0da37a920ece610be80', @@ -1898,7 +1898,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor): 'title': 'Small Scale Baler and Braiding Rugs', 'uploader': 'Backus-Page House Museum', 'uploader_id': 'backuspagemuseum', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum', 'upload_date': '20161008', 'license': 'Standard YouTube License', 'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a', @@ -2186,7 +2186,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor): 'title': 'The Young Turks - Live Main Show', 'uploader': 'The Young Turks', 'uploader_id': 'TheYoungTurks', - 'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheYoungTurks', + 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheYoungTurks', 'upload_date': '20150715', 'license': 'Standard YouTube License', 'description': 'md5:438179573adcdff3c97ebb1ee632b891', diff --git a/youtube_dl/extractor/zapiks.py b/youtube_dl/extractor/zapiks.py index 22a9a57e8..bacb82eee 100644 --- a/youtube_dl/extractor/zapiks.py +++ b/youtube_dl/extractor/zapiks.py @@ -24,7 +24,7 @@ class ZapiksIE(InfoExtractor): 'ext': 'mp4', 'title': 'EP2S3 - Bon Appétit - Eh bé viva les pyrénées con!', 'description': 'md5:7054d6f6f620c6519be1fe710d4da847', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', 'duration': 528, 'timestamp': 1359044972, 'upload_date': '20130124', diff --git a/youtube_dl/extractor/zdf.py b/youtube_dl/extractor/zdf.py index 2ef177275..a365923fb 100644 --- a/youtube_dl/extractor/zdf.py +++ b/youtube_dl/extractor/zdf.py @@ -1,262 +1,312 @@ # coding: utf-8 from __future__ import unicode_literals -import functools import re from .common import InfoExtractor +from ..compat import compat_str from ..utils import ( - int_or_none, - unified_strdate, - OnDemandPagedList, - xpath_text, determine_ext, + int_or_none, + NO_DEFAULT, + orderedSet, + parse_codecs, qualities, - float_or_none, - ExtractorError, + try_get, + unified_timestamp, + update_url_query, + urljoin, ) -class ZDFIE(InfoExtractor): - _VALID_URL = r'(?:zdf:|zdf:video:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/(.*beitrag/(?:video/)?))(?P[0-9]+)(?:/[^/?]+)?(?:\?.*)?' +class ZDFBaseIE(InfoExtractor): + def _call_api(self, url, player, referrer, video_id): + return self._download_json( + url, video_id, 'Downloading JSON content', + headers={ + 'Referer': referrer, + 'Api-Auth': 'Bearer %s' % player['apiToken'], + }) + + def _extract_player(self, webpage, video_id, fatal=True): + return self._parse_json( + self._search_regex( + r'(?s)data-zdfplayer-jsb=(["\'])(?P{.+?})\1', webpage, + 'player JSON', default='{}' if not fatal else NO_DEFAULT, + group='json'), + video_id) + + +class ZDFIE(ZDFBaseIE): + _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P[^/?]+)\.html' + _QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh') _TESTS = [{ - 'url': 'http://www.zdf.de/ZDFmediathek/beitrag/video/2037704/ZDFspezial---Ende-des-Machtpokers--?bc=sts;stt', + 'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html', 'info_dict': { - 'id': '2037704', - 'ext': 'webm', - 'title': 'ZDFspezial - Ende des Machtpokers', - 'description': 'Union und SPD haben sich auf einen Koalitionsvertrag geeinigt. Aber was bedeutet das für die Bürger? Sehen Sie hierzu das ZDFspezial "Ende des Machtpokers - Große Koalition für Deutschland".', - 'duration': 1022, - 'uploader': 'spezial', - 'uploader_id': '225948', - 'upload_date': '20131127', - }, - 'skip': 'Videos on ZDF.de are depublicised in short order', + 'id': 'zdfmediathek-trailer-100', + 'ext': 'mp4', + 'title': 'Die neue ZDFmediathek', + 'description': 'md5:3003d36487fb9a5ea2d1ff60beb55e8d', + 'duration': 30, + 'timestamp': 1477627200, + 'upload_date': '20161028', + } + }, { + 'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html', + 'only_matching': True, + }, { + 'url': 'https://www.zdf.de/dokumentation/planet-e/planet-e-uebersichtsseite-weitere-dokumentationen-von-planet-e-100.html', + 'only_matching': True, }] - def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): - param_groups = {} - for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)): - group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace')) - params = {} - for param in param_group: - params[param.get('name')] = param.get('value') - param_groups[group_id] = params + @staticmethod + def _extract_subtitles(src): + subtitles = {} + for caption in try_get(src, lambda x: x['captions'], list) or []: + subtitle_url = caption.get('uri') + if subtitle_url and isinstance(subtitle_url, compat_str): + lang = caption.get('language', 'deu') + subtitles.setdefault(lang, []).append({ + 'url': subtitle_url, + }) + return subtitles + + def _extract_format(self, video_id, formats, format_urls, meta): + format_url = meta.get('url') + if not format_url or not isinstance(format_url, compat_str): + return + if format_url in format_urls: + return + format_urls.add(format_url) + mime_type = meta.get('mimeType') + ext = determine_ext(format_url) + if mime_type == 'application/x-mpegURL' or ext == 'm3u8': + formats.extend(self._extract_m3u8_formats( + format_url, video_id, 'mp4', m3u8_id='hls', + entry_protocol='m3u8_native', fatal=False)) + elif mime_type == 'application/f4m+xml' or ext == 'f4m': + formats.extend(self._extract_f4m_formats( + update_url_query(format_url, {'hdcore': '3.7.0'}), video_id, f4m_id='hds', fatal=False)) + else: + f = parse_codecs(meta.get('mimeCodec')) + format_id = ['http'] + for p in (meta.get('type'), meta.get('quality')): + if p and isinstance(p, compat_str): + format_id.append(p) + f.update({ + 'url': format_url, + 'format_id': '-'.join(format_id), + 'format_note': meta.get('quality'), + 'language': meta.get('language'), + 'quality': qualities(self._QUALITIES)(meta.get('quality')), + 'preference': -10, + }) + formats.append(f) + + def _extract_entry(self, url, content, video_id): + title = content.get('title') or content['teaserHeadline'] + + t = content['mainVideoContent']['http://zdf.de/rels/target'] + + ptmd_path = t.get('http://zdf.de/rels/streams/ptmd') + + if not ptmd_path: + ptmd_path = t[ + 'http://zdf.de/rels/streams/ptmd-template'].replace( + '{playerId}', 'portal') + + ptmd = self._download_json(urljoin(url, ptmd_path), video_id) formats = [] - for video in smil.findall(self._xpath_ns('.//video', namespace)): - src = video.get('src') - if not src: + track_uris = set() + for p in ptmd['priorityList']: + formitaeten = p.get('formitaeten') + if not isinstance(formitaeten, list): continue - bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000) - group_id = video.get('paramGroup') - param_group = param_groups[group_id] - for proto in param_group['protocols'].split(','): - formats.append({ - 'url': '%s://%s' % (proto, param_group['host']), - 'app': param_group['app'], - 'play_path': src, - 'ext': 'flv', - 'format_id': '%s-%d' % (proto, bitrate), - 'tbr': bitrate, - }) + for f in formitaeten: + f_qualities = f.get('qualities') + if not isinstance(f_qualities, list): + continue + for quality in f_qualities: + tracks = try_get(quality, lambda x: x['audio']['tracks'], list) + if not tracks: + continue + for track in tracks: + self._extract_format( + video_id, formats, track_uris, { + 'url': track.get('uri'), + 'type': f.get('type'), + 'mimeType': f.get('mimeType'), + 'quality': quality.get('quality'), + 'language': track.get('language'), + }) self._sort_formats(formats) - return formats - def extract_from_xml_url(self, video_id, xml_url): - doc = self._download_xml( - xml_url, video_id, - note='Downloading video info', - errnote='Failed to download video info') - - status_code = doc.find('./status/statuscode') - if status_code is not None and status_code.text != 'ok': - code = status_code.text - if code == 'notVisibleAnymore': - message = 'Video %s is not available' % video_id - else: - message = '%s returned error: %s' % (self.IE_NAME, code) - raise ExtractorError(message, expected=True) - - title = doc.find('.//information/title').text - description = xpath_text(doc, './/information/detail', 'description') - duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration')) - uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader') - uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id') - upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date')) - subtitles = {} - captions_url = doc.find('.//caption/url') - if captions_url is not None: - subtitles['de'] = [{ - 'url': captions_url.text, - 'ext': 'ttml', - }] - - def xml_to_thumbnails(fnode): - thumbnails = [] - for node in fnode: - thumbnail_url = node.text - if not thumbnail_url: + thumbnails = [] + layouts = try_get( + content, lambda x: x['teaserImageRef']['layouts'], dict) + if layouts: + for layout_key, layout_url in layouts.items(): + if not isinstance(layout_url, compat_str): continue thumbnail = { - 'url': thumbnail_url, + 'url': layout_url, + 'format_id': layout_key, } - if 'key' in node.attrib: - m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key']) - if m: - thumbnail['width'] = int(m.group(1)) - thumbnail['height'] = int(m.group(2)) + mobj = re.search(r'(?P\d+)x(?P\d+)', layout_key) + if mobj: + thumbnail.update({ + 'width': int(mobj.group('width')), + 'height': int(mobj.group('height')), + }) thumbnails.append(thumbnail) - return thumbnails - - thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage')) - - format_nodes = doc.findall('.//formitaeten/formitaet') - quality = qualities(['veryhigh', 'high', 'med', 'low']) - - def get_quality(elem): - return quality(xpath_text(elem, 'quality')) - format_nodes.sort(key=get_quality) - format_ids = [] - formats = [] - for fnode in format_nodes: - video_url = fnode.find('url').text - is_available = 'http://www.metafilegenerator' not in video_url - if not is_available: - continue - format_id = fnode.attrib['basetype'] - quality = xpath_text(fnode, './quality', 'quality') - format_m = re.match(r'''(?x) - (?P[^_]+)_(?P[^_]+)_(?P[^_]+)_ - (?P[^_]+)_(?P[^_]+)_(?P[^_]+) - ''', format_id) - - ext = determine_ext(video_url, None) or format_m.group('container') - if ext not in ('smil', 'f4m', 'm3u8'): - format_id = format_id + '-' + quality - if format_id in format_ids: - continue - - if ext == 'meta': - continue - elif ext == 'smil': - formats.extend(self._extract_smil_formats( - video_url, video_id, fatal=False)) - elif ext == 'm3u8': - # the certificates are misconfigured (see - # https://github.com/rg3/youtube-dl/issues/8665) - if video_url.startswith('https://'): - continue - formats.extend(self._extract_m3u8_formats( - video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)) - elif ext == 'f4m': - formats.extend(self._extract_f4m_formats( - video_url, video_id, f4m_id=format_id, fatal=False)) - else: - proto = format_m.group('proto').lower() - - abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000) - vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000) - - width = int_or_none(xpath_text(fnode, './width', 'width')) - height = int_or_none(xpath_text(fnode, './height', 'height')) - - filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize')) - - format_note = '' - if not format_note: - format_note = None - - formats.append({ - 'format_id': format_id, - 'url': video_url, - 'ext': ext, - 'acodec': format_m.group('acodec'), - 'vcodec': format_m.group('vcodec'), - 'abr': abr, - 'vbr': vbr, - 'width': width, - 'height': height, - 'filesize': filesize, - 'format_note': format_note, - 'protocol': proto, - '_available': is_available, - }) - format_ids.append(format_id) - - self._sort_formats(formats) return { 'id': video_id, 'title': title, - 'description': description, - 'duration': duration, + 'description': content.get('leadParagraph') or content.get('teasertext'), + 'duration': int_or_none(t.get('duration')), + 'timestamp': unified_timestamp(content.get('editorialDate')), 'thumbnails': thumbnails, - 'uploader': uploader, - 'uploader_id': uploader_id, - 'upload_date': upload_date, + 'subtitles': self._extract_subtitles(ptmd), + 'formats': formats, + } + + def _extract_regular(self, url, player, video_id): + content = self._call_api(player['content'], player, url, video_id) + return self._extract_entry(player['content'], content, video_id) + + def _extract_mobile(self, video_id): + document = self._download_json( + 'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id, + video_id)['document'] + + title = document['titel'] + + formats = [] + format_urls = set() + for f in document['formitaeten']: + self._extract_format(video_id, formats, format_urls, f) + self._sort_formats(formats) + + thumbnails = [] + teaser_bild = document.get('teaserBild') + if isinstance(teaser_bild, dict): + for thumbnail_key, thumbnail in teaser_bild.items(): + thumbnail_url = try_get( + thumbnail, lambda x: x['url'], compat_str) + if thumbnail_url: + thumbnails.append({ + 'url': thumbnail_url, + 'id': thumbnail_key, + 'width': int_or_none(thumbnail.get('width')), + 'height': int_or_none(thumbnail.get('height')), + }) + + return { + 'id': video_id, + 'title': title, + 'description': document.get('beschreibung'), + 'duration': int_or_none(document.get('length')), + 'timestamp': unified_timestamp(try_get( + document, lambda x: x['meta']['editorialDate'], compat_str)), + 'thumbnails': thumbnails, + 'subtitles': self._extract_subtitles(document), 'formats': formats, - 'subtitles': subtitles, } def _real_extract(self, url): video_id = self._match_id(url) - xml_url = 'http://www.zdf.de/ZDFmediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id - return self.extract_from_xml_url(video_id, xml_url) + + webpage = self._download_webpage(url, video_id, fatal=False) + if webpage: + player = self._extract_player(webpage, url, fatal=False) + if player: + return self._extract_regular(url, player, video_id) + + return self._extract_mobile(video_id) -class ZDFChannelIE(InfoExtractor): - _VALID_URL = r'(?:zdf:topic:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/.*kanaluebersicht/(?:[^/]+/)?)(?P[0-9]+)' +class ZDFChannelIE(ZDFBaseIE): + _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P[^/?#&]+)' _TESTS = [{ - 'url': 'http://www.zdf.de/ZDFmediathek#/kanaluebersicht/1586442/sendung/Titanic', + 'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio', 'info_dict': { - 'id': '1586442', + 'id': 'das-aktuelle-sportstudio', + 'title': 'das aktuelle sportstudio | ZDF', }, - 'playlist_count': 3, + 'playlist_count': 21, }, { - 'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/aktuellste/332', - 'only_matching': True, + 'url': 'https://www.zdf.de/dokumentation/planet-e', + 'info_dict': { + 'id': 'planet-e', + 'title': 'planet e.', + }, + 'playlist_count': 4, }, { - 'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/meist-gesehen/332', - 'only_matching': True, - }, { - 'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/_/1798716?bc=nrt;nrm?flash=off', + 'url': 'https://www.zdf.de/filme/taunuskrimi/', 'only_matching': True, }] - _PAGE_SIZE = 50 - def _fetch_page(self, channel_id, page): - offset = page * self._PAGE_SIZE - xml_url = ( - 'http://www.zdf.de/ZDFmediathek/xmlservice/web/aktuellste?ak=web&offset=%d&maxLength=%d&id=%s' - % (offset, self._PAGE_SIZE, channel_id)) - doc = self._download_xml( - xml_url, channel_id, - note='Downloading channel info', - errnote='Failed to download channel info') - - title = doc.find('.//information/title').text - description = doc.find('.//information/detail').text - for asset in doc.findall('.//teasers/teaser'): - a_type = asset.find('./type').text - a_id = asset.find('./details/assetId').text - if a_type not in ('video', 'topic'): - continue - yield { - '_type': 'url', - 'playlist_title': title, - 'playlist_description': description, - 'url': 'zdf:%s:%s' % (a_type, a_id), - } + @classmethod + def suitable(cls, url): + return False if ZDFIE.suitable(url) else super(ZDFChannelIE, cls).suitable(url) def _real_extract(self, url): channel_id = self._match_id(url) - entries = OnDemandPagedList( - functools.partial(self._fetch_page, channel_id), self._PAGE_SIZE) - return { - '_type': 'playlist', - 'id': channel_id, - 'entries': entries, - } + webpage = self._download_webpage(url, channel_id) + + entries = [ + self.url_result(item_url, ie=ZDFIE.ie_key()) + for item_url in orderedSet(re.findall( + r'data-plusbar-url=["\'](http.+?\.html)', webpage))] + + return self.playlist_result( + entries, channel_id, self._og_search_title(webpage, fatal=False)) + + r""" + player = self._extract_player(webpage, channel_id) + + channel_id = self._search_regex( + r'docId\s*:\s*(["\'])(?P(?!\1).+?)\1', webpage, + 'channel id', group='id') + + channel = self._call_api( + 'https://api.zdf.de/content/documents/%s.json' % channel_id, + player, url, channel_id) + + items = [] + for module in channel['module']: + for teaser in try_get(module, lambda x: x['teaser'], list) or []: + t = try_get( + teaser, lambda x: x['http://zdf.de/rels/target'], dict) + if not t: + continue + items.extend(try_get( + t, + lambda x: x['resultsWithVideo']['http://zdf.de/rels/search/results'], + list) or []) + items.extend(try_get( + module, + lambda x: x['filterRef']['resultsWithVideo']['http://zdf.de/rels/search/results'], + list) or []) + + entries = [] + entry_urls = set() + for item in items: + t = try_get(item, lambda x: x['http://zdf.de/rels/target'], dict) + if not t: + continue + sharing_url = t.get('http://zdf.de/rels/sharing-url') + if not sharing_url or not isinstance(sharing_url, compat_str): + continue + if sharing_url in entry_urls: + continue + entry_urls.add(sharing_url) + entries.append(self.url_result( + sharing_url, ie=ZDFIE.ie_key(), video_id=t.get('id'))) + + return self.playlist_result(entries, channel_id, channel.get('title')) + """ diff --git a/youtube_dl/extractor/zingmp3.py b/youtube_dl/extractor/zingmp3.py index 0f0e9d0eb..adfdcaabf 100644 --- a/youtube_dl/extractor/zingmp3.py +++ b/youtube_dl/extractor/zingmp3.py @@ -95,7 +95,7 @@ class ZingMp3IE(ZingMp3BaseInfoExtractor): 'id': 'ZWZB9WAB', 'title': 'Xa Mãi Xa', 'ext': 'mp3', - 'thumbnail': 're:^https?://.*\.jpg$', + 'thumbnail': r're:^https?://.*\.jpg$', }, }, { 'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html', diff --git a/youtube_dl/options.py b/youtube_dl/options.py index 53497fbc6..0eb4924b6 100644 --- a/youtube_dl/options.py +++ b/youtube_dl/options.py @@ -178,6 +178,10 @@ def parseOpts(overrideArguments=None): 'When given in the global configuration file /etc/youtube-dl.conf: ' 'Do not read the user configuration in ~/.config/youtube-dl/config ' '(%APPDATA%/youtube-dl/config.txt on Windows)') + general.add_option( + '--config-location', + dest='config_location', metavar='PATH', + help='Location of the configuration file; either the path to the config or its containing directory.') general.add_option( '--flat-playlist', action='store_const', dest='extract_flat', const='in_playlist', @@ -341,7 +345,7 @@ def parseOpts(overrideArguments=None): authentication.add_option( '-2', '--twofactor', dest='twofactor', metavar='TWOFACTOR', - help='Two-factor auth code') + help='Two-factor authentication code') authentication.add_option( '-n', '--netrc', action='store_true', dest='usenetrc', default=False, @@ -469,7 +473,7 @@ def parseOpts(overrideArguments=None): downloader.add_option( '--xattr-set-filesize', dest='xattr_set_filesize', action='store_true', - help='Set file xattribute ytdl.filesize with expected filesize (experimental)') + help='Set file xattribute ytdl.filesize with expected file size (experimental)') downloader.add_option( '--hls-prefer-native', dest='hls_prefer_native', action='store_true', default=None, @@ -845,22 +849,32 @@ def parseOpts(overrideArguments=None): return conf command_line_conf = compat_conf(sys.argv[1:]) + opts, args = parser.parse_args(command_line_conf) - if '--ignore-config' in command_line_conf: - system_conf = [] - user_conf = [] + system_conf = user_conf = custom_conf = [] + + if '--config-location' in command_line_conf: + location = compat_expanduser(opts.config_location) + if os.path.isdir(location): + location = os.path.join(location, 'youtube-dl.conf') + if not os.path.exists(location): + parser.error('config-location %s does not exist.' % location) + custom_conf = _readOptions(location) + elif '--ignore-config' in command_line_conf: + pass else: system_conf = _readOptions('/etc/youtube-dl.conf') - if '--ignore-config' in system_conf: - user_conf = [] - else: + if '--ignore-config' not in system_conf: user_conf = _readUserConf() - argv = system_conf + user_conf + command_line_conf + argv = system_conf + user_conf + command_line_conf opts, args = parser.parse_args(argv) if opts.verbose: - write_string('[debug] System config: ' + repr(_hide_login_info(system_conf)) + '\n') - write_string('[debug] User config: ' + repr(_hide_login_info(user_conf)) + '\n') - write_string('[debug] Command-line args: ' + repr(_hide_login_info(command_line_conf)) + '\n') + for conf_label, conf in ( + ('System config', system_conf), + ('User config', user_conf), + ('Custom config', custom_conf), + ('Command-line args', command_line_conf)): + write_string('[debug] %s: %s\n' % (conf_label, repr(_hide_login_info(conf)))) return parser, opts, args diff --git a/youtube_dl/postprocessor/metadatafromtitle.py b/youtube_dl/postprocessor/metadatafromtitle.py index 920573da9..164edd3a8 100644 --- a/youtube_dl/postprocessor/metadatafromtitle.py +++ b/youtube_dl/postprocessor/metadatafromtitle.py @@ -12,7 +12,7 @@ class MetadataFromTitlePP(PostProcessor): self._titleregex = self.format_to_regex(titleformat) def format_to_regex(self, fmt): - """ + r""" Converts a string like '%(title)s - %(artist)s' to a regex like diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py index 528d87bb9..39dd6c49f 100644 --- a/youtube_dl/utils.py +++ b/youtube_dl/utils.py @@ -501,7 +501,7 @@ def sanitize_path(s): if drive_or_unc: norm_path.pop(0) sanitized_path = [ - path_part if path_part in ['.', '..'] else re.sub('(?:[/<>:"\\|\\\\?\\*]|[\s.]$)', '#', path_part) + path_part if path_part in ['.', '..'] else re.sub(r'(?:[/<>:"\|\\?\*]|[\s.]$)', '#', path_part) for path_part in norm_path] if drive_or_unc: sanitized_path.insert(0, drive_or_unc + os.path.sep) @@ -1183,7 +1183,7 @@ def date_from_str(date_str): return today if date_str == 'yesterday': return today - datetime.timedelta(days=1) - match = re.match('(now|today)(?P[+-])(?P