Merge remote-tracking branch 'rg3/master'

This commit is contained in:
renalid 2016-07-04 09:14:10 +02:00
commit f1ec33355c
19 changed files with 300 additions and 187 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.03*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.03.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.03** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.03.1**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.07.03 [debug] youtube-dl version 2016.07.03.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -97,9 +97,17 @@ If you want to add support for a new site, first of all **make sure** this site
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`): After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git` 2. Check out the source code with:
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
3. Start a new git branch with
cd youtube-dl
git checkout -b yourextractor
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`: 4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
```python ```python
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
@ -143,16 +151,148 @@ After you have ensured this site is distributing it's content legally, you can f
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py). 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`. 8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+. 9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/extractors.py $ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!
## youtube-dl coding conventions
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
### Mandatory and optional metafields
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
- `id` (media identifier)
- `title` (media title)
- `url` (media download URL) or `formats`
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
#### Example
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
```python
meta = self._download_json(url, video_id)
```
Assume at this point `meta`'s layout is:
```python
{
...
"summary": "some fancy summary text",
...
}
```
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
```python
description = meta.get('summary') # correct
```
and not like:
```python
description = meta['summary'] # incorrect
```
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', fatal=False)
```
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
You can also pass `default=<some fallback value>`, for example:
```python
description = self._search_regex(
r'<span[^>]+id="title"[^>]*>([^<]+)<',
webpage, 'description', default=None)
```
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
### Provide fallbacks
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
#### Example
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
```python
title = meta['title']
```
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
### Make regular expressions flexible
When using regular expressions try to write them fuzzy and flexible.
#### Example
Say you need to extract `title` from the following HTML code:
```html
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
```
The code for that task should look similar to:
```python
title = self._search_regex(
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
Or even better:
```python
title = self._search_regex(
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
webpage, 'title', group='title')
```
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
The code definitely should not look like:
```python
title = self._search_regex(
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
webpage, 'title', group='title')
```
### Use safe conversion functions
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.

View File

@ -424,7 +424,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory: For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```

View File

@ -0,0 +1,41 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import json
import os
import re
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from youtube_dl.compat import (
compat_print,
compat_urllib_request,
)
from youtube_dl.utils import format_bytes
def format_size(bytes):
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
total_bytes = 0
releases = json.loads(compat_urllib_request.urlopen(
'https://api.github.com/repos/rg3/youtube-dl/releases').read().decode('utf-8'))
for release in releases:
compat_print(release['name'])
for asset in release['assets']:
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
r'^youtube-dl$',
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
r'^youtube-dl\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
% (asset_name, format_size(asset['size']), asset['download_count']))
compat_print('total downloads traffic: %s' % format_size(total_bytes))

View File

@ -138,27 +138,27 @@ class TestProxy(unittest.TestCase):
self.proxy_thread.daemon = True self.proxy_thread.daemon = True
self.proxy_thread.start() self.proxy_thread.start()
self.cn_proxy = compat_http_server.HTTPServer( self.geo_proxy = compat_http_server.HTTPServer(
('localhost', 0), _build_proxy_handler('cn')) ('localhost', 0), _build_proxy_handler('geo'))
self.cn_port = http_server_port(self.cn_proxy) self.geo_port = http_server_port(self.geo_proxy)
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever) self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
self.cn_proxy_thread.daemon = True self.geo_proxy_thread.daemon = True
self.cn_proxy_thread.start() self.geo_proxy_thread.start()
def test_proxy(self): def test_proxy(self):
cn_proxy = 'localhost:{0}'.format(self.cn_port) geo_proxy = 'localhost:{0}'.format(self.geo_port)
ydl = YoutubeDL({ ydl = YoutubeDL({
'proxy': 'localhost:{0}'.format(self.port), 'proxy': 'localhost:{0}'.format(self.port),
'cn_verification_proxy': cn_proxy, 'geo_verification_proxy': geo_proxy,
}) })
url = 'http://foo.com/bar' url = 'http://foo.com/bar'
response = ydl.urlopen(url).read().decode('utf-8') response = ydl.urlopen(url).read().decode('utf-8')
self.assertEqual(response, 'normal: {0}'.format(url)) self.assertEqual(response, 'normal: {0}'.format(url))
req = compat_urllib_request.Request(url) req = compat_urllib_request.Request(url)
req.add_header('Ytdl-request-proxy', cn_proxy) req.add_header('Ytdl-request-proxy', geo_proxy)
response = ydl.urlopen(req).read().decode('utf-8') response = ydl.urlopen(req).read().decode('utf-8')
self.assertEqual(response, 'cn: {0}'.format(url)) self.assertEqual(response, 'geo: {0}'.format(url))
def test_proxy_with_idn(self): def test_proxy_with_idn(self):
ydl = YoutubeDL({ ydl = YoutubeDL({

View File

@ -196,8 +196,8 @@ class YoutubeDL(object):
prefer_insecure: Use HTTP instead of HTTPS to retrieve information. prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
At the moment, this is only supported by YouTube. At the moment, this is only supported by YouTube.
proxy: URL of the proxy server to use proxy: URL of the proxy server to use
cn_verification_proxy: URL of the proxy to use for IP address verification geo_verification_proxy: URL of the proxy to use for IP address verification
on Chinese sites. (Experimental) on geo-restricted sites. (Experimental)
socket_timeout: Time to wait for unresponsive hosts, in seconds socket_timeout: Time to wait for unresponsive hosts, in seconds
bidi_workaround: Work around buggy terminals without bidirectional text bidi_workaround: Work around buggy terminals without bidirectional text
support, using fridibi support, using fridibi
@ -304,6 +304,11 @@ class YoutubeDL(object):
self.params.update(params) self.params.update(params)
self.cache = Cache(self) self.cache = Cache(self)
if self.params.get('cn_verification_proxy') is not None:
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
if self.params.get('geo_verification_proxy') is None:
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
if params.get('bidi_workaround', False): if params.get('bidi_workaround', False):
try: try:
import pty import pty

View File

@ -382,6 +382,8 @@ def _real_main(argv=None):
'external_downloader_args': external_downloader_args, 'external_downloader_args': external_downloader_args,
'postprocessor_args': postprocessor_args, 'postprocessor_args': postprocessor_args,
'cn_verification_proxy': opts.cn_verification_proxy, 'cn_verification_proxy': opts.cn_verification_proxy,
'geo_verification_proxy': opts.geo_verification_proxy,
} }
with YoutubeDL(ydl_opts) as ydl: with YoutubeDL(ydl_opts) as ydl:

View File

@ -585,6 +585,13 @@ class BrightcoveNewIE(InfoExtractor):
'format_id': build_format_id('rtmp'), 'format_id': build_format_id('rtmp'),
}) })
formats.append(f) formats.append(f)
errors = json_data.get('errors')
if not formats and errors:
error = errors[0]
raise ExtractorError(
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
self._sort_formats(formats) self._sort_formats(formats)
subtitles = {} subtitles = {}

View File

@ -1729,6 +1729,13 @@ class InfoExtractor(object):
def _mark_watched(self, *args, **kwargs): def _mark_watched(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses') raise NotImplementedError('This method must be implemented by subclasses')
def geo_verification_headers(self):
headers = {}
geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
if geo_verification_proxy:
headers['Ytdl-request-proxy'] = geo_verification_proxy
return headers
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """

View File

@ -1,10 +1,8 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import binascii
import hashlib import hashlib
import itertools import itertools
import math
import re import re
import time import time
@ -14,12 +12,13 @@ from ..compat import (
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
) )
from ..utils import ( from ..utils import (
clean_html,
decode_packed_codes, decode_packed_codes,
get_element_by_id,
get_element_by_attribute,
ExtractorError, ExtractorError,
intlist_to_bytes,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
remove_start, remove_start,
urshift,
) )
@ -166,7 +165,7 @@ class IqiyiIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.iqiyi.com/v_19rrojlavg.html', 'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
'md5': '470a6c160618577166db1a7aac5a3606', # MD5 checksum differs on my machine and Travis CI
'info_dict': { 'info_dict': {
'id': '9c1fb1b99d192b21c559e5a1a2cb3c73', 'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
'ext': 'mp4', 'ext': 'mp4',
@ -174,11 +173,11 @@ class IqiyiIE(InfoExtractor):
} }
}, { }, {
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html', 'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
'md5': 'f09f0a6a59b2da66a26bf4eda669a4cc', 'md5': '667171934041350c5de3f5015f7f1152',
'info_dict': { 'info_dict': {
'id': 'e3f585b550a280af23c98b6cb2be19fb', 'id': 'e3f585b550a280af23c98b6cb2be19fb',
'ext': 'mp4', 'ext': 'mp4',
'title': '名侦探柯南 国语版', 'title': '名侦探柯南 国语版第752集 迫近灰原秘密的黑影 下篇',
}, },
'skip': 'Geo-restricted to China', 'skip': 'Geo-restricted to China',
}, { }, {
@ -196,22 +195,10 @@ class IqiyiIE(InfoExtractor):
'url': 'http://www.iqiyi.com/v_19rrny4w8w.html', 'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
'info_dict': { 'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1', 'id': 'f3cf468b39dddb30d676f89a91200dc1',
'ext': 'mp4',
'title': '泰坦尼克号', 'title': '泰坦尼克号',
}, },
'playlist': [{ 'skip': 'Geo-restricted to China',
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}, {
'info_dict': {
'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
'ext': 'f4v',
'title': '泰坦尼克号',
},
}],
'expected_warnings': ['Needs a VIP account for full video'],
}, { }, {
'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html', 'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
'info_dict': { 'info_dict': {
@ -224,14 +211,16 @@ class IqiyiIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_FORMATS_MAP = [ _FORMATS_MAP = {
('1', 'h6'), '96': 1, # 216p, 240p
('2', 'h5'), '1': 2, # 336p, 360p
('3', 'h4'), '2': 3, # 480p, 504p
('4', 'h3'), '21': 4, # 504p
('5', 'h2'), '4': 5, # 720p
('10', 'h1'), '17': 5, # 720p
] '5': 6, # 1072p, 1080p
'18': 7, # 1080p
}
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
@ -291,101 +280,23 @@ class IqiyiIE(InfoExtractor):
return True return True
@staticmethod
def _gen_sc(tvid, timestamp):
M = [1732584193, -271733879]
M.extend([~M[0], ~M[1]])
I_table = [7, 12, 17, 22, 5, 9, 14, 20, 4, 11, 16, 23, 6, 10, 15, 21]
C_base = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8388608, 432]
def L(n, t):
if t is None:
t = 0
return trunc(((n >> 1) + (t >> 1) << 1) + (n & 1) + (t & 1))
def trunc(n):
n = n % 0x100000000
if n > 0x7fffffff:
n -= 0x100000000
return n
def transform(string, mod):
num = int(string, 16)
return (num >> 8 * (i % 4) & 255 ^ i % mod) << ((a & 3) << 3)
C = list(C_base)
o = list(M)
k = str(timestamp - 7)
for i in range(13):
a = i
C[a >> 2] |= ord(k[a]) << 8 * (a % 4)
for i in range(16):
a = i + 13
start = (i >> 2) * 8
r = '03967743b643f66763d623d637e30733'
C[a >> 2] |= transform(''.join(reversed(r[start:start + 8])), 7)
for i in range(16):
a = i + 29
start = (i >> 2) * 8
r = '7038766939776a32776a32706b337139'
C[a >> 2] |= transform(r[start:start + 8], 1)
for i in range(9):
a = i + 45
if i < len(tvid):
C[a >> 2] |= ord(tvid[i]) << 8 * (a % 4)
for a in range(64):
i = a
I = i >> 4
C_index = [i, 5 * i + 1, 3 * i + 5, 7 * i][I] % 16 + urshift(a, 6)
m = L(L(o[0], [
trunc(o[1] & o[2]) | trunc(~o[1] & o[3]),
trunc(o[3] & o[1]) | trunc(~o[3] & o[2]),
o[1] ^ o[2] ^ o[3],
o[2] ^ trunc(o[1] | ~o[3])
][I]), L(
trunc(int(abs(math.sin(i + 1)) * 4294967296)),
C[C_index] if C_index < len(C) else None))
I = I_table[4 * I + i % 4]
o = [o[3],
L(o[1], trunc(trunc(m << I) | urshift(m, 32 - I))),
o[1],
o[2]]
new_M = [L(o[0], M[0]), L(o[1], M[1]), L(o[2], M[2]), L(o[3], M[3])]
s = [new_M[a >> 3] >> (1 ^ a & 7) * 4 & 15 for a in range(32)]
return binascii.hexlify(intlist_to_bytes(s))[1::2].decode('ascii')
def get_raw_data(self, tvid, video_id): def get_raw_data(self, tvid, video_id):
tm = int(time.time() * 1000) tm = int(time.time() * 1000)
sc = self._gen_sc(tvid, tm) key = 'd5fb4bd9d50c4be6948c97edd7254b0e'
sc = md5_text(compat_str(tm) + key + tvid)
params = { params = {
'platForm': 'h5',
'rate': 1,
'tvid': tvid, 'tvid': tvid,
'vid': video_id, 'vid': video_id,
'cupid': 'qc_100001_100186', 'src': '76f90cbd92f94a2e925d83e8ccd22cb7',
'type': 'mp4',
'nolimit': 0,
'agenttype': 13,
'src': 'd846d0c32d664d32b6b54ea48997a589',
'sc': sc, 'sc': sc,
't': tm - 7, 't': tm,
'__jsT': None,
} }
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
return self._download_json( return self._download_json(
'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id), 'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='), video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
query=params, headers=headers) query=params, headers=self.geo_verification_headers())
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage):
PAGE_SIZE = 50 PAGE_SIZE = 50
@ -435,6 +346,7 @@ class IqiyiIE(InfoExtractor):
video_id = self._search_regex( video_id = self._search_regex(
r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id') r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
formats = []
for _ in range(5): for _ in range(5):
raw_data = self.get_raw_data(tvid, video_id) raw_data = self.get_raw_data(tvid, video_id)
@ -445,16 +357,29 @@ class IqiyiIE(InfoExtractor):
data = raw_data['data'] data = raw_data['data']
# iQiYi sometimes returns Ads for stream in data['vidl']:
if not isinstance(data['playInfo'], dict): if 'm3utx' not in stream:
self._sleep(5, video_id)
continue continue
vd = compat_str(stream['vd'])
formats.append({
'url': stream['m3utx'],
'format_id': vd,
'ext': 'mp4',
'preference': self._FORMATS_MAP.get(vd, -1),
'protocol': 'm3u8_native',
})
title = data['playInfo']['an'] if formats:
break break
self._sleep(5, video_id)
self._sort_formats(formats)
title = (get_element_by_id('widget-videotitle', webpage) or
clean_html(get_element_by_attribute('class', 'mod-play-tit', webpage)))
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': title,
'url': data['m3u'], 'formats': formats,
} }

View File

@ -26,11 +26,6 @@ class KuwoBaseIE(InfoExtractor):
def _get_formats(self, song_id, tolerate_ip_deny=False): def _get_formats(self, song_id, tolerate_ip_deny=False):
formats = [] formats = []
for file_format in self._FORMATS: for file_format in self._FORMATS:
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
query = { query = {
'format': file_format['ext'], 'format': file_format['ext'],
'br': file_format.get('br', ''), 'br': file_format.get('br', ''),
@ -42,7 +37,7 @@ class KuwoBaseIE(InfoExtractor):
song_url = self._download_webpage( song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s', 'http://antiserver.kuwo.cn/anti.s',
song_id, note='Download %s url info' % file_format['format'], song_id, note='Download %s url info' % file_format['format'],
query=query, headers=headers, query=query, headers=self.geo_verification_headers(),
) )
if song_url == 'IPDeny' and not tolerate_ip_deny: if song_url == 'IPDeny' and not tolerate_ip_deny:

View File

@ -20,7 +20,6 @@ from ..utils import (
int_or_none, int_or_none,
orderedSet, orderedSet,
parse_iso8601, parse_iso8601,
sanitized_Request,
str_or_none, str_or_none,
url_basename, url_basename,
urshift, urshift,
@ -121,16 +120,11 @@ class LeIE(InfoExtractor):
'tkey': self.calc_time_key(int(time.time())), 'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.le.com' 'domain': 'www.le.com'
} }
play_json_req = sanitized_Request(
'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
play_json = self._download_json( play_json = self._download_json(
play_json_req, 'http://api.le.com/mms/out/video/playJson',
media_id, 'Downloading playJson data') media_id, 'Downloading playJson data', query=params,
headers=self.geo_verification_headers())
# Check for errors # Check for errors
playstatus = play_json['playstatus'] playstatus = play_json['playstatus']

View File

@ -82,6 +82,10 @@ class PornHubIE(InfoExtractor):
# removed by uploader # removed by uploader
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph572716d15a111', 'url': 'http://www.pornhub.com/view_video.php?viewkey=ph572716d15a111',
'only_matching': True, 'only_matching': True,
}, {
# private video
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph56fd731fce6b7',
'only_matching': True,
}, { }, {
'url': 'https://www.thumbzilla.com/video/ph56c6114abd99a/horny-girlfriend-sex', 'url': 'https://www.thumbzilla.com/video/ph56c6114abd99a/horny-girlfriend-sex',
'only_matching': True, 'only_matching': True,
@ -107,7 +111,7 @@ class PornHubIE(InfoExtractor):
webpage = self._download_webpage(req, video_id) webpage = self._download_webpage(req, video_id)
error_msg = self._html_search_regex( error_msg = self._html_search_regex(
r'(?s)<div[^>]+class=(["\']).*?\bremoved\b.*?\1[^>]*>(?P<error>.+?)</div>', r'(?s)<div[^>]+class=(["\']).*?\b(?:removed|userMessageSection)\b.*?\1[^>]*>(?P<error>.+?)</div>',
webpage, 'error message', default=None, group='error') webpage, 'error message', default=None, group='error')
if error_msg: if error_msg:
error_msg = re.sub(r'\s+', ' ', error_msg) error_msg = re.sub(r'\s+', ' ', error_msg)

View File

@ -20,17 +20,12 @@ class RaiBaseIE(InfoExtractor):
formats = [] formats = []
for platform in ('mon', 'flash', 'native'): for platform in ('mon', 'flash', 'native'):
headers = {}
# TODO: rename --cn-verification-proxy
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
relinker = self._download_xml( relinker = self._download_xml(
relinker_url, video_id, relinker_url, video_id,
note='Downloading XML metadata for platform %s' % platform, note='Downloading XML metadata for platform %s' % platform,
transform_source=fix_xml_ampersands, transform_source=fix_xml_ampersands,
query={'output': 45, 'pl': platform}, headers=headers) query={'output': 45, 'pl': platform},
headers=self.geo_verification_headers())
media_url = find_xpath_attr(relinker, './url', 'type', 'content').text media_url = find_xpath_attr(relinker, './url', 'type', 'content').text
if media_url == 'http://download.rai.it/video_no_available.mp4': if media_url == 'http://download.rai.it/video_no_available.mp4':

View File

@ -8,10 +8,7 @@ from ..compat import (
compat_str, compat_str,
compat_urllib_parse_urlencode, compat_urllib_parse_urlencode,
) )
from ..utils import ( from ..utils import ExtractorError
ExtractorError,
sanitized_Request,
)
class SohuIE(InfoExtractor): class SohuIE(InfoExtractor):
@ -96,15 +93,10 @@ class SohuIE(InfoExtractor):
else: else:
base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid=' base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid='
req = sanitized_Request(base_data_url + vid_id)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
req.add_header('Ytdl-request-proxy', cn_verification_proxy)
return self._download_json( return self._download_json(
req, video_id, base_data_url + vid_id, video_id,
'Downloading JSON data for %s' % vid_id) 'Downloading JSON data for %s' % vid_id,
headers=self.geo_verification_headers())
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')

View File

@ -19,6 +19,7 @@ from ..utils import (
mimetype2ext, mimetype2ext,
) )
from .brightcove import BrightcoveNewIE
from .nbc import NBCSportsVPlayerIE from .nbc import NBCSportsVPlayerIE
@ -227,7 +228,12 @@ class YahooIE(InfoExtractor):
# Look for NBCSports iframes # Look for NBCSports iframes
nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage) nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
if nbc_sports_url: if nbc_sports_url:
return self.url_result(nbc_sports_url, 'NBCSportsVPlayer') return self.url_result(nbc_sports_url, NBCSportsVPlayerIE.ie_key())
# Look for Brightcove New Studio embeds
bc_url = BrightcoveNewIE._extract_url(webpage)
if bc_url:
return self.url_result(bc_url, BrightcoveNewIE.ie_key())
# Query result is often embedded in webpage as JSON. Sometimes explicit requests # Query result is often embedded in webpage as JSON. Sometimes explicit requests
# to video API results in a failure with geo restriction reason therefore using # to video API results in a failure with geo restriction reason therefore using

View File

@ -16,7 +16,6 @@ from ..compat import (
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
get_element_by_attribute, get_element_by_attribute,
sanitized_Request,
) )
@ -218,14 +217,10 @@ class YoukuIE(InfoExtractor):
headers = { headers = {
'Referer': req_url, 'Referer': req_url,
} }
headers.update(self.geo_verification_headers())
self._set_cookie('youku.com', 'xreferrer', 'http://www.youku.com') self._set_cookie('youku.com', 'xreferrer', 'http://www.youku.com')
req = sanitized_Request(req_url, headers=headers)
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy') raw_data = self._download_json(req_url, video_id, note=note, headers=headers)
if cn_verification_proxy:
req.add_header('Ytdl-request-proxy', cn_verification_proxy)
raw_data = self._download_json(req, video_id, note=note)
return raw_data['data'] return raw_data['data']

View File

@ -209,11 +209,16 @@ def parseOpts(overrideArguments=None):
action='store_const', const='::', dest='source_address', action='store_const', const='::', dest='source_address',
help='Make all connections via IPv6 (experimental)', help='Make all connections via IPv6 (experimental)',
) )
network.add_option(
'--geo-verification-proxy',
dest='geo_verification_proxy', default=None, metavar='URL',
help='Use this proxy to verify the IP address for some geo-restricted sites. '
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
)
network.add_option( network.add_option(
'--cn-verification-proxy', '--cn-verification-proxy',
dest='cn_verification_proxy', default=None, metavar='URL', dest='cn_verification_proxy', default=None, metavar='URL',
help='Use this proxy to verify the IP address for some Chinese sites. ' help=optparse.SUPPRESS_HELP,
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
) )
selection = optparse.OptionGroup(parser, 'Video Selection') selection = optparse.OptionGroup(parser, 'Video Selection')

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.07.03' __version__ = '2016.07.03.1'