Merge branch 'master' of https://github.com/rg3/youtube-dl
This commit is contained in:
commit
df152a4dec
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@ -6,8 +6,8 @@
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.01**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.07.03.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.07.03.1**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2016.07.01
|
||||
[debug] youtube-dl version 2016.07.03.1
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
22
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
22
.github/PULL_REQUEST_TEMPLATE.md
vendored
Normal file
@ -0,0 +1,22 @@
|
||||
## Please follow the guide below
|
||||
|
||||
- You will be asked some questions, please read them **carefully** and answer honestly
|
||||
- Put an `x` into all the boxes [ ] relevant to your *pull request* (like that [x])
|
||||
- Use *Preview* tab to see how your *pull request* will actually look like
|
||||
|
||||
---
|
||||
|
||||
### Before submitting a *pull request* make sure you have:
|
||||
- [ ] At least skimmed through [adding new extractor tutorial](https://github.com/rg3/youtube-dl#adding-support-for-a-new-site) and [youtube-dl coding conventions](https://github.com/rg3/youtube-dl#youtube-dl-coding-conventions) sections
|
||||
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?q=is%3Apr&type=Issues) the bugtracker for similar pull requests
|
||||
|
||||
### What is the purpose of your *pull request*?
|
||||
- [ ] Bug fix
|
||||
- [ ] New extractor
|
||||
- [ ] New feature
|
||||
|
||||
---
|
||||
|
||||
### Description of your *pull request* and other information
|
||||
|
||||
Explanation of your *pull request* in arbitrary form goes here. Please make sure the description explains the purpose and effect of your *pull request* and is worded well enough to be understood. Provide as much context and examples as possible.
|
1
AUTHORS
1
AUTHORS
@ -176,3 +176,4 @@ Déstin Reed
|
||||
Roman Tsiupa
|
||||
Artur Krysiak
|
||||
Jakub Adam Wieczorek
|
||||
Aleksandar Topuzović
|
||||
|
152
CONTRIBUTING.md
152
CONTRIBUTING.md
@ -97,9 +97,17 @@ If you want to add support for a new site, first of all **make sure** this site
|
||||
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
|
||||
|
||||
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
|
||||
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
|
||||
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
|
||||
2. Check out the source code with:
|
||||
|
||||
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
|
||||
|
||||
3. Start a new git branch with
|
||||
|
||||
cd youtube-dl
|
||||
git checkout -b yourextractor
|
||||
|
||||
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
|
||||
|
||||
```python
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
@ -143,16 +151,148 @@ After you have ensured this site is distributing it's content legally, you can f
|
||||
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
|
||||
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
|
||||
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
|
||||
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
|
||||
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
|
||||
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
|
||||
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
|
||||
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
|
||||
|
||||
$ git add youtube_dl/extractor/extractors.py
|
||||
$ git add youtube_dl/extractor/yourextractor.py
|
||||
$ git commit -m '[yourextractor] Add new extractor'
|
||||
$ git push origin yourextractor
|
||||
|
||||
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
|
||||
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
|
||||
|
||||
In any case, thank you very much for your contributions!
|
||||
|
||||
## youtube-dl coding conventions
|
||||
|
||||
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
|
||||
|
||||
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
|
||||
|
||||
### Mandatory and optional metafields
|
||||
|
||||
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
|
||||
|
||||
- `id` (media identifier)
|
||||
- `title` (media title)
|
||||
- `url` (media download URL) or `formats`
|
||||
|
||||
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
|
||||
|
||||
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
|
||||
|
||||
#### Example
|
||||
|
||||
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
|
||||
|
||||
```python
|
||||
meta = self._download_json(url, video_id)
|
||||
```
|
||||
|
||||
Assume at this point `meta`'s layout is:
|
||||
|
||||
```python
|
||||
{
|
||||
...
|
||||
"summary": "some fancy summary text",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
|
||||
|
||||
```python
|
||||
description = meta.get('summary') # correct
|
||||
```
|
||||
|
||||
and not like:
|
||||
|
||||
```python
|
||||
description = meta['summary'] # incorrect
|
||||
```
|
||||
|
||||
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
|
||||
|
||||
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
|
||||
|
||||
```python
|
||||
description = self._search_regex(
|
||||
r'<span[^>]+id="title"[^>]*>([^<]+)<',
|
||||
webpage, 'description', fatal=False)
|
||||
```
|
||||
|
||||
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
|
||||
|
||||
You can also pass `default=<some fallback value>`, for example:
|
||||
|
||||
```python
|
||||
description = self._search_regex(
|
||||
r'<span[^>]+id="title"[^>]*>([^<]+)<',
|
||||
webpage, 'description', default=None)
|
||||
```
|
||||
|
||||
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
|
||||
|
||||
### Provide fallbacks
|
||||
|
||||
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
|
||||
|
||||
#### Example
|
||||
|
||||
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
|
||||
|
||||
```python
|
||||
title = meta['title']
|
||||
```
|
||||
|
||||
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
|
||||
|
||||
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
|
||||
|
||||
```python
|
||||
title = meta.get('title') or self._og_search_title(webpage)
|
||||
```
|
||||
|
||||
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
|
||||
|
||||
### Make regular expressions flexible
|
||||
|
||||
When using regular expressions try to write them fuzzy and flexible.
|
||||
|
||||
#### Example
|
||||
|
||||
Say you need to extract `title` from the following HTML code:
|
||||
|
||||
```html
|
||||
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
|
||||
```
|
||||
|
||||
The code for that task should look similar to:
|
||||
|
||||
```python
|
||||
title = self._search_regex(
|
||||
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
|
||||
```
|
||||
|
||||
Or even better:
|
||||
|
||||
```python
|
||||
title = self._search_regex(
|
||||
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
|
||||
webpage, 'title', group='title')
|
||||
```
|
||||
|
||||
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
|
||||
|
||||
The code definitely should not look like:
|
||||
|
||||
```python
|
||||
title = self._search_regex(
|
||||
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
|
||||
webpage, 'title', group='title')
|
||||
```
|
||||
|
||||
### Use safe conversion functions
|
||||
|
||||
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
|
||||
|
||||
|
154
README.md
154
README.md
@ -424,7 +424,7 @@ which means you can modify it, redistribute it or use it however you like.
|
||||
|
||||
# CONFIGURATION
|
||||
|
||||
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
|
||||
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
|
||||
|
||||
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
|
||||
```
|
||||
@ -890,9 +890,17 @@ If you want to add support for a new site, first of all **make sure** this site
|
||||
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
|
||||
|
||||
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
|
||||
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
|
||||
3. Start a new git branch with `cd youtube-dl; git checkout -b yourextractor`
|
||||
2. Check out the source code with:
|
||||
|
||||
git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
|
||||
|
||||
3. Start a new git branch with
|
||||
|
||||
cd youtube-dl
|
||||
git checkout -b yourextractor
|
||||
|
||||
4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
|
||||
|
||||
```python
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
@ -936,19 +944,151 @@ After you have ensured this site is distributing it's content legally, you can f
|
||||
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
|
||||
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
|
||||
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L74-L252). Add tests and code for as many as you want.
|
||||
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L148-L252) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
|
||||
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
|
||||
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
|
||||
8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://pypi.python.org/pypi/flake8). Also make sure your code works under all [Python](http://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
|
||||
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
|
||||
|
||||
$ git add youtube_dl/extractor/extractors.py
|
||||
$ git add youtube_dl/extractor/yourextractor.py
|
||||
$ git commit -m '[yourextractor] Add new extractor'
|
||||
$ git push origin yourextractor
|
||||
|
||||
11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
|
||||
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
|
||||
|
||||
In any case, thank you very much for your contributions!
|
||||
|
||||
## youtube-dl coding conventions
|
||||
|
||||
This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
|
||||
|
||||
Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hoster out of your control and this layout tend to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize code dependency on source's layout changes and even to make the code foresee potential future changes and be ready for that. This is important because it will allow extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with fix incorporated all the previous version become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say some may never receive an update at all that is possible for non rolling release distros.
|
||||
|
||||
### Mandatory and optional metafields
|
||||
|
||||
For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by [information dictionary](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L75-L257) or simply *info dict*. Only the following meta fields in *info dict* are considered mandatory for successful extraction process by youtube-dl:
|
||||
|
||||
- `id` (media identifier)
|
||||
- `title` (media title)
|
||||
- `url` (media download URL) or `formats`
|
||||
|
||||
In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` to be mandatory. Thus aforementioned metafields are the critical data the extraction does not make any sense without and if any of them fail to be extracted then extractor is considered completely broken.
|
||||
|
||||
[Any field](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L149-L257) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
|
||||
|
||||
#### Example
|
||||
|
||||
Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
|
||||
|
||||
```python
|
||||
meta = self._download_json(url, video_id)
|
||||
```
|
||||
|
||||
Assume at this point `meta`'s layout is:
|
||||
|
||||
```python
|
||||
{
|
||||
...
|
||||
"summary": "some fancy summary text",
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Assume you want to extract `summary` and put into resulting info dict as `description`. Since `description` is optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
|
||||
|
||||
```python
|
||||
description = meta.get('summary') # correct
|
||||
```
|
||||
|
||||
and not like:
|
||||
|
||||
```python
|
||||
description = meta['summary'] # incorrect
|
||||
```
|
||||
|
||||
The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some time later but with former approach extraction will just go ahead with `description` set to `None` that is perfectly fine (remember `None` is equivalent for absence of data).
|
||||
|
||||
Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
|
||||
|
||||
```python
|
||||
description = self._search_regex(
|
||||
r'<span[^>]+id="title"[^>]*>([^<]+)<',
|
||||
webpage, 'description', fatal=False)
|
||||
```
|
||||
|
||||
With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
|
||||
|
||||
You can also pass `default=<some fallback value>`, for example:
|
||||
|
||||
```python
|
||||
description = self._search_regex(
|
||||
r'<span[^>]+id="title"[^>]*>([^<]+)<',
|
||||
webpage, 'description', default=None)
|
||||
```
|
||||
|
||||
On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that are known to may or may not be present.
|
||||
|
||||
### Provide fallbacks
|
||||
|
||||
When extracting metadata try to provide several scenarios for that. For example if `title` is present in several places/sources try extracting from at least some of them. This would make it more future-proof in case some of the sources became unavailable.
|
||||
|
||||
#### Example
|
||||
|
||||
Say `meta` from previous example has a `title` and you are about to extract it. Since `title` is mandatory meta field you should end up with something like:
|
||||
|
||||
```python
|
||||
title = meta['title']
|
||||
```
|
||||
|
||||
If `title` disappeares from `meta` in future due to some changes on hoster's side the extraction would fail since `title` is mandatory. That's expected.
|
||||
|
||||
Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
|
||||
|
||||
```python
|
||||
title = meta.get('title') or self._og_search_title(webpage)
|
||||
```
|
||||
|
||||
This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
|
||||
|
||||
### Make regular expressions flexible
|
||||
|
||||
When using regular expressions try to write them fuzzy and flexible.
|
||||
|
||||
#### Example
|
||||
|
||||
Say you need to extract `title` from the following HTML code:
|
||||
|
||||
```html
|
||||
<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
|
||||
```
|
||||
|
||||
The code for that task should look similar to:
|
||||
|
||||
```python
|
||||
title = self._search_regex(
|
||||
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
|
||||
```
|
||||
|
||||
Or even better:
|
||||
|
||||
```python
|
||||
title = self._search_regex(
|
||||
r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
|
||||
webpage, 'title', group='title')
|
||||
```
|
||||
|
||||
Note how you tolerate potential changes in `style` attribute's value or switch from using double quotes to single for `class` attribute:
|
||||
|
||||
The code definitely should not look like:
|
||||
|
||||
```python
|
||||
title = self._search_regex(
|
||||
r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
|
||||
webpage, 'title', group='title')
|
||||
```
|
||||
|
||||
### Use safe conversion functions
|
||||
|
||||
Wrap all extracted numeric data into safe functions from `utils`: `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
|
||||
|
||||
# EMBEDDING YOUTUBE-DL
|
||||
|
||||
youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/rg3/youtube-dl/issues/new).
|
||||
|
41
devscripts/show-downloads-statistics.py
Normal file
41
devscripts/show-downloads-statistics.py
Normal file
@ -0,0 +1,41 @@
|
||||
#!/usr/bin/env python
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
from youtube_dl.compat import (
|
||||
compat_print,
|
||||
compat_urllib_request,
|
||||
)
|
||||
from youtube_dl.utils import format_bytes
|
||||
|
||||
|
||||
def format_size(bytes):
|
||||
return '%s (%d bytes)' % (format_bytes(bytes), bytes)
|
||||
|
||||
|
||||
total_bytes = 0
|
||||
|
||||
releases = json.loads(compat_urllib_request.urlopen(
|
||||
'https://api.github.com/repos/rg3/youtube-dl/releases').read().decode('utf-8'))
|
||||
|
||||
for release in releases:
|
||||
compat_print(release['name'])
|
||||
for asset in release['assets']:
|
||||
asset_name = asset['name']
|
||||
total_bytes += asset['download_count'] * asset['size']
|
||||
if all(not re.match(p, asset_name) for p in (
|
||||
r'^youtube-dl$',
|
||||
r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
|
||||
r'^youtube-dl\.exe$')):
|
||||
continue
|
||||
compat_print(
|
||||
' %s size: %s downloads: %d'
|
||||
% (asset_name, format_size(asset['size']), asset['download_count']))
|
||||
|
||||
compat_print('total downloads traffic: %s' % format_size(total_bytes))
|
@ -242,6 +242,7 @@
|
||||
- **FreeVideo**
|
||||
- **Funimation**
|
||||
- **FunnyOrDie**
|
||||
- **Fusion**
|
||||
- **GameInformer**
|
||||
- **Gamekings**
|
||||
- **GameOne**
|
||||
@ -282,6 +283,8 @@
|
||||
- **HotStar**
|
||||
- **Howcast**
|
||||
- **HowStuffWorks**
|
||||
- **HRTi**
|
||||
- **HRTiPlaylist**
|
||||
- **HuffPost**: Huffington Post
|
||||
- **Hypem**
|
||||
- **Iconosquare**
|
||||
@ -328,7 +331,7 @@
|
||||
- **kuwo:mv**: 酷我音乐 - MV
|
||||
- **kuwo:singer**: 酷我音乐 - 歌手
|
||||
- **kuwo:song**: 酷我音乐
|
||||
- **la7.tv**
|
||||
- **la7.it**
|
||||
- **Laola1Tv**
|
||||
- **Le**: 乐视网
|
||||
- **Learnr**
|
||||
@ -508,7 +511,7 @@
|
||||
- **podomatic**
|
||||
- **PolskieRadio**
|
||||
- **PornHd**
|
||||
- **PornHub**
|
||||
- **PornHub**: PornHub and Thumbzilla
|
||||
- **PornHubPlaylist**
|
||||
- **PornHubUserVideos**
|
||||
- **Pornotube**
|
||||
|
@ -138,27 +138,27 @@ class TestProxy(unittest.TestCase):
|
||||
self.proxy_thread.daemon = True
|
||||
self.proxy_thread.start()
|
||||
|
||||
self.cn_proxy = compat_http_server.HTTPServer(
|
||||
('localhost', 0), _build_proxy_handler('cn'))
|
||||
self.cn_port = http_server_port(self.cn_proxy)
|
||||
self.cn_proxy_thread = threading.Thread(target=self.cn_proxy.serve_forever)
|
||||
self.cn_proxy_thread.daemon = True
|
||||
self.cn_proxy_thread.start()
|
||||
self.geo_proxy = compat_http_server.HTTPServer(
|
||||
('localhost', 0), _build_proxy_handler('geo'))
|
||||
self.geo_port = http_server_port(self.geo_proxy)
|
||||
self.geo_proxy_thread = threading.Thread(target=self.geo_proxy.serve_forever)
|
||||
self.geo_proxy_thread.daemon = True
|
||||
self.geo_proxy_thread.start()
|
||||
|
||||
def test_proxy(self):
|
||||
cn_proxy = 'localhost:{0}'.format(self.cn_port)
|
||||
geo_proxy = 'localhost:{0}'.format(self.geo_port)
|
||||
ydl = YoutubeDL({
|
||||
'proxy': 'localhost:{0}'.format(self.port),
|
||||
'cn_verification_proxy': cn_proxy,
|
||||
'geo_verification_proxy': geo_proxy,
|
||||
})
|
||||
url = 'http://foo.com/bar'
|
||||
response = ydl.urlopen(url).read().decode('utf-8')
|
||||
self.assertEqual(response, 'normal: {0}'.format(url))
|
||||
|
||||
req = compat_urllib_request.Request(url)
|
||||
req.add_header('Ytdl-request-proxy', cn_proxy)
|
||||
req.add_header('Ytdl-request-proxy', geo_proxy)
|
||||
response = ydl.urlopen(req).read().decode('utf-8')
|
||||
self.assertEqual(response, 'cn: {0}'.format(url))
|
||||
self.assertEqual(response, 'geo: {0}'.format(url))
|
||||
|
||||
def test_proxy_with_idn(self):
|
||||
ydl = YoutubeDL({
|
||||
|
@ -196,8 +196,8 @@ class YoutubeDL(object):
|
||||
prefer_insecure: Use HTTP instead of HTTPS to retrieve information.
|
||||
At the moment, this is only supported by YouTube.
|
||||
proxy: URL of the proxy server to use
|
||||
cn_verification_proxy: URL of the proxy to use for IP address verification
|
||||
on Chinese sites. (Experimental)
|
||||
geo_verification_proxy: URL of the proxy to use for IP address verification
|
||||
on geo-restricted sites. (Experimental)
|
||||
socket_timeout: Time to wait for unresponsive hosts, in seconds
|
||||
bidi_workaround: Work around buggy terminals without bidirectional text
|
||||
support, using fridibi
|
||||
@ -304,6 +304,11 @@ class YoutubeDL(object):
|
||||
self.params.update(params)
|
||||
self.cache = Cache(self)
|
||||
|
||||
if self.params.get('cn_verification_proxy') is not None:
|
||||
self.report_warning('--cn-verification-proxy is deprecated. Use --geo-verification-proxy instead.')
|
||||
if self.params.get('geo_verification_proxy') is None:
|
||||
self.params['geo_verification_proxy'] = self.params['cn_verification_proxy']
|
||||
|
||||
if params.get('bidi_workaround', False):
|
||||
try:
|
||||
import pty
|
||||
|
@ -382,6 +382,8 @@ def _real_main(argv=None):
|
||||
'external_downloader_args': external_downloader_args,
|
||||
'postprocessor_args': postprocessor_args,
|
||||
'cn_verification_proxy': opts.cn_verification_proxy,
|
||||
'geo_verification_proxy': opts.geo_verification_proxy,
|
||||
|
||||
}
|
||||
|
||||
with YoutubeDL(ydl_opts) as ydl:
|
||||
|
@ -196,6 +196,11 @@ def build_fragments_list(boot_info):
|
||||
first_frag_number = fragment_run_entry_table[0]['first']
|
||||
fragments_counter = itertools.count(first_frag_number)
|
||||
for segment, fragments_count in segment_run_table['segment_run']:
|
||||
# In some live HDS streams (for example Rai), `fragments_count` is
|
||||
# abnormal and causing out-of-memory errors. It's OK to change the
|
||||
# number of fragments for live streams as they are updated periodically
|
||||
if fragments_count == 4294967295 and boot_info['live']:
|
||||
fragments_count = 2
|
||||
for _ in range(fragments_count):
|
||||
res.append((segment, next(fragments_counter)))
|
||||
|
||||
@ -329,7 +334,11 @@ class F4mFD(FragmentFD):
|
||||
|
||||
base_url = compat_urlparse.urljoin(man_url, media.attrib['url'])
|
||||
bootstrap_node = doc.find(_add_ns('bootstrapInfo'))
|
||||
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, base_url)
|
||||
# From Adobe F4M 3.0 spec:
|
||||
# The <baseURL> element SHALL be the base URL for all relative
|
||||
# (HTTP-based) URLs in the manifest. If <baseURL> is not present, said
|
||||
# URLs should be relative to the location of the containing document.
|
||||
boot_info, bootstrap_url = self._parse_bootstrap_node(bootstrap_node, man_url)
|
||||
live = boot_info['live']
|
||||
metadata_node = media.find(_add_ns('metadata'))
|
||||
if metadata_node is not None:
|
||||
|
@ -2,7 +2,7 @@ from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .theplatform import ThePlatformIE
|
||||
from ..utils import (
|
||||
smuggle_url,
|
||||
update_url_query,
|
||||
@ -15,28 +15,15 @@ from ..compat import (
|
||||
)
|
||||
|
||||
|
||||
class AENetworksBaseIE(InfoExtractor):
|
||||
def theplatform_url_result(self, theplatform_url, video_id, query):
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'id': video_id,
|
||||
'url': smuggle_url(
|
||||
update_url_query(theplatform_url, query),
|
||||
{
|
||||
'sig': {
|
||||
'key': 'crazyjava',
|
||||
'secret': 's3cr3t'
|
||||
},
|
||||
'force_smil_url': True
|
||||
}),
|
||||
'ie_key': 'ThePlatform',
|
||||
}
|
||||
class AENetworksBaseIE(ThePlatformIE):
|
||||
_THEPLATFORM_KEY = 'crazyjava'
|
||||
_THEPLATFORM_SECRET = 's3cr3t'
|
||||
|
||||
|
||||
class AENetworksIE(AENetworksBaseIE):
|
||||
IE_NAME = 'aenetworks'
|
||||
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
|
||||
_VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
|
||||
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
|
||||
@ -76,9 +63,15 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
|
||||
'only_matching': True
|
||||
}]
|
||||
_DOMAIN_TO_REQUESTOR_ID = {
|
||||
'history.com': 'HISTORY',
|
||||
'aetv.com': 'AETV',
|
||||
'mylifetime.com': 'LIFETIME',
|
||||
'fyi.tv': 'FYI',
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
|
||||
domain, show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
|
||||
display_id = show_path or movie_display_id
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
if show_path:
|
||||
@ -103,23 +96,39 @@ class AENetworksIE(AENetworksBaseIE):
|
||||
episode_attributes['data-videoid']))
|
||||
return self.playlist_result(
|
||||
entries, self._html_search_meta('aetn:SeasonId', webpage))
|
||||
|
||||
query = {
|
||||
'mbr': 'true',
|
||||
'assetTypes': 'medium_video_s3'
|
||||
}
|
||||
video_id = self._html_search_meta('aetn:VideoID', webpage)
|
||||
media_url = self._search_regex(
|
||||
r"media_url\s*=\s*'([^']+)'", webpage, 'video url')
|
||||
|
||||
info = self._search_json_ld(webpage, video_id, fatal=False)
|
||||
info.update(self.theplatform_url_result(
|
||||
media_url, video_id, {
|
||||
'mbr': 'true',
|
||||
'assetTypes': 'medium_video_s3'
|
||||
}))
|
||||
theplatform_metadata = self._download_theplatform_metadata(self._search_regex(
|
||||
r'https?://link.theplatform.com/s/([^?]+)', media_url, 'theplatform_path'), video_id)
|
||||
info = self._parse_theplatform_metadata(theplatform_metadata)
|
||||
if theplatform_metadata.get('AETN$isBehindWall'):
|
||||
requestor_id = self._DOMAIN_TO_REQUESTOR_ID[domain]
|
||||
resource = '<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>%s</title><item><title>%s</title><guid>%s</guid><media:rating scheme="urn:v-chip">%s</media:rating></item></channel></rss>' % (requestor_id, theplatform_metadata['title'], theplatform_metadata['AETN$PPL_pplProgramId'], theplatform_metadata['ratings'][0]['rating'])
|
||||
query['auth'] = self._extract_mvpd_auth(
|
||||
url, video_id, requestor_id, resource)
|
||||
info.update(self._search_json_ld(webpage, video_id, fatal=False))
|
||||
media_url = update_url_query(media_url, query)
|
||||
media_url = self._sign_url(media_url, self._THEPLATFORM_KEY, self._THEPLATFORM_SECRET)
|
||||
formats, subtitles = self._extract_theplatform_smil(media_url, video_id)
|
||||
self._sort_formats(formats)
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
})
|
||||
return info
|
||||
|
||||
|
||||
class HistoryTopicIE(AENetworksBaseIE):
|
||||
IE_NAME = 'history:topic'
|
||||
IE_DESC = 'History.com Topic'
|
||||
_VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)/videos(?:/(?P<video_display_id>[^/?#]+))?'
|
||||
_VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)(?:/[^/]+(?:/(?P<video_display_id>[^/?#]+))?)?'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
|
||||
'info_dict': {
|
||||
@ -147,8 +156,30 @@ class HistoryTopicIE(AENetworksBaseIE):
|
||||
}, {
|
||||
'url': 'http://www.history.com/topics/world-war-i-history/videos',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.history.com/topics/world-war-i/world-war-i-history',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/speeches',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def theplatform_url_result(self, theplatform_url, video_id, query):
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'id': video_id,
|
||||
'url': smuggle_url(
|
||||
update_url_query(theplatform_url, query),
|
||||
{
|
||||
'sig': {
|
||||
'key': self._THEPLATFORM_KEY,
|
||||
'secret': self._THEPLATFORM_SECRET,
|
||||
},
|
||||
'force_smil_url': True
|
||||
}),
|
||||
'ie_key': 'ThePlatform',
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
topic_id, video_display_id = re.match(self._VALID_URL, url).groups()
|
||||
if video_display_id:
|
||||
|
@ -585,6 +585,13 @@ class BrightcoveNewIE(InfoExtractor):
|
||||
'format_id': build_format_id('rtmp'),
|
||||
})
|
||||
formats.append(f)
|
||||
|
||||
errors = json_data.get('errors')
|
||||
if not formats and errors:
|
||||
error = errors[0]
|
||||
raise ExtractorError(
|
||||
error.get('message') or error.get('error_subcode') or error['error_code'], expected=True)
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
subtitles = {}
|
||||
|
@ -5,6 +5,7 @@ import json
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .facebook import FacebookIE
|
||||
|
||||
|
||||
class BuzzFeedIE(InfoExtractor):
|
||||
@ -20,11 +21,11 @@ class BuzzFeedIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': 'aVCR29aE_OQ',
|
||||
'ext': 'mp4',
|
||||
'title': 'Angry Ram destroys a punching bag..',
|
||||
'description': 'md5:c59533190ef23fd4458a5e8c8c872345',
|
||||
'upload_date': '20141024',
|
||||
'uploader_id': 'Buddhanz1',
|
||||
'description': 'He likes to stay in shape with his heavy bag, he wont stop until its on the ground\n\nFollow Angry Ram on Facebook for regular updates -\nhttps://www.facebook.com/pages/Angry-Ram/1436897249899558?ref=hl',
|
||||
'uploader': 'Buddhanz',
|
||||
'title': 'Angry Ram destroys a punching bag',
|
||||
'uploader': 'Angry Ram',
|
||||
}
|
||||
}]
|
||||
}, {
|
||||
@ -41,13 +42,30 @@ class BuzzFeedIE(InfoExtractor):
|
||||
'info_dict': {
|
||||
'id': 'mVmBL8B-In0',
|
||||
'ext': 'mp4',
|
||||
'title': 're:Munchkin the Teddy Bear gets her exercise',
|
||||
'description': 'md5:28faab95cda6e361bcff06ec12fc21d8',
|
||||
'upload_date': '20141124',
|
||||
'uploader_id': 'CindysMunchkin',
|
||||
'description': 're:© 2014 Munchkin the',
|
||||
'uploader': 're:^Munchkin the',
|
||||
'title': 're:Munchkin the Teddy Bear gets her exercise',
|
||||
},
|
||||
}]
|
||||
}, {
|
||||
'url': 'http://www.buzzfeed.com/craigsilverman/the-most-adorable-crash-landing-ever#.eq7pX0BAmK',
|
||||
'info_dict': {
|
||||
'id': 'the-most-adorable-crash-landing-ever',
|
||||
'title': 'Watch This Baby Goose Make The Most Adorable Crash Landing',
|
||||
'description': 'This gosling knows how to stick a landing.',
|
||||
},
|
||||
'playlist': [{
|
||||
'md5': '763ca415512f91ca62e4621086900a23',
|
||||
'info_dict': {
|
||||
'id': '971793786185728',
|
||||
'ext': 'mp4',
|
||||
'title': 'We set up crash pads so that the goslings on our roof would have a safe landi...',
|
||||
'uploader': 'Calgary Outdoor Centre-University of Calgary',
|
||||
},
|
||||
}],
|
||||
'add_ie': ['Facebook'],
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@ -66,6 +84,10 @@ class BuzzFeedIE(InfoExtractor):
|
||||
continue
|
||||
entries.append(self.url_result(video['url']))
|
||||
|
||||
facebook_url = FacebookIE._extract_url(webpage)
|
||||
if facebook_url:
|
||||
entries.append(self.url_result(facebook_url))
|
||||
|
||||
return {
|
||||
'_type': 'playlist',
|
||||
'id': playlist_id,
|
||||
|
@ -80,9 +80,6 @@ class CBSInteractiveIE(ThePlatformIE):
|
||||
|
||||
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
|
||||
formats, subtitles = [], {}
|
||||
if site == 'cnet':
|
||||
formats, subtitles = self._extract_theplatform_smil(
|
||||
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
|
||||
for (fkey, vid) in vdata['files'].items():
|
||||
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
|
||||
continue
|
||||
@ -94,7 +91,7 @@ class CBSInteractiveIE(ThePlatformIE):
|
||||
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
|
||||
self._sort_formats(formats)
|
||||
|
||||
info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
|
||||
info = self._extract_theplatform_metadata('kYEXFC/%s' % media_guid_path, video_id)
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
|
@ -1729,6 +1729,13 @@ class InfoExtractor(object):
|
||||
def _mark_watched(self, *args, **kwargs):
|
||||
raise NotImplementedError('This method must be implemented by subclasses')
|
||||
|
||||
def geo_verification_headers(self):
|
||||
headers = {}
|
||||
geo_verification_proxy = self._downloader.params.get('geo_verification_proxy')
|
||||
if geo_verification_proxy:
|
||||
headers['Ytdl-request-proxy'] = geo_verification_proxy
|
||||
return headers
|
||||
|
||||
|
||||
class SearchInfoExtractor(InfoExtractor):
|
||||
"""
|
||||
|
@ -281,6 +281,7 @@ from .freespeech import FreespeechIE
|
||||
from .freevideo import FreeVideoIE
|
||||
from .funimation import FunimationIE
|
||||
from .funnyordie import FunnyOrDieIE
|
||||
from .fusion import FusionIE
|
||||
from .gameinformer import GameInformerIE
|
||||
from .gamekings import GamekingsIE
|
||||
from .gameone import (
|
||||
@ -325,6 +326,10 @@ from .hotnewhiphop import HotNewHipHopIE
|
||||
from .hotstar import HotStarIE
|
||||
from .howcast import HowcastIE
|
||||
from .howstuffworks import HowStuffWorksIE
|
||||
from .hrti import (
|
||||
HRTiIE,
|
||||
HRTiPlaylistIE,
|
||||
)
|
||||
from .huffpost import HuffPostIE
|
||||
from .hypem import HypemIE
|
||||
from .iconosquare import IconosquareIE
|
||||
|
@ -129,6 +129,21 @@ class FacebookIE(InfoExtractor):
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def _extract_url(webpage):
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
|
||||
if mobj is not None:
|
||||
return mobj.group('url')
|
||||
|
||||
# Facebook API embed
|
||||
# see https://developers.facebook.com/docs/plugins/embedded-video-player
|
||||
mobj = re.search(r'''(?x)<div[^>]+
|
||||
class=(?P<q1>[\'"])[^\'"]*\bfb-(?:video|post)\b[^\'"]*(?P=q1)[^>]+
|
||||
data-href=(?P<q2>[\'"])(?P<url>(?:https?:)?//(?:www\.)?facebook.com/.+?)(?P=q2)''', webpage)
|
||||
if mobj is not None:
|
||||
return mobj.group('url')
|
||||
|
||||
def _login(self):
|
||||
(useremail, password) = self._get_login_info()
|
||||
if useremail is None:
|
||||
|
35
youtube_dl/extractor/fusion.py
Normal file
35
youtube_dl/extractor/fusion.py
Normal file
@ -0,0 +1,35 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .ooyala import OoyalaIE
|
||||
|
||||
|
||||
class FusionIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?fusion\.net/video/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://fusion.net/video/201781/u-s-and-panamanian-forces-work-together-to-stop-a-vessel-smuggling-drugs/',
|
||||
'info_dict': {
|
||||
'id': 'ZpcWNoMTE6x6uVIIWYpHh0qQDjxBuq5P',
|
||||
'ext': 'mp4',
|
||||
'title': 'U.S. and Panamanian forces work together to stop a vessel smuggling drugs',
|
||||
'description': 'md5:0cc84a9943c064c0f46b128b41b1b0d7',
|
||||
'duration': 140.0,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['Ooyala'],
|
||||
}, {
|
||||
'url': 'http://fusion.net/video/201781',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
ooyala_code = self._search_regex(
|
||||
r'data-video-id=(["\'])(?P<code>.+?)\1',
|
||||
webpage, 'ooyala code', group='code')
|
||||
|
||||
return OoyalaIE._build_url_result(ooyala_code)
|
@ -66,6 +66,7 @@ from .theplatform import ThePlatformIE
|
||||
from .vessel import VesselIE
|
||||
from .kaltura import KalturaIE
|
||||
from .eagleplatform import EaglePlatformIE
|
||||
from .facebook import FacebookIE
|
||||
|
||||
|
||||
class GenericIE(InfoExtractor):
|
||||
@ -1260,7 +1261,40 @@ class GenericIE(InfoExtractor):
|
||||
'uploader': 'TheAtlantic',
|
||||
},
|
||||
'add_ie': ['BrightcoveLegacy'],
|
||||
}
|
||||
},
|
||||
# Facebook <iframe> embed
|
||||
{
|
||||
'url': 'https://www.hostblogger.de/blog/archives/6181-Auto-jagt-Betonmischer.html',
|
||||
'md5': 'fbcde74f534176ecb015849146dd3aee',
|
||||
'info_dict': {
|
||||
'id': '599637780109885',
|
||||
'ext': 'mp4',
|
||||
'title': 'Facebook video #599637780109885',
|
||||
},
|
||||
},
|
||||
# Facebook API embed
|
||||
{
|
||||
'url': 'http://www.lothype.com/blue-stars-2016-preview-standstill-full-show/',
|
||||
'md5': 'a47372ee61b39a7b90287094d447d94e',
|
||||
'info_dict': {
|
||||
'id': '10153467542406923',
|
||||
'ext': 'mp4',
|
||||
'title': 'Facebook video #10153467542406923',
|
||||
},
|
||||
},
|
||||
# Wordpress "YouTube Video Importer" plugin
|
||||
{
|
||||
'url': 'http://www.lothype.com/blue-devils-drumline-stanford-lot-2016/',
|
||||
'md5': 'd16797741b560b485194eddda8121b48',
|
||||
'info_dict': {
|
||||
'id': 'HNTXWDXV9Is',
|
||||
'ext': 'mp4',
|
||||
'title': 'Blue Devils Drumline Stanford lot 2016',
|
||||
'upload_date': '20160627',
|
||||
'uploader_id': 'GENOCIDE8GENERAL10',
|
||||
'uploader': 'cylus cyrus',
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
def report_following_redirect(self, new_url):
|
||||
@ -1617,6 +1651,13 @@ class GenericIE(InfoExtractor):
|
||||
if matches:
|
||||
return _playlist_from_matches(matches, lambda m: unescapeHTML(m))
|
||||
|
||||
# Look for Wordpress "YouTube Video Importer" plugin
|
||||
matches = re.findall(r'''(?x)<div[^>]+
|
||||
class=(?P<q1>[\'"])[^\'"]*\byvii_single_video_player\b[^\'"]*(?P=q1)[^>]+
|
||||
data-video_id=(?P<q2>[\'"])([^\'"]+)(?P=q2)''', webpage)
|
||||
if matches:
|
||||
return _playlist_from_matches(matches, lambda m: m[-1])
|
||||
|
||||
# Look for embedded Dailymotion player
|
||||
matches = re.findall(
|
||||
r'<(?:(?:embed|iframe)[^>]+?src=|input[^>]+id=[\'"]dmcloudUrlEmissionSelect[\'"][^>]+value=)(["\'])(?P<url>(?:https?:)?//(?:www\.)?dailymotion\.com/(?:embed|swf)/video/.+?)\1', webpage)
|
||||
@ -1759,10 +1800,9 @@ class GenericIE(InfoExtractor):
|
||||
return self.url_result(mobj.group('url'))
|
||||
|
||||
# Look for embedded Facebook player
|
||||
mobj = re.search(
|
||||
r'<iframe[^>]+?src=(["\'])(?P<url>https://www\.facebook\.com/video/embed.+?)\1', webpage)
|
||||
if mobj is not None:
|
||||
return self.url_result(mobj.group('url'), 'Facebook')
|
||||
facebook_url = FacebookIE._extract_url(webpage)
|
||||
if facebook_url is not None:
|
||||
return self.url_result(facebook_url, 'Facebook')
|
||||
|
||||
# Look for embedded VK player
|
||||
mobj = re.search(r'<iframe[^>]+?src=(["\'])(?P<url>https?://vk\.com/video_ext\.php.+?)\1', webpage)
|
||||
|
202
youtube_dl/extractor/hrti.py
Normal file
202
youtube_dl/extractor/hrti.py
Normal file
@ -0,0 +1,202 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_HTTPError
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
ExtractorError,
|
||||
int_or_none,
|
||||
parse_age_limit,
|
||||
sanitized_Request,
|
||||
try_get,
|
||||
)
|
||||
|
||||
|
||||
class HRTiBaseIE(InfoExtractor):
|
||||
"""
|
||||
Base Information Extractor for Croatian Radiotelevision
|
||||
video on demand site https://hrti.hrt.hr
|
||||
Reverse engineered from the JavaScript app in app.min.js
|
||||
"""
|
||||
_NETRC_MACHINE = 'hrti'
|
||||
|
||||
_APP_LANGUAGE = 'hr'
|
||||
_APP_VERSION = '1.1'
|
||||
_APP_PUBLICATION_ID = 'all_in_one'
|
||||
_API_URL = 'http://clientapi.hrt.hr/client_api.php/config/identify/format/json'
|
||||
|
||||
def _initialize_api(self):
|
||||
init_data = {
|
||||
'application_publication_id': self._APP_PUBLICATION_ID
|
||||
}
|
||||
|
||||
uuid = self._download_json(
|
||||
self._API_URL, None, note='Downloading uuid',
|
||||
errnote='Unable to download uuid',
|
||||
data=json.dumps(init_data).encode('utf-8'))['uuid']
|
||||
|
||||
app_data = {
|
||||
'uuid': uuid,
|
||||
'application_publication_id': self._APP_PUBLICATION_ID,
|
||||
'application_version': self._APP_VERSION
|
||||
}
|
||||
|
||||
req = sanitized_Request(self._API_URL, data=json.dumps(app_data).encode('utf-8'))
|
||||
req.get_method = lambda: 'PUT'
|
||||
|
||||
resources = self._download_json(
|
||||
req, None, note='Downloading session information',
|
||||
errnote='Unable to download session information')
|
||||
|
||||
self._session_id = resources['session_id']
|
||||
|
||||
modules = resources['modules']
|
||||
|
||||
self._search_url = modules['vod_catalog']['resources']['search']['uri'].format(
|
||||
language=self._APP_LANGUAGE,
|
||||
application_id=self._APP_PUBLICATION_ID)
|
||||
|
||||
self._login_url = (modules['user']['resources']['login']['uri'] +
|
||||
'/format/json').format(session_id=self._session_id)
|
||||
|
||||
self._logout_url = modules['user']['resources']['logout']['uri']
|
||||
|
||||
def _login(self):
|
||||
(username, password) = self._get_login_info()
|
||||
# TODO: figure out authentication with cookies
|
||||
if username is None or password is None:
|
||||
self.raise_login_required()
|
||||
|
||||
auth_data = {
|
||||
'username': username,
|
||||
'password': password,
|
||||
}
|
||||
|
||||
try:
|
||||
auth_info = self._download_json(
|
||||
self._login_url, None, note='Logging in', errnote='Unable to log in',
|
||||
data=json.dumps(auth_data).encode('utf-8'))
|
||||
except ExtractorError as e:
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 406:
|
||||
auth_info = self._parse_json(e.cause.read().encode('utf-8'), None)
|
||||
else:
|
||||
raise
|
||||
|
||||
error_message = auth_info.get('error', {}).get('message')
|
||||
if error_message:
|
||||
raise ExtractorError(
|
||||
'%s said: %s' % (self.IE_NAME, error_message),
|
||||
expected=True)
|
||||
|
||||
self._token = auth_info['secure_streaming_token']
|
||||
|
||||
def _real_initialize(self):
|
||||
self._initialize_api()
|
||||
self._login()
|
||||
|
||||
|
||||
class HRTiIE(HRTiBaseIE):
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
hrti:(?P<short_id>[0-9]+)|
|
||||
https?://
|
||||
hrti\.hrt\.hr/\#/video/show/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?
|
||||
)
|
||||
'''
|
||||
_TESTS = [{
|
||||
'url': 'https://hrti.hrt.hr/#/video/show/2181385/republika-dokumentarna-serija-16-hd',
|
||||
'info_dict': {
|
||||
'id': '2181385',
|
||||
'display_id': 'republika-dokumentarna-serija-16-hd',
|
||||
'ext': 'mp4',
|
||||
'title': 'REPUBLIKA, dokumentarna serija (1/6) (HD)',
|
||||
'description': 'md5:48af85f620e8e0e1df4096270568544f',
|
||||
'duration': 2922,
|
||||
'view_count': int,
|
||||
'average_rating': int,
|
||||
'episode_number': int,
|
||||
'season_number': int,
|
||||
'age_limit': 12,
|
||||
},
|
||||
'skip': 'Requires account credentials',
|
||||
}, {
|
||||
'url': 'https://hrti.hrt.hr/#/video/show/2181385/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'hrti:2181385',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('short_id') or mobj.group('id')
|
||||
display_id = mobj.group('display_id') or video_id
|
||||
|
||||
video = self._download_json(
|
||||
'%s/video_id/%s/format/json' % (self._search_url, video_id),
|
||||
display_id, 'Downloading video metadata JSON')['video'][0]
|
||||
|
||||
title_info = video['title']
|
||||
title = title_info['title_long']
|
||||
|
||||
movie = video['video_assets']['movie'][0]
|
||||
m3u8_url = movie['url'].format(TOKEN=self._token)
|
||||
formats = self._extract_m3u8_formats(
|
||||
m3u8_url, display_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls')
|
||||
self._sort_formats(formats)
|
||||
|
||||
description = clean_html(title_info.get('summary_long'))
|
||||
age_limit = parse_age_limit(video.get('parental_control', {}).get('rating'))
|
||||
view_count = int_or_none(video.get('views'))
|
||||
average_rating = int_or_none(video.get('user_rating'))
|
||||
duration = int_or_none(movie.get('duration'))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'view_count': view_count,
|
||||
'average_rating': average_rating,
|
||||
'age_limit': age_limit,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
|
||||
class HRTiPlaylistIE(HRTiBaseIE):
|
||||
_VALID_URL = r'https?://hrti.hrt.hr/#/video/list/category/(?P<id>[0-9]+)/(?P<display_id>[^/]+)?'
|
||||
_TESTS = [{
|
||||
'url': 'https://hrti.hrt.hr/#/video/list/category/212/ekumena',
|
||||
'info_dict': {
|
||||
'id': '212',
|
||||
'title': 'ekumena',
|
||||
},
|
||||
'playlist_mincount': 8,
|
||||
'skip': 'Requires account credentials',
|
||||
}, {
|
||||
'url': 'https://hrti.hrt.hr/#/video/list/category/212/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
category_id = mobj.group('id')
|
||||
display_id = mobj.group('display_id') or category_id
|
||||
|
||||
response = self._download_json(
|
||||
'%s/category_id/%s/format/json' % (self._search_url, category_id),
|
||||
display_id, 'Downloading video metadata JSON')
|
||||
|
||||
video_ids = try_get(
|
||||
response, lambda x: x['video_listings'][0]['alternatives'][0]['list'],
|
||||
list) or [video['id'] for video in response.get('videos', []) if video.get('id')]
|
||||
|
||||
entries = [self.url_result('hrti:%s' % video_id) for video_id in video_ids]
|
||||
|
||||
return self.playlist_result(entries, category_id, display_id)
|
@ -1,10 +1,8 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import binascii
|
||||
import hashlib
|
||||
import itertools
|
||||
import math
|
||||
import re
|
||||
import time
|
||||
|
||||
@ -14,12 +12,13 @@ from ..compat import (
|
||||
compat_urllib_parse_urlencode,
|
||||
)
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
decode_packed_codes,
|
||||
get_element_by_id,
|
||||
get_element_by_attribute,
|
||||
ExtractorError,
|
||||
intlist_to_bytes,
|
||||
ohdave_rsa_encrypt,
|
||||
remove_start,
|
||||
urshift,
|
||||
)
|
||||
|
||||
|
||||
@ -166,7 +165,7 @@ class IqiyiIE(InfoExtractor):
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.iqiyi.com/v_19rrojlavg.html',
|
||||
'md5': '470a6c160618577166db1a7aac5a3606',
|
||||
# MD5 checksum differs on my machine and Travis CI
|
||||
'info_dict': {
|
||||
'id': '9c1fb1b99d192b21c559e5a1a2cb3c73',
|
||||
'ext': 'mp4',
|
||||
@ -174,11 +173,11 @@ class IqiyiIE(InfoExtractor):
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.iqiyi.com/v_19rrhnnclk.html',
|
||||
'md5': 'f09f0a6a59b2da66a26bf4eda669a4cc',
|
||||
'md5': '667171934041350c5de3f5015f7f1152',
|
||||
'info_dict': {
|
||||
'id': 'e3f585b550a280af23c98b6cb2be19fb',
|
||||
'ext': 'mp4',
|
||||
'title': '名侦探柯南 国语版',
|
||||
'title': '名侦探柯南 国语版:第752集 迫近灰原秘密的黑影 下篇',
|
||||
},
|
||||
'skip': 'Geo-restricted to China',
|
||||
}, {
|
||||
@ -196,22 +195,10 @@ class IqiyiIE(InfoExtractor):
|
||||
'url': 'http://www.iqiyi.com/v_19rrny4w8w.html',
|
||||
'info_dict': {
|
||||
'id': 'f3cf468b39dddb30d676f89a91200dc1',
|
||||
'ext': 'mp4',
|
||||
'title': '泰坦尼克号',
|
||||
},
|
||||
'playlist': [{
|
||||
'info_dict': {
|
||||
'id': 'f3cf468b39dddb30d676f89a91200dc1_part1',
|
||||
'ext': 'f4v',
|
||||
'title': '泰坦尼克号',
|
||||
},
|
||||
}, {
|
||||
'info_dict': {
|
||||
'id': 'f3cf468b39dddb30d676f89a91200dc1_part2',
|
||||
'ext': 'f4v',
|
||||
'title': '泰坦尼克号',
|
||||
},
|
||||
}],
|
||||
'expected_warnings': ['Needs a VIP account for full video'],
|
||||
'skip': 'Geo-restricted to China',
|
||||
}, {
|
||||
'url': 'http://www.iqiyi.com/a_19rrhb8ce1.html',
|
||||
'info_dict': {
|
||||
@ -224,14 +211,16 @@ class IqiyiIE(InfoExtractor):
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_FORMATS_MAP = [
|
||||
('1', 'h6'),
|
||||
('2', 'h5'),
|
||||
('3', 'h4'),
|
||||
('4', 'h3'),
|
||||
('5', 'h2'),
|
||||
('10', 'h1'),
|
||||
]
|
||||
_FORMATS_MAP = {
|
||||
'96': 1, # 216p, 240p
|
||||
'1': 2, # 336p, 360p
|
||||
'2': 3, # 480p, 504p
|
||||
'21': 4, # 504p
|
||||
'4': 5, # 720p
|
||||
'17': 5, # 720p
|
||||
'5': 6, # 1072p, 1080p
|
||||
'18': 7, # 1080p
|
||||
}
|
||||
|
||||
def _real_initialize(self):
|
||||
self._login()
|
||||
@ -291,101 +280,23 @@ class IqiyiIE(InfoExtractor):
|
||||
|
||||
return True
|
||||
|
||||
@staticmethod
|
||||
def _gen_sc(tvid, timestamp):
|
||||
M = [1732584193, -271733879]
|
||||
M.extend([~M[0], ~M[1]])
|
||||
I_table = [7, 12, 17, 22, 5, 9, 14, 20, 4, 11, 16, 23, 6, 10, 15, 21]
|
||||
C_base = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8388608, 432]
|
||||
|
||||
def L(n, t):
|
||||
if t is None:
|
||||
t = 0
|
||||
return trunc(((n >> 1) + (t >> 1) << 1) + (n & 1) + (t & 1))
|
||||
|
||||
def trunc(n):
|
||||
n = n % 0x100000000
|
||||
if n > 0x7fffffff:
|
||||
n -= 0x100000000
|
||||
return n
|
||||
|
||||
def transform(string, mod):
|
||||
num = int(string, 16)
|
||||
return (num >> 8 * (i % 4) & 255 ^ i % mod) << ((a & 3) << 3)
|
||||
|
||||
C = list(C_base)
|
||||
o = list(M)
|
||||
k = str(timestamp - 7)
|
||||
for i in range(13):
|
||||
a = i
|
||||
C[a >> 2] |= ord(k[a]) << 8 * (a % 4)
|
||||
|
||||
for i in range(16):
|
||||
a = i + 13
|
||||
start = (i >> 2) * 8
|
||||
r = '03967743b643f66763d623d637e30733'
|
||||
C[a >> 2] |= transform(''.join(reversed(r[start:start + 8])), 7)
|
||||
|
||||
for i in range(16):
|
||||
a = i + 29
|
||||
start = (i >> 2) * 8
|
||||
r = '7038766939776a32776a32706b337139'
|
||||
C[a >> 2] |= transform(r[start:start + 8], 1)
|
||||
|
||||
for i in range(9):
|
||||
a = i + 45
|
||||
if i < len(tvid):
|
||||
C[a >> 2] |= ord(tvid[i]) << 8 * (a % 4)
|
||||
|
||||
for a in range(64):
|
||||
i = a
|
||||
I = i >> 4
|
||||
C_index = [i, 5 * i + 1, 3 * i + 5, 7 * i][I] % 16 + urshift(a, 6)
|
||||
m = L(L(o[0], [
|
||||
trunc(o[1] & o[2]) | trunc(~o[1] & o[3]),
|
||||
trunc(o[3] & o[1]) | trunc(~o[3] & o[2]),
|
||||
o[1] ^ o[2] ^ o[3],
|
||||
o[2] ^ trunc(o[1] | ~o[3])
|
||||
][I]), L(
|
||||
trunc(int(abs(math.sin(i + 1)) * 4294967296)),
|
||||
C[C_index] if C_index < len(C) else None))
|
||||
I = I_table[4 * I + i % 4]
|
||||
o = [o[3],
|
||||
L(o[1], trunc(trunc(m << I) | urshift(m, 32 - I))),
|
||||
o[1],
|
||||
o[2]]
|
||||
|
||||
new_M = [L(o[0], M[0]), L(o[1], M[1]), L(o[2], M[2]), L(o[3], M[3])]
|
||||
s = [new_M[a >> 3] >> (1 ^ a & 7) * 4 & 15 for a in range(32)]
|
||||
return binascii.hexlify(intlist_to_bytes(s))[1::2].decode('ascii')
|
||||
|
||||
def get_raw_data(self, tvid, video_id):
|
||||
tm = int(time.time() * 1000)
|
||||
|
||||
sc = self._gen_sc(tvid, tm)
|
||||
key = 'd5fb4bd9d50c4be6948c97edd7254b0e'
|
||||
sc = md5_text(compat_str(tm) + key + tvid)
|
||||
params = {
|
||||
'platForm': 'h5',
|
||||
'rate': 1,
|
||||
'tvid': tvid,
|
||||
'vid': video_id,
|
||||
'cupid': 'qc_100001_100186',
|
||||
'type': 'mp4',
|
||||
'nolimit': 0,
|
||||
'agenttype': 13,
|
||||
'src': 'd846d0c32d664d32b6b54ea48997a589',
|
||||
'src': '76f90cbd92f94a2e925d83e8ccd22cb7',
|
||||
'sc': sc,
|
||||
't': tm - 7,
|
||||
'__jsT': None,
|
||||
't': tm,
|
||||
}
|
||||
|
||||
headers = {}
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
headers['Ytdl-request-proxy'] = cn_verification_proxy
|
||||
return self._download_json(
|
||||
'http://cache.m.iqiyi.com/jp/tmts/%s/%s/' % (tvid, video_id),
|
||||
video_id, transform_source=lambda s: remove_start(s, 'var tvInfoJs='),
|
||||
query=params, headers=headers)
|
||||
query=params, headers=self.geo_verification_headers())
|
||||
|
||||
def _extract_playlist(self, webpage):
|
||||
PAGE_SIZE = 50
|
||||
@ -435,6 +346,7 @@ class IqiyiIE(InfoExtractor):
|
||||
video_id = self._search_regex(
|
||||
r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
|
||||
|
||||
formats = []
|
||||
for _ in range(5):
|
||||
raw_data = self.get_raw_data(tvid, video_id)
|
||||
|
||||
@ -445,16 +357,29 @@ class IqiyiIE(InfoExtractor):
|
||||
|
||||
data = raw_data['data']
|
||||
|
||||
# iQiYi sometimes returns Ads
|
||||
if not isinstance(data['playInfo'], dict):
|
||||
self._sleep(5, video_id)
|
||||
for stream in data['vidl']:
|
||||
if 'm3utx' not in stream:
|
||||
continue
|
||||
vd = compat_str(stream['vd'])
|
||||
formats.append({
|
||||
'url': stream['m3utx'],
|
||||
'format_id': vd,
|
||||
'ext': 'mp4',
|
||||
'preference': self._FORMATS_MAP.get(vd, -1),
|
||||
'protocol': 'm3u8_native',
|
||||
})
|
||||
|
||||
title = data['playInfo']['an']
|
||||
if formats:
|
||||
break
|
||||
|
||||
self._sleep(5, video_id)
|
||||
|
||||
self._sort_formats(formats)
|
||||
title = (get_element_by_id('widget-videotitle', webpage) or
|
||||
clean_html(get_element_by_attribute('class', 'mod-play-tit', webpage)))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'url': data['m3u'],
|
||||
'formats': formats,
|
||||
}
|
||||
|
@ -26,11 +26,6 @@ class KuwoBaseIE(InfoExtractor):
|
||||
def _get_formats(self, song_id, tolerate_ip_deny=False):
|
||||
formats = []
|
||||
for file_format in self._FORMATS:
|
||||
headers = {}
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
headers['Ytdl-request-proxy'] = cn_verification_proxy
|
||||
|
||||
query = {
|
||||
'format': file_format['ext'],
|
||||
'br': file_format.get('br', ''),
|
||||
@ -42,7 +37,7 @@ class KuwoBaseIE(InfoExtractor):
|
||||
song_url = self._download_webpage(
|
||||
'http://antiserver.kuwo.cn/anti.s',
|
||||
song_id, note='Download %s url info' % file_format['format'],
|
||||
query=query, headers=headers,
|
||||
query=query, headers=self.geo_verification_headers(),
|
||||
)
|
||||
|
||||
if song_url == 'IPDeny' and not tolerate_ip_deny:
|
||||
|
@ -1,60 +1,74 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
parse_duration,
|
||||
determine_ext,
|
||||
js_to_json,
|
||||
)
|
||||
|
||||
|
||||
class LA7IE(InfoExtractor):
|
||||
IE_NAME = 'la7.tv'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://(?:www\.)?la7\.tv/
|
||||
(?:
|
||||
richplayer/\?assetid=|
|
||||
\?contentId=
|
||||
)
|
||||
(?P<id>[0-9]+)'''
|
||||
IE_NAME = 'la7.it'
|
||||
_VALID_URL = r'''(?x)(https?://)?(?:
|
||||
(?:www\.)?la7\.it/([^/]+)/(?:rivedila7|video)/|
|
||||
tg\.la7\.it/repliche-tgla7\?id=
|
||||
)(?P<id>.+)'''
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.la7.tv/richplayer/?assetid=50355319',
|
||||
'md5': 'ec7d1f0224d20ba293ab56cf2259651f',
|
||||
_TESTS = [{
|
||||
# 'src' is a plain URL
|
||||
'url': 'http://www.la7.it/crozza/video/inccool8-02-10-2015-163722',
|
||||
'md5': '6054674766e7988d3e02f2148ff92180',
|
||||
'info_dict': {
|
||||
'id': '50355319',
|
||||
'id': 'inccool8-02-10-2015-163722',
|
||||
'ext': 'mp4',
|
||||
'title': 'IL DIVO',
|
||||
'description': 'Un film di Paolo Sorrentino con Toni Servillo, Anna Bonaiuto, Giulio Bosetti e Flavio Bucci',
|
||||
'duration': 6254,
|
||||
'title': 'Inc.Cool8',
|
||||
'description': 'Benvenuti nell\'incredibile mondo della INC. COOL. 8. dove “INC.” sta per “Incorporated” “COOL” sta per “fashion” ed Eight sta per il gesto atletico',
|
||||
'thumbnail': 're:^https?://.*',
|
||||
},
|
||||
'skip': 'Blocked in the US',
|
||||
}
|
||||
}, {
|
||||
# 'src' is a dictionary
|
||||
'url': 'http://tg.la7.it/repliche-tgla7?id=189080',
|
||||
'md5': '6b0d8888d286e39870208dfeceaf456b',
|
||||
'info_dict': {
|
||||
'id': '189080',
|
||||
'ext': 'mp4',
|
||||
'title': 'TG LA7',
|
||||
},
|
||||
}, {
|
||||
'url': 'http://www.la7.it/omnibus/rivedila7/omnibus-news-02-07-2016-189077',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
xml_url = 'http://www.la7.tv/repliche/content/index.php?contentId=%s' % video_id
|
||||
doc = self._download_xml(xml_url, video_id)
|
||||
|
||||
video_title = doc.find('title').text
|
||||
description = doc.find('description').text
|
||||
duration = parse_duration(doc.find('duration').text)
|
||||
thumbnail = doc.find('img').text
|
||||
view_count = int(doc.find('views').text)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
prefix = doc.find('.//fqdn').text.strip().replace('auto:', 'http:')
|
||||
player_data = self._parse_json(
|
||||
self._search_regex(r'videoLa7\(({[^;]+})\);', webpage, 'player data'),
|
||||
video_id, transform_source=js_to_json)
|
||||
|
||||
formats = [{
|
||||
'format': vnode.find('quality').text,
|
||||
'tbr': int(vnode.find('quality').text),
|
||||
'url': vnode.find('fms').text.strip().replace('mp4:', prefix),
|
||||
} for vnode in doc.findall('.//videos/video')]
|
||||
source = player_data['src']
|
||||
source_urls = source.values() if isinstance(source, dict) else [source]
|
||||
|
||||
formats = []
|
||||
for source_url in source_urls:
|
||||
ext = determine_ext(source_url)
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
source_url, video_id, ext='mp4',
|
||||
entry_protocol='m3u8_native', m3u8_id='hls'))
|
||||
else:
|
||||
formats.append({
|
||||
'url': source_url,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': video_title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'title': player_data['title'],
|
||||
'description': self._og_search_description(webpage, default=None),
|
||||
'thumbnail': player_data.get('poster'),
|
||||
'formats': formats,
|
||||
'view_count': view_count,
|
||||
}
|
||||
|
@ -20,7 +20,6 @@ from ..utils import (
|
||||
int_or_none,
|
||||
orderedSet,
|
||||
parse_iso8601,
|
||||
sanitized_Request,
|
||||
str_or_none,
|
||||
url_basename,
|
||||
urshift,
|
||||
@ -121,16 +120,11 @@ class LeIE(InfoExtractor):
|
||||
'tkey': self.calc_time_key(int(time.time())),
|
||||
'domain': 'www.le.com'
|
||||
}
|
||||
play_json_req = sanitized_Request(
|
||||
'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse_urlencode(params)
|
||||
)
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
play_json_req.add_header('Ytdl-request-proxy', cn_verification_proxy)
|
||||
|
||||
play_json = self._download_json(
|
||||
play_json_req,
|
||||
media_id, 'Downloading playJson data')
|
||||
'http://api.le.com/mms/out/video/playJson',
|
||||
media_id, 'Downloading playJson data', query=params,
|
||||
headers=self.geo_verification_headers())
|
||||
|
||||
# Check for errors
|
||||
playstatus = play_json['playstatus']
|
||||
|
@ -1,6 +1,7 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from .theplatform import ThePlatformIE
|
||||
from ..utils import (
|
||||
smuggle_url,
|
||||
url_basename,
|
||||
@ -61,7 +62,7 @@ class NationalGeographicIE(InfoExtractor):
|
||||
}
|
||||
|
||||
|
||||
class NationalGeographicChannelIE(InfoExtractor):
|
||||
class NationalGeographicChannelIE(ThePlatformIE):
|
||||
IE_NAME = 'natgeo:channel'
|
||||
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/videos/(?P<id>[^/?]+)'
|
||||
|
||||
@ -102,12 +103,22 @@ class NationalGeographicChannelIE(InfoExtractor):
|
||||
release_url = self._search_regex(
|
||||
r'video_auth_playlist_url\s*=\s*"([^"]+)"',
|
||||
webpage, 'release url')
|
||||
query = {
|
||||
'mbr': 'true',
|
||||
'switch': 'http',
|
||||
}
|
||||
is_auth = self._search_regex(r'video_is_auth\s*=\s*"([^"]+)"', webpage, 'is auth', fatal=False)
|
||||
if is_auth == 'auth':
|
||||
auth_resource_id = self._search_regex(
|
||||
r"video_auth_resourceId\s*=\s*'([^']+)'",
|
||||
webpage, 'auth resource id')
|
||||
query['auth'] = self._extract_mvpd_auth(url, display_id, 'natgeo', auth_resource_id) or ''
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': 'ThePlatform',
|
||||
'url': smuggle_url(
|
||||
update_url_query(release_url, {'mbr': 'true', 'switch': 'http'}),
|
||||
update_url_query(release_url, query),
|
||||
{'force_smil_url': True}),
|
||||
'display_id': display_id,
|
||||
}
|
||||
|
@ -120,9 +120,12 @@ class PeriscopeUserIE(InfoExtractor):
|
||||
title = user.get('display_name') or user.get('username')
|
||||
description = user.get('description')
|
||||
|
||||
broadcast_ids = (data_store.get('UserBroadcastHistory', {}).get('broadcastIds') or
|
||||
data_store.get('BroadcastCache', {}).get('broadcastIds', []))
|
||||
|
||||
entries = [
|
||||
self.url_result(
|
||||
'https://www.periscope.tv/%s/%s' % (user_id, broadcast['id']))
|
||||
for broadcast in data_store.get('UserBroadcastHistory', {}).get('broadcasts', [])]
|
||||
'https://www.periscope.tv/%s/%s' % (user_id, broadcast_id))
|
||||
for broadcast_id in broadcast_ids]
|
||||
|
||||
return self.playlist_result(entries, user_id, title, description)
|
||||
|
@ -25,7 +25,15 @@ from ..aes import (
|
||||
|
||||
|
||||
class PornHubIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)(?P<id>[0-9a-z]+)'
|
||||
IE_DESC = 'PornHub and Thumbzilla'
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://
|
||||
(?:
|
||||
(?:[a-z]+\.)?pornhub\.com/(?:view_video\.php\?viewkey=|embed/)|
|
||||
(?:www\.)?thumbzilla\.com/video/
|
||||
)
|
||||
(?P<id>[0-9a-z]+)
|
||||
'''
|
||||
_TESTS = [{
|
||||
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
|
||||
'md5': '1e19b41231a02eba417839222ac9d58e',
|
||||
@ -74,6 +82,13 @@ class PornHubIE(InfoExtractor):
|
||||
# removed by uploader
|
||||
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph572716d15a111',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
# private video
|
||||
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph56fd731fce6b7',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://www.thumbzilla.com/video/ph56c6114abd99a/horny-girlfriend-sex',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@classmethod
|
||||
@ -96,7 +111,7 @@ class PornHubIE(InfoExtractor):
|
||||
webpage = self._download_webpage(req, video_id)
|
||||
|
||||
error_msg = self._html_search_regex(
|
||||
r'(?s)<div[^>]+class=(["\']).*?\bremoved\b.*?\1[^>]*>(?P<error>.+?)</div>',
|
||||
r'(?s)<div[^>]+class=(["\']).*?\b(?:removed|userMessageSection)\b.*?\1[^>]*>(?P<error>.+?)</div>',
|
||||
webpage, 'error message', default=None, group='error')
|
||||
if error_msg:
|
||||
error_msg = re.sub(r'\s+', ' ', error_msg)
|
||||
|
@ -1,47 +1,141 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import (
|
||||
compat_urllib_parse,
|
||||
compat_urlparse,
|
||||
)
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
determine_ext,
|
||||
ExtractorError,
|
||||
find_xpath_attr,
|
||||
fix_xml_ampersands,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
unified_strdate,
|
||||
int_or_none,
|
||||
update_url_query,
|
||||
xpath_text,
|
||||
)
|
||||
|
||||
|
||||
class RaiTVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/(?:[^/]+/)+media/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
|
||||
class RaiBaseIE(InfoExtractor):
|
||||
def _extract_relinker_formats(self, relinker_url, video_id):
|
||||
formats = []
|
||||
|
||||
for platform in ('mon', 'flash', 'native'):
|
||||
relinker = self._download_xml(
|
||||
relinker_url, video_id,
|
||||
note='Downloading XML metadata for platform %s' % platform,
|
||||
transform_source=fix_xml_ampersands,
|
||||
query={'output': 45, 'pl': platform},
|
||||
headers=self.geo_verification_headers())
|
||||
|
||||
media_url = find_xpath_attr(relinker, './url', 'type', 'content').text
|
||||
if media_url == 'http://download.rai.it/video_no_available.mp4':
|
||||
self.raise_geo_restricted()
|
||||
|
||||
ext = determine_ext(media_url)
|
||||
if (ext == 'm3u8' and platform != 'mon') or (ext == 'f4m' and platform != 'flash'):
|
||||
continue
|
||||
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
media_url, video_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
elif ext == 'f4m':
|
||||
manifest_url = update_url_query(
|
||||
media_url.replace('manifest#live_hds.f4m', 'manifest.f4m'),
|
||||
{'hdcore': '3.7.0', 'plugin': 'aasp-3.7.0.39.44'})
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
manifest_url, video_id, f4m_id='hds', fatal=False))
|
||||
else:
|
||||
bitrate = int_or_none(xpath_text(relinker, 'bitrate'))
|
||||
formats.append({
|
||||
'url': media_url,
|
||||
'tbr': bitrate if bitrate > 0 else None,
|
||||
'format_id': 'http-%d' % bitrate if bitrate > 0 else 'http',
|
||||
})
|
||||
|
||||
return formats
|
||||
|
||||
def _extract_from_content_id(self, content_id, base_url):
|
||||
media = self._download_json(
|
||||
'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-%s.html?json' % content_id,
|
||||
content_id, 'Downloading video JSON')
|
||||
|
||||
thumbnails = []
|
||||
for image_type in ('image', 'image_medium', 'image_300'):
|
||||
thumbnail_url = media.get(image_type)
|
||||
if thumbnail_url:
|
||||
thumbnails.append({
|
||||
'url': compat_urlparse.urljoin(base_url, thumbnail_url),
|
||||
})
|
||||
|
||||
formats = []
|
||||
media_type = media['type']
|
||||
if 'Audio' in media_type:
|
||||
formats.append({
|
||||
'format_id': media.get('formatoAudio'),
|
||||
'url': media['audioUrl'],
|
||||
'ext': media.get('formatoAudio'),
|
||||
})
|
||||
elif 'Video' in media_type:
|
||||
formats.extend(self._extract_relinker_formats(media['mediaUri'], content_id))
|
||||
self._sort_formats(formats)
|
||||
else:
|
||||
raise ExtractorError('not a media file')
|
||||
|
||||
subtitles = {}
|
||||
captions = media.get('subtitlesUrl')
|
||||
if captions:
|
||||
STL_EXT = '.stl'
|
||||
SRT_EXT = '.srt'
|
||||
if captions.endswith(STL_EXT):
|
||||
captions = captions[:-len(STL_EXT)] + SRT_EXT
|
||||
subtitles['it'] = [{
|
||||
'ext': 'srt',
|
||||
'url': captions,
|
||||
}]
|
||||
|
||||
return {
|
||||
'id': content_id,
|
||||
'title': media['name'],
|
||||
'description': media.get('desc'),
|
||||
'thumbnails': thumbnails,
|
||||
'uploader': media.get('author'),
|
||||
'upload_date': unified_strdate(media.get('date')),
|
||||
'duration': parse_duration(media.get('length')),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
|
||||
class RaiTVIE(RaiBaseIE):
|
||||
_VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/(?:[^/]+/)+(?:media|ondemand)/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-cb27157f-9dd0-4aee-b788-b1f67643a391.html',
|
||||
'md5': '96382709b61dd64a6b88e0f791e6df4c',
|
||||
'md5': '8970abf8caf8aef4696e7b1f2adfc696',
|
||||
'info_dict': {
|
||||
'id': 'cb27157f-9dd0-4aee-b788-b1f67643a391',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Report del 07/04/2014',
|
||||
'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
|
||||
'upload_date': '20140407',
|
||||
'duration': 6160,
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
}
|
||||
},
|
||||
{
|
||||
# no m3u8 stream
|
||||
'url': 'http://www.raisport.rai.it/dl/raiSport/media/rassegna-stampa-04a9f4bd-b563-40cf-82a6-aad3529cb4a9.html',
|
||||
'md5': 'd9751b78eac9710d62c2447b224dea39',
|
||||
# HDS download, MD5 is unstable
|
||||
'info_dict': {
|
||||
'id': '04a9f4bd-b563-40cf-82a6-aad3529cb4a9',
|
||||
'ext': 'flv',
|
||||
'title': 'TG PRIMO TEMPO',
|
||||
'upload_date': '20140612',
|
||||
'duration': 1758,
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
},
|
||||
'skip': 'Geo-restricted to Italy',
|
||||
},
|
||||
{
|
||||
'url': 'http://www.rainews.it/dl/rainews/media/state-of-the-net-Antonella-La-Carpia-regole-virali-7aafdea9-0e5d-49d5-88a6-7e65da67ae13.html',
|
||||
@ -67,127 +161,70 @@ class RaiTVIE(InfoExtractor):
|
||||
},
|
||||
{
|
||||
'url': 'http://www.ilcandidato.rai.it/dl/ray/media/Il-Candidato---Primo-episodio-Le-Primarie-28e5525a-b495-45e8-a7c3-bc48ba45d2b6.html',
|
||||
'md5': '496ab63e420574447f70d02578333437',
|
||||
'md5': 'e57493e1cb8bc7c564663f363b171847',
|
||||
'info_dict': {
|
||||
'id': '28e5525a-b495-45e8-a7c3-bc48ba45d2b6',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Il Candidato - Primo episodio: "Le Primarie"',
|
||||
'description': 'md5:364b604f7db50594678f483353164fb8',
|
||||
'upload_date': '20140923',
|
||||
'duration': 386,
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
}
|
||||
},
|
||||
]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
media = self._download_json(
|
||||
'http://www.rai.tv/dl/RaiTV/programmi/media/ContentItem-%s.html?json' % video_id,
|
||||
video_id, 'Downloading video JSON')
|
||||
|
||||
thumbnails = []
|
||||
for image_type in ('image', 'image_medium', 'image_300'):
|
||||
thumbnail_url = media.get(image_type)
|
||||
if thumbnail_url:
|
||||
thumbnails.append({
|
||||
'url': thumbnail_url,
|
||||
})
|
||||
|
||||
subtitles = []
|
||||
formats = []
|
||||
media_type = media['type']
|
||||
if 'Audio' in media_type:
|
||||
formats.append({
|
||||
'format_id': media.get('formatoAudio'),
|
||||
'url': media['audioUrl'],
|
||||
'ext': media.get('formatoAudio'),
|
||||
})
|
||||
elif 'Video' in media_type:
|
||||
def fix_xml(xml):
|
||||
return xml.replace(' tag elementi', '').replace('>/', '</')
|
||||
|
||||
relinker = self._download_xml(
|
||||
media['mediaUri'] + '&output=43',
|
||||
video_id, transform_source=fix_xml)
|
||||
|
||||
has_subtitle = False
|
||||
|
||||
for element in relinker.findall('element'):
|
||||
media_url = xpath_text(element, 'url')
|
||||
ext = determine_ext(media_url)
|
||||
content_type = xpath_text(element, 'content-type')
|
||||
if ext == 'm3u8':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
media_url, video_id, 'mp4', 'm3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
elif ext == 'f4m':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
media_url + '?hdcore=3.7.0&plugin=aasp-3.7.0.39.44',
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
elif ext == 'stl':
|
||||
has_subtitle = True
|
||||
elif content_type.startswith('video/'):
|
||||
bitrate = int_or_none(xpath_text(element, 'bitrate'))
|
||||
formats.append({
|
||||
'url': media_url,
|
||||
'tbr': bitrate if bitrate > 0 else None,
|
||||
'format_id': 'http-%d' % bitrate if bitrate > 0 else 'http',
|
||||
})
|
||||
elif content_type.startswith('image/'):
|
||||
thumbnails.append({
|
||||
'url': media_url,
|
||||
})
|
||||
|
||||
self._sort_formats(formats)
|
||||
|
||||
if has_subtitle:
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
subtitles = self._get_subtitles(video_id, webpage)
|
||||
else:
|
||||
raise ExtractorError('not a media file')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': media['name'],
|
||||
'description': media.get('desc'),
|
||||
'thumbnails': thumbnails,
|
||||
'uploader': media.get('author'),
|
||||
'upload_date': unified_strdate(media.get('date')),
|
||||
'duration': parse_duration(media.get('length')),
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
def _get_subtitles(self, video_id, webpage):
|
||||
subtitles = {}
|
||||
m = re.search(r'<meta name="closedcaption" content="(?P<captions>[^"]+)"', webpage)
|
||||
if m:
|
||||
captions = m.group('captions')
|
||||
STL_EXT = '.stl'
|
||||
SRT_EXT = '.srt'
|
||||
if captions.endswith(STL_EXT):
|
||||
captions = captions[:-len(STL_EXT)] + SRT_EXT
|
||||
subtitles['it'] = [{
|
||||
'ext': 'srt',
|
||||
'url': 'http://www.rai.tv%s' % compat_urllib_parse.quote(captions),
|
||||
}]
|
||||
return subtitles
|
||||
return self._extract_from_content_id(video_id, url)
|
||||
|
||||
|
||||
class RaiIE(InfoExtractor):
|
||||
class RaiIE(RaiBaseIE):
|
||||
_VALID_URL = r'https?://(?:.+?\.)?(?:rai\.it|rai\.tv|rainews\.it)/dl/.+?-(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})(?:-.+?)?\.html'
|
||||
_TESTS = [
|
||||
{
|
||||
'url': 'http://www.report.rai.it/dl/Report/puntata/ContentItem-0c7a664b-d0f4-4b2c-8835-3f82e46f433e.html',
|
||||
'md5': 'e0e7a8a131e249d1aa0ebf270d1d8db7',
|
||||
'md5': '2dd727e61114e1ee9c47f0da6914e178',
|
||||
'info_dict': {
|
||||
'id': '59d69d28-6bb6-409d-a4b5-ed44096560af',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': 'Il pacco',
|
||||
'description': 'md5:4b1afae1364115ce5d78ed83cd2e5b3a',
|
||||
'upload_date': '20141221',
|
||||
},
|
||||
}
|
||||
},
|
||||
{
|
||||
# Direct relinker URL
|
||||
'url': 'http://www.rai.tv/dl/RaiTV/dirette/PublishingBlock-1912dbbf-3f96-44c3-b4cf-523681fbacbc.html?channel=EuroNews',
|
||||
# HDS live stream, MD5 is unstable
|
||||
'info_dict': {
|
||||
'id': '1912dbbf-3f96-44c3-b4cf-523681fbacbc',
|
||||
'ext': 'flv',
|
||||
'title': 'EuroNews',
|
||||
},
|
||||
'skip': 'Geo-restricted to Italy',
|
||||
},
|
||||
{
|
||||
# Embedded content item ID
|
||||
'url': 'http://www.tg1.rai.it/dl/tg1/2010/edizioni/ContentSet-9b6e0cba-4bef-4aef-8cf0-9f7f665b7dfb-tg1.html?item=undefined',
|
||||
'md5': '84c1135ce960e8822ae63cec34441d63',
|
||||
'info_dict': {
|
||||
'id': '0960e765-62c8-474a-ac4b-7eb3e2be39c8',
|
||||
'ext': 'mp4',
|
||||
'title': 'TG1 ore 20:00 del 02/07/2016',
|
||||
'upload_date': '20160702',
|
||||
},
|
||||
},
|
||||
{
|
||||
'url': 'http://www.rainews.it/dl/rainews/live/ContentItem-3156f2f2-dc70-4953-8e2f-70d7489d4ce9.html',
|
||||
# HDS live stream, MD5 is unstable
|
||||
'info_dict': {
|
||||
'id': '3156f2f2-dc70-4953-8e2f-70d7489d4ce9',
|
||||
'ext': 'flv',
|
||||
'title': 'La diretta di Rainews24',
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
@classmethod
|
||||
@ -201,7 +238,30 @@ class RaiIE(InfoExtractor):
|
||||
iframe_url = self._search_regex(
|
||||
[r'<iframe[^>]+src="([^"]*/dl/[^"]+\?iframe\b[^"]*)"',
|
||||
r'drawMediaRaiTV\(["\'](.+?)["\']'],
|
||||
webpage, 'iframe')
|
||||
webpage, 'iframe', default=None)
|
||||
if iframe_url:
|
||||
if not iframe_url.startswith('http'):
|
||||
iframe_url = compat_urlparse.urljoin(url, iframe_url)
|
||||
return self.url_result(iframe_url)
|
||||
|
||||
content_item_id = self._search_regex(
|
||||
r'initEdizione\((?P<q1>[\'"])ContentItem-(?P<content_id>[^\'"]+)(?P=q1)',
|
||||
webpage, 'content item ID', group='content_id', default=None)
|
||||
if content_item_id:
|
||||
return self._extract_from_content_id(content_item_id, url)
|
||||
|
||||
relinker_url = compat_urlparse.urljoin(url, self._search_regex(
|
||||
r'(?:var\s+videoURL|mediaInfo\.mediaUri)\s*=\s*(?P<q1>[\'"])(?P<url>(https?:)?//mediapolis\.rai\.it/relinker/relinkerServlet\.htm\?cont=\d+)(?P=q1)',
|
||||
webpage, 'relinker URL', group='url'))
|
||||
formats = self._extract_relinker_formats(relinker_url, video_id)
|
||||
self._sort_formats(formats)
|
||||
|
||||
title = self._search_regex(
|
||||
r'var\s+videoTitolo\s*=\s*([\'"])(?P<title>[^\'"]+)\1',
|
||||
webpage, 'title', group='title', default=None) or self._og_search_title(webpage)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'formats': formats,
|
||||
}
|
||||
|
@ -8,10 +8,7 @@ from ..compat import (
|
||||
compat_str,
|
||||
compat_urllib_parse_urlencode,
|
||||
)
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
sanitized_Request,
|
||||
)
|
||||
from ..utils import ExtractorError
|
||||
|
||||
|
||||
class SohuIE(InfoExtractor):
|
||||
@ -96,15 +93,10 @@ class SohuIE(InfoExtractor):
|
||||
else:
|
||||
base_data_url = 'http://hot.vrs.sohu.com/vrs_flash.action?vid='
|
||||
|
||||
req = sanitized_Request(base_data_url + vid_id)
|
||||
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
req.add_header('Ytdl-request-proxy', cn_verification_proxy)
|
||||
|
||||
return self._download_json(
|
||||
req, video_id,
|
||||
'Downloading JSON data for %s' % vid_id)
|
||||
base_data_url + vid_id, video_id,
|
||||
'Downloading JSON data for %s' % vid_id,
|
||||
headers=self.geo_verification_headers())
|
||||
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
video_id = mobj.group('id')
|
||||
|
@ -6,6 +6,7 @@ import time
|
||||
import hmac
|
||||
import binascii
|
||||
import hashlib
|
||||
import netrc
|
||||
|
||||
|
||||
from .once import OnceIE
|
||||
@ -24,6 +25,9 @@ from ..utils import (
|
||||
xpath_with_ns,
|
||||
mimetype2ext,
|
||||
find_xpath_attr,
|
||||
unescapeHTML,
|
||||
urlencode_postdata,
|
||||
unified_timestamp,
|
||||
)
|
||||
|
||||
default_ns = 'http://www.w3.org/2005/SMIL21/Language'
|
||||
@ -62,10 +66,11 @@ class ThePlatformBaseIE(OnceIE):
|
||||
|
||||
return formats, subtitles
|
||||
|
||||
def get_metadata(self, path, video_id):
|
||||
def _download_theplatform_metadata(self, path, video_id):
|
||||
info_url = 'http://link.theplatform.com/s/%s?format=preview' % path
|
||||
info = self._download_json(info_url, video_id)
|
||||
return self._download_json(info_url, video_id)
|
||||
|
||||
def _parse_theplatform_metadata(self, info):
|
||||
subtitles = {}
|
||||
captions = info.get('captions')
|
||||
if isinstance(captions, list):
|
||||
@ -86,6 +91,10 @@ class ThePlatformBaseIE(OnceIE):
|
||||
'uploader': info.get('billingCode'),
|
||||
}
|
||||
|
||||
def _extract_theplatform_metadata(self, path, video_id):
|
||||
info = self._download_theplatform_metadata(path, video_id)
|
||||
return self._parse_theplatform_metadata(info)
|
||||
|
||||
|
||||
class ThePlatformIE(ThePlatformBaseIE):
|
||||
_VALID_URL = r'''(?x)
|
||||
@ -158,6 +167,7 @@ class ThePlatformIE(ThePlatformBaseIE):
|
||||
'url': 'http://player.theplatform.com/p/NnzsPC/onsite_universal/select/media/guid/2410887629/2928790?fwsitesection=nbc_the_blacklist_video_library&autoPlay=true&carouselID=137781',
|
||||
'only_matching': True,
|
||||
}]
|
||||
_SERVICE_PROVIDER_TEMPLATE = 'https://sp.auth.adobe.com/adobe-services/%s'
|
||||
|
||||
@classmethod
|
||||
def _extract_urls(cls, webpage):
|
||||
@ -192,6 +202,96 @@ class ThePlatformIE(ThePlatformBaseIE):
|
||||
sig = flags + expiration_date + checksum + str_to_hex(sig_secret)
|
||||
return '%s&sig=%s' % (url, sig)
|
||||
|
||||
def _extract_mvpd_auth(self, url, video_id, requestor_id, resource):
|
||||
def xml_text(xml_str, tag):
|
||||
return self._search_regex(
|
||||
'<%s>(.+?)</%s>' % (tag, tag), xml_str, tag)
|
||||
|
||||
mvpd_headers = {
|
||||
'ap_42': 'anonymous',
|
||||
'ap_11': 'Linux i686',
|
||||
'ap_z': 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0',
|
||||
'User-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:47.0) Gecko/20100101 Firefox/47.0',
|
||||
}
|
||||
|
||||
guid = xml_text(resource, 'guid')
|
||||
requestor_info = self._downloader.cache.load('mvpd', requestor_id) or {}
|
||||
authn_token = requestor_info.get('authn_token')
|
||||
if authn_token:
|
||||
token_expires = unified_timestamp(xml_text(authn_token, 'simpleTokenExpires').replace('_GMT', ''))
|
||||
if token_expires and token_expires >= time.time():
|
||||
authn_token = None
|
||||
if not authn_token:
|
||||
# TODO add support for other TV Providers
|
||||
mso_id = 'DTV'
|
||||
login_info = netrc.netrc().authenticators(mso_id)
|
||||
if not login_info:
|
||||
return None
|
||||
|
||||
def post_form(form_page, note, data={}):
|
||||
post_url = self._html_search_regex(r'<form[^>]+action=(["\'])(?P<url>.+?)\1', form_page, 'post url', group='url')
|
||||
return self._download_webpage(
|
||||
post_url, video_id, note, data=urlencode_postdata(data or self._hidden_inputs(form_page)), headers={
|
||||
'Content-Type': 'application/x-www-form-urlencoded',
|
||||
})
|
||||
|
||||
provider_redirect_page = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'authenticate/saml', video_id,
|
||||
'Downloading Provider Redirect Page', query={
|
||||
'noflash': 'true',
|
||||
'mso_id': mso_id,
|
||||
'requestor_id': requestor_id,
|
||||
'no_iframe': 'false',
|
||||
'domain_name': 'adobe.com',
|
||||
'redirect_url': url,
|
||||
})
|
||||
provider_login_page = post_form(
|
||||
provider_redirect_page, 'Downloading Provider Login Page')
|
||||
mvpd_confirm_page = post_form(provider_login_page, 'Logging in', {
|
||||
'username': login_info[0],
|
||||
'password': login_info[2],
|
||||
})
|
||||
post_form(mvpd_confirm_page, 'Confirming Login')
|
||||
|
||||
session = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'session', video_id,
|
||||
'Retrieving Session', data=urlencode_postdata({
|
||||
'_method': 'GET',
|
||||
'requestor_id': requestor_id,
|
||||
}), headers=mvpd_headers)
|
||||
authn_token = unescapeHTML(xml_text(session, 'authnToken'))
|
||||
requestor_info['authn_token'] = authn_token
|
||||
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
|
||||
|
||||
authz_token = requestor_info.get(guid)
|
||||
if not authz_token:
|
||||
authorize = self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'authorize', video_id,
|
||||
'Retrieving Authorization Token', data=urlencode_postdata({
|
||||
'resource_id': resource,
|
||||
'requestor_id': requestor_id,
|
||||
'authentication_token': authn_token,
|
||||
'mso_id': xml_text(authn_token, 'simpleTokenMsoID'),
|
||||
'userMeta': '1',
|
||||
}), headers=mvpd_headers)
|
||||
authz_token = unescapeHTML(xml_text(authorize, 'authzToken'))
|
||||
requestor_info[guid] = authz_token
|
||||
self._downloader.cache.store('mvpd', requestor_id, requestor_info)
|
||||
|
||||
mvpd_headers.update({
|
||||
'ap_19': xml_text(authn_token, 'simpleSamlNameID'),
|
||||
'ap_23': xml_text(authn_token, 'simpleSamlSessionIndex'),
|
||||
})
|
||||
|
||||
return self._download_webpage(
|
||||
self._SERVICE_PROVIDER_TEMPLATE % 'shortAuthorize',
|
||||
video_id, 'Retrieving Media Token', data=urlencode_postdata({
|
||||
'authz_token': authz_token,
|
||||
'requestor_id': requestor_id,
|
||||
'session_guid': xml_text(authn_token, 'simpleTokenAuthenticationGuid'),
|
||||
'hashed_guid': 'false',
|
||||
}), headers=mvpd_headers)
|
||||
|
||||
def _real_extract(self, url):
|
||||
url, smuggled_data = unsmuggle_url(url, {})
|
||||
|
||||
@ -265,7 +365,7 @@ class ThePlatformIE(ThePlatformBaseIE):
|
||||
formats, subtitles = self._extract_theplatform_smil(smil_url, video_id)
|
||||
self._sort_formats(formats)
|
||||
|
||||
ret = self.get_metadata(path, video_id)
|
||||
ret = self._extract_theplatform_metadata(path, video_id)
|
||||
combined_subtitles = self._merge_subtitles(ret.get('subtitles', {}), subtitles)
|
||||
ret.update({
|
||||
'id': video_id,
|
||||
@ -339,7 +439,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
|
||||
timestamp = int_or_none(entry.get('media$availableDate'), scale=1000)
|
||||
categories = [item['media$name'] for item in entry.get('media$categories', [])]
|
||||
|
||||
ret = self.get_metadata('%s/%s' % (provider_id, first_video_id), video_id)
|
||||
ret = self._extract_theplatform_metadata('%s/%s' % (provider_id, first_video_id), video_id)
|
||||
subtitles = self._merge_subtitles(subtitles, ret['subtitles'])
|
||||
ret.update({
|
||||
'id': video_id,
|
||||
|
@ -29,7 +29,7 @@ class TwitchBaseIE(InfoExtractor):
|
||||
_VALID_URL_BASE = r'https?://(?:www\.)?twitch\.tv'
|
||||
|
||||
_API_BASE = 'https://api.twitch.tv'
|
||||
_USHER_BASE = 'http://usher.twitch.tv'
|
||||
_USHER_BASE = 'https://usher.ttvnw.net'
|
||||
_LOGIN_URL = 'http://www.twitch.tv/login'
|
||||
_NETRC_MACHINE = 'twitch'
|
||||
|
||||
|
@ -90,10 +90,12 @@ class VineIE(InfoExtractor):
|
||||
|
||||
data = self._parse_json(
|
||||
self._search_regex(
|
||||
r'window\.POST_DATA\s*=\s*{\s*%s\s*:\s*({.+?})\s*};\s*</script>' % video_id,
|
||||
r'window\.POST_DATA\s*=\s*({.+?});\s*</script>',
|
||||
webpage, 'vine data'),
|
||||
video_id)
|
||||
|
||||
data = data[list(data.keys())[0]]
|
||||
|
||||
formats = [{
|
||||
'format_id': '%(format)s-%(rate)s' % f,
|
||||
'vcodec': f.get('format'),
|
||||
|
@ -27,12 +27,12 @@ class VKIE(InfoExtractor):
|
||||
https?://
|
||||
(?:
|
||||
(?:
|
||||
(?:m\.)?vk\.com/video_|
|
||||
(?:(?:m|new)\.)?vk\.com/video_|
|
||||
(?:www\.)?daxab.com/
|
||||
)
|
||||
ext\.php\?(?P<embed_query>.*?\boid=(?P<oid>-?\d+).*?\bid=(?P<id>\d+).*)|
|
||||
(?:
|
||||
(?:m\.)?vk\.com/(?:.+?\?.*?z=)?video|
|
||||
(?:(?:m|new)\.)?vk\.com/(?:.+?\?.*?z=)?video|
|
||||
(?:www\.)?daxab.com/embed/
|
||||
)
|
||||
(?P<videoid>-?\d+_\d+)(?:.*\blist=(?P<list_id>[\da-f]+))?
|
||||
@ -182,6 +182,10 @@ class VKIE(InfoExtractor):
|
||||
# pladform embed
|
||||
'url': 'https://vk.com/video-76116461_171554880',
|
||||
'only_matching': True,
|
||||
},
|
||||
{
|
||||
'url': 'http://new.vk.com/video205387401_165548505',
|
||||
'only_matching': True,
|
||||
}
|
||||
]
|
||||
|
||||
@ -354,7 +358,7 @@ class VKIE(InfoExtractor):
|
||||
class VKUserVideosIE(InfoExtractor):
|
||||
IE_NAME = 'vk:uservideos'
|
||||
IE_DESC = "VK - User's Videos"
|
||||
_VALID_URL = r'https?://vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
|
||||
_VALID_URL = r'https?://(?:(?:m|new)\.)?vk\.com/videos(?P<id>-?[0-9]+)(?!\?.*\bz=video)(?:[/?#&]|$)'
|
||||
_TEMPLATE_URL = 'https://vk.com/videos'
|
||||
_TESTS = [{
|
||||
'url': 'http://vk.com/videos205387401',
|
||||
@ -369,6 +373,12 @@ class VKUserVideosIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://vk.com/videos-97664626?section=all',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://m.vk.com/videos205387401',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://new.vk.com/videos205387401',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
@ -4,17 +4,23 @@ import itertools
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urllib_parse_unquote
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
orderedSet,
|
||||
parse_duration,
|
||||
sanitized_Request,
|
||||
str_to_int,
|
||||
)
|
||||
|
||||
|
||||
class XTubeIE(InfoExtractor):
|
||||
_VALID_URL = r'(?:xtube:|https?://(?:www\.)?xtube\.com/(?:watch\.php\?.*\bv=|video-watch/(?P<display_id>[^/]+)-))(?P<id>[^/?&#]+)'
|
||||
_VALID_URL = r'''(?x)
|
||||
(?:
|
||||
xtube:|
|
||||
https?://(?:www\.)?xtube\.com/(?:watch\.php\?.*\bv=|video-watch/(?P<display_id>[^/]+)-)
|
||||
)
|
||||
(?P<id>[^/?&#]+)
|
||||
'''
|
||||
|
||||
_TESTS = [{
|
||||
# old URL schema
|
||||
@ -27,6 +33,8 @@ class XTubeIE(InfoExtractor):
|
||||
'description': 'contains:an ET kind of thing',
|
||||
'uploader': 'greenshowers',
|
||||
'duration': 450,
|
||||
'view_count': int,
|
||||
'comment_count': int,
|
||||
'age_limit': 18,
|
||||
}
|
||||
}, {
|
||||
@ -51,21 +59,30 @@ class XTubeIE(InfoExtractor):
|
||||
req.add_header('Cookie', 'age_verified=1; cookiesAccepted=1')
|
||||
webpage = self._download_webpage(req, display_id)
|
||||
|
||||
flashvars = self._parse_json(
|
||||
self._search_regex(
|
||||
r'xt\.playerOps\s*=\s*({.+?});', webpage, 'player ops'),
|
||||
video_id)['flashvars']
|
||||
sources = self._parse_json(self._search_regex(
|
||||
r'sources\s*:\s*({.+?}),', webpage, 'sources'), video_id)
|
||||
|
||||
title = flashvars.get('title') or self._search_regex(
|
||||
r'<h1>([^<]+)</h1>', webpage, 'title')
|
||||
video_url = compat_urllib_parse_unquote(flashvars['video_url'])
|
||||
duration = int_or_none(flashvars.get('video_duration'))
|
||||
formats = []
|
||||
for format_id, format_url in sources.items():
|
||||
formats.append({
|
||||
'url': format_url,
|
||||
'format_id': format_id,
|
||||
'height': int_or_none(format_id),
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
uploader = self._search_regex(
|
||||
r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
|
||||
webpage, 'uploader', fatal=False)
|
||||
title = self._search_regex(
|
||||
(r'<h1>(?P<title>[^<]+)</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
|
||||
webpage, 'title', group='title')
|
||||
description = self._search_regex(
|
||||
r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
|
||||
uploader = self._search_regex(
|
||||
(r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
|
||||
r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
|
||||
webpage, 'uploader', fatal=False)
|
||||
duration = parse_duration(self._search_regex(
|
||||
r'<dt>Runtime:</dt>\s*<dd>([^<]+)</dd>',
|
||||
webpage, 'duration', fatal=False))
|
||||
view_count = str_to_int(self._search_regex(
|
||||
r'<dt>Views:</dt>\s*<dd>([\d,\.]+)</dd>',
|
||||
webpage, 'view count', fatal=False))
|
||||
@ -76,7 +93,6 @@ class XTubeIE(InfoExtractor):
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'url': video_url,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'uploader': uploader,
|
||||
@ -84,6 +100,7 @@ class XTubeIE(InfoExtractor):
|
||||
'view_count': view_count,
|
||||
'comment_count': comment_count,
|
||||
'age_limit': 18,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
|
||||
|
@ -19,6 +19,7 @@ from ..utils import (
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
from .brightcove import BrightcoveNewIE
|
||||
from .nbc import NBCSportsVPlayerIE
|
||||
|
||||
|
||||
@ -227,7 +228,12 @@ class YahooIE(InfoExtractor):
|
||||
# Look for NBCSports iframes
|
||||
nbc_sports_url = NBCSportsVPlayerIE._extract_url(webpage)
|
||||
if nbc_sports_url:
|
||||
return self.url_result(nbc_sports_url, 'NBCSportsVPlayer')
|
||||
return self.url_result(nbc_sports_url, NBCSportsVPlayerIE.ie_key())
|
||||
|
||||
# Look for Brightcove New Studio embeds
|
||||
bc_url = BrightcoveNewIE._extract_url(webpage)
|
||||
if bc_url:
|
||||
return self.url_result(bc_url, BrightcoveNewIE.ie_key())
|
||||
|
||||
# Query result is often embedded in webpage as JSON. Sometimes explicit requests
|
||||
# to video API results in a failure with geo restriction reason therefore using
|
||||
|
@ -16,7 +16,6 @@ from ..compat import (
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
get_element_by_attribute,
|
||||
sanitized_Request,
|
||||
)
|
||||
|
||||
|
||||
@ -218,14 +217,10 @@ class YoukuIE(InfoExtractor):
|
||||
headers = {
|
||||
'Referer': req_url,
|
||||
}
|
||||
headers.update(self.geo_verification_headers())
|
||||
self._set_cookie('youku.com', 'xreferrer', 'http://www.youku.com')
|
||||
req = sanitized_Request(req_url, headers=headers)
|
||||
|
||||
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
|
||||
if cn_verification_proxy:
|
||||
req.add_header('Ytdl-request-proxy', cn_verification_proxy)
|
||||
|
||||
raw_data = self._download_json(req, video_id, note=note)
|
||||
raw_data = self._download_json(req_url, video_id, note=note, headers=headers)
|
||||
|
||||
return raw_data['data']
|
||||
|
||||
|
@ -209,11 +209,16 @@ def parseOpts(overrideArguments=None):
|
||||
action='store_const', const='::', dest='source_address',
|
||||
help='Make all connections via IPv6 (experimental)',
|
||||
)
|
||||
network.add_option(
|
||||
'--geo-verification-proxy',
|
||||
dest='geo_verification_proxy', default=None, metavar='URL',
|
||||
help='Use this proxy to verify the IP address for some geo-restricted sites. '
|
||||
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
|
||||
)
|
||||
network.add_option(
|
||||
'--cn-verification-proxy',
|
||||
dest='cn_verification_proxy', default=None, metavar='URL',
|
||||
help='Use this proxy to verify the IP address for some Chinese sites. '
|
||||
'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
|
||||
help=optparse.SUPPRESS_HELP,
|
||||
)
|
||||
|
||||
selection = optparse.OptionGroup(parser, 'Video Selection')
|
||||
|
@ -1625,6 +1625,11 @@ class HEADRequest(compat_urllib_request.Request):
|
||||
return 'HEAD'
|
||||
|
||||
|
||||
class PUTRequest(compat_urllib_request.Request):
|
||||
def get_method(self):
|
||||
return 'PUT'
|
||||
|
||||
|
||||
def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1):
|
||||
if get_attr:
|
||||
if v is not None:
|
||||
@ -1920,7 +1925,13 @@ def update_Request(req, url=None, data=None, headers={}, query={}):
|
||||
req_headers.update(headers)
|
||||
req_data = data or req.data
|
||||
req_url = update_url_query(url or req.get_full_url(), query)
|
||||
req_type = HEADRequest if req.get_method() == 'HEAD' else compat_urllib_request.Request
|
||||
req_get_method = req.get_method()
|
||||
if req_get_method == 'HEAD':
|
||||
req_type = HEADRequest
|
||||
elif req_get_method == 'PUT':
|
||||
req_type = PUTRequest
|
||||
else:
|
||||
req_type = compat_urllib_request.Request
|
||||
new_req = req_type(
|
||||
req_url, data=req_data, headers=req_headers,
|
||||
origin_req_host=req.origin_req_host, unverifiable=req.unverifiable)
|
||||
|
@ -1,3 +1,3 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2016.07.01'
|
||||
__version__ = '2016.07.03.1'
|
||||
|
Loading…
x
Reference in New Issue
Block a user