Merge branch 'master' into tvpleextractor

This commit is contained in:
kjy00302 2016-03-15 23:56:03 +09:00
commit 86fd62ce46
99 changed files with 2675 additions and 984 deletions

1
.gitignore vendored
View File

@ -1,5 +1,6 @@
*.pyc *.pyc
*.pyo *.pyo
*.class
*~ *~
*.DS_Store *.DS_Store
wine-py2exe/ wine-py2exe/

View File

@ -160,3 +160,6 @@ Erwin de Haan
Jens Wille Jens Wille
Robin Houtevelts Robin Houtevelts
Patrick Griffis Patrick Griffis
Aidan Rowe
mutantmonkey
Ben Congdon

View File

@ -92,7 +92,9 @@ If you want to create a build of youtube-dl yourself, you'll need
### Adding support for a new site ### Adding support for a new site
If you want to add support for a new site, you can follow this quick list (assuming your service is called `yourextractor`): If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
1. [Fork this repository](https://github.com/rg3/youtube-dl/fork) 1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git` 2. Check out the source code with `git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git`
@ -140,16 +142,17 @@ If you want to add support for a new site, you can follow this quick list (assum
``` ```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py). 5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py#L62-L200). Add tests and code for as many as you want. 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. If you can, check the code with [flake8](https://pypi.python.org/pypi/flake8). 8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this: 9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py $ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/yourextractor.py $ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor' $ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor $ git push origin yourextractor
10. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it. 11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
In any case, thank you very much for your contributions! In any case, thank you very much for your contributions!

View File

@ -3,6 +3,7 @@ all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bas
clean: clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete find . -name "*.pyc" -delete
find . -name "*.class" -delete
PREFIX ?= /usr/local PREFIX ?= /usr/local
BINDIR ?= $(PREFIX)/bin BINDIR ?= $(PREFIX)/bin
@ -44,7 +45,7 @@ test:
ot: offlinetest ot: offlinetest
offlinetest: codetest offlinetest: codetest
nosetests --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py $(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
tar: youtube-dl.tar.gz tar: youtube-dl.tar.gz

View File

@ -80,6 +80,8 @@ which means you can modify it, redistribute it or use it however you like.
on Windows) on Windows)
--flat-playlist Do not extract the videos of a playlist, --flat-playlist Do not extract the videos of a playlist,
only list them. only list them.
--mark-watched Mark videos watched (YouTube only)
--no-mark-watched Do not mark videos watched (YouTube only)
--no-color Do not emit color codes in output --no-color Do not emit color codes in output
## Network Options: ## Network Options:
@ -179,7 +181,7 @@ which means you can modify it, redistribute it or use it however you like.
to play it) to play it)
--external-downloader COMMAND Use the specified external downloader. --external-downloader COMMAND Use the specified external downloader.
Currently supports Currently supports
aria2c,axel,curl,httpie,wget aria2c,avconv,axel,curl,ffmpeg,httpie,wget
--external-downloader-args ARGS Give these arguments to the external --external-downloader-args ARGS Give these arguments to the external
downloader downloader
@ -409,13 +411,18 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION # CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime and use a proxy: You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
``` ```
--extract-audio -x
--no-mtime --no-mtime
--proxy 127.0.0.1:3128 --proxy 127.0.0.1:3128
-o ~/Movies/%(title)s.%(ext)s
``` ```
Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run. You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run.
### Authentication with `.netrc` file ### Authentication with `.netrc` file
@ -453,6 +460,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `alt_title`: A secondary title of the video - `alt_title`: A secondary title of the video
- `display_id`: An alternative identifier for the video - `display_id`: An alternative identifier for the video
- `uploader`: Full name of the video uploader - `uploader`: Full name of the video uploader
- `license`: License name the video is licensed under
- `creator`: The main artist who created the video - `creator`: The main artist who created the video
- `release_date`: The date (YYYYMMDD) when the video was released - `release_date`: The date (YYYYMMDD) when the video was released
- `timestamp`: UNIX timestamp of the moment the video became available - `timestamp`: UNIX timestamp of the moment the video became available

View File

@ -54,6 +54,7 @@
- **AtresPlayer** - **AtresPlayer**
- **ATTTechChannel** - **ATTTechChannel**
- **AudiMedia** - **AudiMedia**
- **AudioBoom**
- **audiomack** - **audiomack**
- **audiomack:album** - **audiomack:album**
- **Azubu** - **Azubu**
@ -77,6 +78,7 @@
- **BleacherReportCMS** - **BleacherReportCMS**
- **blinkx** - **blinkx**
- **Bloomberg** - **Bloomberg**
- **BokeCC**
- **Bpb**: Bundeszentrale für politische Bildung - **Bpb**: Bundeszentrale für politische Bildung
- **BR**: Bayerischer Rundfunk Mediathek - **BR**: Bayerischer Rundfunk Mediathek
- **Break** - **Break**
@ -166,6 +168,8 @@
- **Dump** - **Dump**
- **Dumpert** - **Dumpert**
- **dvtv**: http://video.aktualne.cz/ - **dvtv**: http://video.aktualne.cz/
- **dw**
- **dw:article**
- **EaglePlatform** - **EaglePlatform**
- **EbaumsWorld** - **EbaumsWorld**
- **EchoMsk** - **EchoMsk**
@ -189,10 +193,10 @@
- **ExpoTV** - **ExpoTV**
- **ExtremeTube** - **ExtremeTube**
- **facebook** - **facebook**
- **facebook:post**
- **faz.net** - **faz.net**
- **fc2** - **fc2**
- **Fczenit** - **Fczenit**
- **features.aol.com**
- **fernsehkritik.tv** - **fernsehkritik.tv**
- **Firstpost** - **Firstpost**
- **FiveTV** - **FiveTV**
@ -292,6 +296,7 @@
- **kontrtube**: KontrTube.ru - Труба зовёт - **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью - **KrasView**: Красвью
- **Ku6** - **Ku6**
- **KUSI**
- **kuwo:album**: 酷我音乐 - 专辑 - **kuwo:album**: 酷我音乐 - 专辑
- **kuwo:category**: 酷我音乐 - 分类 - **kuwo:category**: 酷我音乐 - 分类
- **kuwo:chart**: 酷我音乐 - 排行榜 - **kuwo:chart**: 酷我音乐 - 排行榜
@ -300,12 +305,11 @@
- **kuwo:song**: 酷我音乐 - **kuwo:song**: 酷我音乐
- **la7.tv** - **la7.tv**
- **Laola1Tv** - **Laola1Tv**
- **Le**: 乐视网
- **Lecture2Go** - **Lecture2Go**
- **Lemonde** - **Lemonde**
- **Letv**: 乐视网 - **LePlaylist**
- **LetvCloud**: 乐视云 - **LetvCloud**: 乐视云
- **LetvPlaylist**
- **LetvTv**
- **Libsyn** - **Libsyn**
- **life:embed** - **life:embed**
- **lifenews**: LIFE | NEWS - **lifenews**: LIFE | NEWS
@ -323,6 +327,7 @@
- **m6** - **m6**
- **macgamestore**: MacGameStore trailers - **macgamestore**: MacGameStore trailers
- **mailru**: Видео@Mail.Ru - **mailru**: Видео@Mail.Ru
- **MakersChannel**
- **MakerTV** - **MakerTV**
- **Malemotion** - **Malemotion**
- **MatchTV** - **MatchTV**
@ -333,6 +338,7 @@
- **Mgoon** - **Mgoon**
- **Minhateca** - **Minhateca**
- **MinistryGrid** - **MinistryGrid**
- **Minoto**
- **miomio.tv** - **miomio.tv**
- **MiTele**: mitele.es - **MiTele**: mitele.es
- **mixcloud** - **mixcloud**
@ -420,6 +426,7 @@
- **Npr** - **Npr**
- **NRK** - **NRK**
- **NRKPlaylist** - **NRKPlaylist**
- **NRKSkole**: NRK Skole
- **NRKTV**: NRK TV and NRK Radio - **NRKTV**: NRK TV and NRK Radio
- **ntv.ru** - **ntv.ru**
- **Nuvid** - **Nuvid**
@ -560,7 +567,6 @@
- **southpark.de** - **southpark.de**
- **southpark.nl** - **southpark.nl**
- **southparkstudios.dk** - **southparkstudios.dk**
- **Space**
- **SpankBang** - **SpankBang**
- **Spankwire** - **Spankwire**
- **Spiegel** - **Spiegel**
@ -620,6 +626,7 @@
- **TMZ** - **TMZ**
- **TMZArticle** - **TMZArticle**
- **TNAFlix** - **TNAFlix**
- **TNAFlixNetworkEmbed**
- **toggle** - **toggle**
- **tou.tv** - **tou.tv**
- **Toypics**: Toypics user profile - **Toypics**: Toypics user profile
@ -669,8 +676,10 @@
- **UDNEmbed**: 聯合影音 - **UDNEmbed**: 聯合影音
- **Unistra** - **Unistra**
- **Urort**: NRK P3 Urørt - **Urort**: NRK P3 Urørt
- **USAToday**
- **ustream** - **ustream**
- **ustream:channel** - **ustream:channel**
- **Ustudio**
- **Varzesh3** - **Varzesh3**
- **Vbox7** - **Vbox7**
- **VeeHD** - **VeeHD**
@ -681,12 +690,13 @@
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet - **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com** - **vh1.com**
- **Vice** - **Vice**
- **ViceShow**
- **Viddler** - **Viddler**
- **video.google:search**: Google Video search - **video.google:search**: Google Video search
- **video.mit.edu** - **video.mit.edu**
- **VideoDetective** - **VideoDetective**
- **videofy.me** - **videofy.me**
- **VideoMega** (Currently broken) - **VideoMega**
- **videomore** - **videomore**
- **videomore:season** - **videomore:season**
- **videomore:video** - **videomore:video**
@ -708,6 +718,7 @@
- **vimeo:channel** - **vimeo:channel**
- **vimeo:group** - **vimeo:group**
- **vimeo:likes**: Vimeo user likes - **vimeo:likes**: Vimeo user likes
- **vimeo:ondemand**
- **vimeo:review**: Review pages on vimeo - **vimeo:review**: Review pages on vimeo
- **vimeo:user** - **vimeo:user**
- **vimeo:watchlater**: Vimeo watch later list, "vimeowatchlater" keyword (requires authentication) - **vimeo:watchlater**: Vimeo watch later list, "vimeowatchlater" keyword (requires authentication)

View File

@ -11,8 +11,11 @@ import sys
import youtube_dl.extractor import youtube_dl.extractor
from youtube_dl import YoutubeDL from youtube_dl import YoutubeDL
from youtube_dl.utils import ( from youtube_dl.compat import (
compat_os_name,
compat_str, compat_str,
)
from youtube_dl.utils import (
preferredencoding, preferredencoding,
write_string, write_string,
) )
@ -42,7 +45,7 @@ def report_warning(message):
Print the message to stderr, it will be prefixed with 'WARNING:' Print the message to stderr, it will be prefixed with 'WARNING:'
If stderr is a tty file the 'WARNING:' will be colored If stderr is a tty file the 'WARNING:' will be colored
''' '''
if sys.stderr.isatty() and os.name != 'nt': if sys.stderr.isatty() and compat_os_name != 'nt':
_msg_header = '\033[0;33mWARNING:\033[0m' _msg_header = '\033[0;33mWARNING:\033[0m'
else: else:
_msg_header = 'WARNING:' _msg_header = 'WARNING:'

View File

@ -502,6 +502,9 @@ class TestYoutubeDL(unittest.TestCase):
assertRegexpMatches(self, ydl._format_note({ assertRegexpMatches(self, ydl._format_note({
'vbr': 10, 'vbr': 10,
}), '^\s*10k$') }), '^\s*10k$')
assertRegexpMatches(self, ydl._format_note({
'fps': 30,
}), '^30fps$')
def test_postprocessors(self): def test_postprocessors(self):
filename = 'post-processor-testfile.mp4' filename = 'post-processor-testfile.mp4'

View File

@ -52,7 +52,12 @@ class TestHTTP(unittest.TestCase):
('localhost', 0), HTTPTestRequestHandler) ('localhost', 0), HTTPTestRequestHandler)
self.httpd.socket = ssl.wrap_socket( self.httpd.socket = ssl.wrap_socket(
self.httpd.socket, certfile=certfn, server_side=True) self.httpd.socket, certfile=certfn, server_side=True)
self.port = self.httpd.socket.getsockname()[1] if os.name == 'java':
# In Jython SSLSocket is not a subclass of socket.socket
sock = self.httpd.socket.sock
else:
sock = self.httpd.socket
self.port = sock.getsockname()[1]
self.server_thread = threading.Thread(target=self.httpd.serve_forever) self.server_thread = threading.Thread(target=self.httpd.serve_forever)
self.server_thread.daemon = True self.server_thread.daemon = True
self.server_thread.start() self.server_thread.start()

View File

@ -18,6 +18,7 @@ import xml.etree.ElementTree
from youtube_dl.utils import ( from youtube_dl.utils import (
age_restricted, age_restricted,
args_to_str, args_to_str,
encode_base_n,
clean_html, clean_html,
DateRange, DateRange,
detect_exe_version, detect_exe_version,
@ -40,6 +41,7 @@ from youtube_dl.utils import (
orderedSet, orderedSet,
parse_duration, parse_duration,
parse_filesize, parse_filesize,
parse_count,
parse_iso8601, parse_iso8601,
read_batch_urls, read_batch_urls,
sanitize_filename, sanitize_filename,
@ -60,6 +62,7 @@ from youtube_dl.utils import (
lowercase_escape, lowercase_escape,
url_basename, url_basename,
urlencode_postdata, urlencode_postdata,
update_url_query,
version_tuple, version_tuple,
xpath_with_ns, xpath_with_ns,
xpath_element, xpath_element,
@ -75,6 +78,8 @@ from youtube_dl.utils import (
) )
from youtube_dl.compat import ( from youtube_dl.compat import (
compat_etree_fromstring, compat_etree_fromstring,
compat_urlparse,
compat_parse_qs,
) )
@ -453,6 +458,40 @@ class TestUtil(unittest.TestCase):
data = urlencode_postdata({'username': 'foo@bar.com', 'password': '1234'}) data = urlencode_postdata({'username': 'foo@bar.com', 'password': '1234'})
self.assertTrue(isinstance(data, bytes)) self.assertTrue(isinstance(data, bytes))
def test_update_url_query(self):
def query_dict(url):
return compat_parse_qs(compat_urlparse.urlparse(url).query)
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'quality': ['HD'], 'format': ['mp4']})),
query_dict('http://example.com/path?quality=HD&format=mp4'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'system': ['LINUX', 'WINDOWS']})),
query_dict('http://example.com/path?system=LINUX&system=WINDOWS'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'fields': 'id,formats,subtitles'})),
query_dict('http://example.com/path?fields=id,formats,subtitles'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'fields': ('id,formats,subtitles', 'thumbnails')})),
query_dict('http://example.com/path?fields=id,formats,subtitles&fields=thumbnails'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path?manifest=f4m', {'manifest': []})),
query_dict('http://example.com/path'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path?system=LINUX&system=WINDOWS', {'system': 'LINUX'})),
query_dict('http://example.com/path?system=LINUX'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'fields': b'id,formats,subtitles'})),
query_dict('http://example.com/path?fields=id,formats,subtitles'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'width': 1080, 'height': 720})),
query_dict('http://example.com/path?width=1080&height=720'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'bitrate': 5020.43})),
query_dict('http://example.com/path?bitrate=5020.43'))
self.assertEqual(query_dict(update_url_query(
'http://example.com/path', {'test': '第二行тест'})),
query_dict('http://example.com/path?test=%E7%AC%AC%E4%BA%8C%E8%A1%8C%D1%82%D0%B5%D1%81%D1%82'))
def test_dict_get(self): def test_dict_get(self):
FALSE_VALUES = { FALSE_VALUES = {
'none': None, 'none': None,
@ -615,6 +654,15 @@ class TestUtil(unittest.TestCase):
self.assertEqual(parse_filesize('1.2Tb'), 1200000000000) self.assertEqual(parse_filesize('1.2Tb'), 1200000000000)
self.assertEqual(parse_filesize('1,24 KB'), 1240) self.assertEqual(parse_filesize('1,24 KB'), 1240)
def test_parse_count(self):
self.assertEqual(parse_count(None), None)
self.assertEqual(parse_count(''), None)
self.assertEqual(parse_count('0'), 0)
self.assertEqual(parse_count('1000'), 1000)
self.assertEqual(parse_count('1.000'), 1000)
self.assertEqual(parse_count('1.1k'), 1100)
self.assertEqual(parse_count('1.1kk'), 1100000)
def test_version_tuple(self): def test_version_tuple(self):
self.assertEqual(version_tuple('1'), (1,)) self.assertEqual(version_tuple('1'), (1,))
self.assertEqual(version_tuple('10.23.344'), (10, 23, 344)) self.assertEqual(version_tuple('10.23.344'), (10, 23, 344))
@ -802,5 +850,16 @@ The first line
ohdave_rsa_encrypt(b'aa111222', e, N), ohdave_rsa_encrypt(b'aa111222', e, N),
'726664bd9a23fd0c70f9f1b84aab5e3905ce1e45a584e9cbcf9bcc7510338fc1986d6c599ff990d923aa43c51c0d9013cd572e13bc58f4ae48f2ed8c0b0ba881') '726664bd9a23fd0c70f9f1b84aab5e3905ce1e45a584e9cbcf9bcc7510338fc1986d6c599ff990d923aa43c51c0d9013cd572e13bc58f4ae48f2ed8c0b0ba881')
def test_encode_base_n(self):
self.assertEqual(encode_base_n(0, 30), '0')
self.assertEqual(encode_base_n(80, 30), '2k')
custom_table = '9876543210ZYXWVUTSRQPONMLKJIHGFEDCBA'
self.assertEqual(encode_base_n(0, 30, custom_table), '9')
self.assertEqual(encode_base_n(80, 30, custom_table), '7P')
self.assertRaises(ValueError, encode_base_n, 0, 70)
self.assertRaises(ValueError, encode_base_n, 0, 60, custom_table)
if __name__ == '__main__': if __name__ == '__main__':
unittest.main() unittest.main()

View File

@ -24,9 +24,6 @@ import time
import tokenize import tokenize
import traceback import traceback
if os.name == 'nt':
import ctypes
from .compat import ( from .compat import (
compat_basestring, compat_basestring,
compat_cookiejar, compat_cookiejar,
@ -34,6 +31,7 @@ from .compat import (
compat_get_terminal_size, compat_get_terminal_size,
compat_http_client, compat_http_client,
compat_kwargs, compat_kwargs,
compat_os_name,
compat_str, compat_str,
compat_tokenize_tokenize, compat_tokenize_tokenize,
compat_urllib_error, compat_urllib_error,
@ -87,6 +85,7 @@ from .extractor import get_info_extractor, gen_extractors
from .downloader import get_suitable_downloader from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version from .downloader.rtmp import rtmpdump_version
from .postprocessor import ( from .postprocessor import (
FFmpegFixupM3u8PP,
FFmpegFixupM4aPP, FFmpegFixupM4aPP,
FFmpegFixupStretchedPP, FFmpegFixupStretchedPP,
FFmpegMergerPP, FFmpegMergerPP,
@ -95,6 +94,9 @@ from .postprocessor import (
) )
from .version import __version__ from .version import __version__
if compat_os_name == 'nt':
import ctypes
class YoutubeDL(object): class YoutubeDL(object):
"""YoutubeDL class. """YoutubeDL class.
@ -450,7 +452,7 @@ class YoutubeDL(object):
def to_console_title(self, message): def to_console_title(self, message):
if not self.params.get('consoletitle', False): if not self.params.get('consoletitle', False):
return return
if os.name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow(): if compat_os_name == 'nt' and ctypes.windll.kernel32.GetConsoleWindow():
# c_wchar_p() might not be necessary if `message` is # c_wchar_p() might not be necessary if `message` is
# already of type unicode() # already of type unicode()
ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message)) ctypes.windll.kernel32.SetConsoleTitleW(ctypes.c_wchar_p(message))
@ -521,7 +523,7 @@ class YoutubeDL(object):
else: else:
if self.params.get('no_warnings'): if self.params.get('no_warnings'):
return return
if not self.params.get('no_color') and self._err_file.isatty() and os.name != 'nt': if not self.params.get('no_color') and self._err_file.isatty() and compat_os_name != 'nt':
_msg_header = '\033[0;33mWARNING:\033[0m' _msg_header = '\033[0;33mWARNING:\033[0m'
else: else:
_msg_header = 'WARNING:' _msg_header = 'WARNING:'
@ -533,7 +535,7 @@ class YoutubeDL(object):
Do the same as trouble, but prefixes the message with 'ERROR:', colored Do the same as trouble, but prefixes the message with 'ERROR:', colored
in red if stderr is a tty file. in red if stderr is a tty file.
''' '''
if not self.params.get('no_color') and self._err_file.isatty() and os.name != 'nt': if not self.params.get('no_color') and self._err_file.isatty() and compat_os_name != 'nt':
_msg_header = '\033[0;31mERROR:\033[0m' _msg_header = '\033[0;31mERROR:\033[0m'
else: else:
_msg_header = 'ERROR:' _msg_header = 'ERROR:'
@ -566,7 +568,7 @@ class YoutubeDL(object):
elif template_dict.get('height'): elif template_dict.get('height'):
template_dict['resolution'] = '%sp' % template_dict['height'] template_dict['resolution'] = '%sp' % template_dict['height']
elif template_dict.get('width'): elif template_dict.get('width'):
template_dict['resolution'] = '?x%d' % template_dict['width'] template_dict['resolution'] = '%dx?' % template_dict['width']
sanitize = lambda k, v: sanitize_filename( sanitize = lambda k, v: sanitize_filename(
compat_str(v), compat_str(v),
@ -1232,6 +1234,10 @@ class YoutubeDL(object):
if t.get('id') is None: if t.get('id') is None:
t['id'] = '%d' % i t['id'] = '%d' % i
if self.params.get('list_thumbnails'):
self.list_thumbnails(info_dict)
return
if thumbnails and 'thumbnail' not in info_dict: if thumbnails and 'thumbnail' not in info_dict:
info_dict['thumbnail'] = thumbnails[-1]['url'] info_dict['thumbnail'] = thumbnails[-1]['url']
@ -1333,9 +1339,6 @@ class YoutubeDL(object):
if self.params.get('listformats'): if self.params.get('listformats'):
self.list_formats(info_dict) self.list_formats(info_dict)
return return
if self.params.get('list_thumbnails'):
self.list_thumbnails(info_dict)
return
req_format = self.params.get('format') req_format = self.params.get('format')
if req_format is None: if req_format is None:
@ -1631,12 +1634,14 @@ class YoutubeDL(object):
self.report_error('content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded)) self.report_error('content too short (expected %s bytes and served %s)' % (err.expected, err.downloaded))
return return
if success: if success and filename != '-':
# Fixup content # Fixup content
fixup_policy = self.params.get('fixup') fixup_policy = self.params.get('fixup')
if fixup_policy is None: if fixup_policy is None:
fixup_policy = 'detect_or_warn' fixup_policy = 'detect_or_warn'
INSTALL_FFMPEG_MESSAGE = 'Install ffmpeg or avconv to fix this automatically.'
stretched_ratio = info_dict.get('stretched_ratio') stretched_ratio = info_dict.get('stretched_ratio')
if stretched_ratio is not None and stretched_ratio != 1: if stretched_ratio is not None and stretched_ratio != 1:
if fixup_policy == 'warn': if fixup_policy == 'warn':
@ -1649,15 +1654,18 @@ class YoutubeDL(object):
info_dict['__postprocessors'].append(stretched_pp) info_dict['__postprocessors'].append(stretched_pp)
else: else:
self.report_warning( self.report_warning(
'%s: Non-uniform pixel ratio (%s). Install ffmpeg or avconv to fix this automatically.' % ( '%s: Non-uniform pixel ratio (%s). %s'
info_dict['id'], stretched_ratio)) % (info_dict['id'], stretched_ratio, INSTALL_FFMPEG_MESSAGE))
else: else:
assert fixup_policy in ('ignore', 'never') assert fixup_policy in ('ignore', 'never')
if info_dict.get('requested_formats') is None and info_dict.get('container') == 'm4a_dash': if (info_dict.get('requested_formats') is None and
info_dict.get('container') == 'm4a_dash'):
if fixup_policy == 'warn': if fixup_policy == 'warn':
self.report_warning('%s: writing DASH m4a. Only some players support this container.' % ( self.report_warning(
info_dict['id'])) '%s: writing DASH m4a. '
'Only some players support this container.'
% info_dict['id'])
elif fixup_policy == 'detect_or_warn': elif fixup_policy == 'detect_or_warn':
fixup_pp = FFmpegFixupM4aPP(self) fixup_pp = FFmpegFixupM4aPP(self)
if fixup_pp.available: if fixup_pp.available:
@ -1665,8 +1673,27 @@ class YoutubeDL(object):
info_dict['__postprocessors'].append(fixup_pp) info_dict['__postprocessors'].append(fixup_pp)
else: else:
self.report_warning( self.report_warning(
'%s: writing DASH m4a. Only some players support this container. Install ffmpeg or avconv to fix this automatically.' % ( '%s: writing DASH m4a. '
info_dict['id'])) 'Only some players support this container. %s'
% (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
else:
assert fixup_policy in ('ignore', 'never')
if (info_dict.get('protocol') == 'm3u8_native' or
info_dict.get('protocol') == 'm3u8' and
self.params.get('hls_prefer_native')):
if fixup_policy == 'warn':
self.report_warning('%s: malformated aac bitstream.' % (
info_dict['id']))
elif fixup_policy == 'detect_or_warn':
fixup_pp = FFmpegFixupM3u8PP(self)
if fixup_pp.available:
info_dict.setdefault('__postprocessors', [])
info_dict['__postprocessors'].append(fixup_pp)
else:
self.report_warning(
'%s: malformated aac bitstream. %s'
% (info_dict['id'], INSTALL_FFMPEG_MESSAGE))
else: else:
assert fixup_policy in ('ignore', 'never') assert fixup_policy in ('ignore', 'never')
@ -1830,7 +1857,9 @@ class YoutubeDL(object):
if fdict.get('vbr') is not None: if fdict.get('vbr') is not None:
res += '%4dk' % fdict['vbr'] res += '%4dk' % fdict['vbr']
if fdict.get('fps') is not None: if fdict.get('fps') is not None:
res += ', %sfps' % fdict['fps'] if res:
res += ', '
res += '%sfps' % fdict['fps']
if fdict.get('acodec') is not None: if fdict.get('acodec') is not None:
if res: if res:
res += ', ' res += ', '
@ -1873,13 +1902,8 @@ class YoutubeDL(object):
def list_thumbnails(self, info_dict): def list_thumbnails(self, info_dict):
thumbnails = info_dict.get('thumbnails') thumbnails = info_dict.get('thumbnails')
if not thumbnails: if not thumbnails:
tn_url = info_dict.get('thumbnail') self.to_screen('[info] No thumbnails present for %s' % info_dict['id'])
if tn_url: return
thumbnails = [{'id': '0', 'url': tn_url}]
else:
self.to_screen(
'[info] No thumbnails present for %s' % info_dict['id'])
return
self.to_screen( self.to_screen(
'[info] Thumbnails for %s:' % info_dict['id']) '[info] Thumbnails for %s:' % info_dict['id'])

View File

@ -355,6 +355,7 @@ def _real_main(argv=None):
'youtube_include_dash_manifest': opts.youtube_include_dash_manifest, 'youtube_include_dash_manifest': opts.youtube_include_dash_manifest,
'encoding': opts.encoding, 'encoding': opts.encoding,
'extract_flat': opts.extract_flat, 'extract_flat': opts.extract_flat,
'mark_watched': opts.mark_watched,
'merge_output_format': opts.merge_output_format, 'merge_output_format': opts.merge_output_format,
'postprocessors': postprocessors, 'postprocessors': postprocessors,
'fixup': opts.fixup, 'fixup': opts.fixup,

View File

@ -326,6 +326,9 @@ def compat_ord(c):
return ord(c) return ord(c)
compat_os_name = os._name if os.name == 'java' else os.name
if sys.version_info >= (3, 0): if sys.version_info >= (3, 0):
compat_getenv = os.getenv compat_getenv = os.getenv
compat_expanduser = os.path.expanduser compat_expanduser = os.path.expanduser
@ -346,7 +349,7 @@ else:
# The following are os.path.expanduser implementations from cpython 2.7.8 stdlib # The following are os.path.expanduser implementations from cpython 2.7.8 stdlib
# for different platforms with correct environment variables decoding. # for different platforms with correct environment variables decoding.
if os.name == 'posix': if compat_os_name == 'posix':
def compat_expanduser(path): def compat_expanduser(path):
"""Expand ~ and ~user constructions. If user or $HOME is unknown, """Expand ~ and ~user constructions. If user or $HOME is unknown,
do nothing.""" do nothing."""
@ -370,7 +373,7 @@ else:
userhome = pwent.pw_dir userhome = pwent.pw_dir
userhome = userhome.rstrip('/') userhome = userhome.rstrip('/')
return (userhome + path[i:]) or '/' return (userhome + path[i:]) or '/'
elif os.name == 'nt' or os.name == 'ce': elif compat_os_name == 'nt' or compat_os_name == 'ce':
def compat_expanduser(path): def compat_expanduser(path):
"""Expand ~ and ~user constructs. """Expand ~ and ~user constructs.
@ -556,6 +559,7 @@ __all__ = [
'compat_itertools_count', 'compat_itertools_count',
'compat_kwargs', 'compat_kwargs',
'compat_ord', 'compat_ord',
'compat_os_name',
'compat_parse_qs', 'compat_parse_qs',
'compat_print', 'compat_print',
'compat_shlex_split', 'compat_shlex_split',

View File

@ -1,14 +1,16 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import FileDownloader from .common import FileDownloader
from .external import get_external_downloader
from .f4m import F4mFD from .f4m import F4mFD
from .hls import HlsFD from .hls import HlsFD
from .hls import NativeHlsFD
from .http import HttpFD from .http import HttpFD
from .rtsp import RtspFD
from .rtmp import RtmpFD from .rtmp import RtmpFD
from .dash import DashSegmentsFD from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .external import (
get_external_downloader,
FFmpegFD,
)
from ..utils import ( from ..utils import (
determine_protocol, determine_protocol,
@ -16,8 +18,8 @@ from ..utils import (
PROTOCOL_MAP = { PROTOCOL_MAP = {
'rtmp': RtmpFD, 'rtmp': RtmpFD,
'm3u8_native': NativeHlsFD, 'm3u8_native': HlsFD,
'm3u8': HlsFD, 'm3u8': FFmpegFD,
'mms': RtspFD, 'mms': RtspFD,
'rtsp': RtspFD, 'rtsp': RtspFD,
'f4m': F4mFD, 'f4m': F4mFD,
@ -30,14 +32,17 @@ def get_suitable_downloader(info_dict, params={}):
protocol = determine_protocol(info_dict) protocol = determine_protocol(info_dict)
info_dict['protocol'] = protocol info_dict['protocol'] = protocol
# if (info_dict.get('start_time') or info_dict.get('end_time')) and not info_dict.get('requested_formats') and FFmpegFD.can_download(info_dict):
# return FFmpegFD
external_downloader = params.get('external_downloader') external_downloader = params.get('external_downloader')
if external_downloader is not None: if external_downloader is not None:
ed = get_external_downloader(external_downloader) ed = get_external_downloader(external_downloader)
if ed.supports(info_dict): if ed.can_download(info_dict):
return ed return ed
if protocol == 'm3u8' and params.get('hls_prefer_native'): if protocol == 'm3u8' and params.get('hls_prefer_native'):
return NativeHlsFD return HlsFD
return PROTOCOL_MAP.get(protocol, HttpFD) return PROTOCOL_MAP.get(protocol, HttpFD)

View File

@ -5,6 +5,7 @@ import re
import sys import sys
import time import time
from ..compat import compat_os_name
from ..utils import ( from ..utils import (
encodeFilename, encodeFilename,
error_to_compat_str, error_to_compat_str,
@ -219,7 +220,7 @@ class FileDownloader(object):
if self.params.get('progress_with_newline', False): if self.params.get('progress_with_newline', False):
self.to_screen(fullmsg) self.to_screen(fullmsg)
else: else:
if os.name == 'nt': if compat_os_name == 'nt':
prev_len = getattr(self, '_report_progress_prev_line_length', prev_len = getattr(self, '_report_progress_prev_line_length',
0) 0)
if prev_len > len(fullmsg): if prev_len > len(fullmsg):

View File

@ -2,8 +2,11 @@ from __future__ import unicode_literals
import os.path import os.path
import subprocess import subprocess
import sys
import re
from .common import FileDownloader from .common import FileDownloader
from ..postprocessor.ffmpeg import FFmpegPostProcessor, EXT_TO_OUT_FORMATS
from ..utils import ( from ..utils import (
cli_option, cli_option,
cli_valueless_option, cli_valueless_option,
@ -11,6 +14,8 @@ from ..utils import (
cli_configuration_args, cli_configuration_args,
encodeFilename, encodeFilename,
encodeArgument, encodeArgument,
handle_youtubedl_headers,
check_executable,
) )
@ -45,10 +50,18 @@ class ExternalFD(FileDownloader):
def exe(self): def exe(self):
return self.params.get('external_downloader') return self.params.get('external_downloader')
@classmethod
def available(cls):
return check_executable(cls.get_basename(), [cls.AVAILABLE_OPT])
@classmethod @classmethod
def supports(cls, info_dict): def supports(cls, info_dict):
return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps') return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps')
@classmethod
def can_download(cls, info_dict):
return cls.available() and cls.supports(info_dict)
def _option(self, command_option, param): def _option(self, command_option, param):
return cli_option(self.params, command_option, param) return cli_option(self.params, command_option, param)
@ -76,6 +89,8 @@ class ExternalFD(FileDownloader):
class CurlFD(ExternalFD): class CurlFD(ExternalFD):
AVAILABLE_OPT = '-V'
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '--location', '-o', tmpfilename] cmd = [self.exe, '--location', '-o', tmpfilename]
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
@ -89,6 +104,8 @@ class CurlFD(ExternalFD):
class AxelFD(ExternalFD): class AxelFD(ExternalFD):
AVAILABLE_OPT = '-V'
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-o', tmpfilename] cmd = [self.exe, '-o', tmpfilename]
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
@ -99,6 +116,8 @@ class AxelFD(ExternalFD):
class WgetFD(ExternalFD): class WgetFD(ExternalFD):
AVAILABLE_OPT = '--version'
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-O', tmpfilename, '-nv', '--no-cookies'] cmd = [self.exe, '-O', tmpfilename, '-nv', '--no-cookies']
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
@ -112,6 +131,8 @@ class WgetFD(ExternalFD):
class Aria2cFD(ExternalFD): class Aria2cFD(ExternalFD):
AVAILABLE_OPT = '-v'
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = [self.exe, '-c'] cmd = [self.exe, '-c']
cmd += self._configuration_args([ cmd += self._configuration_args([
@ -130,12 +151,112 @@ class Aria2cFD(ExternalFD):
class HttpieFD(ExternalFD): class HttpieFD(ExternalFD):
@classmethod
def available(cls):
return check_executable('http', ['--version'])
def _make_cmd(self, tmpfilename, info_dict): def _make_cmd(self, tmpfilename, info_dict):
cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']] cmd = ['http', '--download', '--output', tmpfilename, info_dict['url']]
for key, val in info_dict['http_headers'].items(): for key, val in info_dict['http_headers'].items():
cmd += ['%s:%s' % (key, val)] cmd += ['%s:%s' % (key, val)]
return cmd return cmd
class FFmpegFD(ExternalFD):
@classmethod
def supports(cls, info_dict):
return info_dict['protocol'] in ('http', 'https', 'ftp', 'ftps', 'm3u8', 'rtsp', 'rtmp', 'mms')
@classmethod
def available(cls):
return FFmpegPostProcessor().available
def _call_downloader(self, tmpfilename, info_dict):
url = info_dict['url']
ffpp = FFmpegPostProcessor(downloader=self)
if not ffpp.available:
self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
return False
ffpp.check_version()
args = [ffpp.executable, '-y']
args += self._configuration_args()
# start_time = info_dict.get('start_time') or 0
# if start_time:
# args += ['-ss', compat_str(start_time)]
# end_time = info_dict.get('end_time')
# if end_time:
# args += ['-t', compat_str(end_time - start_time)]
if info_dict['http_headers'] and re.match(r'^https?://', url):
# Trailing \r\n after each HTTP header is important to prevent warning from ffmpeg/avconv:
# [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.
headers = handle_youtubedl_headers(info_dict['http_headers'])
args += [
'-headers',
''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
protocol = info_dict.get('protocol')
if protocol == 'rtmp':
player_url = info_dict.get('player_url')
page_url = info_dict.get('page_url')
app = info_dict.get('app')
play_path = info_dict.get('play_path')
tc_url = info_dict.get('tc_url')
flash_version = info_dict.get('flash_version')
live = info_dict.get('rtmp_live', False)
if player_url is not None:
args += ['-rtmp_swfverify', player_url]
if page_url is not None:
args += ['-rtmp_pageurl', page_url]
if app is not None:
args += ['-rtmp_app', app]
if play_path is not None:
args += ['-rtmp_playpath', play_path]
if tc_url is not None:
args += ['-rtmp_tcurl', tc_url]
if flash_version is not None:
args += ['-rtmp_flashver', flash_version]
if live:
args += ['-rtmp_live', 'live']
args += ['-i', url, '-c', 'copy']
if protocol == 'm3u8':
if self.params.get('hls_use_mpegts', False):
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
elif protocol == 'rtmp':
args += ['-f', 'flv']
else:
args += ['-f', EXT_TO_OUT_FORMATS.get(info_dict['ext'], info_dict['ext'])]
args = [encodeArgument(opt) for opt in args]
args.append(encodeFilename(ffpp._ffmpeg_filename_argument(tmpfilename), True))
self._debug_cmd(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE)
try:
retval = proc.wait()
except KeyboardInterrupt:
# subprocces.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/rg3/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise
return retval
class AVconvFD(FFmpegFD):
pass
_BY_NAME = dict( _BY_NAME = dict(
(klass.get_basename(), klass) (klass.get_basename(), klass)
for name, klass in globals().items() for name, klass in globals().items()

View File

@ -99,7 +99,8 @@ class FragmentFD(FileDownloader):
state['eta'] = self.calc_eta( state['eta'] = self.calc_eta(
start, time_now, estimated_size, start, time_now, estimated_size,
state['downloaded_bytes']) state['downloaded_bytes'])
state['speed'] = s.get('speed') state['speed'] = s.get('speed') or ctx.get('speed')
ctx['speed'] = state['speed']
ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes ctx['prev_frag_downloaded_bytes'] = frag_downloaded_bytes
self._hook_progress(state) self._hook_progress(state)

View File

@ -1,87 +1,19 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import os import os.path
import re import re
import subprocess
import sys
from .common import FileDownloader
from .fragment import FragmentFD from .fragment import FragmentFD
from ..compat import compat_urlparse from ..compat import compat_urlparse
from ..postprocessor.ffmpeg import FFmpegPostProcessor
from ..utils import ( from ..utils import (
encodeArgument,
encodeFilename, encodeFilename,
sanitize_open, sanitize_open,
handle_youtubedl_headers,
) )
class HlsFD(FileDownloader): class HlsFD(FragmentFD):
def real_download(self, filename, info_dict): """ A limited implementation that does not require ffmpeg """
url = info_dict['url']
self.report_destination(filename)
tmpfilename = self.temp_name(filename)
ffpp = FFmpegPostProcessor(downloader=self)
if not ffpp.available:
self.report_error('m3u8 download detected but ffmpeg or avconv could not be found. Please install one.')
return False
ffpp.check_version()
args = [ffpp.executable, '-y']
if info_dict['http_headers'] and re.match(r'^https?://', url):
# Trailing \r\n after each HTTP header is important to prevent warning from ffmpeg/avconv:
# [http @ 00000000003d2fa0] No trailing CRLF found in HTTP header.
headers = handle_youtubedl_headers(info_dict['http_headers'])
args += [
'-headers',
''.join('%s: %s\r\n' % (key, val) for key, val in headers.items())]
args += ['-i', url, '-c', 'copy']
if self.params.get('hls_use_mpegts', False):
args += ['-f', 'mpegts']
else:
args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
args = [encodeArgument(opt) for opt in args]
args.append(encodeFilename(ffpp._ffmpeg_filename_argument(tmpfilename), True))
self._debug_cmd(args)
proc = subprocess.Popen(args, stdin=subprocess.PIPE)
try:
retval = proc.wait()
except KeyboardInterrupt:
# subprocces.run would send the SIGKILL signal to ffmpeg and the
# mp4 file couldn't be played, but if we ask ffmpeg to quit it
# produces a file that is playable (this is mostly useful for live
# streams). Note that Windows is not affected and produces playable
# files (see https://github.com/rg3/youtube-dl/issues/8300).
if sys.platform != 'win32':
proc.communicate(b'q')
raise
if retval == 0:
fsize = os.path.getsize(encodeFilename(tmpfilename))
self.to_screen('\r[%s] %s bytes' % (args[0], fsize))
self.try_rename(tmpfilename, filename)
self._hook_progress({
'downloaded_bytes': fsize,
'total_bytes': fsize,
'filename': filename,
'status': 'finished',
})
return True
else:
self.to_stderr('\n')
self.report_error('%s exited with code %d' % (ffpp.basename, retval))
return False
class NativeHlsFD(FragmentFD):
""" A more limited implementation that does not require ffmpeg """
FD_NAME = 'hlsnative' FD_NAME = 'hlsnative'

View File

@ -23,7 +23,10 @@ from .alphaporno import AlphaPornoIE
from .animeondemand import AnimeOnDemandIE from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE from .anitube import AnitubeIE
from .anysex import AnySexIE from .anysex import AnySexIE
from .aol import AolIE from .aol import (
AolIE,
AolFeaturesIE,
)
from .allocine import AllocineIE from .allocine import AllocineIE
from .aparat import AparatIE from .aparat import AparatIE
from .appleconnect import AppleConnectIE from .appleconnect import AppleConnectIE
@ -51,6 +54,7 @@ from .arte import (
from .atresplayer import AtresPlayerIE from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE from .atttechchannel import ATTTechChannelIE
from .audimedia import AudiMediaIE from .audimedia import AudiMediaIE
from .audioboom import AudioBoomIE
from .audiomack import AudiomackIE, AudiomackAlbumIE from .audiomack import AudiomackIE, AudiomackAlbumIE
from .azubu import AzubuIE, AzubuLiveIE from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE from .baidu import BaiduVideoIE
@ -74,6 +78,7 @@ from .bleacherreport import (
) )
from .blinkx import BlinkxIE from .blinkx import BlinkxIE
from .bloomberg import BloombergIE from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bpb import BpbIE from .bpb import BpbIE
from .br import BRIE from .br import BRIE
from .breakcom import BreakIE from .breakcom import BreakIE
@ -184,6 +189,10 @@ from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE from .discovery import DiscoveryIE
from .dropbox import DropboxIE from .dropbox import DropboxIE
from .dw import (
DWIE,
DWArticleIE,
)
from .eagleplatform import EaglePlatformIE from .eagleplatform import EaglePlatformIE
from .ebaumsworld import EbaumsWorldIE from .ebaumsworld import EbaumsWorldIE
from .echomsk import EchoMskIE from .echomsk import EchoMskIE
@ -208,10 +217,7 @@ from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE from .exfm import ExfmIE
from .expotv import ExpoTVIE from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE from .extremetube import ExtremeTubeIE
from .facebook import ( from .facebook import FacebookIE
FacebookIE,
FacebookPostIE,
)
from .faz import FazIE from .faz import FazIE
from .fc2 import FC2IE from .fc2 import FC2IE
from .fczenit import FczenitIE from .fczenit import FczenitIE
@ -339,6 +345,7 @@ from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE from .krasview import KrasViewIE
from .ku6 import Ku6IE from .ku6 import Ku6IE
from .kusi import KUSIIE
from .kuwo import ( from .kuwo import (
KuwoIE, KuwoIE,
KuwoAlbumIE, KuwoAlbumIE,
@ -351,10 +358,9 @@ from .la7 import LA7IE
from .laola1tv import Laola1TvIE from .laola1tv import Laola1TvIE
from .lecture2go import Lecture2GoIE from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE from .lemonde import LemondeIE
from .letv import ( from .leeco import (
LetvIE, LeIE,
LetvTvIE, LePlaylistIE,
LetvPlaylistIE,
LetvCloudIE, LetvCloudIE,
) )
from .libsyn import LibsynIE from .libsyn import LibsynIE
@ -383,6 +389,7 @@ from .lynda import (
from .m6 import M6IE from .m6 import M6IE
from .macgamestore import MacGameStoreIE from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE from .makertv import MakerTVIE
from .malemotion import MalemotionIE from .malemotion import MalemotionIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
@ -392,6 +399,7 @@ from .metacritic import MetacriticIE
from .mgoon import MgoonIE from .mgoon import MgoonIE
from .minhateca import MinhatecaIE from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
from .miomio import MioMioIE from .miomio import MioMioIE
from .mit import TechTVMITIE, MITIE, OCWMITIE from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE from .mitele import MiTeleIE
@ -505,6 +513,7 @@ from .npr import NprIE
from .nrk import ( from .nrk import (
NRKIE, NRKIE,
NRKPlaylistIE, NRKPlaylistIE,
NRKSkoleIE,
NRKTVIE, NRKTVIE,
) )
from .ntvde import NTVDeIE from .ntvde import NTVDeIE
@ -589,6 +598,7 @@ from .regiotv import RegioTVIE
from .restudy import RestudyIE from .restudy import RestudyIE
from .reverbnation import ReverbNationIE from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE from .revision3 import Revision3IE
from .rice import RICEIE
from .ringtv import RingTVIE from .ringtv import RingTVIE
from .ro220 import Ro220IE from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE from .rottentomatoes import RottenTomatoesIE
@ -669,7 +679,6 @@ from .southpark import (
SouthParkEsIE, SouthParkEsIE,
SouthParkNlIE SouthParkNlIE
) )
from .space import SpaceIE
from .spankbang import SpankBangIE from .spankbang import SpankBangIE
from .spankwire import SpankwireIE from .spankwire import SpankwireIE
from .spiegel import SpiegelIE, SpiegelArticleIE from .spiegel import SpiegelIE, SpiegelArticleIE
@ -737,6 +746,7 @@ from .tmz import (
TMZArticleIE, TMZArticleIE,
) )
from .tnaflix import ( from .tnaflix import (
TNAFlixNetworkEmbedIE,
TNAFlixIE, TNAFlixIE,
EMPFlixIE, EMPFlixIE,
MovieFapIE, MovieFapIE,
@ -813,7 +823,9 @@ from .udn import UDNEmbedIE
from .digiteka import DigitekaIE from .digiteka import DigitekaIE
from .unistra import UnistraIE from .unistra import UnistraIE
from .urort import UrortIE from .urort import UrortIE
from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import UstudioIE
from .varzesh3 import Varzesh3IE from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE from .vbox7 import Vbox7IE
from .veehd import VeeHDIE from .veehd import VeeHDIE
@ -827,7 +839,10 @@ from .vgtv import (
VGTVIE, VGTVIE,
) )
from .vh1 import VH1IE from .vh1 import VH1IE
from .vice import ViceIE from .vice import (
ViceIE,
ViceShowIE,
)
from .viddler import ViddlerIE from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE from .videofyme import VideofyMeIE
@ -854,6 +869,7 @@ from .vimeo import (
VimeoChannelIE, VimeoChannelIE,
VimeoGroupsIE, VimeoGroupsIE,
VimeoLikesIE, VimeoLikesIE,
VimeoOndemandIE,
VimeoReviewIE, VimeoReviewIE,
VimeoUserIE, VimeoUserIE,
VimeoWatchLaterIE, VimeoWatchLaterIE,

View File

@ -1,24 +1,11 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
class AolIE(InfoExtractor): class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com' IE_NAME = 'on.aol.com'
_VALID_URL = r'''(?x) _VALID_URL = r'(?:aol-video:|http://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
(?:
aol-video:|
http://on\.aol\.com/
(?:
video/.*-|
playlist/(?P<playlist_display_id>[^/?#]+?)-(?P<playlist_id>[0-9]+)[?#].*_videoid=
)
)
(?P<id>[0-9]+)
(?:$|\?)
'''
_TESTS = [{ _TESTS = [{
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img', 'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
@ -29,42 +16,31 @@ class AolIE(InfoExtractor):
'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam', 'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
}, },
'add_ie': ['FiveMin'], 'add_ie': ['FiveMin'],
}, {
'url': 'http://on.aol.com/playlist/brace-yourself---todays-weirdest-news-152147?icid=OnHomepageC4_Omg_Img#_videoid=518184316',
'info_dict': {
'id': '152147',
'title': 'Brace Yourself - Today\'s Weirdest News',
},
'playlist_mincount': 10,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
video_id = mobj.group('id') return self.url_result('5min:%s' % video_id)
playlist_id = mobj.group('playlist_id')
if not playlist_id or self._downloader.params.get('noplaylist'):
return self.url_result('5min:%s' % video_id)
self.to_screen('Downloading playlist %s - add --no-playlist to just download video %s' % (playlist_id, video_id))
webpage = self._download_webpage(url, playlist_id) class AolFeaturesIE(InfoExtractor):
title = self._html_search_regex( IE_NAME = 'features.aol.com'
r'<h1 class="video-title[^"]*">(.+?)</h1>', webpage, 'title') _VALID_URL = r'http://features\.aol\.com/video/(?P<id>[^/?#]+)'
playlist_html = self._search_regex(
r"(?s)<ul\s+class='video-related[^']*'>(.*?)</ul>", webpage,
'playlist HTML')
entries = [{
'_type': 'url',
'url': 'aol-video:%s' % m.group('id'),
'ie_key': 'Aol',
} for m in re.finditer(
r"<a\s+href='.*videoid=(?P<id>[0-9]+)'\s+class='video-thumb'>",
playlist_html)]
return { _TESTS = [{
'_type': 'playlist', 'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts',
'id': playlist_id, 'md5': '7db483bb0c09c85e241f84a34238cc75',
'display_id': mobj.group('playlist_display_id'), 'info_dict': {
'title': title, 'id': '519507715',
'entries': entries, 'ext': 'mp4',
} 'title': 'What To Watch - February 17, 2016',
},
'add_ie': ['FiveMin'],
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
return self.url_result(self._search_regex(
r'<script type="text/javascript" src="(https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js[^"]+)"',
webpage, '5min embed url'), 'FiveMin')

View File

@ -121,15 +121,18 @@ class ArteTVPlus7IE(InfoExtractor):
json_url = compat_parse_qs( json_url = compat_parse_qs(
compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0] compat_urllib_parse_urlparse(iframe_url).query)['json_url'][0]
if json_url: if json_url:
return self._extract_from_json_url(json_url, video_id, lang) title = self._search_regex(
# Differend kind of embed URL (e.g. r'<h3[^>]+title=(["\'])(?P<title>.+?)\1',
webpage, 'title', default=None, group='title')
return self._extract_from_json_url(json_url, video_id, lang, title=title)
# Different kind of embed URL (e.g.
# http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium) # http://www.arte.tv/magazine/trepalium/fr/episode-0406-replay-trepalium)
embed_url = self._search_regex( embed_url = self._search_regex(
r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1', r'<iframe[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'embed url', group='url') webpage, 'embed url', group='url')
return self.url_result(embed_url) return self.url_result(embed_url)
def _extract_from_json_url(self, json_url, video_id, lang): def _extract_from_json_url(self, json_url, video_id, lang, title=None):
info = self._download_json(json_url, video_id) info = self._download_json(json_url, video_id)
player_info = info['videoJsonPlayer'] player_info = info['videoJsonPlayer']
@ -137,7 +140,7 @@ class ArteTVPlus7IE(InfoExtractor):
if not upload_date_str: if not upload_date_str:
upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0] upload_date_str = (player_info.get('VRA') or player_info.get('VDA') or '').split(' ')[0]
title = player_info['VTI'].strip() title = (player_info.get('VTI') or title or player_info['VID']).strip()
subtitle = player_info.get('VSU', '').strip() subtitle = player_info.get('VSU', '').strip()
if subtitle: if subtitle:
title += ' - %s' % subtitle title += ' - %s' % subtitle

View File

@ -10,9 +10,9 @@ from ..utils import (
class AudiMediaIE(InfoExtractor): class AudiMediaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audimedia\.tv/(?:en|de)/vid/(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?audi-mediacenter\.com/(?:en|de)/audimediatv/(?P<id>[^/?#]+)'
_TEST = { _TEST = {
'url': 'https://audimedia.tv/en/vid/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test', 'url': 'https://www.audi-mediacenter.com/en/audimediatv/60-seconds-of-audi-sport-104-2015-wec-bahrain-rookie-test-1467',
'md5': '79a8b71c46d49042609795ab59779b66', 'md5': '79a8b71c46d49042609795ab59779b66',
'info_dict': { 'info_dict': {
'id': '1565', 'id': '1565',
@ -32,7 +32,10 @@ class AudiMediaIE(InfoExtractor):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
raw_payload = self._search_regex(r'<script[^>]+class="amtv-embed"[^>]+id="([^"]+)"', webpage, 'raw payload') raw_payload = self._search_regex([
r'class="amtv-embed"[^>]+id="([^"]+)"',
r'class=\\"amtv-embed\\"[^>]+id=\\"([^"]+)\\"',
], webpage, 'raw payload')
_, stage_mode, video_id, lang = raw_payload.split('-') _, stage_mode, video_id, lang = raw_payload.split('-')
# TODO: handle s and e stage_mode (live streams and ended live streams) # TODO: handle s and e stage_mode (live streams and ended live streams)
@ -59,13 +62,19 @@ class AudiMediaIE(InfoExtractor):
video_version_url = video_version.get('download_url') or video_version.get('stream_url') video_version_url = video_version.get('download_url') or video_version.get('stream_url')
if not video_version_url: if not video_version_url:
continue continue
formats.append({ f = {
'url': video_version_url, 'url': video_version_url,
'width': int_or_none(video_version.get('width')), 'width': int_or_none(video_version.get('width')),
'height': int_or_none(video_version.get('height')), 'height': int_or_none(video_version.get('height')),
'abr': int_or_none(video_version.get('audio_bitrate')), 'abr': int_or_none(video_version.get('audio_bitrate')),
'vbr': int_or_none(video_version.get('video_bitrate')), 'vbr': int_or_none(video_version.get('video_bitrate')),
}) }
bitrate = self._search_regex(r'(\d+)k', video_version_url, 'bitrate', default=None)
if bitrate:
f.update({
'format_id': 'http-%s' % bitrate,
})
formats.append(f)
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -0,0 +1,66 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import float_or_none
class AudioBoomIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?audioboom\.com/boos/(?P<id>[0-9]+)'
_TEST = {
'url': 'https://audioboom.com/boos/4279833-3-09-2016-czaban-hour-3?t=0',
'md5': '63a8d73a055c6ed0f1e51921a10a5a76',
'info_dict': {
'id': '4279833',
'ext': 'mp3',
'title': '3/09/2016 Czaban Hour 3',
'description': 'Guest: Nate Davis - NFL free agency, Guest: Stan Gans',
'duration': 2245.72,
'uploader': 'Steve Czaban',
'uploader_url': 're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
clip = None
clip_store = self._parse_json(
self._search_regex(
r'data-new-clip-store=(["\'])(?P<json>{.*?"clipId"\s*:\s*%s.*?})\1' % video_id,
webpage, 'clip store', default='{}', group='json'),
video_id, fatal=False)
if clip_store:
clips = clip_store.get('clips')
if clips and isinstance(clips, list) and isinstance(clips[0], dict):
clip = clips[0]
def from_clip(field):
if clip:
clip.get(field)
audio_url = from_clip('clipURLPriorToLoading') or self._og_search_property(
'audio', webpage, 'audio url')
title = from_clip('title') or self._og_search_title(webpage)
description = from_clip('description') or self._og_search_description(webpage)
duration = float_or_none(from_clip('duration') or self._html_search_meta(
'weibo:audio:duration', webpage))
uploader = from_clip('author') or self._og_search_property(
'audio:artist', webpage, 'uploader', fatal=False)
uploader_url = from_clip('author_url') or self._html_search_meta(
'audioboo:channel', webpage, 'uploader url')
return {
'id': video_id,
'url': audio_url,
'title': title,
'description': description,
'duration': duration,
'uploader': uploader,
'uploader_url': uploader_url,
}

View File

@ -10,7 +10,6 @@ from ..utils import (
int_or_none, int_or_none,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
remove_end,
unescapeHTML, unescapeHTML,
) )
from ..compat import ( from ..compat import (
@ -561,7 +560,7 @@ class BBCIE(BBCCoUkIE):
'url': 'http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460', 'url': 'http://www.bbc.co.uk/blogs/adamcurtis/entries/3662a707-0af9-3149-963f-47bea720b460',
'info_dict': { 'info_dict': {
'id': '3662a707-0af9-3149-963f-47bea720b460', 'id': '3662a707-0af9-3149-963f-47bea720b460',
'title': 'BBC Blogs - Adam Curtis - BUGGER', 'title': 'BUGGER',
}, },
'playlist_count': 18, 'playlist_count': 18,
}, { }, {
@ -670,9 +669,17 @@ class BBCIE(BBCCoUkIE):
'url': 'http://www.bbc.com/sport/0/football/34475836', 'url': 'http://www.bbc.com/sport/0/football/34475836',
'info_dict': { 'info_dict': {
'id': '34475836', 'id': '34475836',
'title': 'What Liverpool can expect from Klopp', 'title': 'Jurgen Klopp: Furious football from a witty and winning coach',
}, },
'playlist_count': 3, 'playlist_count': 3,
}, {
# school report article with single video
'url': 'http://www.bbc.co.uk/schoolreport/35744779',
'info_dict': {
'id': '35744779',
'title': 'School which breaks down barriers in Jerusalem',
},
'playlist_count': 1,
}, { }, {
# single video with playlist URL from weather section # single video with playlist URL from weather section
'url': 'http://www.bbc.com/weather/features/33601775', 'url': 'http://www.bbc.com/weather/features/33601775',
@ -735,8 +742,17 @@ class BBCIE(BBCCoUkIE):
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None) json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
timestamp = json_ld_info.get('timestamp') timestamp = json_ld_info.get('timestamp')
playlist_title = json_ld_info.get('title') playlist_title = json_ld_info.get('title')
playlist_description = json_ld_info.get('description') if not playlist_title:
playlist_title = self._og_search_title(
webpage, default=None) or self._html_search_regex(
r'<title>(.+?)</title>', webpage, 'playlist title', default=None)
if playlist_title:
playlist_title = re.sub(r'(.+)\s*-\s*BBC.*?$', r'\1', playlist_title).strip()
playlist_description = json_ld_info.get(
'description') or self._og_search_description(webpage, default=None)
if not timestamp: if not timestamp:
timestamp = parse_iso8601(self._search_regex( timestamp = parse_iso8601(self._search_regex(
@ -797,8 +813,6 @@ class BBCIE(BBCCoUkIE):
playlist.get('progressiveDownloadUrl'), playlist_id, timestamp)) playlist.get('progressiveDownloadUrl'), playlist_id, timestamp))
if entries: if entries:
playlist_title = playlist_title or remove_end(self._og_search_title(webpage), ' - BBC News')
playlist_description = playlist_description or self._og_search_description(webpage, default=None)
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description) return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret) # single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
@ -829,10 +843,6 @@ class BBCIE(BBCCoUkIE):
'subtitles': subtitles, 'subtitles': subtitles,
} }
playlist_title = self._html_search_regex(
r'<title>(.*?)(?:\s*-\s*BBC [^ ]+)?</title>', webpage, 'playlist title')
playlist_description = self._og_search_description(webpage, default=None)
def extract_all(pattern): def extract_all(pattern):
return list(filter(None, map( return list(filter(None, map(
lambda s: self._parse_json(s, playlist_id, fatal=False), lambda s: self._parse_json(s, playlist_id, fatal=False),

View File

@ -28,10 +28,10 @@ class BleacherReportIE(InfoExtractor):
'add_ie': ['Ooyala'], 'add_ie': ['Ooyala'],
}, { }, {
'url': 'http://bleacherreport.com/articles/2586817-aussie-golfers-get-fright-of-their-lives-after-being-chased-by-angry-kangaroo', 'url': 'http://bleacherreport.com/articles/2586817-aussie-golfers-get-fright-of-their-lives-after-being-chased-by-angry-kangaroo',
'md5': 'af5f90dc9c7ba1c19d0a3eac806bbf50', 'md5': '6a5cd403418c7b01719248ca97fb0692',
'info_dict': { 'info_dict': {
'id': '2586817', 'id': '2586817',
'ext': 'mp4', 'ext': 'webm',
'title': 'Aussie Golfers Get Fright of Their Lives After Being Chased by Angry Kangaroo', 'title': 'Aussie Golfers Get Fright of Their Lives After Being Chased by Angry Kangaroo',
'timestamp': 1446839961, 'timestamp': 1446839961,
'uploader': 'Sean Fay', 'uploader': 'Sean Fay',
@ -93,10 +93,14 @@ class BleacherReportCMSIE(AMPIE):
'md5': '8c2c12e3af7805152675446c905d159b', 'md5': '8c2c12e3af7805152675446c905d159b',
'info_dict': { 'info_dict': {
'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1', 'id': '8fd44c2f-3dc5-4821-9118-2c825a98c0e1',
'ext': 'flv', 'ext': 'mp4',
'title': 'Cena vs. Rollins Would Expose the Heavyweight Division', 'title': 'Cena vs. Rollins Would Expose the Heavyweight Division',
'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e', 'description': 'md5:984afb4ade2f9c0db35f3267ed88b36e',
}, },
'params': {
# m3u8 download
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -0,0 +1,60 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_parse_qs
from ..utils import ExtractorError
class BokeCCBaseIE(InfoExtractor):
def _extract_bokecc_formats(self, webpage, video_id, format_id=None):
player_params_str = self._html_search_regex(
r'<(?:script|embed)[^>]+src="http://p\.bokecc\.com/player\?([^"]+)',
webpage, 'player params')
player_params = compat_parse_qs(player_params_str)
info_xml = self._download_xml(
'http://p.bokecc.com/servlet/playinfo?uid=%s&vid=%s&m=1' % (
player_params['siteid'][0], player_params['vid'][0]), video_id)
formats = [{
'format_id': format_id,
'url': quality.find('./copy').attrib['playurl'],
'preference': int(quality.attrib['value']),
} for quality in info_xml.findall('./video/quality')]
self._sort_formats(formats)
return formats
class BokeCCIE(BokeCCBaseIE):
_IE_DESC = 'CC视频'
_VALID_URL = r'http://union\.bokecc\.com/playvideo\.bo\?(?P<query>.*)'
_TESTS = [{
'url': 'http://union.bokecc.com/playvideo.bo?vid=E44D40C15E65EA30&uid=CD0C5D3C8614B28B',
'info_dict': {
'id': 'CD0C5D3C8614B28B_E44D40C15E65EA30',
'ext': 'flv',
'title': 'BokeCC Video',
},
}]
def _real_extract(self, url):
qs = compat_parse_qs(re.match(self._VALID_URL, url).group('query'))
if not qs.get('vid') or not qs.get('uid'):
raise ExtractorError('Invalid URL', expected=True)
video_id = '%s_%s' % (qs['uid'][0], qs['vid'][0])
webpage = self._download_webpage(url, video_id)
return {
'id': video_id,
'title': 'BokeCC Video', # no title provided in the webpage
'formats': self._extract_bokecc_formats(webpage, video_id),
}

View File

@ -13,6 +13,7 @@ from ..compat import (
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
compat_urlparse, compat_urlparse,
compat_xml_parse_error, compat_xml_parse_error,
compat_HTTPError,
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
@ -355,7 +356,7 @@ class BrightcoveLegacyIE(InfoExtractor):
class BrightcoveNewIE(InfoExtractor): class BrightcoveNewIE(InfoExtractor):
IE_NAME = 'brightcove:new' IE_NAME = 'brightcove:new'
_VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>(?:ref:)?\d+)' _VALID_URL = r'https?://players\.brightcove\.net/(?P<account_id>\d+)/(?P<player_id>[^/]+)_(?P<embed>[^/]+)/index\.html\?.*videoId=(?P<video_id>\d+|ref:[^&]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001', 'url': 'http://players.brightcove.net/929656772001/e41d32dc-ec74-459e-a845-6c69f7b724ea_default/index.html?videoId=4463358922001',
'md5': 'c8100925723840d4b0d243f7025703be', 'md5': 'c8100925723840d4b0d243f7025703be',
@ -391,6 +392,10 @@ class BrightcoveNewIE(InfoExtractor):
# ref: prefixed video id # ref: prefixed video id
'url': 'http://players.brightcove.net/3910869709001/21519b5c-4b3b-4363-accb-bdc8f358f823_default/index.html?videoId=ref:7069442', 'url': 'http://players.brightcove.net/3910869709001/21519b5c-4b3b-4363-accb-bdc8f358f823_default/index.html?videoId=ref:7069442',
'only_matching': True, 'only_matching': True,
}, {
# non numeric ref: prefixed video id
'url': 'http://players.brightcove.net/710858724001/default_default/index.html?videoId=ref:event-stream-356',
'only_matching': True,
}] }]
@staticmethod @staticmethod
@ -424,7 +429,7 @@ class BrightcoveNewIE(InfoExtractor):
</video>.*? </video>.*?
<script[^>]+ <script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/ src=["\'](?:https?:)?//players\.brightcove\.net/
(\d+)/([\da-f-]+)_([^/]+)/index\.min\.js (\d+)/([\da-f-]+)_([^/]+)/index(?:\.min)?\.js
''', webpage): ''', webpage):
entries.append( entries.append(
'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s' 'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
@ -458,15 +463,22 @@ class BrightcoveNewIE(InfoExtractor):
'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s' 'https://edge.api.brightcove.com/playback/v1/accounts/%s/videos/%s'
% (account_id, video_id), % (account_id, video_id),
headers={'Accept': 'application/json;pk=%s' % policy_key}) headers={'Accept': 'application/json;pk=%s' % policy_key})
json_data = self._download_json(req, video_id) try:
json_data = self._download_json(req, video_id)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)
raise ExtractorError(json_data[0]['message'], expected=True)
raise
title = json_data['name'] title = json_data['name']
formats = [] formats = []
for source in json_data.get('sources', []): for source in json_data.get('sources', []):
container = source.get('container')
source_type = source.get('type') source_type = source.get('type')
src = source.get('src') src = source.get('src')
if source_type == 'application/x-mpegURL': if source_type == 'application/x-mpegURL' or container == 'M2TS':
if not src: if not src:
continue continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
@ -484,7 +496,7 @@ class BrightcoveNewIE(InfoExtractor):
'width': int_or_none(source.get('width')), 'width': int_or_none(source.get('width')),
'height': height, 'height': height,
'filesize': int_or_none(source.get('size')), 'filesize': int_or_none(source.get('size')),
'container': source.get('container'), 'container': container,
'vcodec': source.get('codec'), 'vcodec': source.get('codec'),
'ext': source.get('container').lower(), 'ext': source.get('container').lower(),
} }

View File

@ -21,6 +21,10 @@ class CinemassacreIE(InfoExtractor):
'title': '“Angry Video Game Nerd: The Movie” Trailer', 'title': '“Angry Video Game Nerd: The Movie” Trailer',
'description': 'md5:fb87405fcb42a331742a0dce2708560b', 'description': 'md5:fb87405fcb42a331742a0dce2708560b',
}, },
'params': {
# m3u8 download
'skip_download': True,
},
}, },
{ {
'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940', 'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
@ -31,14 +35,18 @@ class CinemassacreIE(InfoExtractor):
'upload_date': '20131002', 'upload_date': '20131002',
'title': 'The Mummys Hand (1940)', 'title': 'The Mummys Hand (1940)',
}, },
'params': {
# m3u8 download
'skip_download': True,
},
}, },
{ {
# Youtube embedded video # Youtube embedded video
'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/', 'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/',
'md5': 'df4cf8a1dcedaec79a73d96d83b99023', 'md5': 'ec9838a5520ef5409b3e4e42fcb0a3b9',
'info_dict': { 'info_dict': {
'id': 'OEVzPCY2T-g', 'id': 'OEVzPCY2T-g',
'ext': 'mp4', 'ext': 'webm',
'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles', 'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles',
'upload_date': '20061207', 'upload_date': '20061207',
'uploader': 'Cinemassacre', 'uploader': 'Cinemassacre',
@ -49,12 +57,12 @@ class CinemassacreIE(InfoExtractor):
{ {
# Youtube embedded video # Youtube embedded video
'url': 'http://cinemassacre.com/2006/09/01/mckids/', 'url': 'http://cinemassacre.com/2006/09/01/mckids/',
'md5': '6eb30961fa795fedc750eac4881ad2e1', 'md5': '7393c4e0f54602ad110c793eb7a6513a',
'info_dict': { 'info_dict': {
'id': 'FnxsNhuikpo', 'id': 'FnxsNhuikpo',
'ext': 'mp4', 'ext': 'webm',
'upload_date': '20060901', 'upload_date': '20060901',
'uploader': 'Cinemassacre Extras', 'uploader': 'Cinemassacre Extra',
'description': 'md5:de9b751efa9e45fbaafd9c8a1123ed53', 'description': 'md5:de9b751efa9e45fbaafd9c8a1123ed53',
'uploader_id': 'Cinemassacre', 'uploader_id': 'Cinemassacre',
'title': 'AVGN: McKids', 'title': 'AVGN: McKids',
@ -69,7 +77,11 @@ class CinemassacreIE(InfoExtractor):
'description': 'Lets Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!', 'description': 'Lets Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!',
'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays', 'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays',
'upload_date': '20150525', 'upload_date': '20150525',
} },
'params': {
# m3u8 download
'skip_download': True,
},
} }
] ]

View File

@ -51,9 +51,7 @@ class CNETIE(ThePlatformIE):
uploader = None uploader = None
uploader_id = None uploader_id = None
mpx_account = data['config']['uvpConfig']['default']['mpx_account'] metadata = self.get_metadata('kYEXFC/%s' % list(vdata['files'].values())[0], video_id)
metadata = self.get_metadata('%s/%s' % (mpx_account, list(vdata['files'].values())[0]), video_id)
description = vdata.get('description') or metadata.get('description') description = vdata.get('description') or metadata.get('description')
duration = int_or_none(vdata.get('duration')) or metadata.get('duration') duration = int_or_none(vdata.get('duration')) or metadata.get('duration')
@ -62,7 +60,7 @@ class CNETIE(ThePlatformIE):
for (fkey, vid) in vdata['files'].items(): for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']: if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue continue
release_url = 'http://link.theplatform.com/s/%s/%s?format=SMIL&mbr=true' % (mpx_account, vid) release_url = 'http://link.theplatform.com/s/kYEXFC/%s?format=SMIL&mbr=true' % vid
if fkey == 'hds': if fkey == 'hds':
release_url += '&manifest=f4m' release_url += '&manifest=f4m'
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey) tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)

View File

@ -15,13 +15,14 @@ import math
from ..compat import ( from ..compat import (
compat_cookiejar, compat_cookiejar,
compat_cookies, compat_cookies,
compat_etree_fromstring,
compat_getpass, compat_getpass,
compat_http_client, compat_http_client,
compat_os_name,
compat_str,
compat_urllib_error, compat_urllib_error,
compat_urllib_parse, compat_urllib_parse,
compat_urlparse, compat_urlparse,
compat_str,
compat_etree_fromstring,
) )
from ..utils import ( from ..utils import (
NO_DEFAULT, NO_DEFAULT,
@ -47,6 +48,7 @@ from ..utils import (
determine_protocol, determine_protocol,
parse_duration, parse_duration,
mimetype2ext, mimetype2ext,
update_url_query,
) )
@ -104,7 +106,7 @@ class InfoExtractor(object):
* protocol The protocol that will be used for the actual * protocol The protocol that will be used for the actual
download, lower-case. download, lower-case.
"http", "https", "rtsp", "rtmp", "rtmpe", "http", "https", "rtsp", "rtmp", "rtmpe",
"m3u8", or "m3u8_native". "m3u8", "m3u8_native" or "http_dash_segments".
* preference Order number of this format. If this field is * preference Order number of this format. If this field is
present and not None, the formats get sorted present and not None, the formats get sorted
by this field, regardless of all other values. by this field, regardless of all other values.
@ -157,12 +159,14 @@ class InfoExtractor(object):
thumbnail: Full URL to a video thumbnail image. thumbnail: Full URL to a video thumbnail image.
description: Full video description. description: Full video description.
uploader: Full name of the video uploader. uploader: Full name of the video uploader.
license: License name the video is licensed under.
creator: The main artist who created the video. creator: The main artist who created the video.
release_date: The date (YYYYMMDD) when the video was released. release_date: The date (YYYYMMDD) when the video was released.
timestamp: UNIX timestamp of the moment the video became available. timestamp: UNIX timestamp of the moment the video became available.
upload_date: Video upload date (YYYYMMDD). upload_date: Video upload date (YYYYMMDD).
If not explicitly set, calculated from timestamp. If not explicitly set, calculated from timestamp.
uploader_id: Nickname or id of the video uploader. uploader_id: Nickname or id of the video uploader.
uploader_url: Full URL to a personal webpage of the video uploader.
location: Physical location where the video was filmed. location: Physical location where the video was filmed.
subtitles: The available subtitles as a dictionary in the format subtitles: The available subtitles as a dictionary in the format
{language: subformats}. "subformats" is a list sorted from {language: subformats}. "subformats" is a list sorted from
@ -342,7 +346,7 @@ class InfoExtractor(object):
def IE_NAME(self): def IE_NAME(self):
return compat_str(type(self).__name__[:-2]) return compat_str(type(self).__name__[:-2])
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True): def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers=None, query=None):
""" Returns the response handle """ """ Returns the response handle """
if note is None: if note is None:
self.report_download_webpage(video_id) self.report_download_webpage(video_id)
@ -351,6 +355,12 @@ class InfoExtractor(object):
self.to_screen('%s' % (note,)) self.to_screen('%s' % (note,))
else: else:
self.to_screen('%s: %s' % (video_id, note)) self.to_screen('%s: %s' % (video_id, note))
# data, headers and query params will be ignored for `Request` objects
if isinstance(url_or_request, compat_str):
if query:
url_or_request = update_url_query(url_or_request, query)
if data or headers:
url_or_request = sanitized_Request(url_or_request, data, headers or {})
try: try:
return self._downloader.urlopen(url_or_request) return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err: except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
@ -366,13 +376,13 @@ class InfoExtractor(object):
self._downloader.report_warning(errmsg) self._downloader.report_warning(errmsg)
return False return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None): def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers=None, query=None):
""" Returns a tuple (page content as string, URL handle) """ """ Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038) # Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)): if isinstance(url_or_request, (compat_str, str)):
url_or_request = url_or_request.partition('#')[0] url_or_request = url_or_request.partition('#')[0]
urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal) urlh = self._request_webpage(url_or_request, video_id, note, errnote, fatal, data=data, headers=headers, query=query)
if urlh is False: if urlh is False:
assert not fatal assert not fatal
return False return False
@ -425,7 +435,7 @@ class InfoExtractor(object):
self.to_screen('Saving request to ' + filename) self.to_screen('Saving request to ' + filename)
# Working around MAX_PATH limitation on Windows (see # Working around MAX_PATH limitation on Windows (see
# http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx) # http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx)
if os.name == 'nt': if compat_os_name == 'nt':
absfilepath = os.path.abspath(filename) absfilepath = os.path.abspath(filename)
if len(absfilepath) > 259: if len(absfilepath) > 259:
filename = '\\\\?\\' + absfilepath filename = '\\\\?\\' + absfilepath
@ -459,13 +469,13 @@ class InfoExtractor(object):
return content return content
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None): def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers=None, query=None):
""" Returns the data of the page as a string """ """ Returns the data of the page as a string """
success = False success = False
try_count = 0 try_count = 0
while success is False: while success is False:
try: try:
res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal, encoding=encoding) res = self._download_webpage_handle(url_or_request, video_id, note, errnote, fatal, encoding=encoding, data=data, headers=headers, query=query)
success = True success = True
except compat_http_client.IncompleteRead as e: except compat_http_client.IncompleteRead as e:
try_count += 1 try_count += 1
@ -480,10 +490,10 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id, def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML', note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None): transform_source=None, fatal=True, encoding=None, data=None, headers=None, query=None):
"""Return the xml as an xml.etree.ElementTree.Element""" """Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage( xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding) url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
if xml_string is False: if xml_string is False:
return xml_string return xml_string
if transform_source: if transform_source:
@ -494,10 +504,10 @@ class InfoExtractor(object):
note='Downloading JSON metadata', note='Downloading JSON metadata',
errnote='Unable to download JSON metadata', errnote='Unable to download JSON metadata',
transform_source=None, transform_source=None,
fatal=True, encoding=None): fatal=True, encoding=None, data=None, headers=None, query=None):
json_string = self._download_webpage( json_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding) encoding=encoding, data=data, headers=headers, query=query)
if (not fatal) and json_string is False: if (not fatal) and json_string is False:
return None return None
return self._parse_json( return self._parse_json(
@ -594,7 +604,7 @@ class InfoExtractor(object):
if mobj: if mobj:
break break
if not self._downloader.params.get('no_color') and os.name != 'nt' and sys.stderr.isatty(): if not self._downloader.params.get('no_color') and compat_os_name != 'nt' and sys.stderr.isatty():
_name = '\033[0;34m%s\033[0m' % name _name = '\033[0;34m%s\033[0m' % name
else: else:
_name = name _name = name
@ -963,6 +973,13 @@ class InfoExtractor(object):
if manifest is False: if manifest is False:
return [] return []
return self._parse_f4m_formats(
manifest, manifest_url, video_id, preference=preference, f4m_id=f4m_id,
transform_source=transform_source, fatal=fatal)
def _parse_f4m_formats(self, manifest, manifest_url, video_id, preference=None, f4m_id=None,
transform_source=lambda s: fix_xml_ampersands(s).strip(),
fatal=True):
formats = [] formats = []
manifest_version = '1.0' manifest_version = '1.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media') media_nodes = manifest.findall('{http://ns.adobe.com/f4m/1.0}media')
@ -988,7 +1005,8 @@ class InfoExtractor(object):
# bitrate in f4m downloader # bitrate in f4m downloader
if determine_ext(manifest_url) == 'f4m': if determine_ext(manifest_url) == 'f4m':
formats.extend(self._extract_f4m_formats( formats.extend(self._extract_f4m_formats(
manifest_url, video_id, preference, f4m_id, fatal=fatal)) manifest_url, video_id, preference=preference, f4m_id=f4m_id,
transform_source=transform_source, fatal=fatal))
continue continue
tbr = int_or_none(media_el.attrib.get('bitrate')) tbr = int_or_none(media_el.attrib.get('bitrate'))
formats.append({ formats.append({
@ -1033,11 +1051,21 @@ class InfoExtractor(object):
return [] return []
m3u8_doc, urlh = res m3u8_doc, urlh = res
m3u8_url = urlh.geturl() m3u8_url = urlh.geturl()
# A Media Playlist Tag MUST NOT appear in a Master Playlist
# https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3 # We should try extracting formats only from master playlists [1], i.e.
# The EXT-X-TARGETDURATION tag is REQUIRED for every M3U8 Media Playlists # playlists that describe available qualities. On the other hand media
# https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.1 # playlists [2] should be returned as is since they contain just the media
if '#EXT-X-TARGETDURATION' in m3u8_doc: # without qualities renditions.
# Fortunately, master playlist can be easily distinguished from media
# playlist based on particular tags availability. As of [1, 2] master
# playlist tags MUST NOT appear in a media playist and vice versa.
# As of [3] #EXT-X-TARGETDURATION tag is REQUIRED for every media playlist
# and MUST NOT appear in master playlist thus we can clearly detect media
# playlist with this criterion.
# 1. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.4
# 2. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3
# 3. https://tools.ietf.org/html/draft-pantos-http-live-streaming-17#section-4.3.3.1
if '#EXT-X-TARGETDURATION' in m3u8_doc: # media playlist, return as is
return [{ return [{
'url': m3u8_url, 'url': m3u8_url,
'format_id': m3u8_id, 'format_id': m3u8_id,
@ -1084,19 +1112,29 @@ class InfoExtractor(object):
'protocol': entry_protocol, 'protocol': entry_protocol,
'preference': preference, 'preference': preference,
} }
codecs = last_info.get('CODECS')
if codecs:
# TODO: looks like video codec is not always necessarily goes first
va_codecs = codecs.split(',')
if va_codecs[0]:
f['vcodec'] = va_codecs[0]
if len(va_codecs) > 1 and va_codecs[1]:
f['acodec'] = va_codecs[1]
resolution = last_info.get('RESOLUTION') resolution = last_info.get('RESOLUTION')
if resolution: if resolution:
width_str, height_str = resolution.split('x') width_str, height_str = resolution.split('x')
f['width'] = int(width_str) f['width'] = int(width_str)
f['height'] = int(height_str) f['height'] = int(height_str)
codecs = last_info.get('CODECS')
if codecs:
vcodec, acodec = [None] * 2
va_codecs = codecs.split(',')
if len(va_codecs) == 1:
# Audio only entries usually come with single codec and
# no resolution. For more robustness we also check it to
# be mp4 audio.
if not resolution and va_codecs[0].startswith('mp4a'):
vcodec, acodec = 'none', va_codecs[0]
else:
vcodec = va_codecs[0]
else:
vcodec, acodec = va_codecs[:2]
f.update({
'acodec': acodec,
'vcodec': vcodec,
})
if last_media is not None: if last_media is not None:
f['m3u8_media'] = last_media f['m3u8_media'] = last_media
last_media = None last_media = None
@ -1117,8 +1155,8 @@ class InfoExtractor(object):
out.append('{%s}%s' % (namespace, c)) out.append('{%s}%s' % (namespace, c))
return '/'.join(out) return '/'.join(out)
def _extract_smil_formats(self, smil_url, video_id, fatal=True, f4m_params=None): def _extract_smil_formats(self, smil_url, video_id, fatal=True, f4m_params=None, transform_source=None):
smil = self._download_smil(smil_url, video_id, fatal=fatal) smil = self._download_smil(smil_url, video_id, fatal=fatal, transform_source=transform_source)
if smil is False: if smil is False:
assert not fatal assert not fatal
@ -1135,10 +1173,10 @@ class InfoExtractor(object):
return {} return {}
return self._parse_smil(smil, smil_url, video_id, f4m_params=f4m_params) return self._parse_smil(smil, smil_url, video_id, f4m_params=f4m_params)
def _download_smil(self, smil_url, video_id, fatal=True): def _download_smil(self, smil_url, video_id, fatal=True, transform_source=None):
return self._download_xml( return self._download_xml(
smil_url, video_id, 'Downloading SMIL file', smil_url, video_id, 'Downloading SMIL file',
'Unable to download SMIL file', fatal=fatal) 'Unable to download SMIL file', fatal=fatal, transform_source=transform_source)
def _parse_smil(self, smil, smil_url, video_id, f4m_params=None): def _parse_smil(self, smil, smil_url, video_id, f4m_params=None):
namespace = self._parse_smil_namespace(smil) namespace = self._parse_smil_namespace(smil)
@ -1424,8 +1462,9 @@ class InfoExtractor(object):
continue continue
representation_attrib = adaptation_set.attrib.copy() representation_attrib = adaptation_set.attrib.copy()
representation_attrib.update(representation.attrib) representation_attrib.update(representation.attrib)
mime_type = representation_attrib.get('mimeType') # According to page 41 of ISO/IEC 29001-1:2014, @mimeType is mandatory
content_type = mime_type.split('/')[0] if mime_type else representation_attrib.get('contentType') mime_type = representation_attrib['mimeType']
content_type = mime_type.split('/')[0]
if content_type == 'text': if content_type == 'text':
# TODO implement WebVTT downloading # TODO implement WebVTT downloading
pass pass
@ -1448,6 +1487,7 @@ class InfoExtractor(object):
f = { f = {
'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id, 'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
'url': base_url, 'url': base_url,
'ext': mimetype2ext(mime_type),
'width': int_or_none(representation_attrib.get('width')), 'width': int_or_none(representation_attrib.get('width')),
'height': int_or_none(representation_attrib.get('height')), 'height': int_or_none(representation_attrib.get('height')),
'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000), 'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000),
@ -1600,6 +1640,15 @@ class InfoExtractor(object):
def _get_automatic_captions(self, *args, **kwargs): def _get_automatic_captions(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses') raise NotImplementedError('This method must be implemented by subclasses')
def mark_watched(self, *args, **kwargs):
if (self._downloader.params.get('mark_watched', False) and
(self._get_login_info()[0] is not None or
self._downloader.params.get('cookiefile') is not None)):
self._mark_watched(*args, **kwargs)
def _mark_watched(self, *args, **kwargs):
raise NotImplementedError('This method must be implemented by subclasses')
class SearchInfoExtractor(InfoExtractor): class SearchInfoExtractor(InfoExtractor):
""" """

View File

@ -18,7 +18,7 @@ class DouyuTVIE(InfoExtractor):
'display_id': 'iseven', 'display_id': 'iseven',
'ext': 'flv', 'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$', 'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:c93d6692dde6fe33809a46edcbecca44', 'description': 'md5:f34981259a03e980a3c6404190a3ed61',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅', 'uploader': '7师傅',
'uploader_id': '431925', 'uploader_id': '431925',
@ -26,7 +26,7 @@ class DouyuTVIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
}, { }, {
'url': 'http://www.douyutv.com/85982', 'url': 'http://www.douyutv.com/85982',
'info_dict': { 'info_dict': {
@ -42,7 +42,24 @@ class DouyuTVIE(InfoExtractor):
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
} },
'skip': 'Romm not found',
}, {
'url': 'http://www.douyutv.com/17732',
'info_dict': {
'id': '17732',
'display_id': '17732',
'ext': 'flv',
'title': 're:^清晨醒脑T-ara根本停不下来 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
'description': 'md5:f34981259a03e980a3c6404190a3ed61',
'thumbnail': 're:^https?://.*\.jpg$',
'uploader': '7师傅',
'uploader_id': '431925',
'is_live': True,
},
'params': {
'skip_download': True,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,6 +1,8 @@
# encoding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import json
import re
import time import time
from .common import InfoExtractor from .common import InfoExtractor
@ -8,44 +10,125 @@ from ..utils import int_or_none
class DPlayIE(InfoExtractor): class DPlayIE(InfoExtractor):
_VALID_URL = r'http://www\.dplay\.se/[^/]+/(?P<id>[^/?#]+)' _VALID_URL = r'http://(?P<domain>it\.dplay\.com|www\.dplay\.(?:dk|se|no))/[^/]+/(?P<id>[^/?#]+)'
_TEST = { _TESTS = [{
'url': 'http://it.dplay.com/take-me-out/stagione-1-episodio-25/',
'info_dict': {
'id': '1255600',
'display_id': 'stagione-1-episodio-25',
'ext': 'mp4',
'title': 'Episodio 25',
'description': 'md5:cae5f40ad988811b197d2d27a53227eb',
'duration': 2761,
'timestamp': 1454701800,
'upload_date': '20160205',
'creator': 'RTIT',
'series': 'Take me out',
'season_number': 1,
'episode_number': 25,
'age_limit': 0,
},
'expected_warnings': ['Unable to download f4m manifest'],
}, {
'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/', 'url': 'http://www.dplay.se/nugammalt-77-handelser-som-format-sverige/season-1-svensken-lar-sig-njuta-av-livet/',
'info_dict': { 'info_dict': {
'id': '3172', 'id': '3172',
'ext': 'mp4',
'display_id': 'season-1-svensken-lar-sig-njuta-av-livet', 'display_id': 'season-1-svensken-lar-sig-njuta-av-livet',
'ext': 'flv',
'title': 'Svensken lär sig njuta av livet', 'title': 'Svensken lär sig njuta av livet',
'description': 'md5:d3819c9bccffd0fe458ca42451dd50d8',
'duration': 2650, 'duration': 2650,
'timestamp': 1365454320,
'upload_date': '20130408',
'creator': 'Kanal 5 (Home)',
'series': 'Nugammalt - 77 händelser som format Sverige',
'season_number': 1,
'episode_number': 1,
'age_limit': 0,
}, },
} }, {
'url': 'http://www.dplay.dk/mig-og-min-mor/season-6-episode-12/',
'info_dict': {
'id': '70816',
'display_id': 'season-6-episode-12',
'ext': 'flv',
'title': 'Episode 12',
'description': 'md5:9c86e51a93f8a4401fc9641ef9894c90',
'duration': 2563,
'timestamp': 1429696800,
'upload_date': '20150422',
'creator': 'Kanal 4',
'series': 'Mig og min mor',
'season_number': 6,
'episode_number': 12,
'age_limit': 0,
},
}, {
'url': 'http://www.dplay.no/pga-tour/season-1-hoydepunkter-18-21-februar/',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
domain = mobj.group('domain')
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
video_id = self._search_regex( video_id = self._search_regex(
r'data-video-id="(\d+)"', webpage, 'video id') r'data-video-id=["\'](\d+)', webpage, 'video id')
info = self._download_json( info = self._download_json(
'http://www.dplay.se/api/v2/ajax/videos?video_id=' + video_id, 'http://%s/api/v2/ajax/videos?video_id=%s' % (domain, video_id),
video_id)['data'][0] video_id)['data'][0]
self._set_cookie( title = info['title']
'secure.dplay.se', 'dsc-geo',
'{"countryCode":"NL","expiry":%d}' % ((time.time() + 20 * 60) * 1000)) PROTOCOLS = ('hls', 'hds')
# TODO: consider adding support for 'stream_type=hds', it seems to formats = []
# require setting some cookies
manifest_url = self._download_json( def extract_formats(protocol, manifest_url):
'https://secure.dplay.se/secure/api/v2/user/authorization/stream/%s?stream_type=hls' % video_id, if protocol == 'hls':
video_id, 'Getting manifest url for hls stream')['hls'] formats.extend(self._extract_m3u8_formats(
formats = self._extract_m3u8_formats( manifest_url, video_id, ext='mp4',
manifest_url, video_id, ext='mp4', entry_protocol='m3u8_native') entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False))
elif protocol == 'hds':
formats.extend(self._extract_f4m_formats(
manifest_url + '&hdcore=3.8.0&plugin=flowplayer-3.8.0.0',
video_id, f4m_id=protocol, fatal=False))
domain_tld = domain.split('.')[-1]
if domain_tld in ('se', 'dk'):
for protocol in PROTOCOLS:
self._set_cookie(
'secure.dplay.%s' % domain_tld, 'dsc-geo',
json.dumps({
'countryCode': domain_tld.upper(),
'expiry': (time.time() + 20 * 60) * 1000,
}))
stream = self._download_json(
'https://secure.dplay.%s/secure/api/v2/user/authorization/stream/%s?stream_type=%s'
% (domain_tld, video_id, protocol), video_id,
'Downloading %s stream JSON' % protocol, fatal=False)
if stream and stream.get(protocol):
extract_formats(protocol, stream[protocol])
else:
for protocol in PROTOCOLS:
if info.get(protocol):
extract_formats(protocol, info[protocol])
return { return {
'id': video_id, 'id': video_id,
'display_id': display_id, 'display_id': display_id,
'title': info['title'], 'title': title,
'formats': formats, 'description': info.get('video_metadata_longDescription'),
'duration': int_or_none(info.get('video_metadata_length'), scale=1000), 'duration': int_or_none(info.get('video_metadata_length'), scale=1000),
'timestamp': int_or_none(info.get('video_publish_date')),
'creator': info.get('video_metadata_homeChannel'),
'series': info.get('video_metadata_show'),
'season_number': int_or_none(info.get('season')),
'episode_number': int_or_none(info.get('episode')),
'age_limit': int_or_none(info.get('minimum_age')),
'formats': formats,
} }

View File

@ -0,0 +1,85 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..compat import compat_urlparse
class DWIE(InfoExtractor):
IE_NAME = 'dw'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+av-(?P<id>\d+)'
_TESTS = [{
# video
'url': 'http://www.dw.com/en/intelligent-light/av-19112290',
'md5': '7372046e1815c5a534b43f3c3c36e6e9',
'info_dict': {
'id': '19112290',
'ext': 'mp4',
'title': 'Intelligent light',
'description': 'md5:90e00d5881719f2a6a5827cb74985af1',
'upload_date': '20160311',
}
}, {
# audio
'url': 'http://www.dw.com/en/worldlink-my-business/av-19111941',
'md5': '2814c9a1321c3a51f8a7aeb067a360dd',
'info_dict': {
'id': '19111941',
'ext': 'mp3',
'title': 'WorldLink: My business',
'description': 'md5:bc9ca6e4e063361e21c920c53af12405',
'upload_date': '20160311',
}
}]
def _real_extract(self, url):
media_id = self._match_id(url)
webpage = self._download_webpage(url, media_id)
hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title']
formats = []
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats(
'http://www.dw.com/smil/v-%s' % media_id, media_id,
transform_source=lambda s: s.replace(
'rtmp://tv-od.dw.de/flash/',
'http://tv-download.dw.de/dwtv_video/flv/'))
else:
formats = [{'url': hidden_inputs['file_name']}]
return {
'id': media_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnail': hidden_inputs.get('preview_image'),
'duration': int_or_none(hidden_inputs.get('file_duration')),
'upload_date': hidden_inputs.get('display_date'),
'formats': formats,
}
class DWArticleIE(InfoExtractor):
IE_NAME = 'dw:article'
_VALID_URL = r'https?://(?:www\.)?dw\.com/(?:[^/]+/)+a-(?P<id>\d+)'
_TEST = {
'url': 'http://www.dw.com/en/no-hope-limited-options-for-refugees-in-idomeni/a-19111009',
'md5': '8ca657f9d068bbef74d6fc38b97fc869',
'info_dict': {
'id': '19105868',
'ext': 'mp4',
'title': 'The harsh life of refugees in Idomeni',
'description': 'md5:196015cc7e48ebf474db9399420043c7',
'upload_date': '20160310',
}
}
def _real_extract(self, url):
article_id = self._match_id(url)
webpage = self._download_webpage(url, article_id)
hidden_inputs = self._hidden_inputs(webpage)
media_id = hidden_inputs['media_id']
media_path = self._search_regex(r'href="([^"]+av-%s)"\s+class="overlayLink"' % media_id, webpage, 'media url')
media_url = compat_urlparse.urljoin(url, media_path)
return self.url_result(media_url, 'DW', media_id)

View File

@ -9,7 +9,7 @@ class ElPaisIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?elpais\.com/.*/(?P<id>[^/#?]+)\.html(?:$|[?#])' _VALID_URL = r'https?://(?:[^.]+\.)?elpais\.com/.*/(?P<id>[^/#?]+)\.html(?:$|[?#])'
IE_DESC = 'El País' IE_DESC = 'El País'
_TEST = { _TESTS = [{
'url': 'http://blogs.elpais.com/la-voz-de-inaki/2014/02/tiempo-nuevo-recetas-viejas.html', 'url': 'http://blogs.elpais.com/la-voz-de-inaki/2014/02/tiempo-nuevo-recetas-viejas.html',
'md5': '98406f301f19562170ec071b83433d55', 'md5': '98406f301f19562170ec071b83433d55',
'info_dict': { 'info_dict': {
@ -19,30 +19,41 @@ class ElPaisIE(InfoExtractor):
'description': 'De lunes a viernes, a partir de las ocho de la mañana, Iñaki Gabilondo nos cuenta su visión de la actualidad nacional e internacional.', 'description': 'De lunes a viernes, a partir de las ocho de la mañana, Iñaki Gabilondo nos cuenta su visión de la actualidad nacional e internacional.',
'upload_date': '20140206', 'upload_date': '20140206',
} }
} }, {
'url': 'http://elcomidista.elpais.com/elcomidista/2016/02/24/articulo/1456340311_668921.html#?id_externo_nwl=newsletter_diaria20160303t',
'md5': '3bd5b09509f3519d7d9e763179b013de',
'info_dict': {
'id': '1456340311_668921',
'ext': 'mp4',
'title': 'Cómo hacer el mejor café con cafetera italiana',
'description': 'Que sí, que las cápsulas son cómodas. Pero si le pides algo más a la vida, quizá deberías aprender a usar bien la cafetera italiana. No tienes más que ver este vídeo y seguir sus siete normas básicas.',
'upload_date': '20160303',
}
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
prefix = self._html_search_regex( prefix = self._html_search_regex(
r'var url_cache = "([^"]+)";', webpage, 'URL prefix') r'var\s+url_cache\s*=\s*"([^"]+)";', webpage, 'URL prefix')
video_suffix = self._search_regex( video_suffix = self._search_regex(
r"URLMediaFile = url_cache \+ '([^']+)'", webpage, 'video URL') r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL')
video_url = prefix + video_suffix video_url = prefix + video_suffix
thumbnail_suffix = self._search_regex( thumbnail_suffix = self._search_regex(
r"URLMediaStill = url_cache \+ '([^']+)'", webpage, 'thumbnail URL', r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
fatal=False) webpage, 'thumbnail URL', fatal=False)
thumbnail = ( thumbnail = (
None if thumbnail_suffix is None None if thumbnail_suffix is None
else prefix + thumbnail_suffix) else prefix + thumbnail_suffix)
title = self._html_search_regex( title = self._html_search_regex(
'<h2 class="entry-header entry-title.*?>(.*?)</h2>', (r"tituloVideo\s*=\s*'([^']+)'", webpage, 'title',
r'<h2 class="entry-header entry-title.*?>(.*?)</h2>'),
webpage, 'title') webpage, 'title')
date_str = self._search_regex( upload_date = unified_strdate(self._search_regex(
r'<p class="date-header date-int updated"\s+title="([^"]+)">', r'<p class="date-header date-int updated"\s+title="([^"]+)">',
webpage, 'upload date', fatal=False) webpage, 'upload date', default=None) or self._html_search_meta(
upload_date = (None if date_str is None else unified_strdate(date_str)) 'datePublished', webpage, 'timestamp'))
return { return {
'id': video_id, 'id': video_id,

View File

@ -1,21 +1,13 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
url_basename,
)
class EngadgetIE(InfoExtractor): class EngadgetIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://www.engadget.com/ _VALID_URL = r'https?://www.engadget.com/video/(?P<id>\d+)'
(?:video(?:/5min)?/(?P<id>\d+)|
[\d/]+/.*?)
'''
_TEST = { _TEST = {
'url': 'http://www.engadget.com/video/5min/518153925/', 'url': 'http://www.engadget.com/video/518153925/',
'md5': 'c6820d4828a5064447a4d9fc73f312c9', 'md5': 'c6820d4828a5064447a4d9fc73f312c9',
'info_dict': { 'info_dict': {
'id': '518153925', 'id': '518153925',
@ -27,15 +19,4 @@ class EngadgetIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id)
if video_id is not None:
return self.url_result('5min:%s' % video_id)
else:
title = url_basename(url)
webpage = self._download_webpage(url, title)
ids = re.findall(r'<iframe[^>]+?playList=(\d+)', webpage)
return {
'_type': 'playlist',
'title': title,
'entries': [self.url_result('5min:%s' % vid) for vid in ids]
}

View File

@ -37,7 +37,9 @@ class FacebookIE(InfoExtractor):
video/embed| video/embed|
story\.php story\.php
)\?(?:.*?)(?:v|video_id|story_fbid)=| )\?(?:.*?)(?:v|video_id|story_fbid)=|
[^/]+/videos/(?:[^/]+/)? [^/]+/videos/(?:[^/]+/)?|
[^/]+/posts/|
groups/[^/]+/permalink/
)| )|
facebook: facebook:
) )
@ -50,6 +52,8 @@ class FacebookIE(InfoExtractor):
_CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36' _CHROME_USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36'
_VIDEO_PAGE_TEMPLATE = 'https://www.facebook.com/video/video.php?v=%s'
_TESTS = [{ _TESTS = [{
'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf', 'url': 'https://www.facebook.com/video.php?v=637842556329505&fref=nf',
'md5': '6a40d33c0eccbb1af76cf0485a052659', 'md5': '6a40d33c0eccbb1af76cf0485a052659',
@ -81,6 +85,33 @@ class FacebookIE(InfoExtractor):
'title': 'When you post epic content on instagram.com/433 8 million followers, this is ...', 'title': 'When you post epic content on instagram.com/433 8 million followers, this is ...',
'uploader': 'Demy de Zeeuw', 'uploader': 'Demy de Zeeuw',
}, },
}, {
'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570',
'md5': '037b1fa7f3c2d02b7a0d7bc16031ecc6',
'info_dict': {
'id': '544765982287235',
'ext': 'mp4',
'title': '"What are you doing running in the snow?"',
'uploader': 'FailArmy',
}
}, {
'url': 'https://m.facebook.com/story.php?story_fbid=1035862816472149&id=116132035111903',
'md5': '1deb90b6ac27f7efcf6d747c8a27f5e3',
'info_dict': {
'id': '1035862816472149',
'ext': 'mp4',
'title': 'What the Flock Is Going On In New Zealand Credit: ViralHog',
'uploader': 'S. Saint',
},
}, {
'note': 'swf params escaped',
'url': 'https://www.facebook.com/barackobama/posts/10153664894881749',
'md5': '97ba073838964d12c70566e0085c2b91',
'info_dict': {
'id': '10153664894881749',
'ext': 'mp4',
'title': 'Facebook video #10153664894881749',
},
}, { }, {
'url': 'https://www.facebook.com/video.php?v=10204634152394104', 'url': 'https://www.facebook.com/video.php?v=10204634152394104',
'only_matching': True, 'only_matching': True,
@ -94,7 +125,7 @@ class FacebookIE(InfoExtractor):
'url': 'facebook:544765982287235', 'url': 'facebook:544765982287235',
'only_matching': True, 'only_matching': True,
}, { }, {
'url': 'https://m.facebook.com/story.php?story_fbid=1035862816472149&id=116132035111903', 'url': 'https://www.facebook.com/groups/164828000315060/permalink/764967300301124/',
'only_matching': True, 'only_matching': True,
}] }]
@ -164,19 +195,19 @@ class FacebookIE(InfoExtractor):
def _real_initialize(self): def _real_initialize(self):
self._login() self._login()
def _real_extract(self, url): def _extract_from_url(self, url, video_id, fatal_if_no_video=True):
video_id = self._match_id(url) req = sanitized_Request(url)
req = sanitized_Request('https://www.facebook.com/video/video.php?v=%s' % video_id)
req.add_header('User-Agent', self._CHROME_USER_AGENT) req.add_header('User-Agent', self._CHROME_USER_AGENT)
webpage = self._download_webpage(req, video_id) webpage = self._download_webpage(req, video_id)
video_data = None video_data = None
BEFORE = '{swf.addParam(param[0], param[1]);});\n' BEFORE = '{swf.addParam(param[0], param[1]);});'
AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});' AFTER = '.forEach(function(variable) {swf.addVariable(variable[0], variable[1]);});'
m = re.search(re.escape(BEFORE) + '(.*?)' + re.escape(AFTER), webpage) m = re.search(re.escape(BEFORE) + '(?:\n|\\\\n)(.*?)' + re.escape(AFTER), webpage)
if m: if m:
data = dict(json.loads(m.group(1))) swf_params = m.group(1).replace('\\\\', '\\').replace('\\"', '"')
data = dict(json.loads(swf_params))
params_raw = compat_urllib_parse_unquote(data['params']) params_raw = compat_urllib_parse_unquote(data['params'])
video_data = json.loads(params_raw)['video_data'] video_data = json.loads(params_raw)['video_data']
@ -189,13 +220,15 @@ class FacebookIE(InfoExtractor):
if not video_data: if not video_data:
server_js_data = self._parse_json(self._search_regex( server_js_data = self._parse_json(self._search_regex(
r'handleServerJS\(({.+})\);', webpage, 'server js data'), video_id) r'handleServerJS\(({.+})\);', webpage, 'server js data', default='{}'), video_id)
for item in server_js_data.get('instances', []): for item in server_js_data.get('instances', []):
if item[1][0] == 'VideoConfig': if item[1][0] == 'VideoConfig':
video_data = video_data_list2dict(item[2][0]['videoData']) video_data = video_data_list2dict(item[2][0]['videoData'])
break break
if not video_data: if not video_data:
if not fatal_if_no_video:
return webpage, False
m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage) m_msg = re.search(r'class="[^"]*uiInterstitialContent[^"]*"><div>(.*?)</div>', webpage)
if m_msg is not None: if m_msg is not None:
raise ExtractorError( raise ExtractorError(
@ -241,39 +274,36 @@ class FacebookIE(InfoExtractor):
video_title = 'Facebook video #%s' % video_id video_title = 'Facebook video #%s' % video_id
uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage)) uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage))
return { info_dict = {
'id': video_id, 'id': video_id,
'title': video_title, 'title': video_title,
'formats': formats, 'formats': formats,
'uploader': uploader, 'uploader': uploader,
} }
return webpage, info_dict
class FacebookPostIE(InfoExtractor):
IE_NAME = 'facebook:post'
_VALID_URL = r'https?://(?:\w+\.)?facebook\.com/[^/]+/posts/(?P<id>\d+)'
_TEST = {
'url': 'https://www.facebook.com/maxlayn/posts/10153807558977570',
'md5': '037b1fa7f3c2d02b7a0d7bc16031ecc6',
'info_dict': {
'id': '544765982287235',
'ext': 'mp4',
'title': '"What are you doing running in the snow?"',
'uploader': 'FailArmy',
}
}
def _real_extract(self, url): def _real_extract(self, url):
post_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, post_id) real_url = self._VIDEO_PAGE_TEMPLATE % video_id if url.startswith('facebook:') else url
webpage, info_dict = self._extract_from_url(real_url, video_id, fatal_if_no_video=False)
entries = [ if info_dict:
self.url_result('facebook:%s' % video_id, FacebookIE.ie_key()) return info_dict
for video_id in self._parse_json(
self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
webpage, 'video ids', group='ids'),
post_id)]
return self.playlist_result(entries, post_id) if '/posts/' in url:
entries = [
self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
for vid in self._parse_json(
self._search_regex(
r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
webpage, 'video ids', group='ids'),
video_id)]
return self.playlist_result(entries, video_id)
else:
_, info_dict = self._extract_from_url(
self._VIDEO_PAGE_TEMPLATE % video_id,
video_id, fatal_if_no_video=True)
return info_dict

View File

@ -1,5 +1,7 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse,
@ -16,12 +18,7 @@ from ..utils import (
class FiveMinIE(InfoExtractor): class FiveMinIE(InfoExtractor):
IE_NAME = '5min' IE_NAME = '5min'
_VALID_URL = r'''(?x) _VALID_URL = r'(?:5min:(?P<id>\d+)(?::(?P<sid>\d+))?|https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?P<query>.*))'
(?:https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js\?(?:.*?&)?playList=|
https?://(?:(?:massively|www)\.)?joystiq\.com/video/|
5min:)
(?P<id>\d+)
'''
_TESTS = [ _TESTS = [
{ {
@ -45,6 +42,7 @@ class FiveMinIE(InfoExtractor):
'title': 'How to Make a Next-Level Fruit Salad', 'title': 'How to Make a Next-Level Fruit Salad',
'duration': 184, 'duration': 184,
}, },
'skip': 'no longer available',
}, },
] ]
_ERRORS = { _ERRORS = {
@ -91,20 +89,33 @@ class FiveMinIE(InfoExtractor):
} }
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
sid = mobj.group('sid')
if mobj.group('query'):
qs = compat_parse_qs(mobj.group('query'))
if not qs.get('playList'):
raise ExtractorError('Invalid URL', expected=True)
video_id = qs['playList'][0]
if qs.get('sid'):
sid = qs['sid'][0]
embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id embed_url = 'https://embed.5min.com/playerseed/?playList=%s' % video_id
embed_page = self._download_webpage(embed_url, video_id, if not sid:
'Downloading embed page') embed_page = self._download_webpage(embed_url, video_id,
sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid') 'Downloading embed page')
query = compat_urllib_parse.urlencode({ sid = self._search_regex(r'sid=(\d+)', embed_page, 'sid')
'func': 'GetResults',
'playlist': video_id,
'sid': sid,
'isPlayerSeed': 'true',
'url': embed_url,
})
response = self._download_json( response = self._download_json(
'https://syn.5min.com/handlers/SenseHandler.ashx?' + query, 'https://syn.5min.com/handlers/SenseHandler.ashx?' +
compat_urllib_parse.urlencode({
'func': 'GetResults',
'playlist': video_id,
'sid': sid,
'isPlayerSeed': 'true',
'url': embed_url,
}),
video_id) video_id)
if not response['success']: if not response['success']:
raise ExtractorError( raise ExtractorError(
@ -118,9 +129,7 @@ class FiveMinIE(InfoExtractor):
parsed_video_url = compat_urllib_parse_urlparse(compat_parse_qs( parsed_video_url = compat_urllib_parse_urlparse(compat_parse_qs(
compat_urllib_parse_urlparse(info['EmbededURL']).query)['videoUrl'][0]) compat_urllib_parse_urlparse(info['EmbededURL']).query)['videoUrl'][0])
for rendition in info['Renditions']: for rendition in info['Renditions']:
if rendition['RenditionType'] == 'm3u8': if rendition['RenditionType'] == 'aac' or rendition['RenditionType'] == 'm3u8':
formats.extend(self._extract_m3u8_formats(rendition['Url'], video_id, m3u8_id='hls'))
elif rendition['RenditionType'] == 'aac':
continue continue
else: else:
rendition_url = compat_urlparse.urlunparse(parsed_video_url._replace(path=replace_extension(parsed_video_url.path.replace('//', '/%s/' % rendition['ID']), rendition['RenditionType']))) rendition_url = compat_urlparse.urlunparse(parsed_video_url._replace(path=replace_extension(parsed_video_url.path.replace('//', '/%s/' % rendition['ID']), rendition['RenditionType'])))

View File

@ -36,6 +36,10 @@ class FoxNewsIE(AMPIE):
# 'upload_date': '20141204', # 'upload_date': '20141204',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
}, },
'params': {
# m3u8 download
'skip_download': True,
},
}, },
{ {
'url': 'http://video.foxnews.com/v/video-embed.html?video_id=3937480&d=video.foxnews.com', 'url': 'http://video.foxnews.com/v/video-embed.html?video_id=3937480&d=video.foxnews.com',

View File

@ -14,7 +14,7 @@ class FreespeechIE(InfoExtractor):
'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0', 'url': 'https://www.freespeech.org/video/obama-romney-campaign-colorado-ahead-debate-0',
'info_dict': { 'info_dict': {
'id': 'poKsVCZ64uU', 'id': 'poKsVCZ64uU',
'ext': 'mp4', 'ext': 'webm',
'title': 'Obama, Romney Campaign in Colorado Ahead of Debate', 'title': 'Obama, Romney Campaign in Colorado Ahead of Debate',
'description': 'Obama, Romney Campaign in Colorado Ahead of Debate', 'description': 'Obama, Romney Campaign in Colorado Ahead of Debate',
'uploader': 'freespeechtv', 'uploader': 'freespeechtv',

View File

@ -2,42 +2,27 @@
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
class GameInformerIE(InfoExtractor): class GameInformerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx' _VALID_URL = r'https?://(?:www\.)?gameinformer\.com/(?:[^/]+/)*(?P<id>.+)\.aspx'
_TEST = { _TEST = {
'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx', 'url': 'http://www.gameinformer.com/b/features/archive/2015/09/26/replay-animal-crossing.aspx',
'md5': '292f26da1ab4beb4c9099f1304d2b071',
'info_dict': { 'info_dict': {
'id': '4515472681001', 'id': '4515472681001',
'ext': 'm3u8', 'ext': 'mp4',
'title': 'Replay - Animal Crossing', 'title': 'Replay - Animal Crossing',
'description': 'md5:2e211891b215c85d061adc7a4dd2d930', 'description': 'md5:2e211891b215c85d061adc7a4dd2d930',
'timestamp': 1443457610706, 'timestamp': 1443457610,
}, 'upload_date': '20150928',
'params': { 'uploader_id': '694940074001',
# m3u8 download
'skip_download': True,
}, },
} }
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/694940074001/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._match_id(url) display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
brightcove_id = self._search_regex(r"getVideo\('[^']+video_id=(\d+)", webpage, 'brightcove id')
bc_api_url = self._search_regex(r"getVideo\('([^']+)'", webpage, 'brightcove api url') return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
json_data = self._download_json(
bc_api_url + '&video_fields=id,name,shortDescription,publishedDate,videoStillURL,length,IOSRenditions',
display_id)
return {
'id': compat_str(json_data['id']),
'display_id': display_id,
'url': json_data['IOSRenditions'][0]['url'],
'title': json_data['name'],
'description': json_data.get('shortDescription'),
'timestamp': int_or_none(json_data.get('publishedDate')),
'duration': int_or_none(json_data.get('length')),
}

View File

@ -47,6 +47,7 @@ from .senateisvp import SenateISVPIE
from .svt import SVTIE from .svt import SVTIE
from .pornhub import PornHubIE from .pornhub import PornHubIE
from .xhamster import XHamsterEmbedIE from .xhamster import XHamsterEmbedIE
from .tnaflix import TNAFlixNetworkEmbedIE
from .vimeo import VimeoIE from .vimeo import VimeoIE
from .dailymotion import DailymotionCloudIE from .dailymotion import DailymotionCloudIE
from .onionstudios import OnionStudiosIE from .onionstudios import OnionStudiosIE
@ -1241,28 +1242,34 @@ class GenericIE(InfoExtractor):
full_response = self._request_webpage(request, video_id) full_response = self._request_webpage(request, video_id)
head_response = full_response head_response = full_response
info_dict = {
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
}
# Check for direct link to a video # Check for direct link to a video
content_type = head_response.headers.get('Content-Type', '') content_type = head_response.headers.get('Content-Type', '')
m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>.+)$', content_type) m = re.match(r'^(?P<type>audio|video|application(?=/(?:ogg$|(?:vnd\.apple\.|x-)?mpegurl)))/(?P<format_id>.+)$', content_type)
if m: if m:
upload_date = unified_strdate( upload_date = unified_strdate(
head_response.headers.get('Last-Modified')) head_response.headers.get('Last-Modified'))
formats = [] format_id = m.group('format_id')
if m.group('format_id').endswith('mpegurl'): if format_id.endswith('mpegurl'):
formats = self._extract_m3u8_formats(url, video_id, 'mp4') formats = self._extract_m3u8_formats(url, video_id, 'mp4')
elif format_id == 'f4m':
formats = self._extract_f4m_formats(url, video_id)
else: else:
formats = [{ formats = [{
'format_id': m.group('format_id'), 'format_id': m.group('format_id'),
'url': url, 'url': url,
'vcodec': 'none' if m.group('type') == 'audio' else None 'vcodec': 'none' if m.group('type') == 'audio' else None
}] }]
return { info_dict.update({
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'direct': True, 'direct': True,
'formats': formats, 'formats': formats,
'upload_date': upload_date, 'upload_date': upload_date,
} })
return info_dict
if not self._downloader.params.get('test', False) and not is_intentional: if not self._downloader.params.get('test', False) and not is_intentional:
force = self._downloader.params.get('force_generic_extractor', False) force = self._downloader.params.get('force_generic_extractor', False)
@ -1290,13 +1297,12 @@ class GenericIE(InfoExtractor):
'URL could be a direct video link, returning it as such.') 'URL could be a direct video link, returning it as such.')
upload_date = unified_strdate( upload_date = unified_strdate(
head_response.headers.get('Last-Modified')) head_response.headers.get('Last-Modified'))
return { info_dict.update({
'id': video_id,
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]),
'direct': True, 'direct': True,
'url': url, 'url': url,
'upload_date': upload_date, 'upload_date': upload_date,
} })
return info_dict
webpage = self._webpage_read_content( webpage = self._webpage_read_content(
full_response, url, video_id, prefix=first_bytes) full_response, url, video_id, prefix=first_bytes)
@ -1313,12 +1319,12 @@ class GenericIE(InfoExtractor):
elif doc.tag == '{http://xspf.org/ns/0/}playlist': elif doc.tag == '{http://xspf.org/ns/0/}playlist':
return self.playlist_result(self._parse_xspf(doc, video_id), video_id) return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag): elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
return { info_dict['formats'] = self._parse_mpd_formats(
'id': video_id, doc, video_id, mpd_base_url=url.rpartition('/')[0])
'title': compat_urllib_parse_unquote(os.path.splitext(url_basename(url))[0]), return info_dict
'formats': self._parse_mpd_formats( elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
doc, video_id, mpd_base_url=url.rpartition('/')[0]), info_dict['formats'] = self._parse_f4m_formats(doc, url, video_id)
} return info_dict
except compat_xml_parse_error: except compat_xml_parse_error:
pass pass
@ -1633,6 +1639,11 @@ class GenericIE(InfoExtractor):
if xhamster_urls: if xhamster_urls:
return _playlist_from_matches(xhamster_urls, ie='XHamsterEmbed') return _playlist_from_matches(xhamster_urls, ie='XHamsterEmbed')
# Look for embedded TNAFlixNetwork player
tnaflix_urls = TNAFlixNetworkEmbedIE._extract_urls(webpage)
if tnaflix_urls:
return _playlist_from_matches(tnaflix_urls, ie=TNAFlixNetworkEmbedIE.ie_key())
# Look for embedded Tvigle player # Look for embedded Tvigle player
mobj = re.search( mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage) r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@ -1979,6 +1990,8 @@ class GenericIE(InfoExtractor):
entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4') entry_info_dict['formats'] = self._extract_m3u8_formats(video_url, video_id, ext='mp4')
elif ext == 'mpd': elif ext == 'mpd':
entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id) entry_info_dict['formats'] = self._extract_mpd_formats(video_url, video_id)
elif ext == 'f4m':
entry_info_dict['formats'] = self._extract_f4m_formats(video_url, video_id)
else: else:
entry_info_dict['url'] = video_url entry_info_dict['url'] = video_url

View File

@ -10,8 +10,8 @@ from ..utils import (
class GoogleDriveIE(InfoExtractor): class GoogleDriveIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:docs|drive)\.google\.com/(?:uc\?.*?id=|file/d/)|video\.google\.com/get_player\?.*?docid=)(?P<id>[a-zA-Z0-9_-]{28})' _VALID_URL = r'https?://(?:(?:docs|drive)\.google\.com/(?:uc\?.*?id=|file/d/)|video\.google\.com/get_player\?.*?docid=)(?P<id>[a-zA-Z0-9_-]{28,})'
_TEST = { _TESTS = [{
'url': 'https://drive.google.com/file/d/0ByeS4oOUV-49Zzh4R1J6R09zazQ/edit?pli=1', 'url': 'https://drive.google.com/file/d/0ByeS4oOUV-49Zzh4R1J6R09zazQ/edit?pli=1',
'md5': '881f7700aec4f538571fa1e0eed4a7b6', 'md5': '881f7700aec4f538571fa1e0eed4a7b6',
'info_dict': { 'info_dict': {
@ -20,7 +20,11 @@ class GoogleDriveIE(InfoExtractor):
'title': 'Big Buck Bunny.mp4', 'title': 'Big Buck Bunny.mp4',
'duration': 46, 'duration': 46,
} }
} }, {
# video id is longer than 28 characters
'url': 'https://drive.google.com/file/d/1ENcQ_jeCuj7y19s66_Ou9dRP4GKGsodiDQ/edit',
'only_matching': True,
}]
_FORMATS_EXT = { _FORMATS_EXT = {
'5': 'flv', '5': 'flv',
'6': 'flv', '6': 'flv',
@ -43,7 +47,7 @@ class GoogleDriveIE(InfoExtractor):
@staticmethod @staticmethod
def _extract_url(webpage): def _extract_url(webpage):
mobj = re.search( mobj = re.search(
r'<iframe[^>]+src="https?://(?:video\.google\.com/get_player\?.*?docid=|(?:docs|drive)\.google\.com/file/d/)(?P<id>[a-zA-Z0-9_-]{28})', r'<iframe[^>]+src="https?://(?:video\.google\.com/get_player\?.*?docid=|(?:docs|drive)\.google\.com/file/d/)(?P<id>[a-zA-Z0-9_-]{28,})',
webpage) webpage)
if mobj: if mobj:
return 'https://drive.google.com/file/d/%s' % mobj.group('id') return 'https://drive.google.com/file/d/%s' % mobj.group('id')

View File

@ -42,7 +42,7 @@ class ImdbIE(InfoExtractor):
for f_url, f_name in extra_formats] for f_url, f_name in extra_formats]
format_pages.append(player_page) format_pages.append(player_page)
quality = qualities(['SD', '480p', '720p']) quality = qualities(('SD', '480p', '720p', '1080p'))
formats = [] formats = []
for format_page in format_pages: for format_page in format_pages:
json_data = self._search_regex( json_data = self._search_regex(

View File

@ -73,7 +73,7 @@ class IndavideoEmbedIE(InfoExtractor):
'url': self._proto_relative_url(thumbnail) 'url': self._proto_relative_url(thumbnail)
} for thumbnail in video.get('thumbnails', [])] } for thumbnail in video.get('thumbnails', [])]
tags = [tag['title'] for tag in video.get('tags', [])] tags = [tag['title'] for tag in video.get('tags') or []]
return { return {
'id': video.get('id') or video_id, 'id': video.get('id') or video_id,

View File

@ -4,15 +4,12 @@ from __future__ import unicode_literals
import base64 import base64
from .common import InfoExtractor from ..compat import compat_urllib_parse_unquote
from ..compat import (
compat_urllib_parse_unquote,
compat_parse_qs,
)
from ..utils import determine_ext from ..utils import determine_ext
from .bokecc import BokeCCBaseIE
class InfoQIE(InfoExtractor): class InfoQIE(BokeCCBaseIE):
_VALID_URL = r'https?://(?:www\.)?infoq\.com/(?:[^/]+/)+(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?infoq\.com/(?:[^/]+/)+(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
@ -38,26 +35,6 @@ class InfoQIE(InfoExtractor):
}, },
}] }]
def _extract_bokecc_videos(self, webpage, video_id):
# TODO: bokecc.com is a Chinese video cloud platform
# It should have an independent extractor but I don't have other
# examples using bokecc
player_params_str = self._html_search_regex(
r'<script[^>]+src="http://p\.bokecc\.com/player\?([^"]+)',
webpage, 'player params', default=None)
player_params = compat_parse_qs(player_params_str)
info_xml = self._download_xml(
'http://p.bokecc.com/servlet/playinfo?uid=%s&vid=%s&m=1' % (
player_params['siteid'][0], player_params['vid'][0]), video_id)
return [{
'format_id': 'bokecc',
'url': quality.find('./copy').attrib['playurl'],
'preference': int(quality.attrib['value']),
} for quality in info_xml.findall('./video/quality')]
def _extract_rtmp_videos(self, webpage): def _extract_rtmp_videos(self, webpage):
# The server URL is hardcoded # The server URL is hardcoded
video_url = 'rtmpe://video.infoq.com/cfx/st/' video_url = 'rtmpe://video.infoq.com/cfx/st/'
@ -101,7 +78,7 @@ class InfoQIE(InfoExtractor):
if '/cn/' in url: if '/cn/' in url:
# for China videos, HTTP video URL exists but always fails with 403 # for China videos, HTTP video URL exists but always fails with 403
formats = self._extract_bokecc_videos(webpage, video_id) formats = self._extract_bokecc_formats(webpage, video_id)
else: else:
formats = self._extract_rtmp_videos(webpage) + self._extract_http_videos(webpage) formats = self._extract_rtmp_videos(webpage) + self._extract_http_videos(webpage)

View File

@ -18,6 +18,7 @@ from ..compat import (
compat_urllib_parse_urlparse, compat_urllib_parse_urlparse,
) )
from ..utils import ( from ..utils import (
decode_packed_codes,
ExtractorError, ExtractorError,
ohdave_rsa_encrypt, ohdave_rsa_encrypt,
remove_start, remove_start,
@ -126,43 +127,11 @@ class IqiyiSDK(object):
class IqiyiSDKInterpreter(object): class IqiyiSDKInterpreter(object):
BASE62_TABLE = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
def __init__(self, sdk_code): def __init__(self, sdk_code):
self.sdk_code = sdk_code self.sdk_code = sdk_code
@classmethod
def base62(cls, num):
if num == 0:
return '0'
ret = ''
while num:
ret = cls.BASE62_TABLE[num % 62] + ret
num = num // 62
return ret
def decode_eval_codes(self):
self.sdk_code = self.sdk_code[5:-3]
mobj = re.search(
r"'([^']+)',62,(\d+),'([^']+)'\.split\('\|'\),[^,]+,{}",
self.sdk_code)
obfucasted_code, count, symbols = mobj.groups()
count = int(count)
symbols = symbols.split('|')
symbol_table = {}
while count:
count -= 1
b62count = self.base62(count)
symbol_table[b62count] = symbols[count] or b62count
self.sdk_code = re.sub(
r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
obfucasted_code)
def run(self, target, ip, timestamp): def run(self, target, ip, timestamp):
self.decode_eval_codes() self.sdk_code = decode_packed_codes(self.sdk_code)
functions = re.findall(r'input=([a-zA-Z0-9]+)\(input', self.sdk_code) functions = re.findall(r'input=([a-zA-Z0-9]+)\(input', self.sdk_code)
@ -529,10 +498,10 @@ class IqiyiIE(InfoExtractor):
raw_data = self._download_json(api_url, video_id) raw_data = self._download_json(api_url, video_id)
return raw_data return raw_data
def get_enc_key(self, swf_url, video_id): def get_enc_key(self, video_id):
# TODO: automatic key extraction # TODO: automatic key extraction
# last update at 2016-01-22 for Zombie::bite # last update at 2016-01-22 for Zombie::bite
enc_key = '6ab6d0280511493ba85594779759d4ed' enc_key = '8ed797d224d043e7ac23d95b70227d32'
return enc_key return enc_key
def _extract_playlist(self, webpage): def _extract_playlist(self, webpage):
@ -582,11 +551,9 @@ class IqiyiIE(InfoExtractor):
r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid') r'data-player-tvid\s*=\s*[\'"](\d+)', webpage, 'tvid')
video_id = self._search_regex( video_id = self._search_regex(
r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id') r'data-player-videoid\s*=\s*[\'"]([a-f\d]+)', webpage, 'video_id')
swf_url = self._search_regex(
r'(http://[^\'"]+MainPlayer[^.]+\.swf)', webpage, 'swf player URL')
_uuid = uuid.uuid4().hex _uuid = uuid.uuid4().hex
enc_key = self.get_enc_key(swf_url, video_id) enc_key = self.get_enc_key(video_id)
raw_data = self.get_raw_data(tvid, video_id, enc_key, _uuid) raw_data = self.get_raw_data(tvid, video_id, enc_key, _uuid)

View File

@ -30,7 +30,7 @@ class JeuxVideoIE(InfoExtractor):
webpage = self._download_webpage(url, title) webpage = self._download_webpage(url, title)
title = self._html_search_meta('name', webpage) or self._og_search_title(webpage) title = self._html_search_meta('name', webpage) or self._og_search_title(webpage)
config_url = self._html_search_regex( config_url = self._html_search_regex(
r'data-src="(/contenu/medias/video.php.*?)"', r'data-src(?:set-video)?="(/contenu/medias/video.php.*?)"',
webpage, 'config URL') webpage, 'config URL')
config_url = 'http://www.jeuxvideo.com' + config_url config_url = 'http://www.jeuxvideo.com' + config_url

View File

@ -7,7 +7,46 @@ from .common import InfoExtractor
from ..utils import int_or_none from ..utils import int_or_none
class JWPlatformIE(InfoExtractor): class JWPlatformBaseIE(InfoExtractor):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
video_data = jwplayer_data['playlist'][0]
subtitles = {}
for track in video_data['tracks']:
if track['kind'] == 'captions':
subtitles[track['label']] = [{'url': self._proto_relative_url(track['file'])}]
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
source_type = source.get('type') or ''
if source_type in ('application/vnd.apple.mpegurl', 'hls'):
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', fatal=False))
elif source_type.startswith('audio'):
formats.append({
'url': source_url,
'vcodec': 'none',
})
else:
formats.append({
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'subtitles': subtitles,
'formats': formats,
}
class JWPlatformIE(JWPlatformBaseIE):
_VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})' _VALID_URL = r'(?:https?://content\.jwplatform\.com/(?:feeds|players|jw6)/|jwplatform:)(?P<id>[a-zA-Z0-9]{8})'
_TEST = { _TEST = {
'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js', 'url': 'http://content.jwplatform.com/players/nPripu9l-ALJ3XQCI.js',
@ -33,38 +72,4 @@ class JWPlatformIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
json_data = self._download_json('http://content.jwplatform.com/feeds/%s.json' % video_id, video_id) json_data = self._download_json('http://content.jwplatform.com/feeds/%s.json' % video_id, video_id)
video_data = json_data['playlist'][0] return self._parse_jwplayer_data(json_data, video_id)
subtitles = {}
for track in video_data['tracks']:
if track['kind'] == 'captions':
subtitles[track['label']] = [{'url': self._proto_relative_url(track['file'])}]
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
source_type = source.get('type') or ''
if source_type == 'application/vnd.apple.mpegurl':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', fatal=False))
elif source_type.startswith('audio'):
formats.append({
'url': source_url,
'vcodec': 'none',
})
else:
formats.append({
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video_data['title'],
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'subtitles': subtitles,
'formats': formats,
}

View File

@ -8,6 +8,7 @@ from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse, compat_urllib_parse,
compat_urlparse, compat_urlparse,
compat_parse_qs,
) )
from ..utils import ( from ..utils import (
clean_html, clean_html,
@ -20,21 +21,17 @@ from ..utils import (
class KalturaIE(InfoExtractor): class KalturaIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
(?: (?:
kaltura:(?P<partner_id_s>\d+):(?P<id_s>[0-9a-z_]+)| kaltura:(?P<partner_id>\d+):(?P<id>[0-9a-z_]+)|
https?:// https?://
(:?(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/ (:?(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/
(?: (?:
(?: (?:
# flash player # flash player
index\.php/kwidget/ index\.php/kwidget|
(?:[^/]+/)*?wid/_(?P<partner_id>\d+)/
(?:[^/]+/)*?entry_id/(?P<id>[0-9a-z_]+)|
# html5 player # html5 player
html5/html5lib/ html5/html5lib/[^/]+/mwEmbedFrame\.php
(?:[^/]+/)*?entry_id/(?P<id_html5>[0-9a-z_]+)
.*\?.*\bwid=_(?P<partner_id_html5>\d+)
) )
) )(?:/(?P<path>[^?]+))?(?:\?(?P<query>.*))?
) )
''' '''
_API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?' _API_BASE = 'http://cdnapi.kaltura.com/api_v3/index.php?'
@ -127,10 +124,41 @@ class KalturaIE(InfoExtractor):
url, smuggled_data = unsmuggle_url(url, {}) url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
partner_id = mobj.group('partner_id_s') or mobj.group('partner_id') or mobj.group('partner_id_html5') partner_id, entry_id = mobj.group('partner_id', 'id')
entry_id = mobj.group('id_s') or mobj.group('id') or mobj.group('id_html5') ks = None
if partner_id and entry_id:
info, flavor_assets = self._get_video_info(entry_id, partner_id) info, flavor_assets = self._get_video_info(entry_id, partner_id)
else:
path, query = mobj.group('path', 'query')
if not path and not query:
raise ExtractorError('Invalid URL', expected=True)
params = {}
if query:
params = compat_parse_qs(query)
if path:
splitted_path = path.split('/')
params.update(dict((zip(splitted_path[::2], [[v] for v in splitted_path[1::2]]))))
if 'wid' in params:
partner_id = params['wid'][0][1:]
elif 'p' in params:
partner_id = params['p'][0]
else:
raise ExtractorError('Invalid URL', expected=True)
if 'entry_id' in params:
entry_id = params['entry_id'][0]
info, flavor_assets = self._get_video_info(entry_id, partner_id)
elif 'uiconf_id' in params and 'flashvars[referenceId]' in params:
reference_id = params['flashvars[referenceId]'][0]
webpage = self._download_webpage(url, reference_id)
entry_data = self._parse_json(self._search_regex(
r'window\.kalturaIframePackageData\s*=\s*({.*});',
webpage, 'kalturaIframePackageData'),
reference_id)['entryResult']
info, flavor_assets = entry_data['meta'], entry_data['contextData']['flavorAssets']
entry_id = info['id']
else:
raise ExtractorError('Invalid URL', expected=True)
ks = params.get('flashvars[ks]', [None])[0]
source_url = smuggled_data.get('source_url') source_url = smuggled_data.get('source_url')
if source_url: if source_url:
@ -140,14 +168,19 @@ class KalturaIE(InfoExtractor):
else: else:
referrer = None referrer = None
def sign_url(unsigned_url):
if ks:
unsigned_url += '/ks/%s' % ks
if referrer:
unsigned_url += '?referrer=%s' % referrer
return unsigned_url
formats = [] formats = []
for f in flavor_assets: for f in flavor_assets:
# Continue if asset is not ready # Continue if asset is not ready
if f['status'] != 2: if f['status'] != 2:
continue continue
video_url = '%s/flavorId/%s' % (info['dataUrl'], f['id']) video_url = sign_url('%s/flavorId/%s' % (info['dataUrl'], f['id']))
if referrer:
video_url += '?referrer=%s' % referrer
formats.append({ formats.append({
'format_id': '%(fileExt)s-%(bitrate)s' % f, 'format_id': '%(fileExt)s-%(bitrate)s' % f,
'ext': f.get('fileExt'), 'ext': f.get('fileExt'),
@ -160,9 +193,7 @@ class KalturaIE(InfoExtractor):
'width': int_or_none(f.get('width')), 'width': int_or_none(f.get('width')),
'url': video_url, 'url': video_url,
}) })
m3u8_url = info['dataUrl'].replace('format/url', 'format/applehttp') m3u8_url = sign_url(info['dataUrl'].replace('format/url', 'format/applehttp'))
if referrer:
m3u8_url += '?referrer=%s' % referrer
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False)) m3u8_url, entry_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))

View File

@ -14,10 +14,10 @@ class KhanAcademyIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'http://www.khanacademy.org/video/one-time-pad', 'url': 'http://www.khanacademy.org/video/one-time-pad',
'md5': '7021db7f2d47d4fff89b13177cb1e8f4', 'md5': '7b391cce85e758fb94f763ddc1bbb979',
'info_dict': { 'info_dict': {
'id': 'one-time-pad', 'id': 'one-time-pad',
'ext': 'mp4', 'ext': 'webm',
'title': 'The one-time pad', 'title': 'The one-time pad',
'description': 'The perfect cipher', 'description': 'The perfect cipher',
'duration': 176, 'duration': 176,

View File

@ -0,0 +1,99 @@
# coding: utf-8
from __future__ import unicode_literals
import random
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote_plus
from ..utils import (
int_or_none,
float_or_none,
timeconvert,
update_url_query,
xpath_text,
)
class KUSIIE(InfoExtractor):
_VALID_URL = r'http://(?:www\.)?kusi\.com/(?P<path>story/.+|video\?clipId=(?P<clipId>\d+))'
_TESTS = [{
'url': 'http://www.kusi.com/story/31183873/turko-files-case-closed-put-on-hold',
'md5': 'f926e7684294cf8cb7bdf8858e1b3988',
'info_dict': {
'id': '12203019',
'ext': 'mp4',
'title': 'Turko Files: Case Closed! & Put On Hold!',
'duration': 231.0,
'upload_date': '20160210',
'timestamp': 1455087571,
'thumbnail': 're:^https?://.*\.jpg$'
},
}, {
'url': 'http://kusi.com/video?clipId=12203019',
'info_dict': {
'id': '12203019',
'ext': 'mp4',
'title': 'Turko Files: Case Closed! & Put On Hold!',
'duration': 231.0,
'upload_date': '20160210',
'timestamp': 1455087571,
'thumbnail': 're:^https?://.*\.jpg$'
},
'params': {
'skip_download': True, # Same as previous one
},
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
clip_id = mobj.group('clipId')
video_id = clip_id or mobj.group('path')
webpage = self._download_webpage(url, video_id)
if clip_id is None:
video_id = clip_id = self._html_search_regex(
r'"clipId"\s*,\s*"(\d+)"', webpage, 'clip id')
affiliate_id = self._search_regex(
r'affiliateId\s*:\s*\'([^\']+)\'', webpage, 'affiliate id')
# See __Packages/worldnow/model/GalleryModel.as of WNGallery.swf
xml_url = update_url_query('http://www.kusi.com/build.asp', {
'buildtype': 'buildfeaturexmlrequest',
'featureType': 'Clip',
'featureid': clip_id,
'affiliateno': affiliate_id,
'clientgroupid': '1',
'rnd': int(round(random.random() * 1000000)),
})
doc = self._download_xml(xml_url, video_id)
video_title = xpath_text(doc, 'HEADLINE', fatal=True)
duration = float_or_none(xpath_text(doc, 'DURATION'), scale=1000)
description = xpath_text(doc, 'ABSTRACT')
thumbnail = xpath_text(doc, './THUMBNAILIMAGE/FILENAME')
createtion_time = timeconvert(xpath_text(doc, 'rfc822creationdate'))
quality_options = doc.find('{http://search.yahoo.com/mrss/}group').findall('{http://search.yahoo.com/mrss/}content')
formats = []
for quality in quality_options:
formats.append({
'url': compat_urllib_parse_unquote_plus(quality.attrib['url']),
'height': int_or_none(quality.attrib.get('height')),
'width': int_or_none(quality.attrib.get('width')),
'vbr': float_or_none(quality.attrib.get('bitratebits'), scale=1000),
})
self._sort_formats(formats)
return {
'id': video_id,
'title': video_title,
'description': description,
'duration': duration,
'formats': formats,
'thumbnail': thumbnail,
'timestamp': createtion_time,
}

View File

@ -68,6 +68,7 @@ class KuwoIE(KuwoBaseIE):
'id': '6446136', 'id': '6446136',
'ext': 'mp3', 'ext': 'mp3',
'title': '', 'title': '',
'description': 'md5:b2ab6295d014005bfc607525bfc1e38a',
'creator': 'IU', 'creator': 'IU',
'upload_date': '20150518', 'upload_date': '20150518',
}, },

View File

@ -1,36 +1,39 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import base64
import datetime import datetime
import hashlib
import re import re
import time import time
import base64
import hashlib
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import ( from ..compat import (
compat_urllib_parse,
compat_ord, compat_ord,
compat_str, compat_str,
compat_urllib_parse,
) )
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
encode_data_uri,
ExtractorError, ExtractorError,
int_or_none,
orderedSet,
parse_iso8601, parse_iso8601,
sanitized_Request, sanitized_Request,
int_or_none,
str_or_none, str_or_none,
encode_data_uri,
url_basename, url_basename,
) )
class LetvIE(InfoExtractor): class LeIE(InfoExtractor):
IE_DESC = '乐视网' IE_DESC = '乐视网'
_VALID_URL = r'http://www\.letv\.com/ptv/vplay/(?P<id>\d+).html' _VALID_URL = r'http://www\.le\.com/ptv/vplay/(?P<id>\d+)\.html'
_URL_TEMPLATE = 'http://www.le.com/ptv/vplay/%s.html'
_TESTS = [{ _TESTS = [{
'url': 'http://www.letv.com/ptv/vplay/22005890.html', 'url': 'http://www.le.com/ptv/vplay/22005890.html',
'md5': 'edadcfe5406976f42f9f266057ee5e40', 'md5': 'edadcfe5406976f42f9f266057ee5e40',
'info_dict': { 'info_dict': {
'id': '22005890', 'id': '22005890',
@ -42,7 +45,7 @@ class LetvIE(InfoExtractor):
'hls_prefer_native': True, 'hls_prefer_native': True,
}, },
}, { }, {
'url': 'http://www.letv.com/ptv/vplay/1415246.html', 'url': 'http://www.le.com/ptv/vplay/1415246.html',
'info_dict': { 'info_dict': {
'id': '1415246', 'id': '1415246',
'ext': 'mp4', 'ext': 'mp4',
@ -54,7 +57,7 @@ class LetvIE(InfoExtractor):
}, },
}, { }, {
'note': 'This video is available only in Mainland China, thus a proxy is needed', 'note': 'This video is available only in Mainland China, thus a proxy is needed',
'url': 'http://www.letv.com/ptv/vplay/1118082.html', 'url': 'http://www.le.com/ptv/vplay/1118082.html',
'md5': '2424c74948a62e5f31988438979c5ad1', 'md5': '2424c74948a62e5f31988438979c5ad1',
'info_dict': { 'info_dict': {
'id': '1118082', 'id': '1118082',
@ -94,17 +97,16 @@ class LetvIE(InfoExtractor):
return encrypted_data return encrypted_data
encrypted_data = encrypted_data[5:] encrypted_data = encrypted_data[5:]
_loc4_ = bytearray() _loc4_ = bytearray(2 * len(encrypted_data))
while encrypted_data: for idx, val in enumerate(encrypted_data):
b = compat_ord(encrypted_data[0]) b = compat_ord(val)
_loc4_.extend([b // 16, b & 0x0f]) _loc4_[2 * idx] = b // 16
encrypted_data = encrypted_data[1:] _loc4_[2 * idx + 1] = b % 16
idx = len(_loc4_) - 11 idx = len(_loc4_) - 11
_loc4_ = _loc4_[idx:] + _loc4_[:idx] _loc4_ = _loc4_[idx:] + _loc4_[:idx]
_loc7_ = bytearray() _loc7_ = bytearray(len(encrypted_data))
while _loc4_: for i in range(len(encrypted_data)):
_loc7_.append(_loc4_[0] * 16 + _loc4_[1]) _loc7_[i] = _loc4_[2 * i] * 16 + _loc4_[2 * i + 1]
_loc4_ = _loc4_[2:]
return bytes(_loc7_) return bytes(_loc7_)
@ -117,10 +119,10 @@ class LetvIE(InfoExtractor):
'splatid': 101, 'splatid': 101,
'format': 1, 'format': 1,
'tkey': self.calc_time_key(int(time.time())), 'tkey': self.calc_time_key(int(time.time())),
'domain': 'www.letv.com' 'domain': 'www.le.com'
} }
play_json_req = sanitized_Request( play_json_req = sanitized_Request(
'http://api.letv.com/mms/out/video/playJson?' + compat_urllib_parse.urlencode(params) 'http://api.le.com/mms/out/video/playJson?' + compat_urllib_parse.urlencode(params)
) )
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy') cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy: if cn_verification_proxy:
@ -193,26 +195,51 @@ class LetvIE(InfoExtractor):
} }
class LetvTvIE(InfoExtractor): class LePlaylistIE(InfoExtractor):
_VALID_URL = r'http://www.letv.com/tv/(?P<id>\d+).html' _VALID_URL = r'http://[a-z]+\.le\.com/[a-z]+/(?P<id>[a-z0-9_]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.letv.com/tv/46177.html', 'url': 'http://www.le.com/tv/46177.html',
'info_dict': { 'info_dict': {
'id': '46177', 'id': '46177',
'title': '美人天下', 'title': '美人天下',
'description': 'md5:395666ff41b44080396e59570dbac01c' 'description': 'md5:395666ff41b44080396e59570dbac01c'
}, },
'playlist_count': 35 'playlist_count': 35
}, {
'url': 'http://tv.le.com/izt/wuzetian/index.html',
'info_dict': {
'id': 'wuzetian',
'title': '武媚娘传奇',
'description': 'md5:e12499475ab3d50219e5bba00b3cb248'
},
# This playlist contains some extra videos other than the drama itself
'playlist_mincount': 96
}, {
'url': 'http://tv.le.com/pzt/lswjzzjc/index.shtml',
# This series is moved to http://www.le.com/tv/10005297.html
'only_matching': True,
}, {
'url': 'http://www.le.com/comic/92063.html',
'only_matching': True,
}, {
'url': 'http://list.le.com/listn/c1009_sc532002_d2_p1_o1.html',
'only_matching': True,
}] }]
@classmethod
def suitable(cls, url):
return False if LeIE.suitable(url) else super(LePlaylistIE, cls).suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id = self._match_id(url)
page = self._download_webpage(url, playlist_id) page = self._download_webpage(url, playlist_id)
media_urls = list(set(re.findall( # Currently old domain names are still used in playlists
r'http://www.letv.com/ptv/vplay/\d+.html', page))) media_ids = orderedSet(re.findall(
entries = [self.url_result(media_url, ie='Letv') r'<a[^>]+href="http://www\.letv\.com/ptv/vplay/(\d+)\.html', page))
for media_url in media_urls] entries = [self.url_result(LeIE._URL_TEMPLATE % media_id, ie='Le')
for media_id in media_ids]
title = self._html_search_meta('keywords', page, title = self._html_search_meta('keywords', page,
fatal=False).split('')[0] fatal=False).split('')[0]
@ -222,31 +249,9 @@ class LetvTvIE(InfoExtractor):
playlist_description=description) playlist_description=description)
class LetvPlaylistIE(LetvTvIE):
_VALID_URL = r'http://tv.letv.com/[a-z]+/(?P<id>[a-z]+)/index.s?html'
_TESTS = [{
'url': 'http://tv.letv.com/izt/wuzetian/index.html',
'info_dict': {
'id': 'wuzetian',
'title': '武媚娘传奇',
'description': 'md5:e12499475ab3d50219e5bba00b3cb248'
},
# This playlist contains some extra videos other than the drama itself
'playlist_mincount': 96
}, {
'url': 'http://tv.letv.com/pzt/lswjzzjc/index.shtml',
'info_dict': {
'id': 'lswjzzjc',
# The title should be "劲舞青春", but I can't find a simple way to
# determine the playlist title
'title': '乐视午间自制剧场',
'description': 'md5:b1eef244f45589a7b5b1af9ff25a4489'
},
'playlist_mincount': 7
}]
class LetvCloudIE(InfoExtractor): class LetvCloudIE(InfoExtractor):
# Most of *.letv.com is changed to *.le.com on 2016/01/02
# but yuntv.letv.com is kept, so also keep the extractor name
IE_DESC = '乐视云' IE_DESC = '乐视云'
_VALID_URL = r'https?://yuntv\.letv\.com/bcloud.html\?.+' _VALID_URL = r'https?://yuntv\.letv\.com/bcloud.html\?.+'
@ -327,7 +332,7 @@ class LetvCloudIE(InfoExtractor):
formats.append({ formats.append({
'url': url, 'url': url,
'ext': determine_ext(decoded_url), 'ext': determine_ext(decoded_url),
'format_id': int_or_none(play_url.get('vtype')), 'format_id': str_or_none(play_url.get('vtype')),
'format_note': str_or_none(play_url.get('definition')), 'format_note': str_or_none(play_url.get('definition')),
'width': int_or_none(play_url.get('vwidth')), 'width': int_or_none(play_url.get('vwidth')),
'height': int_or_none(play_url.get('vheight')), 'height': int_or_none(play_url.get('vheight')),

View File

@ -20,18 +20,18 @@ class LifeNewsIE(InfoExtractor):
_VALID_URL = r'http://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)' _VALID_URL = r'http://lifenews\.ru/(?:mobile/)?(?P<section>news|video)/(?P<id>\d+)'
_TESTS = [{ _TESTS = [{
'url': 'http://lifenews.ru/news/126342', # single video embedded via video/source
'md5': 'e1b50a5c5fb98a6a544250f2e0db570a', 'url': 'http://lifenews.ru/news/98736',
'md5': '77c95eaefaca216e32a76a343ad89d23',
'info_dict': { 'info_dict': {
'id': '126342', 'id': '98736',
'ext': 'mp4', 'ext': 'mp4',
'title': 'МВД разыскивает мужчин, оставивших в IKEA сумку с автоматом', 'title': 'Мужчина нашел дома архив оборонного завода',
'description': 'Камеры наблюдения гипермаркета зафиксировали троих мужчин, спрятавших оружейный арсенал в камере хранения.', 'description': 'md5:3b06b1b39b5e2bea548e403d99b8bf26',
'thumbnail': 're:http://.*\.jpg', 'upload_date': '20120805',
'upload_date': '20140130',
} }
}, { }, {
# video in <iframe> # single video embedded via iframe
'url': 'http://lifenews.ru/news/152125', 'url': 'http://lifenews.ru/news/152125',
'md5': '77d19a6f0886cd76bdbf44b4d971a273', 'md5': '77d19a6f0886cd76bdbf44b4d971a273',
'info_dict': { 'info_dict': {
@ -42,15 +42,33 @@ class LifeNewsIE(InfoExtractor):
'upload_date': '20150402', 'upload_date': '20150402',
} }
}, { }, {
# two videos embedded via iframe
'url': 'http://lifenews.ru/news/153461', 'url': 'http://lifenews.ru/news/153461',
'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
'info_dict': { 'info_dict': {
'id': '153461', 'id': '153461',
'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве', 'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.', 'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'upload_date': '20150505', 'upload_date': '20150505',
} },
'playlist': [{
'md5': '9b6ef8bc0ffa25aebc8bdb40d89ab795',
'info_dict': {
'id': '153461-video1',
'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 1)',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'upload_date': '20150505',
},
}, {
'md5': 'ebb3bf3b1ce40e878d0d628e93eb0322',
'info_dict': {
'id': '153461-video2',
'ext': 'mp4',
'title': 'В Москве спасли потерявшегося медвежонка, который спрятался на дереве (Видео 2)',
'description': 'Маленький хищник не смог найти дорогу домой и обрел временное убежище на тополе недалеко от жилого массива, пока его не нашла соседская собака.',
'upload_date': '20150505',
},
}],
}, { }, {
'url': 'http://lifenews.ru/video/13035', 'url': 'http://lifenews.ru/video/13035',
'only_matching': True, 'only_matching': True,
@ -65,10 +83,14 @@ class LifeNewsIE(InfoExtractor):
'http://lifenews.ru/%s/%s' % (section, video_id), 'http://lifenews.ru/%s/%s' % (section, video_id),
video_id, 'Downloading page') video_id, 'Downloading page')
videos = re.findall(r'<video.*?poster="(?P<poster>[^"]+)".*?src="(?P<video>[^"]+)".*?></video>', webpage) video_urls = re.findall(
iframe_link = self._html_search_regex( r'<video[^>]+><source[^>]+src=["\'](.+?)["\']', webpage)
'<iframe[^>]+src=["\']([^"\']+)["\']', webpage, 'iframe link', default=None)
if not videos and not iframe_link: iframe_links = re.findall(
r'<iframe[^>]+src=["\']((?:https?:)?//embed\.life\.ru/embed/.+?)["\']',
webpage)
if not video_urls and not iframe_links:
raise ExtractorError('No media links available for %s' % video_id) raise ExtractorError('No media links available for %s' % video_id)
title = remove_end( title = remove_end(
@ -95,31 +117,44 @@ class LifeNewsIE(InfoExtractor):
'upload_date': upload_date, 'upload_date': upload_date,
} }
def make_entry(video_id, media, video_number=None): def make_entry(video_id, video_url, index=None):
cur_info = dict(common_info) cur_info = dict(common_info)
cur_info.update({ cur_info.update({
'id': video_id, 'id': video_id if not index else '%s-video%s' % (video_id, index),
'url': media[1], 'url': video_url,
'thumbnail': media[0], 'title': title if not index else '%s (Видео %s)' % (title, index),
'title': title if video_number is None else '%s-video%s' % (title, video_number),
}) })
return cur_info return cur_info
if iframe_link: def make_video_entry(video_id, video_url, index=None):
iframe_link = self._proto_relative_url(iframe_link, 'http:') video_url = compat_urlparse.urljoin(url, video_url)
cur_info = dict(common_info) return make_entry(video_id, video_url, index)
cur_info.update({
'_type': 'url_transparent', def make_iframe_entry(video_id, video_url, index=None):
'id': video_id, video_url = self._proto_relative_url(video_url, 'http:')
'title': title, cur_info = make_entry(video_id, video_url, index)
'url': iframe_link, cur_info['_type'] = 'url_transparent'
})
return cur_info return cur_info
if len(videos) == 1: if len(video_urls) == 1 and not iframe_links:
return make_entry(video_id, videos[0]) return make_video_entry(video_id, video_urls[0])
else:
return [make_entry(video_id, media, video_number + 1) for video_number, media in enumerate(videos)] if len(iframe_links) == 1 and not video_urls:
return make_iframe_entry(video_id, iframe_links[0])
entries = []
if video_urls:
for num, video_url in enumerate(video_urls, 1):
entries.append(make_video_entry(video_id, video_url, num))
if iframe_links:
for num, iframe_link in enumerate(iframe_links, len(video_urls) + 1):
entries.append(make_iframe_entry(video_id, iframe_link, num))
playlist = common_info.copy()
playlist.update(self.playlist_result(entries, video_id, title, description))
return playlist
class LifeEmbedIE(InfoExtractor): class LifeEmbedIE(InfoExtractor):

View File

@ -14,6 +14,7 @@ from ..utils import (
xpath_with_ns, xpath_with_ns,
xpath_text, xpath_text,
orderedSet, orderedSet,
update_url_query,
int_or_none, int_or_none,
float_or_none, float_or_none,
parse_iso8601, parse_iso8601,
@ -64,7 +65,7 @@ class LivestreamIE(InfoExtractor):
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None): def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
base_ele = find_xpath_attr( base_ele = find_xpath_attr(
smil, self._xpath_ns('.//meta', namespace), 'name', 'httpBase') smil, self._xpath_ns('.//meta', namespace), 'name', 'httpBase')
base = base_ele.get('content') if base_ele else 'http://livestreamvod-f.akamaihd.net/' base = base_ele.get('content') if base_ele is not None else 'http://livestreamvod-f.akamaihd.net/'
formats = [] formats = []
video_nodes = smil.findall(self._xpath_ns('.//video', namespace)) video_nodes = smil.findall(self._xpath_ns('.//video', namespace))
@ -72,7 +73,10 @@ class LivestreamIE(InfoExtractor):
for vn in video_nodes: for vn in video_nodes:
tbr = int_or_none(vn.attrib.get('system-bitrate'), 1000) tbr = int_or_none(vn.attrib.get('system-bitrate'), 1000)
furl = ( furl = (
'%s%s?v=3.0.3&fp=WIN%%2014,0,0,145' % (base, vn.attrib['src'])) update_url_query(compat_urlparse.urljoin(base, vn.attrib['src']), {
'v': '3.0.3',
'fp': 'WIN% 14,0,0,145',
}))
if 'clipBegin' in vn.attrib: if 'clipBegin' in vn.attrib:
furl += '&ssek=' + vn.attrib['clipBegin'] furl += '&ssek=' + vn.attrib['clipBegin']
formats.append({ formats.append({

View File

@ -0,0 +1,40 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class MakersChannelIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?makerschannel\.com/.*(?P<id_type>video|production)_id=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://makerschannel.com/en/zoomin/community-highlights?video_id=849',
'md5': '624a512c6969236b5967bf9286345ad1',
'info_dict': {
'id': '849',
'ext': 'mp4',
'title': 'Landing a bus on a plane is an epic win',
'uploader': 'ZoomIn',
'description': 'md5:cd9cca2ea7b69b78be81d07020c97139',
}
}
def _real_extract(self, url):
id_type, url_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, url_id)
video_data = self._html_search_regex(r'<div([^>]+data-%s-id="%s"[^>]+)>' % (id_type, url_id), webpage, 'video data')
def extract_data_val(attr, fatal=False):
return self._html_search_regex(r'data-%s\s*=\s*"([^"]+)"' % attr, video_data, attr, fatal=fatal)
minoto_id = self._search_regex(r'/id/([a-zA-Z0-9]+)', extract_data_val('video-src', True), 'minoto id')
return {
'_type': 'url_transparent',
'url': 'minoto:%s' % minoto_id,
'id': extract_data_val('video-id', True),
'title': extract_data_val('title', True),
'description': extract_data_val('description'),
'thumbnail': extract_data_val('image'),
'uploader': extract_data_val('channel'),
}

View File

@ -14,7 +14,7 @@ from ..utils import (
class MDRIE(InfoExtractor): class MDRIE(InfoExtractor):
IE_DESC = 'MDR.DE and KiKA' IE_DESC = 'MDR.DE and KiKA'
_VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z]+(?P<id>\d+)(?:_.+?)?\.html' _VALID_URL = r'https?://(?:www\.)?(?:mdr|kika)\.de/(?:.*)/[a-z]+-?(?P<id>\d+)(?:_.+?)?\.html'
_TESTS = [{ _TESTS = [{
# MDR regularly deletes its videos # MDR regularly deletes its videos
@ -60,6 +60,9 @@ class MDRIE(InfoExtractor):
}, { }, {
'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/einzelsendung2534.html', 'url': 'http://www.kika.de/sendungen/einzelsendungen/weihnachtsprogramm/einzelsendung2534.html',
'only_matching': True, 'only_matching': True,
}, {
'url': 'http://www.mdr.de/mediathek/mdr-videos/a/video-1334.html',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -68,8 +71,8 @@ class MDRIE(InfoExtractor):
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
data_url = self._search_regex( data_url = self._search_regex(
r'dataURL\s*:\s*(["\'])(?P<url>/.+/(?:video|audio)[0-9]+-avCustom\.xml)\1', r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>\\?/.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
webpage, 'data url', group='url') webpage, 'data url', default=None, group='url').replace('\/', '/')
doc = self._download_xml( doc = self._download_xml(
compat_urlparse.urljoin(url, data_url), video_id) compat_urlparse.urljoin(url, data_url), video_id)

View File

@ -0,0 +1,56 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
class MinotoIE(InfoExtractor):
_VALID_URL = r'(?:minoto:|https?://(?:play|iframe|embed)\.minoto-video\.com/(?P<player_id>[0-9]+)/)(?P<id>[a-zA-Z0-9]+)'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
player_id = mobj.group('player_id') or '1'
video_id = mobj.group('id')
video_data = self._download_json('http://play.minoto-video.com/%s/%s.js' % (player_id, video_id), video_id)
video_metadata = video_data['video-metadata']
formats = []
for fmt in video_data['video-files']:
fmt_url = fmt.get('url')
if not fmt_url:
continue
container = fmt.get('container')
if container == 'hls':
formats.extend(fmt_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
else:
fmt_profile = fmt.get('profile') or {}
f = {
'format_id': fmt_profile.get('name-short'),
'format_note': fmt_profile.get('name'),
'url': fmt_url,
'container': container,
'tbr': int_or_none(fmt.get('bitrate')),
'filesize': int_or_none(fmt.get('filesize')),
'width': int_or_none(fmt.get('width')),
'height': int_or_none(fmt.get('height')),
}
codecs = fmt.get('codecs')
if codecs:
codecs = codecs.split(',')
if len(codecs) == 2:
f.update({
'vcodec': codecs[0],
'acodec': codecs[1],
})
formats.append(f)
self._sort_formats(formats)
return {
'id': video_id,
'title': video_metadata['title'],
'description': video_metadata.get('description'),
'thumbnail': video_metadata.get('video-poster', {}).get('url'),
'formats': formats,
}

View File

@ -99,7 +99,7 @@ class OCWMITIE(InfoExtractor):
'url': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/', 'url': 'http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-7-multiple-variables-expectations-independence/',
'info_dict': { 'info_dict': {
'id': 'EObHWIEKGjA', 'id': 'EObHWIEKGjA',
'ext': 'mp4', 'ext': 'webm',
'title': 'Lecture 7: Multiple Discrete Random Variables: Expectations, Conditioning, Independence', 'title': 'Lecture 7: Multiple Discrete Random Variables: Expectations, Conditioning, Independence',
'description': 'In this lecture, the professor discussed multiple random variables, expectations, and binomial distribution.', 'description': 'In this lecture, the professor discussed multiple random variables, expectations, and binomial distribution.',
'upload_date': '20121109', 'upload_date': '20121109',

View File

@ -7,6 +7,7 @@ from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
HEADRequest, HEADRequest,
parse_count,
str_to_int, str_to_int,
) )
@ -85,8 +86,8 @@ class MixcloudIE(InfoExtractor):
uploader_id = self._search_regex( uploader_id = self._search_regex(
r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False) r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
description = self._og_search_description(webpage) description = self._og_search_description(webpage)
like_count = str_to_int(self._search_regex( like_count = parse_count(self._search_regex(
r'\bbutton-favorite\b[^>]+m-ajax-toggle-count="([^"]+)"', r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
webpage, 'like count', fatal=False)) webpage, 'like count', fatal=False))
view_count = str_to_int(self._search_regex( view_count = str_to_int(self._search_regex(
[r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"', [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',

View File

@ -6,6 +6,7 @@ from ..compat import compat_urllib_parse_unquote
from ..utils import ( from ..utils import (
int_or_none, int_or_none,
xpath_text, xpath_text,
update_url_query,
) )
@ -46,17 +47,29 @@ class NozIE(InfoExtractor):
doc, './/article/movie/file/duration')) doc, './/article/movie/file/duration'))
formats = [] formats = []
for qnode in doc.findall('.//article/movie/file/qualities/qual'): for qnode in doc.findall('.//article/movie/file/qualities/qual'):
video_node = qnode.find('./html_urls/video_url[@format="video/mp4"]') http_url = xpath_text(
if video_node is None: qnode, './html_urls/video_url[@format="video/mp4"]')
continue # auto if http_url:
formats.append({ formats.append({
'url': video_node.text, 'url': http_url,
'format_name': xpath_text(qnode, './name'), 'format_name': xpath_text(qnode, './name'),
'format_id': xpath_text(qnode, './id'), 'format_id': '%s-%s' % ('http', xpath_text(qnode, './id')),
'height': int_or_none(xpath_text(qnode, './height')), 'height': int_or_none(xpath_text(qnode, './height')),
'width': int_or_none(xpath_text(qnode, './width')), 'width': int_or_none(xpath_text(qnode, './width')),
'tbr': int_or_none(xpath_text(qnode, './bitrate'), scale=1000), 'tbr': int_or_none(xpath_text(qnode, './bitrate'), scale=1000),
}) })
else:
f4m_url = xpath_text(qnode, 'url_hd2')
if f4m_url:
formats.extend(self._extract_f4m_formats(
update_url_query(f4m_url, {'hdcore': '3.4.0'}),
video_id, f4m_id='hds', fatal=False))
m3u8_url = xpath_text(
qnode, './html_urls/video_url[@format="application/vnd.apple.mpegurl"]')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', 'm3u8_native',
m3u8_id='hls', fatal=False))
self._sort_formats(formats) self._sort_formats(formats)
return { return {

View File

@ -4,7 +4,10 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..compat import compat_urlparse from ..compat import (
compat_urlparse,
compat_urllib_parse_unquote,
)
from ..utils import ( from ..utils import (
determine_ext, determine_ext,
ExtractorError, ExtractorError,
@ -87,7 +90,7 @@ class NRKIE(InfoExtractor):
class NRKPlaylistIE(InfoExtractor): class NRKPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video)(?:[^/]+/)+(?P<id>[^/]+)' _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763', 'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763',
@ -126,6 +129,37 @@ class NRKPlaylistIE(InfoExtractor):
entries, playlist_id, playlist_title, playlist_description) entries, playlist_id, playlist_title, playlist_description)
class NRKSkoleIE(InfoExtractor):
IE_DESC = 'NRK Skole'
_VALID_URL = r'https?://(?:www\.)?nrk\.no/skole/klippdetalj?.*\btopic=(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://nrk.no/skole/klippdetalj?topic=nrk:klipp/616532',
'md5': '04cd85877cc1913bce73c5d28a47e00f',
'info_dict': {
'id': '6021',
'ext': 'flv',
'title': 'Genetikk og eneggede tvillinger',
'description': 'md5:3aca25dcf38ec30f0363428d2b265f8d',
'duration': 399,
},
}, {
'url': 'http://www.nrk.no/skole/klippdetalj?topic=nrk%3Aklipp%2F616532#embed',
'only_matching': True,
}, {
'url': 'http://www.nrk.no/skole/klippdetalj?topic=urn:x-mediadb:21379',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = compat_urllib_parse_unquote(self._match_id(url))
webpage = self._download_webpage(url, video_id)
nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id')
return self.url_result('nrk:%s' % nrk_id)
class NRKTVIE(InfoExtractor): class NRKTVIE(InfoExtractor):
IE_DESC = 'NRK TV and NRK Radio' IE_DESC = 'NRK TV and NRK Radio'
_VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?' _VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'

View File

@ -12,14 +12,14 @@ class PyvideoIE(InfoExtractor):
_TESTS = [ _TESTS = [
{ {
'url': 'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes', 'url': 'http://pyvideo.org/video/1737/become-a-logging-expert-in-30-minutes',
'md5': 'de317418c8bc76b1fd8633e4f32acbc6', 'md5': '520915673e53a5c5d487c36e0c4d85b5',
'info_dict': { 'info_dict': {
'id': '24_4WWkSmNo', 'id': '24_4WWkSmNo',
'ext': 'mp4', 'ext': 'webm',
'title': 'Become a logging expert in 30 minutes', 'title': 'Become a logging expert in 30 minutes',
'description': 'md5:9665350d466c67fb5b1598de379021f7', 'description': 'md5:9665350d466c67fb5b1598de379021f7',
'upload_date': '20130320', 'upload_date': '20130320',
'uploader': 'NextDayVideo', 'uploader': 'Next Day Video',
'uploader_id': 'NextDayVideo', 'uploader_id': 'NextDayVideo',
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],

View File

@ -19,7 +19,7 @@ class Revision3IE(InfoExtractor):
'url': 'http://www.revision3.com/technobuffalo/5-google-predictions-for-2016', 'url': 'http://www.revision3.com/technobuffalo/5-google-predictions-for-2016',
'md5': 'd94a72d85d0a829766de4deb8daaf7df', 'md5': 'd94a72d85d0a829766de4deb8daaf7df',
'info_dict': { 'info_dict': {
'id': '73034', 'id': '71089',
'display_id': 'technobuffalo/5-google-predictions-for-2016', 'display_id': 'technobuffalo/5-google-predictions-for-2016',
'ext': 'webm', 'ext': 'webm',
'title': '5 Google Predictions for 2016', 'title': '5 Google Predictions for 2016',
@ -31,6 +31,7 @@ class Revision3IE(InfoExtractor):
'uploader_id': 'technobuffalo', 'uploader_id': 'technobuffalo',
} }
}, { }, {
# Show
'url': 'http://testtube.com/brainstuff', 'url': 'http://testtube.com/brainstuff',
'info_dict': { 'info_dict': {
'id': '251', 'id': '251',
@ -41,7 +42,7 @@ class Revision3IE(InfoExtractor):
}, { }, {
'url': 'https://testtube.com/dnews/5-weird-ways-plants-can-eat-animals?utm_source=FB&utm_medium=DNews&utm_campaign=DNewsSocial', 'url': 'https://testtube.com/dnews/5-weird-ways-plants-can-eat-animals?utm_source=FB&utm_medium=DNews&utm_campaign=DNewsSocial',
'info_dict': { 'info_dict': {
'id': '60163', 'id': '58227',
'display_id': 'dnews/5-weird-ways-plants-can-eat-animals', 'display_id': 'dnews/5-weird-ways-plants-can-eat-animals',
'duration': 275, 'duration': 275,
'ext': 'webm', 'ext': 'webm',
@ -52,18 +53,72 @@ class Revision3IE(InfoExtractor):
'uploader': 'DNews', 'uploader': 'DNews',
'uploader_id': 'dnews', 'uploader_id': 'dnews',
}, },
}, {
'url': 'http://testtube.com/tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
'info_dict': {
'id': '71618',
'ext': 'mp4',
'display_id': 'tt-editors-picks/the-israel-palestine-conflict-explained-in-ten-min',
'title': 'The Israel-Palestine Conflict Explained in Ten Minutes',
'description': 'If you\'d like to learn about the struggle between Israelis and Palestinians, this video is a great place to start',
'uploader': 'Editors\' Picks',
'uploader_id': 'tt-editors-picks',
'timestamp': 1453309200,
'upload_date': '20160120',
},
'add_ie': ['Youtube'],
}, {
# Tag
'url': 'http://testtube.com/tech-news',
'info_dict': {
'id': '21018',
'title': 'tech news',
},
'playlist_mincount': 9,
}] }]
_PAGE_DATA_TEMPLATE = 'http://www.%s/apiProxy/ddn/%s?domain=%s' _PAGE_DATA_TEMPLATE = 'http://www.%s/apiProxy/ddn/%s?domain=%s'
_API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62' _API_KEY = 'ba9c741bce1b9d8e3defcc22193f3651b8867e62'
def _real_extract(self, url): def _real_extract(self, url):
domain, display_id = re.match(self._VALID_URL, url).groups() domain, display_id = re.match(self._VALID_URL, url).groups()
site = domain.split('.')[0]
page_info = self._download_json( page_info = self._download_json(
self._PAGE_DATA_TEMPLATE % (domain, display_id, domain), display_id) self._PAGE_DATA_TEMPLATE % (domain, display_id, domain), display_id)
if page_info['data']['type'] == 'episode': page_data = page_info['data']
episode_data = page_info['data'] page_type = page_data['type']
video_id = compat_str(episode_data['video']['data']['id']) if page_type in ('episode', 'embed'):
show_data = page_data['show']['data']
page_id = compat_str(page_data['id'])
video_id = compat_str(page_data['video']['data']['id'])
preference = qualities(['mini', 'small', 'medium', 'large'])
thumbnails = [{
'url': image_url,
'id': image_id,
'preference': preference(image_id)
} for image_id, image_url in page_data.get('images', {}).items()]
info = {
'id': page_id,
'display_id': display_id,
'title': unescapeHTML(page_data['name']),
'description': unescapeHTML(page_data.get('summary')),
'timestamp': parse_iso8601(page_data.get('publishTime'), ' '),
'author': page_data.get('author'),
'uploader': show_data.get('name'),
'uploader_id': show_data.get('slug'),
'thumbnails': thumbnails,
'extractor_key': site,
}
if page_type == 'embed':
info.update({
'_type': 'url_transparent',
'url': page_data['video']['data']['embed'],
})
return info
video_data = self._download_json( video_data = self._download_json(
'http://revision3.com/api/getPlaylist.json?api_key=%s&codecs=h264,vp8,theora&video_id=%s' % (self._API_KEY, video_id), 'http://revision3.com/api/getPlaylist.json?api_key=%s&codecs=h264,vp8,theora&video_id=%s' % (self._API_KEY, video_id),
video_id)['items'][0] video_id)['items'][0]
@ -84,36 +139,30 @@ class Revision3IE(InfoExtractor):
}) })
self._sort_formats(formats) self._sort_formats(formats)
preference = qualities(['mini', 'small', 'medium', 'large']) info.update({
thumbnails = [{
'url': image_url,
'id': image_id,
'preference': preference(image_id)
} for image_id, image_url in video_data.get('images', {}).items()]
return {
'id': video_id,
'display_id': display_id,
'title': unescapeHTML(video_data['title']), 'title': unescapeHTML(video_data['title']),
'description': unescapeHTML(video_data.get('summary')), 'description': unescapeHTML(video_data.get('summary')),
'timestamp': parse_iso8601(episode_data.get('publishTime'), ' '),
'author': episode_data.get('author'),
'uploader': video_data.get('show', {}).get('name'), 'uploader': video_data.get('show', {}).get('name'),
'uploader_id': video_data.get('show', {}).get('slug'), 'uploader_id': video_data.get('show', {}).get('slug'),
'duration': int_or_none(video_data.get('duration')), 'duration': int_or_none(video_data.get('duration')),
'thumbnails': thumbnails,
'formats': formats, 'formats': formats,
} })
return info
else: else:
show_data = page_info['show']['data'] list_data = page_info[page_type]['data']
episodes_data = page_info['episodes']['data'] episodes_data = page_info['episodes']['data']
num_episodes = page_info['meta']['totalEpisodes'] num_episodes = page_info['meta']['totalEpisodes']
processed_episodes = 0 processed_episodes = 0
entries = [] entries = []
page_num = 1 page_num = 1
while True: while True:
entries.extend([self.url_result( entries.extend([{
'http://%s/%s/%s' % (domain, display_id, episode['slug'])) for episode in episodes_data]) '_type': 'url',
'url': 'http://%s%s' % (domain, episode['path']),
'id': compat_str(episode['id']),
'ie_key': 'Revision3',
'extractor_key': site,
} for episode in episodes_data])
processed_episodes += len(episodes_data) processed_episodes += len(episodes_data)
if processed_episodes == num_episodes: if processed_episodes == num_episodes:
break break
@ -123,5 +172,5 @@ class Revision3IE(InfoExtractor):
display_id)['episodes']['data'] display_id)['episodes']['data']
return self.playlist_result( return self.playlist_result(
entries, compat_str(show_data['id']), entries, compat_str(list_data['id']),
show_data.get('name'), show_data.get('summary')) list_data.get('name'), list_data.get('summary'))

View File

@ -0,0 +1,116 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_parse_qs
from ..utils import (
xpath_text,
xpath_element,
int_or_none,
parse_iso8601,
ExtractorError,
)
class RICEIE(InfoExtractor):
_VALID_URL = r'https?://mediahub\.rice\.edu/app/[Pp]ortal/video\.aspx\?(?P<query>.+)'
_TEST = {
'url': 'https://mediahub.rice.edu/app/Portal/video.aspx?PortalID=25ffd62c-3d01-4b29-8c70-7c94270efb3e&DestinationID=66bc9434-03bd-4725-b47e-c659d8d809db&ContentID=YEWIvbhb40aqdjMD1ALSqw',
'md5': '9b83b4a2eead4912dc3b7fac7c449b6a',
'info_dict': {
'id': 'YEWIvbhb40aqdjMD1ALSqw',
'ext': 'mp4',
'title': 'Active Learning in Archeology',
'upload_date': '20140616',
'timestamp': 1402926346,
}
}
_NS = 'http://schemas.datacontract.org/2004/07/ensembleVideo.Data.Service.Contracts.Models.Player.Config'
def _real_extract(self, url):
qs = compat_parse_qs(re.match(self._VALID_URL, url).group('query'))
if not qs.get('PortalID') or not qs.get('DestinationID') or not qs.get('ContentID'):
raise ExtractorError('Invalid URL', expected=True)
portal_id = qs['PortalID'][0]
playlist_id = qs['DestinationID'][0]
content_id = qs['ContentID'][0]
content_data = self._download_xml('https://mediahub.rice.edu/api/portal/GetContentTitle', content_id, query={
'portalId': portal_id,
'playlistId': playlist_id,
'contentId': content_id
})
metadata = xpath_element(content_data, './/metaData', fatal=True)
title = xpath_text(metadata, 'primaryTitle', fatal=True)
encodings = xpath_element(content_data, './/encodings', fatal=True)
player_data = self._download_xml('https://mediahub.rice.edu/api/player/GetPlayerConfig', content_id, query={
'temporaryLinkId': xpath_text(encodings, 'temporaryLinkId', fatal=True),
'contentId': content_id,
})
common_fmt = {}
dimensions = xpath_text(encodings, 'dimensions')
if dimensions:
wh = dimensions.split('x')
if len(wh) == 2:
common_fmt.update({
'width': int_or_none(wh[0]),
'height': int_or_none(wh[1]),
})
formats = []
rtsp_path = xpath_text(player_data, self._xpath_ns('RtspPath', self._NS))
if rtsp_path:
fmt = {
'url': rtsp_path,
'format_id': 'rtsp',
}
fmt.update(common_fmt)
formats.append(fmt)
for source in player_data.findall(self._xpath_ns('.//Source', self._NS)):
video_url = xpath_text(source, self._xpath_ns('File', self._NS))
if not video_url:
continue
if '.m3u8' in video_url:
formats.extend(self._extract_m3u8_formats(video_url, content_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
else:
fmt = {
'url': video_url,
'format_id': video_url.split(':')[0],
}
fmt.update(common_fmt)
rtmp = re.search(r'^(?P<url>rtmp://[^/]+/(?P<app>.+))/(?P<playpath>mp4:.+)$', video_url)
if rtmp:
fmt.update({
'url': rtmp.group('url'),
'play_path': rtmp.group('playpath'),
'app': rtmp.group('app'),
'ext': 'flv',
})
formats.append(fmt)
self._sort_formats(formats)
thumbnails = []
for content_asset in content_data.findall('.//contentAssets'):
asset_type = xpath_text(content_asset, 'type')
if asset_type == 'image':
image_url = xpath_text(content_asset, 'httpPath')
if not image_url:
continue
thumbnails.append({
'id': xpath_text(content_asset, 'ID'),
'url': image_url,
})
return {
'id': content_id,
'title': title,
'description': xpath_text(metadata, 'abstract'),
'duration': int_or_none(xpath_text(metadata, 'duration')),
'timestamp': parse_iso8601(xpath_text(metadata, 'dateUpdated')),
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -10,6 +10,7 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
remove_end, remove_end,
remove_start,
sanitized_Request, sanitized_Request,
std_headers, std_headers,
struct_unpack, struct_unpack,
@ -178,14 +179,14 @@ class RTVEInfantilIE(InfoExtractor):
class RTVELiveIE(InfoExtractor): class RTVELiveIE(InfoExtractor):
IE_NAME = 'rtve.es:live' IE_NAME = 'rtve.es:live'
IE_DESC = 'RTVE.es live streams' IE_DESC = 'RTVE.es live streams'
_VALID_URL = r'http://www\.rtve\.es/(?:deportes/directo|noticias|television)/(?P<id>[a-zA-Z0-9-]+)' _VALID_URL = r'http://www\.rtve\.es/directo/(?P<id>[a-zA-Z0-9-]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://www.rtve.es/noticias/directo-la-1/', 'url': 'http://www.rtve.es/directo/la-1/',
'info_dict': { 'info_dict': {
'id': 'directo-la-1', 'id': 'la-1',
'ext': 'flv', 'ext': 'mp4',
'title': 're:^La 1 de TVE [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$', 'title': 're:^La 1 [0-9]{4}-[0-9]{2}-[0-9]{2}Z[0-9]{6}$',
}, },
'params': { 'params': {
'skip_download': 'live stream', 'skip_download': 'live stream',
@ -198,23 +199,20 @@ class RTVELiveIE(InfoExtractor):
video_id = mobj.group('id') video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
player_url = self._search_regex( title = remove_end(self._og_search_title(webpage), ' en directo en RTVE.es')
r'<param name="movie" value="([^"]+)"/>', webpage, 'player URL') title = remove_start(title, 'Estoy viendo ')
title = remove_end(self._og_search_title(webpage), ' en directo')
title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time) title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
vidplayer_id = self._search_regex( vidplayer_id = self._search_regex(
r' id="vidplayer([0-9]+)"', webpage, 'internal video ID') r'playerId=player([0-9]+)', webpage, 'internal video ID')
png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/default/videos/%s.png' % vidplayer_id png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
png = self._download_webpage(png_url, video_id, 'Downloading url information') png = self._download_webpage(png_url, video_id, 'Downloading url information')
video_url = _decrypt_url(png) m3u8_url = _decrypt_url(png)
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
return { return {
'id': video_id, 'id': video_id,
'ext': 'flv',
'title': title, 'title': title,
'url': video_url, 'formats': formats,
'app': 'rtve-live-live?ovpfv=2.1.2', 'is_live': True,
'player_url': player_url,
'rtmp_live': True,
} }

View File

@ -4,14 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
sanitized_Request, sanitized_Request,
smuggle_url,
std_headers, std_headers,
urlencode_postdata, urlencode_postdata,
update_url_query,
) )
@ -20,28 +19,30 @@ class SafariBaseIE(InfoExtractor):
_SUCCESSFUL_LOGIN_REGEX = r'<a href="/accounts/logout/"[^>]*>Sign Out</a>' _SUCCESSFUL_LOGIN_REGEX = r'<a href="/accounts/logout/"[^>]*>Sign Out</a>'
_NETRC_MACHINE = 'safari' _NETRC_MACHINE = 'safari'
_API_BASE = 'https://www.safaribooksonline.com/api/v1/book' _API_BASE = 'https://www.safaribooksonline.com/api/v1'
_API_FORMAT = 'json' _API_FORMAT = 'json'
LOGGED_IN = False LOGGED_IN = False
def _real_initialize(self): def _real_initialize(self):
# We only need to log in once for courses or individual videos self._login()
if not self.LOGGED_IN:
self._login()
SafariBaseIE.LOGGED_IN = True
def _login(self): def _login(self):
# We only need to log in once for courses or individual videos
if self.LOGGED_IN:
return
(username, password) = self._get_login_info() (username, password) = self._get_login_info()
if username is None: if username is None:
self.raise_login_required('safaribooksonline.com account is required') return
headers = std_headers headers = std_headers.copy()
if 'Referer' not in headers: if 'Referer' not in headers:
headers['Referer'] = self._LOGIN_URL headers['Referer'] = self._LOGIN_URL
login_page_request = sanitized_Request(self._LOGIN_URL, headers=headers)
login_page = self._download_webpage( login_page = self._download_webpage(
self._LOGIN_URL, None, login_page_request, None,
'Downloading login form') 'Downloading login form')
csrf = self._html_search_regex( csrf = self._html_search_regex(
@ -66,6 +67,8 @@ class SafariBaseIE(InfoExtractor):
'Login failed; make sure your credentials are correct and try again.', 'Login failed; make sure your credentials are correct and try again.',
expected=True) expected=True)
SafariBaseIE.LOGGED_IN = True
self.to_screen('Login successful') self.to_screen('Login successful')
@ -85,13 +88,15 @@ class SafariIE(SafariBaseIE):
_TESTS = [{ _TESTS = [{
'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/part00.html', 'url': 'https://www.safaribooksonline.com/library/view/hadoop-fundamentals-livelessons/9780133392838/part00.html',
'md5': '5b0c4cc1b3c1ba15dda7344085aa5592', 'md5': 'dcc5a425e79f2564148652616af1f2a3',
'info_dict': { 'info_dict': {
'id': '2842601850001', 'id': '0_qbqx90ic',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Introduction', 'title': 'Introduction to Hadoop Fundamentals LiveLessons',
'timestamp': 1437758058,
'upload_date': '20150724',
'uploader_id': 'stork',
}, },
'skip': 'Requires safaribooksonline account credentials',
}, { }, {
'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html', 'url': 'https://www.safaribooksonline.com/api/v1/book/9780133392838/chapter/part00.html',
'only_matching': True, 'only_matching': True,
@ -106,15 +111,30 @@ class SafariIE(SafariBaseIE):
course_id = mobj.group('course_id') course_id = mobj.group('course_id')
part = mobj.group('part') part = mobj.group('part')
webpage = self._download_webpage( webpage = self._download_webpage(url, '%s/%s' % (course_id, part))
'%s/%s/chapter-content/%s.html' % (self._API_BASE, course_id, part), reference_id = self._search_regex(r'data-reference-id="([^"]+)"', webpage, 'kaltura reference id')
part) partner_id = self._search_regex(r'data-partner-id="([^"]+)"', webpage, 'kaltura widget id')
ui_id = self._search_regex(r'data-ui-id="([^"]+)"', webpage, 'kaltura uiconf id')
bc_url = BrightcoveLegacyIE._extract_brightcove_url(webpage) query = {
if not bc_url: 'wid': '_%s' % partner_id,
raise ExtractorError('Could not extract Brightcove URL from %s' % url, expected=True) 'uiconf_id': ui_id,
'flashvars[referenceId]': reference_id,
}
return self.url_result(smuggle_url(bc_url, {'Referer': url}), 'BrightcoveLegacy') if self.LOGGED_IN:
kaltura_session = self._download_json(
'%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
course_id, 'Downloading kaltura session JSON',
'Unable to download kaltura session JSON', fatal=False)
if kaltura_session:
session = kaltura_session.get('session')
if session:
query['flashvars[ks]'] = session
return self.url_result(update_url_query(
'https://cdnapisec.kaltura.com/html5/html5lib/v2.37.1/mwEmbedFrame.php', query),
'Kaltura')
class SafariCourseIE(SafariBaseIE): class SafariCourseIE(SafariBaseIE):
@ -140,7 +160,7 @@ class SafariCourseIE(SafariBaseIE):
course_id = self._match_id(url) course_id = self._match_id(url)
course_json = self._download_json( course_json = self._download_json(
'%s/%s/?override_format=%s' % (self._API_BASE, course_id, self._API_FORMAT), '%s/book/%s/?override_format=%s' % (self._API_BASE, course_id, self._API_FORMAT),
course_id, 'Downloading course JSON') course_id, 'Downloading course JSON')
if 'chapters' not in course_json: if 'chapters' not in course_json:

View File

@ -70,25 +70,27 @@ class ScreenwaveMediaIE(InfoExtractor):
formats = [] formats = []
for source in sources: for source in sources:
if source['type'] == 'hls': file_ = source.get('file')
formats.extend(self._extract_m3u8_formats(source['file'], video_id, ext='mp4')) if not file_:
continue
if source.get('type') == 'hls':
formats.extend(self._extract_m3u8_formats(file_, video_id, ext='mp4'))
else: else:
file_ = source.get('file')
if not file_:
continue
format_label = source.get('label')
format_id = self._search_regex( format_id = self._search_regex(
r'_(.+?)\.[^.]+$', file_, 'format id', default=None) r'_(.+?)\.[^.]+$', file_, 'format id', default=None)
if not self._is_valid_url(file_, video_id, format_id or 'video'):
continue
format_label = source.get('label')
height = int_or_none(self._search_regex( height = int_or_none(self._search_regex(
r'^(\d+)[pP]', format_label, 'height', default=None)) r'^(\d+)[pP]', format_label, 'height', default=None))
formats.append({ formats.append({
'url': source['file'], 'url': file_,
'format_id': format_id, 'format_id': format_id,
'format': format_label, 'format': format_label,
'ext': source.get('type'), 'ext': source.get('type'),
'height': height, 'height': height,
}) })
self._sort_formats(formats) self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
return { return {
'id': video_id, 'id': video_id,

View File

@ -1,7 +1,5 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
@ -14,7 +12,7 @@ class SexuIE(InfoExtractor):
'id': '961791', 'id': '961791',
'ext': 'mp4', 'ext': 'mp4',
'title': 'md5:4d05a19a5fc049a63dbbaf05fb71d91b', 'title': 'md5:4d05a19a5fc049a63dbbaf05fb71d91b',
'description': 'md5:c5ed8625eb386855d5a7967bd7b77a54', 'description': 'md5:2b75327061310a3afb3fbd7d09e2e403',
'categories': list, # NSFW 'categories': list, # NSFW
'thumbnail': 're:https?://.*\.jpg$', 'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18, 'age_limit': 18,
@ -25,13 +23,18 @@ class SexuIE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
quality_arr = self._search_regex( jwvideo = self._parse_json(
r'sources:\s*\[([^\]]+)\]', webpage, 'forrmat string') self._search_regex(r'\.setup\(\s*({.+?})\s*\);', webpage, 'jwvideo'),
video_id)
sources = jwvideo['sources']
formats = [{ formats = [{
'url': fmt[0].replace('\\', ''), 'url': source['file'].replace('\\', ''),
'format_id': fmt[1], 'format_id': source.get('label'),
'height': int(fmt[1][:3]), 'height': self._search_regex(
} for fmt in re.findall(r'"file":"([^"]+)","label":"([^"]+)"', quality_arr)] r'^(\d+)[pP]', source.get('label', ''), 'height', default=None),
} for source in sources if source.get('file')]
self._sort_formats(formats) self._sort_formats(formats)
title = self._html_search_regex( title = self._html_search_regex(
@ -40,9 +43,7 @@ class SexuIE(InfoExtractor):
description = self._html_search_meta( description = self._html_search_meta(
'description', webpage, 'description') 'description', webpage, 'description')
thumbnail = self._html_search_regex( thumbnail = jwvideo.get('image')
r'image:\s*"([^"]+)"',
webpage, 'thumbnail', fatal=False)
categories_str = self._html_search_meta( categories_str = self._html_search_meta(
'keywords', webpage, 'categories') 'keywords', webpage, 'categories')

View File

@ -1,38 +0,0 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE
from ..utils import RegexNotFoundError, ExtractorError
class SpaceIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|m)\.)?space\.com/\d+-(?P<title>[^/\.\?]*?)-video\.html'
_TEST = {
'add_ie': ['BrightcoveLegacy'],
'url': 'http://www.space.com/23373-huge-martian-landforms-detail-revealed-by-european-probe-video.html',
'info_dict': {
'id': '2780937028001',
'ext': 'mp4',
'title': 'Huge Martian Landforms\' Detail Revealed By European Probe | Video',
'description': 'md5:db81cf7f3122f95ed234b631a6ea1e61',
'uploader': 'TechMedia Networks',
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
title = mobj.group('title')
webpage = self._download_webpage(url, title)
try:
# Some videos require the playerKey field, which isn't define in
# the BrightcoveExperience object
brightcove_url = self._og_search_video_url(webpage)
except RegexNotFoundError:
# Other videos works fine with the info from the object
brightcove_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
if brightcove_url is None:
raise ExtractorError(
'The webpage does not contain a video', expected=True)
return self.url_result(brightcove_url, BrightcoveLegacyIE.ie_key())

View File

@ -73,7 +73,7 @@ class TEDIE(InfoExtractor):
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
'info_dict': { 'info_dict': {
'id': '_ZG8HBuDjgc', 'id': '_ZG8HBuDjgc',
'ext': 'mp4', 'ext': 'webm',
'title': 'Douglas Adams: Parrots the Universe and Everything', 'title': 'Douglas Adams: Parrots the Universe and Everything',
'description': 'md5:01ad1e199c49ac640cb1196c0e9016af', 'description': 'md5:01ad1e199c49ac640cb1196c0e9016af',
'uploader': 'University of California Television (UCTV)', 'uploader': 'University of California Television (UCTV)',

View File

@ -48,6 +48,6 @@ class TF1IE(InfoExtractor):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
wat_id = self._html_search_regex( wat_id = self._html_search_regex(
r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})\1', r'(["\'])(?:https?:)?//www\.wat\.tv/embedframe/.*?(?P<id>\d{8})(?:#.*?)?\1',
webpage, 'wat id', group='id') webpage, 'wat id', group='id')
return self.url_result('wat:%s' % wat_id, 'Wat') return self.url_result('wat:%s' % wat_id, 'Wat')

View File

@ -22,6 +22,7 @@ from ..utils import (
unsmuggle_url, unsmuggle_url,
xpath_with_ns, xpath_with_ns,
mimetype2ext, mimetype2ext,
find_xpath_attr,
) )
default_ns = 'http://www.w3.org/2005/SMIL21/Language' default_ns = 'http://www.w3.org/2005/SMIL21/Language'
@ -31,15 +32,11 @@ _x = lambda p: xpath_with_ns(p, {'smil': default_ns})
class ThePlatformBaseIE(InfoExtractor): class ThePlatformBaseIE(InfoExtractor):
def _extract_theplatform_smil(self, smil_url, video_id, note='Downloading SMIL data'): def _extract_theplatform_smil(self, smil_url, video_id, note='Downloading SMIL data'):
meta = self._download_xml(smil_url, video_id, note=note) meta = self._download_xml(smil_url, video_id, note=note)
try: error_element = find_xpath_attr(
error_msg = next( meta, _x('.//smil:ref'), 'src',
n.attrib['abstract'] 'http://link.theplatform.com/s/errorFiles/Unavailable.mp4')
for n in meta.findall(_x('.//smil:ref')) if error_element is not None:
if n.attrib.get('title') == 'Geographic Restriction' or n.attrib.get('title') == 'Expired') raise ExtractorError(error_element.attrib['abstract'], expected=True)
except StopIteration:
pass
else:
raise ExtractorError(error_msg, expected=True)
formats = self._parse_smil_formats( formats = self._parse_smil_formats(
meta, smil_url, video_id, namespace=default_ns, meta, smil_url, video_id, namespace=default_ns,

View File

@ -4,12 +4,12 @@ import re
from .common import InfoExtractor from .common import InfoExtractor
from .brightcove import BrightcoveLegacyIE from .brightcove import BrightcoveLegacyIE
from ..compat import compat_urlparse from ..compat import compat_parse_qs
class TlcDeIE(InfoExtractor): class TlcDeIE(InfoExtractor):
IE_NAME = 'tlc.de' IE_NAME = 'tlc.de'
_VALID_URL = r'http://www\.tlc\.de/sendungen/[^/]+/videos/(?P<title>[^/?]+)' _VALID_URL = r'http://www\.tlc\.de/(?:[^/]+/)*videos/(?P<title>[^/?#]+)?(?:.*#(?P<id>\d+))?'
_TEST = { _TEST = {
'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001', 'url': 'http://www.tlc.de/sendungen/breaking-amish/videos/#3235167922001',
@ -17,32 +17,23 @@ class TlcDeIE(InfoExtractor):
'id': '3235167922001', 'id': '3235167922001',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Breaking Amish: Die Welt da draußen', 'title': 'Breaking Amish: Die Welt da draußen',
'uploader': 'Discovery Networks - Germany',
'description': ( 'description': (
'Vier Amische und eine Mennonitin wagen in New York' 'Vier Amische und eine Mennonitin wagen in New York'
' den Sprung in ein komplett anderes Leben. Begleitet sie auf' ' den Sprung in ein komplett anderes Leben. Begleitet sie auf'
' ihrem spannenden Weg.'), ' ihrem spannenden Weg.'),
'timestamp': 1396598084,
'upload_date': '20140404',
'uploader_id': '1659832546',
}, },
} }
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1659832546/default_default/index.html?videoId=%s'
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
title = mobj.group('title') brightcove_id = mobj.group('id')
webpage = self._download_webpage(url, title) if not brightcove_id:
iframe_url = self._search_regex( title = mobj.group('title')
'<iframe src="(http://www\.tlc\.de/wp-content/.+?)"', webpage, webpage = self._download_webpage(url, title)
'iframe url') brightcove_legacy_url = BrightcoveLegacyIE._extract_brightcove_url(webpage)
# Otherwise we don't get the correct 'BrightcoveExperience' element, brightcove_id = compat_parse_qs(brightcove_legacy_url)['@videoPlayer'][0]
# example: http://www.tlc.de/sendungen/cake-boss/videos/cake-boss-cannoli-drama/ return self.url_result(self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
iframe_url = iframe_url.replace('.htm?', '.php?')
url_fragment = compat_urlparse.urlparse(url).fragment
if url_fragment:
# Since the fragment is not send to the server, we always get the same iframe
iframe_url = re.sub(r'playlist=(\d+)', 'playlist=%s' % url_fragment, iframe_url)
iframe = self._download_webpage(iframe_url, title)
return {
'_type': 'url',
'url': BrightcoveLegacyIE._extract_brightcove_url(iframe),
'ie': BrightcoveLegacyIE.ie_key(),
}

View File

@ -71,7 +71,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id') video_id = mobj.group('id')
display_id = mobj.group('display_id') display_id = mobj.group('display_id') if 'display_id' in mobj.groupdict() else video_id
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
@ -117,7 +117,7 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
title = self._html_search_regex( title = self._html_search_regex(
self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage) self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
age_limit = self._rta_search(webpage) age_limit = self._rta_search(webpage) or 18
duration = parse_duration(self._html_search_meta( duration = parse_duration(self._html_search_meta(
'duration', webpage, 'duration', default=None)) 'duration', webpage, 'duration', default=None))
@ -152,6 +152,36 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
} }
class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
_VALID_URL = r'https?://player\.(?:tna|emp)flix\.com/video/(?P<id>\d+)'
_TITLE_REGEX = r'<title>([^<]+)</title>'
_TESTS = [{
'url': 'https://player.tnaflix.com/video/6538',
'info_dict': {
'id': '6538',
'display_id': '6538',
'ext': 'mp4',
'title': 'Educational xxx video',
'thumbnail': 're:https?://.*\.jpg$',
'age_limit': 18,
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://player.empflix.com/video/33051',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [url for _, url in re.findall(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//player\.(?:tna|emp)flix\.com/video/\d+)\1',
webpage)]
class TNAFlixIE(TNAFlixNetworkBaseIE): class TNAFlixIE(TNAFlixNetworkBaseIE):
_VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'

View File

@ -17,6 +17,7 @@ from ..utils import (
encode_dict, encode_dict,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
orderedSet,
parse_duration, parse_duration,
parse_iso8601, parse_iso8601,
sanitized_Request, sanitized_Request,
@ -251,6 +252,7 @@ class TwitchVodIE(TwitchItemBaseIE):
self._USHER_BASE, item_id, self._USHER_BASE, item_id,
compat_urllib_parse.urlencode({ compat_urllib_parse.urlencode({
'allow_source': 'true', 'allow_source': 'true',
'allow_audio_only': 'true',
'allow_spectre': 'true', 'allow_spectre': 'true',
'player': 'twitchweb', 'player': 'twitchweb',
'nauth': access_token['token'], 'nauth': access_token['token'],
@ -281,17 +283,36 @@ class TwitchPlaylistBaseIE(TwitchBaseIE):
entries = [] entries = []
offset = 0 offset = 0
limit = self._PAGE_LIMIT limit = self._PAGE_LIMIT
broken_paging_detected = False
counter_override = None
for counter in itertools.count(1): for counter in itertools.count(1):
response = self._download_json( response = self._download_json(
self._PLAYLIST_URL % (channel_id, offset, limit), self._PLAYLIST_URL % (channel_id, offset, limit),
channel_id, 'Downloading %s videos JSON page %d' % (self._PLAYLIST_TYPE, counter)) channel_id,
'Downloading %s videos JSON page %s'
% (self._PLAYLIST_TYPE, counter_override or counter))
page_entries = self._extract_playlist_page(response) page_entries = self._extract_playlist_page(response)
if not page_entries: if not page_entries:
break break
total = int_or_none(response.get('_total'))
# Since the beginning of March 2016 twitch's paging mechanism
# is completely broken on the twitch side. It simply ignores
# a limit and returns the whole offset number of videos.
# Working around by just requesting all videos at once.
if not broken_paging_detected and total and len(page_entries) > limit:
self.report_warning(
'Twitch paging is broken on twitch side, requesting all videos at once',
channel_id)
broken_paging_detected = True
offset = total
counter_override = '(all at once)'
continue
entries.extend(page_entries) entries.extend(page_entries)
if broken_paging_detected or total and len(page_entries) >= total:
break
offset += limit offset += limit
return self.playlist_result( return self.playlist_result(
[self.url_result(entry) for entry in set(entries)], [self.url_result(entry) for entry in orderedSet(entries)],
channel_id, channel_name) channel_id, channel_name)
def _extract_playlist_page(self, response): def _extract_playlist_page(self, response):
@ -411,6 +432,7 @@ class TwitchStreamIE(TwitchBaseIE):
query = { query = {
'allow_source': 'true', 'allow_source': 'true',
'allow_audio_only': 'true',
'p': random.randint(1000000, 10000000), 'p': random.randint(1000000, 10000000),
'player': 'twitchweb', 'player': 'twitchweb',
'segment_preference': '4', 'segment_preference': '4',

View File

@ -10,7 +10,6 @@ from ..utils import (
remove_end, remove_end,
int_or_none, int_or_none,
ExtractorError, ExtractorError,
sanitized_Request,
) )
@ -22,7 +21,7 @@ class TwitterBaseIE(InfoExtractor):
class TwitterCardIE(TwitterBaseIE): class TwitterCardIE(TwitterBaseIE):
IE_NAME = 'twitter:card' IE_NAME = 'twitter:card'
_VALID_URL = r'https?://(?:www\.)?twitter\.com/i/cards/tfw/v1/(?P<id>\d+)' _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos/tweet)/(?P<id>\d+)'
_TESTS = [ _TESTS = [
{ {
'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889', 'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
@ -30,7 +29,7 @@ class TwitterCardIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '560070183650213889', 'id': '560070183650213889',
'ext': 'mp4', 'ext': 'mp4',
'title': 'TwitterCard', 'title': 'Twitter Card',
'thumbnail': 're:^https?://.*\.jpg$', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 30.033, 'duration': 30.033,
} }
@ -41,7 +40,7 @@ class TwitterCardIE(TwitterBaseIE):
'info_dict': { 'info_dict': {
'id': '623160978427936768', 'id': '623160978427936768',
'ext': 'mp4', 'ext': 'mp4',
'title': 'TwitterCard', 'title': 'Twitter Card',
'thumbnail': 're:^https?://.*\.jpg', 'thumbnail': 're:^https?://.*\.jpg',
'duration': 80.155, 'duration': 80.155,
}, },
@ -72,63 +71,102 @@ class TwitterCardIE(TwitterBaseIE):
'title': 'Vine by ArsenalTerje', 'title': 'Vine by ArsenalTerje',
}, },
'add_ie': ['Vine'], 'add_ie': ['Vine'],
} }, {
'url': 'https://twitter.com/i/videos/tweet/705235433198714880',
'md5': '3846d0a07109b5ab622425449b59049d',
'info_dict': {
'id': '705235433198714880',
'ext': 'mp4',
'title': 'Twitter web player',
'thumbnail': 're:^https?://.*\.jpg',
},
},
] ]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
# Different formats served for different User-Agents
USER_AGENTS = [
'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/20.0 (Chrome)', # mp4
'Mozilla/5.0 (Windows NT 5.2; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0', # webm
]
config = None config = None
formats = [] formats = []
for user_agent in USER_AGENTS: duration = None
request = sanitized_Request(url)
request.add_header('User-Agent', user_agent)
webpage = self._download_webpage(request, video_id)
iframe_url = self._html_search_regex( webpage = self._download_webpage(url, video_id)
r'<iframe[^>]+src="((?:https?:)?//(?:www.youtube.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
webpage, 'video iframe', default=None)
if iframe_url:
return self.url_result(iframe_url)
config = self._parse_json(self._html_search_regex( iframe_url = self._html_search_regex(
r'data-player-config="([^"]+)"', webpage, 'data player config'), r'<iframe[^>]+src="((?:https?:)?//(?:www.youtube.com/embed/[^"]+|(?:www\.)?vine\.co/v/\w+/card))"',
video_id) webpage, 'video iframe', default=None)
if 'playlist' not in config: if iframe_url:
if 'vmapUrl' in config: return self.url_result(iframe_url)
formats.append({
'url': self._get_vmap_video_url(config['vmapUrl'], video_id),
})
break # same video regardless of UA
continue
video_url = config['playlist'][0]['source'] config = self._parse_json(self._html_search_regex(
r'data-(?:player-)?config="([^"]+)"', webpage, 'data player config'),
video_id)
def _search_dimensions_in_video_url(a_format, video_url):
m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url)
if m:
a_format.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
playlist = config.get('playlist')
if playlist:
video_url = playlist[0]['source']
f = { f = {
'url': video_url, 'url': video_url,
} }
m = re.search(r'/(?P<width>\d+)x(?P<height>\d+)/', video_url) _search_dimensions_in_video_url(f, video_url)
if m:
f.update({
'width': int(m.group('width')),
'height': int(m.group('height')),
})
formats.append(f) formats.append(f)
vmap_url = config.get('vmapUrl') or config.get('vmap_url')
if vmap_url:
formats.append({
'url': self._get_vmap_video_url(vmap_url, video_id),
})
media_info = None
for entity in config.get('status', {}).get('entities', []):
if 'mediaInfo' in entity:
media_info = entity['mediaInfo']
if media_info:
for media_variant in media_info['variants']:
media_url = media_variant['url']
if media_url.endswith('.m3u8'):
formats.extend(self._extract_m3u8_formats(media_url, video_id, ext='mp4', m3u8_id='hls'))
elif media_url.endswith('.mpd'):
formats.extend(self._extract_mpd_formats(media_url, video_id, mpd_id='dash'))
else:
vbr = int_or_none(media_variant.get('bitRate'), scale=1000)
a_format = {
'url': media_url,
'format_id': 'http-%d' % vbr if vbr else 'http',
'vbr': vbr,
}
# Reported bitRate may be zero
if not a_format['vbr']:
del a_format['vbr']
_search_dimensions_in_video_url(a_format, media_url)
formats.append(a_format)
duration = float_or_none(media_info.get('duration', {}).get('nanos'), scale=1e9)
self._sort_formats(formats) self._sort_formats(formats)
thumbnail = config.get('posterImageUrl') title = self._search_regex(r'<title>([^<]+)</title>', webpage, 'title')
duration = float_or_none(config.get('duration')) thumbnail = config.get('posterImageUrl') or config.get('image_src')
duration = float_or_none(config.get('duration')) or duration
return { return {
'id': video_id, 'id': video_id,
'title': 'TwitterCard', 'title': title,
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'duration': duration, 'duration': duration,
'formats': formats, 'formats': formats,
@ -142,7 +180,6 @@ class TwitterIE(InfoExtractor):
_TESTS = [{ _TESTS = [{
'url': 'https://twitter.com/freethenipple/status/643211948184596480', 'url': 'https://twitter.com/freethenipple/status/643211948184596480',
# MD5 checksums are different in different places
'info_dict': { 'info_dict': {
'id': '643211948184596480', 'id': '643211948184596480',
'ext': 'mp4', 'ext': 'mp4',
@ -153,6 +190,9 @@ class TwitterIE(InfoExtractor):
'uploader': 'FREE THE NIPPLE', 'uploader': 'FREE THE NIPPLE',
'uploader_id': 'freethenipple', 'uploader_id': 'freethenipple',
}, },
'params': {
'skip_download': True, # requires ffmpeg
},
}, { }, {
'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1', 'url': 'https://twitter.com/giphz/status/657991469417025536/photo/1',
'md5': 'f36dcd5fb92bf7057f155e7d927eeb42', 'md5': 'f36dcd5fb92bf7057f155e7d927eeb42',
@ -177,6 +217,36 @@ class TwitterIE(InfoExtractor):
'uploader_id': 'starwars', 'uploader_id': 'starwars',
'uploader': 'Star Wars', 'uploader': 'Star Wars',
}, },
}, {
'url': 'https://twitter.com/BTNBrentYarina/status/705235433198714880',
'info_dict': {
'id': '705235433198714880',
'ext': 'mp4',
'title': 'Brent Yarina - Khalil Iverson\'s missed highlight dunk. And made highlight dunk. In one highlight.',
'description': 'Brent Yarina on Twitter: "Khalil Iverson\'s missed highlight dunk. And made highlight dunk. In one highlight."',
'uploader_id': 'BTNBrentYarina',
'uploader': 'Brent Yarina',
},
'params': {
# The same video as https://twitter.com/i/videos/tweet/705235433198714880
# Test case of TwitterCardIE
'skip_download': True,
},
}, {
'url': 'https://twitter.com/jaydingeer/status/700207533655363584',
'md5': '',
'info_dict': {
'id': '700207533655363584',
'ext': 'mp4',
'title': 'jay - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'jay on Twitter: "BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
'thumbnail': 're:^https?://.*\.jpg',
'uploader': 'jay',
'uploader_id': 'jaydingeer',
},
'params': {
'skip_download': True, # requires ffmpeg
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -234,6 +304,15 @@ class TwitterIE(InfoExtractor):
}) })
return info return info
if 'class="PlayableMedia' in webpage:
info.update({
'_type': 'url_transparent',
'ie_key': 'TwitterCard',
'url': '%s//twitter.com/i/videos/tweet/%s' % (self.http_scheme(), twid),
})
return info
raise ExtractorError('There\'s no video in this tweet.') raise ExtractorError('There\'s no video in this tweet.')

View File

@ -0,0 +1,48 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
get_element_by_attribute,
parse_duration,
update_url_query,
ExtractorError,
)
from ..compat import compat_str
class USATodayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?usatoday\.com/(?:[^/]+/)*(?P<id>[^?/#]+)'
_TEST = {
'url': 'http://www.usatoday.com/media/cinematic/video/81729424/us-france-warn-syrian-regime-ahead-of-new-peace-talks/',
'md5': '4d40974481fa3475f8bccfd20c5361f8',
'info_dict': {
'id': '81729424',
'ext': 'mp4',
'title': 'US, France warn Syrian regime ahead of new peace talks',
'timestamp': 1457891045,
'description': 'md5:7e50464fdf2126b0f533748d3c78d58f',
'uploader_id': '29906170001',
'upload_date': '20160313',
}
}
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/29906170001/38a9eecc-bdd8-42a3-ba14-95397e48b3f8_default/index.html?videoId=%s'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(update_url_query(url, {'ajax': 'true'}), display_id)
ui_video_data = get_element_by_attribute('class', 'ui-video-data', webpage)
if not ui_video_data:
raise ExtractorError('no video on the webpage', expected=True)
video_data = self._parse_json(ui_video_data, display_id)
return {
'_type': 'url_transparent',
'url': self.BRIGHTCOVE_URL_TEMPLATE % video_data['brightcove_id'],
'id': compat_str(video_data['id']),
'title': video_data['title'],
'thumbnail': video_data.get('thumbnail'),
'description': video_data.get('description'),
'duration': parse_duration(video_data.get('length')),
'ie_key': 'BrightcoveNew',
}

View File

@ -0,0 +1,67 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
)
class UstudioIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:www|v1)\.)?ustudio\.com/video/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'http://ustudio.com/video/Uxu2my9bgSph/san_francisco_golden_gate_bridge',
'md5': '58bbfca62125378742df01fc2abbdef6',
'info_dict': {
'id': 'Uxu2my9bgSph',
'display_id': 'san_francisco_golden_gate_bridge',
'ext': 'mp4',
'title': 'San Francisco: Golden Gate Bridge',
'description': 'md5:23925500697f2c6d4830e387ba51a9be',
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20111107',
'uploader': 'Tony Farley',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
config = self._download_xml(
'http://v1.ustudio.com/embed/%s/ustudio/config.xml' % video_id,
display_id)
def extract(kind):
return [{
'url': item.attrib['url'],
'width': int_or_none(item.get('width')),
'height': int_or_none(item.get('height')),
} for item in config.findall('./qualities/quality/%s' % kind) if item.get('url')]
formats = extract('video')
self._sort_formats(formats)
webpage = self._download_webpage(url, display_id)
title = self._og_search_title(webpage)
upload_date = unified_strdate(self._search_regex(
r'(?s)Uploaded by\s*.+?\s*on\s*<span>([^<]+)</span>',
webpage, 'upload date', fatal=False))
uploader = self._search_regex(
r'Uploaded by\s*<a[^>]*>([^<]+)<',
webpage, 'uploader', fatal=False)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': self._og_search_description(webpage),
'thumbnails': extract('image'),
'upload_date': upload_date,
'uploader': uploader,
'formats': formats,
}

View File

@ -20,6 +20,7 @@ class VGTVIE(XstreamIE):
'aftenbladet.no/tv': 'satv', 'aftenbladet.no/tv': 'satv',
'fvn.no/fvntv': 'fvntv', 'fvn.no/fvntv': 'fvntv',
'aftenposten.no/webtv': 'aptv', 'aftenposten.no/webtv': 'aptv',
'ap.vgtv.no/webtv': 'aptv',
} }
_APP_NAME_TO_VENDOR = { _APP_NAME_TO_VENDOR = {
@ -35,7 +36,7 @@ class VGTVIE(XstreamIE):
(?P<host> (?P<host>
%s %s
) )
/ /?
(?: (?:
\#!/(?:video|live)/| \#!/(?:video|live)/|
embed?.*id= embed?.*id=
@ -107,19 +108,27 @@ class VGTVIE(XstreamIE):
'md5': 'fd828cd29774a729bf4d4425fe192972', 'md5': 'fd828cd29774a729bf4d4425fe192972',
'info_dict': { 'info_dict': {
'id': '21039', 'id': '21039',
'ext': 'mov', 'ext': 'mp4',
'title': 'TRAILER: «SWEATSHOP» - I can´t take any more', 'title': 'TRAILER: «SWEATSHOP» - I can´t take any more',
'description': 'md5:21891f2b0dd7ec2f78d84a50e54f8238', 'description': 'md5:21891f2b0dd7ec2f78d84a50e54f8238',
'duration': 66, 'duration': 66,
'timestamp': 1417002452, 'timestamp': 1417002452,
'upload_date': '20141126', 'upload_date': '20141126',
'view_count': int, 'view_count': int,
} },
'params': {
# m3u8 download
'skip_download': True,
},
}, },
{ {
'url': 'http://www.bt.no/tv/#!/video/100250/norling-dette-er-forskjellen-paa-1-divisjon-og-eliteserien', 'url': 'http://www.bt.no/tv/#!/video/100250/norling-dette-er-forskjellen-paa-1-divisjon-og-eliteserien',
'only_matching': True, 'only_matching': True,
}, },
{
'url': 'http://ap.vgtv.no/webtv#!/video/111084/de-nye-bysyklene-lettere-bedre-gir-stoerre-hjul-og-feste-til-mobil',
'only_matching': True,
},
] ]
def _real_extract(self, url): def _real_extract(self, url):
@ -144,8 +153,6 @@ class VGTVIE(XstreamIE):
if len(video_id) == 5: if len(video_id) == 5:
if appname == 'bttv': if appname == 'bttv':
info = self._extract_video_info('btno', video_id) info = self._extract_video_info('btno', video_id)
elif appname == 'aptv':
info = self._extract_video_info('ap', video_id)
streams = data['streamUrls'] streams = data['streamUrls']
stream_type = data.get('streamType') stream_type = data.get('streamType')

View File

@ -1,31 +1,37 @@
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from .ooyala import OoyalaIE from .ooyala import OoyalaIE
from ..utils import ExtractorError from ..utils import ExtractorError
class ViceIE(InfoExtractor): class ViceIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)+(?P<id>.+)' _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?videos?/(?P<id>[^/?#&]+)'
_TESTS = [ _TESTS = [{
{ 'url': 'http://www.vice.com/video/cowboy-capitalists-part-1',
'url': 'http://www.vice.com/Fringes/cowboy-capitalists-part-1', 'info_dict': {
'info_dict': { 'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp',
'id': '43cW1mYzpia9IlestBjVpd23Yu3afAfp', 'ext': 'mp4',
'ext': 'mp4', 'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov',
'title': 'VICE_COWBOYCAPITALISTS_PART01_v1_VICE_WM_1080p.mov', 'duration': 725.983,
'duration': 725.983, },
}, 'params': {
'params': { # Requires ffmpeg (m3u8 manifest)
# Requires ffmpeg (m3u8 manifest) 'skip_download': True,
'skip_download': True, },
}, }, {
}, { 'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab',
'url': 'https://news.vice.com/video/experimenting-on-animals-inside-the-monkey-lab', 'only_matching': True,
'only_matching': True, }, {
} 'url': 'http://www.vice.com/ru/video/big-night-out-ibiza-clive-martin-229',
] 'only_matching': True,
}, {
'url': 'https://munchies.vice.com/en/videos/watch-the-trailer-for-our-new-series-the-pizza-show',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
@ -38,3 +44,35 @@ class ViceIE(InfoExtractor):
except ExtractorError: except ExtractorError:
raise ExtractorError('The page doesn\'t contain a video', expected=True) raise ExtractorError('The page doesn\'t contain a video', expected=True)
return self.url_result(ooyala_url, ie='Ooyala') return self.url_result(ooyala_url, ie='Ooyala')
class ViceShowIE(InfoExtractor):
_VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?show/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://munchies.vice.com/en/show/fuck-thats-delicious-2',
'info_dict': {
'id': 'fuck-thats-delicious-2',
'title': "Fuck, That's Delicious",
'description': 'Follow the culinary adventures of rapper Action Bronson during his ongoing world tour.',
},
'playlist_count': 17,
}
def _real_extract(self, url):
show_id = self._match_id(url)
webpage = self._download_webpage(url, show_id)
entries = [
self.url_result(video_url, ViceIE.ie_key())
for video_url, _ in re.findall(
r'<h2[^>]+class="article-title"[^>]+data-id="\d+"[^>]*>\s*<a[^>]+href="(%s.*?)"'
% ViceIE._VALID_URL, webpage)]
title = self._search_regex(
r'<title>(.+?)</title>', webpage, 'title', default=None)
if title:
title = re.sub(r'(.+)\s*\|\s*.+$', r'\1', title).strip()
description = self._html_search_meta('description', webpage, 'description')
return self.playlist_result(entries, show_id, title, description)

View File

@ -4,11 +4,13 @@ from __future__ import unicode_literals
import re import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import sanitized_Request from ..utils import (
decode_packed_codes,
sanitized_Request,
)
class VideoMegaIE(InfoExtractor): class VideoMegaIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'(?:videomega:|https?://(?:www\.)?videomega\.tv/(?:(?:view|iframe|cdn)\.php)?\?ref=)(?P<id>[A-Za-z0-9]+)' _VALID_URL = r'(?:videomega:|https?://(?:www\.)?videomega\.tv/(?:(?:view|iframe|cdn)\.php)?\?ref=)(?P<id>[A-Za-z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'http://videomega.tv/cdn.php?ref=AOSQBJYKIDDIKYJBQSOA', 'url': 'http://videomega.tv/cdn.php?ref=AOSQBJYKIDDIKYJBQSOA',
@ -42,8 +44,10 @@ class VideoMegaIE(InfoExtractor):
r'(?:^[Vv]ideo[Mm]ega\.tv\s-\s*|\s*-\svideomega\.tv$)', '', title) r'(?:^[Vv]ideo[Mm]ega\.tv\s-\s*|\s*-\svideomega\.tv$)', '', title)
thumbnail = self._search_regex( thumbnail = self._search_regex(
r'<video[^>]+?poster="([^"]+)"', webpage, 'thumbnail', fatal=False) r'<video[^>]+?poster="([^"]+)"', webpage, 'thumbnail', fatal=False)
real_codes = decode_packed_codes(webpage)
video_url = self._search_regex( video_url = self._search_regex(
r'<source[^>]+?src="([^"]+)"', webpage, 'video URL') r'"src"\s*,\s*"([^"]+)"', real_codes, 'video URL')
return { return {
'id': video_id, 'id': video_id,

View File

@ -1,11 +1,14 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
from .common import InfoExtractor from .jwplatform import JWPlatformBaseIE
from ..utils import smuggle_url from ..utils import (
decode_packed_codes,
js_to_json,
)
class VidziIE(InfoExtractor): class VidziIE(JWPlatformBaseIE):
_VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?P<id>\w+)' _VALID_URL = r'https?://(?:www\.)?vidzi\.tv/(?P<id>\w+)'
_TEST = { _TEST = {
'url': 'http://vidzi.tv/cghql9yq6emu.html', 'url': 'http://vidzi.tv/cghql9yq6emu.html',
@ -14,7 +17,6 @@ class VidziIE(InfoExtractor):
'id': 'cghql9yq6emu', 'id': 'cghql9yq6emu',
'ext': 'mp4', 'ext': 'mp4',
'title': 'youtube-dl test video 1\\\\2\'3/4<5\\\\6ä7↭', 'title': 'youtube-dl test video 1\\\\2\'3/4<5\\\\6ä7↭',
'uploader': 'vidzi.tv',
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
@ -29,11 +31,12 @@ class VidziIE(InfoExtractor):
title = self._html_search_regex( title = self._html_search_regex(
r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title') r'(?s)<h2 class="video-title">(.*?)</h2>', webpage, 'title')
# Vidzi now uses jwplayer, which can be handled by GenericIE code = decode_packed_codes(webpage).replace('\\\'', '\'')
return { jwplayer_data = self._parse_json(
'_type': 'url_transparent', self._search_regex(r'setup\(([^)]+)\)', code, 'jwplayer data'),
'id': video_id, video_id, transform_source=js_to_json)
'title': title,
'url': smuggle_url(url, {'to_generic': True}), info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
'ie_key': 'Generic', info_dict['title'] = title
}
return info_dict

View File

@ -176,13 +176,13 @@ class VikiIE(VikiBaseIE):
}, { }, {
# youtube external # youtube external
'url': 'http://www.viki.com/videos/50562v-poor-nastya-complete-episode-1', 'url': 'http://www.viki.com/videos/50562v-poor-nastya-complete-episode-1',
'md5': '216d1afdc0c64d1febc1e9f2bd4b864b', 'md5': '63f8600c1da6f01b7640eee7eca4f1da',
'info_dict': { 'info_dict': {
'id': '50562v', 'id': '50562v',
'ext': 'mp4', 'ext': 'webm',
'title': 'Poor Nastya [COMPLETE] - Episode 1', 'title': 'Poor Nastya [COMPLETE] - Episode 1',
'description': '', 'description': '',
'duration': 607, 'duration': 606,
'timestamp': 1274949505, 'timestamp': 1274949505,
'upload_date': '20101213', 'upload_date': '20101213',
'uploader': 'ad14065n', 'uploader': 'ad14065n',

View File

@ -73,15 +73,26 @@ class VimeoIE(VimeoBaseInfoExtractor):
# _VALID_URL matches Vimeo URLs # _VALID_URL matches Vimeo URLs
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:(?:www|(?P<player>player))\.)? (?:
vimeo(?P<pro>pro)?\.com/ (?:
(?!channels/[^/?#]+/?(?:$|[?#])|album/) www|
(?:.*?/)? (?P<player>player)
(?:(?:play_redirect_hls|moogaloop\.swf)\?clip_id=)? )
(?:videos?/)? \.
(?P<id>[0-9]+) )?
/?(?:[?&].*)?(?:[#].*)?$''' vimeo(?P<pro>pro)?\.com/
(?!channels/[^/?#]+/?(?:$|[?#])|(?:album|ondemand)/)
(?:.*?/)?
(?:
(?:
play_redirect_hls|
moogaloop\.swf)\?clip_id=
)?
(?:videos?/)?
(?P<id>[0-9]+)
/?(?:[?&].*)?(?:[#].*)?$
'''
IE_NAME = 'vimeo' IE_NAME = 'vimeo'
_TESTS = [ _TESTS = [
{ {
@ -93,6 +104,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550", 'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
'description': 'md5:2d3305bad981a06ff79f027f19865021', 'description': 'md5:2d3305bad981a06ff79f027f19865021',
'upload_date': '20121220', 'upload_date': '20121220',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user7108434',
'uploader_id': 'user7108434', 'uploader_id': 'user7108434',
'uploader': 'Filippo Valsorda', 'uploader': 'Filippo Valsorda',
'duration': 10, 'duration': 10,
@ -105,6 +117,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'info_dict': { 'info_dict': {
'id': '68093876', 'id': '68093876',
'ext': 'mp4', 'ext': 'mp4',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/openstreetmapus',
'uploader_id': 'openstreetmapus', 'uploader_id': 'openstreetmapus',
'uploader': 'OpenStreetMap US', 'uploader': 'OpenStreetMap US',
'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography', 'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography',
@ -121,6 +134,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012', 'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
'uploader': 'The BLN & Business of Software', 'uploader': 'The BLN & Business of Software',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
'uploader_id': 'theblnbusinessofsoftware', 'uploader_id': 'theblnbusinessofsoftware',
'duration': 3610, 'duration': 3610,
'description': None, 'description': None,
@ -135,6 +149,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'youtube-dl password protected test video', 'title': 'youtube-dl password protected test video',
'upload_date': '20130614', 'upload_date': '20130614',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user18948128',
'uploader_id': 'user18948128', 'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz', 'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10, 'duration': 10,
@ -154,6 +169,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Key & Peele: Terrorist Interrogation', 'title': 'Key & Peele: Terrorist Interrogation',
'description': 'md5:8678b246399b070816b12313e8b4eb5c', 'description': 'md5:8678b246399b070816b12313e8b4eb5c',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/atencio',
'uploader_id': 'atencio', 'uploader_id': 'atencio',
'uploader': 'Peter Atencio', 'uploader': 'Peter Atencio',
'upload_date': '20130927', 'upload_date': '20130927',
@ -169,6 +185,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'title': 'The New Vimeo Player (You Know, For Videos)', 'title': 'The New Vimeo Player (You Know, For Videos)',
'description': 'md5:2ec900bf97c3f389378a96aee11260ea', 'description': 'md5:2ec900bf97c3f389378a96aee11260ea',
'upload_date': '20131015', 'upload_date': '20131015',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/staff',
'uploader_id': 'staff', 'uploader_id': 'staff',
'uploader': 'Vimeo Staff', 'uploader': 'Vimeo Staff',
'duration': 62, 'duration': 62,
@ -183,6 +200,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'Pier Solar OUYA Official Trailer', 'title': 'Pier Solar OUYA Official Trailer',
'uploader': 'Tulio Gonçalves', 'uploader': 'Tulio Gonçalves',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user28849593',
'uploader_id': 'user28849593', 'uploader_id': 'user28849593',
}, },
}, },
@ -195,6 +213,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': 'FOX CLASSICS - Forever Classic ID - A Full Minute', 'title': 'FOX CLASSICS - Forever Classic ID - A Full Minute',
'uploader': 'The DMCI', 'uploader': 'The DMCI',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/dmci',
'uploader_id': 'dmci', 'uploader_id': 'dmci',
'upload_date': '20111220', 'upload_date': '20111220',
'description': 'md5:ae23671e82d05415868f7ad1aec21147', 'description': 'md5:ae23671e82d05415868f7ad1aec21147',
@ -269,9 +288,8 @@ class VimeoIE(VimeoBaseInfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
url, data = unsmuggle_url(url, {}) url, data = unsmuggle_url(url, {})
headers = std_headers headers = std_headers.copy()
if 'http_headers' in data: if 'http_headers' in data:
headers = headers.copy()
headers.update(data['http_headers']) headers.update(data['http_headers'])
if 'Referer' not in headers: if 'Referer' not in headers:
headers['Referer'] = url headers['Referer'] = url
@ -286,7 +304,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
url = 'https://vimeo.com/' + video_id url = 'https://vimeo.com/' + video_id
# Retrieve video webpage to extract further information # Retrieve video webpage to extract further information
request = sanitized_Request(url, None, headers) request = sanitized_Request(url, headers=headers)
try: try:
webpage = self._download_webpage(request, video_id) webpage = self._download_webpage(request, video_id)
except ExtractorError as ee: except ExtractorError as ee:
@ -370,9 +388,10 @@ class VimeoIE(VimeoBaseInfoExtractor):
# Extract title # Extract title
video_title = config['video']['title'] video_title = config['video']['title']
# Extract uploader and uploader_id # Extract uploader, uploader_url and uploader_id
video_uploader = config['video']['owner']['name'] video_uploader = config['video'].get('owner', {}).get('name')
video_uploader_id = config['video']['owner']['url'].split('/')[-1] if config['video']['owner']['url'] else None video_uploader_url = config['video'].get('owner', {}).get('url')
video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
# Extract video thumbnail # Extract video thumbnail
video_thumbnail = config['video'].get('thumbnail') video_thumbnail = config['video'].get('thumbnail')
@ -473,6 +492,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
return { return {
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
'uploader_url': video_uploader_url,
'uploader_id': video_uploader_id, 'uploader_id': video_uploader_id,
'upload_date': video_upload_date, 'upload_date': video_upload_date,
'title': video_title, 'title': video_title,
@ -488,6 +508,38 @@ class VimeoIE(VimeoBaseInfoExtractor):
} }
class VimeoOndemandIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:ondemand'
_VALID_URL = r'https?://(?:www\.)?vimeo\.com/ondemand/(?P<id>[^/?#&]+)'
_TESTS = [{
# ondemand video not available via https://vimeo.com/id
'url': 'https://vimeo.com/ondemand/20704',
'md5': 'c424deda8c7f73c1dfb3edd7630e2f35',
'info_dict': {
'id': '105442900',
'ext': 'mp4',
'title': 'המעבדה - במאי יותם פלדמן',
'uploader': 'גם סרטים',
'uploader_url': 're:https?://(?:www\.)?vimeo\.com/gumfilms',
'uploader_id': 'gumfilms',
},
}, {
'url': 'https://vimeo.com/ondemand/nazmaalik',
'only_matching': True,
}, {
'url': 'https://vimeo.com/ondemand/141692381',
'only_matching': True,
}, {
'url': 'https://vimeo.com/ondemand/thelastcolony/150274832',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
return self.url_result(self._og_search_video_url(webpage), VimeoIE.ie_key())
class VimeoChannelIE(VimeoBaseInfoExtractor): class VimeoChannelIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:channel' IE_NAME = 'vimeo:channel'
_VALID_URL = r'https://vimeo\.com/channels/(?P<id>[^/?#]+)/?(?:$|[?#])' _VALID_URL = r'https://vimeo\.com/channels/(?P<id>[^/?#]+)/?(?:$|[?#])'

View File

@ -142,10 +142,10 @@ class VKIE(InfoExtractor):
'url': 'https://vk.com/video276849682_170681728', 'url': 'https://vk.com/video276849682_170681728',
'info_dict': { 'info_dict': {
'id': 'V3K4mi0SYkc', 'id': 'V3K4mi0SYkc',
'ext': 'mp4', 'ext': 'webm',
'title': "DSWD Awards 'Children's Joy Foundation, Inc.' Certificate of Registration and License to Operate", 'title': "DSWD Awards 'Children's Joy Foundation, Inc.' Certificate of Registration and License to Operate",
'description': 'md5:bf9c26cfa4acdfb146362682edd3827a', 'description': 'md5:bf9c26cfa4acdfb146362682edd3827a',
'duration': 179, 'duration': 178,
'upload_date': '20130116', 'upload_date': '20130116',
'uploader': "Children's Joy Foundation", 'uploader': "Children's Joy Foundation",
'uploader_id': 'thecjf', 'uploader_id': 'thecjf',

View File

@ -12,38 +12,52 @@ class WebOfStoriesIE(InfoExtractor):
_VIDEO_DOMAIN = 'http://eu-mobile.webofstories.com/' _VIDEO_DOMAIN = 'http://eu-mobile.webofstories.com/'
_GREAT_LIFE_STREAMER = 'rtmp://eu-cdn1.webofstories.com/cfx/st/' _GREAT_LIFE_STREAMER = 'rtmp://eu-cdn1.webofstories.com/cfx/st/'
_USER_STREAMER = 'rtmp://eu-users.webofstories.com/cfx/st/' _USER_STREAMER = 'rtmp://eu-users.webofstories.com/cfx/st/'
_TESTS = [ _TESTS = [{
{ 'url': 'http://www.webofstories.com/play/hans.bethe/71',
'url': 'http://www.webofstories.com/play/hans.bethe/71', 'md5': '373e4dd915f60cfe3116322642ddf364',
'md5': '373e4dd915f60cfe3116322642ddf364', 'info_dict': {
'info_dict': { 'id': '4536',
'id': '4536', 'ext': 'mp4',
'ext': 'mp4', 'title': 'The temperature of the sun',
'title': 'The temperature of the sun', 'thumbnail': 're:^https?://.*\.jpg$',
'thumbnail': 're:^https?://.*\.jpg$', 'description': 'Hans Bethe talks about calculating the temperature of the sun',
'description': 'Hans Bethe talks about calculating the temperature of the sun', 'duration': 238,
'duration': 238, }
} }, {
'url': 'http://www.webofstories.com/play/55908',
'md5': '2985a698e1fe3211022422c4b5ed962c',
'info_dict': {
'id': '55908',
'ext': 'mp4',
'title': 'The story of Gemmata obscuriglobus',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'Planctomycete talks about The story of Gemmata obscuriglobus',
'duration': 169,
}, },
{ 'skip': 'notfound',
'url': 'http://www.webofstories.com/play/55908', }, {
'md5': '2985a698e1fe3211022422c4b5ed962c', # malformed og:title meta
'info_dict': { 'url': 'http://www.webofstories.com/play/54215?o=MS',
'id': '55908', 'info_dict': {
'ext': 'mp4', 'id': '54215',
'title': 'The story of Gemmata obscuriglobus', 'ext': 'mp4',
'thumbnail': 're:^https?://.*\.jpg$', 'title': '"A Leg to Stand On"',
'description': 'Planctomycete talks about The story of Gemmata obscuriglobus', 'thumbnail': 're:^https?://.*\.jpg$',
'duration': 169, 'description': 'Oliver Sacks talks about the death and resurrection of a limb',
} 'duration': 97,
}, },
] 'params': {
'skip_download': True,
},
}]
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage) # Sometimes og:title meta is malformed
title = self._og_search_title(webpage, default=None) or self._html_search_regex(
r'(?s)<strong>Title:\s*</strong>(.+?)<', webpage, 'title')
description = self._html_search_meta('description', webpage) description = self._html_search_meta('description', webpage)
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)

View File

@ -20,7 +20,7 @@ class WimpIE(InfoExtractor):
'md5': '4e2986c793694b55b37cf92521d12bb4', 'md5': '4e2986c793694b55b37cf92521d12bb4',
'info_dict': { 'info_dict': {
'id': 'clowncar', 'id': 'clowncar',
'ext': 'mp4', 'ext': 'webm',
'title': 'It\'s like a clown car.', 'title': 'It\'s like a clown car.',
'description': 'md5:0e56db1370a6e49c5c1d19124c0d2fb2', 'description': 'md5:0e56db1370a6e49c5c1d19124c0d2fb2',
}, },

View File

@ -35,7 +35,8 @@ class WistiaIE(InfoExtractor):
formats = [] formats = []
thumbnails = [] thumbnails = []
for atype, a in data['assets'].items(): for a in data['assets']:
atype = a.get('type')
if atype == 'still': if atype == 'still':
thumbnails.append({ thumbnails.append({
'url': a['url'], 'url': a['url'],

View File

@ -10,13 +10,27 @@ from ..compat import (
compat_urllib_parse, compat_urllib_parse,
) )
from ..utils import ( from ..utils import (
ExtractorError,
int_or_none, int_or_none,
float_or_none, float_or_none,
sanitized_Request, sanitized_Request,
) )
class YandexMusicTrackIE(InfoExtractor): class YandexMusicBaseIE(InfoExtractor):
@staticmethod
def _handle_error(response):
error = response.get('error')
if error:
raise ExtractorError(error, expected=True)
def _download_json(self, *args, **kwargs):
response = super(YandexMusicBaseIE, self)._download_json(*args, **kwargs)
self._handle_error(response)
return response
class YandexMusicTrackIE(YandexMusicBaseIE):
IE_NAME = 'yandexmusic:track' IE_NAME = 'yandexmusic:track'
IE_DESC = 'Яндекс.Музыка - Трек' IE_DESC = 'Яндекс.Музыка - Трек'
_VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)' _VALID_URL = r'https?://music\.yandex\.(?:ru|kz|ua|by)/album/(?P<album_id>\d+)/track/(?P<id>\d+)'
@ -73,7 +87,7 @@ class YandexMusicTrackIE(InfoExtractor):
return self._get_track_info(track) return self._get_track_info(track)
class YandexMusicPlaylistBaseIE(InfoExtractor): class YandexMusicPlaylistBaseIE(YandexMusicBaseIE):
def _build_playlist(self, tracks): def _build_playlist(self, tracks):
return [ return [
self.url_result( self.url_result(

View File

@ -75,7 +75,7 @@ class YouPornIE(InfoExtractor):
links = [] links = []
sources = self._search_regex( sources = self._search_regex(
r'sources\s*:\s*({.+?})', webpage, 'sources', default=None) r'(?s)sources\s*:\s*({.+?})', webpage, 'sources', default=None)
if sources: if sources:
for _, link in re.findall(r'[^:]+\s*:\s*(["\'])(http.+?)\1', sources): for _, link in re.findall(r'[^:]+\s*:\s*(["\'])(http.+?)\1', sources):
links.append(link) links.append(link)
@ -101,8 +101,9 @@ class YouPornIE(InfoExtractor):
} }
# Video URL's path looks like this: # Video URL's path looks like this:
# /201012/17/505835/720p_1500k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4 # /201012/17/505835/720p_1500k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
# /201012/17/505835/vl_240p_240k_505835/YouPorn%20-%20Sex%20Ed%20Is%20It%20Safe%20To%20Masturbate%20Daily.mp4
# We will benefit from it by extracting some metadata # We will benefit from it by extracting some metadata
mobj = re.search(r'/(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+/', video_url) mobj = re.search(r'(?P<height>\d{3,4})[pP]_(?P<bitrate>\d+)[kK]_\d+/', video_url)
if mobj: if mobj:
height = int(mobj.group('height')) height = int(mobj.group('height'))
bitrate = int(mobj.group('bitrate')) bitrate = int(mobj.group('bitrate'))

View File

@ -6,6 +6,7 @@ from __future__ import unicode_literals
import itertools import itertools
import json import json
import os.path import os.path
import random
import re import re
import time import time
import traceback import traceback
@ -382,7 +383,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'youtube-dl test video "\'/\\ä↭𝕐', 'title': 'youtube-dl test video "\'/\\ä↭𝕐',
'uploader': 'Philipp Hagemeister', 'uploader': 'Philipp Hagemeister',
'uploader_id': 'phihag', 'uploader_id': 'phihag',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
'upload_date': '20121002', 'upload_date': '20121002',
'license': 'Standard YouTube License',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'], 'categories': ['Science & Technology'],
'tags': ['youtube-dl'], 'tags': ['youtube-dl'],
@ -401,12 +404,14 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20120506', 'upload_date': '20120506',
'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]', 'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
'alt_title': 'I Love It (feat. Charli XCX)', 'alt_title': 'I Love It (feat. Charli XCX)',
'description': 'md5:782e8651347686cba06e58f71ab51773', 'description': 'md5:f3ceb5ef83a08d95b9d146f973157cc8',
'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli', 'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop', 'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
'iconic ep', 'iconic', 'love', 'it'], 'iconic ep', 'iconic', 'love', 'it'],
'uploader': 'Icona Pop', 'uploader': 'Icona Pop',
'uploader_id': 'IconaPop', 'uploader_id': 'IconaPop',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IconaPop',
'license': 'Standard YouTube License',
'creator': 'Icona Pop', 'creator': 'Icona Pop',
} }
}, },
@ -422,6 +427,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:64249768eec3bc4276236606ea996373', 'description': 'md5:64249768eec3bc4276236606ea996373',
'uploader': 'justintimberlakeVEVO', 'uploader': 'justintimberlakeVEVO',
'uploader_id': 'justintimberlakeVEVO', 'uploader_id': 'justintimberlakeVEVO',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
'license': 'Standard YouTube License',
'creator': 'Justin Timberlake', 'creator': 'Justin Timberlake',
'age_limit': 18, 'age_limit': 18,
} }
@ -437,6 +444,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7', 'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7',
'uploader': 'SET India', 'uploader': 'SET India',
'uploader_id': 'setindia', 'uploader_id': 'setindia',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/setindia',
'license': 'Standard YouTube License',
'age_limit': 18, 'age_limit': 18,
} }
}, },
@ -449,7 +458,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'title': 'youtube-dl test video "\'/\\ä↭𝕐', 'title': 'youtube-dl test video "\'/\\ä↭𝕐',
'uploader': 'Philipp Hagemeister', 'uploader': 'Philipp Hagemeister',
'uploader_id': 'phihag', 'uploader_id': 'phihag',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
'upload_date': '20121002', 'upload_date': '20121002',
'license': 'Standard YouTube License',
'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .', 'description': 'test chars: "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
'categories': ['Science & Technology'], 'categories': ['Science & Technology'],
'tags': ['youtube-dl'], 'tags': ['youtube-dl'],
@ -468,8 +479,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'm4a', 'ext': 'm4a',
'upload_date': '20121002', 'upload_date': '20121002',
'uploader_id': '8KVIDEO', 'uploader_id': '8KVIDEO',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
'description': '', 'description': '',
'uploader': '8KVIDEO', 'uploader': '8KVIDEO',
'license': 'Standard YouTube License',
'title': 'UHDTV TEST 8K VIDEO.mp4' 'title': 'UHDTV TEST 8K VIDEO.mp4'
}, },
'params': { 'params': {
@ -488,6 +501,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'AfrojackVEVO', 'uploader': 'AfrojackVEVO',
'uploader_id': 'AfrojackVEVO', 'uploader_id': 'AfrojackVEVO',
'upload_date': '20131011', 'upload_date': '20131011',
'license': 'Standard YouTube License',
}, },
'params': { 'params': {
'youtube_include_dash_manifest': True, 'youtube_include_dash_manifest': True,
@ -506,6 +520,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'TaylorSwiftVEVO', 'uploader': 'TaylorSwiftVEVO',
'uploader_id': 'TaylorSwiftVEVO', 'uploader_id': 'TaylorSwiftVEVO',
'upload_date': '20140818', 'upload_date': '20140818',
'license': 'Standard YouTube License',
'creator': 'Taylor Swift', 'creator': 'Taylor Swift',
}, },
'params': { 'params': {
@ -522,6 +537,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20100909', 'upload_date': '20100909',
'uploader': 'The Amazing Atheist', 'uploader': 'The Amazing Atheist',
'uploader_id': 'TheAmazingAtheist', 'uploader_id': 'TheAmazingAtheist',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
'license': 'Standard YouTube License',
'title': 'Burning Everyone\'s Koran', 'title': 'Burning Everyone\'s Koran',
'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html', 'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
} }
@ -536,7 +553,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}', 'description': 're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
'uploader': 'The Witcher', 'uploader': 'The Witcher',
'uploader_id': 'WitcherGame', 'uploader_id': 'WitcherGame',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
'upload_date': '20140605', 'upload_date': '20140605',
'license': 'Standard YouTube License',
'age_limit': 18, 'age_limit': 18,
}, },
}, },
@ -550,7 +569,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:33765bb339e1b47e7e72b5490139bb41', 'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
'uploader': 'LloydVEVO', 'uploader': 'LloydVEVO',
'uploader_id': 'LloydVEVO', 'uploader_id': 'LloydVEVO',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
'upload_date': '20110629', 'upload_date': '20110629',
'license': 'Standard YouTube License',
'age_limit': 18, 'age_limit': 18,
}, },
}, },
@ -562,9 +583,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20100430', 'upload_date': '20100430',
'uploader_id': 'deadmau5', 'uploader_id': 'deadmau5',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/deadmau5',
'creator': 'deadmau5', 'creator': 'deadmau5',
'description': 'md5:12c56784b8032162bb936a5f76d55360', 'description': 'md5:12c56784b8032162bb936a5f76d55360',
'uploader': 'deadmau5', 'uploader': 'deadmau5',
'license': 'Standard YouTube License',
'title': 'Deadmau5 - Some Chords (HD)', 'title': 'Deadmau5 - Some Chords (HD)',
'alt_title': 'Some Chords', 'alt_title': 'Some Chords',
}, },
@ -580,6 +603,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'upload_date': '20150827', 'upload_date': '20150827',
'uploader_id': 'olympic', 'uploader_id': 'olympic',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic',
'license': 'Standard YouTube License',
'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games', 'description': 'HO09 - Women - GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
'uploader': 'Olympics', 'uploader': 'Olympics',
'title': 'Hockey - Women - GER-AUS - London 2012 Olympic Games', 'title': 'Hockey - Women - GER-AUS - London 2012 Olympic Games',
@ -597,8 +622,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'stretched_ratio': 16 / 9., 'stretched_ratio': 16 / 9.,
'upload_date': '20110310', 'upload_date': '20110310',
'uploader_id': 'AllenMeow', 'uploader_id': 'AllenMeow',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯', 'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
'uploader': '孫艾倫', 'uploader': '孫艾倫',
'license': 'Standard YouTube License',
'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人', 'title': '[A-made] 變態妍字幕版 太妍 我就是這樣的人',
}, },
}, },
@ -629,7 +656,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:116377fd2963b81ec4ce64b542173306', 'description': 'md5:116377fd2963b81ec4ce64b542173306',
'upload_date': '20150625', 'upload_date': '20150625',
'uploader_id': 'dorappi2000', 'uploader_id': 'dorappi2000',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
'uploader': 'dorappi2000', 'uploader': 'dorappi2000',
'license': 'Standard YouTube License',
'formats': 'mincount:33', 'formats': 'mincount:33',
}, },
}, },
@ -644,6 +673,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'uploader': 'Airtek', 'uploader': 'Airtek',
'description': 'Retransmisión en directo de la XVIII media maratón de Zaragoza.', 'description': 'Retransmisión en directo de la XVIII media maratón de Zaragoza.',
'uploader_id': 'UCzTzUmjXxxacNnL8I3m4LnQ', 'uploader_id': 'UCzTzUmjXxxacNnL8I3m4LnQ',
'license': 'Standard YouTube License',
'title': 'Retransmisión XVIII Media maratón Zaragoza 2015', 'title': 'Retransmisión XVIII Media maratón Zaragoza 2015',
}, },
'params': { 'params': {
@ -668,6 +698,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
'license': 'Standard YouTube License',
}, },
}, { }, {
'info_dict': { 'info_dict': {
@ -678,6 +710,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
'license': 'Standard YouTube License',
}, },
}, { }, {
'info_dict': { 'info_dict': {
@ -688,6 +722,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
'license': 'Standard YouTube License',
}, },
}, { }, {
'info_dict': { 'info_dict': {
@ -698,6 +734,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'upload_date': '20150721', 'upload_date': '20150721',
'uploader': 'Beer Games Beer', 'uploader': 'Beer Games Beer',
'uploader_id': 'beergamesbeer', 'uploader_id': 'beergamesbeer',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
'license': 'Standard YouTube License',
}, },
}], }],
'params': { 'params': {
@ -731,7 +769,9 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a', 'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
'upload_date': '20151119', 'upload_date': '20151119',
'uploader_id': 'IronSoulElf', 'uploader_id': 'IronSoulElf',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
'uploader': 'IronSoulElf', 'uploader': 'IronSoulElf',
'license': 'Standard YouTube License',
'creator': 'Todd Haberman, Daniel Law Heath & Aaron Kaplan', 'creator': 'Todd Haberman, Daniel Law Heath & Aaron Kaplan',
}, },
'params': { 'params': {
@ -759,6 +799,42 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip_download': True, 'skip_download': True,
}, },
}, },
{
# Video licensed under Creative Commons
'url': 'https://www.youtube.com/watch?v=M4gD1WSo5mA',
'info_dict': {
'id': 'M4gD1WSo5mA',
'ext': 'mp4',
'title': 'md5:e41008789470fc2533a3252216f1c1d1',
'description': 'md5:a677553cf0840649b731a3024aeff4cc',
'upload_date': '20150127',
'uploader_id': 'BerkmanCenter',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
'uploader': 'BerkmanCenter',
'license': 'Creative Commons Attribution license (reuse allowed)',
},
'params': {
'skip_download': True,
},
},
{
# Channel-like uploader_url
'url': 'https://www.youtube.com/watch?v=eQcmzGIKrzg',
'info_dict': {
'id': 'eQcmzGIKrzg',
'ext': 'mp4',
'title': 'Democratic Socialism and Foreign Policy | Bernie Sanders',
'description': 'md5:dda0d780d5a6e120758d1711d062a867',
'upload_date': '20151119',
'uploader': 'Bernie 2016',
'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg',
'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg',
'license': 'Creative Commons Attribution license (reuse allowed)',
},
'params': {
'skip_download': True,
},
},
{ {
'url': 'https://www.youtube.com/watch?feature=player_embedded&amp;amp;v=V36LpHqtcDY', 'url': 'https://www.youtube.com/watch?feature=player_embedded&amp;amp;v=V36LpHqtcDY',
'only_matching': True, 'only_matching': True,
@ -975,40 +1051,67 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
return {} return {}
try: try:
args = player_config['args'] args = player_config['args']
caption_url = args['ttsurl'] caption_url = args.get('ttsurl')
if not caption_url: if caption_url:
self._downloader.report_warning(err_msg) timestamp = args['timestamp']
return {} # We get the available subtitles
timestamp = args['timestamp'] list_params = compat_urllib_parse.urlencode({
# We get the available subtitles 'type': 'list',
list_params = compat_urllib_parse.urlencode({ 'tlangs': 1,
'type': 'list', 'asrs': 1,
'tlangs': 1, })
'asrs': 1, list_url = caption_url + '&' + list_params
}) caption_list = self._download_xml(list_url, video_id)
list_url = caption_url + '&' + list_params original_lang_node = caption_list.find('track')
caption_list = self._download_xml(list_url, video_id) if original_lang_node is None:
original_lang_node = caption_list.find('track') self._downloader.report_warning('Video doesn\'t have automatic captions')
if original_lang_node is None: return {}
self._downloader.report_warning('Video doesn\'t have automatic captions') original_lang = original_lang_node.attrib['lang_code']
return {} caption_kind = original_lang_node.attrib.get('kind', '')
original_lang = original_lang_node.attrib['lang_code']
caption_kind = original_lang_node.attrib.get('kind', '') sub_lang_list = {}
for lang_node in caption_list.findall('target'):
sub_lang = lang_node.attrib['lang_code']
sub_formats = []
for ext in self._SUBTITLE_FORMATS:
params = compat_urllib_parse.urlencode({
'lang': original_lang,
'tlang': sub_lang,
'fmt': ext,
'ts': timestamp,
'kind': caption_kind,
})
sub_formats.append({
'url': caption_url + '&' + params,
'ext': ext,
})
sub_lang_list[sub_lang] = sub_formats
return sub_lang_list
# Some videos don't provide ttsurl but rather caption_tracks and
# caption_translation_languages (e.g. 20LmZk1hakA)
caption_tracks = args['caption_tracks']
caption_translation_languages = args['caption_translation_languages']
caption_url = compat_parse_qs(caption_tracks.split(',')[0])['u'][0]
parsed_caption_url = compat_urlparse.urlparse(caption_url)
caption_qs = compat_parse_qs(parsed_caption_url.query)
sub_lang_list = {} sub_lang_list = {}
for lang_node in caption_list.findall('target'): for lang in caption_translation_languages.split(','):
sub_lang = lang_node.attrib['lang_code'] lang_qs = compat_parse_qs(compat_urllib_parse_unquote_plus(lang))
sub_lang = lang_qs.get('lc', [None])[0]
if not sub_lang:
continue
sub_formats = [] sub_formats = []
for ext in self._SUBTITLE_FORMATS: for ext in self._SUBTITLE_FORMATS:
params = compat_urllib_parse.urlencode({ caption_qs.update({
'lang': original_lang, 'tlang': [sub_lang],
'tlang': sub_lang, 'fmt': [ext],
'fmt': ext,
'ts': timestamp,
'kind': caption_kind,
}) })
sub_url = compat_urlparse.urlunparse(parsed_caption_url._replace(
query=compat_urllib_parse.urlencode(caption_qs, True)))
sub_formats.append({ sub_formats.append({
'url': caption_url + '&' + params, 'url': sub_url,
'ext': ext, 'ext': ext,
}) })
sub_lang_list[sub_lang] = sub_formats sub_lang_list[sub_lang] = sub_formats
@ -1019,6 +1122,29 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self._downloader.report_warning(err_msg) self._downloader.report_warning(err_msg)
return {} return {}
def _mark_watched(self, video_id, video_info):
playback_url = video_info.get('videostats_playback_base_url', [None])[0]
if not playback_url:
return
parsed_playback_url = compat_urlparse.urlparse(playback_url)
qs = compat_urlparse.parse_qs(parsed_playback_url.query)
# cpn generation algorithm is reverse engineered from base.js.
# In fact it works even with dummy cpn.
CPN_ALPHABET = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_'
cpn = ''.join((CPN_ALPHABET[random.randint(0, 256) & 63] for _ in range(0, 16)))
qs.update({
'ver': ['2'],
'cpn': [cpn],
})
playback_url = compat_urlparse.urlunparse(
parsed_playback_url._replace(query=compat_urllib_parse.urlencode(qs, True)))
self._download_webpage(
playback_url, video_id, 'Marking watched',
'Unable to mark watched', fatal=False)
@classmethod @classmethod
def extract_id(cls, url): def extract_id(cls, url):
mobj = re.match(cls._VALID_URL, url, re.VERBOSE) mobj = re.match(cls._VALID_URL, url, re.VERBOSE)
@ -1245,9 +1371,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# uploader_id # uploader_id
video_uploader_id = None video_uploader_id = None
mobj = re.search(r'<link itemprop="url" href="http://www.youtube.com/(?:user|channel)/([^"]+)">', video_webpage) video_uploader_url = None
mobj = re.search(
r'<link itemprop="url" href="(?P<uploader_url>https?://www.youtube.com/(?:user|channel)/(?P<uploader_id>[^"]+))">',
video_webpage)
if mobj is not None: if mobj is not None:
video_uploader_id = mobj.group(1) video_uploader_id = mobj.group('uploader_id')
video_uploader_url = mobj.group('uploader_url')
else: else:
self._downloader.report_warning('unable to extract uploader nickname') self._downloader.report_warning('unable to extract uploader nickname')
@ -1275,6 +1405,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
upload_date = ' '.join(re.sub(r'[/,-]', r' ', mobj.group(1)).split()) upload_date = ' '.join(re.sub(r'[/,-]', r' ', mobj.group(1)).split())
upload_date = unified_strdate(upload_date) upload_date = unified_strdate(upload_date)
video_license = self._html_search_regex(
r'<h4[^>]+class="title"[^>]*>\s*License\s*</h4>\s*<ul[^>]*>\s*<li>(.+?)</li',
video_webpage, 'license', default=None)
m_music = re.search( m_music = re.search(
r'<h4[^>]+class="title"[^>]*>\s*Music\s*</h4>\s*<ul[^>]*>\s*<li>(?P<title>.+?) by (?P<creator>.+?)(?:\(.+?\))?</li', r'<h4[^>]+class="title"[^>]*>\s*Music\s*</h4>\s*<ul[^>]*>\s*<li>(?P<title>.+?) by (?P<creator>.+?)(?:\(.+?\))?</li',
video_webpage) video_webpage)
@ -1348,6 +1482,19 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0] encoded_url_map = video_info.get('url_encoded_fmt_stream_map', [''])[0] + ',' + video_info.get('adaptive_fmts', [''])[0]
if 'rtmpe%3Dyes' in encoded_url_map: if 'rtmpe%3Dyes' in encoded_url_map:
raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True) raise ExtractorError('rtmpe downloads are not supported, see https://github.com/rg3/youtube-dl/issues/343 for more information.', expected=True)
formats_spec = {}
fmt_list = video_info.get('fmt_list', [''])[0]
if fmt_list:
for fmt in fmt_list.split(','):
spec = fmt.split('/')
if len(spec) > 1:
width_height = spec[1].split('x')
if len(width_height) == 2:
formats_spec[spec[0]] = {
'resolution': spec[1],
'width': int_or_none(width_height[0]),
'height': int_or_none(width_height[1]),
}
formats = [] formats = []
for url_data_str in encoded_url_map.split(','): for url_data_str in encoded_url_map.split(','):
url_data = compat_parse_qs(url_data_str) url_data = compat_parse_qs(url_data_str)
@ -1416,6 +1563,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
} }
if format_id in self._formats: if format_id in self._formats:
dct.update(self._formats[format_id]) dct.update(self._formats[format_id])
if format_id in formats_spec:
dct.update(formats_spec[format_id])
# Some itags are not included in DASH manifest thus corresponding formats will # Some itags are not included in DASH manifest thus corresponding formats will
# lack metadata (see https://github.com/rg3/youtube-dl/pull/5993). # lack metadata (see https://github.com/rg3/youtube-dl/pull/5993).
@ -1528,11 +1677,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
self._sort_formats(formats) self._sort_formats(formats)
self.mark_watched(video_id, video_info)
return { return {
'id': video_id, 'id': video_id,
'uploader': video_uploader, 'uploader': video_uploader,
'uploader_id': video_uploader_id, 'uploader_id': video_uploader_id,
'uploader_url': video_uploader_url,
'upload_date': upload_date, 'upload_date': upload_date,
'license': video_license,
'creator': video_creator, 'creator': video_creator,
'title': video_title, 'title': video_title,
'alt_title': video_alt_title, 'alt_title': video_alt_title,

View File

@ -137,6 +137,10 @@ class ZDFIE(InfoExtractor):
formats.extend(self._extract_smil_formats( formats.extend(self._extract_smil_formats(
video_url, video_id, fatal=False)) video_url, video_id, fatal=False))
elif ext == 'm3u8': elif ext == 'm3u8':
# the certificates are misconfigured (see
# https://github.com/rg3/youtube-dl/issues/8665)
if video_url.startswith('https://'):
continue
formats.extend(self._extract_m3u8_formats( formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)) video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
elif ext == 'f4m': elif ext == 'f4m':

View File

@ -170,6 +170,14 @@ def parseOpts(overrideArguments=None):
action='store_const', dest='extract_flat', const='in_playlist', action='store_const', dest='extract_flat', const='in_playlist',
default=False, default=False,
help='Do not extract the videos of a playlist, only list them.') help='Do not extract the videos of a playlist, only list them.')
general.add_option(
'--mark-watched',
action='store_true', dest='mark_watched', default=False,
help='Mark videos watched (YouTube only)')
general.add_option(
'--no-mark-watched',
action='store_false', dest='mark_watched', default=False,
help='Do not mark videos watched (YouTube only)')
general.add_option( general.add_option(
'--no-color', '--no-colors', '--no-color', '--no-colors',
action='store_true', dest='no_color', action='store_true', dest='no_color',

View File

@ -6,6 +6,7 @@ from .ffmpeg import (
FFmpegEmbedSubtitlePP, FFmpegEmbedSubtitlePP,
FFmpegExtractAudioPP, FFmpegExtractAudioPP,
FFmpegFixupStretchedPP, FFmpegFixupStretchedPP,
FFmpegFixupM3u8PP,
FFmpegFixupM4aPP, FFmpegFixupM4aPP,
FFmpegMergerPP, FFmpegMergerPP,
FFmpegMetadataPP, FFmpegMetadataPP,
@ -26,6 +27,7 @@ __all__ = [
'ExecAfterDownloadPP', 'ExecAfterDownloadPP',
'FFmpegEmbedSubtitlePP', 'FFmpegEmbedSubtitlePP',
'FFmpegExtractAudioPP', 'FFmpegExtractAudioPP',
'FFmpegFixupM3u8PP',
'FFmpegFixupM4aPP', 'FFmpegFixupM4aPP',
'FFmpegFixupStretchedPP', 'FFmpegFixupStretchedPP',
'FFmpegMergerPP', 'FFmpegMergerPP',

View File

@ -25,6 +25,19 @@ from ..utils import (
) )
EXT_TO_OUT_FORMATS = {
"aac": "adts",
"m4a": "ipod",
"mka": "matroska",
"mkv": "matroska",
"mpg": "mpeg",
"ogv": "ogg",
"ts": "mpegts",
"wma": "asf",
"wmv": "asf",
}
class FFmpegPostProcessorError(PostProcessingError): class FFmpegPostProcessorError(PostProcessingError):
pass pass
@ -391,10 +404,6 @@ class FFmpegMetadataPP(FFmpegPostProcessor):
for (name, value) in metadata.items(): for (name, value) in metadata.items():
options.extend(['-metadata', '%s=%s' % (name, value)]) options.extend(['-metadata', '%s=%s' % (name, value)])
# https://github.com/rg3/youtube-dl/issues/8350
if info.get('protocol') == 'm3u8_native' or info.get('protocol') == 'm3u8' and self._downloader.params.get('hls_prefer_native', False):
options.extend(['-bsf:a', 'aac_adtstoasc'])
self._downloader.to_screen('[ffmpeg] Adding metadata to \'%s\'' % filename) self._downloader.to_screen('[ffmpeg] Adding metadata to \'%s\'' % filename)
self.run_ffmpeg(filename, temp_filename, options) self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename)) os.remove(encodeFilename(filename))
@ -467,6 +476,21 @@ class FFmpegFixupM4aPP(FFmpegPostProcessor):
return [], info return [], info
class FFmpegFixupM3u8PP(FFmpegPostProcessor):
def run(self, info):
filename = info['filepath']
temp_filename = prepend_extension(filename, 'temp')
options = ['-c', 'copy', '-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
self._downloader.to_screen('[ffmpeg] Fixing malformated aac bitstream in "%s"' % filename)
self.run_ffmpeg(filename, temp_filename, options)
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
return [], info
class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor): class FFmpegSubtitlesConvertorPP(FFmpegPostProcessor):
def __init__(self, downloader=None, format=None): def __init__(self, downloader=None, format=None):
super(FFmpegSubtitlesConvertorPP, self).__init__(downloader) super(FFmpegSubtitlesConvertorPP, self).__init__(downloader)

View File

@ -6,6 +6,7 @@ import sys
import errno import errno
from .common import PostProcessor from .common import PostProcessor
from ..compat import compat_os_name
from ..utils import ( from ..utils import (
check_executable, check_executable,
hyphenate_date, hyphenate_date,
@ -73,7 +74,7 @@ class XAttrMetadataPP(PostProcessor):
raise XAttrMetadataError(e.errno, e.strerror) raise XAttrMetadataError(e.errno, e.strerror)
except ImportError: except ImportError:
if os.name == 'nt': if compat_os_name == 'nt':
# Write xattrs to NTFS Alternate Data Streams: # Write xattrs to NTFS Alternate Data Streams:
# http://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29 # http://en.wikipedia.org/wiki/NTFS#Alternate_data_streams_.28ADS.29
def write_xattr(path, key, value): def write_xattr(path, key, value):
@ -168,7 +169,7 @@ class XAttrMetadataPP(PostProcessor):
'Unable to write extended attributes due to too long values.') 'Unable to write extended attributes due to too long values.')
else: else:
msg = 'This filesystem doesn\'t support extended attributes. ' msg = 'This filesystem doesn\'t support extended attributes. '
if os.name == 'nt': if compat_os_name == 'nt':
msg += 'You need to use NTFS.' msg += 'You need to use NTFS.'
else: else:
msg += '(You may have to enable them in your /etc/fstab)' msg += '(You may have to enable them in your /etc/fstab)'

View File

@ -160,8 +160,6 @@ if sys.version_info >= (2, 7):
def find_xpath_attr(node, xpath, key, val=None): def find_xpath_attr(node, xpath, key, val=None):
""" Find the xpath xpath[@key=val] """ """ Find the xpath xpath[@key=val] """
assert re.match(r'^[a-zA-Z_-]+$', key) assert re.match(r'^[a-zA-Z_-]+$', key)
if val:
assert re.match(r'^[a-zA-Z0-9@\s:._-]*$', val)
expr = xpath + ('[@%s]' % key if val is None else "[@%s='%s']" % (key, val)) expr = xpath + ('[@%s]' % key if val is None else "[@%s='%s']" % (key, val))
return node.find(expr) return node.find(expr)
else: else:
@ -467,6 +465,10 @@ def encodeFilename(s, for_subprocess=False):
if not for_subprocess and sys.platform == 'win32' and sys.getwindowsversion()[0] >= 5: if not for_subprocess and sys.platform == 'win32' and sys.getwindowsversion()[0] >= 5:
return s return s
# Jython assumes filenames are Unicode strings though reported as Python 2.x compatible
if sys.platform.startswith('java'):
return s
return s.encode(get_subprocess_encoding(), 'ignore') return s.encode(get_subprocess_encoding(), 'ignore')
@ -1217,13 +1219,23 @@ if sys.platform == 'win32':
raise OSError('Unlocking file failed: %r' % ctypes.FormatError()) raise OSError('Unlocking file failed: %r' % ctypes.FormatError())
else: else:
import fcntl # Some platforms, such as Jython, is missing fcntl
try:
import fcntl
def _lock_file(f, exclusive): def _lock_file(f, exclusive):
fcntl.flock(f, fcntl.LOCK_EX if exclusive else fcntl.LOCK_SH) fcntl.flock(f, fcntl.LOCK_EX if exclusive else fcntl.LOCK_SH)
def _unlock_file(f): def _unlock_file(f):
fcntl.flock(f, fcntl.LOCK_UN) fcntl.flock(f, fcntl.LOCK_UN)
except ImportError:
UNSUPPORTED_MSG = 'file locking is not supported on this platform'
def _lock_file(f, exclusive):
raise IOError(UNSUPPORTED_MSG)
def _unlock_file(f):
raise IOError(UNSUPPORTED_MSG)
class locked_file(object): class locked_file(object):
@ -1304,6 +1316,17 @@ def format_bytes(bytes):
return '%.2f%s' % (converted, suffix) return '%.2f%s' % (converted, suffix)
def lookup_unit_table(unit_table, s):
units_re = '|'.join(re.escape(u) for u in unit_table)
m = re.match(
r'(?P<num>[0-9]+(?:[,.][0-9]*)?)\s*(?P<unit>%s)' % units_re, s)
if not m:
return None
num_str = m.group('num').replace(',', '.')
mult = unit_table[m.group('unit')]
return int(float(num_str) * mult)
def parse_filesize(s): def parse_filesize(s):
if s is None: if s is None:
return None return None
@ -1347,15 +1370,28 @@ def parse_filesize(s):
'Yb': 1000 ** 8, 'Yb': 1000 ** 8,
} }
units_re = '|'.join(re.escape(u) for u in _UNIT_TABLE) return lookup_unit_table(_UNIT_TABLE, s)
m = re.match(
r'(?P<num>[0-9]+(?:[,.][0-9]*)?)\s*(?P<unit>%s)' % units_re, s)
if not m: def parse_count(s):
if s is None:
return None return None
num_str = m.group('num').replace(',', '.') s = s.strip()
mult = _UNIT_TABLE[m.group('unit')]
return int(float(num_str) * mult) if re.match(r'^[\d,.]+$', s):
return str_to_int(s)
_UNIT_TABLE = {
'k': 1000,
'K': 1000,
'm': 1000 ** 2,
'M': 1000 ** 2,
'kk': 1000 ** 2,
'KK': 1000 ** 2,
}
return lookup_unit_table(_UNIT_TABLE, s)
def month_by_name(name): def month_by_name(name):
@ -1387,6 +1423,12 @@ def fix_xml_ampersands(xml_str):
def setproctitle(title): def setproctitle(title):
assert isinstance(title, compat_str) assert isinstance(title, compat_str)
# ctypes in Jython is not complete
# http://bugs.jython.org/issue2148
if sys.platform.startswith('java'):
return
try: try:
libc = ctypes.cdll.LoadLibrary('libc.so.6') libc = ctypes.cdll.LoadLibrary('libc.so.6')
except OSError: except OSError:
@ -1721,6 +1763,15 @@ def urlencode_postdata(*args, **kargs):
return compat_urllib_parse.urlencode(*args, **kargs).encode('ascii') return compat_urllib_parse.urlencode(*args, **kargs).encode('ascii')
def update_url_query(url, query):
parsed_url = compat_urlparse.urlparse(url)
qs = compat_parse_qs(parsed_url.query)
qs.update(query)
qs = encode_dict(qs)
return compat_urlparse.urlunparse(parsed_url._replace(
query=compat_urllib_parse.urlencode(qs, True)))
def encode_dict(d, encoding='utf-8'): def encode_dict(d, encoding='utf-8'):
def encode(v): def encode(v):
return v.encode(encoding) if isinstance(v, compat_basestring) else v return v.encode(encoding) if isinstance(v, compat_basestring) else v
@ -2619,3 +2670,41 @@ def ohdave_rsa_encrypt(data, exponent, modulus):
payload = int(binascii.hexlify(data[::-1]), 16) payload = int(binascii.hexlify(data[::-1]), 16)
encrypted = pow(payload, exponent, modulus) encrypted = pow(payload, exponent, modulus)
return '%x' % encrypted return '%x' % encrypted
def encode_base_n(num, n, table=None):
FULL_TABLE = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
if not table:
table = FULL_TABLE[:n]
if n > len(table):
raise ValueError('base %d exceeds table length %d' % (n, len(table)))
if num == 0:
return table[0]
ret = ''
while num:
ret = table[num % n] + ret
num = num // n
return ret
def decode_packed_codes(code):
mobj = re.search(
r"}\('(.+)',(\d+),(\d+),'([^']+)'\.split\('\|'\)",
code)
obfucasted_code, base, count, symbols = mobj.groups()
base = int(base)
count = int(count)
symbols = symbols.split('|')
symbol_table = {}
while count:
count -= 1
base_n_count = encode_base_n(count, base)
symbol_table[base_n_count] = symbols[count] or base_n_count
return re.sub(
r'\b(\w+)\b', lambda mobj: symbol_table[mobj.group(0)],
obfucasted_code)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2016.02.22' __version__ = '2016.03.14'