Merge pull request #9 from rg3/master

update 22 may
This commit is contained in:
siddht1 2016-05-22 10:29:46 +05:30
commit db00818fad
64 changed files with 2319 additions and 931 deletions

View File

@ -6,8 +6,8 @@
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.05.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.05.01**
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.05.21.2*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.05.21.2**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.05.01
[debug] youtube-dl version 2016.05.21.2
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

2
.gitignore vendored
View File

@ -31,7 +31,9 @@ updates_key.pem
*.part
*.swp
test/testdata
test/local_parameters.json
.tox
youtube-dl.zsh
.idea
.idea/*
tmp/

View File

@ -7,6 +7,9 @@ python:
- "3.4"
- "3.5"
sudo: false
install:
- bash ./devscripts/install_srelay.sh
- export PATH=$PATH:$(pwd)/tmp/srelay-0.4.8b6
script: nosetests test --verbose
notifications:
email:

View File

@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi *.mkv *.webm CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi *.mkv *.webm *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
@ -37,7 +37,7 @@ test:
ot: offlinetest
offlinetest: codetest
$(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
$(PYTHON) -m nose --verbose test --exclude test_download.py --exclude test_age_restriction.py --exclude test_subtitles.py --exclude test_write_annotations.py --exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py --exclude test_socks.py
tar: youtube-dl.tar.gz

View File

@ -25,7 +25,7 @@ If you do not have curl, you can alternatively use a recent wget:
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
Windows users can [download a .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in their home directory or any other location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29).
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](http://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
OS X users can install **youtube-dl** with [Homebrew](http://brew.sh/).
@ -85,9 +85,11 @@ which means you can modify it, redistribute it or use it however you like.
--no-color Do not emit color codes in output
## Network Options:
--proxy URL Use the specified HTTP/HTTPS proxy. Pass in
an empty string (--proxy "") for direct
connection
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy.
To enable experimental SOCKS proxy, specify
a proper scheme. For example
socks5://127.0.0.1:1080/. Pass in an empty
string (--proxy "") for direct connection
--socket-timeout SECONDS Time to wait before giving up, in seconds
--source-address IP Client-side IP address to bind to
(experimental)
@ -415,7 +417,7 @@ which means you can modify it, redistribute it or use it however you like.
# CONFIGURATION
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and OS X, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`.
For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
```
@ -431,7 +433,7 @@ You can use `--ignore-config` if you want to disable the configuration file for
### Authentication with `.netrc` file
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a`.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by you only:
```
touch $HOME/.netrc
chmod a-rwx,u+rw $HOME/.netrc

8
devscripts/install_srelay.sh Executable file
View File

@ -0,0 +1,8 @@
#!/bin/bash
mkdir -p tmp && cd tmp
wget -N http://downloads.sourceforge.net/project/socks-relay/socks-relay/srelay-0.4.8/srelay-0.4.8b6.tar.gz
tar zxvf srelay-0.4.8b6.tar.gz
cd srelay-0.4.8b6
./configure
make

View File

@ -33,6 +33,8 @@ if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: th
useless_files=$(find youtube_dl -type f -not -name '*.py')
if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in youtube_dl: $useless_files"; exit 1; fi
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
/bin/echo -e "\n### First of all, testing..."
make clean

View File

@ -6,6 +6,7 @@
- **22tracks:genre**
- **22tracks:track**
- **24video**
- **3qsdn**: 3Q SDN
- **3sat**
- **4tube**
- **56.com**
@ -15,6 +16,8 @@
- **9gag**
- **abc.net.au**
- **Abc7News**
- **abcnews**
- **abcnews:video**
- **AcademicEarth:Course**
- **acast**
- **acast:channel**
@ -77,6 +80,7 @@
- **Bild**: Bild.de
- **BiliBili**
- **BioBioChileTV**
- **BIQLE**
- **BleacherReport**
- **BleacherReportCMS**
- **blinkx**
@ -102,6 +106,7 @@
- **CBCPlayer**
- **CBS**
- **CBSInteractive**
- **CBSLocal**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
@ -113,7 +118,6 @@
- **chirbit**
- **chirbit:profile**
- **Cinchcast**
- **Cinemassacre**
- **Clipfish**
- **cliphunter**
- **ClipRs**
@ -127,7 +131,6 @@
- **CNN**
- **CNNArticle**
- **CNNBlogs**
- **CollegeHumor**
- **CollegeRama**
- **ComCarCoff**
- **ComedyCentral**
@ -145,6 +148,7 @@
- **culturebox.francetvinfo.fr**
- **CultureUnplugged**
- **CWTV**
- **DailyMail**
- **dailymotion**
- **dailymotion:playlist**
- **dailymotion:user**
@ -212,6 +216,7 @@
- **Flickr**
- **Folketinget**: Folketinget (ft.dk; Danish parliament)
- **FootyRoom**
- **Formula1**
- **FOX**
- **Foxgay**
- **FoxNews**: Fox News and Fox Business Video
@ -315,6 +320,7 @@
- **la7.tv**
- **Laola1Tv**
- **Le**: 乐视网
- **Learnr**
- **Lecture2Go**
- **Lemonde**
- **LePlaylist**
@ -325,10 +331,12 @@
- **limelight**
- **limelight:channel**
- **limelight:channel_list**
- **LiTV**
- **LiveLeak**
- **livestream**
- **livestream:original**
- **LnkGo**
- **LocalNews8**
- **LoveHomePorn**
- **lrt.lt**
- **lynda**: lynda.com videos
@ -374,6 +382,8 @@
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- **MusicPlayOn**
- **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
- **Mwave**
- **MwaveMeetGreet**
- **MySpace**
@ -463,7 +473,8 @@
- **pbs**: Public Broadcasting Service (PBS) and member stations: PBS: Public Broadcasting Service, APT - Alabama Public Television (WBIQ), GPB/Georgia Public Broadcasting (WGTV), Mississippi Public Broadcasting (WMPN), Nashville Public Television (WNPT), WFSU-TV (WFSU), WSRE (WSRE), WTCI (WTCI), WPBA/Channel 30 (WPBA), Alaska Public Media (KAKM), Arizona PBS (KAET), KNME-TV/Channel 5 (KNME), Vegas PBS (KLVX), AETN/ARKANSAS ETV NETWORK (KETS), KET (WKLE), WKNO/Channel 10 (WKNO), LPB/LOUISIANA PUBLIC BROADCASTING (WLPB), OETA (KETA), Ozarks Public Television (KOZK), WSIU Public Broadcasting (WSIU), KEET TV (KEET), KIXE/Channel 9 (KIXE), KPBS San Diego (KPBS), KQED (KQED), KVIE Public Television (KVIE), PBS SoCal/KOCE (KOCE), ValleyPBS (KVPT), CONNECTICUT PUBLIC TELEVISION (WEDH), KNPB Channel 5 (KNPB), SOPTV (KSYS), Rocky Mountain PBS (KRMA), KENW-TV3 (KENW), KUED Channel 7 (KUED), Wyoming PBS (KCWC), Colorado Public Television / KBDI 12 (KBDI), KBYU-TV (KBYU), Thirteen/WNET New York (WNET), WGBH/Channel 2 (WGBH), WGBY (WGBY), NJTV Public Media NJ (WNJT), WLIW21 (WLIW), mpt/Maryland Public Television (WMPB), WETA Television and Radio (WETA), WHYY (WHYY), PBS 39 (WLVT), WVPT - Your Source for PBS and More! (WVPT), Howard University Television (WHUT), WEDU PBS (WEDU), WGCU Public Media (WGCU), WPBT2 (WPBT), WUCF TV (WUCF), WUFT/Channel 5 (WUFT), WXEL/Channel 42 (WXEL), WLRN/Channel 17 (WLRN), WUSF Public Broadcasting (WUSF), ETV (WRLK), UNC-TV (WUNC), PBS Hawaii - Oceanic Cable Channel 10 (KHET), Idaho Public Television (KAID), KSPS (KSPS), OPB (KOPB), KWSU/Channel 10 & KTNW/Channel 31 (KWSU), WILL-TV (WILL), Network Knowledge - WSEC/Springfield (WSEC), WTTW11 (WTTW), Iowa Public Television/IPTV (KDIN), Nine Network (KETC), PBS39 Fort Wayne (WFWA), WFYI Indianapolis (WFYI), Milwaukee Public Television (WMVS), WNIN (WNIN), WNIT Public Television (WNIT), WPT (WPNE), WVUT/Channel 22 (WVUT), WEIU/Channel 51 (WEIU), WQPT-TV (WQPT), WYCC PBS Chicago (WYCC), WIPB-TV (WIPB), WTIU (WTIU), CET (WCET), ThinkTVNetwork (WPTD), WBGU-TV (WBGU), WGVU TV (WGVU), NET1 (KUON), Pioneer Public Television (KWCM), SDPB Television (KUSD), TPT (KTCA), KSMQ (KSMQ), KPTS/Channel 8 (KPTS), KTWU/Channel 11 (KTWU), East Tennessee PBS (WSJK), WCTE-TV (WCTE), WLJT, Channel 11 (WLJT), WOSU TV (WOSU), WOUB/WOUC (WOUB), WVPB (WVPB), WKYU-PBS (WKYU), KERA 13 (KERA), MPBN (WCBB), Mountain Lake PBS (WCFE), NHPTV (WENH), Vermont PBS (WETK), witf (WITF), WQED Multimedia (WQED), WMHT Educational Telecommunications (WMHT), Q-TV (WDCQ), WTVS Detroit Public TV (WTVS), CMU Public Television (WCMU), WKAR-TV (WKAR), WNMU-TV Public TV 13 (WNMU), WDSE - WRPT (WDSE), WGTE TV (WGTE), Lakeland Public Television (KAWE), KMOS-TV - Channels 6.1, 6.2 and 6.3 (KMOS), MontanaPBS (KUSM), KRWG/Channel 22 (KRWG), KACV (KACV), KCOS/Channel 13 (KCOS), WCNY/Channel 24 (WCNY), WNED (WNED), WPBS (WPBS), WSKG Public TV (WSKG), WXXI (WXXI), WPSU (WPSU), WVIA Public Media Studios (WVIA), WTVI (WTVI), Western Reserve PBS (WNEO), WVIZ/PBS ideastream (WVIZ), KCTS 9 (KCTS), Basin PBS (KPBT), KUHT / Channel 8 (KUHT), KLRN (KLRN), KLRU (KLRU), WTJX Channel 12 (WTJX), WCVE PBS (WCVE), KBTC Public Television (KBTC)
- **pcmag**
- **People**
- **Periscope**: Periscope
- **periscope**: Periscope
- **periscope:user**: Periscope user videos
- **PhilharmonieDeParis**: Philharmonie de Paris
- **phoenix.de**
- **Photobucket**
@ -551,6 +562,7 @@
- **ScreenJunkies**
- **ScreenwaveMedia**
- **SenateISVP**
- **SendtoNews**
- **ServingSys**
- **Sexu**
- **Shahid**
@ -674,7 +686,6 @@
- **tvp.pl:Series**
- **TVPlay**: TV3Play and related services
- **Tweakers**
- **twitch:bookmarks**
- **twitch:chapter**
- **twitch:past_broadcasts**
- **twitch:profile**
@ -692,7 +703,8 @@
- **USAToday**
- **ustream**
- **ustream:channel**
- **Ustudio**
- **ustudio**
- **ustudio:embed**
- **Varzesh3**
- **Vbox7**
- **VeeHD**
@ -700,6 +712,7 @@
- **Vessel**
- **Vesti**: Вести.Ru
- **Vevo**
- **VevoPlaylist**
- **VGTV**: VGTV, BTTV, FTV, Aftenposten and Aftonbladet
- **vh1.com**
- **Vice**
@ -772,7 +785,7 @@
- **WSJ**: Wall Street Journal
- **XBef**
- **XboxClips**
- **XFileShare**: XFileShare based sites: GorillaVid.in, daclips.in, movpod.in, fastvideo.in, realvid.net, filehoot.com and vidto.me
- **XFileShare**: XFileShare based sites: DaClips, FileHoot, GorillaVid, MovPod, PowerWatch, Rapidvideo.ws, TheVideoBee, Vidto, Streamin.To
- **XHamster**
- **XHamsterEmbed**
- **xiami:album**: 虾米音乐 - 专辑

View File

@ -24,8 +24,13 @@ from youtube_dl.utils import (
def get_params(override=None):
PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"parameters.json")
LOCAL_PARAMETERS_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)),
"local_parameters.json")
with io.open(PARAMETERS_FILE, encoding='utf-8') as pf:
parameters = json.load(pf)
if os.path.exists(LOCAL_PARAMETERS_FILE):
with io.open(LOCAL_PARAMETERS_FILE, encoding='utf-8') as pf:
parameters.update(json.load(pf))
if override:
parameters.update(override)
return parameters

View File

@ -17,6 +17,7 @@ from youtube_dl.compat import (
compat_expanduser,
compat_shlex_split,
compat_str,
compat_struct_unpack,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlencode,
@ -102,5 +103,9 @@ class TestCompat(unittest.TestCase):
self.assertTrue(isinstance(doc.find('chinese').text, compat_str))
self.assertTrue(isinstance(doc.find('foo/bar').text, compat_str))
def test_struct_unpack(self):
self.assertEqual(compat_struct_unpack('!B', b'\x00'), (0,))
if __name__ == '__main__':
unittest.main()

118
test/test_socks.py Normal file
View File

@ -0,0 +1,118 @@
#!/usr/bin/env python
# coding: utf-8
from __future__ import unicode_literals
# Allow direct execution
import os
import sys
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import random
import subprocess
from test.helper import (
FakeYDL,
get_params,
)
from youtube_dl.compat import (
compat_str,
compat_urllib_request,
)
class TestMultipleSocks(unittest.TestCase):
@staticmethod
def _check_params(attrs):
params = get_params()
for attr in attrs:
if attr not in params:
print('Missing %s. Skipping.' % attr)
return
return params
def test_proxy_http(self):
params = self._check_params(['primary_proxy', 'primary_server_ip'])
if params is None:
return
ydl = FakeYDL({
'proxy': params['primary_proxy']
})
self.assertEqual(
ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8'),
params['primary_server_ip'])
def test_proxy_https(self):
params = self._check_params(['primary_proxy', 'primary_server_ip'])
if params is None:
return
ydl = FakeYDL({
'proxy': params['primary_proxy']
})
self.assertEqual(
ydl.urlopen('https://yt-dl.org/ip').read().decode('utf-8'),
params['primary_server_ip'])
def test_secondary_proxy_http(self):
params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
if params is None:
return
ydl = FakeYDL()
req = compat_urllib_request.Request('http://yt-dl.org/ip')
req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
self.assertEqual(
ydl.urlopen(req).read().decode('utf-8'),
params['secondary_server_ip'])
def test_secondary_proxy_https(self):
params = self._check_params(['secondary_proxy', 'secondary_server_ip'])
if params is None:
return
ydl = FakeYDL()
req = compat_urllib_request.Request('https://yt-dl.org/ip')
req.add_header('Ytdl-request-proxy', params['secondary_proxy'])
self.assertEqual(
ydl.urlopen(req).read().decode('utf-8'),
params['secondary_server_ip'])
class TestSocks(unittest.TestCase):
_SKIP_SOCKS_TEST = True
def setUp(self):
if self._SKIP_SOCKS_TEST:
return
self.port = random.randint(20000, 30000)
self.server_process = subprocess.Popen([
'srelay', '-f', '-i', '127.0.0.1:%d' % self.port],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def tearDown(self):
if self._SKIP_SOCKS_TEST:
return
self.server_process.terminate()
self.server_process.communicate()
def _get_ip(self, protocol):
if self._SKIP_SOCKS_TEST:
return '127.0.0.1'
ydl = FakeYDL({
'proxy': '%s://127.0.0.1:%d' % (protocol, self.port),
})
return ydl.urlopen('http://yt-dl.org/ip').read().decode('utf-8')
def test_socks4(self):
self.assertTrue(isinstance(self._get_ip('socks4'), compat_str))
def test_socks4a(self):
self.assertTrue(isinstance(self._get_ip('socks4a'), compat_str))
def test_socks5(self):
self.assertTrue(isinstance(self._get_ip('socks5'), compat_str))
if __name__ == '__main__':
unittest.main()

View File

@ -50,12 +50,13 @@ from youtube_dl.utils import (
sanitize_path,
prepend_extension,
replace_extension,
remove_start,
remove_end,
remove_quotes,
shell_quote,
smuggle_url,
str_to_int,
strip_jsonp,
struct_unpack,
timeconvert,
unescapeHTML,
unified_strdate,
@ -156,8 +157,8 @@ class TestUtil(unittest.TestCase):
self.assertTrue(sanitize_filename(':', restricted=True) != '')
self.assertEqual(sanitize_filename(
'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ', restricted=True),
'AAAAAAAECEEEEIIIIDNOOOOOOUUUUYPssaaaaaaaeceeeeiiiionoooooouuuuypy')
'ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØŒÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøœùúûüýþÿ', restricted=True),
'AAAAAAAECEEEEIIIIDNOOOOOOOEUUUUYPssaaaaaaaeceeeeiiiionoooooooeuuuuypy')
def test_sanitize_ids(self):
self.assertEqual(sanitize_filename('_n_cd26wFpw', is_id=True), '_n_cd26wFpw')
@ -216,6 +217,16 @@ class TestUtil(unittest.TestCase):
self.assertEqual(replace_extension('.abc', 'temp'), '.abc.temp')
self.assertEqual(replace_extension('.abc.ext', 'temp'), '.abc.temp')
def test_remove_start(self):
self.assertEqual(remove_start(None, 'A - '), None)
self.assertEqual(remove_start('A - B', 'A - '), 'B')
self.assertEqual(remove_start('B - A', 'A - '), 'B - A')
def test_remove_end(self):
self.assertEqual(remove_end(None, ' - B'), None)
self.assertEqual(remove_end('A - B', ' - B'), 'A')
self.assertEqual(remove_end('B - A', ' - B'), 'B - A')
def test_remove_quotes(self):
self.assertEqual(remove_quotes(None), None)
self.assertEqual(remove_quotes('"'), '"')
@ -457,9 +468,6 @@ class TestUtil(unittest.TestCase):
testPL(5, 2, (2, 99), [2, 3, 4])
testPL(5, 2, (20, 99), [])
def test_struct_unpack(self):
self.assertEqual(struct_unpack('!B', b'\x00'), (0,))
def test_read_batch_urls(self):
f = io.StringIO('''\xef\xbb\xbf foo
bar\r
@ -621,6 +629,15 @@ class TestUtil(unittest.TestCase):
json_code = js_to_json(inp)
self.assertEqual(json.loads(json_code), json.loads(inp))
inp = '''{
0:{src:'skipped', type: 'application/dash+xml'},
1:{src:'skipped', type: 'application/vnd.apple.mpegURL'},
}'''
self.assertEqual(js_to_json(inp), '''{
"0":{"src":"skipped", "type": "application/dash+xml"},
"1":{"src":"skipped", "type": "application/vnd.apple.mpegURL"}
}''')
def test_js_to_json_edgecases(self):
on = js_to_json("{abc_def:'1\\'\\\\2\\\\\\'3\"4'}")
self.assertEqual(json.loads(on), {"abc_def": "1'\\2\\'3\"4"})
@ -644,6 +661,27 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{"abc": "def",}')
self.assertEqual(json.loads(on), {'abc': 'def'})
on = js_to_json('{ 0: /* " \n */ ",]" , }')
self.assertEqual(json.loads(on), {'0': ',]'})
on = js_to_json(r'["<p>x<\/p>"]')
self.assertEqual(json.loads(on), ['<p>x</p>'])
on = js_to_json(r'["\xaa"]')
self.assertEqual(json.loads(on), ['\u00aa'])
on = js_to_json("['a\\\nb']")
self.assertEqual(json.loads(on), ['ab'])
on = js_to_json('{0xff:0xff}')
self.assertEqual(json.loads(on), {'255': 255})
on = js_to_json('{077:077}')
self.assertEqual(json.loads(on), {'63': 63})
on = js_to_json('{42:42}')
self.assertEqual(json.loads(on), {'42': 42})
def test_extract_attributes(self):
self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})

View File

@ -9,5 +9,6 @@ passenv = HOME
defaultargs = test --exclude test_download.py --exclude test_age_restriction.py
--exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
--exclude test_socks.py
commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html
# test.test_download:TestDownload.test_NowVideo

View File

@ -64,6 +64,7 @@ from .utils import (
PostProcessingError,
preferredencoding,
prepend_extension,
register_socks_protocols,
render_table,
replace_extension,
SameFileError,
@ -325,7 +326,7 @@ class YoutubeDL(object):
['fribidi', '-c', 'UTF-8'] + width_args, **sp_kwargs)
self._output_channel = os.fdopen(master, 'rb')
except OSError as ose:
if ose.errno == 2:
if ose.errno == errno.ENOENT:
self.report_warning('Could not find fribidi executable, ignoring --bidi-workaround . Make sure that fribidi is an executable file in one of the directories in your $PATH.')
else:
raise
@ -361,6 +362,8 @@ class YoutubeDL(object):
for ph in self.params.get('progress_hooks', []):
self.add_progress_hook(ph)
register_socks_protocols()
def warn_if_short_id(self, argv):
# short YouTube ID starting with dash?
idxs = [
@ -717,6 +720,7 @@ class YoutubeDL(object):
result_type = ie_result.get('_type', 'video')
if result_type in ('url', 'url_transparent'):
ie_result['url'] = sanitize_url(ie_result['url'])
extract_flat = self.params.get('extract_flat', False)
if ((extract_flat == 'in_playlist' and 'playlist' in extra_info) or
extract_flat is True):

View File

@ -67,9 +67,9 @@ def _real_main(argv=None):
# Custom HTTP headers
if opts.headers is not None:
for h in opts.headers:
if h.find(':', 1) < 0:
if ':' not in h:
parser.error('wrong header formatting, it should be key:value, not "%s"' % h)
key, value = h.split(':', 2)
key, value = h.split(':', 1)
if opts.verbose:
write_string('[debug] Adding header from command line option %s:%s\n' % (key, value))
std_headers[key] = value

View File

@ -11,6 +11,7 @@ import re
import shlex
import shutil
import socket
import struct
import subprocess
import sys
import itertools
@ -340,9 +341,9 @@ except ImportError: # Python 2
return parsed_result
try:
from shlex import quote as shlex_quote
from shlex import quote as compat_shlex_quote
except ImportError: # Python < 3.3
def shlex_quote(s):
def compat_shlex_quote(s):
if re.match(r'^[-_\w./]+$', s):
return s
else:
@ -465,18 +466,6 @@ else:
print(s)
try:
subprocess_check_output = subprocess.check_output
except AttributeError:
def subprocess_check_output(*args, **kwargs):
assert 'input' not in kwargs
p = subprocess.Popen(*args, stdout=subprocess.PIPE, **kwargs)
output, _ = p.communicate()
ret = p.poll()
if ret:
raise subprocess.CalledProcessError(ret, p.args, output=output)
return output
if sys.version_info < (3, 0) and sys.platform == 'win32':
def compat_getpass(prompt, *args, **kwargs):
if isinstance(prompt, compat_str):
@ -592,6 +581,26 @@ if sys.version_info >= (3, 0):
else:
from tokenize import generate_tokens as compat_tokenize_tokenize
try:
struct.pack('!I', 0)
except TypeError:
# In Python 2.6 and 2.7.x < 2.7.7, struct requires a bytes argument
# See https://bugs.python.org/issue19099
def compat_struct_pack(spec, *args):
if isinstance(spec, compat_str):
spec = spec.encode('ascii')
return struct.pack(spec, *args)
def compat_struct_unpack(spec, *args):
if isinstance(spec, compat_str):
spec = spec.encode('ascii')
return struct.unpack(spec, *args)
else:
compat_struct_pack = struct.pack
compat_struct_unpack = struct.unpack
__all__ = [
'compat_HTMLParser',
'compat_HTTPError',
@ -614,9 +623,12 @@ __all__ = [
'compat_parse_qs',
'compat_print',
'compat_setenv',
'compat_shlex_quote',
'compat_shlex_split',
'compat_socket_create_connection',
'compat_str',
'compat_struct_pack',
'compat_struct_unpack',
'compat_subprocess_get_DEVNULL',
'compat_tokenize_tokenize',
'compat_urllib_error',
@ -633,7 +645,5 @@ __all__ = [
'compat_urlretrieve',
'compat_xml_parse_error',
'compat_xpath',
'shlex_quote',
'subprocess_check_output',
'workaround_optparse_bug9161',
]

View File

@ -12,37 +12,49 @@ from ..compat import (
compat_urlparse,
compat_urllib_error,
compat_urllib_parse_urlparse,
compat_struct_pack,
compat_struct_unpack,
)
from ..utils import (
encodeFilename,
fix_xml_ampersands,
sanitize_open,
struct_pack,
struct_unpack,
xpath_text,
)
class DataTruncatedError(Exception):
pass
class FlvReader(io.BytesIO):
"""
Reader for Flv files
The file format is documented in https://www.adobe.com/devnet/f4v.html
"""
def read_bytes(self, n):
data = self.read(n)
if len(data) < n:
raise DataTruncatedError(
'FlvReader error: need %d bytes while only %d bytes got' % (
n, len(data)))
return data
# Utility functions for reading numbers and strings
def read_unsigned_long_long(self):
return struct_unpack('!Q', self.read(8))[0]
return compat_struct_unpack('!Q', self.read_bytes(8))[0]
def read_unsigned_int(self):
return struct_unpack('!I', self.read(4))[0]
return compat_struct_unpack('!I', self.read_bytes(4))[0]
def read_unsigned_char(self):
return struct_unpack('!B', self.read(1))[0]
return compat_struct_unpack('!B', self.read_bytes(1))[0]
def read_string(self):
res = b''
while True:
char = self.read(1)
char = self.read_bytes(1)
if char == b'\x00':
break
res += char
@ -53,18 +65,18 @@ class FlvReader(io.BytesIO):
Read a box and return the info as a tuple: (box_size, box_type, box_data)
"""
real_size = size = self.read_unsigned_int()
box_type = self.read(4)
box_type = self.read_bytes(4)
header_end = 8
if size == 1:
real_size = self.read_unsigned_long_long()
header_end = 16
return real_size, box_type, self.read(real_size - header_end)
return real_size, box_type, self.read_bytes(real_size - header_end)
def read_asrt(self):
# version
self.read_unsigned_char()
# flags
self.read(3)
self.read_bytes(3)
quality_entry_count = self.read_unsigned_char()
# QualityEntryCount
for i in range(quality_entry_count):
@ -85,7 +97,7 @@ class FlvReader(io.BytesIO):
# version
self.read_unsigned_char()
# flags
self.read(3)
self.read_bytes(3)
# time scale
self.read_unsigned_int()
@ -119,7 +131,7 @@ class FlvReader(io.BytesIO):
# version
self.read_unsigned_char()
# flags
self.read(3)
self.read_bytes(3)
self.read_unsigned_int() # BootstrapinfoVersion
# Profile,Live,Update,Reserved
@ -194,11 +206,11 @@ def build_fragments_list(boot_info):
def write_unsigned_int(stream, val):
stream.write(struct_pack('!I', val))
stream.write(compat_struct_pack('!I', val))
def write_unsigned_int_24(stream, val):
stream.write(struct_pack('!I', val)[1:])
stream.write(compat_struct_pack('!I', val)[1:])
def write_flv_header(stream):
@ -374,7 +386,17 @@ class F4mFD(FragmentFD):
down.close()
reader = FlvReader(down_data)
while True:
try:
_, box_type, box_data = reader.read_box_info()
except DataTruncatedError:
if test:
# In tests, segments may be truncated, and thus
# FlvReader may not be able to parse the whole
# chunk. If so, write the segment as is
# See https://github.com/rg3/youtube-dl/issues/9214
dest_stream.write(down_data)
break
raise
if box_type == b'mdat':
dest_stream.write(box_data)
break

View File

@ -0,0 +1,135 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import re
import time
from .amp import AMPIE
from .common import InfoExtractor
from ..compat import compat_urlparse
class AbcNewsVideoIE(AMPIE):
IE_NAME = 'abcnews:video'
_VALID_URL = 'http://abcnews.go.com/[^/]+/video/(?P<display_id>[0-9a-z-]+)-(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/ThisWeek/video/week-exclusive-irans-foreign-minister-zarif-20411932',
'info_dict': {
'id': '20411932',
'ext': 'mp4',
'display_id': 'week-exclusive-irans-foreign-minister-zarif',
'title': '\'This Week\' Exclusive: Iran\'s Foreign Minister Zarif',
'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
'duration': 180,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'http://abcnews.go.com/2020/video/2020-husband-stands-teacher-jail-student-affairs-26119478',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_id = mobj.group('id')
info_dict = self._extract_feed_info(
'http://abcnews.go.com/video/itemfeed?id=%s' % video_id)
info_dict.update({
'id': video_id,
'display_id': display_id,
})
return info_dict
class AbcNewsIE(InfoExtractor):
IE_NAME = 'abcnews'
_VALID_URL = 'https?://abcnews\.go\.com/(?:[^/]+/)+(?P<display_id>[0-9a-z-]+)/story\?id=(?P<id>\d+)'
_TESTS = [{
'url': 'http://abcnews.go.com/Blotter/News/dramatic-video-rare-death-job-america/story?id=10498713#.UIhwosWHLjY',
'info_dict': {
'id': '10498713',
'ext': 'flv',
'display_id': 'dramatic-video-rare-death-job-america',
'title': 'Occupational Hazards',
'description': 'Nightline investigates the dangers that lurk at various jobs.',
'thumbnail': 're:^https?://.*\.jpg$',
'upload_date': '20100428',
'timestamp': 1272412800,
},
'add_ie': ['AbcNewsVideo'],
}, {
'url': 'http://abcnews.go.com/Entertainment/justin-timberlake-performs-stop-feeling-eurovision-2016/story?id=39125818',
'info_dict': {
'id': '39125818',
'ext': 'mp4',
'display_id': 'justin-timberlake-performs-stop-feeling-eurovision-2016',
'title': 'Justin Timberlake Drops Hints For Secret Single',
'description': 'Lara Spencer reports the buzziest stories of the day in "GMA" Pop News.',
'upload_date': '20160515',
'timestamp': 1463329500,
},
'params': {
# m3u8 download
'skip_download': True,
# The embedded YouTube video is blocked due to copyright issues
'playlist_items': '1',
},
'add_ie': ['AbcNewsVideo'],
}, {
'url': 'http://abcnews.go.com/Technology/exclusive-apple-ceo-tim-cook-iphone-cracking-software/story?id=37173343',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_id = mobj.group('id')
webpage = self._download_webpage(url, video_id)
video_url = self._search_regex(
r'window\.abcnvideo\.url\s*=\s*"([^"]+)"', webpage, 'video URL')
full_video_url = compat_urlparse.urljoin(url, video_url)
youtube_url = self._html_search_regex(
r'<iframe[^>]+src="(https://www\.youtube\.com/embed/[^"]+)"',
webpage, 'YouTube URL', default=None)
timestamp = None
date_str = self._html_search_regex(
r'<span[^>]+class="timestamp">([^<]+)</span>',
webpage, 'timestamp', fatal=False)
if date_str:
tz_offset = 0
if date_str.endswith(' ET'): # Eastern Time
tz_offset = -5
date_str = date_str[:-3]
date_formats = ['%b. %d, %Y', '%b %d, %Y, %I:%M %p']
for date_format in date_formats:
try:
timestamp = calendar.timegm(time.strptime(date_str.strip(), date_format))
except ValueError:
continue
if timestamp is not None:
timestamp -= tz_offset * 3600
entry = {
'_type': 'url_transparent',
'ie_key': AbcNewsVideoIE.ie_key(),
'url': full_video_url,
'id': video_id,
'display_id': display_id,
'timestamp': timestamp,
}
if youtube_url:
entries = [entry, self.url_result(youtube_url, 'Youtube')]
return self.playlist_result(entries)
return entry

View File

@ -52,7 +52,7 @@ class AMPIE(InfoExtractor):
for media_data in media_content:
media = media_data['@attributes']
media_type = media['type']
if media_type == 'video/f4m':
if media_type in ('video/f4m', 'application/f4m+xml'):
formats.extend(self._extract_f4m_formats(
media['url'] + '?hdcore=3.4.0&plugin=aasp-3.4.0.132.124',
video_id, f4m_id='hds', fatal=False))
@ -61,7 +61,7 @@ class AMPIE(InfoExtractor):
media['url'], video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
formats.append({
'format_id': media_data['media-category']['@attributes']['label'],
'format_id': media_data.get('media-category', {}).get('@attributes', {}).get('label'),
'url': media['url'],
'tbr': int_or_none(media.get('bitrate')),
'filesize': int_or_none(media.get('fileSize')),

View File

@ -0,0 +1,224 @@
# coding: utf-8
from __future__ import unicode_literals
import base64
import hashlib
import json
import random
import time
from .common import InfoExtractor
from ..aes import aes_encrypt
from ..compat import compat_str
from ..utils import (
bytes_to_intlist,
determine_ext,
intlist_to_bytes,
int_or_none,
strip_jsonp,
)
def md5_text(s):
if not isinstance(s, compat_str):
s = compat_str(s)
return hashlib.md5(s.encode('utf-8')).hexdigest()
class AnvatoIE(InfoExtractor):
# Copied from anvplayer.min.js
_ANVACK_TABLE = {
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ',
'nbcu_nbcd_desktop_web_qa_1a6f01bdd0dc45a439043b694c8a031d': 'eSxJUbA2UUKBTXryyQ2d6NuM8oEqaPySvaPzfKNA',
'nbcu_nbcd_desktop_web_acc_eb2ff240a5d4ae9a63d4c297c32716b6c523a129': '89JR3RtUGbvKuuJIiKOMK0SoarLb5MUx8v89RcbP',
'nbcu_nbcd_watchvod_web_prod_e61107507180976724ec8e8319fe24ba5b4b60e1': 'Uc7dFt7MJ9GsBWB5T7iPvLaMSOt8BBxv4hAXk5vv',
'nbcu_nbcd_watchvod_web_qa_42afedba88a36203db5a4c09a5ba29d045302232': 'T12oDYVFP2IaFvxkmYMy5dKxswpLHtGZa4ZAXEi7',
'nbcu_nbcd_watchvod_web_acc_9193214448e2e636b0ffb78abacfd9c4f937c6ca': 'MmobcxUxMedUpohNWwXaOnMjlbiyTOBLL6d46ZpR',
'nbcu_local_monitor_web_acc_f998ad54eaf26acd8ee033eb36f39a7b791c6335': 'QvfIoPYrwsjUCcASiw3AIkVtQob2LtJHfidp9iWg',
'nbcu_cable_monitor_web_acc_a413759603e8bedfcd3c61b14767796e17834077': 'uwVPJLShvJWSs6sWEIuVem7MTF8A4IknMMzIlFto',
'nbcu_nbcd_mcpstage_web_qa_4c43a8f6e95a88dbb40276c0630ba9f693a63a4e': 'PxVYZVwjhgd5TeoPRxL3whssb5OUPnM3zyAzq8GY',
'nbcu_comcast_comcast_web_prod_074080762ad4ce956b26b43fb22abf153443a8c4': 'afnaRZfDyg1Z3WZHdupKfy6xrbAG2MHqe3VfuSwh',
'nbcu_comcast_comcast_web_qa_706103bb93ead3ef70b1de12a0e95e3c4481ade0': 'DcjsVbX9b3uoPlhdriIiovgFQZVxpISZwz0cx1ZK',
'nbcu_comcast_comcastcable_web_prod_669f04817536743563d7331c9293e59fbdbe3d07': '0RwMN2cWy10qhAhOscq3eK7aEe0wqnKt3vJ0WS4D',
'nbcu_comcast_comcastcable_web_qa_3d9d2d66219094127f0f6b09cc3c7bb076e3e1ca': '2r8G9DEya7PCqBceKZgrn2XkXgASjwLMuaFE1Aad',
'hearst_hearst_demo_web_stage_960726dfef3337059a01a78816e43b29ec04dfc7': 'cuZBPXTR6kSdoTCVXwk5KGA8rk3NrgGn4H6e9Dsp',
'anvato_mcpqa_demo_web_stage_18b55e00db5a13faa8d03ae6e41f6f5bcb15b922': 'IOaaLQ8ymqVyem14QuAvE5SndQynTcH5CrLkU2Ih',
'anvato_nextmedia_demo_web_stage_9787d56a02ff6b9f43e9a2b0920d8ca88beb5818': 'Pqu9zVzI1ApiIzbVA3VkGBEQHvdKSUuKpD6s2uaR',
'anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a': 'du1ccmn7RxzgizwbWU7hyUaGodNlJn7HtXI0WgXW',
'anvato_scripps_app_web_stage_360797e00fe2826be142155c4618cc52fce6c26c': '2PMrQ0BRoqCWl7nzphj0GouIMEh2mZYivAT0S1Su',
'fs2go_fs2go_go_all_prod_21934911ccfafc03a075894ead2260d11e2ddd24': 'RcuHlKikW2IJw6HvVoEkqq2UsuEJlbEl11pWXs4Q',
'fs2go_fs2go_go_web_prod_ead4b0eec7460c1a07783808db21b49cf1f2f9a7': '4K0HTT2u1zkQA2MaGaZmkLa1BthGSBdr7jllrhk5',
'fs2go_fs2go_go_web_stage_407585454a4400355d4391691c67f361': 'ftnc37VKRJBmHfoGGi3kT05bHyeJzilEzhKJCyl3',
'fs2go_fs2go_go_android_stage_44b714db6f8477f29afcba15a41e1d30': 'CtxpPvVpo6AbZGomYUhkKs7juHZwNml9b9J0J2gI',
'anvato_cbslocal_app_web_prod_547f3e49241ef0e5d30c79b2efbca5d92c698f67': 'Pw0XX5KBDsyRnPS0R2JrSrXftsy8Jnz5pAjaYC8s',
'anvato_cbslocal_app_web_stage_547a5f096594cd3e00620c6f825cad1096d28c80': '37OBUhX2uwNyKhhrNzSSNHSRPZpApC3trdqDBpuz',
'fs2go_att_att_web_prod_1042dddd089a05438b6a08f972941176f699ffd8': 'JLcF20JwYvpv6uAGcLWIaV12jKwaL1R8us4b6Zkg',
'fs2go_att_att_web_stage_807c5001955fc114a3331fe027ddc76e': 'gbu1oO1y0JiOFh4SUipt86P288JHpyjSqolrrT1x',
'fs2go_fs2go_tudor_web_prod_a7dd8e5a7cdc830cae55eae6f3e9fee5ee49eb9b': 'ipcp87VCEZXPPe868j3orLqzc03oTy7DXsGkAXXH',
'anvato_mhz_app_web_prod_b808218b30de7fdf60340cbd9831512bc1bf6d37': 'Stlm5Gs6BEhJLRTZHcNquyzxGqr23EuFmE5DCgjX',
'fs2go_charter_charter_web_stage_c2c6e5a68375a1bf00fff213d3ff8f61a835a54c': 'Lz4hbJp1fwL6jlcz4M2PMzghM4jp4aAmybtT5dPc',
'fs2go_charter_charter_web_prod_ebfe3b10f1af215a7321cd3d629e0b81dfa6fa8c': 'vUJsK345A1bVmyYDRhZX0lqFIgVXuqhmuyp1EtPK',
'anvato_epfox_app_web_prod_b3373168e12f423f41504f207000188daf88251b': 'GDKq1ixvX3MoBNdU5IOYmYa2DTUXYOozPjrCJnW7',
'anvato_epfox_app_web_stage_a3c2ce60f8f83ef374a88b68ee73a950f8ab87ce': '2jz2NH4BsXMaDsoJ5qkHMbcczAfIReo2eFYuVC1C',
'fs2go_verizon_verizon_web_stage_08e6df0354a4803f1b1f2428b5a9a382e8dbcd62': 'rKTVapNaAcmnUbGL4ZcuOoY4SE7VmZSQsblPFr7e',
'fs2go_verizon_verizon_web_prod_f909564cb606eff1f731b5e22e0928676732c445': 'qLSUuHerM3u9eNPzaHyUK52obai5MvE4XDJfqYe1',
'fs2go_foxcom_synd_web_stage_f7b9091f00ea25a4fdaaae77fca5b54cdc7e7043': '96VKF2vLd24fFiDfwPFpzM5llFN4TiIGAlodE0Re',
'fs2go_foxcom_synd_web_prod_0f2cdd64d87e4ab6a1d54aada0ff7a7c8387a064': 'agiPjbXEyEZUkbuhcnmVPhe9NNVbDjCFq2xkcx51',
'anvato_own_app_web_stage_1214ade5d28422c4dae9d03c1243aba0563c4dba': 'mzhamNac3swG4WsJAiUTacnGIODi6SWeVWk5D7ho',
'anvato_own_app_web_prod_944e162ed927ec3e9ed13eb68ed2f1008ee7565e': '9TSxh6G2TXOLBoYm9ro3LdNjjvnXpKb8UR8KoIP9',
'anvato_scripps_app_ftv_prod_a10a10468edd5afb16fb48171c03b956176afad1': 'COJ2i2UIPK7xZqIWswxe7FaVBOVgRkP1F6O6qGoH',
'anvato_scripps_app_ftv_stage_77d3ad2bdb021ec37ca2e35eb09acd396a974c9a': 'Q7nnopNLe2PPfGLOTYBqxSaRpl209IhqaEuDZi1F',
'anvato_univision_app_web_stage_551236ef07a0e17718c3995c35586b5ed8cb5031': 'D92PoLS6UitwxDRA191HUGT9OYcOjV6mPMa5wNyo',
'anvato_univision_app_web_prod_039a5c0a6009e637ae8ac906718a79911e0e65e1': '5mVS5u4SQjtw6NGw2uhMbKEIONIiLqRKck5RwQLR',
'nbcu_cnbc_springfield_ios_prod_670207fae43d6e9a94c351688851a2ce': 'M7fqCCIP9lW53oJbHs19OlJlpDrVyc2OL8gNeuTa',
'nbcu_cnbc_springfieldvod_ios_prod_7a5f04b1ceceb0e9c9e2264a44aa236e08e034c2': 'Yia6QbJahW0S7K1I0drksimhZb4UFq92xLBmmMvk',
'anvato_cox_app_web_prod_ce45cda237969f93e7130f50ee8bb6280c1484ab': 'cc0miZexpFtdoqZGvdhfXsLy7FXjRAOgb9V0f5fZ',
'anvato_cox_app_web_stage_c23dbe016a8e9d8c7101d10172b92434f6088bf9': 'yivU3MYHd2eDZcOfmLbINVtqxyecKTOp8OjOuoGJ',
'anvato_chnzero_app_web_stage_b1164d1352b579e792e542fddf13ee34c0eeb46b': 'A76QkXMmVH8lTCfU15xva1mZnSVcqeY4Xb22Kp7m',
'anvato_chnzero_app_web_prod_253d358928dc08ec161eda2389d53707288a730c': 'OA5QI3ZWZZkdtUEDqh28AH8GedsF6FqzJI32596b',
'anvato_discovery_vodpoc_web_stage_9fa7077b5e8af1f8355f65d4fb8d2e0e9d54e2b7': 'q3oT191tTQ5g3JCP67PkjLASI9s16DuWZ6fYmry3',
'anvato_discovery_vodpoc_web_prod_688614983167a1af6cdf6d76343fda10a65223c1': 'qRvRQCTVHd0VVOHsMvvfidyWmlYVrTbjby7WqIuK',
'nbcu_cnbc_springfieldvod_ftv_stage_826040aad1925a46ac5dfb4b3c5143e648c6a30d': 'JQaSb5a8Tz0PT4ti329DNmzDO30TnngTHmvX8Vua',
'nbcu_cnbc_springfield_ftv_stage_826040aad1925a46ac5dfb4b3c5143e648c6a30d': 'JQaSb5a8Tz0PT4ti329DNmzDO30TnngTHmvX8Vua',
'nbcu_nbcd_capture_web_stage_4dd9d585bfb984ebf856dee35db027b2465cc4ae': '0j1Ov4Vopyi2HpBZJYdL2m8ERJVGYh3nNpzPiO8F',
'nbcu_nbcd_watch3_android_prod_7712ca5fcf1c22f19ec1870a9650f9c37db22dcf': '3LN2UB3rPUAMu7ZriWkHky9vpLMXYha8JbSnxBlx',
'nbcu_nbcd_watchvod3_android_prod_0910a3a4692d57c0b5ff4316075bc5d096be45b9': 'mJagcQ2II30vUOAauOXne7ERwbf5S9nlB3IP17lQ',
'anvato_scripps_app_atv_prod_790deda22e16e71e83df58f880cd389908a45d52': 'CB6trI1mpoDIM5o54DNTsji90NDBQPZ4z4RqBNSH',
'nbcu_nbcd_watchv4_android_prod_ff67cef9cb409158c6f8c3533edddadd0b750507': 'j8CHQCUWjlYERj4NFRmUYOND85QNbHViH09UwuKm',
'nbcu_nbcd_watchvodv4_android_prod_a814d781609989dea6a629d50ae4c7ad8cc8e907': 'rkVnUXxdA9rawVLUlDQtMue9Y4Q7lFEaIotcUhjt',
'rvVKpA50qlOPLFxMjrCGf5pdkdQDm7qn': '1J7ZkY5Qz5lMLi93QOH9IveE7EYB3rLl',
'nbcu_dtv_local_web_prod_b266cf49defe255fd4426a97e27c09e513e9f82f': 'HuLnJDqzLa4saCzYMJ79zDRSQpEduw1TzjMNQu2b',
'nbcu_att_local_web_prod_4cef038b2d969a6b7d700a56a599040b6a619f67': 'Q0Em5VDc2KpydUrVwzWRXAwoNBulWUxCq2faK0AV',
'nbcu_dish_local_web_prod_c56dcaf2da2e9157a4266c82a78195f1dd570f6b': 'bC1LWmRz9ayj2AlzizeJ1HuhTfIaJGsDBnZNgoRg',
'nbcu_verizon_local_web_prod_88bebd2ce006d4ed980de8133496f9a74cb9b3e1': 'wzhDKJZpgvUSS1EQvpCQP8Q59qVzcPixqDGJefSk',
'nbcu_charter_local_web_prod_9ad90f7fc4023643bb718f0fe0fd5beea2382a50': 'PyNbxNhEWLzy1ZvWEQelRuIQY88Eub7xbSVRMdfT',
'nbcu_suddenlink_local_web_prod_20fb711725cac224baa1c1cb0b1c324d25e97178': '0Rph41lPXZbb3fqeXtHjjbxfSrNbtZp1Ygq7Jypa',
'nbcu_wow_local_web_prod_652d9ce4f552d9c2e7b5b1ed37b8cb48155174ad': 'qayIBZ70w1dItm2zS42AptXnxW15mkjRrwnBjMPv',
'nbcu_centurylink_local_web_prod_2034402b029bf3e837ad46814d9e4b1d1345ccd5': 'StePcPMkjsX51PcizLdLRMzxMEl5k2FlsMLUNV4k',
'nbcu_atlanticbrd_local_web_prod_8d5f5ecbf7f7b2f5e6d908dd75d90ae3565f682e': 'NtYLb4TFUS0pRs3XTkyO5sbVGYjVf17bVbjaGscI',
'nbcu_nbcd_watchvod_web_dev_08bc05699be47c4f31d5080263a8cfadc16d0f7c': 'hwxi2dgDoSWgfmVVXOYZm14uuvku4QfopstXckhr',
'anvato_nextmedia_app_web_prod_a4fa8c7204aa65e71044b57aaf63711980cfe5a0': 'tQN1oGPYY1nM85rJYePWGcIb92TG0gSqoVpQTWOw',
'anvato_mcp_lin_web_prod_4c36fbfd4d8d8ecae6488656e21ac6d1ac972749': 'GUXNf5ZDX2jFUpu4WT2Go4DJ5nhUCzpnwDRRUx1K',
'anvato_mcp_univision_web_prod_37fe34850c99a3b5cdb71dab10a417dd5cdecafa': 'bLDYF8JqfG42b7bwKEgQiU9E2LTIAtnKzSgYpFUH',
'anvato_mcp_fs2go_web_prod_c7b90a93e171469cdca00a931211a2f556370d0a': 'icgGoYGipQMMSEvhplZX1pwbN69srwKYWksz3xWK',
'anvato_mcp_sps_web_prod_54bdc90dd6ba21710e9f7074338365bba28da336': 'fA2iQdI7RDpynqzQYIpXALVS83NTPr8LLFK4LFsu',
'anvato_mcp_anv_web_prod_791407490f4c1ef2a4bcb21103e0cb1bcb3352b3': 'rMOUZqe9lwcGq2mNgG3EDusm6lKgsUnczoOX3mbg',
'anvato_mcp_gray_web_prod_4c10f067c393ed8fc453d3930f8ab2b159973900': 'rMOUZqe9lwcGq2mNgG3EDusm6lKgsUnczoOX3mbg',
'anvato_mcp_hearst_web_prod_5356c3de0fc7c90a3727b4863ca7fec3a4524a99': 'P3uXJ0fXXditBPCGkfvlnVScpPEfKmc64Zv7ZgbK',
'anvato_mcp_cbs_web_prod_02f26581ff80e5bda7aad28226a8d369037f2cbe': 'mGPvo5ZA5SgjOFAPEPXv7AnOpFUICX8hvFQVz69n',
'anvato_mcp_telemundo_web_prod_c5278d51ad46fda4b6ca3d0ea44a7846a054f582': 'qyT6PXXLjVNCrHaRVj0ugAhalNRS7Ee9BP7LUokD',
'nbcu_nbcd_watchvodv4_web_stage_4108362fba2d4ede21f262fea3c4162cbafd66c7': 'DhaU5lj0W2gEdcSSsnxURq8t7KIWtJfD966crVDk',
'anvato_scripps_app_ios_prod_409c41960c60b308db43c3cc1da79cab9f1c3d93': 'WPxj5GraLTkYCyj3M7RozLqIycjrXOEcDGFMIJPn',
'EZqvRyKBJLrgpClDPDF8I7Xpdp40Vx73': '4OxGd2dEakylntVKjKF0UK9PDPYB6A9W',
'M2v78QkpleXm9hPp9jUXI63x5vA6BogR': 'ka6K32k7ZALmpINkjJUGUo0OE42Md1BQ',
'nbcu_nbcd_desktop_web_prod_93d8ead38ce2024f8f544b78306fbd15895ae5e6_secure': 'NNemUkySjxLyPTKvZRiGntBIjEyK8uqicjMakIaQ'
}
_AUTH_KEY = b'\x31\xc2\x42\x84\x9e\x73\xa0\xce'
def __init__(self, *args, **kwargs):
super(AnvatoIE, self).__init__(*args, **kwargs)
self.__server_time = None
def _server_time(self, access_key, video_id):
if self.__server_time is not None:
return self.__server_time
self.__server_time = int(self._download_json(
self._api_prefix(access_key) + 'server_time?anvack=' + access_key, video_id,
note='Fetching server time')['server_time'])
return self.__server_time
def _api_prefix(self, access_key):
return 'https://tkx2-%s.anvato.net/rest/v2/' % ('prod' if 'prod' in access_key else 'stage')
def _get_video_json(self, access_key, video_id):
# See et() in anvplayer.min.js, which is an alias of getVideoJSON()
video_data_url = self._api_prefix(access_key) + 'mcp/video/%s?anvack=%s' % (video_id, access_key)
server_time = self._server_time(access_key, video_id)
input_data = '%d~%s~%s' % (server_time, md5_text(video_data_url), md5_text(server_time))
auth_secret = intlist_to_bytes(aes_encrypt(
bytes_to_intlist(input_data[:64]), bytes_to_intlist(self._AUTH_KEY)))
video_data_url += '&X-Anvato-Adst-Auth=' + base64.b64encode(auth_secret).decode('ascii')
anvrid = md5_text(time.time() * 1000 * random.random())[:30]
payload = {
'api': {
'anvrid': anvrid,
'anvstk': md5_text('%s|%s|%d|%s' % (
access_key, anvrid, server_time, self._ANVACK_TABLE[access_key])),
'anvts': server_time,
},
}
return self._download_json(
video_data_url, video_id, transform_source=strip_jsonp,
data=json.dumps(payload).encode('utf-8'))
def _extract_anvato_videos(self, webpage, video_id):
anvplayer_data = self._parse_json(self._html_search_regex(
r'<script[^>]+data-anvp=\'([^\']+)\'', webpage,
'Anvato player data'), video_id)
video_id = anvplayer_data['video']
access_key = anvplayer_data['accessKey']
video_data = self._get_video_json(access_key, video_id)
formats = []
for published_url in video_data['published_urls']:
video_url = published_url['embed_url']
ext = determine_ext(video_url)
if ext == 'smil':
formats.extend(self._extract_smil_formats(video_url, video_id))
continue
tbr = int_or_none(published_url.get('kbps'))
a_format = {
'url': video_url,
'format_id': ('-'.join(filter(None, ['http', published_url.get('cdn_name')]))).lower(),
'tbr': tbr if tbr != 0 else None,
}
if ext == 'm3u8':
# Not using _extract_m3u8_formats here as individual media
# playlists are also included in published_urls.
if tbr is None:
formats.append(self._m3u8_meta_format(video_url, ext='mp4', m3u8_id='hls'))
continue
else:
a_format.update({
'format_id': '-'.join(filter(None, ['hls', compat_str(tbr)])),
'ext': 'mp4',
})
elif ext == 'mp3':
a_format['vcodec'] = 'none'
else:
a_format.update({
'width': int_or_none(published_url.get('width')),
'height': int_or_none(published_url.get('height')),
})
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}
for caption in video_data.get('captions', []):
a_caption = {
'url': caption['url'],
'ext': 'tt' if caption.get('format') == 'SMPTE-TT' else None
}
subtitles.setdefault(caption['language'], []).append(a_caption)
return {
'id': video_id,
'formats': formats,
'title': video_data.get('def_title'),
'description': video_data.get('def_description'),
'categories': video_data.get('categories'),
'thumbnail': video_data.get('thumbnail'),
'subtitles': subtitles,
}

View File

@ -17,6 +17,9 @@ class BloombergIE(InfoExtractor):
'title': 'Shah\'s Presentation on Foreign-Exchange Strategies',
'description': 'md5:a8ba0302912d03d246979735c17d2761',
},
'params': {
'format': 'best[format_id^=hds]',
},
}, {
'url': 'http://www.bloomberg.com/news/articles/2015-11-12/five-strange-things-that-have-been-happening-in-financial-markets',
'only_matching': True,

View File

@ -307,9 +307,10 @@ class BrightcoveLegacyIE(InfoExtractor):
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
video_id = compat_str(video_info['id'])
publisher_id = video_info.get('publisherId')
info = {
'id': compat_str(video_info['id']),
'id': video_id,
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
@ -331,7 +332,8 @@ class BrightcoveLegacyIE(InfoExtractor):
url_comp = compat_urllib_parse_urlparse(url)
if url_comp.path.endswith('.m3u8'):
formats.extend(
self._extract_m3u8_formats(url, info['id'], 'mp4'))
self._extract_m3u8_formats(
url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
continue
elif 'akamaihd.net' in url_comp.netloc:
# This type of renditions are served through
@ -365,7 +367,7 @@ class BrightcoveLegacyIE(InfoExtractor):
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8',
'protocol': 'm3u8_native',
})
formats.append(a_format)
@ -395,7 +397,7 @@ class BrightcoveLegacyIE(InfoExtractor):
return ad_info
if 'url' not in info and not info.get('formats'):
raise ExtractorError('Unable to extract video url for %s' % info['id'])
raise ExtractorError('Unable to extract video url for %s' % video_id)
return info
@ -442,6 +444,10 @@ class BrightcoveNewIE(InfoExtractor):
# non numeric ref: prefixed video id
'url': 'http://players.brightcove.net/710858724001/default_default/index.html?videoId=ref:event-stream-356',
'only_matching': True,
}, {
# unavailable video without message but with error_code
'url': 'http://players.brightcove.net/1305187701/c832abfb-641b-44eb-9da0-2fe76786505f_default/index.html?videoId=4377407326001',
'only_matching': True,
}]
@staticmethod
@ -512,8 +518,9 @@ class BrightcoveNewIE(InfoExtractor):
})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
json_data = self._parse_json(e.cause.read().decode(), video_id)
raise ExtractorError(json_data[0]['message'], expected=True)
json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
raise ExtractorError(
json_data.get('message') or json_data['error_code'], expected=True)
raise
title = json_data['name'].strip()
@ -527,7 +534,7 @@ class BrightcoveNewIE(InfoExtractor):
if not src:
continue
formats.extend(self._extract_m3u8_formats(
src, video_id, 'mp4', m3u8_id='hls', fatal=False))
src, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
elif source_type == 'application/dash+xml':
if not src:
continue

View File

@ -4,65 +4,66 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import js_to_json
from ..utils import (
js_to_json,
smuggle_url,
)
class CBCIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?:[^/]+/)+(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?cbc\.ca/(?!player/)(?:[^/]+/)+(?P<id>[^/?#]+)'
_TESTS = [{
# with mediaId
'url': 'http://www.cbc.ca/22minutes/videos/clips-season-23/don-cherry-play-offs',
'md5': '97e24d09672fc4cf56256d6faa6c25bc',
'info_dict': {
'id': '2682904050',
'ext': 'flv',
'ext': 'mp4',
'title': 'Don Cherry All-Stars',
'description': 'Don Cherry has a bee in his bonnet about AHL player John Scott because that guys got heart.',
'timestamp': 1454475540,
'timestamp': 1454463000,
'upload_date': '20160203',
},
'params': {
# rtmp download
'skip_download': True,
'uploader': 'CBCC-NEW',
},
}, {
# with clipId
'url': 'http://www.cbc.ca/archives/entry/1978-robin-williams-freestyles-on-90-minutes-live',
'md5': '0274a90b51a9b4971fe005c63f592f12',
'info_dict': {
'id': '2487345465',
'ext': 'flv',
'ext': 'mp4',
'title': 'Robin Williams freestyles on 90 Minutes Live',
'description': 'Wacky American comedian Robin Williams shows off his infamous "freestyle" comedic talents while being interviewed on CBC\'s 90 Minutes Live.',
'upload_date': '19700101',
'upload_date': '19780210',
'uploader': 'CBCC-NEW',
},
'params': {
# rtmp download
'skip_download': True,
'timestamp': 255977160,
},
}, {
# multiple iframes
'url': 'http://www.cbc.ca/natureofthings/blog/birds-eye-view-from-vancouvers-burrard-street-bridge-how-we-got-the-shot',
'playlist': [{
'md5': '377572d0b49c4ce0c9ad77470e0b96b4',
'info_dict': {
'id': '2680832926',
'ext': 'flv',
'ext': 'mp4',
'title': 'An Eagle\'s-Eye View Off Burrard Bridge',
'description': 'Hercules the eagle flies from Vancouver\'s Burrard Bridge down to a nearby park with a mini-camera strapped to his back.',
'upload_date': '19700101',
'upload_date': '20160201',
'timestamp': 1454342820,
'uploader': 'CBCC-NEW',
},
}, {
'md5': '415a0e3f586113894174dfb31aa5bb1a',
'info_dict': {
'id': '2658915080',
'ext': 'flv',
'ext': 'mp4',
'title': 'Fly like an eagle!',
'description': 'Eagle equipped with a mini camera flies from the world\'s tallest tower',
'upload_date': '19700101',
'upload_date': '20150315',
'timestamp': 1426443984,
'uploader': 'CBCC-NEW',
},
}],
'params': {
# rtmp download
'skip_download': True,
},
}]
@classmethod
@ -91,24 +92,54 @@ class CBCIE(InfoExtractor):
class CBCPlayerIE(InfoExtractor):
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/|i/caffeine/syndicate/\?mediaId=))(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193',
'md5': '64d25f841ddf4ddb28a235338af32e2c',
'info_dict': {
'id': '2683190193',
'ext': 'flv',
'ext': 'mp4',
'title': 'Gerry Runs a Sweat Shop',
'description': 'md5:b457e1c01e8ff408d9d801c1c2cd29b0',
'timestamp': 1455067800,
'timestamp': 1455071400,
'upload_date': '20160210',
'uploader': 'CBCC-NEW',
},
'params': {
# rtmp download
'skip_download': True,
}, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'http://www.cbc.ca/player/play/2657631896',
'md5': 'e5e708c34ae6fca156aafe17c43e8b75',
'info_dict': {
'id': '2657631896',
'ext': 'mp3',
'title': 'CBC Montreal is organizing its first ever community hackathon!',
'description': 'The modern technology we tend to depend on so heavily, is never without it\'s share of hiccups and headaches. Next weekend - CBC Montreal will be getting members of the public for its first Hackathon.',
'timestamp': 1425704400,
'upload_date': '20150307',
'uploader': 'CBCC-NEW',
},
}
}, {
# available only when we add `formats=MPEG4,FLV,MP3` to theplatform url
'url': 'http://www.cbc.ca/player/play/2164402062',
'md5': '17a61eb813539abea40618d6323a7f82',
'info_dict': {
'id': '2164402062',
'ext': 'flv',
'title': 'Cancer survivor four times over',
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746,
'upload_date': '20111104',
'uploader': 'CBCC-NEW',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result(
'http://feed.theplatform.com/f/ExhSPC/vms_5akSXx4Ng_Zn?byGuid=%s' % video_id,
'ThePlatformFeed', video_id)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/%s?mbr=true&formats=MPEG4,FLV,MP3' % video_id, {
'force_smil_url': True
}),
'id': video_id,
}

View File

@ -0,0 +1,84 @@
# coding: utf-8
from __future__ import unicode_literals
import calendar
import datetime
from .anvato import AnvatoIE
from .sendtonews import SendtoNewsIE
from ..compat import compat_urlparse
class CBSLocalIE(AnvatoIE):
_VALID_URL = r'https?://[a-z]+\.cbslocal\.com/\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
_TESTS = [{
# Anvato backend
'url': 'http://losangeles.cbslocal.com/2016/05/16/safety-advocates-say-fatal-car-seat-failures-are-public-health-crisis',
'md5': 'f0ee3081e3843f575fccef901199b212',
'info_dict': {
'id': '3401037',
'ext': 'mp4',
'title': 'Safety Advocates Say Fatal Car Seat Failures Are \'Public Health Crisis\'',
'description': 'Collapsing seats have been the focus of scrutiny for decades, though experts say remarkably little has been done to address the issue. Randy Paige reports.',
'thumbnail': 're:^https?://.*',
'timestamp': 1463440500,
'upload_date': '20160516',
'subtitles': {
'en': 'mincount:5',
},
'categories': [
'Stations\\Spoken Word\\KCBSTV',
'Syndication\\MSN',
'Syndication\\NDN',
'Syndication\\AOL',
'Syndication\\Yahoo',
'Syndication\\Tribune',
'Syndication\\Curb.tv',
'Content\\News'
],
},
}, {
# SendtoNews embed
'url': 'http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/',
'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'upload_date': '20160516',
'timestamp': 1463433840,
'duration': 49,
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
sendtonews_url = SendtoNewsIE._extract_url(webpage)
if sendtonews_url:
info_dict = {
'_type': 'url_transparent',
'url': compat_urlparse.urljoin(url, sendtonews_url),
}
else:
info_dict = self._extract_anvato_videos(webpage, display_id)
time_str = self._html_search_regex(
r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
timestamp = None
if time_str:
timestamp = calendar.timegm(datetime.datetime.strptime(
time_str, '%b %d, %Y %I:%M %p').timetuple())
info_dict.update({
'display_id': display_id,
'timestamp': timestamp,
})
return info_dict

View File

@ -1,119 +0,0 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import ExtractorError
from .screenwavemedia import ScreenwaveMediaIE
class CinemassacreIE(InfoExtractor):
_VALID_URL = 'https?://(?:www\.)?cinemassacre\.com/(?P<date_y>[0-9]{4})/(?P<date_m>[0-9]{2})/(?P<date_d>[0-9]{2})/(?P<display_id>[^?#/]+)'
_TESTS = [
{
'url': 'http://cinemassacre.com/2012/11/10/avgn-the-movie-trailer/',
'md5': 'fde81fbafaee331785f58cd6c0d46190',
'info_dict': {
'id': 'Cinemassacre-19911',
'ext': 'mp4',
'upload_date': '20121110',
'title': '“Angry Video Game Nerd: The Movie” Trailer',
'description': 'md5:fb87405fcb42a331742a0dce2708560b',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'http://cinemassacre.com/2013/10/02/the-mummys-hand-1940',
'md5': 'd72f10cd39eac4215048f62ab477a511',
'info_dict': {
'id': 'Cinemassacre-521be8ef82b16',
'ext': 'mp4',
'upload_date': '20131002',
'title': 'The Mummys Hand (1940)',
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
# Youtube embedded video
'url': 'http://cinemassacre.com/2006/12/07/chronologically-confused-about-bad-movie-and-video-game-sequel-titles/',
'md5': 'ec9838a5520ef5409b3e4e42fcb0a3b9',
'info_dict': {
'id': 'OEVzPCY2T-g',
'ext': 'webm',
'title': 'AVGN: Chronologically Confused about Bad Movie and Video Game Sequel Titles',
'upload_date': '20061207',
'uploader': 'Cinemassacre',
'uploader_id': 'JamesNintendoNerd',
'description': 'md5:784734696c2b8b7f4b8625cc799e07f6',
}
},
{
# Youtube embedded video
'url': 'http://cinemassacre.com/2006/09/01/mckids/',
'md5': '7393c4e0f54602ad110c793eb7a6513a',
'info_dict': {
'id': 'FnxsNhuikpo',
'ext': 'webm',
'upload_date': '20060901',
'uploader': 'Cinemassacre Extra',
'description': 'md5:de9b751efa9e45fbaafd9c8a1123ed53',
'uploader_id': 'Cinemassacre',
'title': 'AVGN: McKids',
}
},
{
'url': 'http://cinemassacre.com/2015/05/25/mario-kart-64-nintendo-64-james-mike-mondays/',
'md5': '1376908e49572389e7b06251a53cdd08',
'info_dict': {
'id': 'Cinemassacre-555779690c440',
'ext': 'mp4',
'description': 'Lets Play Mario Kart 64 !! Mario Kart 64 is a classic go-kart racing game released for the Nintendo 64 (N64). Today James & Mike do 4 player Battle Mode with Kyle and Bootsy!',
'title': 'Mario Kart 64 (Nintendo 64) James & Mike Mondays',
'upload_date': '20150525',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('display_id')
video_date = mobj.group('date_y') + mobj.group('date_m') + mobj.group('date_d')
webpage = self._download_webpage(url, display_id)
playerdata_url = self._search_regex(
[
ScreenwaveMediaIE.EMBED_PATTERN,
r'<iframe[^>]+src="(?P<url>(?:https?:)?//(?:[^.]+\.)?youtube\.com/.+?)"',
],
webpage, 'player data URL', default=None, group='url')
if not playerdata_url:
raise ExtractorError('Unable to find player data')
video_title = self._html_search_regex(
r'<title>(?P<title>.+?)\|', webpage, 'title')
video_description = self._html_search_regex(
r'<div class="entry-content">(?P<description>.+?)</div>',
webpage, 'description', flags=re.DOTALL, fatal=False)
video_thumbnail = self._og_search_thumbnail(webpage)
return {
'_type': 'url_transparent',
'display_id': display_id,
'title': video_title,
'description': video_description,
'upload_date': video_date,
'thumbnail': video_thumbnail,
'url': playerdata_url,
}

View File

@ -1,101 +0,0 @@
from __future__ import unicode_literals
import json
import re
from .common import InfoExtractor
from ..utils import int_or_none
class CollegeHumorIE(InfoExtractor):
_VALID_URL = r'^(?:https?://)?(?:www\.)?collegehumor\.com/(video|embed|e)/(?P<videoid>[0-9]+)/?(?P<shorttitle>.*)$'
_TESTS = [
{
'url': 'http://www.collegehumor.com/video/6902724/comic-con-cosplay-catastrophe',
'md5': 'dcc0f5c1c8be98dc33889a191f4c26bd',
'info_dict': {
'id': '6902724',
'ext': 'mp4',
'title': 'Comic-Con Cosplay Catastrophe',
'description': "Fans get creative this year at San Diego. Too creative. And yes, that's really Joss Whedon.",
'age_limit': 13,
'duration': 187,
},
}, {
'url': 'http://www.collegehumor.com/video/3505939/font-conference',
'md5': '72fa701d8ef38664a4dbb9e2ab721816',
'info_dict': {
'id': '3505939',
'ext': 'mp4',
'title': 'Font Conference',
'description': "This video wasn't long enough, so we made it double-spaced.",
'age_limit': 10,
'duration': 179,
},
}, {
# embedded youtube video
'url': 'http://www.collegehumor.com/embed/6950306',
'info_dict': {
'id': 'Z-bao9fg6Yc',
'ext': 'mp4',
'title': 'Young Americans Think President John F. Kennedy Died THIS MORNING IN A CAR ACCIDENT!!!',
'uploader': 'Mark Dice',
'uploader_id': 'MarkDice',
'description': 'md5:62c3dab9351fac7bb44b53b69511d87f',
'upload_date': '20140127',
},
'params': {
'skip_download': True,
},
'add_ie': ['Youtube'],
},
]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('videoid')
jsonUrl = 'http://www.collegehumor.com/moogaloop/video/' + video_id + '.json'
data = json.loads(self._download_webpage(
jsonUrl, video_id, 'Downloading info JSON'))
vdata = data['video']
if vdata.get('youtubeId') is not None:
return {
'_type': 'url',
'url': vdata['youtubeId'],
'ie_key': 'Youtube',
}
AGE_LIMITS = {'nc17': 18, 'r': 18, 'pg13': 13, 'pg': 10, 'g': 0}
rating = vdata.get('rating')
if rating:
age_limit = AGE_LIMITS.get(rating.lower())
else:
age_limit = None # None = No idea
PREFS = {'high_quality': 2, 'low_quality': 0}
formats = []
for format_key in ('mp4', 'webm'):
for qname, qurl in vdata.get(format_key, {}).items():
formats.append({
'format_id': format_key + '_' + qname,
'url': qurl,
'format': format_key,
'preference': PREFS.get(qname),
})
self._sort_formats(formats)
duration = int_or_none(vdata.get('duration'), 1000)
like_count = int_or_none(vdata.get('likes'))
return {
'id': video_id,
'title': vdata['title'],
'description': vdata.get('description'),
'thumbnail': vdata.get('thumbnail'),
'formats': formats,
'age_limit': age_limit,
'duration': duration,
'like_count': like_count,
}

View File

@ -44,10 +44,10 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
# or: http://www.colbertnation.com/the-colbert-report-collections/422008/festival-of-lights/79524
_VALID_URL = r'''(?x)^(:(?P<shortname>tds|thedailyshow)
|https?://(:www\.)?
(?P<showname>thedailyshow|thecolbertreport)\.(?:cc\.)?com/
(?P<showname>thedailyshow|thecolbertreport|tosh)\.(?:cc\.)?com/
((?:full-)?episodes/(?:[0-9a-z]{6}/)?(?P<episode>.*)|
(?P<clip>
(?:(?:guests/[^/]+|videos|video-playlists|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
(?:(?:guests/[^/]+|videos|video-(?:clips|playlists)|special-editions|news-team/[^/]+)/[^/]+/(?P<videotitle>[^/?#]+))
|(the-colbert-report-(videos|collections)/(?P<clipID>[0-9]+)/[^/]*/(?P<cntitle>.*?))
|(watch/(?P<date>[^/]*)/(?P<tdstitle>.*))
)|
@ -129,6 +129,9 @@ class ComedyCentralShowsIE(MTVServicesInfoExtractor):
}, {
'url': 'http://thedailyshow.cc.com/news-team/michael-che/7wnfel/we-need-to-talk-about-israel',
'only_matching': True,
}, {
'url': 'http://tosh.cc.com/video-clips/68g93d/twitter-users-share-summer-plans',
'only_matching': True,
}]
_available_formats = ['3500', '2200', '1700', '1200', '750', '400']

View File

@ -1058,12 +1058,8 @@ class InfoExtractor(object):
})
return formats
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
fatal=True, live=False):
formats = [{
def _m3u8_meta_format(self, m3u8_url, ext=None, preference=None, m3u8_id=None):
return {
'format_id': '-'.join(filter(None, [m3u8_id, 'meta'])),
'url': m3u8_url,
'ext': ext,
@ -1071,7 +1067,14 @@ class InfoExtractor(object):
'preference': preference - 1 if preference else -1,
'resolution': 'multiple',
'format_note': 'Quality selection URL',
}]
}
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
entry_protocol='m3u8', preference=None,
m3u8_id=None, note=None, errnote=None,
fatal=True, live=False):
formats = [self._m3u8_meta_format(m3u8_url, ext, preference, m3u8_id)]
format_url = lambda u: (
u
@ -1138,12 +1141,15 @@ class InfoExtractor(object):
format_id = []
if m3u8_id:
format_id.append(m3u8_id)
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') != 'SUBTITLES' else None
last_media_name = last_media.get('NAME') if last_media and last_media.get('TYPE') not in ('SUBTITLES', 'CLOSED-CAPTIONS') else None
# Despite specification does not mention NAME attribute for
# EXT-X-STREAM-INF it still sometimes may be present
stream_name = last_info.get('NAME') or last_media_name
# Bandwidth of live streams may differ over time thus making
# format_id unpredictable. So it's better to keep provided
# format_id intact.
if not live:
format_id.append(last_media_name if last_media_name else '%d' % (tbr if tbr else len(formats)))
format_id.append(stream_name if stream_name else '%d' % (tbr if tbr else len(formats)))
f = {
'format_id': '-'.join(format_id),
'url': format_url(line.strip()),
@ -1275,21 +1281,21 @@ class InfoExtractor(object):
m3u8_count = 0
srcs = []
videos = smil.findall(self._xpath_ns('.//video', namespace))
for video in videos:
src = video.get('src')
media = smil.findall(self._xpath_ns('.//video', namespace)) + smil.findall(self._xpath_ns('.//audio', namespace))
for medium in media:
src = medium.get('src')
if not src or src in srcs:
continue
srcs.append(src)
bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
filesize = int_or_none(video.get('size') or video.get('fileSize'))
width = int_or_none(video.get('width'))
height = int_or_none(video.get('height'))
proto = video.get('proto')
ext = video.get('ext')
bitrate = float_or_none(medium.get('system-bitrate') or medium.get('systemBitrate'), 1000)
filesize = int_or_none(medium.get('size') or medium.get('fileSize'))
width = int_or_none(medium.get('width'))
height = int_or_none(medium.get('height'))
proto = medium.get('proto')
ext = medium.get('ext')
src_ext = determine_ext(src)
streamer = video.get('streamer') or base
streamer = medium.get('streamer') or base
if proto == 'rtmp' or streamer.startswith('rtmp'):
rtmp_count += 1

View File

@ -3,6 +3,10 @@ from __future__ import unicode_literals
from .abc import ABCIE
from .abc7news import Abc7NewsIE
from .abcnews import (
AbcNewsIE,
AbcNewsVideoIE,
)
from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
@ -107,6 +111,7 @@ from .cbc import (
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbslocal import CBSLocalIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import (
CBSNewsIE,
@ -124,7 +129,6 @@ from .chirbit import (
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .cinemassacre import CinemassacreIE
from .cliprs import ClipRsIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
@ -139,7 +143,6 @@ from .cnn import (
CNNBlogsIE,
CNNArticleIE,
)
from .collegehumor import CollegeHumorIE
from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
@ -240,6 +243,7 @@ from .fktv import FKTVIE
from .flickr import FlickrIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .formula1 import Formula1IE
from .fourtube import FourTubeIE
from .fox import FOXIE
from .foxgay import FoxgayIE
@ -367,6 +371,7 @@ from .kuwo import (
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .learnr import LearnrIE
from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE
from .leeco import (
@ -392,6 +397,7 @@ from .livestream import (
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
from .localnews8 import LocalNews8IE
from .lovehomeporn import LoveHomePornIE
from .lrt import LRTIE
from .lynda import (
@ -665,6 +671,7 @@ from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .senateisvp import SenateISVPIE
from .sendtonews import SendtoNewsIE
from .servingsys import ServingSysIE
from .sexu import SexuIE
from .shahid import ShahidIE
@ -767,6 +774,7 @@ from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .threeqsdn import ThreeQSDNIE
from .tinypic import TinyPicIE
from .tlc import TlcDeIE
from .tmz import (
@ -834,7 +842,6 @@ from .twitch import (
TwitchVodIE,
TwitchProfileIE,
TwitchPastBroadcastsIE,
TwitchBookmarksIE,
TwitchStreamIE,
)
from .twitter import (
@ -852,7 +859,10 @@ from .unistra import UnistraIE
from .urort import UrortIE
from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import UstudioIE
from .ustudio import (
UstudioIE,
UstudioEmbedIE,
)
from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE

View File

@ -0,0 +1,25 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class Formula1IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?formula1\.com/content/fom-website/en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
_TEST = {
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
'md5': '8c79e54be72078b26b89e0e111c0502b',
'info_dict': {
'id': 'JvYXJpMzE6pArfHWm5ARp5AiUmD-gibV',
'ext': 'flv',
'title': 'Race highlights - Spain 2016',
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
ooyala_embed_code = self._search_regex(
r'data-videoid="([^"]+)"', webpage, 'ooyala embed code')
return self.url_result(
'ooyala:%s' % ooyala_embed_code, 'Ooyala', ooyala_embed_code)

View File

@ -61,6 +61,8 @@ from .jwplatform import JWPlatformIE
from .digiteka import DigitekaIE
from .instagram import InstagramIE
from .liveleak import LiveLeakIE
from .threeqsdn import ThreeQSDNIE
from .theplatform import ThePlatformIE
class GenericIE(InfoExtractor):
@ -716,15 +718,18 @@ class GenericIE(InfoExtractor):
},
# Wistia embed
{
'url': 'http://education-portal.com/academy/lesson/north-american-exploration-failed-colonies-of-spain-france-england.html#lesson',
'md5': '8788b683c777a5cf25621eaf286d0c23',
'url': 'http://study.com/academy/lesson/north-american-exploration-failed-colonies-of-spain-france-england.html#lesson',
'md5': '1953f3a698ab51cfc948ed3992a0b7ff',
'info_dict': {
'id': '1cfaf6b7ea',
'id': '6e2wtrbdaf',
'ext': 'mov',
'title': 'md5:51364a8d3d009997ba99656004b5e20d',
'duration': 643.0,
'filesize': 182808282,
'uploader': 'education-portal.com',
'title': 'paywall_north-american-exploration-failed-colonies-of-spain-france-england',
'description': 'a Paywall Videos video from Remilon',
'duration': 644.072,
'uploader': 'study.com',
'timestamp': 1459678540,
'upload_date': '20160403',
'filesize': 24687186,
},
},
{
@ -733,14 +738,30 @@ class GenericIE(InfoExtractor):
'info_dict': {
'id': 'uxjb0lwrcz',
'ext': 'mp4',
'title': 'Conversation about Hexagonal Rails Part 1 - ThoughtWorks',
'title': 'Conversation about Hexagonal Rails Part 1',
'description': 'a Martin Fowler video from ThoughtWorks',
'duration': 1715.0,
'uploader': 'thoughtworks.wistia.com',
'upload_date': '20140603',
'timestamp': 1401832161,
'upload_date': '20140603',
},
},
# Wistia standard embed (async)
{
'url': 'https://www.getdrip.com/university/brennan-dunn-drip-workshop/',
'info_dict': {
'id': '807fafadvk',
'ext': 'mp4',
'title': 'Drip Brennan Dunn Workshop',
'description': 'a JV Webinars video from getdrip-1',
'duration': 4986.95,
'timestamp': 1463607249,
'upload_date': '20160518',
},
'params': {
'skip_download': True,
}
},
# Soundcloud embed
{
'url': 'http://nakedsecurity.sophos.com/2014/10/29/sscc-171-are-you-sure-that-1234-is-a-bad-password-podcast/',
@ -1427,7 +1448,8 @@ class GenericIE(InfoExtractor):
# Site Name | Video Title
# Video Title - Tagline | Site Name
# and so on and so forth; it's just not practical
video_title = self._html_search_regex(
video_title = self._og_search_title(
webpage, default=None) or self._html_search_regex(
r'(?s)<title>(.*?)</title>', webpage, 'video title',
default='video')
@ -1445,6 +1467,9 @@ class GenericIE(InfoExtractor):
video_uploader = self._search_regex(
r'^(?:https?://)?([^/]*)/.*', url, 'video uploader')
video_description = self._og_search_description(webpage, default=None)
video_thumbnail = self._og_search_thumbnail(webpage, default=None)
# Helper method
def _playlist_from_matches(matches, getter=None, ie=None):
urlrs = orderedSet(
@ -1475,6 +1500,11 @@ class GenericIE(InfoExtractor):
if bc_urls:
return _playlist_from_matches(bc_urls, ie='BrightcoveNew')
# Look for ThePlatform embeds
tp_urls = ThePlatformIE._extract_urls(webpage)
if tp_urls:
return _playlist_from_matches(tp_urls, ie='ThePlatform')
# Look for embedded rtl.nl player
matches = re.findall(
r'<iframe[^>]+?src="((?:https?:)?//(?:www\.)?rtl\.nl/system/videoplayer/[^"]+(?:video_)?embed[^"]+)"',
@ -1543,21 +1573,26 @@ class GenericIE(InfoExtractor):
'url': embed_url,
'ie_key': 'Wistia',
'uploader': video_uploader,
'title': video_title,
'id': video_id,
}
match = re.search(r'(?:id=["\']wistia_|data-wistia-?id=["\']|Wistia\.embed\(["\'])(?P<id>[^"\']+)', webpage)
if match:
return {
'_type': 'url_transparent',
'url': 'http://fast.wistia.net/embed/iframe/{0:}'.format(match.group('id')),
'url': 'wistia:%s' % match.group('id'),
'ie_key': 'Wistia',
'uploader': video_uploader,
'title': video_title,
'id': match.group('id')
}
match = re.search(
r'''(?sx)
<script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
<div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]+)\b.*?\2
''', webpage)
if match:
return self.url_result(self._proto_relative_url(
'wistia:%s' % match.group('id')), 'Wistia')
# Look for SVT player
svt_url = SVTIE._extract_url(webpage)
if svt_url:
@ -1983,6 +2018,19 @@ class GenericIE(InfoExtractor):
if liveleak_url:
return self.url_result(liveleak_url, 'LiveLeak')
# Look for 3Q SDN embeds
threeqsdn_url = ThreeQSDNIE._extract_url(webpage)
if threeqsdn_url:
return {
'_type': 'url_transparent',
'ie_key': ThreeQSDNIE.ie_key(),
'url': self._proto_relative_url(threeqsdn_url),
'title': video_title,
'description': video_description,
'thumbnail': video_thumbnail,
'uploader': video_uploader,
}
def check_video(vurl):
if YoutubeIE.suitable(vurl):
return True

View File

@ -4,7 +4,7 @@ from .common import InfoExtractor
class GrouponIE(InfoExtractor):
_VALID_URL = r'https?://www\.groupon\.com/deals/(?P<id>[^?#]+)'
_VALID_URL = r'https?://(?:www\.)?groupon\.com/deals/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://www.groupon.com/deals/bikram-yoga-huntington-beach-2#ooid=tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
@ -15,18 +15,26 @@ class GrouponIE(InfoExtractor):
},
'playlist': [{
'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'flv',
'title': 'Bikram Yoga Huntington Beach | Orange County',
'id': 'fk6OhWpXgIQ',
'ext': 'mp4',
'title': 'Bikram Yoga Huntington Beach | Orange County !tubGNycTo@9Uxg82uESj4i61EYX8nyuf',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 44.961,
'duration': 45,
'upload_date': '20160405',
'uploader_id': 'groupon',
'uploader': 'Groupon',
},
}],
'params': {
'skip_download': 'HDS',
'skip_download': True,
}
}
_PROVIDERS = {
'ooyala': ('ooyala:%s', 'Ooyala'),
'youtube': ('%s', 'Youtube'),
}
def _real_extract(self, url):
playlist_id = self._match_id(url)
webpage = self._download_webpage(url, playlist_id)
@ -36,12 +44,17 @@ class GrouponIE(InfoExtractor):
videos = payload['carousel'].get('dealVideos', [])
entries = []
for v in videos:
if v.get('provider') != 'OOYALA':
provider = v.get('provider')
video_id = v.get('media') or v.get('id') or v.get('baseURL')
if not provider or not video_id:
continue
url_pattern, ie_key = self._PROVIDERS.get(provider.lower())
if not url_pattern:
self.report_warning(
'%s: Unsupported video provider %s, skipping video' %
(playlist_id, v.get('provider')))
(playlist_id, provider))
continue
entries.append(self.url_result('ooyala:%s' % v['media']))
entries.append(self.url_result(url_pattern % video_id, ie_key))
return {
'_type': 'playlist',

View File

@ -7,6 +7,7 @@ from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
HEADRequest,
KNOWN_EXTENSIONS,
sanitized_Request,
str_to_int,
urlencode_postdata,
@ -17,7 +18,7 @@ from ..utils import (
class HearThisAtIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?hearthis\.at/(?P<artist>[^/]+)/(?P<title>[A-Za-z0-9\-]+)/?$'
_PLAYLIST_URL = 'https://hearthis.at/playlist.php'
_TEST = {
_TESTS = [{
'url': 'https://hearthis.at/moofi/dr-kreep',
'md5': 'ab6ec33c8fed6556029337c7885eb4e0',
'info_dict': {
@ -26,7 +27,7 @@ class HearThisAtIE(InfoExtractor):
'title': 'Moofi - Dr. Kreep',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1421564134,
'description': 'Creepy Patch. Mutable Instruments Braids Vowel + Formant Mode.',
'description': 'Listen to Dr. Kreep by Moofi on hearthis.at - Modular, Eurorack, Mutable Intruments Braids, Valhalla-DSP',
'upload_date': '20150118',
'comment_count': int,
'view_count': int,
@ -34,7 +35,25 @@ class HearThisAtIE(InfoExtractor):
'duration': 71,
'categories': ['Experimental'],
}
}
}, {
# 'download' link redirects to the original webpage
'url': 'https://hearthis.at/twitchsf/dj-jim-hopkins-totally-bitchin-80s-dance-mix/',
'md5': '5980ceb7c461605d30f1f039df160c6e',
'info_dict': {
'id': '811296',
'ext': 'mp3',
'title': 'TwitchSF - DJ Jim Hopkins - Totally Bitchin\' 80\'s Dance Mix!',
'description': 'Listen to DJ Jim Hopkins - Totally Bitchin\' 80\'s Dance Mix! by TwitchSF on hearthis.at - Dance',
'upload_date': '20160328',
'timestamp': 1459186146,
'thumbnail': 're:^https?://.*\.jpg$',
'comment_count': int,
'view_count': int,
'like_count': int,
'duration': 4360,
'categories': ['Dance'],
},
}]
def _real_extract(self, url):
m = re.match(self._VALID_URL, url)
@ -90,6 +109,7 @@ class HearThisAtIE(InfoExtractor):
ext_handle = self._request_webpage(
ext_req, display_id, note='Determining extension')
ext = urlhandle_detect_ext(ext_handle)
if ext in KNOWN_EXTENSIONS:
formats.append({
'format_id': 'download',
'vcodec': 'none',

View File

@ -1,10 +1,10 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..utils import (
mimetype2ext,
qualities,
)
@ -12,9 +12,9 @@ from ..utils import (
class ImdbIE(InfoExtractor):
IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/imdb/vi(?P<id>\d+)'
_VALID_URL = r'https?://(?:www|m)\.imdb\.com/video/[^/]+/vi(?P<id>\d+)'
_TEST = {
_TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': {
'id': '2524815897',
@ -22,7 +22,10 @@ class ImdbIE(InfoExtractor):
'title': 'Ice Age: Continental Drift Trailer (No. 2) - IMDb',
'description': 'md5:9061c2219254e5d14e03c25c98e96a81',
}
}
}, {
'url': 'http://www.imdb.com/video/_/vi2524815897',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@ -48,13 +51,27 @@ class ImdbIE(InfoExtractor):
json_data = self._search_regex(
r'<script[^>]+class="imdb-player-data"[^>]*?>(.*?)</script>',
format_page, 'json data', flags=re.DOTALL)
info = json.loads(json_data)
format_info = info['videoPlayerObject']['video']
f_id = format_info['ffname']
info = self._parse_json(json_data, video_id, fatal=False)
if not info:
continue
format_info = info.get('videoPlayerObject', {}).get('video', {})
if not format_info:
continue
video_info_list = format_info.get('videoInfoList')
if not video_info_list or not isinstance(video_info_list, list):
continue
video_info = video_info_list[0]
if not video_info or not isinstance(video_info, dict):
continue
video_url = video_info.get('videoUrl')
if not video_url:
continue
format_id = format_info.get('ffname')
formats.append({
'format_id': f_id,
'url': format_info['videoInfoList'][0]['videoUrl'],
'quality': quality(f_id),
'format_id': format_id,
'url': video_url,
'ext': mimetype2ext(video_info.get('videoMimeType')),
'quality': quality(format_id),
})
self._sort_formats(formats)

View File

@ -505,7 +505,10 @@ class IqiyiIE(InfoExtractor):
'enc': md5_text(enc_key + tail),
'qyid': _uuid,
'tn': random.random(),
'um': 0,
# In iQiyi's flash player, um is set to 1 if there's a logged user
# Some 1080P formats are only available with a logged user.
# Here force um=1 to trick the iQiyi server
'um': 1,
'authkey': md5_text(md5_text('') + tail),
'k_tag': 1,
}

View File

@ -5,33 +5,50 @@ import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
float_or_none,
int_or_none,
)
class JWPlatformBaseIE(InfoExtractor):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True, m3u8_id=None, rtmp_params=None):
video_data = jwplayer_data['playlist'][0]
formats = []
for source in video_data['sources']:
source_url = self._proto_relative_url(source['file'])
source_type = source.get('type') or ''
if source_type in ('application/vnd.apple.mpegurl', 'hls'):
if source_type in ('application/vnd.apple.mpegurl', 'hls') or determine_ext(source_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
source_url, video_id, 'mp4', 'm3u8_native', fatal=False))
source_url, video_id, 'mp4', 'm3u8_native', m3u8_id=m3u8_id, fatal=False))
elif source_type.startswith('audio'):
formats.append({
'url': source_url,
'vcodec': 'none',
})
else:
formats.append({
a_format = {
'url': source_url,
'width': int_or_none(source.get('width')),
'height': int_or_none(source.get('height')),
}
if source_url.startswith('rtmp'):
a_format['ext'] = 'flv',
# See com/longtailvideo/jwplayer/media/RTMPMediaProvider.as
# of jwplayer.flash.swf
rtmp_url_parts = re.split(
r'((?:mp4|mp3|flv):)', source_url, 1)
if len(rtmp_url_parts) == 3:
rtmp_url, prefix, play_path = rtmp_url_parts
a_format.update({
'url': rtmp_url,
'play_path': prefix + play_path,
})
if rtmp_params:
a_format.update(rtmp_params)
formats.append(a_format)
self._sort_formats(formats)
subtitles = {}

View File

@ -0,0 +1,33 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
class LearnrIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?learnr\.pro/view/video/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.learnr.pro/view/video/51624-web-development-tutorial-for-beginners-1-how-to-build-webpages-with-html-css-javascript',
'md5': '3719fdf0a68397f49899e82c308a89de',
'info_dict': {
'id': '51624',
'ext': 'mp4',
'title': 'Web Development Tutorial for Beginners (#1) - How to build webpages with HTML, CSS, Javascript',
'description': 'md5:b36dbfa92350176cdf12b4d388485503',
'uploader': 'LearnCode.academy',
'uploader_id': 'learncodeacademy',
'upload_date': '20131021',
},
'add_ie': ['Youtube'],
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
return {
'_type': 'url_transparent',
'url': self._search_regex(
r"videoId\s*:\s*'([^']+)'", webpage, 'youtube id'),
'id': video_id,
}

View File

@ -0,0 +1,47 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class LocalNews8IE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?localnews8\.com/(?:[^/]+/)*(?P<display_id>[^/]+)/(?P<id>[0-9]+)'
_TEST = {
'url': 'http://www.localnews8.com/news/rexburg-business-turns-carbon-fiber-scraps-into-wedding-rings/35183304',
'md5': 'be4d48aea61aa2bde7be2ee47691ad20',
'info_dict': {
'id': '35183304',
'display_id': 'rexburg-business-turns-carbon-fiber-scraps-into-wedding-rings',
'ext': 'mp4',
'title': 'Rexburg business turns carbon fiber scraps into wedding ring',
'description': 'The process was first invented by Lamborghini and less than a dozen companies around the world use it.',
'duration': 153,
'timestamp': 1441844822,
'upload_date': '20150910',
'uploader_id': 'api',
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
webpage = self._download_webpage(url, display_id)
partner_id = self._search_regex(
r'partnerId\s*[:=]\s*(["\'])(?P<id>\d+)\1',
webpage, 'partner id', group='id')
kaltura_id = self._search_regex(
r'videoIdString\s*[:=]\s*(["\'])kaltura:(?P<id>[0-9a-z_]+)\1',
webpage, 'videl id', group='id')
return {
'_type': 'url_transparent',
'url': 'kaltura:%s:%s' % (partner_id, kaltura_id),
'ie_key': 'Kaltura',
'id': video_id,
'display_id': display_id,
}

View File

@ -11,7 +11,7 @@ class MGTVIE(InfoExtractor):
_TEST = {
'url': 'http://www.mgtv.com/v/1/290525/f/3116640.html',
'md5': '',
'md5': '1bdadcf760a0b90946ca68ee9a2db41a',
'info_dict': {
'id': '3116640',
'ext': 'mp4',
@ -20,15 +20,6 @@ class MGTVIE(InfoExtractor):
'duration': 7461,
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {
'skip_download': True, # m3u8 download
},
}
_FORMAT_MAP = {
'标清': ('Standard', 0),
'高清': ('High', 1),
'超清': ('SuperHigh', 2),
}
def _real_extract(self, url):
@ -40,17 +31,27 @@ class MGTVIE(InfoExtractor):
formats = []
for idx, stream in enumerate(api_data['stream']):
format_name = stream.get('name')
format_id, preference = self._FORMAT_MAP.get(format_name, (None, None))
stream_url = stream.get('url')
if not stream_url:
continue
tbr = int_or_none(self._search_regex(
r'(\d+)\.mp4', stream_url, 'tbr', default=None))
def extract_format(stream_url, format_id, idx, query={}):
format_info = self._download_json(
stream['url'], video_id,
note='Download video info for format %s' % format_id or '#%d' % idx)
formats.append({
stream_url, video_id,
note='Download video info for format %s' % format_id or '#%d' % idx, query=query)
return {
'format_id': format_id,
'url': format_info['info'],
'ext': 'mp4', # These are m3u8 playlists
'preference': preference,
})
'ext': 'mp4',
'tbr': tbr,
}
formats.append(extract_format(
stream_url, 'hls-%d' % tbr if tbr else None, idx * 2))
formats.append(extract_format(stream_url.replace(
'/playlist.m3u8', ''), 'http-%d' % tbr if tbr else None, idx * 2 + 1, {'pno': 1031}))
self._sort_formats(formats)
return {

View File

@ -1,19 +1,18 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
month_by_name,
int_or_none,
remove_end,
unified_strdate,
)
class NDTVIE(InfoExtractor):
_VALID_URL = r'^https?://(?:www\.)?ndtv\.com/video/player/[^/]*/[^/]*/(?P<id>[a-z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?ndtv\.com/video/(?:[^/]+/)+[^/?^&]+-(?P<id>\d+)'
_TEST = {
'url': 'http://www.ndtv.com/video/player/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal/300710',
'url': 'http://www.ndtv.com/video/news/news/ndtv-exclusive-don-t-need-character-certificate-from-rahul-gandhi-says-arvind-kejriwal-300710',
'md5': '39f992dbe5fb531c395d8bbedb1e5e88',
'info_dict': {
'id': '300710',
@ -22,7 +21,7 @@ class NDTVIE(InfoExtractor):
'description': 'md5:ab2d4b4a6056c5cb4caa6d729deabf02',
'upload_date': '20131208',
'duration': 1327,
'thumbnail': 'http://i.ndtvimg.com/video/images/vod/medium/2013-12/big_300710_1386518307.jpg',
'thumbnail': 're:https?://.*\.jpg',
},
}
@ -30,36 +29,19 @@ class NDTVIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = remove_end(self._og_search_title(webpage), ' - NDTV')
filename = self._search_regex(
r"__filename='([^']+)'", webpage, 'video filename')
video_url = ('http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' %
filename)
video_url = 'http://bitcast-b.bitgravity.com/ndtvod/23372/ndtv/%s' % filename
duration = int_or_none(self._search_regex(
r"__duration='([^']+)'", webpage, 'duration', fatal=False))
date_m = re.search(r'''(?x)
<p\s+class="vod_dateline">\s*
Published\s+On:\s*
(?P<monthname>[A-Za-z]+)\s+(?P<day>[0-9]+),\s*(?P<year>[0-9]+)
''', webpage)
upload_date = None
upload_date = unified_strdate(self._html_search_meta(
'publish-date', webpage, 'upload date', fatal=False))
if date_m is not None:
month = month_by_name(date_m.group('monthname'))
if month is not None:
upload_date = '%s%02d%02d' % (
date_m.group('year'), month, int(date_m.group('day')))
description = self._og_search_description(webpage)
READ_MORE = ' (Read more)'
if description.endswith(READ_MORE):
description = description[:-len(READ_MORE)]
title = self._og_search_title(webpage)
TITLE_SUFFIX = ' - NDTV'
if title.endswith(TITLE_SUFFIX):
title = title[:-len(TITLE_SUFFIX)]
description = remove_end(self._og_search_description(webpage), ' (Read more)')
return {
'id': video_id,

View File

@ -2,8 +2,12 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
sanitized_Request,
clean_html,
determine_ext,
int_or_none,
qualities,
urlencode_postdata,
xpath_text,
)
@ -16,12 +20,12 @@ class NFBIE(InfoExtractor):
'url': 'https://www.nfb.ca/film/qallunaat_why_white_people_are_funny',
'info_dict': {
'id': 'qallunaat_why_white_people_are_funny',
'ext': 'mp4',
'ext': 'flv',
'title': 'Qallunaat! Why White People Are Funny ',
'description': 'md5:836d8aff55e087d04d9f6df554d4e038',
'description': 'md5:6b8e32dde3abf91e58857b174916620c',
'duration': 3128,
'creator': 'Mark Sandiford',
'uploader': 'Mark Sandiford',
'uploader_id': 'mark-sandiford',
},
'params': {
# rtmp download
@ -31,65 +35,78 @@ class NFBIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
page = self._download_webpage(
'https://www.nfb.ca/film/%s' % video_id, video_id,
'Downloading film page')
uploader_id = self._html_search_regex(r'<a class="director-link" href="/explore-all-directors/([^/]+)/"',
page, 'director id', fatal=False)
uploader = self._html_search_regex(r'<em class="director-name" itemprop="name">([^<]+)</em>',
page, 'director name', fatal=False)
request = sanitized_Request(
config = self._download_xml(
'https://www.nfb.ca/film/%s/player_config' % video_id,
urlencode_postdata({'getConfig': 'true'}))
request.add_header('Content-Type', 'application/x-www-form-urlencoded')
request.add_header('X-NFB-Referer', 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf')
video_id, 'Downloading player config XML',
data=urlencode_postdata({'getConfig': 'true'}),
headers={
'Content-Type': 'application/x-www-form-urlencoded',
'X-NFB-Referer': 'http://www.nfb.ca/medias/flash/NFBVideoPlayer.swf'
})
config = self._download_xml(request, video_id, 'Downloading player config XML')
title = None
description = None
thumbnail = None
duration = None
formats = []
def extract_thumbnail(media):
thumbnails = {}
for asset in media.findall('assets/asset'):
thumbnails[asset.get('quality')] = asset.find('default/url').text
if not thumbnails:
return None
if 'high' in thumbnails:
return thumbnails['high']
return list(thumbnails.values())[0]
title, description, thumbnail, duration, uploader, author = [None] * 6
thumbnails, formats = [[]] * 2
subtitles = {}
for media in config.findall('./player/stream/media'):
if media.get('type') == 'posterImage':
thumbnail = extract_thumbnail(media)
elif media.get('type') == 'video':
duration = int(media.get('duration'))
title = media.find('title').text
description = media.find('description').text
# It seems assets always go from lower to better quality, so no need to sort
quality_key = qualities(('low', 'high'))
thumbnails = []
for asset in media.findall('assets/asset'):
for x in asset:
asset_url = xpath_text(asset, 'default/url', default=None)
if not asset_url:
continue
quality = asset.get('quality')
thumbnails.append({
'url': asset_url,
'id': quality,
'preference': quality_key(quality),
})
elif media.get('type') == 'video':
title = xpath_text(media, 'title', fatal=True)
for asset in media.findall('assets/asset'):
quality = asset.get('quality')
height = int_or_none(self._search_regex(
r'^(\d+)[pP]$', quality or '', 'height', default=None))
for node in asset:
streamer = xpath_text(node, 'streamerURI', default=None)
if not streamer:
continue
play_path = xpath_text(node, 'url', default=None)
if not play_path:
continue
formats.append({
'url': x.find('streamerURI').text,
'app': x.find('streamerURI').text.split('/', 3)[3],
'play_path': x.find('url').text,
'url': streamer,
'app': streamer.split('/', 3)[3],
'play_path': play_path,
'rtmp_live': False,
'ext': 'mp4',
'format_id': '%s-%s' % (x.tag, asset.get('quality')),
'ext': 'flv',
'format_id': '%s-%s' % (node.tag, quality) if quality else node.tag,
'height': height,
})
self._sort_formats(formats)
description = clean_html(xpath_text(media, 'description'))
uploader = xpath_text(media, 'author')
duration = int_or_none(media.get('duration'))
for subtitle in media.findall('./subtitles/subtitle'):
subtitle_url = xpath_text(subtitle, 'url', default=None)
if not subtitle_url:
continue
lang = xpath_text(subtitle, 'lang', default='en')
subtitles.setdefault(lang, []).append({
'url': subtitle_url,
'ext': (subtitle.get('format') or determine_ext(subtitle_url)).lower(),
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'thumbnails': thumbnails,
'duration': duration,
'creator': uploader,
'uploader': uploader,
'uploader_id': uploader_id,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -4,35 +4,140 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_urlparse,
compat_urllib_parse_unquote,
)
from ..compat import compat_urllib_parse_unquote
from ..utils import (
determine_ext,
ExtractorError,
float_or_none,
int_or_none,
parse_age_limit,
parse_duration,
unified_strdate,
)
class NRKIE(InfoExtractor):
_VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
class NRKBaseIE(InfoExtractor):
def _extract_formats(self, manifest_url, video_id, fatal=True):
formats = []
formats.extend(self._extract_f4m_formats(
manifest_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81',
video_id, f4m_id='hds', fatal=fatal))
formats.extend(self._extract_m3u8_formats(manifest_url.replace(
'akamaihd.net/z/', 'akamaihd.net/i/').replace('/manifest.f4m', '/master.m3u8'),
video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=fatal))
return formats
_TESTS = [
{
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://%s/mediaelement/%s' % (self._API_HOST, video_id),
video_id, 'Downloading mediaelement JSON')
title = data.get('fullTitle') or data.get('mainTitle') or data['title']
video_id = data.get('id') or video_id
entries = []
media_assets = data.get('mediaAssets')
if media_assets and isinstance(media_assets, list):
def video_id_and_title(idx):
return ((video_id, title) if len(media_assets) == 1
else ('%s-%d' % (video_id, idx), '%s (Part %d)' % (title, idx)))
for num, asset in enumerate(media_assets, 1):
asset_url = asset.get('url')
if not asset_url:
continue
formats = self._extract_formats(asset_url, video_id, fatal=False)
if not formats:
continue
self._sort_formats(formats)
entry_id, entry_title = video_id_and_title(num)
duration = parse_duration(asset.get('duration'))
subtitles = {}
for subtitle in ('webVtt', 'timedText'):
subtitle_url = asset.get('%sSubtitlesUrl' % subtitle)
if subtitle_url:
subtitles.setdefault('no', []).append({
'url': compat_urllib_parse_unquote(subtitle_url)
})
entries.append({
'id': asset.get('carrierId') or entry_id,
'title': entry_title,
'duration': duration,
'subtitles': subtitles,
'formats': formats,
})
if not entries:
media_url = data.get('mediaUrl')
if media_url:
formats = self._extract_formats(media_url, video_id)
self._sort_formats(formats)
duration = parse_duration(data.get('duration'))
entries = [{
'id': video_id,
'title': title,
'duration': duration,
'formats': formats,
}]
if not entries:
if data.get('usageRights', {}).get('isGeoBlocked'):
raise ExtractorError(
'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
expected=True)
conviva = data.get('convivaStatistics') or {}
series = conviva.get('seriesName') or data.get('seriesTitle')
episode = conviva.get('episodeName') or data.get('episodeNumberOrDate')
thumbnails = None
images = data.get('images')
if images and isinstance(images, dict):
web_images = images.get('webImages')
if isinstance(web_images, list):
thumbnails = [{
'url': image['imageUrl'],
'width': int_or_none(image.get('width')),
'height': int_or_none(image.get('height')),
} for image in web_images if image.get('imageUrl')]
description = data.get('description')
common_info = {
'description': description,
'series': series,
'episode': episode,
'age_limit': parse_age_limit(data.get('legalAge')),
'thumbnails': thumbnails,
}
vcodec = 'none' if data.get('mediaType') == 'Audio' else None
# TODO: extract chapters when https://github.com/rg3/youtube-dl/pull/9409 is merged
for entry in entries:
entry.update(common_info)
for f in entry['formats']:
f['vcodec'] = vcodec
return self.playlist_result(entries, video_id, title, description)
class NRKIE(NRKBaseIE):
_VALID_URL = r'(?:nrk:|https?://(?:www\.)?nrk\.no/video/PS\*)(?P<id>\d+)'
_API_HOST = 'v8.psapi.nrk.no'
_TESTS = [{
# video
'url': 'http://www.nrk.no/video/PS*150533',
# MD5 is unstable
'md5': '2f7f6eeb2aacdd99885f355428715cfa',
'info_dict': {
'id': '150533',
'ext': 'flv',
'ext': 'mp4',
'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
'duration': 263,
}
},
{
}, {
# audio
'url': 'http://www.nrk.no/video/PS*154915',
# MD5 is unstable
'info_dict': {
@ -42,52 +147,75 @@ class NRKIE(InfoExtractor):
'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
'duration': 20,
}
},
]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'http://v8.psapi.nrk.no/mediaelement/%s' % video_id,
video_id, 'Downloading media JSON')
media_url = data.get('mediaUrl')
if not media_url:
if data['usageRights']['isGeoBlocked']:
raise ExtractorError(
'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
expected=True)
if determine_ext(media_url) == 'f4m':
formats = self._extract_f4m_formats(
media_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id, f4m_id='hds')
self._sort_formats(formats)
else:
formats = [{
'url': media_url,
'ext': 'flv',
}]
duration = parse_duration(data.get('duration'))
images = data.get('images')
if images:
thumbnails = images['webImages']
thumbnails.sort(key=lambda image: image['pixelWidth'])
thumbnail = thumbnails[-1]['imageUrl']
else:
thumbnail = None
class NRKTVIE(NRKBaseIE):
IE_DESC = 'NRK TV and NRK Radio'
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_API_HOST = 'psapi-we.nrk.no'
return {
'id': video_id,
'title': data['title'],
'description': data['description'],
'duration': duration,
'thumbnail': thumbnail,
'formats': formats,
}
_TESTS = [{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '4e9ca6629f09e588ed240fb11619922a',
'info_dict': {
'id': 'MUHH48000314AA',
'ext': 'mp4',
'title': '20 spørsmål 23.05.2014',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'duration': 1741.52,
},
}, {
'url': 'https://tv.nrk.no/program/mdfp15000514',
'md5': '43d0be26663d380603a9cf0c24366531',
'info_dict': {
'id': 'MDFP15000514CA',
'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014',
'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
'duration': 4605.08,
},
}, {
# single playlist video
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
},
'skip': 'Only works from Norway',
}, {
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'playlist': [{
'md5': '9480285eff92d64f06e02a5367970a7a',
'info_dict': {
'id': 'MSPO40010515-part1',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
},
}, {
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
},
}],
'info_dict': {
'id': 'MSPO40010515',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'duration': 6947.52,
},
'skip': 'Only works from Norway',
}, {
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True,
}]
class NRKPlaylistIE(InfoExtractor):
@ -159,179 +287,3 @@ class NRKSkoleIE(InfoExtractor):
nrk_id = self._search_regex(r'data-nrk-id=["\'](\d+)', webpage, 'nrk id')
return self.url_result('nrk:%s' % nrk_id)
class NRKTVIE(InfoExtractor):
IE_DESC = 'NRK TV and NRK Radio'
_VALID_URL = r'(?P<baseurl>https?://(?:tv|radio)\.nrk(?:super)?\.no/)(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
_TESTS = [
{
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'info_dict': {
'id': 'MUHH48000314',
'ext': 'mp4',
'title': '20 spørsmål',
'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
'upload_date': '20140523',
'duration': 1741.52,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
'url': 'https://tv.nrk.no/program/mdfp15000514',
'info_dict': {
'id': 'mdfp15000514',
'ext': 'mp4',
'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting',
'description': 'md5:654c12511f035aed1e42bdf5db3b206a',
'upload_date': '20140524',
'duration': 4605.08,
},
'params': {
# m3u8 download
'skip_download': True,
},
},
{
# single playlist video
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
'skip': 'Only works from Norway',
},
{
'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
'playlist': [
{
'md5': '9480285eff92d64f06e02a5367970a7a',
'info_dict': {
'id': 'MSPO40010515-part1',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
},
{
'md5': 'adbd1dbd813edaf532b0a253780719c2',
'info_dict': {
'id': 'MSPO40010515-part2',
'ext': 'flv',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
},
},
],
'info_dict': {
'id': 'MSPO40010515',
'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
'upload_date': '20150106',
'duration': 6947.5199999999995,
},
'skip': 'Only works from Norway',
},
{
'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
'only_matching': True,
}
]
def _extract_f4m(self, manifest_url, video_id):
return self._extract_f4m_formats(
manifest_url + '?hdcore=3.1.1&plugin=aasp-3.1.1.69.124', video_id, f4m_id='hds')
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
part_id = mobj.group('part_id')
base_url = mobj.group('baseurl')
webpage = self._download_webpage(url, video_id)
title = self._html_search_meta(
'title', webpage, 'title')
description = self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._html_search_regex(
r'data-posterimage="([^"]+)"',
webpage, 'thumbnail', fatal=False)
upload_date = unified_strdate(self._html_search_meta(
'rightsfrom', webpage, 'upload date', fatal=False))
duration = float_or_none(self._html_search_regex(
r'data-duration="([^"]+)"',
webpage, 'duration', fatal=False))
# playlist
parts = re.findall(
r'<a href="#del=(\d+)"[^>]+data-argument="([^"]+)">([^<]+)</a>', webpage)
if parts:
entries = []
for current_part_id, stream_url, part_title in parts:
if part_id and current_part_id != part_id:
continue
video_part_id = '%s-part%s' % (video_id, current_part_id)
formats = self._extract_f4m(stream_url, video_part_id)
entries.append({
'id': video_part_id,
'title': part_title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'formats': formats,
})
if part_id:
if entries:
return entries[0]
else:
playlist = self.playlist_result(entries, video_id, title, description)
playlist.update({
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
})
return playlist
formats = []
f4m_url = re.search(r'data-media="([^"]+)"', webpage)
if f4m_url:
formats.extend(self._extract_f4m(f4m_url.group(1), video_id))
m3u8_url = re.search(r'data-hls-media="([^"]+)"', webpage)
if m3u8_url:
formats.extend(self._extract_m3u8_formats(m3u8_url.group(1), video_id, 'mp4', m3u8_id='hls'))
self._sort_formats(formats)
subtitles_url = self._html_search_regex(
r'data-subtitlesurl\s*=\s*(["\'])(?P<url>.+?)\1',
webpage, 'subtitle URL', default=None, group='url')
subtitles = {}
if subtitles_url:
subtitles['no'] = [{
'ext': 'ttml',
'url': compat_urlparse.urljoin(base_url, subtitles_url),
}]
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'duration': duration,
'formats': formats,
'subtitles': subtitles,
}

View File

@ -100,7 +100,7 @@ class OpenloadIE(InfoExtractor):
raise ExtractorError('File not found', expected=True)
code = self._search_regex(
r'<video[^>]+>\s*<script[^>]+>([^<]+)</script>',
r'</video>\s*</div>\s*<script[^>]+>([^<]+)</script>',
webpage, 'JS code')
decoded = self.openload_decode(code)

View File

@ -12,8 +12,8 @@ from ..utils import (
class OraTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ora\.tv/([^/]+/)*(?P<id>[^/\?#]+)'
_TEST = {
_VALID_URL = r'https?://(?:www\.)?(?:ora\.tv|unsafespeech\.com)/([^/]+/)*(?P<id>[^/\?#]+)'
_TESTS = [{
'url': 'https://www.ora.tv/larrykingnow/2015/12/16/vine-youtube-stars-zach-king-king-bach-on-their-viral-videos-0_36jupg6090pq',
'md5': 'fa33717591c631ec93b04b0e330df786',
'info_dict': {
@ -22,7 +22,10 @@ class OraTVIE(InfoExtractor):
'title': 'Vine & YouTube Stars Zach King & King Bach On Their Viral Videos!',
'description': 'md5:ebbc5b1424dd5dba7be7538148287ac1',
}
}
}, {
'url': 'http://www.unsafespeech.com/video/2016/5/10/student-self-censorship-and-the-thought-police-on-university-campuses-0_6622bnkppw4d',
'only_matching': True,
}]
def _real_extract(self, url):
display_id = self._match_id(url)

View File

@ -2,7 +2,10 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import parse_iso8601
from ..utils import (
parse_iso8601,
unescapeHTML,
)
class PeriscopeIE(InfoExtractor):
@ -42,8 +45,11 @@ class PeriscopeIE(InfoExtractor):
broadcast = broadcast_data['broadcast']
status = broadcast['status']
uploader = broadcast.get('user_display_name') or broadcast_data.get('user', {}).get('display_name')
uploader_id = broadcast.get('user_id') or broadcast_data.get('user', {}).get('id')
user = broadcast_data.get('user', {})
uploader = broadcast.get('user_display_name') or user.get('display_name')
uploader_id = (broadcast.get('username') or user.get('username') or
broadcast.get('user_id') or user.get('id'))
title = '%s - %s' % (uploader, status) if uploader else status
state = broadcast.get('state').lower()
@ -92,6 +98,7 @@ class PeriscopeUserIE(InfoExtractor):
'info_dict': {
'id': 'LularoeHusbandMike',
'title': 'LULAROE HUSBAND MIKE',
'description': 'md5:6cf4ec8047768098da58e446e82c82f0',
},
# Periscope only shows videos in the last 24 hours, so it's possible to
# get 0 videos
@ -103,16 +110,19 @@ class PeriscopeUserIE(InfoExtractor):
webpage = self._download_webpage(url, user_id)
broadcast_data = self._parse_json(self._html_search_meta(
'broadcast-data', webpage, default='{}'), user_id)
username = broadcast_data.get('user', {}).get('display_name')
user_broadcasts = self._parse_json(
self._html_search_meta('user-broadcasts', webpage, default='{}'),
data_store = self._parse_json(
unescapeHTML(self._search_regex(
r'data-store=(["\'])(?P<data>.+?)\1',
webpage, 'data store', default='{}', group='data')),
user_id)
user = data_store.get('User', {}).get('user', {})
title = user.get('display_name') or user.get('username')
description = user.get('description')
entries = [
self.url_result(
'https://www.periscope.tv/%s/%s' % (user_id, broadcast['id']))
for broadcast in user_broadcasts.get('broadcasts', [])]
for broadcast in data_store.get('UserBroadcastHistory', {}).get('broadcasts', [])]
return self.playlist_result(entries, user_id, username)
return self.playlist_result(entries, user_id, title, description)

View File

@ -6,6 +6,9 @@ import re
import time
from .common import InfoExtractor
from ..compat import (
compat_struct_unpack,
)
from ..utils import (
ExtractorError,
float_or_none,
@ -13,7 +16,6 @@ from ..utils import (
remove_start,
sanitized_Request,
std_headers,
struct_unpack,
)
@ -21,7 +23,7 @@ def _decrypt_url(png):
encrypted_data = base64.b64decode(png.encode('utf-8'))
text_index = encrypted_data.find(b'tEXt')
text_chunk = encrypted_data[text_index - 4:]
length = struct_unpack('!I', text_chunk[:4])[0]
length = compat_struct_unpack('!I', text_chunk[:4])[0]
# Use bytearray to get integers when iterating in both python 2.x and 3.x
data = bytearray(text_chunk[8:8 + length])
data = [chr(b) for b in data if b != 0]
@ -62,7 +64,7 @@ def _decrypt_url(png):
class RTVEALaCartaIE(InfoExtractor):
IE_NAME = 'rtve.es:alacarta'
IE_DESC = 'RTVE a la carta'
_VALID_URL = r'https?://www\.rtve\.es/(m/)?alacarta/videos/[^/]+/[^/]+/(?P<id>\d+)'
_VALID_URL = r'https?://www\.rtve\.es/(m/)?(alacarta/videos|filmoteca)/[^/]+/[^/]+/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.rtve.es/alacarta/videos/balonmano/o-swiss-cup-masculina-final-espana-suecia/2491869/',
@ -85,6 +87,9 @@ class RTVEALaCartaIE(InfoExtractor):
}, {
'url': 'http://www.rtve.es/m/alacarta/videos/cuentame-como-paso/cuentame-como-paso-t16-ultimo-minuto-nuestra-vida-capitulo-276/2969138/?media=tve',
'only_matching': True,
}, {
'url': 'http://www.rtve.es/filmoteca/no-do/not-1-introduccion-primer-noticiario-espanol/1465256/',
'only_matching': True,
}]
def _real_initialize(self):

View File

@ -0,0 +1,86 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .jwplatform import JWPlatformBaseIE
from ..compat import compat_parse_qs
from ..utils import (
ExtractorError,
parse_duration,
)
class SendtoNewsIE(JWPlatformBaseIE):
_VALID_URL = r'https?://embed\.sendtonews\.com/player/embed\.php\?(?P<query>[^#]+)'
_TEST = {
# From http://cleveland.cbslocal.com/2016/05/16/indians-score-season-high-15-runs-in-blowout-win-over-reds-rapid-reaction/
'url': 'http://embed.sendtonews.com/player/embed.php?SK=GxfCe0Zo7D&MK=175909&PK=5588&autoplay=on&sound=yes',
'info_dict': {
'id': 'GxfCe0Zo7D-175909-5588',
'ext': 'mp4',
'title': 'Recap: CLE 15, CIN 6',
'description': '5/16/16: Indians\' bats explode for 15 runs in a win',
'duration': 49,
},
'params': {
# m3u8 download
'skip_download': True,
},
}
_URL_TEMPLATE = '//embed.sendtonews.com/player/embed.php?SK=%s&MK=%s&PK=%s'
@classmethod
def _extract_url(cls, webpage):
mobj = re.search(r'''(?x)<script[^>]+src=([\'"])
(?:https?:)?//embed\.sendtonews\.com/player/responsiveembed\.php\?
.*\bSC=(?P<SC>[0-9a-zA-Z-]+).*
\1>''', webpage)
if mobj:
sk, mk, pk = mobj.group('SC').split('-')
return cls._URL_TEMPLATE % (sk, mk, pk)
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
params = compat_parse_qs(mobj.group('query'))
if 'SK' not in params or 'MK' not in params or 'PK' not in params:
raise ExtractorError('Invalid URL', expected=True)
video_id = '-'.join([params['SK'][0], params['MK'][0], params['PK'][0]])
webpage = self._download_webpage(url, video_id)
jwplayer_data_str = self._search_regex(
r'jwplayer\("[^"]+"\)\.setup\((.+?)\);', webpage, 'JWPlayer data')
js_vars = {
'w': 1024,
'h': 768,
'modeVar': 'html5',
}
for name, val in js_vars.items():
js_val = '%d' % val if isinstance(val, int) else '"%s"' % val
jwplayer_data_str = jwplayer_data_str.replace(':%s,' % name, ':%s,' % js_val)
info_dict = self._parse_jwplayer_data(
self._parse_json(jwplayer_data_str, video_id),
video_id, require_title=False, rtmp_params={'no_resume': True})
title = self._html_search_regex(
r'<div[^>]+class="embedTitle">([^<]+)</div>', webpage, 'title')
description = self._html_search_regex(
r'<div[^>]+class="embedSubTitle">([^<]+)</div>', webpage,
'description', fatal=False)
duration = parse_duration(self._html_search_regex(
r'<div[^>]+class="embedDetails">([0-9:]+)', webpage,
'duration', fatal=False))
info_dict.update({
'title': title,
'description': description,
'duration': duration,
})
return info_dict

View File

@ -4,28 +4,35 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_urllib_parse_urlencode
from ..utils import sanitized_Request
from ..utils import (
HEADRequest,
ExtractorError,
int_or_none,
update_url_query,
qualities,
get_element_by_attribute,
clean_html,
)
class SinaIE(InfoExtractor):
_VALID_URL = r'''(?x)https?://(.*?\.)?video\.sina\.com\.cn/
(
(.+?/(((?P<pseudo_id>\d+).html)|(.*?(\#|(vid=)|b/)(?P<id>\d+?)($|&|\-))))
|
_VALID_URL = r'''(?x)https?://(?:.*?\.)?video\.sina\.com\.cn/
(?:
(?:view/|.*\#)(?P<video_id>\d+)|
.+?/(?P<pseudo_id>[^/?#]+)(?:\.s?html)|
# This is used by external sites like Weibo
(api/sinawebApi/outplay.php/(?P<token>.+?)\.swf)
api/sinawebApi/outplay.php/(?P<token>.+?)\.swf
)
'''
_TESTS = [
{
'url': 'http://video.sina.com.cn/news/vlist/zt/chczlj2013/?opsubject_id=top12#110028898',
'md5': 'd65dd22ddcf44e38ce2bf58a10c3e71f',
'url': 'http://video.sina.com.cn/news/spj/topvideoes20160504/?opsubject_id=top1#250576622',
'md5': 'd38433e2fc886007729735650ae4b3e9',
'info_dict': {
'id': '110028898',
'ext': 'flv',
'title': '《中国新闻》 朝鲜要求巴拿马立即释放被扣船员',
'id': '250576622',
'ext': 'mp4',
'title': '现场:克鲁兹宣布退选 特朗普将稳获提名',
}
},
{
@ -35,37 +42,74 @@ class SinaIE(InfoExtractor):
'ext': 'flv',
'title': '军方提高对朝情报监视级别',
},
'skip': 'the page does not exist or has been deleted',
},
{
'url': 'http://video.sina.com.cn/view/250587748.html',
'md5': '3d1807a25c775092aab3bc157fff49b4',
'info_dict': {
'id': '250587748',
'ext': 'mp4',
'title': '瞬间泪目8年前汶川地震珍贵视频首曝光',
},
},
]
def _extract_video(self, video_id):
data = compat_urllib_parse_urlencode({'vid': video_id})
url_doc = self._download_xml('http://v.iask.com/v_play.php?%s' % data,
video_id, 'Downloading video url')
image_page = self._download_webpage(
'http://interface.video.sina.com.cn/interface/common/getVideoImage.php?%s' % data,
video_id, 'Downloading thumbnail info')
return {'id': video_id,
'url': url_doc.find('./durl/url').text,
'ext': 'flv',
'title': url_doc.find('./vname').text,
'thumbnail': image_page.split('=')[1],
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
video_id = mobj.group('video_id')
if not video_id:
if mobj.group('token') is not None:
# The video id is in the redirected url
self.to_screen('Getting video id')
request = sanitized_Request(url)
request.get_method = lambda: 'HEAD'
request = HEADRequest(url)
(_, urlh) = self._download_webpage_handle(request, 'NA', False)
return self._real_extract(urlh.geturl())
elif video_id is None:
else:
pseudo_id = mobj.group('pseudo_id')
webpage = self._download_webpage(url, pseudo_id)
video_id = self._search_regex(r'vid:\'(\d+?)\'', webpage, 'video id')
error = get_element_by_attribute('class', 'errtitle', webpage)
if error:
raise ExtractorError('%s said: %s' % (
self.IE_NAME, clean_html(error)), expected=True)
video_id = self._search_regex(
r"video_id\s*:\s*'(\d+)'", webpage, 'video id')
return self._extract_video(video_id)
video_data = self._download_json(
'http://s.video.sina.com.cn/video/h5play',
video_id, query={'video_id': video_id})
if video_data['code'] != 1:
raise ExtractorError('%s said: %s' % (
self.IE_NAME, video_data['message']), expected=True)
else:
video_data = video_data['data']
title = video_data['title']
description = video_data.get('description')
if description:
description = description.strip()
preference = qualities(['cif', 'sd', 'hd', 'fhd', 'ffd'])
formats = []
for quality_id, quality in video_data.get('videos', {}).get('mp4', {}).items():
file_api = quality.get('file_api')
file_id = quality.get('file_id')
if not file_api or not file_id:
continue
formats.append({
'format_id': quality_id,
'url': update_url_query(file_api, {'vid': file_id}),
'preference': preference(quality_id),
'ext': 'mp4',
})
self._sort_formats(formats)
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': video_data.get('image'),
'duration': int_or_none(video_data.get('length')),
'timestamp': int_or_none(video_data.get('create_time')),
'formats': formats,
}

View File

@ -88,7 +88,7 @@ class TeamcocoIE(InfoExtractor):
preload_codes = self._html_search_regex(
r'(function.+)setTimeout\(function\(\)\{playlist',
webpage, 'preload codes')
base64_fragments = re.findall(r'"([a-zA-z0-9+/=]+)"', preload_codes)
base64_fragments = re.findall(r'"([a-zA-Z0-9+/=]+)"', preload_codes)
base64_fragments.remove('init')
def _check_sequence(cur_fragments):

View File

@ -151,6 +151,22 @@ class ThePlatformIE(ThePlatformBaseIE):
'only_matching': True,
}]
@classmethod
def _extract_urls(cls, webpage):
m = re.search(
r'''(?x)
<meta\s+
property=(["'])(?:og:video(?::(?:secure_)?url)?|twitter:player)\1\s+
content=(["'])(?P<url>https?://player\.theplatform\.com/p/.+?)\2
''', webpage)
if m:
return [m.group('url')]
matches = re.findall(
r'<(?:iframe|script)[^>]+src=(["\'])((?:https?:)?//player\.theplatform\.com/p/.+?)\1', webpage)
if matches:
return list(zip(*matches))[1]
@staticmethod
def _sign_url(url, sig_key, sig_secret, life=600, include_qs=False):
flags = '10' if include_qs else '00'

View File

@ -0,0 +1,139 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
determine_ext,
js_to_json,
mimetype2ext,
)
class ThreeQSDNIE(InfoExtractor):
IE_NAME = '3qsdn'
IE_DESC = '3Q SDN'
_VALID_URL = r'https?://playout\.3qsdn\.com/(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
# ondemand from http://www.philharmonie.tv/veranstaltung/26/
'url': 'http://playout.3qsdn.com/0280d6b9-1215-11e6-b427-0cc47a188158?protocol=http',
'md5': 'ab040e37bcfa2e0c079f92cb1dd7f6cd',
'info_dict': {
'id': '0280d6b9-1215-11e6-b427-0cc47a188158',
'ext': 'mp4',
'title': '0280d6b9-1215-11e6-b427-0cc47a188158',
'is_live': False,
},
'expected_warnings': ['Failed to download MPD manifest'],
}, {
# live video stream
'url': 'https://playout.3qsdn.com/d755d94b-4ab9-11e3-9162-0025907ad44f?js=true',
'info_dict': {
'id': 'd755d94b-4ab9-11e3-9162-0025907ad44f',
'ext': 'mp4',
'title': 'd755d94b-4ab9-11e3-9162-0025907ad44f',
'is_live': False,
},
}, {
# live audio stream
'url': 'http://playout.3qsdn.com/9edf36e0-6bf2-11e2-a16a-9acf09e2db48',
'only_matching': True,
}, {
# live audio stream with some 404 URLs
'url': 'http://playout.3qsdn.com/ac5c3186-777a-11e2-9c30-9acf09e2db48',
'only_matching': True,
}, {
# geo restricted with 'This content is not available in your country'
'url': 'http://playout.3qsdn.com/d63a3ffe-75e8-11e2-9c30-9acf09e2db48',
'only_matching': True,
}, {
# geo restricted with 'playout.3qsdn.com/forbidden'
'url': 'http://playout.3qsdn.com/8e330f26-6ae2-11e2-a16a-9acf09e2db48',
'only_matching': True,
}, {
# live video with rtmp link
'url': 'https://playout.3qsdn.com/6092bb9e-8f72-11e4-a173-002590c750be',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+\b(?:data-)?src=(["\'])(?P<url>%s.*?)\1' % ThreeQSDNIE._VALID_URL, webpage)
if mobj:
return mobj.group('url')
def _real_extract(self, url):
video_id = self._match_id(url)
js = self._download_webpage(
'http://playout.3qsdn.com/%s' % video_id, video_id,
query={'js': 'true'})
if any(p in js for p in (
'>This content is not available in your country',
'playout.3qsdn.com/forbidden')):
self.raise_geo_restricted()
stream_content = self._search_regex(
r'streamContent\s*:\s*(["\'])(?P<content>.+?)\1', js,
'stream content', default='demand', group='content')
live = stream_content == 'live'
stream_type = self._search_regex(
r'streamType\s*:\s*(["\'])(?P<type>audio|video)\1', js,
'stream type', default='video', group='type')
formats = []
urls = set()
def extract_formats(item_url, item={}):
if not item_url or item_url in urls:
return
urls.add(item_url)
type_ = item.get('type')
ext = determine_ext(item_url, default_ext=None)
if type_ == 'application/dash+xml' or ext == 'mpd':
formats.extend(self._extract_mpd_formats(
item_url, video_id, mpd_id='mpd', fatal=False))
elif type_ in ('application/vnd.apple.mpegURL', 'application/x-mpegurl') or ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
item_url, video_id, 'mp4',
entry_protocol='m3u8' if live else 'm3u8_native',
m3u8_id='hls', fatal=False))
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
item_url, video_id, f4m_id='hds', fatal=False))
else:
if not self._is_valid_url(item_url, video_id):
return
formats.append({
'url': item_url,
'format_id': item.get('quality'),
'ext': 'mp4' if item_url.startswith('rtsp') else mimetype2ext(type_) or ext,
'vcodec': 'none' if stream_type == 'audio' else None,
})
for item_js in re.findall(r'({.*?\b(?:src|source)\s*:\s*["\'].+?})', js):
f = self._parse_json(
item_js, video_id, transform_source=js_to_json, fatal=False)
if not f:
continue
extract_formats(f.get('src'), f)
# More relaxed version to collect additional URLs and acting
# as a future-proof fallback
for _, src in re.findall(r'\b(?:src|source)\s*:\s*(["\'])((?:https?|rtsp)://.+?)\1', js):
extract_formats(src)
self._sort_formats(formats)
title = self._live_title(video_id) if live else video_id
return {
'id': video_id,
'title': title,
'is_live': live,
'formats': formats,
}

View File

@ -47,7 +47,8 @@ class TwentyFourVideoIE(InfoExtractor):
title = self._og_search_title(webpage)
description = self._html_search_regex(
r'<span itemprop="description">([^<]+)</span>', webpage, 'description', fatal=False)
r'<(p|span)[^>]+itemprop="description"[^>]*>(?P<description>[^<]+)</\1>',
webpage, 'description', fatal=False, group='description')
thumbnail = self._og_search_thumbnail(webpage)
duration = int_or_none(self._og_search_property(
'duration', webpage, 'duration', fatal=False))

View File

@ -171,6 +171,7 @@ class TwitchVideoIE(TwitchItemBaseIE):
'title': 'Worlds Semifinals - Star Horn Royal Club vs. OMG',
},
'playlist_mincount': 12,
'skip': 'HTTP Error 404: Not Found',
}
@ -187,6 +188,7 @@ class TwitchChapterIE(TwitchItemBaseIE):
'title': 'ACRL Off Season - Sports Cars @ Nordschleife',
},
'playlist_mincount': 3,
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.twitch.tv/tsm_theoddone/c/2349361',
'only_matching': True,
@ -355,31 +357,6 @@ class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
}
class TwitchBookmarksIE(TwitchPlaylistBaseIE):
IE_NAME = 'twitch:bookmarks'
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/bookmarks/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_URL = '%s/api/bookmark/?user=%%s&offset=%%d&limit=%%d' % TwitchBaseIE._API_BASE
_PLAYLIST_TYPE = 'bookmarks'
_TEST = {
'url': 'http://www.twitch.tv/ognos/profile/bookmarks',
'info_dict': {
'id': 'ognos',
'title': 'Ognos',
},
'playlist_mincount': 3,
}
def _extract_playlist_page(self, response):
entries = []
for bookmark in response.get('bookmarks', []):
video = bookmark.get('video')
if not video:
continue
entries.append(video['url'])
return entries
class TwitchStreamIE(TwitchBaseIE):
IE_NAME = 'twitch:stream'
_VALID_URL = r'%s/(?P<id>[^/#?]+)/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE

View File

@ -6,10 +6,12 @@ from .common import InfoExtractor
from ..utils import (
int_or_none,
unified_strdate,
unescapeHTML,
)
class UstudioIE(InfoExtractor):
IE_NAME = 'ustudio'
_VALID_URL = r'https?://(?:(?:www|v1)\.)?ustudio\.com/video/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
_TEST = {
'url': 'http://ustudio.com/video/Uxu2my9bgSph/san_francisco_golden_gate_bridge',
@ -27,9 +29,7 @@ class UstudioIE(InfoExtractor):
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id')
video_id, display_id = re.match(self._VALID_URL, url).groups()
config = self._download_xml(
'http://v1.ustudio.com/embed/%s/ustudio/config.xml' % video_id,
@ -37,7 +37,7 @@ class UstudioIE(InfoExtractor):
def extract(kind):
return [{
'url': item.attrib['url'],
'url': unescapeHTML(item.attrib['url']),
'width': int_or_none(item.get('width')),
'height': int_or_none(item.get('height')),
} for item in config.findall('./qualities/quality/%s' % kind) if item.get('url')]
@ -65,3 +65,61 @@ class UstudioIE(InfoExtractor):
'uploader': uploader,
'formats': formats,
}
class UstudioEmbedIE(InfoExtractor):
IE_NAME = 'ustudio:embed'
_VALID_URL = r'https?://(?:(?:app|embed)\.)?ustudio\.com/embed/(?P<uid>[^/]+)/(?P<id>[^/]+)'
_TEST = {
'url': 'http://app.ustudio.com/embed/DeN7VdYRDKhP/Uw7G1kMCe65T',
'md5': '47c0be52a09b23a7f40de9469cec58f4',
'info_dict': {
'id': 'Uw7G1kMCe65T',
'ext': 'mp4',
'title': '5 Things IT Should Know About Video',
'description': 'md5:93d32650884b500115e158c5677d25ad',
'uploader_id': 'DeN7VdYRDKhP',
}
}
def _real_extract(self, url):
uploader_id, video_id = re.match(self._VALID_URL, url).groups()
video_data = self._download_json(
'http://app.ustudio.com/embed/%s/%s/config.json' % (uploader_id, video_id),
video_id)['videos'][0]
title = video_data['name']
formats = []
for ext, qualities in video_data.get('transcodes', {}).items():
for quality in qualities:
quality_url = quality.get('url')
if not quality_url:
continue
height = int_or_none(quality.get('height'))
formats.append({
'format_id': '%s-%dp' % (ext, height) if height else ext,
'url': quality_url,
'width': int_or_none(quality.get('width')),
'height': height,
})
self._sort_formats(formats)
thumbnails = []
for image in video_data.get('images', []):
image_url = image.get('url')
if not image_url:
continue
thumbnails.append({
'url': image_url,
})
return {
'id': video_id,
'title': title,
'description': video_data.get('description'),
'duration': int_or_none(video_data.get('duration')),
'uploader_id': uploader_id,
'tags': video_data.get('keywords'),
'thumbnails': thumbnails,
'formats': formats,
}

View File

@ -213,19 +213,17 @@ class VevoIE(VevoBaseIE):
formats = []
if not video_info:
if response and response.get('statusCode') != 909:
try:
self._initialize_api(video_id)
except ExtractorError:
ytid = response.get('errorInfo', {}).get('ytid')
if ytid:
self.report_warning(
'Video is geoblocked, trying with the YouTube video %s' % ytid)
return self.url_result(ytid, 'Youtube', ytid)
if 'statusMessage' in response:
raise ExtractorError('%s said: %s' % (
self.IE_NAME, response['statusMessage']), expected=True)
raise ExtractorError('Unable to extract videos')
raise
self._initialize_api(video_id)
video_info = self._call_api(
'video/%s' % video_id, video_id, 'Downloading api video info',
'Failed to download video info')

View File

@ -3,16 +3,17 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
sanitized_Request,
int_or_none,
float_or_none,
)
class WistiaIE(InfoExtractor):
_VALID_URL = r'https?://(?:fast\.)?wistia\.net/embed/iframe/(?P<id>[a-z0-9]+)'
_API_URL = 'http://fast.wistia.com/embed/medias/{0:}.json'
_VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.net/embed/iframe/)(?P<id>[a-z0-9]+)'
_API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
_IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
_TEST = {
_TESTS = [{
'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
'md5': 'cafeb56ec0c53c18c97405eecb3133df',
'info_dict': {
@ -24,36 +25,54 @@ class WistiaIE(InfoExtractor):
'timestamp': 1386185018,
'duration': 117,
},
}
}, {
'url': 'wistia:sh7fpupwlt',
'only_matching': True,
}, {
# with hls video
'url': 'wistia:807fafadvk',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
request = sanitized_Request(self._API_URL.format(video_id))
request.add_header('Referer', url) # Some videos require this.
data_json = self._download_json(request, video_id)
data_json = self._download_json(
self._API_URL % video_id, video_id,
# Some videos require this.
headers={
'Referer': url if url.startswith('http') else self._IFRAME_URL % video_id,
})
if data_json.get('error'):
raise ExtractorError('Error while getting the playlist',
expected=True)
raise ExtractorError(
'Error while getting the playlist', expected=True)
data = data_json['media']
title = data['name']
formats = []
thumbnails = []
for a in data['assets']:
aurl = a.get('url')
if not aurl:
continue
astatus = a.get('status')
atype = a.get('type')
if (astatus is not None and astatus != 2) or atype == 'preview':
if (astatus is not None and astatus != 2) or atype in ('preview', 'storyboard'):
continue
elif atype in ('still', 'still_image'):
thumbnails.append({
'url': a['url'],
'resolution': '%dx%d' % (a['width'], a['height']),
'url': aurl,
'width': int_or_none(a.get('width')),
'height': int_or_none(a.get('height')),
})
else:
aext = a.get('ext')
is_m3u8 = a.get('container') == 'm3u8' or aext == 'm3u8'
formats.append({
'format_id': atype,
'url': a['url'],
'url': aurl,
'tbr': int_or_none(a.get('bitrate')),
'vbr': int_or_none(a.get('opt_vbitrate')),
'width': int_or_none(a.get('width')),
@ -61,7 +80,8 @@ class WistiaIE(InfoExtractor):
'filesize': int_or_none(a.get('size')),
'vcodec': a.get('codec'),
'container': a.get('container'),
'ext': a.get('ext'),
'ext': 'mp4' if is_m3u8 else aext,
'protocol': 'm3u8' if is_m3u8 else None,
'preference': 1 if atype == 'original' else None,
})
@ -73,6 +93,6 @@ class WistiaIE(InfoExtractor):
'description': data.get('seoDescription'),
'formats': formats,
'thumbnails': thumbnails,
'duration': int_or_none(data.get('duration')),
'duration': float_or_none(data.get('duration')),
'timestamp': int_or_none(data.get('createdAt')),
}

View File

@ -8,7 +8,6 @@ from ..utils import (
clean_html,
ExtractorError,
determine_ext,
sanitized_Request,
)
@ -25,8 +24,6 @@ class XVideosIE(InfoExtractor):
}
}
_ANDROID_USER_AGENT = 'Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19'
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
@ -35,31 +32,34 @@ class XVideosIE(InfoExtractor):
if mobj:
raise ExtractorError('%s said: %s' % (self.IE_NAME, clean_html(mobj.group(1))), expected=True)
video_url = compat_urllib_parse_unquote(
self._search_regex(r'flv_url=(.+?)&', webpage, 'video URL'))
video_title = self._html_search_regex(
r'<title>(.*?)\s+-\s+XVID', webpage, 'title')
video_thumbnail = self._search_regex(
r'url_bigthumb=(.+?)&amp', webpage, 'thumbnail', fatal=False)
formats = [{
'url': video_url,
}]
formats = []
android_req = sanitized_Request(url)
android_req.add_header('User-Agent', self._ANDROID_USER_AGENT)
android_webpage = self._download_webpage(android_req, video_id, fatal=False)
video_url = compat_urllib_parse_unquote(self._search_regex(
r'flv_url=(.+?)&', webpage, 'video URL', default=''))
if video_url:
formats.append({'url': video_url})
if android_webpage is not None:
player_params_str = self._search_regex(
'mobileReplacePlayerDivTwoQual\(([^)]+)\)',
android_webpage, 'player parameters', default='')
player_params = list(map(lambda s: s.strip(' \''), player_params_str.split(',')))
if player_params:
formats.extend([{
'url': param,
'preference': -10,
} for param in player_params if determine_ext(param) == 'mp4'])
player_args = self._search_regex(
r'(?s)new\s+HTML5Player\((.+?)\)', webpage, ' html5 player', default=None)
if player_args:
for arg in player_args.split(','):
format_url = self._search_regex(
r'(["\'])(?P<url>https?://.+?)\1', arg, 'url',
default=None, group='url')
if not format_url:
continue
ext = determine_ext(format_url)
if ext == 'mp4':
formats.append({'url': format_url})
elif ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4',
entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
self._sort_formats(formats)
@ -67,7 +67,6 @@ class XVideosIE(InfoExtractor):
'id': video_id,
'formats': formats,
'title': video_title,
'ext': 'flv',
'thumbnail': video_thumbnail,
'age_limit': 18,
}

View File

@ -188,7 +188,10 @@ def parseOpts(overrideArguments=None):
network.add_option(
'--proxy', dest='proxy',
default=None, metavar='URL',
help='Use the specified HTTP/HTTPS proxy. Pass in an empty string (--proxy "") for direct connection')
help='Use the specified HTTP/HTTPS/SOCKS proxy. To enable experimental '
'SOCKS proxy, specify a proper scheme. For example '
'socks5://127.0.0.1:1080/. Pass in an empty string (--proxy "") '
'for direct connection')
network.add_option(
'--socket-timeout',
dest='socket_timeout', type=float, default=None, metavar='SECONDS',

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
import subprocess
from .common import PostProcessor
from ..compat import shlex_quote
from ..compat import compat_shlex_quote
from ..utils import PostProcessingError
@ -17,7 +17,7 @@ class ExecAfterDownloadPP(PostProcessor):
if '{}' not in cmd:
cmd += ' {}'
cmd = cmd.replace('{}', shlex_quote(information['filepath']))
cmd = cmd.replace('{}', compat_shlex_quote(information['filepath']))
self._downloader.to_screen('[exec] Executing command: %s' % cmd)
retCode = subprocess.call(cmd, shell=True)

271
youtube_dl/socks.py Normal file
View File

@ -0,0 +1,271 @@
# Public Domain SOCKS proxy protocol implementation
# Adapted from https://gist.github.com/bluec0re/cafd3764412967417fd3
from __future__ import unicode_literals
# References:
# SOCKS4 protocol http://www.openssh.com/txt/socks4.protocol
# SOCKS4A protocol http://www.openssh.com/txt/socks4a.protocol
# SOCKS5 protocol https://tools.ietf.org/html/rfc1928
# SOCKS5 username/password authentication https://tools.ietf.org/html/rfc1929
import collections
import socket
from .compat import (
compat_ord,
compat_struct_pack,
compat_struct_unpack,
)
__author__ = 'Timo Schmid <coding@timoschmid.de>'
SOCKS4_VERSION = 4
SOCKS4_REPLY_VERSION = 0x00
# Excerpt from SOCKS4A protocol:
# if the client cannot resolve the destination host's domain name to find its
# IP address, it should set the first three bytes of DSTIP to NULL and the last
# byte to a non-zero value.
SOCKS4_DEFAULT_DSTIP = compat_struct_pack('!BBBB', 0, 0, 0, 0xFF)
SOCKS5_VERSION = 5
SOCKS5_USER_AUTH_VERSION = 0x01
SOCKS5_USER_AUTH_SUCCESS = 0x00
class Socks4Command(object):
CMD_CONNECT = 0x01
CMD_BIND = 0x02
class Socks5Command(Socks4Command):
CMD_UDP_ASSOCIATE = 0x03
class Socks5Auth(object):
AUTH_NONE = 0x00
AUTH_GSSAPI = 0x01
AUTH_USER_PASS = 0x02
AUTH_NO_ACCEPTABLE = 0xFF # For server response
class Socks5AddressType(object):
ATYP_IPV4 = 0x01
ATYP_DOMAINNAME = 0x03
ATYP_IPV6 = 0x04
class ProxyError(IOError):
ERR_SUCCESS = 0x00
def __init__(self, code=None, msg=None):
if code is not None and msg is None:
msg = self.CODES.get(code) and 'unknown error'
super(ProxyError, self).__init__(code, msg)
class InvalidVersionError(ProxyError):
def __init__(self, expected_version, got_version):
msg = ('Invalid response version from server. Expected {0:02x} got '
'{1:02x}'.format(expected_version, got_version))
super(InvalidVersionError, self).__init__(0, msg)
class Socks4Error(ProxyError):
ERR_SUCCESS = 90
CODES = {
91: 'request rejected or failed',
92: 'request rejected becasue SOCKS server cannot connect to identd on the client',
93: 'request rejected because the client program and identd report different user-ids'
}
class Socks5Error(ProxyError):
ERR_GENERAL_FAILURE = 0x01
CODES = {
0x01: 'general SOCKS server failure',
0x02: 'connection not allowed by ruleset',
0x03: 'Network unreachable',
0x04: 'Host unreachable',
0x05: 'Connection refused',
0x06: 'TTL expired',
0x07: 'Command not supported',
0x08: 'Address type not supported',
0xFE: 'unknown username or invalid password',
0xFF: 'all offered authentication methods were rejected'
}
class ProxyType(object):
SOCKS4 = 0
SOCKS4A = 1
SOCKS5 = 2
Proxy = collections.namedtuple('Proxy', (
'type', 'host', 'port', 'username', 'password', 'remote_dns'))
class sockssocket(socket.socket):
def __init__(self, *args, **kwargs):
self._proxy = None
super(sockssocket, self).__init__(*args, **kwargs)
def setproxy(self, proxytype, addr, port, rdns=True, username=None, password=None):
assert proxytype in (ProxyType.SOCKS4, ProxyType.SOCKS4A, ProxyType.SOCKS5)
self._proxy = Proxy(proxytype, addr, port, username, password, rdns)
def recvall(self, cnt):
data = b''
while len(data) < cnt:
cur = self.recv(cnt - len(data))
if not cur:
raise IOError('{0} bytes missing'.format(cnt - len(data)))
data += cur
return data
def _recv_bytes(self, cnt):
data = self.recvall(cnt)
return compat_struct_unpack('!{0}B'.format(cnt), data)
@staticmethod
def _len_and_data(data):
return compat_struct_pack('!B', len(data)) + data
def _check_response_version(self, expected_version, got_version):
if got_version != expected_version:
self.close()
raise InvalidVersionError(expected_version, got_version)
def _resolve_address(self, destaddr, default, use_remote_dns):
try:
return socket.inet_aton(destaddr)
except socket.error:
if use_remote_dns and self._proxy.remote_dns:
return default
else:
return socket.inet_aton(socket.gethostbyname(destaddr))
def _setup_socks4(self, address, is_4a=False):
destaddr, port = address
ipaddr = self._resolve_address(destaddr, SOCKS4_DEFAULT_DSTIP, use_remote_dns=is_4a)
packet = compat_struct_pack('!BBH', SOCKS4_VERSION, Socks4Command.CMD_CONNECT, port) + ipaddr
username = (self._proxy.username or '').encode('utf-8')
packet += username + b'\x00'
if is_4a and self._proxy.remote_dns:
packet += destaddr.encode('utf-8') + b'\x00'
self.sendall(packet)
version, resp_code, dstport, dsthost = compat_struct_unpack('!BBHI', self.recvall(8))
self._check_response_version(SOCKS4_REPLY_VERSION, version)
if resp_code != Socks4Error.ERR_SUCCESS:
self.close()
raise Socks4Error(resp_code)
return (dsthost, dstport)
def _setup_socks4a(self, address):
self._setup_socks4(address, is_4a=True)
def _socks5_auth(self):
packet = compat_struct_pack('!B', SOCKS5_VERSION)
auth_methods = [Socks5Auth.AUTH_NONE]
if self._proxy.username and self._proxy.password:
auth_methods.append(Socks5Auth.AUTH_USER_PASS)
packet += compat_struct_pack('!B', len(auth_methods))
packet += compat_struct_pack('!{0}B'.format(len(auth_methods)), *auth_methods)
self.sendall(packet)
version, method = self._recv_bytes(2)
self._check_response_version(SOCKS5_VERSION, version)
if method == Socks5Auth.AUTH_NO_ACCEPTABLE:
self.close()
raise Socks5Error(method)
if method == Socks5Auth.AUTH_USER_PASS:
username = self._proxy.username.encode('utf-8')
password = self._proxy.password.encode('utf-8')
packet = compat_struct_pack('!B', SOCKS5_USER_AUTH_VERSION)
packet += self._len_and_data(username) + self._len_and_data(password)
self.sendall(packet)
version, status = self._recv_bytes(2)
self._check_response_version(SOCKS5_USER_AUTH_VERSION, version)
if status != SOCKS5_USER_AUTH_SUCCESS:
self.close()
raise Socks5Error(Socks5Error.ERR_GENERAL_FAILURE)
def _setup_socks5(self, address):
destaddr, port = address
ipaddr = self._resolve_address(destaddr, None, use_remote_dns=True)
self._socks5_auth()
reserved = 0
packet = compat_struct_pack('!BBB', SOCKS5_VERSION, Socks5Command.CMD_CONNECT, reserved)
if ipaddr is None:
destaddr = destaddr.encode('utf-8')
packet += compat_struct_pack('!B', Socks5AddressType.ATYP_DOMAINNAME)
packet += self._len_and_data(destaddr)
else:
packet += compat_struct_pack('!B', Socks5AddressType.ATYP_IPV4) + ipaddr
packet += compat_struct_pack('!H', port)
self.sendall(packet)
version, status, reserved, atype = self._recv_bytes(4)
self._check_response_version(SOCKS5_VERSION, version)
if status != Socks5Error.ERR_SUCCESS:
self.close()
raise Socks5Error(status)
if atype == Socks5AddressType.ATYP_IPV4:
destaddr = self.recvall(4)
elif atype == Socks5AddressType.ATYP_DOMAINNAME:
alen = compat_ord(self.recv(1))
destaddr = self.recvall(alen)
elif atype == Socks5AddressType.ATYP_IPV6:
destaddr = self.recvall(16)
destport = compat_struct_unpack('!H', self.recvall(2))[0]
return (destaddr, destport)
def _make_proxy(self, connect_func, address):
if not self._proxy:
return connect_func(self, address)
result = connect_func(self, (self._proxy.host, self._proxy.port))
if result != 0 and result is not None:
return result
setup_funcs = {
ProxyType.SOCKS4: self._setup_socks4,
ProxyType.SOCKS4A: self._setup_socks4a,
ProxyType.SOCKS5: self._setup_socks5,
}
setup_funcs[self._proxy.type](address)
return result
def connect(self, address):
self._make_proxy(socket.socket.connect, address)
def connect_ex(self, address):
return self._make_proxy(socket.socket.connect_ex, address)

View File

@ -4,10 +4,12 @@ import collections
import io
import zlib
from .compat import compat_str
from .compat import (
compat_str,
compat_struct_unpack,
)
from .utils import (
ExtractorError,
struct_unpack,
)
@ -23,17 +25,17 @@ def _extract_tags(file_contents):
file_contents[:1])
# Determine number of bits in framesize rectangle
framesize_nbits = struct_unpack('!B', content[:1])[0] >> 3
framesize_nbits = compat_struct_unpack('!B', content[:1])[0] >> 3
framesize_len = (5 + 4 * framesize_nbits + 7) // 8
pos = framesize_len + 2 + 2
while pos < len(content):
header16 = struct_unpack('<H', content[pos:pos + 2])[0]
header16 = compat_struct_unpack('<H', content[pos:pos + 2])[0]
pos += 2
tag_code = header16 >> 6
tag_len = header16 & 0x3f
if tag_len == 0x3f:
tag_len = struct_unpack('<I', content[pos:pos + 4])[0]
tag_len = compat_struct_unpack('<I', content[pos:pos + 4])[0]
pos += 4
assert pos + tag_len <= len(content), \
('Tag %d ends at %d+%d - that\'s longer than the file (%d)'
@ -101,7 +103,7 @@ def _read_int(reader):
for _ in range(5):
buf = reader.read(1)
assert len(buf) == 1
b = struct_unpack('<B', buf)[0]
b = compat_struct_unpack('<B', buf)[0]
res = res | ((b & 0x7f) << shift)
if b & 0x80 == 0:
break
@ -127,7 +129,7 @@ def _s24(reader):
bs = reader.read(3)
assert len(bs) == 3
last_byte = b'\xff' if (ord(bs[2:3]) >= 0x80) else b'\x00'
return struct_unpack('<i', bs + last_byte)[0]
return compat_struct_unpack('<i', bs + last_byte)[0]
def _read_string(reader):
@ -146,7 +148,7 @@ def _read_bytes(count, reader):
def _read_byte(reader):
resb = _read_bytes(1, reader=reader)
res = struct_unpack('<B', resb)[0]
res = compat_struct_unpack('<B', resb)[0]
return res

View File

@ -83,11 +83,8 @@ def update_self(to_screen, verbose, opener):
print_notes(to_screen, versions_info['versions'])
filename = sys.argv[0]
# Py2EXE: Filename could be different
if hasattr(sys, 'frozen') and not os.path.isfile(filename):
if os.path.isfile(filename + '.exe'):
filename += '.exe'
# sys.executable is set to the full pathname of the exe-file for py2exe
filename = sys.executable if hasattr(sys, 'frozen') else sys.argv[0]
if not os.access(filename, os.W_OK):
to_screen('ERROR: no write permissions on %s' % filename)
@ -95,7 +92,7 @@ def update_self(to_screen, verbose, opener):
# Py2EXE
if hasattr(sys, 'frozen'):
exe = os.path.abspath(filename)
exe = filename
directory = os.path.dirname(exe)
if not os.access(directory, os.W_OK):
to_screen('ERROR: no write permissions on %s' % directory)

View File

@ -26,7 +26,6 @@ import platform
import re
import socket
import ssl
import struct
import subprocess
import sys
import tempfile
@ -43,18 +42,34 @@ from .compat import (
compat_http_client,
compat_kwargs,
compat_parse_qs,
compat_shlex_quote,
compat_socket_create_connection,
compat_str,
compat_struct_pack,
compat_urllib_error,
compat_urllib_parse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
compat_urllib_parse_unquote_plus,
compat_urllib_request,
compat_urlparse,
compat_xpath,
shlex_quote,
)
from .socks import (
ProxyType,
sockssocket,
)
def register_socks_protocols():
# "Register" SOCKS protocols
# In Python < 2.6.5, urlsplit() suffers from bug https://bugs.python.org/issue7904
# URLs with protocols not in urlparse.uses_netloc are not handled correctly
for scheme in ('socks', 'socks4', 'socks4a', 'socks5'):
if scheme not in compat_urlparse.uses_netloc:
compat_urlparse.uses_netloc.append(scheme)
# This is not clearly defined otherwise
compiled_regex_type = type(re.compile(''))
@ -90,9 +105,9 @@ KNOWN_EXTENSIONS = (
'f4f', 'f4m', 'm3u8', 'smil')
# needed for sanitizing filenames in restricted mode
ACCENT_CHARS = dict(zip('ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ',
itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOOUUUUYP', ['ss'],
'aaaaaa', ['ae'], 'ceeeeiiiionoooooouuuuypy')))
ACCENT_CHARS = dict(zip('ÂÃÄÀÁÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØŒÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøœùúûüýþÿ',
itertools.chain('AAAAAA', ['AE'], 'CEEEEIIIIDNOOOOOO', ['OE'], 'UUUUYP', ['ss'],
'aaaaaa', ['ae'], 'ceeeeiiiionoooooo', ['oe'], 'uuuuypy')))
def preferredencoding():
@ -752,8 +767,15 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
self._params = params
def http_open(self, req):
conn_class = compat_http_client.HTTPConnection
socks_proxy = req.headers.get('Ytdl-socks-proxy')
if socks_proxy:
conn_class = make_socks_conn_class(conn_class, socks_proxy)
del req.headers['Ytdl-socks-proxy']
return self.do_open(functools.partial(
_create_http_connection, self, compat_http_client.HTTPConnection, False),
_create_http_connection, self, conn_class, False),
req)
@staticmethod
@ -849,6 +871,49 @@ class YoutubeDLHandler(compat_urllib_request.HTTPHandler):
https_response = http_response
def make_socks_conn_class(base_class, socks_proxy):
assert issubclass(base_class, (
compat_http_client.HTTPConnection, compat_http_client.HTTPSConnection))
url_components = compat_urlparse.urlparse(socks_proxy)
if url_components.scheme.lower() == 'socks5':
socks_type = ProxyType.SOCKS5
elif url_components.scheme.lower() in ('socks', 'socks4'):
socks_type = ProxyType.SOCKS4
elif url_components.scheme.lower() == 'socks4a':
socks_type = ProxyType.SOCKS4A
def unquote_if_non_empty(s):
if not s:
return s
return compat_urllib_parse_unquote_plus(s)
proxy_args = (
socks_type,
url_components.hostname, url_components.port or 1080,
True, # Remote DNS
unquote_if_non_empty(url_components.username),
unquote_if_non_empty(url_components.password),
)
class SocksConnection(base_class):
def connect(self):
self.sock = sockssocket()
self.sock.setproxy(*proxy_args)
if type(self.timeout) in (int, float):
self.sock.settimeout(self.timeout)
self.sock.connect((self.host, self.port))
if isinstance(self, compat_http_client.HTTPSConnection):
if hasattr(self, '_context'): # Python > 2.6
self.sock = self._context.wrap_socket(
self.sock, server_hostname=self.host)
else:
self.sock = ssl.wrap_socket(self.sock)
return SocksConnection
class YoutubeDLHTTPSHandler(compat_urllib_request.HTTPSHandler):
def __init__(self, params, https_conn_class=None, *args, **kwargs):
compat_urllib_request.HTTPSHandler.__init__(self, *args, **kwargs)
@ -857,12 +922,20 @@ class YoutubeDLHTTPSHandler(compat_urllib_request.HTTPSHandler):
def https_open(self, req):
kwargs = {}
conn_class = self._https_conn_class
if hasattr(self, '_context'): # python > 2.6
kwargs['context'] = self._context
if hasattr(self, '_check_hostname'): # python 3.x
kwargs['check_hostname'] = self._check_hostname
socks_proxy = req.headers.get('Ytdl-socks-proxy')
if socks_proxy:
conn_class = make_socks_conn_class(conn_class, socks_proxy)
del req.headers['Ytdl-socks-proxy']
return self.do_open(functools.partial(
_create_http_connection, self, self._https_conn_class, True),
_create_http_connection, self, conn_class, True),
req, **kwargs)
@ -982,7 +1055,10 @@ def unified_strdate(date_str, day_first=True):
if upload_date is None:
timetuple = email.utils.parsedate_tz(date_str)
if timetuple:
try:
upload_date = datetime.datetime(*timetuple[:6]).strftime('%Y%m%d')
except ValueError:
pass
if upload_date is not None:
return compat_str(upload_date)
@ -1193,7 +1269,7 @@ def bytes_to_intlist(bs):
def intlist_to_bytes(xs):
if not xs:
return b''
return struct_pack('%dB' % len(xs), *xs)
return compat_struct_pack('%dB' % len(xs), *xs)
# Cross-platform file locking
@ -1476,15 +1552,11 @@ def setproctitle(title):
def remove_start(s, start):
if s.startswith(start):
return s[len(start):]
return s
return s[len(start):] if s is not None and s.startswith(start) else s
def remove_end(s, end):
if s.endswith(end):
return s[:-len(end)]
return s
return s[:-len(end)] if s is not None and s.endswith(end) else s
def remove_quotes(s):
@ -1761,24 +1833,6 @@ def escape_url(url):
fragment=escape_rfc3986(url_parsed.fragment)
).geturl()
try:
struct.pack('!I', 0)
except TypeError:
# In Python 2.6 and 2.7.x < 2.7.7, struct requires a bytes argument
# See https://bugs.python.org/issue19099
def struct_pack(spec, *args):
if isinstance(spec, compat_str):
spec = spec.encode('ascii')
return struct.pack(spec, *args)
def struct_unpack(spec, *args):
if isinstance(spec, compat_str):
spec = spec.encode('ascii')
return struct.unpack(spec, *args)
else:
struct_pack = struct.pack
struct_unpack = struct.unpack
def read_batch_urls(batch_fd):
def fixup(url):
@ -1864,24 +1918,38 @@ def js_to_json(code):
v = m.group(0)
if v in ('true', 'false', 'null'):
return v
if v.startswith('"'):
v = re.sub(r"\\'", "'", v[1:-1])
elif v.startswith("'"):
v = v[1:-1]
v = re.sub(r"\\\\|\\'|\"", lambda m: {
'\\\\': '\\\\',
"\\'": "'",
elif v.startswith('/*') or v == ',':
return ""
if v[0] in ("'", '"'):
v = re.sub(r'(?s)\\.|"', lambda m: {
'"': '\\"',
}[m.group(0)], v)
"\\'": "'",
'\\\n': '',
'\\x': '\\u00',
}.get(m.group(0), m.group(0)), v[1:-1])
INTEGER_TABLE = (
(r'^0[xX][0-9a-fA-F]+', 16),
(r'^0+[0-7]+', 8),
)
for regex, base in INTEGER_TABLE:
im = re.match(regex, v)
if im:
i = int(im.group(0), base)
return '"%d":' % i if v.endswith(':') else '%d' % i
return '"%s"' % v
res = re.sub(r'''(?x)
"(?:[^"\\]*(?:\\\\|\\['"nu]))*[^"\\]*"|
'(?:[^'\\]*(?:\\\\|\\['"nu]))*[^'\\]*'|
[a-zA-Z_][.a-zA-Z_0-9]*
return re.sub(r'''(?sx)
"(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
'(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
/\*.*?\*/|,(?=\s*[\]}])|
[a-zA-Z_][.a-zA-Z_0-9]*|
(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:\s*:)?|
[0-9]+(?=\s*:)
''', fix_kv, code)
res = re.sub(r',(\s*[\]}])', lambda m: m.group(1), res)
return res
def qualities(quality_ids):
@ -1929,7 +1997,7 @@ def ytdl_is_updateable():
def args_to_str(args):
# Get a short string representation for a subprocess command
return ' '.join(shlex_quote(a) for a in args)
return ' '.join(compat_shlex_quote(a) for a in args)
def error_to_compat_str(err):
@ -1967,11 +2035,7 @@ def mimetype2ext(mt):
def urlhandle_detect_ext(url_handle):
try:
url_handle.headers
getheader = lambda h: url_handle.headers[h]
except AttributeError: # Python < 3
getheader = url_handle.info().getheader
getheader = url_handle.headers.get
cd = getheader('Content-Disposition')
if cd:
@ -2701,6 +2765,10 @@ class PerRequestProxyHandler(compat_urllib_request.ProxyHandler):
if proxy == '__noproxy__':
return None # No Proxy
if compat_urlparse.urlparse(proxy).scheme.lower() in ('socks', 'socks4', 'socks4a', 'socks5'):
req.add_header('Ytdl-socks-proxy', proxy)
# youtube-dl's http/https handlers do wrapping the socket with socks
return None
return compat_urllib_request.ProxyHandler.proxy_open(
self, req, proxy, type)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2016.05.01'
__version__ = '2016.05.21.2'