23790 Commits

Author SHA1 Message Date
Sergey M․
d0ba55871e
[youtube] Improve _VALID_URLs (closes #12538) 2017-03-25 01:18:33 +07:00
Sergey M․
54b960f340
[generic] Do not follow redirects to the same URL 2017-03-24 00:45:24 +07:00
Sergey M․
a3ccd6bd11
release 2017.03.24 2017.03.24 2017-03-24 00:24:23 +07:00
Sergey M․
7963b6cba8
[ChangeLog] Actualize 2017-03-24 00:19:58 +07:00
Sergey M․
bea7af6947
[channel9] Remove expired comment and sort imports 2017-03-23 23:58:12 +07:00
Sergey M․
a5d783f525
[channel9] Extract more formats 2017-03-23 23:47:43 +07:00
Remita Amine
d0572557c2 [ninecninemedia] remove mp4 url extraction request 2017-03-23 13:53:07 +01:00
Remita Amine
52d5ecabd5 [bellmedia] add support for etalk.ca(closes #12447) 2017-03-23 13:52:45 +01:00
Remita Amine
b0f7f21cb9 [channel9] fix extraction(closes #11323) 2017-03-23 09:22:37 +01:00
Sergey M․
579c99a284
[cloudy] Fix extraction (closes #12525) 2017-03-22 23:48:06 +07:00
Remita Amine
ca5ed022e9 [hbo] add support for free episode urls and new formats extraction(closes #12519) 2017-03-22 17:28:53 +01:00
Sergey M․
391d076d7c
[condenast] Fix extraction and style (closes #12526) 2017-03-22 23:22:14 +07:00
Sergey M․
c183e14f89
[viu] Relax _VALID_URL (closes #12529) 2017-03-22 22:26:59 +07:00
Sergey M․
093dad9e25
release 2017.03.22 2017.03.22 2017-03-22 02:36:50 +07:00
Sergey M․
e8686e51d7
[ChangeLog] Actualize 2017-03-22 02:35:09 +07:00
Sergey M․
8e5a7c5e67
[pluralsight] Omit module title from video title (closes #12506) 2017-03-22 02:28:04 +07:00
Sergey M․
e1e35d1ac6
[pornhub] Improve extraction and style (closes #12515) 2017-03-22 01:59:27 +07:00
Throaway
21fbf0f955
[pornhub] Decode obfuscated video URL (closes #12470) 2017-03-22 01:51:45 +07:00
John Hawkinson
66361e1e93 [external] Make get_execname() a classmethod per @dstftw 2017-03-21 14:21:28 -04:00
John Hawkinson
095cda627e [external] Print executable name instead of classname
Seperate out the concept of the executable used by an ExternalFD
downloader and the name of the class. I'm not entirely sure why we
care aobut the name of the class at all, but it's used outside of
classes to initialize _BY_NAME() and soforth, so it seems impractical
to just change get_basename() to return the executable name

So instead, call get_execname() not get_basename().

Default get_execname() to calling get_basename(), but override it in
FFmpegFD, where it returns self.execname, which is set in
_call_downloader().

Perhaps there is a less complicated way to achieve this goal?
2017-03-21 14:13:47 -04:00
John Hawkinson
97952bdb78 [generic] Add test for Senate ISVP iframe embed 2017-03-22 01:12:14 +08:00
Jan Hoek
503acf8c87 Add npo:recents extractor
Extractor for npo.nl programs. Retrieves only recent episodes of the program in question (hence the name...). Some programs have so many episodes available that it doesn't make any practical sense to retrieve all.
2017-03-21 17:29:25 +01:00
Mohammed Yaseen Mowzer
fe6f302959 Raise exception if jwplayer doesn't have "sources"
YoutubeDL uses a regexp in common.py::_find_jwplayer_data
to find the jwplayer options. However the options are found in a
javascript function. For example the regexp might match this

    jwplayer('some_string').setup({
        /** Other attributes */
        sources: {
             file: "<url of video>",
             label: "<title of video>",
             type: "mp4"
        }
    });

Since this a valid javascript function, some websites write the
options as

    var src = {
        file: "<url of video>",
        label: "<title of video>",
        type: "mp4"
    }
    jwplayer('some_string').setup({
        /** Other attributes */
        sources: src
    });

In this case YoutubeDL won't be able to retrieve sources.file, since
the regexp only matches the ".setup(...)" and ignores the "var src
= ..." assignment.

This commit makes YoutubeDL raise an ExtractorError in the above
case. YoutubeDL will then try alternative methods to retrieve the URL
of the video.
2017-03-21 12:38:51 +02:00
Throaway
8636f42428 Fix issue #12470 - Parse out encoded PH video URLs 2017-03-20 16:36:39 -07:00
TheAMM
f2d5dc6bde [picarto] Add new extractor, Picarto.TV 2017-03-21 00:49:31 +02:00
John Hawkinson
046bb59dc9 [generic] Test for Senate ISVP iframe 2017-03-20 11:55:31 -04:00
John Hawkinson
8a8cc339b6 [senateisvp] Allow https URL scheme for embeds 2017-03-20 23:35:13 +08:00
John Hawkinson
17aa397d16 [SenateISVP] iframes can be https 2017-03-20 11:28:53 -04:00
Vijay Singh
957f453429 [Openload.co] Fixed Extraction
They did it again. just a minor change though. here's quick fix
2017-03-20 16:15:00 +08:00
Vijay Singh
3a3c383732 [Openload.co] Fixed Extraction
They did it again. just a minor change though. here's quick fix
2017-03-20 09:22:32 +05:30
John Hawkinson
a5d5a2c068 [generic] utf8 decode before re.match(), for Python 3
Otherwise we raise
  TypeError: can't use a string pattern on a bytes-like object
This perhaps argues for putting it in is_html(), which already
does this decoding. But of course plain whitespace isn't just
html. So perhaps renaming is_html()? I dunno what is simpler.
Let's start with this.
2017-03-19 21:52:13 -04:00
John Hawkinson
00bc75ca01 [generic] Allow parsing when first 512 bytes are whitespace
is_html(first_bytes) will fail if the first 512 bytes of the URL are
all whitespace, for some weird reason. Such a case probably is not a
direct video link, the case we're concerned about downloading
inadvertently, since that wouldn't be a valid video binary file
format.

But it's still peculiar, so don't silently ignore it -- print a
warning and continue on.
2017-03-19 21:01:47 -04:00
John Hawkinson
6206194c5a [generic] Replace LazyYT test with skiplagged
discourse.ubuntu.com has gone away, repalce with skiplagged.com.
Be nice to have a non-frontpage URL that might be more stable,
though I don't have one. Maybe this should move to html
in test/test_InfoExtractor.py?
2017-03-19 20:52:25 -04:00
Sergey M․
0e9a73e612
release 2017.03.20 2017.03.20 2017-03-20 00:07:57 +07:00
Sergey M․
0ecdd3adbd
[ChangeLog] Actualize 2017-03-20 00:03:58 +07:00
Sergey M․
9487ce03e9
[YoutubeDL] Allow multiple input URLs to be used with stdout as output template 2017-03-19 23:59:40 +07:00
Sergey M․
45e6ad21b4
Credit @mrBliss for vtm (#11912) 2017-03-19 23:48:02 +07:00
motophil
334fdb1922 [Gaskrank] fix for broken site. - requested fix. 2017-03-19 16:45:30 +01:00
Yen Chi Hsuan
68220649fa
[ChangeLog] Update after #12099 2017-03-19 20:42:17 +08:00
John Hawkinson
46b18f2349 [BostonGlobe] New. Nonstandard version of Brightcove.
Has a "data-brightcove-video-id" instead of a "data-video-id," otherwise
pretty much just Brightcove. Except the Globe isn't all Brightcove
videos, so fallback to Generic, too.

Also, abstract playlist_from_matches() from generic.py to common.py, and use
it here.

History of these changes can be found in
51170427d4b1143572a498dedaee61863a5b2c5b.
2017-03-19 20:40:31 +08:00
motophil
9b3cd96034 [Gaskrank] fix for broken site. 2017-03-19 12:54:02 +01:00
Tithen-Firion
fc8e22df8f [tvnplayer] Add extractor 2017-03-19 01:04:24 +01:00
Remita Amine
772b5ff57f [toongoggles] Add new extractor(closes #12171) 2017-03-19 00:45:38 +01:00
Alpesh Valia
f4c968ac5b [hotstar] checked code with flake8 and modified 2017-03-18 22:21:08 +05:30
John Hawkinson
fe6eb793c0 [Brightcove:new] reorder doc refs in code order (again) 2017-03-18 12:36:45 -04:00
Sergey M․
f68ef1e2ab
[medialaan] Remove unrelated test 2017-03-18 23:23:47 +07:00
John Hawkinson
936693bf1b [generic] Skip Test_Generic25, it changes frequently
But it was only added 2 weeks ago by @dstftw in
b68a812ea839e44148516a34a15193189e58ba77, so maybe it should be
handled differently? Remove the test? Mark it only_matching? Not sure.
2017-03-18 12:09:13 -04:00
John Hawkinson
a3915f69fa [generic] remove Brightcove detection messages
Unnecessarily verbose since the extractor itself
will print messages prefixed by IE_NAME.
2017-03-18 11:49:06 -04:00
John Hawkinson
36fc7fb07a [brightcove:new] reorder [1] <video> comment to point of use 2017-03-18 11:47:24 -04:00
John Hawkinson
ca67fc4341 [brightcove:new] don't strip ref: from video_id
Partial revert of 49571c1c2fc0872d8e8cf341cdeacd57d8885236
which was based on a misreading of the regexp extractor.
2017-03-18 11:46:16 -04:00