Seperate out the concept of the executable used by an ExternalFD
downloader and the name of the class. I'm not entirely sure why we
care aobut the name of the class at all, but it's used outside of
classes to initialize _BY_NAME() and soforth, so it seems impractical
to just change get_basename() to return the executable name
So instead, call get_execname() not get_basename().
Default get_execname() to calling get_basename(), but override it in
FFmpegFD, where it returns self.execname, which is set in
_call_downloader().
Perhaps there is a less complicated way to achieve this goal?
Extractor for npo.nl programs. Retrieves only recent episodes of the program in question (hence the name...). Some programs have so many episodes available that it doesn't make any practical sense to retrieve all.
YoutubeDL uses a regexp in common.py::_find_jwplayer_data
to find the jwplayer options. However the options are found in a
javascript function. For example the regexp might match this
jwplayer('some_string').setup({
/** Other attributes */
sources: {
file: "<url of video>",
label: "<title of video>",
type: "mp4"
}
});
Since this a valid javascript function, some websites write the
options as
var src = {
file: "<url of video>",
label: "<title of video>",
type: "mp4"
}
jwplayer('some_string').setup({
/** Other attributes */
sources: src
});
In this case YoutubeDL won't be able to retrieve sources.file, since
the regexp only matches the ".setup(...)" and ignores the "var src
= ..." assignment.
This commit makes YoutubeDL raise an ExtractorError in the above
case. YoutubeDL will then try alternative methods to retrieve the URL
of the video.
Otherwise we raise
TypeError: can't use a string pattern on a bytes-like object
This perhaps argues for putting it in is_html(), which already
does this decoding. But of course plain whitespace isn't just
html. So perhaps renaming is_html()? I dunno what is simpler.
Let's start with this.
is_html(first_bytes) will fail if the first 512 bytes of the URL are
all whitespace, for some weird reason. Such a case probably is not a
direct video link, the case we're concerned about downloading
inadvertently, since that wouldn't be a valid video binary file
format.
But it's still peculiar, so don't silently ignore it -- print a
warning and continue on.
discourse.ubuntu.com has gone away, repalce with skiplagged.com.
Be nice to have a non-frontpage URL that might be more stable,
though I don't have one. Maybe this should move to html
in test/test_InfoExtractor.py?
Has a "data-brightcove-video-id" instead of a "data-video-id," otherwise
pretty much just Brightcove. Except the Globe isn't all Brightcove
videos, so fallback to Generic, too.
Also, abstract playlist_from_matches() from generic.py to common.py, and use
it here.
History of these changes can be found in
51170427d4b1143572a498dedaee61863a5b2c5b.
But it was only added 2 weeks ago by @dstftw in
b68a812ea839e44148516a34a15193189e58ba77, so maybe it should be
handled differently? Remove the test? Mark it only_matching? Not sure.