29971 Commits

Author SHA1 Message Date
juandisay
418c64b0eb extracting data audio config youtube 2020-03-01 10:34:18 +07:00
juandisay
38e70fb328 update new 2020-03-01 07:04:58 +07:00
Sergey M․
6d475d01d8
[telecinco] Add support for article opening videos 2020-03-01 03:09:19 +07:00
Sergey M․
f8cbd8c963
[telecinco] Fix extraction (refs #24195) 2020-03-01 01:04:51 +07:00
Martin Nadal
b7ad62a1e1 [telecinco] works again 2020-02-29 18:50:07 +01:00
Sergey M․
838f051c4b
[xtube:user] Fix test 2020-02-29 23:51:56 +07:00
Sergey M․
e88b450771
[xtube] Fix metadata extraction (closes #21073, closes #22455) 2020-02-29 23:51:34 +07:00
Sergey M․
278355bae4
[zapiks] Fix test 2020-02-29 23:09:13 +07:00
Sergey M․
b4cbdbd4b3
[zdf:channel] Fix tests 2020-02-29 23:06:36 +07:00
Sergey M․
ea17979d83
[test_subtitles] Remove obsolete test 2020-02-29 22:08:43 +07:00
Sergey M․
886d985959
[youjizz] Fix extraction (closes #24181) 2020-02-29 21:58:22 +07:00
Sergey M․
7947a1f7db
Remove no longer needed compat_str around geturl 2020-02-29 19:19:24 +07:00
Sergey M․
fca6dba8b8
[YoutubeDL] Force redirect URL to unicode on python 2 2020-02-29 19:08:44 +07:00
Sergey M․
e2f8bf5888
[extractor/common] Convert ISM manifest to unicode before processing on python 2 (#24152) 2020-02-29 17:29:30 +07:00
The Hatsune Daishi
b76f0e58f7
[options] Remove duplicate short option -v for --version (#24162) 2020-02-29 16:33:09 +07:00
Mark
80f7dfce93 change bitchute channel test to new channel url (old url is 404) 2020-02-28 18:57:27 -06:00
Mark
6dd76fed80 change bitchute test to new video url (old url is 404) 2020-02-28 18:54:58 -06:00
Mark
d6add6313d bitchute - parse timestamp 2020-02-28 18:38:46 -06:00
tsia
620ca40bbb
Merge pull request #13 from ytdl-org/master
update
2020-02-28 18:54:22 +01:00
Guillem Vela
b4a7030620 [CCMA] Avoid exception when 'utc' is not found 2020-02-28 15:29:39 +01:00
nao20010128nao
689d621b40 [options] Remove shorthand for --version 2020-02-28 10:06:11 +00:00
rubyist
c1020cf113 Added tests for Matter extractor 2020-02-27 19:42:51 -08:00
rubyist
b5879f6e44 Don't use _html_search_regex when there's no html to filter out 2020-02-27 19:10:31 -08:00
rubyist
8c5c97a0d3 Be a little less specific about what an artist username looks like 2020-02-27 19:06:16 -08:00
rubyist
a7ca0f9303 Add initial extractor for Matter Online 2020-02-27 18:52:48 -08:00
nao20010128nao
e07b5e42d6 [options] assign -V to --version to resolve conflict with --verbose 2020-02-28 01:33:13 +00:00
Guillem Vela
8c60c29d34 [CCMA] Fix multiple subtitles incompatibility
CCMA extractor used to raise an exception when attempting the download of
a URL featuring multiple languages in the subtitles.

When a single language is available, the field is the expected dict.
When multiple languages are available, a list of dicts is provided.

This commit fixes this issue.
2020-02-27 22:37:35 +01:00
Guillem Vela
69c4e35907 [CCMA] Add test with multiple subtitles
Added test is one of the cases of broken compatibility.
Issue is in featuring multiple languages in the subtitles field.
2020-02-27 22:37:35 +01:00
Guillem Vela
8fbefa5cf7 [CCMA] Fix wrong timestamp issue
For some reason, provided UTC timestamp does not comply ISO8601, as its
format is YYYY-DD-MM instead of expected YYYY-MM-DD.

This can be checked with the also provided "text" field of
emission date object. Example:
"data_emissio": {
            "text": "14/05/2002�21:39",
            "utc": "2002-14-05T21:39:28+0200"
}

This commit fixes this behavior.
2020-02-27 22:37:35 +01:00
eleph-hub
f04954f46e
fixed flake8 findings
sorry.
2020-02-27 20:39:55 +01:00
juandisay
49c24c6682 restore 2020-02-27 17:31:02 +07:00
juandisay
d0f91e2083 new sanitized request url mewatch 2020-02-27 09:53:38 +07:00
eleph-hub
8a28ae7108
fix for url pattern
fixed pattern https://www.servus.com/tv/videos/aa-1t6vbu5pw1w12/
2020-02-25 21:27:01 +01:00
eleph-hub
4e04cbf35e
fixed ServusTV support is broken #23475
added proposed fix from PR23583 and added a test for the new URL pattern.
2020-02-25 20:59:01 +01:00
Alexandre Vallières-Lagacé
810dfad7b9
Added Support for HGTV.ca
Since both hgtv.ca and hgtv.com are using the same CMS, by updating the
existing script to support both domains, it works well.
2020-02-24 15:44:45 -05:00
Anarky
e8d102e1bf [rmcdecouverte] Expand to RMC Story 2020-02-24 19:45:15 +01:00
Sergey M․
bee6451fe8
[pornhd] Fix extraction (closes #24128) 2020-02-24 04:47:56 +07:00
Sergey M․
00d798b7c2
[teachable] Add support for multiple videos per lecture (closes #24101) 2020-02-23 06:49:45 +07:00
Sergey M․
fda6d237a5
[wistia] Add support for multiple generic embeds (closes #8347, closes #11385) 2020-02-23 06:47:11 +07:00
Sergey M․
5d9f6cbc5a
[imdb] Fix extraction (closes #23443) 2020-02-23 04:33:29 +07:00
Vytautas Jakutis
5a639d17e5 I wish there was a timestamp in "OUTPUT TEMPLATE" for RSS/Atom 2020-02-22 20:16:33 +02:00
dmsummers
08c47b7af9 [simplecast] Add new extractor 2020-02-21 23:27:34 -06:00
3risian
d1af4dafab
[PeerTube] Rename try_get_second_level_data() to data() 2020-02-20 13:27:22 +11:00
tomyang001
653c200e35
Update bilibili.py 2020-02-19 19:30:00 -05:00
tomyang001
4a0a254b41
Delete pythonapp.yml 2020-02-19 19:10:13 -05:00
tomyang001
82ee7c5946
Create pythonapp.yml 2020-02-19 18:47:07 -05:00
tomyang001
d723a946d7
Add files via upload 2020-02-19 18:44:35 -05:00
TinyToweringTree
1326a5aa38 [archiveorg] Make metadata extraction more robust 2020-02-20 00:02:32 +01:00
TinyToweringTree
b98d1c0d5a [archiveorg] Use and fix get_element_by_class()
Use get_element_by_class() from utils to get rid of yet another regex.
This function used to return only the content of the element, and not
the element itself, including its tag and attributes. The whole group
of get_element_by_X() functions are a bit of a misnomer, as they all
return the *content* of the element and not the element itself.

All these functions can now return the whole element when setting
their `include_tag` parameter to `True`. By default it is `False` so
no other code will be affected by this change. Tests have been added
to test/test_utils.py accordingly.

This uncovered a bug which prevented elements starting with a hyphen as
their class name from being found. This has been fixed by fixing the
regex used in get_elements_by_class().
2020-02-19 22:42:00 +01:00
TinyToweringTree
e910f498d3 [archiveorg] Use extract_attributes() 2020-02-19 22:04:47 +01:00