Prevent HTTP 301 for YouTube playlist continuations

When a YouTube playlist or channel listing has more than one page of videos, the continuation URLs specify `youtube.com` instead of `www.youtube.com`. This causes an unnecessary HTTP round-trip for each continuation page the extractor accesses.

**Example**

<code>
youtube-dl  -s --print-traffic  https://www.youtube.com/channel/UCBR8-60-B28hp2BmDPdntcQ
</code>

**Before**

<code>
GET /playlist?list=UUBR8-60-B28hp2BmDPdntcQ&disable_polymer=true
Host: www.youtube.com
HTTP/1.1 200 OK

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIsEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoOZWdaUVZEcERSMUUlM0Q%253D&disable_polymer=true
Host: youtube.com
HTTP/1.1 301 Moved Permanently
Location: https://www.youtube.com/browse_ajax?action_continuation=1&continuation=4qmFsgIsEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoOZWdaUVZEcERSMUUlM0Q%253D&disable_polymer=true

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIsEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoOZWdaUVZEcERSMUUlM0Q%253D&disable_polymer=true
Host: www.youtube.com
HTTP/1.1 200 OK

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIqEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoMZWdkUVZEcERUV2RD&disable_polymer=true
Host: youtube.com
HTTP/1.1 301 Moved Permanently
Location: https://www.youtube.com/browse_ajax?action_continuation=1&continuation=4qmFsgIqEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoMZWdkUVZEcERUV2RD&disable_polymer=true

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIqEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoMZWdkUVZEcERUV2RD&disable_polymer=true
Host: www.youtube.com
HTTP/1.1 200 OK

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIqEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoMZWdkUVZEcERTM2RE&disable_polymer=true
Host: youtube.com
HTTP/1.1 301 Moved Permanently
Location: https://www.youtube.com/browse_ajax?action_continuation=1&continuation=4qmFsgIqEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoMZWdkUVZEcERTM2RE&disable_polymer=true

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIqEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoMZWdkUVZEcERTM2RE&disable_polymer=true
Host: www.youtube.com
HTTP/1.1 200 OK
</code>

**After**

<code>
GET /playlist?list=UUBR8-60-B28hp2BmDPdntcQ&disable_polymer=true
Host: www.youtube.com
HTTP/1.1 200 OK

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIsEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoOZWdaUVZEcERSMUUlM0Q%253D&disable_polymer=true
Host: www.youtube.com
HTTP/1.1 200 OK

GET /browse_ajax?action_continuation=1&continuation=4qmFsgIqEhpWTFVVQlI4LTYwLUIyOGhwMkJtRFBkbnRjURoMZWdkUVZEcERUV2RD&disable_polymer=true
Host: www.youtube.com
HTTP/1.1 200 OK
</code>
This commit is contained in:
Glenn Slayden 2020-06-24 23:02:06 -07:00 committed by GitHub
parent 9a7e5cb88a
commit bd1340d294
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -303,7 +303,7 @@ class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
# Downloading page may result in intermittent 5xx HTTP error
# that is usually worked around with a retry
more = self._download_json(
'https://youtube.com/%s' % mobj.group('more'), playlist_id,
'https://www.youtube.com/%s' % mobj.group('more'), playlist_id,
'Downloading page #%s%s'
% (page_num, ' (retry #%d)' % count if count else ''),
transform_source=uppercase_escape,