aboutsummaryrefslogtreecommitdiffstats
path: root/youtube/yt_data_extract/common.py
Commit message (Collapse)AuthorAgeFilesLines
* Fix related vids, like_count, playlist sometimes missingJesus2023-09-111-9/+13
| | | | | | | | | Cause is that some pages have the onResponseReceivedEndpoints key at the top level with useless stuff in it, and the extract_items function was searching in that instead of the 'contents' key. Change to use if blocks instead of elif blocks in the extract_items function.
* Fix minor formatting issuesJesus E2023-06-171-5/+5
|
* Merge short and video parsing even furtherJesus E2023-06-171-29/+24
| | | | | Use multi_get and multi_deep_get for tag differences Replace the duration check with conservative_update
* Merge short and video parsingJesus E2023-06-171-43/+25
|
* Fix parsing shortsJesus E2023-06-171-7/+7
| | | | | | Add check for extracting duration for shorts Make short duration extraction stricter Fix handling shorts with no views
* Add functional but preliminary channel tab supportJesus E2023-06-171-0/+47
| | | | | | | Add channel tabs to the channel template and script Update continuation token to request different tabs Add support for 'reelItemRenderer' format required to extract shorts
* Fix music list extractionJesus E2023-05-281-0/+3
| | | | Closes #160
* Update channel to new ctoken formatJesus E2023-05-281-2/+6
| | | | | | Huge thanks to @michaelweiser Different sortings still don't work for videos and playlists
* Revert "Usage hqdefault thumbnail in related videos"Jesús2021-09-141-3/+2
| | | | This reverts commit a0c3ca0159136d17eefa129176ae1904110238b8.
* Usage hqdefault thumbnail in related videosJesús2021-09-141-2/+3
|
* Support more audio and video qualitiesJames Taylor2021-08-311-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Adds support for AV1-encoded videos, which includes any videos above 1080p. These weren't getting included because they did not have a quality entry in the format table at the top of watch_extraction.py. So get the quality from the quality labels of the format if it's not there. Because YouTube often includes BOTH AV1 and H.264 (AVC) for each quality, after these are included, there will be way too many quality options and the code needs to choose which one to use. The choice is somewhat hard: AV1 is encoded in fewer bytes than H.264 and is patent-free, however, it has less hardware support, so might be more difficult to play. For instance, on my system, AV1 does not work on 1080p, but H.264 does. Adds a setting about which to prefer, set to H.264 as the default. Also adds support for the lower quality mp4 audio quality, which now gets used at 144p to save network bandwidth. For similar reasons, this was not getting included because it did not have an audio_bitrate entry in the table. Prefer bitrate instead for the quality. Signed-off-by: Jesús <heckyel@hyperbola.info>
* Fix comments extraction due to new response continuation key nameJames Taylor2021-08-231-2/+6
| | | | Signed-off-by: Jesús <heckyel@hyperbola.info>
* Fix description extraction in search resultsJames Taylor2021-08-091-1/+5
| | | | Signed-off-by: Jesús <heckyel@hyperbola.info>
* Fix (dis)like, music list extraction due to YouTube changes (again)James Taylor2021-08-091-5/+29
| | | | | | | | | | | | | | | YouTube reverted the changes they made that prompted f9f5d5ba. In case they change their minds again, this adds support for both formats. The liberal_update and conservative_update functions needed to be modified to handle the cases of empty lists, so that a successfully extracted 'music_list': [{'Author':...},...] will not be overwritten by 'music_list': [] in the calls to liberal_dict_update. Signed-off-by: Jesús <heckyel@hyperbola.info>
* Switch to new comments api now that old one is being disabledJames Taylor2021-08-091-7/+32
| | | | | | | | | | | watch_comment api periodically gives the error "Top level comments mweb servlet is turned down." The continuation items for the new api are in a different arrangement in the json, so changes were necessary to the extract_items function. Signed-off-by: Jesús <heckyel@hyperbola.info>
* Fix missing likes, dislikes, & music list due to Youtube changesJames Taylor2021-07-281-7/+23
| | | | | | | | | Also moves some microformat extraction from _extract_watch_info_mobile to extract_watch_info where it belongs. _extract_watch_info_mobile is really only for stuff visible on the page, and thus specialized for either mobile or desktop. Signed-off-by: Jesús <heckyel@hyperbola.info>
* Capitalize name appJesús2021-06-101-1/+1
|
* Fix videos added to playlist from channel page not having authorJames Taylor2021-05-171-2/+3
| | | | | | Information from additional_info was being overrided with None. Signed-off-by: Jesús <heckyel@hyperbola.info>
* Channel: Allow going to next pages of playlists pageJames Taylor2021-03-151-0/+8
| | | | | | | Uses previous and next buttons. Now can view more than just first page of playlists page Signed-off-by: Jesús <heckyel@hyperbola.info>
* Use new channel api endpoint now that browse_ajax is disabledJames Taylor2021-03-031-0/+5
| | | | | | Fixes channel pages > 1 Signed-off-by: Jesús <heckyel@hyperbola.info>
* yt_data_ext: support richGrid&richItem sometimes used on searchJames Taylor2021-02-131-1/+3
| | | | | | Some searches have these renderers instead of the usual ones Signed-off-by: Jesús <heckyel@hyperbola.info>
* Fix youtube mixesJames Taylor2020-12-181-0/+5
| | | | | | | They cannot be viewed on their own, so change url in items to go to the video+playlist instead Signed-off-by: Jesús <heckyel@hyperbola.info>
* remove trailing whitespaceszrose5842020-10-211-1/+1
|
* yt_data_extract: normalize thumbnail and author urlsJames Taylor2020-10-191-6/+11
| | | | | | | | | | for instance, urls that start with // become https:// adjustment required in comments.py because the url was left as a relative url in yt_data_extract by mistake and was using URL_ORIGIN prefix as fix. see #31
* yt_data_extract: Fix time_published picking up 'Streaming' stringJames Taylor2020-08-121-1/+5
| | | | | This was causing an exception in subscriptions when it tried to estimate the unix timestamp for the upload time
* extract_items: Handle case where continuation has multipleJames Taylor2020-08-111-10/+21
| | | | | | | | | | | | [something]Continuation renderers, all of which are junk except one. Check the items in each one until the one which contains the items being sought is found. The usage in extract_comments_info needed to be changed to specify the items being sought. It was unspecified before which is strictly incorrect since extract_items by default looks for video/playlist/channel thumbnail items. It was relying on this special case for continuations. But now that wouldn't work anymore.
* Fix related video extraction sometimes failingJames Taylor2020-04-101-2/+10
| | | | Youtube added some pointless variation in variable names
* Add playlist sidebar for videos in playlist, including autoplayJames Taylor2020-04-041-0/+26
|
* Fix playlist id extraction for radio renderersJames Taylor2019-12-311-1/+1
|
* Extraction: Correctly extract view_count for vids with 0 views.James Taylor2019-12-301-1/+9
| | | | Also change superfluous use of multi_get to item.get nearby
* extract_items: allow extracting items that are normally dug into for moreJames Taylor2019-12-261-5/+5
| | | | | By checking first if it's in item_types rather than checking if it can be dug into first. For example: this allows extracting things like sectionListRenderer
* yt_data_extract: Split up extract_items so renderer extraction works ↵James Taylor2019-12-261-47/+48
| | | | | | independently extract_items_from_renderer will extract given just a renderer rather than a response
* yt_data_extract.common: Simplify usage of get functions and remove dead codeJames Taylor2019-12-261-18/+11
| | | | | | | Change usage of multi_deep_get to multi_get where possible Remove checking of type from calls to get functions (because it's very unlikely Youtube suddenly changes the type without changing the name of the variable or anything, and it takes up unnecessary space) Remove all default=None arguments from get functions, since those are superflous. Remove list_types constant since it's no longer in use.
* yt_data_extract: Simplify extract_items so it needs only 1 while loopJames Taylor2019-12-261-32/+31
|
* extract_item_info: Don't extract author, author_id, etc. for channel itemsJames Taylor2019-12-241-7/+8
| | | | Philosophically, a channel doesn't create itself.
* Fix extract_approx_int not working for non-approx ints, make extract_int ↵James Taylor2019-12-241-2/+2
| | | | | | | | more robust For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int. Make extract_approx_int and extract_int only extract integers that are words. So e.g. 342 will not be extracted from internetuser342
* Rewrite channel extraction with proper error handling and new extraction ↵James Taylor2019-12-211-2/+5
| | | | | | names. Extract subscriber_count correctly. Don't just shove english strings into info['stats']. Actually give semantic names for the stats.
* Fix extract_approx_int. Fixes incorrect subscriber count on channels.James Taylor2019-12-211-2/+2
| | | | It wasn't working because decimals such as 15.1M weren't considered, so it was extracting "1M"
* Fix regression: date extraction broken. Move constants to correct file in ↵James Taylor2019-12-201-1/+2
| | | | yt_data_extract
* Extraction: Move stuff around in files and put underscores in front of ↵James Taylor2019-12-191-9/+8
| | | | | | internal helper function names Move get_captions_url in watch_extraction to bottom next to other exported, public functions
* Extraction: Move html post processing stuff from yt_data_extract to utilJames Taylor2019-12-191-39/+0
|
* Extraction: Split yt_data_extract.py into multiple filesJames Taylor2019-12-191-0/+455