aboutsummaryrefslogtreecommitdiffstats
path: root/youtube/yt_data_extract
Commit message (Collapse)AuthorAgeFilesLines
...
* Retrieve base.js url from html watch page when it's missingJames Taylor2020-12-092-1/+15
| | | | Fixes failure mode 3 in #22
* yt_data_ext: watch playlist: Fix missing author_url if no author_idJames Taylor2020-11-081-3/+2
| | | | | | | Embedded playlist info was missing author_url key if author_id was None. This caused KeyError in watch.py when it expected that key Closes #37
* Redo fix for failure mode 1 in issue #22James Taylor2020-10-211-4/+4
| | | | | Previous fix didn't work. Should work now. The non-embedded player response can still be present but the urls will be missing.
* remove trailing whitespaceszrose5842020-10-213-3/+3
|
* Use get_video_info to get video urls if player response missingJames Taylor2020-10-191-2/+8
| | | | Fixes failure mode 1 in #22
* yt_data_extract: normalize thumbnail and author urlsJames Taylor2020-10-192-12/+17
| | | | | | | | | | for instance, urls that start with // become https:// adjustment required in comments.py because the url was left as a relative url in yt_data_extract by mistake and was using URL_ORIGIN prefix as fix. see #31
* Specify video height in html so page doesn't shift down after loadJames Taylor2020-09-241-2/+9
| | | | | Use true video height extracted from youtube to handle videos shorter than their quality size. (e.g. widescreen videos)
* yt_data_extract: Fix time_published picking up 'Streaming' stringJames Taylor2020-08-121-1/+5
| | | | | This was causing an exception in subscriptions when it tried to estimate the unix timestamp for the upload time
* Switch to mobile api endpoint to fix 'Unknown error' blockageJames Taylor2020-08-111-9/+18
| | | | See https://github.com/iv-org/invidious/issues/1319#issuecomment-671732646
* extract_items: Handle case where continuation has multipleJames Taylor2020-08-112-11/+23
| | | | | | | | | | | | [something]Continuation renderers, all of which are junk except one. Check the items in each one until the one which contains the items being sought is found. The usage in extract_comments_info needed to be changed to specify the items being sought. It was unspecified before which is strictly incorrect since extract_items by default looks for video/playlist/channel thumbnail items. It was relying on this special case for continuations. But now that wouldn't work anymore.
* extract_channel_info: Improve error extractionJames Taylor2020-08-111-3/+6
| | | | | | | | Use extract_str function since it's not always 'simpleText' Make sure we don't output an empty error message if we don't know what it is. channel.py: Don't check if error message is empty, check if it's None
* Fix hls_manifest_url not included when there's no other formatsJames Taylor2020-06-281-2/+6
| | | | | | | Since there are no formats, it was retrying with the non-embedded playerResponse, which resulted in the hls_manifest_urls from the embedded player_response being overwritten with None. So use conservative_update instead
* Add dialog for copying urls to external player for livestreamsJames Taylor2020-06-282-11/+53
| | | | | Also for livestreams which are over whose other sources aren't present or aren't ready yet.
* Handle case where embedded player response missingJames Taylor2020-06-281-2/+10
| | | | | | | | Change so it extracts other stuff from regular playerResponse Extract formats from embedded player response, but fallback to regular one if that doesn't work. Sometimes there is no 'player' at top_level and the urls are in the regular playerResponse
* Do not override previous playability error if unknownJames Taylor2020-06-281-1/+1
|
* Fix previously live videos labeled as liveJames Taylor2020-05-291-1/+3
|
* Fix broken signature decryptionJames Taylor2020-05-271-1/+2
| | | | | | | | | The base.js url format changed, so the identifier at the end was no longer unique. So it was using the wrong cached decryption function Changes the identifier to just be the whole url so this won't happen again.
* Fix urls sometimes not extracted due to youtube changesJames Taylor2020-05-271-1/+2
| | | | | The 'cipher' parameter which contains the url is sometimes called 'signatureCipher' instead now.
* Fix error getting exit node ip if format urls are NoneJames Taylor2020-05-271-1/+1
|
* Fix comment count & disabled extraction not working sometimesJames Taylor2020-04-101-3/+14
| | | | because of A/B test.
* Fix related video extraction sometimes failingJames Taylor2020-04-101-2/+10
| | | | Youtube added some pointless variation in variable names
* Fix exception due to missing 'playlist' key in extracted infoJames Taylor2020-04-051-0/+3
| | | | | | Happens when there's an error on the page and there was no visible stuff on the page. 'playlist' wasn't set to None in that case.
* Fix error when there's a video format with mimetype class of 'text'James Taylor2020-04-041-1/+1
|
* Add playlist sidebar for videos in playlist, including autoplayJames Taylor2020-04-042-2/+58
|
* yt_data_extract: fix missing variables in info for unavailable videosJames Taylor2020-02-171-2/+3
| | | | | 'ip_address' was not set when no formats are available 'allowed_countries' was set to None rather than [] in extract_desktop_info which it turns out is the function that gets used in these cases
* Watch page: add info box with allowed countries and tor exit nodeJames Taylor2020-02-011-0/+8
| | | | Should help with debugging various content blocks
* Check for 403 errors and fallback on InvidiousJames Taylor2020-02-011-1/+2
| | | | 403 errors on the video urls happen typically when a video has copyrighted content or was livestreamed originally. They appear to not happen (or at least happen less frequently) if the Tor exit node used ipv6, however.
* yt_data_extract: parse mimeType field for codecsJames Taylor2020-02-011-0/+27
| | | | the youtube-dl formats table doesn't have all the necessary information
* Fix signature decryption.James Taylor2020-01-241-1/+1
| | | | | | | | The function body regex was capturing some unrelated new code before the actual function body. Example: `function(a){a=a.split("");var b=[function(c,d){d=(d%c.length+c.length)%c.length;c.splice(-d).reverse().forEach(function(e){return c.unshift(e)}` If you look closely, the closing bracket doesn't match the opening one. I have added `{` to the `[^\}]+` part to make sure it only captures matching brackets. Additionally, I've added `return a\.join\(""\)` to the end for good measure.
* Fix playlist id extraction for radio renderersJames Taylor2019-12-311-1/+1
|
* Extraction: Correctly extract view_count for vids with 0 views.James Taylor2019-12-301-1/+9
| | | | Also change superfluous use of multi_get to item.get nearby
* extract_items: allow extracting items that are normally dug into for moreJames Taylor2019-12-261-5/+5
| | | | | By checking first if it's in item_types rather than checking if it can be dug into first. For example: this allows extracting things like sectionListRenderer
* yt_data_extract: Split up extract_items so renderer extraction works ↵James Taylor2019-12-261-47/+48
| | | | | | independently extract_items_from_renderer will extract given just a renderer rather than a response
* yt_data_extract.common: Simplify usage of get functions and remove dead codeJames Taylor2019-12-261-18/+11
| | | | | | | Change usage of multi_deep_get to multi_get where possible Remove checking of type from calls to get functions (because it's very unlikely Youtube suddenly changes the type without changing the name of the variable or anything, and it takes up unnecessary space) Remove all default=None arguments from get functions, since those are superflous. Remove list_types constant since it's no longer in use.
* yt_data_extract: Simplify extract_items so it needs only 1 while loopJames Taylor2019-12-261-32/+31
|
* extract_item_info: Don't extract author, author_id, etc. for channel itemsJames Taylor2019-12-241-7/+8
| | | | Philosophically, a channel doesn't create itself.
* Fix extract_approx_int not working for non-approx ints, make extract_int ↵James Taylor2019-12-241-2/+2
| | | | | | | | more robust For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int. Make extract_approx_int and extract_int only extract integers that are words. So e.g. 342 will not be extracted from internetuser342
* Regression: Fix channel extraction 'items' key not present when there's no ↵James Taylor2019-12-231-2/+3
| | | | | | items. Examples: Empty channels, no search results
* Channel: Change search results to use next and previous page buttonsJames Taylor2019-12-231-1/+3
| | | | Because youtube doesn't give the number of search results, so previous behavior would give an error if a page number out of range was selected.
* Rewrite channel extraction with proper error handling and new extraction ↵James Taylor2019-12-212-45/+40
| | | | | | names. Extract subscriber_count correctly. Don't just shove english strings into info['stats']. Actually give semantic names for the stats.
* Fix extract_approx_int. Fixes incorrect subscriber count on channels.James Taylor2019-12-211-2/+2
| | | | It wasn't working because decimals such as 15.1M weren't considered, so it was extracting "1M"
* Fix regression: date extraction broken. Move constants to correct file in ↵James Taylor2019-12-202-2/+2
| | | | yt_data_extract
* Extraction: Move non-stateful signature decryption functionality into ↵James Taylor2019-12-192-1/+98
| | | | yt_data_extract
* Extraction: Move stuff around in files and put underscores in front of ↵James Taylor2019-12-193-38/+37
| | | | | | internal helper function names Move get_captions_url in watch_extraction to bottom next to other exported, public functions
* Extraction: Move html post processing stuff from yt_data_extract to utilJames Taylor2019-12-192-41/+1
|
* Extraction: Split yt_data_extract.py into multiple filesJames Taylor2019-12-194-0/+1188