| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Specifically, fix failures when any of the fields from the parsed
comment are None, such as author, author_url, etc.
(failure due to string concatenation when building urls).
|
|
|
|
|
|
| |
It would be 30 since the old method looked to see where the latest
video in the database is in the new batch of videos. New method
finds the first video in the new batch which is in the database.
|
|
|
|
|
|
|
| |
The try statement was missing an except clause. So if there
was an exception such as the tor browser being closed or
getting a 429 error during the request, the workers would
crash until there were none left to handle checking.
|
|
|
|
|
|
|
|
| |
On debian for instance, the default font DejaVu Sans is bigger
than the default calibri/times on Windows, messing up the layout
in some places. The font size in video items was adjusted
slightly to accomodate the change to liberation sans for the
default.
|
| |
|
|
|
|
|
| |
This is likely not a big deal since it is already assumed that video file server logs are not plugged into
Google's tracking infrastructure, but it doesn't hurt to give less info.
|
|
|
|
| |
Display a descriptive error message instead of a traceback
|
|
|
|
|
|
|
|
| |
The default urllib3 max redirect amount was set to 3. Change it to 10 and
do not fail if there is a problem with checking for URL access. Just print
the error to the console and proceed.
Also add an unrelated remark about the bcptr=9999999999 parameter in watch.py
|
|
|
|
|
| |
'ip_address' was not set when no formats are available
'allowed_countries' was set to None rather than [] in extract_desktop_info which it turns out is the function that gets used in these cases
|
|
|
|
| |
working directory is not the directory of the program
|
|
|
|
| |
Because the invidious formats don't have all the information
|
|
|
|
| |
Should help with debugging various content blocks
|
|
|
|
| |
The New Identity button suffices to get the socks proxy to use a new exit node.
|
|
|
|
| |
403 errors on the video urls happen typically when a video has copyrighted content or was livestreamed originally. They appear to not happen (or at least happen less frequently) if the Tor exit node used ipv6, however.
|
|
|
|
| |
the youtube-dl formats table doesn't have all the necessary information
|
|
|
|
|
|
| |
These occur when too many requests are coming from a Tor exit node.
Before, there would be an error page with an exception instructing users to report the issue.
But this is an expected and persistent issue.
|
|
|
|
|
|
|
|
| |
The function body regex was capturing some unrelated new code before the actual function body. Example:
`function(a){a=a.split("");var b=[function(c,d){d=(d%c.length+c.length)%c.length;c.splice(-d).reverse().forEach(function(e){return c.unshift(e)}`
If you look closely, the closing bracket doesn't match the opening one. I have added `{` to the `[^\}]+` part to make sure it only captures matching brackets. Additionally, I've added `return a\.join\(""\)` to the end for good measure.
|
|
|
|
|
|
| |
playlist is chosen when using "add to playlist"
See #4
|
| |
|
| |
|
|
|
|
| |
Also change superfluous use of multi_get to item.get nearby
|
|
|
|
|
| |
By checking first if it's in item_types rather than checking if it can be dug into first.
For example: this allows extracting things like sectionListRenderer
|
|
|
|
|
|
| |
independently
extract_items_from_renderer will extract given just a renderer rather than a response
|
|
|
|
|
|
|
| |
Change usage of multi_deep_get to multi_get where possible
Remove checking of type from calls to get functions (because it's very unlikely Youtube suddenly changes the type without changing the name of the variable or anything, and it takes up unnecessary space)
Remove all default=None arguments from get functions, since those are superflous.
Remove list_types constant since it's no longer in use.
|
| |
|
| |
|
|
|
|
| |
Philosophically, a channel doesn't create itself.
|
|
|
|
|
|
|
|
| |
more robust
For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int.
Make extract_approx_int and extract_int only extract integers that are words.
So e.g. 342 will not be extracted from internetuser342
|
| |
|
|
|
|
|
|
| |
items.
Examples: Empty channels, no search results
|
|
|
|
| |
Because youtube doesn't give the number of search results, so previous behavior would give an error if a page number out of range was selected.
|
|
|
|
| |
Don't display a nasty traceback in that case.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
than just channel id.
It was set to a fake value of 1000 previously in order to ensure there would be enough page buttons.
This was because two sequential requests are necessary (one to get the channel id corresponding to the custom url, another to get the number of videos from the "all uploaded videos" playlist, the url for which can be generated from the channel id).
Since Tor has a high latency, I thought at the time that this would be too slow, but in practice it's not too big of a deal.
Introduces cachetools dependency in order to cache the function which gets the number of videos.
The get_channel_id function has also been fixed since the ajax api seems to have been removed.
|
|
|
|
|
| |
Deduplicates the code. channel_id logic was previously separate because of the need to get the number of videos and different page numbers
Also makes search work for general urls, not just channel_id urls
|
|
|
|
|
|
| |
names. Extract subscriber_count correctly.
Don't just shove english strings into info['stats']. Actually give semantic names for the stats.
|
|
|
|
| |
It wasn't working because decimals such as 15.1M weren't considered, so it was extracting "1M"
|
| |
|
| |
|
|
|
|
|
| |
- Correctly handle /embed, /watch with no video ids
- Correctly report error for this and for too short video ids
|
|
|
|
|
|
| |
page in general.
Also add a link to github for reporting the exception.
|
| |
|
| |
|
|
|
|
| |
Especially for the light theme
|
|
|
|
| |
yt_data_extract
|
|
|
|
| |
Otherwise, it wasn't clear enough that a tag was selected.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commits in this branch are prefixed with "Extraction:"
This branch refactors data extraction. All such functionality has been moved to the yt_data_extract module.
Responses from requests are given to the module and it parses them into a consistent, more useful format.
The dependency on youtube-dl has also been dropped and this functionality has been built from scratch for these reasons:
(1) I've noticed youtube-dl breaks more often than invidious (which uses watch page extraction built from scratch) in response to changes from Youtube, so I'm hoping what I wrote will also be less brittle.
(2) Such breakage is inconvenient because I have to manually merge the fixes since I had to make changes to youtube-dl to make it do things such as extracting related videos.
(3) I have no control over error handling and request pooling with youtube-dl, since it does all the requests (these would require intrusive changes I don't want to maintain).
(4) I will now be able to finally display the number of comments and whether comments are disabled without making additional requests.
|
| |
| |
| |
| | |
yt_data_extract
|
| |
| |
| |
| |
| |
| | |
internal helper function names
Move get_captions_url in watch_extraction to bottom next to other exported, public functions
|
| | |
|