Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | extract_item_info: Don't extract author, author_id, etc. for channel items | James Taylor | 2019-12-24 | 1 | -7/+8 |
| | | | | Philosophically, a channel doesn't create itself. | ||||
* | Fix extract_approx_int not working for non-approx ints, make extract_int ↵ | James Taylor | 2019-12-24 | 1 | -2/+2 |
| | | | | | | | | more robust For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int. Make extract_approx_int and extract_int only extract integers that are words. So e.g. 342 will not be extracted from internetuser342 | ||||
* | Rewrite channel extraction with proper error handling and new extraction ↵ | James Taylor | 2019-12-21 | 1 | -2/+5 |
| | | | | | | names. Extract subscriber_count correctly. Don't just shove english strings into info['stats']. Actually give semantic names for the stats. | ||||
* | Fix extract_approx_int. Fixes incorrect subscriber count on channels. | James Taylor | 2019-12-21 | 1 | -2/+2 |
| | | | | It wasn't working because decimals such as 15.1M weren't considered, so it was extracting "1M" | ||||
* | Fix regression: date extraction broken. Move constants to correct file in ↵ | James Taylor | 2019-12-20 | 1 | -1/+2 |
| | | | | yt_data_extract | ||||
* | Extraction: Move stuff around in files and put underscores in front of ↵ | James Taylor | 2019-12-19 | 1 | -9/+8 |
| | | | | | | internal helper function names Move get_captions_url in watch_extraction to bottom next to other exported, public functions | ||||
* | Extraction: Move html post processing stuff from yt_data_extract to util | James Taylor | 2019-12-19 | 1 | -39/+0 |
| | |||||
* | Extraction: Split yt_data_extract.py into multiple files | James Taylor | 2019-12-19 | 1 | -0/+455 |