Fix extract_approx_int not working for non-approx ints, make extract_int more robust

For example, "354 subscribers" wasn't being extracted correctly be extract_approx_int. Make extract_approx_int and extract_int only extract integers that are words. So e.g. 342 will not be extracted from internetuser342
author: James Taylor <user234683@users.noreply.github.com> 2019-12-24 13:07:12 -0800
committer: James Taylor <user234683@users.noreply.github.com> 2019-12-24 13:07:12 -0800
commit: 3200d66d880d72ba2c4e687840d31c9c98c66f6a (patch)
tree: f293fec931ee26e4b18f1f6c16e23f4a88a35469 /youtube
parent: a428d47bde199a3837dcc0208cb240a1dac61992 (diff)
download: yt-local-3200d66d880d72ba2c4e687840d31c9c98c66f6a.tar.lz
yt-local-3200d66d880d72ba2c4e687840d31c9c98c66f6a.tar.xz
yt-local-3200d66d880d72ba2c4e687840d31c9c98c66f6a.zip
1 files changed, 2 insertions, 2 deletions
diff --git a/youtube/yt_data_extract/common.py b/youtube/yt_data_extract/common.py
index 06f0e95..4af76c2 100644
--- a/youtube/yt_data_extract/common.py
+++ b/youtube/yt_data_extract/common.py
@@ -135,7 +135,7 @@ def extract_int(string, default=None):
         string = extract_str(string)
     if not string:
         return default
-    match = re.search(r'(\d+)', string.replace(',', ''))
+    match = re.search(r'\b(\d+)\b', string.replace(',', ''))
     if match is None:
         return default
     try:
@@ -149,7 +149,7 @@ def extract_approx_int(string):
         string = extract_str(string)
     if not string:
         return None
-    match = re.search(r'(\d+(?:\.\d+)?[KMBTkmbt])', string.replace(',', ''))
+    match = re.search(r'\b(\d+(?:\.\d+)?[KMBTkmbt]?)\b', string.replace(',', ''))
     if match is None:
         return None
     return match.group(1)
author	James Taylor <user234683@users.noreply.github.com>	2019-12-24 13:07:12 -0800
committer	James Taylor <user234683@users.noreply.github.com>	2019-12-24 13:07:12 -0800
commit	3200d66d880d72ba2c4e687840d31c9c98c66f6a (patch)
tree	f293fec931ee26e4b18f1f6c16e23f4a88a35469 /youtube
parent	a428d47bde199a3837dcc0208cb240a1dac61992 (diff)
download	yt-local-3200d66d880d72ba2c4e687840d31c9c98c66f6a.tar.lz yt-local-3200d66d880d72ba2c4e687840d31c9c98c66f6a.tar.xz yt-local-3200d66d880d72ba2c4e687840d31c9c98c66f6a.zip