Last updated on May 31st, 2025 at 08:09 am
I think all Pythonistas will be able to relate to this hurdle.
You have a large downloaded asset that you have to analyse & draw insights from.
Now, if you’re someone who generally uses external IDEs like Google Collab or Replit then this will be a huge pain to upload the large asset which can take it’s sweet time & then you have to analyze & generate insights out of it.
At times, it happens that while the import is happening, the runtime may get disconnected which is yet another pain to start all over again.
This is what I realised this time when I was about to do Server Access Log File Analysis. I was already aware of the fact that these log files are HUGE & all I wanted to know was the split between crawl requests from different response codes from Google Bot alone.
I could have used a log file analysis tool, but my use case was minor, so I didn’t want to go through all of that.
Now, in this blog post, I will explain how you can use Visual Studio Code, specify the Local File Destination in the terminal & have the analysis done within seconds.
Step 1 - Build the Python Script (I am sharing it so you don't have to build it)
import os
import re
import argparse
from collections import defaultdict
def analyze_googlebot_status_codes(log_file_path):
"""
Analyze log file for Googlebot crawls and count HTTP status codes.
Args:
log_file_path (str): Path to the log file.
Returns:
dict: Dictionary with status categories as keys and counts as values.
"""
# Initialize counters
status_counts = {
'200': 0,
'404': 0,
'3xx': 0,
'other': 0
}
# Patterns to match Googlebot and status codes
googlebot_pattern = re.compile(r'Googlebot', re.IGNORECASE)
status_code_pattern = re.compile(r'\s(\d{3})\s') # Captures 3-digit HTTP status code
try:
with open(log_file_path, 'r', encoding='utf-8', errors='ignore') as file:
for line in file:
if googlebot_pattern.search(line):
match = status_code_pattern.search(line)
if match:
status_code = match.group(1)
if status_code == '200':
status_counts['200'] += 1
elif status_code == '404':
status_counts['404'] += 1
elif status_code.startswith('30'):
status_counts['3xx'] += 1
else:
status_counts['other'] += 1
except FileNotFoundError:
print(f"Error: Log file not found at {log_file_path}")
return None
except Exception as e:
print(f"Error reading log file: {e}")
return None
return status_counts
def main():
parser = argparse.ArgumentParser(description='Analyze Googlebot crawls by HTTP status code')
parser.add_argument('--logfile', type=str, required=True,
help='Path to the log file to analyze')
args = parser.parse_args()
log_file_path = args.logfile
print(f"Analyzing Googlebot status codes in: {log_file_path}")
results = analyze_googlebot_status_codes(log_file_path)
if results:
print("\nGooglebot Crawl Summary by Status Code:")
print("-" * 50)
print(f"200 OK : {results['200']}")
print(f"404 Not Found : {results['404']}")
print(f"301/302 Redirects (3xx): {results['3xx']}")
print(f"Other Statuses : {results['other']}")
print("-" * 50)
total = sum(results.values())
print(f"Total Googlebot Crawl Entries: {total}")
if __name__ == "__main__":
main()
Step 2 - Save the Python File in the Folder where your Log File is located & open that Folder in VS

By opening the folder where the Python Script is located & where the Log File is located, it opens in that workspace & at that directory level. This will prevent the forthcoming directory declaration headaches.
Step 3 - Open New Terminal & Run this Line
python statuscodesplit.py --logfile "/pathtoyourlogfile"
My Python script was named statuscodesplit that’s why I named it like that if yours is called something different then you will accordingly specify it.
That’s it, once you hit enter, it will analyse the Log File that you’ve declared in that path & then it will print the summary.


Kunjal Chawhan founder of Decode Digital Market, a Digital Marketer by profession, and a Digital Marketing Niche Blogger by passion, here to share my knowledge