Anchor Text Audit with Python – Find Cannibalization in Seconds

When it comes to Internal Linking in SEO the opportunities are limitless.

There are various ways you can build internal links & navigate the authority flow across the pages and help them rank better.

And at the same time Internal Links could be responsible for cannibalization when 1 competitive exact match anchor is used to link to more than 1 page.

i.e. anchor text is money plant and it is used to link to both

  • https://example.com/category/money-plant
  • https://example.com/blog/random-blog

Now this is an issue because we are confusing Google about which page we consider to be about money plants.

Which page it is that deserve to rank for the money plant keyword?

And this also leads to an authority split issue.

Solution: Run a full site crawl and manually analyse Google Sheets on Excel anchors that are linking more than one distinct page and this can take its sweet time.

This is where Python comes to save the day

Python Script to Find Anchors Linking to >1 Distinc Page

python script anchor text

To use this script (best to use on Replit) you have to upload a csv containing your screaming frog export of inlinks.

Containing following columns source_url, target_url, anchor_text, follow_status, http_status

Once you run the script it reads the data from your csv upload & in output tells you about the anchors that are linking to more than 1 distinct page.

As you can see in the screenshot above. You can drill down by only adding export of selected anchor export. So now instead of Python Scanning all the internal link anchors from the website it will instead look at anchors that are important to you. And this will save you more time.

How long does it take to execute the script? it happens in mere seconds. If the upload is big then it can take about a minute.

Here is the script

				
					import pandas as pd

# Read in the CSV file containing internal linking data
df = pd.read_csv('internal_linking_data.csv')

# Create a dictionary of anchor text and the pages it links to
anchors = {}
for i, row in df.iterrows():
    if row['anchor_text'] in anchors:
        anchors[row['anchor_text']].append(row['target_url'])
    else:
        anchors[row['anchor_text']] = [row['target_url']]

# Find the exact match anchors that are linking to more than one page and output the pages
exact_match_anchors = {}
for anchor in anchors:
    target_urls = anchors[anchor]
    unique_target_urls = list(set(target_urls))
    if len(unique_target_urls) > 1:
        exact_match_anchors[anchor] = unique_target_urls

# Output the results
print("Exact match anchors linking to more than one page: ")
for anchor in exact_match_anchors:
    print("Anchor: " + str(anchor))
    print("Count of links: " + str(len(anchors[anchor])))
    print("Links to pages:")
    for target_url in exact_match_anchors[anchor]:
        print("- " + target_url)
    print()
				
			

Replace internal_linking_data.csv with whatever your file name is.

That’s it! This is how you can make use of this Python Script to audit your anchor text.

Leave a Comment