How Google Crawls Websites

blog cover image
42
18.1K followers
Updated

So, I recently watched a YouTube video hosted by Gary Illyes of Google, that discusses how websites and posts are crawled.

I thought it might be useful to summarize this - it's not often Google gives us insights into how it works!

Having written your posts, you want them to be indexed and ranked highly.

If your content is not indexed, it won't be in Google's search engine, and nobody will find it.

Before this can happen, Google needs to find it, it needs to know it exists, which it does through the use of a crawler called a Googlebot.

One thing you have to realize, is that there are trillions of URLs floating around the internet and some of them will never be discovered.

Not all posts will be crawled.

Google uses a complex algorithm to decide which sites it will crawl, how often and how many posts and pages it will look at.

The majority of new posts are found by following a link that has already been added to the search engine.

That is why internal linking is so important.

Once you have written a post, ensure that one of your indexed posts includes a link that leads to the new content.

But it has to be relevant!

Don't try and link two different topics together, if it doesn't make sense.

Google also crawls categories, so this is another key step after writing your post.

All posts should be categorized, not just for Google's crawlers but for easy navigation.

Having a Sitemap will also help Google crawl your site and find your new posts.

Google Search Console Reports

This is a bitter pill to swallow, but you cannot force Google to crawl or index your posts if it thinks your content is just not good enough!

The quality of the content is a relevant factor when Google is deciding whether to download or "fetch" your posts.

If Google thinks it doesn't meet the standard required, then it won't crawl the post.

Or even if it does crawl it, it still might not be indexed.

Re-submitting your post through Google Search Console over and over again, without having added anything new, more informative, more unique, will not change the situation!

So, finally let's look at the two indexing errors in Search Console.

Discovered - currently not indexed:

The page was found by Google, but not crawled yet.

Typically, Google wanted to crawl the URL but this was expected to overload the site; therefore Google rescheduled the crawl.

This is why the last crawl date is empty on the report.

Crawled - currently not indexed

The page was crawled by Google but not indexed.

It may or may not be indexed in the future; no need to resubmit this URL for crawling.


I hope this helps but don't shoot the messenger!

Yes, we all know that Google's search results are in a terrible state currently and that some very poor content is ranking highly.

https://my.wealthyaffiliate.com/parthab/blog/what-does-google-want-heres-what-i-think-youre-not-going-to-like-it

Thank you for reading.



Login
Create Your Free Wealthy Affiliate Account Today!
icon
4-Steps to Success Class
icon
One Profit Ready Website
icon
Market Research & Analysis Tools
icon
Millionaire Mentorship
icon
Core โ€œBusiness Start Upโ€ Training

Recent Comments

20

Thank you Diane

Understanding how Google crawls and indexes helps us. I have been lucky so far with my new website all my pages and posts have been at least indexed by Google

Jeff

Thanks, Diane!

I think that Google is a victim of its definition of domain authority. It needs to use its AI to sort out which high DA sites actually meet its EEAT criteria.

In the guitar niche, the majority of high-DA UGC sites have info that's based purely on opinion, and a lot of the "factual" info is incorrect.

Google knows this, which is why a great article from a trusted affiliate marketing site with a fairly low DA could outrank them prior to the latest HCU.

Massively cranking up article output using AI to create mediocre content isn't going to get us where we want to go in the long run, for those of us who are doing that.

Just my opinion. ๐Ÿ˜Ž

Rock On! ๐Ÿค˜
Frank ๐ŸŽธ

An excellent informative read as ever Diane, but what do you know about de-indexing?

As I mentioned to you before, I started a new experimental basic question and answer site just before Christmas and things were going well.... until a couple of weeks ago!

Pretty much every post was indexed within a couple of days as they continue to be now...

But when I recently checked some stats in GSC, I noticed that 24 of my earlier posts have been de-indexed in the last two weeks!!

6 each on Wednesday then Sunday then Wednesday and Sunday again! I take it these two days of the week are when the site gets crawled...

While I am fully aware that not every single post will get indexed, I am just curious as to why these posts were deemed good enough to be indexed before, some were even ranking and getting the odd click or two... and now they are not???

I'm up to 110 posts so far and until 2 weeks ago impressions were increasing to over 400 a day, but since this de-indexing whatever, they have dropped to around 200 daily!!

Like I said, it's just an experiment that I will give 6 months, if it works... great!! If not.... I won't lose any sleep over it!!

But... with your insider knowledge of all things Google can you shed some light on what is going on??

Should I just wait it out until Google sorts itself out or go back and update all the posts and submit them again??

If anyone else reading this has the same problem or knows what is going on then please chime in!!

Have a fantastic start to the weekend Diane!!

๐Ÿ‘๐Ÿป๐Ÿท๐Ÿท

Hi - it's difficult to say really.

I wonder if Google was simply testing your posts, as it does with new sites, and then if you weren't getting the clicks, decided there was no point in keeping them in the index.

Were the queries you were ranking for, the ones that you were actually targeting?

Could you add any new, unique content to the posts?

Is it the likes of Quora and Reddit ranking number 1 for your keywords?

So many variables!

Yep... it is difficult to say Diane!

But, at the time the site was only 6 weeks old and from my limited experience I believed it normally takes at least six months for Madame Google to test new posts against more established ones in order to decide on the well... temporary pecking order!

Would she really de-index them after a month or so which is a tiny amount of time in my opinion??

Some queries were, but a lot were not...

I could always add more unique content to the posts, but... that could be never ending and I learnt a while back it is best not to strive for perfection as that is a battle that never stops!

Some forums where ranking higher sometimes as were high DA sites, but on others it was absolute fluff (no offense intended)!!!

Too many variables indeed.. I'll just continue to plod along until the six month mark when I intend to have around 400 posts published and take it from there!!

Appreciate your feedback my friend!!

๐Ÿ‘๐Ÿป๐Ÿท

Thank you Diane. As always, your posts are on hit, very informative, and quite helpful. Keep them coming! :)

Kevin

Thanks Diane,
I have been very lucky so far that virtually all of my posts have been indexed but I have 1 post on my new WA promoting site that just won't get indexed.
I don't think I ever categorized this one so will add that as I didn't know that had any impact on Google searches and I am going to rewrite it and see if this works

Thanks as always - currently you are one of the only WA members whose posts i ALWAYS read and that is because I know i will learn something if I do (and i know they will always be to the point :-)) Keep them coming
Cheers
Pete

See more comments

Login
Create Your Free Wealthy Affiliate Account Today!
icon
4-Steps to Success Class
icon
One Profit Ready Website
icon
Market Research & Analysis Tools
icon
Millionaire Mentorship
icon
Core โ€œBusiness Start Upโ€ Training