Last update at :2024-03-16,Edit by888u
Robots.txt is a text file that gives instructions to search engine robots and is used for SEO optimization. If used correctly, you can ensure that search engine robots (also called crawlers or spiders) correctly crawl and index your website pages. If used incorrectly, it may have a negative impact on SEO rankings and website traffic. So how to set the robots.txt file correctly? Today I would like to share my next experience with you, which mainly includes the following aspects.
1. What is robots.txt?
robots.txt is a plain text file placed in the root directory of the website and needs to be added by yourself, as shown in the figure below.
If the domain name of your website is www.abc.com, the viewing address of robots.txt is www.abc.com/robots.txt.
Robots.txt contains a set of search engine robot instructions.
When a search engine robot visits your website, it will first check the content in the robots.txt file, and then crawl and index the website pages according to the instructions of robots.txt, and then include certain pages or not include certain pages. page.
It should be noted that the robot.txt file is not a mandatory setting that must be done. As for whether to do it or not, why to do it, and what is the use of doing it, I will explain it in detail to you next.
2. What is the use of robots.txt for SEO?
To put it simply, robots.txt has two functions, allowing and preventing search engine robots from crawling your website pages. If not, search engine robots will crawl the entire website, including all data content in the root directory of the website.
For the specific working principle, please refer to the instructions of elliance, as shown in the figure below.
In 1993, the Internet had just started, and there were very few websites that could be discovered. Matthew Gray wrote a spider crawler program, World Wide Web Wanderer, with the purpose of discovering and collecting new websites for website directory.
But the people behind the crawler not only collect website directories, but also crawl and download a large amount of website data.
In July of the same year, Aliweb founder Martijn Koster's website data was maliciously crawled, so he proposed the robots protocol.
The purpose is to tell spiders which web pages can be crawled and which web pages cannot be crawled, especially those website data pages that do not want to be seen. After a series of discussions, robots.txt officially entered the stage of history.
From an SEO perspective, for a newly launched website, since there are fewer pages, robots.txt can be used or not. However, as the number of pages increases, the SEO effect of robots.txt will be reflected, mainly as follows: several aspects.
- Optimize the crawling of search engine robots
- Prevent malicious crawling and optimize server resources
- Reduce duplicate content appearing in search results
- Hidden page links appear in search results
3. How to write robots.txt to comply with SEO optimization?
First of all, there is no default format for robots.txt files.
The writing methods of robots.txt include User-agent, Disallow, Allow and Crawl-delay.
- User-agent: fill in the search engine you want to target, * represents all search engines
- Disallow: fill in the website content and folders you want to prohibit crawling, prefix /
- Allow: Fill in the website content, folders and links that you are allowed to crawl, prefixed with /
- Crawl-delay: Fill in the number after, which means crawling delay, small website Not recommended
For example, if you want to prohibit Google robots from crawling the categories of your website, you would write it as follows:
User-agent: googlebotDisallow: /category/
For example, if you want to prohibit all searches from crawling wp login links, the writing is as follows:
User-agent: *Disallow: /wp-admin/
For example, if you only allow Google Images to crawl your wp website images, the writing would be as follows:
User-agent: Googlebot-ImageAllow: /wp-content/uploads/
For more specific writing methods, you can refer to the official Google robots.txt document, as shown in the figure below.
Although these writing instructions seem complicated, as long as you are using WordPress, it will become much simpler. After all, wp is the son of Google. As far as SEO is concerned, the best way to write robots.txt on a WordPress website is as follows: Text editing is required.
User-agent: *Disallow: /wp-admin/Allow: /wp-admin/admin-ajax.phpSitemap: https://www.yourdomain.com/sitemap.xml
Or something like this.
User-agent: *Allow: /Sitemap: https://www.yourdomain.com/sitemap.xml
The difference between the two is whether to prohibit crawling /wp-admin/.
Regarding /wp-admin/, WordPress added a new tag @header ( 'X-Robots-Tag: noindex' ) in 2012. The effect is the same as using robots.txt to prohibit crawling /wp-admin/. If it is still If you are worried, you can add it.
As for other website content and links that do not want to be crawled by search engines, just do it according to the needs of your own website.
You can use robots.txt to disable crawling, or use Meta Robots to do Noindex. My personal opinion is that Meta Robots is used for the links that come with the wp program, and robots.txt is used for website content pages that need to be hidden.
Summary
The next thing to do is to add the written robots.txt file to the WordPress website.
According to my own experience, the fewer instructions in robots.txt, the better. When I was a novice, I read some great articles and banned many file directories and website content, especially /wp-includes /, directly causing JS and CSS to fail to run properly.
Finally, please note that the instructions in the robots.txt file are case-sensitive, so don’t make a mistake.
Recommended site search: registered domain name query, virtual host Hong Kong host, Zhenjiang high-defense server, foreign trade host, free host foreign server, server, US host rental, cloud host, German server,
发表评论