Disallow: to intercept ", generally written in front of
well we next today’s theme is robots.txt robots.txt, the first search engine in the file when the site visit to view. When a search spider to visit a site, it will first check whether robots.txt exists, the site root directory if it exists, the robot will search range according to the contents of the file to determine access; if the file does not exist, all search spiders will be able to access all is not password protected web page
The last time the
Sitemap: site map URL
User-Agent: Rover the following rules apply, fill in the general "*"
Allow: does not intercept ", general fill" / "
shielding spider on the background document collection is used of the webpage code other, do not explain here, in my own’s fucking network, I think we can shield is cache, include, JS, update, skins directory, in order to tell others not stupid B directory administrator, so here is not to write directory administrator.
and share the < station in Shanghai station; how to love K under the condition of serious >, there are a lot of children included; and I QQ to learn from me, I also just contact Shanghai dragon is not experienced, and occupation is not engaged in the network in the industry, only their own interest. I also often Lu Songsong, Mou Changqing and some other well-known promotion blogs and websites constantly learn, and I have enough time and patience to test in practice to draw lessons
spider with their own robots.txt Allow
told the website weight distribution of
on the introduction of the robots, has been very clear, here to talk about why the website is very important. Many webmaster do not add this file in the root directory of sites and set its own standard format, it allows you to search in the search engine, can also use Google webmaster tools to generate.
you know, for a web site, the weight is limited, especially the grassroots website, if the entire website gives equal rights, that is not scientific, and a complete waste of server resources (search spiders more server resources, than the normal access CPU/IIS / bandwidth); you can think of, just like you the website structure is not clear, no weight statement is good, the spider can not determine your site what content is what important, what is your main content.
if you want to shield some spiders, someone asked if personalize? You write on the top on it "