According to RFC3986
, the path portion of a URI is case-sensitive.
Google and other search engines use this ruling in their crawlers, so that pages which differ by case are treated as different pages.
This can be problematic when working in a non-case sensitive environment, such as IIS on Windows, because this rule is ignored and any variation in case will still return the same page. This leads to duplicate content issues and can also dilute the number of inbound links detected to a particular page.
As a general rule, I always keep my internal site links in lower case, but you cannot control how 3rd parties link to you, so if someone links to you using upper case and this link is spidered by a search engine, the page could be mistaken as a duplicate of its lower case alternative.
These all serve up the same page, but as per the RFC should be treated as 3 different pages. We want to prevent these links from being mistaken as different pages, by redirecting requests for pages with uppercase letters to the lowercase equivelant.
In my particular case, I have two other problems to overcome because I am using a shared hosting environment. Firstly, I cannot use HttpHandlers on the server so my fix had to be in the code-behind of any pages I want to apply it to (limiting it to aspx files). Secondly, the shared host uses 'Default.aspx' and not 'default.aspx' as the default page, so this causes an uppercase character in the RawUrl when the root URL is requested.
The following code should be placed in your page_load event:
'prevent wrong case urls
If Text.RegularExpressions.Regex.IsMatch(Request.Url.PathAndQuery.Replace("Default.aspx", "default.aspx"), "[A-Z]") Then
Response.StatusCode = 301
Response.Status = "301 Moved Permanently"
Dim loc As String = Request.Url.PathAndQuery.ToLower()
If loc.EndsWith("default.aspx") Then loc = loc.Remove(loc.LastIndexOf("default.aspx"), "default.aspx".Length)
The code basically uses a 301 redirect to inform the client (spider/browser) to use the lower case equivelant of the URL. Another SEO measure is to not reveal 'default.aspx' as part of the URL (because that would be a duplicate of '/') - so the code also makes sure not to use this page name in the URL.