URL Canonicalization

This post idea came up from an IT support request I got from a user trying to get to our company website today. she was typing just the the domain name without using the www prefix on the browser, and she was getting to our domain controller web server instead of the public site, and she was wondering why. she was even worried that our website was hacked!

Our problem is that our website URL and our network domain name is the same. our website is hosted outside the network, so people in order to get to the site they need to type the www prefix before the domain, otherwise they see the domain controller web server instead. as of now, I have no idea how to fix this, I was trying to redirect the IIS page to the external website, but then I read somewhere in a Microsoft support page that it is not recommended to do that in a domain controller so I chose not to do it.

canonical URL

This problem got me thinking about URL canonicalization or whatever that is called though. In the eyes of most people these URLs are the same:

www.example.com

example.com
example.com/

http://example.com

https://example.com

But for a web server they are different. you can setup a different web page in each of those URLs. If you have a website, and your website is accessible from all those URLs, a search engine will “think” that there are actually 5 different websites in your web server with the same content.

What about if web servers were smarter though, and they could recognize the intention of the user instead of spitting out what is typed in the browser. imagine if our domain controller server somehow knew that our user  was actually trying to get to the main company site, instead of the ugly IIS page, and bring the main site instead. I believe if the server collects enough historical data about the user, it can actually do it!!

If I’m not mistaken this is what Google is doing or trying to do with our data, guessing our intentions instead of just spitting out search results based on keywords we type in the search box. For example, there are two famous people with the name of Roberto Carlos in Brazil, but one is a soccer player and the other is a singer. if I type Roberto Carlos in Google how it will know which person I’m searching for? I could easily type “soccer” or “music” at the end of each name to define the search, but lets assume I’m too lazy to do that, how Google will know which person I’m looking for? the answer is historical data. if I do soccer searches in Google a lot, and visit Roberto Carlos the soccer player website frequently, and I have never looked at Roberto Carlos the singer website or click in search results that came up about him when searching for that name, then Google will assume that I’m actually searching for Roberto Carlos the soccer player next time I search for that name. I just can imagine the complexity that goes into this, and doing it in a split second in a search query its just freaking amazing..

Maybe in the future all web servers will be smarter and guess our intentions. maybe we won’t even have to type anything to get what we want from the web, we will just need to think about it, and the server will automatically read our minds!

More DNS Tutorials