Archive for the 'Tips and Tricks' Category
URL Obfuscation
The way Nathan explained it, you type in “theinnerlayer.softlayer.com” and it is translated to an IP address, which is then contacted, and the page is returned to you. However, if you know the IP address already, you can use that instead of the URL, and skip the nameserver entirely. For instance, http://66.228.119.19 will take you directly to the InnerLayer blog, bypassing the name server. But that’s not all! Not only will the dotted-decimal representation of the IP work in a url, but the dword representation will as well! Try http://1122268947. That will also get you to the InnerLayer.
Now that we’ve gotten the domain out of the way, what about the bits before and after? Before the domain, between the protocol (http) and the domain itself, there is an optional authentication part. You can specify a username to log into secured sites right in the URL. http://user:pass@site.com is the standard format for such logins. However, if the website you’re going to doesn’t require authentication, most browsers simply ignore it. FireFox 3 will prompt you when you click on these obfuscated links to ask you if “site.com” is really the site you wish to visit, where IE7 simply won’t work at all if there’s an unexpected authentication string. This is a fairly new feature, and it’s a good way to protect users against this sort of attack. Now that you know about the methods of obfuscating domain names in URLs, you can probably see how http://www.bankofamerica.com%20login@1122268947 actually redirects to the InnerLayer. This is a common tactic used by spammers and phishers to obfuscate their URLs. You can put anything you want into the authentication portion of the URL to obfuscate it, as long as it’s not a reserved URL character like colon, “at” sign, or forward slash. For our case, let’s use “4NDIw:U4ODYwMCAxMjE5″ as our fake authentication data, just to be confusing.
Now that we’ve added stuff to the beginning of the URL, what about the filename at the end? Nathan’s post could easily be accessed using http://4NDIw:U4ODYwMCAxMjE5@1122268947/2008/do-you-know-where-your-nameserver-is/. However, there’s still all that easy-to-read nonsense at the end. That will never do. Have you ever seen a URL with a space in it? The space is encoded as %20. That’s the hexadecimal representation of the ASCII code 32, a space. The percent sign indicates that the following 2 digits are to be interpreted as a hex code for a real character. This is how you keep URLs from breaking on spaces, you turn the spaces into non-breaking characters. However, did you know it works for ALL characters and not just spaces? We can change every character but the forward slashes in any url to their hex equivalents. Nathan’s article link then becomes: http://4NDIw:U4ODYwMCAxMjE5@1122268947/%32%30%30%38/%64%6f%2d%79%6f
%75%2d%6b%6e%6f%77%2d%77%68%65%72%65%2d%79%6f%75%72%2d%6e%61
%6d%65%73%65%72%76%65%72%2d%69%73.
But wait, there’s more! Let’s go back to the domain name, shall we? Most browsers will handle overflow in the dword representation of the domain just fine. What that means is that we can continually add 4294967296 (2^32) to the domain portion of our obfuscated URL and still continue to get the results we want. Our URL is now: http://4NDIw:U4ODYwMCAxMjE5
@5417236243/%32%30%30%38/%64%6f%2d%79%6f%75%2d%6b%6e%6f%77%2d%77
%68%65%72%65%2d%79%6f%75%72%2d%6e%61%6d%65%73%65%72%76%65%72
%2d%69%73.
As a final trick, you don’t have to obfuscate every letter. A fixed pattern of %xx%xx%xx over and over again will get boring. Mix it up. I only converted 70% of my URL to hex, resulting in this gem: http://4NDIw:U4ODYwMCAxMjE5@5417236243//%3200%38/%64%6f%2d%79%6f%75
%2d%6b%6e%6f%77-%77%68%65%72e%2d%79%6f%75%72-%6e%61%6d%65%73e
%72v%65%72%2di%73. As you can see, this is quite a bit more confusing than the original URL, which was http://theinnerlayer.softlayer.com/2008/do-you-know-where-your-nameserver-is/.
This information can be useful to any systems administrator who is dealing with an elusive, abusive user. Being able to translate a crazy URL to the actual human-readable equivalent can greatly assist both the SoftLayer abuse department as well as any other group attempting to track down spammers, scammers, or just plain old sneaky users.
As a final note: Please don’t use this knowledge for evil. As mentioned before, the new versions of both FireFox and Internet Explorer are no longer fooled by the fake authentication string trick, and the rest of the obfuscation should really only be used to fool web spiders. Personally, I used this method in combination with javascript to obfuscate links and email addresses so that I wouldn’t get spammed.
The following PHP code was used to generate the links in this article.
[Editor's note: We at SoftLayer use our powers for good and so should you. Thankfully half of these kinds of links won't open in the latest versions of Outlook and Safari. -K]
<?php
//the URL we’re attempting to obfuscate
$url = “http://theinnerlayer.softlayer.com/2008/do-you-know-where-your-nameserver-is/”;
$urlData = parse_url($url);
$path = $urlData['path'];
$startingIP = gethostbyname($urlData['host']);
//get the long representation:
$long = ip2long($startingIP);
//add 4294967296 to the long for further obfuscation:
$long += 4294967296;
//add random authentication characters to the beginning of the string:
$auth = substr(base64_encode(microtime()), rand(5,10), rand(5, 15)) . “:” . substr(base64_encode(microtime()), rand(5,10), rand(5, 15));
//obfuscate the rest of the URL
$len = strlen($path);
$obfuscatedLocation = “”;
for ( $p = 0; $p < $len; $p++ ) {
//check for slashes
//also, 3 in 10 characters make it through plain for further confusion
if ( $path[$p] == ‘/’ || rand(0, 10) > 7 ) {
$obfuscatedLocation .= $path[$p];
continue;
}
//made it here, obfuscate this character:
$obfuscatedLocation .= ‘%’ . dechex(ord($path[$p]));
}
echo “http://$auth@$long$obfuscatedLocation”;
?>
The New Face of Search Engine Optimization
Most SL customers host websites on our services, and all websites benefit from high search engine rankings. The “old method” of search engine optimization doesn’t really work anymore. Back in the days before Google, the best way to get to the top of the search engine rankings was to follow four easy steps:
- Diversify your IP space.
- Add keywords to the <meta> tag on your site.
- Make sure those keywords also appear in the body of your document.
- Take 2 & 3 and fill them with references to Pokémon, pop music, and porn.
However, only #3 is a valid tactic in this new, Google-driven world. Let’s analyze them one by one.
Diversifying your IP space. Old search engines gave more credence to sites located in “geographically diverse” areas, where “geographically diverse” was determined by class C addresses. Now, however, with the advent of huge centralized data centers, search engine algorithms recognize that a site with 15 servers in the same datacenter may be just as effective as 15 separate cities. Of course, it’s still a good idea to buy servers in, say, Dallas, Seattle, and Washington DC.
Meta tags. Google and other major search engines don’t really look at meta tags anymore for keywords. They still will use the meta tag for language, encoding, and summary data. However, the processing power of search engines has been increasing exponentially in the last few years, which means they’re capable of analyzing the actual content of the page rather than relying on meta tags. If you still have meta tags, you can keep them, but they’re only really useful for language and summary information.
Document body keywords. This is an area where it still matters. As previously mentioned, search engines now are capable of searching the entire page. In the past, it was only a few search engines that indexed actual page content, and even then it may have been a simple count of how often your meta keywords match page contents. Now, however, Google stores local copies of every page they index (to a certain extent) and uses the entire page contents for search and cached viewing.
Dummy data. When search engines were younger, they could be fooled very easily by simply including the top 1,000 popular search terms in your meta tags and as invisible text inside your document body. I never understood it personally, but the thinking was that if you had enough references to Britney Spears on your page, you would hijack enough people that one of them would forget what he was originally looking for and buy your product instead. Though I guess that’s how spam works now, isn’t it?
So what can you do right now to improve your search engine placement? There are a few easy things to do, broken into the following categories:
- Page Titles. Your pages should each have a unique, meaningful title. Putting the name of your site on every page doesn’t do anyone any good. Not only will it give your search results more visibility, but it will help people find it again if they bookmark it.
- Page Content. You want your page content to be meaningful and arranged around a central semantic theme. Don’t put up one huge page featuring thousands of unrelated pieces of information. Keep it concise, unique, and focused. You have an unlimited number of individual pages, make use of that fact.
- Dynamic Content. The more often your site changes, the higher your Google rank will be. You could take the cheap way out and simply put a box on your site that has random content, but the best way is to actually do updates as often as possible. This ensures not only visibility on the search engines, but makes your site more useful to the people that eventually make it to your pages, which is your main goal anyway.
- Accessibility. This is a key area that many sites overlook. You need to make heavy use of the title and alt attributes for things like links and images. Not only is it required by the Americans With Disabilities Act, but it helps blind users navigate your site. You know what acts like a blind user? Search engine crawlers. When you do a Google images search, the images that pop up most likely have alt attributes specified. The same goes for link titles, if you put a brief link description in your titles, not only do you get pretty mouseovers on the links, but they add one more point to your eventual page rank.
- Linking. Google builds its page rank based on links to and from the page in question, as well as the page contents itself. With this in mind, it’s useful to link out to sources on whatever topic you’re attempting to talk about. The higher the page rank of the target, the higher the benefit to you. Also, it’s a good idea to attempt to be useful enough for other people to link to you, either in message boards or as a source of their own. All links eventually increase your page rank. Also, as a small note, make your URLs “search engine friendly” by attempting to include keywords in there as well. Many message boards will include the post title in the URL for just this purpose. Also, for some reason, Google refuses to index any URL with “?id=” in it, so be careful about that.
- Site Map. Search engines love site maps. Users don’t care for them as much, but a concise HTML or XML site map with links to every page on your site divided into sections with a short description increases links, increases accessibility, and gives the search engines more meta data on the important topics in each page.
So all you have to do to improve your search engine rank is to have dynamic, frequently changing content about a single, concise topic on an easily accessible page that is frequently linked to by other pages. Wikipedia is a perfect example of search engine optimization in action. Each page is titled with the topic it discusses; every image has a title attribute and links out to a full description of the article; each link has a title attribute; many outside sources are mentioned; plenty of sites link to each article as well as the root domain; and the index page changes every single day with completely new and original content.
No comments