- Learn Linux
- Learn Electronics
- Raspberry Pi
- LPI certification
- News & Reviews
9 September 2011
When I'm writing my blog I often add links to relevant websites. A quick Google and click on the link and I can easily copy the site into my blog. If however the links are direct to PDF files then it's usually just a case of right clicking and choosing "Copy Link Location" (I'm a Firefox user, if you are using Internet Explorer then it's called Copy Shortcut). Normally this works OK as well, except that Google likes to track the number of people following it's links. This isn't always the case, but sometimes you end up with a url that goes to Google rather than directly to the website. Striping out the Google part manually is difficult as the embedded url is encoded as a web safe string.
An example may help explain this:
I recently linked to a PDF document on the Worcestershire County Council website (on this blog post regarding road safety near Redditch schools).
The url that the Google search result provided is:
which is 376 characters long! [Try fitting that into a 140 character tweet without shortening].
As you can see the first 67 characters are the bits added by Google and do not relate to the website I wanted to link to. Also the encoding adds additional characters so the actual url is 291 characters.
This is a pretty extreme example, but some sites do have some long urls, particularly for attached documents within CMS systems.
Google is not the only site that does this, for example Facebook does a similar thing with some links from it's pages.
The alternative would be to only unescape the characters that need to be (eg. %3A = :, %2F = /, %3F = ?, %3D = =), but I found it easier using the built-in function rather than using a lot of regular expression manipulation.
Try the code below: