GOOGLE: best friend, worst enemy.

Google.com is the most popular search engine on the  market. That’s beacuase it has the most complete and largest indexing of all internet. Here I’d like to display few interesting tricks about the google search engine.

  • Searching for the word : “hello”

https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=hello&oq=hello&gs_l=hp.3..0l4.1930.2760.0.3170.5.5.0.0.0.0.80.388.5.5.0.les%3B..0.0…1c.1.5.psy-ab.cuLsoCtq5UE&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&bvm=bv.43148975,d.dmg&fp=79c21eb3a028c18d&biw=1440&bih=735

Searching the google using the input field text box for the string “Hello” produces already a large php GET method with a lot of variables (13 different variables!)

https://www.google.com/#hl=en&output=search&sclient=psy-ab&q=hello&oq=hello&gs_l=hp.3..0l4.1930.2760.0.3170.5.5.0.0.0.0.80.388.5.5.0.les%3B..0.0…1c.1.5.psy-ab.cuLsoCtq5UE&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&bvm=bv.43148975,d.dmg&fp=79c21eb3a028c18d&biw=1440&bih=735Let’s see what google know about me:
(& = is a separator used to separate the type variable from each other )hl=en   (language: try changing the variable to “it”, “de”, “es”)
output=search  (query type)
sclient=psy-ab  
q=hello      (query : you can type directly your search without pressing the search buttom)
oq=hello   (?)
gs_l=hp.3..0l4.1930.2760.0.3170.5.5.0.0.0.0.80.388.5.5.0.les%3B..0.0…1c.1.5.psy-ab.cuLsoCtq5UE  (looks import but i have no idea what it is)
pbx=1    (?)
bav=on.2,or.r_gc.r_pw.r_qf.  (?)
bvm=bv.43148975,d.dmg   (?)
fp=79c21eb3a028c18d   (?)
biw=1440   (?)
bih=735   (?)

As you can see google populates even the simples search (in this case “hello”) with over a dozzen of other variables which the regular user is not even aware of using. It is interesting to note that even changing 1 variable will result in a different search result list.

  • Ranking

When we search something for example the word “hello” , how does google order the search results. Why is a search result at place #1 and another one at place #200? They contain the same word, but why one is considered more relevant then the other one?

The algorithm used by google to rank pages is called PAGERANKPAGERANK takes in consideration not only how many times the queried word (or string) is repeated (like traditional pre-google search engines did), but also how many pages link to it. Linking to a page is an important factor in valuing a search result because it express the importance and relevance of that page over other results. It kinda works like a reference for a page.

  • Advanced/Customized searches

Google offers users the ability to use advanced search features to search for something. You can  add these features in your search query using the google advance search page . You Can use these features and other not described in google-advanced dirrectly in your search query with special operators and key words that google understands: here is a list (from wiki) of these features used by google that can help you specify better your query.

OR Search for either one, such as “price high OR low” searches for “price” with “high” or “low”.
“-“ Search for either one, such as “price high OR low” searches for “price” with “high” or “low”.
“+” (Removed on October 19, 2011). Force inclusion of a word, such as “Name +of +the Game” to require the words “of” & “the” to appear on a matching page.
“*” Wildcard operator to match any words between other specific words.
define: The query prefix “define:” will provide a definition[33] of the words listed after it.
stocks: After “stocks:” the query terms are treated as stock ticker symbols for lookup.
site: Restrict the results to those websites in the given domain, such as, site:www.acmeacme.com. The option “site:com” will search all domain URLs named with “.com” (no space after “site:”).
intext: Prefix to search in a webpage text, such as “intext:google search” will list pages with word “google” in the text of the page, and word “search” anywhere (no space after “intext:”).
allintitle: Only the page titles are searched(not the remaining text on each webpage).
intitle: Prefix to search in a webpage title, such as “intitle:google search” will list pages with word “google” in title, and word “search” anywhere (no space after “intitle:”).
allinurl: Only the page URL address lines are searched (not the text inside each webpage).
inurl: Prefix for each word to be found in the URL; others words are matched anywhere, such as “inurl:acme search” matches “acme” in a URL, but matches “search” anywhere (no space after “inurl:”).
  • Interestingly dangerous search results

Try to make (simple/short) queries with “filetype=log”, or “filetype=sql” (and other log/database format terminations) and than add the string search “intext=passwor” or “intext=hash”. You will find some preddy interestings search results, mostly due to system administrator mistakes in assigning the right permission/access to their file. Also some of these file (log files) are the results of different hacking tools used by other ppl, most of them phishing site, spams etc..
Screenshoot:

Screenshot from 2013-03-01 05:57:39

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s