Pagodo: Automate Google Hacking Database scraping

  • 0
pagodo

Pagodo: Automate Google Hacking Database scraping

Category : Blog

Pagodo (Passive Google Dork)

The goal of this project was to develop a passive Google dork script to collect potentially vulnerable web pages and applications on the Internet. There are 2 parts. The first is ghdb_scraper.py that retrieves Google Dorks and the second portion is pagodo.py that leverages the information gathered by ghdb_scraper.py.

 

Pagodo

 

What are Google Dorks?

The awesome folks at Offensive Security maintain the Google Hacking Database (GHDB) found here: https://www.exploit-db.com/google-hacking-database. It is a collection of Google searches, called dorks, that can be used to find potentially vulnerable boxes or other juicy info that is picked up by Google’s search bots.

 

Database

 

Usage:

To start off, pagodo.py needs a list of all the current Google dorks. Unfortunately, the entire database cannot be easily downloaded. A couple of older projects did this, but the code was slightly stale and it wasn’t multi-threaded…so collecting ~3800 Google Dorks would take a long time. ghdb_scraper.py is the resulting Python script.

pagoda

The flow of execution is pretty simple:

  • Fill a queue with Google dork numbers to retrieve based off a range
  • Worker threads retrieve the dork number from the queue, retrieve the page using urllib2, then process the page to extract the Google dork using the BeautifulSoup HTML parsing library
  • Print the results to the screen and optionally save them to a file (to be used by pyfor example)

dork

 

pagodo.py:

Now that a file with the most recent Google dorks exists, it can be fed into pagodo.py using the -g switch to start collecting potentially vulnerable public applications. pagodo.py leverages the google python library to search Google for sites with the Google dork.

file

 

Performing ~3800 search requests to Google as fast as possible will simply not work. Google will rightfully detect it as a bot and block your IP for a set period of time. In order to make the search queries appear more human, a couple of enhancements have been made. A pull request was made and accepted by the maintainer of the Python google module to allow for User-Agent randomization in the Google search queries. This feature is available in 1.9.3 (https://pypi.python.org/pypi/google) and allows you to randomize the different user agents used for each search. This emulates the different browsers used in a large corporate environment.

python

 

Future Work:

Future work includes grabbing the Google dork description to provide some context around the dork and why it is in the Google Hacking Database.

Hacking

 

Most Popular Training Courses at Indian Cyber Security Solutions:

 

Summer Training for CSE, IT, BCA & MCA Students 

Network Penetration Tester Training

Ethical Hacking  training

Python Programming training

 RHCE  training

CEH V9  training

Diploma in Network Security Training

Secure Coding in Java

Diploma in Web Application Security 

Certified Web Application Penetration Tester 

Certified Android Penetration Tester 

Certified Python Programming 

Advanced Python Training 

Reverse Engineering Training  

Amazon Web Services Training  

VMware Training 

Digital marketing

CCNA training

Android Training

 

Cybersecurity services that can protect your company:

 

Web Security | Web Penetration Testing

Network Penetration Testing – NPT

Android App Penetration Testing

Source Web Development

Source Code Review

Android App Development

Digital Marketing Consultancy

Data Recovery


Leave a Reply

Show Buttons
Hide Buttons