Monday, June 25, 2007

Automated Web-Site Crawling Tools

Manual Application Security testing provides granular control however it may sometimes become very exhaustive and monotonous. As a professional Application Security tester, often you will come across such projects where testing of hundreds of web pages would be required, and in case you do not have sufficient time their is a high probability of missing critical vulnerabilities.

There comes the requirement for automation of crawling and scanning. It can save tremendous amount of time and labor. Apart from these two obvious benefits, you can get a high level schema of the whole web site in just few minutes and once you can view the logical interconnectivity of web-pages/scripts its far easy to plan the attack phase.
The two most popular and free, automated crawlers are:

Web-Site layout after crawling is finished:

WebScarab
Basically these are proxy applications with in-built crawling function. These operate as a man-in the-middle between your browser and web server. Every request/response between your browser and the server is trapped and proxy maintains a log of every single transaction. You can select any of the sites cached by proxy and crawl (Paros and WebScarab use “spider” term for crawling) for interconnected web-pages.

Additionally, Paros can scan a web site for vulnerabilities and generate a vulnerability report with vulnerability priority, description and recommended countermeasures.
Other crawler applications are:
  • Wget (It can crawl and download the contents of a web site, supported on Windows & *nix)
  • Teleport Pro (It is a windows based commercial tool for crawling and local caching)
  • Lynx (It is an advanced, text based browser for *nix platform)

Till now, it was the positives of automated crawlers…however, these have following limitations as well:

  • Automated Crawlers do not work well with client side code like Java Script, Java Applet, Flash, ActiveX
  • Since these are automated by nature, they do not interact well with web pages requiring human input and may not be able to result in all possible child routes (web pages).
  • Crawlers may not be able to retrieve complete hierarchy of sites having multiple level of web authentication. For example, even after logging in, you may be required to submit a transaction password or fill an input, based on some graphics (CAPTCHAs).
  • Crawlers may not be able to retrieve complete hierarchy of those sites which produce different web-pages for different user types (role-based authentication). In such cases only pages accessible to current user will be retrieved.
  • Crawlers may miss the URLs which are coded inside function calls instead of html code.
  • Crawlers do a multi-thread search, therefore web-sites having restrictions on multiple simultaneous sessions for same user may get locked and result into denial of service.

Due to these reasons, Professional Testers prefer a mix of automated tools and manual testing. They split the whole projects on the basis of different access (crawl) levels and test them separately.

No comments: