Because our process if reversing is not a direct 1:1 mapping to compiled reversing, we have to clarify a bit on how we can be successful. Although some frameworks generate HTML based on the underlying code, HTML cannot always be reversed to a state of source. People do weird stuff. So we must additionally rely on application behaviors and concepts found in forensics and social engineering.
The primary basis of our reversing approach is on Locard’s exchange principle.
Wherever he steps, whatever he touches, whatever he leaves, even unconsciously, will serve as a silent witness against him. Not only his fingerprints or his footprints, but his hair, the fibers from his clothes, the glass he breaks, the tool mark he leaves, the paint he scratches, the blood or semen he deposits or collects. All of these and more, bear mute witness against him.
Locard was a smart dude. You can’t do things in life with out leaving some evidence behind into the how and why something took place. Even the attempt to “clean” a crime scene leaves evidence that the crime scene itself was cleaned. This holds especially true when building applications*. Since information leaking isn’t in the OWASP top 10 list, most applications are like bilboards which scream how they were built. Furthermore, how an application responds or behaves against data is also just another way to identify what it’s composed of.
As a very easy example, lets look at a typical ASP.NET WebForms based application.
The first bit of evidence are the file extensions, .NET applications typically use .aspx, ashx, and .asax. This immediately focuses you on either an ASP.NET MVC application or a WebForms one. To identify which was used, we can use unique features of WebForms such as ViewState or EventValidation. These don’t generally exist outside of this WebForms, because ASP.NET MVC pages are not event driven and are supposedly RESTful. These framework features are obvious and easy to look for (read: grep & view-source). Because ASP.NET WebForms is event driven, it likes to mangle names of objects in order to make sure that you don’t have naming collisions. As a result, if you had a ASP.NET Panel control which contained an ASP.NET TextBox control in it, you’d have a HTML rendering which looked very similar to:
<div id="Panel_NamedPanel"> <input name="ctl100$Panel_NamedPanel_TextBox1" type="text" value="oh hai" /> </div>
This special naming convention suggests not only the framework, but even the version (as previous versions use a different convention). IIS also tends to tell you what framework version, and there are default ASP.NET folders you can test for to see if they exist. A “Views” folder will exist for MVC .NET apps, and is unlikely to exist for a WebForms one. Failing all that, look at the career page and see what they want new developers to know. 🙂
Like I said, lots and lots and lots of evidence.
By just having the application framework identified, you have reduced your working set significantly**. If you suspect that the site you were looking at was built on a content management system, you could use the google to search for any “unique” named fields or pages to see if any results come up which might help you identify the framework. I use technique this often.
Secondly, because our process is based on feedback cycles– how we interact with the site is of importance.
Although some people use the terms active & passive testing, I find them misleading. You are nearly always actively testing the site, though sometimes in less obvious ways. I prefer the terms, elicitation and interrogation. In elicitation, you are strategically asking the application a series of questions which are reasonably acceptable in normal use. This is done not to set off triggers (ids) and end the conversation, but also because sometimes it’s the best way to get information. Interrogation, on the other hand, is often far more aggressive and very obvious it’s being done***. To compare and contrast, I might elicit details about an encoding scheme used on a web application with a creative user/details such as:
Name = John "the duke" O'Reilly Street = 123 Some Street #123 (near 4th & Thomas) City = Phoenix/Ahwatuke ...etc...
This user could very reasonably exist, and concurrently tests different reserved characters to see how they are handled. This name is unique enough that it makes it easier to later grep for in results to see where it’s used throughout an application. It also is unlikely to ever be in someone’s WAF. So I have an incredibly strong chance of not being bothered by one if it exists. If I was testing this in a more interrogative sort of way, I might just spam the fields with a list of xss attacks like:
"><script>" <script>alert("XSS")</script> <<script>alert("XSS");//<</script> <script>alert(document.cookie)</script> '><script>alert(document.cookie)</script> '><script>alert(document.cookie);</script> \";alert('XSS');// %3cscript%3ealert("XSS");%3c/script%3e %3cscript%3ealert(document.cookie);%3c%2fscript%3e %3Cscript%3Ealert(%22X%20SS%22);%3C/script%3E <script>alert(document.cookie);</script> <script>alert(document.cookie);<script>alert ...etc...
Conversely, these payloads MIGHT be in a WAF and could be blocked, despite the field being vulnerable. Neither approach is “better” than the other, they are just used in different places for different reasons. The trick is, of course, to know when to use which and what might cause deviations in your ability to understand the response. For instance, just like in interrogation sessions, applications tend to shut down if you are too aggressive. Or if you are too obvious with your questions, a WAF might block keywords and become (in a theoretical sense) aware of your deceptions. People aren’t really named Bobby DropTables.
But just to be complete– it wouldn’t matter so much if they did block it. The sheer fact that it’s blocked implicates some type of countermeasure, either a WAF or application filter. You can distinguish between the two with forensics. WafWoof (or Waffit) is an example of a tool which attempts to figure out what WAF is being used by testing various encodings that WAFs use in general. If it’s an application filter, they are sometimes implemented as plugins and you can try to force browse to see if they exist. If those fail, you can look for gaps where an application filter might not be applied. In ASP.NET WebForms, for instance, some controls don’t encode output data by default. Sometimes you can bypass an application filter with an attack against an AJAX type service– a WAF might still filter data, where often application filters don’t. You could try comparison measurements against pages with known and made up parameters to see how they are handled. It goes on and on and on.
You can’t stop the signal.
Our final basis is that application behaviors can assert it’s relationships, entities and types.
This concept will be discussed and demonstrated at great length as we get into decomposition. It’s worth noting, for now, that this approach is used when testing malware somewhat frequently. Allowing the malware to affect/infect controlled systems, lets the reverser discern not only what it does, but what things it might then be built of. In order to do X, an app might be composed YZ. This basis provides useful evidence for asking intelligent questions later on.
The engineering process is one of pragmatism. Applications aren’t built in total isolation. They use frameworks to develop with, and reuse code (patterns & algorithms) to solve problems. They also aren’t generally aware of how obvious that is, which makes it VERY easy to gain visibility into what they’ve done. Despite not being a 1:1 relationship to compiled reversing, we can be very successful in figuring out how an application is built.
If a website boldly declares it’s written in ASP.NET WebForms you should have open the MSDN articles speaking to what might be there. If a website further boasts of being built on top of DotNetNuke, you should download the source and have a local copy you can use to help navigate the site you’re looking at. It is always in your best interest to download the framework locally and use it as a frame for your test.
Every bit of evidence can and should be used against them.
* Some apps would be best served if developers tried to cover up that they wrote it, I’ve seen many a travesty in my time.
** Reducing your working set is a way to digest information with out overwhelming yourself. It’s usually a good idea– so long as you don’t mistakenly remove things that are needed from the working set.
*** Interrogation techniques are wide ranging, so perhaps my term isn’t as accurate as I’d like either. But, because interrogation is fairly obvious when it’s happening I think it works for now.