Don’t worry, give it 10 years and you will be an overnight success. – K. Slatoff
Since our process of reverse engineering relies heavily on pattern matching, being capable of identifying and decomposing architecture is a critical skill. Unfortunately, there aren’t very many short cuts here. I personally feel as though this skill is one of my greatest strengths, but it took 9 years or so of developing software to get here.
In spite of that you still need to be familiar with common patterns to do the real work of web application reversing & penetration tests.
Compiled binaries have a bit of a leg up on us here. When you download an application the file format is generally fairly easy to determine. This gives you some very key insights into how an application works, where data is stored, and its structure. This is not true of the web.
Luckily for us, developers have a penchant for reusability. This means that their applications are built on top of frameworks, leverage shared components, and are most often structured in known/public ways. Patterns and algorithms are the cornerstone of proper engineering. Which is also great because even if an ‘engineer’ isn’t proper– they still rely on things which are. If you’re using MVC.Net to build your application, regardless of your skill level, you have to go out of your way to not use MVC. This is true for all other frameworks as well.
One of the best resources I’ve found for this is Martin Fowler’s Patterns of Enterprise Application Architecture book. Subsequently, he has published a briefing on many patterns here: http://martinfowler.com/eaaCatalog/
Since we are discussing web apps, the web application presentation patterns are of most interest. Read up on all of them, but in particular I find that 3 patterns are most popular.
In this pattern, the page itself is the controller– which is just a fancy way of saying it’s responsible for binding the model (core application data) to the view (user interface presentation). This is pretty easy to spot as the page name is the action it wishes to perform (such as: ProductEdit.do, ProductDelete.do, etc…)
In this pattern, I treat each page as its own API since the ProductEdit page is likely to expect a whole different set of parameters than ProductDelete. For all intent and purpose, each page is a silo– loosely communicating with each other through querystring, cookie, or POST parameters.
A front controller is a somewhat similar pattern. The page itself is a type of controller, except that it mostly operates as a router of commands. Drupal and WordPress work this way, despite their ability to appear as MVC.
In this pattern, you see pages like:
This is either applied broadly (such as an index.php page) or more specifically to a functional area. (such as product.php?action=edit).
In either case, it’s also fairly straight forward to decompose.
The scope of the API to communicate with these types of applications is based on the scope of the controller. In a global scope, index controller has to support all of the parameters that could come through it. Though these commands may be ignored, the general size of the API is often fairly large. In the more focused scope, the API is usually more focused. It is not uncommon to be able to call admin commands from a less-than-admin controller if the ACL on the commands is not setup correctly. It is also easy to guess that there might be an action=edit if you see lots of action=view type commands.
Model View Controller (MVC)
In this pattern the URL structure is more than a resource locator– its a syntax for communication (also referred to as RESTful). In this pattern you have a clear abstraction of the view, the model and the controller. This usually looks something like:
There are of course variants of this syntax, for instance a default action and default controller could be used and allow for a call like:
/Products/id == returns the view action for the id.
/id == returns the view action of products by id
This pattern also creates some interesting dynamics as far as composition is concerned. Consider, that while the latter call will work, you might ALSO be able to call this page by doing /Views/Products/Edit.aspx and POST and ID to the page. This can create interesting side effects if permissions are not set correctly (especially for partial views and JSON results).
This pattern has become super popular among many frameworks. Ruby, Python, MVC.Net, Spring, Struts, etc… all use this pattern primarily for their web applications.
The aforementioned patterns are considered “enterprise patterns” specifically related to architecture. Component patterns (or design patterns) are also important to understand since they are how individual components are built. Since this post is already somewhat long, we will talk more about component based composition discovery next time.
The rest of the ‘stuff’ below represent patterns which fall into categories less easily spotted on a webpage– but are useful in figuring out how something works. Unless a developer mistake short cuts this process (such as an exception with a full stack trace), you can only reliably get an understanding about these components through interaction.
Data Access Patterns
There are three means of data access which are important to have some exposure to. This is more useful to note if you have SQL injection, but can be helpful in identifying points in the application which MIGHT be vulnerable.
These three access patterns are: string concatenation (aka: evil), parameterized queries (most common), and stored procedures.
There are very subtle and unique ways to figure out how the data access pattern is composed, but the SQL injection Attack and Defense book does better job of outlining them than I will attempt.
You are also unlikely to be able to reverse an algorithm used on the web, with perhaps the exception of various cryptographic ciphers or hashes. But you ought to be familiar with various important algorithms, as lists and data retrieval and binding are things which come in handy in more advanced attacks. You ought to know the differences between Linked Lists and Sets, for instance. Most web applications just use generic or typed lists, however I’ve run into situations where understanding how the data was being cached (as a Set) made it possible to short cut the caching mechanism (which was important so I could generate the pages uniquely each time).
There are super formal algorithm books, but also good introduction ones.
AJAX patterns are also very useful to be able to identify in the testing of a site. OFTEN these represent great chances to bypass WAF or application level input filtering mechanisms. There are basically only three approaches to this. The first puts the processing of the display entirely in the hands of the client (and just sends raw JSON back to the AJAX call). The second is that the entire component is returned, processed by the server. This approach was favored for a while in ASP.NET Ajax’s mechanisms. The final is a hybrid where parts are processed server side, parts are processed local client.
How much you will be able to manipulate these features later will depend largely on how they are composed.
Patterns are ubiquitous and unavoidable. They range from the super formal to something more commonly known as spaghetti. This mess, (their mess) is one of the first things you are going to be unpacking as you work through a site. Applications might be a mix of one or more of these patterns, as each component they might implement could leverage a different pattern for it’s development.
Understanding architectural composition is my ground zero of a test– a scoping step if you will. It is a lot of information to grok, but once you can it only takes a few minutes to figure out. The best way to get experience with this is to build sites with these various approaches.
But I reiterate, this skill dictates where the entire rest of the test goes. I believe that composition is destiny– at the very least it’s a predisposition. Each pattern has strengths and weaknesses, which you can only take advantage of if you have the chops to first recognize them. MVC for instance suffers from model binding (aka mass assignment) attacks, whereas front controllers might have command injection / authorization issues. I stack the deck as much as I can here and try to know more about architecture than the developers themselves.
If I were going to train a person on web application testing in general, enterprise and design patterns would be where I spent nearly all my time for a while. More on design patterns next time.