Reversing Web Apps: The Caveats

Because our process if reversing is not a direct 1:1 mapping to compiled reversing, we have to clarify a bit on how we can be successful.  Although some frameworks generate HTML based on the underlying code, HTML cannot always be reversed to a state of source.  People do weird stuff.  So we must additionally rely on application behaviors and concepts found in forensics and social engineering.

The primary basis of our reversing approach is on Locard’s exchange principle.

Wherever he steps, whatever he touches, whatever he leaves, even unconsciously, will serve as a silent witness against him. Not only his fingerprints or his footprints, but his hair, the fibers from his clothes, the glass he breaks, the tool mark he leaves, the paint he scratches, the blood or semen he deposits or collects. All of these and more, bear mute witness against him.

Locard was a smart dude.  You can’t do things in life with out leaving some evidence behind into the how and why something took place.  Even the attempt to “clean” a crime scene leaves evidence that the crime scene itself was cleaned.  This holds especially true when building applications*.  Since information leaking isn’t in the OWASP top 10 list, most applications are like bilboards which scream how they were built.  Furthermore, how an application responds or behaves against data is also just another way to identify what it’s composed of.

As a very easy example, lets look at a typical ASP.NET WebForms based application.

The first bit of evidence are the file extensions, .NET applications typically use .aspx, ashx, and .asax.  This immediately focuses you on either an ASP.NET MVC application or a WebForms one.  To identify which was used, we can use unique features of WebForms such as ViewState or EventValidation.  These don’t generally exist outside of this WebForms, because ASP.NET MVC pages are not event driven and are supposedly RESTful.  These framework features are obvious and easy to look for (read: grep & view-source).  Because ASP.NET WebForms is event driven, it likes to mangle names of objects in order to make sure that you don’t have naming collisions.  As a result, if you had a ASP.NET Panel control which contained an ASP.NET TextBox control in it, you’d have a HTML rendering which looked very similar to:

<div id="Panel_NamedPanel">
<input name="ctl100$Panel_NamedPanel_TextBox1" type="text" value="oh hai" /> 
</div>

This special naming convention suggests not only the framework, but even the version (as previous versions use a different convention).  IIS also tends to tell you what framework version, and there are default ASP.NET folders you can test for to see if they exist.  A “Views” folder will exist for MVC .NET apps, and is unlikely to exist for a WebForms one.  Failing all that, look at the career page and see what they want new developers to know. 🙂

Like I said, lots and lots and lots of evidence.

By just having the application framework identified, you have reduced your working set significantly**. If you suspect that the site you were looking at was built on a content management system, you could use the google to search for any “unique” named fields or pages to see if any results come up which might help you identify the framework.  I use technique this often.

Secondly, because our process is based on feedback cycles– how we interact with the site is of importance.

Although some people use the terms active & passive testing, I find them misleading.  You are nearly always actively testing the site, though sometimes in less obvious ways.  I prefer the terms, elicitation and interrogation.  In elicitation, you are strategically asking the application a series of questions which are reasonably acceptable in normal use.  This is done not to set off triggers (ids) and end the conversation, but also because sometimes it’s the best way to get information.  Interrogation, on the other hand, is often far more aggressive and very obvious it’s being done***.  To compare and contrast, I might elicit details about an encoding scheme used on a web application with a creative user/details such as:

Name = John "the duke" O'Reilly
Street = 123 Some Street #123 (near 4th & Thomas)
City = Phoenix/Ahwatuke
...etc...

This user could very reasonably exist, and concurrently tests different reserved characters to see how they are handled.  This name is unique enough that it makes it easier to later grep for in results to see where it’s used throughout an application.  It also is unlikely to ever be in someone’s WAF.  So I have an incredibly strong chance of not being bothered by one if it exists.  If I was testing this in a more interrogative sort of way, I might just spam the fields with a list of xss attacks like:

"><script>"
<script>alert("XSS")</script>
<<script>alert("XSS");//<</script>
<script>alert(document.cookie)</script>
'><script>alert(document.cookie)</script>
'><script>alert(document.cookie);</script>
\";alert('XSS');//
%3cscript%3ealert("XSS");%3c/script%3e
%3cscript%3ealert(document.cookie);%3c%2fscript%3e
%3Cscript%3Ealert(%22X%20SS%22);%3C/script%3E
&ltscript&gtalert(document.cookie);</script>
&ltscript&gtalert(document.cookie);&ltscript&gtalert
...etc...

Conversely, these payloads MIGHT be in a WAF and could be blocked, despite the field being vulnerable.  Neither approach is “better” than the other, they are just used in different places for different reasons.  The trick is, of course, to know when to use which and what might cause deviations in your ability to understand the response.  For instance, just like in interrogation sessions, applications tend to shut down if you are too aggressive.  Or if you are too obvious with your questions, a WAF might block keywords and become (in a theoretical sense) aware of your deceptions.  People aren’t really named Bobby DropTables.

But just to be complete– it wouldn’t matter so much if they did block it.  The sheer fact that it’s blocked implicates some type of countermeasure, either a WAF or application filter.  You can distinguish between the two with forensics.  WafWoof (or Waffit) is an example of a tool which attempts to figure out what WAF is being used by testing various encodings that WAFs use in general.  If it’s an application filter, they are sometimes implemented as plugins and you can try to force browse to see if they exist.  If those fail, you can look for gaps where an application filter might not be applied.  In ASP.NET WebForms, for instance, some controls don’t encode output data by default.  Sometimes you can bypass an application filter with an attack against an AJAX type service– a WAF might still filter data, where often application filters don’t.  You could try comparison measurements against pages with known and made up parameters to see how they are handled. It goes on and on and on.

You can’t stop the signal.

Our final basis is that application behaviors can assert it’s relationships, entities and types.

This concept will be discussed and demonstrated at great length as we get into decomposition.  It’s worth noting, for now, that this approach is used when testing malware somewhat frequently.  Allowing the malware to affect/infect controlled systems, lets the reverser discern not only what it does, but what things it might then be built of.  In order to do X, an app might be composed YZ.  This basis provides useful evidence for asking intelligent questions later on.

Le Finale

The engineering process is one of pragmatism.  Applications aren’t built in total isolation.  They use frameworks to develop with, and reuse code (patterns & algorithms) to solve problems.  They also aren’t generally aware of how obvious that is, which makes it VERY easy to gain visibility into what they’ve done.  Despite not being a 1:1 relationship to compiled reversing, we can be very successful in figuring out how an application is built.

If a website boldly declares it’s written in ASP.NET WebForms you should have open the MSDN articles speaking to what might be there.  If a website further boasts of being built on top of DotNetNuke, you should download the source and have a local copy you can use to help navigate the site you’re looking at.  It is always in your best interest to download the framework locally and use it as a frame for your test.

Every bit of evidence can and should be used against them.

-A

* Some apps would be best served if developers tried to cover up that they wrote it, I’ve seen many a travesty in my time.
** Reducing your working set is a way to digest information with out overwhelming yourself.  It’s usually a good idea– so long as you don’t mistakenly remove things that are needed from the working set.
*** Interrogation techniques are wide ranging, so perhaps my term isn’t as accurate as I’d like either.  But, because interrogation is fairly obvious when it’s happening I think it works for now.

Advertisements

Reverse Engineering Web Applications: The Series

There is only so much you can share in a talk, and so I’ve decided to turn a short 50 minutes into a rightfully lengthy series.  I know this post is long, but I kindly ask you bear with me.  We will revisit the topics discussed in this post repeatedly throughout our series– so it’s best to establish some basis and familiarity with them now.

Indeed, it makes little sense to jump into the technical meat and potatoes without first defining the words, processes, and concepts to evaluate the work ahead of us.  This post, after all, serves as our guide to establish goals and valid measurements on if we are successful or not.

Reverse Engineering

In the compiled binary world, reverse engineering is the taking of an application (executable or dll) and leveraging a combination of compiler theory and assembly to arrive at a reasonable representation of its original source code.  And though, while this definition is accurate, it’s somewhat mechanical and not exactly very revealing.  We are going to define reverse engineering as:

The art of deducing an application’s elements, composition, behaviors, and relationships.

This definition is more functional as it establishes the goals of our process.  Why someone might reverse engineer an application is variable of intent, and to a degree irrelevant to the conceptual goals.  That is, people reverse engineer for a variety of reasons including, interoperability, general education, and security testing.  Although each of these reasons dictate unique attentions and focuses, our conceptual goals still stand.  In our case, “why” we reverse engineering applications is predicated around the belief that security is a visibility problem.

In the ideal world, every engagement would grant me source code access and a copy of the application/environment*.  Having 100% visibility into the static and dynamic environment of an application is incredibly powerful.  By its nature, it eliminates the need for guessing and will make attacks significantly more informed and reliable.  Simply put, a better job can be done because this is a position of advantage.  It serves to reason then, that in all situations less than the ideal, we must reverse engineer to get into that position.

The Process of Information Gathering

Now, if you’ve been around the block, you might note that few (if any) in the appsec industry use this lingo.  In its stead, you will hear about information gathering and in some cases even analysis.  The Web Application Hackers Handbook (WAHH) uses this combined definition as the entry point to any web security test.  While I believe the track they are on is correct in a sense, I’d dare suggest the picture painted is inaccurate.

Traditional information gathering, as defined by OWASP, WAHH, and many others, is ubiquitously listed first step in the hierarchy of checklist style web-testing.  The laundry list of tasks it outlines include:

  1. mapping visible content
  2. identifying non-visible content (forced browsing, search engine discovery)
  3. testing for debug parameters
  4. identifying data entry points
  5. identifying the technologies used in the application
  6. mapping the actual attack surface
  7. analyzing exceptions

These tasks are further broken down into numerous sub-tasks and subtle implications, such as testing with various browsers, extracting a list of every parameter used in the site, gathering comments, and so on and so on.  Though perhaps not as apparent up front, these tasks create a huge amount of upfront work if you were to follow them in the literal definition.

Which is why basically no one tests like that.

Lets presume for a moment that you had the time to do all this.  This is not a simple presumption, mind you, as the time this takes is exponential to size of the application.  But lets say you did.  Related to the tasks and goals we defined with reverse engineering what have you learned?  You’ve collected a series of facts about the application, but you are realistically in no better a situation than when you started.  Gathering a list of every parameter in a site doesn’t make you more situated to test any of them in a relevant way.  You still lack context and understanding (more so if you used automation to achieve this).

And did I mention it’s slow?  As the surface area of an application grows, especially its dynamic surface area, so does the amount of information possible that you can collect.  Take just one aspect of the discovering of non-visible content, for instance.  While there are only a few approaches, most commonly this is performed with tools such as dir-buster and burp’s discovery tools**.  These tools throw mutations and variants at the site based on content previously learned about.  This approach sounds good, but grows very very quickly and in most cases never actually finishes on anything except the most trivial of sites.

So.  It is fair to say that some types of information collected (and some collection methods) are far more valuable than others.  In most actual instances of reverse engineering (not for web), it’s rare one would try to collect “everything” about an application.  More common would be to understand and evaluate a specific item of concern or to unpack a behavior.  The task of information gathering and analysis (in our case reversing) is only valuable in its ability to drive us forward towards a goal.  We do not collect information for information’s sake.

Instead, when asked to describe their methodology, the most common answer I hear*** is somewhat nebulous and uncomfortable.  It’s often described as this:

I use the application/system like a normal user would, and follow leads and play with interesting things as they come up.  I keep going until I feel as though I’ve hit all the important stuff.

Yikes.  Admittedly, that not a very well-defined process– and surely not something that can be taught to others in its current state.  Luckily, however, there are indeed ways to unpack the gems hidden in that statement.

The Art of Reverse Engineering Web Applications

The first thing to note of the aforementioned description is that the process is best understood as iterative, not hierarchical.  The application is revisited, over and over, and as new information is discovered absorbed into our understanding and acted against when it’s determined to be valuable.  The deciding of what to test is both natural and dynamic.  In contrast, hierarchical testing implies you gather a bunch of details once and move on.  Waterfall, as a development methodology, has fallen out of favor to build software– so why would we test it that?

For me, understanding this iterative process hit home when I studied a bit on John Boyd. Boyd was a modern military strategist in the Air force who is perhaps best known for his work with maneuverability warfare and OODA (Observe, Orient, Decide, Act) loop.  The OODA loop provides interesting insight into our natural method of processing information and deciding/acting against it.  It proposes that we take in information and evaluate it for its worth based on our history, emotion, bias, and cultural experiences.  Once evaluated, we decide what we do with it and act against it.  This loop is constant, generally subconscious, and usually very quick (some loops are longer than others).  You may not make a decision to operate this way, but I believe you do regardless.

The implication in this type of testing (and further hinted at in our description) is that the tester must rely heavily on their ability to see patterns and deviations of those patterns.  This places a premium on having exposure to a wide variety of patterns and practices, such that they can be observed and oriented to appropriately.  While, it’s theoretically possible the first time you test MVC pattern based sites you’d discern its inner working and details, it is unlikely.  It is about as likely as writing a masterfully composed song the first time you learn to play the guitar.  Possible, but not likely.  As such we will spend considerable time discussing patterns for web applications in later posts.

Finally, the aforementioned process definition forces us to face the most common rebuke I get when sharing this approach– “how do you know when you are done?”, which is usually coupled with an expression of desire to be thorough and to ensure the client gets the best test.  This question is a good one to ask, and not one asked enough.

To answer that question we have to strip away the illusion that any model could perfectly satiate the fear of being incomplete.  They all are.  We have no evidence to suggest that we are capable of finding and squashing all bugs (let alone security bugs) even when an application is put under numerous spot lights.  The hierarchical model, in my opinion, exists exactly because of this fear– people like bookends.  It fulfills a desire for a concrete beginning and ending to the test, but in exchange it steals away creativity and relationship between the tester and what an application has to say.  Its like two people dancing next to each other, not with each other.

Instead– testing in a fashion reliant on past experiences, asking questions, and listening to the application is perhaps the best way to provide a thorough test.  It allows the tester to deal with what IS going on with a site, while not trying to fit the site into a specific mold.

To be clear, though, the nebulous definition of methodology still sucks.  I am not suggesting testing an application in a totally undirected fashion.  I am merely pointing out that actual conversation with the application has a much greater potential to drive us deeper into the heart of what is going on.

Since the aforementioned definition is indeed nebulous, the approach we will review and work with is somewhere in between.  It is clearly less formal than a hierarchical approach, yet more formal than “I just test the app.”  It is a focused and iterative process in which each piece of the test drives us forward and continues to reveal even more of the puzzle.  It is both active and passive**** as, in many cases, we can shortcut the guess-work through functional exploits to gain a deep visibility into the application’s composition.

Oh.  So, how do I know when a test is over?  When I say its over.  Being a professional, reliant my past experiences and education puts me in a position to say that.  Relying on someone else’s checklist does not.  The rest of the series will revolve around unpacking what this all means.  This process is neither comfortable, traditional, or yet complete.  But for me, it’s made all the difference so far.

On a personal note, I’ve tried the hierarchical approach to testing applications through studying and following a wide variety of methodologies.  In each case, I can say that inevitably I am left with a feeling of, quite frankly, boredom.  Every test becomes the same, and the job becomes monotonous.  I have embraced the approach I am outlining in this series because I’ve found that testing is a relationship.  Applications are very honest, and if you can learn to ask intelligent questions, and how to listen to what they say, they will tell you a great deal about themselves.

-A

* I also want access to the server if it’s hosted, a recreation of environment with total visibility.
** This is not a knock at these tools or their creators– only pointing out the shotgun approach bears limited fruit, especially compared to other more informed approaches.
*** My intent was not exactly scientific in this, as I did not send out a formal survey and such.  I did, however, talk to a fair bit of experienced and notable testers about this issue.
**** Active and passive testing are also very weak terms.  Lately I’ve been using the terms elicitation and interrogation.  I will get into more detail on that later.

Giving in.

I am somewhat bummed out to announce that I am now an owner of an iphone 4s.  I recognize that’s not something normally worth of grief– so I should explain.  If you’ve ever met me, one of the things I don’t pull any punches with is my disdain of Apple.  With out going into a long history of grievances, lets just say I don’t see eye to eye with much of how they operate.  I made the mistake of reading “iCon”, a biography on Steve Jobs*.  Since then, I had effectively vowed not to do business with the man or company.

But here I am, buying an iphone.  wth.  I expect that this decision will get me countless amounts of teasing from my friends– but that’s okay.  Perhaps I deserve it.  I got the phone because it’s the best on the market, is kind of a depressing statement of the market.  I’ve had a windows phone 7 developer phone on loan for some time now.  It’s a great phone, but it’s a developer/beta phone and has lots of ‘quirks’ that don’t make it viable for long term (I am also switching carriers).  That said, I love the phone.  It’s decently powered, with an okay camera, and leaps and bounds better than the windows 6.5 I owned before that.  Also, being a long term .net developer, the idea of writing my own software was kinda neat.

So my hope was to find a nice new windows phone 7 device.  With verizon, however, there is currently only the trophey.  While this is an okay phone, if I am going to have a phone for at least two years, I want it to be top grade.  My phone is a valued tool for my work.  I started to look into what was coming out, and was very excited about the HTC Titan– at first.  The reviews I read said over and over the same thing– good phone, but old news.  The specs for memory were not good, the processor was a single core, and comes with a small amount of hard disk space.  The biggest drawbacks where it’s bad resolution on the screen and it only boasts a 720p video recordings… something I do a lot of with my kids.

As much as I love the windows phone 7 OS– I can’t justify waiting 2 months to purchase yesterday’s technology.  It doesn’t make sense.

Which left me to decide between the droid bionic and the 4s.  The droid and 4s had very similar specs over all, except the droid is like a brick in your pocket.  I’ve had brick in pocket phones before, and I just can’t go back to them.  I mean, the droid could swallow my current phone it was that big.  The final clincher was omnifocus.  Work provided me a mac book pro**, and I’ve fallen in love with omnifocus.  It helps keep my otherwise cluttered brain reasonably straight.  Having it on my phone and syncing to laptop is a total win.

So that is the story.  I could try and justify it any other way, but men hold themselves accountable for the decisions they make.  I bought the phone because its the best on the market and it makes the most sense.

-A

* This does not mean that I am an advocate for Gates either– but the differences are indeed fairly large tween the two.
** The mac book pro I’ve had for the last year has literally been the worst computer I’ve ever owned.  It was unstable, crashed frequently (including my Defcon talk), and just was bad news.  I recently brought it in, as it’s likely due to a bad video card.  The new laptop has not crashed on me yet, though ironically the MS office suite seems to be unstable as late.

 

Breaking Non-Existant Code

I recently ran into a fun problem that stumped me for about 1/2 hour.  I had found a value that I could control in the query string, which would put data inside a “onmouseover” attribute on an href tag.  So something like:

url:          ?myvalue=”xxxx
html:      <a href=”#” onmouseover=”par=window.parent;par.call_function(‘xxxx‘);”>test</a>

normally, one would escape this by setting the “myvalue” parameter to:

x’);alert(document.cookie);(‘x

which when injected would terminate the call to my_function and inserts my alert box*.

<a href=”#” onmouseover=”my_function(‘x’);alert(document.cookie);(‘x‘);”>test</a>

This is pretty straight forward, tried and true.  Except, in the case of this test, the my_function function doesn’t actually exist.  This particular page was expected to be called from an iframe, and it’d walk back up to the parent to call a library the parent has loaded**.  In short, this means that due to a bug in the actual page, my attack wouldn’t work because JS would stop processing after the failed call to my_function.  Suckage.

But not all is lost.  To get around this normally, we just need to inject ourselves earlier into the page’s process before the page can call the missing function.  One could try:

x’);”/><script>body.onload = function(){ alert(document.cookie); } </script><a href=”#” onmouseover=”(‘x

which results in:

<a href=”#” onmouseover=”my_function(‘x’);”/><script>body.onload = function(){ alert(document.cookie); } </script><a href=”#” onmouseover=”(‘x‘);”>test</a>

In this payload, we end the onmouseover and include our own script tag.  In this tag, we override the onload behavior of the body***, which would allow our alert box to execute before JS has a chance to later fail.  The Javascript after that is just a nicety to prevent the html from being malformed.

Except, yet again we were foiled.

In this case, the developers were actually encoding the ” character, so I couldn’t break out of the function call to do this.  Normally this might be end game; but never fear, order of operations prevails in the end.

q: In general programming theory– before a function can be called, it first has to what?
a: Process it’s arguments.

To get my payload to execute, despite the fact that my_function() doesn’t exist, I merely have to make my attack an argument to that function.  In other words, unless you pass a reference to a function (or a proc itself), the application will have to first process that call before it can call the function it’s being passed to.  The end payload is:

x’,alert(document.cookie),’x

which results in:

<a href=”#” onmouseover=”my_function(‘x’,alert(document.cookie),’x‘);”>test</a>

In this example the alert box’s results are to be passed to the non-existant function, which processes my payload.  If this was anything more than a POC, you’d do something far more nasty, such that the end user would never see the broken code.  Because, for all intent and purpose, you fixed it for them.  How nice of you.

This topic of fitting is something I will be talking more about in my BSidesDFW talk coming up in November.

-A
* for anything other than this post, you’d generally want to grab the session details and send them to a 3rd party host, and maybe even redirect the user to the login page so you could try and login as them shortly there-after.  But that code just makes this code less clear, so we went back to a trusty old alert box.

** the code itself calls into the parent window and loads it to make the function call.  But, because that’s more than needed I omitted for clarity sake.

*** you can override the body, document, or window depending on what you are after.

Of all the things I’ve lost…

A funny thing has dawned on me recently.  During the course of an average day, I read code at least 2 to 3 times.  Sometimes it’s to quickly evaluate read a plugin I’ve downloaded, sometimes it’s to do a thorough review, and sometime’s it’s just because I want to know how something works.  I even have been spending more time with languages that I hadn’t looked at in years.  But.  I haven’t written a real application* in over a year.

Don’t get me wrong, I script a fair bit… I did so a little bit this weekend, in fact.  But it’s hit me how much it’s dissimilar from actual programming.  Most of my scripts are quick tools to reach places that are too hard to set up macros for, or to parse out some text I want to use later.  I haven’t sat down and designed, built tests, implemented, fixed API and released code in well over a year.

And that feeling burns.

I spent well over the last 12 years of my life writing code.  Not always to the best of quality, mind you– but with lots of focus, energy and general zeal.  Toward the last few years of my development time, I was actually getting fairly sharp.  I had studied how language affected API development, and had some really nerdy insight into how the CLR and other .NET goodness worked.  I could imagine something, draw it out, write it and get it working pretty quickly.  I was a good engineer.

Now, I am fumbling over a little project I started over 3 months ago– for no real good reason.  A part of me thinks it’s because I am attempting to program it in a language I’ve not written a medium-sized app in before.  Another part of me thinks it’s because I have so many half-baked patterns in my head that I can’t seem to find one that fits the way I want.  Another part of me, perhaps just my demons speaking, feels like I am getting old and dull.

Either way I slice it– I must finish this code, I think.  It’s a “moral imperative” as some geniuses I know might say.

* I did write a version of pywebfuzz in ruby so I could quickly grab payloads– but I can’t and won’t release it in its hacky shape.