Impedance mismatch: a hacker’s best friend.

A security application, such as a Web Application Firewall or an Anti-Virus, can be vulnerable to impedance mismatch attacks if it interprets traffic and input differently than the back-end does.

In this article, we’ll see three simple but efficient cases of evasion attacks.

PHP base64_decode()

The PHP base64_decode function is a great example to illustrate impedance mismatch issues. Let’s assume that we want to detect the base64-encoded eval(base64_decode expression in a variable. We could just search for its ZXZhbChiYXNlNjRfZGVjb2Rl encoded string:

<?php
if ( strpos( $buffer, 'ZXZhbChiYXNlNjRfZGVjb2Rl' ) !== false ) {
    echo 'Base64-encoded "eval(base64_decode" detected!';
}

But let’s have a look at how the PHP base64_decode function really works. The code is found inside the php_base64_decode_ex function from the ext/standard/base64.c file:

PHPAPI zend_string *php_base64_decode_ex(const unsigned char *str, size_t length, zend_bool strict) /* {{{ */
{
    const unsigned char *current = str;
    int ch, i = 0, j = 0, padding = 0;
    zend_string *result;
    result = zend_string_alloc(length, 0);

    /* run through the whole string, converting as we go */
    while (length-- > 0) {
        ch = *current++;
        if (ch == base64_pad) {
            padding++;
            continue;
        }
        ch = base64_reverse_table[ch];
        if (!strict) {
            /* skip unknown characters and whitespace */
            if (ch < 0) {
                continue;
            }
        } else {
            /* skip whitespace */
            if (ch == -1) {
                continue;
            }
            /* fail on bad characters or if any data follows padding */
            if (ch == -2 || padding) {
                goto fail;
            }
        }

It checks if the string contains unknown characters (i.e., characters that aren’t part of the base64 alphabet: a-z, A-Z, 0-9, +, = and /), will skip them and keep decoding the string. That means that we can inject many bogus characters into a base64-encoded string in order to bypass our script, they won’t corrupt the PHP decoding process. For instance, we can take the original ZXZhbChiYXNlNjRfZGVjb2Rl string, and transform it into Z &X#,]Z)!%h b~C -h}i{ Y--X.# @N&#96;~l~N]]]] ]]j.. R}#fZGVj^b2Rl instead.
It works perfectly:

$ php -r 'echo base64_decode("Z &X#,]Z)!%h b~C -h}i{ Y--X.# @N`~l~N]]]] ]]j.. R}#fZGVj^b2Rl");'
eval(base64_decode

The PHP decoding behaviour isn’t odd at all: it strictly follows the RFC2045 which states that “any characters outside of the base64 alphabet are to be ignored in base64-encoded data“.

PHP unserialize() and the mysterious “+” sign

Lately, I decided to add to NinjaFirewall, our PHP web application firewall, a new option that will attempt to block serialized PHP objects found inside a GET or POST request, cookies, user agent and referrer variables.
Basically, a serialized PHP object looks like this one:

O:8:"stdClass":1:{s:3:"foo";s:3:"bar";}

I thought I could block it by searching for an O letter (O for Object), followed by a colon :, a number (one or more digits), a colon :, a double quotation mark ", a valid class name, a double quotation mark ", a colon :, a number, a colon : and a left curly bracket {. That would match the O:8:"stdClass":1:{ part of any serialized object.

The following regular expression looks perfect:
[OC]:\d+:"[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*":\d+:{

But here too, let’s have a look at the PHP source code: the php_var_unserialize_internal function is located in ext/standard/var_unserializer.c:

    switch (yych) {
    case 'C':
    case 'O': goto yy4;
    case 'N':   goto yy5;
    case 'R':   goto yy6;
    case 'S':   goto yy7;
    case 'a':   goto yy8;
    case 'b':   goto yy9;
    case 'd':   goto yy10;
    case 'i':   goto yy11;
    case 'o':   goto yy12;
    case 'r':   goto yy13;
    case 's':   goto yy14;
    case '}':   goto yy15;
    default:    goto yy2;
    }
yy2:
    ++YYCURSOR;
yy3:
    { return 0; }
yy4:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy17;
    goto yy3;
yy5:
    yych = *++YYCURSOR;
    if (yych == ';') goto yy19;
    goto yy3;
yy6:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy21;
    goto yy3;
yy7:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy22;
    goto yy3;
yy8:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy23;
    goto yy3;
yy9:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy24;
    goto yy3;
yy10:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy25;
    goto yy3;
yy11:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy26;
    goto yy3;
yy12:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy27;
    goto yy3;
yy13:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy28;
    goto yy3;
yy14:
    yych = *(YYMARKER = ++YYCURSOR);
    if (yych == ':') goto yy29;
    goto yy3;
yy15:
    ++YYCURSOR;
    {
    /* this is the case where we have less data than planned */
    php_error_docref(NULL, E_NOTICE, "Unexpected end of serialized data");
    return 0; /* not sure if it should be 0 or 1 here? */
}
yy17:
    yych = *++YYCURSOR;
    if (yybm[0+yych] & 128) {
        goto yy31;
    }
    if (yych == '+') goto yy30;
yy18:
    YYCURSOR = YYMARKER;
    goto yy3;
yy19:
    ++YYCURSOR;
    {
    *p = YYCURSOR;
    ZVAL_NULL(rval);
    return 1;
}

It first looks for the O character, then it jumps to label yy4 where it makes sure that the next character is a : colon, otherwise it returns an error. Then, it jumps to label yy17 where it looks for an optional + sign that might precede the integer! You can read the PHP unserialize() documentation as many times as you want, you won’t find any reference to that mysterious + sign. But it’s there, in the source code.
Therefore, if a hacker wanted to bypass my regex, he could inject the following serialized object:

O:+8:"stdClass":+1:{s:+3:"foo";s:+3:"bar";}

PHP will happily decode it without throwing any error:

$ php -r "print_r(unserialize('O:+8:\"stdClass\":+1:{s:+3:\"foo\";s:+3:\"bar\";}'));"

stdClass Object
(
    [foo] => bar
)

WordPress User Enumeration

Impedance mismatch issues can also occur with CMS such as WordPress. For instance, the “User Enumeration” feature: it allows to show posts associated with certain author. There are several different ways to use it, among them:

  • Show posts from multiple authors with IDs 1, 3, 6 and 9: ?author=1,3,6,9
  • Show posts from all authors, except author with ID 123: ?author=-123
  • Show only posts from author with ID 123: ?author=123

The last one is the one used by hackers to retrieve usernames associated with an ID in order to launch more accurate brute-force attacks. There are many articles available that recommend to block user enumeration at the .htaccess level with a regex similar to this one:

^author=[0-9]+

But let’s check the WordPress source code to find out if it can really protect us: the WP_Query class is defined in the wp-includes/class-wp-query.php script:

$qv['author'] = preg_replace( '|[^0-9,-]|', '', $qv['author'] ); // comma separated list of positive or negative integers

It shows that WordPress removes any unwanted characters from the author variable (i.e., characters that aren’t a digit, a comma , or a hyphen -) and keeps going on, rather than rejecting the request because it contains unwanted characters. It does exactly what PHP does in the base64_decode() function. Therefore, if a hacker wanted to check user ID 123 and bypass our regex, he could send the following request:

?author=you1cant2catch3me

A Hackers’ best friend

Impedance mismatch issues can occur at any level: the HTTP server (nginx, apache etc), the PHP interpreter (PHP5, PHP7, HHVM), the CMS (WordPress, Joomla etc), the database server (MySQL, MariaDB, PostgreSQL etc) and even on the client side (i.e., the browser, which could lead to cross-site scripting attacks) or the OS.
They are hard to spot and to prevent, because it requires to read the source code of each of those applications (if available, otherwise you may need to disassemble it) and to always keep an eye on it and its evolution because new updates can bring new issues. But from a hacker’s point of view, impedance mismatch attacks are one very efficient way to bypass WAFs and other security applications.