Anatomy of the EICAR Antivirus Test File.

Revision: August 26, 2021

A customer using NinjaFirewall (WP+), our Web Application Firewall for WordPress, asked us to explain what was the meaning of this line, found in NinjaFirewall’s log:

178.137.xx.xx POST /index.php - EICAR Standard Anti-Virus Test File blocked - [favico.gif, 68 bytes]

It means that someone attempted to upload a 68-byte EICAR anti-virus test file (disguised as a favico.gif image), probably to test if the WordPress blog was protected by a security application. And indeed, it was: NinjaFirewall blocked it.
I redirected our customer to the EICAR AV Test File official page but he came back with another question: how come this bunch of characters can be a program and print a message to the screen?

The EICAR Anti-Virus Test File is made up from 68 ASCII printable characters that you can view with your favorite text editor:

X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

But it is a legitimate 16-bit DOS program. Let’s take for instance the first character of the string:

  • In the ASCII table, it is simply the X character.
  • In Decimal format, it is 88.
  • In Hexadecimal format, it is 58h.
  • In x86 assembly language, i.e., the language that the CPU in a computer can understand and follow, it is a specific instruction: pop ax.
    This instruction means to move (pop) the 2 bytes from the stack pointer ss:[sp] to the 16-bit ax register.

If you take into consideration that the EICAR AV Test File contains 67 other characters, you can probably understand now why it is more than just a “bunch of characters”.

Disassembly Listing

Loading the EICAR AV Test File into a disassembler will produce the following listing. The first column shows the current segment:offset, the second one the program opcodes and the last one shows the corresponding x86 instructions.

; Beginning of the executable code (29 bytes):
0001:0100   58       pop ax
0001:0101   354F21   xor ax, 214Fh
0001:0104   50       push ax
0001:0105   254041   and ax, 4140h
0001:0108   50       push ax
0001:0109   5B       pop bx
0001:010A   345C     xor al, 5Ch
0001:010C   50       push ax
0001:010D   5A       pop dx
0001:010E   58       pop ax
0001:010F   353428   xor ax, 2834h
0001:0112   50       push ax
0001:0113   5E       pop si
0001:0114   2937     sub [bx], si
0001:0116   43       inc bx
0001:0117   43       inc bx
0001:0118   2937     sub [bx], si
0001:011A   7D24     jge 0140

; The '$'-terminated string (35 bytes):
0001:011C   db       'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$'

; End of the executable code (4 bytes):
0001:0140   48         dec ax
0001:0141   2B482A     sub cx, [bx+si+2Ah]

Out of the 68 bytes, only 33 are used for the executable code: the first 29 bytes and the last 4 ones. The 35 bytes left are the EICAR-STANDARD-ANTIVIRUS-TEST-FILE! string plus its trailing $ character.
Regarding the last four bytes, they just don’t make sense: the program decrements the ax register at offset 0x0140, and then subtracts a value to the cx one at offset 0x0141, but it does not make any use of them before exiting. A deeper code analysis will help us to understand that part.

Code Analysis

This first instruction pops two bytes from the stack pointer, ss:[sp], into the ax register:

0001:0100 58       pop ax

Because it is empty, it simply clears ax. This is equivalent to instructions such as mov ax, 0 or the faster (and more elegant) xor ax, ax.

It makes a XOR mask with ax and 214Fh:

0001:0101 354F21   xor ax, 214Fh

Because ax was empty, it will be equal to 214Fh now.
It is saved on the stack:

0001:0104 50       push ax

It makes a AND mask with ax (214Fh) and 4140h:

0001:0105 254041   and ax, 4140h

To make a mask, we convert both values to their binary notation:

214Fh: 0010000101001111
4140h: 0100000101000000
------------------------
AND    0000000101000000 => 140h

ax new value is now 140h. Note that this is the address of the offset of the first byte following the EICAR string data.

The value of ax is pushed on the stack, and popped back into bx:

0001:0108 50       push ax
0001:0109 5B       pop bx

It makes a XOR mask with al and 5Ch:

0001:010A 345C     xor al, 5Ch

ax value is currently 0140h. al is the lower byte of ax (40h), while ah is the higher one (01h):

40h: 01000000
5Ch: 01011100
--------------
XOR  00011100 => 1Ch

al‘s new value is 1Ch and hence ax is now equal to 011Ch. It points to the address of the offset of the EICAR string.
It saves the 011Ch value on the stack and pops it back into the dx register for later use (remember this one, it is important):

0001:010C 50       push ax
0001:010D 5A       pop dx

It pops the current stack pointer value into ax. It is 214Fh (see the instruction at offset 0x0104):

0001:010E 58       pop ax

We need to make a XOR mask with ax and 2834h:

0001:010F 353428   xor ax, 2834h
214Fh: 0010000101001111
2834h: 0010100000110100
------------------------
XOR    0000100101111011 => 097Bh

ax is now equal to 097Bh. The important value to remember here is the higher byte, 09h, which is stored in ah. It is saved on the stack and popped back into the si register:

0001:0112 50       push ax
0001:0113 5E       pop si

And now, here comes the good part: the self-modifying code.

0001:0114 2937     sub [bx], si

bx is equal to 0140h and hence the word pointer [bx] contains the 2 bytes at offset 0x0140 that are 48h and 2Bh, which gives us 2B48h (little-endian data format). Minus the value of si, 097Bh, the new word at [bx] becomes 21CDh.
If you are familiar with assembly language and the DOS interrupts, you probably noticed that 21CDh is the opcode used for the DOS interrupt 21 call: int 21h.

Then, it increments bx (0140h) twice:

0001:0116 43       inc bx
0001:0117 43       inc bx

bx is now equal to 0142h, which means that it points to the address of the offset of the last two bytes of the program, respectively 48h and 2Ah.

It subtracts si to the word pointer [bx]:

0001:0118 2937     sub [bx], si

2A48h minus si (097Bh) gives 20CDh. Here too, we can easily recognize another DOS interrupt call, int 20h, used to terminate a COM program.
The last two bytes of the program are patched on the fly with that word. This is the end of the self-modifying code.

The next instruction is a conditional one and means: jump to offset 0x0140 if the SF Sign flag equals the OF Overflow one. In this program, the condition is always met and hence it will jump over the EICAR string (note that there is no alternative, otherwise the COM program would crash trying to execute the string!):

0001:011A 7D24     jge 0140
0001:011C          db 'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$'

And now, we can see the result of the self-modifying code. In the original disassembly listing, we had these two nonsense instructions:

0001:0140 48         dec ax
0001:0141 2B482A     sub cx, [bx+si+2A]

We just patched them on the fly, and now we have:

0001:0140 CD21     int 21h
0001:0142 CD20     int 20h

It will use the values of ah and ds:dx that were manipulated during the execution of the program to call the interrupt 21h with the following parameters:

  • ah = 09h: DOS int 21h, display string service 09h.
  • ds:dx = 011Ch: the offset of the ‘$’ terminated string to display to the screen, which here points to the EICAR 35-byte string.

Lastly, it calls the DOS interrupt 20h.

That’s it. This COM program simply prints the “EICAR-STANDARD-ANTIVIRUS-TEST-FILE!” message and quits.

Post Analysis

The whole program could have been written in a much simpler way:

jmp     @start
msg     db "EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$"
@start:
mov     dx, offset msg
mov     ah,09h
int     21h
int     20h

But it wouldn’t match the requirements:

  • The EICAR AV Test file must only use the following ASCII printable characters: UPPER CASE letters, digits and punctuation marks. No other characters are allowed.
  • Its code can be copied and paste with any text editor, there is no need to base64-encode it. It can also be printed.
  • No compiler or linker are needed to build it.

Regarding the self-modifying code, it was used for two reasons:

  • The int 21h and int 20h x86 instructions cannot be translated into printable ASCII characters. Creating them on the fly was the best solution.
  • For decades, virus have used self-modifying code (SMC) as an evasion technique to bypass anti-virus programs. Because the EICAR AV Test file is intended to be used to test anti-virus programs and should be treated as a virus, self-modifying code simply adds some fun to this small but clever 68-byte COM program.

Detection

If you are a developer and want your security application to block the EICAR AV Test file, here is how to detect it accurately:

  • An EICAR AV Test file must start with these 68 bytes:
    X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*

    This is absolutely mandatory. Because it is a COM program, if there were any character preceding it, it would not run but crash.

  • Optionally, it may be followed by whitespace characters, but the total length of the file must not exceed 128 bytes. Only five types of whitespace characters are allowed: Tab (0x09), Line Feed (0x10), Carriage Return (0x13), Space (0x20), and Ctrl-Z (0x1A).

If the file meets all the above conditions, your application must treat it as a virus and block it. Otherwise, it must ignore it.