Revision: August 26, 2021
A customer using NinjaFirewall (WP+), our Web Application Firewall for WordPress, asked us to explain what was the meaning of this line, found in NinjaFirewall’s log:
178.137.xx.xx POST /index.php - EICAR Standard Anti-Virus Test File blocked - [favico.gif, 68 bytes]
It means that someone attempted to upload a 68-byte EICAR anti-virus test file (disguised as a favico.gif image), probably to test if the WordPress blog was protected by a security application. And indeed, it was: NinjaFirewall blocked it.
I redirected our customer to the EICAR AV Test File official page but he came back with another question: how come this bunch of characters can be a program and print a message to the screen?
The EICAR Anti-Virus Test File is made up from 68 ASCII printable characters that you can view with your favorite text editor:
But it is a legitimate 16-bit DOS program. Let’s take for instance the first character of the string:
- In the ASCII table, it is simply the X character.
- In Decimal format, it is 88.
- In Hexadecimal format, it is 58h.
- In x86 assembly language, i.e., the language that the CPU in a computer can understand and follow, it is a specific instruction:
This instruction means to move (pop) the 2 bytes from the stack pointer
ss:[sp]to the 16-bit
If you take into consideration that the EICAR AV Test File contains 67 other characters, you can probably understand now why it is more than just a “bunch of characters”.
Loading the EICAR AV Test File into a disassembler will produce the following listing. The first column shows the current segment:offset, the second one the program opcodes and the last one shows the corresponding x86 instructions.
; Beginning of the executable code (29 bytes): 0001:0100 58 pop ax 0001:0101 354F21 xor ax, 214Fh 0001:0104 50 push ax 0001:0105 254041 and ax, 4140h 0001:0108 50 push ax 0001:0109 5B pop bx 0001:010A 345C xor al, 5Ch 0001:010C 50 push ax 0001:010D 5A pop dx 0001:010E 58 pop ax 0001:010F 353428 xor ax, 2834h 0001:0112 50 push ax 0001:0113 5E pop si 0001:0114 2937 sub [bx], si 0001:0116 43 inc bx 0001:0117 43 inc bx 0001:0118 2937 sub [bx], si 0001:011A 7D24 jge 0140 ; The '$'-terminated string (35 bytes): 0001:011C db 'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$' ; End of the executable code (4 bytes): 0001:0140 48 dec ax 0001:0141 2B482A sub cx, [bx+si+2Ah]
Out of the 68 bytes, only 33 are used for the executable code: the first 29 bytes and the last 4 ones. The 35 bytes left are the
EICAR-STANDARD-ANTIVIRUS-TEST-FILE! string plus its trailing
Regarding the last four bytes, they just don’t make sense: the program decrements the
ax register at offset 0x0140, and then subtracts a value to the
cx one at offset 0x0141, but it does not make any use of them before exiting. A deeper code analysis will help us to understand that part.
This first instruction pops two bytes from the stack pointer,
ss:[sp], into the
0001:0100 58 pop ax
Because it is empty, it simply clears
ax. This is equivalent to instructions such as
mov ax, 0 or the faster (and more elegant)
xor ax, ax.
It makes a XOR mask with
ax and 214Fh:
0001:0101 354F21 xor ax, 214Fh
ax was empty, it will be equal to 214Fh now.
It is saved on the stack:
0001:0104 50 push ax
It makes a AND mask with
ax (214Fh) and 4140h:
0001:0105 254041 and ax, 4140h
To make a mask, we convert both values to their binary notation:
214Fh: 0010000101001111 4140h: 0100000101000000 ------------------------ AND 0000000101000000 => 140h
ax new value is now 140h. Note that this is the address of the offset of the first byte following the EICAR string data.
The value of
ax is pushed on the stack, and popped back into
0001:0108 50 push ax 0001:0109 5B pop bx
It makes a XOR mask with
al and 5Ch:
0001:010A 345C xor al, 5Ch
ax value is currently 0140h.
al is the lower byte of
ax (40h), while
ah is the higher one (01h):
40h: 01000000 5Ch: 01011100 -------------- XOR 00011100 => 1Ch
al‘s new value is 1Ch and hence
ax is now equal to 011Ch. It points to the address of the offset of the EICAR string.
It saves the 011Ch value on the stack and pops it back into the
dx register for later use (remember this one, it is important):
0001:010C 50 push ax 0001:010D 5A pop dx
It pops the current stack pointer value into
ax. It is 214Fh (see the instruction at offset 0x0104):
0001:010E 58 pop ax
We need to make a XOR mask with
ax and 2834h:
0001:010F 353428 xor ax, 2834h
214Fh: 0010000101001111 2834h: 0010100000110100 ------------------------ XOR 0000100101111011 => 097Bh
ax is now equal to 097Bh. The important value to remember here is the higher byte, 09h, which is stored in
ah. It is saved on the stack and popped back into the
0001:0112 50 push ax 0001:0113 5E pop si
And now, here comes the good part: the self-modifying code.
0001:0114 2937 sub [bx], si
bx is equal to 0140h and hence the word pointer
[bx] contains the 2 bytes at offset 0x0140 that are 48h and 2Bh, which gives us 2B48h (little-endian data format). Minus the value of
si, 097Bh, the new word at
[bx] becomes 21CDh.
If you are familiar with assembly language and the DOS interrupts, you probably noticed that 21CDh is the opcode used for the DOS interrupt 21 call:
Then, it increments
bx (0140h) twice:
0001:0116 43 inc bx 0001:0117 43 inc bx
bx is now equal to 0142h, which means that it points to the address of the offset of the last two bytes of the program, respectively 48h and 2Ah.
si to the word pointer
0001:0118 2937 sub [bx], si
si (097Bh) gives 20CDh. Here too, we can easily recognize another DOS interrupt call,
int 20h, used to terminate a COM program.
The last two bytes of the program are patched on the fly with that word. This is the end of the self-modifying code.
The next instruction is a conditional one and means: jump to offset 0x0140 if the
SF Sign flag equals the
OF Overflow one. In this program, the condition is always met and hence it will jump over the EICAR string (note that there is no alternative, otherwise the COM program would crash trying to execute the string!):
0001:011A 7D24 jge 0140 0001:011C db 'EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$'
And now, we can see the result of the self-modifying code. In the original disassembly listing, we had these two nonsense instructions:
0001:0140 48 dec ax 0001:0141 2B482A sub cx, [bx+si+2A]
We just patched them on the fly, and now we have:
0001:0140 CD21 int 21h 0001:0142 CD20 int 20h
It will use the values of ah and ds:dx that were manipulated during the execution of the program to call the interrupt 21h with the following parameters:
ah = 09h: DOS int 21h, display string service 09h.
ds:dx = 011Ch: the offset of the ‘$’ terminated string to display to the screen, which here points to the EICAR 35-byte string.
Lastly, it calls the DOS interrupt 20h.
That’s it. This COM program simply prints the “EICAR-STANDARD-ANTIVIRUS-TEST-FILE!” message and quits.
The whole program could have been written in a much simpler way:
jmp @start msg db "EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$" @start: mov dx, offset msg mov ah,09h int 21h int 20h
But it wouldn’t match the requirements:
- The EICAR AV Test file must only use the following ASCII printable characters: UPPER CASE letters, digits and punctuation marks. No other characters are allowed.
- Its code can be copied and paste with any text editor, there is no need to base64-encode it. It can also be printed.
- No compiler or linker are needed to build it.
Regarding the self-modifying code, it was used for two reasons:
int 20hx86 instructions cannot be translated into printable ASCII characters. Creating them on the fly was the best solution.
- For decades, virus have used self-modifying code (SMC) as an evasion technique to bypass anti-virus programs. Because the EICAR AV Test file is intended to be used to test anti-virus programs and should be treated as a virus, self-modifying code simply adds some fun to this small but clever 68-byte COM program.
If you are a developer and want your security application to block the EICAR AV Test file, here is how to detect it accurately:
- An EICAR AV Test file must start with these 68 bytes:
This is absolutely mandatory. Because it is a COM program, if there were any character preceding it, it would not run but crash.
- Optionally, it may be followed by whitespace characters, but the total length of the file must not exceed 128 bytes. Only five types of whitespace characters are allowed: Tab (0x09), Line Feed (0x10), Carriage Return (0x13), Space (0x20), and Ctrl-Z (0x1A).
If the file meets all the above conditions, your application must treat it as a virus and block it. Otherwise, it must ignore it.