Arcanum Analysis - The DOS Header [Part 2]

Old and abandoned stuff goes here...
Ailing Wolf
Ailing Wolf
Posts: 19
Joined: Fri Feb 11, 2011 6:55 pm
Location: Void

Arcanum Analysis - The DOS Header [Part 2]

Post by Bia » Mon Nov 21, 2011 12:16 am

Wellcome to part 2 of arcanum analysis, today we will be learning somethink valuable about dos header, i understand that it might seem a bit "useless,non-worthy" or arcanum non related, but you have to understand these basics deeply before you can actually go and analyze/edit something. Just a few more parts and well be injecting our own code into arcanum. :)

As mentioned in part 1, the first component in the PE file format is the MS-DOS header. The MS-DOS header is not new for the PE file format. It is the same MS-DOS header that has been around since version 2 of the MS-DOS operating system. The main reason for keeping the same structure intact at the beginning of the PE file format is so that, when you attempt to load a file created under Windows version 3.1 or earlier, or MS DOS version 2.0 or later, the operating system can read the file and understand that it is not compatible. In other words, when you attempt to run a Windows NT executable on MS-DOS version 6.0, you get this message: "This program cannot be run in DOS mode." If the MS-DOS header was not included as the first part of the PE file format, the operating system would simply fail the attempt to load the file and offer something completely useless, such as: "The name specified is not recognized as an internal or external command, operable program or batch file."

All PE files start with the DOS header which occupies the first 64 bytes of the file. It's there in case the program is run from DOS, so DOS can recognize it as a valid executable(else we get the useless error mentioned above) and run the DOS stub which is stored immediately after the header. The DOS stub usually just prints a string something like "This program must be run under Microsoft Windows" but it can be a whole DOS program. When building an application for Windows, the linker links a default stub program called WINSTUB.EXE into your executable. You can override the default linker behavior by substituting your own valid MS-DOS-based program in place of WINSTUB and using the -STUB: linker option when linking the executable file.

The DOS header is a structure defined in the or winnt.h files. (If you have an assembler or compiler installed you will find them in the \include\ directory). It has 19 members of which magic and lfanew are of interest:


Code: Select all

typedef struct _IMAGE_DOS_HEADER {  // DOS .EXE header
    USHORT e_magic;         // Magic number
    USHORT e_cblp;          // Bytes on last page of file
    USHORT e_cp;            // Pages in file
    USHORT e_crlc;          // Relocations
    USHORT e_cparhdr;       // Size of header in paragraphs
    USHORT e_minalloc;      // Minimum extra paragraphs needed
    USHORT e_maxalloc;      // Maximum extra paragraphs needed
    USHORT e_ss;            // Initial (relative) SS value
    USHORT e_sp;            // Initial SP value
    USHORT e_csum;          // Checksum
    USHORT e_ip;            // Initial IP value
    USHORT e_cs;            // Initial (relative) CS value
    USHORT e_lfarlc;        // File address of relocation table
    USHORT e_ovno;          // Overlay number
    USHORT e_res[4];        // Reserved words
    USHORT e_oemid;         // OEM identifier (for e_oeminfo)
    USHORT e_oeminfo;       // OEM information; e_oemid specific
    USHORT e_res2[10];      // Reserved words
    LONG   e_lfanew;        // File address of new exe header
Note: .H file extension is associeted with Header file type wich is usually used by a C Compiler but Java also has been known to use .H as a header include file. Therefore USHORT(unsigned short int) and LONG(long int) are C compiler specific integral data types. However in more general terms we can exchange USHORT for WORD and LONG for DWORD. DWORD ("double word") = 4 bytes or 32bit value, WORD = 2 bytes or 16bit value, sometimes you will also see dd for DWORD, dw for WORD and db for byte.(The definitions are helpful as they tell us the size of each member. This allows us to locate information of interest by counting the number of bytes from the start of the section or any other identifiable point.)

The first field, e_magic, is the so-called magic number. This field is used to identify an MS-DOS-compatible file type. All MS-DOS-compatible executable files set this value to 4Dh, 5Ah (The letters "MZ" for Mark Zbikowsky one of the original architects of MS-DOS, as mentioned in part 1) which signifies a valid DOS header. MZ are the first 2 bytes you will see in any PE file opened in a hex editor. Many other fields are important to MS-DOS operating systems, but for Windows NT, there is really one more important field in this structure. The final field, e_lfanew is a DWORD which sits at the end of the DOS header directly before the DOS stub begins. It contains the offset of the PE header, relative to the file beginning. The windows loader looks for this offset so it can skip the DOS stub and go directly to the PE header.

As we said above, the DOS header occupies the first 64 bytes of the file - ie the first 4 rows seen in the hexeditor in the picture below. The last DWORD before the DOS stub begins contains 00h 01h 00h 00h. Allowing for reverse byte order this gives us 00 00 01 00h which is the offset where the PE header begins. The PE header begins with its signature 50h, 45h, 00h, 00h (the letters "PE" followed by two terminating zeroes).

If in the Signature field of the PE header, you find an NE signature here rather than a PE, you're working with a 16-bit Windows NE file. Likewise, an LE in the signature field would indicate a Windows 3.x virtual device driver (VxD). An LX here would be the mark of a file for OS/2 2.0.(i have mentioned these in part 1)

An example of small .EXE file as shown in basic hex editor:

Again its alot of terminology, but virtual world is world with strict defined terms/logic so all you can do is learn it, the logic can be sometimes bend, but never broken. :snooty:

Stay tuned for part 3 :) Btw... At end ill give you a basic question : Why is the last member of DOS header structure DWORD and not WORD?