Share to learn: System Fundementals

Showing posts with label System Fundementals. Show all posts

How do debuggers work? - P.3

Ok, as mentioned in the previous entry, let's me continue to discuss about the problem a debugger faces - thread or more exactly, multithread.

When you want to stop a debuggee that's churning like mad, you need to get a breakpoint jammed into the CPU instruction stream so that you can stop in the debugger. The question is, what do you need to do to get the instruction in there? If a thread is running, the only thing you can do to get it to a known point is to suspend it by using the SuspendThread API function. Once the thread is suspended, you can look at it with the GetThreadContext API function and determine the current instruction pointer. Once you have the instruction pointer, you're back to setting simple breakpoints. After you set the breakpoint, you need to call the ResumeThread API function so that you can let the thread continue execution and have it hit your breakpoint.

Although breaking into the debugger is fairly simple, you still need to think about a couple of issues. The first issue is that your breakpoint might not trigger. If the debuggee is processing a message or doing some other work, it will break. If the debuggee is sitting there waiting for a message to arrive, however, the breakpoint won't trigger until the debuggee receives a message. Although you could require the user to move the mouse over the debuggee to generate a WM_MOUSEMOVE message, the user might not be too happy about this requirement.

Exploring Windows Memory Management - Virtual Memory

In this entry, I'll try to explain more the other strategy for managing memory - virtual memory.

Virtual Memory

The basic idea behind virtual memory is that the combined size of the program, data, and stack may exceed the amount of physical memory available for it. The operating system keeps those parts of the program currently in use in main memory, and the rest on the disk. For example, a 512-MB program can run on a 256-MB machine by carefully choosing which 256 MB to keep in memory at each instant, with pieces of the program being swapped between disk and memory as needed.

Virtual memory can also work in a multiprogramming system, with bits and pieces of many programs in memory at once. While a program is waiting for part of itself to be brought in, it is waiting for I/O and cannot run, so the CPU can be given to another process, the same way as in any other multiprogramming system.

Paging

Most virtual memory systems use a technique called paging, which we will now describe. On any computer, there exists a set of memory addresses that programs can produce. When a program uses an instruction like

MOV REG, 1000

It does this to copy the contents of memory address 1000 to REG (or vice versa, depending on the computer). Addresses can be generated using indexing, base registers, segment registers, and other ways.

These program-generated addresses are called virtual addresses and form the virtual address space. On computers without virtual memory, the virtual address is put directly onto the memory bus and causes the physical memory word with the same address to be read or written. When virtual memory is used, the virtual addresses do not go directly to the memory bus. Instead, they go to an MMU (Memory Management Unit) that maps the virtual addresses onto the physical memory addresses.

A very simple example of how this mapping works is shown in the following figure. In this example, we have a computer that can generate 16-bit addresses, from 0 up to 64K. These are the virtual addresses. This computer, however, has only 32 KB of physical memory, so although 64-KB programs can be written, they cannot be loaded into memory in their entirety and run. A complete copy of a program's memory image, up to 64 KB, must be present on the disk, however, so that pieces can be brought in as needed.

The virtual address space is divided up into units called pages. The corresponding units in the physical memory are called page frames. The pages and page frames are always the same size. In this example they are 4 KB, but page sizes from 512 bytes to 1 MB have been used in real systems. With 64 KB of virtual address space and 32 KB of physical memory, we get 16 virtual pages and 8 page frames. Transfers between RAM and disk are always in units of a page. When the program tries to access address 0, for example, using the instruction

MOV REG, 0

virtual address 0 is sent to the MMU. The MMU sees that this virtual address falls in page 0 (0 to 4095), which according to its mapping is page frame 2 (8192 to 12287). It thus transforms the address to 8192 and outputs address 8192 onto the bus. The memory knows nothing at all about the MMU and just sees a request for reading or writing address 8192, which it honors. Thus, the MMU has effectively mapped all virtual addresses between 0 and 4095 onto physical addresses 8192 to 12287. Similarly, an instruction

MOV REG, 8192

is effectively transformed into

MOV REG, 24576

because virtual address 8192 is in virtual page 2 and this page is mapped onto physical page frame 6 (physical addresses 24576 to 28671). As a third example, virtual address 20500 is 20 bytes from the start of virtual page 5 (virtual addresses 20480 to 24575) and maps onto physical address 12288 + 20 = 12308.

By itself, this ability to map the 16 virtual pages onto any of the eight page frames by setting the MMU's map appropriately does not solve the problem that the virtual address space is larger than the physical memory. Since we have only eight physical page frames, only eight of the virtual pages are mapped onto physical memory. The others, shown as crosses in the figure, are not mapped. In the actual hardware, a present/absent bit keeps track of which pages are physically present in memory.

What happens if the program tries to use an unmapped page, for example, by using the instruction

MOV REG, 32780

which is byte 12 within virtual page 8 (starting at 32768)? The MMU notices that the page is unmapped (indicated by a cross in the figure) and causes the CPU to trap to the operating system. This trap is called a page fault. The operating system picks a little-used page frame and writes its contents back to the disk. It then fetches the page just referenced into the page frame just freed, changes the map, and restarts the trapped instruction.

For example, if the operating system decided to evict page frame 1, it would load virtual page 8 at physical address 4K and make two changes to the MMU map. First, it would mark virtual page 1's entry as unmapped, to trap any future accesses to virtual addresses between 4K and 8K. Then it would replace the cross in virtual page 8's entry with a 1, so that when the trapped instruction is re-executed, it will map virtual address 32780 onto physical address 4108.

Now let us look inside the MMU to see how it works and why we have chosen to use a page size that is a power of 2. In the next figure we see an example of a virtual address, 8196 (0010000000000100 in binary), being mapped using the MMU map of the previous one. The incoming 16-bit virtual address is split into a 4-bit page number and a 12-bit offset. With 4 bits for the page number, we can have 16 pages, and with 12 bits for the offset, we can address all 4096 bytes within a page.

The page number is used as an index into the page table, yielding the number of the page frame corresponding to that virtual page. If the present/absent bit is 0, a trap to the operating system is caused. If the bit is 1, the page frame number found in the page table is copied to the high-order 3 bits of the output register, along with the 12-bit offset, which is copied unmodified from the incoming virtual address. Together they form a 15-bit physical address. The output register is then put onto the memory bus as the physical memory address.

Segmentation

This is another scheme to manage memory bases on virtual memory concept. In that way, the virtual address space is a collection of segments. Each segment has a name and a length. Thus the addresses specify both the segment name and the offset within that segment. The user therefore specifies each address by two quantities: a segment name and an offset. Contrast this scheme with the paging one, in which the user specifies only a single address which is partitioned by the hardware into a page number and an offset, all invisible to the programmer.

For simplicity of implementation, segments are numbered and are referred to by a segment number rather than by a segment name. That's the reason why a logical address in segmentation approach consists of a two tuple:

<segment-number, offset>

Although the user can refer to objects in the program by a two-dimensional address, the actual physical address is still, of course, a one-dimensional sequence of bytes. Thus, segment table is defined as a mapping from two-dimensional user-defined address to one-dimensional physical address. Each entry in the segment table has a segment base and a segment limit. The segment base contains the starting physical address where the segment resides in memory, whereas the segment limit specifies the length of the segment.

Intel Architecture's strategy
Both paging and segmentation have advantages and disadvantages. In fact, some architectures provide both and Intel one is one of them. It supports both pure segmentation and segmentation with paging.

In Pentium system, the CPU generates logical address which are given to the segmentation unit. The segmentation unit produces a linear address for each logical address. The linear address is then given to the paging unit which in turn generates the physical address in main memory. Thus, the segmentation unit and paging unit form the equivalent of the memory-management unit as follows:

The Pentium architecture allows a segment to be as 4GB and the maximum number of segments per process is 16KB. The logical address space is divided into two partitions. The first one consists of up to 8KB segments that are private to that process. The second consists of up to 8KB segments that are shared among all the process. About the paging, a page size is either 4 KB or 4 MB.

References:

Andrew S. Tanenbaum, Operating Systems Design and Implementation, Third Edition.
Abraham Silberschatz, Peter Baer Galvin, Greg Gagne, Operating System Concepts, 7th Edition

Exploring Windows Memory Management - Swapping

Before keeping going with the second part about debugger, there're somethings associated with the way in which Windows manages memory. Therefore, I decided to bookmark that second one and have begun exploring the memory architecture of Windows.

This entry will take a closer look at two memory management approach: swapping and virtual memory. The difference between them is the way in which a process is loaded to main memory. Know that to fully understand the memory architecture can't be achieved just in only several entries. Anyway, I hope that I can partially get a better knowledge about this concept than before :-)

Swapping

The operation of a swapping system is illustrated in the following figure. Initially, only process A is in memory. Then processes B and C are created or swapped in from disk. In (d) A is swapped out to disk. Then D comes in and B goes out. Finally A comes in again. Since A is now at a different location, addresses contained in it must be relocated, either by software when it is swapped in or (more likely) by hardware during program execution.

When swapping creates multiple holes in memory, it is possible to combine them all into one big one by moving all the processes downward as far as possible. This technique is known as memory compaction. It is usually not done because it requires a lot of CPU time. For example, on a 1-GB machine that can copy at a rate of 2 GB/sec (0.5 nsec/byte) it takes about 0.5 sec to compact all of memory. That may not seem like much time, but it would be noticeably disruptive to a user watching a video stream.

A point that is worth making concerns how much memory should be allocated for a process when it is created or swapped in. If processes are created with a fixed size that never changes, then the allocation is simple: the operating system allocates exactly what is needed, no more and no less.

If, however, processes' data segments can grow, for example, by dynamically allocating memory from a heap, as in many programming languages, a problem occurs whenever a process tries to grow. If a hole is adjacent to the process, it can be allocated and the process can be allowed to grow into the hole. On the other hand, if the process is adjacent to another process, the growing process will either have to be moved to a hole in memory large enough for it, or one or more processes will have to be swapped out to create a large enough hole. If a process cannot grow in memory and the swap area on the disk is full, the process will have to wait or be killed.

If it is expected that most processes will grow as they run, it is probably a good idea to allocate a little extra memory whenever a process is swapped in or moved, to reduce the overhead associated with moving or swapping processes that no longer fit in their allocated memory. However, when swapping processes to disk, only the memory actually in use should be swapped; it is wasteful to swap the extra memory as well. In the following figure we see a memory configuration in which space for growth has been allocated to two processes.

If processes can have two growing segments, for example, the data segment being used as a heap for variables that are dynamically allocated and released and a stack segment for the normal local variables and return addresses, an alternative arrangement suggests itself, namely that of the figure (b). In this figure we see that each process illustrated has a stack at the top of its allocated memory that is growing downward, and a data segment just beyond the program text that is growing upward. The memory between them can be used for either segment. If it runs out, either the process will have to be moved to a hole with sufficient space, swapped out of memory until a large enough hole can be created, or killed.

Ok, that's enough for a concept :-) The remaining strategy of memory management - virtual memor, will be mentioned in the next entry.

References

Andrew S. Tanenbaum, Operating Systems Design and Implementation, Third Edition.

How do debuggers work? - P.2

I'll continue to talk about the way how a debugger works after investing the memory management.

Reading and Writing Memory

Reading from a debuggee's memory is simple. ReadProcessMemory takes care of it for you. A debugger has full access to the debuggee if the debugger started it because the handle to the process returned by the CREATE_PROCESS_DEBUG_EVENT debug event has PROCESS_VM_READ and PROCESS_VM_WRITE access. If the debugger attaches to the process with DebugActiveProcess, OpenProcess must be used to get a handle to the debuggee, and it's needed to specify both read and write access.

Before I can talk about writing to the debuggee's memory, I need to briefly explain an important concept: copy-on-write. When Windows loads an executable file, Windows shares as many mapped memory pages of that binary as possible with the different processes using it. If one of those processes is running under a debugger and one of those pages has a breakpoint written to it, the breakpoint obviously can't be present in all the processes sharing that page. As soon as any process running outside the debugger executed that code, it would crash with a breakpoint exception. To avoid that situation, the operating system sees that the page changed for a particular process and makes a copy of that page that is private to the process that had the breakpoint written to it. Thus, as soon as a process writes to a page, the operating system copies the page.

An interesting detail about the Win32 Debugging API is that the debugger is responsible for getting the string to output when an OUTPUT_DEBUG_STRING_EVENT comes through. The information passed to the debugger includes the location and the length of the string. When it receives this message, the debugger goes and reads the memory out of the debuggee. There're multiple trace statements could easily change an application's behavior when running under a debugger. Because all threads in the application stop when the debug loop is processing an event, calling OutputDebugString in the debuggee means that all your threads stop.Take a look at the following code to see how a debugger (WDBG) handles the OUTPUT_DEBUG_STRING_EVENT. Notice that the DBG_ReadProcessMemory function is the wrapper function around ReadProcessMemory from LOCALASSIST.DLL.


static DWORD OutputDebugStringEvent(CDebugBaseUser *      pUserClass  ,

                                    LPDEBUGGEEINFO        pData       ,
  DWORD                 dwProcessId ,

                                    OUTPUT_DEBUG_STRING_INFO & stOD
  DWORD                 dwThreadId  ,
SI)
{
    TCHAR szBuff[512];
    HANDLE hProc = pData->GetProcessHandle();
    DWORD dwRead ;

  // Read the memory.

    BOOL bRet = DBG_ReadProcessMemory(hProc,


                                      stODSI.lpDebugStringData,
                                      szBuff ,

sizeof(szBuff),
                                      min
(

                                     stODSI.nDebugStringLength),

                                       &dwRead);


    ASSERT ( TRUE == bRet ) ;
  if ( TRUE == bRet ) {

  // Always NULL terminate the string.

       szBuff [ dwRead + 1 ] = _T ( '\0' ) ;


        // Convert CR/LFs if I’m supposed to.

        pUserClass->ConvertCRLF(szBuff, sizeof(szBuff));

        // Send the converted string on to the user class.

        pUserClass->OutputDebugStringEvent(dwProcessId, dwThreadId, szBuff);


    }

   return ( DBG_CONTINUE ) ;
}

Breakpoint and Single Stepping

Most engineers don't realize that debuggers use breakpoints extensively behind the scenes to allow the debugger to control the debuggee. Although you might not directly set any breakpoints, the debugger will set many to allow you to handle tasks such as stepping over a function call. The debugger also uses breakpoints when you choose to run to a specific source file line and stop. Finally, the debugger uses breakpoints to break into the debuggee on command (via the Debug Break menu option in WDBG, for example).

The concept of setting a breakpoint is simple. All you need to do is have a memory address where you want to set a breakpoint, save the opcode (the value) at that location, and write the breakpoint instruction into the address. On the Intel Pentium family, the breakpoint instruction mnemonic is "INT 3" or an opcode of 0xCC, so you need to save only a single byte at the address you're setting the breakpoint. Other CPUs, such as the Intel Merced, have different opcode sizes, so you would need to save more data at the address.

Take a look at the SetBreakPoint function in the following code:


int CPUHELP_DLLINTERFACE __stdcall SetBreakpoint ( PDEBUGPACKET dp,  ULONG ulAddr  ,OPCODE *   pOpCode ) {    

    

    DWORD dwReadWrite = 0 ;

    BYTE bTempOp = BREAK_OPCODE ;
    BOOL bReadMem ;

    BOOL bFlush ;
    BOOL bWriteMem ;  

    MEMORY_BASIC_INFORMATION mbi ;
    DWORD dwOldProtect ;




   ASSERT ( FALSE == IsBadReadPtr ( dp , sizeof ( DEBUGPACKET ) ) ) ;

   ASSERT ( FALSE == IsBadWritePtr ( pOpCode , sizeof ( OPCODE ) ) ) ;

   if ( ( TRUE == IsBadReadPtr ( dp , sizeof ( DEBUGPACKET ) ) ) || ( TRUE == IsBadWritePtr ( pOpCode , sizeof ( OPCODE ) ) )   ) {

 

         TRACE0 ( "SetBreakpoint : invalid parameters\n!" ) ;

         return ( FALSE ) ;

  }


// If the operating system is Windows 98 and the address is above 2 GB, just leave quietly.

    if ( ( FALSE == IsNT ( ) ) && ( ulAddr >= 0x80000000 ) ) {

         return ( FALSE ) ;
   }

  // Read the opcode at the location.


  bReadMem = DBG_ReadProcessMemory ( dp->hProcess, (LPCVOID)ulAddr, &bTempOp, sizeof ( BYTE ) , &dwReadWrite) ;

    ASSERT ( FALSE != bReadMem ) ;

   sizeof ( BYTE ) == dwReadWrite ) ;
    ASSERT ( if ( ( FALSE == bReadMem) ||  ( sizeof ( BYTE ) != dwReadWrite ) ) {

           return ( FALSE ) ;
    }


 // Is this new breakpoint about to overwrite an existing breakpoint opcode?

    if ( BREAK_OPCODE == bTempOp ) {

          return ( -1 ) ;
    }


// Get the page attributes for the debuggee.

   DBG_VirtualQueryEx ( dp->hProcess                        ,

                        (LPCVOID)ulAddr                     ,


                          &mbi                                ,


                         sizeof ( MEMORY_BASIC_INFORMATION )  ) ;

   // Force the page to copy-on-write in the debuggee.

   if ( FALSE == DBG_VirtualProtectEx ( dp->hProcess           ,

                                        mbi.BaseAddress        ,

                                          mbi.RegionSize         ,

                                        &mbi.Protect         
                                         PAGE_EXECUTE_READWRITE ,

    ) )
    {

  ASSERT ( !"VirtualProtectEx failed!!" ) ;
     
 

       return ( FALSE ) ;
 
    }

// Save the opcode I’m about to whack.

   *pOpCode = (void*)bTempOp ;

  bTempOp = BREAK_OPCODE ;    dwReadWrite = 0 ;    // The opcode was saved, so now set the breakpoint.


   bWriteMem = DBG_WriteProcessMemory ( dp->hProcess, (LPVOID)ulAddr ,                                         (LPVOID)&bTempOp ,


                                         sizeof ( BYTE )  ,

                                         &dwReadWrite      ) ;
    ASSERT ( FALSE != bWriteMem ) ;

sizeof ( BYTE ) == dwReadWrite ) ;
    ASSERT (

 if ( ( FALSE == bWriteMem             ) || 
  
 

        ( sizeof ( BYTE ) != dwReadWrite ) )
 
    {

 return ( FALSE ) ;
      
 
    }


// Change the protection back to what it was before I blasted the
   
 

    // breakpoint in.

    VERIFY ( DBG_VirtualProtectEx ( dp->hProcess    ,
                                    mbi.BaseAddress ,

                                    mbi.Protect     ,
                                    mbi.RegionSize  ,

   ) ) ;

   
                                    &dwOldProtect

  // Flush the instruction cache in case this memory was in the CPU

    // cache.

    bFlush = DBG_FlushInstructionCache ( dp->hProcess    ,

                                         (LPCVOID)ulAddr ,

                                         sizeof ( BYTE )  ) ;

    ASSERT ( TRUE == bFlush ) ;

   return ( TRUE ) ;
 


}

After you set the breakpoint, the CPU will execute it and will tell the debugger that an EXCEPTION_BREAKPOINT (0x80000003) occurred—that's where the fun begins. If it's a regular breakpoint, the debugger will locate and display the breakpoint location to the user. After the user decides to continue execution, the debugger has to do some work to restore the state of the program. Because the breakpoint overwrote a portion of memory, if you, as the debugger writer, were to just let the process continue, you would be executing code out of sequence and the debuggee would probably crash. What you need to do is to move the current instruction pointer back to the breakpoint address and replace the breakpoint with the opcode you saved when you set the breakpoint. After restoring the opcode, you can continue executing.

There's only one small problem: How do you reset the breakpoint so that you can stop at that location again? If the CPU you're working on supports single-step execution, resetting the breakpoint is trivial. In single-step execution, the CPU executes a single instruction and generates another type of exception, EXCEPTION_SINGLE_STEP (0x80000004). Fortunately, all CPUs that Win32 runs on support single-step execution. For the Intel Pentium family, setting single-step execution requires that you set bit 8 on the flags register. The Intel reference manual calls this bit the TF, or Trap Flag. The followingcode shows the SetSingleStep function and the work needed to set the TF. After replacing the breakpoint with the original opcode, the debugger marks its internal state to reflect that it's expecting a single-step exception, sets the CPU into single-step execution, and then continues the process.


BOOL CPUHELP_DLLIMNTERFACE __stdcall SetSingleStep(PDEBUGPACKET dp)

{

     BOOL bSetContext ;

     ASSERT ( FALSE == IsBadReadPtr(dp, sizeof(DEBUGPACKET)));

     if(TRUE == IsBadReadPtr(dp, sizeof(DEBUGPACKET)))
     {

  TRACE0("SetSingleStep : invalid parameters\n!"); 


         return (FALSE);
     }
 

// For the i386, just set the TF bit.


    dp->context.EFlags |= TF_BIT ;


    bSetContext = DBG_SetThreadContext(dp->hThread, &dp->context ) ;


      ASSERT(FALSE != bSetContext);


     return (bSetContext);

}

After the debugger releases the process by calling ContinueDebugEvent, the process immediately generates a single-step exception after the single instruction executes. The debugger checks its internal state to verify that it was expecting a single-step exception. Because the debugger was expecting a single-step exception, it knows that a breakpoint needs to be reset. The single step caused the instruction pointer to move past the original breakpoint location. Due to that, the debugger can set the breakpoint opcode back at the original breakpoint location without any impact to the execution process. The operating system automatically clears the TF each time the EXCEPTION_SINGLE_STEP exception occurs, so there's no need for the debugger to clear it. After setting the breakpoint, the debugger releases the debuggee to continue running.

One more thing when writing a debugger is considered is concerned with thread, especially in multithread environment. But it seems that this entry is long enough, so I decide to leave it in the third part.

Greatz thanks to:

John Robbins, Debugging Applications, 2000.

How do debuggers work? - P.1

Until now, although having more than 5 years working as a programmer, I actually have not understood what the hell the concept "debug" is and how it works. And hope that it's not too late for a investigation about the concept. Greatz thanks Alexander Sandler for ur article here

First of all, actual debugging requires operating system kernel support and here’s why. Think about it. We’re living in a world where one process reading memory belonging to another process is a serious security vulnerability. Yet, when debugging a program, we would like to access a memory that is part of debugged process’s (debuggie) memory space, from debugger process. It is a bit of a problem, isn’t it? We could, of course, try somehow to use same memory space for both debugger and debuggie, but then what if debuggie itself creates processes. This really complicates things.

Debugger support has to be part of the operating system kernel. Kernel able to read and write memory that belongs to each and every process in the system. Furthermore, as long as process is not running, kernel can see value of its registers and debugger have to be able to know values of the debuggie registers. Otherwise it won’t be able to tell you where the debuggie has stopped (when we pressed CTRL-C in gdb for instance).

Some "malicious" code is run before main is executed!

Yes, keep going with the article I mentioned in the previous post, I come up with a new thing: "mainCRTStartup" or other function depending on the switch we choose when start a new project (/SUBSYSTEM:WINDOWS or /SUBSYSTEM:CONSOLE). Let's go >:)

The first guy I met is crtexe.c and the code of that function is here:

#ifdef _WINMAIN_
#ifdef WPRFLAG
int wWinMainCRTStartup(
#else /* WPRFLAG */
int WinMainCRTStartup(
#endif /* WPRFLAG */
#else /* _WINMAIN_ */
#ifdef WPRFLAG
int wmainCRTStartup(
#else /* WPRFLAG */
int mainCRTStartup(
#endif /* WPRFLAG */
#endif /* _WINMAIN_ */
void
)
{
/*
* The /GS security cookie must be initialized before any exception
* handling targetting the current image is registered. No function
* using exception handling can be called in the current image until
* after __security_init_cookie has been called.
*/
__security_init_cookie();
return __tmainCRTStartup();
}

Let's begin with the __security_init_cookie( ) function. "Cookie", "security", huhm, something relating to web application? Of course not, I have investigated and found some helpful information from here. Thanks Brandon Bray (MSFT) for your useful article.

Now I'll summary everything in my view for easily understanding later.

Functions and Stack Frames

To allow for many unknowns in the execution environment, functions are frequently set up with a "stack frame" to allow access to both function parameters, and automatic function variables. The idea behind a stack frame is that each subroutine can act independently of its location on the stack, and each subroutine can act as if it is the top of the stack.

When a function is called, a new stack frame is created at the current esp location. A stack frame acts like a partition on the stack. All items from previous functions are higher up on the stack, and should not be modified. Each current function has access to the remainder of the stack, from the stack frame until the end of the stack page. The current function always has access to the "top" of the stack, and so functions do not need to take account of the memory usage of other functions or programs.

What's in vtable besides the function pointers?

That's is RTTICompleteObjectLocator at the tail!

Keep looking deeper and get some information from Internet, everything looks like this:

Greatz thanks to openrce and especially igorsk coz of your detail article. Everyone can take the full one at here: http://www.openrce.org/articles/full_view/23

C++ to ASM: Behind the scene!

Yeah, a weekend of fully reading about the way in that C++ is compiled to assembly. It's greatz and of course, my girl friend is not really happy without any weekend dating :D

ECX register

I've just know about this two day ago :| in one of my previous post (ref from other). This register is usually used as the this pointer. And it's often assigned a value just before a function is about to be called. I wrote a small application to test it. Its function is just to print the address of my object and call the method.

class A { public: void func() { cout << "class A"; } };
void main()
{
A obj;
cout << &obj << endl;
obj.func();
return;
}

And here the output:

0012FF63

class A

I load it into OllyDbg and after carrying out several steps over, I reach to the target:

The value in ECX which is assigned right before the call is the same as the address of obj.

How an object in C/C++ is stored (cont.)

Virtual Inheritance

Let's see the following code:

class A { public: int a; virtual void fA() { } };
class B : public virtual A { public: int b; virtual void fA() { } };
void main()
{
B obj;
obj.a = 50; obj.b = 100;
cout << sizeof(obj) << endl << &obj << endl << &obj.a << endl << &obj.b << endl;
}

Output:

20

0012FF50

0012FF60

0012FF58

That's strange, right? I guessed obj's size's been 16 including vfptr of C, vfptr of A and two members "a" and "b". So, why do we have extra 4 bytes? What's it for?


No Inheritance


class A
{
public:
    int a;
};
 
class B
{
public:
    int b;
};
 
class C: public A, public B
{
public:
    int c;
};
 
void main()
{
    C obj;
    cout << &obj << endl << &obj.a << endl << &obj.b << endl << &obj.c << endl;
}

Output:

0012FF58

0012FF5C

0012FF60

if change the order of inheritance into:

class C : public B, public A

the output is:

Detail about How VPTR and Virtual table works

Assumption: 32-bit Machine.
Here I am going to explain How Virtual table, Virtual pointer for Virtual functions are internally working.

First we have understand memory layout.

Example 1: How the class's memory layout
Code: cpp

class Test

{

public:

int data1;

int data2;

int fun1();

};

int main()

{

Test obj;

cout << "obj's Size = " << sizeof(obj) <<>

cout << "obj 's Address = " << &obj <<>

return 0;

}

OUTPUT:
Sobj's Size = 8
obj 's Address = 0012FF7C

Note: Any Plane member function does not take any memory.

C++ Object Model

A Simple Object Model

Our first object model is admittedly very simple. It might be used for a C++ implementation designed to minimize the complexity of the compiler at the expense of space and runtime efficiency. In this simple model, an object is a sequence of slots, where each slot points to a member. The members are assigned a slot in the order of their declarations. There is a slot for each data or function member. This is illustrated in Figure 1.1.

Figure 1.1. Simple Object Model

In this simple model, the members themselves are not placed within the object. Only pointers addressing the members are placed within the object. Doing this avoids problems from members' being quite different types and requiring different amounts (and sometimes different types of) storage. Members within an object are addressed by their slot's index. For example, _x's index is 6 and _point_count's index is 7. The general size of a class object is the size of a pointer multiplied by the number of members declared by the class.

C++: Accessing the virtual table directly

This post is not intended for beginners. To understand the content of this topic, you need to have basic understanding of what virtual functions are.

We know that the run time binding or virtual function mechanism is implemented by a virtual table. If a class has at least one virtual function a virtual table will be created for that class. To be specific, 'only one' virtual table will be created for all of the instances/objects of that class. Each of the instances and objects will have a pointer to the virtual table.

The same thing is true for a class hierarchy. Meaning, if class Z derives class Y and class Y derives class X, only one virtual table will be created for all instances/objects of class X, Y and Z. Each of the instances and objects of X, Y and Z will have a pointer to the virtual table.

The virtual tables for each of class X, Y and Z share common information but they are not necessarily the same table for each of these classes. The scenario is complex for multiple and virtual inheritance. I would like to discuss them in future posts.

Functions and Stack Frames

Figure 1.1. Simple Object Model