Upcoming Posts
"I have no special talents. I am only passionately curious." - Albert Einstein
Thursday, January 12, 2012
Why does C++ not allow overloading ‘.’ , ‘:*’ , ‘::’ and ‘?:’ operators?
Thursday, January 5, 2012
How is default argument to a method implemented in C++?
Wednesday, January 4, 2012
'this' pointer implementation
In C++, the ‘this’ keyword is a constant pointer to object. It is actually a local variable defined in each member function, including constructors and destructors, which get initialized with object’s address passed by the caller.
Whenever, a public method called using a object, object’s address is passed to the member function and then the member function copies this supplied address to ‘this’ variable. Since ‘this’ variable is created on stack, each member method call will have separate ‘this’ variable on stack. Now using ‘this’ pointer, each of the data member is accessed in the method.
There can be two ways to pass object's address to the member function. 1) By pushing address on stack 2) by copied address in a register. Compiler can use any of the above method or can use any other method too.
Let’s take an example:
class test {
private:
int data;
public:
int public_data;
test() { data = public_data = 0; }
void display()
{ printf("\ndata = %d, public_data = %d", data, public_data); }
};
Let’s see the dis-assembly code generated for public method call on object:
test obj;
// Object’s address is getting copied in ECX register to supply it
// as input to constructor.
lea ecx,[obj]
call test::test (411195h) // constructor is getting called
obj.display();
// Object’s address is getting copied in ECX register to supply it
// as input to display method
lea ecx,[obj]
call test::display (411235h)
Now let's see the dis-assembly code of display method:
void display()
{
........
// Here ECX register contain object's address. Its value is getting copied to
// 'this' variable
mov dword ptr [ebp-8],ecx
printf("\ndata = %d, public_data = %d", data, public_data);
mov esi,esp
mov eax,dword ptr [this] // getting object's address
mov ecx,dword ptr [eax+8] //accessing 'public_data' value using 'this' pointer
push ecx
mov edx,dword ptr [this]
mov eax,dword ptr [edx+4] //accessing 'data' value using 'this' pointer
push eax
push offset string "\ndata = %d, public_data = %d" (415B10h)
call dword ptr [__imp__printf (4192D4h)]
......
}
In above dis-assembly code, we can see that the 'this' pointer is getting initialized with the object's address supplied by the caller. And the data members are getting accessed via 'this' pointer.
Tuesday, January 3, 2012
How does compiler achieve runtime binding/polymorphism?
Here is the answer:
Whenever a method is called, compiler puts machine level code (‘call’ instruction in assembly level language) and supplies method’s address to call that method. Let’s take an simple example to understand this:
virtual void run_time_binding_method()
};
Here, the ‘test’ class contain a virtual method ‘run_time_binding_method()’ and a non-virtual method ‘compile_time_binding_method()’.
Let’s create an object, its reference and a pointer to point to the created object:
test a; // Created a object of test class
Let’s call methods using object:
What do think calling a virtual method via its object will be a run time binding? If your answer is no, you are correct. When any method is called using its object, compiler is sure about the method to call. So dynamic binding/call is not at all required here even if a virtual method is being called. You can verify this by reviewing generated dis-assembly code.
a.compile_time_binding_method();
a.run_time_binding_method();
In above generated dis-assembly code, we can see that the address of both the methods are hard coded (address determined by compiler while compiling the source code) to resolve the call. This hard coded address will never change (unless you modify and re-compile the source code) in the binary. Such binding/linking is known as static binding/linking (or compile time binding).
These calls are static calls because address is hard coded with machine code generated by compiler (same as ‘call
Let’s call methods using its pointer:
For non-virtual methods:
ptrA->compile_time_binding_method();
ptrA->run_time_binding_method();
Let’s call methods using its reference:
As reference is nothing but an implicit pointer to the object, the method calls via reference is same as method calls via pointer:
refA.compile_time_binding_method();
refA.run_time_binding_method();
As I am not an author by profession, I might not have explained it in a best way J. Please help me make it best by raising your question/doubt.
Monday, January 2, 2012
Reference vs. Pointer
A reference is an alternate name for an object/variable. Reference is an implicit constant pointer to a variable. It can’t be used to point memory location say 0x1000. On the other hand, pointers can be used to point to any location in the memory. To access any address, we'd need to use pointers instead of references.
Here is a simple example of reference/pointer.
int i = 10;
mov dword ptr [i],0Ah // dis-assembly code
Reference:
int &ref = i;
lea eax,[i] // disassembly code
mov dword ptr [ref],eax // dis-assembly code
int j = ref;
mov eax,dword ptr [ref] // dis-assembly code
mov ecx,dword ptr [eax] // dis-assembly code
mov dword ptr [j],ecx // dis-assembly code
Pointer:
int *ptr = &i;
lea eax,[i] // disassembly code
mov dword ptr [ptr],eax // dis-assembly code
j = *ptr;
mov eax,dword ptr [ptr] // dis-assembly code
mov ecx,dword ptr [eax] // dis-assembly code
mov dword ptr [j],ecx // dis-assembly code
If you see their dis assembly code, you can see that the dis assembly code generated for reference and pointer is same. This means that implementation wise they are same.
Use reference as much as you can as you don’t need to use & and * confusing operators J.
Sunday, January 1, 2012
What is ORG (origin) directive in assembly level language?
The origin directive tells the assembler where to load instructions and data into memory. It changes the program counter to the value specified by the expression in the operand field. Subsequent statements are assembled into memory locations starting with the new program/location counter value. If no ORG directive is encountered in a source program, the program counter is initialized to zero.
Assembler uses an internal variable called LC (Location Counter) to store current offset address of the statement being processed. When it encounters a variable declaration statement, it puts the value of LC in its symbol table as variable’s address.
For example:
; Initial value of LC is 0
MOV AX, BX ; Here LC = 0
MOV CX, DX ; Now LC = LC + size of above statement i.e. LC = 0 + 2 = 2
A db 0; ; LC = LC + size of above statement i.e. LC = 2 + 2 = 4.
; So the address of “A” will be 4 as LC = 4 when variable definition appear.
MOV DX, A ; LC = LC + size of above statement i.e. LC = 4 + 1 = 5
; In above statement “A” will be replaced with the address of “A” which is 4.
; At end LC = 5 + 4 = 9
This program will work when it is loaded at offset 0 in the segment pointed by DS register. I.e. loading this program at 200:00h (Segment : Offset) or 700:00h address will work as the offset address is 00h.
What if we need to load this program at 200:300h address? Here DS = 200h and offset = 300h (offset != 0), the variable “A” is physically located at “200:304h” address. But the program will try reading its value from 200:04h address. It is obvious that we will not get expected result as the program is not reading variable from its actual address (200:304h).
This program would have worked if the initial value of LC was 300h. Isn’t it?
So we need a directive which can instruct assembler to initialize LC with a specific value like 300h. The directive “ORG” does this. In such scenarios, we would need to use “ORG XXh” statement at the begging of the program to initialize LC with value XXh.
The bottom line is that we should use “ORG” directive when DS (Data Segment) register is not pointing to the first variable in Data segment (when program has separate Code and Data segment) or first instruction (when program has only one segment for both Code and Data).
This directive is very useful when writing boot loader, device drivers, virus, antivirus and OS components because these programs need to loaded at particular offset address.