Monday, August 29, 2005

Const objects and a piece of buggy code!

Question: Here is a mallicious piece of code that I came across a few days ago:
[CODE]

#include
int main(){
const int x=5;
*(int *)&x=10;
printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&(*(int*)&x));
printf("x=%d at address =0x%x\n",x,&x);
const_cast <> (x) = 20;
printf("New x=%d at address =0x%x\n",x,&x);
printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&(*(int*)&x));
return 0;
}

It ran well on VC++ 6.0 and gave wierd outputs:
[Output]
Value of x = 10 at address= 0x12ff7c
x=5 at address =0x12ff7c
New x=5 at address =0x12ff7c
Value of x = 20 at address= 0x12ff7c

Isnt that wierd and wrong? Explain this behaviour.

Answer: The code results in an undefined behaviour.
Explanation: It does not work. It just seems to work. Here is the why and how:

You have declared the variable as a const int. So, when you use the address-of operator on this variable it returns a const int * as compared to a normal int variable in which case it return a normal int *. Now, what you are doing is you got the const int * and then you are using the C-style cast (int *) to cast the const int * to a int *. This is the problem. This is where the undefined behaviour is exposed. You forced the pointer to be converted to an int * and use it to modify the original const object. However, the compiler is free to store the constants whereever it wants which might even be a read-only memory location. And if you try making any changes to it, the results are said to be undefined according to the standards. This could probably work, not do anything or even crash.

Instead of printing what is there with 'x', the compiler optimizes the code. The compiler knows that you have defined x as a const int and you have initialized it with 5. So, it is at its own free-will to use optimization and take the value to be printed from this constant rather than using the variable x and dereferencing from its memory location. Here's a quote I like:

"The const keyword can't keep you from purposely shooting yourself in the foot. Using explicit type-casting, you can freely blow off your entire leg, because while the compiler helps prevent accidental errors, it lets you make errors on purpose. Casting allows you to "pretend" that a variable is a different type."

The villains are actually these two statements:
1. *(int *)&x=10; and
2. const_cast(x) =20;
The first statement is enough to force the undefined behaviour but lets suppose that we avoided it then this second statement becomes the culprit. Herb Sutter in his Exceptional C++ item - 6 says that removing the constness of the const object this way results in undefined behaviour if the actual object is defined as a const. Thats it. If you are further interested, see at the assembly code for this program (VC++ 6.0 Disassembly):
[CODE]
1: #include
2: int main()
3: {
00401010 push ebp
00401011 mov ebp,esp
00401013 sub esp,44h
00401016 push ebx
00401017 push esi
00401018 push edi
00401019 lea edi,[ebp-44h]
0040101C mov ecx,11h
00401021 mov eax,0CCCCCCCCh
00401026 rep stos dword ptr [edi]
4: const int x=5;
00401028 mov dword ptr [ebp-4],5
5: *(int *)&x=10;
0040102F mov dword ptr [ebp-4],0Ah
6: printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&amp;amp;amp;amp;amp;(*(int*)&x));
00401036 lea eax,[ebp-4]
00401039 push eax
0040103A mov ecx,dword ptr [ebp-4]
0040103D push ecx
0040103E push offset string "Value of x = %d at address= 0x%x"... (00420058)
00401043 call printf (004010d0)
00401048 add esp,0Ch
7: printf("x=%d at address =0x%x\n",x,&x);
0040104B lea edx,[ebp-4]
0040104E push edx
0040104F push 5
00401051 push offset string "x=%d at address =0x%x\n" (0042003c)
00401056 call printf (004010d0)
0040105B add esp,0Ch
8: const_cast(x) =20;
0040105E mov dword ptr [ebp-4],14h
9: printf("New x=%d at address =0x%x\n",x,&x);
00401065 lea eax,[ebp-4]
00401068 push eax
00401069 push 5
0040106B push offset string "New x=%d at address =0x%x\n" (0042001c)
00401070 call printf (004010d0)
00401075 add esp,0Ch
10: printf("Value of x = %d at address= 0x%x\n",*(int*)&x,&amp;amp;amp;amp;amp;(*(int*)&x));
00401078 lea ecx,[ebp-4]
0040107B push ecx
0040107C mov edx,dword ptr [ebp-4]
0040107F push edx
00401080 push offset string "Value of x = %d at address= 0x%x"... (00420058)
00401085 call printf (004010d0)
0040108A add esp,0Ch
11: return 0;
0040108D xor eax,eax
12: }
0040108F pop edi
00401090 pop esi
00401091 pop ebx
00401092 add esp,44h
00401095 cmp ebp,esp
00401097 call __chkesp (00401150)
0040109C mov esp,ebp
0040109E pop ebp
0040109F ret

Even I am not very much aware (until now) of the assembly language but I guess this snippet is easy enough to get a feel about what things are happening. The locations marked in black are the one's important to us. Instead of dereferencing x and taking the value from its address, as a optimization techniques the compiler simply pushes in a constant value 5 and uses it while (or even when) there is a different actual value at the storage location of x.

Hence, people! just beware of landing up with code where you cause the undefined behaviour show up. They may look very convincing and correct and that leads to a buggy code very hard to fix. Hope you all liked and enjoyed this post.

P.S. - Please report comments, corrections, and suggestions at this blog. I am learning.

No comments: