Upcoming Posts

Upcoming Posts....

Make your own partition magic software.
How to make an assembler.
What is booting?
Write a simple OS!

‎"I have no special talents. I am only passionately curious." - Albert Einstein

Tuesday, March 27, 2012

C/C++ Data Model (32/64 Bit)

A machine dependent high level language provides several data types like char, int and long. However, Compilers/Languages are not free to decide the size of the data types. The reason is that the Language has to call OS's system API providing expected inputs of data. Thus the size of any data type provided by compiler must match with the data types expected by OS and its System API. Otherwise, System call will fail with some error and will not get expected result. Isn't it?

Let's consider a system call which expects first argument as 'int' (4 byte) and second argument as 'char' (1 byte) as input. In order to call this API, the compiler have to have a 'int' data type of 4 byte and a 'char' data type of 1 byte to call that system API. Isn't it? Suppose, the compiler has 'int' data type of 1 byte and 'char' of 4 byte. Now when the call to the OS API is made, it will fail as it is expecting first argument as 'int' (4 byte) and second argument as 'char' (1 byte) data. Isn't it? It is obvious that API will fail with an error. In such case, as a workaround, we can pass an 'int' (as its size if 1 byte in our compiler) as fist argument and a 'char' as second argument. Doing this will be a big issue programing issue and it will cause any confusions and issues. That's the reason, compiler's data type must match with the data types chosen by the Operating System.

There will be many more issues if the size of any data type (provided by compiler) is not same as the OS data types. To avoid such issues, data models which specify the size of each data type are introduced and standardized. The compiler is enforced to follow data model chosen by the Operating System.

Every application and every operating system has an abstract data model. Many applications do not explicitly expose this data model, but the model guides the way in which the application's code is written.

The table below details the data types of several data models for comparison purposes.

Data model

   char
short
(integer)
Int
Long
(integer)
long long
pointer/
size_t
Sample operating systems
LP32
 8
 16
 16
 32

 32
Win-16, Apple Macintosh
ILP32
 8
 16
 32
 32

 32
32-bit UNIX
LLP64/
IL32P64
 8
16
32
32
64
64
Microsoft Windows
(X64/IA-64)
LP64/
I32LP64
 8
16
32
64
64
Most 64 bit Unix and
Unix-like
systems,
e.g. Solaris,
Linux, and
Mac OS Xz/OS
ILP64
 8
16
64
64
64
HAL Computer Systems port of
Solaris to SPARC64
SILP64
 8
64
64
64
64

Many 64-bit compilers today use the LP64 model (including Solaris, AIXHP-UX, Linux, Mac OS XFreeBSD, and IBM z/OS native compilers). Microsoft's Visual C++ compiler uses the LLP64 model.

Please note that the size of ’long long’ is 64 bit on 32 bit and 64 bit machine/OS. In the C99 version of the C programming language and the C++11 version of C++, a ‘long long’ type is supported that doubles the minimum capacity of the standard long to 64 bits. This type is not supported by compilers that require C code to be compliant with the previous C++ standard, C++03, because the ‘long long’ type did not exist in C++03. Microsoft Windows VC++ supports it. However, some compilers may not support it.


References:
http://en.wikipedia.org/wiki/64-bit
http://www.unix.org/whitepapers/64bit.html

Wednesday, March 21, 2012

Relationship between high level language data type with the processor's Data type

The size of any data type in any high level language like C/C++ has no direct relationship with the processor's data type. The size of any data type is defined by the language/compiler and it is implementation specific. Each data type is implemented in terms of data type supported by the processor.

Let's take Intel 8086 processor (16 bit) as an example, its instruction set supports following data types:

   Length | Size Name
  ---------|------------
   8-bit   | byte
   16-bit | word

It means, you can perform operations on 1/2 byte (8/16 bit) data. The Intel 8086 processor has no support for 4 byte (32 bit) data. You won't find any 8086 instruction which can perform operation on 4 byte (32 bit) data.

Now let's see the C data types and their sizes:

   Type              | Length | Range
  ------------------|----------|-----------------------------------------
   unsigned char |  8 bits  |                 0 to 255
   char              |  8 bits  |              -128 to 127
   enum            | 16 bits |           -32,768 to 32,767
   unsigned int   | 16 bits |                 0 to 65,535
   short int        | 16 bits |           -32,768 to 32,767
   int                | 16 bits |           -32,768 to 32,767
   unsigned long | 32 bits |                 0 to 4,294,967,295
   long              | 32 bits |    -2,147,483,648 to 2,147,483,647
   
Signed/unsigned Char data type of C/C++ is implemented as 'byte' data type of processor and other 16 bit data types like 'int', 'enum' etc are implemented as 'word' data type of processor. Now the question arises how can compiler support 32 bit data like 'long' on a processor (intel 8086) which doesn't support 32 bit arithmatics/operations ?

The answer is very simple :), the 32 bit data type like 'long' is implemted in terms of two 'word' data types of processor. Compilers implement 32 bit arithmetic operations by using 16 bit arithmetic operations provided by the processor.

Mapping each C/C++ data types with processor's data type doesn't mean that the size of any C/C++ data type will be determined by the processor. The data type of a machine dependent high level language is implemented in terms of processor's data type. It is the language and compiler which provides these data types. Thus the size of these data types will oviously be specified by the language/compiler.

Read Data models for more information on this.