Question Everything: 03/01/2012

Tuesday, March 27, 2012

C/C++ Data Model (32/64 Bit)

A machine dependent high level language provides several data types like char, int and long. However, Compilers/Languages are not free to decide the size of the data types. The reason is that the Language has to call OS's system API providing expected inputs of data. Thus the size of any data type provided by compiler must match with the data types expected by OS and its System API. Otherwise, System call will fail with some error and will not get expected result. Isn't it?

Let's consider a system call which expects first argument as 'int' (4 byte) and second argument as 'char' (1 byte) as input. In order to call this API, the compiler have to have a 'int' data type of 4 byte and a 'char' data type of 1 byte to call that system API. Isn't it? Suppose, the compiler has 'int' data type of 1 byte and 'char' of 4 byte. Now when the call to the OS API is made, it will fail as it is expecting first argument as 'int' (4 byte) and second argument as 'char' (1 byte) data. Isn't it? It is obvious that API will fail with an error. In such case, as a workaround, we can pass an 'int' (as its size if 1 byte in our compiler) as fist argument and a 'char' as second argument. Doing this will be a big issue programing issue and it will cause any confusions and issues. That's the reason, compiler's data type must match with the data types chosen by the Operating System.

There will be many more issues if the size of any data type (provided by compiler) is not same as the OS data types. To avoid such issues, data models which specify the size of each data type are introduced and standardized. The compiler is enforced to follow data model chosen by the Operating System.

Every application and every operating system has an abstract data model. Many applications do not explicitly expose this data model, but the model guides the way in which the application's code is written.

The table below details the data types of several data models for comparison purposes.

Data model	char	short (integer)	Int	Long (integer)	long long	pointer/ size_t	Sample operating systems
LP32	8	16	16	32		32	Win-16, Apple Macintosh
ILP32	8	16	32	32		32	32-bit UNIX
LLP64/ IL32P64	8	16	32	32	64	64	Microsoft Windows (X64/IA-64)
LP64/ I32LP64	8	16	32	64		64	Most 64 bit Unix and Unix-like systems, e.g. Solaris, Linux, and Mac OS X; z/OS
ILP64	8	16	64	64		64	HAL Computer Systems port of Solaris to SPARC64
SILP64	8	64	64	64		64	Unicos

Many 64-bit compilers today use the LP64 model (including Solaris, AIX, HP-UX, Linux, Mac OS X, FreeBSD, and IBM z/OS native compilers). Microsoft's Visual C++ compiler uses the LLP64 model.

Please note that the size of ’long long’ is 64 bit on 32 bit and 64 bit machine/OS. In the C99 version of the C programming language and the C++11 version of C++, a ‘long long’ type is supported that doubles the minimum capacity of the standard long to 64 bits. This type is not supported by compilers that require C code to be compliant with the previous C++ standard, C++03, because the ‘long long’ type did not exist in C++03. Microsoft Windows VC++ supports it. However, some compilers may not support it.

References:
http://en.wikipedia.org/wiki/64-bit
http://www.unix.org/whitepapers/64bit.html

Wednesday, March 21, 2012

Relationship between high level language data type with the processor's Data type

The size of any data type in any high level language like C/C++ has no direct relationship with the processor's data type. The size of any data type is defined by the language/compiler and it is implementation specific. Each data type is implemented in terms of data type supported by the processor.

Let's take Intel 8086 processor (16 bit) as an example, its instruction set supports following data types:

Length | Size Name

---------|------------

8-bit | byte

16-bit | word

It means, you can perform operations on 1/2 byte (8/16 bit) data. The Intel 8086 processor has no support for 4 byte (32 bit) data. You won't find any 8086 instruction which can perform operation on 4 byte (32 bit) data.

Now let's see the C data types and their sizes:

Type | Length | Range

------------------|----------|-----------------------------------------

unsigned char | 8 bits | 0 to 255

char | 8 bits | -128 to 127

enum | 16 bits | -32,768 to 32,767

unsigned int | 16 bits | 0 to 65,535

short int | 16 bits | -32,768 to 32,767

int | 16 bits | -32,768 to 32,767

unsigned long | 32 bits | 0 to 4,294,967,295

long | 32 bits | -2,147,483,648 to 2,147,483,647

Signed/unsigned Char data type of C/C++ is implemented as 'byte' data type of processor and other 16 bit data types like 'int', 'enum' etc are implemented as 'word' data type of processor. Now the question arises how can compiler support 32 bit data like 'long' on a processor (intel 8086) which doesn't support 32 bit arithmatics/operations ?

The answer is very simple :), the 32 bit data type like 'long' is implemted in terms of two 'word' data types of processor. Compilers implement 32 bit arithmetic operations by using 16 bit arithmetic operations provided by the processor.

Mapping each C/C++ data types with processor's data type doesn't mean that the size of any C/C++ data type will be determined by the processor. The data type of a machine dependent high level language is implemented in terms of processor's data type. It is the language and compiler which provides these data types. Thus the size of these data types will oviously be specified by the language/compiler.

Read Data models for more information on this.

Question Everything

Upcoming Posts

Tuesday, March 27, 2012

C/C++ Data Model (32/64 Bit)

Wednesday, March 21, 2012

Relationship between high level language data type with the processor's Data type

About Me