Journey of a C Program to Linux Executable in 4 Stages

by Himanshu Arora on October 5, 2011

You write a C program, use gcc to compile it, and you get an executable. It is pretty simple. Right?

Have you ever wondered what happens during the compilation process and how the C program gets converted to an executable?

There are four main stages through which a source code passes in order to finally become an executable.

The four stages for a C program to become an executable are the following:

  1. Pre-processing
  2. Compilation
  3. Assembly
  4. Linking

In Part-I of this article series, we will discuss the steps that the gcc compiler goes through when a C program source code is compiled into an executable.

Before going any further, lets take a quick look on how to compile and run a ‘C’ code using gcc, using a simple hello world example.

$ vi print.c
#include <stdio.h>
#define STRING "Hello World"
int main(void)
{
/* Using a macro to print 'Hello World'*/
printf(STRING);
return 0;
}

Now, lets run gcc compiler over this source code to create the executable.

$ gcc -Wall print.c -o print

In the above command:

  • gcc – Invokes the GNU C compiler
  • -Wall – gcc flag that enables all warnings. -W stands for warning, and we are passing “all” to -W.
  • print.c – Input C program
  • -o print – Instruct C compiler to create the C executable as print. If you don’t specify -o, by default C compiler will create the executable with name a.out

Finally, execute print which will execute the C program and display hello world.

$ ./print
Hello World

Note: When you are working on a big project that contains several C program, use make utility to manage your C program compilation as we discussed earlier.

Now that we have a basic idea about how gcc is used to convert a source code into binary, we’ll review the 4 stages a C program has to go through to become an executable.

1. PRE-PROCESSING

This is the very first stage through which a source code passes. In this stage the following tasks are done:

  1. Macro substitution
  2. Comments are stripped off
  3. Expansion of the included files

To understand preprocessing better, you can compile the above ‘print.c’ program using flag -E, which will print the preprocessed output to stdout.

$ gcc -Wall -E print.c

Even better, you can use flag ‘-save-temps’ as shown below. ‘-save-temps’ flag instructs compiler to store the temporary intermediate files used by the gcc compiler in the current directory.

$ gcc -Wall -save-temps print.c -o print

So when we compile the program print.c with -save-temps flag we get the following intermediate files in the current directory (along with the print executable)

$ ls
print.i
print.s
print.o

The preprocessed output is stored in the temporary file that has the extension .i (i.e ‘print.i’ in this example)

Now, lets open print.i file and view the content.

$ vi print.i
......
......
......
......
# 846 "/usr/include/stdio.h" 3 4
extern FILE *popen (__const char *__command, __const char *__modes) ;
extern int pclose (FILE *__stream);
extern char *ctermid (char *__s) __attribute__ ((__nothrow__));

# 886 "/usr/include/stdio.h" 3 4
extern void flockfile (FILE *__stream) __attribute__ ((__nothrow__));
extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__)) ;
extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__));

# 916 "/usr/include/stdio.h" 3 4
# 2 "print.c" 2

int main(void)
{
printf("Hello World");
return 0;
}

In the above output, you can see that the source file is now filled with lots and lots of information, but still at the end of it we can see the lines of code written by us. Lets analyze on these lines of code first.

  1. The first observation is that the argument to printf() now contains directly the string “Hello World” rather than the macro. In fact the macro definition and usage has completely disappeared. This proves the first task that all the macros are expanded in the preprocessing stage.
  2. The second observation is that the comment that we wrote in our original code is not there. This proves that all the comments are stripped off.
  3. The third observation is that beside the line ‘#include’ is missing and instead of that we see whole lot of code in its place. So its safe to conclude that stdio.h has been expanded and literally included in our source file. Hence we understand how the compiler is able to see the declaration of printf() function.

When I searched print.i file, I found, The function printf is declared as:

extern int printf (__const char *__restrict __format, ...);

The keyword ‘extern’ tells that the function printf() is not defined here. It is external to this file. We will later see how gcc gets to the definition of printf().

You can use gdb to debug your c programs. Now that we have a decent understanding on what happens during the preprocessing stage. let us move on to the next stage.

2. COMPILING

After the compiler is done with the pre-processor stage. The next step is to take print.i as input, compile it and produce an intermediate compiled output. The output file for this stage is ‘print.s’. The output present in print.s is assembly level instructions.

Open the print.s file in an editor and view the content.

$ vi print.s
.file "print.c"
.section .rodata
.LC0:
.string "Hello World"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movl $.LC0, %eax
movq %rax, %rdi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",@progbits

Though I am not much into assembly level programming but a quick look concludes that this assembly level output is in some form of instructions which the assembler can understand and convert it into machine level language.

3. ASSEMBLY

At this stage the print.s file is taken as an input and an intermediate file print.o is produced. This file is also known as the object file.

This file is produced by the assembler that understands and converts a ‘.s’ file with assembly instructions into a ‘.o’ object file which contains machine level instructions. At this stage only the existing code is converted into machine language, the function calls like printf() are not resolved.

Since the output of this stage is a machine level file (print.o). So we cannot view the content of it. If you still try to open the print.o and view it, you’ll see something that is totally not readable.

$ vi print.o
^?ELF^B^A^A^@^@^@^@^@^@^@^@^@^A^@>^@^A^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@0^
^@UH<89>å¸^@^@^@^@H<89>ǸHello World^@^@GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3^@^
T^@^@^@^@^@^@^@^AzR^@^Ax^P^A^[^L^G^H<90>^A^@^@^\^@^@]^@^@^@^@A^N^PC<86>^B^M^F
^@^@^@^@^@^@^@^@.symtab^@.strtab^@.shstrtab^@.rela.text^@.data^@.bss^@.rodata
^@.comment^@.note.GNU-stack^@.rela.eh_frame^@^@^@^@^@^@^@^@^@^@^@^
...
...
…

The only thing we can explain by looking at the print.o file is about the string ELF.

ELF stands for executable and linkable format.

This is a relatively new format for machine level object files and executable that are produced by gcc. Prior to this, a format known as a.out was used. ELF is said to be more sophisticated format than a.out (We might dig deeper into the ELF format in some other future article).

Note: If you compile your code without specifying the name of the output file, the output file produced has name ‘a.out’ but the format now have changed to ELF. It is just that the default executable file name remains the same.

4. LINKING

This is the final stage at which all the linking of function calls with their definitions are done. As discussed earlier, till this stage gcc doesn’t know about the definition of functions like printf(). Until the compiler knows exactly where all of these functions are implemented, it simply uses a place-holder for the function call. It is at this stage, the definition of printf() is resolved and the actual address of the function printf() is plugged in.

The linker comes into action at this stage and does this task.

The linker also does some extra work; it combines some extra code to our program that is required when the program starts and when the program ends. For example, there is code which is standard for setting up the running environment like passing command line arguments, passing environment variables to every program. Similarly some standard code that is required to return the return value of the program to the system.

The above tasks of the compiler can be verified by a small experiment. Since now we already know that the linker converts .o file (print.o) to an executable file (print).

So if we compare the file sizes of both the print.o and print file, we’ll see the difference.

$ size print.o
   text	   data	    bss	    dec	    hex	filename
     97	      0	      0	     97	     61	print.o 

$ size print
   text	   data	    bss	    dec	    hex	filename
   1181	    520	     16	   1717	    6b5	print

Through the size command we get a rough idea about how the size of the output file increases from an object file to an executable file. This is all because of that extra standard code that linker combines with our program.

Now you know what happens to a C program before it becomes an executable. You know about Preprocessing, Compiling, Assembly, and Linking stages There is lot more to the linking stage, which we will cover in our next article in this series.


Linux Sysadmin Course Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t know what these tools are and how to use them, you could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools, which will help you to become an effective Linux system administrator.
Get the Linux Sysadmin Course Now!

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 58 comments… read them below or add one }

1 AMIT October 5, 2011 at 8:27 am

Very nice explanation !!
Waiting for more.

2 Jason October 5, 2011 at 9:03 am

Clear and concise! Thank you very much.

3 TommyZee October 5, 2011 at 9:47 am

Very well done maestro! :P )

4 Eric October 5, 2011 at 11:04 am

Noob alert: I couldn’t get it to work unless I included .

5 Eric October 5, 2011 at 11:07 am

I think I know what happened. The comment processor stripped off the “stdio.h” from my last comment. It might have done that for the Hello World code too.

6 Ramesh Natarajan October 5, 2011 at 12:56 pm

@Eric,

Thanks for pointing it out. It is fixed now in the print.c program mentioned above.

7 Júlio Hoffimann Mendes October 5, 2011 at 5:09 pm

Hi Ramesh,

Nice post, would be great to get deeper in the ELF format in future articles as you said. :-)

Regards,
Júlio.

8 Himanshu October 5, 2011 at 7:34 pm

Thanks you all for your comments.
As of now, you’ll soon get to see Part-II of this series. After that I will write on ELF too :-)

9 Reynold October 5, 2011 at 8:03 pm

Great article :)

10 Dennis October 6, 2011 at 12:20 am

excellent!!

11 Gregory October 6, 2011 at 7:09 am

Very nice article, many times used gcc but never thought about details

12 Chuck October 6, 2011 at 5:19 pm

Very Nice. You hit on a helpful topic to expand our horizons.

13 peter October 8, 2011 at 6:34 am

Great article. More please.

14 Himanshu October 9, 2011 at 12:19 am

Thank you all again for appreciation. Article on ‘Linking process (advanced)’ can arrive anytime.

15 behzad October 12, 2011 at 3:35 pm

this was a good read, thanks

16 Welington October 14, 2011 at 4:34 pm

Pretty nice article.
Congratulations!

I hope that it’s going to be a first of a series.

17 Albert Joseph October 15, 2011 at 3:28 am

Thanks for the article. I’ll be visiting here frequently for more knowledge :)

18 Bill November 17, 2011 at 10:21 am

I believe that coff (common object file format) is prior to ELF. a.out is just a file name.

19 rahul kumar dubey January 10, 2012 at 7:46 am

really nice explanation.. please keep it up…
it will be nice if you can post some article realted to how / what exactly happens at machine level.. when computer starts and how “hello world” for e.g. is manipulate d in 101010101 form at hardware level

20 Himanshu January 10, 2012 at 9:26 am

@rahul kumar dubey

Thanks Rahul.
Sure, whenever I get a chance I’ll write an article over it.

21 Deepak March 15, 2012 at 6:30 am

Very good explanation. Hope to see some more articles from you.

22 Dibyendu Chakraborty April 2, 2012 at 3:26 am

Excellent lecture … ready for more.

23 kalaiselvan April 24, 2012 at 6:33 am

thank you

24 Tarun Thakur May 2, 2012 at 11:20 pm

Awesome .. Great job done. Crystal clear steps provided. In short time, understanding of internal steps from program to executable is made.
Thank U .. !!

25 Shashank July 13, 2012 at 9:17 am

Please keep writing more.
Very nicely done

26 bril July 24, 2012 at 12:26 am

Excellent…. well written and well explained, You just saved my day. Thanks for such an awesome article.

27 Tayyab August 4, 2012 at 1:40 am

Brilliant job ……
Thank u so much !!

28 Ashish August 29, 2012 at 10:38 am

Really Nice Stuff…Thanks a lot….waiting for more like this on unix internals

29 Anonymous September 5, 2012 at 6:03 am

extra ordinary !!!

30 sujith September 7, 2012 at 10:33 am

very clear and good for freshers…

31 Prasad October 23, 2012 at 9:50 am

A very good explanation… :)

32 Rajeev October 26, 2012 at 5:25 am

Nice Article very con-vincible. I have one doubt about the LOADER. How LOADER program come into play in the above example and when.

Thanks in advance.

33 soumyajit November 28, 2012 at 3:31 am

great article sir

34 Pradeep January 17, 2013 at 6:57 am

Very well explained

35 Mayank February 9, 2013 at 10:25 am

Very well explained for beginners.

36 slekcher February 16, 2013 at 5:41 pm

At the linking stage, you said the printf() definition will be included in our executable file, but is the definition in machine code?

37 shiv February 26, 2013 at 10:29 pm

Awesome work!! Have been looking for such an article for some time. Very well explained.

38 Anonymous May 24, 2013 at 6:56 am

superb explanation with the demonstrated commands … thanx

39 Tarunjit Singh May 30, 2013 at 1:09 am

You can mention that the option to see the output after compiling and before assembling is

$$ gcc -S print.c

40 Abhisek Pattnaik June 5, 2013 at 11:01 am

Please provide the links to the part 2 & ELF in the article content itself.

41 prasant June 10, 2013 at 3:42 am

thanx for these usefull informations…….

42 manoj June 16, 2013 at 3:28 am

very concise & helpful !!!…

43 Hanu June 21, 2013 at 2:15 am

Very nice article, many times used gcc but never thought about details

44 Nithin July 3, 2013 at 3:48 am

Thanks for the detailed explanation…

45 mahe July 16, 2013 at 6:38 am

nice explanation.. :D

46 prabhu August 4, 2013 at 4:17 am

clear explation…

47 Harish August 5, 2013 at 2:25 pm

very nice explanation.. Is there some website to learn c from scratch in deep..?

48 chandu August 7, 2013 at 12:43 am

superb article… thanks a lot…!!

49 Akshara August 18, 2013 at 7:32 am

Very Nicely explained

50 Akshara August 18, 2013 at 7:41 am

i have one questions… on which of these 4 stages inline functions are handled???

51 Anonymous September 11, 2013 at 1:13 am

Awesome!!! Keep up the good work

52 Himanshu AKA Anky September 23, 2013 at 6:23 am

AWESOME work bro Helpfull in mah assignment

53 Duryodhan September 28, 2013 at 10:53 pm

superb….man very helpfull for me!!!!!!!

54 anand September 30, 2013 at 2:36 am

Cool Stuff. Definitely helps in understanding the Internal of a c language.

55 satya October 1, 2013 at 1:09 am

hi sir,
i like ur blog, very much.
my question is, when i executed .cpp file with g++ command and then i searched for .s file which is not there
print.c print print.o print-size

where is print.o, i want to see my assembling code….
how can i see it.

please help, thanks in advance…

56 Dharmik February 25, 2014 at 12:43 pm

Very nicely explained. .. Always liking this site

57 Ajay March 27, 2014 at 11:53 pm

Nicely Explained.. good Work

58 Anonymous August 22, 2014 at 5:04 am

why do we need different compilation stages???

Leave a Comment

Previous post:

Next post: