We run programs all the time: double-click an icon, type a command in the terminal, or tap an app on a phone. But under the hood, starting a program is a surprisingly complex chain of events involving the operating system, the CPU, memory, and various runtime components.
In this article, we’ll walk through what exactly happens when you run a program — from the moment you start it, to the moment it finishes and returns control back to the operating system.
1. A Program Is Just a File (Until You Run It)
Before execution, a program is simply a file on disk. Depending on the language and platform, this can be:
- a native binary (e.g., compiled C/C++ program on Linux or Windows),
- bytecode (e.g., Java .class files or .jar archives, .NET assemblies),
- an interpreted script (e.g., Python, shell, JavaScript outside the browser).
An executable file usually has:
- header (format information, entry point address, metadata),
- code segment (machine instructions),
- data sections (initialized and uninitialized global variables),
- symbol and relocation tables (for linking and debugging).
On Linux, native executables are often in the ELF format; on Windows, in PE (Portable Executable) format. The OS loader knows how to interpret these formats.
2. Double-Click or ./run: Triggering Execution
Execution starts when you ask the OS to run a program:
- by double-clicking an icon in a file manager,
- by calling it from a shell:
./my_program, - by another process using calls like
exec(POSIX) orCreateProcess(Windows).
At this point, the OS checks:
- Does the file exist?
- Is it executable?
- What is its type? Native binary, script, or something else?
For scripts, the OS may use a shebang line (e.g., #!/usr/bin/env python3) to determine which interpreter should run the file.
3. The OS Loader Takes Over
The next step is the loader, a component of the operating system responsible for preparing a program for execution. Its tasks include:
- parsing the executable file format (ELF, PE, etc.),
- verifying permissions and security constraints,
- creating a new process,
- allocating an address space for the process,
- mapping the program’s code and data segments into memory,
- loading and linking dynamic libraries (shared objects, DLLs),
- setting up the initial stack with arguments and environment variables.
At this stage, the OS has created the “shell” in which your program will live, but the code has not yet started executing in user mode.
4. From Program to Process
A key concept here: a program is a passive file on disk; a process is a running instance of that program.
When the OS starts a process, it creates:
- a Process Control Block (PCB) or equivalent structure with:
- process ID (PID),
- current state (running, waiting, etc.),
- CPU register state,
- open file descriptors,
- scheduling information (priority, time slice).
- a private virtual address space,
- one or more threads of execution.
The process now exists conceptually, but we still need to define how its memory is organized and where execution will begin.
5. Memory Layout of a Running Program
Most traditional processes share a similar logical memory layout:
| Region | Purpose |
|---|---|
| Text segment | Executable machine code instructions |
| Data segment | Initialized global and static variables |
| BSS segment | Zero-initialized globals and statics |
| Heap | Dynamically allocated memory (e.g., malloc, new) |
| Stack | Function calls, local variables, return addresses |
| Mapped regions | Shared libraries, memory-mapped files, JIT code |
The OS allocates and protects these regions using the memory management unit (MMU) and page tables, providing isolation between processes.
6. Preparing the Stack and the Entry Point
Before your own code runs, the runtime environment must be set up:
- the OS places command-line arguments and environment variables on the stack,
- the C runtime (or other runtime) is initialized,
- global constructors and static initializers may be executed,
- security features (like stack canaries) are configured.
Every executable file declares an entry point — the address at which execution should begin. On many systems, a low-level function like _start is called first, which then sets up the environment and eventually calls main() (in C/C++-style programs).
7. The CPU’s Fetch–Decode–Execute Cycle
Now the CPU starts executing instructions in the context of your process. At a very high level, each CPU core repeats the same basic loop:
- Fetch: read the next instruction from memory using the instruction pointer (program counter).
- Decode: determine what the instruction means and which operands it needs.
- Execute: perform the requested operation (addition, comparison, branch, memory access, etc.).
- Write-back: store the result in registers or memory.
Modern CPUs add layers of sophistication:
- caches (L1, L2, L3) to reduce memory access latency,
- pipelining to execute multiple instructions in overlapping stages,
- branch prediction to guess future instruction paths,
- out-of-order and superscalar execution to exploit parallelism inside a single thread.
8. System Calls: Crossing the User–Kernel Boundary
Programs often need services that only the kernel can provide: reading and writing files, opening network connections, allocating low-level resources, managing processes, and more.
To do this, the program issues a system call (syscall): a controlled transition from user mode to kernel mode. Examples include:
read(),write()(file I/O),open(),close(),socket(),connect(),fork(),exec(),wait().
The kernel validates parameters, accesses hardware or system resources, and eventually returns control and results back to user mode.
9. Multithreading and Scheduling
Most modern programs either use multiple threads or run on systems where dozens of processes share the same CPU cores.
The operating system’s scheduler decides:
- which process or thread runs on which CPU core,
- for how long (time slice / quantum),
- how to balance priorities and fairness.
Each thread has its own stack and register state. When the scheduler switches from one thread or process to another, it performs a context switch: saving the old context and restoring the new one.
10. How Runtimes Change the Picture: JVM, Python, JavaScript
10.1 Java Virtual Machine (JVM)
For Java, Kotlin, Scala and similar languages, you don’t run native machine code directly. Instead:
- your source code is compiled to bytecode,
- the JVM is started as a native process,
- the JVM loads and verifies bytecode classes,
- a JIT (Just-In-Time) compiler may compile hot methods to native code,
- a garbage collector automatically reclaims unused objects.
10.2 Python
For Python:
- the Python interpreter is started as a native process,
- your code is compiled to Python bytecode (.pyc),
- a VM loop executes this bytecode instruction-by-instruction,
- garbage collection and reference counting manage memory.
10.3 JavaScript Engines (e.g., V8)
In environments like Node.js or modern browsers:
- the JS engine parses your code into an AST,
- creates bytecode or baseline compiled code,
- profiles execution and uses JIT tiers to optimize hot paths,
- manages memory with garbage collection,
- integrates with an event loop that handles timers, I/O, and callbacks.
11. What Happens When the Program Ends?
Eventually, your program returns from main() (or equivalent) or calls exit(). When that happens:
- an exit code is returned to the operating system (0 for success, non-zero for errors),
- open file descriptors and sockets are closed,
- OS releases the process’s virtual memory and other resources,
- parent processes (like the shell) may read the exit status and act accordingly.
In garbage-collected environments, some cleanup can happen at shutdown, but ultimately it’s the OS that reclaims the entire process space.
12. Common Issues During Program Execution
When you understand the lifecycle of a program, many common errors make more sense:
- Segmentation faults: illegal memory access in your process’ address space.
- Stack overflows: stack exhausted due to deep recursion or huge local variables.
- Memory leaks: memory allocated but never freed or dereferenced.
- Race conditions: multiple threads accessing shared data without proper synchronization.
- Deadlocks: threads waiting on each other’s locks in a cycle.
- High CPU usage: tight loops, busy waiting, or inefficient algorithms.
13. From File to Execution: A Quick Summary
Let’s recap the journey:
- You trigger execution (double-click, command, system call).
- The OS loader parses the executable, checks permissions, and creates a process.
- The OS sets up virtual memory, loads code and data segments, and prepares the stack.
- The runtime environment initializes and calls your program’s entry point.
- The CPU runs instructions in a fetch–decode–execute cycle, using system calls to interact with the kernel.
- The scheduler shares CPU time among processes and threads.
- When the program finishes, resources are cleaned up and control returns to the OS.
All of this happens in milliseconds, but understanding it gives you a powerful mental model for debugging, optimizing performance, and writing more robust software.