Alexander Jung

CS PhD Student @LancasterUni

Posted on 10 mins read Tag: OS

The approach to designing an OS is delivered by composition of its directory hierarchy. The kernel, for instance, with its subset of features providing the core mechanisms for the OS, can be further subdivided and categorised. Abstractions of the features are provided as libraries higher up the hierarchy, made available to programs which do not need to re-implement or otherwise be aware of any underlying mechanisms. For instance, using a function which dynamically allocates a variable sized segment of memory should not include architecture-specific methods, even if architecture-specific methods are used in the process of the allocation.

In the following sections, the structure of the kernel is illustrated; the tools and programming languages used to develop the OS are discussed; and, the methods by which one can determine the provisioning of a higher-level are illustrated.

Figures representing data structures are illustrated as byte fields which show how the bits are laid out in memory. Whilst the structure may not display the address in memory, it will show both the starting bit and maximum field lengths along the top, and the hexadecimal offset along the edge. All diagrams are little-endian.

Kernel Structure

The Kernel structure illustrated in the figure below is loosely based on the Linux directory source tree.1 The main differences here are the movement of directories containing mechanisms such as the bootloader and memory management inside the kernel directory. The reason for this is that all the contained methods should be inaccessible to the OS, all necessary functions are provided in the OS-wide directory, include/.

Interrupt Descriptor Table (IDT) entry descriptor. Interrupt Descriptor Table (IDT) entry descriptor.

The following is a general description of all abstractions made by the directories:

build/ The working build directory (compiled sources)
disk/ The contents of the initial RAM disk
include/ OS-wide libraries and Standard C Libraries
kernel/ Core subsystems
kernel/arch/ Architecture-specific source
kernel/boot/ Kernel boot and initialisation
kernel/devices/ Device drivers
kernel/include/ Kernel-only libraries
kernel/lib/ Helper routines
kernel/mem/ Memory management subsystem and the VM
lib/ Third-party helper routines and sources
util/ Tools helpful for developing the OS

Bootloader

All the files pertaining to the boot process is contained within the kernel/boot/ directory. This provides an ad hoc containment of the initial files used to provide the first sector (see bootloading) of the final OS image. These files are later linked together during compilation and so the configuration files for this process are also included here.

Architecture Support

There are number of architecture-specific functions and data structures which should be provided together when defining the platform for which the OS should be compiled. Whilst the implementation will not discuss architectures other than x86, the ability to later provide and maintain support, however, is present by aggregating these functions together inside the arch/ folder and then inside a further subdirectory defined by architecture name. The following designs define structures and processes used in an x86 platform, located in arch/x86/.

Global Descriptor Table

The GDT is a native method offered by the x86 architecture for providing address management and protection.2 Addresses can be grouped into segments, with every segment containing a physical starting address, or base; the size of the segment itself, known as the limit; the access rights to the segment (whether its available to the kernel or to the user); and several configuration bits. The access and config bits have been simplified in the data structure illustrated by figure below and will be explained in further detail in section the implementation.

Global Descriptor Table (GDT) segment descriptor. Global Descriptor Table (GDT) segment descriptor.

Initialising the GDT requires a reference point to all the segments which will be used by the OS. This is set in the GDT register which contains the base address which points to the starting location of all the segments, contained in array, and the total number of segments, indicated as the limit in figure below.

Contents of the Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT) register. Contents of the Global Descriptor Table (GDT) and Interrupt Descriptor Table (IDT) register.

The GDT will provide a method of protecting tasks executed by the OS, preventing them from accessing system critical memory regions dealing directly with the OS itself.

Interrupt Descriptor Table

Akin to the GDT, the IDT is a list of addresses for all interrupt routines the CPU can recognise. This ranges from peripheral interrupts such as those offered by the keyboard and on-board clock; user interrupts, known as tasks; and system critical interrupts such as page faults. The figure below illustrates the structure of an IDT entry, where each entry contains a starting address, or base; the entry flags such as the privilege level; the selector which points to a segment in the GDT; and one byte of zeros.

Interrupt Descriptor Table (IDT) entry descriptor. Interrupt Descriptor Table (IDT) entry descriptor.

The IDT is initialised in the same manner as as the GDT register, with a base address pointing to the list of interrupts in conjunction with the limit, or total, number of routines made available.

Interrupt Service Routines (ISR)

Continuing architecture initialisations will require the provisioning and handling of both system critical interrupts and those on behalf of the user. All IDT entries contain a pointer (the base address) to a routine which is called once the interrupt is detected by the CPU. Providing an interface for interrupts will help simplify their later handling in addition to reducing the complexity of each routine itself. One method of doing this is to initialise a unique routine for each IDT entry whose only function is to pass unique identifier to a common routine. Thus, interrupts thrown as a result of passing instructions to the CPU which attempt a division by zero, create a memory stack overflow, or a page fault (deemed serious enough to affect the running of the OS) can be handled separately to those issued by the keyboard or by the completion of a user task all while using a common interface.

A representation of the information passed to an Interrupt Service Routine (ISR). A representation of the information passed to an Interrupt Service Routine (ISR).

In addition to the interrupt identifier, additional information such as the contents of the CPU’s working registers can be used as the sole parameter to all interrupt handlers, enabling the routine to determine the error itself, perform an analysis of the previous instruction, all in attempt to reduce disruption the user. This parameter structure is illustrated in figure above.

Memory Management

One of the features of an OS is to support dynamic memory allocation. This processes delivers a reference to a logical memory location which is translated to a physical location. Whilst providing memory management could be implemented by hand, the x86 architecture offers some additional help, with greater efficiency, in facilitating this process. This is taken in the form of paging. The figure below illustrates steps taken by the CPU which will look up the contents of a memory location given a logical address.

The CPU's general process of acquiring a physical address from a logical address. The CPU's general process of acquiring a physical address from a logical address.
Linear-address translation to a 4KB page using 32-Bit paging. Linear-address translation to a 4KB page using 32-Bit paging.

Segmentation provides a large allocation of linear addresses whilst paging is a method of using this space efficiently by dividing it up and monitoring whether the page of memory is in use. The figure above illustrates the lookup mechanism by which a linear address translates to a physical address in memory. It uses three underlying structures to provide this mechanism: a page directory, a page table, and a page.

The structure of a page entry. The structure of a page entry.

Each page contains a number of flags which help determine the memory state and provide protection. The structure for the page is illustrated in figure above, and the description of all the bits are described as follows:

  • p is “present” and is equal to 1;
  • r is a boolean state for permissioning to write to this frame;
  • u is the access level where 0 is user-mode and 1 is supervisor-mode;
  • a is ignored;
  • d is used to determine whether the size of the frame;
  • unused is ignored; and,
  • frame is the physical address of 4KB aligned page table referenced by this entry.

Weaponry Choice of Tools

The C Programming Language (simply, C) and the GNU Compiler Collection (GCC) will be used as primary means of developing this minimal OS. For the purpose of delivering accurate numeric and binary representations for managing memory, C offers this precise control and data typing system. This is necessary when dealing with lower level functionality provided by the CPU. In addition, it serves as a primary language which many OSs today are built in,3 and thus its utilisation in this scenario offers parallels with other systems.

Consequently, GNU Assembly (GAS) will be the assembler language providing low-level instructions to the CPU as it is the default assembler provided by GCC. Operation Codes (OP-codes) are provided by Elsner et al.4

GNU Make (simply, Make)5 will be used as the build mechanism for generating a bootable images of the OS for redistribution and testing purposes. This tool offers flexibility and control when compiling the source files of the project. Instead of typing out commands by hand, its automation techniques are used to speed up and simplify the build process

Finally, Linux is the choice of development environment as it provides the mentioned tools as part of its distribution. In addition, it offers the ability to install additional tools with ease via package management.

JavaScript Engine

Determining the minimal functional requirements for a high-level language to be interpreted on the OS depends on the choice of JavaScript engine and choosing an engine depends on the degree to which the JavaScript engines of can be described as standalone.

When software is described as ‘standalone’, it refers to the software’s ability to run with minimal or no requirements for third-party or system libraries. This usually entails being able to run the software without having to install anything else, require any additional source code packages downloaded during installation, or the degree to which it depends on system-provided libraries.

How reliant the software is on requiring additional libraries sources can be tested simply determining a). the overall size of the engine itself once fully downloaded, and b). the required libraries and methods it uses for its inner working. The latter can be determined by compiling the source of the engine with the same GCC flags the rest of the minimal OS is compiled with, however this is providing the source is written using C.

By compiling the Duktape and NodeJS source (both of which are written using C) with GCC and using flags such as -nostd, it returns a list of compilation errors indicating the missing libraries and functions it needs in order to function itself. Comparing these requirements between the two engines is shown in more detail in the table below. As a result of these findings, Duktape was selected as the engine to implement.

JavaScript Engine Duktape NodeJS
Required Libraries 10 13
Required Methods 47 61
Required Constants 52 56
Source Code 4.3 MB 269.33 MB

Explicit Requisites

System design constants are are fairly straight forward and simply describe the maximum and minimum bit length of different C types, including int, char, and long. The following list of constants provided as part of the Standard C Library which are directly required by Duktape in order to function:

size_t INT_MIN INT_MAX UINT_MAX
UINTMAX_MAX INTMAX_MIN INTMAX_MAX SIZE_MAX
CHAR_BIT UCHAR_MAX USHRT_MAX UINT_MAX
ULONG_MAX UINTPTR_MAX INTPTR_MIN INTPTR_MAX
UINT8_MAX INT8_MIN INT8_MAX UINT16_MAX
INT16_MIN INT16_MAX UINT32_MAX INT32_MIN
INT32_MAX UINT64_MAX INT64_MIN INT64_MAX
UINT_LEAST8_MAX INT_LEAST8_MIN INT_LEAST8_MAX UINT_LEAST16_MAX
INT_LEAST16_MIN INT_LEAST16_MAX UINT_LEAST32_MAX INT_LEAST32_MIN
INT_LEAST32_MAX UINT_LEAST64_MAX INT_LEAST64_MIN INT_LEAST64_MAX
UINT_FAST8_MAX INT_FAST8_MIN INT_FAST8_MAX UINT_FAST16_MAX
INT_FAST16_MIN INT_FAST16_MAX UINT_FAST32_MAX INT_FAST32_MIN
INT_FAST32_MAX UINT_FAST64_MAX INT_FAST64_MIN INT_FAST64_MAX

The following is a list of methods provided as part of the Standard C Library which are also directly required by Duktape in order to function:

cbrt log2 log10 trunc fabs floor
ceil fmod pow acos asin atan
atan2 sin cos tan exp log
sqrt memmove memcpy memmove memcmp memset
strlen strcmp strncmp sprintf snprintf vsprintf
vsnprintf sscanf vsscanf malloc realloc calloc
free abort jmp_buf setjmp longjmp

  1. Love, R. Linux Kernel Development: A Thorough Guide to the Design and Implementation of the Linux Kernel (Developer’s Library). 3rd ed. Crawfordsville, Indiana.: Pearson Education, Inc., 2010. [return]
  2. Corporation, Intel. Intel 64 and IA-32 Architectures, Software Developer’s Manual. Vol. 3A: System Programming Guide, Part 1, 2016. [return]
  3. Plauger, P.J. The Standard C Library. 1st ed. Prentice Hall, 1992. [return]
  4. Elsner, & friends, D., Fenlason, F. Using AS, 2009. [return]
  5. Stallman, P. D, R.M, McGrath, R. & Smith. GNU Make: A Program for Directing Recompilation, 2016. [return]