Some fundamental OS/2 concepts
Part 2


Clicking on a section heading will get you back to this table of contents.

Contents

Disk caches

A disk file is made up of blocks, each of which has a fixed size (typically 512 bytes, on a PC). Application software is usually working with records whose size is smaller than the block size, i.e. most commonly each block holds several records or part-records. Thus, an operation on data in a disk file is very likely to be followed by another operation on the same disk block.

The same holds for the "housekeeping information" on the disk. If, for example, the software has recently read from a particular directory, then it's quite likely that there will be a later request for the same directory.

A disk cache is a region of main memory that holds copies of some of the disk blocks. When a block is requested, the file system software first checks to see whether it's already in the cache, and if so a slow disk operation can be avoided. If the block is not in the cache, it is placed in the cache as part of the read operation. As the cache fills up, older blocks have to be discarded. Usually the software uses some form of "least recently used" algorithm, which discards those blocks that haven't been used for a long time.

Smart cacheing algorithms often also include some form of "read ahead". If you read block N of a file, then it's likely that you'll soon want block N+1, so the file system uses any spare processor time to read that block as well. Sometimes you won't want block N+1, in which case the read-ahead has been wasted; but on average, this approach saves enough time to compensate for the occasional wasted operation.

As long as you're only reading from the disk, and not writing, the cache contents are identical with what is on the disk. Once you start writing, this might no longer be true. There are two popular ways to manage write operations with a cache:

Lazy Write is a good way to gain speed, but it does have one risk: if there's a power supply failure, or a system crash, some of the modifications will never be written to the disk. That's a relatively minor loss if it's just a matter of updating a user file. It can be a much more serious loss if the lost information includes things like INI files, directory updates, and file allocation tables. In the worst case, you can get a system that won't re-boot because of disk corruption.

[When I used Windows, I found it simply wasn't safe to enable lazy writing, because the disk corruption got a little worse each time an application crashed the system. Now that I use OS/2 I usually have Lazy Write enabled, because faulty applications usually don't stop the entire system.]

Surprisingly, quite a few people seem to shut down OS/2 by turning off the power, rather than going through a proper shutdown operation. Such people should never have Lazy Write enabled. (Actually, such people shouldn't be allowed in the same room as a computer - but that's another story.) Lazy Write should also be disabled on systems where security is more important than speed.

If you ever do get into the situation where your system is locked up so hard that you can't shut it down, you should use the Ctrl/Alt/Del method of stopping the system. (If nothing happens, wait a few seconds and try Ctrl/Alt/Del again. The second attempt usually succeeds.) Although this doesn't do a proper shutdown, it does at least attempt to ensure that the disk caches are flushed.

So much for the theory. Now let's look at the cache parameters you can control.

DISKCACHE

DISKCACHE controls the cache size for FAT partitions. If you have any FAT partitions, your CONFIG.SYS should have a line that says something like
  DISKCACHE=3328,LW,128,AC:CD
The parameters have the following meanings. I've been told, although I don't know for sure, that DISKCACHE has no effect on floppy disk operations.

Remark: many OS/2 users keep one or more FAT partitions for DOS compatibility, but have their most important files on HPFS partitions. In practice this means that they just don't use FAT files very often. In such cases, it probably makes sense to keep the DISKCACHE small, or eliminate it altogether. This would slow down FAT operations, but would save enough main memory to make the rest of the system faster.

Parameters on the IFS=HPFS line

This is for HPFS volumes. If you're using HPFS, your CONFIG.SYS will include a line like the following.
   IFS=F:\OS2\HPFS.IFS /CACHE:2048 /CRECL:64 /AUTOCHECK:EF
The CACHE parameter gives the size of the cache, in 1K units. The CRECL parameter specifies the maximum record size for cacheing, again in 1K units. (If you read or write something that's larger than CRECL in a single operation, the cache is bypassed.) AUTOCHECK says which partitions will be checked by CHKDSK after an improper shutdown - you should list all your HPFS partitions here, because it's dangerous to try running OS/2 with a corrupted file system.

You might also see a parameter like /F:2 on this line. This specifies what "level" of disk checking is to be done by CHKDSK. The default value is 2, and that's the best value for almost all situations.

CACHE.EXE

The program CACHE.EXE allows a bit more fine tuning of the parameters used by the HPFS cache. You can optionally run it from your CONFIG.SYS or from your STARTUP.CMD; or, for that matter, you can run it from a command line after the system has started. If you never run it, the cache still works (because of what was specified in the IFS=HPFS line), and the timing parameters are left at some default values.

The command "CACHE", without any parameters, returns a list of the parameters currently in force. Here's a typical output:

            DiskIdle:     1000 milliseconds

              MaxAge:    30000 milliseconds

          BufferIdle:    10000 milliseconds

          Cache size:     2048 kbytes

  3 Lazy write worker(s) are enabled.

  1 Read ahead worker(s) are enabled.
The information about cache size, and about the number of "workers", is fairly obvious, but the three timing parameters need more explanation. Here is what they mean. The on-line documentation says that DiskIdle should be greater than BufferIdle. This appears to be a documentation error.

Parameters on the IFS=CDFS line

This is for CD-ROM volumes. If you're interested, check the on-line documentation for CDFS. For most people the efficiency of CD-ROM operations doesn't have a major effect on overall system performance, so you don't need a large CD-ROM cache. You might as well stick with the parameter values that were chosen for you during installation.

The BUFFERS parameter in CONFIG.SYS

This specifies how many 512-byte buffers to reserve in main memory for I/O operations that are still in progress. These are separate from the disk cache(s), and are present even if you don't have cacheing enabled. You can think of the data as going through a pipeline
   Disk <--> cache <--> buffers <--> application software

Processor caches

A cache makes sense wherever there's a speed discrepancy. The reason for using a disk cache is that main memory is so much faster than the disk hardware. With modern processors, we also find that the main memory is itself a bottleneck, in that it can't deal with data as fast as the processor can produce or consume the data. In these circumstances, it makes sense to put a cache in between the processor and the main memory. Physically, this cache is just a block of very fast memory, together with the hardware that makes it act as a cache. It costs more per byte than ordinary main memory, but then you don't need as much of it.

The original 8086 was not significantly faster than conventional memory, so it did not include a cache. (It had an internal instruction pipeline that could be thought of as a cache, but it was only a few bytes long.) Later members of the 80x86 family are faster, and include an internal cache - that is, a cache memory that's implemented as part of the processor chip. The newer the processor, the bigger the on-chip cache.

With the very fastest processors, you need a bigger cache than will fit on the chip. (OK, strictly speaking you don't need it; but it does make a big performance difference.) Some motherboard manufacturers now include a "level 2 cache", which effectively increases the size of the on-chip cache. The main difference between an on-chip cache and a level 2 cache is that the latter is physically placed on the motherboard, rather than on the processor chip.

[Remark: the hardware manufacturers often neglect to test their hardware with true multitasking software, which means that the hardware can't always go as fast as the manufacturer thinks it can. As a result, OS/2 sometimes can't be installed on hardware with a level 2 cache. The solution is to disable the level 2 cache (via the BIOS setup options), install OS/2, and then re-enable the level 2 cache.]

Unlike disk caches, the operation of a processor cache is not controlled by software. You need to do it all in hardware. Of course the necessary hardware is always included, so this is a non-issue for most users.

Some people discover that their system slows down if they add more main memory. (Normally, adding memory should make your system faster.) When this happens, it means that the new memory is not being cached. To fix the problem, it's necessary to configure the hardware in such a way that cacheing is enabled for all of main memory. Typically this means that the size of the level 2 cache must be increased.

Segmentation and paging

Segmentation is a hardware memory management scheme that protects programs from one another, and (to a certain extent) from themselves. The implementation is such that no program can read from or write to the memory segments belonging to another program, except in the case of segments that have been specifically set up to be shared. In addition, the hardware prohibits operations like writing to a read-only segment, modifying a code segment, executing code in a data segment, and so on. In the 80x86 implementation, there are also privilege controls that place restrictions on who can call whom, and on which code segments are allowed to execute "dangerous" operations, e.g. direct manipulation of the I/O ports.

Paging is a different hardware memory management scheme, whose primary purpose is to support disk swapping (see later). Paging hardware typically also contains some protection mechanisms - e.g. the designation of some pages as read-only - but the protection is not as complete as with segmentation.

Segmentation hardware in the 80x86 family first appeared in the 80286. (It's sometimes said that the 8086 had segmentation, but that's an abuse of terminology. It had an addressing mechanism that looked a little like segmented addressing, but this wasn't true segmentation, because the protection hardware was missing.) Starting with the 80386, the processors in this family have both segmentation and paging. OS/2 uses both of these hardware features, which is why the current versions of OS/2 require an 80386 or better.

The ideas behind segmentation are actually much older than the 8086; but for many years there were very few computers that put the ideas into practice, because of the high cost of the hardware. (Paging, being rather cruder and simpler, got more support from the hardware designers.) One of the most important innovations in the 80286 was that the designers managed to fit the segmentation hardware onto the processor chip itself, which made mass production possible at an affordable cost.

The 80386 address translation hardware is a little unusual in that it uses a two-stage translation. Within a program - that is, in the executable machine code - addresses are expressed as a pair (segment number, offset within segment). (In the majority of machine language instructions the segment number is not explicitly included, because the hardware provides for some "current segment" defaults. Nevertheless, the programmers have to remain aware of which segment they're talking about.) The segmentation hardware translates each such address into what's called a "linear address". Then the paging hardware takes the linear address, splits it up into the pair (page number, offset within page), looks up its own tables that give the physical address of each page, and finally produces a physical address. It is this physical address that is sent to the main memory as part of an instruction fetch, a memory read, etc.

At first sight it might appear that the segmentation hardware and the paging hardware are doing the same thing. If this were true, it would of course be a waste to have both sets of hardware, since it wouldn't be doing any more than could be achieved with a one-stage translation. There are, however, several important differences:

  1. Segment sizes are variable, page sizes are fixed by the hardware. Since segmentation is all about protection, the segment boundaries are at the "natural protection boundaries" of the software, for example the boundaries between two modules. The segment size is set to whatever the programmer needs it to be. Page boundaries, on the other hand, are unrelated to the structure of the software. An essential attribute of paging hardware is that all pages have the same size.
  2. The segmentation hardware is not merely performing an address translation; it is also doing legality and privilege checks. The paging hardware does some elementary checks on access rights, but basically it's just doing an address translation via table lookup.
The point here is that the paging hardware and the segmentation hardware are serving two different purposes. In a typical operating system, the segment tables are maintained by that part of the software that looks after memory allocation, and the page tables are the concern of the software that does disk swapping. The two hardware mechanisms are independent of each other, and this separation can be reflected in the software design: the disk swapping software doesn't need to know anything about segmentation, and the memory mapping software doesn't need to know anything about paging. This "separation of concerns" is very useful in terms of being able to write bug-free software.

Both the segmentation hardware and the paging hardware have to look up tables in main memory in order to do their job. This sounds like a major overhead, and indeed it would be if every address translation triggered several extra memory references. What makes the whole system work is that both sets of hardware have their own private caches (in high-speed memory) for the translation tables.

The swap file

The advantage of a multitasking operating system is that you can run several programs at the same time. One of the disadvantages is that this encourages you to overload your system. In particular, you're very likely to need more main memory than is physically present on your machine.

Swapping is a technique that lets you use a disk file as an extension of main memory. It works as follows. A program's memory consists of a number of what are called virtual pages. The paging hardware maps virtual pages into physical pages. Each entry in the page table (the address translation table for paging) contains a physical page number, but it also contains several flags, and one of these flags is used to signal a "physical page not present" condition.

As long as your software is using memory pages that are physically present in main memory, nothing unusual happens. If, however, the paging hardware detects a "page not present" condition, it issues an interrupt called a "page fault" interrupt. The interrupt routine then has to deal with this condition.

There are at least two possible causes of a page fault. The obvious cause is a programming error where the software is trying to address something outside its legal range. Of course there's nothing you can do about that but abort the program. The less obvious cause, but in fact the most common one, is that the address is legal but the swapping software has not yet loaded that page into main memory. When that happens, your program is temporarily suspended while that page is fetched from disk; and then the program can proceed again.

The overall effect is that the effective size of main memory is increased by the size of a special disk file called the swap file. The system software that looks after paging and swapping manages the movement of pages between main memory and the swap file as needed.

Some systems require the swap file to have a fixed size. In OS/2, a line in CONFIG.SYS specifies the initial size of the swap file, but the swap file can subsequently grow and shrink as needed. This flexibility comes at a cost: while the swap file is growing, the extra overhead causes your system to slow down substantially. To avoid that overhead, it's best to make the initial swap file size so large that it will rarely need to grow.

Quite a lot of what's in main memory is executable code that won't be altered during execution. When this code has to be bumped out of main memory to make room for something else, the memory image doesn't need to be saved in the swap file; it can be re-read, the next time it's swapped in, from the original source file. This helps to keep the swap file small. Executable code pages are normally marked "discardable", to tell the swapper that they need not be saved in the swap file.

There is, however, a slight time penalty in making code pages discardable. Code saved in the swap file is saved in the form of a memory image, i.e. it's an exact copy of what was in main memory. Code in an EXE or DLL file has a slightly more complicated format, and requires some processing by the system loader as it's being loaded into memory. To reduce the time overhead, some frequently used code is marked "swappable" rather than "discardable", to force it to be written to the swap file when it's swapped out. (If you've used several versions of OS/2, you might have noticed that the swap file gets bigger than it used to be in earlier versions.) Although this increases the disk overhead, it makes the overall system a little faster.

Thrashing

There is, of course, an overhead involved in swapping. Code and data blocks have to be moved backwards and forwards between main memory and the swap file, and this takes time.

As long as the demand on main memory is small, the overhead isn't particularly noticeable. While something is being swapped into or out of main memory, the processor can occupy itself executing something else, some other piece of code that's not related to what's being swapped. Some processor time gets lost in looking after the disk operations, of course, but not a huge amount.

You can, however, reach a situation where the system is spending nearly all its time on swapping, and very little time doing useful work. This condition is known as thrashing. While the system is thrashing, you get the impression that the entire system has slowed to a crawl. You'll also notice that the disk drive is very busy.

When that happens, the obvious remedy is to close down one or more applications, in order to make more main memory available. (The problem is not so much a processor overload; the real problem is a shortage of main memory.) But typically the "close" operation itself requires some main memory, so the thrashing problem gets worse before it gets better. You have no choice but to wait around until the system recovers.

If thrashing happens only rarely, you just learn to live with it. If it's severe and/or frequent, more drastic remedies are needed. To solve a problem of chronic memory shortage, you need to do one or more of the following.

Real mode and protected mode

The first processors in the 80x86 family - the 8086, 8088, 80186, and 80188 - had no proper segmentation hardware, and no way to protect programs from one another. They did use addresses of the form (segment number, offset within segment), but the "segment number" part was simply a base address, not a reference to the (nonexistent) segment tables. In effect, the software worked directly with physical addresses.

Segmentation and protection were introduced with the 80286, but the designers faced a compatibility problem: most of the existing software was written for the 8086, therefore the 80286 had to be capable of executing 8086 software. The solution they adopted was to define two operating modes for the processor. In "real mode", the processor acted just like an 8086, and the segmentation hardware was disabled. In "protected mode", the new protection features were enabled.

As it turned out, not much software was written for the 80286. The dominance of the DOS/Windows market meant that most people used the 80286 as an 8086 emulator. The advanced features were largely wasted.

The 80386 introduced a new twist. It still had real and protected modes, but in protected mode it was possible to define a special segment type that acted as an 8086 emulator. This allowed you run a protected-mode operating system, and still have a mechanism to run all those legacy applications without having to re-boot back to real mode. It's this "virtual 8086" mode that OS/2 uses to run DOS/Windows applications.

Given this feature, there's no longer much need for real mode. The processor still boots up in real mode, but the OS/2 initialisation routines switch the processor into protected mode almost immediately.

Now and then you get a DOS application - usually a game - that won't run even under the OS/2 DOS emulation. In that case your only option is to switch back into real mode and run "pure" DOS. OS/2 provides a "hibernate" feature that does this for you - in effect, it re-boots the machine so that OS/2 is no longer in charge.

What's a 32-bit program?

There's been a lot of rubbish written about 32-bit software. Most people don't know exactly what it is, but they have the impression - aided by plenty of advertising - that 32-bit software is somehow superior to 16-bit software.

It's all based on a misconception. I'll explain why later in this section. In fact, 32-bit software is usually slower and more memory-hungry than equivalent 16-bit versions.

In the original 8086, most internal registers were 16 bits wide. The processor used 32-bit addresses, but these were broken down into a 16-bit segment base and a 16-bit offset. Since the segment base was implicit in most instructions, it was common to refer to these 32-bit addresses as 16-bit addresses. (To complicate matters, main memory addresses were only 20 bits wide.)

Later models of the processor, starting with the 80386, expanded many of the internal registers to a 32-bit width. This turned the addresses into 48-bit addresses: 16 bits for the segment number, and 32 bits for the offset within the segment. (Again, most people call these addresses 32-bit addresses rather than 48-bit addresses.) In addition, these later processor models introduced some new instructions and new addressing modes.

This brings us right back to the question of upwards compatibility. You can't just change the register sizes and still expect the old 8086 software to execute correctly. To solve this, each code segment has two special flags, which are stored in the segment descriptor. (The segment descriptor is the table entry that the segmentation hardware uses to do its address translation.) One of these flags says whether the code uses 32-bit data registers or just the lower 16 bits. The other flag specifies whether the code is using addresses with 16-bit offsets or addresses with 32-bit offsets. This is done on a per-segment basis, thereby allowing a mixture of software using the new and the old conventions. You can even call a 16-bit procedure from a 32-bit code segment, or vice versa.

(In fact, there are even special "escape" codes that allow isolated 32-bit instructions in a 16-bit segment, or vice versa.)

Do you really need 32-bit data registers? My own experience (and I've written a lot of software over the years) is that 16 bits is adequate nearly all of the time. There are a few special situations where 32-bit variables must be used, but those situations don't arise all that often.

On the other hand, the use of 16-bit data does mean that the programmer has to be conscious of the possibility of overflow, and to design the software accordingly. The world has a lot more bad programmers than good programmers, and there's a lot of software out there that doesn't take this complication into account. For the weaker programmers, a move to 32-bit data registers reduces the probability of error, and that's probably a good thing.

The situation with respect to addresses is a bit more complicated. With a 16-bit offset, the maximum segment size is 64 kilobytes, and that's not a lot of memory. There are several situations where you need bigger segments.

  1. Some applications necessarily work with very large arrays - or other large data structures - and it would be an unreasonable imposition on the programmer to have to break up such an array over several segments. This is one case where there's a very clear need for addresses with 32-bit offsets.
  2. Some programmers still insist on writing huge monolithic pieces of software, rather than designing their software in a modular way. As a result, they need large segments. I don't consider this to be a good excuse, but it can't be denied that the demand is there.
  3. Some compilers aren't smart enough to take advantage of segmentation - they try to pack all of your modules into the same physical segment. Until we can talk the compiler-writers out of this, we're again stuck with very large segments.
  4. There's a marked preference, among many programmers, for something called the "flat memory model". I'll explain what that is in a later section. Such a model inevitably leads to very large segments.
In an ideal world, we'd use 16-bit code segments (for efficiency) wherever possible, and 32-bit code segments only where they're needed or where the improved addressing modes would give more efficient code. Unfortunately, it's rare to find compilers that will let you mix segment types in this way. Most insist that you choose one or the other.

OK, that goes some way towards explaining what 32-bit software is all about. It doesn't yet explain why so many people are in a hurry to get rid of their 16-bit applications and "move up" to 32-bit versions. What's the attraction?

The answer lies in a historical accident. The 32-bit support in the 80x86 family appeared at roughly the same time as operating systems were starting to take advantage of the processor's protected mode. The move to protected mode was definitely a step forward. Most PC users were getting heartily sick of the "General Protection Fault" syndrome, and it was a real relief to move away from the situation where one crashed application could bring down the entire system. The software vendors weren't particularly clear on the distinction between "protected mode" and "32-bit software". The sloppy advertising meant that most users were also confused about, or even unaware of, the distinction.

The flat memory model

When segmentation was first introduced, it was seen as a significant step forward. It gave the prospect of proper hardware support for "clean" programming practices, such as making a clear distinction between the executable parts of a program and the data sections. If the idea had taken hold at that time, we would probably have had more reliable software by now.

Unfortunately, segmentation was expensive. It required hardware that did sophisticated address translation at high speed (so as not to create unacceptable time overheads). This was possible with the technology of that time, but it would have added significantly to the overall cost of a computer. Presumably the hardware designers decided that the extra cost was not justified; in any event, there were very few commercial implementations of the idea.

By the time the 80286 appeared, the technology had finally caught up; at last, it was possible to put into practice, at an acceptable cost, an idea that had been around for many years.

It's a pity, then, that hardly anyone used the segmentation hardware. There were several reasons for this.

There was also a language-related problem, which will be discussed below. First, though, let us look at what is meant by "linear addressing".

An address space is linear if addresses can be combined according to the laws of linear arithmetic. For example, if p1 and p2 are two pointers, then the operations p1+p2 and p1-p2 should produce two other valid pointers.

A segmented address space is definitely not linear. With a segmented memory, p1+p2 never makes sense, and p1-p2 has a meaning only if p1 and p2 are pointers into the same segment. The nonlinearity of a segmented address space puts some severe limitations on what sort of address arithmetic is possible.

In a strongly typed programming language, these limitations make sense. In fact, a segmented address space is ideally suited to the implementation of the more modern programming languages (e.g. Ada, Modula-2, Oberon) which stress the concept of modularity. There is a very natural mapping between modules and segments.

As it happened, however, many of today's operating systems were designed at a time when the older language C was near a peak of its popularity, and C does not match particularly well with a segmented memory model. C permits some operations (mixing pointers to code with pointers to data, unrestricted linear address arithmetic, etc.) which the segmentation hardware would trap as illegal. One might well argue that the only things that segmentation would prevent are those things that a sensible programmer wouldn't do, but that's irrelevant. A compiler has to permit anything that the language standard permits - even the stupid things - or it's not a standard-conforming compiler.

As a result, C programmers generally insist on having a linear address space - and this tradition seems to be continued by the C++ programmers. This has had a major influence on OS/2, because so much OS/2 software is written in C or C++.

Can you get a linear address space with 80x86 hardware? Well, you can't actually disable the segmentation, but there's a way of pretending that it isn't there. The trick is to combine your entire program (both code and data) into one huge segment, and to set up the segment registers so that they all select the same segment. Another way of looking at this is to say that you have several segments, but they overlap precisely so that they're all in the same physical memory. And that's what the "flat memory model" of the 80x86 (more precisely, the 80386 or higher) is all about.

As you might guess from the above discussion, I'm not particularly a fan of the flat memory model. In fact, I think it's crazy to throw away the advantages of segmentation. I can afford to say that because I'm not selling any OS/2 software. The software vendors don't have the same luxury; you can go out of business by calling your customers crazy. The flat memory model is what most OS/2 programmers want, and that's what they get.


[ Part 1 | Part 2 | Part 3 ]
This information was compiled by Peter Moylan. Please send complaints, criticisms, praise, corrections, suggestions, etc. to peter at ee.newcastle.edu.au
Last modified: 23 July 2004