MadSci Network: Computer Science
Query:

Re: FAT and clusters

Area: Computer Science
Posted By: Keith Little, Computer Science
Date: Wed Oct 2 15:24:12 1996
Message ID: 843419507.Cs


Lance,

The disk system is one of the truly grey (and complex) areas of IBM PC (or clone), so 
please pardon me if I give a somewhat simplistic answer.  Disk access on the IBM PC (or 
clone) is facilitated on 4 levels:

1) Application program to MS-DOS file system (industry standard).
2) MS-DOS file system to BIOS (industry standard).
3) BIOS to disk controller (manufacturer specific).
4) Disk controller to drive (manufacturer specific).

We'll analyze  this issue from bottom up.

At the lowest level, the disk drive hardware has one or more magnetically coated platters which are 
accessed by one or more read/write heads (like on your home tape recorder).  These signals are
recorded on the platters in concentric rings, or "tracks", sort of like tree rings.  When these platters
are stacked vertically, the tracks on each platter align to form what are commonly called 
"cylinders". Furthermore, each track is subdivided into arc segments called "sectors".  All this forms the basis
for the BIOS access method known as "CHS" or Cylinder/Head/Sector.

Before the sectors on a hard drive can be accessed, It must be "Formatted" (at what's 
called the "Low Level").  This is usually done by using the BIOS ROM on the controller, 
or a disk maintenance program, such as the "Ontrack" utility.  Each sector written on a 
particular track has a unique number written into its "Header", by which it can be later 
identified (0 to n).  Before this, the drive is uninitialized, with random magnetic patterns.

After this, the "FDISK" program must be run to create the "Master Boot Record" and the 
"Partition Table" (this may be done manually, or by some system setup program).   it's 
also possible to have more than one operating system on a drive, each in a separate 
partition - such as UNIX, MS-DOS, OS/2 or Windows NT.

The drive must be now be formatted again!  This time, it's done with what's called a "High Level"
format.  This is where MS-DOS (or some other operating system) writes its necessary control
structures.  In the case of MS-DOS, the "Volume Boot Sector", "File Allocation Table", "Root
Directory" and "Data Area", where it keeps the actual file data.

MS-DOS keeps information in "files", which are made up of "clusters" (the actual term is 
"allocation unit").  Each cluster contains some number of sectors, based upon the kind of 
storage unit (floppy drives have either 1 or 2 sectors per cluster, hard drives have 4, 8, 16 
32 or 64, depending on the total amount of storage).  Files then, are made up of a chain of 
clusters, which can be located all over the drive (various different platters, tracks and 
sectors).  They may also be in one contiguous (physically linear) area (such as an entire 
track - 64 sectors, for instance).  That however, is not usually the case, at which point the 
drive is said to be "Fragmented", and must be "Defragmented" or "Defragged", to 
improve access performance by rewriting the data for each file contiguously.

The MS-DOS "FAT" or "File Allocation Table" is a bunch of numeric entries describing 
which cylinder(s), head(s) and sector(s) a particular cluster is located (I.E. cylinder 50, 
head 5, sectors 0-7).  When a file is created, an entries are created in the FAT and root 
directory, which point to the first "Free" allocated cluster.  When more of the file is to be 
written, more free unallocated clusters are allocated to the file and the FAT is kept updated.

Now here's the curve ball:  Originally, IBM had one kind of drive made for them, with a particular
cylinder/head/sector geometry which fit into the (software interrupt 13H) BIOS disk access
scheme.  Since then, the PC clone came along, as did lots of drives from other disk vendors, each
with it's own unique geometry.  Some drives even have more cylinders and sectors than the C/H/S
system allows for!  This is because they were developed independently from the IBM standards, for use on other systems.

To solve this problem, the drive and controller vendors have come up with various 
schemes that "Translate" a "Logical" cylinder/head/sector (the INT 13H CHS system) into 
a "Physical" cylinder/head/sector known only by the drive and controller.  Therefore, the 
operating system doesn't always deal with the real sector ID when accessing data!

Hope this helps!

Keith Little


Current Queue | Current Queue for Computer Science | Computer Science archives

Return to the MadSci Network




MadSci Home | Information | Search | Random Knowledge Generator | MadSci Archives | Mad Library | MAD Labs | MAD FAQs | Ask a ? | Join Us! | Help Support MadSci
MadSci Network
© Copyright 1996, Washington University. All rights reserved.
webadmin@www.madsci.org