Objects First


Binary Input and Output

Unix (and most modern operating systems) views files simply as a linearly ordered collection of bytes. Structure (ie division of a file into lines, records, etc) is imposed on the file by application programs. At the lowest level, all files can be read with the functions in this group:
#include <stdio.h>

int open( char *name, int mode );
int close( int fd );

int read( int fd, char *buf, int n );
int write( int fd, char *buf, int n );

off_t lseek(int fd, off_t offset, int whence );
For this group of functions, the file's handle is a small positive integer, usually called the file descriptor. On Unix and many other systems, the first three file descriptors are already open when programs start:
0standard input
1standard output
2standard error
For other files, a call to open is made with the file name and the mode as an argument and, if successful, the file descriptor is returned. Values for mode are:
ValueSymbolOperation
0O_RDONLYread only
1O_WRONLYwrite only
2O_RDWRread and write
The values in the Value column are included to enable you to read the many old programs which used these values before the symbols in the second column became widely used and accepted. For portability, all new programs should obviously use the symbolic values, O_RDONLY, O_WRONLY, etc.

Example:

#include <stdio.h>
#define BUF_SIZE  1000
  ....
  int f, n;
  char buffer[BUF_SIZE];
  f = open( "data_file", O_RDONLY );
  if ( f > 0 ) {
    while( (n=read(f, buffer, BUF_SIZE)) > 0 ) {
      /* Process n bytes of data in buffer */
      ...
      }
    close( f );
    }
  else
    perror("Opening data file");
Note
  1. open will return -1 if there is any error (eg attempting to read a non-existent file)
  2. The function perror will print a short identificaiton string if an error occurs. Sometimes this string will identify the problem. Unfortunately, sometimes it is so cryptic that you will need to consult a reference or an expert on your operating system for a translation into ordinary English.
  3. read returns the number of bytes actually read. It will never be more than n, but may be less - particularly when reading the end of a file. Attempting to read past the end of the file will produce 0 or -1.
  4. close returns a value of 0 on success and -1 on error. Errors can be caused by closing the file twice and a few system dependent scenarios. Most programmers use it as I have done in the example, ie they don't bother to check the return!

Random Access Files

A random access file is one in which you can access records in the file in any order - not just one after the other. This means that you can seek to the last record in the file, move back to the first one, then read one in the middle, etc. Associated with each file will be a read/write pointer: it points to the current position in the file - the next byte which will be read or the position where the next byte will be written.
In streams, the read/write pointer is normally at the end of the file - a stream is not a random access file, so you can only read forward and you always write at the end. (Unix, however, does allow you to move the read/write pointer when you are using the stream I/O routines, if the device will allow it! For instance, you can't move backwards through the keyboard input. But if you have opened a disc file as a stream, you can move forwards and backwards throught it.)
#include <stdio.h>

off_t lseek( int fildes, off_t offset, int whence );
lseek takes the file descriptor as its first argument and two other arguments which control its operation. It returns the current position of the read/write pointer in bytes from the beginning of the file. off_t is usually the same as a long int and whence can take values:
SEEK_SETFrom the beginning of the file
SEEK_CURFrom the current position in the file
SEEK_ENDFrom the end of the file
which determine how offset is interpreted. offset can be positive or negative - allowing lseek to position forwards and backwards from the beginning, current position or end.

Thus it can be used to both move the read/write pointer to a new position in the file and to return information about the file such as the current position of the read/write pointer and the length of the file. Some examples of its use are:

#include <stdio.h>

int f;
long int length, cur_pos;
char buf[REC_LEN];

f = open( "test.dat", O_RDWR );
printf("This file has %ld bytes\n", length = lseek( f, 0L, SEEK_END ) );
/* Seek back to the beginning */
cur_pos = lseek( f, 0, 0 );
/* The file contains records which are REC_LEN bytes long,
   read every 10th one */
while ( cur_pos < length ) {
	read( f, buf, REC_LEN );
	/* Move on 9 records */
	cur_pos = lseek( f, 9*REC_LEN, SEEK_CUR );
	}
close( f );

Communications Sockets and Other Devices

I/O to and from communications sockets may use the read and write functions, but it's more common to use functions from a special library, the sockets library. This library is not part of the ANSI standard and contains routines which are specific to communication channels, such as recv_message and send_message.

Similarly, suppliers of other special purpose devices will usually provide a library of functions which operate on the device to enable it to provide its special functions. Graphics devices (both input and output), control devices (such as our robot controller), data acquisition devices (one of which you will program for your assignment next semester), etc, are examples of such systems.

Key terms

Records
A record is a collection of data items which logically belong together. For example, a record in a bank's account database might contain an account number, a name, an account classification code, a date the account was opened, etc.
Random Access Files
Files in which records can be read or written in any order: distinct from streams, in which records are read or written one after the other.
file descriptor
A small integer used as the handle to a file. Often it is an index into an array of structures which keep the attributes (access mode, current position, input or output buffer, etc) of each open file.
communications sockets
A socket is the name used for a communications port on a computer. Programs which communicate with other computers do so through sockets (a logical entity established by the operating system on each computer). Each system provides a large number of sockets which can share a single physical communications link. Each service provided by a machine, eg remote login, ftp, mail and Web server, is assigned a separate socket. Sometimes called Berkeley sockets.
sockets library
A collection of library functions which are used when communicating with another machine via a socket.

Continue on to Structures and sizeof
Back to the Table of Contents
©
John Morris, 1998