Binary Input and Output
Unix (and most modern operating systems) views files simply
as a linearly ordered collection of bytes.
Structure (ie division of a file into lines,
records,
etc) is imposed on the file by application programs.
At the lowest level, all files can be read with the
functions in this group:
#include <stdio.h>
int open( char *name, int mode );
int close( int fd );
int read( int fd, char *buf, int n );
int write( int fd, char *buf, int n );
off_t lseek(int fd, off_t offset, int whence );
For this group of functions, the file's handle is a
small positive integer, usually called the file descriptor.
On Unix and many other systems,
the first three file descriptors are already open when programs
start:
| 0 | standard input |
| 1 | standard output |
| 2 | standard error |
For other files,
a call to open is made with the file name and
the mode as an argument and, if successful, the
file descriptor is returned.
Values for mode are:
| Value | Symbol | Operation |
| 0 | O_RDONLY | read only |
| 1 | O_WRONLY | write only |
| 2 | O_RDWR | read and write |
The values in the Value column are included to enable
you to read the many old programs which used these values
before the symbols in the second column became widely used
and accepted.
For portability, all new programs should obviously use the
symbolic values, O_RDONLY,
O_WRONLY, etc.
Example:
#include <stdio.h>
#define BUF_SIZE 1000
....
int f, n;
char buffer[BUF_SIZE];
f = open( "data_file", O_RDONLY );
if ( f > 0 ) {
while( (n=read(f, buffer, BUF_SIZE)) > 0 ) {
/* Process n bytes of data in buffer */
...
}
close( f );
}
else
perror("Opening data file");
Note
- open will return -1 if there is any error
(eg attempting to read a non-existent file)
- The function perror will print a short identificaiton
string if an error occurs.
Sometimes this string will identify the problem.
Unfortunately, sometimes it is so cryptic that you will need to
consult a reference or an expert on your operating system
for a translation into ordinary English.
- read returns the number of bytes actually read.
It will never be more than n, but may be less -
particularly when reading the end of a file.
Attempting to read past the end of the file will produce 0
or -1.
- close returns a value of 0 on success and -1 on error.
Errors can be caused by closing the file twice and a few
system dependent scenarios.
Most programmers use it as I have done in the example,
ie they don't bother to check the return!
Random Access Files
A random access file is one in which you
can access records in the file in any
order - not just one after the other. This means that you can
seek to the last record in the file, move back to the first one,
then read one in the middle, etc.
Associated with each file will be a read/write pointer: it points to the
current position in the file - the next byte which will be read or the
position where the next byte will be written.
| In streams, the read/write pointer is normally at the end of the file -
a stream is not a random access file, so you can only read forward and you
always write at the end. |
(Unix, however, does allow you to move the read/write pointer when
you are using the stream I/O routines, if the device will allow it!
For instance, you can't move backwards through the keyboard input.
But if you have opened a disc file as a stream, you can move forwards and
backwards throught it.)
|
#include <stdio.h>
off_t lseek( int fildes, off_t offset, int whence );
lseek takes the file descriptor as its first
argument and two other arguments which control its operation.
It returns the current position of the read/write pointer in bytes from the beginning
of the file.
off_t is usually the same as a
long int and
whence can take values:
| SEEK_SET | From the beginning of the file |
| SEEK_CUR | From the current position in the file |
| SEEK_END | From the end of the file |
which determine how offset is
interpreted.
offset can be positive or negative -
allowing
lseek to position forwards and
backwards from the beginning, current position or end.
Thus it can be used to both move the read/write pointer to a new position in the
file and to return information about the file such as the current position of the
read/write pointer and the length of the file.
Some examples of its use are:
#include <stdio.h>
int f;
long int length, cur_pos;
char buf[REC_LEN];
f = open( "test.dat", O_RDWR );
printf("This file has %ld bytes\n", length = lseek( f, 0L, SEEK_END ) );
/* Seek back to the beginning */
cur_pos = lseek( f, 0, 0 );
/* The file contains records which are REC_LEN bytes long,
read every 10th one */
while ( cur_pos < length ) {
read( f, buf, REC_LEN );
/* Move on 9 records */
cur_pos = lseek( f, 9*REC_LEN, SEEK_CUR );
}
close( f );
Communications Sockets and Other Devices
I/O to and from
communications sockets may use the
read and write functions, but it's more common to
use functions from a special library, the
sockets library.
This library is not part of the ANSI standard
and contains routines which are specific to communication
channels, such as
recv_message
and send_message.
Similarly, suppliers of other special purpose devices
will usually provide a library of functions which operate
on the device to enable it to provide its special functions.
Graphics devices (both input and output),
control devices (such as our robot controller),
data acquisition devices (one of which you will program
for your assignment next semester), etc, are examples
of such systems.
- Records
- A record is a collection of data items which logically belong together.
For example, a record in a bank's account database might contain
an account number, a name, an account classification code, a date the account
was opened, etc.
- Random Access Files
- Files in which records can be read or written in any order:
distinct from streams, in which records are read or written one
after the other.
- file descriptor
- A small integer used as the handle to a file. Often it is an
index into an array of structures which keep the attributes (access
mode, current position, input or output buffer, etc) of each
open file.
- communications sockets
- A socket is the name used for a communications port on
a computer. Programs which communicate with other computers do
so through sockets (a logical entity established by the operating
system on each computer). Each system provides a large number of
sockets which can share a single physical communications link.
Each service provided by a machine, eg remote login,
ftp, mail and Web server, is assigned a separate socket.
Sometimes called Berkeley sockets.
- sockets library
- A collection of library functions which are used when communicating
with another machine via a socket.
© John Morris, 1998