NAME
scan.h - a simple reentrant text scanner
SYNOPSIS
These routines only distinguish delimiter characters from comment characters and white space characters. In other words, these scanners are powerful enough to scan languages such as the Bourne-shell or lisp, but could not handle C language comments. However, we can scan character arrays and files identically.
#include <scan.h>
SCAN *scan_create(int isfile, void *src);
SCAN *scan_recreate(SCAN *s, int isfile, void *src);
void scan_flush(SCAN *s);
void scan_destroy(SCAN *s);
DESCRIPTION
The scan package provides a simple method for scanning text from character strings and and file pointers. The package distinquishes between four types of characters: white space, delimiters, comment characters, and everything else (the other type).
In a nutshell, the scan_get() function will skip all white space and return the first sequence of consecutive characters that contain only "other" characters. If a comment characters is encountered, then all characters are skipped until a newline is found. If a delimiter is found then only the delimiter is returned. Thus, scan would be a decent scanner for either a lisp-like language or a bourne-shell-like language.
Type Declarations
The following types are defined in the header file scan.h.
SCAN
typedef struct SCAN {
char *delims, *whites, *comments;
char *buffer, *ptr;
ARRAY *token;
FILE *fp;
} SCAN;
The user can set the first three fields to meaningful value. Every other field should remain hidden.
Function Definitions
The following function prototypes are given in the header file scan.h.
SCAN *scan_create(int isfile, void *src);
Creates and returns a pointer to a SCAN structure. If isfile is non-zero then src should be a valid FILE pointer to read text from. If isfile is zero then src should be a NULL terminated character string that you wish to scan.
SCAN *scan_recreate(SCAN *s, int isfile, void *src);
This function is takes an already created SCAN structure pointer and reinitializes it to a new source. This is usefull if you are scanning multiple strings. The delims, whites, and comments fields remain unchanged.
Returns a pointer to a string which contains the next "word" or delimiter. If an EOF or end of string is encountered, then a NULL pointer is returned.
The function is the same as scan_get() except that the internal pointers of s are maintained such that a subsequent call to scan_get() or scan_peek() will return the same "word".
void scan_flush(SCAN *s);
This function flushes the internal buffers until a newline is encountered, which is useful if your scanner detects a parse error.
void scan_destroy(SCAN *s);
This last function will free up all associated memory of s. and return it to the system.
BUGS
Note that for the return values of the functions scan_get() and scan_peek() the caller does not "own" the memory since the return values point to internal buffers. Subsequent calls to these function will wipe out the old values. If you need the results to have a longer life-span then you should copy them to memory that you allocated yourself.
AUTHOR
Gary William Flake (gary.flake@usa.net).
SEE ALSO
regex(2), fgets(3), getc(3).