Text File Processing

We provide 5 functions : readLine, readLines, readLines!, writeLine and writeLines for basic text file read and write. Character carriage, a new line, or the combination of carriage and a new line will be treated as the delimiter of lines when the system reads a line from a text file. When the system writes a line to a text file, a line delimiter will be appended to the line. The line delimiter varies depending on the operating system. In WINDOWS systems, the delimiter is the combination of carriage and a new line. In other systems, the line delimiter is the new line character.

Read and Write a Single Line

The writeLine function writes a single line to the given file. The function will automatically append a line delimiter to the string. Thus the string shouldn’t end with a line delimiter. If the operation succeeds, the function returns 1; otherwise, an IOException will be raised. The readLine function reads a line from the given file. The return line doesn’t include the line delimiter. If the file reaches the end, the function will return a NULL object which can be tested by the isVoid function. If operation fails due to other reasons, an IOException will be raised.

$ x=`IBM`MSFT`GOOG`YHOO`ORCL
$ eachRight(writeLine, file("test.txt","w"), x)
$ fin = file("test.txt")
$ do{
$    x=fin.readLine()
$    if(x.isVoid()) break
$    print x
$ }while(true);

IBM
MSFT
GOOG
YHOO
ORCL

Read and Write Multiple Lines

The writeLines function writes multiple lines to the given file. The function will automatically append a line delimiter to each line. If the operation succeeds, the function returns the number of lines written; otherwise, an IOException will be raised. The readLines function reads a specified number of lines from the file. The default number of lines to read is 1024. The function returns if the file reaches the end or the given number of lines has been read. The file reaches the end if the returned number of lines is less than specified. If the operation fails due to other reasons, an IOException will be raised.

$ timer(10){
$    x=rand(`IBM`MSFT`GOOG`YHOO`ORCL,10240)
$    eachRight(writeLine, file("test.txt","w"),x)
$    fin = file("test.txt")
$    do{ y=fin.readLine() }while(!y.isNull())
$    fin.close()
$ };

Time elapsed: 271.035 ms

$ timer(10){
$    x=rand(`IBM`MSFT`GOOG`YHOO`ORCL,10240)
$    file("test.txt","w").writeLines(x)
$    fin = file("test.txt")
$    do{ y=fin.readLines(1024)}while(y.size()==1024)
$    fin.close()
$ };

Time elapsed: 33.503 ms

The example above compares the efficiency of single line processing with multiple lines processing. The latter is about 9 times faster than the former. The readLines function creates a string vector to return for every call. It takes some time to create a string vector, so it could save more time if we can reuse the same vector as the buffer during repeated function calls. readLines! is such a function that accepts the existing buffer as data holder. The 2 examples below read the same amount of data for 100 times. The readLines! function is faster than the readLines function.

$ timer(100){
$    fin = file("test.txt")
$    do{ y=fin.readLines(1024) } while(y.size()==1024)
$    fin.close()
$ };

Time elapsed: 79.511 ms

$ timer(100){
$    fin = file("test.txt")
$    y=array(STRING,1024)
$    do{ lines = fin.readLines!(y,0,1024) } while(lines==1024)
$    fin.close()
$ };

Time elapsed: 56.034 ms