Super-basic file parsing in Go

I want to parse all the files in a folder starting with a certain prefix searching for the lines containing a certain string.

package main tells go that this source will be a main file, not a library/module. Then, we import the packages we need. If a package is not still needed, go will throw an error.

// or /* */ are, respectively, the one-line and multi-line comments.

package main

import (
	"bufio" // buffered input/output
	"compress/gzip" // compressed files handling
	"log" // logging
	"os" // OS related utilities
	"time" // timing
	"strings" // strings handling
	"path/filepath" // path handling
	"fmt" // basic printing
)

The main body of the program has to be in the main function. We would also like to know the time needed to execute the program so we use tGlob0 := time.Now() to keep track of the time we launch the program.
We also print a help message in case the number of arguments provided do not match the needed ones…. id they are wrong, googd luck!:P

func main() {
	
	helpMessage :=`Hi! To use this program you must provide
	1 - the path
	2 - the file prefix
	3 - the output file
	4 - the string to be searched.`
	
	if len(os.Args) < 5{
		fmt.Println(helpMessage)
		os.Exit(0)
	}

	tGlob0 := time.Now()
 

Go can create variables in two ways, by declaring them before or by assignement directly. To have a clearer view of what is going on I will declare the variables before except for tGlob0 and tGlob1. If you want to know the type of a variable, you can create it by assignement and then inspect its type with the reflect package, using reflect.TypeOf(variable).

    var inPath string
    var prefix string
    var searchString string
    var outFile string
    var inFiles []string
    var inFile string
    var extString []string
    var ext string
    var PID int 
    var (fileStruct *os.File
    fOut *os.File)
    var err error
    var fZip *gzip.Reader
    var nReader *bufio.Reader
    var read_line string
```        

The `os` package provides the tools to interact with the OS, so we can retrieve the process
PID and the CLI arguments.        
    
````go
	PID = os.Getpid()
	log.Println("Process PID is ", PID)	
	
	inPath = os.Args[1]
	prefix = os.Args[2]+"*"
	outFile = os.Args[3]
	searchString = os.Args[4]
	
	log.Println("Parsing files in folder ", inPath, " selecting ", os.Args[2])

The Glob function allows to search for the filenames matching a certain wildcard pattern.
defer is used to mark functions to be executed on function exit.

	inFiles, err = filepath.Glob(filepath.Join(inPath, prefix))
	if err != nil {
		panic(err)
	}
	
	log.Println("Searching for ", searchString, " in ", filepath.Join(inPath, prefix))
	log.Println("Creating output file ", outFile)
	
	fOut, err = os.Create(outFile)
	if err != nil {
		panic(err)
	}
	defer fOut.Close()

In the following piece of code is possible to see how to

  • work on strings
  • write a counter that updates
  • use the switch construct
  • make an assignement in the if construct
	log.Println("Starting main loop on file list of lenght ", len(inFiles))
	for fileIdx := range inFiles {
		
		inFile = inFiles[fileIdx]
		extString = strings.Split(inFile, ".")
		ext = extString[len(extString)-1]

		// Write an updating counter
		fmt.Print("Completed: ", 100. * fileIdx / len(inFiles), "% \r")
		
		// Creating file object
		if fileStruct, err = os.Open(inFile); err != nil {
			log.Fatal(os.Stderr, "%v, Can't open %s: error: %s\n", os.Args[0], inFile, err)
			os.Exit(1)
		}
		defer fileStruct.Close()
		
		switch ext {
			case "dat": {
				nReader = bufio.NewReader(fileStruct)
			}
			case "txt":{
				nReader = bufio.NewReader(fileStruct)
			}
			case "gz": {
				fZip, err = gzip.NewReader(fileStruct)
				if err != nil {
				log.Fatal(os.Stderr, "%v, Can't open %s: error: %s\n", os.Args[0], inFile, err)
				os.Exit(1)
				}
				nReader = bufio.NewReader(fZip)
			}
			default: {
				log.Fatal("Unrecognized file ", inFile)
			}
		}

And yes, no while but infinite loops with for.
Then we read the file line by line and write the line if we find a certain string inside it.

		for {
			if read_line, err = nReader.ReadString('\n'); err != nil {
			log.Println("Done reading file with err", err)
			break
			}
// 			if (strings.Contains(read_line, "name =") || strings.Contains(read_line, "i =")) {//&& strings.Contains(read_line, "<"){
				if strings.Contains(read_line, searchString){
				_, err = fOut.WriteString(read_line)
				}
		}
	
		// flush 
		fOut.Sync()
		fOut.Close()
	}
	
	
	log.Println()
	tGlob1 := time.Now()
	log.Println("Wall time for all ", tGlob1.Sub(tGlob0))
}
Tags// , , ,
More Reading
comments powered by Disqus