Syntax Highlighting

Syntax highlighting of source code is essential in software development, as it effectively distinguishes different program elements such as keywords, reserved words, identifiers, bracket matching, comments, strings, and more. IDEs used by developers generally support syntax highlighting - in VSCode, the gopls mechanism can identify the category of each token and, combined with color themes, implement syntax highlighting. Editors like Vim and Sublime can also support syntax highlighting for different programming languages through plugins.

When using debuggers, we often need to examine source code, for example: 1) Using the list command to actively view source code, list main.main or list main.go:20 to conveniently view code or further determine where to add the next breakpoint; 2) When stepping through statements with next or instructions with step, we want to show the current source code and instruction positions; 3) When tracing the call stack with bt, we want to display the current function call stack... In these cases, if we can syntax highlight specific source code locations including function calls, statements, expressions, operands, operators, variable names, and type names, it will undoubtedly improve readability.

In this section, we'll introduce how to achieve this.

How to Implement

What work needs to be done to implement syntax highlighting? If you've studied compiler principles, it should be easy to realize that we just need to implement a lexical analyzer to extract token sequences from the program, and use a syntax analyzer to analyze and identify what these tokens are and how they relate to each other - whether they form a function, an expression, or something as simple as defining a variable or a control flow statement. Once we can identify these elements, highlighting these different program constructs becomes straightforward.

Hands-on Practice

Let's take Go language as an example to discuss how to implement source code highlighting. Naturally, we don't want to reimplement tedious work like lexical analyzers and syntax analyzers, and we don't have the energy to redo this kind of work. Although flex and yacc can help simplify these tasks, Go's standard library already provides package ast to help us with syntax analysis related work. In this article, we'll demonstrate how to syntax highlight Go source code using package ast.

We'll design a package colorize that provides a colorize.Print(...) method to highlight specified source code files, allowing specification of line number ranges, io.Writer, and highlighting color styles. We only need to write the following source files:

line_writer.go: Responsible for line-by-line output, allowing specification of tokens and highlighting color styles. Since tokens contain position information, combined with colors, it can highlight different program constructs like specific keywords, identifiers, comments, etc.
colorize.go: Responsible for reading source files and performing AST analysis to extract program constructs we want to highlight, such as keywords like package, var, func as tokens, and constructing colorTok (containing token position information and category, where category determines the final color style)
style.go: The highlighting display style, mapping different categories to different terminal colors

Below is the specific source code implementation. Actually, this code comes from go-delve/delve. When writing debugger101 related demos, I found and fixed a bug in go-delve/delve, so I'm sharing this here as a simple record. Not many students get the chance to try this.

file: colorize.go

// Package colorize use AST analysis to analyze the source and colorize the different kinds
// of literals, like keywords, imported packages, etc.
//
// If you want to highlight source parts, for example, the identifiers.
// - firstly, colorTok must be generated by `emit(token.IDENT, n.Pos(), n.End())` in colorize.go
// - secondly, we should map the token.IDENT to some style in style.go
// - thirdly, we should define the color escape in terminal.go
package colorize

import (
    "go/ast"
    "go/parser"
    "go/token"
    "io"
    "io/ioutil"
    "path/filepath"
    "reflect"
    "sort"
)

// Print prints to out a syntax highlighted version of the text read from
// path, between lines startLine and endLine.
func Print(out io.Writer, path string, startLine, endLine, arrowLine int, colorEscapes map[Style]string) error {
    buf, err := ioutil.ReadFile(path)
    if err != nil {
        return err
    }

    w := &lineWriter{w: out, lineRange: [2]int{startLine, endLine}, arrowLine: arrowLine, colorEscapes: colorEscapes}

    var fset token.FileSet
    f, err := parser.ParseFile(&fset, path, buf, parser.ParseComments)
    if err != nil {
        w.Write(NormalStyle, buf, true)
        return nil
    }

    var base int

    fset.Iterate(func(file *token.File) bool {
        base = file.Base()
        return false
    })

    type colorTok struct {
        tok        token.Token // the token type or ILLEGAL for keywords
        start, end int         // start and end positions of the token
    }

    toks := []colorTok{}

    emit := func(tok token.Token, start, end token.Pos) {
        if _, ok := tokenToStyle[tok]; !ok {
            return
        }
        start -= token.Pos(base)
        if end == token.NoPos {
            // end == token.NoPos it's a keyword and we have to find where it ends by looking at the file
            for end = start; end < token.Pos(len(buf)); end++ {
                if buf[end] < 'a' || buf[end] > 'z' {
                    break
                }
            }
        } else {
            end -= token.Pos(base)
        }
        if start < 0 || start >= end || end > token.Pos(len(buf)) {
            // invalid token?
            return
        }
        toks = append(toks, colorTok{tok, int(start), int(end)})
    }

    for _, cgrp := range f.Comments {
        for _, cmnt := range cgrp.List {
            emit(token.COMMENT, cmnt.Pos(), cmnt.End())
        }
    }

    ast.Inspect(f, func(n ast.Node) bool {
        if n == nil {
            return true
        }

        switch n := n.(type) {
        case *ast.File:
            emit(token.PACKAGE, f.Package, token.NoPos)
            return true
        case *ast.BasicLit:
            emit(n.Kind, n.Pos(), n.End())
            return true
        case *ast.Ident:
            // TODO(aarzilli): builtin functions? basic types?
            return true
        case *ast.IfStmt:
            emit(token.IF, n.If, token.NoPos)
            if n.Else != nil {
                for elsepos := int(n.Body.End()) - base; elsepos < len(buf)-4; elsepos++ {
                    if string(buf[elsepos:][:4]) == "else" {
                        emit(token.ELSE, token.Pos(elsepos+base), token.Pos(elsepos+base+4))
                        break
                    }
                }
            }
            return true
        }

        nval := reflect.ValueOf(n)
        if nval.Kind() != reflect.Ptr {
            return true
        }
        nval = nval.Elem()
        if nval.Kind() != reflect.Struct {
            return true
        }

        tokposval := nval.FieldByName("TokPos")
        tokval := nval.FieldByName("Tok")
        if tokposval != (reflect.Value{}) && tokval != (reflect.Value{}) {
            emit(tokval.Interface().(token.Token), tokposval.Interface().(token.Pos), token.NoPos)
        }

        for _, kwname := range []string{"Case", "Begin", "Defer", "Package", "For", "Func", "Go", "Interface", "Map", "Return", "Select", "Struct", "Switch"} {
            kwposval := nval.FieldByName(kwname)
            if kwposval != (reflect.Value{}) {
                kwpos, ok := kwposval.Interface().(token.Pos)
                if ok && kwpos != token.NoPos {
                    emit(token.ILLEGAL, kwpos, token.NoPos)
                }
            }
        }

        return true
    })

    sort.Slice(toks, func(i, j int) bool { return toks[i].start < toks[j].start })

    flush := func(start, end int, style Style) {
        if start < end {
            w.Write(style, buf[start:end], end == len(buf))
        }
    }

    cur := 0
    for _, tok := range toks {
        flush(cur, tok.start, NormalStyle)
        flush(tok.start, tok.end, tokenToStyle[tok.tok])
        cur = tok.end
    }
    if cur != len(buf) {
        flush(cur, len(buf), NormalStyle)
    }

    return nil
}

file: style.go

package colorize

import "go/token"

// Style describes the style of a chunk of text.
type Style uint8

const (
    NormalStyle Style = iota
    KeywordStyle
    StringStyle
    NumberStyle
    CommentStyle
    LineNoStyle
    ArrowStyle
)

var tokenToStyle = map[token.Token]Style{
    token.ILLEGAL:     KeywordStyle,
    token.COMMENT:     CommentStyle,
    token.INT:         NumberStyle,
    token.FLOAT:       NumberStyle,
    token.IMAG:        NumberStyle,
    token.CHAR:        StringStyle,
    token.STRING:      StringStyle,
    token.BREAK:       KeywordStyle,
    token.CASE:        KeywordStyle,
    token.CHAN:        KeywordStyle,
    token.CONST:       KeywordStyle,
    token.CONTINUE:    KeywordStyle,
    token.DEFAULT:     KeywordStyle,
    token.DEFER:       KeywordStyle,
    token.ELSE:        KeywordStyle,
    token.FALLTHROUGH: KeywordStyle,
    token.FOR:         KeywordStyle,
    token.FUNC:        KeywordStyle,
    token.GO:          KeywordStyle,
    token.GOTO:        KeywordStyle,
    token.IF:          KeywordStyle,
    token.IMPORT:      KeywordStyle,
    token.INTERFACE:   KeywordStyle,
    token.MAP:         KeywordStyle,
    token.PACKAGE:     KeywordStyle,
    token.RANGE:       KeywordStyle,
    token.RETURN:      KeywordStyle,
    token.SELECT:      KeywordStyle,
    token.STRUCT:      KeywordStyle,
    token.SWITCH:      KeywordStyle,
    token.TYPE:        KeywordStyle,
    token.VAR:         KeywordStyle,
}

file: line_writer.go

package colorize

import (
    "fmt"
    "io"
)

type lineWriter struct {
    w         io.Writer
    lineRange [2]int
    arrowLine int

    curStyle Style
    started  bool
    lineno   int

    colorEscapes map[Style]string
}

func (w *lineWriter) style(style Style) {
    if w.colorEscapes == nil {
        return
    }
    esc := w.colorEscapes[style]
    if esc == "" {
        esc = w.colorEscapes[NormalStyle]
    }
    fmt.Fprintf(w.w, "%s", esc)
}

func (w *lineWriter) inrange() bool {
    lno := w.lineno
    if !w.started {
        lno = w.lineno + 1
    }
    return lno >= w.lineRange[0] && lno < w.lineRange[1]
}

func (w *lineWriter) nl() {
    w.lineno++
    if !w.inrange() || !w.started {
        return
    }
    w.style(ArrowStyle)
    if w.lineno == w.arrowLine {
        fmt.Fprintf(w.w, "=>")
    } else {
        fmt.Fprintf(w.w, "  ")
    }
    w.style(LineNoStyle)
    fmt.Fprintf(w.w, "%4d:\t", w.lineno)
    w.style(w.curStyle)
}

func (w *lineWriter) writeInternal(style Style, data []byte) {
    if !w.inrange() {
        return
    }

    if !w.started {
        w.started = true
        w.curStyle = style
        w.nl()
    } else if w.curStyle != style {
        w.curStyle = style
        w.style(w.curStyle)
    }

    w.w.Write(data)
}

func (w *lineWriter) Write(style Style, data []byte, last bool) {
    cur := 0
    for i := range data {
        if data[i] == '\n' {
            if last && i == len(data)-1 {
                w.writeInternal(style, data[cur:i])
                if w.curStyle != NormalStyle {
                    w.style(NormalStyle)
                }
                if w.inrange() {
                    w.w.Write([]byte{'\n'})
                }
                last = false
            } else {
                w.writeInternal(style, data[cur:i+1])
                w.nl()
            }
            cur = i + 1
        }
    }
    if cur < len(data) {
        w.writeInternal(style, data[cur:])
    }
    if last {
        if w.curStyle != NormalStyle {
            w.style(NormalStyle)
        }
        if w.inrange() {
            w.w.Write([]byte{'\n'})
        }
    }
}

Running Tests

Below is the test file. We define a string representing the source code content, and use gomonkey to mock the ioutil.ReadFile(...) operation to return our defined source code string. Then we execute colorize.Print(...) to highlight it.

file: colorize_test.go

package colorize_test

import (
    "bytes"
    "fmt"
    "io/ioutil"
    "reflect"
    "testing"

    "github.com/agiledragon/gomonkey/v2"

    "github.com/hitzhangjie/dlv/pkg/terminal/colorize"
)

var src = `package main

// Vehicle defines the vehicle behavior
type Vehicle interface{
    // Run vehicle can run in a speed
    Run()
}

// BMWS1000RR defines the motocycle bmw s1000rr
type BMWS1000RR struct {
}

// Run bwm s1000rr run
func (a *BMWS1000RR) Run() {
    println("I can run at 300km/h")
}

func main() {
    var vehicle = &BMWS1000RR{}
    vehicle.Run()
}
`

const terminalHighlightEscapeCode string = "\033[%2dm"

const (
    ansiBlack     = 30
    ansiRed       = 31
    ansiGreen     = 32
    ansiYellow    = 33
    ansiBlue      = 34
    ansiMagenta   = 35
    ansiCyan      = 36
    ansiWhite     = 37
    ansiBrBlack   = 90
    ansiBrRed     = 91
    ansiBrGreen   = 92
    ansiBrYellow  = 93
    ansiBrBlue    = 94
    ansiBrMagenta = 95
    ansiBrCyan    = 96
    ansiBrWhite   = 97
)

func colorizeCode(code int) string {
    return fmt.Sprintf(terminalHighlightEscapeCode, code)
}

var colors = map[colorize.Style]string{
    colorize.KeywordStyle: colorizeCode(ansiYellow),
    colorize.ArrowStyle:   colorizeCode(ansiBlue),
    colorize.CommentStyle: colorizeCode(ansiGreen),
    colorize.LineNoStyle:  colorizeCode(ansiBrWhite),
    colorize.NormalStyle:  colorizeCode(ansiBrWhite),
    colorize.NumberStyle:  colorizeCode(ansiBrCyan),
    colorize.StringStyle:  colorizeCode(ansiBrBlue),
}

func TestPrint(t *testing.T) {
    p := gomonkey.ApplyFunc(ioutil.ReadFile, func(name string) ([]byte, error) {
        return []byte(src), nil
    })
    defer p.Reset()

    buf := &bytes.Buffer{}
    colorize.Print(buf, "main.go", bytes.NewBufferString(src), 1, 30, 10, colors)

    colorize.Print(os.Stdout, "main.go", bytes.NewBufferString(src), 1, 30, 10, colors)
}

Now when running this test case go test -run TestPrint, the program output is as follows:

We can see that some program elements have been highlighted. Of course, we only identified a small subset of elements like keywords, strings, and comments. In practice, IDEs analyze code much more thoroughly, as you've probably experienced when using IDEs.

Article Summary

This article briefly summarized how to perform syntax analysis and highlighting of source code based on Go's AST package. We hope readers can understand the key points covered here and recognize that compiler theory knowledge can be used to create valuable and interesting tools. For example, we can implement linters to check source code (like golangci-linter). The author previously wrote another article about visualizing Go programs - some IDEs also support automatically generating classdiagrams, callgraphs, etc., which are other applications of Go AST analysis.

In the new year, let's strive together to be engineers who pursue deeper understanding, knowing both what and why :)

9.3.102 Implementing Syntax Highlighting