Files
tidb/parser/docs/quickstart.md

196 lines
5.6 KiB
Markdown

# Quickstart
This parser is highly compatible with MySQL syntax. You can use it as a library, parse a text SQL into an AST tree, and traverse the AST nodes.
In this example, you will build a project, which can extract all the column names from a text SQL.
## Prerequisites
- [Golang](https://golang.org/dl/) version 1.13 or above. You can follow the instructions in the official [installation page](https://golang.org/doc/install) (check it by `go version`)
## Create a Project
```bash
mkdir colx && cd colx
go mod init colx && touch main.go
```
## Import Dependencies
First, you need to use `go get` to fetch the dependencies through git hash. The git hashes are available in [release page](https://github.com/pingcap/tidb/releases). Take `v5.3.0` as an example:
```bash
go get -v github.com/pingcap/tidb/parser@4a1b2e9
```
> **NOTE**
>
> The parser was merged into TiDB repo since v5.3.0. So you can only choose version v5.3.0 or higher in this TiDB repo.
>
> You may want to use advanced API on expressions (a kind of AST node), such as numbers, string literals, booleans, nulls, etc. It is strongly recommended using the `types` package in TiDB repo with the following command:
>
> ```bash
> go get -v github.com/pingcap/tidb/types/parser_driver@4a1b2e9
> ```
> and import it in your golang source code:
> ```go
> import _ "github.com/pingcap/tidb/types/parser_driver"
> ```
Your directory should contain the following three files:
```
.
├── go.mod
├── go.sum
└── main.go
```
Now, open `main.go` with your favorite editor, and start coding!
## Parse SQL text
To convert a SQL text to an AST tree, you need to:
1. Use the [`parser.New()`](https://pkg.go.dev/github.com/pingcap/tidb/parser?tab=doc#New) function to instantiate a parser, and
2. Invoke the method [`Parse(sql, charset, collation)`](https://pkg.go.dev/github.com/pingcap/tidb/parser?tab=doc#Parser.Parse) on the parser.
```go
package main
import (
"fmt"
"github.com/pingcap/tidb/parser"
"github.com/pingcap/tidb/parser/ast"
_ "github.com/pingcap/tidb/parser/test_driver"
)
func parse(sql string) (*ast.StmtNode, error) {
p := parser.New()
stmtNodes, _, err := p.Parse(sql, "", "")
if err != nil {
return nil, err
}
return &stmtNodes[0], nil
}
func main() {
astNode, err := parse("SELECT a, b FROM t")
if err != nil {
fmt.Printf("parse error: %v\n", err.Error())
return
}
fmt.Printf("%v\n", *astNode)
}
```
Test the parser by running the following command:
```bash
go run main.go
```
If the parser runs properly, you should get a result like this:
```
&{{{{SELECT a, b FROM t}}} {[]} 0xc0000a1980 false 0xc00000e7a0 <nil> 0xc0000a19b0 <nil> <nil> [] <nil> <nil> none [] false false 0 <nil>}
```
> **NOTE**
>
> Here are a few things you might want to know:
> - To use a parser, a `parser_driver` is required. It decides how to parse the basic data types in SQL.
>
> You can use [`github.com/pingcap/tidb/parser/test_driver`](https://pkg.go.dev/github.com/pingcap/tidb/parser/test_driver) as the `parser_driver` for test. Again, if you need advanced features, please use the `parser_driver` in TiDB (run `go get -v github.com/pingcap/tidb/types/parser_driver@4a1b2e9` and import it).
> - The instantiated parser object is not goroutine safe. It is better to keep it in a single goroutine.
> - The instantiated parser object is not lightweight. It is better to reuse it if possible.
> - The 2nd and 3rd arguments of [`parser.Parse()`](https://pkg.go.dev/github.com/pingcap/tidb/parser?tab=doc#Parser.Parse) are charset and collation respectively. If you pass an empty string into it, a default value is chosen.
## Traverse AST Nodes
Now you get the AST tree root of a SQL statement. It is time to extract the column names by traverse.
Parser implements the interface [`ast.Node`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Node) for each kind of AST node, such as SelectStmt, TableName, ColumnName. [`ast.Node`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Node) provides a method `Accept(v Visitor) (node Node, ok bool)` to allow any struct that has implemented [`ast.Visitor`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Visitor) to traverse itself.
[`ast.Visitor`](https://pkg.go.dev/github.com/pingcap/tidb/parser/ast?tab=doc#Visitor) is defined as follows:
```go
type Visitor interface {
Enter(n Node) (node Node, skipChildren bool)
Leave(n Node) (node Node, ok bool)
}
```
Now you can define your own visitor, `colX`(columnExtractor):
```go
type colX struct{
colNames []string
}
func (v *colX) Enter(in ast.Node) (ast.Node, bool) {
if name, ok := in.(*ast.ColumnName); ok {
v.colNames = append(v.colNames, name.Name.O)
}
return in, false
}
func (v *colX) Leave(in ast.Node) (ast.Node, bool) {
return in, true
}
```
Finally, wrap `colX` in a simple function:
```go
func extract(rootNode *ast.StmtNode) []string {
v := &colX{}
(*rootNode).Accept(v)
return v.colNames
}
```
And slightly modify the main function:
```go
func main() {
if len(os.Args) != 2 {
fmt.Println("usage: colx 'SQL statement'")
return
}
sql := os.Args[1]
astNode, err := parse(sql)
if err != nil {
fmt.Printf("parse error: %v\n", err.Error())
return
}
fmt.Printf("%v\n", extract(astNode))
}
```
Test your program:
```bash
go build && ./colx 'select a, b from t'
```
```
[a b]
```
You can also try a different SQL statement as an input. For example:
```console
$ ./colx 'SELECT a, b FROM t GROUP BY (a, b) HAVING a > c ORDER BY b'
[a b a b a c b]
If necessary, you can deduplicate by yourself.
$ ./colx 'SELECT a, b FROM t/invalid_str'
parse error: line 1 column 19 near "/invalid_str"
```
Enjoy!