sqltocsvgzip

A library designed to convert sql.Rows result from a query into a CSV.GZIP file and/or upload to AWS S3.

Features

Multi-threaded Gzip compression
Concurrent multipart S3 uploads
Upload retries for resiliency
Uploading to S3 does not require local storage.
Consistent memory, cpu and network usage irrespective of number of sql.Rows.

Installation

go get github.com/thatInfrastructureGuy/sqltocsvgzip@v0.0.13

Note: Please do not use master branch. Master branch may contain breaking changes. Use tags instead.

Functions

Uploading ToS3
- One-Liner: UploadToS3(rows)
- Set up config: UploadConfig(rows) + Upload()
Writing to File
- One-Liner: WriteFile(rows, filename)
- Set up config: WriteConfig(rows) + WriteFile(filename)

Usage

Note: Check out examples directory for examples.

Importing the package

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql" // or the driver of your choice
    "github.com/thatinfrastructureguy/sqltocsvgzip"
)

Dumping a query to a file

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

err := sqltocsvgzip.WriteFile("~/important_user_report.csv.gzip", rows)
if err != nil {
    panic(err)
}

Upload to AWS S3 with env vars

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

// UploadToS3 looks for the following environment variables.
// Required: S3_BUCKET, S3_PATH, S3_REGION
// Optional: S3_ACL (default => bucket-owner-full-control)
err := sqltocsvgzip.UploadToS3(rows)
if err != nil {
    panic(err)
}

Upload to AWS S3 without environment variables

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

config := sqltocsvgzip.UploadConfig(rows)
config.S3Bucket = "mybucket"
config.S3Path = "/myfolder/file.csv.gzip"
config.S3Region = "us-west-1"

err := config.Upload()
if err != nil {
    panic(err)
}

Return a query as a GZIP download on the world wide web

http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    rows, err := db.Query("SELECT * FROM users WHERE something=72")
    if err != nil {
        http.Error(w, err, http.StatusInternalServerError)
        return
    }
    defer rows.Close()

    w.Header().Set("Content-Type", "application/javascript")
    w.Header().Set("Content-Encoding", "gzip")
    w.Header().Set("Content-Disposition", "attachment; filename=\"report.csv.gzip\"")

    config := sqltocsvgzip.WriteConfig(rows)
    err = config.Write(w)
    if err != nil {
        http.Error(w, err, http.StatusInternalServerError)
        return
    }
})
http.ListenAndServe(":8080", nil)

If you need more flexibility you can get an instance of a config and fiddle with a few settings.

rows, _ := db.Query("SELECT * FROM users WHERE something=72")

config := sqltocsvgzip.WriteConfig(rows)

config.TimeFormat = time.RFC822
config.Headers = append(rows.Columns(), "extra_column_one", "extra_column_two")

config.SetRowPreProcessor(func (columns []string) (bool, []string) {
    // exclude admins from report
    // NOTE: this is a dumb example because "where role != 'admin'" is better
    // but every now and then you need to exclude stuff because of calculations
    // that are a pain in sql and this is a contrived README
    if columns[3] == "admin" {
      return false, []string{}
    }

    extra_column_one = generateSomethingHypotheticalFromColumn(columns[2])
    extra_column_two = lookupSomeApiThingForColumn(columns[4])

    return append(columns, extra_column_one, extra_column_two)
})

config.WriteFile("~/important_user_report.csv.gzip")

Defaults

10Mb default csv buffer size.
50Mb default zip buffer size.
Zipping: Default runtime.GOMAXPROCS(0) goroutines with 512Kb data/goroutine
4 concurrent upload goroutines.

Caveats

Minimum PartUploadSize should be greater than 5 Mb.
Maximum of 10000 part uploads are allowed by AWS. Hence, (50Mb x 10000) 500Gb of gzipped data is supported by default settings.
Increase buffer size if you want to reduce parts or have more than 500Gb of gzipped data.
Currently only supports upload to AWS S3 API compatible storage.

System Requirements

Minimum:
- CPU: 2 vcpu
- Memory: 500Mb (Depends on sql data size.)
- Disk: Only needed if your writing to a file locally. (> size of gzip file)

Credits:

Name		Name	Last commit message	Last commit date
Latest commit History 290 Commits
.github/workflows		.github/workflows
examples		examples
LICENSE		LICENSE
README.md		README.md
config.go		config.go
csv.go		csv.go
fakedb_test.go		fakedb_test.go
getter.go		getter.go
go.mod		go.mod
go.sum		go.sum
s3.go		s3.go
sort.go		sort.go
sqltocsvgzip.go		sqltocsvgzip.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sqltocsvgzip

Features

Installation

Functions

Usage

Defaults

Caveats

System Requirements

About

Releases 13

Packages

Contributors 6

Languages

License

thatInfrastructureGuy/sqltocsvgzip

Folders and files

Latest commit

History

Repository files navigation

sqltocsvgzip

Features

Installation

Functions

Usage

Defaults

Caveats

System Requirements

About

Resources

License

Stars

Watchers

Forks

Releases 13

Packages 0

Contributors 6

Languages

Packages