Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: NewWriter taking up too much memory #3903

Closed
jherman opened this issue Apr 8, 2021 · 10 comments
Closed

storage: NewWriter taking up too much memory #3903

jherman opened this issue Apr 8, 2021 · 10 comments
Assignees
Labels
api: storage Issues related to the Cloud Storage API. type: question Request for information or clarification. Not an issue.

Comments

@jherman
Copy link

jherman commented Apr 8, 2021

Client

Cloud Function

Environment

Cloud Function on GCP

Go Environment

$ go version

go version go1.15.10 windows/amd64

$ go env

set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\aaa\AppData\Local\go-build
set GOENV=C:\Users\aaa\AppData\Roaming\go\env  
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\Users\aaa\go\pkg\mod
set GONOPROXY=bitbucket.org/aaa
set GONOSUMDB=bitbucket.org/aaa
set GOOS=windows
set GOPATH=C:\Users\aaa\go
set GOPRIVATE=bitbucket.org/aaa
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=C:\repos\aaa
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\aaa\AppData\Local\Temp\go-build986707378=/tmp/go-build -gno-record-gcc-switches

Code

e.g.

package storage_test

import (
	"context"
	"fmt"
	"io/ioutil"
	"os"
	"runtime"
	"runtime/debug"
	"testing"

	"cloud.google.com/go/storage"
	"google.golang.org/api/option"
)

func TestUpload(t *testing.T) {
	setEnv()

	var (
		m1  runtime.MemStats
		m2  runtime.MemStats
		m3  runtime.MemStats
		m4  runtime.MemStats
		ctx = context.Background()
	)

	runtime.ReadMemStats(&m1)
	fmt.Printf("M1: %d MB\n", m1.TotalAlloc/1024/1024)

	op := option.WithScopes(storage.ScopeFullControl)

	client, _ := storage.NewClient(ctx, op)

	runtime.ReadMemStats(&m2)
	fmt.Printf("M2: %d MB\n", m2.TotalAlloc/1024/1024)

	for i := 0; i < 10; i++ {
		wc := client.Bucket("my-bucket").Object("my-path").NewWriter(ctx)
		wc.Close()
	}

	runtime.ReadMemStats(&m3)
	fmt.Printf("M3: %d MB\n", m3.TotalAlloc/1024/1024)

	runtime.GC()
	debug.FreeOSMemory()

	runtime.ReadMemStats(&m4)
	fmt.Printf("M4: %d MB\n", m4.TotalAlloc/1024/1024)
}

OUTPUT:

API server listening at: 127.0.0.1:28446
M1: 1 MB
M2: 1 MB
M3: 162 MB
M4: 162 MB
PASS

Note that this code example is the sake of brevity.

Expected behavior

Free up memory after closing NewWriter instance

Actual behavior

Each NewWriter instance takes about 16-18MB

This problem is exacerbated when used in a Cloud Function. In my specific case, I have to write many files to storage, with each TextWriter taking up this much memory, the Cloud Function get's interrupted because memory is quickly exceeded. How does one reclaim the memory after closing a NewWriter?

@jherman jherman added the triage me I really want to be triaged. label Apr 8, 2021
@codyoss codyoss changed the title cloud.google.com/go/storage: NewWriter taking up too much memory storage: NewWriter taking up too much memory Apr 8, 2021
@codyoss codyoss added the api: storage Issues related to the Cloud Storage API. label Apr 8, 2021
@codyoss
Copy link
Member

codyoss commented Apr 8, 2021

Likely related to googleapis/google-api-go-client#638

@jherman
Copy link
Author

jherman commented Apr 8, 2021

Likely related to googleapis/google-api-go-client#638

Ah, definitely @codyoss. Looks like it's been open for quite awhile. Any updates on the situation?

@tritone
Copy link
Contributor

tritone commented Apr 8, 2021

Hi @jherman , yes I need to circle back to that PR to allow buffer management by end users. However, you can already control how much memory is used for each Writer using Writer.ChunkSize. How large are the files you're uploading? Especially if you're uploading small files, you definitely want to set this to a smaller value.

@jherman
Copy link
Author

jherman commented Apr 8, 2021

Hi @jherman , yes I need to circle back to that PR to allow buffer management by end users. However, you can already control how much memory is used for each Writer using Writer.ChunkSize. How large are the files you're uploading? Especially if you're uploading small files, you definitely want to set this to a smaller value.

Hi @tritone, thanks for the tip. My files are only ~17KB. I'll read up on the ChunkSize. Can you recommend where to start?

EDIT:

OK, just realized this probably doesn't help me. Running the following line:

wc := client.Bucket("my-bucket").Object("my-path").NewWriter(ctx)

will immediately cause the memory allocation to go up, so specifying wc.ChunkSize would already be too late, no?

@tritone
Copy link
Contributor

tritone commented Apr 8, 2021

For the Go library specifically, we just have the docs for Writer that I linked to, plus cross-library documentation here which can give more intuition for how this is interpreted overall.

For your use case, I would set ChunkSize to the minimum upload ChunkSize of 256k; this will allow the library to send the metadata and media in a single multipart upload.

Did you see experimentally that NewWriter causes the allocation? If so that would be a bug; it's not expected that the buffer would be allocated until the Writer has been opened.

I'd try the following:

wc := client.Bucket("my-bucket").Object("my-path").NewWriter(ctx)
wc.ChunkSize = 256 * 1024
// use wc to upload object
wc.Close()

What does the allocation look like when you do this?

@jherman
Copy link
Author

jherman commented Apr 8, 2021

@tritone, yes, it's NewWriter that is causing the allocation. Check out my sample above. I print out memory allocations before and after, only instantiating the NewWriter instance and immediately closing it (thinking closing would help but doesn't). I think you can directly copy & paste that code above, and run it and see this for yourself as well. Is this a bug?

@tritone
Copy link
Contributor

tritone commented Apr 8, 2021

It would be-- let me verify that I can reproduce locally and report back.

@tritone
Copy link
Contributor

tritone commented Apr 8, 2021

Yeah, so it's being allocated upon wc.Close(), not on NewWriter. I made a tweak to your example to use ChunkSize and it behaved as expected.

package main

import (
	"context"
	"fmt"
	"runtime"
	"runtime/debug"

	"cloud.google.com/go/storage"
	"google.golang.org/api/option"
)



func main() {

	var (
		m1  runtime.MemStats
		m2  runtime.MemStats
		m3  runtime.MemStats
		m4  runtime.MemStats
		ctx = context.Background()
	)

	runtime.ReadMemStats(&m1)
	fmt.Printf("M1: %d MB\n", m1.TotalAlloc/1024/1024)

	op := option.WithScopes(storage.ScopeFullControl)

	client, _ := storage.NewClient(ctx, op)

	runtime.ReadMemStats(&m2)
	fmt.Printf("M2: %d MB\n", m2.TotalAlloc/1024/1024)

	for i := 0; i < 10; i++ {
		wc := client.Bucket("my-bucket").Object("my-path").NewWriter(ctx)
		wc.ChunkSize = 256 * 1024
		wc.Close()
	}

	runtime.ReadMemStats(&m3)
	fmt.Printf("M3: %d MB\n", m3.TotalAlloc/1024/1024)

	runtime.GC()
	debug.FreeOSMemory()

	runtime.ReadMemStats(&m4)
	fmt.Printf("M4: %d MB\n", m4.TotalAlloc/1024/1024)
}

Memory allocated:

M1: 1 MB
M2: 1 MB
M3: 9 MB
M4: 9 MB

@jherman
Copy link
Author

jherman commented Apr 8, 2021

Ah, that's excellent news @tritone. That helps tremendously, thank you!

Also sorry about wasting your time on the reproduction - I could have sworn it was doing it after the instantiating.

@tritone
Copy link
Contributor

tritone commented Apr 8, 2021

No worries, glad it helps!

I'm going to close out this issue for now-- hopefully adjusting ChunkSize will be sufficient for your use case but we do still want to add the ability to implement buffer pooling as well (this becomes more important when managing workloads with larger objects).

@tritone tritone added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Apr 8, 2021
@tritone tritone closed this as completed Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

3 participants