Skip to content
This repository was archived by the owner on Oct 13, 2023. It is now read-only.

[17.06] Graceful upgrade of containerd and runc state files upon live-restore#117

Merged
vieux merged 1 commit into
docker-archive:17.06from
tiborvass:fix-live-restore
Jul 24, 2017
Merged

[17.06] Graceful upgrade of containerd and runc state files upon live-restore#117
vieux merged 1 commit into
docker-archive:17.06from
tiborvass:fix-live-restore

Conversation

@tiborvass

@tiborvass tiborvass commented Jul 13, 2017

Copy link
Copy Markdown
Contributor

To address issue:

@andrewhsu andrewhsu added this to the 17.06.1 milestone Jul 13, 2017
return fmt.Errorf("container %s is already active", containerID)
}

if errUpgrade := v17_06_1.Upgrade(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by this time, it's too late if I'm not mistaken.

containerd on restart tries to load all of those files and would have failed to unmarshal them

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this is not supposed to be here... only the other Upgrade() call. Will remove.

ids[i] = fi.Name()
}
f.Close()
for _, id := range ids {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should upgrade all the entries, we should only upgrade the one which ids are found in /run/docker/libcontainerd/containerd

Nothing prevent the user to having different version of containerd/runc on the machine and we don't want to break the containers started with those.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Will update.

@andrewhsu

Copy link
Copy Markdown
Contributor

In latest build result:

00:27:05.307 FAIL: docker_cli_daemon_test.go:2813: DockerDaemonSuite.TestExecWithUserAfterLiveRestore
00:27:05.307 
00:27:05.307 [dd68c05e3b6f9] waiting for daemon to start
00:27:05.307 [dd68c05e3b6f9] daemon started
00:27:05.307 
00:27:05.307 [dd68c05e3b6f9] exiting daemon
00:27:05.307 [dd68c05e3b6f9] waiting for daemon to start
00:27:05.307 [dd68c05e3b6f9] daemon started
00:27:05.307 
00:27:05.307 docker_cli_daemon_test.go:2834:
00:27:05.307     c.Assert(err, check.IsNil, check.Commentf("Output: %s", out2))
00:27:05.307 ... value *exec.ExitError = &exec.ExitError{ProcessState:(*os.ProcessState)(0xc42054f1e0), Stderr:[]uint8(nil)} ("exit status 126")
00:27:05.307 ... Output: invalid character 's' after top-level value
00:27:05.307 
00:27:05.307 
00:27:05.307 [dd68c05e3b6f9] exiting daemon

@tiborvass

Copy link
Copy Markdown
Contributor Author

Ping @mlaventure

MountLabel string `json:"mount_label"`
Hostname string `json:"hostname"`
Namespaces configs.Namespaces `json:"namespaces"`
Capabilities linuxCapabilities `json:"capabilities"`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlaventure just realized that linuxCapabilities uses specs.LinuxCapabilities (https://github.com/docker/docker-ce/pull/117/files#diff-40363fc1cc8aab85c96e0f6d4e7d4315R97) while this is supposed to be configs.Capabilities, which thankfully happens to be the same.

Maybe it's fine for this version and we can think of something more accurate after?

if err != nil {
return err
}
// TODO: copy caps or not copy caps?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlaventure any thoughts on this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym by copy?

@tiborvass tiborvass Jul 17, 2017

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlaventure I mean:

copy(boundingCaps, caps)
copy(effectiveCaps, caps)
...

Otherwise, l.V.Bounding[i] == l.V.Effective[i], and if someone writes code to change an element of Bounding it will change all the other ones too, unwanted side-effect. Although it's only in very special cases, where the same underlying array is used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the case currently in the daemon code too: https://github.com/moby/moby/blob/master/daemon/oci_linux.go#L258-L266

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mlaventure okay thanks, so let's keep it orthogonal.

@andrewhsu

Copy link
Copy Markdown
Contributor

@tiborvass

Copy link
Copy Markdown
Contributor Author

@mlaventure I pushed a better version, that does all the decoding first, before overwriting the files. This is so that if one of the later files fails to be decoded, the previous ones keep their initial state and we don't end up in a mixed state.

@tiborvass

tiborvass commented Jul 18, 2017

Copy link
Copy Markdown
Contributor Author

And now it also encodes into a bytes.Buffer before writing to file, just in case the reencoding phase fails (not just the decoding).

I hope the extra buffering doesn't become an issue when there's a lot of containers.

@tiborvass

Copy link
Copy Markdown
Contributor Author

Ping @mlaventure

@stevvooe

Copy link
Copy Markdown
Contributor

LGTM after moving to atomic file write.

@mlaventure

Copy link
Copy Markdown
Contributor

I agree with @stevvooe it's better to write to a tmp file, then rename it.

…e-restore

Vendors new dependency github.com/crosbymichael/upgrade

Signed-off-by: Tibor Vass <tibor@docker.com>
@andrewhsu

Copy link
Copy Markdown
Contributor

LGTM

I've tested this PR by building a deb package and upgrading from a docker-ce 17.03 daemon that had a container started with live-restore. After the upgrade, container still seen as active from docker ps.

@crosbymichael

Copy link
Copy Markdown
Contributor

LGTM

defer f.w.Close()
}
var errs []string
for _, f := range files {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this sucks everything up into memory, creates an atomic file for each, then writes the buffer.

@stevvooe

Copy link
Copy Markdown
Contributor

LGTM

@vieux vieux merged commit 8bdf679 into docker-archive:17.06 Jul 24, 2017
@tiborvass

Copy link
Copy Markdown
Contributor Author

@vieux Line for changelog: Fix issue upon upgrade, preventing docker from showing running containers when --live-restore is enabled

@andrewhsu

Copy link
Copy Markdown
Contributor

@tiborvass @vieux I've also verified this PR works on upgrade from docker-ce 17.05.0.

@andrewhsu

Copy link
Copy Markdown
Contributor

Note to selves: after conversation with @tiborvass may need to look into having all 3 json files updated atomically. As this PR stands, looks good to have containers still running with live restore.

@vieux vieux changed the title [engine] Graceful upgrade of containerd and runc state files upon live-restore [17.06] Graceful upgrade of containerd and runc state files upon live-restore Jul 24, 2017
docker-jenkins pushed a commit that referenced this pull request Jul 3, 2018
[18.06] update package descriptions and info
Upstream-commit: 2baa88e2bcf7c451ea8405fa54948c07f722cf9e
Component: packaging
docker-jenkins pushed a commit that referenced this pull request Mar 19, 2019
fixes #117

Print healthcheck information in pretty mode.

Signed-off-by: Stephane Jeandeaux <stephane.jeandeaux@gmail.com>
Upstream-commit: 05674a50961d99eaf5320f32beb91646fd1355b4
Component: cli
docker-jenkins pushed a commit that referenced this pull request Mar 19, 2019
Signed-off-by: Stephane Jeandeaux <stephane.jeandeaux@gmail.com>
Upstream-commit: d4ad7a94d217a8925923dd8e232fd949b3ec1bd4
Component: cli
silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020
[engine] Graceful upgrade of containerd and runc state files upon live-restore
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants