Hacker News new | past | comments | ask | show | jobs | submit login
Permissive forwarding rule leads to unintentional exposure of containers (2021) (gist.github.com)
234 points by password4321 on June 22, 2022 | hide | past | favorite | 175 comments



In context, https://github.com/moby/moby/issues/22054#issuecomment-96220... and following:

> Docker containers in the 172.16.0.0/12 ip range are definitely reachable from external hosts. All it requires is a custom route on the attacker's machine that directs traffic bound for 172.16.0.0/12 through the victim's machine.

> Notice that the common advice of binding the published port to 127.0.0.1 does not protect the container from external access.

...

> You don't even have to --publish.


Would this attack require having control of a machine on the same local network? Otherwise how would you route that packet to the target machine?


Right, you aren't going to send that packet over the Internet, and you'll only getting it through local routers that have the routes you need, which should not be common.


Might be common with IoT devices, cloud network equipment etc. :|


Yes, I believe so, although you need “privileged network access” on the box - meaning you can craft random IP packets, not just “control” of the box. So on the LAN or otherwise in-path for the route for that subnet’s responses.


Aren't you just crafting a packet to a 172.17.0.0/16 address on port 80? That requires no special permission whatsoever.


If your compromised box has a route already for that subnet and machine yes. That would be odd though I think.


yes


Eh, isn't the real story then that Linux will route random incoming traffic on any interface? Even if you haven't configured it as a router?

I know whoever was in charge of configuring Dockers iptables routes should have known this and messed up, but that is fucked up.


I'm a bit confused; as far as I know Linux only forwards packets on interfaces where you've enabled routing, doesn't it?

If you set up:

    net.ipv4.ip_forward=0
    net.ipv4.conf.eth0.forwarding=0
    net.ipv4.conf.docker.forwarding=1 
I'm pretty sure this isn't an issue, right?

My guess is that the real real story is probably that every guide on the internet says to just set net.ipv4.ip_forward=1 and that nobody bothers to stop and read up on the sysctl parameters they're copy/pasting from the internet.

For this attack to succeed, the attacker also needs to be on the same network or have their upstream ISPs accept plain external traffic towards internal networks. Executing the PoC on Linux without being in the same subnet won't even be accepted (though raw sockets may still send out traffic towards the host that will probably get filtered somewhere along the way).


This doesn't work. Breaks traffic out (below) and in.

    # sysctl net.ipv4.ip_forward net.ipv4.conf.enp8s0.forwarding net.ipv4.conf.docker0.forwarding
    net.ipv4.ip_forward = 1
    net.ipv4.conf.enp8s0.forwarding = 1
    net.ipv4.conf.docker0.forwarding = 1
    # docker run --rm alpine/curl -sI --max-time 5 http://1.1.1.1 | head -n2
    HTTP/1.1 301 Moved Permanently
    Server: cloudflare
    # sysctl net.ipv4.conf.enp8s0.forwarding=0
    net.ipv4.conf.enp8s0.forwarding = 0
    # docker run --rm alpine/curl -I --max-time 5 http://1.1.1.1 | head -n2
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
    curl: (28) Connection timed out after 5000 milliseconds


At some level docker has to do this because it creates an abstraction that your containers are their own little devices with their own IP address. For your host machine to talk to them and vice/versa it has to be able to route traffic to them. I don't think docker flips on routing globally for all interfaces though.


> Linux will route random incoming traffic on any interface? Even if you haven't configured it as a router?

"configuring Linux as a router" is exactly what "adding an iptables rule that routes" is. That's what it's for, that's how you do it.


You have configured it as a router, by installing Docker.


I believe this is only true when ipv4 forwarding kernel tunable is enabled, but am not sure. Does anyone know?


If you disable it at least, your containers will no longer have internet access.


Had to look it up, you’re right:

https://github.com/moby/moby/issues/490

It’s required by docker, ugh.



This is even worse than falsely binding to 127.0.0.1. If I don't publish a port in a container, I expect to *not publish any ports*.


Well yeah, you turned on ip forwarding and created a network on your host that can be reached. Imagine this was a VM host and those networks were meant to be routable, how else could it work?

Are people just now discovering the FORWARD chain?


Like the OP said,

> most Linux users do not know how to configure their firewalls and have not added any rules to DOCKER-USER. The few users that do know how to configure their firewalls are likely to be unpleasantly surprised that their existing FORWARD rules have been preceded by Docker's own forwarding setup


I'm confused. Does docker change the routing table? Otherwise there will be no route back to the attacker's machine.

Also, if 172.16.0.0/12 is bound to the docker0 interface then the kernel will see packets from IPs in that range that come in on other interfaces as martians and drop them.


Ouch. This is the security posture that half a billion dollars, hundreds of engineers, multiple layers of abstraction to a firewall ruleset and each Stream-aligned team with an embedded Product Manager and UX Designer[0] buys you. I too contributed security fixes to Docker in the early stages but soon gave up.

Docker is a fascinating study in hype. Containers already existed, docker wrapped them very badly, raised half a billion dollars, popularized their wrapper (I understand kubernetes is the fad-wrapper now) and then failed to obtain a revenue stream. I didn't see them return much to open source, though perhaps they contributed to projects I don't see.

Honestly, the whole thing has a plotline like a Shakespearean tragedy. Someone could do a film, starting with interviews with the guys who wrote the original Linux containers and namespacing kernel code at IBM, the maintainers of jails in FreeBSD, and an early hacker at VMWare.

[0] https://www.docker.com/blog/building-stronger-happier-engine...


The Docker UX is what made the difference, not the underlying tech. Building containers in an easy incremental and reproducible way that could then be stored in a repo and pulled over HTTP was the evolution in packaging that was needed.

UX alone can be a multi-billion dollar business, for example Dropbox was just a folder that syncs. I'll leave you to find that classic HN comment about it. Docker just executed poorly and didn't have much worth paying for.


How does one build containers reproducably? I can make parts of my stack reproducable if I use existing, pinned containers, but due to all the `apt-get`ing in my dockerfiles, I'd call them anything but reproducable.


It's all relative, and as long as you don't use rolling-release distro, docker is a good compromise between having security fixes and partial reproducibility where it matters.

Remember, there were lots of containers before docker: jails, schroot, lxc, etc.. But the setup model there was "do the base os install, then enter the container and run setup commands" -- which is as non-reproducible as it gets, those commands are often not even saved anywhere. (Yes, you could if you are diligent about it. I don't think many people did that, certainly no tutorials mentioned it)

Compared to that, Dockerfile was revolutionary: a _requirement_ that all software is installed via automated means, and no simple way to do "just one more adjustment" in manual, undocumented way.


And then there was rkt around the same time as Docker and thought they would be able to take the mantle when Docker-the-company struggled to make money. The biggest problem with that theory is that it had none of this secret sauce you mention.

When I would point this out I usually got brigaded. Now RedHat owns the company that made rkt and I don't know if it's even being developed anymore. But nobody brings it up so I don't even have to think about it.


rkt (and many other container solutions) was introduced after docker was released and became popular... they even mentioned docker's shortcomings as a motivation for the project creation [0]. It had all the same problems as other replacement software: there were plenty of bugs and missing features (the announcement mentions "prototype quality release"), documentation was limited, and there are no community to help you. None of those would be fatal if it was significantly better than docker, but it was not -- it had less functionality and needed more scaffolding. So almost no one made the switch. It is closed now [1]

And why "rkt"? There were much better alternative container runtimes. For example Sylabs Singularity [2] -- container-as-a-file, instant mounting, etc... I wish more people knew about it.

[0] https://web.archive.org/web/20141201181834/https://coreos.co...

[1] https://github.com/rkt/rkt#warning-end-of-project-warning

[2] https://github.com/sylabs/singularity#singularityce


> but due to all the `apt-get`ing in my dockerfiles, I'd call them anything but reproducable.

Giving this some thought, I agree if you use package managers and base images I don't think it's possible to build images in a perfectly reproducible fashion.

But I would ask: don't image tagging and registries to provide a reproducible output (binary, packages, etc.) make it unnecessary to provide reproducible builds? Why does it matter if the build is perfectly reproducible if the actual output of the image can be reproduced? I.E. if I can run the software from an image built two years ago, does it matter if I can't build the exact Dockerfile from two years ago?

EDIT: fixed some grammar issues


I don't know, in my opinion then you are just running a container, like running an exe. You really aren't reproducing anything.


A lot of the value of containers seems to be in providing the ease of deployment benefits of a static executable to languages that don’t support compiling to a static executable.


Isn't that the whole point of a container is to not retain state of anything else within the filesystem other than the desired entrypoint/cmd(s)?


But that's the good thing about them. It isn't just a VM snapshot it is a recipe to build an image.


Pre container times there were recipes to build VMs including plain shell ( which can also be used to build containers).

Docker, the FOSs tools and the hub are very nice packaging.


Sure, but those were, and still are, a nightmare to use/learn for a lot of usecases.


that's where docker failed in only copying part of what I built (they simplified it, and to an extent that was good for making a product, but I had proposed the need for (and prototyped) a container/layer aware.

see https://www.usenix.org/conference/lisa11/improving-virtual-a....

the "problem" is building that layer aware linux distribution, so Docker just skipped it and relied on existing things and building each image up linearly.

This was actually an advantage of CoreOS's rkt (and the ACI format), where it enabled images to be constructed from "random" layers instead of linear layers (though the primary tooling didn't really provide it and were just docker analogues, there were other tools that enabled you to manually select which layer IDs should be part of an image - but docker "won" and we were left with its linearized layer model).


I use Nix, which has a `dockerTools.streamLayeredImage` function: https://nixos.org/manual/nixpkgs/stable/#sec-pkgs-dockerTool...


With apt2ostree[1] we use lockfiles to allow us to version control the exact versions that were used to build a container. This makes updating the versions explicit and controlled, and building the containers functionally reproducible - albeit not byte-for-byte reproducible.

[1]: https://github.com/stb-tester/apt2ostree#lockfiles


You use Docker as a distribution mechanism for the output of a reproducibility-oriented build system like Guix or Nix. Pin your dependencies inside the build system, then the build system can generate a container.


I use NixOS. But I see no reason to dabble with containers since I use NixOS except for when I can't coax some software to work reliably on NixOS or I want to produce binaries that work on distros that are not NixOS, in which case I can't use NixOS based containers anyway.


I had in mind that your goal with reproducible containers was to distribute some app your business makes on platforms that don't know anything about NixOS, in which case building the app with Nix and then using your favorite container runtime as the deployment target makes sense. But if you're using containers as an escape hatch for NixOS and Nixpkgs, that definitely doesn't do much for you!

In enterprise environments, some old school distros have an archiving layer that sits between hosts and their normal repos that you can use to hold back updates. Maybe you could use something like that. I forget what Red Hat's offering is called but I think it's part of Satellite. Idk if there are any free tools for that, but maybe there are.

The other alternative escape hatched that NixOS has, like FHSUserEnvs or just steam-run, you likely already know about.


> I can't use NixOS based containers anyway

I'm no expert, but I believe the purpose of containers is to include all user space dependencies... so this doesn't make sense to me.

I personally am surprised NixOS hasn't leaned into marketing itself as ideal for creating reproducible containers.


If you're using NixOS on the desktop, sometimes you want to run an obscure piece of software quickly, without packaging it for NixOS, which can be a PITA with proprietary software or software with irregular build processes.

Or maybe you're working in an organization where some instructions are written for another distro, and you just want to be able to follow them word for word the first time you attempt a task, to make sure you understand the process on a 'normal' distro, or because you're troubleshooting with someone who is running another distro. Then NixOS' support for running containers is handy, but afterward you're left running some containers whose reproducibility doesn't match what you've come to expect frok the rest of your system.

> I personally am surprised NixOS hasn't leaned into marketing itself as ideal for creating reproducible containers.

Agreed, I think this is a really good use case for a lot of companies.


Its great when it works, but I use containers to emulate normal Linux when it doesn't.


How reproducible is up to you.

You can build everything elsewhere and only copy in the artifacts. Or only install pinned versions of things. Or only use certain base tags. Or do the entire CI thing inside a container. Or use multi-stage containers for different steps.

My previous startups all used pinned based images with multi-stage containers to build each language/framework part in its own step, then assembly all the artifacts into a single deployable unit.


The Docker images are not even versioned.


I've seen success building them in centralized server(s) and 'apt-get'ing specific dependencies instead of the latest ones.


I don't think it was even that. It was massive marketing and hype which drove docker adoption. The engineering concerns and the user experience were both negatives because they actually added a considerable cost and friction to the experience.

I will obviously need to back that comment up but fundamentally the security posture of docker (and most package repositories) is scary and the ability to debug containers and manage them adds considerable complexity. Not to mention the half baked me-too offerings that circle around the product (windows containers anyone?)

The repeatability argument is dead when you have no idea what or who the hell your base layers came from.

What it did was build a new market for ancillary products which is why the marketing was accelerated. It made money. Quickly.


I mean, these are the same people who released an entire new daemon in the 2013 that ... can only run as root. Party like it's 1983. I believe they added some experimental caveated "non-root" mode last year, but how this was not even supported, much less the default, from day one is just ... ugh.

That being said, Docker is more than just "Linux containers" or FreeBSD jails; the key thing it brought was the push/pull commands, which made setting up and distributing containers a lot easier, as well as some other UI/UX things. I think these ideas are implemented badly, but they do add value.

Compare this with FreeBSD jails, where the instructions were "download the base set from the FreeBSD ftp and extract it to the destination dir, manually set up /dev, twiddle with /etc/rc.conf, muck about with ifconfig for networking, etc."

Looking at the current handbook[1], things are better now: there's bsdinstall and /etc/jail.conf, but e.g. jail.conf was added only a few months before Docker was released, but the key value proposition of Docker (push/pull) is still missing. Right now, setting up nginx or PostgreSQL in a FreeBSD jail is more work than setting it up system-wide. People don't use Docker because it runs in containers per se, they use it so they can run something in one easy step.

IMHO Docker is more or less the Sendmail of our age[2]; we just have to wait for "Docker Postfix" or "Docker qmail" :-) I'd make a punt, but I don't really have that much time.

[1]: https://docs.freebsd.org/en/books/handbook/jails/#jails-buil...

[2]: People have forgotten now, but Sendmail solved a lot of issues in the early 80s and dealt with the wildgrowth of different email systems better than anything else; while crusty and ugly, it added a lot of value when it was first released.


Docker non-root is some kind of sad twisted joke. The person that jumps through pages of hoops to get that mess working and the person that reaches for Docker in the first place are not the same person. When they say they recommend Ubuntu only, they aren't fucking around. I tried last year on Debian. Never again. I nuked it all and started over with regular root Docker and restored whatever was left of my sanity.

Docker compose is also a bewildering mess of conflicting versions and baffling decisions. With incredibly poor documentation. Try to figure out network_mode without spending hours doing trial-and-error github issues stackoverflow head pounding.

I think the UX is fine if you ignore Dockerfile and docker-compose.yml. But those files are rather atrocious. The Faustian bargain of Docker is you fetch images from rather dubious sources and of dubious origin, run them with a root daemon, and let Docker molest your iptables. In return, you get the illusion of having accomplished some form of meaningful isolation. No security whatsoever. But hey, I get to run two versions of node on one Linux and pretend I didn't just sweep all these serious issues under my, now, rather large bed.


> Try to figure out network_mode

Try to figure out why bridge isn't a bridge (honestly it still makes my blood a couple degrees warmer every time I remember that)


I mean, these are the same people who released an entire new daemon in the 2013 that ... can only run as root. Party like it's 1983.

Small correction: that would be 1982. Bill Joy released chroot in 1982.


I'd rather run Docker as root than enable unprivileged user namespaces, security-wise.


Why, exactly?

Personally I would trust the Linux kernel developers to find and fix security issues around unprivileged user namespaces much more than I trust the Docker team to build a secure product.


> Personally I would trust the Linux kernel developers to find and fix security issues

Yeah I highly recommend not having that view. Kernel upstream is the entire reason this problem exists - they spent decades downplaying and deriding security researchers who found root -> kernel privesc, and, in general, have had an incredibly hostile relationship with security professionals.

I don't know the case with Docker as much but my impression is a lot more positive based on what I've seen - integration with Apparmor/SELinux, seccomp, memory safety, etc.


Wouldn’t "Docker Postfix" or "Docker qmail" be podman?


Maybe; I haven't used it. Whenever I needed containers I used runc or just set up some cgroups manually, but it's been a while since I had to do so.


Docker accidentally(?) created my very favorite cross-distro Linux server package & service manager.

That it uses containers is just an implementation detail, to me.

All the docker replacements will be useless to me if they don't come with a huge repository of pre-built relatively-trustworthy up-to-date containers for tons of services.

I love no longer needing to care much about the underlying distro. LTS? Fine. I still get up-to-date services. Red hat? Debian-based? Something else? Barely matters, once Docker is installed.

On my home server it even saves me from ever having to think about the distro's init system.

Bonus: dockerized services are often much easier to configure than their distro-repo counterparts, and it's usually easier to track down everything they write that you need to back up, and that's all the same no matter what distro they're running on.


>huge repository of pre-built relatively-trustworthy >relatively-trustworthy

yeah, that's not really the case with docker either I'm afraid


> yeah, that's not really the case with docker either I'm afraid

I feel like two places I disagree with this statement:

- Docker's official image library

- When a project you are looking to run has an official image they maintain

When those constraints are not met, I try to build my own image (based on the official images).

I try not to trust any other external images (with some exceptions from reputable sources such as linuxserver.io or Bitnami).


Shrug just about every service I want to run has an official image. I use a couple unofficial ones (e.g. Minecraft, and I think my Samba container—oh man, talk about being easier to configure than typical distro packaging—is unofficial) for my personal use and it's pretty easy to read the Dockerfile and build them myself, so that's not a big deal. At work, most any database or self-hosted service or anything like that has an official image.

So yes, it is.


> Bonus: dockerized services are often much easier to configure than their distro-repo counterparts

A hard no there.

Sure, some projects, after they started to publish their own Docker images concluded what it would be beneficial to expose more configuration options not buried in tons of .conf files - ie they started to support environment variables or have some script in the init what would dump the variables to the proper places.

Sadly, this works only for some projects, because if this is is an afterthought then you will have a pretty bad time figuring out why something doesn't work or not configured as you wish despite having all the knobs specified. The worst offender for was probably Gitea - it just ignored half of the settings until I found a combination what would trigger supporting some settings (!). Another offender is probably Nextcloud, it is easier to docker build a full image than to add SMB support to their official one with a published instructions on how to do so. And half the time it ignores 'trusted_hostnames' so it fires up and doesn't allow you to login because it doesn't know what 'nextcloud.yourdomain.tld' is it's own address.

Oh, don't let me start on Swarm on compose file shenanigans.


Just forcing them to record exactly where important files are saved and config files live has been hugely helpful. I like that it makes it easy to keep my config from being shit all over /etc with a thousand other files I will never want or need to touch, so my modifications to /etc on a typical server are small and tightly confined to a small number of files. What do I back up? It's right there in the "mounts" list. Have I backed up everything important? Add some dummy data, remove the image and do an "up" and if it still looks right, then yes, I have.

... and I don't have to re-learn any distro-specific config or packaging, for those dockerized services. Get it right once, and you're done.

> The worst offender for was probably Gitea - it just ignored half of the settings until I found a combination what would trigger supporting some settings (!)

Haha, that's actually one I run, and also one of the worst I've ever encountered. I found the Rootless version of their image much easier to deal with (I never got the other one to work at all) personally, for whatever reason. Part of the trouble is that Gitea's config file, and documentation thereof, is kinda a damn mess.


> my config from being shit all over /etc with a thousand other files

Yea, I was so involved in the previous comment what I forgot to say what this is what is actually helpful

> What do I back up? It's right there in the "mounts" list

Yep, I completely ignore the docker persistent volumes. My infra is running in VMs which are backed up by Veeam, so there is a bazillion of ways to recover data and services - and no need for full restores and volumes juggling if I only need a one file.


I get that this comment is currently being pummeled from all sides, but just to be clear on what Docker offered, and the value I think they created, it wasn't containers, but:

* The public registry. Obviously, there are a lot of downsides to this with respect to security, but the convenience of "just install docker and then you can run basically anything that has already been uploaded to Docker Hub" can't really be overstated.

* OCI. They were kind of forced into this much later on, but it's great to have a standard for how to package a container image and a bundle that can be loaded into a runtime. Containers may have already existed, but there was not standard way to package, deliver, and run them.

* The Dockerfile. For better or worse, it isn't perfect and is full of footguns, but as the "hello world" equivalent of build systems go, it's pretty easy to get started and understand what it's doing.

* I think, but am not certain, that Docker was the first to combine containers with union filesystems, saving quite a bit of disk space when you're running many containers that share layers. This also helped usher in the era of read-only root filesystems and immutable infrastructure.

It's also important to remember that developers are not the only people involved in delivering and running software. I see a lot of complaining here that containers add friction to their workflow, but look at it from the perspective of systems administrators, IT operators, and SREs that have to install, configure, run, monitor, and troubleshoot potentially hundreds of applications, some internally developed, some third-party, all using whatever language, whatever runtime, and whatever dependency chain they feel like without regard to what the others are doing. Having a standardized way to package, deliver, and run applications that is totally agnostic to anything other than "you need Linux" was pretty powerful.


Docker's hype train was absolutely absurd, but it was instrumental in making containers ubiquitous, and the wrapper is the reason why. Containers were painful as hell to set up and use; it was like using Git, but worse. Docker made it convenient, simple, and useful. By now everybody understands that immutable artifacts and simple configuration formats are extremely effective. Docker is one of the only tools I can think of that was designed with a human being in mind rather than a machine.


Docker is like brew in that they are both huge productivity levers, except Docker found a way to pay a bunch of technologists with VC dollars to evangelize the model.


Security is an afterthought in docker, deploy first questions later the dirty harry way. I learned lots about dockers lack of defense while trying to harden my instances and questioned why some of the options weren't enabled by default (--cap-drop=all and --security-opt=no-new-privileges and userns remap) other than exposing bad dockerfiles practices.


I’m not sure that is entirely fair.

Sure containers like LXC and bsd-jails existed, but I think the feature that won the hearts of devs was the incremental builds that let people fail quickly in a forgiving way.

With jails, the best I know of achieving this is creating a base jail filesystem (thin jail), and creating a new one. (Apologies in advance if another tool addresses this).


HP-UX vaults date back to late 90's, as one of the first container models for UNIX systems.


1. Docker has contributed a lot and provided if not technology at least ideas others have iterated on.

2. k8s is not a fad. Nor is it wrapper or really even equivalent with docker the container bit.


Docker is an imperfect solution to problems we shouldn't have after decades of Linux and Unix and its value is exactly in its DX.

But considering the pride in finding the right incantation to build C/C++ code, to set up a system service, to set up routing rules or any other advanced feature... I guess anything that tries to hide it will count as hype


> Containers already existed

Fascinating. I'm admittedly very new to container-land, but my understanding was precisely the opposite - that Docker invented this concept and practice of building a runtime environment that can then be replicated and deployed to many hosts. Do you have any search terms or articles I should pursue to learn more about the predecessors?

I've read through this[0], but it mostly suggests that approaches prior to Docker provided isolation/jailing, which is an important part of containerization but neglects a lot of tooling around building, deploying, etc.

[0] https://blog.aquasec.com/a-brief-history-of-containers-from-...

EDIT: various other comments ("The Docker UX is what made the difference, not the underlying tech. Building containers in an easy incremental and reproducible way that could then be stored in a repo and pulled over HTTP was the evolution in packaging that was needed.", "I think the feature that won the hearts of devs was the incremental builds that let people fail quickly in a forgiving way", "Containers were painful as hell to set up and use; it was like using Git, but worse. Docker made it convenient, simple, and useful. By now everybody understands that immutable artifacts and simple configuration formats are extremely effective") seem to support this perception.

EDIT2: Thanks to all those that replied!


OpenVZ is around since 2005, see https://en.m.wikipedia.org/wiki/OpenVZ and it had checkpointing, live migration between hosts, dump container to tar, etc before Docker even existed.


And linux vserver* has been around since 2001.

If you were willing to use out of tree patches, linux containers pre-date Solaris containers by years (Debian distributed pre-patched kernels pretty early on, so you didn't need to deal with compiling your own kernels, and Debian also had an option for vserver + grsec patched kernels). But, freebsd jails beats linux vserver by a year.

* not to be confused with the similarly named linux virtual server (LVS) load balancer.


Containers absolutely existed before. THere was a long-running project, I think it was LXC but there may have been one before that, which was doing great work fixing all the underlying problems before the technology went exponential.


> I think it was LXC

Still is :)


Containers at their very heart in Linux are three ingredients: cgroups, namespaces, and filesystem-level seperation.

Here's an interesting way to learn more about this - Bocker, an implementation of docker in 119 lines of bash:

https://github.com/p8952/bocker/blob/master/bocker


I belive that Solaris (OpenSolaris) Zones predates LXC by around 3 years. Even when working with k8s and docker every day, I still find what OpenSolaris had in 2009 superior. Crossbow and zfs tied it all together so neatly. What OpenSolaris could have been in another world. :D


HP-UX Vaults were introduced in late 90's, then came Solaris Zones, BSD jails,...

This on the UNIX world, mainframes had a couple of decades earlier.



> failed to obtain a revenue stream

Docker is throwing off incredible amounts of cash with Docker Desktop licensing alone.


Sadly. This is why I recommend running docker inside of a container (e.g. LXC) or a VM, as I just don't trust the security profiles for docker images.


so, uhm, before Docker, what was the one-liner that allowed me to get a ubuntu, fedora, nixos, ... shell automatically ?


You can do something with lxc - I had to use it instead of Docker on a Chromebook a while back. The wrapper isn't as nice, but it's not actually that hard to get going.


lxc-create -t ubuntu


tar -zf ubuntu.tar.gz && chroot ubuntu

ish


    $ tar -xzf ubuntu.tar.gz && chroot ubuntu
    tar (child): ubuntu.tar.gz: Cannot open: No such file or directory
    tar (child): Error is not recoverable: exiting now
    tar: Child returned status 2
if I have to do wget of something (and find the url to wget) it's not worth it



If it requires to read a whole blog article like this to setup someone can make and sell a product with a one-liner command that does it. They could call it, idk, decker or something


I take it you don't remember learning Docker? I had to read way more than one article to fully understand it. I also had to learn what cgroups were and all kinds of things.


    tar -zf ubuntu.tar.gz
    tar: Must specify one of -c, -r, -t, -u, -x
Doesn't work on my shiny macbook. I'll stick with Docker, I guess.


How is K8S a fad?


It's not a fad, but a fashion. People wear it because it looks cool on your CV or consultancy offering, not because it keeps you warm and dry. They don't even understand it properly. Even AWS, who I work with regularly on EKS issues, seem not to understand their own offering properly.

Ultimately in a lot of companies it ends up an expensive piss soaked nightmare wearing a jacket with "success" printed across it.

I like the idea but the implementation sucks and echoing the root poster, containers have some unique problems which can indeed cause you some real trouble and friction. It makes sense for really horizontal simple things, which isn't a lot of stuff on the market.


I used borg for over a decade, I know what k8s is for.


On a long enough timeline, everything can be considered a fad!


The shortening of the name to what you have above is your first clue.


Uh-oh, guess we have to tell everyone internationalization is just a fad because everyone writes it i18n.


It absolutely is. Approximately no-one gets real business value out of internationalization, they just use it as a stick to beat technologies they dislike.


This is hilarious.


Ask me in 10 years, I'll tell you whether it was a fad or not.


a 20 year run in tech is not a "fad".


"tech" was a fad to your future AI overlords


The other fun thing here, at least with docker-compose, is that it behaves really badly with IPv6, essentially, if you don't manually set a IPv6 address, then all IPv6 to the host address is masqueraded to IPv4, with the visible address being the router address for the network.

So, Any external IPv6 connections come in looking like they're from the local network.


With vanilla Docker ipv6 is just disabled by default I believe.


So every time you sit down in an internet cafe and connecting to the wifi, you are fucked because all the docker containers that run on your machine are open to everybody? Or - depending on the lan settings - at least to the wifi provider?

Also - shouldn't the web be full off vurnurable database servers then?


> So every time you sit down in an internet cafe and connecting to the wifi, you are fucked because all the docker containers that run on your machine are open to everybody?

Yes, if you're running docker on Linux. On Mac and Windows docker has to create a little linux VM so it's more isolated and not directly accessible without explicitly publishing ports.


Publishing ports to '*' is commonly done to allow Mac and Windows users to access containers through their browsers.

The macos firewall is able to block connections to these exposed sockets but:

1. The user has to explicitly turn on the firewall since it is off by default

2. The option "Automatically allow downloaded signed software to receive incoming connections" must be unchecked because Docker Desktop is signed by Apple.

I don't use a Mac, but all of the developers that use Macs at my company either did not have their firewall enabled or did not realize that connections to Docker Desktop were whitelisted.


Yep it's a good call out. Another good habit that you allude to is to publish ports to both an interface and port on the host system, not just a port (which assumes all interfaces, including external ones like wifi). The syntax slightly changes so you do command line option '-p "127.0.0.1:5000:5000"' which means on my host machine's localhost only (127.0.0.1) listen on port 5000 and forward it to port 5000 in the container. That way only a process running on my local machine can connect to my container and not someone else on the network if I forgot to turn on a firewall.


> Also - shouldn't the web be full off vurnurable database servers then?

No, the docker bridge network is not on a routable subnet.


Does it have to? The attack looks like it would also work over the internet:

    2. [ATTACKER] Route all packets destined for 172.16.0.0/12 through the victim's machine.

    ip route add 172.16.0.0/12 via 192.168.0.100
Here, "192.168.0.100" could be exchange for any ip address I guess?


That will only work if you are on the same subnet.

When you craft a packet for that address, the stack will see that route and send an ARP "who has" request out whatever interface you assigned when you did that IP route rule (probably your default ethernet). If nobody responds than the packet dies in the stack.


172.16.0.0/12 is a private subnet. This means that it's addresses are relevant only within a local network, and never over the internet. If you try to send a packet to an address within that subnet, layer 3 devices (i.e. routers) on the internet will drop it.


If that's true, you can then send packets to it, but not receive replies. That's still a problem.


Except you would have to be on the same layer 2 network as the "victim" for this to work.


> Also - shouldn't the web be full off vurnurable database servers then?

It already was, but yes.


> "shouldn't the web be full off vurnurable database servers then?"

It is. There are millions of servers out there with major security issues. Every few months there's another big story of a data breach from nothing more than a database left open.


Have you ever heard of https://www.shodan.io/?


It seems to be some kind of search engine. How is it related to the topic at hand?


Sorry, should have been more clear. It was a reply to this part of your comment:

> Also - shouldn't the web be full off vurnurable database servers then?

You can search for open database servers there, which should answer your question.


It exposes/let's you search for unsecured IoT, IP cams etc, I assume OP implied searching for unsecured containers would be kind of similar.


It lets users search for misconfigured database instances that it has scraped - very relevant to the topic.


For a home style Wi-Fi yes. Hopefully you can’t reach Wi-Fi peers for Starbucks and it’s ilk.


Even if not - the wifi operator could still access your docker containers.


Not if you have a firewall enabled on your machine, which everybody should.

edit: based on comments, added the FORWARD table, which supersedes Docker's rules and should block new connections forwarding from external interfaces.

edit 2: I'm wrong, this doesn't fix the bug. If you run `docker network create foobar`, it will move its rules up above the custom FORWARD rules. Ugh.

edit 3: Modified to add the rules to the DOCKER-USER table, which according to Docker[1], will always be evaluated first. So now it should actually fix the problem. They explain in the docs how to fix this situation, actually, and even how to prevent Docker from modifying iptables. Another example of why we should RTFM...

  for tool in iptables ip6tables ; do
    for intf in wlan+ eth+ ; do
      for table in INPUT DOCKER-USER ; do
        $tool -I $table 1 -i $intf -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
        $tool -I $table 2 -i $intf -m conntrack --ctstate INVALID -j DROP
        $tool -I $table 3 -i $intf -m conntrack --ctstate NEW -j DROP
    done
  done
  iptables-save > /etc/iptables.rules
  ip6tables-save > /etc/ip6tables.rules
https://help.ubuntu.com/community/IptablesHowTo https://wiki.debian.org/iptables https://wiki.alpinelinux.org/wiki/Configure_Networking#Firew... https://wiki.archlinux.org/title/Iptables#Configuration_and_... [1] https://docs.docker.com/network/iptables/


As discussed extensively in https://github.com/moby/moby/issues/22054, which is linked from the OP: this doesn't actually help, because Docker (by default) bypasses your existing firewall rules.


Ah, forgot about forwarding table. These should fix that:

  for tool in iptables ip6tables ; do
    $tool -I DOCKER-USER 1 -i eth+ -m conntrack --ctstate NEW -j DROP
    $tool -I DOCKER-USER 1 -i wlan+ -m conntrack --ctstate NEW -j DROP
  done
If these are saved and loaded with the rest of networking, they will appear at the top of the FORWARD table before the DOCKER table jumps. Docker won't remove these rules, and they come first in the table, so they supersede Docker's rules. Any new connections forwarded from an external interface should drop.

edit: changed from FORWARD to DOCKER-USER table


If you are going that far why not just:

    iptables -t filter -I FORWARD -j DROP
and while you are at it:

    sysctl -w net.ipv4.conf.all.forwarding=0
    sysctl -w net.ipv6.conf.all.forwarding=0


....because then containers can't network at all to non-local networks, e.g. no internet access. for bridge networks at least (which is the default).

by specifying the drop for new connections incoming from the external interface, you stop connections to listening services from external networks, but established and related connections can continue implicitly, so forwarding still works for outbound connections.

if you really want to block all internet access for containers, and stop anything else on your system that might need to use forwarding, then your suggestion is correct.


Docker creates its own iptables rules when its daemon runs, so unfortunately whatever firewall you setup will be ignored by exceptions it creates.


But isn't the point of TFA that regular docker (which runs with privileges) negates these rules effectively punching holes?


Ah yes, this footgun. Sometime last year I discovered our QA databases had all been locked with a ransom demand, how were they exposed? We had `ufw` blocking the db port?! Ah yes, Docker -p. smh.


Give us money or were keeping your test data


Docker was definitely a fork in the road that led to unhappiness. I was tracking linux containers for several years (specifically, with the goal of doing cross-machine container migration) and they were doing great. Then docker came along and... pretty much every experience Ive had with it involves realizing core decisions were made by people who had no experience and we're all paying for it, long-term.


When Docker was still new, a Go IRC channel saw one employee after another coming to ask about combining ungzip and untar in a streaming fashion inside a Go process. We tried to help, but that was roughly when I realized Docker will never gain technical excellence. It never was in their culture.


This is a valid concern when running Dockerized containers and it caught me years ago when I was deploying my first containerized servers to the wild. It was a "holy shit" moment, but anyone who is trusted (getting paid) to secure servers should stumble upon this pretty quickly.

1) Add firewall rules to UFW

2) Test firewall rules

3) Notice you can bypass those rules (they didn't take)

4) iptables -vL

5) Why the fuck is there a DOCKER-USER chain at the top of FORWARD, and it's above all of my UFW entries?

6) Adjust iptables without UFW

7) Realize adjustment didn't work after reboot because Docker dynamically adds the iptables entries on start (above my entries)

8) Add my chain to the bottom of the DOCKER-USER chain, before the RETURN all

9) Verify firewall rules

10) Reboot server

11) Verify firewall rules

This is probably a huge issue today because most people can't do their job correctly. They are pulling that sweet salary, bullshitting their stand-ups, zoning out on video calls, and squirming until they can turn netflix back on while working remotely. You just need to have enough attention span to verify your work and dig in when things are broken.


You should try podman instead. The rootless version obviously can’t bypass UFW. When I tested podman as root it also didn’t punch holes into UFW. Not sure how it is today however.


Related thread on the GitHub repo of Moby: - https://github.com/moby/moby/issues/22054#issuecomment-10997...


Securing docker, with it's weird iptables shenanigans is a nightmare. I prefer to use rootless podman, and also avoid exposing ports (by using internal routing and if needed, a reverse proxy with only one published port)


Yeah this got me the first time I was using docker as well. I wanted my app server to only listen locally and configured it like that, then Docker helpfully punched a hole in my firewall so anything could talk to it.

Agreed that podman has been a great experience in comparison.


Same here. Are there even legit reasons to not run all containers rootless? I run rootless everything, as well as things that need privileged ports. The design choices by the Docker team are just baffling.


I believe, at the time, setting up a container required root. As in the kernel didn't expose the right pieces for a non-root user to ever dream of starting a container.


Podman and Docker require the exact same shenanigans except in the one specific case using slirp4netns (rootless) in Podman or RootlessKit in Docker which requires using a tap device and an entire usemode networking stack.

It's a neat trick for development environments but for real traffic you'll still have to actually do the iptables BS.


Podman 4 rootless uses a different network stack: https://www.redhat.com/sysadmin/podman-new-network-stack

It is performant enough for my usecase: services used by me and a few friends.

I don't use the root mode, but I was under the impression it doesn't have the same well known docker issue where it exposes everything on the public interface(and using a firewall on top of it is complicated)


Yeah it's nuts. The best solution I've found is some kind of cloud firewall, whether that be an off the self service that you use with whatever cloud provider you are using, or rolling your own by routing all traffic through another host that doesn't have Docker nuking all your firewall rules every time it restarts.


This.

You need a last resort security control against stuff like this anyway. Even an automation failure or misunderstanding of a ruleset can leave you exposed.

Security must be layered.


To be fair either in the cloud or on premise, firewall is a must. It’s just one of the layers of security.


Setting aside the (rather scary) issue of crafted external packets being bridged into the internal docker subnet - the “0.0.0.0” fw-bypassing default for published ports in a developer-oriented tool is a complete nightmare for infosec teams at software engineering companies. I’ve seen of thousands of employee hours wasted chasing that down.


Is this still the realm of "mistakes can happen sometimes" or is it neglience? Where is the limit?


Is it neglience if it is a documented feature? Bad UX does not imply neglience?


Its documented, but still extremely surprising even for experienced docker users. The only place you will read about this is is you actually go to the docs page for the `-p` flag. But as I've said before, why would I do that? I already know what `-p` does (spoiler: I didn't know what it did).

It was multiple years before I realized I was exposing my services like this, after it came up on HN a while back.


I am not saying that this does not suck. I was a victim of this as well. I am just saying that it isn't really negligence, just bad architecture.


Right, but theres bad architecture and then theres "this is a security risk and every tutorial in the wild + every app in production uses this in an insecure way and we haven't done anything about it".

I just realized I posted my thoughts on this github issue [1] which is now _six_ years old. There have been no updates / changes made as far as I can tell.

[1] - https://github.com/moby/moby/issues/22054


Mhh good point. I guess if you keep bad architecture long enough it becomes negligence.


There was some site that got pawned because of this last year or so, I forgot the name but the owner did a nice write-up of it and there's a long HN thread on it. There were many experienced Docker users – often using it daily in a business setting – that were not aware of this "feature".

Yes, you can document it somewhere, but 1) not everyone reads everything from cover-to-cover, and 2) even if you do, the real-world implications may not be immediately obvious (the way it was phrased, at the time, didn't make it obvious).


Probably thinking of NewsBlur. I know because the fallout from that is what this brought the issue to my attention in the first place. It was 5 years since a public issue was opened at that point, and has been a year since that story, and still nothing has changed.


Ah yes, that's the one; thank you.


That is why you usually secure your stuff by a hardware (or separate) firewall.

But I too found this feature weird when I found out about it.


Can someone explain how that proof of concept in the gist applies to a production environment?

For example, let's say I take this exact command from the gist:

    docker run -e POSTGRES_PASSWORD=password -p 127.0.0.1:5432:5432 postgres
And run that on a VPS somewhere out in the world, let's say it's on a public IP address of 106.12.52.111 (I completely made up this IP address btw).

How can you sitting on your dev box on a different network open a psql connection to 106.12.52.111 on port 5432 to the point where you haven't been blocked by iptables?

The part I don't get is the gist mentions:

> An attacker that sends traffic to 172.17.0.2:80 through the docker host will match the rule above and successfully connect to the container, obviating any security benefit of binding the published port on the host to 127.0.0.1. What's worse, users who bind their published ports to 127.0.0.1 operate under a false sense of security and may not bother taking further precautions against unintentional exposure.

In the above example how is someone going to send traffic to 172.17.0.2:80 through the Docker host from a box on a different network than the Docker host?

Also is this still exploitable if you drop everything at the iptables level before you start using Docker?

For example all of my iptables configs start with:

    :INPUT DROP [0:0]
    :FORWARD DROP [0:0]
    :OUTPUT ACCEPT [0:0]


> In the above example how is someone going to send traffic to 172.17.0.2:80 through the Docker host from a box on a different network than the Docker host?

The attacker and host will generally need to be on the same network so that the attacker's packets are not dropped because they are addressed to a non-routable private IP address.

You could access the containers at 106.12.52.111 if you were in the same network (e.g. 106.12.52.0/24) and the packets did not have to traverse a router.

> Also is this still exploitable if you reject everything at the iptables level before you start using Docker?

Yes. Docker appends the FORWARD chain with custom rules that explicitly forward traffic to published ports.


> You could access the containers at 106.12.52.111 if you were in the same network (e.g. 106.12.52.0/24) and the packets did not have to traverse a router.

Ok thanks, that's sort of what I thought (you had to be on the same network) but I wasn't 100% on that because networking has a lot of rabbit holes.

Your gist is very well written and a great find but based on the scope of the vulnerability this wouldn't be classified as a catastrophic event right?

If it's only limited to the attacker and the Docker host being on the same network while packets never go through a router then it's not an issue for the common case of someone hosting their web app or service on a VPS somewhere on the internet and have used 127.0.0.1:XXXX:XXXX to publish a port (perhaps their web app is published to localhost so nginx running directly on the Docker host can reverse proxy it -- this is what I've done for years now).


Rootless Docker is currently the only version that does not touch iptables.

Modifying iptables was a massive oversight by Docker- they really should deprecate that.


Or you just pass --iptables=false to dockerd. Then it will also not do anything.


Docker users on the Mac are not affected by this issue, but they should be aware that the "Automatically allow downloaded signed software to receive incoming connections" option in the firewall settings must be unchecked in order to block incoming connections to container ports published to 0.0.0.0.

This is necessary because Docker Desktop for Mac is presumably signed by Apple.


I am (was) using portainer on my host to manage my docker containers.

Turns out everything was, indeed, publicly accessible. I could fix it for my custom containers, by binding on 127.0.0.1:<port>, but I can't edit portainer's config.

Anyone has a solution for this?


You can't fix this issue by binding to 127.0.0.1

My solution has always been to just use some external firewall (outside of the docker host machine.) Often cloud providers have this as a feature, to configure a firewall for your vps/network.


Does that really protect me against the "routing through" my machine they explain?


   It should, however, restrict the source ip address range to
   127.0.0.1/8 and the in-interface to the loopback interface:

   Chain DOCKER (2 references)
    pkts bytes target prot opt in out     source      destination
       0 0     ACCEPT tcp  --  lo docker0 127.0.0.1/8 172.17.0.2 tcp dpt:5432
I think this would break intra-container network communication, since containers on docker1 (172.17.1.2) would not be coming from source 127.0.0.0/8 or device lo. Docker would need to create an explicit rule matrix from/to each container in each network (does it already do this?)


Nice to see that "Docker and Kubernetes are in first and second place as the most loved and wanted tools" according to the Stack Overflow developer survey.


Docker is definitely a 'good enough' frontend for containers in that it did a lot to make them more accessible, hence why it tops that survey, (presumably).

This can be quite separate from Docker being a security nightmare at the same time.


Takeaway: NAT around netnses is as about as bad a substitute for explicit filtering + proper bind discipline as it always was for local networks. So we should treat containers the same way we've long known to treat the hosts behind a NAT: sure, the NAT itself provides a little bit of defense in depth, but that's basically accidental and services should still be independently secured.


This conversation makes me thankful that I spent time on the extra hoops to get Docker working in rootless mode. (That mode wasn't officially released yet; I imagine it's easier by now.)

It also reminds me of Podman as a promising alternative. Now that it's in Debian Stable, I think it's about time I give it a try.


Docker's behavior is unintutivie but makes sense given how container networking works. If you use UFW read https://github.com/chaifeng/ufw-docker and follow the guide.

Then configuring firewall rules to containers is as easy as

    - name: Open HTTPS
      ufw:
        rule: allow
        proto: tcp
        route: true
        port: 443


I would love to see a version of portscanner which scans for Docker containers on my network, just for fun.


Nice, that's interesting indeed!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: