March 8, 2022
This article is the third part of the A Vercel-like PaaS beyond Jamstack with Kubernetes and GitOps series.
In part I, I've set up a Kubernetes cluster with k0s. Then in part II, I've configured a GitLab pipeline to build Docker images and deploy applications on this cluster.
Now I'm going to write the required Dockerfile
to build those Docker images.
Since the first stage of my GitLab pipeline
is the package stage, I'll start this third part by creating the Dockerfile
to
complete the stage and pass to the next deploy stage.
Before that, I must take a step aside and talk about some specifics about application examples. This will give a full explanation of how every piece of the setup connects to each other and to understand the functional scope of each part.
As a reminder, for the purpose of this experiment I've created Node.js, PHP, Python and Ruby web applications. These are the applications I'll talk about in the next section.
At the end of part I, I've made a brief description of how traffic flows from the client to the application:
โ 1.client DNS ok and 443/TCP port openโโ 2.host k0s installedโโ 3.ingress ingress-nginx installedโ4.serviceโ5.podโ6.containerโ7.application
The last component at the end of this diagram, the 7.application
, represents
not only the code inside the container but the process that is running this code,
and listens for incoming connections.
To do so, every application must implement these two requirements:
3000/TCP
.0.0.0.0
instead of localhost
.The Node.js implementation is done like this in the app.js file:
const host = "0.0.0.0";const port = 3000;require("http").createServer((req, res) => {...}).listen(port, host, () => {
Likewise, the Python implementation in the app.py file:
hostName = "0.0.0.0"serverPort = 3000...if __name__ == "__main__":webServer = HTTPServer((hostName, serverPort), Server)...try:webServer.serve_forever()
The Ruby implementation in the app.rb file:
server = TCPServer.new 3000
For the PHP application, incoming connections are handled by the PHP command line and its built-in web server, see in the Dockerfile:
CMD ["-S", "0.0.0.0:3000", "app.php"]
In the real world, traffic is not necessarily handled this way and a dedicated web server such as nginx might stand in front and act as a reverse proxy, forwarding requests to an event-driven server, instead of a process-based server, that runs the application code.
For instance, requests are usually forwarded by an nginx or Apache server and handled by a php-fpm server for PHP, a unicorn server for Ruby, and gunicorn for Python.
Outside of the hardcoded port value, the Node.js implementation can be done this way, though, because Node.js has a built-in event-driven webserver and Kubernetes will act as the process manager, a task that is usually delegated to PM2.
Docker images are built in the package stage of the pipeline with the following command:
$ docker build -t ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} \--build-arg COMMIT_SHORT_HASH=${CI_COMMIT_SHORT_SHA} .
Dockerfiles are almost identical throughout repositories. Given the Node.js example,
the Dockerfile
contains the following instructions:
1# define the base system2FROM node:16-slim34# read value of COMMIT_SHORT_HASH passed with --build-arg5ARG COMMIT_SHORT_HASH67# copy COMMIT_SHORT_HASH value to COMMIT variable8ENV COMMIT $COMMIT_SHORT_HASH910# copy the GitLab repository into the image11COPY . /src1213# move the current working directory to repository root14WORKDIR /src1516# define the default program executed when running the image17ENTRYPOINT [ "node" ]1819# define arguments passed to the default program20CMD [ "app.js" ]
There are a lot of things to explain here, but first:
The term build-time refers to the moment the docker build
command is executed.
The term run-time refers to the moment the docker run
command executed,
or when a container has been deployed to Kubernetes.
Variables are passed at build-time with the --build-arg
flag
They can be read with the ARG
instruction as in the Dockerfile
at line 5.
They are not persisted at run-time.
To persist a build-time variable at run-time, its value must be copied to
another variable as it's done with the ENV
instruction in the Dockerfile
at
line 8.
The following command illustrates this behaviour by dumping all variables with the
printenv
command,
COMMIT_SHORT_HASH
doesn't exist but COMMIT
does and contains the copied value:
$ docker run --entrypoint printenv \${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binHOSTNAME=e63a3f3ff400COMMIT=7c77eb36NODE_VERSION=16.13.0YARN_VERSION=1.22.15HOME=/root
Variables are passed at run-time with the -e
flag.
If the value was already set at build-time with an ENV
instruction in the
Dockerfile
, it is overwritten.
In the following example, I'm overwriting with the -e
flag at run-time the
COMMIT
variable that has been set at build-time with the ENV
instruction at
line 8 of the Dockerfile
,
$ docker run --entrypoint printenv \-e COMMIT="A different value" \${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/binHOSTNAME=e63a3f3ff400COMMIT=A different valueNODE_VERSION=16.13.0YARN_VERSION=1.22.15HOME=/root
FROM
, COPY
and RUN
instructionsFROM
, COPY
and RUN
instructions cannot be at run-time since
they exist only at build-time to construct the image's file system that will be
mounted in the container at run-time.
Changing the content of an image can be done with the docker commit
command. I definitely don't recommend using this command, nor to create a workflow
that integrates such practice. Images should be reproducible, meaning they should
be built from a Dockerfile
only.
WORKDIR
, ENTRYPOINT
and CMD
instructionsWORKDIR
and ENTRYPOINT
can also be at run-time with --workdir
or -w
, and --entrypoint
flags respectively.
Though, it is unlikely to happen since an image is usually built to run the command set in the entrypoint.
A valid case would be to switch the entrypoint from node
to npm
for instance.
CMD
can also be at run-time, and is more likely to be
to pass arguments to the application, when environmental variables cannot be used.
Another valid case to overwrite the CMD
instruction would be if the ENTRYPOINT
instruction is also .
For instance, the following command will overwrite most instructions of the Dockerfile:
$ docker run --rm \-e COMMIT="a different value" \ # overwrite line 8--workdir /home \ # overwrite line 14--entrypoint sh \ # overwrite line 17${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} \-c "echo \$COMMIT" # overwrite line 20
I use the shortened Git commit hash as the image tag to identify what code an image contains:
$ docker build -t ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} .
I also pass this value to the Docker build
command with the --build-arg
flag
so that I can copy it to an environmental variable as explained in the previous section:
$ docker build -t ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} \--build-arg COMMIT_SHORT_HASH=${CI_COMMIT_SHORT_SHA} .$ docker push ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}
The video below shows the whole package stage running:
Images have been built and pushed to the Container Registry. The last missing
configurations are Kubernetes manifests to allow the deploy stage to deploy
applications to the Kubernetes cluster with kubectl
.
A Vercel-like PaaS beyond Jamstack with Kubernetes and GitOps, part IV: Kubernetes manifests