How to optimize Node.js Docker image (Part 1)

October 1, 2020 | Last updated: February 16, 2024

Table of Contents

Introduction
Contents
Prerequisities
Preparations
Reducing the Number of Processes
Signal Handling
Caching Layers
Using ENTRYPOINT and CMD together

Build Docker images on a push to Git. Cache layers at no extra charge. 💸

Introduction

Docker is a containerization method that lets you pack applications and dependencies in so-called images and then run them basically on any operating system, in a similar manner to a virtual machine. You can run your app in a single container, run multiple containers with Docker Compose, heck, even go big time and scale your app on Docker Swarm or a Kubernetes cluster!

Don't worry, though – in this guide, we'll stick to the basics.

Let's say you've just built an amazing Node.js application and want to distribute it as a Docker Node image. You write a Dockerfile, build the image, and distribute the generated image on a registry such as Docker Hub.

"Not too shabby!", you utter to yourself. Then, you run docker pull, docker run and... start wondering why it's taking so long. Being the perfectionist you are, you ask yourself a question: is there anything you can do to speed things up? Is your image actually optimized to do the job?

Well, that's exactly what we'll cover in this article! However, since there's a lot of best practices for optimizing Docker images, we've split this article into several parts.

Part 1

This part of the guide covers the following issues:

Reducing the number of running processes
Handling signals in a proper manner
Making use of the build cache
Using ENTRYPOINT
Using EXPOSE to document exposed ports

Part 2

In the second part, we cover:

Reducing the Docker image filesize by:
- Removing obsolete files
- Using a lighter base image
Using labels (LABEL)
Adding semantics to labels
Linting your Dockerfile

Part 3

The last article published covers securing our Docker image, in which we'll cover:

The Principle of Least Privilege
Signing and verifying Docker Images
Using .dockerignore to ignore sensitive files (similar to .gitignore file)
Vulnerability Scanning

Although this article deals with a Node.js application, the best practices outlined here apply to apps written in other languages and frameworks as well.

Prerequisities

The first step is installing some basic stuff (which you probably already have):

Git and Docker Desktop installed for your operating system
(Optional) A Docker Hub account for pulling and pushing images

For the purpose of this guide Docker Desktop will suffice (no need to install Docker Compose and other advanced tools).

Preparations

In this guide we'll try to optimize a Dockerfile for a basic front-end application. The source code for the app is available at github.com/buddy-works/tutorials-docker-node-frontend.

For this guide, however, you only need the docker/basic branch:

git clone -b docker/basic https://github.com/d4nyll/docker-demo-frontend.git

With the repos cloned, open the Dockerfile in a text edit to see what lines (instructions) are already there:

FROM node
WORKDIR /root/
COPY . .
RUN npm install
RUN npm run build
CMD npm run serve

You can find all valid instructions at the Dockerfile reference page.

These instructions are pretty much self-explanatory for developers who already worked with Docker. Here's a brief overview if you need are new to the party:

FROM node – use the official Node Docker image as the base image
WORKDIR /root/ – run all subsequent Dockerfile's instructions in the specified directory (similar to running cd /root/ in your terminal)
COPY . . – copy everything from the build context to the current WORKDIR
RUN npm install – install dependencies specified JSON file (package.json and package-lock.json)
RUN npm build – build application with WebPack (as specified in package.json)
CMD npm run serve – specifies the first process to run inside the container (executed upon running the image)

Let's try building our image for the first time:

docker build -t demo-frontend:basic .
...
Removing intermediate container a3d5032b851b
 ---> 703e723acecf
Successfully built 703e723acecf
Successfully tagged demo-frontend:basic

You should now be able to see the demo-frontend:basic with the ID and image size when you run docker images:

docker images
REPOSITORY     TAG     IMAGE ID      SIZE
demo-frontend  basic   703e723acecf  939MB
node           latest  b18afbdfc458  908MB

With everything verified, you can launch the Docker container:

docker run --name demo-frontend demo-frontend:basic

> frontend@1.0.0 serve /root
> http-server ./dist/

Starting up http-server, serving ./dist/
Available on:
  http://127.0.0.1:8080
Hit CTRL-C to stop the server

Run npm run serve and you should see the application's (rather rudimentary) interface on the URL 127.0.0.1:8080:

Running application from Node image Running application from Node image

Reducing the Number of Processes

Examining internal processes

With the Docker container running, you can run docker exec on another terminal to see what processes are running inside our container.

docker exec demo-frontend ps -eo pid,ppid,user,args --sort pid
  PID  PPID USER  COMMAND
    1     0 root  /bin/sh -c npm run serve
    6     1 root  npm
   17     6 root  sh -c http-server ./dist/
   18    17 root  node /root/node_modules/.bin/http-server ./dist/
   25     0 root  ps -eo pid,ppid,user,args --sort pid

That's a lot of interconnected processes that are not needed to run our application – and each one takes up a huge chunk of the container's memory.

It would be ideal if we could just run node /root/node_modules/.bin/http-server ./dist/ command and nothing else.

The ps command is the same that we are running with docker exec and can be ignored.

Check out our Docker Commands Cheat Sheet for the list of the most useful Docker commands.

Avoid using npm script

It's best not to use npm as a Docker command because, as you saw above, npm invokes a sub-shell and executes the script inside that sub-shell, yielding a redundant process.

Instead, you should specify the command directly as the value of the CMD instruction:

FROM node
WORKDIR /root/
COPY . .
RUN npm install
RUN npm run build
CMD node /root/node_modules/.bin/http-server ./dist/

Next, try to stop your existing http-server instance by pressing CTRL+C. Hmmm, seems like it's not working, but why? Don't worry, we'll explain that shortly.

For now, launch and serapate terminal run the following commands to stop and remove the container:

docker stop demo-frontend
docker rm demo-frontend

With a clean slate, let's build a new image and run it:

docker build -t demo-frontend:no-npm .
docker run --name demo-frontend demo-frontend:no-npm

Once again, run docker exec in a separate terminal. You'll notice that the number of processes have been reduced from 4 to 2:

docker exec demo-frontend ps -eo pid,ppid,user,args --sort pid
  PID  PPID USER  COMMAND
    1     0 root  /bin/sh -c node /root/node_modules/.bin/http-server ./dist/
    6     1 root  node /root/node_modules/.bin/http-server ./dist/
   13     0 root  ps -eo pid,ppid,user,args --sort pid

If you calculate the real memory used by the Docker container before and after the change, you'll find that you saved ~16MB just by removing the superfluous npm and sh functions.

However, our node command is still being run inside the /bin/sh shell.

The quesion is: how to get rid of that shell and invoke node as the first and only process inside our container? To answer that, we must understand and use the exec form syntax in our Dockerfile.

Using the Exec Form

Docker supports two different syntaxes to specify Dockerfile: the shell form, which is what we've been using, and the exec form.

The exec form specifies the command, its options, and arguments in the form of a JSON array, rather than a simple string.

Translated into exec, our Dockerfile would look like this:

FROM node
WORKDIR /root/
COPY . .
RUN ["npm", "install"]
RUN ["npm", "run", "build"]
CMD ["node", "/root/node_modules/.bin/http-server" , "./dist/"]

Shell vs. Exec Form

The practical difference is that with the shell form, Docker will implicitly invoke a shell and run the CMD command inside that shell (this is what we saw earlier).

With the exec form, the command we specified is run directly, without first invoking the shell.

Again, stop and remove the existing demo-frontend Docker container:

docker stop demo-frontend && docker rm demo-frontend

Let's rebuild the Dockerfile into the exec form:

docker build -t demo-frontend:exec .

Next, run the image:

docker run --name demo-frontend demo-frontend:exec

Finally, run docker exec to query the container's processes:

docker exec demo-frontend ps -eo pid,ppid,user,args --sort pid
  PID  PPID USER     COMMAND
    1     0 root     node /root/node_modules/.bin/http-server ./dist/
   12     0 root     ps -eo pid,ppid,user,args --sort pid

Now, the only process running in the container is the node process.

Congratulations! We have successfully reduced the number of running processes to just one.

Signal Handling

However, saving a single process is not the reason why we prefer the exec form over the shell form. The real reason is because of signal handling.

On Linux, different processes can communicate with each other through inter-process communication (IPC). One method of IPC is signalling.

If you use the command line, you've probably used signals without realizing it. For example, when you press CTRL+C in the terminal, you're actually instructing the kernel to send a SIGINT signal to the process, requesting it to stop.

Remember when we tried to stop our container with CTRL+C and it didn't work? Let's try that again with the image running as demo-frontend:exec. This time, the http-server stops successfully:

docker run --name demo-frontend demo-frontend:exec
Starting up http-server, serving ./dist/
Available on:
  http://127.0.0.1:8080
  http://172.17.0.2:8080
Hit CTRL-C to stop the server
^Chttp-server stopped.

Why did it work now, but not before? This is because when we send the SIGINT signal from our terminal, we are actually sending it to the first process run inside the container. This process is known as the init process, and has the PID of 1.

Therefore, the init process must have the ability to listen for the SIGINT signal. When it receives the signal, it must try to shut down gracefully. For example, a web server must stop accepting any new requests, wait for any remaining requests to finish, and only then exit. Makes sense now.

With the shell form, the init process is /bin/sh. When /bin/sh receives the SIGINT signal, it'll simply ignore it. Therefore, our container and the http-server process won't be stopped.

When we run docker stop demo-frontend, the Docker daemon similarly sends a SIGTERM signal to the container's init process, yet again, /bin/sh ignores it. After around 10 seconds, the Docker daemon realizes the container is not responding to the SIGTERM signal, and issues a SIGKILL sugnal instead, which forcefully kills the process.

time docker stop demo-frontend

real  0m10.443s
user  0m0.072s
sys      0m0.022s

The SIGKILL signal cannot be handled; this means processes within the container do not get a chance to shut down gracefully. For a web server, it might mean that existing requests won't have a chance to run to completion, and your client might have to retry that request again.

In comparison, when we use the exec form, the init process is node and it will handle the SIGINT and SIGTERM signals. You can either include a process.on('SIGINT') handler yourself, or the default one will be used. The point is, with node as the first command, you have the ability to catch signals and handle them.

To demonstrate, with the new image built using the exec form Dockerfile, the container can be stopped in under half a second.

time docker stop demo-frontend

real  0m0.420s
user  0m0.053s
sys      0m0.026s

If the application you are running cannot handle signals, you should run the container with the --init flag. The flag executes Tini as its first process, a minimalistic init system which, unlike sh, can handle and propagate signals.

Caching Layers

So far, we've looked at techniques that improve the function of our Docker image whilst it's running. In this section, we'll look at how we can use Docker's build cache to make the build process faster.

Here's what happens when we're building the Docker image:

Docker runs the base image as a container.
Executes each instruction sequentially on top of it.
Saves the resulting state of the container in a layer.
Uses that layer as the base image for the next instruction.
Repeats the process to build the image layer by layer.

You can conceptualize a layer as a diff from the previous layer.

Docker layer caching (c) Docker Inc. Docker layer caching (c) Docker Inc.

However, pulling or building an image from scratch on every run can be very time-consuming. This is why Docker attempts to use an existing cached layer whenever possible. In other words, if Docker determines that the next instruction will yield the same result as an existing layer, it will use the cached layer.

For example, let's say we've updated something inside the src directory; when we run the build command again, Docker will use the cached layer associated with the FROM node and WORKDIR /root/ instructions.

When it gets to the COPY instruction, however, it will notice that the source code has changed, invalidate the cached layer, and build it from scratch. This will also invalidate every layer that comes after it. Therefore, every instruction after the COPY instruction must be built again.

In this instance, this build process takes about 10 seconds:

time docker build -t demo-frontend:exec .
Sending build context to Docker daemon    511kB
Step 1/6 : FROM node
 ---> a9c1445cbd52
Step 2/6 : WORKDIR /root/
 ---> Using cache
 ---> 7ac595062ce2
Step 3/6 : COPY . .
 ---> 3c2f3cfb6f92
Step 4/6 : RUN ["npm", "install"]

...
Successfully built 326bf48a8488
Successfully tagged demo-frontend:exec

real  0m10.387s
user  0m0.187s
sys   0m0.089s

Still, making a small change in the source code (e.g. fixing a typo) shouldn't affect the dependencies of our application, and so there's really no need to run npm install again. However, because the cache is invalidated in an earlier step, every subsequent step must be re-ran from scratch.

To optimize this, we should copy only what is needed for the immediate next step. This means if the next step is installs dependencies with npm, we should COPY only the package.json and package-lock.json, and nothing else.

Update our Dockerfile to copy only what is needed for the next immediate step:

FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
CMD ["node", "/root/node_modules/.bin/http-server" , "./dist/"]

By copying only what is needed immediately, we allow more layers of the image to be cached. Now, if we update the /src directory again, every instruction and layer up until COPY ["src/", "./src/"] is going to be cached:

time docker build -t demo-frontend:cache .
Step 1/8 : FROM node
Step 2/8 : WORKDIR /root/
 ---> Using cache
Step 3/8 : COPY ["package.json", "package-lock.json", "./"]
 ---> Using cache
Step 4/8 : RUN ["npm", "install"]
 ---> Using cache
Step 5/8 : COPY ["webpack.config.js", "./"]
 ---> Using cache
Step 6/8 : COPY ["src/", "./src/"]
Step 7/8 : RUN ["npm", "run", "build"]
...
Successfully tagged demo-frontend:cache

real    0m3.175s
user    0m0.193s
sys    0m0.132s

And again: instead of taking ~10 seconds to build, it takes only ~3 seconds. Of course, the mileage may vary for every user, but using the cache will always be faster.

You can find more details on caching, including how Docker determines when a cache is invalidated, on the official Dockerfile Best Practices page.

Using ENTRYPOINT and CMD together

Right now, the command executed on run is specified by the CMD instruction. This command can be overridden by the user of the image (command invoker). For example, if I want to use a different port, e.g. 4567 instead of the default 8080, then I can run:

docker run --name demo-frontend demo-frontend:cache node /root/node_modules/.bin/http-server ./dist/ -p 4567

Starting up http-server, serving ./dist/
Available on:
  http://127.0.0.1:4567
  http://172.17.0.2:4567
Hit CTRL-C to stop the server

However, we have to specify the whole command in its entirety. This requires the image user to know where the executable is located within the container (i.e. /root/node_modules/.bin/http-server).

All in all, we should make it as easy as possible for the user to run our application. How about running a containerized application in the same way as the non-containerized application?

Let's do it!

Instead of using only CMD, we can use the ENTRYPOINT instruction to specify the default command and options to run, and use the CMD instruction to specify any additional options that are commonly overridden.

Update the Dockerfile so that it makes use of the ENTRYPOINT instruction.

FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]

Now, build the image with the entrypoint instruction:

docker build -t demo-frontend:entrypoint .

Using this method, the user can run the image as if it was the http-server command, and does not need to know the underlying file structure of the container.

docker run --name demo-frontend demo-frontend:entrypoint -p 4567

The command specified by the ENTRYPOINT instruction can also be overridden using the --entrypoint flag. For example, if we want to run the /bin/sh shell inside the container, you can run:

docker run --name demo-frontend -it --entrypoint /bin/sh demo-frontend:entrypoint

# hostname
1b64852541eb

Using EXPOSE to document exposed ports

Lastly, let's finish up the first part of this article with some documentation. By default, our http-server listens on port 8080; however, a user of our image won't know this without looking up the http-server documentation. Likewise, if we are running our own application, the user would have to look inside our implementation code to know which port the application listens on.

We can make it easier for the user by using an EXPOSE instruction to document which ports and protocol (TCP or UDP) the application expects to listen on. This way, the user can easily figure out which ports needs to be published.

FROM node
WORKDIR /root/
COPY ["package.json", "package-lock.json", "./"]
RUN ["npm", "install"]
COPY ["webpack.config.js", "./"]
COPY ["src/", "./src/"]
RUN ["npm", "run", "build"]
ENTRYPOINT ["node", "/root/node_modules/.bin/http-server" , "./dist/"]
EXPOSE 8080/tcp

Once again, build the image, this time with the expose instruction:

docker build -t demo-frontend:expose .

Now a user can see which ports are exposed either by looking at the Dockerfile, or by inspecting the image:

docker inspect --format '{{range $key, $value := .ContainerConfig.ExposedPorts}}{{ $key }}{{end}}' demo-frontend:expose
8080/tcp

Note that the EXPOSE instruction does not publish the port. If the user wishes to publish the port, they would have to either:

use the -p flag on docker run to individually specify each host-to-container port mapping
use the -P flag to automatically map all exposed container port(s) to an ephemeral high-ordered host port(s)

Summary

By following the 5 best practices outlined above, we have improved our Dockerfile and Docker image. However, this is only the beginning! Keep an eye out for the next part of this article, where we will reduce the Docker image size, learn to use labels, and lint our Dockerfile. 💪

Optimize Docker deployment with Buddy. Save time and get the job done. ⏰

Try Buddy for Free

Additional resources

Daniel Li

Staff Software Engineer @ Zinc Work

Daniel Li is a DevOps Engineer and Fullstack Node.js Developer, working with AWS, Ansible, Terraform, Docker, Kubernetes, and Node.js. He is the author of the book Building Enterprise JavaScript Applications, published by Packt.

Read similar articles

Docker Commands Cheat Sheet

Check out our tutorial

A beginner's guide to configuring a Discord Bot in Node.js

Check out our tutorial

Integration testing for AWS Lambda in Go with Docker-compose

Check out our tutorial