iJS Blog

International Javascript Conference
28
Sep

Manage infrastructure with Node.js – Part 4

Anyone developing an application in Node.js not only has to deal with the code itself. It also involves connecting infrastructure, including databases, and the execution layer with Docker. How does that work?

In the past three parts of this series, we created an application written in Node.js that provides an API for managing a simple task list. The API deliberately does not use REST, but only separates writing from reading. Writing operations are based on POST, and reading on GET. The actual semantics were shifted to the path of the URL, so that the technical intention is preserved and the API is far more comprehensible than if it were to rely solely on the four technical verbs provided in REST. The current version of the application contains three routes, one for noting and checking off tasks, the other for listing all unfinished tasks:

  • POST /note-todo
  • POST /tick-off-todo
  • GET /pending-todos

The separation of write and read operations is also continued in the code, and the technical designations are also found here. A distinction is made between functions that change the state of the application and functions that read and return the current state. The application implements a simple form of the CQRS design pattern [1], which recommends this separation.

From in-memory to database

Data storage is currently RAM-based in an array. The core is the file ./lib/Todos.js, in which the class Todos is implemented. Its structure initializes an array, which is then modified by the noteTodo and tickOffTodo functions and read by getPendingTodos. Since the functions are already marked as async, it is easy to replace this class with an equivalent variant that accesses a database.

For this purpose, the application will be extended in a plug-in-based manner so that you can decide at startup which data storage to use. It’s helpful that there is only one instance of the Todos class in the entire application, which is created globally in the ./lib/getApp.js file and then only passed on to other functions. The first step is to rename the Todos class to InMemoryTodos, rename the file, and move it to the new ./lib/stores directory.

Next, the ./lib/getApp.js file needs to be adjusted to access the new file and instantiate the renamed class. Apart from adjusting the call to the require function, this only affects a single line:

const todos = new InMemoryTodos();

Next, it’s necessary to ensure that the desired store can be selected and configured from the outside. To ensure that all stores can be created in the same way, a new object is introduced as a parameter in the InMemoryTodos con-structure, which can be used to pass any data. Since the InMemoryTodos class does not expect any parameters, the object remains empty at this point, as Listing 1 shows.

class InMemoryTodos {
  constructor ({}) {
    this.items = [];
  }
 
  // ...
}

At first glance, this change seems pointless, but it ensures that a store must always be called with an object and no special handling is required for stores that do not expect parameters. The call in the ./lib/getApp.js file changes as follows:

const todos = new InMemoryTodos({});

However, the actual parameters should not be hardcoded in the code. With a real database, this would not make sense, since you would want to be able to change the connection string from the outside and it is not advisable to store sensitive data such as access data in your code. Instead, the parameters should be set via the environment variable STORE_OPTIONS. Additionally, the type of store should be configurable by the environment variable STORE_TYPE. The in-memory store and empty object serve as default values as parameters.

To do this, the ./app.js file must be adapted, in which the port is already read from an environment variable. It is convenient that the processenv module converts the contents of the environment variable into the appropriate type, in the case of STORE_OPTIONS into the type object. Then the two parameters must be passed to the getApp function (Listing 2).

'use strict';
 
const getApp = require('./lib/getApp');
const http = require('http');
const { processenv } = require('processenv');
 
(async () => {
  const port = processenv('PORT', 3_000);
 
  const storeType = processenv('STORE_TYPE', 'InMemory');
  const storeOptions = processenv('STORE_OPTIONS', {});
 
  const server = http.createServer(await getApp({ storeType, storeOptions }));
 
  server.listen(port);
})();

There, the two parameters must be received and evaluated. To do this, all available stores are stored in a dictionary called todos, then the appropriate entry is selected based on the value of storeType and called with storeOptions. It was a good idea to perform the initialization in the asynchronous function initialize, since some databases require asynchronous initialization, as can be seen in Listing 3.

// ...
 
const getApp = async function ({ storeType, storeOptions }) {
  const Todos = {
    InMemory: InMemoryTodos
  };
 
  const todos = new Todos[storeType](storeOptions);
 
  await todos.initialize();
 
  // ...
 
  return app;
};
 
module.exports = getApp;

With that, everything is ready to go. Adding support for a concrete database is limited to writing the appropriate store, storing it in ./lib/stores/, importing it into ./lib/getApp.js, and registering it in Todos. Everything else can be controlled from the outside. For example, if you want to support a MongoDB database, you first need to install the mongodb module:

$ npm install mongodb

Then, the ./lib/stores/MongoDbTodos.js file must be created, which contains the class visible in Listing 4. Before MongoDB can be used, the new type must be imported and registered in the ./lib/getApp.js file. While only a common require is needed for the import, the registration is done by adding the new class to the Todos listing:

const Todos = {
  InMemory: InMemoryTodos,
  MongoDb: MongoDbTodos
};
'use strict';
 
const { MongoClient } = require('mongodb');
const { v4 } = require('uuid');
 
class MongoDbTodos {
  constructor ({ url, databaseName }) {
    this.url = url;
    this.databaseName = databaseName;
  }
 
  async initialize () {
    const client = await MongoClient.connect(this.url, {
      useUnifiedTopology: true
    });
    const database = client.db(this.databaseName);
 
    this.collection = database.collection('todos');
  }
 
  async noteTodo ({ title }) {
    const id = v4();
    const timestamp = Date.now();
 
    const todo = {
      id,
      timestamp,
      title
    };
 
    await this.collection.insertOne(todo);
  }
 
  async tickOffTodo ({ id }) {
    const { result } = await this.collection.deleteOne({ id });
 
    if (result.n !== 1) {
      throw new Error('Todo not found.');
    }
  }
 
  async getPendingTodos () {
    return await this.collection.find({}).toArray();
  }
}
 
module.exports = MongoDbTodos;

The application can be launched as usual. Calling $ node app.js will cause the application to continue running with the in-memory store. However, if you set the two environment variables to the desired values, you can also connect to a MongoDB database. The easiest way to do this is to specify the environment variables when calling the process:

$ STORE_TYPE=MongoDb \
  STORE_OPTIONS='{"url":"mongodb://localhost:27017/","databaseName":"test"}' \
  node app.js

Note that the string passed using STORE_OPTIONS must represent a valid serialized JSON object, otherwise the parsing will fail. Unless a MongoDB database happens to be accessible at the specified address, the connection attempt is acknowledged with an ECONNRESET error and the application terminates.

For easy testing, a MongoDB instance can be started with the help of Docker. Of course, Docker must be installed and configured on the respective system. In this case, the following command is sufficient to start MongoDB:

$ docker run -d -p 27017:27017 --name todos-mongodb mongo

Running the aforementioned command again, the API runs in conjunction with MongoDB as a store for the backend. To exit the Docker container again and clean it up, use the following command:

$ docker kill todos-mongodb
$ docker rm -v todos-mongodb

Running the application in Docker

The next step is to package the application into a Docker image so that it can be run not only on the local system but also in Kubernetes at a cloud provider, for example. To ensure that this is done in the same way for each build, it is recommended to write a Dockerfile that contains the build configuration. The first hurdle is choosing a suitable base image. An official base image for Node.js is available on Docker Hub [2], but there are a number of variants.

If you retrieve the node:14.14.0 image, you will have a Node.js installation based on Debian Stretch. This is not wrong in principle, but the image is anything but small at 943 MBytes. Downloading close to 1 GByte for the deployment of a simple application is not efficient. Therefore, it is recommended to use the node:14.14.0-alpine image instead, which requires about one-tenth of the disk space with 117 MBytes.

In general, Alpine Linux has proven to be a good starting point for Docker images. Nevertheless, it should be noted that it can lead to problems under certain circumstances, since Alpine is not based on the GNU C library or a compatible variant [3] like many other Linux distributions, but on the MUSL C library [4]. For Node.js this doesn’t matter, but for other software, it does to some extent.

Another important point is to always specify the full version of the Docker image. For example, the image node:14.14.0-alpine can also be addressed as node:14.14-alpine, node:14-alpine or node:alpine. However, as soon as there are newer versions, the latter tags refer to a new image at once. For the sake of build reproducibility, you should avoid this approach and always specify a tag that is as specific as possible.

After these considerations, the first line of the ./Dockerfile file looks like this:

FROM node:14.14.0-alpine

The next step is to copy the code into the image. When doing this, it is important to make sure that you do not copy the node_modules directory as well, because not everyone develops based on a Linux system, and not all dependencies are written in JavaScript. Since dependencies compiled on macOS will not run on Linux, for example, you should avoid adding the node_modules directory and run an npm install inside the image.

The easiest way to do this is to create a ./.dockerignore file with the following content to exclude the node_modules directory as well as the .git directory (the latter is not needed to run the application, but may be very large and inflate the image unnecessarily):

.git
node_modules

To copy the code to the image and run npm install, just make the following adjustment in the Dockerfile:

FROM node:14.14.0-alpine
 
ADD . /app/
RUN cd /app
RUN npm install

This code, although very simple, will not work due to a bug and violates several recommendations at once.

The first thing to do is fix the error. The problem is the line where cd /app supposedly changes to the directory where the code was copied into. In fact, the directory is also changed – however, each RUN statement always starts again in the root directory, which is why the change of directory for the subsequent call of npm install practically did not take place. Instead, the two calls must be combined into one:

FROM node:14.14.0-alpine
 
ADD . /app/
 
RUN cd /app && \
  npm install

If you call the command $ docker build ., the application image is built. It is not very efficient yet, nor optimized in any way, but it works in principle (it cannot be started yet, but more about that later).

The first optimization that can be done is to install only those dependencies listed in the dependencies section of package.json, and not the devDependencies. To install only the dependencies that are needed in production, you need to modify the Dockerfile as follows:

FROM node:14.14.0-alpine
 
ADD . /app/
RUN cd /app && \
  npm install –production

If you build Docker images frequently, you will notice that Docker tries to cache as many build steps as possible and reuse them in the next build. To do this, Docker must be able to assume that a build step will return the same result when it is re-executed. This means that the ADD statement invalidates the cache as soon as a single file has changed. As a result, even if you only correct a typo in an application message, npm install will run again – and that costs time and annoyance.

It would be much more practical if npm install actually only ran when there was a change to the ./ package.json file. Since such changes occur much less frequently than changes to the code, this would dramatically speed up the build in most cases. This can be achieved by first copying only the ./package.json file into the image, and copying the rest of the application later (Listing 5).

FROM node:14.14.0-alpine
 
ADD ./package.json /app/package.json
 
RUN cd /app && \
  npm install --production
 
ADD . /app/

As long as the ./package.json file remains untouched, the first call to the ADD statement does not invalidate the cache, so Docker assumes that the RUN is also executed and falls back to the cached state. Now the statement shown in Listing 6 is missing so Docker knows how to start the application.

FROM node:14.14.0-alpine
 
ADD ./package.json /app/package.json
 
RUN cd /app && \
  npm install --production
 
ADD . /app/
 
CMD [ "node", "/app/app.js" ]

By default, Docker uses the root user with administrative privileges to run the code in a Docker image. This is not very useful in daily use and can even be downright dangerous. Therefore, it is advisable to use a user with limited privileges. Fortunately, such a user is already included in the node base image, you just have to activate it. This is done with the USER statement; the user is called node. However, you have to be careful where you copy the code to, so that the user has access rights to it. It is a good idea to use a directory within the home directory of the node user. If you also set the application directory with WORKDIR as the working directory (Listing 7), you can save yourself the trouble of formulating the paths within the image.

FROM node:14.14.0-alpine
 
ADD ./package.json /home/node/app/package.json
WORKDIR /home/node/app/
 
RUN npm install --production
ADD . /home/node/app/
 
USER node
 
CMD [ "node", "app.js" ]

Now you have to create a container from the image and start it. For this purpose, it is advisable to assign a name to the image during the build so that you don’t have to laboriously identify it via the hash:

$ docker build -t thenativeweb/todolist .

Now you can create and start a container. It is important to enable the required port, otherwise, the access to the container will not work:

$ docker run \
  -d \
  -p 3000:3000 \
  thenativeweb/todolist

If you want to change the port within the container, for example to 4000, but on the host system it should remain mapped to 3000, two things are necessary: First, the port forwarding must be adjusted, and second, the environment variable PORT must be set to the value 4000 within the container:

$ docker run \
  -d \
  -p 3000:4000 \
  -e PORT=4000 \
  thenativeweb/todolist

By default, Node.js runs in development mode, but you can switch it to production mode if needed. To do this, set the NODE_ENV environment variable (Listing 8).

$ docker run \
  -d \
  -p 3000:4000 \
  -e PORT=4000 \
  -e NODE_ENV=production \
  thenativeweb/todolist

What’s still missing is a reasonable handling of signals that the operating system sends to the container if necessary. It’s the task of the process with ID 1 to take care of this. On unix-like operating systems, this process is called the init process. In a Docker container, the process that is started first gets ID 1, and in this case, that is Node.js. However, Node.js is not designed to run as an init process, so it is advisable to use an additional small init process. For this purpose, tini [5] is available, which has been directly integrated into Docker since Docker 1.13. To use it, it is sufficient to call Docker with the –init parameter, as can be seen in Listing 9.

$ docker run \
  -d \
  -p 3000:4000 \
  -e PORT=4000 \
  -e NODE_ENV=production \
  --init \
  thenativeweb/todolist

Additionally, STORE_TYPE and STORE_OPTIONS can also be set as environment variables in the container via the -e parameter to connect to MongoDB.

Outlook

This concludes the fourth part of this series on Node.js. The application is developed and executable in a first version, verified by code analysis and tests, supports different databases for storing tasks via a plug-in system, and can be packaged into a Docker image and executed as a Docker container.

The next step is to develop a client for the application so that access does not always have to be via curl or similar tools. This will be the topic of the fifth and last part of this series.

The author’s company, the native web GmbH, offers a free video course on Node. js [6] with close to 30 hours of playtime. Episode 16 of this video course deals with the topics covered in this article, such as packaging a Node.js application in a Docker image. This course is recommended for anyone interested in more details.

Links & Literature

[1] https://www.youtube.com/watch?v=k0f3eeiNwRA&t=1s

[2] https://hub.docker.com/_/node/

[3] http://www.etalabs.net/compare_libcs.html

[4] http://www.musl-libc.org

[5] https://github.com/krallin/tini

[6] https://www.thenativeweb.io/learning/techlounge-nodejs

STAY TUNED!

 

BEHIND THE TRACKS OF iJS

Angular

Best-Practises with Angular

Vue.js

One of the most famous frameworks of modern days

JavaScript Practices & Tools

DevOps, Testing, Performance, Toolchain & SEO

Node.js

All about Node.js

React

From Basic concepts to unidirectional data flows