Skip to main content
  1. Posts/
  2. Docker/

·917 words·5 mins·
Jaume Sabater
Author
Jaume Sabater
CTO and systems engineer

Dockerfiles
#

A Dockerfile is a text document that contains a set of instructions used to build a Docker image. Docker images are the blueprints for containers. The Dockerfile acts as a recipe, specifying everything needed to assemble the image: the base operating system, software dependencies, environment variables, network configurations, and the application code itself.

By automating the image creation process, a Dockerfile ensure consistency across different environments, from development to production.

Here is a Dockerfile that creates an image based on postgres:17 and includes an initialization script:

FROM postgres:17

# Copy the initialization script into the container
COPY student.sql /docker-entrypoint-initdb.d/student.sql   

The initialisation script student.sql would be sitting right next to the Dockerfile, and its contents could be like this:

CREATE TABLE IF NOT EXISTS student (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50)
);
INSERT INTO student (id, name)
VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'),
       (4, 'David'), (5, 'Elisabeth'), (6, 'Ferdinand')
ON CONFLICT (id) DO NOTHING;

And we would build an image from this Dockerfile by using the following command:

docker build --tag mypostgres:latest .

Once the image has been build, you can list it using the following command:

docker images

SQL files inside /docker-entrypoint-initdb.d/ being executed automatically upon the first startup is a feature of the specific PostgreSQL image setup made by its maintaners.

Make sure you have a clean slate before creating new containers with shared volumes. You can reuse the same volume if the version of PostgreSQL stays the same.

Command Purpose
docker rm --force demo-postgres Stop and remove a container
docker volume rm postgres-data Remove an unused volume

Now that the image is built, we can run the container. If it does not exist yet, start by creating the volume:

docker volume create postgres-data

Then, run the container as we did previously:

docker run --name demo-postgres \
  --env POSTGRES_PASSWORD=mypassword \
  --env POSTGRES_USER=demouser \
  --env POSTGRES_DB=demodb \
  --publish 5432:5432 \
  --volume "postgres-data:/var/lib/postgresql/data" \
  --detach \
  mypostgres:latest

Here is a list of the most relevant instructions in a Dockerfile:

  • FROM: Specifies the base image for the Docker image.
  • RUN: Executes commands in a new layer. Used to install software or make system changes.
  • COPY: Copies files from the host machine into the Docker image.
  • ADD: Similar to COPY, but can also extract compressed files.
  • WORKDIR: Sets the working directory for subsequent commands.
  • CMD: Specifies the default command to run when the container starts.
  • EXPOSE: Documents which ports the container will use. This is not a security feature.
  • ENV: Sets environment variables inside the container.
  • LABEL: Adds metadata to the image. Optional, but useful when documenting.
  • USER: Change to this username, so commands following this instruction will be executed as this user.

A more elaborate Dockerfile to build our own custom image could look like this:

FROM postgres:17

# Extend PostgreSQL to support storing, indexing and querying geographic data
RUN apt-get update && \
    apt-get install --yes --no-install-recommends postgresql-17-postgis-3 && \
    apt-get clean

# Copy and extract the initialitzacion scripts into the container
ADD init.tar.gz /docker-entrypoint-initdb.d/

# Document the port PostgreSQL is listening to inside the container
EXPOSE 5432

# Document the maintainer via metadata
LABEL maintainer="[email protected]"

Each RUN instruction creates a new layer (i.e., a snapshot of the image). Minimizing layers improves build performance and reduces image size, therefore consolidating multiple operations into one layer is considered a best practice, as it helps maintain the efficiency of Docker’s layer caching system.

Our init.tar.gz archive will consist of two files:

  1. A city.sql script that will install the postgis extension, create the city table and load some data into it.
  2. A student.sql script that will create the student table and load some data into it.

Our student.sql file could look the same as before:

-- student.sql
CREATE TABLE IF NOT EXISTS student (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50)
);
INSERT INTO student (id, name)
VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie'),
       (4, 'David'), (5, 'Elisabeth'), (6, 'Ferdinand')
ON CONFLICT (id) DO NOTHING;

Our new city.sql will install and make use of the PostGIS extension we added support for:

-- city.sql
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE TABLE IF NOT EXISTS city (
  id SERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  location GEOMETRY(POINT, 4326) NOT NULL
);
INSERT INTO city (name, location)
VALUES
  ('Barcelona', ST_GeomFromText('POINT(2.17340 41.38879)', 4326)),
  ('València', ST_GeomFromText('POINT(-0.37739 39.46975)', 4326)),
  ('Sevilla', ST_GeomFromText('POINT(-5.98376 37.38296)', 4326)),
  ('Palma', ST_GeomFromText('POINT(2.65024 39.56975)', 4326)),
  ('Maó', ST_GeomFromText('POINT(1.49675 39.65029)', 4326)),
  ('Ciutadella', ST_GeomFromText('POINT(4.00649 39.99439)', 4326)),
  ('Eivissa', ST_GeomFromText('POINT(1.44216 38.90629)', 4326))
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name;

Because we are demonstrating the ADD directive, we will need to archive the student.sql and city.sql files:

tar --create --gzip --file init.tar.gz city.sql student.sql

We are now ready to use this Dockerfile and its associated archive to build our custom image:

docker build --tag mypostgres:latest .

Make sure you are not reusing an existing volume for this new PostgreSQL container. Otherwise, your SQL files will not be executing upon first run.

And, then, run the container:

docker run --name demo-postgres \
  --env POSTGRES_PASSWORD=mypassword \
  --env POSTGRES_USER=demouser \
  --env POSTGRES_DB=demodb \
  --publish 5432:5432 \
  --volume "postgres-data:/var/lib/postgresql/data" \
  --detach \
  mypostgres:latest

Once the container is running, we would interactively execute psql inside the container:

docker exec -it demo-postgres psql -U demouser -d demodb

And run the following query to find the closest available city to Granada (latitude: 37.17702, longitude: -3.59825):

SELECT name, ST_Distance(location, ST_SetSRID(ST_MakePoint(-3.59825, 37.17702), 4326)) AS distance
FROM city
ORDER BY location <-> ST_SetSRID(ST_MakePoint(-3.59825, 37.17702), 4326) ASC
LIMIT 1;

We would quite the interactive shell by sending the Ctrl + D command or typing the \q.

Related

·413 words·2 mins

·555 words·3 mins

·762 words·4 mins