Introduction

Docker is an open-source platform that enables developers to automate the deployment, scaling, and management of applications using containerisation. It provides the capability to create, deploy, and manage containers, ensuring that applications run consistently across different environments. The key advantages of using docker are:

Consistency Across Environments: Docker ensures your application runs the same way in development, testing, and production environments. Containers encapsulate all dependencies, configurations, and code, making moving applications across different environments easier without compatibility issues.
Horizontal Scaling: Docker makes it straightforward to scale applications horizontally by adding more containers to handle increased load. This is especially useful for microservices architectures where different components can be scaled independently.
Resource Utilization: Containers share the host OS kernel and resources, making them more lightweight and efficient than traditional virtual machines. They start up quickly and use less memory and CPU.
Process Isolation: Docker provides a high level of isolation between containers, enhancing security by containing vulnerabilities within a single container.

In this article, you’ll learn to deploy your database with Docker and access it with Pyscopg2 in Python.

Postgres Initialization

I initialized the database using an init.sql file.

-- Create schema
CREATE SCHEMA IF NOT EXISTS MOVIE_DATA;




-- create and populate tables
create table if not exists MOVIE_DATA.NETFLIX_DATA
(
    user_id  serial primary key,
    subscription varchar not null,
    join_date varchar,
    last_payment_date varchar,
    country varchar,
    age int,
    gender varchar,
    device varchar,
    plan_duration varchar
);




COPY MOVIE_DATA.NETFLIX_DATA (user_id, subscription, join_date, last_payment_date, country, age, gender, device,plan_duration)
FROM '/data/data.csv' DELIMITER ',' CSV HEADER;

The provided code snippet defines a schema and populates a table in PostgresSQL. Let’s break it down.

CREATE SCHEMA IF NOT EXISTS MOVIE_DATA;

This line attempts to create a schema (database) named MOVIE_DATA. IF NOT EXISTS clause ensures it creates the schema only if it does not exist already.

create table if not exists MOVIE_DATA.NETFLIX_DATA(....);

This line creates a table named NETFLIX_DATA within the MOVIE_DATA schema. The table definition follows in parenthesis, specifying the columns and their properties.

user_id serial primary key: This defines the user_id column as an auto-incrementing integer (serial) and sets it as the primary key for the table. The primary key uniquely identifies each row.
subscription varchar not null: This defines the subscription column as a variable-length string (varchar) that cannot be null (has to have a value).
Other columns (join_date, last_payment_date, etc.) are defined similarly with their respective data types and constraints.

COPY MOVIE_DATA.NETFLIX_DATA (...) FROM '/data/data.csv' DELIMITER ',' CSV HEADER;

COPY: This is a Postgres command specifically used for loading data from external files.
MOVIE_DATA.NETFLIX_DATA: This specifies the target table where the data will be inserted.
FROM '/data/data.csv': This defines the location of the CSV file containing the data.
DELIMITER ',': This indicates that the data in the CSV file is separated by commas (",").
CSV HEADER: This specifies that the first line of the CSV file contains column names, which will be used to map the data to the corresponding table columns.

Initialization with Docker

The Postgres database will be declared with a docker-compose yml file. Docker Compose is a tool specifically designed to simplify working with applications that require multiple Docker containers.

version: '3'


services:
  postgres:
    image: postgres:latest
    environment:
      POSTGRES_USER: alt_nkem_user
      POSTGRES_PASSWORD: secretPassw0rd
      POSTGRES_DB: alt_netflix_db
    ports:
      - "5434:5432"
    volumes:
      - ./pg_data:/var/lib/postgresql/data
      - ./data:/data
      -./infra_scripts/init.sql:/docker-entrypoint-initdb.d/init.sql


volumes:
  postgres_data:

version: ‘3’

This line specifies the Docker Compose file format version.

Services: This section defines the services that make up your application. Here, you only have one service: postgres.

Postgres: This section defines the configuration for the postgres service.

image: postgres:latest

This line specifies the Docker image to use for the service. Here, it’s using the official postgres:latest image, which pulls the latest version of the PostgreSQL database server.

Environment: This section defines environment variables for the container. These variables will be accessible in the container during runtime.

POSTGRES_USER:Sets the username for the PostgreSQL database.

POSTGRES_PASSWORD: Sets the password for the PostgreSQL database. Here it is secretPassw0rd.

POSTGRES_DB: Set to create the database. The database is set to alt_netflix_db

Ports: This section maps ports on the host machine to ports inside the container. Here, it maps port 5434 t0 5432 inside the container. This allows you to connect to the PostgreSQL database from the host machine using port 5434.

Volumes: This section defines persistent storage for the container. Volumes allow data to persist even if the container is recreated.

./pg_data:/var/lib/postgresql/data: This mounts the local directory ./pg_data on the host machine to the /var/lib/postgresql/data directory inside the container. This directory stores the actual database data, ensuring it persists across container restarts.
./data:/data: This mounts the local directory ./data on the host machine to the /data directory inside the container.
./infra_scripts/init.sql:/docker-entrypoint-initdb.d/init.sql: his mounts the local file ./infra_scripts/init.sql on the host machine to the /docker-entrypoint-initdb.d/init.sql directory inside the container. This allows you to run an initialization script when the container starts, potentially for setting up database schema or other configurations.

Test Service

In the terminal type Docker-compose up -d and press enter. You can use PGADMIN software or DbWeaver to connect with the credentials.

Python Script

I used pysycopg2 library to access postgres database with python

dbmanager.py

import psycopg2
import os
from dotenv import load_dotenv


load_dotenv()




# create a connection
def _get_pg_cred():
    return {
        "user": os.environ.get("POSTGRES_USER"),
        "password": os.environ.get("POSTGRES_PASSWORD"),
        "port": os.environ.get("POSTGRES_PORT", 5434),
        "host": os.environ.get("POSTGRES_HOST", "0.0.0.0"),
        "db_name": os.environ.get("POSTGRES_DB"),
    }




def start_postgres_connection():
    creds = _get_pg_cred()
    connection = psycopg2.connect(
        dbname=creds["db_name"],
        user=creds["user"],
        password=creds["password"],
        host=creds["host"],
        port=creds["port"],
    )
    return connection




def query_database(connection, query_str):
    conn = connection
    cursor = conn.cursor()
    cursor.execute(query_str)
    rows = cursor.fetchall()
    cursor.close()
    conn.close()
    return rows

main.py

from db_manager import start_postgres_connection, query_database


if __name__ == "__main__":
    conn = start_postgres_connection()
    query = """
            SELECT COUNT(*) AS TOTAL_RECORDS
            FROM MOVIE_DATA.NETFLIX_DATA;
            """
    result = query_database(connection=conn, query_str=query)
    print(result)

Conclusion

In this article, you learnt how to use Docker Compose to create a PostgreSQL image and run a Python script to execute SQL files against the PostgreSQL database.

Streamlining Database Management :Deploying PostgreSQL with Docker and Python

Table of contents