Photo by Venti Views on Unsplash
Streamlining Database Management :Deploying PostgreSQL with Docker and Python
Introduction
Docker is an open-source platform that enables developers to automate the deployment, scaling, and management of applications using containerisation. It provides the capability to create, deploy, and manage containers, ensuring that applications run consistently across different environments. The key advantages of using docker are:
Consistency Across Environments: Docker ensures your application runs the same way in development, testing, and production environments. Containers encapsulate all dependencies, configurations, and code, making moving applications across different environments easier without compatibility issues.
Horizontal Scaling: Docker makes it straightforward to scale applications horizontally by adding more containers to handle increased load. This is especially useful for microservices architectures where different components can be scaled independently.
Resource Utilization: Containers share the host OS kernel and resources, making them more lightweight and efficient than traditional virtual machines. They start up quickly and use less memory and CPU.
Process Isolation: Docker provides a high level of isolation between containers, enhancing security by containing vulnerabilities within a single container.
In this article, you’ll learn to deploy your database with Docker and access it with Pyscopg2 in Python.
Postgres Initialization
I initialized the database using an init.sql file.
-- Create schema
CREATE SCHEMA IF NOT EXISTS MOVIE_DATA;
-- create and populate tables
create table if not exists MOVIE_DATA.NETFLIX_DATA
(
user_id serial primary key,
subscription varchar not null,
join_date varchar,
last_payment_date varchar,
country varchar,
age int,
gender varchar,
device varchar,
plan_duration varchar
);
COPY MOVIE_DATA.NETFLIX_DATA (user_id, subscription, join_date, last_payment_date, country, age, gender, device,plan_duration)
FROM '/data/data.csv' DELIMITER ',' CSV HEADER;
The provided code snippet defines a schema and populates a table in PostgresSQL. Let’s break it down.
CREATE SCHEMA IF NOT EXISTS MOVIE_DATA;
This line attempts to create a schema (database) named MOVIE_DATA. IF NOT EXISTS clause ensures it creates the schema only if it does not exist already.
create table if not exists MOVIE_
DATA.NETFLIX
_DATA(....);
This line creates a table named NETFLIX_DATA within the MOVIE_DATA schema. The table definition follows in parenthesis, specifying the columns and their properties.
user_id serial primary key: This defines the user_id column as an auto-incrementing integer (serial) and sets it as the primary key for the table. The primary key uniquely identifies each row.
subscription varchar not null: This defines the
subscription
column as a variable-length string (varchar) that cannot be null (has to have a value).Other columns (join_date, last_payment_date, etc.) are defined similarly with their respective data types and constraints.
COPY MOVIE_
DATA.NETFLIX
_DATA (...) FROM '/data/data.csv' DELIMITER ',' CSV HEADER;
COPY
: This is a Postgres command specifically used for loading data from external files.MOVIE_
DATA.NETFLIX
_DATA
: This specifies the target table where the data will be inserted.FROM '/data/data.csv'
: This defines the location of the CSV file containing the data.DELIMITER ','
: This indicates that the data in the CSV file is separated by commas (",").CSV HEADER
: This specifies that the first line of the CSV file contains column names, which will be used to map the data to the corresponding table columns.
Initialization with Docker
The Postgres database will be declared with a docker-compose yml file. Docker Compose is a tool specifically designed to simplify working with applications that require multiple Docker containers.
version: '3'
services:
postgres:
image: postgres:latest
environment:
POSTGRES_USER: alt_nkem_user
POSTGRES_PASSWORD: secretPassw0rd
POSTGRES_DB: alt_netflix_db
ports:
- "5434:5432"
volumes:
- ./pg_data:/var/lib/postgresql/data
- ./data:/data
-./infra_scripts/init.sql:/docker-entrypoint-initdb.d/init.sql
volumes:
postgres_data:
version: ‘3’
This line specifies the Docker Compose file format version.
Services: This section defines the services that make up your application. Here, you only have one service: postgres.
Postgres: This section defines the configuration for the postgres service.
image: postgres:latest
This line specifies the Docker image to use for the service. Here, it’s using the official postgres:latest image, which pulls the latest version of the PostgreSQL database server.
Environment: This section defines environment variables for the container. These variables will be accessible in the container during runtime.
POSTGRES_USER:Sets the username for the PostgreSQL database.
POSTGRES_PASSWORD: Sets the password for the PostgreSQL database. Here it is secretPassw0rd.
POSTGRES_DB: Set to create the database. The database is set to alt_netflix_db
Ports: This section maps ports on the host machine to ports inside the container. Here, it maps port 5434 t0 5432 inside the container. This allows you to connect to the PostgreSQL database from the host machine using port 5434.
Volumes: This section defines persistent storage for the container. Volumes allow data to persist even if the container is recreated.
./pg_data:/var/lib/postgresql/data
: This mounts the local directory ./pg_data on the host machine to the /var/lib/postgresql/data directory inside the container. This directory stores the actual database data, ensuring it persists across container restarts../data:/data
: This mounts the local directory./data
on the host machine to the /data directory inside the container../infra_scripts/init.sql:/docker-entrypoint-initdb.d/init.sql
: his mounts the local file ./infra_scripts/init.sql on the host machine to the/docker-entrypoint-initdb.d/init.sql
directory inside the container. This allows you to run an initialization script when the container starts, potentially for setting up database schema or other configurations.
Test Service
In the terminal type Docker-compose up -d
and press enter. You can use PGADMIN software or DbWeaver to connect with the credentials.
Python Script
I used pysycopg2 library to access postgres database with python
import psycopg2
import os
from dotenv import load_dotenv
load_dotenv()
# create a connection
def _get_pg_cred():
return {
"user": os.environ.get("POSTGRES_USER"),
"password": os.environ.get("POSTGRES_PASSWORD"),
"port": os.environ.get("POSTGRES_PORT", 5434),
"host": os.environ.get("POSTGRES_HOST", "0.0.0.0"),
"db_name": os.environ.get("POSTGRES_DB"),
}
def start_postgres_connection():
creds = _get_pg_cred()
connection = psycopg2.connect(
dbname=creds["db_name"],
user=creds["user"],
password=creds["password"],
host=creds["host"],
port=creds["port"],
)
return connection
def query_database(connection, query_str):
conn = connection
cursor = conn.cursor()
cursor.execute(query_str)
rows = cursor.fetchall()
cursor.close()
conn.close()
return rows
from db_manager import start_postgres_connection, query_database
if __name__ == "__main__":
conn = start_postgres_connection()
query = """
SELECT COUNT(*) AS TOTAL_RECORDS
FROM MOVIE_DATA.NETFLIX_DATA;
"""
result = query_database(connection=conn, query_str=query)
print(result)
Conclusion
In this article, you learnt how to use Docker Compose to create a PostgreSQL image and run a Python script to execute SQL files against the PostgreSQL database.