Welcome to my PostgreSQL blogs: May 2017

Thursday 25 May 2017

High speed data loading utility for PostgreSQL "pg_bulkload"

In this blog i am trying to explain you how to load bulk data into PostgreSQL more faster than copy command .

here i am using "pg_bulkload" binary/utility for loading data into PostgreSQL .I am covering installation/configuration of pg_bulkload and loading data .

Let's start from scratch .

Introduction : -

pg_bulkload is designed to load huge amount of data to a database. You can choose whether database constraints are checked and how many errors are ignored during the loading. For example, you can skip integrity checks for performance when you copy data from another database to PostgreSQL. On the other hand, you can enable constraint checks when loading unclean data. The original goal of pg_bulkload was an faster alternative of COPY command in PostgreSQL, but version 3.0 or later has some ETL features like input data validation and data transformation with filter functions.

I assumes the following:

PostgreSQL must have been installed .
The database has been initialized using initdb.

Installation : -

Download source code from below link

https://github.com/ossc-db/pg_bulkload

wget https://github.com/ossc-db/pg_bulkload/archive/master.zip

unzip master

If environment variable is not set then ,Manually export path of pg_config

export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr

/local/postgresql9.6/bin/

cd pg_bulkload-master

make

make install

After Installation ,You can run regression testing

make installcheck

Configuration : -

First connect to the DB and create pg_bulkload extension .

create extension pg_bulkload ;

Test case : -

I have created one table to get data loaded into that table .

Copy data into table via copy command .

Time taken : - 14 seconds

Load data via bulk-load utility .

Edit control file that includes settings for data loading

Control file can accept below Input file types

TYPE = CSV | BINARY/FIXED | FUNCTION

The type of input data. The default is CSV.

CSV : load from a text file in CSV format
BINARY | FIXED : load from a fixed binary file
FUNCTION : load from a result set from a function.
If you use it, INPUT must be an expression to call a function.

Basic configuration/settings to load data from csv file .

#File Name = /tmp/one_csv.ctl

OUTPUT = public.bulkload_demo # [<schema_name>.]table_name

INPUT = /tmp/bulkload_demo.csv # Input data location (absolute path)

TYPE = CSV # Input file type

QUOTE = "\"" # Quoting character

ESCAPE = \ # Escape character for Quoting

DELIMITER = "," # Delimiter

MULTI_PROCESS = yes

Start loading data by using below command .

pg_bulkload /tmp/one_csv.ctl -d bulkload_test

Time taken : - 4 Seconds

Reference/Useful Links : -

Detailed information on other switches of pg_bulkload .

http://ossc-db.github.io/pg_bulkload/pg_bulkload.html

Pre-defined test case .

http://ossc-db.github.io/pg_bulkload/index.html

Monday 15 May 2017

Job scheduler for PostgreSQL "pg_cron"

I am trying to highlight cron job facility inside the databases as per my knowledge .

Today i am going explore pg_cron ,So let's start .

What is pg_cron : -

pg_cron is a simple cron-based job scheduler for PostgreSQL (9.5 or higher) that runs inside the database as an extension. It uses the same syntax as regular cron, but it allows you to schedule PostgreSQL commands directly from the database .

Let's see how it's works

Step 1 :-

For implementing/Installation of pg_cron you need to download source code from git

Dowload link

export PATH=/usr/local/pgsql/bin:$PATH

wget https://github.com/citusdata/pg_cron/archive/master.zip

unzip master

cd pg_cron-master/

make

make install

Step 2 : -

To start the pg_cron background worker when PostgreSQL starts, you need to add pg_cron to shared_preload_libraries in postgresql.conf and restart PostgreSQL.

Note that pg_cron does not run any jobs as a long a server is in hot standby mode, but it automatically starts when the server is promoted.

Add below line in postgresql.conf

shared_preload_libraries = 'pg_cron'

Step 3 : -

Create extension pg_cron ,After creation of extension it will create one schema having one table "job"

create extension pg_cron ;

Step 4 : -

Now try to setup your job inside the DB .

-- Vacuum Analyze every day at 11:00am

SELECT cron.schedule('0 11 * * *', 'VACUUM ANALYZE');

B) To run SQL statement in scheduler.

-- Delete old data on Saturday at 3:30

SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);

C) Stop/unscheduled job

The schedule uses the standard cron syntax, in which * means "run every time period", and a specific number means "but only at this time":

 ┌───────────── min (0 - 59)
 │ ┌────────────── hour (0 - 23)
 │ │ ┌─────────────── day of month (1 - 31)
 │ │ │ ┌──────────────── month (1 - 12)
 │ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to
 │ │ │ │ │                  Saturday, or use names; 7 is also Sunday)
 │ │ │ │ │
 │ │ │ │ │
 * * * * *

How to run SQL job against remote server :-

If you are superuser, then you can manually modify the cron.job table and use custom values for nodename and nodeport to connect to a different machine:




 

  
INSERT INTO cron.job (schedule, command, nodename,
  nodeport, database, username)

VALUES ('0 11 * * *', 'VACUUM ANALYZE',
  'postgresql-pgcron', 5432, 'postgres', 'tushar');

 




You can use .pgpass to allow pg_cron to authenticate with the remote server .  







Background process : - 




Process "bgworker: pg_cron_scheduler" get executed in back-end to run scheduled  jobs .




Please refer below snip .

Friday 12 May 2017

PostgreSQL foreign data wrapper for MySQL "mysql_fdw"

This is my first blog ,In this blog i am trying to give you some information on heterogeneous connection between PostgreSQL to MySQL with the help of "mysql_fdw" .

So,Now we are starting .
Following are the steps how to implement FDW with PostgreSQL to perform operation on mysql

Step 1 : -

Download 'mysql_fdw' package

Link to download = "https://github.com/EnterpriseDB/mysql_fdw"

wget https://github.com/EnterpriseDB/mysql_fdw/archive/master.zip

unzip master

cd mysql_fdw-master/

Step 2 : -

A) To build on POSIX-compliant systems you need to ensure the pg_config executable is in your path when you run make. This executable is typically in your PostgreSQL installation's bin directory.

export PATH=/usr/local/pgsql/bin/:$PATH //PostgreSQL Binary path .

B) The mysql_config must also be in the path, it resides in the MySQL bin directory

yum install mysql-devel mysql-common

export PATH=/usr/bin/:$PATH

Step 3 : -

Compile and install the code .

cd mysql_fdw-master/

make USE_PGXS=1

make USE_PGXS=1 install

Step 4 : -

Create mysql_fdw extension on Postgresql server

CREATE EXTENSION mysql_fdw;

Till now we have completed installation part .

Below part is for creating connection of PostgreSQL with MySql .
Step 5 : -

Now connect to your mysql database to created some dummy user and table .

CREATE USER 'foo'@'%' IDENTIFIED BY 'bar'; --Create user

grant all privileges on *.* to 'foo'@'%' with grant option; --Grant permission

Create one table in mysql databases

Step 6 : -

Now get back to PostgreSQL Databases ,and do MySQL connectivity configuration as per below steps.

A) Create server object

CREATE SERVER mysql_server

FOREIGN DATA WRAPPER mysql_fdw

OPTIONS (host '192.168.213.1', port '3306');

B) Create user mapping

CREATE USER MAPPING FOR postgres

SERVER mysql_server

OPTIONS (username 'foo', password 'bar');

C) Create foreign table

CREATE FOREIGN TABLE warehouse(

warehouse_id int,

warehouse_name text,

warehouse_created datetime)

SERVER mysql_server

OPTIONS (dbname 'test', table_name 'warehouse');

Running select query from PostgreSQL DB against MySql .

Insert one value into mysql's test database checking it from PostgreSQL DB.

ERROR: -

While accessing DB via PostgreSQL from remote location if you got error like .

ERROR: failed to connect to MySQL: Can't connect to MySQL server on (113)

iptables -I INPUT 1 -p tcp --dport 3306 -j ACCEPT

Add below entry into /etc/my.cnf

bind-address = 0.0.0.0

/etc/init.d/mysqld restart