Thursday 25 May 2017

High speed data loading utility for PostgreSQL "pg_bulkload"

In this blog i am trying to explain you how to load bulk data into PostgreSQL more faster than copy command .

here i am using "pg_bulkload" binary/utility for loading data into PostgreSQL .I am covering installation/configuration of pg_bulkload and loading data .


Let's start from scratch .


Introduction : -



pg_bulkload is designed to load huge amount of data to a database. You can choose whether database constraints are checked and how many errors are ignored during the loading. For example, you can skip integrity checks for performance when you copy data from another database to PostgreSQL. On the other hand, you can enable constraint checks when loading unclean data. The original goal of pg_bulkload was an faster alternative of COPY command in PostgreSQL, but version 3.0 or later has some ETL features like input data validation and data transformation with filter functions.

I assumes the following:
  • PostgreSQL must have been installed .
  • The database has been initialized using initdb.
Installation : -  

Download source code from below link 



wget https://github.com/ossc-db/pg_bulkload/archive/master.zip
unzip master


If environment variable is not set then ,Manually export path of pg_config

export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr
/local/postgresql9.6/bin/

cd pg_bulkload-master
make
make install

After Installation ,You can run regression testing
  
make installcheck


Configuration : -

First connect to the DB and create pg_bulkload extension .

 create extension pg_bulkload ;


Test case : -

I have created one table to get data loaded into that table .



  • Copy data into table via copy command .



Time taken : - 14 seconds


  • Load data via bulk-load utility .


Edit control file that includes settings for data loading

Control file can accept below Input file types 



TYPE = CSV | BINARY/FIXED | FUNCTION
The type of input data. The default is CSV.
  • CSV : load from a text file in CSV format
  • BINARY | FIXED : load from a fixed binary file
  • FUNCTION : load from a result set from a function.
    If you use it, INPUT must be an expression to call a function.
Basic configuration/settings to load data from csv file .

#File Name = /tmp/one_csv.ctl

OUTPUT = public.bulkload_demo                   # [<schema_name>.]table_name
INPUT = /tmp/bulkload_demo.csv  # Input data location (absolute path)
TYPE = CSV                            # Input file type
QUOTE = "\""                          # Quoting character
ESCAPE = \                            # Escape character for Quoting
DELIMITER = ","                       # Delimiter
MULTI_PROCESS = yes

Start loading data by using below command .

pg_bulkload /tmp/one_csv.ctl -d bulkload_test 


Time taken : - 4 Seconds

Reference/Useful Links : -

Detailed information on other switches of pg_bulkload .


Pre-defined test case  .


Monday 15 May 2017

Job scheduler for PostgreSQL "pg_cron"

I am trying to highlight cron job facility inside the databases as per my knowledge .
Today i am going explore pg_cron ,So let's start .  
What is pg_cron  : - 
pg_cron is a simple cron-based job scheduler for PostgreSQL (9.5 or higher) that runs inside the database as an extension. It uses the same syntax as regular cron, but it allows you to schedule PostgreSQL commands directly from the database .
Let's see how it's works 

Step 1 :- 


For implementing/Installation of pg_cron you need to download source code from git


export PATH=/usr/local/pgsql/bin:$PATH
wget https://github.com/citusdata/pg_cron/archive/master.zip
unzip master
cd pg_cron-master/
make
make install
  

Step 2 : -

To start the pg_cron background worker when PostgreSQL starts, you need to add pg_cron to shared_preload_libraries in postgresql.conf and restart PostgreSQL

Note that pg_cron does not run any jobs as a long a server is in hot standby mode, but it automatically starts when the server is promoted.

Add below line in postgresql.conf
shared_preload_libraries = 'pg_cron'


Step 3 : -

Create extension pg_cron ,After creation of extension it will create one schema having one table "job"


create extension pg_cron ;



Step 4 : - 

Now try to setup your job inside the DB .

A) 

-- Vacuum Analyze every day at 11:00am
SELECT cron.schedule('0 11 * * *', 'VACUUM ANALYZE');


B) To run SQL statement in scheduler.


-- Delete old data on Saturday at 3:30
SELECT cron.schedule('30 3 * * 6', $$DELETE FROM events WHERE event_time < now() - interval '1 week'$$);



C) Stop/unscheduled job




The schedule uses the standard cron syntax, in which * means "run every time period", and a specific number means "but only at this time":
 ┌───────────── min (0 - 59)
 │ ┌────────────── hour (0 - 23)
 │ │ ┌─────────────── day of month (1 - 31)
 │ │ │ ┌──────────────── month (1 - 12)
 │ │ │ │ ┌───────────────── day of week (0 - 6) (0 to 6 are Sunday to
 │ │ │ │ │                  Saturday, or use names; 7 is also Sunday)
 │ │ │ │ │
 │ │ │ │ │
 * * * * * 
How to run SQL job against remote server :-



If you are superuser, then you can manually modify the cron.job table and use custom values for nodename and nodeport to connect to a different machine:

INSERT INTO cron.job (schedule, command, nodename, nodeport, database, username)
VALUES ('0 11 * * *', 'VACUUM ANALYZE', 'postgresql-pgcron', 5432, 'postgres', 'tushar');

You can use .pgpass to allow pg_cron to authenticate with the remote server .  


Background process : - 

Process "bgworker: pg_cron_scheduler" get executed in back-end to run scheduled  jobs .

Please refer below snip .




Friday 12 May 2017

PostgreSQL foreign data wrapper for MySQL "mysql_fdw"


This is my first blog ,In this blog i am trying to give you some information on heterogeneous connection between PostgreSQL to MySQL with the help of "mysql_fdw" .


So,Now we are starting .

Following are the steps how to implement FDW with PostgreSQL to perform operation on mysql 

Step 1 : -

       Download 'mysql_fdw' package

       Link to download = "https://github.com/EnterpriseDB/mysql_fdw"
     
              
wget https://github.com/EnterpriseDB/mysql_fdw/archive/master.zip
unzip master
cd mysql_fdw-master/

Step 2 : -
      
      A)  To build on POSIX-compliant systems you need to ensure the pg_config executable is in your path when you run make. This executable is typically in your PostgreSQL installation's bin directory. 


export PATH=/usr/local/pgsql/bin/:$PATH     //PostgreSQL Binary path .

     B) The mysql_config must also be in the path, it resides in the MySQL bin directory


yum install mysql-devel mysql-common
export PATH=/usr/bin/:$PATH    



Step 3 : -

        Compile and install the code .

cd mysql_fdw-master/
make USE_PGXS=1
make USE_PGXS=1 install

Step 4 : -

        Create  mysql_fdw extension on Postgresql server


CREATE EXTENSION mysql_fdw;


Till now we have completed installation part .

Below part is for creating connection of PostgreSQL with MySql .
Step 5 : - 

       Now connect to your mysql database to created some dummy user and table .


CREATE USER 'foo'@'%' IDENTIFIED BY 'bar';   --Create user
grant all privileges on *.* to 'foo'@'%' with grant option; --Grant permission 

      Create one table in mysql databases



Step 6 : -

       Now get back to PostgreSQL Databases ,and do MySQL connectivity configuration as per below steps.


A) Create server object
                       
                       CREATE SERVER mysql_server
                        FOREIGN DATA WRAPPER mysql_fdw
                        OPTIONS (host '192.168.213.1', port '3306');

 B) Create user mapping

                        CREATE USER MAPPING FOR postgres
                        SERVER mysql_server
                        OPTIONS (username 'foo', password 'bar');
           
C) Create foreign table

                        CREATE FOREIGN TABLE warehouse(
                        warehouse_id int,
                        warehouse_name text,
                        warehouse_created datetime)
                        SERVER mysql_server          
                        OPTIONS (dbname 'test', table_name 'warehouse');



Running select query from PostgreSQL DB against MySql .



Insert one value into mysql's test database checking it from PostgreSQL DB.





ERROR: - 



While accessing DB via PostgreSQL from remote location if you got error like .

ERROR:  failed to connect to MySQL: Can't connect to MySQL server on (113)

iptables -I INPUT 1 -p tcp --dport 3306 -j ACCEPT

 Add below entry into /etc/my.cnf

bind-address = 0.0.0.0


/etc/init.d/mysqld restart