PostgreSQL 9 introduces a new feature which allows for asynchronous replication of a master server to slave/standby server(s). Replication has been available in Postgres in previous versions, in different capacities (warm standby/hot standby) and through various 3rd party solutions (Slony, etc); however, 9.0 introduces simple fast syncing between master and slaves. The slave servers will first connect w/ the master and when the master had new data it will push to the open slave connections over TCP.
I will not go into the details of installing postgres 9 on any environment. Some Notes however:
- All master and slaves mush (of course) be running PG 9
- All master and slaves must be on the same architecture - i.e. all on 64 or 32 bit systems
- The whole master server has be replicated - you can't just say replicate this table.
The master will send binary data to the slave when data is available. Slave and master communicate through TCP. You must setup this permission in the master servers pg_hba.conf file with this entry (replace eggie5 and ip w/ your respective values) - the word ";replication"; must be there. the /24 on the ip address means only match the first 24 bits of the 32 bit IP- so anyone in the while range 127.0.0.1-255 can access. If you want to match a single IP use /32:
host replication eggie5 127.0.0.1/24 trust
Next add these settings to postgresql.conf:
wal_level = hot_standby max_wal_senders = 5 wal_keep_segments = 32 listen_addresses='*'
- wal_level: hot_standby or archive are valid options. set hot_standby in order to run queires on your slave server
- wal_keep_segments: setting ensures that the master keeps a long enough queue of data for slaves that if one gets behind in polling it won't miss some data and get out of sync. See the Archiving section below for more details on this issue.
- max_wal_senders: setting specifies how many slaves the master can connect to.
- listen_addresses: opens port for outside connections.
That is all the configutation for the master server!
Configure the slave server(s)
Add this setting to postgresql.conf on the slave to allows queries to be run on it while in slave/standby mode thus rendering it a ";hot standby";.
hot_standby = on
Create a recovery.conf file in PG's data directory w/ these contents and replace my settings w/ yours respectively:
standby_mode = 'on' primary_conninfo = 'host=localhost port=5432 user=eggie5 password=asdf' trigger_file = '/tmp/pgsql.trigger'
- primary_connifo: this is the information used to make TCP connection w/ the master
- trigger_file: this tells the standby to become read/write - if the master goes down
Create baseline of master database
Now, in order for the streaming replication system to work the slaves need to be in-synch w/ the master. There are many ways to do this. You could scp your masters data directory to the slave(s) or rsync it, etc. You just need to get a copy of your master to the slave(s). It depends if you can stop your master server or not on how you do this.
If you can be sure there will be no writes to you DB you can run this backup script w/o stopping the server; otherwise stop the server. The example below uses rsync, however, any method can be used to get the data dir to the standby computer - I often tar the data dir then use scp.
su - postgres psql -c ";SELECT pg_start_backup('backup', true)"; rsync -a -v -e ssh /usr/local/var/postgres/data/ slave:/var/lib/postgres/9.0/data/ --exclude postmaster.pid psql -c ";SELECT pg_stop_backup()";
You should now be able to start the standby and see it connect to the master:
LOG: database system was interrupted while in recovery at log time 2011-02-25 15:43:47 PST HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: entering standby mode LOG: consistent recovery state reached at 0/410030A0 LOG: redo starts at 0/41003010 LOG: record with zero length at 0/410030A0 LOG: database system is ready to accept read only connections LOG: streaming replication successfully connected to primary
If you are having trouble w/ the TCP connection a handy troubleshooting tool is nmap. Usually my problem was the port not being open from the outside. Run this command from the standby server:
nmap -sS ip_of_master
and you should get something like this:
Starting Nmap 5.21 ( http://nmap.org ) at 2011-03-01 10:42 PST Nmap scan report for ec2-174-129-157-204.compute-1.amazonaws.com (22.214.171.124) Host is up (0.043s latency). Not shown: 995 filtered ports PORT STATE SERVICE 22/tcp open ssh 5432/tcp open postgresql
This postgres port is open. Usually if your listen_addreses in posgtresql.conf are messed up this port will be closed.
If your master fails and falls back to the slave you'll want to function just like the master - so copy all the settings from the master posgresql.conf to the slave postgresql.conf file.
When postgres finds that there is a recovery.conf file in the data directory - it considers the server a slave or standby for replication purposes. If the file specified in trigger_file is created postgres will halt recovery mode and turn into a full read/write assuming the master DB is down. When this happens recovery.conf will be renamed to recovoery.done thus making this instance not a standby.
If the trigger file is found this is the postgres log output
LOG: trigger file found: /tmp/pgsql.trigger FATAL: terminating walreceiver process due to administrator command LOG: redo done at 0/410293C0 LOG: selected new timeline ID: 2 LOG: archive recovery complete LOG: database system is ready to accept connections LOG: autovacuum launcher started
This standby server has full read/write capabilities - however, don't think about it as a master server - it isn't replicating to your other standby nodes. To restore streaming replicaiton you must repeat the baseline step above.
Note: The creation of the tigger file should be automated w/ some type of monitoring tool. e.g. If it notices your master isn't responding, etc, it should create the touch file.
Setting up Archiving (optional)
If your database has a lot of activity it is possible for replication process to get bottlenecked and possible lose records. For this reason Postgresql 9 streaming replication can be set up with log shipping/WAL files. WAL (write ahead log) is a feature of postgres in which data is written before it committed to the database. Using these files it is possible to ensure that streaming replication doesn't lose anything. These files are created on the master in 16 MB files. Streaming Replication can utilize log shipping in order to catch up slaves which fall behind.
This is especially useful for if example a slave goes offline while updates are happening on the master. If you have log shipping on - when the slave comes back it can get caught up to head (current) using the wal_logs.
These websites helped me w/ the setup:
Tags: pg9, postgres, postgresql, postgresql 9, replication, streaming replication, database, db, tutorial, eggmos