amazon web services - Using AWS Pipeline for incremental copy of on premises postgres to Redshift -
we're exploring using redshift our warehouse , need move new data our existing on-premises postgres db redshift nightly. looks can pipeline , template if primary db in rds, can pipeline if database on-premises?
since synching in bulk on periodic basis might want consider pushing data s3, easy on-premises, , loading redshift via copy command. it's fast , reliable , backup free (cheap) in s3 side effect.
you can use aws s3 cp
awscli python tool push csv postgres dumps (after putting access key in .aws/config
) like:
aws s3 cp current_dump.csv.gz s3://yourbucket/20170108/dump.csv.gz
and regular postgres psql utility execute copy on redshift so:
pgpassword='yourpass' psql -h your.redshift.end.point.com -u youruser -d yourdb -p 5439 -c "copy yourtable 's3://yourbucket/20170108/dump.csv.gz' credentials 'aws_access_key_id=[yourkey];aws_secret_access_key=[yoursecret]' delimiter ',' null 'null' ignoreblanklines emptyasnull blanksasnull timeformat 'auto' fillrecord maxerror 1 csv gzip;"
you put postgres dump , these commands in shell script bit scripting of time values , run cron job.
for best practice want upload staging table , merge main table support updates if needed , prevent duplicates.
Comments
Post a Comment