Converting from MySQL to Postgres

In playing with GitStreams, I've made the switch from MySQL to Postgres. The reason for the switch was mostly shaming from Jeremy Dunck, Selena Deckelmann and a few others. The results, however were quite surprising.

Storage Efficiency

Operating on a budget $20/month Linode server (affiliate link), disk space is precious to me. In using MySQL, my data files were hovering just under 8GB for a couple of million rows of mostly-JSON data. After the migration, my size-on-disk dropped to 3GB. This is quite exciting for me as I've 2x'd the lifetime of this server. You can see the MySQL schema and the Postgres schema for comparison purposes. The main table is streamer_activity.

Query Performance

One of the hopes I had for this change was an increase in query peformance. I've been told that Postgres has a much better query planner than MySQL, so I was hoping that this would mean that syncs or email generation would take dramatically less time. Unfortunately, I'm IO bound due to disk on my linode box (it's sooo tiny), and no amount of query planning is going to make that better.

The Migration

One of the benefits of having a "slow web" service is that you can suffer downtime with little hassle. The migration itself was mostly uneventful thanks to a fantastic porting script I came across: py-mysql2pgsql.

Using this tool, I imported my 2.5 million rows in ~10 minutes (it hovered 4k rows /sec) with a minimal amount of configuration. The first time you run the tool, it outputs a simple configuration file. You edit it with the information used to connect to each database, and it does all the dirty work.

The one gotcha I did find using this script is that it doesn't create datetime fields with timezones, so my Django install complained a great deal. A simple migration to the tune of the code sample below fixed that right up.

alter table streamer_repo alter column last_synced set data type timestamp with time zone;
alter table streamer_githubuser alter column last_synced set data type timestamp with time zone;
alter table streamer_activity alter column created_at set data type timestamp with time zone;
alter table streamer_userprofile alter column last_email_received set data type timestamp with time zone;
alter table streamer_userprofile_repos alter column last_sent set data type timestamp with time zone;
alter table streamer_userprofile_followed alter column last_sent set data type timestamp with time zone;

alter table auth_user alter column  last_login set data type timestamp with time zone;
alter table auth_user alter column date_joined set data type timestamp with time zone;
alter table django_admin_log alter column action_time set data type timestamp with time zone;
alter table django_session alter column expire_date set data type timestamp with time zone;
alter table nashvegas_migration alter column date_created set data type timestamp with time zone;

Conclusion

All in all, the port to Postgres has been worth it. I've doubled the lifetime of my server, decreased my storage footprint and set my small service up for another few months of happy email delivery.