MySQL and Postgres as an External Data Source
- Setting up your database
- Updating your firewall rules
- Creating the connection
- Creating queries
You can connect Kissmetrics to your MySQL or Postgres database. This will import all existing data and then continuously monitor your database for any new data and pass that into Kissmetrics. This is a powerful and easy way to integrate Kissmetrics with your app.
If you have a different SQL database (Oracle, SQLite, Mongo), we have not yet built support to directly read from these database types. Let us know that you are interested in seeing such an integration! In the meantime, please consider converting your data into
.csv files and using our recurring CSV import method.
It’s recommended you read People, Events, and Properties before getting started here.
Setting up your database
The Kissmetrics SQL integration will only work with a read-only database. We enforce this for all users for our users peace of mind. Additionally we recommend that you create a separate MySQL user for Kissmetrics to use. The following SQL will accomplish this:
CREATE USER 'kissmetrics'@'%' IDENTIFIED BY 'some_password'; GRANT select ON your_database.* TO 'kissmetrics'@'%'; FLUSH PRIVILEGES;
Particularly for Postgres, we check for read-only status by ensuring the user you created does not have table privileges to
Just make sure you note the username (
kissmetrics in this example) and password (
some_password in this example) because you’ll need those when you create the connection.
This SQL provides access to all tables in a database. You can also grant access to the specific tables you will be querying with Kissmetrics.
Updating your firewall rules
Kissmetrics will connect to your MySQL server using one of the following IP addresses:
You’ll want to make sure that your firewall allows connections to the port you run MySQL on (which by default is
3306) from these IP addresses. If you have other security needs please contact support so that we can help you figure out the best solution for your situation.
If your database is on AWS
If your database is hosted on Amazon Web Services in the US East (Northern Virginia) Region (us-east-1), then we would be locating your databases using the internal Amazon network. In that case, Kissmetrics may connect to your MySQL server using one of the following internal Amazon IP addresses:
Creating the connection
To create the connection:
- Go to Settings (the small gear tab)
- Click Data Integrations
- Click Add a MySQL Data Source
- Fill out the connection details you used in the previous step. You’ll need to specify the remote address of the MySQL server. If your server is not accessible over the internet please contact us so we can help you determine your best option.
You will need to create one query for each event.
- Name: you can describe what the query is importing.
- Record Kissmetrics Event as: name the Kissmetrics event that is recorded by this query.
When creating your SQL query you’ll need to make sure that the results contain the following fields:
KM_PERSON- this field will be used as the identity of the person. Typically you’ll want to select the email or username field from your database.
KM_TIME- this field will be used as the timestamp of the event. Typically you’ll want to select a date field from your database.
KM_LAST_RAN- this field is used so that Kissmetrics knows which rows have already been synced. Typically you’ll want to select the id or timestamp field from your database.
See below for more details about each.
The value you select for
KM_PERSON will be used as the identity for the person performing the event. As mentioned the best choice, when available, is the e-mail field form your table. To rename
KM_PERSON you can use the
SELECT...AS... syntax to name the field in the results (e.g.
SELECT email AS KM_PERSON).
It’s possible that the identity field you want to use is in a different table. In this case you might need to use a SQL JOIN to join the two tables. For example if I have a
payments table that I’m creating a query for I might do:
SELECT users.email AS KM_TIME... FROM users, payments... WHERE users.id = payments.user_id;
The value you select for
KM_TIME will be used as the date/time for the event you syncing. Kissmetrics will not sync without this field. The column type can be any valid date field, including just a plain
INT field with a UNIX epoch-based timestamp. You can use the
SELECT...AS... syntax to name the field in the results (e.g.
SELECT created_at AS KM_TIME).
Please note that Kissmetrics will set the timezone to UTC/GMT (
SET time_zone = "+0:00") for the connection before making any queries.
When Kissmetrics syncs it will sync at most 50,000 rows every hour per query. When it runs the next hour it will need to make sure that it selects the next 50,000 rows - making sure not to duplicate or miss any rows.
This is where the
KM_LAST_RAN field comes in. There are a few key things to keep in mind:
- You need to select a column that only increments when new data is inserted. An ID column is a perfect candidate, because it only goes up (after Kissmetrics has processed the row with ID 500 it knows that no new row with a lower ID than 500 is going to be inserted). A timestamp or date/time column can work also as long as new rows are going to have later timestamps than the existing rows in the database. If users can insert rows with timestamps in the past than a timestamp is not a good choice.
- You need to tell Kissmetrics the last value that was used. You can do this by returning a field with the name
KM_LAST_RAN. If you were using the
idcolumn for this purpose you can do
SELECT id AS KM_LAST_RANto accomplish this. Kissmetrics will store the last value used here.
- You need to add a condition to your SQL query that will ensure that when Kissmetrics runs your query it only selects rows that have not been synced. The value that you last returned to Kissmetrics will be available in the variable
$KM_LAST_RAN_FOR_THIS_QUERY. If you are using the ID column you can use
WHERE id > $KM_LAST_RAN_FOR_THIS_QUERYto accomplish this. You must be very careful when using timestamps for the this purpose. To ensure that Kissmetrics does not miss any data it is recommended that you use a
WHERE created_at >= $KM_LAST_RAN_FOR_THIS_QUERY). This does mean that Kissmetrics will get some duplicate rows, but Kissmetrics will ignore the duplicates. Not doing this means that Kissmetrics might miss some data depending on the granularity of your timestamps and the frequency that you get new data.
- Your query will automatically be ordered by
KM_LAST_RAN. You should not add ORDER BY or LIMIT clauses to your query.
- NOTE: You should use the same field for KM_LAST_RAN and $KM_LAST_RAN_FOR_THIS_QUERY.
A full query might look like:
SELECT id AS KM_LAST_RAN ... FROM users WHERE id > $KM_LAST_RAN_FOR_THIS_QUERY
Any additional fields you return in your SQL will be recorded as Kissmetrics properties for the person/event. So let’s say you have
country columns in your table that you want to pass along when a user signs up. You could do:
SELECT id AS KM_LAST_RAN, email AS KM_PERSON, created_at AS KM_TIME, gender, country FROM users WHERE id > $KM_LAST_RAN_FOR_THIS_QUERY
Whatever the name of the field is is what the property will be recorded as. If you want to use a different name you can use
SELECT ... AS .... So if you had a field
cpn that you wanted to record as
Coupon Code you could use:
SELECT cpn AS "Coupon Code".
Tracking when a user signs up
It’s quite easy to track all your signups:
SELECT users.login AS KM_PERSON, users.created_at AS KM_TIME, users.id AS KM_LAST_RAN FROM users WHERE users.id > $KM_LAST_RAN_FOR_THIS_QUERY
Imagine you also have a
payment_plans table and you want to include the plan name when a user signs up, you can do:
SELECT users.login AS KM_PERSON, users.created_at AS KM_TIME, payment_plans.name AS 'Plan Name', users.id AS KM_LAST_RAN FROM users, payment_plans WHERE payment_plans.id = users.payment_plan_id AND users.id > $KM_LAST_RAN_FOR_THIS_QUERY
Tracking when a user pays you
SELECT users.login AS KM_PERSON, payment_transactions.created_at AS KM_TIME, payment_transactions.id AS KM_LAST_RAN, payment_transactions.amount FROM payment_plans, users WHERE users.id = payment_plans.user_id AND payment_plans.id > $KM_LAST_RAN_FOR_THIS_QUERY
Tracking when a user cancels their account
SELECT users.login AS KM_PERSON, users.canceled_at AS KM_TIME, users.canceled_at AS KM_LAST_RAN FROM users WHERE users.canceled_at IS NOT NULL AND users.canceled_at > $KM_LAST_RAN_FOR_THIS_QUERY