We will use SQL copy command to load data from CSV files into first_name and last_name
This command must be run from PostgreSQL account that was created with superuser option.
You may check if your current account is superuser by executing query:
On Ubuntu you can create superuser account using command:
After loading data from CSV files you should have a bunch of first and last names in
first_name and last_name tables:
Now we may use data from first_name and last_name to populate account table:
After executing this query account table will contain test data:
Often we want to enforce uniqueness of values in given table column.
For example each user should have unique login, there should be
only one person with given national id etc.
Databases allow us to implement such a constraint via unique index.
We will add unique index to account table to
ensure that each account email is unique:
If in the future we decide that after all emails may repeat, we may
drop (remove) index using:
Let’s check how it works:
Looks like our index works, but uses case sensitive check to compare two emails.
It would be better to ignore case when performing check, we may achieve this by using
lower function (converts strings to lower case) in our index definition:
Now index will check if lower(email) value is unique across table:
In databases indexes are mainly used to speed up queries.
By keeping additional information database can retrieve values more
The following example will illustrate that:
We can use explain to ask database how it will execute query.
In our example since we don’t have any index on first_name column
database will read every row in the table (that’s what Seq Scan mean)
and then check if fist_name is equal to Michael. If our table
would contain millions of rows this would be really slow.
To speed things up we may add index on first_name column
that will keep track where rows with given first_name are stored:
Now we can see that database will use index to perform search (Index Scan).
When we look at cost returned by explain we can see that
now it’s equal to 0.29 - 8.63, and without index it was equal to 0 - 1907.98.
This means that we should expect x200 performance improvement especially
with more rows in account table.
Indexes are great way to speed queries up but they come with a cost, database must
update index every time you insert a new row into table or update existing one.
Adding too many indexes to table may actually slow things down, be warned.
PostgreSQL allows us to define index on multiple columns.
For example let’s say that we want to speed up query that filters
accounts by first name and last name:
As we can see this query uses idx_account_first_name index that we created in
previous section and has quite good performance already.
But we can do even better by creating index on first_name and last_name column:
Now we get:
We didn’t gain much in this example (running time 0.017 ms vs 0.013 ms)
but it’s worth to know that
multicolumn indexes are available in PostgreSQL.
If we define index on columns col1, ..., colN PostgreSQL can use it
to optimize queries that test for equality/inequality for columns col1 ... colK and
that use comparisons (equal, greater then, less than etc.) on col(K+1).
In other words our multicolumn index will be used to optimize queries:
But our index cannot be used to optimise query:
The last thing to remember is that multicolumn indexes can take up a lot of space
on disk. We can get size of all indexes for given table via query:
Sometime we want to index only part of the table rows. In our example we may
want to index emails of only active users.
To support such scenarios PostgreSQL provides partial indexes.
After index creation database will use index to speed up queries like:
But will use SeqScan to perform queries:
That’s all for today, thanks for reading.
If you have any suggestion how I can improve this blog please
leave a comment!