Database Indexes Simplified
What is a database index? Understanding database indexes and DB optimization can be hard and confusing. I wanted to see if I can provide a simple explanation of what a DB index is, how it works, and how database indexes can be used. A good analogy for a database index could be an index in a book. If you are looking for a specific section in a book, you can either flip through every page (full table scan in database world) or find the section in the index and get a page number (pointer). Let’s consider a simple example: a table “people” with 2 columns (name and age) and we are using this table to store millions of user names with their appropriate age.
If you run a select query on that table ( SELECT * FROM people WHERE age = 30 ) you are making a full table scan (every row is searched for a match). Indexes are all about optimizing that table scan or speeding up search queries by eliminating the number of rows or records that need to be checked.
In more detail, an index is a data structure that stores column values from a table and pointers to the corresponding rows in the table. A pointer is a reference to a place in memory where the row data is stored on disk. So, in our example above, one of the values in the index for age could be (“age”,0×12345) where 0×12345 is the pointer (address on disk) where row data for “30″ is stored.
B- trees are a common data structure used for indexes as they are sorted. Hash tables is another data structure for indexes but they are not sorted. They are great for looking up key value pairs (queries that check equality) like our example above.
If we created a hash index on the age column above, the age value would be a key into the hash table (basically an associative array) and the actual value mapped to that key would just be a pointer to the row data in the table. So, the query above would be looking up a value 30 in a hash table and getting back a reference to the row instead of scanning the entire table.
Since hash tables are not sorted, they would not be good for queries like finding out all people that are less than 30 years old. B- trees would be good for that and they would also be good for our example above as the age values are now sorted and all numbers “30″ are next to each other so you don’t have to scan the entire table.
It’s important to note that the index for age does not store the values of other columns (name) in the table. If it did, you would just be creating another copy of the table which makes no sense. It stores pointers to the table row so that other column values can be retrieved (like the name or other fields you might have).
When you run the query above, the database checks to see if there are indexes on the columns being queried and makes a decision if it makes sense to use the index (with some queries it makes sense to scan the entire table).
Here’s how to create an index: CREATE INDEX index_name ON table_name (column_name)
In our example, it would be: CREATE INDEX age_index ON people (people_age)
Remember that indexes take up some space and they need to be updated when you add, delete, or update rows in your table. Wikipedia describes a database index as a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space. Hope this is a good introduction to understanding indexes in a database.
Guide to Indexes
How DB indexing works
btree vs hashtable