|
Need help optimizing a query
I'm using PostgreSQL 8.0, with PostGIS (the geo-spatial extensions) added in.
I have two tables, "trip" and "data". The trip table has about a thousand entries, the data table about 52 million.
When a trip occurs, a 1000x1000 grid is formed. For each cell in the grid, if any data is collected, it is entered into the "data" table. The result is that for each trip, a partial 1000x1000 grid is inserted into the table (ie, only the cells with data in them are entered, not blank ones).
The important columns in the "data" table are the grid location, the trip the particular row-entry is associated with, and the amount of data collected.
The goal is to find all the trips the collected data within a certain area (defined in the "data" table) between given dates (defined in the "trip" table).
Now, I can perform a query on the "trip" table and the "data" table individually pretty quickly - usually within ten seconds or so for the "data" table. However, if I try to cross-index them - trip.id = data.id, it takes upwards of half an hour. I have indexes on the id field in both tables, as well as on the grid location (in the form of a geospatial index) on the "data" table. I do not have a primary key in the "data" table.
I have tried using inner joins in an attempt to force the two aspects of the queries to run separately then link the two results, but the speed improvement was negligible. I'm not terribly experienced with joins, so I might be doing something wrong.
Any ideas on how to optimize this query?
|