Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    forums
Old 12-21-2005, 02:51 PM   #1 (permalink)
Belisarius
Java fanboy
 
Belisarius's Avatar
 
Join Date: Aug 2003
Posts: 1,140
Belisarius is on a distinguished road
Need help optimizing a query

I'm using PostgreSQL 8.0, with PostGIS (the geo-spatial extensions) added in.

I have two tables, "trip" and "data". The trip table has about a thousand entries, the data table about 52 million.

When a trip occurs, a 1000x1000 grid is formed. For each cell in the grid, if any data is collected, it is entered into the "data" table. The result is that for each trip, a partial 1000x1000 grid is inserted into the table (ie, only the cells with data in them are entered, not blank ones).

The important columns in the "data" table are the grid location, the trip the particular row-entry is associated with, and the amount of data collected.

The goal is to find all the trips the collected data within a certain area (defined in the "data" table) between given dates (defined in the "trip" table).

Now, I can perform a query on the "trip" table and the "data" table individually pretty quickly - usually within ten seconds or so for the "data" table. However, if I try to cross-index them - trip.id = data.id, it takes upwards of half an hour. I have indexes on the id field in both tables, as well as on the grid location (in the form of a geospatial index) on the "data" table. I do not have a primary key in the "data" table.

I have tried using inner joins in an attempt to force the two aspects of the queries to run separately then link the two results, but the speed improvement was negligible. I'm not terribly experienced with joins, so I might be doing something wrong.

Any ideas on how to optimize this query?
__________________
GitS
Belisarius is offline   Reply With Quote
Old 12-21-2005, 09:13 PM   #2 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 596
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
I'm not always the best at optimizing, but I think the reason your query slows down is that when you do any sort of join, it creates the cartesian product of both tables and then searchs the resulting millions of rows for valid ones.

You might be able to avoid this by doing using perl (EDIT: oops forgot to look at the poster ... )[Or Java] to do two queries (don't know if Postgres supports nested queries) to return only the trip_id number from trips table to get all the trips from between the two dates and then construct an ugly looking SQL query on the data table in the form of 'SELECT location_data FROM data WHERE Trip_id = ($result1 | $result 2 | ...)' so basically make your program logic layer perform a join without generating a huge temp table (which may have forced your machine to start swapping pages).

I'm sure other folks here know how to actually use the advanced features of Postgres to do all this inside the database.
__________________
Stop intellectual property from infringing on me
teknomage1 is offline   Reply With Quote
Old 12-22-2005, 12:49 AM   #3 (permalink)
Belisarius
Java fanboy
 
Belisarius's Avatar
 
Join Date: Aug 2003
Posts: 1,140
Belisarius is on a distinguished road
I had considred that (I'm actually doing this all in PHP), but I figured there had to be a way to do it in SQL.
__________________
GitS
Belisarius is offline   Reply With Quote
Old 12-23-2005, 06:33 AM   #4 (permalink)
DJMaze
Senior Contributor
 
DJMaze's Avatar
 
Join Date: Mar 2005
Posts: 651
DJMaze is on a distinguished road
You could create an index that uses both tables.
Something like
Code:
CREATE INDEX trip_data ON (trip, data) USING btree (trip.id, data.id);
Code:
SELECT * FROM (trip, data) WHERE trip.id = data.id;
I'm not 100% if this code is correct cos i don't use table binding often.
There's a better one which also supports a great DELETE feature so if you delete a trip.id it also deletes all data entries that have the same id
DJMaze is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
ASP ADO SQL Query TheShadow MS Technologies ( ASP, VB, C#, .NET ) 2 06-09-2005 08:01 AM
Query returning hyperlink GameOn PHP 3 06-06-2005 07:21 PM
Inserting the results of a select query into seperate tables Ste_Boro PHP 1 02-17-2005 05:05 AM
a query type of questiton sde Program Design and Methods 6 08-21-2003 04:55 PM
foreach() while() for() or new mysql query? nemesis PHP 6 06-10-2003 06:57 PM


All times are GMT -8. The time now is 02:07 AM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0 RC8





Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting