| BIK - Constructing SQL that selects the latest versions of records |
|---|
|
Problem at hand:
We have a table where several versions of the same record may exist
(for instance, person_name along with person_position.
We may only be interested in the last position the person occupied).
Usually transaction date is the determining factor.
|
| Example |
|---|
Let's consider a table like this
(We are using Oracle 9i syntax just because we don't want to actually create a table,
but if you still use 8i you can create this table yourself):
with mytable as (
select 1 key1, 2 key2, 'abc' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 1 key1, 2 key2, 'bcd' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 1 key1, 3 key2, 'ert' col1, to_date('20050703','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'qwe' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'plo' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 3 key1, 4 key2, 'dfg' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 4 key1, 4 key2, 'sdf' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
)
SELECT * FROM mytable;
We also assume that the table has two keys that will make our records different once we get rid of older versions).
It would display results like this:
KEY1 KEY2 COL1 TXN_DT
---------- ---------- ---- ---------
1 2 abc 01-JUL-05
1 2 bcd 02-JUL-05
1 3 ert 03-JUL-05
2 1 plo 02-JUL-05
2 1 qwe 01-JUL-05
3 4 dfg 01-JUL-05
4 4 sdf 02-JUL-05
notice to records with key1, key2 pair 1,2 (7/1/2005 and 7/2/2005),
two records with key1, key2 pair 2,1 (7/2/2005 and 7/1/2005).
What we want is just the latest record for each of them. |
| Solution1. (Standard). |
|---|
Let's rewrite this statement as follows:
with mytable as (
select 1 key1, 2 key2, 'abc' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 1 key1, 2 key2, 'bcd' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 1 key1, 3 key2, 'ert' col1, to_date('20050703','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'qwe' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'plo' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 3 key1, 4 key2, 'dfg' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 4 key1, 4 key2, 'sdf' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
)
SELECT * FROM mytable WHERE (key1, key2, txn_dt) IN
(SELECT key1, key2, max(txn_dt) FROM mytable GROUP BY key1, key2)
;
It produces results like this:
KEY1 KEY2 COL1 TXN_DT
---------- ---------- ---- ---------
1 2 bcd 02-JUL-05
1 3 ert 03-JUL-05
2 1 plo 02-JUL-05
3 4 dfg 01-JUL-05
4 4 sdf 02-JUL-05
|
| Solution. (Unusual). |
|---|
with mytable as (
select 1 key1, 2 key2, 'abc' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 1 key1, 2 key2, 'bcd' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 1 key1, 3 key2, 'ert' col1, to_date('20050703','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'qwe' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'plo' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 3 key1, 4 key2, 'dfg' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 4 key1, 4 key2, 'sdf' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
)
SELECT * FROM (
SELECT mytable.*
,row_number() OVER (PARTITION BY key1, key2 ORDER BY txn_dt DESC) rn
FROM mytable
) WHERE rn<2
;
This solution produces the following results (note the last column):
KEY1 KEY2 COL1 TXN_DT RN
---------- ---------- ---- --------- ----------
1 2 bcd 02-JUL-05 1
1 3 ert 03-JUL-05 1
2 1 plo 02-JUL-05 1
3 4 dfg 01-JUL-05 1
4 4 sdf 02-JUL-05 1
This solution is a bit unisual but may have some other uses.
For instance, if you try the inner part of this sql:
with mytable as (
select 1 key1, 2 key2, 'abc' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 1 key1, 2 key2, 'bcd' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 1 key1, 3 key2, 'ert' col1, to_date('20050703','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'qwe' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 2 key1, 1 key2, 'plo' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
union
select 3 key1, 4 key2, 'dfg' col1, to_date('20050701','YYYYMMDD') txn_dt from dual
union
select 4 key1, 4 key2, 'sdf' col1, to_date('20050702','YYYYMMDD') txn_dt from dual
)
--SELECT * FROM (
SELECT mytable.*
,row_number() OVER (PARTITION BY key1, key2 ORDER BY txn_dt DESC) rn
FROM mytable
-- ) WHERE rn<2
;
you will get results like this:
KEY1 KEY2 COL1 TXN_DT RN
---------- ---------- ---- --------- ----------
1 2 bcd 02-JUL-05 1
1 2 abc 01-JUL-05 2
1 3 ert 03-JUL-05 1
2 1 plo 02-JUL-05 1
2 1 qwe 01-JUL-05 2
3 4 dfg 01-JUL-05 1
4 4 sdf 02-JUL-05 1
Notice what is happening here is that column RN is giving you a sequence number
within its group. This may be useful by itself.
The outer SQL just selects the first record from each group.
|
Back to BIK Information Services home page