sql - Is integer column index faster than string column index in case of index range scan? -
i have database implementation task on sql server
, there table a
containing column - yearmonth
. don't need date operations such calculating how many days or month between 2 dates etc. currently. yearmonth can defined date
, int
, or varchar(6)
. perspective of saving data space, 4-bytes int
best choice 6-digit int needed e.g. 201701
. whereas varchar(6)
occupies 6 bytes, date
occupies 2x4 bytes. (applied databases)
but perspective of indexing, esp. in case of index range scan?
- if column
yearmonth
definedvarchar(6)
, index range scan can happen when using queryselect .. yearmonth in (...)
- if column
yearmonth
definedint
ordate
, index range scan can happen operator<=
,<=
etc.
in cases above, type of column definition more efficient when index range scan happens?
most (if not all) dbms store date integer anyway, , datetime 2 integers, 1 date , 1 time, there little or no difference between two. think biggest consideration how intend use column, if want kind of date operations on column, store date (defaulting 1st day of month). instance, if want know how many months between 201604
, 201701
easier using date, if want format value april 2017
lot easier if stored date.
another consideration validation, if have varchar(6) or int, need additional check constraints ensure value entered in fact valid date, enter 999999
, , while year valid, month not, , varchar possibilities endless nonsense can entered.
now have tagged sql server can answer more definitively - both date
, int
take 4 bytes of storage there no space saved, , testing both perform pretty same (date performs marginally, not better , fewer reads), there no benefit using int (unless don't want restricted valid dates)
i have done quick tests using following schema:
create table dbo.tdate (id int identity(1, 1) primary key, dt date not null); insert dbo.tdate (dt) select top 100000 dateadd(month, rand(checksum(newid())) * 300, '20000101') sys.all_objects a, sys.all_objects b; create nonclustered index ix_tdate_dt on dbo.tdate (dt); create table dbo.tint(id int identity(1, 1) primary key, dt int not null); insert dbo.tint (dt) select (datepart(year, dt) * 100) + datepart(month, dt) dbo.tdate; create nonclustered index ix_tint_dt on dbo.tint (dt);
then running compare performance
declare @d1 date = (select top 1 dt dbo.tdate order newid()); declare @d2 date = (select top 1 dt dbo.tdate dt > @d1 order newid()); declare @i1 int = (datepart(year, @d1) * 100) + datepart(month, @d1), @i2 int = (datepart(year, @d2) * 100) + datepart(month, @d2); set statistics io on; set statistics time on; select count(*) dbo.tdate dt >= @d1 , dt < @d2; select count(*) dbo.tint dt >= @i1 , dt < @i2; set statistics io off; set statistics time off;
Comments
Post a Comment