sql - Is integer column index faster than string column index in case of index range scan? -

August 15, 2011

i have database implementation task on sql server, there table a containing column - yearmonth. don't need date operations such calculating how many days or month between 2 dates etc. currently. yearmonth can defined date, int, or varchar(6). perspective of saving data space, 4-bytes int best choice 6-digit int needed e.g. 201701. whereas varchar(6) occupies 6 bytes, date occupies 2x4 bytes. (applied databases)

but perspective of indexing, esp. in case of index range scan?

if column yearmonth defined varchar(6), index range scan can happen when using query select .. yearmonth in (...)
if column yearmonth defined int or date, index range scan can happen operator <=, <= etc.

in cases above, type of column definition more efficient when index range scan happens?

most (if not all) dbms store date integer anyway, , datetime 2 integers, 1 date , 1 time, there little or no difference between two. think biggest consideration how intend use column, if want kind of date operations on column, store date (defaulting 1st day of month). instance, if want know how many months between 201604 , 201701 easier using date, if want format value april 2017 lot easier if stored date.

another consideration validation, if have varchar(6) or int, need additional check constraints ensure value entered in fact valid date, enter 999999, , while year valid, month not, , varchar possibilities endless nonsense can entered.

now have tagged sql server can answer more definitively - both date , int take 4 bytes of storage there no space saved, , testing both perform pretty same (date performs marginally, not better , fewer reads), there no benefit using int (unless don't want restricted valid dates)

i have done quick tests using following schema:

create table dbo.tdate (id int identity(1, 1) primary key, dt date not null); insert dbo.tdate (dt) select top 100000 dateadd(month, rand(checksum(newid())) * 300, '20000101') sys.all_objects a, sys.all_objects b;  create nonclustered index ix_tdate_dt on dbo.tdate (dt);  create table dbo.tint(id int identity(1, 1) primary key, dt int not null); insert dbo.tint (dt) select (datepart(year, dt) * 100) + datepart(month, dt) dbo.tdate;  create nonclustered index ix_tint_dt on dbo.tint (dt);

then running compare performance

declare @d1 date = (select top 1 dt dbo.tdate order newid()); declare @d2 date = (select top 1 dt dbo.tdate dt > @d1 order newid()); declare @i1 int = (datepart(year, @d1) * 100) + datepart(month, @d1),         @i2 int = (datepart(year, @d2) * 100) + datepart(month, @d2);   set statistics io on; set statistics time on;  select  count(*)    dbo.tdate   dt >= @d1 ,     dt < @d2;  select  count(*)    dbo.tint   dt >= @i1 ,     dt < @i2;  set statistics io off; set statistics time off;

Search This Blog

MOno

sql - Is integer column index faster than string column index in case of index range scan? -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to provide dependency injections in Eclipse RCP 3.x? -