Saturday, February 7, 2009

C#: DataTable Performance


I thought it was time for me to share a simple performance tip when working with DataTables in .Net. We all know indexes in SQL Server, or any database, improve select access times when used correctly. Did you know you can do the same with DataTables?

Here's what I did. I created a DataTable with 2 columns, an ID of type int, and a description of type string. The initial sample created 100,000 rows with the ID in incrementing order starting at 1.
DataSet ds = new DataSet();
DataTable dt = ds.Tables.Add("TestTable");
dt.Columns.Add("id", typeof(int));
dt.Columns.Add("description", typeof(string));

for (int i = 0; i < 100000; i++)
{
DataRow dr = dt.NewRow();
dr["id"] = i;
dr["description"] = "asdf" + i;
dt.Rows.Add(dr);
}

Simple enough. Now, we do the standard DataTable.Select() to get rows with ID between 40,000 and 60,000:
DataRow[] drs = dt.Select("id > 40000 and id < 60000");
Using the StopWatch class, we get a timing (on my 2.8GHz laptop) averaging 267ms. Not bad, it's a lot of data we're selecting. But, let's see what happens when we do this:
dt.PrimaryKey = new DataColumn[] {dt.Columns["id"]};
DataRow[] drs = dt.Select("id > 40000 and id < 60000");
The times average 2ms on my laptop! That's 2 orders of magnitude faster! Well, you're thinking, "I don't use that much data in my datatable! I only have 1000 rows, but I select from it often." Alright, let's try that out:
for (int i = 0; i < 1000; i++)
{
int lower = r.Next(0, 500);
int upper = r.Next(500, 1000);
DataRow[] drs = dt.Select("id > " + lower + " and id < " + upper);
}

Doing that with 1000 rows in the DataTable yields a total time averaging 1,916ms. Let's put in the primary key. We get a timing of 183ms!!! So, on a 1000 row DataTable, selecting from it 1,000 times using a primary key results in a gain of an order of magnitude in performance time!

So, use those primary keys! Of course, a StringBuilder in that last example would help things out, too...

4 comments:

  1. Thanks!

    BTW StringBuilder in that last example would NOT help

    ReplyDelete
  2. I have 12000 rows in database table. i need some kind of calculation these before final use of this fields. when i try to get it by "datarow", one by one it takes too much time to generate complete datatable. I don't know what should i do..

    ReplyDelete
  3. Does it work if you add these to multiple columns?

    ReplyDelete
  4. Yes. I frequently add multiple columns as the primary key to datatables. This gives huge performance gains.

    ReplyDelete