Changing the datatype of an large table

28 Sep 2019 0 minutes to read Contributors

When you change the datatype of (large) tables (in rows or in size) you have to consider a few things:

  •  the time it takes to perform the action (=downtime of the table)
  •  the amount of data that needs to be restructured (I/O operations)
  •  the temporary LOG and TEMPDB workspace needed for the transactions

ALTER TABLE ALTER COLUMN VARCHAR(MAX)

  1. Altering an existing table will be performed in a single transaction.
  2. The time needed for altering a table will be depending on the speed of the I/O operations and the amount of available memory. 
  3. Because it is a single transaction, the size of the table x 1.5+ will need to fit in the LOG space.

The alter statement will go through every record and try to restructure it to the new structure. It will need also to rebuild the indexes

In case the transaction is being rolled back when you run out of transaction log space it will take 5x the time to roll back.

Prerequisites

  • Make sure you have more than enough LOG space available. Do not calculate it on 1.5x the size of the table but preserve more than 2x the size of the table at least.
  • Create additional emergency logfiles on drives that have enough diskspace (if they are not used, you can drop them after the operation). 
  • You will have an enormous amount of data operations. Every record will be updated and in case of an failure rolled back. Your disk subsystem will be fully utilized for the length of the operation.
  • Your operation will block all access to the affected table.

An important rule to remember is that you cannot influence the operation. Once it is started 2 things can happen, 1 the operation will complete succesfully, 2 the operation will be rolled back. During the operation you can only wait and monitor your system. You will also have limited possibilities to intervene. 
Because of this rule you have to prepare carefully. If you do not prepare and the operation will fail after 1 hour, you will have to wait 4-5 hours before it is rolled back. 

Alternatives

As a database administrator we always want to be able to control the actions we perform. Therefore we take the road of least resistance and maximum control.

Instead of altering a large table do the following:

  1. rename the table you want to alter
  2. create a new table with the original name with the new datastructure
  3. in batches - copy the data into the new table

In case your table is in replication, remove it from replication, add the new table to replication without initializing (no snapshot will be created, the original subscriber table will stay in place and new records will be replicated)

This approach makes your operation a very fast one. Your new structure will be directly available. The copying of data you can control. You can do batch copies or segmented BCP out/in's

Think of it this way, the result of both operations is equal. the amount of resources, time and control significantly works in your advantage.

Big Databases

Big databases require a different way of thinking and planning. 

In this article