Tuesday, May 26, 2009

Best Practices , Design and Development guidelines for Microsoft SQL Server

Madhu K Nair, Jul 24, 2008

There are many resources available in the net but here I have a list of Best Practices,Design Guidelines and General Guidelines in Database Development and Designing specifically for SQL Server.

Best Practices

1. Use Stored Procedure: Benefits are as follows :-
(a) Code reusability
(b) Access Control: You can control permission on sp
(c) Execution plan reusability : Though adhoc query also create and reuse plan, the plan is reused only when the query is textual match and the datatypes are matching with the previous call. Any datatype or you have an extra space in the query then new plan is created

Select  Sal from Employee where sal=$10  --Money
Select  Sal from Employee where sal=10  -- Int 

Above statements will create different execution plan because the datatype of value is different.
(d) Prevent SQL Injection
(e) Procedure gives more Readability and Manageability.

2. Use Fully Qualified Name for objects : This is very significant. You must use fully qualified name when you refer any object in SQL Server. Ie. SchemaName.ObjectName. Because, when the execution plan is prepared by the query engine , in the binding process, Query engine has to resolve the Object existence. If you specify the fully qualified name the object resolution become easy for the engine and also it will be more readable.

3. Avoid using Scalar Function in SELECT statement: Recently I faced this issue and I emphasis this point. Never use Scalar function inside a query which returns a large number of rows. Scalar function behave like a cursor when you use Scalar function inside a query which returns large number of rows . Change the scalar function to Inline or Multiline table function or a view.

4. Avoid Mixing-up DML and DDL statement on a temp table inside sp : This is very important. When you Create a temptable (#table) and ALTER the same temptable in the same storedprocedure later, this DDL and DML mix-up causes the stored procedure to get recompiled. So, if a stored procedure is getting recompiled in each call check this point.

5. Select only the required columns: Select only the required column in select statement. Usage of SELECT * can be the cause of NOT using the indexes available on the table . Also if you are selecting more data then you are doing more IO. In short we should limit the IO.

6. Avoid Usage of HINTS : HINTS prevent Query engine automated optimization capability. You may find a hint gives you better performance on a particular scenario. But it may behave differently as the data grows or when scenario changes. Check this KB on HINTS http://msdn.microsoft.com/en-us/library/ms187713.aspx

7. Use Table variable and Temp table as far as possible: You must use Table variable (@TableName) or Temp table (#TableName) for intermediate storage of records in an procedure. Avoid using Table variable for large record set. There are pros and cons between Table variable and Temp table, but in general, if the record set is small you should go for Table variable.

8. Use SET NOCOUNT ON : Basically, you must reduce the data transferred on the network. Database Engine, return the number of rows effected by the statements to the client which is unnecessary and you can avoid that using this statement. It is a must in all Stored procedure.

9. Do not change SET OPTION In connection: Changing SET Option during connection or anywhere else will cause the stored procedure to recompile. Refer this KB for more info http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx

10. EXISTS vs IN : IN Operator can easily be replaced by EXISTS which is more optimized for correlated queries. But you may find IN better for small tables.

11. Keep the transaction as short as possible: Deadlock is a outcome of ill-formed query. You must keep the transaction as short as possible to avoid dead lock. And also refer the object in the same order inside transaction.

12. Avoid user input inside a transaction: Do not accept a user input inside a transaction.

13. Avoid doing Front-End work in Databases : String processing , Ordering , Conversion etc can easily be done at the client side and you should avoid doing these kind of work in Database. The point here is, you should have a clear segregation of tasks between Data Layer, Data Access Layer (DAL) and Business Layer. For example, you can easily number your rows in client side where as if you do that in Database you have to use a Function or extra join which can be avoided.

14. Avoid Function in select statement: Any functions like CONVERT(),CAST,ISNULL usage in query may ignore the indexes available on the table.

15. Do not use EXEC (‘String’) , use sp_executeSql : As far as possible you try to avoid Dynamic SQL. If there is no other option use sp_ExecuteSQL DO NOT USE EXEC(‘string’). Because EXEC statement is prone to SQL Injection and it is not parametrized query which can re-use execution plan.

16. Use proper size for the input parameter: This is one of the step to avoid SQL Injection and also the reduce the memory usage.

17. Do not keep the name of sp with sp_ prefix: Naming convention is very important. Do not name the storedprocedures with SP_ as prefix (eg sp_ somespname ) because this naming convention is used for system stored procedure in MS SQL Server.

18. USE WHERE Condition as far as possible: Basically, you should limit the rows fetched by the query.

19. Avoid Negative operator : Avoid using <> , NOT IN, NOT EXISTS kind of operator because it causes a table scan. Query engine has to ensure there is not data till the last row is read.

20. Avoid Cursor /loops: In SET Based operation, in general looping can be avoided.

21. Avoid using Like ‘% %’ : If you use % in both side of the searching value, the query will go for table scan which should be avoided. If the application is more text searching kind go for Full Text Index.

22. Do not use WITH Recompile : Usage of WITH Recompile causes the procedure to recompile each time it call. You must avoid this command.

23. JOIN Consideration : When you JOIN two table consider these points
(a) Avoid using negative operator (<> ,NOT IN) in JOIN
(b) Avoid Like operator in Join

Design Guidelines

1. Create Covering indexes: Create covering indexes. Covering index will have all the data required by the query at the leaf level itself. Covering contains all the columns used in SELECT, WHERE, ORDERBY, JOIN etc.

Select Col1,Col2 From YourTableName Where Col3=1 Order by Col4.

The coveing index for the above mentioned query will be
Col1+ col2+ col3+ col4. (Note : Most selective column should come first in the index creation statement)

2. Remove Unwanted indexes : In SQL Server 2005 it is very easy to find unused indexes. Too many or too less indexes on a table are equally bad. If you have unwanted/unused indexes on a table Insert/Update statement will have performance hit and also we all know indexes consume space.

3. Create the indexes most selective column as first column in Index : Index creation has to be done after proper analysis. You must create the index with Most Selective column at first and so on.

4. Formatting the stored procedure and queries : You must have a format / template for each object (sp/function/views) and everyone (the dev team) should stick to the format defined. And also the query has to be formatted well so that it is more readable.

5. Use Identity column if the table is INSERT oriented table as Clustered Index to avoid page split : This is a design and data modeling issue. If you have more insert kind of table (some kind of logging table) then you must go for Identity Column (ever increasing) as Clustered Index. This helps to resolve page split. There may be Hotspot issue (all transaction contending for same page of a table), but I have never faced.

6. Use proper fillfactor for Indexes: Very important to avoid Page Split. In general transactional table can be kept at 80-90 fillfactor.

7. Balanced Normalization / De-normalization: You must have a trade off between Normalization and de-normalization. At time De-normalization can give you better performance at the cost of Data redundancy.

8. Primary Key size and Composite primary key : You must limit the size of the PK because, in a relational database, you may be creating foreign key which refers this primary key. If you have multiple Column in PK (composite PK) or big size , you are knowingly or unknowingly increasing the space usage. If the composite PK contains more than 3 columns then you may go for surrogate key like Identity column as PK.

9. Do not alter system Objects: If your application requires some tweaking of system objects then you are in trouble. The structure of system object can be changed by Microsoft in any release or patches. So avoid such modeling.

Guidelines for Datatype Selection

As a Database architect I believe in the significance of proper datatype selection while designing the tables. if you do a proper analysis of the data and then select the datatype, then you can control the row, page, table size and hence increase the overall performance of the database. Following points you may consider when you design a table :-

1. If your database is to support web-based application better to go for UNICODE for the scalability of the application. (Unicode (nchar, nvarchar) takes 2 bytes per char where as ASCII (char,varchar) datatypes takes 1 bytes per char)

2. If your application is multi-lingual go for UNICODE.

3. If you are planning to include CLRDatatype (SQL Server 2005) in the database go for UNICODE Datatypes , because, if CLRDatatype is going to consume the data then it must be in UNICODE.

4. For numeric column, find the range that column is going to have and then choose the datatype. For eg. You have a table called Department and DepartmentID is the Primarykey Column. You know that the maximum rows in this table is 20-30. In such cases it is recommended to choose TinyINT datatype for this column. Generally keeping all numeric columns type of INT without analyzing the range that column going to support is not at all recommended from storage perspective.

5. Description /Comments /Remarks sort of columns may or may not have data for all the rows. So it is better to go for Variable datatypes like Varchar ,Nvarchar.

6. If you know the column is not nullable and it may contain more or less the same size of the data then for sure go for Fixed datatype like CHAR or NCHAR. Having said that it is important to know that, if you select fixed datatypes and if the column is nullable then, if you do not have any data (null) then also the column will consume the space.

7. If the size of the column is less than 20 char , use fixed width datatypes like NCHAR or CHAR.

8. I have seen in many applications use Decimal to store currency kind of data though the application needs the precision which can be supported by money. So, my point here is, use Money datatype if you need only 4 precision.

9. Use UniqueIdentitifier column as PK and ClusteredIndex or so only when it is unavoidable because UniqueIdentitifier takes 16 Bytes of the space.

General Guidelines

1. Write ANSI standard Code : You must write standard code which will scale your application. ie migration to next version will not be an issue. Do not use Deprecated features. For eg. There are DBCC Command to rebuild index but in SQL Server 2005 in order to standardize things, you have ALTER INDEX command which does the same thing.

2. Do not give Error message which expose your architecture to the frontend: I have seen folks giving very detailed error message which tells you “ blah blah table do not have this rows in blah blah database” kind which can be a invitation to the hacker

3. Use proper Isolation level required for the application: This is very significant. Before going for any isolation level, you must know the implication. All application cannot afford READUNCOMMITTED Isolation level since it can cause data inconsistency issues like Dirty Read, Phantom read, Lost Update etc. WITH NOLOCK Hint is nothing but READ UNCOMMITTED isolation level.

4. Keep the Database Object Scripts in Source Control: We all are aware of this point but generally ignore. This is important for fault tolerance and management when multiple developers are working on same project.

Last edited Aug 4 2008 at 9:27 PM by madhuottapalam, version 10


This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email

Sunday, May 24, 2009

SQL Server Support Policy for Failover Clustering and Virtualization gets an update...

According to the following article, SQL 2005/2008 clustering under Windows 2008 using VMWare/Hyper-V is now officially supported by Microsoft.





Thursday, May 14, 2009

Calling external web service from with SQL Server 2005

I am trying to establish the way to call external web service from within sql server 2005, previously we used a javascript and use SQL Agent Job to execute the script. However, it seems that SQL 2005 has problem with it.

The external consultant proposed another way, which is to use OLE Automation, while we managed to make it work occasionally, it seems to be unreliable at all. The cause of this is still unclear, I will look back once I get more timee.

After all, we know that the native way to call external web service is to use CLR and create assembly, since I haven’t done this before, it will be quite interesting to figure out how it works. So far, I have managed to create the CLR assembly and made it working. No sure why the other two methods didn’t work, but this final approach works pretty well.


Tuesday, May 12, 2009

Does upgrading to SQL Server 2008 fit your business?

Does upgrading to SQL Server 2008 fit your business?

Don Jones, Contributor

SQL Server 2008 has been around long enough to be considered both a known, stable quantity and a big question mark for many businesses. That question is often, "Should we migrate?"

First off, if you are implementing a new SQL Server installation and do not need a specific older version for compatibility reasons, you should buy SQL Server 2008. If you still use SQL Server 2000 or an older version and don't face compatibility hurdles, you should migrate. But the question becomes more complicated for businesses running SQL Server 2005 that have no particular technical barriers that absolutely prevent a migration. What should they do?

It depends on the applications you run
Businesses use SQL Server in two distinct scenarios. The first is to support third-party applications that require SQL Server, and the second is to support internally developed applications. In the first instance, moving to SQL Server 2008 is advisable only if a few conditions hold true. First and foremost, confirm that your application vendor supports SQL Server 2008. Second, migrating is useful only when you have enough control over the application to take advantage of SQL Server 2008's important new features.

SQL Server 2008 isn't so much of an improvement over SQL Server 2005 that you will experience a huge benefit just from migrating. To derive a business benefit, you need to leverage its new features. With many third-party applications, you have zero control over things like the database schema, file groups, index deployment and so forth. Unless the vendor releases a version of its application that leverages SQL Server 2008, there isn't much point in putting yourself through the migration process.

An exception, perhaps, is when you deploy a new SQL Server computer while also moving from an existing 32-bit SQL Server installation to a new x64-based machine. Moving from 32- to 64-bit offers major performance advantages, and unless you specifically need SQL Server 2005, you might as well upgrade to the newest version.

If you are in control of applications
If you use SQL Server to support a custom application, or if you exercise significant control over a third-party application (such as making schema changes and so forth), migrating from SQL Server 2005 to SQL Server 2008 can offer performance advantages, enable new capabilities, and make life easier on your developers. Some of the key new features include the following (though the Microsoft site offers a complete list of new features in SQL Server 2008):

  • Using data compression to save disk space. Granted, these days this capability has become less urgent but in some instances -- such as servers in a branch office -- disk space can still be precious.
  • Support for the ADO.NET Entity Framework and Entity Data Model can facilitate work with data in an object-oriented fashion. This, however, is more important to new applications than to existing ones, which aren't likely to be re-engineered to the degree necessary to take advantage of these features.
  • Some important limitations in the embedded .NET Common Language Runtime (CLR) have been overcome, such as limits on user-defined aggregates and user-defined types.
  • New data types support spatial computing, such as the geography data type, geometry data type, and Virtual Earth integration.
  • FILESTREAM data types allow large blocks of data -- such as photos or files -- to be moved outside the main database pages and into file system files that are faster to access and help trim the main database files .

Reasons to migrate, no matter what
There are a few additional reasons to upgrade to SQL Server 2008, all of which involve server-level improvements that aren't necessarily performance-related. SQL Server 2008 offers certain reliability and availability improvements, such as the following:

  • SQL Server 2008's database mirroring is more robust and, in my view, easier to configure and manage.
  • Policy-based management can be a tremendous feature for improving the consistency of SQL Server's configuration and ongoing management, especially with compliance or other restrictive requirements.
  • A new data auditing feature that provides more granular auditing of SQL Server activity.
  • Transparent Data Encryption (TDE) lets you turn on encryption without requiring massive application-level changes and can help meet specific security and compliance requirements.
  • The data collector helps make performance management and tuning easier and more centralized.
  • The Resource Governor gives administrators central control over CPU and memory allocation, which is useful for servers that host multiple databases and experience significant resource contention between those applications.
  • If you manage multiple servers, Server Group management enables you to run T-SQL queries against multiple servers simultaneously from a central management server. This makes server farm management easier and more consistent.

The bottom line: Is an upgrade worth it?
If some of SQL Server 2008's new features appear compelling or solve your IT shop's specific problems and if migrating your applications doesn't pose compatibility issues, the move to SQL Server 2008 is worth it. If you're already on SQL Server 2005 and do not consider the new features necessities, there may not be a justification to move at this time.

By 2010 or 2011, Microsoft will likely release another version of SQL Server (this predication is based on Microsoft's history, not any inside knowledge), so an upgrade to SQL Server 2008 may be optional if you're not going to use its new features to solve business problems.



This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email