database tips: 2011

Tuesday, 6 September 2011

Encrypting password in sql server

Despite that fact that everyone knows storing passwords in a plain text is evil, I am convinced that there are many large organisations that store customer passwords in plain text.

Recently a large company was attacked by a group of hackers and they used sql injection to get customer password which they then used to login to a portal and do bad things..

In this post I will show you a simple way to store passwords securely in your database.

There is an important difference between encrypting passwords and encrypting other information such as customer salary or credit card information. In the case of passwords, in most cases the DBA or any user does not need to know the customers password, so all you need is a way to encrypt the password (aka one way hash). But credit card numbers need to be seen in the plain form by the Finance team and employee salary by the Account team so you need to be able to encrypt and decrypt this information.

If you are encrypting the password data using EncryptByPassPhrase or EncryptByKey and a hacker/ or an “non-authorised internal user” got hold of your key or passphrase “which usually will be found on the server“ then your passwords can be decrypted using DecryptByPassPhrase or DecryptByKey.

In order to do that, I have written a function called password () which tales a string as a parameter and returns a string of the hash value. The function is inspired by the php function password() .. php rocks ;-)

Create a function

CREATE FUNCTION [dbo].[password]

(

@stringToHash VARCHAR(128)

)

RETURNS VARCHAR(128)

/*********************************************************************************

Name: dba_parseString_udf

Author: Y Bamarni

Purpose: A function to hash a string

Date Initials Description

----------------------------------------------------------------------------

2011-09-06 Initial Release

*********************************************************************************

Usage:

SELECT dbo.password ('mypassword') as encryptedString

*********************************************************************************/

BEGIN

DECLARE @hash VARCHAR(10),

@EncryptedValue VARCHAR(128)

SET @hash='SHA1'

SELECT @EncryptedValue=CONVERT(VARCHAR(128), HASHBYTES(@hash, @stringToHash),2) -- the convert is to remove the begining two characters

RETURN @EncryptedValue;

END

Now in order to test this, we will create a test table called Test

create table Test (id int identity, username varchar(50), pass varchar(128))

To insert a new record, use the new password function to convert the password string to an encrypted string

insert into Test (username, Pass) values ( 'Yasir1', dbo.password('mypassword'))

insert into Test (username, Pass) values ( 'Yasir2', dbo.password('mypassword2'))

insert into Test (username, Pass) values ( 'Yasir3', dbo.password('mypassword3'))

insert into Test (username, Pass) values ( 'Yasir4', dbo.password('mypassword4'))

If you run a SELECT, you will notice that the password is encrypted

select top 100 * from Test;

If you want to update the password (say you need to reset a password)

update Test set pass =dbo.password('myNewPassword') where id=1

If you want to check if a password supplied by the user matches the password hash in the database.

SELECT * from Test WHERE username ='Yasir3' and pass=dbo.password ('WrongPassword') -- deny

SELECT * from Test WHERE username ='Yasir3' and pass=dbo.password ('myNewPassword3') -- grant

Friday, 1 July 2011

using a check constraint to enforce values in a column based on values in another table

You can enforce business rules in a column based on values in another column thats in another table by creating a check constraint with a UDF function.

On scenario that you may need to do this is imagine you have a global table that holds states. this table holds stateIds for all entities in your database (for instance customer stateIds, order stateIds, sale stateIds...etc)

In this case you may not be able to use a foriegn key constraint ( because you need to restrict a subset of the stateIds per table)

Here is an exanple on how you can do tha.

Assume your global table that holds stateIds is called State

CREATE TABLE t1 (stateId INT NOT NULL)

go

CREATE FUNCTION dbo.fnc_IsValidState(@StateId INT, @TypeId Int) RETURNS BIT
AS
BEGIN

DECLARE @flag BIT = 1

IF NOT EXISTS (SELECT 1 FROM state WHERE stateId =@StateId AND ClassId=@TypeId)
BEGIN

SET @flag = 0

END

RETURN @flag
END

go

ALTER TABLE dbo.[t1] WITH NOCHECK ADD CONSTRAINT [CK_StateCHeck] CHECK ((dbo.fnc_IsValidState(StateId,1)=1))

Monday, 4 April 2011

Check if a database is a snapshot

You can quickly check if a database is a Snapshot or not by examining the source_database_id column of the sys.databases view

if the database is a snapshot, the source_database_id will contain the database_id of the original database. but if the database is a normal database, this field will be NULL.

-- Check if a database has a snapshot
DECLARE @DBname sysname
SET @DBname='MyDatabaseName'

-- check if a database is a snapshot or not
SELECT CASE WHEN source_database_id IS NULL THEN 'Not a snapshot' ELSE 'Snapshot' END AS Database_type , source_database_id, database_id
FROM sys.databases
WHERE Name=@DBname

You can use the same field to find all the snapshots for a specific database.

SELECT * FROM sys.databases
WHERE source_database_id =
(SELECT database_Id FROM sys.databases WHERE Name=@DBname)

This is useful if you have a script that regularly restore a database (say on a dev server) and needs to drop any snapshots before performing the restore.

Wednesday, 23 March 2011

SET VS SELECT

There are a couple of differences between using SET or SELECT statement when it comes to assigning values to local variables. Some are trivial and others may cause bugs that are hard to find in your code;

One of the issues that you need to be aware of is when using SELECT to assign value to a variable, if the select returns more than one record, it will assign one of the returned values to your local variable which may not be what you want.

But, SET will return an error if there are more than one records returned which in my openion is a better option. Here is a demo;

DECLARE @Id INT
DECLARE @Temp TABLE (ID INT , NAME VARCHAR(25))

INSERT INTO @temp (id, name) VALUES (1,'one')
INSERT INTO @temp (id, name) VALUES (2,'one')
INSERT INTO @temp (id, name) VALUES (3,'Two')
INSERT INTO @temp (id, name) VALUES (4,'Three')

-- With select you dont get an error

SELECT @Id =id FROM @Temp WHERE Name='one'
SELECT @Id AS RETURNS_one_of_the_records_Without_any_errors

--You get NO errors, and ONE of the records is returned which may not be what you want.

-- With SET you will get an error if there are more than one records returned
SET @Id =(SELECT id FROM @Temp WHERE Name='one')

--You get error, Which is better
SELECT @Id AS RETURNS_errors_as_There_Are_More_than_one_records

In the SET statement, you can check for @@Error and handle accordingly or if you have to use SELECT, you may be able to pick the record that you need by using TOP 1 and order by in the SELECT statement.

The second issue is when the SELECT returns NULL, it does not assign the NULL to the variable. So if your variable had a value of X before the SELECT statement was executed, and the SELECT returned NULL, your variable value will still be X as shown below.

DECLARE @Id INT
DECLARE @Temp TABLE (ID INT , NAME VARCHAR(25))

-- Assume @Id had a vlaue of 100
SET @Id =100

-- Get the new value of @id from a table using SELECT
SELECT @Id =id FROM @Temp WHERE Name='Does not exist'

--You get wrong value. the value of @id should be Null and not 100
SELECT @Id AS Id_Value_Should_be_Null

SET @Id =(SELECT id FROM @Temp WHERE Name='Does not exist')

--This will show correct value i.e Null
SELECT @Id AS Id_Value_Should_be_Null
SELECT @Id =(SELECT id FROM @Temp WHERE Name='Does not exist')

--This will show correct value i.e. null
SELECT @Id AS Id_Value_Should_be_Null

Hopefully knowing these settle differences will save you time debugging !

Monday, 14 March 2011

Why “Extent Scan fragmentation” remains high after rebuilding index

According to BOL, extent scan fragmentation is; “Percentage of out-of-order extents in scanning the leaf pages of an index. This number is not relevant to heaps. An out-of-order extent is one for which the extent that contains the current page for an index is not physically the next extent after the extent that contains the previous page for an index.”

It is also referred to as “physical fragmentation”, which basically means that the extents on the disk are not contagious.
If rebuilding an index did not bring this value down, it means your physical volume is fragmented.

In my case, I have a sql server running on a netapp san (FAS 3140)

I have a table called objects containing 12+ million records and a clustered index and a couple of non clustered indexes.

When running the following command, I get extent scan fragmentation of 60.59%
dbcc showcontig (Objects,IX_ObjectsExpires)

DBCC SHOWCONTIG scanning 'Objects' table...
Table: 'Objects' (309576141); index ID: 12, database ID: 17
LEAF level scan performed.
- Pages Scanned................................: 30593
- Extents Scanned..............................: 3852
- Extent Switches..............................: 3917
- Avg. Pages per Extent........................: 7.9
- Scan Density [Best Count:Actual Count].......: 97.63% [3825:3918]
- Logical Scan Fragmentation ..................: 0.88%
- Extent Scan Fragmentation ...................: 60.59%
- Avg. Bytes Free per Page.....................: 683.2
- Avg. Page Density (full).....................: 91.56%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Alter rebuilding the index

Alter index IX_ObjectsExpires ON [Objects] Rebuild;
The value of extent scan fragmentation is gone up!

DBCC SHOWCONTIG scanning 'Objects' table...
Table: 'Objects' (309576141); index ID: 12, database ID: 17
LEAF level scan performed.
- Pages Scanned................................: 31109
- Extents Scanned..............................: 3889
- Extent Switches..............................: 3888
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [3889:3889]
- Logical Scan Fragmentation ..................: 0.01%
- Extent Scan Fragmentation ...................: 69.17%
- Avg. Bytes Free per Page.....................: 806.1
- Avg. Page Density (full).....................: 90.04%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

-- as can be seen, the logical fragmentation has dropped to 0, but the extent fragmentation has gone up!!

I run “reallocate” command on the volume containing the database. please note that this is a Netap specific command that reorders the block on a volume. Your SAN vendor will provide a different command.

reallocate start -f /vol/ascio_otherdb_data_vol

During reallocate, cpu gone up from 1 to around 27 % on the filer.
But no performance hit was noticed on the database. After the reallocation was completed, I rebuild the index again;

alter index IX_ObjectsExpires ON [Objects] Rebuild;

and hooray ..extent fragmentation set back to almost 0.15%

DBCC SHOWCONTIG scanning 'Objects' table...
Table: 'Objects' (309576141); index ID: 12, database ID: 17
LEAF level scan performed.
- Pages Scanned................................: 31109
- Extents Scanned..............................: 3889
- Extent Switches..............................: 3888
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 100.00% [3889:3889]
- Logical Scan Fragmentation ..................: 0.00%
- Extent Scan Fragmentation ...................: 0.15%
- Avg. Bytes Free per Page.....................: 806.0
- Avg. Page Density (full).....................: 90.04%
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

-- for a volume size of 400 GB, it took around 1 hour t complete the reallocation command on the filer.

Mon Mar 7 17:28:52 GMT [LDN-HA-SAN01: wafl.scan.start:info]: Starting file reallocating on volume ascio_otherdb_data_vol.
Mon Mar 7 18:26:41 GMT [LDN-HA-SAN01: wafl.reallocate.full.done:info]: Full reallocation on '/vol/ascio_otherdb_data_vol' is complete.

CXPACKET wait times and Degree of Parallelism

I have heard many times that the quickest way to improve the performance of a Sql Server instance is to throw more hardware at it. You would think this makes sense, but I have seen doubling the number of CPUs from (8 to 16) decreased the servers performance !

The reason is, under the default server settings, Sql Server will try to execute queries across all the CPUs in parallel. Depending on the type of the workload (OLTP or OLAP), the queries may take much longer to execute as splitting the query plan and combining the result again could take a lot loner than executing the query serially or with less number of CPUs.

Generally you will notice the parallelism issue with servers having more than 8 CPUs.
A quick way to check if your server is suffering from excessive parallelism is to look at the CXPACKET wait times.

This query will give you the top 10 wait types.

select top 10 * from sys.dm_os_wait_stats order by wait_time_ms desc

If you find CXPACKET value at the top of the list and you have more than 8 CPUs, you will probably benefit from lowering the degree of parallelism.
Also note that completely disabling parallelism (setting value to 1) may adversely affect large queries (OLAP). So you need to find the best value for your environment.
You can set the MAXDOP for individual queries which will override the server settings.

In order to change the degree of parallelism from the default (0) which means use all available CPUs to (say 4)

-- set to number of physical processors
EXEC sys.sp_configure N'max degree of parallelism', 4
GO
RECONFIGURE WITH OVERRIDE
GO

This will take effect immediately and no service restart is required.

Another option that is related to the degree of parallelism is the threshold (in seconds) that sql server will check before deciding to use parallelism.
The default value is 5 (i.e. Sql server will only use parallelism if the cost of executing a query serially is more than 5 seconds)

The following statement will modify this setting so that, sql server will only use parallelism for queries that take longer than 10 seconds to execute.

-- number of seconds elapsed
EXEC sys.sp_configure N'cost threshold for parallelism', N'10'
GO

The pictures below shows the affect of lowering MAXDOP on one of my servers on the processor queue length.

There is no rule of what value you should use, it all depends on your specific environment. But here are a few guidelines. Create a baseline of your server so that you can compare the performance of your server before and after the change.

Friday, 11 March 2011

upgrading sql server 2008 to SP2 breaks reports

After upgrading an instacne of sql server 2008 to SP2, I noticed that non of the reports from the management studio are accessable. including the reports that I frequently look at under the Management Data Warehouse.

The error message I was getting was;

The file 'Microsoft.ReportViewer.WinForms, version=9.0.0.0, Calture=neutral, publicKeyToken=b03xxxxxxx' cannot
be opened. Do you want to remove the reference to it from the Recent list?

In order to solve this, you need to download and install the Reporting Services Report Viewer 9.0 available from the following link;

http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=6ae0aa19-3e6c-474c-9d57-05b2347456b1

installation was very quick and did not require any restart.

Wednesday, 23 February 2011

Display different colour for different query windows in SSMS

When working with multiple query windows in SQL server management studio, It is useful to visually distinguish which server/ environment you are connected to.
Here is how you can configure SSMS to show a different colour for each environment.

In the Views menu click on Registered Servers "CTL-ALT-G"
Right click on the Local Server Groups under Database Engine and create some server groups e.g. (DEV, SIT , Demo, Prod)

Then right click on one of the new groups and create “New Server Registration”
In the “General” tab of New server Registration, configure the connection details
In the “Connection properties” tab, there is an option to “use custom color”. Tick the check box and pick your favourite colour
Once that is setup, every time you start a new query (right click on the server and select new Query) the query window will show the configured colour in the task bar.

Wednesday, 16 February 2011

Stored procedure slower than ad-hoc query

This morning I got a call from a developer saying that he is getting timeout on a call to a procedure. can you investigate please? We have orders that are piling up and not getting processed.! top priority

The developer sent me a windows form that i could use to test the issue. He was claiming that when calling the procedure, it takes 30 seconds then he gets a timeout but when using inline sql the query runs instantly.

Grabbed a coffee, fired the win form using the stored procedure and yes I got a timeout after around 30 seconds, then run the same thing using the inline query and it took a couple of milliseconds..himm

Executed the query that the procedure uses from management studio and it took a couple of seconds! I looked at the query plan, it was using an index seek and no issues there, no blocking too..

Sql server was using a different plan (in this case, a bad plan) for the procedure and a different plan when executing the ad-hoc query.

The issue happened to be that SQL Server creates an execution plan for a stored procedure when it is first called, and it tries to be cleaver by optimising it for the parameter that was used i the first call. However, this presents a problem if there is a large difference in the distribution of the number of records based on different parameters.

Let me explain:
If I execuete a procedure the first time and pass a parameter that returns 1 record from a table of 100 million record. Sql server creates a plan optimised for this parameter. If I then call the same procedure and pass a different parameter than returns 5000 records, assuming the first plan was a using an index seek and a key lookup, it will do the same for 5000 records which will be very inefficient.

There are a couple of ways round this;
1. to create local variables inside a stored procedure and assign the procedure variables to teh local variables. this way sql server will not be able to sniff the local variables.

2. use the OPTIMSE FOR () Hint

Create Procedure [dbo].[sp_GetObjectsByAttribute]
(
@AttrId int,
@Value varchar(128),
@Owner varchar(128)
)
As
SET nocount ON

SELECT TOP 500 o.ID, o.Handle, o.ClassID, v.AttributeID, o.Flags,o.Owner
FROM dbo.ObjectValues v
INNER JOIN dbo.[Objects] o on o.ID = v.ObjectID
WHERE
v.AttributeID = @AttrID
AND v.Value = @Value
AND o.Owner= @Owner
AND o.Flags & 1 = 0
OPTION (OPTIMIZE FOR UNKNOWN)
3. Force recompilation of the procedure evey time it is executed (not recommended)

This is a good post
http://www.simple-talk.com/sql/t-sql-programming/parameter-sniffing/