by Stuart Blackler
After a rather long period of absence thanks to Bournemouth University, I am back again. There are a few subtle changes happening on the website as and when I can get time to make the changes. If you notice a bug or would like to suggest an improvement please do via the comments section on every post or via the social media links in the about page.
In this article, I am going to show you how to use the IDisposable interface correctly in your code. When I read others code, it is easy to pick up on subtle bugs. We need to begin to train ourselves to see the bugs and we do this by understanding what we are using. Before we begin, we need to make sure that we understand a core piece of computer science theory: Destructors.
Destructors
Generally speaking, destructors are the computers way of releasing resources from an application. In environments that contain a virtual machine with a garbage collection facility, the destructor is automatically called. In these environments however, the destructor is also called a Finalizer. Although these environments are good excellent at managing memory for us, we cannot guarantee when the Finalizer is going to be called.
Enter Dispose
The purpose of the Dispose is to guarantee when we are going to release resources. This might be at the end of a foreach loop or at the end of a database connection. Either way, we have control of when we can release the resources. There are two types of resources that can be released: Managed and Unmanaged.
Managed resources are typically objects that are run and controlled by the Common Language Runtime (CLR). Managed code supplies the metadata necessary for the CLR to provide services such as memory management and cross-language integration (Source). Unmanaged resources are those outside the CLR such as Win32 API's. These can be called from within managed code allowing some serious memory leaks if we are not careful.
The .Net libraries have some useful interfaces in them, one of them being the IDisposable interface. This interface has just one method called Dispose (the name seems standard from what I have seen). Here is the implementation of the interface:
public interface IDisposable
{
void Dispose()
}
When we first implement the interface on our class, we are given the following code:
public sealed class MyClass : IDisposable
{
public void Dispose()
{
/* Release resources here */
}
}
While this implementation is fine if you don't mind waiting for the garbage collector to come and release the resources. What if your class has a large object inside (say ~250mb). Do you really want to wait for the garbage collector? Probably not.
In order to fix our implementation, we need to do two things. Firstly, we need to implement a Finalizer and then implement an overload to the original Dispose method. The reason why we implement a Finalizer is because we want to safe-guard ourselves if we forget to call the Dispose method. For those that do not know what a Finalizer looks like, here it is:
public sealed class MyClass : IDisposable
{
public MyClass()
{
/* Constructor */
}
public ~MyClass()
{
/* Destructor */
}
public void Dispose()
{
/* Release resources here */
}
}
In order to safe-guard ourselves as I just mentioned, our Finalizer needs to call our Dispose method like so:
public ~MyClass()
{
/* Destructor */
Dispose();
}
You may have realised by now that we could, potentially, call the Dispose twice. The user will call it once followed by the CLR calling it for us in case we forget (through the Finalizer). This gives us the requirement for the overload of the Dispose method I mentioned earlier. If we call the Dispose method then it is safe for us to release managed resources. However, if the CLR calls the Dispose method then we cannot safely release managed resources because we do not know their current state.
Note: The CLR runs on a background thread, which we have no control over. Therefore, we cannot know any objects state on that thread.
Now that we have identified that the Dispose method can be called from two places, we can implement this into our code:
public sealed class MyClass : IDisposable
{
public MyClass()
{
/* Constructor */
}
public ~MyClass()
{
/* Destructor */
Dispose(false); // the CLR will call Dispose, so its an unsafe call
}
public void Dispose()
{
/* The interface implementation */
Dispose(true); // WE are calling Dispose, so its a safe call
}
public void Dispose(bool safeToFreeManagedResources)
{
/* Free unmanaged resources */
if (safeToFreeManagedResources)
{
/* Free managed resources */
}
}
}
Even though we have told the CLR that we are not to release managed resources twice, we will still release unmanaged resources twice. This is not only wasteful, but you could end up with an exception here which is something that SHOULD NEVER HAPPEN. Luckily for us, the CLR has a neat way for us to tell it not to call the Finalizer because we have already released all the resources necessary. Here is the one line magic fix:
public void Dispose()
{
/* The interface implementation */
Dispose(true); // WE are calling Dispose, so its a safe call
GC.SuppressFinalize(this); // WE have called dispose, there is no need to call it again Mr. GC.
}
Best Practise
Now that we have our code fixed, without any issues or bugs, it's time to know a best practise. When an object implements the IDisposable interface, we have the opportunity to use the using statement. The idea of the using statement is that once you have finished with the object, the CLR will call the Dispose method for you. Note I said Dispose not the Finalizer. The using statement is really easy to use:
static void Main(string[] args)
{
using (var myClass = new MyClass())
{
/* Do stuff here */
}
}
When the compiler sees this code, it actually expands it to this:
static void Main(string[] args)
{
var myClass = new MyClass();
try
{
/* Do stuff here */
}
finally
{
myClass.Dispose();
}
}
So there it is. Hopefully now you can implement IDisposable correctly according to your needs.
Post Permalink
by Stuart Blackler
In this post, I am going to show a small micro-benchmark to demonstrate the performance difference between the Semaphore and SemaphoreSlim classes in C#. A Semaphore is often used to restrict the number of threads than can access some (physical or logical) resource. In this case, we want the restriction to be as little as possible.
Semaphores are of two types: local semaphores and named system semaphores. If you create a Semaphore object using a constructor that accepts a name, it is associated with an operating-system semaphore of that name. Named system semaphores are visible throughout the operating system, and can be used to synchronize the activities of processes. You can create multiple Semaphore objects that represent the same named system semaphore, and you can use the OpenExisting method to open an existing named system semaphore. Source
A local semaphore exists only within your process. It can be used by any thread in your process that has a reference to the local Semaphore object. Each Semaphore object is a separate local semaphore. Source
The machine that I am using for this benchmark is a Intel core i3, clocked at 4ghz with 4GB DDR3 ram running Windows 7 x64 SP1 and .Net Framework 4.5.
In order to begin the test, I created a new console application and imported the BMark package from the NuGet repository. Next, I added the following code to the application as shown below:
const Int32 count = 106;
Semaphore regularSemaphore = new Semaphore(count, count);
SemaphoreSlim slimSemaphore = new SemaphoreSlim(count, count);
UInt64 amountToRun = (UInt64)(count - PerformanceTester.PreRunAmount - 2);
PerformanceTester.Run("Semaphore.WaitOne", amountToRun, () => { regularSemaphore.WaitOne(); });
PerformanceTester.Run("Semaphore.Release", amountToRun, () => { regularSemaphore.Release(); });
PerformanceTester.Run("SemaphoreSlim.WaitOne", amountToRun, () => { slimSemaphore.Wait(); });
PerformanceTester.Run("SemaphoreSlim.Release", amountToRun, () => { slimSemaphore.Release(); });
Console.WriteLine(PerformanceTester.GetResults());
By default, the PerformanceTester will run each test 4 times before starting the actual timed test. Since we are dealing with a blocking resource, I added some extra capacity so that the test would not block at any point. When the code is run in release mode without the debugger, the output of the program is:
Semaphore.WaitOne: 0.09ms NumberOfSamples: 100
Semaphore.Release: 0.05ms NumberOfSamples: 100
SemaphoreSlim.WaitOne: 0.01ms NumberOfSamples: 100
SemaphoreSlim.Release: 0.01ms NumberOfSamples: 100
As the results show, the SemaphoreSlim class is a tiny bit quicker. After testing this myself earlier, I thought that others could run this themselves and hopefully receive a small increase in performance in their applications. The reason for the performance increase is because the SemaphoreSlim class provides a lightweight alternative to the Semaphore class that doesn't use Windows kernel semaphores.
In essence, if you do not need a named Semaphore, use the SemaphoreSlim class.
Post Permalink
by Stuart Blackler
This post is dedicated to the release of a small utility class that I have just released on NuGet. The aim of BMark is to provide a simple way of running multiple microbenchmarks. This is my first ever NuGet package so there is a chance that I have done something wrong. Please notify me ASAP if you do notice anything and I will sort it out when I can.
Using BMark
In order to use BMark, please make sure that you have NuGet package manager installed, or download the code from the GitHub Project Page.
If you are using the NuGet package manager to install packages, simply search for BMark in the online sources and click install.
Or, if you prefer to use the NuGet package console, simply type in the following command:
Install-Package BMark
Now you are ready to run some benchmarks. The core method Run has three parameters which are the test name, the amount of times to run the test and the actual code to test. Here is a quick demonstration from a future blog post:
PerformanceTester.Run("SemaphoreSlim.Release", amountToRun, () => { slimSemaphore.Release(); });
Console.WriteLine(PerformanceTester.GetResults());
It's as easy as that. Feel free to contribute any modifications to the GitHub Project Page.
Happy benchmarking
Post Permalink
by Stuart Blackler
The observer pattern is a way of defining a one-to-many relationship between two classes. Typically the pattern is used to provide push based notifications of changes from the observed subject to the subscribed listeners, much like you receive notifications on a mobile phone.
For this pattern we need to two interfaces which classes will implement. First of all, we have IObservable which, when implemented, is going to push any changes to the observers. Secondly, we have IObserver which, when implemented, is going to listen to the IObservable instance for changes.
Note: I have made everything in this post as generic as possible with the aim of ease of use for you, dear reader.
IObservable<T>
This interface needs to define a subscribe method and optionally an unsubscribe method, such as this Java implementation:
public interface IObservable<T> {
void subscribe(IObserver<T> observer);
void unsubscribe(IObserver<T> observer);
}
IObserver<T>
This interface needs to define a method from which the IObservable instance can call when it detects a change. A typical example, taken from the .Net implementation, is to define an onNext method such as this implementation:
public interface IObserver<T> {
void onNext(T value);
void onCompleted();
void onError(Exception e);
}
The onNext method takes the value added/changed in the IObservable instance. I have also defined two additional methods for when an error is raised and when the observer should complete its work. This interface is already defined for .Net programmers.
The Headmaster Example
I am going to briefly show a full implementation using a school classroom. In this example, we are simply going to watch a Lesson for any pupils enrolling onto the lesson and write their names to the console window. To make this clear, the example is in Java.
To start off with, I am going to define an immutable (an object whose state cannot be modified after it is created) Pupil class:
public class Pupil {
private String _name;
private int _age;
public Pupil(String name, int age)
{
_name = name;
_age = age;
}
public String getName()
{
return _name;
}
public int getAge()
{
return _age;
}
}
Next, I am going to implement a Lesson class that implements IObservable of the type Pupil. If the concepts of generics confuses you, don't worry, I will explain this in a future post.
import java.util.ArrayList;
import java.util.List;
public class Lesson implements IObservable<Pupil> {
List<IObserver<Pupil>> _subscribers = new ArrayList<IObserver<Pupil>>();
@Override
public void subscribe(IObserver<Pupil> observer) {
// check the argument
if(observer != null)
{
_subscribers.add(observer);
}
// if the argument is null throw the exception or something
}
@Override
public void unsubscribe(IObserver<Pupil> observer) {
// check the argument
if(observer != null)
{
_subscribers.remove(observer);
}
// if the argument is null throw the exception or something
}
public void enrollPupil(Pupil p)
{
// do something with p here...
for(IObserver<Pupil> observer : _subscribers)
{
// Notify the subscribers of the new pupil
observer.onNext(p);
}
}
}
As you can see from the implementation above, in order to implement IObservable properly, we need to have a way of maintaining a collection of objects that are subscribed to changes. The easiest way of doing this is through a list as shown above. The subscribe method adds to the list while the unsubscribe method removes it from the list. When the enrollPupil method is called, we perform an action with the Pupil before iterating through the list and letting each observer know of the change.
Penultimately, we need to implement the IObserver interface in the form of a LessonObserver. This is implemented as follows:
public class LessonObserver implements IObserver<Pupil> {
@Override
public void onNext(Pupil value) {
System.out.println(value.getName() + " has enrolled on the course.");
}
@Override
public void onCompleted() {
// This is fired when we stop listening
}
@Override
public void onError(Exception e) {
// This is fired when an error occurs (if the Observable throws it to us)
}
}
Lastly, we just need a simple way of testing the application which I have done for you:
class Driver {
public static void main(String[] args)
{
// create the objects
Lesson l = new Lesson();
LessonObserver lo = new LessonObserver();
// subscribe to notifications
l.subscribe(lo);
// add some dummy data
l.enrollPupil(new Pupil("Stueh", 18));
l.enrollPupil(new Pupil("Jon", 18));
l.enrollPupil(new Pupil("Kirsty", 18));
// unsubscribe to finish off
l.unsubscribe(lo);
}
}
All being well you should end up with the following output in your console window:
Stueh has enrolled on the course.
Jon has enrolled on the course.
Kirsty has enrolled on the course.
I hope that you now understand how simple the observer pattern is and how it can be useful.
Post Permalink
by Stuart Blackler
In this post I am going to talk a little bit about the system databases, what they are used for and provide a few additional resources should you be interested in reading more about each database.
Master
The name of this database gives away its true purpose. It is the master database for SQL Server. It is responsible for the following:
- Instance wide metadata;
- Server configuration;
- Initialisation information; and
- Information about all databases in the instance.
Because of the reasons specified above, the master database is very important to SQL Server. Therefore, it is extremely important that you backup the master regularly in the event that the master database becomes corrupted.
MSDB
The msdb database stores configuration data for a number of resources including but not limited to:
- SQL Server Agent
- Database Mail
- Service Broker
- Backups
As this database contains alot of configuration data, it is vital that this database is backed up alongside the master database. Simply backup this database when ever a configuration change is made to one of the above services.
Model
As the name suggests, the model database is used as a template for every database that you create. For example, if you want a series of stored procedures in every database you create on the system from now on, you create the scripts in model. It is not just limited to stored procedures though, other SQL objects such as custom data types can also be created inside of model.
Depending on your businesses needs, this is also a vital database to backup as it may contain some code that should not be lost. As the database is small, Microsoft recommend creating a full database backup with backing up the log file deemed unnecessary.
Resource
The resource database is a special database. For starters, it is read-only and hidden which would explain why we never see it in list of databases in SSMS. The purpose of this database is to hold the definitions of all system objects.
When querying against a user database, it appears that the sys schema belongs to the user database. In fact, the definitions of all sys objects are held inside of the resource database.
TempDB
The tempdb system database is a global resource that has been used since
the inception of SQL Server. It is available to all SQL Server users
connected to the SQL Server instance. It holds many types of data and
can often be the source of a performance bottleneck.
For example, tempdb can hold the following:
- Temporary user objects that are explicitly created, such as: global
or local temporary tables, temporary stored procedures, table
variables, or cursors.
- Internal objects that are created by the SQL Server Database Engine,
for example, work tables to store intermediate results for spools or
sorting
- Row versions that are generated by data modification transactions in
a database that uses read-committed using row versioning isolation
or snapshot isolation transactions
- Row versions that are generated by data modification transactions
for features, such as: online index operations, Multiple Active
Result Sets (MARS), and AFTER triggers
For more information about how you know whether tempdb is a problem for your system, read a few of the following articles:
Distribution
The distribution database is not always created on a SQL server instance. This is because it has the important role of storing metadata and history data for all types of replication and transactions for transactional replication.
Only when the server is configured as a distributor, link at the end of this section, is the database created. Because of the data stored in the distribution database, Microsoft recommend that this database is backed up in the event of a disaster to the distribution instance.
Backups
As I have mentioned in the article so far, it is important to backup the system databases. Luckily, there is an excellent guide on MSDN for backing up our system databases. This guide covers what recovery options can be used and the implications as well. This guide will provide you information on how to view or change your recovery model for a specific database, should you need the reminder.
Hopefully that gives you a quick oversight on what each database is used for and resources to continue exploration if you would like to.
Post Permalink
by Stuart Blackler
Sargabliity is a term used to describe whether or not a query can leverage existing indexes in order to speed up query execution. SARGABLE means Search ARGument ABLE.
Commonly, new or inexperienced SQL developers will often write queries such as:
SELECT col1, col2 FROM myTable WHERE YEAR(col3) >= 2008
While this is perfectly valid SQL, having the function YEAR() on the left hand side of the condition will cause SQL Server to evaluate every single row of the table. This is fine for a small table but for a large table and multiple users, this approach will not scale very well at all as it will consume too many resources.
In essence, if you ask the database server to manipulate the data in order to get the results, you will suffer a performance impact and reduce your ability to scale. Now we know this, we can re-write the query to perform a lot better:
SELECT col1, col2 FROM myTable WHERE col3 >= '2008-01-01 00:00:00'
With our new query above, we can leverage our fictious index on col3 to quickly search and return all the results. This approach is much like you would go through a phonebook. You would not search through the phonebook, looking at every name to see whether they started with the letter S. Instead, you know that the phonebook is split into sections (much like our index in our database) so you can jump straight to the section that begins with S and begin searching from there. This is what we have just told SQL Server to do with a minor modification of our query.
Matching your data types
By matching the data types in your WHERE statements you avoid implicit conversions (automatic conversions). There are two main reasons that we want to avoid implicit conversions:
- It is an extra operation that we potentially don't need
- If the conversion is on the side of the data table, it will evaluate the conversion for every row in the table which is SLOW
Which side of the evaluation expression the conversion happens on depends on the order of precedence. It is important to note the following from the article linked:
When an operator combines two expressions of different data types, the rules for data type precedence specify that the data type with the lower precedence is converted to the data type with the higher precedence. If the conversion is not a supported implicit conversion, an error is returned.
Tibor Karaszi has a detailed post on the subject here: Match Those Types
Finding implicit conversions in the QPC
Jonathon Kehayias (blog) has a great query to find queries that have implicate conversions that exist in the query plan cache. I have included the query below as a point-in-time snapshot of the query, but please check the original link for updated versions of the query and additional comments.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE @dbname SYSNAME
SET @dbname = QUOTENAME(DB_NAME());
WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
stmt.value('(@StatementText)[1]', 'varchar(max)'),
t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)'),
t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)'),
t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)'),
ic.DATA_TYPE AS ConvertFrom,
ic.CHARACTER_MAXIMUM_LENGTH AS ConvertFromLength,
t.value('(@DataType)[1]', 'varchar(128)') AS ConvertTo,
t.value('(@Length)[1]', 'int') AS ConvertToLength,
query_plan
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
CROSS APPLY stmt.nodes('.//Convert[@Implicit="1"]') AS n(t)
JOIN INFORMATION_SCHEMA.COLUMNS AS ic
ON QUOTENAME(ic.TABLE_SCHEMA) = t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)')
AND QUOTENAME(ic.TABLE_NAME) = t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)')
AND ic.COLUMN_NAME = t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)')
WHERE t.exist('ScalarOperator/Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1
Invertability
Rob Farley has an excellent article on inverting the search predicates. I thought this is a must read and a technique that we could apply to our own development.
For the unavoidable situations
Sometimes you need to break the guidelines of SARGABILITY for various reasons. But there are ways in which you can still improve the performance of your queries. For instance, you can use an indexed view or a computed column.
Remember these are only guidelines. Test, test and test some more for your specific scenario.
The Big 3
- Try to leave the manipulation of the data in the table alone where you can
- If you can not effectively do #1 explore other avenues such as computed columns or indexed views
- Make sure you know the data types in your system in order to avoid implicit conversions
Further Reading
Post Permalink