Sunday, January 27, 2008 11:47 AM
Shaun Collett
Full and Partial Caching Pattern in WCF and SQL Server
When developing a full Service-Oriented Framework, it's inevitable for you to reach out to several other services from your service to collect data. If these services are hosted within your network, the release of IIS 7.0 will improve performance by allowing service responses to be transferred in TCP/IP and named pipes. However, the fact is that the service calls slow things down quite a bit. Typically each service call reaches out to a data source, which can add to the latency.
My team ran into this issue and took the following caching approach to resolve it. It uses full caching and partial caching mechanisms and would be useful any time you're required to collect a lot of information that, for the most part, remains static.
Overview
Since serializing and de-serializing has become so easy in .Net, we simply serialize the results we want to cache after the data has been collected and write it to a data source, such as SQL Server. The class structure can be serialized into a binary or XML stream. After it's written to the database, upon the next service call we check the database to see if valid cache exists. If so, we retrieve it, de-serialize and send it back to the client. Once we implemented this, we experienced 4000% performance improvements!!!
WCF Service
We could certainly create an abstraction layer or embed the logic directly in the service method. I've even considered creating a custom method attribute to enable the method to be cached. However you do it, when the call enters the method that has caching-enabled it makes a high-performance call to the database to see if valid cache exists. If so, return… done. If not, we'll pass through the logic, let the service call run as usual, then push the service call serialized results to the database prior to returning to the client.
SQL Server Database
To store the serialized cache, we needed a central table that is flexible. The final table structure we ended up with looked like this…
CacheHash
- VARBINARY
- Represents 1:N values that make up the result's primary key. This can be the object name, method name, parameters, etc all hashed together using SHA1.
CacheResult
- VARBINARY
- Serialized cache result.
CacheXml
- XML
- Metadata of the cache result. For example, useful to store the XML representation of the hash if you want to rebuild cache each night through a database job.
ExpirationDate
- DATETIME
- Triggers when the cache has expired.
CreatedDate
- DATETIME
- Timestamp when the cache was created.
To create the hash value, we create a stored procedure for each "type" of cache. The SQL Server stored procedure only function is to feed the primary cache keys into the HASHBYTES function, which produces the CacheHash value. After the hash has been created, common stored procedures are called to add or retrieve the results.
The stored procedures created against the table are very fast and lightweight. The clustered index on the table is against the CacheHash column as all SELECT statements should be against this column. There are no nonclustered indexes.
Partial Caching
When exploring this pattern, we immediately asked ourselves about partial caching. What if you have a structure that's largely static, but 1 value needs to be real time? So far this negates our pattern. But… the beauty of serialization/deserialization is after you de-serialize you can update the data structure. For example, in our implementation after we retrieved the cache result we calls another (incredibly performant) stored procedure to update the values that needed to be real-time. The stored procedure ran in 0.1 seconds, compared to the stored procedure that returned ALL of the data, which ran in 1.5 seconds.
I've found this pattern to be incredibly powerful in a Service-Oriented environment. If you have any large service calls that fit this bill, please implement it. It will make you look like a rock-star!