SQL Server 2008 to the rescue: user defined aggregate solution recovered

March 6th, 2009 by Adi S.
In my introductory post on user defined aggregates I mentioned that Merge() method will be an interesting point of discussion. The Merge() method is called any time SQL decides to split up the workload, process it on separate threads, and eventually merge all of the results. It seems perfectly desirable to be able to maximize the use of computer resources.
 
We have also decided to create a user defined aggregate that calculates the median which means that we will need some kind of data structure to hold all of the values in a group. For the sake of simplicity we could use an array of type double. The implementation using an array of type double will compile fine but upon the deployment to the instance of SQL Server you might receive one of the following two error messages depending on the data structure used.
 
Type {…} is marked for native serialization, but field {…} of type {…} is of type {…} which is a non-value type. Native serialization types can only have fields of blittable types. If you wish to have a field of any other type, consider using different kind of serialization format, such as User Defined Serialization.
 
Type {…} is marked for native serialization, but field {…} of type {…} is not valid for native serialization.
 
SQL Server has to be able to serialize UDAs in order to communicate/pass them between the threads. In fact, native serialization only supports following data types for native serialization: bool, byte, sbyte, short, ushort, int, uint, long, ulong, float, double, SqlByte, SqlInt16, SqlInt32, SqlInt64, SqlDateTime, SqlSingle, SqlDouble, SqlMoney, SqlBoolean. Since our array type is not acceptable for native serialization we will need to mark the Format attribute to use UserDefined serialization and provide our own implementation of Read() and Write() methods as defined in the IBinarySerialize interface.
 
The next issue that comes into play is that the maximum size of a serialized object cannot exceed 8000 bytes in SQL 2005. That does not allow too many double type values to be aggregated and the UDA execution will throw the following error if the size of serialized UDA exceeds 8000 bytes.

A .NET Framework error occurred during execution of user-defined routine or aggregate {…}: System.Data.SqlTypes.SqlTypeException: The buffer is insufficient. Read or write operation failed.
 
One suggestion I found was to force SQL to use on processor/core at the time via OPTION (MAXDOP 1) directive. That did not seem to resolve the issue as SQL Server does not seem to always accept aforementioned query hint. Needless to say this limitation greatly reduced the effectiveness of my solution.
 
Luckily the good folks working on SQL Server 2008 have decided to solve my problem and increase the size of the serialization buffer. SQL Server 2008 now allows CLR components to be serialized up to whooping 2GB as outlined in the TechNet BOL article titled “Requirements for CLR User-Defined Aggregates”. Way to go SQL Server team and thanks for saving my project.
 
 

Tags: , ,

Leave a Reply