https://www.henrik.org/

Blog

Tuesday, December 23, 2014

Fun with collations and persistence layers

So today I spent several hours trying to figure out how to get the correct collation on a SQL Server table created using the Microsoft Entity framework.

So you would think that given that I use a C# string that is UTF8 that when you create a table using this data type with the entity framework it would by default have a collation that supports Unicode more or less properly out of the box... No such luck.

And to make matters worth the Entity framework provides no provision to specify the collection used. In the end what I had to do is to run the following custom SQL on every column that I need proper unicode handling of (For me that means properly distinguishing different high code point Unicode characters as being different letters and also being case sensitive).

  ALTER TABLE {tablename}
    ALTER COLUMN {columnname}
    NVARCHAR({length})
    COLLATE Latin1_General_100_BIN2

The problem is that to do this you also need to drop any indexes that use this column before you make the change and then recreate the column (In my case since I have no deployed database yet I simply create the indexes after I have properly changed the collation). Then once I solved everything for SQL Server I had to do it all over again for MySQL (Different parts of the service runs on SQL Server and MySQL), but that was a lot easier than for SQL Server. Finally I spent some more time trying to figure out some unit tests to validate that an error in this area will not sneak in at a later date since this would be easy to forget.

I have always had a very skeptical approach to persistence layers like the Entity framework (Or indeed any other similar project like Hibernate, XPO etc...) in that I think it produces horribly inefficient database access patterns and should really only be used if you don't care at all about performance or if you are lazy.

I do like certain aspects of the Entity framework though. Specifically I do like being able to specify the database structure using LINQ instead of SQL (You generally get better compile time error checks with this) and also the migration layer for the Entity framework is pretty nice.

That said having worked on developing SQL development tools for over a decade and knowing SQL at the back of my spine I would not trust it for a second to actually generate the SQL for my database access layer. That I hand code for exactly the purposes that I need.

No comments: