Friday, December 26, 2014

Public facing hard to guess identifiers

This might take some explaining of the actual problem. When applications are reporting information I would like there to at least not be possible to guess an identifier starting at 1 and have the data end up on some other users account. My goal isn't so much to guard against somebody who is intentionally trying to do misreporting, but to make it hard enough to do for all but the determined attackers.

So how do you do this, one way is to just use GUID's for every identifier but I have always hated that and it leads to bad database design at least in my opinion. So my suggestion is to just use a simple integer identifier counting upwards internally. However whenever the identifier required to report for this is displayed to an end user I take this ID and encrypt it using a secret key with AES-256. This results in a pretty much random 16 byte array that you then encode using base 64 and present to the user. Once any reporting is done you simply do the reverse so that the ID needed to be used is the base 64 encoded encrypted value. This means that it will be almost impossible to guess a valid identifier for anything coming from the outside but internally you can still just deal with regular integers of varying size for everything.

The performance hit should be negligible since AES is implemented in hardware in recent CPU's and even without it AES is really fast.

Thursday, December 25, 2014

Dealing with timestamps

I thought I would do a detour though and share some thoughts on dealing with timestamps.

Something to take into account dealing with timestamps reported into the system is that I can not really trust that users have their clocks running correctly. And since data can be collected offline and then submitted after the fact I need to compensate for the devices that have really weird time settings (A surprising amount of people run their machines with the clock set to 1970). I would deal with this by simply have the clock as the reporting device thinks it is at the time of submission be included as part of the submission. This will give me a delta for how much all the other included timestamps in that particular submission need to be adjusted. It will not handle the case where the user has changed their clock between the start of the data being collected and the time it was submitted, but hopefully that will be a pretty rare occurrence.

Tuesday, December 23, 2014

Fun with collations and persistence layers

So today I spent several hours trying to figure out how to get the correct collation on a SQL Server table created using the Microsoft Entity framework.

So you would think that given that I use a C# string that is UTF8 that when you create a table using this data type with the entity framework it would by default have a collation that supports Unicode more or less properly out of the box... No such luck.

And to make matters worth the Entity framework provides no provision to specify the collection used. In the end what I had to do is to run the following custom SQL on every column that I need proper unicode handling of (For me that means properly distinguishing different high code point Unicode characters as being different letters and also being case sensitive).

  ALTER TABLE {tablename}
    ALTER COLUMN {columnname}
    COLLATE Latin1_General_100_BIN2

The problem is that to do this you also need to drop any indexes that use this column before you make the change and then recreate the column (In my case since I have no deployed database yet I simply create the indexes after I have properly changed the collation). Then once I solved everything for SQL Server I had to do it all over again for MySQL (Different parts of the service runs on SQL Server and MySQL), but that was a lot easier than for SQL Server. Finally I spent some more time trying to figure out some unit tests to validate that an error in this area will not sneak in at a later date since this would be easy to forget.

I have always had a very skeptical approach to persistence layers like the Entity framework (Or indeed any other similar project like Hibernate, XPO etc...) in that I think it produces horribly inefficient database access patterns and should really only be used if you don't care at all about performance or if you are lazy.

I do like certain aspects of the Entity framework though. Specifically I do like being able to specify the database structure using LINQ instead of SQL (You generally get better compile time error checks with this) and also the migration layer for the Entity framework is pretty nice.

That said having worked on developing SQL development tools for over a decade and knowing SQL at the back of my spine I would not trust it for a second to actually generate the SQL for my database access layer. That I hand code for exactly the purposes that I need.