My company is transitioning some of it's internal core services to publicly available web services.
The reasons for doing so are two fold:
1. Make it easier to develop custom applications utilizing the same core platform services in order to reduce development and maintenance time (vs packaged modules).
2. Allow external customers the ability to develop their own applications using the Vocabra platform.
The services will be designed to be multi-tenant with the intention that most customers will be hosted in a shared database instance. The same architecture could be used for any customer that needs more data isolation by simply creating a single tenant in a separate database instance (which would likely be accompanied by separate application instances as well; at significant additional cost due to deployment and maintenance issues).
A new surrogate key design is necessary to provide security and data migration ability with minimal effort since it is entirely likely that over a short period of time, customers may be merged/split into other database instances and the core services may be versioned quickly (until a more complete version is reached). From a security perspective, it is useful to provide an external customer with a non-sequential nor easily guessable record identifier to prevent any mild form of hacking the URL's. (internally, the queries will also provide mechanisms for preventing unauthorized viewing/editing of records)
The proposed solution
Each table (for publicly available records) will have the following base columns:
id - auto increment default integer field used for internal joining of records (considered to be private id)
uid - unique integer field that will be displayed in the url and provided as a reference id externally (considered to be public id)
Generating the 'uid' field
The 'uid' field will be generated through the following algorithm:
1. Generate a random UUID using java.util.UUID.randomUUID().
2. Hash the random UUID using SHA-256
java.security.MessageDigest md = java.security.MessageDigest.getInstance("SHA-256");
md.update(uuidStr.getBytes("UTF-8"));
3. Take the first 6 bytes of the resulting hash and create a hex value (will be used for display of key; 12 characters).
4. Store the hex value as a long value for database querying
What this solution gets us:
1. Non-sequential URL identifiers (useful to avoid simple 'plus one' URL hacking) that aren't terribly ugly or long.
2. Auto increment keys still exist for joining records in the backend, which should be easier on the database
3. 'Public key' stored as integer value for faster searching rather than trying to compare against a character value (hex form) but is easily switchable from format to format
4. Minimal migration headaches since each record already has a unique identifier and can simply be inserted into the new database without having to worry about the auto increment field. (associations are still 'work' though)
5. Values up to 281,474,976,710,655 (big enough to not worry about, will require sharding or something at that point)
Drawbacks:
1. Insert times will increase, as will CPU IO due to a more complicated key generation routine. (currently not concerned with that since we'll plan on load balancing if it becomes an issue)
2. Requires modification of URL creation to use alternate id field for URL generation (modify Grails templates)
3. Overhead in converting URL keys from hex to integer in order to find records in database (again adding to additional IO)
4. Database querying more difficult since the hex value doesn't exist to search by, requires conversion to integer value or additional scripting
5. Values up to 281,474,976,710,655 (not a large concern, reasoning above)
Outstanding questions:
1. Is taking the first 6 bytes of the SHA hash going to have any problems with collision, it's understood the randomness should be OK.
2. Overkill anywhere?
3. Is it worth the bother to keep the auto increment field around? If we're storing the external key as an integer value what are the performance hits for not having them be sequential (ignoring storage requirements, we'll get a bigger storage device)
4. Is there any benefit to storing the original UUID? Perhaps in case we'd like to regenerate keys at some point in the future (ignoring the user frustration level)
5. Is 'uid' an appropriate name for the alternate key? What are alternative names?
Whatcha guys think?