We're in the beginning stages at work of writing a proposal to redo a local government-sector website that has a LOT of custom data sets. Things like city crime statistics, broken down by district; School system data (test scores, dropout rates, special ed) sortable by grade, district, year, etc; Child Services data; hosing data; etc.
There current process for adding new day is
1) their communtiy data coordinator massages the information that comes from various sorces
2) their on-retainer "webmaster" inserts the data into MS-SQL and writes as custom (ASP-based
) UI page for each table of data that lets web users select various fields to sort and filter their results by.
My proposed goal is to eliminate step two (as we will be taking over all maintenance of the website's systems) and have the data coordinator just upload the data herself through a web app; so that we never have to deal with it.
The way I kind of think it would work is like this:
1) after massaging the data offline using excel or whatever program she wants, she uploads the file to the site - creating either a new table or replacing/appending an existing table.
2) she would then fill out a web form that would update (presumably several) tables of meta data ABOUT the data set she just uploaded. In addition to obvious stuff like user-friendly table names and a description of the table's content this information would also track which columns are searchable and how the data can be linked to other tables of information already on the site.
3) a PHP script uses this meta data from step two to dynamically generate the end-user's interface for viewing and sorting the data.
Obviously this is going to be a massive undertaking.
Does it sound feasible though?So for step 2, I would have tables like:
data_tables
id
table_name
table_label
table_description
data_columns
id
table_id
col_name
col_label #user friendly column name
col_description
is_searchable
has_many #boolean - if yes, use data_lookups table to find related data
data_lookups #table that links one row of data to that of a row in another table
primay_column_id
foreign_column_id
Does this seem like i'm on the right track? any better ideas?I'm still not sure exactly how easy it will be to join table data together on the fly but in general, I think I can make a "query builder" application from this.
My real concern is that so far i've been looking at this from the perspective of single-table un-normalized data sets. Most of this data is going to have SOME shared elements such as zipcodes, districts, and zones (yes those are all different) which, in a perfect world should all be stored in their own tables and only referenced by whatever new data set is being uploaded by the data coordinator. But she's getting the data offline and putting it into an offline table. I have no idea how I can make the process easy for her to normalize data, separating common information from the main table.
Your thoughts?