Author Topic: brainstorming (Read 1613 times)

micah · « **on:** December 17, 2010, 12:22:01 PM »

We're in the beginning stages at work of writing a proposal to redo a local government-sector website that has a LOT of custom data sets. Things like city crime statistics, broken down by district; School system data (test scores, dropout rates, special ed) sortable by grade, district, year, etc; Child Services data; hosing data; etc.

There current process for adding new day is
1) their communtiy data coordinator massages the information that comes from various sorces
2) their on-retainer "webmaster" inserts the data into MS-SQL and writes as custom (ASP-based

) UI page for each table of data that lets web users select various fields to sort and filter their results by.

My proposed goal is to eliminate step two (as we will be taking over all maintenance of the website's systems) and have the data coordinator just upload the data herself through a web app; so that we never have to deal with it.

The way I kind of think it would work is like this:
1) after massaging the data offline using excel or whatever program she wants, she uploads the file to the site - creating either a new table or replacing/appending an existing table.

2) she would then fill out a web form that would update (presumably several) tables of meta data ABOUT the data set she just uploaded. In addition to obvious stuff like user-friendly table names and a description of the table's content this information would also track which columns are searchable and how the data can be linked to other tables of information already on the site.

3) a PHP script uses this meta data from step two to dynamically generate the end-user's interface for viewing and sorting the data.

Obviously this is going to be a massive undertaking. Does it sound feasible though?

So for step 2, I would have tables like:

Code: [Select]

data_tables
 id
 table_name
 table_label
 table_description

data_columns
 id
 table_id
 col_name
 col_label #user friendly column name
 col_description
 is_searchable
 has_many #boolean - if yes, use data_lookups table to find related data

data_lookups         #table that links one row of data to that of a row in another table
 primay_column_id
 foreign_column_id

Does this seem like i'm on the right track? any better ideas?

I'm still not sure exactly how easy it will be to join table data together on the fly but in general, I think I can make a "query builder" application from this.

My real concern is that so far i've been looking at this from the perspective of single-table un-normalized data sets. Most of this data is going to have SOME shared elements such as zipcodes, districts, and zones (yes those are all different) which, in a perfect world should all be stored in their own tables and only referenced by whatever new data set is being uploaded by the data coordinator. But she's getting the data offline and putting it into an offline table. I have no idea how I can make the process easy for her to normalize data, separating common information from the main table.

Your thoughts?

ober · « **Reply #1 on:** December 17, 2010, 12:43:57 PM »

A few things come to mind from working with importing data for much of my career:

1) Allowing the end user to import data will be problematic and your error handling better be ROCK SOLID. There will be errors and problems with the data and you need to be able to tell the user EXACTLY what is wrong with the data so they can either start over or upload again after fixing it. Appending the data to existing data will be messy and problematic but you likely won't have a cumulative file for each set of data so you'll just have to find a way to deal with it.

2) End users are stupid. Your UI better be as simple as pie. I'm guessing they are probably using option #2 because of that problem alone.

3) Getting them to use shared elements across tables will be another nightmare. How do they match them up? What if there are new ones that don't exist in the centralized tables? The only way around this would be to make them use a standard column name in the import file so you could recognize it and manage it through the import ... but then what if they don't use the right name or if they don't spell it correctly?

Honestly, you have a great idea but I don't think it's going to go very smoothly at all and your predecessor probably realized this and figured it would be easier to do it on their own.

My suggestion would be to cut out half of #2: You insert the data but make a generalized UI that can handle different datasets. In the end I think you'll be farther ahead.

micah · « **Reply #2 on:** December 17, 2010, 01:52:20 PM »

thanks Ober for your insight. In the end, what you suggest may be easier - having us insert their data in a way that can easily be handled by the script that generates the searchable view for the end user.

I did have another crazy idea which might not be feasible but I'll type it out here so I can remember it later...

let the client upload their spreadsheet of data.
then, when viewing it they can click on a column name and "normalize" the data. Triggering the following:

1) the user is prompted to select an existing "shared data" table or create a new one.
2) the script grabs the values from the given data table column: SELECT col_name FROM uploaded_table GROUP BY col_name
3) for each grouped value:
a. check if the value is similar to an existing value in the "shared data" table (if col LIKE '%val' OR col LIKE 'val%')
if its IS similar, store in an array for use in step 4.
stop and continue to next value

b. check if the value is an exact match in the shared_data table
i. if not, add it.
ii. get ID of this value
iii. update uploaded_table with new value: UPDATE uploaded_table SET col_name = share_data.id WHERE col_name = 'original_value'

4) prompt user with list of all the values similar to existing shared values. User manually selects if the value is really a match or not; triggering step 3-b (either i. or ii.)

ta-da!

of course, 3a isn't perfect and doesn't account for alternate spellings, spacing or puncuation. This could be a major fail point I suppose.

But presuming its all ok, here's what the tables could look like:

Code: [Select]

shared_data_tables #to track all the tables of shared data
 id
 table_name
 table_label 
 table_description

shared_data_DistrictNames  #an example of a "shared data" table
 id
 value

uploaded_table #an example of an "uploaded table"
 id
 column1
 column2
 DistrictName

ober · « **Reply #3 on:** December 17, 2010, 04:57:10 PM »

If you can explain that easily to your people and still get past all of the importing BS, then have at it. Sounds cumbersome to do it the way you describe and you still don't get away from the importing nightmare.

Plus, people will ask 'why' they have to do the above. Trust me.

micah · « **Reply #4 on:** December 18, 2010, 10:00:22 AM »

yeah, i totally agree with your point. I'm going to keep brainstorming on this. I need to figure something out by Tuesday when I meet with the people who'll be writing up the actual proposal for the client. We can't pass on the job because its an off-shoot of a larger, important client... but we can't waste man hours programming tedious and cumbersome processes that, like you said, probably won't be used correctly anyway in the end.

its kind of like a rerun episode of Pawn Stars I was watching last night. This guy brought in some famous sunken treasure and said he hadn't been able to sell it at auction but was hoping to pawn it for $700k. The pawn guy offered around $250k and the guy declined and left. Afterwords, the pawn shop owner expressed his relief that the guy passed on the low-ball offer because, even though the profit could have been high when he eventually resold it, it could take years and he would have tied up a quarter-million dollars in the process.

So for this project, I kind of think we might just quote them on a massive, bells-and-whistles job with a 6-figure price tag. It'll either scare them away or at least make it worth our while.

ober · « **Reply #5 on:** December 22, 2010, 01:40:33 AM »

So? What was the outcome?

micah · « **Reply #6 on:** December 22, 2010, 07:57:36 AM »

yesterday I met with our owner and the copywriter who writes up the proposals and makes them look pretty. I gave them a breakdown of steps totaling about 300 hours for strategy and programming. Of course, that number doesn't include the stuff that happens before it hits my desk - like graphic design design and whatever other BS they pad projects with - I'll let them add that into the final proposal. Then we'll see what happens. I did attempt to make it clear that, no matter how robust the system was or how much time went into it, it would be a very "delicate" program, probably requiring a retainer or separate agreement for ongoing support and or maintenance.

EntropySink

News:

Author Topic: brainstorming (Read 1613 times)

micah

brainstorming

ober

Re: brainstorming

micah

Re: brainstorming

ober

Re: brainstorming

micah

Re: brainstorming

ober

Re: brainstorming

micah

Re: brainstorming