Tuesday, August 7, 2012

Litter is also where cat(egories) put all their s**t...

In the quest to be able to store information about any collection (regardless of what it is a collection of), we began by trying to determine the necessary generic data structures.

When we left off - we had figured out tables to store the:

  • attributes that are necessary to flesh out a category
  • attribute value types that those attributes can contain
  • attribute values (the choices that attributes can actually hold)

But, those attributes are not describing anything yet.  Lets fix that and finally define the category table.

category (
id bigint auto_increment primary key,
categoryDescription varchar(255),
subCategoryOf bigint

Those first two columns are pretty straight forward.  There is an id and a description of the category.  But that third is not so obvious - especially this early on in figuring everything out.  The reason is that I am cheating a little bit and looking ahead.  And what I see when I do that is - in order to describe a category, there might be 'things' that are part of it.

For example (going back to crayons...and this is pushing a little bit) - crayons not only have a color and a length.  They also have a manufacturer.  And, a manufacturer would need its own set of attributes to describe it (name, address, phone number, etc).  I do not plan on actually going to that point.  But if I did, then it could be taken even further (addresses have the street number, street name, city, state, etc...phone numbers have area codes, prefixes, etc).  Being able to specify that a category can be a sub-category will allow us to support this type of nesting regardless of whether or not it would be silly (or necessary) to do so.

Having the category table allows us to define what a category is - but not to hold any actual things that belong to the category.  We also do not have anywhere to store the actual attribute values for a particular item (just the list of choices that it might have).

instance (
id bigint auto_increment primary key,
categoryID bigint,
instanceDescription varchar(255),
created date

instanceAttributeValue (
id bigint auto_increment primary key,
instanceID bigint,
attributeID bigint,
attributeValueID bigint

You may or may not be able to tell, but the way we have this set up, a user would have to have an entry in the attributeValues table to pick...No free text entry.  We will handle that by allowing the user to pick from the list or enter new text that would automatically get added to the attributeValues table.  But that is an implementation detail and I am going to try to keep the data structure clean.  To do that, let's add another column to the attributeValues table - (instanceSpecific boolean default false).  Here is the new structure of the table.

attributeValues (
id bigint auto_increment primary key,
attributeID bigint,
valueData varchar(255),
instanceSpecific boolean default false

Now we can tell that an entry in the table is a one-off and can be removed if there is no reference to it.

One more thing is missing - there is no way to tell which attributes belong to which categories.  This could be handled in one of two ways: add the categoryID as a column in the attribute (rigid) or make a join table (flexible).  To see how these choices would affect the system, let's take 'color' as an example.  The rigid route would force us to have a special color attribute for every category that we wanted to have it apply to - each with its own set of possible values.  The flexible way will allow us to use a single color attribute and apply it to as many categories as we wanted - much less redundancy.  If we come across an attribute the at first looks like it is a repeat but turns out to need a special set of values then we can always split it out later.

xCategoryAttribute (
id bigint auto_increment primary key,
categoryID bigint,
attributeID bigint

And that completes the first part of the trek.  All of the data structures have been figured out.

Here are the additional table entries to let us actually describe a crayon (some of the table definitions are from the previous post):

  • (1, 'Crayon', null)
  • (1, 'Enter single value', true, false)
  • (2, 'Enter multiple values', true, true)
  • (3, 'Select single value', false, false)
  • (4, 'Select multiple values', false, true)
  • (1, 'Color', 3)
  • (2, 'Length', 1)
  • (1, 1, 'Red', false)
  • (2, 1, 'Orange', false)
  • (3, 1, 'Yellow', false)
  • (4, 1, 'Green', false)
  • (5, 1, 'Blue', false)
  • (6, 1, 'Indigo', false)
  • (7, 1, 'Violet', false)
  • (8, 2, '3 inches', true)
  • (1, 1, 1)
  • (2, 1, 2)
  • (1, 1, 'Red crayon', '2012-08-07')
  • (1, 1, 1, 1)
  • (2, 1, 2, 8)
If that is anything less than dizzying to look at - you are following along far too well.  Here is an attempt to string things together.

Crayons (categoryID: 1) have two attributes: color (attributeID: 1) and length (attributeID: 2).  The value for a color must be a single value selected from a list (attributeValueTypeID: 3).  The value for the length of a crayon will be a single entered value (attributeValueTypeID: 1). The red crayon gets id 1, is a crayon (categoryID: 1), and was created on '2012-08-07'.

And, we have one crayon described - a new red one.  The values for its two attributes are (id: 1 / categoryID: 1) color (attributeID: 1) red (attributeValueID: 1) and (id: 2 / categoryID: 1) length (attributeID: 2) 3 inches (attributeValueID: 8).

That probably does not help much right now.  If you can make it through to the end and see this actually working it should all become a bit clearer.  And even if it doesn't get clearer, that is why we have computers manage all of this for us.

The next two parts are going to be tied pretty tightly together: data entry/display and processing.  But, I will try to pull them apart to make them a little bit easier to keep track of.

To do the pulling apart, we'll continue moving from back (server side) to front (user side).

And cover...



Levi Domingos said...

nice material. i liked it.

Jay McHugh said...

I'm glad you enjoyed it! This 'toy exercise' is turning out to be a lot more involved than I expected it to be.