Wednesday, August 22, 2012

Building the litterBox...

In the past two posts, I went through figuring out what type of data tables could be used for storing a catalog of any type of item including a full description - in user data so that the same program could be used to keep track of a collection of anything.

Now that that is done, the next step is to build the processing side of the system.

Since I do most of my work building Java webapps that run on Apache Geronimo...I will lay out how I would build the back end running on Geronimo.

First of all, the information would have to be stored in the database tables.  As I said, MySQL is my (current) favorite database so, that is where the data would live.  What I have not recently mentioned is that the Java Persistence API (JPA) is Amazing!

Before I discovered JPA, all of the data access I did was done by manually coding the commands in SQL using JDBC to perform the connection to the database.  This was slow, error prone, painful to debug problems, and tedious.

Using JPA, all you have to do is to create Java classes that correspond to your database tables, add some fairly straightforward annotations to them, and list them in a configuration file.  And voila, reading and writing to/from the database is nearly as effortless as creating POJOs.

Geronimo comes with Apache OpenJPA baked in - so there is nothing extra to do except use it.

Here is a link to the persistence.xml that I made for my litterBox sitting in github:

On the flip side, we will need to get that data sent over to the web client (and have new data sent from the web client).  For flexibility, I like to use XML.  Most programming languages have standardized methods for processing it - and that includes JavaScript.  I am going to choose it over JSON (JavaScript Object Notation) even though many languages also support JSON because it makes me nervous.  It is possible to put executable code into a JSON object, but it is not possible to do the same in XML (at least not easily).

And, to add to the attractiveness of XML, there is a built in standard to convert Java objects into XML - JAXB (Java Architecture for XML Binding).  In the same way that JPA makes it possible to simplify database access using annotations - JAXB makes it possible to marshal Java objects into XML using a essentially a block of boilerplate code (after those same POJO entity classes get a couple of JAXB annotations added).

Here is a link to the first entity class fully annotated (both for JPA and JAXB):

And, that takes care of making Java aware of the data.  Now we need to establish the API between the web client and the server.  The interface we will be using should support all regular adding and updating functions.  Sometimes this is referred to as CRUD (create, read, update, delete) - but I hate actually deleting.  So instead, we will just mark things as being deleted with a 'deleted' attribute (so no table change needed).  We will also combine the 'create' and 'update' functions be having update automatically do the create if the record is not found.  That gets us down to only needing two functions: read and update.  We'll add one extra 'selectList' that will send back a simplified list of all of the records.

And our final interface becomes:

  • sendXML() - send the full XML of a table row to the client (identified by its ID).
  • selectList() - send a simplified XML of each table row as a part of a big list for use in populating drop down boxes.
  • update() - update existing table row or create new (and then update)
And a link to the interface for the attribute session bean:

Both the sendXML and selectList methods will return a string of XML data (to be interpreted by the client).  The update method will return the ID of the row that was just updated (or created).  That way, the program will be able to request the XML data if needed.

The update method is going to use an object that contains XML data parsed and processed from the message sent from the browser client (along with an XPath object for pulling the data out of it).  This will move the need for the servlet layer to understand the information that it is handing back and forth all of the way back to the session beans that will actually be doing the work.

So that is the rough (very rough) run down of how processing will be managed.

The github repo will be updated (as I have time) to fill in all of the functionality that is described here.  I really do plan on having a complete (if rudimentary) working app when all is said and done.

Next up...


Tuesday, August 7, 2012

Litter is also where cat(egories) put all their s**t...

In the quest to be able to store information about any collection (regardless of what it is a collection of), we began by trying to determine the necessary generic data structures.

When we left off - we had figured out tables to store the:

  • attributes that are necessary to flesh out a category
  • attribute value types that those attributes can contain
  • attribute values (the choices that attributes can actually hold)

But, those attributes are not describing anything yet.  Lets fix that and finally define the category table.

category (
id bigint auto_increment primary key,
categoryDescription varchar(255),
subCategoryOf bigint

Those first two columns are pretty straight forward.  There is an id and a description of the category.  But that third is not so obvious - especially this early on in figuring everything out.  The reason is that I am cheating a little bit and looking ahead.  And what I see when I do that is - in order to describe a category, there might be 'things' that are part of it.

For example (going back to crayons...and this is pushing a little bit) - crayons not only have a color and a length.  They also have a manufacturer.  And, a manufacturer would need its own set of attributes to describe it (name, address, phone number, etc).  I do not plan on actually going to that point.  But if I did, then it could be taken even further (addresses have the street number, street name, city, state, numbers have area codes, prefixes, etc).  Being able to specify that a category can be a sub-category will allow us to support this type of nesting regardless of whether or not it would be silly (or necessary) to do so.

Having the category table allows us to define what a category is - but not to hold any actual things that belong to the category.  We also do not have anywhere to store the actual attribute values for a particular item (just the list of choices that it might have).

instance (
id bigint auto_increment primary key,
categoryID bigint,
instanceDescription varchar(255),
created date

instanceAttributeValue (
id bigint auto_increment primary key,
instanceID bigint,
attributeID bigint,
attributeValueID bigint

You may or may not be able to tell, but the way we have this set up, a user would have to have an entry in the attributeValues table to pick...No free text entry.  We will handle that by allowing the user to pick from the list or enter new text that would automatically get added to the attributeValues table.  But that is an implementation detail and I am going to try to keep the data structure clean.  To do that, let's add another column to the attributeValues table - (instanceSpecific boolean default false).  Here is the new structure of the table.

attributeValues (
id bigint auto_increment primary key,
attributeID bigint,
valueData varchar(255),
instanceSpecific boolean default false

Now we can tell that an entry in the table is a one-off and can be removed if there is no reference to it.

One more thing is missing - there is no way to tell which attributes belong to which categories.  This could be handled in one of two ways: add the categoryID as a column in the attribute (rigid) or make a join table (flexible).  To see how these choices would affect the system, let's take 'color' as an example.  The rigid route would force us to have a special color attribute for every category that we wanted to have it apply to - each with its own set of possible values.  The flexible way will allow us to use a single color attribute and apply it to as many categories as we wanted - much less redundancy.  If we come across an attribute the at first looks like it is a repeat but turns out to need a special set of values then we can always split it out later.

xCategoryAttribute (
id bigint auto_increment primary key,
categoryID bigint,
attributeID bigint

And that completes the first part of the trek.  All of the data structures have been figured out.

Here are the additional table entries to let us actually describe a crayon (some of the table definitions are from the previous post):

  • (1, 'Crayon', null)
  • (1, 'Enter single value', true, false)
  • (2, 'Enter multiple values', true, true)
  • (3, 'Select single value', false, false)
  • (4, 'Select multiple values', false, true)
  • (1, 'Color', 3)
  • (2, 'Length', 1)
  • (1, 1, 'Red', false)
  • (2, 1, 'Orange', false)
  • (3, 1, 'Yellow', false)
  • (4, 1, 'Green', false)
  • (5, 1, 'Blue', false)
  • (6, 1, 'Indigo', false)
  • (7, 1, 'Violet', false)
  • (8, 2, '3 inches', true)
  • (1, 1, 1)
  • (2, 1, 2)
  • (1, 1, 'Red crayon', '2012-08-07')
  • (1, 1, 1, 1)
  • (2, 1, 2, 8)
If that is anything less than dizzying to look at - you are following along far too well.  Here is an attempt to string things together.

Crayons (categoryID: 1) have two attributes: color (attributeID: 1) and length (attributeID: 2).  The value for a color must be a single value selected from a list (attributeValueTypeID: 3).  The value for the length of a crayon will be a single entered value (attributeValueTypeID: 1). The red crayon gets id 1, is a crayon (categoryID: 1), and was created on '2012-08-07'.

And, we have one crayon described - a new red one.  The values for its two attributes are (id: 1 / categoryID: 1) color (attributeID: 1) red (attributeValueID: 1) and (id: 2 / categoryID: 1) length (attributeID: 2) 3 inches (attributeValueID: 8).

That probably does not help much right now.  If you can make it through to the end and see this actually working it should all become a bit clearer.  And even if it doesn't get clearer, that is why we have computers manage all of this for us.

The next two parts are going to be tied pretty tightly together: data entry/display and processing.  But, I will try to pull them apart to make them a little bit easier to keep track of.

To do the pulling apart, we'll continue moving from back (server side) to front (user side).

And cover...