Wednesday, August 22, 2012

Building the litterBox...

In the past two posts, I went through figuring out what type of data tables could be used for storing a catalog of any type of item including a full description - in user data so that the same program could be used to keep track of a collection of anything.

http://bhindsight.blogspot.com/2012/07/a-litter-is-bunch-of-categories.html
http://bhindsight.blogspot.com/2012/08/litter-is-also-where-categories-put-all.html

Now that that is done, the next step is to build the processing side of the system.

Since I do most of my work building Java webapps that run on Apache Geronimo...I will lay out how I would build the back end running on Geronimo.

First of all, the information would have to be stored in the database tables.  As I said, MySQL is my (current) favorite database so, that is where the data would live.  What I have not recently mentioned is that the Java Persistence API (JPA) is Amazing!

Before I discovered JPA, all of the data access I did was done by manually coding the commands in SQL using JDBC to perform the connection to the database.  This was slow, error prone, painful to debug problems, and tedious.

Using JPA, all you have to do is to create Java classes that correspond to your database tables, add some fairly straightforward annotations to them, and list them in a configuration file.  And voila, reading and writing to/from the database is nearly as effortless as creating POJOs.

Geronimo comes with Apache OpenJPA baked in - so there is nothing extra to do except use it.

Here is a link to the persistence.xml that I made for my litterBox sitting in github:
https://github.com/jaydm/litter-ejb/blob/master/ejbModule/META-INF/persistence.xml

On the flip side, we will need to get that data sent over to the web client (and have new data sent from the web client).  For flexibility, I like to use XML.  Most programming languages have standardized methods for processing it - and that includes JavaScript.  I am going to choose it over JSON (JavaScript Object Notation) even though many languages also support JSON because it makes me nervous.  It is possible to put executable code into a JSON object, but it is not possible to do the same in XML (at least not easily).

And, to add to the attractiveness of XML, there is a built in standard to convert Java objects into XML - JAXB (Java Architecture for XML Binding).  In the same way that JPA makes it possible to simplify database access using annotations - JAXB makes it possible to marshal Java objects into XML using a essentially a block of boilerplate code (after those same POJO entity classes get a couple of JAXB annotations added).

Here is a link to the first entity class fully annotated (both for JPA and JAXB):
https://github.com/jaydm/litter-ejb/blob/master/ejbModule/net/jnwd/litter/entity/Attribute.java

And, that takes care of making Java aware of the data.  Now we need to establish the API between the web client and the server.  The interface we will be using should support all regular adding and updating functions.  Sometimes this is referred to as CRUD (create, read, update, delete) - but I hate actually deleting.  So instead, we will just mark things as being deleted with a 'deleted' attribute (so no table change needed).  We will also combine the 'create' and 'update' functions be having update automatically do the create if the record is not found.  That gets us down to only needing two functions: read and update.  We'll add one extra 'selectList' that will send back a simplified list of all of the records.

And our final interface becomes:

  • sendXML() - send the full XML of a table row to the client (identified by its ID).
  • selectList() - send a simplified XML of each table row as a part of a big list for use in populating drop down boxes.
  • update() - update existing table row or create new (and then update)
And a link to the interface for the attribute session bean:

Both the sendXML and selectList methods will return a string of XML data (to be interpreted by the client).  The update method will return the ID of the row that was just updated (or created).  That way, the program will be able to request the XML data if needed.

The update method is going to use an object that contains XML data parsed and processed from the message sent from the browser client (along with an XPath object for pulling the data out of it).  This will move the need for the servlet layer to understand the information that it is handing back and forth all of the way back to the session beans that will actually be doing the work.

So that is the rough (very rough) run down of how processing will be managed.

The github repo will be updated (as I have time) to fill in all of the functionality that is described here.  I really do plan on having a complete (if rudimentary) working app when all is said and done.

Next up...

Presentation.

Tuesday, August 7, 2012

Litter is also where cat(egories) put all their s**t...

In the quest to be able to store information about any collection (regardless of what it is a collection of), we began by trying to determine the necessary generic data structures.

When we left off - we had figured out tables to store the:

  • attributes that are necessary to flesh out a category
  • attribute value types that those attributes can contain
  • attribute values (the choices that attributes can actually hold)

But, those attributes are not describing anything yet.  Lets fix that and finally define the category table.

category (
id bigint auto_increment primary key,
categoryDescription varchar(255),
subCategoryOf bigint
)

Those first two columns are pretty straight forward.  There is an id and a description of the category.  But that third is not so obvious - especially this early on in figuring everything out.  The reason is that I am cheating a little bit and looking ahead.  And what I see when I do that is - in order to describe a category, there might be 'things' that are part of it.

For example (going back to crayons...and this is pushing a little bit) - crayons not only have a color and a length.  They also have a manufacturer.  And, a manufacturer would need its own set of attributes to describe it (name, address, phone number, etc).  I do not plan on actually going to that point.  But if I did, then it could be taken even further (addresses have the street number, street name, city, state, etc...phone numbers have area codes, prefixes, etc).  Being able to specify that a category can be a sub-category will allow us to support this type of nesting regardless of whether or not it would be silly (or necessary) to do so.

Having the category table allows us to define what a category is - but not to hold any actual things that belong to the category.  We also do not have anywhere to store the actual attribute values for a particular item (just the list of choices that it might have).

instance (
id bigint auto_increment primary key,
categoryID bigint,
instanceDescription varchar(255),
created date
)

instanceAttributeValue (
id bigint auto_increment primary key,
instanceID bigint,
attributeID bigint,
attributeValueID bigint
)

You may or may not be able to tell, but the way we have this set up, a user would have to have an entry in the attributeValues table to pick...No free text entry.  We will handle that by allowing the user to pick from the list or enter new text that would automatically get added to the attributeValues table.  But that is an implementation detail and I am going to try to keep the data structure clean.  To do that, let's add another column to the attributeValues table - (instanceSpecific boolean default false).  Here is the new structure of the table.

attributeValues (
id bigint auto_increment primary key,
attributeID bigint,
valueData varchar(255),
instanceSpecific boolean default false
)

Now we can tell that an entry in the table is a one-off and can be removed if there is no reference to it.

One more thing is missing - there is no way to tell which attributes belong to which categories.  This could be handled in one of two ways: add the categoryID as a column in the attribute (rigid) or make a join table (flexible).  To see how these choices would affect the system, let's take 'color' as an example.  The rigid route would force us to have a special color attribute for every category that we wanted to have it apply to - each with its own set of possible values.  The flexible way will allow us to use a single color attribute and apply it to as many categories as we wanted - much less redundancy.  If we come across an attribute the at first looks like it is a repeat but turns out to need a special set of values then we can always split it out later.

xCategoryAttribute (
id bigint auto_increment primary key,
categoryID bigint,
attributeID bigint
)

And that completes the first part of the trek.  All of the data structures have been figured out.

Here are the additional table entries to let us actually describe a crayon (some of the table definitions are from the previous post):

category
  • (1, 'Crayon', null)
attributeValueType
  • (1, 'Enter single value', true, false)
  • (2, 'Enter multiple values', true, true)
  • (3, 'Select single value', false, false)
  • (4, 'Select multiple values', false, true)
attribute
  • (1, 'Color', 3)
  • (2, 'Length', 1)
attributeValues
  • (1, 1, 'Red', false)
  • (2, 1, 'Orange', false)
  • (3, 1, 'Yellow', false)
  • (4, 1, 'Green', false)
  • (5, 1, 'Blue', false)
  • (6, 1, 'Indigo', false)
  • (7, 1, 'Violet', false)
  • (8, 2, '3 inches', true)
xCategoryAttribute
  • (1, 1, 1)
  • (2, 1, 2)
instance
  • (1, 1, 'Red crayon', '2012-08-07')
instanceAttributeValue
  • (1, 1, 1, 1)
  • (2, 1, 2, 8)
If that is anything less than dizzying to look at - you are following along far too well.  Here is an attempt to string things together.

Crayons (categoryID: 1) have two attributes: color (attributeID: 1) and length (attributeID: 2).  The value for a color must be a single value selected from a list (attributeValueTypeID: 3).  The value for the length of a crayon will be a single entered value (attributeValueTypeID: 1). The red crayon gets id 1, is a crayon (categoryID: 1), and was created on '2012-08-07'.

And, we have one crayon described - a new red one.  The values for its two attributes are (id: 1 / categoryID: 1) color (attributeID: 1) red (attributeValueID: 1) and (id: 2 / categoryID: 1) length (attributeID: 2) 3 inches (attributeValueID: 8).

That probably does not help much right now.  If you can make it through to the end and see this actually working it should all become a bit clearer.  And even if it doesn't get clearer, that is why we have computers manage all of this for us.

The next two parts are going to be tied pretty tightly together: data entry/display and processing.  But, I will try to pull them apart to make them a little bit easier to keep track of.

To do the pulling apart, we'll continue moving from back (server side) to front (user side).

And cover...

Processing.

Wednesday, July 11, 2012

A litter is a bunch of cat(egorie)s...

For some reason, I started thinking about organizing a bunch of 'anything'.  That probably grew out of my previous post about what programming is - but I can't be sure.

There are thousands of apps that slice into that pie of organizing something (your CD library, sock drawer, shopping list, etc).  But, as far as I know, there isn't one that will organize anything.

And, that led me to consider what it would take to make such a thing.  I did not go the next step to wonder how many people would want an app that would be generic enough to allow them to catalog their collection of stamps in the same place as they cataloged their collection of CDs, shirts, socks, etc.  Neither will I start trying to figure that out now - instead, I will go through the exercise of building it.

If you are interested in watching the app evolve, stay tuned.

Phase one: Figure out what your data is...

As I go through this, I will be designing the data structures to hold the information.  Since I am most familiar with SQL, I will be using tables (in my head, they are mySQL tables - but that is just in my head).  Also in my head, every table should have an id column that has no intrinsic meaning.  If your table has a name or description (or any other meaningful piece of information) as its primary key, things will get messy if you ever want to change that column.

Since this is intended to be a program to allow you to catalog anything - there is no way for me to determine exactly what the information being stored will be.  So, instead, I need to figure out a way to describe an 'anything' and group together all of the attributes that are necessary to describe one.

Aha!  An 'anything' has attributes that describe it!  I'll need an 'attribute' table.  There will not need to be much to it though.  Just the name of the attribute and the type of information that it will hold.

Crap - I need an attribute value type table too.

attributeValueType (
id bigint auto_increment primary key,
valueTypeDescription varchar(255),
enterValue boolean default true,
multiValue boolean default false
)

This table has two boolean (true or false) columns.  This took a little bit of thinking ahead.  And the reason for them is this.  An attribute will either allow the user to 'select a value from a list' or 'enter a value'.  Which of these two applies will be indicated by the value of the enterValue column.  If the value is 'true' then the user will be able to enter any value that they want.  If it is false, they they can only select from the list of options that the attribute permits.  The second boolean indicates whether or not it is possible to have more than one value for the attribute.  An example of an attribute that allows only one value might be the 'color' of a crayon.  But, a sweater might have several values for the 'color'.

You might be looking at this table and thinking to yourself that even though we have this nifty table, there are only four possible combinations of enterValue and multiValue.  And, you would be right.  Since you would be right, it would be an option to simply hard code these four values and forget about having the table altogether.  I hate hard coding if I can avoid it though.  If you decide to implement this system - feel free to get rid of this table.

But, there is a method to my madness... Keeping this in a table will allow us to extend the system later when we realize that there are more kinds of information we might want to keep track of (like dates or currency amounts for example).  For now, I am keeping things simple.

Here are generic versions of the four records that would go into the attributeValueType table:
  • ('Enter Single Value', true, false) - gets an id of 1
  • ('Enter Multiple Values', true, true) - gets an id of 2
  • ('Select Single Value', false, false) - gets an id of 3
  • ('Select Multiple Values', false, true) - gets an id of 4
Now, we can define our attribute table.

attribute (
id bigint auto_increment primary key,
attributeDescription varchar(255),
attributeValueTypeID bigint
)

Now we are starting to get somewhere interesting.  We can begin to think about one particular thing and how to describe it.  i.e. What are its attributes?  To help think about this, we should probably pick something to be our 'anything'.  As long as we keep the goal of a system that can keep track of 'anything' it won't hurt to narrow things down while we are designing/building it.

I will pick crayons - because there are not very many attributes to keep track of.  Actually, there are lots of attributes of crayons.  Most are things that would usually be ignored (diameter, wax type, melting point, flash point, weight, wrapper material, etc) with only a few that would 'usually' be kept track of (color, length).

So, lets make entries for the attributes to describe a crayon:
  • ('Color', 3) - gets an id of 1
  • ('Length', 1) - gets an id of 2
Here is the rational behind these choices for the attributeValueType.  The color will be a single value (most crayons have only one color) that will be picked from a list.  The length will simply be entered in.  It would be possible to change the length to a 'select from list', but in this case - it would probably be silly to try to enumerate all of the possible lengths that a crayon could have (3", 2.99", 2.98", endless silliness..., 0").

Wait a second.  We don't have a place to hold those color choices.  Time for another table.  Now, these values will need two things.  They will need to know what attribute they belong to (attributeID) and they will need the actual value that they are (valueData).

attributeValues (
id bigint auto_increment primary key,
attributeID bigint,
valueData varchar(255)
)

Since we are here, we will go ahead and enter the color choices that we will need for our crayons (I'll pretend there are only the seven colors of the rainbow for our crayons):

  • (1, 'Red') - gets an id of 1
  • (1, 'Orange') - gets an id of 2
  • (1, 'Yellow') - gets an id of 3
  • (1, 'Green') - gets an id of 4
  • (1, 'Blue') - gets an id of 5
  • (1, 'Indigo') - gets an id of 6
  • (1, 'Violet') - gets an id of 7

Each of those got an attributeID of 1 because they all belong to the attribute 'Color' that we created earlier.

Ack.  That is an awful lot of work considering that we didn't even get to making a 'thing' yet.

We'll take a break for now.  But come back!

Next time, we will finally make a cat(egory).

Monday, July 2, 2012

What is programming...

The non-programmers that I know usually do not have a particularly good idea of what programming is.

I know that none of the people that I have in my circle of people really know what I do for a living.  Most seem to think that I fix hardware.

To be clear, I am mostly going to talk about programming from a utility or business standpoint.  But even games follow the same principles (although it might not be as obvious).

Programming is a natural thing that people do all of the time - but usually, it is 'throw-away' coding.  They figure out how do perform a task, perform the task, and forget about the whole thing until (unless) they need to do it again.

Computer programming requires something more regimented.  We (you) are figuring out how to explain to a computer exactly how to perform a task.  Because computers are very good at doing exactly the same thing over and over again (but not good at filling in the blanks if you skip a step) it is important to include every necessary step.

Good computer programming would require making those steps simple.  Computers can follow complicated steps, but when you are trying to read your work in the future (or someone else is trying to read it) simple will be easier for you.  And a good practice would be to make sure that not only are your steps as simple as possible, but that you make them self describing.  Name variables so that the tell you what they hold.  Put in comments.  Include logging that makes your program tell you what it is doing as it is doing.

Great programming adds in speed and elegance.

There is no one way to add speed and elegance.  I think that mostly, they will come from practice at trying to be a good programmer.

I think that every program is made out of a small number of 'major' parts.

  • Data
  • Presentation/Input
  • Processing
The data is the information that your program has to work with.  Whether it is temperature and wind measurements for a weather modelling program - or the amount of money in your bank account along with your deposits and expenditures...data is the whole reason for writing a program.  It is what you want to keep track of and the building blocks that you need to solve a problem.

Presentation is how you show your data (and results) to your user.  Input is how you get the source data from them.  Ugly screens or complicated interfaces will make it more difficult for someone to use your program.  It may be the best at what it does, but if it is painful to use...no one will ever know that.

And processing is all of the work mixing the data together and applying various rules against it.  If you have the best data and the most pleasant, understandable presentation...It will not matter if you program does not do something useful (and correct) with it.

From the order that I listed them, you would probably infer that I think the data is the most important part of the puzzle.  And, you would be right.  If your data is wrong or you are keeping track of the wrong information - your system is never going to be useful.  So, the first step in any programming exercise should be to figure out what your data is.

But, whatever you come up with will probably be wrong.  With practice, you will get closer to being initially right (so keep practicing) but you will need to keep an open mind as to what needs to be kept track of.  And also, what your user will need back from you.  As you move through the process both sides of that equation will (hopefully) become clearer and clearer.  As changes make themselves obvious - adjust.

The next part of programming is figuring out how you will get that starting information from the user, how you will show them what they told you, and how you will give the results back to them after they have been processed.  You will also need to figure out how the processing actions are going to be started.  Will the user press a button on screen, set a timer, etc...?  A well designed user interface is one of the things that can make or break a program.

And finally, what processing needs to be done?  There needs to be a payoff for your user otherwise, they will not go through the bother of starting your program (never mind taking the time to enter all of their data into it!).  This is where the meat of programming is.  Someone has a problem that they want solved.  It is a problem that requires keeping track of a lot of information or performing difficult manipulation of information (or both).  Sometimes your user would be capable of doing that work themselves and sometimes they will not be.  You will need to make it 'cheaper' (in time, stress, accuracy) for them to use your program than to do the work themselves.

This is a little more than strictly 'programming'.  Working through all of these would encompass system analysis, system design, interface design, data processing, as well as the programming part.  If you have a team to work with and each person takes a piece - that might make it easier for each to become expert at the part they do.  But I would say that being able to see the big picture and all of the parts is what truly makes creating programs fulfilling.

Thoughts?