Querying through Collections in db4o

Background

I’ve been working for a while now with db4o, an object-oriented database. It’s native Java (ie you use it from and feed it Java objects, as opposed to translating to some third-party pseudo or meta object representation as most of the original OODBMSes required you to do). It does a really straight forward job of just persisting Plain Old Java Objects. No mucking around with bytecode enhancers (like JDO) and certainly none of the self-mutilation that goes with EJB. db4o has proved fairly easy to use once you wrap your head around the fact that instead of foreign keys you just use a plain old Java variable … because of course all variables (ok, instance fields) are references in Java. Neat.

To say that I am nervous about contemplating db4o would be an understatement — I have lots of experience with data evolution in enterprise settings and using a OODBMS rather than a more traditional RDBMS scares the heck out of me. On the other hand, db4o gives me almost totally transparent persistence and has certainly massively accelerated development by allowing me to have a functional persistence layer in one of my applications from the get-go rather than wasting eons fighting through the Object-Relational impedance mismatch. So all in all it’s been a very good experience. Thus far, anyway, db4o has proved to be reliable and robust. Of course, you have to use it properly…

[I’m not worried about it loosing data so much as being worried about schema migration and being able to ensure I can always get my data out. db4o is actually pretty good about data upgrades across schema migration, but there are lots of good reasons why the SQL crowd have long asserted that loose coupling to data is a better idea and that abstracting data design and storage from domain layers is a good thing. {shrug} I won’t be surprised if I end up switching to Hibernate and PostgreSQL, but in the mean time, I’ve been able to completely avoid the ORM nightmare, and that makes me rather happy. Will I trust db4o it to actually store accounting data when I go live? Eeek. We’ll see.]

Aside: As it stands right now, I’ve deliberately architected things so that you have to talk to the application [think app server] to talk to the data. All the data integrity is in the domain objects layer. So the usual reasons to translate it into SQL (a. “so that other applications can work with it”, b. “because the data is already there and your app is secondary”, and c. “because that’s what an RDBMS is for — ensuring data integrity”) don’t apply. It’s like your local bank — the ATM talks to the enterprise application middleware and THAT and only that talks to the database. Customers don’t write SQL to get their account balance. Yes, that’s contrived, but more to the point application programmers at the bank don’t talk to the database either. They ask the app server for a finder to get them account objects, and get one handed to them which they can then operate on. The actual database is quite shielded from them, thank you very much — at which point I make the inference that since the nature of the datastore is transparent to the application writer anyway, it doesn’t much matter what mode that database is. Yes, I understand that RDBMSes do a perfectly good job of ensuring data integrity (in fact, some argue to me that it’s their only real job) but I’m a Java guy and that’s where I’ve got my application and data integrity logic at the moment. Which is why object persistence is of interest to me. The enduring reason to go to an RDBMS — to protect me from my own screwups — remains.

Querying

I’ve recently started writing some finders — things that would help me quickly look up [persisted] objects by some commonly used set of constraints. This week it was getting a specific Ledger given a fragment of a [parent'] Account title and a fragment of the Ledger name. Right now Accounts contain a Set of Ledger objects in a field called ledgers.

db4o does query through Collections — transparently, in fact, but I had trouble descending through the Collection to constrain the result set to just members of the class which was contained in that Collection. To search for the Ledger whose name is “At Cost” within an Account titled “Furniture” using some prototypes to constrain the searches:

    String title = "Furniture";
    String name = "At Cost"
    ...
    Query query = container.query();

    Account a = new Account();
    a.setTitle(title);
    query.constrain(a);

    Query subquery = query.descend("ledgers");

    Ledger r = new Ledger();
    r.setName(name);
    subquery.constrain(r);

    ObjectSet results = subquery.execute();

Returned me a single LinkedHashSet! That sorta makes sense given that the query node was at ledgers but if the constraints are applied transparently through Collection types, how come I can’t descend below the Collection to get a subsubquery object constrained to Ledger.class that would be the one I could execute?

I went through countless renditions of the above and had the same problem. One variation allowed me to do substring searching, (ie LIKE):

    String titleFrag = "Furn";
    String nameFrag = "Cost";
    ...
    Query query = container.query();

    query.constrain(Account.class);
    query.descend("name").constrain(titleFrag).contains();

    Query subquery = query.descend("ledgers");

    subquery.constrain(Ledger.class);
    subquery.descend("title").constrain(nameFrag).contains();

    ObjectSet results = subquery.execute();

Variations on this theme would either return that single LinkedHashSet or worse would return all the Ledgers in the datastore. Ick.

Workaround

What ended up fixing the problem was to add a navigation reference from Ledger to the parent Account. I already had that pattern in various other places (for instance in Entries which are the bridge between transactions and accounts/ledgers, have references to both their parent Transaction and their parent Ledger) so adding a parentAccount reference to Ledger was just a few lines of code.

[This of course adds a whole field and object-to-object relationship so in persistence terms this means that all the objects of type Ledger already in the data store would need to be updated. This is what makes me nervous about db4o. At the moment it’s all just mockup data that I’m recreating each run, but in production use? They provide some really neat hooks to deal with this sort of thing but it strikes me that this would be really easy to screw up]

Now to get the Ledgers I want I just do Ledger.class first and descend “up” the object graph:

    Query query = container.query();

    query.constrain(Ledger.class);
    query.descend("name").constrain(nameFrag).contains();

    subquery.descend("parentAccount");

    subquery.constrain(Account.class);
    subquery.descend("title").constrain(titleFrag).contains();

    Query subquery = query.descend("ledgers");

    ObjectSet results = query.execute();

So the thing thats bugging me is adding & maintaining this kind of navigation reference just to get around what I view as a weakness in the query mechanism provided by the persistence store.

Actually, the thing bugs me even more is the fact that to navigate through the graph you have to naming the private fields of the classes you want to go to in String form. So much for getting the compiler to type check for you. Of course, with something like Hibernate you have to name, in string form in an XML file, bloody everything by exact name that is in both your classes and in your database. But back to this problem, if I just assumed everything [that needs to be] was in memory, and used appropriate classes (ie, TreeSets for large sets that need to be sorted, or ArrayLists for things that need to be iterated through) then I could probably avoid needing db4o’s query mechanism entirely.

Which would actually be just fine. It’s not like there are hundreds of thousands of these things — there are just hundreds, max. Sigh. It really is easy to over-engineer things. :)

It’s been a good exercise of their API, however, and out-of-band querying is still pretty cool. I’ll certainly need it when it comes time to allow users to do searches to isolating specific Transactions they might have interest in.

Strongly Typed arrays versus Collections

I did see some indication in the db4o forums to the effect that a strongly typed array would be easier to use than Java Collections of them, ie Hobbit[] array versus (say) a Set of them. Certainly, this whole descend() thing would descend you into an array of the Type you want, not into a Collection where you get stuck.

I did some looking around on the net to see what people thought of using arrays versus Collections — not so much from a performance standpoint but rather from an API usability standpoint.

Broadly, the consensus out there seems to be that while strongly typed arrays are speedy things indeed [especially care of System.arraycopy()] and “nicer” to deal with because of the compile time Type safety, almost any use case imaginable would end up needing all the various methods that Java >= 1.2′s Collections provide so you’ve got a lot of reimplementation to do.

One suggestion I saw in a Java forum somewhere was to store in arrays for steady state, but to switch to (say) a HashSet while you’re mucking with it and then ultimately export back to an array with Collection.toArray(). That seems interesting, but since the Collections are all internally implemented on top of Map instances and they in turn are implemented on top of Object[] you really aren’t buying much be writing all that machinery yourself.

Anyway, with respect to my adding a parent reference, now all I have to decide is whether this is an elegant solution and something I’ll need anyway, or a dirty hack which needs to be expunged before it sees the light of day. :)

And, of course, I need to decide whether to keep using db4o or to switch to a ORM tool on top of an SQL database. The debate continues.

/me runs off to write some more unit tests.

AfC