Recently, my former colleague Robin Moffatt authored a blog post about approaches to Concurrent RPD Development, where he discussed some of the technical challenges around trying to use MDS XML as a core OBIEE repository development strategy. As usual, Robin’s attention to detail was spot-on, and he does a really nice job detailing some of the gotchas and surprises for those considering this approach.
At Red Pill Analytics, we offer a product called Checkmate that is, above all, a continuous integration and delivery platform that handles the building of OBIEE artifacts (repository and web catalog), the regression testing of those artifacts, and the deployment of those tested artifacts to downstream environments, all 100% automated and completely hands-off. The platform is agnostic in the choice of multi-user development approaches: as long as source control is used, Checkmate can continuously and automatically build, test and deploy BI artifacts. However, since Red Pill Analytics actually has multiple customers using Checkmate with MDS XML (no fairy dust here Robin) and are quite happy with it, and since I disagree with some of the conclusions drawn in the post, I thought I would spend a little time and add to the debate.
MDS XML Repository Opening Times
Robin pointed out that with large repositories rendered in MDS XML, the time it takes to open can be excruciating, mentioning times in excess of 20 minutes in one example. My question is this: in what world are serious repository developers in “offline” mode anyway… binary or MDS XML? Spending hours, or even mere minutes, developing in offline mode is a fools errand, akin to writing complex Java code or shell scripts without bothering to periodically execute it to see if it works. Even your basic “Hello World” program is unit tested occassionaly to work out the kinks. We have no idea if our metadata “code” functions correctly until a running BI Server parses that metadata, performs the intelligent request generation, executes physical queries to a data source (or pulls from cache), and returns those results back to us. We have to see the results, and we have to see the code generated by the BI Server, or it’s a waste of time. None of this is possible in offline mode.
Our Checkmate methodology stresses the importance of “online” development and we recommend isolated, full-stack OBIEE workstations/sandboxes for each developer to enable this (for core catalog developers as well, depending on the requirements). Online development is the only way we can adequately unit test our code: we need immediate results while crafting these complex models. How can we even begin to discuss continuous integration or rapid, Agile delivery when even unit test results are hours or days away?
Our customers are never actually opening an MDS XML repository, but are instead saving back to one whenever they commit their code. It takes only seconds:
The Inevitability of Conflict
(Or The Conflict of Inevitability… Your Choice)
Yes… they will happen. Regardless of what method we use… there will be conflicts. But this is true of any development methodology: if we allow concurrent development on the repository (as we should), it’s inevitable that developers will overlap, especially with a metadata layer that values consistency above all else. However, conflicts can be quarantined to the penumbra of the process, and handled just as easily and effectively with MDS XML as with a binary repository.
Let’s take that last point first, and I’ll quote Robin here as he takes us down into Dante’s Inferno of source control when describing a Git conflict with MDS XML: “Now the developer has to roll up his or her sleeves and try to reconcile two XML files — with no GUI to support or validate the change made except loading it back into the Administration Tool each time.” I’m not sure exactly what he means when he says “loading it back into the Administration Tool.”, or why it’s so awful. It’s identical in fact to the process of “loading it back” using a binary repository: 100% identical.
Since verion 11.1.1.6.0 the Admin Tool has been agnostic in it’s support for binary or MDS XML repositories when it comes time to compare or merge, or anything else for that matter: you can even mix and match if it suits your fancy. Here’s the really cool part: Checkmate automatically builds and tests feature branches as well as develop and release branches (an important part of the Gitflow methodology), so every feature branch automatically generates an individual RPD feature patch for just the changes in that branch, an individual RPD feature rollback patch (just in case you need it), as well as all the binary repositories along the way (equalized, non-equalized, etc.) Checkmate can even automatically merge the feature branch into the develop branch upon successful regression testing, though none of our customers have taken us up on that yet. With patches generated as part of the feature branch build, if we have to use the Admin Tool, we can use the incremental patch file and merge straight into the MDS XML repository:
Dorothy should be here soon… ‘Cuz I just saw a Straw Man
Next I’ll discuss Robin’s two unmergeable situations, which he lays down as a sort of litmus test for whether MDS XML is a workable solution. The first is the situation where in separate branches, two developers create different columns on the same Logical Fact table: in this case, the Margin and Taxmeasures on the Logical Fact Sales, like so:
When attemping to merge in the second branch of changes, we get a conflict warning from Git. Robin explains it thus: “But taking a ‘simple’ merge conflict where two independent developers add or modify different columns within the same Logical Table [his bold, not mine] we see what a problem there is when we try to merge it back together relying on source control alone.”
He’s right about this… multiple, isolated additions to the same logical table will likely lead to a merge conflict that source control tools alone cannot handle. Git can usually handle simultaneous modifications to the same logical table, or a simultaneous addition with an update. But this isn’t special to MDS XML or it’s hierarchical structure… instead it boils over from the way source control tools work in general. In an informative post on the CSS-Tricks website, guest blogger Tobias Günther explains the basis for this conflict:
“There’s a handful of situations where you might have to step in and tell Git what to do. Most commonly, this is when there are changes to the same file on both branches. Even in this case, Git will most likely be able to figure it out on its own. But if two people changed the same lines in that same file, or if one person decided to delete it while the other person decided to modify it, Git simply cannot know what is correct.”
So nothing inherently troublesome about MDS XML, except that it’s text and not pure metadata. Let me quickly run over Robin’s second example before I discuss the ramnifactions in more detail. He describes the situation where two developers add the exact same physical table to the repository in two different branches, producing duplication of the physical table and an error in the repository, in this case, both developers adding the table GCBC_SALES.SALES:
No development paradigm could expect an automated merge in this situation. As Marco (he doesn’t provide his full name) describes on his Earthli.com site, we would get similar results if we allowed Java developers to create whatever they wanted without process… duplication, and a non-functioning result:
“An automatic merge can, however, introduce semantic issues. For example if both sides declared a method with the same name, but in different places in the same file, an automatic merge will include both copies but the resulting file won’t compile (because the same method was declared twice).”
Robin argues that the structure of MDS XML makes the processing of the OBIEE repository writ flat somehow more treacherous. But that doesn’t seem to be the case at all. In their An Agile Perspective on Branching and Merging article on the popular CMCrossroads site, Steve Berczuk, Brad Appleton and Robert Cowham describe the basic nature of software merging in general, and the kinds of things we can expect:
“There is usually no difference in the algorithm used for merging two text files containing poetry and two text files containing C++/Java or any other language you care to mention. Indeed, once it is possible to perform 100% correct merges at the semantic and syntactic level, the merge system may well be able to write whatever program we need 5 minutes before we know we needed it!”
Process, Process, Process…
At Red Pill Analytics, when we discuss a Checkmate purchase with prospective customers, we make it clear that, for best results, they are enlisting in a methodology as much as a product. Yes… Checkmate automates all aspects of OBIEE builds, testing and deployment, including the repository and the web catalog (and ODI is coming soon to a city near you). But real success only comes when we adopt the hacker’s equivalent of the Serenity Prayer. Ironically, the two scenarios that Robin lays out in his litmus test about merging are perhaps the easiest two scenarios to remedy with simple process. He states:
“So if we want to use MDS XML as the basis for merging, we need to restrict our concurrent developments to completely independent objects. But, that kind of hampers the ideal of more rapid delivery through an Agile method if we’re imposing rules and restrictions like this.”
First off… as I described in the first section above… this is simply not the case. The Admin Tool is agnostic with regards to a binary repository and MDS XML… we have just as much right to use it as anyone else. But the second point is the one I take most issue with… that developers need to be able to treat development like the wild, wild west to participate in Agile delivery.
On the contrary, all Agile management tools have the concept of “blocker” built in, and the Scrum Master (or whoever) pays very special attention to this attribute as the number one inhibitor to a healthy burn rate. It simply makes no sense to enable two developers ripping into and rewriting the same API simultaneously: it’s not efficient, and Agile is all about efficiency. BI development is just that… development, and BI projects need to be run like real IT projects. Agile has very strong processes in place (some would argue too strong), the only difference is that Agile project managers are constantly reevaluating the application of those processes each and every sprint to make sure that efficiency is the number one goal. At the end of the day… dual additions to a single logical table is a bad idea… and that requirement has no business being in a litmus test for anything.
Secondly… Agile development embraces the concept of refactoring, something the BI world is sorely missing. Why not quickly correct the dual Physical Table issue with an alias, open a ticket for it to be refactored, and go on about the business of delivering value? If two developers have indeed made simultaneous additions to the same logical fact… we have numerous solutions to handle this. In an environment where BI development is treated like real development, the merge request would either be rejected as illegitimate, or the Source Master (we call it the Gatekeeper) would simply open the Admin Tool, close the merge request, and then probably go to lunch. If any of these processes seem in any way hazardous, just remember: we’re using source control. There’s no action that we can take that can’t be reversed and redone differently at a later time.
Conclusion
Robin is very smart and incredibly practical… I respect his opinion as much as anyone I have ever worked with. But I think he’s misjudged this one. Why take these two examples—both of which are bad practices in any rational view of delivery, yet at the same time, easily recitified with process or the occasional use of the Admin Tool—as an indictment of a process that works flawlessly the other 97% of the time? I think I’m parsing the logic correctly: “Because you have to use the Admin Tool for merging 3% of the time to support some outliers that are generally bad practice anyway… we think you should go ahead and use the Admin Tool 100% of the time.”
It simply doesn’t make sense. Red Pill Analytics has multiple customers actively using MDS XML for multi-developer teams and most never see a single conflict. When they do… Checkmate leaves all the bread-crumbs for an easy and successful Admin Tool merge. But most of our customers prefer to simply plan around these restrictions in the name of efficiency. In ETL tools like ODI and Informatica, it’s not possible for two developers to work on the same mapping at the same time because it’s generally a bad idea. Concurrent development is not about unfettered access to commit whatever we want whenever we want it with the absolute expectation that it should merge automatically. Instead, it’s about designing processes that make whatever method we use as fruitful as possible.
When we discuss continuous integration, Agile methodologies, and rapid delivery, I sometimes hear other technologists debate about “current state” versus “target state”, and the vast chasm between them. In other words… this is the real world, and in the real world, organizations simply aren’t going to treat BI development like real development. I’m sorry… I just can’t accept that. At Red Pill Analytics, we want to… get ready for this… change the world. We discuss the pros and cons of all the multi-user options when we engage with customers to purchase Checkmate, because at the end of the day, we support all of them. At the same time… we clearly and emphatically believe that MDS XML is the best choice. It makes cherry-picking features for a release just plain easy… another important tenant of the Gitflow methodology. Bottom line… all of us want to deliver results for whomever our end-user is. But I don’t think we need to throw the baby out with the bathwater just because change is difficult.