Posts
630
Comments
535
Trackbacks
1
Sunday, February 07, 2010
Blog Comments or “Saint jdn – fighting Those Who Cared”

I have to explain the new tag line.

Updated: the new tag line was fine for a day.  Back to what it has always been.

In the meantime, don’t forget the first rule of the Blogosphere: opinions are like assholes.  Everyone that has one, is one.

Because of the weird phenomenon of people in Eastern Europe posting spam links that advertise porn or poker or whatever, I have moderation turned on for the site.  It’s annoying because almost no one reads this thing and almost no one comments on it (except maybe to say “DOH!  I did that too!”), but if I don’t have moderation turned on, once one test spam comment gets through, hundreds a day come in, and it’s just a pain in the ass to deal with.  Right now, the big thing appears to be selling term papers to college students who don’t want to actually do the work themselves.  One gets submitted every couple of days, which I then kill.

Anyway, other than spam, I’ve never deleted a comment for any other reason, and except for something that was just blatantly racist or something, I don’t know what I would delete.  Maybe completely off-topic comments about politics or something.

I have this general policy for a reason, and one that ties back to the ‘birth’ of this piece of crap blog.

Back when Scott Bellware was blogging on CodeBetter, he was going on in his general humble (</sarcasm>) way about something he said or did, and I submitted a generic snarky response commenting on how nice it must be to be superior to everyone else.  Which, of course, he deleted.  And then he wrote a post about it (I think it was titled ‘BlogCoward’).

Anyone who’s read this blog or, well, ever met me, knows that I can have somewhat of, uh…an aggressive personality.  (Translation: “I’m kind of a dick”….what do you mean, kind of?  “Fine.”).

Naturally then, I commented about it.  I don’t remember if he deleted the first one, but eventually, one stayed and general frivolity ensued.  “Who are you jdn?  Are you just a troll?”  Which eventually led to my “My name is John Nuechterlein, I got a Ph.D. at the age of 25, I’m a good cook, I play a mean guitar, and I’m a snazzy dresser” comment, and there we go. (i’m no where near being a snazzy dresser, the rest is fairly accurate….but I digress).

From all of that, I have what I might call the ‘pot meet kettle’ moderation policy.  It would be rather cheezy for me to just delete comments if I didn’t agree with what was being said, or if the comments were somewhat…’aggressive’.   This was what was so funny about Bellware’s old blog, he’d post really rude and obnoxious comments about everything and everyone, but if anyone posted the slightest thing critical of him, he’d get all offended and wussy.  I’d link to examples (especially that didn’t involve me) but he deleted all of his old posts when he split from CodeBetter.

If you are going to have a blog and allow comments and say provocative things, don’t be a loser and delete stuff without a reason.  That’s true blog cowardliness…or something like that (i guess my attitude also stems from being on USENET back in the day, as the things that people call trolling today ain’t nothing like then.  There’s nary an H. West amongst them.  Not even a Plain and Simple Cronan.  Moment of silence for Cronan, rest his soul with God………….thank you.  The world misses that guy, and doesn’t even know it.  but i digress).

So, the new tagline came from a comment that Rob Conery made on his blog to a comment I made before he deleted the whole exchange and banned me from the commenting system altogether.  Burning bridge?  What’s that?

true story digression: after I graduated from the University of Miami with a Ph.D. in Philosophy at the age of 25 (Hi Jeremy, Hi Rob!), I stayed around for a year or two before finally leaving the hellhole that is South Beach (City motto: It’s a great place to visit, but you wouldn’t want to live here), and so during that time I was still around the Department for meetings, colloquiums, parties, etc.  Anyway, for a few months I dated one of the graduate students in the program, and some of my former classmates and colleagues asked her what it was like to date this jdn guy.  When she told them that I was very sweet and kind, the unanimous reaction was pretty much that she couldn’t have possibly understood the question properly, or she was actually dating someone else.  I consider the time when she told me about this one of the highpoints of my life, if only because it was so funny…lol, sorry, I digress.

Rob’s been on a kick about getting rid of relational databases and using NoSQL type databases (which are entirely different topics, which he doesn’t get), and supplying a lot of useful code, all of which is good.  He seems to think that if he does something good in one area, it means he’s excused for harmful actions in other places.

The ultimate problem is that advocating getting rid of relational databases would make this industry worse on orders of magnitude.  He doesn’t like to hear that, so he deletes comments.

This has always been the problem with Alt.NET.  Really smart people advocating really stupid business practices, e.g. all of the good that could come from examining NoSQL possibilities drowned out by dumb ideas that you have to get rid of relational databases. 

Anyway, Rob was making some generally ignorant comments and so I posted the following:

"In the 24 years that I’ve been doing this, I’ve changed a column name on a DB precisely twice"

So, in other words, you don't have a lot of experience in this area.

Rob really didn’t like this, I guess.  What I said was accurate (if you’ve ever worked on a DB system with hundreds of tables worked on by dozens of people over 5+ years, changing column names is pretty common.  Not every day common.  But common.).

Oh, he didn’t like this at all.  Because he’s too lazy, err, because he uses Disqus to handle comments to his blog, I got the full response in the email that gets sent out.  It was a brilliant rant.  I wish he had had the guts to keep it online, but it included the following:

“You, my friend, are the smartest of them all. You see me for what I am - a sham. And when I spend 5 hours of my Saturday trying to concoct yet another Lame Blog Post to try and answer the Good People of the world (whom I've completely fooled) - you're there to call me out. You should be commended. No you should be Sainted. Saint JDN - savior of the Geeks. The guy who understood what no one else did and saved the masses from the tyranny of Those Who Cared.”

I love this.  I really do.  The sarcasm is awesome.  Rob has never been able to handle challenges to his positions, so he resorts to this sort of thing.   Brilliant.

He didn’t actually get around to banning me from his site until my follow up comment:

“Why do you insist on things like:

- copying the points made by Udi and Greg and others, but without attributing them at any point, as if you were an original source on any of this (which you aren't, you are simply repeating what they have said, for the most part).

- thinking that your criticisms about relational databases actually relate to them, since your critiques of them seem irrelevant to how they are actually used in the real world”

That did it for him.  As the ultimate Gloryhound, he likes to post stuff where he rips off material from other people and pretend that he was the source.  When I post about cqrs, I make it clear that I am building on the work of others.  Rob thinks it is okay to plagiarize.   Good for him.  Derik has been doing with Dimecasts what Rob is doing with Tekpub, except Derik doesn’t charge for it (Dimecast official motto: “learn something new in 10 minutes or less, average running time of episodes is 12 minutes”).  Good for him.  As I said to him in an email, Tekpub is as popular as it is because people like Ayende are part of it.  It’s not like anyone thinks his blog series is real world code.

As I mentioned to him in my response to his email, I have an open invitation to have a Skype session to go over all of this.  I’d be fine with recording it so that everyone could hear it and come to their own conclusions.  He’s too scared to do that.

From his last email to me, he seems to think that I hate him or that I have a lot of anger towards him.  He’s a blogger guy.  So was Bellware.  If I don’t hate Scott (who has actually blogged a bunch of good stuff in the last few days since abandoning Twitter), why would I hate anyone?  This is all just talking about code.  I think Rob is killing the message of the advantages of using NoSQL stuff with this silly and ignorant comments about relational databases.  Not surprisingly, he has a different opinion about.  Okay, so what?  We differ about that.  I’m willing to talk about it in any open forum he wants.  Like SB, he runs away from that.  No problem.

Summation

If you have a blog, and you allow comments, be a man about it and allow comments that disagree with you.  You aren’t as smart as the people who disagree with you, and the people who disagree with you aren’t as smart as you either.  I think that makes sense.  Besides deleting whatever comments I had on his blog, he’s deleted a bunch of other things as well (there’s some guy named Eric that really riles him up…anybody know who this guy is?), and left in all the stuff where people thank him for what a great guy he is.  Which is okay.  It’s his blog, he can do what he wants with it.

And remember the first rule about the Blogosphere: opinions are like assholes.  Everyone who has one, is one. 

Deal with it.

posted @ Sunday, February 07, 2010 9:28 PM | Feedback (5)
It’s OK to do Reporting off of a RDBMS

Well, that’s a little misleading.  It’s OK to do Reporting off of a RDBMS as long as you do it right, and you should consider other options before committing to it. 

note: I’m using “Reporting” here in the traditional sense, not in the cqrs sense where pretty much anything that doesn’t involve a command is called “Reporting.”  Also, since I mostly know SQL Server, that’s what I’m going to be discussing here.  Also, yes, I know I’m glossing over a hell of a lot of stuff here.

The ‘Problem’

Suppose you have your traditional transactional system (it could be an eCommerce store, trading system, whatever), designed and optimized to handle inserts into it.  Indexes are aimed at preventing locking, data files (especially the transaction log) are located in different places to minimize hot spots and maximize I/O (SAN technology is pretty amazing these days), code is written correctly so that minimal numbers of query plans are generated which are then maximally used, yada yada yada.

Then along comes Sally Business User who wants to write a report that gets back whatever data she’s all hot to get information on, and happens to construct the query in such a way that joins on ten tables, all of which get locked, and which unfortunately returns the Cartesian Product of whatever table you have with the most rows.  Needless to say, the DB locks up and becomes unavailable, requiring a reboot, much gnashing of teeth, yada yada yada.

Of course, if your users are idiots, bad things can happen

But, of course, this is a straw man presentation.  Anybody can come up with stupid scenarios that don’t really address the pros and cons of using an RDBMS for reporting (or using one at all).  Instead, let’s take a closer look at why reporting off of an RDBMS can be problematic and how these problems can be addressed.

Joins can be costly

As a general rule, relational theory says that normalization is good (though this can be taken to an extreme…I once worked on a system where it took something like 8 joins to get a person’s cell phone number, but I digress).  This tends to lead to a proliferation of tables.  This means that when you want to read back related data, you have to join between a larger number of tables than in a denormalized system, and this can be a bad thing for a number of reasons.

A surprisingly large number of developers don’t really understand as much about transaction isolation levels as they really should, and so often times don’t even know how to write their queries with “(nolock)” properly implemented, which can lead to quite a lot of table locking.

The overhead of join conditions themselves (which, BTW, can also use “(nolock)”) isn’t that much (assuming you have proper indexes), additional conditions in the where clause increase, and can lead to very inefficient and costly execution plans if there aren’t proper indexes, they are in the wrong order, or aren’t sargable in the first place.

Aggregation can be costly

One of the greatest managers I ever worked with didn’t really like relational theory or SQL (which was somewhat ironic since data services was his department), and would at times dismissively wave his hand and say “blah blah blah, group by, order by, whatever.”  He knew it was important but didn’t really care much about the details (that’s what I was being paid for).

Well, all that ‘group by, order by, whatever’ can also greatly impact your execution plans for obvious reasons.  Simply selecting a group of rows is much different from selecting a group of rows while also finding your sums, maxes, mins that typically show up in a report.

Functions in where clauses are bad

Taking a piece of code and putting it in an UDF seems like a good idea.  The problem is that putting functions in where clauses makes the where clause non-sargable in most cases.  Even worse, if the function is doing any sort of complicated logic itself, it gets called for *every* row in the result set, not just once.

All this is getting in the way of the business anyway

Back in the dot.com heyday when working on eCommerce systems, the general principle was that our DB should ideally only be used when a customer was trying to give us their credit card number.  Obviously, this was an impossible ideal, but it was still a guiding principle (just as “eliminating crime” is an impossible ideal, but still a guiding principle). 

Well, obviously, reporting goes completely against that principle, which means that your scalability is limited by the amount of resources that are used for it, and as we’ve seen, that amount can be disproportionate to what you really want to be doing.

So, what to do?

Stop using an RDBMS for anything

One tactic to take is to “stop the madness” and not use an RDBMS at all.  Learning set theory, query optimization, etc. seems to require a lot of work, and can be never-ending.  An index that is good today might be bad tomorrow.  Statistics get out-dated.  And Joe the Developer is going to forget some table hint and lock the order table anyway.

And the experience of the Internet has shown that there seem to be hard limits in just how much data can be stored/processed/managed in an RDBMS.  That’s why Google and Amazon (the obvious examples) don’t center their businesses around them (the white papers on their internal architectures are fascinating to read). 

However, throwing the baby out with the bathwater isn’t generally a good option.  Most of us aren’t going to be building systems that scale to the size of Google or Amazon.  It’s good to dream, but deciding on how to architect a system should be based on realistic expectations as to the needs of the business it is supporting, and for almost all businesses (except maybe at the highest of high-ends and lowest of low-ends), an RDBMS hits the sweet spot.

Moreover, while you certainly don’t have to be a Google or an Amazon to use something other than an RDBMS, making that choice requires learning different ways of doing things, with less supporting literature.  SQL has been around for a long time and in many different environments.  While ‘NoSQL’ style databases have been around for ages, there simply isn’t the high level of ‘Google it’ knowledge that you can rely on to solve any particular practical issue you might be facing.  If you don’t know much about SQL, but know that your system is experiencing blocking, you can pretty quickly learn how to identify the causes of it and devise a solution.

Don’t use an RDBMS for reporting, that’s why God invented OLAP

If you want to limit the resources that are hitting your database for non-‘credit card submission’ purposes, then obviously, you should move those ‘non-critical’ resources somewhere else, and an obvious solution is another database.  In fact, if you really want to do it ‘correctly’, put in place an OLAP system.  Take your highly normalized transactions and ship them off to another system that denormalizes it, aggregates it, precisely for data mining and reporting purposes.  The output of such a system will be familiar to anyone who, for instance, uses the Pivot function in Excel.  That’s right, your business users, the ones who want the reports in the first place.

With SQL Server, it comes with the product (well, not the Express version, I guess) as Analysis Services, so if you can already afford the cost of the license, you get it with no additional cost.  So, since it was designed precisely to do the reporting that needs to be done, and it comes with the product, why wouldn’t you use it?  You do have to afford additional hardware (you could use it on the same instance, but that kind of defeats the purpose), but that’s going to be true just about no matter what.  Seems like the obvious answer.

The problem is cost.  Not cost of the product, but opportunity cost.  Learning OLAP concepts and technologies appears to be something like an order of magnitude more difficult than learning OLTP concepts.  Normalization is easier to understand than star schema.  T-SQL is an easier language to work with than MDX.  Because of this, it is harder to find people to staff a business if it uses it.  Significantly harder.

Is this a definitive argument against OLAP?  Of course not.  In fact, if you are the sort of person who likes data, and who would like a fairly secure, fairly well-paid profession so that they could support and raise a family (in other words, the truly important stuff in life other than this geek crap), I would encourage you to learn this stuff.  Action pack subscriptions or MSDN subscriptions aren’t free, but they can give you licenses of SQL Server versions that include SSAS.

But, the main reason why OLAP hasn’t taken off as much as one might think (though I guess that is changing over time) is that there is another, cheaper option.

God also invented Replication

Part of every version of SQL Server (though with some limitations), replication allows you essentially to take your database and copy its data, in near real-time and in as close to the form as it exists in as you want, to another database, more or less automatically.

You still need the additional hardware (since replicating to the same server kind of defeats the purpose), but you can keep your better understood schema, write your reports in T-SQL, and offload almost all of those resources from your main database.  There is a slight overhead in setting up and running replication but it isn’t much, and improves with each version (so, with SQL 7, setting up replication caused table locks, so you had to do it at 3AM, where in 2000, it only caused row locks, etc. etc. etc.).  If a bad query hits your replica and locks it up, then it won’t affect the main DB (as long as was setup that way). 

Is it perfect?  Nope.  You still have to do things right, you still are querying against a highly normalized database (leading to the common “why is my report timing out” problem), there is overhead involved both in terms of actual CPU as well as human overhead in terms of additional monitoring and support, etc. etc. etc.

But many businesses find that it is a perfectly acceptable solution in many situations.

Relational Databases aren’t going anywhere, neither is reporting off of them

It really is okay to do reporting off an RDBMS.  Despite what people think, they are going to be around for a long, long time, and that’s okay too.  Will non-relational databases grow in the market?  That’s hard to predict, but because of the experience of running systems on the Web, I think it will.  And that’s okay too.  The idea that there’s only one way to architect a system is generally nutty anyway.

posted @ Sunday, February 07, 2010 4:20 PM | Feedback (0)