Tuesday, May 18, 2010

Visual Explain Reloaded

One of the new features in PostgreSQL 9.0 is EXPLAIN output in different markup formats (XML, JSON, YAML). The idea is that this makes it easier for programs (as opposed to humans) to parse the output.

Using one of the standard teaching examples, compare the outputs:
regression=> EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Hash Join  (cost=227.95..700.46 rows=101 width=488)
   Hash Cond: (t2.unique2 = t1.unique2)
   ->  Seq Scan on tenk2 t2  (cost=0.00..434.00 rows=10000 width=244)
   ->  Hash  (cost=226.68..226.68 rows=101 width=244)
         ->  Bitmap Heap Scan on tenk1 t1  (cost=5.03..226.68 rows=101 width=244)
               Recheck Cond: (unique1 < 100)
               ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..5.01 rows=101 width=0)
                     Index Cond: (unique1 < 100)
(8 rows)
regression=> EXPLAIN (FORMAT JSON) SELECT * FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;

                     QUERY PLAN
-----------------------------------------------------
 [                                                  +
   {                                                +
     "Plan": {                                      +
       "Node Type": "Hash Join",                    +
       "Join Type": "Inner",                        +
       "Startup Cost": 227.95,                      +
       "Total Cost": 700.46,                        +
       "Plan Rows": 101,                            +
       "Plan Width": 488,                           +
       "Hash Cond": "(t2.unique2 = t1.unique2)",    +
       "Plans": [                                   +
         {                                          +
           "Node Type": "Seq Scan",                 +
           "Parent Relationship": "Outer",          +
           "Relation Name": "tenk2",                +
           "Alias": "t2",                           +
           "Startup Cost": 0.00,                    +
           "Total Cost": 434.00,                    +
           "Plan Rows": 10000,                      +
           "Plan Width": 244                        +
         },                                         +
         {                                          +
           "Node Type": "Hash",                     +
           "Parent Relationship": "Inner",          +
           "Startup Cost": 226.68,                  +
           "Total Cost": 226.68,                    +
           "Plan Rows": 101,                        +
           "Plan Width": 244,                       +
           "Plans": [                               +
             {                                      +
               "Node Type": "Bitmap Heap Scan",     +
               "Parent Relationship": "Outer",      +
               "Relation Name": "tenk1",            +
               "Alias": "t1",                       +
               "Startup Cost": 5.03,                +
               "Total Cost": 226.68,                +
               "Plan Rows": 101,                    +
               "Plan Width": 244,                   +
               "Recheck Cond": "(unique1 < 100)",   +
               "Plans": [                           +
                 {                                  +
                   "Node Type": "Bitmap Index Scan",+
                   "Parent Relationship": "Outer",  +
                   "Index Name": "tenk1_unique1",   +
                   "Startup Cost": 0.00,            +
                   "Total Cost": 5.01,              +
                   "Plan Rows": 101,                +
                   "Plan Width": 0,                 +
                   "Index Cond": "(unique1 < 100)"  +
                 }                                  +
               ]                                    +
             }                                      +
           ]                                        +
         }                                          +
       ]                                            +
     }                                              +
   }                                                +
 ]
(1 row)
For just reading the plan, the new formats are probably not better, although some might prefer them. The real win comes when you can feed this to a program to create a visualization. With the old format, parsing the output was complicated and error prone. With the new formats, it is easy. In fact, it was so easy that I couldn't resist writing a small visual explain program that renders plans through the graphviz library. Here is how you can use it:
regression=> \a\t
regression=> EXPLAIN (FORMAT JSON) SELECT * FROM tenk1 t1, tenk2 t2
WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2 \g |veung
Result:

Simple and useful. Get it from GitHub.

3 comments:

  1. I'd love to try it out, but I wasn't able to find the libraries it needs on OSX 10.6/Fink or Fedora 12. Where might I find them?

    ReplyDelete
  2. pygraphviz is at http://networkx.lanl.gov/pygraphviz/, simplejson is at http://undefined.org/python/#simplejson; the other modules are in the standard library.

    ReplyDelete