Because everything’s better with bacon

comprar sildenafil viagra prijs pilule levitra tadalafil moins cher vendo viagra vendo cialis cialis te koop acheter cialis sur internet acquisto viagra senza ricetta medicament cialis levitra italia vardenafil generico prix de cialis compro levitra viagra sans prescription medicament levitra acquisto levitra vardenafil bestellen trouver du levitra cialis pharmacie levitra sur internet generique du viagra sildenafil bestellen compro viagra viagra kosten comprar cialis cialis venta libre commander kamagra compra viagra acheter cialis en belgique ordina levitra kamagra te koop viagra pharmacie pharmacie en ligne acquisto viagra on line levitra france impuissance erection cialis kauf achat cialis 20mg levitra sur le net viagra donne generische levitra comprar cialis generico viagra ricetta acheter tadalafil commander du cialis acheter cialis internet viagra farmacia costo levitra cialis ohne rezept cialis vente libre viagra quanto costa levitra en pharmacie posologia viagra acquisto viagra zithromax generique ordina viagra acheter isotretinoine viagra rezeptfrei tadalafil generique comprar vardenafil generique du cialis commander du viagra vente levitra acheter kamagra 100mg generische viagra achat cialis propecia prix viagra dosaggio tadalafil 10 mg levitra generico compro levitra achat viagra en ligne acheter kamagra kamagra pharmacie aquisto viagra acheter cialis en espagne trouble erection viagra ordonnance cialis donne vente viagra cialis receta cialis vente en ligne vendita levitra viagra recensioni acheter zyban kamagra oral jelly acheter cialis en pharmacie acheter finasteride viagra te koop levitra venta cialis ricetta medica vardenafil generique sildenafil receta acheter cialis pas cher pastilla viagra viagra ricetta medica medicament impuissance comprar levitra generica impotenza sessuale tadalafil precio achat cialis generique viagra svizzera cialis belgique acheter clomid en france viagra cialis differenze cialis livraison rapide levitra rezeptfrei dysfonction erectile acheter du cialis cialis generico vente de cialis acheter cialis sur la net cialis effet secondaire kamagra rezeptfrei levitra precio acquisto viagra svizzera impuissance homme compro sildenafil prozac sans ordonnance pastilla sildenafil comprar viagra em portugal compro cialis levitra pharmacie prezzi levitra kamagra generique acquista levitra vendo viagra milano sildenafil rezeptfrei viagra fur frauen viagra effet secondaire cialis prescrizione

Saturday, June 20, 2009

OSBridge Recap

This week I attended Open Source Bridge here in Portland.

Typically, I managed to miss the keynotes both days. There is something about conferences which makes me sleep through my alarm.

It was really hard choosing which talks to attend. The results of the coin toss:

Wednesday:
Tcl/Tk: Grandpa might be old, but he can still kick your ass! I went to this primarily because I use Expect so much. (Well, I use Expect.pm, but I remember my roots.) Webb gave a good intro to Tcl/Tk (”Tickle-Tea-Kay!”) despite some initial technical difficulties. I finally figured out the brackets vs braces variable expansion.

Then I gave my talk. Thankfully, Impress did not surprise me. I now have my unicorn badge.

Spindle, Mutilate, & Metaprogram: This was really cool, although it seemed similar to things that came out of the Perl community a few years back. I’d like to see a throwdown between Markus Roberts & Damian Conway.

Assholes are killing your project: I only managed about 20 minutes of this talk before I got too depressed & had to leave. Sorry, Donnie! We’ll talk about this later.

I spent some time in the hall track & then hit the yoga session. This was an excellent pick-me-up after a day of talking and brain-filling, and set me up for my BoF and then some time at the pub.

Thursday:
Arrived too late for chromatic’s Intro to Parrot so hung out in the speaker lounge and watched Andy and Irving’s run-through of Virtualize vs Containerize: Fight! I love the mashups.

Next up was Emma McGratten’s Ask Forgiveness not Permission, which had a lot of excellent reasons (financial & otherwise) for using open source, but not many tips on how to subversively bring it into your organization. I’m sure I know someone who could give a talk about that. :cough:

Lunch today was the excellent KOiFusionPDX Food Cart! They came to the conference site & provided excellent korean tacos. (Yeah, I know, sounds weird – but TRUST ME.)

Speaking of trust…Trust the Vote sounds like an excellent project. Unfortunately the question period started devolving into political discussion, and I didn’t want to just dive right in there and ask them why the hell they’re using MySQL instead of PostgreSQL.

Maria Webster got her unicorn badge for Faking it Till I Make It. Check out her blog to see what geeky women are up to.

bzr vs git smackdown with Selena & Emma. I’ve already made up my mind (git all the way!), but it’s good to listen to alternatives.

The Meditiation for Geeks session didn’t go too well for me, because I was so tired that any time I got close to The Zone, I almost fell over onto a fellow PostgreSQL Smurf. Still, the yoga & meditation sessions are a great way to unwind prior to the post-con socializing & I’d like to see more of this.

Pg took over the room & had our PostgreSQL BoF, which replaced the regular PDXPUG meeting.

Josh Berkus was riding a bicycle around town, which made me inordinately happy. I want to see if we can provide more bikes for attendees next year.

Friday: The Unconference rocked my socks:
1. Emma Jane’s “Playing with yourself” about Open Source documentation teams. I am even sadder that I missed WOSCON. This got me totally excited to contribute to docs. (Especially for certain Perl modules – but that’s a discussion for another post.) Highlights: the conference team is working on a style guide, and a library of personas (which isn’t public yet)

2. I signed up with DayOn, a local volunteer effort. This will be fantastic once we can get people trained in what’s actually reasonable to ask for.

3. I did a Network Management Basics talk (”FCAPS: What the hell?!?”) with Ua and Adam. We talked about the FCAPS model & where various tools we use fit. A very high percentage of them are rrdtool-based, so we talked about that a bit as well. Adam showed us his munin install. I keep trying to find other people in town who are as into Net Management as I am…I sort of feel like I need a 12-step program sometimes. On the way out, Ua proposed a Super-Sekrit project which we’ll start working on in September. (Excitement!)


I saw up-close what it took to put on this conference and I’d like to congratulate the organizers on their success! Great job, and can’t wait until next year!

posted by gabrielle at 5:52 pm  

Saturday, June 6, 2009

Book Review (part I): Refactoring SQL Applications, with bonus queries

It’s taking me quite a while to wade through Stephan Faroult’s Refactoring SQL Applications. I just finished Chapter 2 & figured I’d better just go ahead with the review.

It’s quite humorous – I mean, there’s a section called “Queries of Death” – but this is some dense material, make no mistake. I tried to keep my copy nice so I could loan it to others, but I had to give up and get out The Pen, and it’s been highlighted and scribbled on.

Small gripe: the layout of the example queries makes them hard to read (capitalizing the conditionals would help). I’d also like to see more examples of result sets.

The section about statistics sparked a lively discussion on #pdxpug about cardinality vs selectivity*. What I thought I knew about indexes has been thrown on its head – don’t base your decisions just on whether or not the column in question is searched on.

One of the recommendations for “Sanity Checks” is to take a good look at your indexes. For starters, check for tables with no indexes, or a lot of indexes. There’s a sample query to pull the number of rows, indexes, and some info about those indexes for each table. Faroult only shows sample queries for Oracle, SQL Server, and MySQL, so Selena & I put our heads together & came up with an equivalent for PostgreSQL:

(Only works on 8.3; ditch the pg_size_pretty if you’re on an earlier version)

SELECT
    t.tablename,
    pg_size_pretty(c.reltuples::bigint) AS num_rows,
    c.reltuples AS num_rows,
    count(indexname) AS number_of_indexes,
    CASE WHEN x.is_unique = 1 THEN 'Y'
       ELSE 'N'
    END AS unique,
    SUM(case WHEN number_of_columns = 1 THEN 1
              ELSE 0
            END) AS single_column,
    SUM(case WHEN number_of_columns IS NULL THEN 0
             WHEN number_of_columns = 1 THEN 0
             ELSE 1
           END) AS multi_column
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg on ipg.oid = x.indexrelid  )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
GROUP BY t.tablename, c.reltuples, x.is_unique
order by 2;

It took quite a bit of chocolate to wrap that up…afterwards, Selena decided that it would be neat to look at table & index sizes and see which indexes were being scanned and how many tuples fetched:

SELECT
    t.tablename,
    indexname,
    c.reltuples AS num_rows,
    pg_size_pretty(pg_relation_size(t.tablename)) AS table_size,
    pg_size_pretty(pg_relation_size(indexrelname)) AS index_size,
    CASE WHEN x.is_unique = 1  THEN 'Y'
       ELSE 'N'
    END AS unique,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch,indexrelname FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg ON ipg.oid = x.indexrelid
           JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
order by 1,2;



cardinality: size of the relation (”number of rows in [something]“)
selectivity: percent of the relation that’s selected
cardinality * selectivity = number of tuples in your results set.

posted by gabrielle at 5:35 pm  

Saturday, June 6, 2009

When I get a TUIT…

I use Perl’s split function a lot more than I use join. Every time I use join I go through an iteration like this:

my @things = split(/-/, $value);
[do cool stuff to @things]
my $new_value = join(/:/, @things); #D'oh! Should be join(':', @things);

I understand why (split can take a regexp, join must be on a specific value) but that doesn’t mean it still doesn’t trip me up. When I get time, I’ll write something that will let me put my arrays back together the same way I took them apart.

posted by gabrielle at 4:52 pm  

Thursday, June 4, 2009

Optimizing aggregate use using subselects

Mark Wong is learning about the database called Portal that Portland State University researchers maintain. It contains a bunch of interesting data about our roads and traffic, gathered from lots of sensors installed along our highways in Oregon.

Here was the original query he posted for optimization:

-- Show each detector, its state, and what percentage of time it is ok or not.
SELECT t1.detectorid, description, COUNT(*) AS count, total,
       CAST(COUNT(*) AS NUMERIC) / CAST(total AS NUMERIC) * 100.0 AS percentage
FROM loopdata_2007_01_01 t1, loop_status,
     (SELECT detectorid, COUNT(*) AS total
      FROM loopdata_2007_01_01 t2
      GROUP BY detectorid
      ORDER BY detectorid) t3
WHERE t1.detectorid = t3.detectorid
  AND t1.status = loop_status.status
GROUP BY t1.detectorid, description, t3.total
ORDER BY t1.detectorid, description, t3.total

So, my first approach to optimizing this was to just materialize the data from the aggregates. COUNT(*) across these tables was killing the performance.

Here was my quick and dirty mockup, plus an example of a trigger to add to the schema in case we wanted to add it to the load scripts:

BEGIN;
-- DROP TABLE selena.detectorid_count;
CREATE TABLE selena.detectorid_count (
        detectorid integer not null PRIMARY KEY,
        count integer not null
);

-- DROP FUNCTION update_count;
CREATE OR REPLACE FUNCTION update_count() RETURNS trigger as
$update_count$
DECLARE
        _count integer;
BEGIN
        if (TG_OP = 'UPDATE') THEN
                -- Don't care about updates
                RETURN NULL;
        ELSIF (TG_OP = 'INSERT') THEN
                SELECT count INTO _count FROM selena.detectorid_count WHERE detectorid = NEW.detectorid;
                IF _count IS NULL THEN
                        INSERT INTO selena.detectorid_count VALUES(NEW.detectorid, 1);
                ELSE
                        UPDATE selena.detectorid_count d SET d.count = _count + 1 where d.detectorid = NEW.detectorid;
                END IF;
        END IF;
        RETURN NULL;
END;
$update_count$ LANGUAGE plpgsql;

-- DROP TRIGGER update_count on loopdata_2007_01_01;
CREATE TRIGGER update_count AFTER INSERT ON loopdata_2007_01_01 FOR EACH ROW EXECUTE PROCEDURE update_count();

-- DROP TABLE selena.test.agg;
CREATE TABLE selena.test_agg (
        detectorid integer not null,
        description text not null,
        total integer not null,        PRIMARY KEY (detectorid, description, total)
);

INSERT INTO selena.test_agg SELECT t1.detectorid, description, count(*) as count from loopdata_2007_01_01 t1, loop_status WHERE t1.status = loop_status.status GROUP B
Y t1.detectorid, description;

explain analyze SELECT t1.detectorid, description, t1.total as count, t3.count,
       CAST(t1.total AS NUMERIC) / CAST(t3.count AS NUMERIC) * 100.0 AS percentage
FROM selena.test_agg t1,
      selena.detectorid_count t3
WHERE t1.detectorid = t3.detectorid
ORDER BY t1.detectorid, description, t3.count;

ROLLBACK;
-- COMMIT;

The resulting query only took 7 ms!  So, Mark kind of thought that was cheating – since obviously, if I pre-calculate the values, the queries will be faster.

Looking at the EXPLAIN ANALYZE:

explain analyze SELECT t4.detectorid, t4.description, t4.count, total,
       CAST(t4.count AS NUMERIC) / CAST(total AS NUMERIC) * 100.0 AS percentage
FROM  (SELECT t1.detectorid, description, count(*) as count
        from loopdata_2007_01_01 t1, loop_status
        where t1.status = loop_status.status
        GROUP BY t1.detectorid, description) t4,
     (SELECT detectorid, COUNT(*) AS total
      FROM loopdata_2007_01_01 t2
      GROUP BY detectorid
      ORDER BY detectorid) t3
WHERE t4.detectorid = t3.detectorid
ORDER BY t4.detectorid, t4.description, t3.total
;
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=710083.07..711078.47 rows=398161 width=38) (actual time=22667.435..22668.159 rows=1386 loops=1)
   Sort Key: t1.detectorid, loop_status.description, t3.total
   Sort Method:  quicksort  Memory: 198kB
   ->  Merge Join  (cost=602939.70..651271.14 rows=398161 width=38) (actual time=17907.248..22665.778 rows=1386 loops=1)
         Merge Cond: (t1.detectorid = t3.detectorid)
         ->  GroupAggregate  (cost=544650.66..573487.36 rows=126200 width=22) (actual time=14246.234..18997.411 rows=1386 loops=1)
               ->  Sort  (cost=544650.66..551465.46 rows=2725920 width=22) (actual time=14242.307..17137.123 rows=2725920 loops=1)
                     Sort Key: t1.detectorid, loop_status.description
                     Sort Method:  external merge  Disk: 50000kB
                     ->  Hash Join  (cost=45.33..85556.32 rows=2725920 width=22) (actual time=0.076..5438.041 rows=2725920 loops=1)
                           Hash Cond: (t1.status = loop_status.status)
                           ->  Seq Scan on loopdata_2007_01_01 t1  (cost=0.00..44622.20 rows=2725920 width=4) (actual time=0.032..1685.744 rows=2725920 loops=1)
                           ->  Hash  (cost=25.70..25.70 rows=1570 width=22) (actual time=0.030..0.030 rows=6 loops=1)
                                 ->  Seq Scan on loop_status  (cost=0.00..25.70 rows=1570 width=22) (actual time=0.016..0.021 rows=6 loops=1)
         ->  Materialize  (cost=58289.03..58303.23 rows=631 width=10) (actual time=3660.990..3663.363 rows=1386 loops=1)
               ->  Subquery Scan t3  (cost=58289.03..58296.92 rows=631 width=10) (actual time=3660.983..3662.162 rows=631 loops=1)
                     ->  Sort  (cost=58289.03..58290.61 rows=631 width=2) (actual time=3660.980..3661.429 rows=631 loops=1)
                           Sort Key: t2.detectorid
                           Sort Method:  quicksort  Memory: 54kB
                           ->  HashAggregate  (cost=58251.80..58259.69 rows=631 width=2) (actual time=3659.995..3660.390 rows=631 loops=1)
                                 ->  Seq Scan on loopdata_2007_01_01 t2  (cost=0.00..44622.20 rows=2725920 width=2) (actual time=0.044..1633.167 rows=2725920 loops=1)
 Total runtime: 22686.007 ms
(22 rows)

We shaved 4000 ms off the original query!  Signficant savings, but not terrific performance. Ultimately, the solution will be to materialize the aggregate calculations ahead of time.

posted by selenamarie at 10:13 am  

Monday, June 1, 2009

Adventures in QA & Testing, Part I.

A couple of weeks ago at PDX.pm we had a Quality Assurance Tools panel discussion. I was invited to be on the panel for the “beginning tester” perspective.

QA concepts are already familiar to me from my time as a microbiologist/immunologist with the FDA. We had a QA division that evaluated us quarterly in the following areas*:
Safety:
- appropriate gear (lab goats, goggles, gloves appropriate for what you’re handling)
- OSHA regs (aisle spacing, fire extinguishers, etc)
- drills (fire, acid spill, etc)

Keeping things neat:
- do we have expired chemicals hanging around
- are we keeping our documentation up-to-date and readable
- record-keeping (temperature records for fridges & incubators)

Calibration:
- solution & culture standardization
- instrument calibration (making sure all the lasers point the right way!)

Here’s how I relate this to software testing:

Safety equipment:
Backups & version control. If you have these, you can get yourself out of anything. Remember to practice restoring your backups.

Keeping things neat:
perltidy & perlcritic are your friends. (I still say perlcritic needs to have a drinking game that goes along with it.) Keep your code & documentation fresh.

Calibration:
Testing. Making sure that, given a certain input/environment, your code will produce certain output. For a while I confused testing with error handling – but error handling only deals with a certain set of inputs/$ENV. You want to include error handling in your testing – make sure that something that should throw an error actually does.

Once I got a grip on what I wanted to do, I had to figure out how to accomplish it. Learning how to use the tools was the hard part. Hard enough, in fact, that it took me a year of sporadic false starts before I actually did anything productive. I’m not blessed with a separate QA team for my programming tasks; I have to do it myself, but that is no excuse for having crappy code.

My largest project is my own fork of NMIS, which has no existing tests. (It may now, I forked it a while ago.) I went for the low-hanging fruit & started by testing a simple subroutine that altered text input:

my $ifName = "Serial1/0/0.0";
is (convertIfName($ifName),
'serial1-0-0-0',
'convertIfName should replace non-alphanumeric chars with hyphens and lowercase the whole schmear'
);

Over the course of 3 days, I wrote something like 200 tests.(eta actually I think I mean assertions. I’m still learning the lingo.) These were all simple unit tests (basically, does this one little block of code do what it’s supposed to do). I haven’t started yet with integration testing (does it play well with others).

The Payoff:
- I gained a much better understanding of how my code works. And found some interesting glitches – edge cases that (in theory, anyway) would never be executed in the existing production environment, but should probably be tested for anyway in case I decide I want to use them somewhere else.
- I found a lot of unused code & duplicated code, and places I could use now-standard Perl modules (like I said, my fork is old).
- Best of all, I can change my code (at least the parts I have tests for) at will and not worry that I’m going to screw something else up.

Glitches I hit:
- I already have been bitten in the ass by an edge case.
- Haven’t experienced any time savings yet, due to the learning curve.
- I’m about even with aggravation savings at this point – I am taking the next steps (mocking objects, getting ready to test an actual script^Wprogram instead of a module) and it’s like starting all over.

Places I’ve found answers to my burning testing questions:
- perlmonks archives
- stackoverflow
- Perl-QA wiki
- my local .pm IRC

* artificial categories which made it easier to draw parallels with software testing; thanks to Peter Eschright for the great idea from his talk at the recent BarCamp Portland.

posted by gabrielle at 10:01 pm