My (MySQL) cursor gets the right help!

In my blog yesterday I wrote a simple cursor example for a client who is learning MySQL the results of which should have been a distinct list of customerIDs. Unfortunately the final value in the result set was repeated! After first assuming that the cursor SELECT DISTINCT query had failed, I later verified that the query was working correctly. In fact the final repeated value in the result set only had one entry in the source table to begin with! What is going on? Here is the original cursor:


CREATE PROCEDURE cid_orders()
BEGIN
-- Declare local variables
DECLARE done BOOLEAN DEFAULT 0;
DECLARE id INT;
-- Declare the cursor
DECLARE customer_ids CURSOR
FOR
SELECT DISTINCT customerID FROM orders ORDER BY customerID ASC;
-- so we are asking for DISTINCT customerID's (so if we have
-- a duplicate customerID we shouldn't get it) and
-- are ordering the result set in ASCENDING order by
-- the customerID column

-- Declare continue handler
-- when no more data remains to be fetched the '02000' state
-- is fired and done is set, causing the loop to end
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
-- Create a table to store the results
CREATE TABLE IF NOT EXISTS order1 (id INT);
-- Open the cursor
OPEN customer_ids;
-- Loop through all rows
REPEAT
-- Get the next id number
FETCH NEXT FROM customer_ids INTO id;
INSERT INTO order1(id) VALUES(id);
-- End of loop
UNTIL done = 1 END REPEAT;
-- Close the cursor
CLOSE customer_ids;
END;

So what is happening is that when no more data is seen in the cursor result set during the FETCH statement, SQLSTATE is set to ‘02000’ and the continue handler sets done to 1. However the loop doesn’t immediately exit. As a result, the INSERT statement following the FETCH is executed with the old customerID value and thus the final piece of data is written to the output table twice!

Yesterday I solved the problem before checking if the SELECT DISTINCT query was actually working correctly. By the time I thought to check, I had already written a nice solution that fixed the problem – I kept track of the previous customerID and wrapped the INSERT statement in an decision statement to keep the code from inserting a customerID twice. The only problem is that I solved the wrong problem (and constantly checking for a previous customerID is wasteful).

The actual problem is that the INSERT statement is fired following a FETCH in the original code regardless of the SQLSTATE. Unfortunately it doesn’t appear possible to check the SQLSTATE in the loop itself. However we can check the done variable and that is the first correct solution:


CREATE PROCEDURE cid_orders()
BEGIN
-- Declare local variables
DECLARE done BOOLEAN DEFAULT 0;
DECLARE id INT;
-- Declare the cursor
DECLARE customer_ids CURSOR
FOR
SELECT DISTINCT customerID FROM orders ORDER BY customerID ASC;
-- so we are asking for DISTINCT customerID's (so if we have
-- a duplicate customerID we shouldn't get it) and
-- are ordering the result set in ASCENDING order by
-- the customerID column

-- Declare continue handler
-- when no more data remains to be fetched the '02000' state
-- is fired and done is set, causing the loop to end
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
-- Create a table to store the results
CREATE TABLE IF NOT EXISTS order1 (id INT);
-- Open the cursor
OPEN customer_ids;
-- Loop through all rows
REPEAT
-- Get the next id number
FETCH NEXT FROM customer_ids INTO id;
IF done <> 1 THEN
INSERT INTO order1(id) VALUES(id);
END IF;
-- End of loop
UNTIL done END REPEAT;
-- Close the cursor
CLOSE customer_ids;
END;

The second (better) correct solution is to unconditionally exit the loop after the FETCH statement has raised the ‘02000’ SQLSTATE. This requires changing the REPEAT … UNTIL loop to a labeled LOOP:


CREATE PROCEDURE cid_orders3()
BEGIN
-- Declare local variables
DECLARE done BOOLEAN DEFAULT 0;
DECLARE id INT;
-- Declare the cursor
DECLARE customer_ids CURSOR
FOR
SELECT DISTINCT customerID FROM orders ORDER BY customerID ASC;
-- so we are asking for DISTINCT customerID's (so if we have
-- a duplicate customerID we shouldn't get it) and
-- are ordering the result set in ASCENDING order by
-- the customerID column

-- Declare continue handler
-- when no more data remains to be fetched the '02000' state
-- is fired and done is set, causing the loop to end
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
-- Create a table to store the results
CREATE TABLE IF NOT EXISTS order1 (id INT);
-- Open the cursor
OPEN customer_ids;
-- Loop through all rows
loop1: LOOP
-- Get the next id number
FETCH NEXT FROM customer_ids INTO id;
IF done = 1 THEN
LEAVE loop1;
END IF;
INSERT INTO order1(id) VALUES(id);
-- End of loop
END LOOP loop1;
-- Close the cursor
CLOSE customer_ids;
END;

Posted in MySQL, SQL | Tagged | Leave a comment

My (MySQL) cursor needs help!

I adapted a MySQL cursor example from Kevin Bedell’s MySQL cursor example. He has a cursor running in a stored procedure with the cursor results inserted into an output table.

I adapted this example for a client who is learning MySQL. Unfortunately the results were incorrect! In this example, I wrote a simple cursor that queries for distinct customerID’s and puts the result set in ascending order. I place the result set in ascending order because typically you know nothing about the data coming back and need to add constraints on the result set if you need to guarantee the order of the data returned. The customerID’s have to be ordered to fix the problem that came up (which I will reveal in just a second – it’s a shocker!).

So the orders table that is used looks like this:

Field Type
service VARCHAR(15)
subscribed DATE
canceled DATE
customerID INT(11)

Importantly, there is no primary key on the table. This is an example table taken from a local university and the idea is to accumulate customer orders for services in the orders table with the date they subscribed to something and the date they canceled the subscription. So there can be more than one customerID in the table because a customer can order more than one service.

And the orders table data might look like this:

service subscribed cancelled customerID
Phone 2013-01-01 NULL 0
Phone 2013-01-01 NULL 1
Android 2013-01-01 NULL 1
Phone 2013-01-01 NULL 2
Wireless 2013-01-01 NULL 3

So here is the adapted cursor in the cid_orders stored procedure:

CREATE PROCEDURE cid_orders()
BEGIN
-- Declare local variables
DECLARE done BOOLEAN DEFAULT 0;
DECLARE id INT;
-- Declare the cursor
DECLARE customer_ids CURSOR
FOR
SELECT DISTINCT customerID FROM orders ORDER BY customerID ASC;
-- so we are asking for DISTINCT customerID's (so if we have
-- a duplicate customerID we shouldn't get it) and
-- are ordering the result set in ASCENDING order by
-- the customerID column

-- Declare continue handler
-- when no more data remains to be fetched the '02000' state
-- is fired and done is set, causing the loop to end
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
-- Create a table to store the results
CREATE TABLE IF NOT EXISTS order1 (id INT);
-- Open the cursor
OPEN customer_ids;
-- Loop through all rows
REPEAT
-- Get the next id number
FETCH NEXT FROM customer_ids INTO id;
INSERT INTO order1(id) VALUES(id);
-- End of loop
UNTIL done = 1 END REPEAT;
-- Close the cursor
CLOSE customer_ids;
END;

Pretty straightforward. So what was the problem? When I queried the order1 table I got duplicate customerIDs!


mysql> SELECT * FROM order1;
+---+
|id |
+---+
| 0 |
| 1 |
| 2 |
| 4 |
| 7 |
| 7 |
+---+
5 rows in set (0.06 sec)

So the SELECT DISTINCT query didn’t seem to have worked as expected.
I then queried the orders table and it turns out that customerID 7 only has one service to begin with! customerID’s 1 and 2 have multiple services and in the result set they are returned as distinct entities. So something is happening in the cursor itself that has it repeating the final value in the result set. That something has to be the behavior when state ‘02000’, no more data, is encountered. This is thrown on the FETCH (presumably) but the following INSERT is still executed with the last value.

When I first saw this I assumed that the SELECT DISTINCT query hadn’t worked correctly (however it had!). So the following code does in fact fix the problem but it actually fixes the wrong problem: the assumption was that the SELECT DISTINCT query had failed (but it hadn’t) so the INSERT is wrapped in an IF statement that keeps track of the previous id value (thus the necessity of ordering customerID in ASC or DESC order). This still works of course but a better solution, one dealing with the INSERT repeating the last operation after the FETCH raised the ‘02000’ no more data state is forthcoming (it’s always better to solve the actual problem).


CREATE PROCEDURE cid_orders()
BEGIN
-- Declare local variables
DECLARE done BOOLEAN DEFAULT 0;
DECLARE id INT;
DECLARE last_id INT DEFAULT -1;
-- Declare the cursor
DECLARE customer_ids CURSOR
FOR
SELECT DISTINCT customerID FROM orders ORDER BY customerID ASC;
-- so we are asking for DISTINICT customerID's (so if we have
-- a duplicate customerID we shouldn't get it) and
-- are ordering the result set in ASCENDING order by
-- the customerID column

-- Declare continue handler
-- when no more data remains to be fetched the '02000' state
-- is raised and done is set, causing the loop to end
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done=1;
-- Create a table to store the results
CREATE TABLE IF NOT EXISTS order1 (id INT);
-- Open the cursor
OPEN customer_ids;
-- Loop through all rows
REPEAT
-- Get the next id number
FETCH NEXT FROM customer_ids INTO id;
-- there shouldn't be any duplicate id's in the fetched set
-- because the query asked for DISTINCT customerID's
-- nonetheless I saw them on my system so we check and see
-- if the previous id number has already been entered
IF last_id <> id THEN
INSERT INTO order1(id) VALUES(id);
END IF;
SET last_id = id;
-- End of loop
UNTIL done = 1 END REPEAT;
-- Close the cursor
CLOSE customer_ids;
END;

The resulting data in order1 is what was needed: a distinct set of customerID’s from the orders table. However a better solution addressing the real problem is needed because it is expensive (relatively) for a stored procedure to have to store the last customerID seen every time into a local variable.

So I got my cursor help and it worked but it wasn’t the exact right help that it needed!

Posted in MySQL, SQL | Tagged | Leave a comment

Hello World! in MySQL – there’s more than one way ….

How would we write the venerable “Hello World” program in MySQL? As with all SQL implementations, there’s more than one way.

The most obvious way is just to use a SELECT statement:
SELECT "Hello World!";

This results in:

+--------------+
| Hello World! |
+--------------+
| Hello World! |
+--------------+
Query OK, 1 row in set (0.00 sec)

in the MySQL command line client. However this is not the only way to write a “Hello World!” program in MySQL. We could create a session variable and use it to print “Hello World!”:


In the MySQL command line client:
mysql> SET @helloWorldStr = "Hello World!";
Query OK, 0 rows affected (0.05 sec)
mysql> SELECT @helloWorldStr AS "My Hello World";
+----------------+
| My Hello World |
+----------------+
| Hello World! |
+----------------+
Query OK, 1 row in set (0.00 sec)

We could also write a stored procedure:

mysql> delimiter // -- we have to reset the delimiter otherwise MySQL will interpret the semicolon in the procedure code as a delimiter as you are writing the procedure and will terminate.
mysql> CREATE PROCEDURE HelloWorld()
-> BEGIN
-> SELECT "Hello World!";
-> END; //
Query OK, 0 rows affected (1.97 sec)
mysql> delimiter ;
mysql> CALL HelloWorld();
+--------------+
| Hello World! |
+--------------+
| Hello World! |
+--------------+
Query OK, 1 row in set (0.00 sec)

Hey! That’s cheating! All you did was wrap the original SELECT statement in a procedure! That’s true, but now you never have to write the silly Hello World program again (at least not in the MySQL dialect).

Note that in the HelloWorld stored procedure we did not have to use a BEGIN … END block because there was just one SQL statement in the procedure. We could have just written:


mysql> CREATE PROCEDURE HelloWorld()
-> SELECT "Hello World!";

I added the BEGIN…END block for clarity.

We can modify the stored procedure to say Hello < Name > by adding an input parameter that takes a string (in this case a VARCHAR). However we have to also add an output parameter in this case and use a session variable in order to get the desired output:


mysql> delimiter // -- we have to reset the delimiter
mysql> CREATE PROCEDURE HelloName(IN name VARCHAR(100)
-> OUT res VARCHAR(120))
-> BEGIN
-> SELECT CONCAT("Hello ", name) INTO res;
-> END; //
Query OK, 0 rows affected (1.97 sec)
mysql> delimiter ;
mysql> SET @hw = ""; -- create a session variable called hw
mysql> CALL HelloName("Cindy", @hw); -- @hw has the result
Query OK, 1 row affected (0.13 sec)
mysql> SELECT @hw;
+-------------+
| @hw |
+-------------+
| Hello Cindy |
+-------------+
Query OK, 1 row in set (0.04 sec)

Again in this example we didn’t actually have to use a BEGIN…END block in the procedure HelloName since the procedure is actually just a single SQL statement (BEGIN…END blocks are used to denote blocks for multiple SQL statements). So we could have written the procedure HelloName as:


mysql> CREATE PROCEDURE HelloName(IN name VARCHAR(100)
-> OUT res VARCHAR(120))
-> SELECT CONCAT("Hello ", name) INTO res;

Finally we can round out our Hello World programs by creating a function to return “Hello ” < Name >, storing the result into a session variable and then SELECTing the session variable:


mysql> CREATE FUNCTION HelloName(name VARCHAR(100))
-> RETURNS VARCHAR(120) DETERMINISTIC
-> RETURN CONCAT("Hello ", name);
mysql> SET @fres = HelloName("Walter");
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT @fres;
+--------------+
| @fres |
+--------------+
| Hello Walter |
+--------------+
Query OK, 1 row in set (0.04 sec)

The DETERMINISTIC clause in the function definition tells MySQL that name input will have the same output. So if name is “David”, that input will always have the same output – “Hello David”.

Posted in MySQL, SQL | Tagged , , , | Leave a comment

A QuantLib Date demo

Here’s a short Date class program from the QuantLib library exercising the class functionality:


#include <ql/QuantLib.hpp>
#include <boost/timer.hpp>
#include <iostream>
#include <iomanip>

using namespace QuantLib;

int main(int, char* []) {
Date myDate(19, October, 2012);
std::cout << myDate << std::endl;

myDate++;
std::cout << myDate << std::endl;

myDate += 12*Days;
std::cout << myDate << std::endl;

myDate -= 2*Months;
std::cout << myDate << std::endl;

myDate--;
std::cout << myDate << std::endl;

return 0;
}

Compile it: g++ -I/usr/lib/include -L/usr/lib/lib -lQuantLib dateTest.cpp -o dateTest

Run it: > ./dateTest

October 19th, 2012
October 20th, 2012
November 1st, 2012
September 1st, 2012
August 31st, 2012

Posted in QuantLib | Tagged | Leave a comment

Problem 23 – ProjectEuler.net

I solved problem 23 in the Project Euler computational math problems yesterday. This makes 52 problems that I’ve solved there and only 346 more to go! Basically aside from the first two problems, many of the problems at Project Euler appear to be impossible.

Problem 23 asks for the sum of all numbers that cannot be written as the sum of two abundant numbers. An abundant number is a number where the sum of proper divisors of the number are greater than the number: 12 is an abundant number because it’s proper divisors 1, 2, 3, 4, 6 sum to 16. So the first number that can be written as the sum of two abundant numbers is 24 as it is the sum of 12 + 12 (12 being an abundant number).

An interesting problem. Check out ProjectEuler.net for more interesting computational math problems!

Posted in computational math | Leave a comment

Running QuantLib test suite

A client has asked me to help him learn the open source financial analysis package QuantLib. I finally got around to installing QuantLib and am currently running the test suite. The only problem is that the test suite doesn’t seem to do anything. However I found a post by Luigi, one of the main developers, that suggested adding the BOOST_TEST_LOG_LEVEL=message environment variable to my environment. Worked perfectly and now instead of seeing a message saying that the software is running 497 (or so) cases and a blank console, I’m getting reassuring progress messages. Now this test might go on for a few hours or so …. here’s some of the output:

> ./quantlib-test-suite
===================================
Testing QuantLib 1.2.1
QL_NEGATIVE_RATES defined
QL_EXTRA_SAFETY_CHECKS undefined
QL_DISABLE_DEPRECATED undefined
QL_USE_INDEXED_COUPON undefined
===================================
Running 497 test cases…
Testing Barone-Adesi and Whaley approximation for American options…
Testing Bjerksund and Stensland approximation for American options…
Testing Ju approximation for American options…
Testing finite-difference engine for American options…
Testing finite-differences American option greeks…
Testing finite-differences shout option greeks…
Testing array construction…
Testing analytic continuous geometric average-price Asians…
Testing analytic continuous geometric average-price Asian greeks…
Testing analytic discrete geometric average-price Asians…
Testing analytic discrete geometric average-strike Asians…
Testing Monte Carlo discrete geometric average-price Asians…
Testing Monte Carlo discrete arithmetic average-price Asians…
Testing Monte Carlo discrete arithmetic average-strike Asians…
Testing discrete-averaging geometric Asian greeks…
Testing use of past fixings in Asian options…
Testing Levy engine for Asians options…
Testing consistency between fair price and fair spread…
Testing implied bond value against asset-swap fair price with null spread…
Testing relationship between market asset swap and par asset swap…
Testing clean and dirty price with null Z-spread against theoretical prices…
Testing implied generic-bond value against asset-swap fair price with null spread…
Testing market asset swap against par asset swap with generic bond…
Testing clean and dirty price with null Z-spread against theoretical prices…
Testing clean and dirty prices for specialized bond against equivalent generic bond…
Testing asset-swap prices and spreads for specialized bond against equivalent generic bond…

Posted in QuantLib | Leave a comment

Algorithm to find the source of a rumor or epidemic in a network

From the write-up at Phys.org:

A team of EPFL scientists has developed an algorithm that can identify the source of an epidemic or information circulating within a network, a method that could also be used to help with criminal investigations.

and perhaps it can be used to find the source of a bug or a set of bugs or possibly extended to find the source of an observable characteristic in software.

The preprint is available here.

Posted in algorithms | Leave a comment