<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jeremy Cook &#187; SQL</title>
	<atom:link href="http://jeremycook.ca/tag/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://jeremycook.ca</link>
	<description>Random musings on web development and PHP</description>
	<lastBuildDate>Mon, 30 Jan 2012 02:31:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Creating 50,000 unique values</title>
		<link>http://jeremycook.ca/2010/07/11/creating-50000-unique-values/</link>
		<comments>http://jeremycook.ca/2010/07/11/creating-50000-unique-values/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 01:57:25 +0000</pubDate>
		<dc:creator>Jeremy Cook</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://jeremycook.ca/?p=135</guid>
		<description><![CDATA[A client at work wants to run a promotion where a customer will receive a card with a unique 8 digit code on it when they buy something. They will then be able to visit the website to find out if they are a winner or to get a chance to enter a prize draw. [...]]]></description>
			<content:encoded><![CDATA[<p>A client at work wants to run a promotion where a customer will receive a card with a unique 8 digit code on it when they buy something. They will then be able to visit the website to find out if they are a winner or to get a chance to enter a prize draw. I have to put all of the code together to manage this and thought I would run a few experiments to try out a few ideas, the first of which was to generate the 50,000 unique codes needed for the competition. I didn&#8217;t think this would be a particularly difficult task (and in reality it wasn&#8217;t) but I hit a number of problems while doing this that revealed some interesting things about the variations of running PHP on different operating systems and issues with handling large datasets.</p>
<h2>Development environment</h2>
<p>My development environment is Windows 7 with Apache (compiled using VC9), PHP 5.3.2 and MySQL 5.1.47. For coding I use NuSphere&#8217;s PHPED. All of my timings were made using the built in profiler that comes with PHPED and the DBG PHP debugging extension. Why is this information important? Hopefully that will become clear later.</p>
<h2>My first attempt</h2>
<p>My first attempt used a while loop to iterate 50,000 times to create the values. On each iteration a code was generated and a prepared statement executed to insert the code into the database. The column holding the code has a unique index on it, causing an exception to be thrown if PHP generates a duplicate 8 digit code. This is then caught in the inner catch block, which causes the loop to go through another iteration. If the code is inserted successfully the counter is incremented and the loop carries on. Code for this is below:</p>
<pre class="brush: php; title: ; notranslate">

&lt;?php
 //Up the execution time to 15 minutes as this takes ages to run
 ini_set('max_execution_time', 900);
 try {
 //Create a PDO connection to the database
 $db = new PDO('mysql:dbname=test;host=localhost', 'root', 'PASSWORD');
 //Set the PDO error mode to exceptions
 $db-&gt;setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
 //Delete all entries from the test table
 $db-&gt;exec('TRUNCATE test');
 //Prepare the statement
 $st = $db-&gt;prepare('INSERT INTO test(code) VALUES(?)');
 //Bind the parameter to the $code variable.
 //When the statement is executed whatever value is in $code will be used as the value for the parameter
 $st-&gt;bindParam(1, $code);
 //Set the counter for the main loop
 $i = 0;
 //Possible characters for the code string
 $possible = &quot;0123456789bcdfghjkmnpqrstvwxyz&quot;;
 //Length of the possible characters (0 based)
 $len = strlen($possible) - 1;
 //Loop 50,000 times!
 while ($i &lt; 50000) {
 try {
 //Reset the $code variable
 $code = &quot;&quot;;
 //Create a counter for the inner loop
 $j = 0;
 //add random characters to $code until it is 8 characters long
 do {
 //Get a random character
 $char = $possible[mt_rand(0, $len)];
 //we don't want this character if it's already in the code
 if (! strpos($code, $char)) {
 $code .= $char;
 $j++;
 }
 } while ($j &lt; 8);
 //Execute the statement. The value of $code will be used as the parameter
 $st-&gt;execute();
 $i++;
 }
 catch (PDOException $e) {
 //If an exception is caught here it's probably a duplicate value in the code column. Continue to get a new value.
 continue;
 }
 }
 }
 catch (PDOException $e) {
 //Any other exceptions caught here.
 echo $e-&gt;getMessage();
 }
?&gt;
</pre>
<p>The major problem with this was the time taken to execute it: after 15 minutes I still didn&#8217;t have 50,000 rows in the database but only something like 38,000! The code worked but was amazingly slow. After running the code through a profiler I found that 14 mins 54 seconds were spent on the line &#8216;$st-&gt;execute()&#8217; to insert the row. I was quite amazed by this as I didn&#8217;t expect the database to add quite such an overhead. The other important point to mention here is that I used the mt_rand() function. Why that&#8217;s important will become clear in a moment.</p>
<p>At this point I had two main questions:</p>
<ol>
<li>What&#8217;s the quickest way to create 50,000 unique 8 digit codes?</li>
<li>What&#8217;s the quickest way to insert these rows into the database?</li>
</ol>
<p>I should add that the need to find answers to both of these questions was somewhat academic and more to satisfy my own curiosity. This code would never be run on a live server but would be used on my development box to generate a database table. This could then be dumped into a SQL file and used to create the table on a live server. As a result performance was not the number one consideration in this case. I still wanted to run this in less than 15 minutes though!</p>
<h2>The &#8216;Hamlet&#8217; solution</h2>
<p>At this point I turned to the <a href="http://www.open.ac.uk/">Open University</a> web development forums for help. I learnt <a href="http://www3.open.ac.uk/study/undergraduate/qualification/c39.htm">web development at the OU</a> and their forums are open to students, tutors and ex-students like myself. Some very clever people hang out there and I knew I would get some good suggestions. Two of the tutors of the course on open source web development tools, Michelle Hoyle and Keith Evetts, which focuses on PHP as a server side scripting language engaged in a discussion with me about how to generate codes quickly. From the beginning I noticed that ideas they were posting were not working for me. They were using PHP functions like str_shuffle(), array_rand() and rand() and producing 50,00 unique values quickly. I had independently tried str_shuffle() and array_rand() too but had given up on them as they generated huge numbers of duplicate codes after a while. At this point I couldn&#8217;t understand why Keith and Michelle could produce solutions using these functions that I couldn&#8217;t get to work.</p>
<p>Keith Evetts came up with an idea which seemed crazy at the time but which generates random values very well and quickly. His idea was to take a large piece of text, use mcrypt() to encrypt it, discard all non alpha-numeric characters from the encrypted text and use this to generate the 8 digit codes. He used the the third act of Hamlet as the text (hence my calling this the Hamlet solution). This worked extremely well except for the fact that even the third act of Hamlet couldn&#8217;t be used to produce 50,000 codes. The solution was to loop over the code generation, encrypting the text using different initialisation vectors and keystrings, until 50,00 uniques values had been produced. I also worked out at this time that the fastest way to insert this many values quickly into a MySQL table was to use the &#8216;LOAD DATA INFILE&#8217; command. I used Keith&#8217;s function to generate the codes, writing these to a text file before using &#8216;LOAD DATA INFILE&#8217; to insert the records into the database. The PHP code ran in 1.9 seconds while the database insert took around 6 seconds. Here&#8217;s the PHP code:</p>
<pre class="brush: php; title: ; notranslate">

&lt;?php
 /* ---------  array function generate_codes ( int $number, string $plaintext , string $keystring, $length ) -----------
Keith Evetts 3 July 2010 license: LGPL 2.  This notice and author name must remain intact.
Args: number of codes to be generated,
 plaintext string from which to generate them, minimum 100 chars (optimal length is 4000 chars)
 keystring in plain text; minimum 12 chars
 length of code strings to be generated
Returns: enumerated array of  unique alphanumeric codes of length $length
Requires: PHP mcrypt lib with Rijndael 256 (v. 2.4 + of mcrypt)
-------------------------------------------------------------------------------------------------------------------------------------- */
 function generate_codes ( $number, $plaintext, $keystring, $length = 8 ) {
 switch (true) {
 case ( ! is_int($number) )                                                :
 case ( ! is_int($length) )                                                  :
 case ( ! is_string($plaintext)  || strlen($plaintext ) &lt; 100)   :
 case ( ! is_string($keystring) || strlen($keystring) &lt; 12 )    :
 throw new Exception (' function generate_codes called with incorrect params ');
 break;
 // default is proceed
 }
 // use the same text and randomly different keys to reach desired number of codes
 // for e.g. 50000 codes this will take several passes
 $unique_array = array();
 do {
 // get a new key
 $key =  substr ( sha1 ( str_shuffle ( $keystring )  ) , 0 , 32 );
 // get a new initialisation vector
 $iv = substr ( sha1 ( str_shuffle ( &quot;the slings and arrows of outrageous fortune&quot; )  ) , 0, 32 );
 $ciphertext = mcrypt_encrypt (   MCRYPT_RIJNDAEL_256,
 $key,
 $plaintext,
 MCRYPT_MODE_CBC,
 $iv
 );
 /* clean out non-alphanumeric chars at some cost to code redundancy */
 $ciphertext = preg_replace( '/[^2346789abdefhjkmnprtwxyz]/' ,  &quot;&quot; , strtolower($ciphertext) );
 $codearray = str_split ( $ciphertext, $length );
 // dump leftover element at end
 array_pop( $codearray );
 $size = sizeof($codearray);
 for ($i = 0; $i &lt; $size; $i++) $unique_array[] = $codearray[$i];
 /*somewhat amazingly, it is far quicker to enlarge the array by adding elements one at a time in a for loop, than to use array_merge() ! */
 } while ( sizeof ( $unique_array ) &lt;= ( $number  + 1 ) ) ;
 // now remove any duplicates at end of whole process
 $unique_array = array_unique ( $unique_array );
 return array_slice($unique_array, 0, $number);
 }
 try {
 $codes = generate_codes(50000, file_get_contents('plaintext_Hamlet_Act3.txt'), 'This is the keystring');
 }
 catch (Exception $e) {
 echo $e-&gt;getMessage();
 }
 $file = fopen('codes.txt', 'w');
 foreach($codes as $code) {
 fwrite($file, &quot;$code\r\n&quot;);
 }
 fclose($file);
?&gt;
</pre>
<p>This was clearly the winner for me on Windows but it still didn&#8217;t explain why PHP random functions couldn&#8217;t be used for me on Windows.</p>
<h2>&#8216;Random&#8217; functions on Windows-a theory</h2>
<p>At this point I was left wondering why the various &#8216;random&#8217; functions on Windows had performed so badly for me. I knew that Michelle Hoyle uses a Mac and that Keith Evetts scripts executed without a problem on a web server. I began to think that the problem was PHP on Windows. To test this out I uploaded one of the scripts that generated huge numbers of duplicates for me to a live server running CentOS Linux and everything worked without any problems. So what&#8217;s going on? I have a theory and once again I need to thank Keith Evetts for pointing me on the way to this. It seems that the PHP function rand() is simply a wrapper for the operating systems native random function (see <a href="http://cod.ifies.com/2008/05/php-rand01-on-windows-openssl-rand-on.html">here</a> for more details). The PHP manual states:</p>
<blockquote>
<blockquote><p><strong>Note</strong>:          On some platforms (such as Windows), <a href="http://ca.php.net/manual/en/function.getrandmax.php">getrandmax()</a> is only 32768.  If you require a range larger than 32768, specifying     <em><tt>min</tt></em> and <em><tt>max</tt></em> will allow     you to create a range larger than this, or consider using     <a href="http://ca.php.net/manual/en/function.mt-rand.php">mt_rand()</a> instead.</p></blockquote>
</blockquote>
<p>My theory is that functions such as str_shuffle() or array_rand() are also using the native operating system random function. It just happens to be that the function on the Linux/Unix platform is far better than the one available under Windows, which would explain why the scripts run so differently under different platforms. Normally this wouldn&#8217;t create any problems but when you&#8217;re dealing with very large datasets where randomness is a necessity it becomes a problem. This would also explain why my initial attempt had no problems generating 50,000 unique values as I was using the mt_rand() function. This is based on mathematics known as the Mersenne Twister, which generates better random numbers, and does not rely on the operating systems random number generator. I don&#8217;t know any C or the PHP source code well enough to confirm this  but this is my strong hunch. Is someone able to confirm this?</p>
<h2>Conclusion</h2>
<p>As I said at the beginning what should have been a fairly simple exercise turned into something a little more involved and ultimately informative. I would suggest that if you need to make a large number of random values that these are the guidelines you might want to follow:</p>
<ul>
<li>If you&#8217;re going to be running the script on a Linux/Unix box or you&#8217;re developing on such a system go ahead and use whichever PHP functions you like. As the underlying OS random number generation is better than on Windows you won&#8217;t run into any problems and will almost certainly get better performance this way.</li>
<li>If you&#8217;re running on Windows your options are more limited. Here I would suggest that you use either mt_rand() or the &#8216;Hamlet&#8217; approach if you require more than 32,768 random values.</li>
</ul>
<p>I know that all programs rely on the OS they&#8217;re executing on but this discrepancy seems quite big to me. Is there any way that PHP&#8217;s &#8216;random&#8217; functions could be re-written to take advantage of the Mersenne Twister? With my limited knowledge of the PHP core code that would seem to me to offer the combination of good random generation and consistency across platforms. Of course, if it were that simple I&#8217;m sure someone else would already have done it.</p>
]]></content:encoded>
			<wfw:commentRss>http://jeremycook.ca/2010/07/11/creating-50000-unique-values/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Quick Tip</title>
		<link>http://jeremycook.ca/2010/05/15/a-quick-tip/</link>
		<comments>http://jeremycook.ca/2010/05/15/a-quick-tip/#comments</comments>
		<pubDate>Sat, 15 May 2010 14:00:10 +0000</pubDate>
		<dc:creator>Jeremy Cook</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[PDO]]></category>

		<guid isPermaLink="false">http://jeremycook.ca/?p=84</guid>
		<description><![CDATA[I haven&#8217;t written anything here for ages due to illness, work and life getting in the way. I&#8217;ve got a longer post brewing that I&#8217;ll hopefully add in the next couple of days but for now here&#8217;s a quick tip that I hope someone will find useful. I recently had a situation where I wanted [...]]]></description>
			<content:encoded><![CDATA[<p>I haven&#8217;t written anything here for ages due to illness, work and life getting in the way. I&#8217;ve got a longer post brewing that I&#8217;ll hopefully add in the next couple of days but for now here&#8217;s a quick tip that I hope someone will find useful.</p>
<p>I recently had a situation where I wanted to use an array of values as bound parameters in a SQL IN clause. Easy enough to do except the array was of variable length and I didn&#8217;t know how long it would be each time the script was ran. Here&#8217;s the solution I came up with.</p>
<pre class="brush: php; title: ; notranslate">

&lt;?php

try {

$array = array('some value', 'another', 'another');//Variable length array, unknown length before runtime

$db = new PDO(CONNSTR, USERNAME, PASS);

$sql = &quot;SELECT SomeColumn FROM table WHERE SomeOtherColumn IN (&quot; . implode(',', array_fill(0, count($array), '?')) . ') ORDER BY SomeColumn';

$st = $db-&gt;prepare($sql);

$st-&gt;execute($array);

}

catch (PDOExeception $e) {}

?&gt;
</pre>
<p>Quick and easy!</p>
]]></content:encoded>
			<wfw:commentRss>http://jeremycook.ca/2010/05/15/a-quick-tip/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Handling Binary Data with PDO</title>
		<link>http://jeremycook.ca/2010/02/21/handling-binary-data-with-pdo/</link>
		<comments>http://jeremycook.ca/2010/02/21/handling-binary-data-with-pdo/#comments</comments>
		<pubDate>Sun, 21 Feb 2010 17:27:11 +0000</pubDate>
		<dc:creator>Jeremy Cook</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[PDO]]></category>

		<guid isPermaLink="false">http://jeremycook.ca/?p=64</guid>
		<description><![CDATA[This post looks at an issue with handling binary data from a database using PDO and a partial workaround for the problem.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a great fan of the <a href="http://www.php.net/pdo">PDO</a> database access library in PHP 5 and use it for all of my database work in PHP. I love its&#8217; clean, object oriented syntax and great support for prepared statements. I also like the fact that it supports most of the most common database engines. Although all of my dev work in PHP so far has been with MySQL I like the fact that if I needed to use MS SQL Server, Oracle or any of the other big RDBMS&#8217;s I could use the same PDO syntax to access them rather than learning a new database access library. However, there do seem to be some bugs in PDO  according to what I&#8217;ve read on the web. While I haven&#8217;t encountered most of them and can&#8217;t comment on them I&#8217;d like to write about one that I ran into the other day and how I worked around it.</p>
<p>I have a project that I&#8217;m working on where I&#8217;m storing some images in a database as binary data. PDO allows you to bind a file handle to a parameter in a prepared statement and when the statement is executed the contents of the file are slurped into the database. This works perfectly but the problem comes when getting the image out of the database again to display it. According to the <a href="http://www.php.net/manual/en/pdo.lobs.php">PHP manual</a> the following code should work:</p>
<pre class="brush: php; title: ; notranslate">

&lt;?php
$db = new PDO('odbc:SAMPLE', 'db2inst1', 'ibmdb2');
$stmt = $db-&gt;prepare(&quot;select contenttype, imagedata from images where id=?&quot;);
$stmt-&gt;execute(array($_GET['id']));
$stmt-&gt;bindColumn(1, $type, PDO::PARAM_STR, 256);
$stmt-&gt;bindColumn(2, $lob, PDO::PARAM_LOB);
$stmt-&gt;fetch(PDO::FETCH_BOUND);

header(&quot;Content-Type: $type&quot;);
fpassthru($lob);
?&gt;
</pre>
<p>Binding a column from a result set to a variable using PDO::PARAM_LOB is supposed to return a stream resource into the variable when PDO::fetch() is called. This stream can then be operated on using any PHP function that handles files. Unfortunately there&#8217;s a <a href="http://bugs.php.net/bug.php?id=40913">bug</a> which means that instead of returning a stream into $lob PDO returns a string containing the binary data. When this is then passed to fpassthru() an error is triggered. Fortunately there&#8217;s a simple fix for displaying the image: replace the call to fpassthru() with echo or print. Since the browser is expecting an image after the call to header() writing the binary data using echo or print has the same effect as calling fpassthru(). In my code I&#8217;ve added the following just in case this bug is fixed in a future release:</p>
<pre class="brush: php; title: ; notranslate">

if (is_string($lob)) {

echo $lob;

} else {

fpassthru($lob);

}
</pre>
<p>This neatly gets around the problem if you just want to send the binary data back to the browser to be displayed. Anything more requiring the use of any file functions or image editing functions would need quite a few contortions in the code. The information from the database would probably need to be written to a temporary file to allow it to be operated on. This bug was first reported almost three years ago in PHP 5.2.6 and it&#8217;s still not fixed today in the most recent version, 5.3.1. It would be great if this bug was finally taken care of.</p>
<p><strong>Edit:</strong> Joshua Johnston has posted a comment below that explains how to convert a string of data into a stream using the data stream wrapper. I&#8217;ve tried it out and it works very well. I think it gives a cleaner solution to the problem and allows the data returned from the database to be manipulated with file functions.</p>
]]></content:encoded>
			<wfw:commentRss>http://jeremycook.ca/2010/02/21/handling-binary-data-with-pdo/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Location Aware Webpages</title>
		<link>http://jeremycook.ca/2010/02/13/location-aware-webpages/</link>
		<comments>http://jeremycook.ca/2010/02/13/location-aware-webpages/#comments</comments>
		<pubDate>Sat, 13 Feb 2010 20:55:55 +0000</pubDate>
		<dc:creator>Jeremy Cook</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[Web Services]]></category>

		<guid isPermaLink="false">http://jeremycook.ca/?p=54</guid>
		<description><![CDATA[This post discusses a how I implemented a solution for a client which shows which of the client's stores a user visiting a website is geographically close to.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m working on a site for a client at the moment who has around 35 store locations across Canada. The client was keen to have a &#8216;your local store&#8217; feature on the home page where the store closest to the visitors location was featured. I thought I&#8217;d write briefly about this, the solution I came up with to address this problem, and some of the limitations of it.</p>
<p>To display the local store information I needed to go through a number of distinct steps in the code.</p>
<ol>
<li>Determine the visitors location from their IP address.</li>
<li>If the visitor is in Canada calculate the distance to their closest store.</li>
<li>If they&#8217;re within 100km of a store display information on that store on the homepage.</li>
<li>If the user is not in Canada, is more than 100km away from a store or if there is any kind of error display a generic message about store locations.</li>
</ol>
<h2>Getting the Users Location</h2>
<p>Determining a visitors location from their IP address can be done through a number of GeoIp services, some available for free and some paid for. Ideally I wanted to use a PECL <a href="http://www.php.net/manual/en/book.geoip.php">PHP extension</a> to do the location lookup but this was not possible as I could not persuade my web host to install this. I fell back to using a free web service provided by <a href="http://ipinfodb.com/">ipinfodb</a> for the lookup. I found a class on PHPClasses that used this web service, which I substantially rewrote to serve my purposes. The code that performs the lookup in the class is below:</p>
<pre class="brush: php; title: ; notranslate">

$strAPIURL = $this-&gt;apiUrl . &quot;?ip=&quot; . urlencode ($this-&gt;remoteAddress);
 //Make a call to the api and fetch the result.
 $ch        = curl_init ($strAPIURL);
 curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
 $xmlResult = curl_exec ($ch);
 curl_close ($ch);

 if ($xmlResult &amp;&amp; strlen ($xmlResult) &gt; 2) {
 //Process the result
 $result = simplexml_load_string ($xmlResult);

 if ((string)$result-&gt;Status == 'OK') {
 $this-&gt;countryName   = (string)$result-&gt;CountryName;
 $this-&gt;countryCode   = (string)$result-&gt;CountryCode;
 $this-&gt;cityName      = (string)$result-&gt;City;
 $this-&gt;zipPostalCode = (string)$result-&gt;ZipPostalCode;
 $this-&gt;regionName    = (string)$result-&gt;RegionName;
 $this-&gt;regionCode    = (string)$result-&gt;RegionCode;
 $this-&gt;timezone      = (int)$result-&gt;Timezone;
 $this-&gt;gmtOffset     = (int)$result-&gt;Gmtoffset;
 $this-&gt;dstOffset     = (int)$result-&gt;Dstoffset;
 $this-&gt;lat           = (float)$result-&gt;Latitude;
 $this-&gt;long          = (float)$result-&gt;Longitude;
 //Set the ipSniffed flag to true
 $this-&gt;ipSniffed     = true;
 }
 }
</pre>
<p>Once the call to the web service has been completed the information returned is stored in public properties and a flag is set to show that a successful lookup was performed. My class uses filter_var() to make sure a valid IP address that is not in a private range is passed to it, throwing an InvalidArgument Exception if not. Of course, if an exception is thrown I fall back to my default position of displaying a generic message about store locations.</p>
<h2>Finding the Closest Store</h2>
<p>Once I have the users location from their IP address and I&#8217;ve determined that they&#8217;re visiting from Canada I proceed to calculate the distance to their closest store. This is done with a database query. Information about all of the stores is stored in a database table along with the latitude and longitude of each store. Using the latitude and longitude returned from the web service I can then calculate the users distance to the nearest store. This is done in the following code:</p>
<pre class="brush: php; title: ; notranslate">

protected function getClosestStore ($lat, $long) {
 $sql = &lt;&lt;&lt; _SQL_

SELECT StoreId, CONCAT_WS(', ', Address, City, CONCAT(Province, ' ', PostCode)) AS Address, Lat, Lon, ROUND(6371 * 2 * ASIN(SQRT(POWER(SIN((:lat - abs(lat)) * pi()/180 / 2), 2) +
COS(:lat * pi()/180 ) * COS(abs(lat) * pi()/180) *  POWER(SIN((:lon - lon) * pi()/180 / 2), 2) )),2)
AS Distance
FROM Stores
HAVING Distance &lt;= 100
ORDER BY Distance Limit 1
_SQL_;
 try {
 $db = db::getConn();
 $st = $db-&gt;prepare($sql);
 $st-&gt;execute(array(':lat' =&gt; $lat, ':lon' =&gt; $long));
 $this-&gt;storeInfo = $st-&gt;fetch(PDO::FETCH_ASSOC);
 }
 catch (PDOException $e) {
 error_log($e-&gt;getMessage());
 }
 }
</pre>
<p>This PHP in this code simply connects to the database (db::getConn() is a static method that returns a singleton instance of a PDO object), performs the query and stores the result in the storeInfo property. If no result is returned from the database the fetch() method will return false, which then tells me that a user is not within 100 km of a store. The real meat of this code is in the SQL query. It selects some information about the store and then uses some math to calculate the distance to the nearest store. This is done using the Haversine formula, which calculates the distance between two sets of latitude and longitude, taking into account the curvature of the earth. I&#8217;m not going to try to explain the math (mostly because I don&#8217;t fully understand it myself!) but there is a <a href="http://en.wikipedia.org/wiki/Haversine_formula">Wikipedia article</a> on the formula for anyone who is interested. The query then limits the results to stores that are within 100km, orders the results by distance to make sure the closest one is listed first and then returns the first result. If a result is returned I then record the StoreId of the closest store in a cookie, which is then used when the visitor visits the site again. This is to cut down on processing time for subsequent request for the home page and means that I won&#8217;t need to hit the ipinfodb web service on every request for the index page of the site.</p>
<h2>Limitations of this Solution</h2>
<p>There are three major limitations to this solution that I can see: the accuracy of GeoIP services, the database query and saving the local store information in a cookie.</p>
<p>GeoIP services claim about an 80% accuracy when tracking the geographic location of an IP address down to the city level (accuracy is about 95% for finding the country a user is visiting from). For example I am writing this in Guelph, Ontario but the IP address I am connected to the internet with resolves geographically to the nearby city of Kitchener. This is not a major problem for my application as the client only has 35 stores across Canada and I am just trying to find the closest one to a user. Given the small number of stores chances are that I will hit on the closest one, even allowing for only 80% accuracy in GeoIP tracking. For other applications this could be a problem. The W3C has a draft <a href="http://dev.w3.org/geo/api/spec-source.html">Geolocation API</a> which some browsers (such as Firefox) are starting to implement. Using this (perhaps with some AJAX) could help to get a more accurate fix on some users locations but would open up some privacy concerns. For this application I display links that allow a user to manually select their closest store if the GeoIP tracking is wrong, hopefully alleviating any problems caused by the 80% accuracy.</p>
<p>The SQL query as I&#8217;ve written it would have performance issues if a large number of locations were being stored. To calculate the distance the query has to find the distance to every store held in the database, only then narrowing it down to the closest. This is not a problem currently as the client only has 35 locations, but if hundreds or thousands of locations were being stored the query would be extremely innefficient and take a long time to execute. In the course of my research I did read something about limiting the search to a radius (100 km in this case). This would involve several queries and probably a stored procedure to carry them out in. As I was dealing with a small number of locations I decided against this for simplicity. For another solution involving more locations I would probably go with this approach.</p>
<p>Saving the information on the users local store in a cookie makes the application more efficient but it potentially slightly degrades the user experience. In a scenario where a user travels and expects to see information on the closest store to where they currently are this would not work. I made a judgement call here and decided that it was better to go with the more efficient approach for this client, but in other cases this may not be so. Of course, if I could install the PECL GeoIP extension I wouldn&#8217;t need to store the cookie. Looking up a users location using this extension would be no more complicated than calling some PHP functions. The GeoIP information would be stored locally as part of the extension. This would be quicker than creating a call to and processing a response from a web service. Unfortunately this was not an option for me on this occasion. This problem can be overcome by the user manually selecting their closest store, overriding the mechanism I programmed.</p>
<p>All in all I am fairly happy with the solution I arrived at for this problem. It enables me to tailor content on the page based on where a user is. My example is a fairly simple one but it would be possible to do far more sophisticated things using GeoIP such as automatically displaying content in different languages depending on which country a visitor is coming from or displaying prices in different currencies. Due to the less than 100% accuracy of GeoIP mechanisms would always need to be provided to allow a user to override the conclusions found by the code though.</p>
]]></content:encoded>
			<wfw:commentRss>http://jeremycook.ca/2010/02/13/location-aware-webpages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

