Tracking Social Media Marketing

Many businesses are now using social media services such as Twitter for marketing. It can be a great tool for reaching a wide user base but I wonder how many businesses really know what is being said about them on social networks. It seems to me that if someone wants to use a tool like Twitter for marketing they need to have a way to monitor what sort of response their marketing efforts are generating. Business owners wouldn’t dream of spending time, effort and money on any other form of marketing without knowing what sort of response is being generated.

Thankfully all of the main social networks provide web service API’s and using these it’s possible to track just about any metric that you would like. As an example I’ve put together a short (about 100 lines including some nice formatting and comments) script that carries out a search on Twitter, using the Twitter search API, and then extracts keywords from each tweet returned, using the Yahoo term extraction API. The results are sent to the user as a CSV file containing columns with the date the tweet was posted, the author, the tweet text and the keywords extracted for that tweet. Popular keywords are then presented at the bottom of the file. This is by no means production ready code but is really a quick and dirty test to show the kind of thing that is possible. Here’s a brief look at how it works.


require 'termExtractor.php';

try {
 //Perform the initial search on Twitter. Also set the number of tweets to return as the maximum allowed per page (100)
 $search = '"Demo Camp Guelph" OR #dcg';
 $query  = http_build_query(array (
 'q'     => $search,
 'lang'  => 'en',
 'rpp'   => 100,
 'since' => date('Y-m-d', strtotime('5 days ago'))
 ));
 $url    = 'http://search.twitter.com/search.json?';
 $ch     = curl_init($url . $query);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
 $tweets = json_decode(curl_exec($ch));

The beginning of the script includes a class that I wrote to handle working with the Yahoo term extraction API. It then sets up a few variables and carries out an initial search using the Twitter API. I’ve chosen as my search query text relating to a meeting of developers that took place in Guelph last week. Documentation on the Twitter search API is pretty good and includes information on the format of the search terms, including modifiers that are allowed. My search looks for any tweets posted in the last five days with the text ‘Demo Camp Guelph’ or the hashtag ‘#dcg’. It also sets the number of results to return to 100. This is the maximum number that can be returned from a request but Twitter provides a mechanism whereby searches with more than 100 results can be paged over.


//Open the PHP output as a file handle
 $output = fopen('php://output', 'w');
 if (!$output)
 throw new Exception('Unable to bind PHP output as a file handle');
 //Set headers to tell the browser a csv attachement is coming. Also set cache control to no-cache to ensure browser does not cache the results file.
 header ('Content-Type: application/csv');
 header ('Content-Disposition: attachment; filename="Twitter_Search.csv"');
 header ('Cache-Control: no-cache, must-revalidate');
 //Add headings for the columns
 fputcsv($output, array (
 'Posted On',
 'Author',
 'Tweet',
 'Keywords'
 ));
 if (!isset($tweets->results) || !count($tweets->results)) {
 //No search results. Output an error message and exit
 fputcsv($output, array('Sorry, there are no results that match that search'));
 curl_close ($ch);
 fclose ($output);
 exit;
 }

This code binds the output of the script to a file handle, allowing me to present the output of the script as a file to the browser. HTTP headers are also set to tell the browser to expect a csv file and that it should be treated as an attachment. Some headings for columns are then written and if there are no search results the script outputs a message and exits.


//Set up the termExtractor object, an array to hold the keywords results and a flag to show that we're in the first iteration of the do while loop
 $keys = new termExtractor(YAHOO_API_KEY);
 $keywords_array = array();
 $first = true;
 //Loop over the results
 do {
 if (!$first) {
 //Break the query string for the next page into its' parts. Use substr() to remove the initial ?
 parse_str (substr($tweets->next_page, 1));
 $query = http_build_query(array (
 'q'      => $q,
 'page'   => $page,
 'max_id' => $max_id,
 'lang'   => 'en',
 'rpp'    => 100
 ));
 curl_setopt($ch, CURLOPT_URL, $url . $query);
 $tweets = json_decode(curl_exec($ch));
 }
 foreach ($tweets->results as $result) {
 //Result text from Twitter is utf-8 encoded and htmlentity encoded.
 $result->text = html_entity_decode(utf8_decode($result->text));
 $keywords     = $keys->doTextQuery($result->text);
 //If a single keyword was returned turn it into an array to save repetition of code.
 if (is_string($keywords) && strlen($keywords)) {
 $keywords = array($keywords);
 }
 if (is_array($keywords)) {
 foreach ($keywords as $word) {
 //Check that the keyword does not appear in the initial search term.
 switch (!stristr($search, $word)) {
 case true:
 //If the keyword has already been set in the array increment the count by 1
 if (array_key_exists($word, $keywords_array)) {
 $keywords_array[$word]++;
 } else {
 $keywords_array[$word] = 1;
 }
 break;
 default:
 continue;
 }
 }
 $keywords = implode(', ', $keywords);
 }
 fputcsv($output, array (
 $result->created_at,
 $result->from_user,
 $result->text,
 $keywords
 ));
 }
 $first = false;
 } while (isset($tweets->next_page));

This code sets up the term extraction object and then loops over the results while there is a next_page available. Keywords are output to the CSV file alongside each tweet and also stored in an array for later use. Keywords which appear in the original search query are filtered out of the results array and using a do while loop ensures that the first set of search results are output even if there are no other results after that.


//Destroying the $keys object triggers the destructor which disposes of the cURL resource in the object
$keys = NULL;
//Free up the cURL resource
 curl_close ($ch);
 fputcsv($output, array('Keywords for this search:'));
 fputcsv($output, array (
 'Phrase',
 'Number of Occurences'
 ));
 foreach ($keywords_array as $word => $count) {
 if ($count >= 3) {
 fputcsv($output, array (
 $word,
 $count
 ));
 }
 }
 fclose ($output);
} catch (Exception $e) {
 echo $e->getTraceAsString();
}

The final part of the code processes the stored keywords and outputs any keyword that appears three times or more in the tweets that have been found.

The code here is obviously not optimal but is merely meant as an example of what is possible. It would be very easy to abstract the logic out of this script into a Twitter search class, making the code portable and reusable. The code also takes quite a long time to execute, depending on the search term used, the number of tweets returned and the time frame to return results for. The problem seems to be in the code that calls the Yahoo web service. Every ‘page’ of the Twitter search results returns up to 100 tweets and the text of each of these must then be passed onto Yahoo for analysis. This means that for each call to the Twitter search API there are up to 100 calls to the Yahoo term extraction service. This can easily cause the script to ‘time out’. In its’ present form it would probably be best run once a day to return search results for the last 24 hours, with the results being emailed to someone to analyse. It may be possible to amalgamate the calls to the Yahoo service into one. The keyword extraction itself is somewhat rudimentary and could probably be easily improved, depending on what a client was looking to do.

What is interesting to me is how powerful this could be. It could be possible to build automated scripts that run on a schedule, tracking campaigns on Twitter. If the text analysis could be made more sophisticated it would be possible to alert someone only if certain trends appeared, allowing them to then make appropriate responses. This would then become a very useful Twitter tracking tool. Similar tools could be built for other social networking sites.

As an example of what is possible this script serves its’ purpose. For a production environment there are lots of ways in which it could be improved but the possibilities are there. Source code and an example of the script output can be downloaded here but due to the issues with time outs I can’t make the script itself available to request from a browser. I hope someone will be inspired to come up with their own, better, Twitter analysis tools and will consider sharing it with the rest of us.

Leave a Reply