Sunday, December 28, 2014

Generate a Wordcloud from a URL using R

How to generate a wordcloud image from a website URL in just three lines of R code.

I recently found a very groovy wordcloud package for R, written by Ian Fellows, that I then extended to pass in a URL. Once you've got everything setup, that I'll outline here, you can then generate a wordcloud on the fly by simply pasting in the web address. You should have a basic understanding of loading R packages in order to set this up.

Required Libraries


You'll need to install the following libraries;

library(data.table)   # Used in filter and subsetting the words.
library(httr)            # used to access the webpage URL
library(XML)         # Used to extract the body tag content from the URL.
library(tm)              # Text mining features for stopword and punctuation removal
library(wordcloud)    # Used to generate the wordcloud
library(RColorBrewer) # Coloring the wordcloud

Create or download pageCloudify.R file


pageCloudify.R is my solution script file for loading the URL, scraping its contents, and generating the wordcloud image. You can paste the script into a new file called pageCloudify.R or simply download it from here.

1:  ## pageCloudify.R  
2:  ## Creates a wordcloud for one or more webpage URLs.  
3:  ## Most stopwords (ie. "the", "and") are removed by default.  
4:  ##---------------------------------  
5:  ## Function: pageCloudify()  
6:  ## Parameters:  
7:  # TheURLS: a string or vector of strings containing valid URL(s) http://starwars.wikia.com/wiki/Star_Wars  
8:  # freqReq: an integer value representing the minimum number of word repetition to be included in the cloud.  
9:  # toPNG:  Output wordClouds to PNG files. The default is False which sends the wordCloud to the screen device.  
10:  ## Invalid URL: The webpage html must contain a <body> tag. This is the portion of content used   
11:  ## to build the word cloud.  
12:  ## Warnings: Warnings will be generated for clouds containing more words than can fit in the cloud.  
13:  ## Try increasing the freqReq parameter to decrease the number of words in the cloud.  
14:  ## Many english stopwords are removed by default. To view the stopwords removed by the wordCloud package:  
15:  ## stopwords(kind = "en")  
16:  ## Libraries required: They will be attempted to be loaded for you when you call the source R file.  
17:  ## However, they do need to be already installed.  
18:  # library(data.table)  # Used in filter and subsetting the words.  
19:  # library(httr)     # used to access the webpage URL  
20:  # library(XML)     # Used to extract the body tag content from the URL.  
21:  # library(tm)      # Text mining features for stopword and punctuation removal  
22:  # library(wordcloud)  # Used to generate the wordcloud  
23:  # library(RColorBrewer) # Coloring the wordcloud  
24:  ##### Examples #####  
25:  ## Copy the pageCloudify.R file to your working directory.  
26:  ## Set your working directory.  
27:  # setwd("c:/personal/r")  
28:  ## Load the pageCloudify.R source file.  
29:  # source("pageCloudify.R")  
30:  ## Example usage with single URL  
31:  # pageCloudify(TheURLs="http://en.wikipedia.org/wiki/Star_Wars", freqReq = 15, toPNG = F)  
32:  ## Example usage with multiple URLs, to PNG files  
33:  ## each png file will go to the working directory  
34:  ## each png file will be named mywordcloud_X.png where X is an ordinal integer from 1 to the number of URLs.  
35:  # TheURLs <- c("http://incrediblemouse.blogspot.com/2014/11/sql-server-r-and-facebook-sentiment.html",   
36:  #       "http://incrediblemouse.blogspot.com/2014/11/sql-server-predictive-analytics.html",   
37:  #       "http://incrediblemouse.blogspot.com/2014/11/sql-server-r-and-logistic-regression.html",   
38:  #       "http://incrediblemouse.blogspot.com/2014/11/sql-server-and-univariate-linear.html")  
39:  # pageCloudify(TheURLs, freqReq = 5, toPNG = T)  

40:  library(data.table)  
41:  library(httr)  
42:  library(XML)  
43:  library(wordcloud)  
44:  library(tm)  
45:  library(RColorBrewer)  
46:  pageCloudify <- function(theURLs, freqReq = 10, toPNG=F){  
47:    y = 1  
48:    for(x in theURLs){  
49:      ## Using XML to pull the html  
50:      parsedHTML2 <- htmlTreeParse(x, useInternalNodes=T)  
51:      PostEntry <- xpathSApply(parsedHTML2, "//body",simplify=T)  
52:      ## Munging for Wordcloud  
53:      z <- data.table(xmlSApply(PostEntry[[1]], xmlValue))   
54:      z <- data.table(words = tolower(gsub("\n", " ", z$V1, fixed = TRUE)))  
55:      z <- data.table(words = tolower(gsub("[^[:alnum:] ]", "", z$words)))  
56:      z <- data.table(words = unlist(strsplit(z$words," ")))  
57:      z$words <- removeWords(z$words, stopwords(kind = "en"))  
58:      z <- subset(z, words != "" | words != " ")  
59:      z <- data.table(words = removePunctuation(z$words), int = 1)  
60:      z <- z[,(wordcount = sum(int)), by=words]  
61:      z <- subset(z, nzchar(z$words) == T & nchar(z$words) > 2 & nchar(z$words) < 26 & z$V1 > freqReq)  
62:      ## Word Cloud Color Palette  
63:      pal <- brewer.pal(6,"Dark2")  
64:      ##pal <- pal[-(1)]  
65:      if (toPNG) {  
66:        pngFileName = paste("mywordcloud_", as.character(y), ".png", sep="")  
67:        png(file=pngFileName, bg="transparent")  
68:        wordcloud(z$words, z$V1, scale = c(4,.8), min.freq=2, random.color = T, colors=pal)  
69:        dev.off()  
70:      } else {  
71:        wordcloud(z$words, z$V1, scale = c(4,.8), min.freq=2, random.color = T, colors=pal)  
72:      }  
73:      y = y + 1  
74:    }  


The script provides a new function called pageCloudify()
Parameters:
theURLS: a string or vector of strings containing valid URL(s)
freqReq: an integer value representing the minimum number of word repetition to be included in the cloud.
toPNG: Output wordclouds to PNG files. The default is False which sends the wordCloud to the screen device.

Plot a WordCloud


Now that've got your packages installed, and the above script saved to a memorable location, you can now generate a wordcloud in 3 lines of code. Be sure to change your working directory to the same location where you stored the pageCloudify.R file.

1:  setwd("c:/personal/r")  # Change this to the stored location of pageCloudify.R  
2:  source("pageCloudify.R")  
3:  pageCloudify(theURLs="http://en.wikipedia.org/wiki/Star_Wars", freqReq = 25, toPNG = F)  

Optionally run the code again to get a different variation on the placement  and coloring of the words.

You can even pass in multiple URLs all at once, and output them to PNG files.

1:  ## Example usage with multiple URLs, to PNG files  
2:  ## each png file will go to the working directory  
3:  ## each png file will be named mywordcloud_X.png where X is an ordinal integer from 1 to the number of URLs.  
4:  theURLs <- c("http://incrediblemouse.blogspot.com/2014/11/sql-server-r-and-facebook-sentiment.html",   
5:         "http://incrediblemouse.blogspot.com/2014/11/sql-server-predictive-analytics.html",   
6:         "http://incrediblemouse.blogspot.com/2014/11/sql-server-r-and-logistic-regression.html",   
7:         "http://incrediblemouse.blogspot.com/2014/11/sql-server-and-univariate-linear.html")  
8:  pageCloudify(theURLs, freqReq = 5, toPNG = T)  

Modify the function for even more cool features, likes different color palettes. Simply pass a different brewer.pal color palette name.

The first cloud is from Issac Newton's wikipedia page. I mean, who's a better candidate for a colorful wordcloud than the man who figured out that light contains all the visible colors.


While this is awesome, from a programmer standpoint, there are great sites where you can simply paste in some text, and generate highly artistic wordclouds - You might be interested in wordle.net

Enjoy!

-P


Sunday, November 23, 2014

SQL Server, R, and Facebook Sentiment Analysis

Apparently, if you're talking about congress, you're probably not using flowery adjectives.


I love the integration of SQL Server and R for data analysis and storage. I often jump back and forth between platforms depending on what I want to accomplish and the efficiency at which I know I can get the job done. Granting that you can perform all of this analysis, and to a much larger statistical extent, inside of R, I still lean on SQL Server for my data storage engine.

I demonstrate here the use of R for accessing Facebook public posts, SQL Server for storage, manipulation, and sentiment analysis of that data. Again, you could achieve it all in R, but my background is in data architecture, so I have a penchant for storing data responsibly for future needs (I wish I did the same with my money).

The above chart, aggregated in SQL Server and prettied up with Excel, scored word sentiment found in 200 recent public facebook posts, for each search term category.

Software Needed

R can be downloaded from CRAN and R Studio can be downloaded from RStudio.
SQL Server Express 2014, capable of everything here, can be downloaded from Microsoft. I highly recommend the version: Express with Advanced Services (SQLEXPRADV), since it comes with full text search as well (not needed here, but is a very efficient text search engine you might one day want).

The process of the whole project is as follows;

1.) Obtain your facebook developer API access token.
2.) Configure your R environment for access to the API
3.) Prepare a SQL Server table for facebook data.
4.) Retrieve data from Facebook in R and save it to SQL Server.
5.) Prepare a SQL Server function for word parsing.
6.) Create a WordScorer lookup table.
7.) Score the facebook data, by search term, providing the average sentiment score for each.

Obtain your facebook developer API access token


To begin, you'll need to create an app at https://developers.facebook.com
I show here the basic configuration needed. The Site URL may need to be slightly different, but we'll get to that shortly.


Make note of your App ID and App Secret. You'll need them both for use in R.

Configure your R environment for access to the API


You'll need to install and load a few R libraries ahead of time. These will cover the bases for rFacebook, OAuth, SQL Server access, and even future plotting for commands we will using here. Some are just useful for extended features we'll not be using here, but that you should have on hand. All of the facebook API interaction in R is done via the awesome Rfacebook library, hosted on Cran and written by Pablo Barbera.

 library(Rfacebook);  
 library(data.table);  
 library(RCurl);  
 library(rjson);  
 library(RODBC);  
 library(ROAuth);  
 library(ggplot2);  
 
## Also set your working directory.
## Your Facebook OAuth credentials will be saved 
## here for future use.
setwd("C:/Personal/R")


Now you'll authenticate your OAuth connection using your App ID and App Secret from the facebook application basic settings page, as noted previously.

 ## This is only performed once.  
 fb_oauth <- fbOAuth("YourAppIDHere", "YourAppSecretHere", extended_permissions = TRUE)  
 save(fb_oauth, file="fb_oauth")
  
 ## Now we only need to do this for all future OAuth loading.
 load("fb_oauth")  

When you run the fbOAuth, your R console will provide further instructions. These instructions contain the URL you should apply to your facebook application settings for the Site URL.

Copy and paste into Site URL on Facebook App Settings: http://localhost:1410/ 
When done, press any key to continue...

After applying the Site URL, press any key in the console, as the instructions state. Your web browser will open, access facebook, and tell you when it is completed. Afterwards, close your browser.

That's the one-time setup (until the token expires). From this point forward, you'll only need to load("fb_oauth")

Prepare a SQL Server table for your facebook data.


In SQL Server we'll create a table to hold all of the variables the Rfacebook function searchFacebook returns. We could simply pass the sqlSave command and let RODBC create our table for us, however, it's not all that savvy when it comes to data types. It tends to take the path of least resistance, making most numbers a float, and all character vectors as varchar(255). Since one of the vectors in the results, [message], containing the contents of the facebook post can contain a great number of characters, RODBC will truncate this data before inserting it. However, if we specify the table data types more explicitly in advance, we can get greater control. I've chosen a varchar(4000) to hold the message body. Some posts can even be longer, but for this simplistic sentiment analysis it's plenty large enough, even if we get some truncation. I would have preferred to use SQL Server's [text] data type, but was unable to get RODBC to stop complaining about it. I've tabled that for another rainy day.

It's also important to note that your table should mirror, column for column, what you plan on having in your R data table. If you want to insert records to an existing table with additional columns, you should really insert it to a temporary staging table, and migrate the data via trigger or otherwise. Yet another topic for a different rainy day.

 create table dbo.facebooksearchresults (  
      from_id bigint null  
      , from_name varchar(255) null  
      , [message] varchar(2000) null  
      , created_time varchar(255) null  
      , [type] varchar(255) null  
      , link varchar(255) null  
      , id varchar(255) null  
      , likes_count integer null  
      , comments_count integer null  
      , shares_count integer null  
      , searchstring varchar(255) null  
      , dateadded date  
 );  

Retrieve data from Facebook in R and save it to SQL Server.


Now, in R, we can utilize the searchFacebook function for specific terms, and save the results back to SQL Server. I've repeated my search (and data save), pulling down 200 posts, for each search term I was interested in. The example below searches for posts containing the term ebola as an example. You might consider a vector of terms, and looping over them, but for simplicity and training, I've left that out of this article.

1:  ## Search Facebook Public Posts, and save data to SQL Server.  
2:  searchString = "ebola"  
3:  data <- searchFacebook(string=searchString, token=fb_oauth, n = 200, since = NULL, until = NULL)  

4:  ## Convert to a data table.  
5:  datatbl <- data.table(data)  

6:  ## Add a new column to our table with the search string.  
7:  ## This field will allow us to append this data to SQL Server   
8:  ## and can act as the key to filtering for this specific search.  
9:  ## Now we can search multiple terms, and keep loading data.  
10:  datatbl$searchstring <- searchString  

11:  ## Add the dateadded, in case you want to compare a search on one day, to another.  
12:  datatbl$dateadded <- as.character(Sys.Date())  


The last two fields in our SQL Server table, searchString and dateadded, are not returned by the searchFacebook function, but we'll add those to our R data table so they can be stored with our results.

Your console should indicate that x number of posts were retrieved. We can now save the data to our SQL Server table

1:  ## Connect to your SQL Server.  
2:  url1 <- "Driver={SQL Server};Server=YourServerAndInstanceName;Database=YourDatabaseName;Trusted_Connection=Yes;"  
3:  localdb <- odbcDriverConnect(url1)  
4:    
5:  ## For very specific colnames and data type mapping, roll your own.  
6:  varTypes <- c("bigint", "varchar", "varchar", "varchar", "varchar",   
7:         "varchar", "varchar", "integer", "integer", "integer", "varchar", "date")  
8:  names(varTypes) <- c("from_id", "from_name", "message", "created_time",   
9:            "type", "link", "id", "likes_count", "comments_count",   
10:            "shares_count", "searchstring", "dateadded")  
11:    
12:  ## Save the data back to SQL Server.  
13:  sqlSave(localdb, datatbl, varTypes=varTypes, tablename ="facebookSearchResults", append = T, rownames = F)  


Prepare a SQL Server function for word parsing.


In order to parse the large [message] post contents into individual words, we need to delineate the words by spaces. There are a great number of ways to splice strings into words, but I have a favorite function for SQL Server I found a long time ago, that I still use today. It's efficient mostly because of its use of an Itzik Ben-Gan style of exponentially recursive CTEs.

One time process: Create the following function, enabling us to split strings.

 USE [YourDatabaseName]  
 GO  
   
create function [dbo].[delimitedsplit8k]  
 --===== define i/o parameters  
     (@pstring varchar(8000), @pdelimiter char(1))  
 --warning!!! do not use max data-types here! it will kill performance!  
 returns table with schemabinding as  
  return  
 --===== "inline" cte driven "tally table" produces values from 1 up to 10,000...  
    -- enough to cover varchar(8000)  
  with e1(n) as (  
          select 1 union all select 1 union all select 1 union all  
          select 1 union all select 1 union all select 1 union all  
          select 1 union all select 1 union all select 1 union all select 1  
         ),             --10e+1 or 10 rows  
     e2(n) as (select 1 from e1 a, e1 b), --10e+2 or 100 rows  
     e4(n) as (select 1 from e2 a, e2 b), --10e+4 or 10,000 rows max  
  ctetally(n) as (--==== this provides the "base" cte and limits the number of rows right up front  
            -- for both a performance gain and prevention of accidental "overruns"  
          select top (isnull(datalength(@pstring),0)) row_number() over (order by (select null)) from e4  
         ),  
 ctestart(n1) as (--==== this returns n+1 (starting position of each "element" just once for each delimiter)  
          select 1 union all  
          select t.n+1 from ctetally t where substring(@pstring,t.n,1) = @pdelimiter  
         ),  
 ctelen(n1,l1) as(--==== return start and length (for use in substring)  
          select s.n1,  
             isnull(nullif(charindex(@pdelimiter,@pstring,s.n1),0)-s.n1,8000)  
           from ctestart s  
         )  
 --===== do the actual split. the isnull/nullif combo handles the length for the final element when no delimiter is found.  
  select itemnumber = row_number() over(order by l.n1),  
     item    = substring(@pstring, l.n1, l.l1)  
   from ctelen l  
 ;  

Create a WordScorer lookup table


We'll need one final ingredient before we can start scoring the words by search term. We need a lookup table that assigns a sentiment value to words. Naturally we're not going to use the whole dictionary, but we can get started with a pretty straight-forward approach.

Not every possible word needs a score. We need to ignore "a", "and", "the", and all those other words out that don't generally indicate a users sentiment, like the word "chair". The word chair doesn't give us any idea what kind of mood the user is in when they wrote it, or the sentiment behind those words.

My WordScorer table has 2477 words commonly used to score sentiment. They include adjectives and verbs that can generally be considered to be positive or negative influencers on sentiment. For instance, "abusive", "lol", and "jackass".

You'll need to create the following table and import this data (currently in google doc spreadsheet format) into the table.

 create table dbo.wordscorer(  
      wordid int identity(1,1) not null  
      , word varchar(50) not null  
      , score int not null  
      , scoreplusone int not null  
  constraint pk_wordscorer primary key clustered (wordid asc)  
 );  

Score the facebook data, by search term, providing the average sentiment score for each.


Finally, the peas of resistance! We can now query our posts, parse the words, score them, and report the average sentiment for each search string.

1:  with TheWords as (  
2:       select r.SearchString, z.item   
3:       from dbo.facebooksearchresults r   
4:       cross apply (  
5:            select *   
6:            from dbo.DelimitedSplit8K (r.[message],'')   
7:       ) z   
8:       where r.[message] is not null   
9:  )  
10:  , TheWordsScored as (  
11:       select SearchString, item, w.word, w.score   
12:       from TheWords i   
13:       join [dbo].[WordScorer] w   
14:            on i.item = w.word  
15:  )  
16:  select SearchString, AverageSentimentScore = avg(cast(score as float))   
17:  from TheWordsScored   
18:  group by searchString   
19:  order by avg(cast(score as float)) desc  


Additional Data Considerations


There is more analysis and cleaning that should be considered when performing sentiment analysis on these posts. First of all, the search terms are searched individually, and never combined. Therefore, a two word search term will yield inconsistent results. After retrieving the data, you'll have to filter that set where both are found, eliminating most if not all of the results you pulled. So, in my search terms, "Taylor Swift" will return posts that merely mention the word swift. This means the sentiment result for the two word term is erroneous, unless filtered where both are present.

Secondly, the search is performed by facebook over the message content and the comment content. So if you're looking for just the message contents containing the terms, you'll again have to filter for this on the backend after retrieval.

Where to from here


At this point, I'd be very interested in automating a regular pull of the data to see how that sentiment moves and changes over time. Facebook doesn't let you go very far back in time, even with the date parameters. It must be nice to be Facebook and have all this data in their hands already. That's my problem - I've got more side projects of personal interest than I know what to do with.


Enjoy,
-P

Have a great and fantastically wonderfully cheerful day!
(+9 positive sentiment bagged right there)

I am turned into a sort of machine for observing facts and grinding out conclusions.
Charles Darwin

Saturday, November 22, 2014

SQL Server, Predictive Analytics, Reducing Bias In Train and Test Data

SQL Server, Predictive Analytics, Reducing Bias In Train and Test Data

When training your statistical model to predict something, we never do so without our hold out data - our test data. This allows us to review the performance of our model against data that was not used during training, but for which we still know the outcome. However, the method through which you choose to divide your data in training and testing data can sometimes introduce bias and skew your predictive prowess. Were you to simply divide the data in half, you might have an uneven distribution of the population. For instance, you may have the data ordered by total sales. Splitting this in half might mean you trained your model on an unevenly heavy-handed sales total. 

To that end, we avoid ordering our data on the SQL Server side before moving it to our statistical software of choice (it better be R). However, this still isn't enough. Depending how your table is constructed, or was populated, it may be naturally ordered in some way that will create some sort of population bias. Tables with clustered indexes are great, but it also means your data is pre-ordered by that clustering key. That key might be negatively influencing the distribution. Never select the top 50% either - the data in the table is likely ordered in the table itself - you may still be getting biased sets.

To avoid this biased data selection, we introduce random number generation to our prepared data, and then split it. I show here the method for random number generation in SQL Server, and the method for doing so in R. While an analyst would quickly hop to the R version, which is incredibly easy, the SQL Server developer who wants to automate processes and data preparations, wants to be sure this kind of information is already stored in the database before the analyst or automated procedures need it, reducing the future workload for all involved.


Generating Random Numbers in SQL Server


Create the following view
 create view [dbo].[v_getRandomNum]   
 as   
 select   
      Random_ZeroOrOne = abs(cast(newid() as binary(6)) %2) -- between 0 and 1   
      , Random_Num = abs(cast(newid() as binary(6)) %1000) + 1 -- between 1 and 1000  
      , Random_Big = abs(cast(newid() as binary(6)) %999999)  
      , aGUID = newid()  
 ;  

Test it with some sample data. 
 -- Create Temp Table to hold our Sample Data  
 if object_id('tempdb..##SampleRandomized','u') is not null begin drop table ##SampleRandomized end;  
 create table ##SampleRandomized (  
      RowID               int identity(1,1)  
      , rnd_ZeroOrOne     tinyint  
      , rnd_Int           int  
 );  


 -- Create Sample Dataset with Randomized Numbers from SQL Server  
 -- Insert 1000 rows of sample numbers (using sys.objects as the numerator)  
 insert into ##SampleRandomized (rnd_ZeroOrOne, rnd_Int)   
 select top 100   
      rnd_OneOrZero = random.Random_ZeroOrOne  
      , rnd_Int     = random.Random_Num  
 from sys.objects, dbo.v_getRandomNum as random  
 ;  
 select * from ##SampleRandomized  


Now you can use the random number generating view by joining it to your data query. In the example above I've joined it to sys.objects simply as a means of row generation. The distribution of the random numbers will tend to be evenly distributed, given enough rows. 

In the sample I have only pulled together 100 rows, but if we look at the number of cases that are 1, compared to the number of cases that are 0 we will see it's pretty evenly split near 50%. The same distribution is applicable to our random integer, between 1 and 1000, we find about 50% are below 500, and 50% are at or above 500. Each time you generate the random numbers you should see slightly different distributions.


 select MinInt = min(rnd_Int)  
      , MaxInt = max(rnd_int)  
      , CountCases = count(1)  
      , CountOne = sum(case when rnd_ZeroOrOne =1 then 1 else 0 end)   
      , CountZero = sum(case when rnd_ZeroOrOne =0 then 1 else 0 end)   
      , CountBelow = sum(case when rnd_int < 500 then 1 else 0 end)   
      , CountAbove = sum(case when rnd_int >= 500 then 1 else 0 end)   
 from ##SampleRandomized  



Using this to break our data into a training and testing set is now a trivial matter.


 -- Now we can easily break the set in half, based on Random values, decreasing bias.'  
 if object_id('tempdb..##FullDataset','u') is not null begin drop table ##FullDataset end;  
 select *   
      , dataset_byInt = case when rnd_int < 500 then 'train' else 'test' end  
      , dataset_byBit = case when rnd_ZeroOrOne = 0 then 'train' else 'test' end  
 into ##FullDataset  
 from ##SampleRandomized  


Random Number Generation in R


If you plan on using R for your predictive analytics, you'll likely be using the runif command. runif stands for Random Uniform and generates a uniform distribution of random numbers between 0 and 1.


 ## Load the needed libraries  
 library(data.table)  
 library(RODBC)  
 library(ggplot2)  

 ## Get your sample randomized data  
 url1 <- "Driver={SQL Server};Server=YourServerName;Database=YourDatabaseName;Trusted_Connection=Yes;"  
 dsr1 <- odbcDriverConnect(url1) ## Using RODBC  
 data1 <- data.table(sqlQuery(dsr1, paste("select * from ##FullDataset")))  


 ## Transform rnd_ZeroOrOne into a factor  
 data1$rnd_ZeroOrOne <- factor(data1$rnd_ZeroOrOne)  


 ## Random Number Generation via R  
 RowsInData <- nrow(data1)  
 data1$rnd_FromR <- runif(RowsInData)  

After implementing the runif based on nrows (number of rows), we can view the results of that vector.




Now we can break up our data into different sets, and mark them as train or test.



 ## Create rowcounts, by random methods, for both our test and train sets.  
 ## this can be done in compressed language with aggregate or ddply grouping  
 ## however, this is the more readily accessible to beginners.  
 rCountBelow <- data.table(provider = 'R', dataset = 'train', cases = nrow(subset(data1, rnd_FromR < .5)))  
 rCountAbove <- data.table(provider = 'R', dataset = 'test', cases = nrow(subset(data1, rnd_FromR >= .5)))  
 sCountBelow <- data.table(provider = 'SQL', dataset = 'train', cases = nrow(subset(data1, rnd_Int < 500)))  
 sCountAbove <- data.table(provider = 'SQL', dataset = 'test', cases = nrow(subset(data1, rnd_Int >= 500)))  
 

## union all the row counts  
 caseCounts <- rbind(rCountBelow, rCountAbove, sCountBelow, sCountAbove)  



To show the distribution, we can view(caseCounts)



A very uniform distribution indeed, bias reduced through random numbers.

To demonstrate visually, here's a gpplot of the distribution differences (there is none) between our SQL Server random numbers and the random numbers generated in R.


 ## View the distribution of Random row selection from R  
 ggplot(data = caseCounts,   
     aes(x = factor(provider), y=cases, fill = factor(dataset))) +  
  geom_bar(stat="identity") +  
  scale_y_continuous(limits=c(0,100))  




Enjoy,
-P

Windows Tip: God Mode for Windows

Oh my gawd. You have the power. Well, it's not nearly as awesome as god mode used to be when playing the original Quake - but it's interesting anyway.

Check out this windows tip for accessing all those windows god mode features through one control panel-like interface.

What's the secret? Simply right click anywhere on your desktop and create a new shortcut, but name it specifically this:

GodMode.{ED7BA470-8E54-465E-825C-99712043E01C}

Now open up your god panel!!!
(it's really just an expanded control panel with administrative shortcuts)




Enjoy,
-P

A Peek Inside an XLSX Document

Wants to see what's under the hood of an XLSX file? Probably not, but just for giggles, this is a little bit interesting. Did you know that Excel xlsx files are actually zip files? Change the file extension to .zip and [poof], you can view the underlying supporting documents. You'll notice the format of the xlsx file is supported by pretty straightforward xml files.

Make File Extensions Visible

If you don't have file extensions visible - set this folder option first.
Option your control panel, and select "Folder Options".
Then remove the checkmark for "Hide extensions for known file types".


Rename the xlsx file

Using Windows File Explorer, choose any xlsx file and rename the extension from xlsx to zip.



Naturally, say yes to the prompt.


Now open the .zip file, and view the contents.


Oh my.. Look at that xml file. 

Digging inside there you'll notice it's straight forward xml with values, except that excel is using an index lookup to the sharedStrings.xml file (also located in a higher leve inside the zip file) for cell values. So in the Sheet1.xml, a <v>0</v> would correspond to the first value in the sharedStrings file. Yes, you can edit the files, save it back, rename the zip back to an xlsx file, and it still works.

Programmatically, this makes for pretty easy interaction with excel documents.

Anyways, enjoy.
-P

Sunday, November 9, 2014

SQL Server, R, and Logistic Regression

SQL Server, R, and Logistic Regression

This demonstrates the retrieval of data from SQL Server, performing logistic regression to predict the probability of an event and saving the data back to SQL Server for storage, reporting or additional analysis. The motivation here is the desire to extend the capabilities of your average SQL Server developer who may be looking to expand their skills with predictive analytics and R.

Logistic regression returns the probabilty of an event, and in the case of my example, the event is represented by 1, and a non-event is represented by a 0. In order to start predicting probabilities we should follow the basic protocol of breaking our dataset into two groups, test and train. For example, we might have 100 cases (rows) of events with explanatory variables (those that are used to predict the resulting event). We would break up this data, for which we already know the outcome, into 50 cases of training data, and 50 cases of testing data. This allows us to not only build our predictive model, but then put the model up against another set of known outcomes to compare how our model is performing. In the example data I will use, the test and train data are identical, for the sake of simplicity. (keep it stupid, simple.)

SQL Server Sample Data
1.) In SQL Server, create a global temporary table with sample data;

 -- test data for logistic regression  
 -- Find the probability of an event  
 if object_id('tempdb..##testlogit') is not null begin drop table ##testlogit end;  
 create table ##testlogit (  
      rowid int identity(1,1)  
      , [a] int not null   
      , [b] int not null   
      , [event] int null   
      , dataset varchar(5) not null  
 )  
 -- Insert the model training data.   
 -- We will use this to train our predictive prowess.  
 insert into ##testlogit select 1, 0, 1, 'train'  
 insert into ##testlogit select 1, 0, 1, 'train'  
 insert into ##testlogit select 1, 0, 0, 'train'  
 insert into ##testlogit select 1, 1, 1, 'train'  
 insert into ##testlogit select 1, 1, 1, 'train'  
 insert into ##testlogit select 0, 0, 0, 'train'  
 insert into ##testlogit select 0, 0, 0, 'train'  
 insert into ##testlogit select 0, 1, 0, 'train'  
 insert into ##testlogit select 0, 1, 1, 'train'  
 -- Insert the model testing data. We will predict the event in this set.  
 insert into ##testlogit select 1, 0, 1, 'test'  
 insert into ##testlogit select 1, 0, 1, 'test'  
 insert into ##testlogit select 1, 0, 0, 'test'  
 insert into ##testlogit select 1, 1, 1, 'test'  
 insert into ##testlogit select 1, 1, 1, 'test'  
 insert into ##testlogit select 0, 0, 0, 'test'  
 insert into ##testlogit select 0, 0, 0, 'test'  
 insert into ##testlogit select 0, 1, 0, 'test'  
 insert into ##testlogit select 0, 1, 1, 'test'  

2.) The resulting table should contain the following;


Copy the data to R for Logistic Regression

1.) Connect to the data source and pull the data into a variable called data. I've used the library RODBC, though I've seen others use rsqlserver. I hear that rsqlserver is faster for data movement, however, when you're working with statistical models in R, it's actually pretty rare that you're moving mass amounts of data around. This is even a limitation of R, in that it stores your data in objects in memory - meaning that large datasets need to take into account the available hardware limitations as well. Frankly, the words "large amount of data" is a subjective opinion, in my view. I've seen people who consider 1GB large, and some won't use the word until they see a terabyte. Nonetheless, for the incredibly tiny sample data we're using here, and everything I have ever done with R, I've found RODBC to be stunningly fast enough.

 ## Load the needed libraries  
 library(RODBC)  
 library(data.table)  
 library(ggplot2)  
 ## For the ConfusionMatrix you'll need the following  
 ##install.packages("caret")  
 ##install.packages("e1071")  
 library(caret)  
 ## Set the working directory  
 setwd("c:/personal/r")  
 ## Connect to and pull your sample data from SQL Server.  
 driver <- "Driver={SQL Server};Server=YourServerName;Database=YourDatabaseName;Trusted_Connection=Yes;"  
 localdb <- odbcDriverConnect(driver)  
 data <- data.table(sqlQuery(localdb, paste("select * from ##testlogit")))  

Be sure to replace the YourServerName with your actual server name, and YourDatabase name with your actual database name. The database I'm using I've lovingly called CommunalToilet - it's where I put all my garbage data. :)

2.) Subset your data into training and testing data sets.

 ## Subset the data into Training and Testing datasets.  
 traindata <- subset(data, dataset == 'train')  
 testdata <- subset(data, dataset == 'test')  

3.) Create the glm (generalized linear model) using family binomial for logistic regression. At this point we're playing with data in which we already know the outcome. Because we will have data in the future in which we do not know the outcome, but would like to predict it, we can save our logistic regression model to an rda file. The next time I use this script with data that has no known events, I do not need to re-train my model. Finally, I viewed the summary of that model.

 ## Create the logistic regression model using family binomial link=logit  
 ## Using the training dataset.  
 ## If the model was already built, then simply reload it.  
 if(file.exists("my_testglm.rda")) {  
   ## load model  
   load("my_testglm.rda")  
 } else {  
   ## (re)fit the model  
   testglm <- glm(data = traindata, event ~ a + b + a * b, family="binomial")  
   save(testglm, file = "my_testglm.rda")  
 }  
 ## View summary of the model  
 summary(testglm)  

The glm (lm linear regression) model function is looking particularly for the formula. The formula (event ~ a + b + a * b) is your dependent variable (the event you're predicting) ~ over the explanatory variables (the variables that explain how the prediction is made). In this case, I'm using a and b as my explanatory variables, which might represent, for example, age and income. I also included an interaction of a*b. While this doesn't help my model in in this case, it is a powerful way of adding predictive ability when you determine that one variable has direct interactivity with another - their combined weight may be significant. For example, you might find it more significant when someone has previously had both a heart attack -and- has high blood pressure, compared to simply one or the other.


4.) Using your model, predict the probability, the rounded probability, and put those into our data sets.

 ## Create prediction based on the model parameters.  
 traindata$prob_of_event <- predict(testglm, newdata=traindata, type="response")  
 testdata$prob_of_event <- predict(testglm, newdata=testdata, type="response")  
 ## round the predicted probabilities to 1 and 0  
 traindata$prob_of_event_round <-round(traindata$prob_of_event, digits = 0)  
 testdata$prob_of_event_round <- round(testdata$prob_of_event, digits = 0)  



5.) Create and view the confusionMatrix results. The confusion matrix shows your predictive power by creating a matrix of the actual events to your predicted events. This results in predictions called true positives, false positives, true negatives, and false negatives. Thus, a true positive would indicate you predicted true in an actual event, and a false positive would indicate you've predicted false even though the event actually occurred.

 ## View the accuracy of the model through a confusion matrix  
 confusionMatrix(traindata$prob_of_event_round, traindata$event, positive="1")  


In the confusionMaztrix above, you can see I've got 2 false negatives. I falsely predict they are events (1) even though they are not actual events (0) in the data. Not bad, considering the data we've put into the model, there are two explanatory variable scenarios where the predictive value isn't perfect. The data leans towards being an event, but I have actual cases that turned out to be (0) non-events. Nonetheless, the model says that, given these variables, there's a 66% probability of it being an event. It just so happens, that 66% is not 100%. You can see these scenarios in the traindata output above.

Merge and copy the data back to SQL Server
1.) Merge the training and testing data sets into one data table, and then plot the resulting predictions.

 ## Merge your training and testing sets back together.  
 ## this is the data.table that we will copy back to SQL Server.  
 mergeddata <- data.table(rbind(traindata, testdata))  
 ## Plot the results of your predicted probabilities.  
 ggplot(data=mergeddata, aes(x=rowid, y=prob_of_event, color=dataset)) +  
   geom_point(size = 4)  



2.) Now copy the data back to SQL Server. The first step will check to see if the table I'm copying the data into already exists and if it does then we will remove the data.

 ## First test to see if the table already exists. If so, drop it.  
 sql <- paste("if object_id('CommunalToilet.dbo.testlogit_results') is ",   
       "not null begin drop table CommunalToilet.dbo.testlogit_results end")  
 destroyObject <- sqlQuery(localdb, sql)  
 ## Save the data set to SQL Server.  
 sqlSave(localdb, mergeddata, tablename ="testlogit_results", append = FALSE, rownames = F)  

3.) Finally query your resulting data in SQL Server.

 select * from dbo.testLogit_result  



Where to from here

Obviously you'll want to use some real data with some actual explanatory variables, and start predicting the future. The most common scenario I've seen around the web is through the use of the adventure works database and predicting if someone is a biker buyer. They usually pull in multiple customer variables like income, commute distance, number of vehicles, etc., and then predict if the person is a likely bike buyer - only so that they can they spam that person with emails or postal mail. I prefer to think of better things to spend my predictive prowess on than marketing. :\

Once you've gathered the data, open and run your R script. Even though you don't have events predicted yet, the model will load and predict events based on that.

Good luck, best wishes, etc., etc.,
-Parker

SQL Server and Univariate Linear Regression

SQL Server and Univariate Linear Regression

This is a method for evaluating the correlation coefficient of a univariate linear regression model by simply calling a function from within SQL Server. The motivation here was that I wanted to be able to determine a possible linear correlation without having to leave my SQL Server Management Studio environment.

To start you'll need to create the following;

1.) Create a new table data type.

 create type dbo.XandY as table(  
      [x] float null  
      , [y] float null  
 );  

2.) Create the linear regression function;

 create function dbo.UnivariateLinearRegression (@XandY dbo.XandY READONLY)   
 returns @ReturnThisTable table (  
      correlation_coefficient float  
      , slope float  
      , intercept float   
      , r_squared float   
      , standard_estimate_error float  
 )  
 as  
 begin  
      declare @total_points int   
      , @intercept float   
      , @slope float   
      , @r_squared float   
      , @standard_estimate_error float   
      , @correlation_coefficient float   
      , @average_x float   
      , @average_y float   
      , @sumX float   
      , @sumY float   
      , @sumXX float   
      , @sumYY float   
      , @sumXY float   
      , @Sxx float   
      , @Syy float   
      , @Sxy float   
      , @stdevX float  
      , @stdevY float;  
      Select   
           @total_points = count(*)  
           , @average_x = avg(x)  
           , @average_y = avg(y)  
           , @sumX = sum(x)  
           , @sumY = sum(y)  
           , @sumXX = sum(x*x)  
           , @sumYY = sum(y*y)  
           , @sumXY = sum(x*y)  
           , @stdevX = stdev(x)  
           , @stdevY = stdev(y)  
      from @XandY a  
      ;  
      set @Sxx = @sumXX - (@sumX * @sumX) / @total_points;  
      set @Syy = @sumYY - (@sumY * @sumY) / @total_points;  
      set @Sxy = @sumXY - (@sumX * @sumY) / @total_points;  
      set @correlation_coefficient = @Sxy / sqrt(@Sxx * @Syy);  
      set @slope = (@stdevY / @stdevX) * @correlation_coefficient;  
      set @intercept = @average_y - @slope * @average_x;  
      set @r_squared = (@intercept * @sumY + @slope * @sumXY - power(@sumY,2) / @total_points) / (@sumYY - power(@sumY,2) / @total_points);  
      select @standard_estimate_error = sqrt(sum(power(y - (@slope * x + @intercept),2)) / @total_points) from @XandY;  
      insert into @ReturnThisTable (  
           correlation_coefficient  
           , slope  
           , intercept  
           , r_squared  
           , standard_estimate_error  
      )  
      select correlation_coefficient = @correlation_coefficient  
           , slope = @slope  
           , intercept = @intercept  
           , r_squared = @r_squared  
           , standard_estimate_error = @standard_estimate_error  
      ;  
      return;  
 end  

Try it out

1.) Generate some sample data

 if object_id('tempdb..##example','u') is not null   
      begin drop table ##example end;  
 -- Sample Dataset  
 create table ##example (  
      TheDay int  
      , TheValue float  
      , TheUnits float  
 );  
 insert into ##example (TheDay, TheValue, TheUnits)   
 select 1, 10.2, 4 union all select 2, 11.8, 5 union all   
 select 3, 19.2, 8 union all select 4, 10.2, 4 union all   
 select 5, 12.4, 5 union all select 6, 13.2, 6 union all   
 select 7, 15.2, 7 union all select 8, 17.2, 8 union all   
 select 9, 16.2, 7 union all select 10, 25.2, 16 union all   
 select 11, 12.7, 10 union all select 12, 14.2, 11 union all   
 select 13, 15.9, 13 union all select 14, 13.6, 6 union all   
 select 15, 19.2, 9;  

2.) View your sample data, with bonus code to view the 5 day moving average.

 select a.TheDay, a.TheValue, a.TheUnits  
      , c.FiveDayMovingAverageValue  
 from ##example a   
 outer apply (  
      select FiveDayMovingAverageValue = avg(TheValue)   
      from ##example b   
      where b.TheDay between a.TheDay - 4 and a.TheDay   
      having count(TheValue) > 4  
 ) c;  

Which should return this sample data

3.) Now use your new data type and function to generate the correlation coefficient;

 declare @XandY as dbo.XandY;  
 insert into @XandY (x, y) select TheUnits, TheValue from ##example;  
 select * from dbo.UnivariateLinearRegression(@XandY);  

The results of the function returned are;

As you can see, from our manufactured values, we have a strong correlation coefficient of .76
The additional results can help with plotting, or determining the quality of fit with the r_squared, also known as the coefficient of determination.

Validation through R

The values were compared to the results of performing the linear regression analysis on the same data in R.

 library(RODBC)  
 library(data.table)  
 library(ggplot2)  

 localdb <- odbcDriverConnect("Driver={SQL Server};Server=YourServerName;Database=YourDatabaseName;Trusted_Connection=Yes;")  
 data <- data.table(sqlQuery(localdb, paste("select * from ##example")))
  
 testlm <- lm(data = data, TheValue ~ TheUnits)  

 summary(testlm)  
 cor(data$TheValue, data$TheUnits)  
 coef(testlm)  

 ggplot(data = data, aes(y=TheValue, x=TheUnits)) +  
   geom_point() +  
   geom_smooth(method=lm)  

Using the standard lm function, I generated the univariate linear regression model. Next I ran three diagnostic information queries: summary, cor, and coef.


As you can see from R, the summary shows a multiple R-Squared of .5786
The cor function shows the correlation coefficient at .7606
The coef function shows the slope and intercept, also mirroring the results from SQL Server.

Finally the ggplot output is shown below;


In a subsequent post I'll demonstrate the same capability but with the added ability of performing the correlation over a grouping value, which returns the coefficient for each grouped subset.

Till then...

-Parker

The Wretched

I glance down at the steaming cup of coffee, as I sit back into my recliner.
It was set there, as it always is, right on time. The aroma, tells me it's
exactly the way I like it and it's making me sick already.

I know she's there, without even looking. She's always there, somewhere,
right where she should be, at just the right time, doing just the right things.
The thought of her makes me even more sickened. I know, without looking,
the arc of her wretched hunched back will be there, just a couple feet
away in her rocking chair. I try pretending she isn't there, but the clickity-
clack of her knitting has started. It's deafening. This day, like any other,
is making me insane. I can smell her and my coffee. I can smell the detergent
she uses. I can hear her breathing. My head swims.

I can't resist but to glance, one eye squinting in disgust. The beast, and her
wrinkled skin, doesn’t notice me watching. It's 6am. I'm ready to leap out of
my chair and twist her fucking head right off. I imagine her perfectly crafted
hair, white and gray, now soaking in a pool of blood. How’s that for perfect?
If I didn't need this walker to get from my chair to hers, she'd be dead
already.

I'll have to catch her in passing. Maybe lean my walker out just enough to trip
her ass. Her face would flatten to the floor, knocking her perfectly brushed
teeth right in. She’s smacking her lips now. She does that when she's really
into her damned knitting. Every time I hear the sound, another nail drives into
my forehead. 63 years of this shit.

I would have to make it look like an accident though, she knows every damned
neighbor for 10 miles. Hugging and kissing and blessing their damned hearts at
every eager conversation. It sickens me. Must she really stick her nose into
everything? She starts, as usual, rattling off gossip. It's something about Edna.
She was already at her house this morning, dropping off some crap for some
damned recipe. I have no interest whatsoever. As she's waving her hands
around I notice her sagging arms. I'm cringing.

Her breasts are in her lap. She's dressed to the nines, in some flowery printed
crap. She's already been somewhere, done something, talked to someone,
planned some event, learned some ridiculous gossip, is spreading the news
around, and telling me that Susan is coming over in 15 minutes. This’ll have to
happen fast. She's going to want to get out of that squeaking chair soon. She'll
want to start making something in the kitchen, no doubt. Whatever it is, it'll
be wrapped up in a bow, have some unpronounceable foreign ingredients, and
have some fancy name. Who the hell eats that shit.

She's bitching at me now because I didn't take a shower and get dressed yet.
For Christ sake it's 6 god-damned-AM wench! I don't hear the words anymore,
just the tone of her irritatingly subtle, non-stop, nagging voice. I use my foot
and situate the walker a little closer in front of me. I'll just casually start to get
up when she passes, and my foot will accidentally kick the walker on its side.
Yes. I've had enough. I better put my glasses on, I don't want to miss.

I start pretending to watch TV, my eyes straight ahead. My mind though,
working in perfect harmony with my ears. Waiting. Waiting to hear the sound
of her chair as she rises. It's still slowly rocking, creaking, and cracking. I'm
sweating. My foot jerks uncontrollably. I almost blew the whole damned
thing. Get a hold of yourself. Stay calm. Focus. It must be done.

3 excruciating silent minutes later, my leg is so tense, I'm about to burst!
Suddenly, my leg spasms and I kick the walker onto its side! Son of a bitch I
ruined it! I glance over and see Martha is leaning forward in her chair, one arm
stretched outward, holding her cane out in front of herself at a perfect
angle to the floor.

The Contract

My name is Vincent Carrea. I'm 37 years old, single, and not even remotely looking for a relationship. There are far more important things to accomplish. Besides, my life is not ordinary. I don't think anyone else would be quite as accepting of who I am. Who else can you trust in this world, but yourself? Character? I've got character. Many may not agree with the defining attributes of my character, yet there's no denying it, I've got character. I revere a persons' core character as undeniable and inevitably inherent. While you can mask some of these traits, you will never escape them. They are definitive for life. Accept, embrace, and fulfill your destiny. Am I just establishing a method for a guiltless evasion of responsibilities for my actions? I've noticed now, dwelling on and debating these thoughts, the body laying on the floor has started turning new shades of grey. I take a deep breath as I close my eyes and attempt to store an image of her, a silent frame of beauty, into my mind. The slow, short drip of blood at the corner of her slightly open mouth is the only piece of the image that has any color. I will remember her. Thirty eight minutes have passed. I've taken it all in, appreciated and verified every detail. I am done here. With a slight tug on her scarf, I lift it to my nose before folding it and placing it in my pocket. I retrieve the contract out from under my coat, and slide the signed brown parchment under her head. Tilting my hat to her, I walk to the door. As I step out, a cold breeze from the Grady River brushes over my face, and I barely smile. This is me, there's no denying it - and I love it. I arrive at my home, and heat up something to eat. While I'm waiting, I start sketching the burned image from my mind. I sense it before he arrives. The shadow, darkening the hallway, closing nearer my desk. I raise my head from the stock of newspapers around me. He is ready for another soul. I accept the parchment he carries. His gruff tone is calm and commanding, as it fades out, and the shadow recesses to nowhere. I put my hat and coat back on - no rest for the weary or the sinners of the world. He’s back. I sense it. I halt - as I notice the name on the document: Vincent Carrea. 

The Dentist - Part II

The symphony had come to a pleasing cadence as The Dentist carefully polished each extraction and placed them into a glass jar. He sat down in front of his rusty Underwood. The embossed keys were thick with grime. After adjusting his glasses slightly, he punched in a date and a name. The label got smeared with dirt and blood as he slid his hand over it, affixing it to the jar. A drop of blood landing caught his attention. It was pooling on the floor just beneath the wooden operating table. He needed to get the body moved so that it could be ready for disposal, but first, he admired his faceless patient. The odd smile of the exposed jaw, no longer distorted by flesh, satisfied him. He checked his antique pocket watch. Disposal was only a matter of timing - and the repeating static thumping on the old turntable meant he was right on schedule. He slid his bare hand up and under the bottom jaw, grabbing her face, and pulled the table into a dark earthen hallway. The hall, with just a few scattered boards for stability, seemed to weep with seeping trickles of discolored water. At a slight downward slope, he only needed to hold on, until reaching the beast. For two years he had been buying used pieces and parts online, slowly constructing an industrial tree chipper in a sublevel below the basement. The behemoth machine was built so that it would drain directly into an adjoining sewage tunnel that then drained into a river eight miles away. The beastly machine generated a lot of noise. No one in this farming community, not in the four years since, had ever suspected his unassuming practice was also mulching estranged hitchhikers. The decrepit house, just off the highway, was on the market for years before The Dentist found it, rotting from the ground up, as if slowly being eaten by the damp and muddy earth it sat on. He resurrected the place, and the townsfolk adored him.

As if on cue, a rumbling, some dirt and dust unsettling, and the whole tunnel shook. The Dentist slid the body into the chipper, closed the lid, and turned it on. It's jumping, bumping, and bone shattering sounds were muffled by the passing train above. The Dentist turned and pushed the table back up the hall.

The Dentist

Strapping down his unwilling patient - the stained leather constricting the blood flow to other less interesting organs - The Dentist takes a sip of red wine. With passion and lust, he envisions the symphony of crimson blood spatter that would begin shortly. Slipping on his coat, the thick scent tantalizes his olfactory, which now reminds him of yesterdays prize. He caresses the jaw, carefully appreciating the bone structure. Oblivious to the frantic eyes tracking his every move, he uses homemade instruments constructed of discarded metals from the local rust yard and forces the mouth open. He looks up - a brief pause. Turning slightly, he squints as he prepares Strauss on the wooden turntable. A half smile breaks the hardened old blood streaking his face. The violins score delicately in the background. The molars on this one would be quite fascinating, but the slow extraction even more satisfying. The unprotected light bulb, dangling from the ceiling, flickers and ticks for a moment as it casts swaying shadows on the stone walls. It will need replaced, soon, but will serve its purpose this evening. Craftily tracing his scalpel over the outline of her mouth, he slices away fatty flesh. As it slides away, her true frame is revealed and more accessible now. The Dentist listens to the oboe, as he lifts salts to her nose, bringing her back around.

The Dark Masque

The Dark Masque
A short story by Incredible Mouse


Lyla often found herself there, in the wee hours of the night, quietly observing the old French architecture, serving as a backdrop to the collage of people coming and going. It had called to her again, as if from a barely audible voice echoing inside her head. It wasn't the first time she had listened to this voice, pulling her to the dimly lit streets of the French Quarter. The laughter and gawking of the day tourists was gone now, they were already snug in their hotels sleeping. This hour is reserved for the hard core stumbling drunks, the entranced raving hipsters, the sexual deviants, and the Goths who dejected society and hid under black dress. On rare nights she would spot a late-running haunted tour, serving as the final emptying of brave vacationing wallets. It was actually quite amusing to watch from afar. They often included a third party actor, who would stealthily follow the tour under guise of shadow and macabre outfits. Their job was to simply make a point of being occasionally noticed by the group - as if they too were being haunted this very night. The guide would play along, for everything it was worth. It must be a fun job, telling stories, watching goose bumps rise on the arms of your willing tourists. These ancient streets have an immense history of haunted stories, but $29.95 probably won't raise the dead from the cobblestones.

The fog was rolling in from the Mississippi, providing just enough chill in the air to keep most patrons inside. As Lyla walked down one of the narrow entrance streets to the Quarter, she passed a jazz club, and could faintly hear the bustling sounds of music and laughter coming from within. She tucked her scarf closer to her face to prevent the bite of the air. She couldn't see more than twenty feet ahead through this fog. Fortunately, however, the cool damp smell of it drowned out the customary stench of thick rotting beer. Each new building and passing figure was a materializing mirage, as each step gave way to clearer vision. There wouldn't be any tours tonight. Not in this weather.

She came upon an antique shop, and stopped at the window to admire the headboard on display. Made of darkly stained solid wood, it was hand carved into flowing water and waves crashing into rough rocks. The clouds at the top were ominous with deeply carved recessions. She couldn't imagine this in her home, and her girlfriend would probably burn it on the spot, but it was stunning nonetheless. She found the art in New Orleans was just as fascinating as its music and its people. Just then, as if pulling her from behind, the voice in her head awoke. She turned around and looked to other side of the street. There, through the fog, was a small shop. Its grey facade and yellowing window were lighted by two muted and flickering lamp posts hanging from both sides of the building. They must have been electric lamps that gave the illusion of gas lighted fire, though they convincingly appeared to be flames. They were throwing unsteady shadows over the items on display. It was filled was Mardi Gras masks, twenty or thirty of them, looking across the street, right at her. She stepped off the sidewalk slowly toward the store. It was a store she had never noticed prior, even though she had been down this particular street twice before. The building was out of place in what was normally a continuing theme of ornate French finishing and balconies laden in over-hanging ferns. It was a slender single floor that seemed far older than any of the other buildings in the area, as if the termite infested wood were being held up by the will of time itself. She was not even sure it was entirely vertical, appeared to slope inward at the top, and was shorter than the shops on either side. She wondered if the building had just been growing there, for a while, squeezing its way up, slowly pushing the others out of the way. It was the masks though that seemed to be yearning for someone to admire them. The movements of light on their faces made them seem less porcelain, with ever changing expressions. As she neared the window, and the masks became less hazy, she could see some that were happy and smiling, some were gemmed with royalty, and others were well crafted animals. They were stock items in this town of tourists. But, it was the mask near to the top that captured her eyes. It was the porcelain face of a woman, half black, half white. It was much less ornate than the others. Long swaying feathers seemed to grow from just above its brow, draping white on one side, black on the other. The shadows moving over the face depicted conflicting emotions, as if the mask were constantly at odds with itself. Its deep red lips looked delicate but powerfully sensual. Lyla looked up at this mask, unable to figure it out. What was it trying to say? She began to notice, even though it seemed to have been there the whole time, there were two faintly glowing white dots just below the mask. Squinting and focusing her vision there, she realized the dots were a reflection on the glass. Were those eyes staring at her? She heard someone whistle. Lyla spun around and was startled to see, in the middle of the street, a man hunched over his bottle of booze. He was making his way down the street. She exhaled, relaxed, and started frowning at herself. The man called to her and asked for a smoke. She told him she did not smoke. He mumbled something, then continued on his way. Lyla started towards Bourbon Street.

Walking again, down the sidewalk, she ducked and just missed a branch that was lurking from above. The rest of the tree came into sight, from a yard enclosed by a wrought iron fence with a slowly creaking gate that moved with the wind. Just ahead and to the side of the house, she glanced down a dark and narrow alley. Two silhouette figures were facing each other. They seemed to be kissing. As Lyla started to look away, it seemed that one of them had looked at her. Two eyes, apparently catching the moonlight, had shined for a moment. She looked back again, but the figures were now walking away. The goose bumps were now crawling up her arms. She laughed at herself, and shook her head, thinking about the haunted tour actors.

Lyla arrived at her favorite hole-in-the-wall. It was commonly known as The Silversmith, even though the sign out front was so old and worn out it was no longer readable. It was a relatively quiet lesbian bar that was always dark inside. It had one TV that was seemingly stuck on a sports channel. Music from the jukebox made actually listening to the TV utterly impossible. Not being the hip place to be, The Silversmith garnered just enough patrons to form a group of regulars. At the same time, being on Bourbon St. meant it was common for intoxicated visitors to curiously stumble in. This enabled Lyla to sit by herself at the far end of the bar and usually go unnoticed. She could admire the local ambiance, and enjoy a cocktail and some music. As she made her way through the smoke filled room, without really looking, she noticed the usual ten or so regulars grouped around the front of the television. A cute couple was laughing and dancing off to the side. At the back of the bar, she pulled a bar stool out, and retrieved a pack of cigarettes from the pocket of her wool coat. After making sure there was an ashtray nearby, and lighting her cigarette, she smirked at the thought of the drunk who had asked her for a smoke. Lyla sat down, removed her scarf and laid it in her lap. The stone fireplace on the wall behind her was instantly warming her up. The old woman behind the bar made her way over. The poor woman had grisly coal black hair and a face that was ten times older than her age. She was rough on the edges, but always attentive to her patron’s drink levels. Lyla ordered a margarita on the rocks, no salt - her favorite drink. Even though this wasn't the first time she had been in this bar, and ordered this same drink from the old woman, there was never any sign of facial recognition. There was never a “welcome back” greeting. No doubt this attitude wasn't going to win her bartender of the year, but Lyla loved it. Small talk bothered her.

The cute couple was now leaned against the wall and flirting with each other.  Kissing and groping would surely begin soon. Through the window at the front of the bar, Lyla could see the usual foot traffic of Bourbon St. partiers passing by. After listening and observing for a while, her drink had already started taking effect. She could feel the slight buzz in her body. If she continued to another her nose would begin to go numb. This always signified it was time to stop. But, she ordered another drink anyway. Other than a few curious street wanderers, the crowd hadn't changed much from the original regulars, all huddled around the sports channel. Lyla noticed one of the passing silhouettes in the window had stopped. Its dark shape was strange indeed as it stood still then slowly turned. Lyla became frightened when the shape reminded her of the feathers sprouting above the head of the mask in the shop window. Two eyes began to illuminate on its face. Suddenly, a woman, who had been watching the game on TV, jumped up and yelled “Son of a bitch.” Two of the other women also jumped up and laughed as they clanked their drinks and toasted to an apparent goal. They blocked the view of the window. Lyla stood up, trying to get a better view around them. As they all settled back down in their seats, Lyla no longer saw the dark figure. She looked at the floor and wondered ‘What kind of craziness is this? I must be going insane!’ She took her seat again, keeping her eyes on the window. She noticed the lusting couple had retreated up the stairs to the second floor bar. Maybe the voice was talking to her again, maybe it was just an onset of worry, or maybe it was just the alcohol talking, but whatever the reason, she felt drawn to go upstairs as well. She gathered her things, while continuing to check the window.

It was so much darker upstairs. In fact, you could hardly tell where you were walking. Lyla noticed the cute couple was ordering drinks from the bartender in the center of the room. This bartender was considerably younger than the old woman downstairs, and she was wearing an elaborate 17th century black dress, layered in gray doilies. The irony here, of the young woman wearing the old dress, was not lost on her. There were a few others hunched over the bar. They were dressed in black, sporting tattoos and neckbands of metal spikes. She didn't want to appear social with them, so she continued around a corner to an adjoining room. It had a pool table, several booths, and doors that opened onto the balcony. She considered going out there, to see if she could spot the figure that was in the window. The idea of actually seeing what she pictured in her mind was quickly defeated by a feeling of fear. She took a seat at one of the booths, and finished off her second drink. Another couple was making out at a nearby table. Only the candles on the tables lighted the room.

Under the archway of the entrance, Lyla noticed the tall silhouette standing there. She panicked. Her heart raced. It was going to literally jump out of her chest. The figure, stepped toward the candle light. Lyla could now make out its black tux and cape. The feathered mask from the shop was now staring directly at her, with beaming white eyes. The beautiful delicate lips were now separated by two protruding and elongated teeth. Blood dripped from its mouth. Lyla squirmed further into her booth, as she noticed that the two women who were making out at a nearby table were now staring back at her, also wearing feathered masks! Lyla closed her eyes and screamed, as the masked figure walked to her.

As she opened her eyes, and the mask was staring back at her from behind the old shop window, she knew now what this mask was trying to say to her. Another set of automobile headlights passed behind her and cast two eye-like dots of light in the reflection on the glass. She smiled contently, turned and headed home.