Tools and Concepts
A quick reference guide to marketing analytics tools and concepts.
ToolsANOVA  The purpose of analysis of variance is to test for significant differences between means. It assumes balanced data in the sets compared.
Bayesian analysis uses "prior" information plus data to arrive at predictions that are expressed in terms of posterior probabilities. For example, Sales force models indicate a prior probability of response at 5%, but the posterior probability of a given individual, based on the individual's age, income etc., can be larger than 5%. If the company ignores individuals scoring under 5% and only solicits those that score above 5%, the company should see increased profits [See the SAS blog]. Clustering Analysis  an exploratory data analysis tool which aims at sorting different objects into groups. Cluster analysis can be used to discover structures in data without providing an explanation. Joining (Tree Clustering), Twoway Joining (Block Clustering), and kMeans Clustering. Conjoint analysis  respondents are shown several descriptions at one time, with the variables at different levels, and asked to select the one description that they prefer. This process is repeated a number of times, for different combinations of levels and features. The output indicates which combinations of features and levels are of greatest interest to consumers.If you don't have many levels of variables to test Max Diff may be a better choice of analysis. Decision Tree:  The goal of Decision Trees (aka classification trees) is to predict or explain responses on a categorical dependent variable, similar to more traditional methods of Discriminant Analysis, Cluster Analysis, Nonparametric Statistics, and Nonlinear Estimation which have more stringent theoretical and distributional assumptions. The flexibility of classification trees make them a very attractive analysis option. JMP  SAS created JMP in 1989 to empower scientists and engineers to explore data visually. It performs powerful statistical analysis linked with interactive graphics, in memory and on the desktop. Latent Class Analysis  (LCA) is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data. For example, it can be used to find distinct types of attitude structures from survey responses, consumer segments from demographic and preference variables, or examinee subpopulations from their answers to test items. LCA is used in way analogous to cluster analysis (grouping cases of interval data), and factor analysis (grouping attributes of interval data). MaxDiff  an approach for obtaining preference/importance scores for multiple items (brand preferences, brand images, product features, advertising claims, etc.). MaxDiff is superior to rating scales because it provides greater discrimination among items, the maxdiff question is easier to understand and the there is no opportunity for scale use bias, since choices are based on pairwise comparisons. R is a free software programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls and surveys of data miners are showing R's popularity has increased substantially in recent years. More detail here about this tool. RSquared  a statistical measure of how close the data are to the fitted regression line, this is a classic diagnostic tool in the evaluation of linear regression, and classically misused. Rsquared can be artificially improved simply by adding more variables, or "overfitting". Regression generally is also prone to measurement bias, distribution assumptions and other limitations of parametric statistics. SQL  Structured Query Language, the lingua franca of database programming code, typically used to manage and do basic database review statistics. Many versions (ORACLE SQL, SAS SQL, Microsoft SQL) all have enough variation to prevent cross compatibility of code, and typically SQL is limited to basic tabulations as opposed to statistical analysis, inference and predictive modeling (SAS, R, SPSS, JMP). SAS  (Statistical Analysis System) is a software suite developed by SAS Institute for advanced analytics, business intelligence, data management, and predictive analytics. It is the largest marketshare holder for advanced analytics. SPSS  (Statistical Package for the Social Sciences) is a computer program typically used for survey authoring and operation, statistical analysis and text analytics. Generally, it is a statistical software that was originally devised for the management of data generated in the process of social science studies. TURF Analysis  Total Unduplicated Reach and Frequency Analysis. Answers questions like "Where should we place ads to reach the widest possible audience possible" or "What kind of marketshare will we gain if we add a new line to our model?" 
ConceptsBig Data  a general term used to describe the voluminous amount of unstructured and semistructured data a company creates  data that would take too much time and cost too much money to load into a relational database for analysis. A primary goal for looking at big data is to discover repeatable business patterns. It’s generally accepted that unstructured data, most of it located in text files, accounts for at least 80% of an organization’s data.
