Workshop Overview

Overview

Ceres Analytics In-House Workshop
Please Choose 4 of 8 Topics Below
For examples of applicability, see Statistical Answers to Common Business Questions
For pricing and Excel-based tools, see Workshop Deliverables

1. Comparison Tests

What

Test whether two groups are the same, or whether one group is the same in two different time periods.

When

When there is a reason to expect a difference, such as:
- implementation of a policy or program (one group before a marketing campaign vs. after)
- an experiment (two groups at the same time: one received a drug and the other didn't)

Where/Who

- If "When" is before vs. after, then "Who" would be the affected group (the same subjects at both times)
- If "When" relates to an experiment, then "Who" would be two different groups (different subjects at the same time)

Why

To learn whether an action (business policy, administration of drug, marketing campaign) makes a difference

How

- The kind of test depends on the underlying distribution of the trait being measured. Different tests have different definitions of the "center of the distribution". Some tests do not require a center.
- Many health or social phenomia are normally distributed (the familiar bell curve which is symmetric about its center).
- Many business phenomena are skewed to one end (i.e., retail transactions: there are often more at the low end than the high).
- In this seminar, we review how to choose the tests, and then we conduct the tests.

2. Cluster Analysis

What

Find groups that contain subjects (customers, students, research participants, products) that are as similar to other subjects in the group as possible, while the groups are as different from each other as possible

When

You want to learn what key groups constitute the whole. You have a few (or several) traits that are strategically important to grouping.

Where/Who

Customers, research participants, products, etc.

Why

- Knowledge of key groups--and their sizes--can help a business to tailor strategy to its market. By using cluster analysis to segment the market, we're assured that the market segments will be defined in groups that are as unique as possible in terms of the traits used to define them.
- Profiles of the clusters are telling. Often segments are named based on the relative value of the traits (e.g., " 'Thrifty frequent buyers' are the source of 30% of our business despite their small transactions, simply because they shop 2-3 times a week.")

How

Ceres' research and experience has led to a method with consistently superior results. That method is programmed in the Ceres platforms and covered in this seminar. Graphic profiling tools are also implemented.

3. Linear Regression Analysis

What

Predict the level of a continuous measure from other measures, including continuous, qualitative, and yes/no characteristics

When

You want to forecast something or understand the relative impacts its drivers (magnitude, certainty)

Where/Who

- Aggregates (e.g., total sales) or individuals (amounts of customer purchases)
- Time Series (quarterly home values, student GPA's, GDP), Cross-sections (processing time for different manufacturing pieces), or Pooled Corss-section and Time Series (panel data for a group of people observed every year)

Why

- Forecasting business measures (unit sales, revenues, number of customers) can lead to reliable planning for both supply and financial purposes
- Understanding the relative impacts of a measure's drivers enables "what if" scenario analysis. Hypothetical values for the drivers can translate into expected forecasts for unforeseen circumstances (e.g., a substantial drop in GDP for a sales forecast driven by GDP)

How

- In the case of only one predictor, linear regression places the line closest to the data (in light of a few mathematical considerations)
- The line has a formula that is a lot like the old "y=mx+b" of middle-school algebra. In that formula, "m" represents "rise over run". In business, it could instead represent something like sales per dollar spent on advertising.
- In linear regression, unit steps in the predictors (X's) beget unit steps in what you're predicting (Y, also called the "dependent variable")
- Multiple regression works the same way, except that the "line" becomes abstract in numerous dimensions

4. Logistic Regression Analysis

What

Predict a yes/no answer from other measures, including continuous, qualitative, and yes/no characteristics

When

You want to predict a yes/no outcome or understand what drives it (magnitude, certainty)

Where/Who

- Many yes/no outcomes merit prediction: customers' decisions whether to buy a product; members' decisions whether to increase or decrease subscribed services; whether a patient lives or dies

Why

- Understanding the key drivers of a yes/no outcome enables targeting of actions that evoke a yes/no response

How

- In the case of only one predictor, logistic regression places the "S-curve" closest to the data (in light of a few mathematical considerations)
- The S-curve's formula is a lot like linear regression's "y=mx + b" But instead of "unit steps in Y", you're predicting "proprtionate steps in the odds of Y". (Y=1 is usually yes; Y=0 is usually no).

5. Decision Tree (CHAID)

What

CHAID (Chi-Square Automated Interaction Detector) identifies segments (of customers, patients, accounts) based on key interactions of predictive variables

When

- Predictors are expected to compound one another and/or to be mutually exclusive.
- One measure can be selected as the basis on which to derive splits (i.e., as the "dependent variable", just like "Y" in regression)

Where/Who

- CHAID is used in identifying key interactions that explain yes/no questions
- A similar technique, C&RT (Classification and Regression Trees) can answer questions posed by both yes/no and continuous data variables (e.g., for the latter, "What combinations of customer characteristics segment the level at which they spend?")

Why

- A decision tree gives a "50,000 foot view" of the terrain that comprises the question a business is examining
- When a question begins with "What combinations of...", the answer often begins with a decision tree, and may end with the decision tree, as well.

How

- Automated search techniques find the breakpoints in predictors, and select the predictors, which correspond most strongly to the divisions of the dependent variable

6. Factor Analysis
(Principal Components)

What

Factor Analysis is a data reduction technique. It reduces a large number of variables to a few key, underlying composites

When

- The objective is to represent key concepts in a few lean, efficient predictors
- The objective is to determine how many truly independent concepts are embodied in a big "mish-mash" of data

Where/Who

Factor Analysis is well-known for its use in stock market analysis, where multitudes of financial accounting measures are collected into risk factors like "the Growth Factor", "the Volatility Factor", etc. (cf. MSCI Barra, http://www.msci.com/products/indices/strategy/risk_premia/factor/)

Why

- Comparisons based on a composite are easier to interpret than numerous comparisons
- In regression analysis, if your data matrix has more columns than rows, then you can't extract maximum value from it unless you compact many of its columns.

How

- Principal Components shares the mathematics of physics' quantum mechanics:

- eigenvectors report the amount of information captured by the first--and each subsequent--composite variable
- eigenvalues designate the weights applied to source variables to generate the composites

7. Linear Discriminant Analysis

What

Linear Discriminant Analysis articulates "bright lines" that distinguish groups of individuals (customers, patients, accounts, etc.)

When

It's crucial to understand the relative importance of predictors that separate numerous groups from another (logistic regression can have multiple outcomes, as well, but LDA is more straightforward)

Where/Who

Market segments (customers, patients, accounts) that have contrasting profiles on key measures.

Why

- Understanding the relative importance of distinguishing factors enables a business to improve service to important segments
- When the measure that distinguish segments can be influenced by management, then a business has an opportunity to improve its business by helping customers migrate "across the line" from a bad to a good segment

How

Like Principal Companents, LDA is also rooted in eigenvalues. But its objective is different. Whereas PC reduces the number of columns (variables) of a data matrix, LDA articulates the separation of data rows through a function that classifies them.

8. Neural Network
backward propagation as an introduction to machine learning

What

- An early technique of artificial intelligence
- Neural networks focus on pattern recognition
- Neural nets preceded most current analytic methods of machine learning

- Many statistical methods have machine-learning counterparts, also referred to as "kernel methods".
- By gaining familiarity with neural nets, an analyst can quickly gain the background to branch out into kernel methods

When

- Only the resulting prediction is required; the relative importance of the predictors can remain unknown within a "black box"
- Historical patterns are non-linear and virtually intractable

Where/Who

Same as Linear Regression Analysis

Why

- When direct association between the dependent variables and the predictors cannot be observed, or complex interactions are suspected, a neural net can find a way to associate the outcome with the inputs
- Often, a neural network model can serve as a "test" to determine whether reliable predictions can be obtained when regression has failed, giving the analyst guidance on how to find the interactions

How

- Within the constraints provided to it, the neural net algorithm takes a blind guess at how to combine the inputs to fit the outputs
- The algorithm compares the predicted value to the actual, and adjusts the parameters that generated its guess
- If the fit is better, the algorithm continues in a similar direction. If the fit is worse, the algorithm switches direction and changes one or more parameters in a different way
- The comparison/continuation-vs-switch cycle continues for a pre-determined number of iterations. Each time, the network "learns" a bit more about which direction it should be changing.