This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Data

Database Modeling, Summer 2019          

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

Database Modeling, Summer 2019             

ACME Direct is a direct marketer of books, music, videos and magazines.  The Marketing Director of ACME Direct tested a new book title slightly over one year ago and has decided, based on the results of the test, to promote this title to selected names from the database.  Last month ACME Direct purchased, for the first time, new list enhancement data (age, income, marital status, home value, etc.) not previously on the customer database.

 

Using the saved sample from the original test promotion one year ago, the analyst is preparing to develop a regression model which will aid in predicting the type of customer most likely to order this particular book title.  The Marketing Director has asked the analyst to append the new enhancement data to the sample in order to see if any of this “new enhancement data” will come into the regression equation.

 

Do you have any concerns regarding the marketing director’s request?  If so, explain your concerns to the Marketing Director in 75 words or less.

 

Yes, I am concerned that appending the new data to the database for the purpose of regression analysis could lead to a bias in the dataset and higher error in the final regression model.  Appending data onto an existing dataset could pose diverse challenges. For instance, the new data could have been collected from a different population, different samples, different time frames, among differences. As a result, the data may not have similar validity measures with the preexisting one. Therefore, a regression model could be prone to a bigger estimation error due to the differences in the datasets. The assumptions of homogeneity and normality may most probably not be met or could be altered by the new dataset.[unique_solution]

 

IB> wrong reason. Use simple logic. No statistics. Statement A shows something cannot achieve the desired result/

 

 

 

 

Question #2 (5 points each part, 10 points total)

 

 

ACME Direct is a direct marketer of books, music, videos and magazines.  Below are two customers selected at random from the ACME database.

 

  1. a) Based on this information alone, which customer do you believe is most likely to order an upcoming book promotion and why?

 

 

Customer

Total PromotionsTotal Book Promotions Total Book Orders
Smith79474
Johnson61333

 

 

 

 

 

Using binomial probability and odds ratio to evaluate the probability of a successful purchase, it is possible to measure the probability of success for both customers. For customer Smith, the ratio of total book promotions to total book orders is 0.085. The ratio for customer Johnson is 0.09. Assuming a constant probability of success from both customers, it takes more trials to convince Smith to purchase a book as compared to Johnson. Therefore, there are significant reasons to believe Johnson a higher probability of purchasing a book based on the upcoming book promotion as compared to Smith. However, there are probabilities of 0.915 and 0.91 that a single promotion will not yield a purchase from Smith and Johnson respectively.

 

 

 

 

 

 

 

 

  1. b) If Smith’s last known order date is 4/15/00 and Johnson’s last known order date is 2/6/99, would you change your mind regarding who you selected to promote in part (a)? Fully explain your answer.

 

No I would not. The decision is based on preexisting data evidence. The data has been collected for many years or a valid sample size and generates facts that can be statistically tested. Although consumers’ behavior changes, one would need a valid dataset to test the behavioral change and conclude that it is significant. The dates of the recent purchase are not enough to conclude that the purchase patterns or behavior of the two customers have changed.  The information cannot be tested to generate any statistical meaning.

 

 

IB> OK

 

 

 

 

 

 

 

 

 

 

 

 

 

Question #3 (5 points each part, 20 points total)

 

 

 

ACME Direct is a direct marketer of books, music, videos and magazines.  They have a database of size 10,000,000.  For each of the pairs of data elements residing on the ACME Direct database, describe how you think they would be correlated (positive, negative or zero), the degree of the correlation (strong, moderate or slight), and defend your answer.

 

IB> Do a) – e) over. Use book.

  1. “Total number of promotions since last order” associated with each customer and “total number of orders ever.”

 

 

Ideally, there should be no correlation between both variables. Statistically, this would be a weak positive correlation. There reason is that the strength of correlation is based on the fact that the number of promotions since the last order has no impact on orders that were made before the last order. Therefore, the two variable would have a slight measure of correlation that is based on the mere numbers. The correlation is highly likely not be statistically significant. The variable has no direct relationship.

 

 

 

 

 

  1. “Customer age” and “total number of cookbook purchased ever.”

 

 

This would be a strong positive correlation whereby book purchases increase with and increase in age and decreases with a decrease in age. Previous research and ideally, reading behavior increases age increases. Although few readers could have a reading behavior at a young age, they may not be potential buyers and would have to dependent on the seniors for their book purchases. Also, the society has a culture where people tend to read more books as they grow order. Cookbooks are more likely to be purchased as people age and try new cooking recipes. An increase in age is therefore highly likely to trigger an increase ibn cookbook purchases.

 

 

 

 

  1. “Total number of cookbooks purchased ever” and “total number of videos of any kind purchased.”

 

 

 

This would be a strong positive correlation. An increase in cookbooks purchases will trigger an increase in videos purchases. People who purchase cookbooks would normally be enthusiastic people who are ready to learn new things and have time to read and watch. Therefore, the consumer behavior of both consumers are almost similar. Therefore, there would be strong relationship between them,

 

 

 

  1. “Total products returned for full refund” and “total products ordered.”

 

 

 

This would be a moderate positive correlation. As the number of products ordered increase, the number of refunds is also likely to increase. This is because the bookseller is handling more clients, and therefore expose to a higher refund claim. When total products ordered decline, there is a subsequent decline in full refunds claim. This type of correlation is positive and moderate.

 

 

 

 

 

Question #4 (5 points each calculation, 15 points total)

Last year a sample of names from the books product line primary customer segment was test promoted for a new cookbook offer.  The ACME analyst has built a regression model for this sample of names.  The resulting cumulative and incremental gains charts developed on the validation sample are shown below.  Fill in the 3 missing numbers on this chart.

 

Question #5 (5 points each part, 10 points total)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  1. a) fill in the 3 missing numbers – showing exactly how they are calculated.

Number 1= 5.60

Calculations:

Response Rate = Number of Responses / Total Number of Responses

IB> a) and b) and c) are missing. Show exact calculations and all details.

Your #7 is done in the style required

 

 

 

 

 

  1. b) how do you calculate the gain of 30 for bucket 7

 

 

 

 

  1. c) how do you calculate the cumulative RR of 6.5 for bucket 3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Using the gains charts shown in Question #4 the Senior Product Manager at ACME Direct will determine who to promote for his upcoming cookbook promotion.  To ensure 5% profit-after-overhead for this campaign, the Senior Product Manager has determined he should not promote any group of customers with a response rate below 4.00%.

 

If the primary customer segment for the books product line (the universe the regression model was built on) represents 3,450,000 names,

 

  1. How many names should the senior product manager promote?

 

IB> a) and b) are missing. Show exact calculations and all details.

 

 

 

 

 

 

 

 

 

 

 

  1. What will be his expected number of orders in roll-out?

 

IB> a) and b) are missing. Show exact calculations and all details.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Question #6 (5 points each part, 15 points total)

 

 

Answer the following questions regarding regression analysis:

 

  1. In 75 words or less please explain what causes multicollinearity?

 

 

IB> a) is OK. But b) and c) below are not mentioned in our text nor in the classwork or exercise. Please do over

 

Multicollinearity is a condition of high inter-associations or inter-correlations, and it occurs when same kind of variables are repeated, for example, weight in kilos and weight in pounds. Also, the conditions occur when dummy variables are used inaccurately, for example, exclusion of a dummy variable in one category. Moreover, when a variable obtained from other variables in a data set in included and when variables have a high correlation between them also result in multicollinearity. Multicollinearity is a condition of high inter-associations or inter-correlations, and it occurs when same kind of variables are repeated, for example, weight in kilos and weight in pounds. Also, the conditions occur when dummy variables are used inaccurately, for example, exclusion of a dummy variable in one category. Moreover, when a variable obtained from other variables in a data set in included and when variables have a high correlation between them also result in multicollinearity.

 

 

 

  1. In 75 words or less please describe two ways one can identify that multicollinearity is present in a model.

 

One:

Variance inflation factor (VIF) can be used to detect multicollinearity in a model. This is a measure found in statistical software through which multicollinearity diagnosis is done. In this case, regression analysis is run into one of the correlated x variables used as a dependent variable in a dataset against the other variables, which are used as predictor variables. It completes this to identify how much is a variation in a variable, as explained by another variable.

 

Two:

A sample collected for data analysis can be utilized to measure if a dataset has multicollinearity. In this case, a sample can be divided into two parts to check for the existence of similarity between coefficients of both. If a researcher finds out that the two coefficients of the sample differ drastically, then the dataset is said to have multicollinearity. In other words, there is instability of coefficients as a result of multicollinearity.

 

 

 

 

 

 

  1. c) In 75 words or less please describe two ways that one can rid a model of multicollinearity.

 

One:

 

One way to handle multicollinearity is by use of Principal component analysis (PCA). This approach is available in statistical software used to evaluate the Eigenvalue. When evaluated Eigenvalue turns out to be near 0, a transformation can be carried out according to the function, or in other cases, an analysis can be done using eigenvectors.

 

 

 

 

Two:

 

The second method of getting rid of multicollinearity is through the use of Ridge regression estimates. In this case, the estimates tend to be stable due to the fact that little data changes in the dataset from which a regression model is fitted cause small changes in the estimates.

 

 

 

 

 

 

 

 

Question #7 (10 points)

 

 

ACME Direct, a direct marketer of book, music, videos and magazines, has built a multiple regression model predicting who is likely to order a World War II video set based on a saved sample of 9,994 names promoted for this product last fall.  Below is a copy of the EXCEL output from the regression run.  The dependent variable was the typical binary indicator (1=order, 0=silent).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Score customer Jones and Matthews on the above equation and indicate which one is most likely to order this video product.  Assume today’s date is 5/1/00.

 

 

Customer

Last Order DateTotal  Promotions Sent (All Product Lines)Total Paid Orders

(All Product Lines)

Total Video Orders 

Gender

Jones1/1/003531male
Matthews10/1/996673unknown

 

 

 

 

Jones Scores:

 

 

Mathews scores

 

 

 

 

 

Based on the results from the regression model analysis, Jones is more likely to order as compared to Mathews.

 

 

 

Question #8 (5 points each part, 10 points total)

 

 

 

Below is a cross-tabulation of two variables produced on a sample of 10,000 customers test promoted for the book “The Secret Lives of Our U.S. Presidents” denoted as SECRETS.  The two variables used in this cross tabulation are “Time in Months Since Last Book Order” and “Number of Books Ordered Ever.”  Answer the following questions about this cross tabulation.

 

 

 

 

 

 

Number of BooksTime in Months Since Last Book Order
Ordered Ever:1-1011-2021-4041-6061+Total
146/525 = 8.76%30/445 = 6.74%16/321 = 4.98%3/135 = 2.22%0/29 = 0.00%95/1,455 = 6.53%
242/387 =

10.85

77/978 = 7.87%44/645 = 6.82%10/289 = 3.46%1/55 = 1.82%174/2,354 =

7.39%

3-544/349 = 12.61%95/789 = 12.04%149/1,256 = 11.86%39/443 = 8.80%12/250 = 4.80%339/3,087 = 10.98%
6-1041/298 = 13.76%10/306 =

13.07%

69/534 = 12.92%67/567 = 11.82%24/282 = 8.51%241/1,987 = 12.13%
11+0/0 = N/A5/27 = 18.52%34/232 = 14.66%60/444 = 13.51%52/414 =

12.56

151/1,117 = 13.52%
Total173/1,559 = 11.10%247/2,545 = 9.71%312/2,988 = 10.44%179/1,878 = 9.53%89/1,030 = 8.64%1,000/10,000 =

10.00%

173 represents the number of customers that fell into this cell who ordered SECRETS.

1,559 represents all customers (orders & non-orders of SECRETS) falling into this cell.

11.10% represents the percent of those that fell into this cell that ordered SECRETS.

 

 

 

 

  1. What is the SECRETS order rate of the names in this sample that have placed only 2 book orders and whose last book order was 11-20 months ago?

 

 

 

IB> Missing: Needs a calculation

 

 

 

 

 

 

  1. What is the SECRETS order rate of the names in this sample that have placed less than 6 book orders ever?

 

 

 

 

 

IB> Missing: Needs a calculation

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask