Solved by verified expert:Statistical Research Projects Project Objective: Application of statistical concepts and techniques taught in QM 292. Project Outcomes: Formulation of research hypothesis and relevant model specification, data set construction, and empirical analysis of model results. Students can work as individuals or in teams of two. The project consists of 5 phases with each phase due to the instructor by midnight of the prescribed due date. Each completed Phase must be submitted to me electronically by the due date. I need the Final Report Format and Contents including 5 phases.The requirement of each phases and final report is attached.
statistical_research_projects_spring_2018.docx
Unformatted Attachment Preview
Statistical Research Projects
Project Objective: Application of statistical concepts and techniques taught in QM 292.
Project Outcomes: Formulation of research hypothesis and relevant model specification, data set
construction, and empirical analysis of model results.
Students can work as individuals or in teams of two. The project consists of 5 phases with each phase
due to the instructor by midnight of the prescribed due date. Each completed Phase must be submitted
to me electronically by the due date. The project is worth 100 points and constitutes 10% of your total
grade for this course.
Failure to submit a completed phase by the due date will result in a penalty of one letter grade or 10
points from the total project grade.
Note of caution, if you chose to work in a team and your team member fails to submit
an individual phase by the due date or submit the project in its entirety by the due date,
both team members will be assessed penalty points. If your team member withdraws
from the course, you alone will be responsible for the project which includes meeting all
required deadline submissions.
Graduating seniors who fail to submit the final project as an individual or as part of a
team will receive an ‘I’ (incomplete for the course).
The project is due April 29, 2018 in electronic format.
PHASE 1 – January 26, 2018
a. Identify and state a research question of interest. The research question must be stated
such that multiple linear regression can be applied to the analysis of the question. In
Phase 1 address (a) why is your research question of interest to the reader and (b) Why
is your research question relevant?
Examples of questions where multiple linear regression is not applicable:
a. Is there any difference in BMI between males and females? This is test of means.
b. Does Google stock have greater performance variability as compared to Yahoo? This is
a variance test.
Examples of research questions might be:
c. Females or more likely to attrite from college then males. Alternatively, this can be
phrased as ‘Is there any difference in the college attrition rate between the genders’?
d. Is the consumer price index a good predictor of Christmas retail spending?
e. Was President Obama’s American Recovery and Reinvestment Act successful in
stimulating the US economy after the May 2007 housing crash?
f. What factors are most likely to influence stock performance?
g. Does parental income and education influence BMI?
h. To what extent is defense spending influenced by world oil prices and US oil demand?
i. What factors (demographic, macroeconomic, industrial spending, environmental) are
most likely to predict the outbreak of Ebola or other diseases (polio, malaria, bird flu)?
PHASE 2 – February 9, 2018
2. In a general format, state the model. In other words, identify your dependent and independent
variables. Your model must include at least 3 independent variables, but not to exceed 7
independent variables.
a. EXAMPLE: If your research question is: “ Does parental income and education influence
BMI? “, then the dependent variable would be BMI and the independent variables
would be parental education and parental income.
i. The general model statement: BMI = f(parental education, parental income).
2b. Identify the source or sources of your data. When using cross-sectional data ALL data must
be pulled from the same time period but can be pulled from different sources. Identify the
source and time period for each data element.
2c. Data sets much have a minimum of 50 observations, but not to exceed 100 observations.
2d. Include a definition of each data element, data source, and period for which data element is
captured.
For example: BMI = Body Mass Index, is measure by body mass divided by the square of
the individual’s height. Data source: Health and Human Services, www.hhs.gov, fiscal year 2010,
state level data.
Example #2: Parental income = is combined household annual income. Data Source:
www.bls.gov, state level data for 2012.
IMPORTANT: Your dependent variable cannot be binary, categorical/ranking, or strictly a
discrete variable. Your dependent variable must be a continuous variable. Your independent
variables can be all continuous or a mix of discrete and continuous. Your independent variables
CANNOT be strictly discrete variables.
Submission of Phases I – II: Must be submitted in WORD format. The word document should be
attached to your email. Do not send Phase I and II as part of the body of an email. As part of the Phase
II submission include Phase I. Definition of variables (see Phase II ‘2b’) should include time period data is
captured, explicit definition of variable measurement or how the variable will be transformed for
inclusion in Phase IV. For example:
INCORRECT: Weight –‘ how much a person weighs.’
CORRECT: Weight – ‘weight as measured in pounds’, data source Center for Disease Control,
www.cdc.gov, individual level data, 2010.
INCORRECT: Unemployment rate – ‘the unemployment rate’
CORRECT: Unemployment rate – ‘the number of people unemployed per 1000, data source Bureau of
Labor Statistics, www.bls.gov, state level data 2011.
Transformed Variables (see Phase 3 for example): Categorical variables such as gender, race, color,
manufacturing sector, team, geographic region, as examples, need to be transformed into quantitative
variables. For example, gender is captured as M or F, this will need to be transformed into 0,1 variable.
Example: M = 0 and F = 1 or M =1 and F = 0. The definition should read if M then M = 0 and if F then
F=1
Example: Let’s assume your data contains 4 geographic regions, North, South, East, and West then for
Phase II you will need to define the states that comprise the North geographic region, similarly for east,
west, and south. In Phase III, you will need to transform these variables into columns of 0,1 dummy
variables. See examples below.
PHASE 3 – February 23, 2018
3. Analysis of your research question requires data. There are three general sources of data; (1)
survey, (2) historical data, and (3) experimental data.
a. Survey data is response data from questionnaires. Students can construct a survey and
collect their own data for purposes of this project by PERMISSION OF THE INSTRUCTOR
ONLY. Students wishing to use survey data must provide me a copy of the survey prior
to data collection for review. There are issues of privacy and human subjects protection
that must be considered with every survey.
b. Historical data – is largely collected by public and private institutions. There are
numerous publically available data sources on the web:
i. http://www.rfe.org/showCat.php?cat_id=2
ii. http://www.bls.gov/data/
iii. http://www.freefinancialdata.com/stocks/index-data/
iv. http://www.census.gov/compendia/statab/
v. http://www.healthdata.gov/
vi. http://nces.ed.gov/
vii. www.city-data.com
c. Experimental data – data obtained from controlled experiments. Using experimental
data is not recommended for the purposes of this project.
Identify your data source(s). Once the data source(s) has/have been identified, input the data into an
excel spreadsheet. Your data set must contain minimum of 50 observations. All data must be cross
sectional data. Cross sectional data is defined as different observations within the same time period.
Examples of an excel data set is provided below:
Example 1
Company
Earn/Share %Growth Net Income Revenue%Growth Revenue Annual%Return StkPrice %Change
4Kids Entertainment
312
32.2
100
77.9
201
-8
RF Micro Devices
179
55.9
129
325.1
164
78
Siebel Systems
190
166.2
117
1171.2
173
406
Network Appliance
115
73.8
80
579.3
224
432
Source Information Mgmt.
112
11.2
125
90.9
74
-26
Salton
161
83.4
71
795.9
90
38
Zomax
88
30.3
115
238.4
92
35
12 Technologies
96
58.4
71
750.5
138
711
Diamond Tech Partners
300
16.2
50
136.2
133
472
Stericycle
760
15.3
72
186.7
44
73
Example 2
Home
1
2
3
4
5
6
7
Price $K
260.9
337.3
268.4
242.2
255.2
205.7
249.5
Size sq feet
2666
3418
2945
2942
2798
2210
2209
Number of Bathrooms
2.5
3.5
2.0
2.5
3.0
2.5
2.0
Niceness
7
6
5
3
3
2
7
Pool? yes=1; no=0
0
1
1
1
1
0
0
Submission of Phase III: Phase III must be submitted in EXCEL format. Attach Phase I-II in your
submission. This allows me to verify the accuracy of your data and data format in terms of your Phase III statements and definitions. See examples provided earlier in the document as to how your data
should be captured.
Excel CANNOT read alpha characters. All alpha characters must be transformed into 0,1 variables. See
examples below.
Transformation of variables – geographic example provided above:
Geographic
Region
N
N
W
E
S
S
E
North
East
West
South
1
1
0
0
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
1
1
0
Transformation Example: For Individual level Race Variable
Race
White
White
Asian
African American
Hispanic
African American
White
White
1
1
0
0
0
0
1
African American
0
0
0
1
0
1
0
Hispanic
0
0
0
0
1
0
0
Asian
0
0
1
0
0
0
0
Transformation Example: Individual Level Gender Variable
Gender
M
F
F
M
Gender Binary m= 1 f = 0
1
0
0
1
Transformation Example: National Football Conference – American and National League
League
American
National
American
American
American = 1 National = 0
1
0
1
1
PHASE 4 – March 23, 2018
4. Using excel ‘Data Analysis Toolpak’ run your regression. Save your output to a separate
spreadsheet. A tutorial for this add-in can be found at
http://cameron.econ.ucdavis.edu/excel/ex61multipleregression.html
Submission of Phase IV – Phase IV must be submitted in EXCEL format. Include in your submission
Phase I-III in the format discussed above. There is a high probability that your initial regression outputs
will be incorrect or ‘explode’. Indications that your regression output is incorrect include:
a.
b.
c.
d.
e.
f.
Blank cells in the output
Observe ‘NUM’ in a cell
Very high R-squared
Negative R-squared
R-squared that exceeds ‘1’
Excel error message that the data is not compatible
If you should encounter these problems, check the following:
1.
2.
3.
4.
5.
Cells do NOT contain formulas.
Cells do not contain alpha characters, spaces, blank cells, or other non-numeric values.
Dummy variables are correctly captured.
That the sum or difference of any two columns does not equal a third column
In the case of dummy variables, you must drop (do not include) at least 1 of the dummies.
I will review each regression model for accuracy. I strongly encourage you to run your regressions as
soon as possible. In other words, I would not recommend waiting until the due date to run your
regressions.
PHASE 5 – Due April 29, 2018. PHASE 5 will not be accepted late – see syllabus.
5. Report your findings/results from the regression analysis. Project report should not exceed 8
pages EXCLUDING appendices, can be single or doubled space, and must be in Times New
Roman, 12 point font, 1 inch margins formats. You must be sure to follow the outline below and
address all questions, discuss all metrics, output, findings, and conclusion as outline below.
Final reports will include Phase I-IV information and for Phase V need to discuss/include the
following:
Final Report Format and Contents
READ THE INSTRUCTIONS CAREFULLY
Follow the outline as exactly detailed below. Address each section and address the content under
each section. Failure to follow the outline or address the content of each section will result in points
deducted.
Papers must be submitted DOUBLE SPACED, 11 or 12 point font, 8 Page limit NOT including
appendices or references.
Papers MUST be submitted in format specified, labeled, and addressing points discussed below.
Phase 5 Research Paper Outline:
1. Cover page – including title of project, your name, date, and an abstract. Abstract should
not exceed 3 paragraphs.
b. NOT INCLUDED IN 8 page limit
2. Introduction – in well written English, discuss Phase I and Phase II part 2b.
a. Why is your research question of interest? Why is it relevant?
3. Data Section – discuss your data, data definitions, any transformations, data source, data time
period – this section references Phase II part 2b and Phase III.
a. DO NOT include a copy of your data tables in the body of the paper. You may include a
copy of the data table in the Appendix –but this is optional.
4. Regression Output (Phase IV) – In this section discusses any modifications to the original model
statement specified in Phase II. For example, deletion of variables, alternative specification of
the original model statement ect.
a. Your regression output is part of the appendix and NOT part of the main paper.
5. Results
a. Discuss how your regression results do or do not support Research
questions/hypothesis.
b. Using relevant test statistics (listed below) discussion of findings. Does the data support
your research question/hypothesis?
i. F-Test – discuss the test metric including purpose (what is it testing), results,
decision/conclusion as it relates to your research question.
ii. R-squared discuss the test metric including purpose (what is it testing), results,
decision/conclusion as it relates to your research question.
iii. Statistical significance based on t-test – discuss the test metric including purpose
(what is it testing), results, decision/conclusion as it relates to your research
question.
iv. Marginal effects – Using the data and coefficients, provide examples of and
discuss marginal effects. In cases where the coefficients are (a) discrete or (b)
statistically insignificant discuss the limitations of computing marginal effects.
v. Prediction – using your regression output, provide an example of model
prediction. In other words, chose values within the range of your independent
variables, compute the predicted value, and discuss this prediction in light of
your research hypothesis.
c. Conclusion – summarize empirical results, how do the regression findings do or do not
support your hypothesis, discuss shortcomings of the model, how would you improve
model, and ideas for future research.
Appendices (NOT INCLUDED IN 8 PAGE LIMIT):
a. Appendix A: For each variable in your data set provide descriptive statistics. Discuss
2-3 of the variable summary/descriptive statistics. For example, if the average value
of a variable in your data set differs from National or State level means, large
variances, non-normal distributions ect.
b. Appendix B: Include table of regression results.
c. Appendix C: Copy of original data – optional
…
Purchase answer to see full
attachment