To request a specific blog topic or if you have any questions email James@StatisticsSolutions.com

Tuesday, March 12, 2013

Coefficient of Determination



  • To determine how much variance two variables share, or how much variance is explained, or accounted for, by a set of variables (predictors) in an outcome variable.
  • Values can range from 0.00 to 1.00, or 0 to 100%.

  • In terms of regression analysis, the coefficient of determination is an overall measure of the accuracy of the regression model.

  • In simple linear regression analysis, the calculation of this coefficient is to square the r value between the two values, where r is the correlation coefficient.


  • It helps to describe how well a regression line fits (a.k.a., goodness of fit). An R2 value of 0 indicates that the regression line does not fit the set of data points and a value of 1 indicates that the regression line perfectly fits the set of data points.

  • By definition, R2 is calculated by one minus the Sum of Squares of Residuals (SSerror) divided by the Total Sum of Squares (SStotal):  R2 = 1 – (SSerror / SStotal).

  • In the case of a multiple linear regression, if the predictor variables are too correlated with one another (referred to as multicollinearity), this can cause the coefficient of determination to be higher in value. 

  • If, for whatever reason, there is multicollinearity in the regression model, the Adjusted R Squared (Adjusted Coefficient of Determination) should be interpreted.  The Adjusted R2 can take on negative values, but should always be less than or equal to the Coefficient of Determination.  Note: The Adjusted R2 will only increase if more predictors variables are added to the regression model.

  • Inversely, the Coefficient of Non-Determination explains the amount of unexplained, or unaccounted for, variance between two variables, or between a set of variables (predictors) in an outcome variable. Where the Coefficient of Non-Determination is simply 1 – R2.

Monday, March 4, 2013

When to use descriptive Statistics to answer RQs



·         Descriptive statistics are the appropriate analyses when the goal of the research is to present the participants’ responses (as frequencies and percentages and/or as means and standard deviations) to survey items in order to address the research questions.  There are no hypotheses in descriptive statistics.

·         Descriptive statistics include: frequencies and percentages for categorical (ordinal and nominal) data; and averages (means, medians, and/or ranges) and standard deviations for continuous data.  Frequency is the number of participants that fit into a certain category or group; it is beneficial to know the percent of the sample that coincides with that category/group.  Percentages can be calculated to assess the percent of the sample that corresponds with the given frequency; typically presented without decimal places (according to APA 6th ed. standards).  Typically, the average that is calculated/presented is the mean.  Means describe the average unit for a continuous item; and standard deviations describe the spread of those units in reference to the mean.  

·         You cannot (statistically) infer results with descriptive statistics. Inferential (parametric and non-parametric) statistics are conducted when the goal of the research is to draw conclusions about the statistical significance of the relationships and/or differences among variables of interest.

·         Power analyses (sample size and effect size) can be conducted when the analyses used to address the research questions are inferential; not for descriptive statistics and there is not a minimum sample size that is required to conduct descriptive statistics.

·         Descriptive statistics are appropriate when the research questions ask questions similar to the following:

      •  What is the percentage of X, Y, and Z participants?
      • How long have X, Y, and Z participants been in a certain group/category?
      • What are, or describe, the factors of X?
      • What is the average of variable Y?
      • How much do X participants agree about a certain topic?
      • What are, or describe, the similarities and/or differences on a certain topic by group/category?
·         Example: a study was conducted on a group of college students about specific courses offered, where the questions had “check all that apply” responses.  The study’s research question asked “What courses offered to college students are most prevalent?”  Descriptive statistics would be the appropriate analysis to address the research question.  Frequencies and percentages could be conducted on the survey’s listed courses that students took/registered for.  See the table below for details.

Frequencies and Percentages on the Survey’s Listed Courses

Course
n
%



English composition 101
35
25
Chemistry 101
53
66
Algebra 101
16
4
Pottery
2
1
Intro to Psychology
70
85
Art 101
72
86
Note.  Percentages may not total 100 due to rounding error and participant allowance to select multiple responses.


Tuesday, February 19, 2013

Writing a quantitative research question



Formulating a quantitative research question can often be a difficult task.  When composing a research question, a researcher needs to determine if they want to describe data, compare differences among groups, assess a relationship, or determine if a set of variables predict another variable.  The type of question the researcher asks will help to determine the type of statistical analysis that needs to be conducted.  It is also important to consider what specific variables need to be assessed when writing a research question.  The researcher must be certain all variables are quantifiable, or measurable. Measuring variables can be as simple as having participants report their age or as involved as having participants answer survey questions that make up a reliable instrument.  Some examples of different types of research questions are presented below:

Descriptive:
Describe the teachers’ perceptions of the newly implemented reading assessment program.
The goal of a descriptive research question is to describe the data.  The researcher cannot infer any conclusions from this type of analysis; it simply presents data.  Descriptive questions do not have corresponding null and alternative hypotheses because the researcher is not making inferences.  Descriptive studies can be conducted on categorical or continuous data.

Comparative:
Are there differences in students’ grades by gender (male vs. female)?
Are there differences in job level (entry vs. mid vs. executive) by gender (male vs. female)?
Comparative questions can be assessed using a continuous variable and a categorical grouping variable, as well as with two categorical grouping variables.  They type of analysis will vary depending on the types of data.

Relationship:
Is there a relationship between age and fitness level?
Is there a relationship between ice cream sales and temperature at noon?
Questions that assess relationships do not require a definitive independent and dependent variable, but two variables are required; they can be considered variables of interest as opposed to independent and dependent variables.  Data used for this type of analysis can be dichotomous, ordinal, or continuous.  They type of analysis will vary depending on the types of data. 

Predictive:
Do age, gender, and education predict income?
Does a pitcher’s ERA predict the number of wins the team has?
Predictive questions have a definitive independent and dependent variable.  Typically, the independent variable should be continuous or dichotomous, but nominal and ordinal variables can be used.  When nominal and ordinal variables are used as predictors, they must be dummy coded.  Like the independent variable, the dependent variable is typically continuous or dichotomous, but can also be ordinal or nominal.  The type of analysis that is appropriate will vary based upon the type of data.

Monday, November 19, 2012

Practice MRF



You may be saying the MRF stands for “Man, Research Frustrating.” For those struggling with Capella’s BMGT8032: Survey of research methods, or for other dissertation students working on their proposal, here are a few thoughts. 

Research questions 1.6.  Your research questions need to have clear measures, you have to be able to get in touch with the participants, and they have to be stated in statistical language.   If you don’t have these three things, you don’t have answerable research questions. 

Sample size 2.2.  Sample size is a tricky thing, and maybe the order of writing has something to do with this.  Capella has this section as 2.2, which talks about the sample—fair enough.  However, since the preponderance of dissertations use a power analysis, and the power analysis is different based on the statistics used, the sample size justification (section 2.2) should go after, not before, the data analysis plan 2.5.  The best thing to do is to make sure you have the correct analysis, then use G-power (which is free) or go to our membership website page basic-membership for a write-up ($29). 

Measurement 2.3: First of all, this will become part of your dissertation, so make sure that you have constructs that are measurable.  If you are the first person to measure a particular construct, expect a few extra months to pilot test the instrument, then you still have to assess the reliability and validity of the new instrument.  Don’t reinvent the wheel—find a reliable and valid instrument that exists. Worst case, adapt a reliable and valid instrument, and use a change cross-walk in the appendix to show how your adaptation is different.

Data analysis plan 2.5.  The data plan is comprised of three components: which analyses are appropriate to assess your research questions, what are the assumptions of the selected analyses, and a justification of why the analyses were select.  The appropriate analysis is selected based on the way the research question is phrased (i.e., “difference” questions presume ANOVA type analyses) and the level of measurement of the variables (i.e., ANOVAs presume an interval or ratio level dependent variable and a nominal level independent variable).  The assumptions of an analysis can be found on our website (www.StatisticsSolutions.com) or elsewhere on the web.  And finally, the justification of the analysis combines the above two points by simply stating that given the research question phrasing and the level of measurement, this particular statistical test is appropriate. 

Certainly, any questions, feel free to call us (877) 437-8622 or email us info@StatisticsSolutions.com