Research Methods

To understand the use of statistics, one needs to know a little bit about experimental design or how a researcher conducts investigations. A little knowledge about methodology will provide us with a place to hang our statistics. In other words, statistics are not numbers that just appear out of nowhere. Rather, the numbers (data) are generated out of research. Statistics are merely a tool to help us answer research questions. As such, an understanding of methodology will facilitate our understanding of basic statistics.


A key concept relevant to a discussion of research methodology is that of validity. When an individual asks, "Is this study valid?", they are questioning the validity of at least one aspect of the study. There are four types of validity that can be discussed in relation to research and statistics. Thus, when discussing the validity of a study, one must be specific as to which type of validity is under discussion. Therefore, the answer to the question asked above might be that the study is valid in relation to one type of validity but invalid in relation to another type of validity.

Each of the four types of validity will be briefly defined and described below. Be aware that this represents a cursory discussion of the concept of validity. Each type of validity has many threats which can pose a problem in a research study. Examples, but not an exhaustive discussion, of threats to each validity will be provided. For a comprehensive discussion of the four types of validity, the threats associated with each type of validity, and additional validity issues see Cook and Campbell (1979).

Statistical Conclusion Validity: Unfortunately, without a background in basic statistics, this type of validity is difficult to understand. According to Cook and Campbell (1979), "statistical conclusion validity refers to inferences about whether it is reasonable to presume covariation given a specified alpha level and the obtained variances (p. 41)." Essentially, the question that is being asked is - "Are the variables under study related?" or "Is variable A correlated (does it covary) with Variable B?". If a study has good statistical conclusion validity, we should be relatively certain that the answer to these questions is "yes". Examples of issues or problems that would threaten statistical conclusion validity would be random heterogeneity of the research subjects (the subjects represent a diverse group - this increases statistical error) and small sample size (more difficult to find meaningful relationships with a small number of subjects).

Internal Validity: Once it has been determined that the two variables (A & B) are related, the next issue to be determined is one of causality. Does A cause B? If a study is lacking internal validity, one can not make cause and effect statements based on the research; the study would be descriptive but not causal. There are many potential threats to internal validity. For example, if a study has a pretest, an experimental treatment, and a follow-up posttest, history is a threat to internal validity. If a difference is found between the pretest and posttest, it might be due to the experimental treatment but it might also be due to any other event that subjects experienced between the two times of testing (for example, a historical event, a change in weather, etc.).

Construct Validity: One is examining the issue of construct validity when one is asking the questions "Am I really measuring the construct that I want to study?" or "Is my study confounded (Am I confusing constructs)?". For example, if I want to know a particular drug (Variable A) will be effective for treating depression (Variable B) , I will need at least one measure of depression. If that measure does not truly reflect depression levels but rather anxiety levels (Confounding Variable X), than my study will be lacking construct validity. Thus, good construct validity means the we will be relatively sure that Construct A is related to Construct B and that this is possibly a causal relationship. Examples of other threats to construct validity include subjects apprehension about being evaluated, hypothesis guessing on the part of subjects, and bias introduced in a study by expectencies on the part of the experimenter.

External Validity: External validity addresses the issue of being able to generalize the results of your study to other times, places, and persons. For example, if you conduct a study looking at heart disease in men, can these results be generalized to women? Therefore, one needs to ask the following questions to determine if a threat to the external validity exists: "Would I find these same results with a difference sample?", "Would I get these same results if I conducted my study in a different setting?", and "Would I get these same results if I had conducted this study in the past or if I redo this study in the future?" If I can not answer "yes" to each of these questions, then the external validity of my study is threatened.

Types of Research Studies

There are four major classifications of research designs. These include observational research, correlational research, true experiments, and quasi-experiments. Each of these will be discussed further below.

Observational research: There are many types of studies which could be defined as observational research including case studies, ethnographic studies, ethological studies, etc. The primary characteristic of each of these types of studies is that phenomena are being observed and recorded. Often times, the studies are qualitative in nature. For example, a psychological case study would entail extensive notes based on observations of and interviews with the client. A detailed report with analysis would be written and reported constituting the study of this individual case. These studies may also be qualitative in nature or include qualitative components in the research. For example, an ethological study of primate behavior in the wild may include measures of behavior durations ie. the amount of time an animal engaged in a specified behavior. This measure of time would be qualitative.

Surveys are often classified as a type of observational research.

Correlational research: In general, correlational research examines the covariation of two or more variables. For example, the early research on cigarette smoking examine the covariation of cigarette smoking and a variety of lung diseases. These two variable, smoking and lung disease were found to covary together.

Correlational research can be accomplished by a variety of techniques which include the collection of empirical data. Often times, correlational research is considered type of observational research as nothing is manipulated by the experimenter or individual conducting the research. For example, the early studies on cigarette smoking did not manipulate how many cigarettes were smoked. The researcher only collected the data on the two variables. Nothing was controlled by the researchers.

It is important to not that correlational research is not causal research. In other words, we can not make statements concerning cause and effect on the basis of this type of research. There are two major reasons why we can not make cause and effect statements. First, we donĀ¹t know the direction of the cause. Second, a third variable may be involved of which we are not aware. An example may help clarify these points.

In major clinical depressions, the neurotransmitters serotonin and/or norepinephrine have been found to be depleted (Coppen, 1967; Schildkraut & Kety, 1967). In other words, low levels of these two neurotransmitters have been found to be associated with increased levels of clinical depression. However, while we know that the two variables covary - a relationship exists - we do not know if a causal relationship exists. Thus, it is unclear whether a depletion in serotonin/norepinephrine cause depression or whether depression causes a depletion is neurotransmitter levels. This demonstrates the first problem with correlational research; we don't know the direction of the cause. Second, a third variable has been uncovered which may be affecting both of the variables under study. The number of receptors on the postsynaptic neuron has been found to be increased in depression (Segal, Kuczenski, & Mandell, 1974; Ventulani, Staqarz, Dingell, & Sulser, 1976). Thus, it is possible that the increased number of receptors on the postsynaptic neuron is actually responsible for the relationship between neurotransmitter levels and depression. As you can see from the discussion above, one can not make a simple cause and effect statement concerning neurotransmitter levels and depression based on correlational research. To reiterate, it is inappropriate in correlational research to make statements concerning cause and effect.

Correlational research is often conducted as exploratory or beginning research. Once variables have been identified and defined, experiments are conductable.

True Experiments: The true experiment is often thought of as a laboratory study. However, this is not always the case. A true experiment is defined as an experiment conducted where an effort is made to impose control over all other variables except the one under study. It is often easier to impose this sort of control in a laboratory setting. Thus, true experiments have often been erroneously identified as laboratory studies.

To understand the nature of the experiment, we must first define a few terms:

  1. Experimental or treatment group - this is the group that receives the experimental treatment, manipulation, or is different from the control group on the variable under study.

  2. Control group - this group is used to produce comparisons. The treatment of interest is deliberately withheld or manipulated to provide a baseline performance with which to compare the experimental or treatment group's performance.

  3. Independent variable - this is the variable that the experimenter manipulates in a study. It can be any aspect of the environment that is empirically investigated for the purpose of examining its influence on the dependent variable.

  4. Dependent variable - the variable that is measured in a study. The experimenter does not control this variable.

  5. Random assignment - in a study, each subject has an equal probability of being selected for either the treatment or control group.

  6. Double blind - neither the subject nor the experimenter knows whether the subject is in the treatment of the control condition.

Now that we have these terms defined, we can examine further the structure of the true experiment. First, every experiment must have at least two groups: an experimental and a control group. Each group will receive a level of the independent variable. The dependent variable will be measured to determine if the independent variable has an effect. As stated previously, the control group will provide us with a baseline for comparison. All subjects should be randomly assigned to groups, be tested a simultaneously as possible, and the experiment should be conducted double blind. Perhaps an example will help clarify these points.

Wolfer and Visintainer (1975) examined the effects of systematic preparation and support on children who were scheduled for inpatient minor surgery. The hypothesis was that such preparation would reduce the amount of psychological upset and increase the amount of cooperation among thee young patients. Eighty children were selected to participate in the study. Children were randomly assigned to either the treatment or the control condition. During their hospitalization the treatment group received the special program and the control group did not. Care was take such that kids in the treatment and the control groups were not roomed together. Measures that were taken included heart rates before and after blood tests, ease of fluid intake, and self-report anxiety measures. The study demonstrated that the systematic preparation and support reduced the difficulties of being in the hospital for these kids.

Let us examine now the features of the experiment described above. First, there was a treatment and control group. If we had had only the treatment group, we would have no way of knowing whether the reduced anxiety was due to the treatment or the weather, new hospital food, etc. The control group provides us with the basis to make comparisons The independent variable in this study was the presence or absence of the systematic preparation program. The dependent variable consisted of the heart rates, fluid intake, and anxiety measures. The scores on these measures were influenced by and depended on whether the child was in the treatment or control group. The children were randomly assigned to either group. If the "friendly" children had been placed in the treatment group we would have no way of knowing whether they were less anxious and more cooperative because of the treatment or because they were "friendly". In theory, the random assignment should balance the number of "friendly" children between the two groups. The two groups were also tested at about the same time. In other words, one group was not measured during the summer and the other during the winter. By testing the two groups as simultaneously as possible, we can rule out any bias due to time. Finally, the children were unaware that they were participants in an experiment (the parents had agreed to their children's participation in research and the program), thus making the study single blind. If the individuals who were responsible for the dependent measures were also unaware of whether the child was in the treatment or control group, then the experiment would have been double blind.

A special case of the true experiment is the clinical trial. A clinical trial is defined as a carefully designed experiment that seeks to determine the clinical efficacy of a new treatment or drug. The design of a clinical trial is very similar to that of a true experiment. Once again, there are two groups: a treatment group (the group that receives the therapeutic agent) and a control group (the group that receives the placebo). The control group is often called the placebo group. The independent variable in the clinical trial is the level of the therapeutic agent. Once again, subjects are randomly assigned to groups, they are tested simultaneously, and the experiment should be conducted double blind. In other words, neither the patient or the person administering the drug should know whether the patient is receiving the drug or the placebo.

Quasi-Experiments: Quasi-experiments are very similar to true experiments but use naturally formed or pre-existing groups. For example, if we wanted to compare young and old subjects on lung capacity, it is impossible to randomly assign subjects to either the young or old group (naturally formed groups). Therefore, this can not be a true experiment. When one has naturally formed groups, the variable under study is a subject variable (in this case - age) as opposed to an independent variable. As such, it also limits the conclusions we can draw from such an research study. If we were to conduct the quasi-experiment, we would find that the older group had less lung capacity as compared to the younger group. We might conclude that old age thus results in less lung capacity. But other variables might also account for this result. It might be that repeated exposure to pollutants as opposed to age has caused the difference in lung capacity. It could also be a generational factor. Perhaps more of the older group smoked in their early years as compared to the younger group due to increased awareness of the hazards of cigarettes. The point is that there are many differences between the groups that we can not control that could account for differences in our dependent measures. Thus, we must be careful concerning making statement of causality with quasi-experimental designs.

Quasi-experiments may result from studying the differences between naturally formed groups (ie. young & old; men & women). However, there are also instances when a researcher designs a study as a traditional experiment only to discover that random assignment to groups is restricted by outside factors. The researcher is forced to divide groups according to some pre-existing criteria. For example, if a corporation wanted to test the effectiveness of a new wellness program, they might decide to implement their program at one site and use a comporable site (no wellness program) as a control. As the employees are not shuffled and randomly assigned to work at each site, the study has pre-existing groups. After a few months of study, the researchers could then see if the wellness site had less absenteeism and lower health costs than the non-wellness site. The results are again restricted due to the quasi-correlational nature of the study. As the study has pre-existing groups, there may be other differences between those groups than just the presence or absence of a wellness program. For example, the wellness program may be in a significantly newer, more attractive building, or the manager from hell may work at the nonwellness program site. Either way, it a difference is found between the two sites it may or may not be due to the presence/absence of the wellness program.

To summarize, quasi-experiments may result from either studying naturally formed groups or use of pre-existing groups. When the study includes naturally formed groups, the variable under study is a subject variable. When a study uses pre-existing groups that are not naturally formed, the variable that is manipulated between the two groups is an independent variable (With the exception of no random assignment, the study looks similar in form to a true experiment). As no random assignment exists in a quasi-experiment, no causal statements can be made based on the results of the study.

Populations and Samples

When conducting research, one must often use a sample of the population as opposed to using the entire population. Before we go further into the reasons why, let us first discuss what differentiates between a population and a sample.

A population can be defined as any set of persons/subjects having a common observable characteristic. For example, all individuals who reside in the United States make up a population. Also, all pregnant women make up a population. The characteristics of a population are called a parameter. A statistic can be defined as any subset of the population. The characteristics of a sample are called a statistic.

Why Sample?

This brings us to the question of why sample. Why should we not use the population as the focus of study. There are at least four major reasons to sample.

First, it is usually too costly to test the entire population. The United States government spends millions of dollars to conduct the U.S. Census every ten years. While the U.S. government may have that kind of money, most researchers do not.

The second reason to sample is that it may be impossible to test the entire population. For example, let us say that we wanted to test the 5-HIAA (a serotonergic metabolite) levels in the cerebrospinal fluid (CSF) of depressed individuals. There are far too many individuals who do not make it into the mental health system to even be identified as depressed, let alone to test their CSF.

The third reason to sample is that testing the entire population often produces error. Thus, sampling may be more accurate. Perhaps an example will help clarify this point. Say researchers wanted to examine the effectiveness of a new drug on Alzheimer's disease. One dependent variable that could be used is an Activities of Daily Living Checklist. In other words, it is a measure of functioning o a day to day basis. In this experiment, it would make sense to have as few of people rating the patients as possible. If one individual rates the entire sample, there will be some measure of consistency from one patient to the next. If many raters are used, this introduces a source of error. These raters may all use a slightly different criteria for judging Activities of Daily Living. Thus, as in this example, it would be problematic to study an entire population.

The final reason to sample is that testing may be destructive. It makes no sense to lesion the lateral hypothalamus of all rats to determine if it has an effect on food intake. We can get that information from operating on a small sample of rats. Also, you probably would not want to buy a car that had the door slammed five hundred thousand time or had been crash tested. Rather, you probably would want to purchase the car that did not make it into either of those samples.

Types of Sampling Procedures

As stated above, a sample consists of a subset of the population. Any member of the defined population can be included in a sample. A theoretical list (an actual list may not exist) of individuals or elements who make up a population is called a sampling frame. There are five major sampling procedures.

The first sampling procedure is convenience. Volunteers, members of a class, individuals in the hospital with the specific diagnosis being studied are examples of often used convenience samples. This is by far the most often used sample procedure. It is also by far the most biases sampling procedure as it is not random (not everyone in the population has an equal chance of being selected to participate in the study). Thus, individuals who volunteer to participate in an exersise study may be different that individuals who do not volunteer.

Another form of sampling is the simple random sample. In this method, all subject or elements have an equal probability of being selected. There are two major ways of conducting a random sample. The first is to consult a random number table, and the second is to have the computer select a random sample.

A systematic sample is conducted by randomly selecting a first case on a list of the population and then proceeding every Nth case until your sample is selected. This is particularly useful if your list of the population is long. For example, if your list was the phone book, it would be easiest to start at perhaps the 17th person, and then select every 50th person from that point on.

Stratified sampling makes up the fourth sampling strategy. In a stratified sample, we sample either proportionately or equally to represent various strata or subpopulations. For example if our strata were states we would make sure and sample from each of the fifty states. If our strata were religious affiliation, stratified sampling would ensure sampling from every religious block or grouping. If our strata were gender, we would sample both men and women.

Cluster sampling makes up the final sampling procedure. In cluster sampling we take a random sample of strata and then survey every member of the group. For example, if our strata were individuals schools in the St. Louis Public School System, we would randomly select perhaps 20 schools and then test all of the students within those schools.

Sampling Problems

There are several potential sampling problems. When designing a study, a sampling procedure is also developed including the potential sampling frame. Several problems may exist within the sampling frame. First, there may be missing elements - individuals who should be on your list but for some reason are not on the list. For example, if my population consists of all individuals living in a particular city and I use the phone directory as my sampling frame or list, I will miss individuals with unlisted numbers or who can not afford a phone.

Foreign elements make up my second sampling problem. Elements which should not be included in my population and sample appear on my sampling list. Thus, if I were to use property records to create my list of individuals living within a particular city, landlords who live elsewhere would be foreign elements. In this case, renters would be missing elements.

Duplicates represent the third sampling problem. These are elements who appear more than once on the sampling frame. For example, if I am a researcher studying patient satisfaction with emergency room care, I may potentially include the same patient more than once in my study. If the patients are completing a patient satisfaction questionnaire, I need to make sure that patients are aware that if they have completed the questionnaire previously, they should not complete it again. If they complete it more that once, their second set of data respresents a duplicate.

Back to Statistics Page