Developmental Research Methods

It is imperative that individuals studying any branch of psychology become familiar with the methodology applied to this scientific discipline. This becomes even more important when the methodology places limitations on the conclusions one can draw from the empirical research. All too often false conclusions are reached when reading research based on a lack of understanding concerning these limitations. This is particularly true for the study of human development as much of the research is quasi-experimental in nature. Below are several topics related to research methodology and the evaluation of developmental research.


A key concept relevant to a discussion of research methodology is that of validity. When an individual asks, "Is this study valid?", they are questioning the validity of at least one aspect of the study. There are four types of validity that can be discussed in relation to research and statistics. Thus, when discussing the validity of a study, one must be specific as to which type of validity is under discussion. Therefore, the answer to the question asked above might be that the study is valid in relation to one type of validity but invalid in relation to another type of validity.

Each of the four types of validity will be briefly defined and described below. Be aware that this represents a cursory discussion of the concept of validity. Each type of validity has many threats which can pose a problem in a research study. For a comprehensive discussion of the four types of validity, the threats associated with each type of validity, and additional validity issues see Cook and Campbell (1979).

Statistical Conclusion Validity: Unfortunately, without a background in basic statistics, this type of validity is difficult to understand. According to Cook and Campbell (1979), "statistical conclusion validity refers to inferences about whether it is reasonable to presume covariation given a specified alpha level and the obtained variances (p. 41)." Essentially, the question that is being asked is - "Are the variables under study related?" or "Is variable A correlated (does it covary) with Variable B?". If a study has good statistical conclusion validity, we should be relatively certain that the answer to these questions is "yes". Examples of issues or problems that would threaten statistical conclusion validity would be random heterogeneity of the research subjects (the subjects represent a diverse group - this increases statistical error) and small sample size (more difficult to find meaningful relationships with a small number of subjects).

Internal Validity: Once it has been determined that the two variables (A & B) are related, the next issue to be determined is one of causality. Does A cause B? If a study is lacking internal validity, one can not make cause and effect statements based on the research; the study would be descriptive but not causal. There are many potential threats to internal validity. For example, if a study has a pretest, an experimental treatment, and a follow-up posttest, history is a threat to internal validity. If a difference is found between the pretest and posttest, it might be due to the experimental treatment but it might also be due to any other event that subjects experienced between the two times of testing (for example, a historical event, a change in weather, etc.). Nonrandom assignment of subjects to groups (selection) also represents a threat to internal validity. This is often a problem in developmental research as one can not randomly assign individuals to particular age groups. Age is a subject variable (a characteristic inherent in the subject) and thus can not be assigned by a researcher. If a difference is found between two groups of different ages, the difference may be due to age or it may be due to some other non-age related factor such as history related cohort or generational differences.

Construct Validity: One is examining the issue of construct validity when one is asking the questions "Am I really measuring the construct that I want to study?" or "Is my study confounded (Am I confusing constructs)?". For example, if I want to know a particular drug (Variable A) will be effective for treating depression (Variable B) , I will need at least one measure of depression. If that measure does not truly reflect depression levels but rather anxiety levels (Confounding Variable X), than my study will be lacking construct validity. Thus, good construct validity means the we will be relatively sure that Construct A is related to Construct B and that this is possibly a causal relationship. Examples of other threats to construct validity include subjects apprehension about being evaluated (often a problem when testing older subjects), hypothesis guessing on the part of subjects, and bias introduced in a study by expectencies on the part of the experimenter.

External Validity: External validity addresses the issue of being able to generalize the results of your study to other times, places, and persons. For example, if you conduct a study looking at heart disease in men, can these results be generalized to women? If I study upper middle-class white children attending a private grade school, will these results apply to other children? Sampling is often a problem in developmental research. For years, old age was viewed to be a period of major health problems, depression, isolation, and severe cognitive decline as the research subjects consisted of elderly living in nursing homes! Therefore, one needs to ask the following questions to determine if a threat to the external validity exists: "Would I find these same results with a difference sample?", "Would I get these same results if I conducted my study in a different setting?", and "Would I get these same results if I had conducted this study in the past or if I redo this study in the future?" If I can not answer "yes" to each of these questions, then the external validity of my study is threatened.

Types of Research Studies

There are four major classifications of research designs. These include observational research, correlational research, true experiments, and quasi-experiments. Each of these will be discussed further below.

Observational research: There are many types of studies which could be defined as observational research including case studies, ethnographic studies, ethological studies, etc. The primary characteristic of each of these types of studies is that phenomena are being observed and recorded. Often times, the studies are qualitative in nature. For example, a psychological case study would entail extensive notes based on observations of and interviews with the client. A detailed report with analysis would be written and reported constituting the study of this individual case. These studies may also be quantitative in nature or include quantitative components in the research. For example, an ethological study of primate behavior in the wild may include measures of behavior durations ie. the amount of time an animal engaged in a specified behavior. This measure of time would be quantitative.

Surveys are often classified as a type of observational research.

Observational research can be problematic if not conducted well. Clearly, there are many problems with internal validity. One can describe the individual(s) being observed but one can not make any sort of causitive conclusions based on the observations. Additionally, construct validity can be impacted by lack of background work before the observations or study, observer and experimenter biases or expectencies, etc. In developmental psychology, this form of research is often early work in the exploration of a developmental topic.

Correlational research: In general, correlational research examines the covariation of two or more variables. For example, the early research on cigarette smoking examine the covariation of cigarette smoking and a variety of lung diseases. These two variable, smoking and lung disease were found to covary together.

Correlational research can be accomplished by a variety of techniques which include the collection of empirical data. Often times, correlational research is considered a type of observational research as nothing is manipulated by the experimenter or individual conducting the research. For example, the early studies on cigarette smoking did not manipulate how many cigarettes were smoked. The researcher only collected the data on the two variables. Nothing was controlled by the researchers and therefore, no cause and effect statements could be made. Of course, further experimental research clearly demonstrated the negative effects of cigarette smoking.

It is important to not that correlational research is not causal research. In other words, we can not make statements concerning cause and effect on the basis of this type of research. There are two major reasons why we can not make cause and effect statements. First, we don't know the direction of the cause. Second, a third variable may be involved of which we are not aware. An example may help clarify these points.

In major clinical depressions, the neurotransmitters serotonin and/or norepinephrine have been found to be depleted (Coppen, 1967; Schildkraut & Kety, 1967). In other words, low levels of these two neurotransmitters have been found to be associated with increased levels of clinical depression. However, while we know that the two variables covary - a relationship exists - we do not know if a causal relationship exists. Thus, it is unclear whether a depletion in serotonin/norepinephrine cause depression or whether depression causes a depletion is neurotransmitter levels. This demonstrates the first problem with correlational research; we don't know the direction of the cause. Second, a third variable has been uncovered which may be affecting both of the variables under study. The number of receptors on the postsynaptic neuron has been found to be increased in depression (Segal, Kuczenski, & Mandell, 1974; Ventulani, Staqarz, Dingell, & Sulser, 1976). Thus, it is possible that the increased number of receptors on the postsynaptic neuron is actually responsible for the relationship between neurotransmitter levels and depression. As you can see from the discussion above, one can not make a simple cause and effect statement concerning neurotransmitter levels and depression based on correlational research. To reiterate, it is inappropriate in correlational research to make statements concerning cause and effect.

Much has been in the news in the past several years concerning violence in schools. Most of the research related to this phenomenon is correlational in nature. Thus, for example, a relationship has been reported in the media concerning violence on television and in video games and violent children and adolescents. While a correlation exists between the two it is not necessarily a cause and effect relationship. We do know that children who are aggressive tend to watch a higher proportion of violent television than children who are not highly violent or aggressive. However, it is unclear based on correlational research whether violent TV programs increase violence in children or whether violent kids prefer to watch violent programming. Only experimental studies have provided a clearer answer regarding the negative impact of violent programming and video games on children and adolescents.

Correlational research is often conducted as exploratory or beginning research. Once variables have been identified and defined, experiments are conductable.

True Experiments: The true experiment is often thought of as a laboratory study. However, this is not always the case. A true experiment is defined as an experiment conducted where an effort is made to impose control over all other variables except the one under study. It is often easier to impose this sort of control in a laboratory setting. Thus, true experiments have often been erroneously identified as laboratory studies.

To understand the nature of the experiment, we must first define a few terms:

  1. Experimental or treatment group - this is the group that receives the experimental treatment, manipulation, or is different from the control group on the variable under study.

  2. Control group - this group is used to produce comparisons. The treatment of interest is deliberately withheld or manipulated to provide a baseline performance with which to compare the experimental or treatment group's performance.

  3. Independent variable - this is the variable that the experimenter manipulates in a study. It can be any aspect of the environment that is empirically investigated for the purpose of examining its influence on the dependent variable.

  4. Dependent variable - the variable that is measured in a study. The experimenter does not control this variable.

  5. Random assignment - in a study, each subject has an equal probability of being selected for either the treatment or control group.

  6. Double blind - neither the subject nor the experimenter knows whether the subject is in the treatment of the control condition.

Now that we have these terms defined, we can examine further the structure of the true experiment. First, every experiment must have at least two groups: an experimental and a control group. Each group will receive a level of the independent variable. The dependent variable will be measured to determine if the independent variable has an effect. As stated previously, the control group will provide us with a baseline for comparison. All subjects should be randomly assigned to groups, be tested a simultaneously as possible, and the experiment should be conducted double blind. Perhaps an example will help clarify these points.

Wolfer and Visintainer (1975) examined the effects of systematic preparation and support on children who were scheduled for inpatient minor surgery. The hypothesis was that such preparation would reduce the amount of psychological upset and increase the amount of cooperation among these young patients. Eighty children were selected to participate in the study. Children were randomly assigned to either the treatment or the control condition. During their hospitalization the treatment group received the special program and the control group did not. Care was take such that kids in the treatment and the control groups were not roomed together. Measures that were taken included heart rates before and after blood tests, ease of fluid intake, and self-report anxiety measures. The study demonstrated that the systematic preparation and support reduced the difficulties of being in the hospital for these kids.

Let us examine now the features of the experiment described above. First, there was a treatment and control group. If we had had only the treatment group, we would have no way of knowing whether the reduced anxiety was due to the treatment or the weather, new hospital food, etc. The control group provides us with the basis to make comparisons The independent variable in this study was the presence or absence of the systematic preparation program. The dependent variable consisted of the heart rates, fluid intake, and anxiety measures. The scores on these measures were influenced by and depended on whether the child was in the treatment or control group. The children were randomly assigned to either group. If the "friendly" children had been placed in the treatment group we would have no way of knowing whether they were less anxious and more cooperative because of the treatment or because they were "friendly". In theory, the random assignment should balance the number of "friendly" children between the two groups. The two groups were also tested at about the same time. In other words, one group was not measured during the summer and the other during the winter. By testing the two groups as simultaneously as possible, we can rule out any bias due to time. Finally, the children were unaware that they were participants in an experiment (the parents had agreed to their children's participation in research and the program), thus making the study single blind. If the individuals who were responsible for the dependent measures were also unaware of whether the child was in the treatment or control group, then the experiment would have been double blind.

Quasi-Experiments: Quasi-experiments are very similar to true experiments but use naturally formed or pre-existing groups. For example, if we wanted to compare young and old subjects on lung capacity, it is impossible to randomly assign subjects to either the young or old group (naturally formed groups). Therefore, this can not be a true experiment. When one has naturally formed groups, the variable under study is a subject variable (in this case - age) as opposed to an independent variable. As such, it also limits the conclusions we can draw from such an research study. If we were to conduct the quasi-experiment, we would find that the older group had less lung capacity as compared to the younger group. We might conclude that old age thus results in less lung capacity. But other variables might also account for this result. It might be that repeated exposure to pollutants as opposed to age has caused the difference in lung capacity. It could also be a generational factor. Perhaps more of the older group smoked in their early years as compared to the younger group due to increased awareness of the hazards of cigarettes. The point is that there are many differences between the groups that we can not control that could account for differences in our dependent measures. Thus, we must be careful concerning making statement of causality with quasi-experimental designs.

Quasi-experiments may result from studying the differences between naturally formed groups (ie. young & old; men & women). However, there are also instances when a researcher designs a study as a traditional experiment only to discover that random assignment to groups is restricted by outside factors. The researcher is forced to divide groups according to some pre-existing criteria. For example, if a corporation wanted to test the effectiveness of a new wellness program, they might decide to implement their program at one site and use a comporable site (no wellness program) as a control. As the employees are not shuffled and randomly assigned to work at each site, the study has pre-existing groups. After a few months of study, the researchers could then see if the wellness site had less absenteeism and lower health costs than the non-wellness site. The results are again restricted due to the quasi-correlational nature of the study. As the study has pre-existing groups, there may be other differences between those groups than just the presence or absence of a wellness program. For example, the wellness program may be in a significantly newer, more attractive building, or the manager from hell may work at the nonwellness program site. Either way, it a difference is found between the two sites it may or may not be due to the presence/absence of the wellness program.

Much of the research conducted by developmental psychologists is quasi-experimental. Two commonly used designs include the cross-sectional design and the longitudinal design. The cross-sequential design represents an alternative design which aims to correct for some of the problems inherent in the cross-sectional and longitudinal designs. These designs are described as unifactorial designs, with age as the single factor (Campbell & Stanley, 1963). The cross-sectional and longitudinal designs are noted for low internal validity. For example, the cross-sectional design is confounded by cohort effects. Each of the designs will be discussed below. Included in this discussion will be an analysis of the internal validity problems as they relate to each design.

The cross-sectional method has been defined by Baltes (1968) as follows: "Samples (S1 - Sn) of different ages (A1 - An) are observed on the same dependent variable once (O1) at the same time of measurement (T1)" (p. 146). In other words, two or more age cohorts (individuals born at roughly the same time) are tested at one time to see if differences exist across ages. Thus, at one point in time, individuals of different ages are tested and compared. Cook and Campbell (1979) argue that this is not a true design but rather separate samples. As such, there are many threats to internal validity. The major threat to internal validity in a separate sample quasi-experimental study is selection. The samples may be different on any number of variables other than the one under investigation. In the cross-sectional study, age differences may be confounded with differences in generations or cohorts. All members of a cohort share similar experiences in relation to normative history-graded influences. Thus, the researcher is not able to differentiate between maturational differences and cohort differences. An example may be useful in clarifying this point.

>A researcher might choose to conduct a study examining differences in spending habits across the life-span. The hypothesis might be as follows - as individuals age they become more conservative in their spending habits. The researcher would then randomly select samples from various age cohorts; for example: individuals born in 1910, 1920, 1930, 1940, 1950, 1960, 1970. These groups would then be tested for differences in spending habits. Subsequently, the researcher finds differences in spending habits across age with increasing conservatism correlated with increasing age. The researcher concludes that an age difference has been demonstrated. However, age is confounded with a cohort effect. In particular, the older groups experienced the Great Depression (in different ways) whereas the younger groups did not. This, not age, may account for the differences in spending habits.

As demonstrated above, the cross-sectional design confounds maturation with cohort. Therefore, it can only be used descriptively. Differences in age groups or cohort can be described but the differences can not be definitively explained.

It should be noted that the selective sampling with the cross-sectional method can also be problematic. For example, selective sampling is a problem when examining intellectual performance with age. The studies conducted reporting a drop in intelligence with increasing age may be simply the reporting of a selection bias - the subjects samples were in long-term care and thus, non-well. Then evaluating the results of cross-sectional studies, care should be taken to examine the size and representativeness of the selected samples.

The longitudinal method is defined by Baltes (1968) as follows: "One sample (S1 is observed several times (O1 - On) on the same dependent variable at different age levels (A1 - An), and therefore by definition at different times of measurement (T1 - Tn)" (p. 146). In other words, one group of individuals within one cohort is tested at least twice over time. Cook and Campbell (1979) would define this method as a time-series design. As such, it suffers from many threats to internal validity with history being the most serious threat. History is defined as those events that occur between time of testing. In the longitudinal method, age differences or differences in maturation are confounded with history effects. What occurs in the environment represents an experimental treatment. In other words, normative history-graded influences are confounded with age differences.

Let us presume that a researcher had decided to study spending habits across the life-span and this research was begun shortly after the turn of the century. A group of individuals was initially studied at 20 years of age in 1910. A follow-up test was then conducted every ten years for the next 50 years. Once again, increased conservatism concerning spending was found to be correlated with increased age. However, age is confounded with a normative history-graded event. In this example, the event was the Great Depression of the early 1930s. Therefore, the Great Depression acted as a treatment effect.

As demonstrated above, the longitudinal method confounds history and maturation. It is important to note that even small changes in a context can have an impact on longitudinal research. Thus, a change is school administration could possibly impact any longitudinal research being conducted within a particular school system. Therefore, as a methodology it can also only be used descriptively.

There are also several threats to selection with the longitudinal method. First, the longitudinal method rarely meets the criteria of selective sampling (Baltes, 1968). For example, individuals who volunteer to participate in a longitudinal study are usually of higher intelligence and socioeconomic status (Baltes, 1968). Second, longitudinal studies suffer from selective survival. Individuals who survive (or at least don't drop out of the study) may be qualitatively different than those who do not (Jarvik & Falek, 1963). Similarly, longitudinal studies also suffer from selective drop-out/experimental mortality (Campbell & Stanley, 1963). It is theorized, in the longitudinal method, that the same group of individuals will be repeatedly tested. Thus, leading to a homogeneity of groups across testing time. Subject attrition due to drop-out creates a disparity between those who remain and those who leave the study. Thus, the longitudinal method suffers from many selection biases.

Testing effects are also a problem with the longitudinal method. This is particularly evident in studies where subjects have been retested many times. For example, the Berkeley Growth Study tested the majority of subjects approximately 38 times over a period of 18 years (Bayley, 1948, cited in Baltes, 1968).

It should be clear from the description above that the longitudinal method suffers from many threats to internal validity. It should also be noted that the longitudinal method is very time-consuming and expensive to conduct. Thus, while at first glance it appears that the longitudinal method provides us a window into the life-span, it can only be used to describe changes in a particular group of individuals over a particular point in time.

When both cohort effects and normative history-graded effects are thought to play a role, it is suggested that a cross-sequential design be implemented (Schaie, 1965). The cross-sequential method is designed to measure all cohorts at all times of measurement. In other words, two or more age cohorts are tested at two or more times. Thus, one can examine not only age changes but also cohort effects and normative history-graded influences. It is assumed that these effects are additive and thus the amount of variance due to each can be divided out. However, if interactions occur these effects can not be partitioned out. Thus, the assumptions underlying the cross-sequential method may be false.

If a researcher has decided to conduct a cross-sequential study of spending habits in 1910 that extended for sixty years, they would have discovered that a normative history-graded influence was impacting their study. Possibly, they would have found little differences between the groups measured in 1910 and 1920 regardless of the age group to which they belonged. However, when they tested these same individuals after 1930, they would have found that all age groups became more conservative in the spending, thus, indicating a history effect. If a pattern was found such that a particular cohort was always different from another cohort when tested regardless of the year they were tested, then the difference is most likely a cohort effect. However, if a difference is found between two ages regardless of cohort or year studies, then the effect is most likely maturational or age related.

In addition to the above design, it has been suggested that multivariate procedures be applied to the study of human development (Bock, 1979; Nesselroade, 1970). The traditional designs, such as those described above, examine only one dependent variable. As such, their external validity is low. In other words, a set of variables serves better to define a psychological construct than a single variable. In addition, a multivariate procedure enables interrelationships between variable and constructs to be examined. However, as a wide range of multivariate techniques can be used in the study of development, they will not all be discussed here. Briefly stated, those with the greatest applicability include factor analysis, principle components analysis, multifactor analysis of variance (MANOVA), and multivariate analysis of covariance (MANCOVA). The later two methodologies can be used on all the designs described above as long as more than one dependent variable is being studied and the basic assumptions underlying each test is met. Therefore, this methodology is extremely functional in relation to the study of human development.

Discussion Questions:

1. Why do you think Wechsler found that IQ drops with age?

2. Why do you think that Terman found that IQ increases with age?

3. Why do you think that more recent researchers such as Schaie have found that intelligence remains stable with age?

4. Why do you think that Horn found that intelligence both increases and decreases with age?

5. Why do you think the Riegel and Riegel found that intelligence remains stable until five years prior to death at which point it declines in old age?

Back to Woolf Main Web Page