Sample Size In Case Study Research

In recent years, there has been an increase in submissions to the Journal that draw on qualitative research methods. This increase is welcome and indicates not only the interdisciplinarity embraced by the Journal (Zucker, 2002) but also its commitment to a wide array of methodologies.

For those who do select qualitative methods and use grounded theory and in-depth interviews in particular, there appear to be a lot of questions that authors have had recently about how to write a rigorous Method section. This topic will be addressed in a subsequent Editorial. At this time, however, the most common question we receive is: “How large does my sample size have to be?” and hence I would like to take this opportunity to answer this question by discussing relevant debates and then the policy of the Archives of Sexual Behavior.1

The sample size used in qualitative research methods is often smaller than that used in quantitative research methods. This is because qualitative research methods are often concerned with garnering an in-depth understanding of a phenomenon or are focused on meaning (and heterogeneities in meaning)—which are often centered on the how and why of a particular issue, process, situation, subculture, scene or set of social interactions. In-depth interview work is not as concerned with making generalizations to a larger population of interest and does not tend to rely on hypothesis testing but rather is more inductive and emergent in its process. As such, the aim of grounded theory and in-depth interviews is to create “categories from the data and then to analyze relationships between categories” while attending to how the “lived experience” of research participants can be understood (Charmaz, 1990, p. 1162).

There are several debates concerning what sample size is the right size for such endeavors. Most scholars argue that the concept of saturation is the most important factor to think about when mulling over sample size decisions in qualitative research (Mason, 2010). Saturation is defined by many as the point at which the data collection process no longer offers any new or relevant data. Another way to state this is that conceptual categories in a research project can be considered saturated “when gathering fresh data no longer sparks new theoretical insights, nor reveals new properties of your core theoretical categories” (Charmaz, 2006, p. 113). Saturation depends on many factors and not all of them are under the researcher’s control. Some of these include: How homogenous or heterogeneous is the population being studied? What are the selection criteria? How much money is in the budget to carry out the study? Are there key stratifiers (e.g., conceptual, demographic) that are critical for an in-depth understanding of the topic being examined? What is the timeline that the researcher faces? How experienced is the researcher in being able to even determine when she or he has actually reached saturation (Charmaz, 2006)? Is the author carrying out theoretical sampling and is, therefore, concerned with ensuring depth on relevant concepts and examining a range of concepts and characteristics that are deemed critical for emergent findings (Glaser & Strauss, 1967; Strauss & Corbin, 1994, 2007)?

While some experts in qualitative research avoid the topic of “how many” interviews “are enough,” there is indeed variability in what is suggested as a minimum. An extremely large number of articles, book chapters, and books recommend guidance and suggest anywhere from 5 to 50 participants as adequate. All of these pieces of work engage in nuanced debates when responding to the question of “how many” and frequently respond with a vague (and, actually, reasonable) “it depends.” Numerous factors are said to be important, including “the quality of data, the scope of the study, the nature of the topic, the amount of useful information obtained from each participant, the use of shadowed data, and the qualitative method and study designed used” (Morse, 2000, p. 1). Others argue that the “how many” question can be the wrong question and that the rigor of the method “depends upon developing the range of relevant conceptual categories, saturating (filling, supporting, and providing repeated evidence for) those categories,” and fully explaining the data (Charmaz, 1990). Indeed, there have been countless conferences and conference sessions on these debates, reports written, and myriad publications are available as well (for a compilation of debates, see Baker & Edwards, 2012).

Taking all of these perspectives into account, the Archives of Sexual Behavior is putting forward a policy for authors in order to have more clarity on what is expected in terms of sample size for studies drawing on grounded theory and in-depth interviews. The policy of the Archives of Sexual Behavior will be that it adheres to the recommendation that 25–30 participants is the minimum sample size required to reach saturation and redundancy in grounded theory studies that use in-depth interviews. This number is considered adequate for publications in journals because it (1) may allow for thorough examination of the characteristics that address the research questions and to distinguish conceptual categories of interest, (2) maximizes the possibility that enough data have been collected to clarify relationships between conceptual categories and identify variation in processes, and (3) maximizes the chances that negative cases and hypothetical negative cases have been explored in the data (Charmaz, 2006; Morse, 1994, 1995).

The Journal does not want to paradoxically and rigidly quantify sample size when the endeavor at hand is qualitative in nature and the debates on this matter are complex. However, we are providing this practical guidance. We want to ensure that more of our submissions have an adequate sample size so as to get closer to reaching the goal of saturation and redundancy across relevant characteristics and concepts. The current recommendation that is being put forward does not include any comment on other qualitative methodologies, such as content and textual analysis, participant observation, focus groups, case studies, clinical cases or mixed quantitative–qualitative methods. The current recommendation also does not apply to phenomenological studies or life history approaches. The current guidance is intended to offer one clear and consistent standard for research projects that use grounded theory and draw on in-depth interviews.

References

  1. Baker, S. E., & Edwards, R. (2012). How many qualitative interviews is enough? National Center for Research Methods. Available at: http://eprints.ncrm.ac.uk/2273/.

  2. Charmaz, K. (1990). ‘Discovering’ chronic illness: Using grounded theory. Social Science and Medicine,30, 1161–1172.PubMedCrossRefGoogle Scholar

  3. Charmaz, K. (2006). Constructing grounded theory: A practical guide through qualitative analysis. London: Sage Publications.Google Scholar

  4. Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Chicago: Aldine Publishing Co.Google Scholar

  5. Mason, M. (2010). Sample size and saturation in PhD studies using qualitative interviews. Forum: Qualitative Social Research, 11(3) [Article No. 8].Google Scholar

  6. Morse, J. M. (1994). Designing funded qualitative research. In N. Denzin & Y. Lincoln (Eds.), Handbook of qualitative research (pp. 220–235). Thousand Oaks, CA: Sage Publications.Google Scholar

  7. Morse, J. M. (1995). The significance of saturation. Qualitative Health Research,5, 147–149.CrossRefGoogle Scholar

  8. Morse, J. M. (2000). Determining sample size. Qualitative Health Research,10, 3–5.CrossRefGoogle Scholar

  9. Strauss, A. L., & Corbin, J. M. (1994). Grounded theory methodology. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 273–285). Thousand Oaks, CA: Sage Publications.Google Scholar

  10. Strauss, A. L., & Corbin, J. M. (2007). Basics of qualitative research: Techniques and procedures for developing grounded theory. Thousand Oaks, CA: Sage Publications.Google Scholar

  11. Zucker, K. J. (2002). From the Editor’s desk: Receiving the torch in the era of sexology’s renaissance. Archives of Sexual Behavior,31, 1–6.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Social and Behavioral SciencesUniversity of California at San FranciscoSan FranciscoUSA

Volume 11, No. 3, Art. 8 – September 2010

Sample Size and Saturation in PhD Studies Using Qualitative Interviews

Mark Mason

Abstract: A number of issues can affect sample size in qualitative research; however, the guiding principle should be the concept of saturation. This has been explored in detail by a number of authors but is still hotly debated, and some say little understood. A sample of PhD studies using qualitative approaches, and qualitative interviews as the method of data collection was taken from theses.com and contents analysed for their sample sizes. Five hundred and sixty studies were identified that fitted the inclusion criteria. Results showed that the mean sample size was 31; however, the distribution was non-random, with a statistically significant proportion of studies, presenting sample sizes that were multiples of ten. These results are discussed in relation to saturation. They suggest a pre-meditated approach that is not wholly congruent with the principles of qualitative research.

Key words: saturation; sample size; interviews

Table of Contents

1. Introduction

1.1 Factors determining saturation

1.2 Guidelines for sample sizes in qualitative research

1.3 Operationalising the concept of saturation

1.4 The issue of saturation in PhDs

2. Method

3. Results

4. Discussion

5. Conclusion

Notes

References

Author

Citation

 

1. Introduction

Samples for qualitative studies are generally much smaller than those used in quantitative studies. RITCHIE, LEWIS and ELAM (2003) provide reasons for this. There is a point of diminishing return to a qualitative sample—as the study goes on more data does not necessarily lead to more information. This is because one occurrence of a piece of data, or a code, is all that is necessary to ensure that it becomes part of the analysis framework. Frequencies are rarely important in qualitative research, as one occurrence of the data is potentially as useful as many in understanding the process behind a topic. This is because qualitative research is concerned with meaning and not making generalised hypothesis statements (see also CROUCH & McKENZIE, 2006). Finally, because qualitative research is very labour intensive, analysing a large sample can be time consuming and often simply impractical. [1]

Within any research area, different participants can have diverse opinions. Qualitative samples must be large enough to assure that most or all of the perceptions that might be important are uncovered, but at the same time if the sample is too large data becomes repetitive and, eventually, superfluous. If a researcher remains faithful to the principles of qualitative research, sample size in the majority of qualitative studies should generally follow the concept of saturation (e.g. GLASER & STRAUSS, 1967)—when the collection of new data does not shed any further light on the issue under investigation. [2]

While there are other factors that affect sample size in qualitative studies, researchers generally use saturation as a guiding principle during their data collection. This paper examines the size of the samples from PhD studies that have used interviews as their source of data collection. It does not look at the data found in those studies, just the numbers of the respondents in each case. [3]

1.1 Factors determining saturation

While saturation determines the majority of qualitative sample size, other factors that can dictate how quickly or slowly this is achieved in a qualitative study. CHARMAZ (2006) suggests that the aims of the study are the ultimate driver of the project design, and therefore the sample size. She suggests that a small study with "modest claims" (p.114) might achieve saturation quicker than a study that is aiming to describe a process that spans disciplines (for example describing drug addiction in a specific group rather than a description of general addiction). [4]

Other researchers have also elucidated further supplementary factors that can influence a qualitative sample size, and therefore saturation in qualitative studies. RITCHIE et al. (2003, p.84) outline seven factors that might affect the potential size of a sample:

"the heterogeneity of the population; the number of selection criteria; the extent to which 'nesting' of criteria is needed; groups of special interest that require intensive study; multiple samples within one study; types of data collection methods use; and the budget and resources available". [5]

To this, MORSE (2000, p.4) adds, "the scope of the study, the nature of the topic, the quality of the data, the study design and the use of what MORSE calls "shadowed data". [6]

JETTE, GROVER and KECK (2003) suggested that expertise in the chosen topic can reduce the number of participants needed in a study—while LEE, WOO and MACKENZIE (2002) suggest that studies that use more than one method require fewer participants, as do studies that use multiple (very in-depth) interviews with the same participant (e.g. longitudinal or panel studies). [7]

Some researchers have taken this a step further and tried to develop a debate on the concept of saturation. MORSE (1995) feels that researchers often claim to have achieved saturation but are not necessarily able to prove it. This is also suggested by BOWEN (2008) who feels that saturation is claimed in any number of qualitative research reports without any overt description of what it means or how it was achieved. To this end, CHARMAZ (2006) gives the example of a researcher studying stigma in obese women. It is entirely possible that a researcher will claim that the category "experiencing stigma" is saturated very quickly. However, while an inexperienced researcher might claim saturation, a more experienced researcher would explore the context of stigma in more detail and what it means to each of these women (p.114). [8]

According to DEY (1999), the concept of saturation is inappropriate. He suggests that researchers often close categories early as the data are only partially coded, and cite others to support this practice, such as and STRAUSS and CORBIN (1998 [1990]) who suggest that saturation is a "matter of degree" (p.136). They suggest that the longer researchers examine, familiarise themselves and analyse their data there will always be the potential for "the new to emerge". Instead, they conclude that saturation should be more concerned with reaching the point where it becomes "counter-productive" and that "the new" is discovered does not necessarily add anything to the overall story, model, theory or framework (p.136). They admit that sometimes the problem of developing a conclusion to their work is not necessarily a lack of data but an excess of it. As the analysis begins to take shape it is important for the researcher to become more disciplined and cut data where necessary. [9]

1.2 Guidelines for sample sizes in qualitative research

As a result of the numerous factors that can determine sample sizes in qualitative studies, many researchers shy away from suggesting what constitutes a sufficient sample size (in contrast to quantitative studies for example). However, some clearly find this frustrating. GUEST, BUNCE and JOHNSON (2006, p.59) suggest, "although the idea of saturation is helpful at the conceptual level, it provides little practical guidance for estimating sample sizes for robust research prior to data collection". During the literature search for the background to their study they found "only seven sources that provided guidelines for actual sample sizes" (p.61):

  • Ethnography and ethnoscience: MORSE (1994, p.225) 30-50 interviews for both; BERNARD (2000, p.178) states that most studies are based on samples between 30-60 interviews for ethnoscience;

  • grounded theory methodology: CRESWELL (1998, p.64) 20-30; MORSE (1994, p.225) 30-50 interviews.

  • phenomenology: CRESWELL (1998, p.64) five to 25; MORSE (1994, p.225) at least six;

  • all qualitative research: BERTAUX (1981, p.35) fifteen is the smallest acceptable sample (adapted from GUEST et al., 2006). [10]

While these numbers are offered as guidance the authors do not tend to present empirical arguments as to why these numbers and not others for example. Also the issue of why some authors feel that certain methodological approaches call for more participants compared to others, is also not explored in any detail. [11]

Further to this, other researchers have tried to suggest some kind of guidelines for qualitative sample sizes. CHARMAZ (2006, p.114) for example suggests that "25 (participants are) adequate for smaller projects"; according to RITCHIE et al. (2003, p.84) qualitative samples often "lie under 50"; while GREEN and THOROGOOD (2009 [2004], p.120) state that "the experience of most qualitative researchers (emphasis added) is that in interview studies little that is 'new' comes out of transcripts after you have interviewed 20 or so people". [12]

While some researchers offer guidelines for qualitative samples, there is evidence that suggests others do not strictly adhere to them. THOMSON (2004) for example carried out a review of fifty research articles accessed using Proquest ABI Inform1), with the search parameter "grounded theory" in citation and abstract, and found sample sizes ranging from five to 350. Just over a third (34%) used samples between CRESWELL's suggested range of 20 and 30 (1998, p.128)—while only 11 studies (or 22%) used samples in MORSE's range of over 30 (1994, p.225). [13]

1.3 Operationalising the concept of saturation

There is an obvious tension between those who adhere to qualitative research principles, by not quantifying their samples—and those who feel that providing guidance on sample sizes is useful. Some researchers have gone further than providing guidelines and have tried to operationalise the concept of saturation, based on their own empirical analysis. [14]

Possibly the first to attempt this were ROMNEY, BATCHELDER and WELLER (1986) who developed an analysis tool called the "Cultural Consensus Model" (CCM ) for their ethnographic work. This sought to identify common characteristics between communities and cultural groups. The model suggests that each culture has a shared view of the world, which results in a "cultural consensus"—the level of consensus of different topics does vary but there are considered to be a finite set of characteristics or views. ROMNEY et al. suggest these views can then be factor analysed to produce a rigorous model of the cultures views on that topic. The subsequent analysis tool has also been used by some to estimate a minimum sample size—recently for example by ATRAN, MEDIN and ROSS (2005, p.753) who suggested that in some of their studies "as few as 10 informants were needed to reliably establish a consensus". [15]

GRIFFIN and HAUSER (1993) reanalysed data from their own study into customers of portable food containers. Using a model developed by VORBERG and ULRICH (1987) they examined the number of customer needs uncovered by various numbers of in-depth interviews and focus groups. Their work was undertaken from a market research perspective to assist in the development of robust bids and campaigns. Because of their analysis, they hypothesized that twenty to thirty in-depth interviews would be needed to uncover ninety to nine-five per cent of all customer needs. [16]

Most recently, GUEST et al. (2006) carried out a systematic analysis of their own data from a study of sixty women, involving reproductive health care in Africa. They examined the codes developed from their sixty interviews, in an attempt to assess at which point their data were returning no new codes, and were therefore saturated. Their findings suggested that data saturation had occurred at a very early stage. Of the thirty six codes developed for their study, thirty four were developed from their first six interviews, and thirty five were developed after twelve. Their conclusion was that for studies with a high level of homogeneity among the population "a sample of six interviews may [be] sufficient to enable development of meaningful themes and useful interpretations" (p.78). [17]

1.4 The issue of saturation in PhDs

GREEN and THOROGOOD (2009 [2004]) agree with GUEST et al., and feel that while saturation is a convincing concept, it has a number of practical weaknesses. This is particularly apparent in what they call "funded work" (or that limited by time). They suggest that researchers do not have the luxury of continuing the sort of open-ended research that saturation requires. This is also true when the point of saturation (particularly in relation to an approach like grounded theory methodology, which requires that all of the properties and the dimensions are saturated) they consider to be "potentially limitless" (p.120). They go on to add that sponsors of research often require a thorough proposal that includes a description of who, and how many people, will be interviewed at the outset of the research (see also SIBLEY, 2003). They further suggest that this also applies to ethics committees, who will want to know who will be interviewed, where, and when, with a clearly detailed rationale and strategy. This is no less relevant to PhD researchers. [18]

A reading of the application requirements of some of the World's top 50 Universities2) suggests it is not uncommon for universities to require applicants to explicitly document their intended sample size, prior to registration. The University of Toronto for example (ranked 29th in 2009), requires prospective students of PhD research programmes to "[j]ustify the anticipated sample size and its representativeness. For example, how many documents, interviews, focus groups will be consulted/undertaken and why?"3) Further to this, University College Dublin (ranked 43rd in 2009) requires prospective students to give "an indication of the feasibility of the proposed project including some indication of sample size and selection"4). [19]

This also appears to trouble current postgraduate students. When STRAUSS and CORBIN redrafted their "Basics of qualitative research" (1998 [1990]), they included what they considered twenty of the most frequently asked questions in their classes and seminars—question sixteen is, "How many interviews or observations are enough? When do I stop gathering data?" The answer they give once again outlines the concept of saturation but finishes with only a reiteration of the concept, suggesting that there are constraints (including time, energy, availability of participants etc.): "Sometimes the researcher has no choice and must settle for a theoretical scheme that is less developed than desired" (p.292). [20]

As an example of how this issue affects current postgraduate students, a brief search of the prominent Internet forum PostgraduateForum.com5) found at least three live discussion threads, specifically set up to debate and discuss the number of research participants required for their studies: "How many qual research interviews?"6); "No. of participants"7); and "How many qualitiative (sic.) interviews"8). [21]

The PhD is probably the one time that a researcher (often mature and in the middle of his/her career) should get to examine a subject in a great deal of detail, over the course of a number of years. Throughout the supervisory process the study is scrutinised by national, and often international, experts, and once completed, the methodology and findings scrutinised further. If a high level of rigour is to be found in the types of methods used in research studies then it should be in PhDs. [22]

With this in mind it was decided to examine the issue of sample size in the context of PhDs studies. The following research questions were developed to explore this issue:

  • How many participants are used in PhD studies utilising qualitative interviews? And do these numbers vary depending on the methodological approach? [23]

2. Method

A content analysis of a PhD database was undertaken on the website: "Index To Theses: A comprehensive listing of theses with abstracts accepted for higher degrees by universities in Great Britain and Ireland since 1716"9) ("the only comprehensive published listing of British theses accepted annually for higher degrees by some of the most prestigious educational institutions in the world; the Universities of Great Britain and Ireland"10)). [24]

Searching was undertaken between 3rd August 2009 and the 24th August 2009 (on 532,646 abstracts in the collection; last updated 2 July 2009, volume 58, 3rd update of 8) to identify PhD studies which stated they had used qualitative (i.e. structured; semi-structured or unstructured) interviews as a method of data collection. [25]

To explore any differences between diverse research approaches a categorisation of 26 different qualitative research approaches from diverse disciplines was used (TESCH, 1990). While studying qualitative research software TESCH found 26 different types of qualitative methodological tradition and categorised them into four groups: the characteristics of language, the discovery of regularities, the comprehension of the meaning of text or action, and reflection. [26]

A "standard search" was used, with the following parameters applied: "Any field" contains "INSERT METHODOLOGY (e.g. "Grounded Theory"11)); "Any field" contains "interviews"; and "Degree" contains "PhD". The following criteria were used to exclude cases:

  • Abstracts that did not state the exact number of interviews (i.e. studies where the author stated that "over fifty interviews were undertaken", for example, were excluded.

  • Abstracts that stated that the author had been part of a fieldwork team were excluded.

  • Abstracts that specified more than one interview for one participant were excluded (i.e. repeat interviews, longitudinal studies or panel studies).

  • Abstracts from other professional qualifications such as PhDs in clinical psychology (DClinPsy12)), for example, where single client case studies are prevalent, were excluded. [27]

This was intended to provide consistent criteria, that meant only studies that explicitly detailed the actual number of people interviewed once as part of the work, are included. Also, this study looks only at the use of one to one personal interviewing, and as such, the use of focus groups is not included in this analysis. [28]

The remaining studies were collected into the sample. The abstracts were searched and the following details recorded from each:

  • number of participants interviewed;

  • methodological approach used; and

  • category of qualitative research. [29]

3. Results

Table 1 shows the results from the analysis. It provides the number of studies identified from each research approach (as identified by TESCH, 1990), and the number of studies which made up the study sample, when the inclusion criteria were applied. It further provides the highest number of participants/interviews used for that approach along with the measures of central dispersion. Finally, it identifies how many of these studies utilised interviews as the only method, and provides this figure as a percentage of the overall sample.

 

No. of studies found

No. of studies after inclusion criteria applied

Range

Measures of central dispersion

 

 

 

High

Low

Mode

Mean

Median

St. Dev.

Action research

140

28

67

3

6

23

17

18.4

Case study

1401

179

95

1

40

36

33

21.1

Collaborative research

8

2

25

5

-

15

15

14.1

Content analysis

213

42

70

2

30

28

25

14.7

Critical / emancipatory research

6

3

42

21

-

35

41

11.8

Discourse analysis

157

44

65

5

20

25

22

15.3

Ecological Psychology

0

0

-

-

-

-

-

-

Educational ethnography

0

0

-

-

-

-

-

-

Education connoisseurship

0

0

-

-

-

-

-

-

Ethnographic contents analysis

2

2

52

22

-

37

37

21.2

Ethnography of communication

1

1

34

34

-

34

34

-

Ethnomethodolgy

7

2

55

11

-

31

31

27.6

Ethnoscience

0

0

-

-

-

-

-

-

Event structure

0

0

-

-

-

-

-

-

Grounded theory

429

174

87

4

25

32

30

16.6

Holistic ethnography

1

0

-

-

-

-

-

-

Hermeneutics

19

9

42

7

-

24

26

10.2

Heuristic research

0

0

-

-

-

-

-

-

Life history

61

35

62

1

21

23

20

16.1

Naturalistic enquiry

2

1

26

26

-

26

26

-

Phenomenology

57

25

89

7

20

25

20

19.9

Qualitative evaluation

7

1

42

42

-

42

42

-

Reflective phenomenology

0

0

-

-

-

-

-

-

Structural ethnography

0

0

-

-

-

-

-

-

Symbolic interactionism

22

12

4

87

-

33

28

26.5

Transcendental realism

0

0

-

-

-

-

-

-

TOTAL

2533

560

95

1

30

31

28

18.7

Table 1: Descriptive statistics for each methodological group [30]

Table 1 shows that the overall range of the numbers of participants used was from 95 (using a case study approach) to 1 (also using a case study and a life history approach). Of the 560 studies analysed the median and mean were 28, and 31 respectively, suggesting perhaps that the measures of central dispersion were generally consistent. However, the distribution is bi-modal (20 and 30) and the standard deviation was 18.7, which suggests that distribution of studies was somewhat positively skewed and the range of studies was comparatively widely dispersed from the mean. Below this Figure 1 provides an illustration of the distribution of the sample, i.e. how many studies used 1 participant in their study, how many used 2, how many used 3 etc.



Figure 1: Number of studies by each individual sample size [31]

Figure 1 shows a bi-modal distribution with a skewness 13)of 0.936 and a Kurtosis14) of 0.705, potentially suggesting a positively skewed distribution. The most important result from the chart however is the distribution of the sample. [32]

Figure 1 shows the prevalence of the studies that included 10, 20, 30 and 40 participants as their sample size. These were the four highest sample sizes, and provided 17% of the total number of studies in this analysis. This pattern continues with the prevalence of studies using 50 and 60 as their sample size comparative to the numbers around them (i.e. 50 is the most prevalent sample size with samples using any number from 50-59, and the same for 60). In total, the sample sizes ending in a zero account (i.e. studies with 10, 20, 30 participants etc.) for 114 of the studies in the sample. These nine sample sizes accounted for 20% of the total number of studies used in this analysis. [33]

A test for the randomness of fluctuations15) indicated that there was very strong evidence against the randomness of fluctuations: test statistic 5.17; p=0.00025. The pattern of non-fluctuation is more clearly illustrated below in Figure 2.



Figure 2: Number of studies with a sample ending in each integer [34]

A Chi-squared "goodness-of-fit" test16) was then used to test the null hypothesis that samples used in qualitative studies are equally likely to end on any integer. Results from the test indicate that Chi-square = 108.475; p=0.000, as a result the null hypothesis is rejected17). [35]

Table 1 also shows the descriptive results of the analysis of the 26 approaches identified by TESCH, in an attempt to discover whether the methodological approach affects the number of interviews undertaken. The analysis returned an uneven distribution of approaches among the studies used in the sample. Of the 26 approaches identified by TESCH, seven did not return any studies that fitted the search criteria, and a further one did not return any studies into the sample once the inclusion criteria were applied. As a result, detailed statistical analysis was not possible. [36]

However, it was clear that there were approaches that utilised interviews in their method more frequently than others did. Of the 26 qualitative approaches, nine returned more than 10 studies (eight after the inclusion criteria were applied). The most popular approaches used in PhD studies for this analysis were: case study, grounded theory methodology, content analysis, discourse analysis, action research, life history, phenomenology, symbolic interactionism, and hermeneutics. [37]

The approach utilising interviews most frequently were case study projects (1,401). However, only 13% of these fitted the inclusion criteria. This is followed by grounded theory studies (429). A greater proportion of these fitted the inclusion criteria (41%). Case study and grounded theory designs accounted for nearly two thirds (63%) of the entire sample. [38]

Qualitative evaluation had the highest mean number of participants (42); followed by ethnographic contents analysis (37), critical/emancipatory research (35), ethnography of communication (34). However, these means are achieved from comparatively few studies. The more studies returned into the sample for this analysis, the lower the mean tended to become. [39]

Perhaps more worthy of note is the fact that of the major approaches (i.e. those that returned the largest numbers of studies into the sample), case study approaches had the highest mean number of participants in their studies (36), while action research and life history approaches each showed mean numbers of participants of 23 in their studies. [40]

Finally, the data were compared to the guidelines given by various authors for achieving saturation in qualitative interviews (see page 3). The number of studies used in this analysis is shown below as a proportion of the whole for that approach:

  • Sixty per cent of the ethnographic studies found fell within the range of 30-50 suggested by MORSE (1994) and BERNARD (2000). No ethnoscience studies were found that fitted the inclusion criteria.

  • Just under half (49%) of the studies in this analysis fell within CRESWELL's (1998) suggested range of 20-30 for grounded theory studies: while just over a third (37%) fell within the range of 30-50 suggested by MORSE.

  • All of the phenomenological studies identified had at least six participants, as suggested by MORSE: while just over two thirds identified (68%) fell within CRESWELL's suggested range of five to 25.

  • Eighty per cent of the total proportion of qualitative studies met BERTAUX's (1981) guideline: while just under half (45%) met CHARMAZ's (2006) guidelines for qualitative samples, with up to 25 participants being "adequate" (p.114). A third of the studies (33% or 186) used sample sizes of 20 or under (GREEN & THOROGOOD, 2009 [2004]). Finally, 85% met RITCHIE et al.'s (2003) assertion that qualitative samples "often lie under 50" (p.84). [41]

4. Discussion

A wide range of sample sizes was observed in the PhD studies used for this analysis. The smallest sample used was a single participant used in a life history study, which might be expected due to the in-depth, detailed nature of the approach, while the largest sample used was 95 which was a study utilising a case study approach. The median, and mean were 28 and 31 respectively, which suggests a generally clustered distribution. However, the standard deviation (at 18.7) is comparatively high and the distribution is bi-modal and positively skewed. [42]

The most common sample sizes were 20 and 30 (followed by 40, 10 and 25). The significantly high proportion of studies utilising multiples of ten as their sample is the most important finding from this analysis. There is no logical (or theory driven) reason why samples ending in any one integer would be any more prevalent than any other in qualitative PhD studies using interviews. If saturation is the guiding principle of qualitative studies it is likely to be achieved at any point, and is certainly no more likely to be achieved with a sample ending in a zero, as any other number. However, the analysis carried out here suggests that this is the case. [43]

Of the samples achieved in this study there does not seem to be any real pattern as to how far PhD researchers are adhering to the guidelines for saturation, established by previous researchers. A large proportion of the samples (80%) adhered to BERTAUX's guidelines of 15 being the smallest number of participants for a qualitative study irrespective of the methodology. At the lower end of the spectrum, a higher proportion of researchers seem more ready to adhere to RITCHIE et al.'s guidelines that samples should "lie under 50". However, there were a proportion of studies that used more than 50 as their sample—these larger qualitative studies are perhaps the hardest to explain. [44]

While none of the guidelines presented here are intended to be faultless reference tools for selecting qualitative samples sizes, all authors agree that saturation is achieved at a comparatively low level (e.g. GUEST et al., 2006; GRIFFIN & HAUSER, 1993; and ROMNEY et al., 1986), and generally don't need to be greater than 60 participants (CHARMAZ, 2006; MORSE, 1994; CRESWELL, 1998). [45]

Without more detail of the studies it is not possible to conclude whether these larger samples were truly inappropriate. GREEN and THOROGOOD (2009 [2004]) give an example of a study where they explored how bilingual children work as interpreters for their parents. They constructed a sample of 60 participants: "but within that were various sub-samples, such as 30 young women, 15 Vietnamese speakers- 40 young people born outside the UK, and 20 people who were the only people to speak their 'mother tongue' in their school class" (p.120). [46]

MOHRMAN, TENKASI and MOHRMAN (2003) also used a comparatively large sample size for a qualitative study. Their study was longitudinal and utilised over 350 participants in eight different organisations. The study required MOHRMAN et al. to assess differences between multiple groups so each unit of analysis required its own sub-set (more in the nature of a quantitative quota sample). There is no way of knowing that the samples analysed in this study were similarly arranged. [47]

LEECH (2005) suggests that it is a mistake to presume that all qualitative research must inevitably use small samples. She feels that this ignores what she calls a growing body of research studies that utilise text-mining18) (e.g. POWIS & CAIRNS, 2003; DEL-RIO, KOSTOFF, GARCIA, RAMIREZ & HUMENIK, 2002; and LIDDY, 2000) as their method. Text-mining was not identified by TESCH (1990) as a separate methodological approach and as a result was not used in this analysis. Further analysis might examine samples from these studies in more detail. This highlights a potential weakness of this study—the interpretation of methodological approach. While it is believed that PhD researchers own descriptions of their work are likely to be accurate, it may place studies into certain categories when they might be better suited to others. [48]

Further research might also seek to quantify the other issues that affect sample size and undertake regression analysis to see what percentage of variance in the sample size can be explained by these factors. This would require a larger sample than that achieved in this paper as the unit of analysis would be the methodological approach or the existence of supplementary methods for example. Finally, this paper has sought to examine the use of personal interviewing in PhD studies for the reasons already given. Further research could feasibly examine whether these patterns exist in published research. [49]

0 thoughts on “Sample Size In Case Study Research

Leave a Reply

Your email address will not be published. Required fields are marked *