Analysing qualitative data by computer

Nigel Fielding

Qualitative Data Analysis with a computer: recent developments

Nigel Fielding, University of Surrey

[Nigel Fielding, Reader in Sociology, is responsible for teaching qualitative methods and has research interests in criminal justice and qualitative methodology. Among his books are The National Front (1981), Linking Data (1986; a text on triangulation), Actions and Structure (1988; on micro-sociology and macro-theory) and Using computers in qualitative research (1991). He is also editor of the Howard Journal of Criminal Justice. It has been claimed that 'computer use in qualitative sociology is advancing faster than in quantitative research' (Hinze 1987). Around 15 dedicated qualitative analysis programs of various kinds are currently available. A number of researchers have also adapted word processing and text retrieval programs to help in qualitative analysis (for an overview of the programs, see Tesch 1991).]

Computer-Assisted Qualitative Data AnalysiS (CAQDAS) is a recent development. The arrival of word processors with text retrieval and handling capacities was the background for the development of specialist qualitative packages. While some early experimenters began developing their own software for particular qualitative applications almost as soon as they had assembled their first DIY personal computer, the main impetus came from academic seminars including social researchers and computing enthusiasts during the early 1980s.

The software did not, however, remain an interest only for those in 'pure' academic disciplines. From the first it gained substantial use in applied research. It offered to address the needs of researchers working under the pressure of short-term 'soft money' contracts and yet who retained an enthusiasm for the intrinsic interest of qualitative data. Another major use was in market research, where the focus group approach continues to represent a distinct branch of the field.

This does not mean that CAQDAS is the answer to every qualitative research problem. Indeed, Seidel, the creator of one of the most popular programs, has written of his fear that researchers, especially those with little qualitative experience and those working under the pressure of applied research settings, may be led into slavish adherence to conventions that are set into program assumptions. Many qualitative researchers believe that the use of software poses a threat to the craft skills of a long-established research tradition. There is a perceived danger of superficial analysis produced by slavishly following a mechanical set of procedures. There is also a more profound concern, that the existing software contains an implicit theory of qualitative analysis, one which is not conducive to the full range of analytic postures customarily found in this eclectic field. Insofar as existing software presumes a generic theory of qualitative analysis, it largely relates to the conventional, but by no means universal, grounded theory approach. Those preferring hermeneutic approaches, ethnomethodology, conversation analysis or holistic analysis are less well-served.

It is also apparent that some software imposes a very light touch on the analyst, being confined to simple, albeit rapid, text retrieval (database software such as SONAR). But other packages, notably NUDIST and Atlas-ti, promise much more. The authors of NUDIST explicitly claim that their software 'transforms' qualitative analysis. The co-developer of Atlas-ti has complained that, while developers have already advanced 'over the horizon', users are too conservative and reluctant to use features which are technically feasible. Developers of Hypercard-based applications report that much of their time is spent removing features so as to make their programs accessible to qualitative researchers.

Thus, a researcher considering whether to use a package on a particular project needs to take into account the kinds of analytic work the software facilitates and the kinds of work for which it is unsuitable; the relevance of the features included in the software to the analytic procedures employed by different research traditions; the degree to which holistic as opposed to segmental analysis is facilitated; and the degree to which micro-analysis (e.g., conversation analysis) is facilitated.

Fortunately, the very limits of the existing software seem to hold at bay problems of the sort expressed by Siedel. Those who have used CAQDAS generally find the 'threat' of the software implausible. For the last year Ray Lee and I have been researching user experiences with CAQDAS, by convening focus groups (we plan to carry out more soon; if you'd like to participate, let me know!). For the most part researchers regard CAQDAS as just another tool, to be used when appropriate but not when analytic closure would be premature or when sample size or features of the data do not justify the time setting it up (Lee and Fielding 1991).

What's around

This section concerns dedicated software for qualitative analysis. But a variety of common programs are useful, such as timeliners (MacTimeline, Tom Synder Productions), outliners (MORE, Symantec), graphics packages (SuperPaint, Aldus) and word processing, database and spreadsheet programs (Works, Word and Excel, Microsoft). These are all for Macintosh but similar software exists for all platforms.

In making choices it is important to know what sort of manipulation or presentation is required and on what scale. For example, a straight-forward database like Microsoft Works is certainly adequate for inductive coding at the sentence level of transcripts, where there is little need to interrelate categories with other transcripts. You just create a record template with a field for sentence TEXT, and another (or several, perhaps one for each family of coding categories) to hold a set of CODEWORDS, and one for sentence NUMBERS. This effectively duplicates the process of writing marginal notes when working with paper. Sentences are then coded by typing the names of coding categories in the CODEWORDS field. When this is complete, categories can be extracted by searches and sorts, and printed. You can always return the transcript to its original context by sorting by NUMBERS.

Thus, depending on the amount of data you have and the depth of analysis you want, it might make sense to use The Ethnograph, or a word processor and outlining software, or revert to highlighting pens and Post-It notes. The overhead in setting up and using packages is not always worth it.

Turning to dedicated packages, a few remarks about two well-known packages help to sketch in some key considerations that users face. The Ethnograph, first developed in the early 1980s, has been upgraded periodically and is now a rather sophisticated program for IBM PCs and compatibles. Users have adapted it to many individual analytic approaches, but it is best suited to analysis of the 'cut and paste' kind rather than analytic approaches based on, say, sociolinguistics. Its not very well-suited to use on networks and, like other similar programs, does not allow simultaneous access to data files by multiple users. The current version allows on-screen coding and the attachment of 'memos' to coded data. In contrast, NUDIST is specifically designed for multi-access use. It indexes on-line and off-line data, provides 'audit trails' of retrievals, and now has an interface to the quantitative data management program SPSS. Relationships in the data are displayed in 'tree structures' which users may initially find daunting. Compared to The Ethnograph, the emphasis is on conceptual relations between codes rather than on the construction of typologies where the relationship of data to code is the pre-eminent concern. The interface is common across platforms and advisory support is available to those who can call Australia.

With such points in mind we can go on to some descriptive profiles. Some details, especially prices, are subject to change. This section is based on a selection from the Resources Appendix in Fielding and Lee (1991); an updated Resources Appendix appears in the new edition available from March 1993.


Allows you to identify and retrieve text from documents. Basic unit is the segment. Each can be identified by up to 12 codewords. Segments can be nested and overlapped 7 levels deep. Search results are sensitive to nests and overlaps. Searches can be done on single or multiple codeword(s). Each data file can be identified by facesheet variables. Existing coding schemes can be selectively or globally modified. Includes memo feature and codebook feature. Runs on all PCs and compatibles. Hard disk essential for version 4, recommended for earlier versions. Single copy $150 plus $20 shipping. Site licenses available.

Distributor: Qualitative Research Management, 73425 Hilltop Rd., Desert Hot Springs, CA 92240 USA. Tel. (619) 329 7026


Provides an integrated environment for data entry, memory and illustrations. Designed to assist in the analysis of text data from interviews, observations and documents. A HyperCard application (stack). A special package for focus groups, Hyperfocus, is also available. Any Macintosh; word processor, hard disk; HyperCard ver. 1.2 or higher. $125 plus $10 shipping.

Distributor: as per The Ethnograph. Or Dr R.V. Padilla, 3327 North Dakota, Chandler, AZ 85224 USA.


A HyperCard-based application that allows for qualitative and quantitative analysis of textual, audio and video materials. An expert system provides a semi-formal mechanism for theory-building. Statistical option allows for the simple analysis of coded data. Reporting allows for the displaying or printing of text and the replay of coded segments of audio or video. Macintosh with System 6.0 or later and HyperCard ver. 1.2 or higher. $175

Distributor: Researchware Inc., 20 Soren St., Randolph MA 02368-1945, USA. Tel (617) 961 3909.


Offers facilities for filing, copying, indexing, searching and extracting textual data. Includes procedures for summarizing, annotating, categorizing, mapping, coding and quantifying data. Expresses relationships in the data graphically on screen by width of linking line. Macintosh with System 6.0 or later and HyperCard ver. 1.2 or higher. £50

Distributor: Ian Dey, 45 Colinton Rd., Edinburgh EH10 5EN


Uses flagging and text search to construct a possibly large and highly structured hierarchical database indexing into the documents to be analysed. Retrievals use a complete set of Boolean operators on indexing categories, as well as a set of nonBoolean operators which encourage generation of new ideas. All retrievals are added back to the indexing system as additional indexing categories, and are available as the basis of further and more abstracted retrievals. To support emerging theory, indexing categories are independent objects which may be modified, titled, have text comments added and be shifted to other locations in the indexing structure. Text unit may be any length you like, including single words. Text search can be word, phrase, or a pattern of words, or by facesheet variables such as pulling out all interviews with women. Can have several windows open to have data and codes on view simultaneously. Representation of node relationships ('tree structures') is not graphical.

Mainframe version supports multi-user multi-database projects; minis or mainframes that have any version of Common LISP. Mac version needs any Macintosh with minimum of 2 MB free main memory; 2 MB+ Macs on Appletalk network. Hard disk. PC version for 386s. Mainframe AUS $1500. Macintosh single user AUS $250, network AUS $1000; site AUS $3000.

Distributor: NUDIST Project, ACRI, La Trobe University, Bundoora, Victoria 3083 Australia. Tel (613) 479 2857. During 1993 a network of franchised dealers is being set up.


Provides basic functions needed by qualitative researchers, similar conception to Ethnograph. Search for co-occurring codes on basis of overlap and nested segments. Just 4 menus, pure cut and paste. Text segments can be identified flexibly, codes attached and segments retrieved. Good introductory program, especially for postgrads. IBM PC/XT/AT or 100% compatible with DOS 2.0 or higher; at least 128k RAM. $160

Distributor: Qualitative Research Management (see address above)


Emphasises Dewey's 'induction process'. Many relationships are built in. User formulates queries about codes, such as an if-then query. Replies are confirming and disconfirming instances. Unappealing interface. Uses LogLisp.

Mainframe only. DEC VAX/VMS or IBM CMS/VM.

$700 site licence

Distributor: Ernest Sibert and Anne Shelly, Syracuse University, 4-116 CST, School of Computer and Information Science, Syracuse, NY 13244-4100 USA.

Textbase Alpha

Permits coding of data which have an internal structure, as well as narrative texts of any kind. Searching and assembling of coded segments is supported along with frequency counts, and data matrix output. Completely new version ('Textbase Beta') now at advanced stage with new features, can code portions of line and can be used with a mouse. Similar code memoing to Ethnograph ver.4. IBM/XT and fully compatibles; 640k RAM and DOS 2.00 or higher. $160

Distributor: Although this program was developed in Denmark it is most straightforward to order from Qualitative Research Management (address above).


Enhanced text retrieval software which allows coding. Minimal file preparation, instant retrieval of target phrases (especially useful for structured interviews). Data can be on screen with your comments and an index of comments. Code categories have to be in the data, though you can edit in codewords. Conditional searches on Boolean basis. Instant index (occurences in corpus, filename and page number). Link files (conceptual categories). Comment window attaches to datum. Good introductory program, simple, basic and fast. Unknown price.

Distributor: Qualitative Research Management (address above).


Sophisticated package with an impressive interface. Like NUDIST, emphasis is on inter-code relationships and theory-building, rather than straight code and retrieve. SPSS interface and Graphical User Interface. Two levels, text level (segmenting, coding) and theory building (manipulating and specifying code relationships). Networking tool, query browser. You get a series of statements expressing textual relationships, like 'this statement justified that', and inter-code relations, like AI/formal logic relations, followed by your commentary on text. Unlimited codes for text segments. IBM-compatible 80286, 80386 SX (or better) based PC-AT with system speed greater than 20MHz, 4MB RAM, a VGA graphics adapter and monitor is necessary. Demo only at present.

Distributor: No commercial distribution as yet. Current details from the developer, Dr Thomas Muhr, Technische Universität Berlin, Projekt ATLAS, Hardenbergstr. 28, D-1000 Berlin 12.

The users voice

Like other software, there is a need for systematic evaluation of program capabilities. Some progress is being made. There is work being done at University College Cardiff on the evaluation of selected software, including The Ethnograph; the project is headed by Paul Atkinson. In the US, the group at Boston College associated with the Mac program HyperRESEARCH is beginning to carry out detailed and explicit assessments of software. The ins and outs of CAQDAS, and particular packages, are a recurring topic on the 'QUALRS–L' e-mail discussion group (details below). Finally, Ray Lee and I are engaged in the focus groups mentioned above which are looking at the experiences of users of several selected packages. Preliminary findings were presented at the Bremen conference (details below) in October 1992 and the fieldwork will continue over the next 12 months.

Keeping in touch

If you are looking for an introduction to CAQDAS, Ray Lee has run short courses in the autumn for several years. Ray can be contacted at the Department of Social Policy and Social Science, Royal Holloway, Egham Hill, Egham TW20 0EX (tel. 0784 443152).

If you want to participate in the 'QUALRS-L' electronic discussion group, you can join by sending a message to containing the message: Subscribe QUALRS-L <your name, institution>. Note that QUALRS-L concerns all aspects of qualitative research, not just CAQDAS.

Surrey's involvement in all this largely began when Nigel Gilbert, Nigel Fielding and Ray Lee convened a conference on the subject in 1989. The Surrey conference inaugurated a series, with the second at the University of Colorado in Breckenridge and the third at the University of Bremen, Germany. The fourth conference will take place in 1994, probably in June/July, at Syracuse University, New York. Details will appear on QUALRS-L, or contact the organiser, Prof. Anne Shelley (address shown in entry for QUALOG above) or Nigel Fielding at Surrey, closer to the time.

Further reading

N. Fielding and R. Lee, eds., (1991) Using computers in qualitative research, Sage.

K. Hinze (1987) 'Computing in sociology', Social science computer review, 5:439-51.

R. Lee and N. Fielding (1991) 'Options, problems and potential' in Fielding and Lee.

R. Tesch (1991) 'Software for qualitative researchers: analysis needs and program capabilities' in Fielding and Lee.

