Issue 7 | Winter 1995 |
Social Research Update is published quarterly by the Department of Sociology, University of Surrey, Guildford GU2 7XH, England. Subscriptions for the hardcopy version are free to researchers with addresses in the UK. Apply by email to sru@soc.surrey.ac.uk.
Correspondence analysis
Dianne Phillips is a lecturer in Sociology at the Manchester Metropolitan University. She is responsible for the work of the Social Information Technology Unit which provides research support and training in the use of computer applications for social research. Her sociological interests and publications have ranged from cognitivism to youth unemployment, but her present commitment is to encourage and support the use of exploratory data analysis by professional groups involved in monitoring and evaluation. She is also the chair of ASSESS, the SPSS user group.
Correspondence analysis is primarily a technique for representing the rows and columns of a two way contingency table in a joint plot.
It is by no means a 'new' technique for data analysis (Hill 1974). Proponents trace its development from the mid 1930s, for example in the work of Hirschfield (1935). One source of confusion is that correspondence analysis is equivalent to a number of techniques which have appeared in different contexts under different labels. Correspondence factor analysis, principal components analysis of qualitative data and dual scaling are but three of a long list of alternative names presented by Nishisato (1980). Fielding (1992) provides a useful review of the relationships between the different variants.
Nevertheless, it remains the case that correspondence analysis has been relatively little used in social science research in the United Kingdom and the USA. This is surprising, considering how popular the technique is elsewhere. In France, for example, it was originally developed in the 1960s to provide a mathematical analysis of contingency data sets in linguistics. The label 'correspondence analysis' is a translation of the French 'analyse des correspondances', a term associated with the work of Benzecri (1992). It is nowadays commonplace for the French press to include correspondence maps in their articles on topics such as voting behaviour. But interest in the use of the technique among English and Amercan sociologists seems to have remained low until the publication of Greenacre's text (1984) and the easier availability of appropriate computer software.
There are several reasons why a sociologist could be attracted to correspondence analysis.
Despite these reservations, the approach does merit more attention in sociological research. Only a brief sketch of the mechanics and main concepts of correspondence analysis is offered here; at the end there is a selection of references for those who wish to take it further.
The basic model of correspondence analysis
Correspondence analysis seeks to represent the interrelationships of
categories of row and column variables on a two dimensional map. It can be
thought of as trying to plot a cloud of data points (the cloud having
height, width, thickness) on a single plane to give a reasonable summary
of the relationships and variation within them.
To illustrate this, consider Table 1, a typical two dimensional contingency table. The data are from research into lifestyle and cultural consumption in a UK city (Featherstone et al, 1994). One aspect of the study looks at the leisure activities of people living in the new inner city developments. Included in the questionnaire were questions about the use of local facilities, from pubs to art galleries, knowledge of and preferences in music, and involvement in political issues. In Table 1, scores on knowledge of music (m1 = low , m4 = high) have been crosstabulated with scores on theatre visits (t1 to t4).
The first step in a correspondence analysis is to examine the profiles (the set of relative frequencies). This concept is basic to correspondence analysis. Table 2 gives the row profiles for the data in Table 1.
Note that in Table 2 the final row of the row profiles and the final column of the column profiles are labelled 'Average'. These are the proportion of the number of respondents in the row (or column). For example, the figure for row t1 is 85 of 210 (i.e. .405). These values are used as weights ('masses') in the calculation of weighted distances. Mass affects the centroid, the position of the 'middle' of the cloud of points. Point t1, for example, has a large mass (.405) relative to the others and will pull the centroid towards its location.
Figure 1 shows the correspondence map produced using the ANACOR procedure in the SPSS for Windows categories module.
Figure 1. A correspondence map
The chart suggests that low scores on knowledge of music and theatre visits (t1, m1) are closely associated. There is a clear left right dimension of low to high on both variables.
How satisfactory is this as a representation? One answer is to examine the inertia of each dimension. This is given in table 4.
The column headed 'proportion explained' shows that the first dimension explains 93 per cent of the total inertia, a measure of the spread of the points. The first two dimensions together explain 98 per cent and therefore a two dimensional solution appears satisfactory.
An examination of the contribution to the inertia of each row and column point helps in the interpretation of the dimensions:
Table 5 shows that t4 (.526) contributes substantially to the first dimension. m4 (.476) contributes substantially to the second.
Multiple correspondence analysis
In the 'Lifestyle and Cultural Consumption' study, we used correspondence
maps to gain initial answers to questions about the general pattern of
cultural consumption of the new residents of the Manchester City Centre.
The research intends to contribute to discussion about 'post-modern'
collapse of cultural barriers. We carried out a multivariate homogeneity
analysis (using the HOMALS procedure in SPSS) to map the cultural
consumption of the city centre residents. Relationships between seven
variables were explored: knowledge of music, films, theatre and art
gallery visits, progressive political activity, fashion and awareness of
discos and the pop scene.
Figure 2. A correspondence map of category quantifications
This map (Figure 2) places categories that are associated close together. The first dimension differentiates most clearly along the categories of film knowledge, the second on the music scores. Taking the left hand side of the plot, a low knowledge of film directors (fl1) is closely associated with low scores on visits to galleries (g1), theatre (t1) and music (m1). On the right hand side, a high score on film knowledge (fl3) is associated with high scores on gallery and theatre visits (g4, t5). High scores on politics are also associated with high scores on film knowledge.
Although the second dimension discriminates most obviously on knowledge of music, it also discriminates between high scores on all dimensions and middle and low scores. (A score of 1 on politics indicates a level of active opposition to 'progression' that does not characterise a score of 2 or 3).
Tentatively, given a three dimensional solution, we proposed that the dimensions are
Figure 3. Discrimination measures
The chart of discrimination measures,Fig. 3, suggests that the first dimension in the multivariate case is related most strongly to the knowledge of film directors, the second to music. Fashion is discriminating poorly in the two first dimensions. Theatre attendance, gallery and disco visits and politics are in between.
In terms of our original questions, the maps, taken together, suggest that the majority of the study sample are culturally active, with progressive politics working like other cultural dimensions. They are a large "centre" with substantial usage, but not particularly heavy on what Bourdieu would call cultural capital, or exploratory cultural practice. There is then in each cultural dimension a much smaller group of serious enthusiasts. This group includes some who are scoring high on all scales except fashion and disco usage.
There is, in addition, a cluster of scores low on all dimensions. We would interpret this in relation to the 25 per cent of the sample population who make little or no use of the cultural facilities and score low on knowledge and "taste" tests. The only obvious anomaly is that a high score on Disco rates low on commitment and activism.However, this question reflects awareness and recognition as much as actual attendance. The positioning of high fashion scores close to low scores on the other variables appears to suggest a separateness of style activism from other activisms.
Very tentatively, the maps fit a picture of a substantial middle of fairly high usage and cross-over, without enthusiast commitment, with two separate groupings of low awareness and use, and one of high awareness, use and active commitment. This could suggest a continuity of old boundaries rather than a post-modern collapse, but the relative "size" of the middle grouping suggests that this would be a very partial explanation. In general, the cultural picture is strikingly unlike Bourdieu's frame of twenty years ago.
The lifestyle study has been used to demonstrate the use of correspondence mapping as an exploratory device in a sociological context. As all 'good' exploratory devices should, it has promoted new suggestions and ideas for the researchers to follow. It certainly provided a picture of relationships which might well have been missed in a conventional laborious study of numerous crosstabulations.
References
Benzecri, J.P. (1992) Correspondence Analysis Handbook, New York: Marcel
Dekker.
Bourdieu, P. (1979) Distinction: A Social Critique of the Judgement of
Taste, Routledge.
Featherstone, M., O'Conner, J., Phillips, D. and Wynne D. 1994. Lifestyle
and Cultural Consumption in the City. ESRC End of Project Report Ref:
R000-23-3075.
Fielding, A (1992) Axiomatic Approaches to Scoring Ordered Classifications
University of Birmingham, Department of Economics, Discussion Paper 92-06.
Greenacre, Michael J. (1993).Correspondence Analysis in Practice. London:
Academic Press.
Hill, M.O. (1974) Correspondence analysis: a neglected multivariate
method. Applied Statistics, 23, 340-54.
Hirschfield, H.O. (1935) A connection between correlation and contingency
Proc. Camb. Phil. Soc.31, 520-4.
Nishisato, S. (1980) Analysis of Categorical Data: Dual Scaling and its
Applications, Toronto: University of Toronto Press.
Greenacre, Michael J. (1984) Theory and Applications of Correspondence Analysis. London: Academic Press.
The first text in English to describe 'analyse des correspondances'. Greenacre is also the author of the SIMCA software for simple correspondence analysis.Jambu, Michel. (1991). Exploratory and Multivariate Data Analysis, Boston Academic Press.
Like Greenacre, Jambu's mentor was Benzecri. This text has two chapters specifcally on correspondence analysis and covers many other tools and techniques for exploring multivariate data.Van de Geer, John, P. (1993) Multivariate Analysis of Categorical Data: Theory. Newbury Park: Sage Publications. Advanced Quantitative Techniques in the Social Sciences Series Vol 2.
Van de Geer, John, P. (1993) Multivariate Analysis of Categorical Data: Applications Newbury Park: Sage Publications Inc. Advanced Quantitative Techniques in the Social Sciences Series Vol 3.
These two books focus on the GIFI approach to categorical analysis developed by the group of statisticians in the Department of Data Theory at the University of Leiden. The group was also responsible for the development of the ANACOR and HOMALS procedures in SPSS categories. The applications volume has many useful detailed examples.Weller, Susan C. & A. Kinball Romney (1990) Metric Scaling. Newbury Park: Sage Publications, Series: Quantitative Applications in the Social Sciences no 75.
This text explores, clearly and concisely, three approaches to metric scaling; principal components, multidimensional preference scaling and correspondence analysis and shows their close relationship.Social Research Update is published by:
Department of Sociology
Telephone: +44 (0) 1 483 300800
Fax: +44 (0) 1 483 689551
Edited by Nigel Gilbert.
Winter 1995 © University of Surrey
Permission is granted to reproduce this issue of Social Research Update provided that no charge is made other than for the cost of reproduction and this panel acknowledging copyright is included with all copies.