Corpus Analysis as a Methodology for Studying the Conceptual Foundations of Sustainability and Healthcare

The SHE Corpus is a new resource developed by the Centre for Sustainable Healthcare Education to enable students and researchers to investigate the key concepts that underpin the practice and ethos of modern medicine as well as the discourses on sustainability in the field of healthcare. 

Screenshot of a mosaic with words, a corpus of information

Mosaic abortion, SHE Corpus.

Eyebrow, hair, eyes, smileProfessor Mona Baker is a world-leading scholar in translation studies and co-cordinator of the Genealogies of Knowledge Research Network. She is currently leading the development of the SHE Corpus at the Centre for Sustainable Healthcare Education (SHE) to support a corpus-based research agenda for the fields of health and sustainability studies, in collaboration with Professor Eivind Engebretsen and other colleagues.

We spoke to her about her experience in corpus building and the rationale for building the SHE Corpus.

What is a corpus?

– The word ‘corpus’ can be a bit confusing for those not familiar with its more recent, technical use. The original Latin means ‘body’ in the concrete, material sense. However, the word later acquired the meaning of a (physical) collection of texts that constitute the data for a particular piece of research. Today, Corpus is increasingly used to refer to a collection of texts (written, spoken or multimodal) that are held in an electronic format and are hence amenable to automatic or semi-automatic analysis using a wide range of software applications, says Professor Baker. 

– Corpora in this more modern sense are not random collections of text: They are carefully designed according to clear selection criteria in order to support specific types of research and educational programmes. 

The human intuition on language

The linguist Professor John McHardy Sinclair said that when we look at language in large quantities the things we assume can change and turn out to be different. Professor Baker elaborates Sinclair’s statement: 

– What he means is that introspection is not a reliable method of identifying patterns in language. We use language intuitively but if asked to reflect on it we often fail to identify the most interesting patterns and frequently come up with unsubstantiated assumptions. As he puts it in Corpus, Concordance, Collocation (Sinclair 1991:4), “the contrast exposed [by the availability of large corpora] between the impressions of language detail noted by people, and the evidence compiled objectively from texts is huge and systematic. It leads one to suppose that human intuition about language is … not at all a good guide to what actually happens when the same people actually use the language”.

One of the many examples Sinclair goes on to discuss is the starkly different patterning of eye and eyes. Most people would assume the patterning of the singular and plural forms of the same word would be pretty much identical. But drawing on the COBUILD corpus Sinclair demonstrated that only the plural form attracts collocates relating to colour: brown eyes, blue eyes, etc. The singular form predominantly features in mostly metaphorical expressions relating to visualizing something (with the naked eye; keep an eye out for; in the public eye; turn a blind eye to, etc.).

A new innovative approach in education

Centre for Sustainable Healthcare Education (SHE) has pioneered new approaches to teaching sustainable health. By using corpus analysis and critical discourse analysis it is possible to engage students in critical analysis of basic concepts in medicine. The purpose is to outline a new methodology that allows students to investigate basic concepts that underpin the practice and ethos of modern medicine. 

– There is a great deal of literature on the use of corpora in language teaching. But because corpus research has so far been largely confined to the field of linguistics, there is as yet little or no literature on using corpora in other educational contexts. Our ongoing work on developing the SHE Corpus is laying the foundation for a new, innovative approach to deploying corpora in the context of sustainable healthcare which we believe will also be applicable in other scientific and disciplinary domains. At the heart of this approach is an emphasis on critical engagement with key concepts in the field of sustainable healthcare – rather than the traditional emphasis on language patterning per se. 

The method encourages students to reflect critically

– Unlike other existing corpora, the texts we include in the SHE Corpus are selected because they are rich in particular concepts that in a sense define the field of sustainable healthcare. They are drawn from a diverse range of genres, both mainstream and non-mainstream, including medical journal articles, policy documents, online magazines, websites of grassroots organizations such as Abortion Rights Campaign, and internet blogs, explains Professor Baker.

As such, the corpus enables us to design socially conscious educational programmes that encourage students to revisit the foundational concepts underpinning the current discourse on sustainable healthcare and scrutinize the global aspirations that underlie the sustainable development model in global health. Specifically, it allows them to revisit the foundational concepts that support the agenda (such as sustainability, well-being, equity, partnership, resilience, and empowerment), and to reflect critically on the instability of their meanings and the diverse and occasionally contradictory ideological messages they communicate. 

The SHE Corpus web interface can be accessed here.
The User Manual that accompanies it is available here.

Studies at University of Oslo using SHE Corpus method

FHE4350 – Politics of Sustainability in Public Health – Data-driven Critical Conceptual Analysis
The course draws attention to the power of key concepts in the sustainability agenda and more specially, how such concepts have come to accommodate various and sometimes conflicting ideological messages. This course utilizes a unique datathon model that encourages students to engage in critical reflection on sustainability concepts and ideologies. The course is part of the Master in Public Health Science and Epidemiology.

Education for Sustainable Health (honours certificate)
This Honours certificate offers a unique opportunity to acquire knowledge about some of our time's greatest challenges to health and how to address them. Students will also develop the skills needed to transfer this knowledge to health professionals in different educational settings.

Debating Democracy – courses coming in 2024
The courses aim to equip students with digital literacy skills needed for meaningful participation in democratic society and hone their skills in critical reflection and complexity thinking, focusing on key concepts underpinning shared civic values in the often competing discourses on democracy and sustainability. Findings drawn from interdisciplinary, empirical analysis of very large corpora will be incorporated into a critical, culturally sensitive, student-driven educational model. 

Further readings on Oslo Medical Corpus

Website of SHE Corpus


Buts, J,. Baker, M., Luz, Saturnino, Engebretsen, E. Epistemologies of evidence-based medicine: a plea for corpus-based conceptual research in the medical humanities. Medicine, Health Care and Philosophy. 2021.

Tags: SHE Corpus, JohnSinclair, Mona Baker, critical thinking, Medical Humanities, Education for Sustainable Health, Honours Certificate, debating democracy, MEDRA By Trine Kleven
Published Oct. 23, 2023 2:43 PM - Last modified Apr. 19, 2024 10:36 AM