How to use the Bambara Reference Corpus

You want to use the Bambara Reference Corpus but don’t know where to start because you are lost amidst its ugly and confusing interface that only linguists and computer programs can stomach? You’re in luck. Here’s a quick overview followed by recommended settings for getting started...

Which corpus to search?

The Bambara Reference Corpus actually includes a number of different corpora, some of which are more useful for a lay user. One must select which one whenever one performs a search. Here are the most important ones with a general comment about their use:

The three main corpus types

  • Corbama-brut (a partially non-disambiguated corpus)

    A so-called raw corpus which includes ALL of the texts in the overall collection regardless of whether they have been disambiguated by part of speech etc.

    Searching in this corpus is akin to using the basic search function within a text editor such as word. A search for “bon” will reveal any word-form that includes those letters: bón ‘house’, 'bòn ‘fat’, but also bònya ‘respect’ etc.

  • Corbama-net-non-tonal (a tonally non-disambiguated corpus)

    A corpus of disambiguated texts without tones marked.

    Searching in this corpus allows one to more quickly know whether a search for a word without tone marked is revealing the lexeme of interest or not. For example, one can search for “bon” and then scroll through to see whether the hits are for bòn ‘fat’ or bón ‘house’.

  • Corbama-net-tonal (a disambiguated corpus)

    A corpus of disambiguated texts with tones marked.

    This allows savvy users to search for specific words in the most refined way by using tone. One can from the beginning choose to search for “bón” (‘house’) instead of “bòn” (‘fat’) without needing to read through or filter the hits etc.

Recommended settings

Ok, you know which Bambara corpus is best for you, but how can you quickly get back example sentences in a format that looks halfway decent for a human? Here’s some good options for initial users before you start getting a hang of the interface and the weird language of corpus linguistics (query, KWIC, tokens, blah-blah, God help us etc.):

General Viewing by Sentence

2. Click on the indented “Sentence” under “View Options”

  1. Perform a search.

  2. Click on the indented “Sentence” under “View Options”

Settings and things to check as outlined in steps 3-7

3. Click on “View Options” and select the following:

Attributes: word, tag, gloss

Structures: <doc>

References: doc.text_title

4. Check the box for “References up”

5. Select the button for “KWIC tokens only”

6. Make “Page size” equal to 10

7. Check the following:

“Allow multiple lines selection”

“Checkbox for selecting lines”

“Shorten Long Reference”

A nice readable list of sentences with the word you are interested in highlighted with its part of speech and its gloss in French as well as the title of the documents from which they came.

8. Click “Save and changes options” and now perform searches and look happily at your results.

Steps to Export Examples for Formatted Interlinear Gloss Tables

4. Under “Display attributes” select “For each token”

  1. Perform a search.

  2. Make sure that you are in “Sentence” and not “KWIC” view mode (underneath “View Options”)

  3. Select “View options”

  4. Under “Attributes” select the following: Word, Gloss

  5. Under “Display attributes” select “For each token”

  6. Click “Save & change options”

  7. Click “Save” on the left. This will allow us to produce a basic text file with the words and their respective glosses on two lines aligned via tab keystrokes.

  8. Uncheck “Include heading” and “Number lines”

  9. Copy sentence of interest. Paste into Word. Use “Convert Text to Table”. Edit as needed.