Contributions: Joshua Wilson Black: primary analysis, write up. Jen Hay: research direction, write up. Lynn Clark: research direction. James Brand: pilot work on interval representation of data.
This research was carried out by Joshua Wilson Black, Jen Hay, Lynn Clark, and James Brand.
This work is part of the project Towards an Improved Theory of Language Change, based at the New Zealand Institute of Language, Brain and Behaviour | Te Kāhui Roro Reo at the University of Canterbury | Te Whare Wānanga o Waitaha.
We gratefully acknowledge the support of the Marsden Fund | Te Pūtea Rangahau a Marsden.
Application number: (17-UOC-049)
Contributions: Joshua Wilson Black: primary analysis, write up. Jen Hay: research direction, write up. Lynn Clark: research direction. James Brand: pilot work on interval representation of data.
Our problem:
Research on style predicts systematic covariation of linguistic variables as a result of shifts in speaker style (e.g. Podesva 2008), and invites the development of quantitative methods to explore any such covariation (e.g. Tamminga 2021).
Our method:
To extend Brand et al.'s (2021) use of PCA from across-speaker variation
to within-speaker variation.
Our results:
The literature on style both predicts covariation of linguistic variables and invites methodological developments so that we can get a quantitative handle on such covariation. Evaluating the predictions depends on the development of appropriate methods.
Previous work on the project (by all authors but Wilson Black), showed that Principal Component Analysis is an appropriate methods for finding vowel covariation across speakers. We attempt to push a similar method in order to explore within-speaker covariation.
We did not find stylistic covariation. However, we did find a surprising and, we argue, overlooked, source of covariation in amplitude. We find that first formant values pattern together with amplitude.
While this is known in certain laboratory environments, and is particularly associated with loud speech and/or distant interlocutors, it has not been shown in the kind of naturalistic recordings which sociolinguists are often interested in.
Second, we found that variation in amplitude is being used, possibly on purpose, by speakers to mark their position in sub-topic segments of our recordings. This is a potential link with literature on, say, turn-taking, where amplitude is a variable which can be used to signal that one is finished or not in a dialogue.
We didn't solve our problem, but our results are of interest nonetheless!
We will discuss the first of these results here, but are happy to talk about the second if that is of interest.
Our data comes from the QuakeBox corpus (Clark et al., 2016). The corpus has 431 speakers who freely respond to the prompt 'tell us your earthquake story'. The recordings are made with high quality audio and video equipment.
The corpus contains topic tagging for a subset of speakers. These tags indicate whether the speaker is talking about, say, the September 2010 Earthquake, or the larger February 2011 Earthquake, or their life before the earthquakes etc.
The corpus is stored in LaBB-CAT and run through HTK forced alignment (Fromont and Hay, 2008). LaBB-CAT also interfaces with Praat.
If it comes up filtering details are given as an additional slide at the end of the deck.
The example is a simplified version of two PCs in Brand et al. 2021. These are from the ONZE corpus and represent structure in large scale across-speaker change in NZE.
Can the same methods work within speakers?
The observations need to be complete. That is, we need an entry for each of our variables (formants, amplitude, and articulation rate for each vowel).
The interval solution is not the only possible solution. We tried more sophisticated methods (representing each vowel with a GAMM, for instance), but these proved very difficult to get a handle on (for a sense of why see Meredith Tamminga's recent paper in Social Meaning and Linguistic Variation (2021).).
The figure on the left is a zoomed in section of a larger plot of all of the data for a single speaker. The two rows are the first formant values for two different vowels. The points are the frequencies of tokens of the vowel. The filled in squares are the intervals, with values represented by the shade, with more red intervals having higher mean values and more blue intervals having lower mean values.
Note that different vowels have quite different frequencies. These are 240 second intervals, and one of the vowels only has one token!
More detail on imputation process given in additional slide at end of deck.
Worth noting here that 240 second intervals require almost no imputation, whereas the 60 second intervals require quite a bit.
This depicts all of the data (with exception of articulation rate) for a single speaker. We have both interval lengths, both F1 and F2, all vowels, and amplitude.
Note that amplitude seems to be gradually reducing.
An intuitive way to think of our PCA analysis is that seeks to find associations between the patterns of colour in these variables.
We jump straight to the PCA. This analysis is the result of putting all of the 60 second intervals into PCA. It is thus an attempt to see within speaker patterns which are about to be characterised across speakers. Any patterns we find are patterns in the corpus at large, but they are patterns which apply within monologues.
We will here present only the results for the 60 second intervals. The 240 second intervals give the same kind of PC1.
PC2 the easiest to use to indicate how to read variable plots. PC2 indicates one pattern in the data: when GOOSE F2 goes up, GOOSE F1 goes down, and vice versa. We are not interested in PC2 otherwise, though: it is not a relationship between vowels.
PC1 is our main phenomenon. Note all F1 on the right, along with amplitude. PC1 says that one ingredient in the data is F1s moving together with amplitude. Incidentally, our initial PCA did not include amplitude. This PCA analysis was an initial attempt to explain F1 variation.
240s Intervals and results of permutation test given as additional slide at end.
To further confirm our claim that amplitude drives systematic covariation of F1, we fit a GAMM model with:
All variables come up as significant by model comparison significance test, but amplitude far and away the strongest effect. Addition at end: sig test scores.
We already see with the last point, some consequences for the nature of the nature of the vowel space. We see this more in the next slide.
These plots assume that the non-varied variables maintain average values and that we are at the mid point of a monologue.
Addition at end: embedded vowel space Shiny for exploration of different intervals.
Natural amplitude variation in quiet environments correlates with F1.
Challenges for investigation of within speaker stylistic covariation.
Challenges for investigation of across speaker covariation.
Amplitude variation related to topic structure.
Regarding 1: 24dB variation in our data vs. 10db variation in quiet vs. loud environment study (Liénard & Di Benedetto 1999).
Regarding 2:
Priming work by Villarreal & Clark mentioned as study which might have benefited from including amplitude.
Regarding 3:
Differences in recording equipment make this very difficult. No objective measure forthcoming for most of the data we work with.
The proposal to include relative amplitude also insufficient if speakers have been recorded entirely within a high or low amplitude span of speech. This creates serious problems for comparing speakers with one another.
Regarding 4:
Exclude foot, then
NB: No imputation required for amplitude or articulation rate.
Variable | p-value |
---|---|
Time | 6.382e-08 |
Articulation Rate | 1.051e-13 |
Amplitude | < 2e-16 |
Pitch | < 2e-16 |
We do not find any strong evidence that our F1 co-variation is agentive (as say, stylistic co-variation would be). But is amplitude variation being used agentively.
We don't have much in our corpus to get a handle on this question, but we do have the topic tags. These give us structure within out monologues which speakers ought to be somewhat aware of, and which might have some connection with amplitude variation.
The plot here shows amplitude data for six distinct topics within the same speaker. Note that we ignore the specific content of these topics.
We do a bit of additional filtering not discussed here, but note that topic 5 does not have enough data in the middle to be included. (We require five points per section).
Variable | Estimate | Std. Error | t-value |
---|---|---|---|
Beginning | 0.05 | 0.01 | 3.46 |
Middle | -0.024 | 0.02 | -1.10 |
End | -0.10 | 0.02 | -5.28 |
Time | -0.24 | 0.06 | -3.99 |
We don't fit random intercepts because speaker data is z-scored. Our random effect structure treats each topic as a random effect, allowing each part to vary independently. It also treats each speaker as having a more or less extreme change in amplitude over the course of the monologue.
I've not discussed the t-value test we carry out in the paper here. This is given as an additional slide.
This plot displays predictions from the model for each topic in the data set.
The effect does look quite subtle, but it is there.
We generate fake topics by, for each speaker, collecting the lengths of their topics, and then selecting random chunks of their monologue of the same length.
This is done, and the modelling step repeated, 1000 times. The distributions depicted here are the distribution of t-values from models fit on the fake topics.
This research was carried out by Joshua Wilson Black, Jen Hay, Lynn Clark, and James Brand.
This work is part of the project Towards an Improved Theory of Language Change, based at the New Zealand Institute of Language, Brain and Behaviour | Te Kāhui Roro Reo at the University of Canterbury | Te Whare Wānanga o Waitaha.
We gratefully acknowledge the support of the Marsden Fund | Te Pūtea Rangahau a Marsden.
Application number: (17-UOC-049)
Contributions: Joshua Wilson Black: primary analysis, write up. Jen Hay: research direction, write up. Lynn Clark: research direction. James Brand: pilot work on interval representation of data.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |