library(tidynorm)
library(dplyr)
library(tibble)
library(ggplot2)library(tidynorm)
library(dplyr)
library(tibble)
library(ggplot2)options(
ggplot2.discrete.colour = c(
lapply(
1:6,
\(x) c(
"#4477AA", "#EE6677", "#228833",
"#CCBB44", "#66CCEE", "#AA3377"
)[1:x]
)
),
ggplot2.discrete.fill = c(
lapply(
1:6,
\(x) c(
"#4477AA", "#EE6677", "#228833",
"#CCBB44", "#66CCEE", "#AA3377"
)[1:x]
)
)
)
theme_set(
theme_minimal(
base_size = 16
)
)The Discrete Cosine Transform re-describes an input signal as a set of coefficients. These coefficients can be converted back into the original signal, or simplified, to get back a smoothed form of the original signal.
For example here is an F1 track with 20 measurement points from the speaker_tracks data set.
one_track <- speaker_tracks |>
filter(
speaker == "s01",
id == 9
)one_track |>
ggplot(aes(t, F1)) +
geom_point() +
geom_line()If we apply dct() to the F1 track, we’ll get back 20 DCT coefficients.
dct(one_track$F1)
#> [1] 482.3728655 16.5472580 -25.0305876 -3.4475760 -8.8201713 -2.4903558
#> [7] -3.1619876 -2.9428915 -5.2993291 -0.9811638 0.5681181 0.7707920
#> [13] -0.4318330 0.2322257 -0.3945702 -0.5995980 -0.4285492 0.8180725
#> [19] 0.7793962 -0.1793681And, if we apply idct() to these coefficients, we’ll get back the original track.
one_track |>
mutate(
F1_dct = dct(F1),
F1_idct = idct(F1_dct)
) |>
ggplot(
aes(t, F1_idct)
) +
geom_point() +
geom_line()However, if we apply idct() to just the first few DCT coefficients, we’ll get back a smoothed version of the formant track.
one_track |>
mutate(
F1_dct = dct(F1),
F1_idct = idct(F1_dct[1:5], n = n())
) |>
ggplot(
aes(t, F1_idct)
) +
geom_point() +
geom_line()There are three reframe_with_* functions in tidynorm.
reframe_with_dct()
This will take a data frame of formant tracks, and return a data frame of DCT coefficients.
You need to be able to identify which rows belong to individual tokens, and can identify a column for the time domain.
reframe_with_idct()
This will take a data frame of DCT coefficients, and return a data frame of formant tracks.
You need to be able to identify which rows belong to individual tokens, and can identify a column for the parameter number.
reframe_with_dct_smooth()
This combines reframe_with_dct() and reframe_with_idct() into one step, taking in a data frame of formant tracks, and returning a data frame of smoothed formant tracks.
You need to be able to identify which rows belong to individual tokens, and can identify a column for the time domain.
To get average formant tracks for each vowel, you’ll need to
# focusing on one speaker
one_speaker <- speaker_tracks |>
filter(speaker == "s01")
dct_smooths <- one_speaker |>
# step 1, reframing as dct coefficients
reframe_with_dct(
F1:F3,
.token_id_col = id,
.time_col = t
) |>
# step 2, averaging over parameter number and vowel
summarise(
across(F1:F3, mean),
.by = c(.param, plt_vclass)
) |>
# step 3, reframing with inverse DCT
reframe_with_idct(
F1:F3,
# this time, the id column is the vowel class
.token_id_col = plt_vclass,
.param_col = .param
)dct_smooths |>
filter(
plt_vclass %in% c("iy", "ey", "ay", "ay0", "oy")
) |>
ggplot(
aes(F2, F1)
) +
geom_path(
aes(
group = plt_vclass,
color = plt_vclass
),
arrow = arrow()
) +
scale_y_reverse() +
scale_x_reverse()The DCT decomposes an input signal as a combination of weighted cosine functions, and returns those weights. You can access the cosine functions it uses with dct_basis().
basis <- dct_basis(100, 5)
matplot(basis, type = "l", lty = 1, lwd = 2)One way to think about it is that the DCT is using these cosine functions in a regression, and the values that get returned are the coefficients.
dct(one_track$F1)[1:5]
#> [1] 482.372866 16.547258 -25.030588 -3.447576 -8.820171lm(
one_track$F1 ~ dct_basis(20, 5) - 1
) |>
coef()
#> dct_basis(20, 5)1 dct_basis(20, 5)2 dct_basis(20, 5)3 dct_basis(20, 5)4
#> 482.372866 16.547258 -25.030588 -3.447576
#> dct_basis(20, 5)5
#> -8.820171For more details on the mathematical formulation of the DCT, see the dct() help page.