文档翻译:What Are Acoustic Landmarks, and What Do They Describe?

编程入门 行业动态 更新时间:2024-10-05 17:18:03

<a href=https://www.elefans.com/category/jswz/34/1770955.html style=文档翻译:What Are Acoustic Landmarks, and What Do They Describe?"/>

文档翻译:What Are Acoustic Landmarks, and What Do They Describe?

What Are Acoustic Landmarks, and What Do They Describe?

文档地址:.pdf
In speech acoustics, landmarks are patterns that mark certain speech-production events. Speech acoustic landmarks come in two classes: peak and abrupt.
在语音声学中,地标是标记某些语音生成事件的模式。 语音声学地标分为两类:峰值和突然。

Peak: At present, the peak landmarks detected in SpeechMark® are vowel landmarks (VLMs) and frication landmarks. These are identified as instants in an utterance at which a maximum (or peak) of harmonic power or of fractal dimension occurs, respectively, and may be considered the centers of the vowels or fricated intervals (resp.). When plotted with SpeechMark functions, they are drawn below the waveform, labeled by uppercase letters: V or F. Frication landmarks are more fully described elsewhere (e.g., “Frication Peak Landmarks” on the SpeechMark website) and will be ignored here.1
Abrupt: Abrupt or abrupt-consonantal landmarks (AC LMs, or simply LMs) have a more complex specification.
峰值:目前,在SpeechMark®中检测到的峰值地标是元音地标(VLM)和摩擦地标。这些被识别为,分别出现谐波功率或分形维数的最大(或峰值)的话语中的瞬间,并可被认为是元音或摩擦间隔的中心(分别)。当使用SpeechMark函数绘制时,它们被绘制在波形下方,用大写字母标记:V或F。在其他地方(例如,SpeechMark网站上的“Frication Peak标志”)更详尽地描述了摩擦地标,这里将被忽略。
突然:突然或突然的辅音地标(AC LM,或简称LM)具有更复杂的规范。

It is helpful first to distinguish laryngeal-source from vocal-tract events. We denote the former by “+g” (overall onset) or “-g” (overall offset), by “+p” (onset of periodicity) or “-p” (offset, likewise), or by “+j” (upward jump of fundamental frequency, F0) or “-j” (downward, likewise). The detailed rule for the critical +g is particularly complex. However, the central observation is easily stated: Vocal-tract excitation by the laryngeal source is characterized by well-developed voicing.
首先,区分开喉源与声道事件是有益的。用“+ g”(整体开始)或“-g”(整体偏移),“+ p”(周期性开始)或“-p”(偏移,同样)或“+ j”表示前者(基频向上跳跃,F0)或“-j”(同样向下)。关键的+ g的详细规则特别复杂。而中心观察很容易说明:喉源的声道激发的特点是发声良好。

Voicing is considered well developed when there is evidence of sustained periodic excitation of at least minimal amplitude, as measured over intervals of several milliseconds.2 In spectrogram terms: A narrow-band spectrogram shows clearly defined, smooth, approximately horizontal stripes, reflecting the harmonics of the excitation signal. The spacing between stripes defines the fundamental frequency. Apart from occasional jumps (+j), this frequency must lie within a range specified by the user, or by the client software, or by default. (The current defaults for human speech are: maximum F0 = 350 Hz, minimum F0 = 1/5 of maximum; these are typical of adults, especially females.) The limits of such an interval are denoted by +p and –p events.
当存在至少最小幅度的持续周期性激励的证据时,如在几毫秒的间隔内测量,则认为发声很好。用频谱图术语说:窄带频谱图明确显示,平滑的近似水平条纹,反映激励信号的谐波。条纹之间的间距确定了基频。除偶尔跳转(+ j)外,此频率必须位于用户或客户端软件指定的范围内,或默认情况下。(人类语音的当前默认值为:最大F0 = 350 Hz,最小F0 =最大值的1/5;这些是典型的成年人,尤其是女性。)此类间隔的限制由+ p和-p事件表示。

Additionally, voicing is considered to be present in a segment of the signal if it occurs shortly before a segment with well developed voicing with (a) similar power, and (b) similar spectral slope to the well-voiced segment. Currently, “shortly before” is up to 50 ms. Such a segment reflects glottalization or other irregular laryngeal motion.
另外,如果声音在具有良好发声的声音的片段之前不久发生(a)相似的功率,并且(b)与良好浊音的片段具有相似的谱斜率,则认为声音存在于信号的片段中。目前,“不久之前”最多可达50毫秒。这样的片段反映了声门滑动或其他不规则的喉部运动。

Both “g” and “p” LMs occur only in pairs. (Jumps do not.) So we may speak of voicing or of periodic voicing as an attribute of an entire segment of a signal, i.e., of the interval between +g and –g, or between +p and –p, similarly.
“g”和“p”LM均仅成对出现。(跳跃不会。)因此,类似地,我们可以说浊音或周期性浊音作为一个信号的整个片段的属性,即+ g和-g之间或者+ p和-p之间的间隔。

Thus, a +g/-g interval must include at least one +p/-p subinterval. However, it may contain more than one, and it may contain both F0 jumps and intervals of irregular motion between +p/-p subintervals. Sometimes it may contain +p/-p intervals separated only by jumps, either upward or downward. Many voiced intervals start with periodicity, so for these intervals, +g and +p are coincident; and similarly for coincident –g and –p LMs.
因此,+ g / -g间隔必须包括至少一个+ p / -p子区间。但是,它可能包含多个,并且它可能包含F0跳转和+ p / -p子间隔之间的不规则运动间隔。有时它可能包含仅由向上或向下跳跃分开的+ p / -p间隔。许多有声间隔以周期性开始,因此对于这些间隔,+ g和+ p是重合的;类似地,对于重合的-g和-p LMs。

Informally, but very usefully, the remaining LMs are identified as instants at which the signal shows evidence of rapid change across multiple frequency ranges, on multiple time scales.
通常,剩余的LM被识别为在多个时间尺度上信号显示多个频率范围内快速变化的证据的瞬间。

In each case, AC LMs are classified as onset (+) or offset (-) type. They are also classified as voiced or unvoiced, according to their location in a voiced segment (between +g and –g) or an unvoiced one.
在每种情况下,AC LM分为起始(+)或偏移( - )类型。根据它们在有声片段(+ g和-g之间)或无声片段中的位置,它们也被分类为有声或无声。

Processing begins by computing the power in each of several frequency bands. At present, the SpeechMark system normally uses five bands, from 800 to 8000 Hz for adults, or 1200 to 8000 Hz for infants. The instantaneous power is smoothed over two time scales, approximately 25 ms (“fine”) and 50 ms (“coarse”): Coarse smoothing suppresses too-brief events, fine smoothing allows higherprecision placement.
A landmark is detected if power rises or falls by 6 dB simultaneously at both fine and coarse time scales, and in at least 3 of the 5 bands.
处理开始于计算几个频带中的每一个的功率。目前,SpeechMark系统通常使用五个频段,成人为800至8000 Hz,婴儿为1200至8000 Hz。瞬时功率在两个时间尺度上平滑,大约25毫秒(“精细”)和50毫秒(“粗略”):粗糙平滑抑制过于短暂的事件,精细平滑允许更高精度的放置。
如果在精细和粗略时间尺度上同时上升或下降6dB,并且在5个频带中的至少3个中,则检测到界标。

In practice, simultaneity is measured to a precision of 20 ms. That is, three bands must show 6-dB increases or decreases within 20 ms of each other in the coarsely smoothed power contours, and three must show the same in the finely smoothed contours, and the coarse and fine increases or decreases must lie within 20 ms of each other.
在实践中,同时被测量为20ms的精度。也就是说,在粗糙平滑的功率轮廓中,三个波段必须在20毫秒内显示6-dB增加或减少,并且三个必须在精细平滑的轮廓中显示相同,并且粗略和精细的增加或减少必须在彼此的20ms以内。

In the simplest case, power rises in all the bands, on both time scales, defining a “+b” (unvoiced) or “+s” (voiced) LM. Or it may fall, likewise: “-b” or “-s”, respectively. In practice, it often happens that power rises in three or four frequency bands but stays nearly constant (to within 6 db) in the remaining ones.
在最简单的情况下,在两个时间尺度上,所有频带中的功率上升,定义为“+ b”(无声)或“+ s”(有声)LM。或者它可能会下降,同样:“-b”或“-s”。在实践中,经常发生功率在三个或四个频带上升但在其余频带中保持几乎恒定(在6分贝内)。

A more complicated case arises for fricative-like “f” (unvoiced) or “v” (voiced) onset and offset LMs. Here, the power rises at high frequencies and simultaneously falls at lower frequencies, defining a “+f” or “+v”. Or it may do the opposite, i.e., falling at high frequencies and rising at low ones: “-f” or “-v”, respectively.
对于类似摩擦的“f”(无声)或“v”(浊音)起始和偏移LM,出现了更复杂的情况。这里,功率在高频时上升并同时在较低频率下降,定义为“+ f”或“+ v”。或者它可以相反,即,在高频下降并在低频上升:分别为“-f”或“-v”。

Note that “b”/“s” LMs always take precedence over “f”/“v”. That is, if power rises in at least three bands, then SpeechMark detects a “+b”/“+s”; a “f”/“v” LM is not detected even if power falls in the other bands. And likewise for power falling in at least three bands: SpeechMark detects “-b”/“-s”.
注意,“b”/“s”LM总是优先于“f”/“v”。也就是说,如果功率上升至少三个频段,则SpeechMark会检测到“+ b”/“+ s”;即使功率下降到其他频段,也不会检测到“f”/“v”LM。同样,电源至少落在三个频段:SpeechMark检测到“-b”/“ - s”。

Figure 1 shows an example of the abrupt LMs for one syllable of an infant babble. In contrast to peak LMs, SpeechMark functions draw abrupt LMs above the waveform, labeled by lowercase letters. SpeechMark groups the LMs into one syllabic cluster, covering exactly the segment from the beginning at +g to the ending –g, but (in this example) not beyond. However, they are also grouped into an utterance that does extend beyond this point.
图1显示了婴儿唠叨的一个音节的突然LM的示例。与峰值LM相比,SpeechMark函数在波形上方绘制突然的LM,用小写字母标记。 SpeechMark将LM分组为一个音节群集,完全覆盖从+ g开头到结尾-g的段,但(在此示例中)不超出。然而,它们也被分为一个超出这一点的话语。

Also notice that the narrow-band spectrogram shows the characteristic horizontal stripes of welldeveloped voicing. However, it further shows two abrupt changes of period at 0.04s and 0.26s, as well as a loss of periodicity (-p) at 0.33s. Finally, notice that an acoustic event at 0.17s is (correctly) not detected as a LM, because it does not appear in enough spectral bands.
另请注意,窄带频谱图显示了发展良好的发声的特征水平条纹。然而,它进一步显示了0.04s和0.26s的两个周期的突然变化,以及0.33s处的周期性(-p)的损失。最后,请注意,0.17s的声学事件(正确地)未被检测为LM,因为它没有出现在足够的光谱带中。

Figure 1. Example landmarks. One syllable of an infant babble is shown. The LMs are placed at instants of abrupt change of energy occurring simultaneously across multiple frequency ranges and at multiple time scales. (top) Waveform with smoothed amplitude envelope, landmarks (+g through –g, green vertical lines), and landmark grouping. Graphics show the interval of voicing (solid red line), grouping as a syllabic cluster (dashed light blue), and grouping as part of an utterance that continues beyond the window (dashed magenta).(bottom) Narrow-band spectrogram of the segment with dotted line through F0 and dashed line through 10 F0. The spectrogram shows the harmonics (horizontal stripes). Periodicity is strong even at the start of voicing (0.01s), so the +g LM is coincident with the corresponding +p (not shown). Note that the event at 0.17s affects too few spectral bands and therefore does not generate a LM. Also note abrupt jumps in F0: Jumps and periodicity events do not contribute to defining the syllabic cluster, so this example is considered a +g+s-s-s-g cluster.
图1.示例地标。显示了婴儿唠叨的一个音节。 LM在多个频率范围和多个时间尺度上同时发生的能量突然变化的瞬间被放置。 (顶部)波形具有平滑的幅度包络,地标(+ g到-g,绿色垂直线)和地标分组。图形显示发声间隔(实线红色),分组为音节簇(蓝色浅蓝色),并将分组作为继续超出窗口的话语的一部分(虚线洋红色)。(底部)片段的窄带频谱图通过F0的虚线和通过10 F0的虚线。频谱图显示谐波(水平条纹)。即使在发声开始时(0.01s),周期性也很强,因此+ g LM与相应的+ p(未示出)一致。请注意,0.17s的事件影响光谱带太少,因此不会生成LM。另请注意F0中的突然跳转:跳转和周期性事件无助于定义音节群集,因此此示例被视为+ g + s-s-s-g群集。

The following table summarizes the rules for the abrupt LMs.
下表总结了突然LM的规则。

Table. Rules to identify each type of AC LM. The symbols and mnemonics are not intended to identify underlying articulatory or phonetic events, only to suggest examples: syllabic, voiced frication, etc.
表中指明每种AC LM的规则。符号和助记符并非旨在识别潜在的发音或语音事件,仅用于建议示例:音节,浊音等。

更多推荐

文档翻译:What Are Acoustic Landmarks, and What Do They Describe?

本文发布于:2024-02-19 13:27:55,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1764308.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文档   Acoustic   Describe   Landmarks

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!