Domain-Sensitive Temporal Tagging. Jannik Strötgen. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Jannik Strötgen
Издательство:	Ingram
Серия:	Synthesis Lectures on Human Language Technologies
Жанр произведения:	Программы
Год издания:	0
isbn:	9781681731858

Скачать книгу

“March 2013” can be directly normalized to 2013-03-11 and 2013-03, respectively.

• Implicit expressions: Implicit expressions can be normalized once their implicit temporal semantics is known. Thus, this category is designed specifically for named dates. Examples are holidays that can be directly mapped to a point in time. A simple implicit expression is “Christmas 2013” since Christmas refers to December 25. Thus, the expression can be normalized to 2013-12-25. A more complex example is “Columbus Day 2013” since Columbus Day is scheduled as the second Monday in October. Some calendar calculations have to be performed to normalize the expression to 2013-10-14.

Table 2.1: The four categories how temporal expressions can be realized with examples and an overview of information required for their normalization

• Relative expressions: In contrast to explicit and implicit expressions, relative expressions cannot be normalized without context information. More precisely, a reference time has to be detected to normalize expressions such as “today” and “the following year”. For some relative expressions, the reference time is the point in time when the expression was formulated (e.g., for “today”) while the reference time of other expressions is a point in time mentioned in the context of the expression (e.g., in the statement “in 2000 … in the following year”, 2001 is the normalized value of “the following year” since “2000” is the reference time). In both cases, the reference time is the only required information, because the relation to the reference time is carried by the expressions.

• Underspecified expressions: For the normalization of underspecified expressions, the relation to the reference time is required in addition to the reference time itself. For instance, expressions such as “December” or “December 25” can locally be normalized to XXXX-12 and XXXX-12-25, respectively, that is, without specifying the year. Assuming that the reference time is “November 2013” (2013-11) and the relation to the reference time is “after”, then the two examples can be normalized to 2013-12 and 2013-12-25, respectively.

ALTERNATIVE NAMINGS

As mentioned above, the categorization of temporal expressions referring to points in time has quite a long tradition in the literature. While the set of expressions which we call explicit expressions is usually a fixed set and only the names to refer to such expressions differ—e.g., explicit [e.g., Alonso et al., 2007, Schilder and Habel, 2001], fully specified [e.g., Pustejovsky et al., 2003a], absolute [e.g., Derczynski, 2013, Jurafsky and Martin, 2008], complete [e.g., Hinrichs, 1986], and independent [e.g., Hinrichs, 1986]—expressions we call implicit are less frequently discussed. Grouping the other expressions (i.e., the ones we refer to as relative and underspecified) results in different, partially overlapping sets with multiple names in the literature.

In the following, we present Mazur’s [2012] overview of the terminology used in the literature. For this, the following three example expressions are used:

(i) “tomorrow”,

(ii) “2 days later”, and

(iii) “May 21st”.

While some authors summarize all three types of expressions, e.g., as indexical expressions [e.g., Schilder and Habel, 2001] or relative expressions [e.g., Alonso et al., 2007], they were already separated into three groups by Smith [1978] and Hinrichs [1986]. Expressions such as (i) are frequently referred to as deictic expressions [e.g., Ahn et al., 2005, Busemann et al., 1997, Hinrichs, 1986, Smith, 1978]. Expressions such as (ii) are referred to as anaphoric expressions by some authors [Busemann et al., 1997], while others use the same term to refer to expressions such as (ii) and (iii) [e.g., Ahn et al., 2005]. In our categorization, we follow Busemann et al. [1997] referring to expressions such as (iii) as underspecified expressions.

Some authors include so-called “vague expressions” as a separate group of point expressions. For instance, Mani and Wilson [2000b] use the term to refer to expressions such as “Monday morning” or season names (e.g., “fall”, “winter”) as vague expressions since their boundaries are fuzzy. That is, there are no exact start and end times. However, we agree with Mazur [2012] that the vagueness of such expressions should not result in a specific type of expressions since it “is not the expression that is vague […] [but] the entity referred to that has vague boundaries” [Mazur, 2012].

UNCERTAINTY OF TEMPORAL EXPRESSIONS

Standard date and time expressions are also often used without referring to the full duration of the expression. That is, the actual meaning of them is uncertain, or more specifically, it is not clear which exact time interval they actually refer to [Berberich et al., 2010]. For instance, in “he visited Germany in 2010”, it is rather unlikely that the visit took place the whole year. The exact point or period in 2010 is not known. Thus, all expressions of a larger granularity than a timestamp could be regarded as fuzzy. As will be described in Chapter 3, according to annotation standards, date and time expressions are typically assigned a single normalized value so that we also refer to them as points in time (with specific granularities). However, as pointed out by Berberich et al. [2010]—and as we will also discuss later in Section 3.1 when describing annotation standards—for some applications it may be useful to consider every time and date expression as an interval and to assign lower and upper bounds for the start and end times instead of a single value, that is, to take care of the fuzziness issue.

Figure 2.4: Different realization types of date expressions in documents.

EXAMPLES OF DATE EXPRESSIONS IN A NEWS ARTICLE

In order to become familiar with the naming of realization types of date and time expressions, we give some examples in Figure 2.4. In some excerpts of the news article, which was already shown in Figure 1.1, temporal expressions are marked as either explicit, underspecified, or relative. Since there has been no implicit temporal expression in the original article, we added the last sentence to the example to cover all four realization types of temporal expressions in this example.

As already pointed out above, there are differences in how temporal expressions of the four realization types are to be normalized. Since these differences are one of the key challenges of temporal tagging, we will cover them in detail in Chapter 4. Before that, we will first lay some further foundations (annotation standards and evaluation methods) and present an overview of relevant research competitions as well as existing annotated data sets in the next chapter.

2.4 SUMMARY OF THE CHAPTER

The most important characteristic of temporal information in the context of temporal tagging is that it can be normalized. For applications exploiting normalized temporal information, it is furthermore important that temporal information is well defined and that it can be organized hierarchically. While there are four types of temporal expressions (date, time, duration, and set expressions), several namings of the realizations of date and time expressions have been suggested in the literature. However, in the context of temporal tagging, we suggest to distinguish between explicit, implicit, relative, and underspecified date and time expressions.

CHAPTER 3

Foundations of Temporal Tagging

Скачать книгу