Preferential text classification: learning algorithms and evaluation measures (Articolo in rivista)

Type
Label
  • Preferential text classification: learning algorithms and evaluation measures (Articolo in rivista) (literal)
Anno
  • 2009-01-01T00:00:00+01:00 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#doi
  • 10.1007/S10791-008-9071-Y (literal)
Alternative label
  • Aiolli F.; Cardin R.; Sebastiani F.; Sperduti A. (2009)
    Preferential text classification: learning algorithms and evaluation measures
    in Information retrieval (Boston); Springer, Heidelberg (Germania)
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Aiolli F.; Cardin R.; Sebastiani F.; Sperduti A. (literal)
Pagina inizio
  • 559 (literal)
Pagina fine
  • 580 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
  • http://www.springerlink.com/content/v86339676k0x3464/ (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroVolume
  • 12 (literal)
Rivista
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#note
  • In: Information Retrieval, vol. 12 (5) pp. 559 - 580. Springer Verlag, 2009. (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 22 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#numeroFascicolo
  • 5 (literal)
Note
  • Scopu (literal)
  • ISI Web of Science (WOS) (literal)
  • PuMa (literal)
  • Google Scholar (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • Aiolli, Cardin, Sperduti: Dipartimento di Matematica Pura e Applicata, Università di Padova Sebastiani: CNR-ISTI, Pisa (literal)
Titolo
  • Preferential text classification: learning algorithms and evaluation measures (literal)
Abstract
  • In many applicative contexts in which textual documents are labelled with thematic categories, a distinction is made between the primary and the secondary categories that are attached to a given document. The primary categories represent the topics that are central to the document, while the secondary categories represent topics that the document somehow touches upon, albeit peripherally. This distinction has always been neglected in text categorization (TC) research. We contend that the distinction is important, and deserves to be explicitly tackled. The contribution of this paper is three-fold. First, we propose an evaluation measure for this preferential text categorization task, whereby different kinds of misclassifications involving either primary or secondary categories have a different impact on effectiveness. Second, we establish baseline results for this task on a well-known benchmark for patent classification in which the distinction between primary and secondary categories is present; these results are obtained by using state-of-the-art learning technology such as multiclass SVMs (for detecting the unique primary category) and binary SVMs (for detecting the secondary categories). Third, we improve on these results by using a recently proposed class of algorithms explicitly devised for learning from training data expressed in preferential form, i.e. in the form 'for document d_i, category c' is preferred to category c' '; this allows us to distinguish between primary and secondary categories not only in the testing phase but also in the learning phase, thus differentiating their impact on the classifiers to be generated. (literal)
Editore
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Prodotto
Autore CNR di
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#rivistaDi
Editore di
Insieme di parole chiave di
data.CNR.it