Language Detection
In this section you can query a TexeK Klassifier designed to work as a
language
detector.
The Klassifier has been build by simply declaring four classes, one for
each
target language:
- 'ang' for english
- 'frn' for french
- 'cas' for castilian
- 'cat' for catalian
And then using as examples of each language 3 texts extracted from the
Project Gutemberg
catalog
(stripping the PG english preface templates), specifically:
- class 'ang' refs: 11297,
11298, 11303
- class 'frn' refs: 10061,
10346, 11300
- class 'cas' refs: 10293,
11070, 11302
- class 'cat' refs: 11306,
14768, 14816
The whole process took less than five minutes to deploy. No special
criteria has been used to select those references other than
to ensure that no mixed languages were used in the texts.
To test the Texe
K K
lassifier,
just enter a short
phrase in any of the four languages and pres 'submit', the
Klassifier will try to detect the language used in the entered phrase..
News Detection
In this section you can query a TexeK Klassifier designed to work as a
news type detector.
This Klassifier has been build by declaring five classes, one for
each
usual news type.
- 'bussines'
- 'entertainment'
- 'health'
- 'scitech'
- 'sports'
The trainning has been done by presenting 100 news english web pages
for each section extracted from Google News between days 6-15 June 2005.
The
only consideration has been to avoid web redirects or subscription
based
news pages.
To test the Texe
K Klassifier,
just enter the url of an english news page and pres 'submit',
the
Klassifier will download the page and try to detect what kind of news
the page is about.