
The Part-of-Speech Tagged Corpus: Orchid is an aim to build a Thai text corpus with syntactic word class annotation. The part-of-speech tagged corpus is not our final goal in constructing Thai text resource for NLP research, but instead it is our first step to make Thai text resource available. Though there is no consensus of many issues in Thai syntax (such as, word or sentence construction, word or sentence classification, etc.), we initially propose a standard using in constructing Orchid. Word classification as well as word and sentence breaking using Orchid is somehow verified in machine translation system. They are not closed to the competence of Thai syntax but are expected to be verified together with the corpus and to be improved by thoroughly use in general text.
