The growth of collection and development of natural occurring language texts have multiplied since large corpora are considered to be important research resources for language investigating in natural language processing (NLP) and theoretical linguistic study, as well. From such corpora, researchers can automatically extract linguistic information such as word-collocation, word with specific part-of-speech, and so on. On the other hand, verifying language hypotheses or making a solution of language phenomena as a "GRAMMAR" is also applicable.

The Part-of-Speech Tagged Corpus: Orchid is an aim to build a Thai text corpus with syntactic word class annotation. The part-of-speech tagged corpus is not our final goal in constructing Thai text resource for NLP research, but instead it is our first step to make Thai text resource available. Though there is no consensus of many issues in Thai syntax (such as, word or sentence construction, word or sentence classification, etc.), we initially propose a standard using in constructing Orchid. Word classification as well as word and sentence breaking using Orchid is somehow verified in machine translation system. They are not closed to the competence of Thai syntax but are expected to be verified together with the corpus and to be improved by thoroughly use in general text.