Sindhi Text Corpus using XML and Custom Tags


Article PDF :

Veiw Full Text PDF

Article type :

Original article

Author :

Zeeshan Bhatti

Volume :

2

Issue :

2

Abstract :

Sindhi language being one of the oldest languages of the world, has still very limited use in digital age due to lack of digital contents. The use of corpus for each language has been extremely important in facilitating the natural language processing of its script. This research work address the issue of building corpus for Sindhi Language using UML based Tagging. The tree based XML tag structure is designed to develop Sindhi Corpa, that has two main nodes namely metadata and sindhi Document which contains the main text.

Keyword :

Corpus, Sindhi, Sindhi Corpus, Natural Language Processing, XML
Journals Insights Open Access Journal Filmy Knowledge Hanuman Devotee Avtarit Wiki In Hindi Multiple Choice GK