applications requires availability of sizeable, reliable and representative corpora. This paper
describes how we have constructed a well-structured 345 MB tagged corpus of news, and
presents some beneficial statistics of this corpus based upon the characteristics of Farsi
language. It also goes into particular detail on the fitness of the frequency and rank of Farsi
words with Zipf-Mandelbrot's law. We will then present our measurement of Entropy of Farsi …