@leosmith
Good thoughts here...
So...my program is detecting an average of 10% of all words being verbs on average...this is a fuzzy number, because I have a lot of different source materials...I'm sure the mix is different between my fiction content, newspaper content, and internet comments sections...
Also, I try to adjust for some skewing where there is a verb that is also another part of speech (most commonly, when there is an adjective that begins with "ma" and then there is also a "ma-" verb) that the program can't differentiate between.
So, that is to say... there's a lot of caveats there.
But, a typical novel will have 300 words per page.
At 10% of words being verbs, that would mean 30 verbs per page.
Among the top 800, assuming they do cover 75% of verb instances, that means on average of 22-23 of the verb instances would be in the list of 800, and 7-8 per page would be outside of that list.
I don't have any of my Tagalog novels at the office with me right now, but I'll do a real-world check of this when I get home sometime.
Again,...this will vary wildly will the subject matter. Internet comments, for example, tend to use a much smaller vocabulary.
And again, this 800 verbs list would count "kumain" and "kainin" as separate verbs, for example.