Close
 


List of entries with sentences? - Page 2

« Back
12
Message Menu
Author Photo TLDCAdmin Badge: AdminBadge: SupporterBadge: Serious SupporterBadge: VIP Supporter
Jan 09 2020, 11:50am CST ~ 4 years ago. 
@leosmith
Good thoughts here...
 
So...my program is detecting an average of 10% of all words being verbs on average...this is a fuzzy number, because I have a lot of different source materials...I'm sure the mix is different between my fiction content, newspaper content, and internet comments sections...
 
Also, I try to adjust for some skewing where there is a verb that is also another part of speech (most commonly, when there is an adjective that begins with "ma" and then there is also a "ma-" verb) that the program can't differentiate between.
 
So, that is to say... there's a lot of caveats there.
 
But, a typical novel will have 300 words per page.
At 10% of words being verbs, that would mean 30 verbs per page.
Among the top 800, assuming they do cover 75% of verb instances, that means on average of 22-23 of the verb instances would be in the list of 800, and 7-8 per page would be outside of that list.
 
I don't have any of my Tagalog novels at the office with me right now, but I'll do a real-world check of this when I get home sometime.
 
Again,...this will vary wildly will the subject matter. Internet comments, for example, tend to use a much smaller vocabulary.
 
And again, this 800 verbs list would count "kumain" and "kainin" as separate verbs, for example.
 
Message Menu
Author Photo leosmith
Jan 10 2020, 2:36am CST ~ 4 years ago. 
Thanks for the detailed answer. Interesting stats for sure.
 
Message Menu
Author Photo BoraMac Badge: Supporter
Jan 10 2020, 4:02am CST ~ 4 years ago. 
In my solitary pursuit of a verb index against roots...my initial look:
 
820 Total Roots
770 Roots with at least one conjugation for um, mag, ma, in, an, i

361 47% Um Verbs
431 56% Mag Verbs
494 64% Ma Verbs
426 55% In Verbs
508 66% An Verbs
457 59% I Verbs

2677 348% Total verbs conjugated
 
A two-level review by native speakers is underway around the table. Spirited Tagalog all around. Updates to come.
 
Message Menu
Author Photo TLDCAdmin Badge: AdminBadge: SupporterBadge: Serious SupporterBadge: VIP Supporter
Jan 10 2020, 10:34am CST ~ 4 years ago. 
This is what I got with my data...but again, my data may skewed in various ways that are difficult to tease out:
 
MA- = 24%
-IN = 19%
-UM- 15.5%
MAG- = 12.5%
-AN = 8%
I- = 6%
Everything Else = 15%
 
This data is a little tricky, though...my data doesn't include "ma- -an" or "pa- -an" verbs, for example, in the "-an" category...it puts them in the "Other" category...so...I guess this is up for interpretation and valid stats may vary.
 
12
Post a Reply»




« Back to Main Page
Views: 292