On a lighter note on this wee project of mine, I have had some traffic from search engines to my blog from people searching for obscure words that have turned up in the gobbledegoop.
Project details:
Name: Readable gobbledegoop
Aim: Initially, to generate something that has been generated by a computer, but would be physically possible to read. This has taken a tangent somewhat into attempting to get closer and closer to real words.
I scanned a book of G.A. Henty hoping to get letter frequencies.
Approximate ratios of letters of the alphabet as they occur.
233,43,60,120,364,69,60,191,177,3,23,108,64,183,220,42,2,172,181,273,82,31,68,3,54,1
As my program has ‘qu’ rather than ‘q’, i subtract all occurences of q from u to get a more accurate quantity of ‘u’s. Sadly, doing it like this, and being too tired to rewrite code to scan for ‘th’ and ‘ch’ I opted to take them out of my list of characters as individual characters. They will probably find their way back in some time soon.
I fiddled around, and it was quicker to generate a big list of all possible letters, with them occuring as many times as they would (compared to ‘z’ occuring once). This is more processor intensive than assigning proper weightings, but at 1am, I’m much rather the processor does the hard work rather than me
. Besides, my computer is somewhat above average speed and has more than the average memory.
For any possible critics who believe this not worth the effort, I immediately tried to generate something, obtaining the following:
sanurum noda yetadeyofe kerafer rihe sarew hirawahete secono mibubo sine sotocifet huyare desa sim cafihohit ronote sahay sitegu lanud norome lito yoneh vocil rihime hof sane legorodob wuheyo ninode ropifo nehofe hileto dehop ferera miriy hunode cohokeni cidayen lorol.
Obviously I’m not at the capitalisation/punctuation stage, but that is a definite improvement on my last attempts with their gross proliferations of zeds et alia similis.
To do:
Find proper weighting for wordlength. Same for sentence length. Sentence structure would be good if I could get hold of seperate dictionaries of nouns, verbs etc.
Get th’s and ch’s and endings such as s,es,ed and ing in somehow.
I may write a program to search all possible two letter combinations and find which letter combinations are most common.
Another attempt:
ror sen pude sonehat raridatoso wipec meyuni vat mam vetel tira webe datag soneke widoh sesecat tetuceter detirewad dal fedirefo dalel soder hel cadifas mar tuwerom raveye seseto wesen tew
With this run, I drastically altered the word length weightings in favour of shorter words.


