ctrl dictionary_english.txt NONE 1.1
Update of /cvsroot/sbbs/ctrl
In directory cvs:/tmp/cvs-serv8999
Added Files:
dictionary_english.txt
Log Message:
This is an English dictionary for spell checking purposes - basically just a list of words, in text format. Note that this must be sorted in order for spell checking to work. SlyEdit's new spell checker feature, for instance, uses a binary search with an array of words loaded from a dictionary to check the validity of words.
SlyEdit's new spell check feature assumes dictionary filenames are dictionary_<language>.txt, where <language> is the language name.
So... would that be US-English or UK-English or Australian-English? I would recommend using standard language tags: https://en.wikipedia.org/wiki/Language_localisation
Re: ctrl/dictionary_english.txt
By: Digital Man to nightfox on Fri May 24 2019 06:04 pm
So... would that be US-English or UK-English or Australian-English? I would recommend using standard language tags: https://en.wikipedia.org/wiki/Language_localisation
Thanks, that's probably a good idea. I've replaced that dictionary with several localized English dictionaries.
My original idea was that BBSes these days usually get callers from all over the world, so perhaps just one English dictionary that covered all regions might be simpler. But it probably makes more sense to have the localized dictionaries.
Most English dictionaries are going to be 99% the same, but of course we 'mericans decided to spell things differ'nt: color, honor, favorite, etc. Maybe one common/main english dictionary and then "diff" files for each sub-region (which would add or remove words). That'd reduce a lot of redundancy and *possibly* (?) make the dictionary maintenance easier.
So right now you have:
1,321,370 dictionary_en-AU.txt
1,318,337 dictionary_en-CA.txt
1,313,337 dictionary_en-GB.txt
1,318,297 dictionary_en-US.txt
Instead, you could have:
~1M dictionary_en.txt
~1-5k dictionary_en-AU.txt
~1-5k dictionary_en-CA.txt
~1-5k dictionary_en-GB.txt
~1-5k dictionary_en-US.txt
Disk space isn't really an issue, so if it's easier/better/faster to have multiple 99%-dupe dictionary files (as you have), then that's fine with me.
Most English dictionaries are going to be 99% the same, but of course we 'mericans decided to spell things differ'nt: color, honor, favorite, etc. Maybe one common/main english dictionary and then "diff" files for each sub-region (which would add or remove words). That'd reduce a lot of redundancy and *possibly* (?) make the dictionary maintenance easier.
So right now you have:
1,321,370 dictionary_en-AU.txt
1,318,337 dictionary_en-CA.txt
1,313,337 dictionary_en-GB.txt
1,318,297 dictionary_en-US.txt
Instead, you could have:
~1M dictionary_en.txt
~1-5k dictionary_en-AU.txt
~1-5k dictionary_en-CA.txt
~1-5k dictionary_en-GB.txt
~1-5k dictionary_en-US.txt
Disk space isn't really an issue, so if it's easier/better/faster to have multiple 99%-dupe dictionary files (as you have), then that's fine with me.
The diff dictionaries would probably work fine. I agree it would probably be easier to maintain. And it should also cut down on the time taken to search multiple dictionaries in case a word isn't found in one. The binary search is fairly quick already, so speed doesn't seem to be much of an issue right now. Even when I'm using all 4 dictionaries, matching a non-word seems pretty much immediate.
As long as you're not trying to spell-check as the user types. To do that, you probably would need to use an in-memory trie: https://en.wikipedia.org/wiki/Trie
Sysop: | Kurt Hamm |
---|---|
Location: | Columbia, SC |
Users: | 8 |
Nodes: | 20 (0 / 20) |
Uptime: | 229:57:38 |
Calls: | 2,976 |
Calls today: | 5 |
Files: | 64 |
D/L today: |
2 files (234K bytes) |
Messages: | 868,830 |