Forum : FC BBS

ctrl/dictionary_english.txt

From nightfox@1:103/705 to CVS commit on Fri May 24 17:29:02 2019

ctrl dictionary_english.txt NONE 1.1
Update of /cvsroot/sbbs/ctrl
In directory cvs:/tmp/cvs-serv8999

Added Files:
dictionary_english.txt
Log Message:
This is an English dictionary for spell checking purposes - basically just a list of words, in text format. Note that this must be sorted in order for spell checking to work. SlyEdit's new spell checker feature, for instance, uses a binary search with an array of words loaded from a dictionary to check the validity of words.
SlyEdit's new spell check feature assumes dictionary filenames are dictionary_<language>.txt, where <language> is the language name.

--- SBBSecho 3.07-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Digital Man@1:103/705 to nightfox on Fri May 24 18:04:05 2019

Re: ctrl/dictionary_english.txt
By: nightfox to CVS commit on Fri May 24 2019 05:29 pm

ctrl dictionary_english.txt NONE 1.1
Update of /cvsroot/sbbs/ctrl
In directory cvs:/tmp/cvs-serv8999

Added Files:
dictionary_english.txt
Log Message:
This is an English dictionary for spell checking purposes - basically just a list of words, in text format. Note that this must be sorted in order for spell checking to work. SlyEdit's new spell checker feature, for instance, uses a binary search with an array of words loaded from a dictionary to check the validity of words.
SlyEdit's new spell check feature assumes dictionary filenames are dictionary_<language>.txt, where <language> is the language name.

So... would that be US-English or UK-English or Australian-English? I would recommend using standard language tags: https://en.wikipedia.org/wiki/Language_localisation

digital man

This Is Spinal Tap quote #42:
What day the Lord created Spinal Tap and couldn't he have rested on that day? Norco, CA WX: 70.1°F, 51.0% humidity, 11 mph E wind, 0.00 inches rain/24hrs
--- SBBSecho 3.07-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Nightfox@1:103/705 to Digital Man on Sat May 25 13:36:10 2019

Re: ctrl/dictionary_english.txt
By: Digital Man to nightfox on Fri May 24 2019 06:04 pm

So... would that be US-English or UK-English or Australian-English? I would recommend using standard language tags: https://en.wikipedia.org/wiki/Language_localisation

Thanks, that's probably a good idea. I've replaced that dictionary with several localized English dictionaries.

My original idea was that BBSes these days usually get callers from all over the world, so perhaps just one English dictionary that covered all regions might be simpler. But it probably makes more sense to have the localized dictionaries.

Nightfox

---
■ Synchronet ■ Digital Distortion: digitaldistortionbbs.com
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Digital Man@1:103/705 to Nightfox on Sat May 25 14:12:18 2019

Re: ctrl/dictionary_english.txt
By: Nightfox to Digital Man on Sat May 25 2019 01:36 pm

Re: ctrl/dictionary_english.txt
By: Digital Man to nightfox on Fri May 24 2019 06:04 pm

So... would that be US-English or UK-English or Australian-English? I would recommend using standard language tags: https://en.wikipedia.org/wiki/Language_localisation

Thanks, that's probably a good idea. I've replaced that dictionary with several localized English dictionaries.

My original idea was that BBSes these days usually get callers from all over the world, so perhaps just one English dictionary that covered all regions might be simpler. But it probably makes more sense to have the localized dictionaries.

Most English dictionaries are going to be 99% the same, but of course we 'mericans decided to spell things differ'nt: color, honor, favorite, etc. Maybe
one common/main english dictionary and then "diff" files for each sub-region (which would add or remove words). That'd reduce a lot of redundancy and *possibly* (?) make the dictionary maintenance easier.

So right now you have:
1,321,370 dictionary_en-AU.txt
1,318,337 dictionary_en-CA.txt
1,313,337 dictionary_en-GB.txt
1,318,297 dictionary_en-US.txt

Instead, you could have:

~1M dictionary_en.txt
~1-5k dictionary_en-AU.txt
~1-5k dictionary_en-CA.txt
~1-5k dictionary_en-GB.txt
~1-5k dictionary_en-US.txt

Disk space isn't really an issue, so if it's easier/better/faster to have multiple 99%-dupe dictionary files (as you have), then that's fine with me.

digital man

Synchronet "Real Fact" #14:
SBBSecho was originally written by Allen Christiansen (King Drafus) in 1994. Norco, CA WX: 69.0°F, 59.0% humidity, 7 mph NE wind, 0.00 inches rain/24hrs
--- SBBSecho 3.07-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Nightfox@1:103/705 to Digital Man on Sat May 25 22:23:38 2019

Most English dictionaries are going to be 99% the same, but of course we 'mericans decided to spell things differ'nt: color, honor, favorite, etc. Maybe one common/main english dictionary and then "diff" files for each sub-region (which would add or remove words). That'd reduce a lot of redundancy and *possibly* (?) make the dictionary maintenance easier.

So right now you have:
1,321,370 dictionary_en-AU.txt
1,318,337 dictionary_en-CA.txt
1,313,337 dictionary_en-GB.txt
1,318,297 dictionary_en-US.txt

Instead, you could have:

~1M dictionary_en.txt
~1-5k dictionary_en-AU.txt
~1-5k dictionary_en-CA.txt
~1-5k dictionary_en-GB.txt
~1-5k dictionary_en-US.txt

Disk space isn't really an issue, so if it's easier/better/faster to have multiple 99%-dupe dictionary files (as you have), then that's fine with me.

The diff dictionaries would probably work fine. I agree it would probably be easier to maintain. And it should also cut down on the time taken to search multiple dictionaries in case a word isn't found in one. The binary search is fairly quick already, so speed doesn't seem to be much of an issue right now. Even when I'm using all 4 dictionaries, matching a non-word seems pretty much immediate.

Nightfox

---
■ Synchronet ■ Digital Distortion: digitaldistortionbbs.com
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Digital Man@1:103/705 to Nightfox on Sun May 26 01:22:48 2019

Re: ctrl/dictionary_english.txt
By: Nightfox to Digital Man on Sat May 25 2019 10:23 pm

Most English dictionaries are going to be 99% the same, but of course we 'mericans decided to spell things differ'nt: color, honor, favorite, etc. Maybe one common/main english dictionary and then "diff" files for each sub-region (which would add or remove words). That'd reduce a lot of redundancy and *possibly* (?) make the dictionary maintenance easier.

So right now you have:
1,321,370 dictionary_en-AU.txt
1,318,337 dictionary_en-CA.txt
1,313,337 dictionary_en-GB.txt
1,318,297 dictionary_en-US.txt

Instead, you could have:

~1M dictionary_en.txt
~1-5k dictionary_en-AU.txt
~1-5k dictionary_en-CA.txt
~1-5k dictionary_en-GB.txt
~1-5k dictionary_en-US.txt

Disk space isn't really an issue, so if it's easier/better/faster to have multiple 99%-dupe dictionary files (as you have), then that's fine with me.

The diff dictionaries would probably work fine. I agree it would probably be easier to maintain. And it should also cut down on the time taken to search multiple dictionaries in case a word isn't found in one. The binary search is fairly quick already, so speed doesn't seem to be much of an issue right now. Even when I'm using all 4 dictionaries, matching a non-word seems pretty much immediate.

As long as you're not trying to spell-check as the user types. To do that, you probably would need to use an in-memory trie: https://en.wikipedia.org/wiki/Trie

digital man

This Is Spinal Tap quote #5:
Nigel Tufnel: Authorities said... best leave it... unsolved.
Norco, CA WX: 54.6°F, 89.0% humidity, 2 mph ESE wind, 0.00 inches rain/24hrs --- SBBSecho 3.07-Linux
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

From Nightfox@1:103/705 to Digital Man on Mon May 27 15:06:18 2019

As long as you're not trying to spell-check as the user types. To do that, you probably would need to use an in-memory trie: https://en.wikipedia.org/wiki/Trie

It doesn't do spell checking as the user types, only when the user starts the spell check process. It then looks up each word and prompts the user for a correction if it finds an invalid word.

Nightfox

---
■ Synchronet ■ Digital Distortion: digitaldistortionbbs.com
* Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)

Who's Online

System Info

Sysop:	Kurt Hamm
Location:	Columbia, SC
Users:	6
Nodes:	20 (0 / 20)
Uptime:	121:59:00
Calls:	3,012
Files:	64
Messages:	910,591

ctrl/dictionary_english.txt

Who's Online

System Info