
-----------------------------------
deltatux
Mon Nov 17, 2008 10:55 pm

Word Count in C
-----------------------------------
Hi,

I was wondering if anyone have any helpful resources on how to do word counts in C. I have been trying to find simple to understand and very useful C examples on how to do this but to no avail. I tried Googling too, but can't seem to find any good examples.

If anyone can help, please help and point me to great examples!

Many thanks,
deltatux

P.S: To XKCD readers, no pointers please, you know what I mean =P.

-----------------------------------
A.J
Mon Nov 17, 2008 11:27 pm

Re: Word Count in C
-----------------------------------

P.S To XKCD readers, no pointers please, you know what I mean =P.


don't worry...no pointers :D

by word count, do u mean to count the number of words in a sentence?

If yes, all u have to do is have a variable thats stores the number of words u have so far (it shud start at '0'), and increase it by one every time u get a string...


I hav a feeling that this isnt what u want.....sry if this  didnt help...

please clarify what u mean by 'word count'

-----------------------------------
deltatux
Tue Nov 18, 2008 12:55 am

Re: Word Count in C
-----------------------------------
for example:

Hello I am deltatux


The software should say that there are 4 words.

Thanks,
deltatux

-----------------------------------
Euphoracle
Tue Nov 18, 2008 7:06 am

RE:Word Count in C
-----------------------------------
Well, you can count the number of spaces in the sequence, and add 1 for each complete sentence, assuming you're not counting hyphenated words as two individual words, or putting two spaces after punctuation.  If the latter is true, feel free to keep track of whether the last character you checked was a space or punctuation, and if it was, ignore further spaces until it isn't.  You can reset your "add 1 'counter'" by determining if you've passed over punctuation.  The general structure of a simple starting sentence is:

and for a simple sentence further in:



I have a blue dog.  His name is Cody.


4 spaces in first sentence, add 1 = 5 words.
5 spaces in second sentence, remove 2 from start, add 1 = 4 words.

Total = 5 words + 4 words = 9 words.

-----------------------------------
A.J
Tue Nov 18, 2008 11:51 am

Re: Word Count in C
-----------------------------------

Well, you can count the number of spaces in the sequence, and add 1 for each complete sentence, assuming you're not counting hyphenated words as two individual words, or putting two spaces after punctuation.  If the latter is true, feel free to keep track of whether the last character you checked was a space or punctuation, and if it was, ignore further spaces until it isn't.  You can reset your "add 1 'counter'" by determining if you've passed over punctuation.  The general structure of a simple starting sentence is:

and for a simple sentence further in:



I have a blue dog.  His name is Cody.


4 spaces in first sentence, add 1 = 5 words.
5 spaces in second sentence, remove 2 from start, add 1 = 4 words.

Total = 5 words + 4 words = 9 words.


That doesn't necessarily work. what if the sentence is :

 Hi, I am A.J.


This has a space before the sentence...

I wud say FIRST remove the space before and after the senctence, remove all punctuation, then count the spaces and + 1.

-----------------------------------
deltatux
Tue Nov 18, 2008 1:19 pm

RE:Word Count in C
-----------------------------------
This is very confusing ... can someone explain it a bit simpler?

Many thanks,
deltatux

-----------------------------------
md
Tue Nov 18, 2008 2:09 pm

RE:Word Count in C
-----------------------------------
read string
index = 0
while index < length of string
    while character in string at index is not a space
        index++
    word count += 1
    while character in string at index is a space
        index++


-----------------------------------
A.J
Tue Nov 18, 2008 3:53 pm

Re: Word Count in C
-----------------------------------
then again, that counts the spaces before and after the string

do exactly what md did, but first replace all spaces.
read string
index = 0
while index < length of string
    while character in string at index is not a space
        index++
    word count += 1
    while character in string at index is a space
        index++



-----------------------------------
Euphoracle
Tue Nov 18, 2008 4:19 pm

Re: Word Count in C
-----------------------------------

Well, you can count the number of spaces in the sequence, and add 1 for each complete sentence, assuming you're not counting hyphenated words as two individual words, or putting two spaces after punctuation.  If the latter is true, feel free to keep track of whether the last character you checked was a space or punctuation, and if it was, ignore further spaces until it isn't.  You can reset your "add 1 'counter'" by determining if you've passed over punctuation.  The general structure of a simple starting sentence is:

and for a simple sentence further in:



I have a blue dog.  His name is Cody.


4 spaces in first sentence, add 1 = 5 words.
5 spaces in second sentence, remove 2 from start, add 1 = 4 words.

Total = 5 words + 4 words = 9 words.


That doesn't necessarily work. what if the sentence is :

 Hi, I am A.J.


This has a space before the sentence...

I wud say FIRST remove the space before and after the senctence, remove all punctuation, then count the spaces and + 1.

That was assuming that you're not giving it silly data.  Also...

0 + 1 = 1; (Hi,)
2 + 1 = 3; (, I am A.)
0 + 1 = 1; (.J.)

= 5.

A.J. is on preference, imo.  It can be one or two words, I guess; I don't know how you want it.  I'd count it as two, just because it stands for two words, and it's not a recognized acronym, like RADAR or NASA or COBOL and the likes.

-----------------------------------
A.J
Tue Nov 18, 2008 6:16 pm

Re: Word Count in C
-----------------------------------

Well, you can count the number of spaces in the sequence, and add 1 for each complete sentence, assuming you're not counting hyphenated words as two individual words, or putting two spaces after punctuation.  If the latter is true, feel free to keep track of whether the last character you checked was a space or punctuation, and if it was, ignore further spaces until it isn't.  You can reset your "add 1 'counter'" by determining if you've passed over punctuation.  The general structure of a simple starting sentence is:

and for a simple sentence further in:



I have a blue dog.  His name is Cody.


4 spaces in first sentence, add 1 = 5 words.
5 spaces in second sentence, remove 2 from start, add 1 = 4 words.

Total = 5 words + 4 words = 9 words.


That doesn't necessarily work. what if the sentence is :

 Hi, I am A.J.


This has a space before the sentence...

I wud say FIRST remove the space before and after the senctence, remove all punctuation, then count the spaces and + 1.

That was assuming that you're not giving it silly data.  Also...

0 + 1 = 1; (Hi,)
2 + 1 = 3; (, I am A.)
0 + 1 = 1; (.J.)

= 5.

A.J. is on preference, imo.  It can be one or two words, I guess; I don't know how you want it.  I'd count it as two, just because it stands for two words, and it's not a recognized acronym, like RADAR or NASA or COBOL and the likes.



well, technically i put a space before the 'Hi, I am A.J' (look at it again)...

but if u dont get messed up testcases, then Euphoracle is right....
sry but the confusion, deltatux

[edit by md] fixed quote

-----------------------------------
md
Tue Nov 18, 2008 6:54 pm

Re: Word Count in C
-----------------------------------
then again, that counts the spaces before and after the string

do exactly what md did, but first replace all spaces.
read string
index = 0
while index < length of string
    while character in string at index is not a space
        index++
    word count += 1
    while character in string at index is a space
        index++



Infact that algorithm does not count spaces twice. And replacing spaces first (with what?) will break it entirely since it requires spaces as word separators.

-----------------------------------
A.J
Tue Nov 18, 2008 7:35 pm

Re: Word Count in C
-----------------------------------
what i meant was to remove spaces before the words...but nevertheless, it wud still work md :D[/b]

-----------------------------------
md
Tue Nov 18, 2008 10:15 pm

RE:Word Count in C
-----------------------------------
The space after one word is the space before another ;-)

Incidentally, why mangle a string which you probably need somewhere else just to count words? Algorithms that destroy data do have their place; but it is not here.

-----------------------------------
Vermette
Wed Nov 19, 2008 10:21 am

RE:Word Count in C
-----------------------------------
Publisher style word count:

Book.length/5 ;)

-----------------------------------
Euphoracle
Wed Nov 19, 2008 3:14 pm

Re: Word Count in C
-----------------------------------

Well, you can count the number of spaces in the sequence, and add 1 for each complete sentence, assuming you're not counting hyphenated words as two individual words, or putting two spaces after punctuation.  If the latter is true, feel free to keep track of whether the last character you checked was a space or punctuation, and if it was, ignore further spaces until it isn't.  You can reset your "add 1 'counter'" by determining if you've passed over punctuation.  The general structure of a simple starting sentence is:

and for a simple sentence further in:



I have a blue dog.  His name is Cody.


4 spaces in first sentence, add 1 = 5 words.
5 spaces in second sentence, remove 2 from start, add 1 = 4 words.

Total = 5 words + 4 words = 9 words.


That doesn't necessarily work. what if the sentence is :

 Hi, I am A.J.


This has a space before the sentence...

I wud say FIRST remove the space before and after the senctence, remove all punctuation, then count the spaces and + 1.

That was assuming that you're not giving it silly data.  Also...

0 + 1 = 1; (Hi,)
2 + 1 = 3; (, I am A.)
0 + 1 = 1; (.J.)

= 5.

A.J. is on preference, imo.  It can be one or two words, I guess; I don't know how you want it.  I'd count it as two, just because it stands for two words, and it's not a recognized acronym, like RADAR or NASA or COBOL and the likes.



well, technically i put a space before the 'Hi, I am A.J' (look at it again)...

but if u dont get messed up testcases, then Euphoracle is right....
sry but the confusion, deltatux



Which is why I said "That was assuming that you're not giving it silly data."  I removed the space.

-----------------------------------
Okapi
Fri Nov 21, 2008 10:42 am

RE:Word Count in C
-----------------------------------
Here's another solution for your problem (I remember reading this algorithm in a C book by Dennis Ritchie.

while (c = getchar() != EOF)
{
if (c == ' ' || c == '\n' || c == '\t')
state = NO;
else if (state = NO)
state = YES;
word++;
}

-----------------------------------
deltatux
Sat Nov 29, 2008 3:40 pm

RE:Word Count in C
-----------------------------------
Hey guys,

Sorry for taking so long to update, I was caught up with tons of other stuff and this fell off my priority list.

Thanks for the tips, I'll rework it.

deltatux
