regex - Remove stopwords from a string in Java -
I have many words that I need to count.
But I want to avoid a few words without a word without reference.
So, I have a file with all the words that I will ignore. I open this file and make a list of what I have called
ArrayList & lt; String & gt; stopWordsList; Now I have a string and it needs to be cleaned, to end the stop words from the list.
I have tried such a thing:
string example = "job in software factory. Work with playful, spring, hibernation, GWT, etc."; (String stop word: stop wordlist) {example = example.replaceAll ("+ StopWord +", ""); After that, an example of a string should be: "Job software factory. Work is fickle, spring, in hibernation, GWT."
The problem is that "etc." For
(string stop word: stop wordlist) {example = example.replaceAll ("" + StopWorld + "", ""); Example = example.replaceAll ("" + StopWord + ",", ","); Example = example.replaceAll ("" + StopWord + ".", "."); } But, this is not correct, I do not need it.
Does anyone help me find a way to clean up this string before punctuation or spaces?
PS: I just can not do
example = example.replaceAll (stopWord, ""); Because it can break some words like "initial" it will remove "in" and leave me "i.e.el".
The easiest way is to add string and everything to the words boundaries but close the word.
stringbilder result = new stringbilder (example.lammith ()); For (string s: result.split ("\\ b") {if (! StopWordsSet.contains ()) result.append (s); }
Comments
Post a Comment