regex - Remove stopwords from a string in Java -


I have many words that I need to count.

But I want to avoid a few words without a word without reference.

So, I have a file with all the words that I will ignore. I open this file and make a list of what I have called

  ArrayList & lt; String & gt; stopWordsList;   

Now I have a string and it needs to be cleaned, to end the stop words from the list.

I have tried such a thing:

  string example = "job in software factory. Work with playful, spring, hibernation, GWT, etc."; (String stop word: stop wordlist) {example = example.replaceAll ("+ StopWord +", ""); After that, an example of a string should be:  

"Job software factory. Work is fickle, spring, in hibernation, GWT."

The problem is that "etc." For

  (string stop word: stop wordlist) {example = example.replaceAll ("" + StopWorld + "", ""); Example = example.replaceAll ("" + StopWord + ",", ","); Example = example.replaceAll ("" + StopWord + ".", "."); }   

But, this is not correct, I do not need it.

Does anyone help me find a way to clean up this string before punctuation or spaces?

PS: I just can not do

  example = example.replaceAll (stopWord, "");   

Because it can break some words like "initial" it will remove "in" and leave me "i.e.el".

The easiest way is to add string and everything to the words boundaries but close the word.

  stringbilder result = new stringbilder (example.lammith ()); For (string s: result.split ("\\ b") {if (! StopWordsSet.contains ()) result.append (s); }    

Comments

Popular posts from this blog

Verilog Error: output or inout port "Q" must be connected to a structural net expression -

jasper reports - How to center align barcode using jasperreports and barcode4j -

c# - ASP.NET MVC - Attaching an entity of type 'MODELNAME' failed because another entity of the same type already has the same primary key value -