Parsing HTML tags using Java -


I am trying to create an HTML parser which checks the HTML tag and confirms that there is a closed tag Which are every open tag

How do I work partially now and I believe the argument is correct, but I'm having problems getting the token right. When the code is run to me, there are so many empty tokens in it, which is compared to other non-empty ones, obviously an error occurs.

I wonder how can I read it in my HTML file, but only in those things that & lt; And & gt; I do not even want to keep the text between tokens in any additional data, such as the H1 tag,

It is for the school's work and I believe that Professor us without it Want to use the third party program, like JTD?

Any help is greatly appreciated.

  import java.util.Scanner; Import java.util.StringTokenizer; Import java.io * *; Public class HTMLDriver {public static zero main (string [] args) IOException throws {// declared variable QueueReferenceBased QE = new QueueReferenceBased (); // Create a scanner object scanner = new scanner (system.); System.out.println ("What is your html file name?"); String filename = in.next (); File userFile = new file (filename); If (! UserFile.exists ()) {System.out.println ("The file does not exist. This program will now exit."); System.exit (0); } Scanner inputfile = new scanner (userfile); While (inputFile.hasNext ()) {string str = inputFile.nextLine (); Stringtochnizer st = new string locator (str, ""); // adds token to queue while (st.hasMoreTokens ()) {string token = st.nextToken (); Tag t = new tag (token); Queue.enqueue (t); }} // steak stack builds refreshbjack stack = new stackrifferbus (); // loops through the queue are not empty, while (! Queue.isEmpty ()) {object obj = queue.dequeue (); Tag 2 t2 = (tag) obj; If (t2.get open () == true) {stack.push (t2); } If (t2.get is open () == wrong) {if (stack.isEmpty ()) {System.out.println ("no + match for" + + "t2 +" tag); } Other {object obj2 = stack.pop (); Tag t3 = (tag) obj2; // Create tag category and check equality if (t2.getTag (.) Equals (t3.getTag ())} {System.out.println (t2 + "matches" + t3); } Else {System.out.println ("found" + "2 +" + "3 +" to match termination program); System.exit (0); }}}}}}}}    

Do not do that. In this regard, the HTML is notorious. Some tags have no open / closing & lt; & Gt; - Then there are all perverse HTML presentations and browser quarkness.

As long as your professor does not explicitly exclude you from using a third party's lib, then it is insane to try a strong scale. On XML, it is manageable

If you really want to do it yourself, decent results can be used in regular expressions

  Pattern p = Pattern.compile ("& lt; (. *) & Gt;") / / Your start You can then: Matcher m = p.matcher (); M.group (...) // You will find everything in regex     between parentheses

Comments

Popular posts from this blog

c# - ASP.NET MVC - Attaching an entity of type 'MODELNAME' failed because another entity of the same type already has the same primary key value -

jasper reports - How to center align barcode using jasperreports and barcode4j -

django - CommandError: You must set settings.ALLOWED_HOSTS if DEBUG is False -