xx <tag a ="b" c= 'd' e=f> yy </tag> zz
(Did you noticed the single,double and no-quote attribute
values and spaces ? It is important to consider all these variations.)
Here, we will use the Captured Text of a Group within a Pattern to dynamically match tag name from the start tag and use it in the end tag. eg. matching <tag ...> ... </tag>
Here, we will use the Captured Text of a Group within a Pattern to dynamically match tag name from the start tag and use it in the end tag. eg. matching <tag ...> ... </tag>
First we extract the tag name and attribute sets. For this
we use regex : <(\S+?)(.*?)>(.*?)</\1> .
Here ,
- </\1> represents the first captured group (\S+?) i.e., tag name.
- (.*?) represents the attributes.
- Next (.*?) represents the content inside <> …. </>
Once we find the attributes, we need to extract the (name,value) of each
attribute. For this we can use regex (\w+)="(.*?)" for
simplicity. But this only matches attribute=”value”-
without spaces and only double quotes. For matching attribute, value
representations such as a ="b" c= 'd' e=f,
we can use the regex ([\w: \-]+)(\s*=\s*("(.*?)"|'(.*?)'|([^
]*))|(\s+|\z)).
Here is the
complete CODE:
String testHtml = "xx <tag a =\"b\"
c= \'d\' e=f> contentssss
</tag> zz";
Pattern tagPattern =
Pattern.compile("<(\\S+?)(.*?)>(.*?)</\\1>");
Pattern attValueDoubleQuoteOnly
= Pattern.compile("(\\w+)=\"(.*?)\"");
Pattern attValueAll =
Pattern.compile("([\\w:\\-]+)(\\s*=\\s*(\"(.*?)\"|'(.*?)'|([^
]*))|(\\s+|\\z))");
Matcher m = tagPattern.matcher(testHtml);
boolean tagFound = m.find(); // true
String tagOnly =
m.group(0);//
<tag a ="b" c= 'd' e=f> contentssss </tag>
String tagname =
m.group(1);//
tag
String attributes =
m.group(2);// a
="b" c= 'd' e=f
String content =
m.group(3);// contentssss
System.out.println("Tag
Only : " + tagOnly);
System.out.println("Tag
Name : " + tagname);
System.out.println("Attributes
: " +
attributes);
System.out.println("Content : " + content);
//m =
attValueDoubleQuoteOnly.matcher(attributes);
m =
attValueAll.matcher(attributes);
while (m.find()) {
System.out.println("
>> " +
m.group(0));
}
Result :
Tag Only : <tag a
="b" c= 'd' e=f>
contentssss </tag>
Tag Name : tag
Attributes : a
="b" c= 'd' e=f
Content : contentssss
>> a ="b"
>> c= 'd'
>> e=f
See also : Java : Html form parser return map of (name,value) pair of input attribute
what about self closing tags
ReplyDelete