Saturday, July 3, 2010

Regular Expressions in Java\WCS Examples

REGEX 101:
Regular expressions are a powerful (and fairly standardized) way of searching, replacing, and parsing text with complex patterns of characters.

. is a wild character
e.g b.t in text could search bat bit bet bot b*t b t

[] search to Limit to a range of characters.

e.g. b[aeio]t : bat bet bit bot

() search more than 1 characters. (can not use [] to search more than 1 character)
e.g. b(a|e|i|o|oo)t : bat bet bit bot boot

* 0 or more times
+ 1 or more times
? 0 or 1 time
{n} Exactly n number of times
{n,m} n to m number of times
- Indicates a range that would match any number\character.
\ used for escaping regexp special characters.
"^" notation is also called the NOT notation. If used in brackets, "^" indicates the character you don't want to match.
\s Search spaces and tabs.
Simpler short cuts:
\d [0-9]
\D [^0-9]
\w [A-Z0-9]
\W [^A-Z0-9]
\s [ \t\n\r\f]
\S [^ \t\n\r\f]

e.g. US telephone number 858-343-1111
Regexp expression to be used: [0-9]{3}\-[0-9]{3}\-[0-9]{4}

If - is optional: 8583431111
Regexp expression to be used: [0-9]{3}\-?[0-9]{3}\-?[0-9]{4}

e.g. 8836KV [0-9]{4} [A-Z]{2}
(first 4 numbers)(last 2 characters.)

e.g. May 11, 2010 : [A-Z]\s+[0-90{1,2},\s*[0-9]{4}
[A-Z]+(first 3 characters)\s+(mandatory space),\s* (0 or more spaces)[0-9]{4}

E.g. These examples are with JDK 1.4.2 API.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Pattern class:

An instance of the Pattern class represents a regular expression that is specified in string form in a syntax similar to that used by Perl.

A regular expression, specified as a string, must first be compiled into an instance of the Pattern class. The resulting pattern is used to create a Matcher object that matches arbitrary character sequences against the regular expression. Many matchers can share the same pattern because it is stateless.

The compile method compiles the given regular expression into a pattern, then the matcher method creates a matcher that will match the given input against this pattern. The pattern method returns the regular expression from which this pattern was compiled

Matcher Class

Instances of the Matcher class are used to match character sequences against a given string sequence pattern. Input is provided to matchers using the CharSequence interface to support matching against characters from a wide variety of input sources.

A matcher is created from a pattern by invoking the pattern's matcher method. Once created, a matcher can be used to perform three different kinds of match operations:

  • The matches method attempts to match the entire input sequence against the pattern.
  • The lookingAt method attempts to match the input sequence, starting at the beginning, against the pattern.
  • The find method scans the input sequence looking for the next sequence that matches the pattern.

Each of these methods returns a boolean indicating success or failure. More information about a successful match can be obtained by querying the state of the matcher.

This class also defines methods for replacing matched sequences by new strings whose contents can, if desired, be computed from the match result.

The appendReplacement method appends everything up to the next match and the replacement for that match. The appendTail appends the strings at the end, after the last match.


Sample methods :



public static String removeDuplicateWhitespace(String inputStr) {
String patternStr = "\\s+";
String replaceStr = " ";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
return matcher.replaceAll(replaceStr);
}


public static String replaceSpecialCharsWithHTMLTags(Map splCharsMap, String patternStr, String input)
{
final String METHODNAME ="replaceSpecialCharsWithHTMLTags";

Pattern r = Pattern.compile(patternStr, Pattern.CASE_INSENSITIVE);
Matcher m = r.matcher(input);

m.reset();
StringBuffer sb = new StringBuffer();
String replaceStr = null;
String supTagElement = null;

while (m.find())
{
supTagElement = input.substring(m.start(), m.end());

if (splCharsMap.containsKey(supTagElement))
{
replaceStr = (String) splCharsMap.get(supTagElement);
}

if ( replaceStr!=null && replaceStr.length > 0 )
m.appendReplacement(sb, replaceStr);
else
m.appendReplacement(sb, supTagElement);
}

m.appendTail(sb);
return sb.toString();
}

WCS:

There is application for regular expression in both for front validation JSP validation for feeds and also during string search\replace in the back end.



Ref:
http://java.sun.com/developer/technicalArticles/releases/1.4regex/

No comments:

Post a Comment