java - Why do ";" and "\\;" find the same?

I just found Java code like this:

"bla;bla;bla".split("\\;");

It returns:

["bla","bla","bla"]   // String array of course

String.split does use regex, but from my research I found that ; is not a special character in regex and doesn't have to be escaped. So I tried replacing it with:

"bla;bla;bla;".split(";");

and it still does the same! So what is happening here? Is Java trying to be nice and ignores a useless backslash in the regex? But I tried it with Notepad++, too, and there it also both finds a single semikolon.

3 Answers

  1. Eric- Reply

    2019-11-16

    From the Javadoc of Pattern (emphasis mine):

    The backslash character ('\') serves to introduce escaped constructs

    ...

    It is an error to use a backslash prior to any alphabetic character that does not denote an escaped construct; these are reserved for future extensions to the regular-expression language. A backslash may be used prior to a non-alphabetic character regardless of whether that character is part of an unescaped construct.

  2. Evan- Reply

    2019-11-16

    In the following code:

    "bla;bla;bla".split("\\;");
    

    String#split() executes in a regex context. Two backslashes \\ result in a literal backslash, and so you end up splitting on \;, which functionally is the same as just splitting on ;, because semicolon does not need to be escaped.

    If you tried the following split, you would not the result you expect:

    "bla;bla;bla".split("\\\\;");
    

    This would correspond, in regex terms, to splitting on literal \;. Since that separator never appears in your string, you would just get an array whose first element is that input string.

    See the answer by @AndyTurner for an explanation on why splitting on \; is allowed in the first place.

  3. Francis- Reply

    2019-11-16

    The answers are fine. However, nobody mentioned Pattern.quote()

    Java does not have a raw or literal string (e.g. like a @"..."; verbatim string in C# or a r"..." raw string in Python). Nonetheless, for regular expressions we have the quote method that returns a literal pattern String for the specified String:

    This method produces a String that can be used to create a Pattern that would match the string s as if it were a literal pattern.

    So, if you would have used quote to specify your pattern, no split would have happened as illustrated in the following code sample:

    import java.util.regex.Pattern;
    class Example
    {
      public static void main (String[] args) throws java.lang.Exception
      {
      String sourcestring = "bla;bla;bla";
      Pattern re = Pattern.compile(Pattern.quote("\\;"));
      String[] parts = re.split(sourcestring);
        for(int partsIdx = 0; partsIdx < parts.length; partsIdx++ ){
          System.out.println( "[" + partsIdx + "] = " + parts[partsIdx]);
        }
      }
    }
    

    Output:

    [0] = bla;bla;bla
    

    Otherwise, it's just an escaped semi-colon in the regex context of the split method as explained by Tim and Andy.

Leave a Reply

Your email address will not be published. Required fields are marked *

You can use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>