|
In the python docs:
I understand what the difference is but I don't get why the () version doesn't work for the regexps used in the class. We never used \ followed by a number in those regexps so how does it interfere? EDIT I probably didn't ask the question as clearly as I could have but luckily it got answered anyway, I was confused because I hadn't read the findall docs and so didn't know that it has a different behavior when there are capturing groups in the regexp. As dreyescat put it:
|
|
Take a look at this example:
If you run this you will get
as a result. So "(...)" and "(?:...)" obviously yield different results. regex2 on the other hand yields a different result (the contents of the parantheses). So it is clear why the examples from the course did not work this way. "(...)" are used to create backreferences and are very useful. Consider this example:
1
I don't think it is really due to a difference between the capturing and non-capturing grouping but because of the All these regular expressions (regex1, regex2, and regex3) are equal but, when you call findall with the second one, then findall return the list of groups instead of the list of matches. So it is because of the findall behavior that it looks like they behave different. From findall documentation:
|
|
The only difference between the capturing and non-capturing version of the grouping parenthesis is that the capturing version keeps the matched groups for later use while the non-capturing don't. So you could use any of them for the regular expressions in this class and should work, it doesn't matter which. I think that the non-capturing version is basically a convenient grouping that could be helpful with complex regular expressions with lots of groups, when some are required for later use and others aren't. It could be difficult to track which group number is which. Using the capturing ones for the groups required and the non-capturing version for the rest could improve your group numbering an help a lot on tracking only the interesting groups. A part from this convenience I have not found anything saying that once is better than the other, for example, regarding performance. An interesting excerpt from the regular expressions documentation:
|
|
Look here: http://docs.python.org/library/re.html#regular-expression-syntax (?...) => A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern. But so far I do not understand what that really means. |
|
It makes the intent of the regular expression clearer, and I'd wager it comes with a (likely very small) performance increase; there's no point in keeping a reference to a subexpression if we're not going to need it, and it's much clearer from simply looking at the regex that it's being used only for grouping, and not for capturing. Also (?:) totally looks cooler. :D |