Curiosity Python string.strip() removes just more than white spaces
Another Python curiosity.
If you see the Official Python3 documentation for strip(), it says that strip without parameters will return the string without the leading and trailing white spaces.
Optionally you can pass a string with the characters you want to eliminate.
The official documentation for Python 2 says:
string.strip(s[, chars])
Return a copy of the string with leading and trailing characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the both ends of the string this method is called on.
Changed in version 2.2.3: The chars parameter was added. The chars parameter cannot be passed in earlier 2.2 versions.
https://docs.python.org/2/library/string.html
A white space is a white space. Is not an Enter.
But strip() without parameters will remove white spaces (space), and Enter \n and Tabs \t.
Probably you will not realize that unless you read from a file that has empty lines at the end for a reason, and you use strip().
You can see a demonstration following this small program, that runs the same for Python2 and Python3.
And the corresponding output for python2 and python3:
The [ ] characters where added to show that there are no hidden tabs or similar after.
Here I paste the code so you can try yourself:
import sys def print_bar(): print("-----------------------------------------------------") def print_between_brackets(s_test): print("[" + s_test + "]") s_string_with_enters = " Testing strip not only removing white spaces, but Enter and Tabs s well\n\t\n\n" print("Testing .strip()") print("You are running Python " + sys.version) print("This is the original string") print_bar() print_between_brackets(s_string_with_enters) print_bar() print("Now after strip()...") print_bar() print_between_brackets(s_string_with_enters.strip()) print_bar() print("As you can see the Enters and the Tabs have been removed, not just the spaced")
I think this should be disambiguate so I decided to take action. Is very easy to blame and never contribute. Not me. I went to Python to fix that and I located a bug reporting this issue:
https://bugs.python.org/issue25433
The issue was registed and made specially interesting contributions by Dimitri Papadopoulos Orfanos.
The thread is really interesting to read. I recommend it.
At a glance:
“Python heavily relies on isspace() to detect “whitespace” characters.”
* Lib/string.py near line 23: whitespace = ' \t\n\r\v\f'
So all those characters will be stripped in Python2.7 if you use just string.strip()
The ticket was opened the 2015-10-18 12:15. So it’s a shame the documentation has not been updated yet, more than 3 years later. Those are the kind of things, lack of care, that I can’t understand. Not looking for the excellence.
Please, do note that Python3 supports Unicode natively and things are always a bit different than with Python2 and AscII.
Thanks for pointing me at your blog, Carles! It may indeed be a Python documentation bug insofar as it’s not clearly defined and the example uses only spaces, but while “space” normally refers only to the space character ‘ ‘ ASCII 32, “whitespace” is indeed generally understood to include any non-printing character, including newlines, tabs, etc. https://en.wikipedia.org/wiki/Whitespace_character
Thanks for your comment Michael. :)
My pleasure and I feel flattered that you came to visit my blog. :)