This lesson covers some topics related to strings in Python. In particular: encoding, string management, built-in methods, iterating over strings, substring search, and immutability (concept: immutable type).
-
# Do this in the console so you can see the difference! mystring = "hello world" # What methods are defined on strings? dir(mystring) # Let's try using some methods... mystring.upper() # What happened? Did that permanently change the original string? mystring.title() # How about that? Why did we get the result we did? # Now let's coerce something else into a string mynumber = 1001001 my_coerced_string = str(mynumber) # Number and string are similar enough that python can have regular rules to do this # If you wanted to follow different rules, you could write them yourself # by overriding an object's __str__ method
-
- First off, what are they?
- ascii - this is a 1:1 encoding of bytes to characters, it can only represent the english letters and some additional stuff.
- unicode - this is a huge set of characters representing many languages. Not all fonts support all sections of unicode. Unicode costs between 1-4 bytes per character.
- So what does Python use in the Python interpreter?
- Python 2 uses ascii for strings, but has unicode strings available if you choose to use them.
- Python 3 uses unicode by default, so you never have to think about it.
- You can identify a unicode string because it will look like this: u'hello world'.
- How do you know if you are using Python 2 or 3? When you type the python command, you have to type
python3
to use Python 3 on most systems.
- What about your source code file, does that have to be ascii or unicode? Lets check PEP 263
- The Python 2 interpreter defaults to decoding a source code file (a script) as ascii.
- In order to use a different encoding you need to specify it:
- emacs-friendly:
# -*- coding: utf-8 -*-
- vim-friendly:
# vim: set fileencoding=utf-8 :
- Precise definition from PEP 263: encoding must match the regular expression
"coding[:=]\s*([-\w.]+)"
- you could use human friendly:
# this file uses the encoding: utf-8
- emacs-friendly:
- What about Python 3?
- Python 3 uses utf-8 as the default file coding
- PEP 263 still applies to Python 3.
- Lets use Python 2 - Three types of string declarations:
ascii = 'this is a string'
unicode = u'this is a unicode string'
raw = r'this needs no escapes'
- String escapes - a string will often need escapes.
- Let's play with escape sequences!
>>> 'hello world' >>> 'hello humans\' world' >>> '\\' >>> r'\\' >>> '\\\\'
- First off, what are they?
-
-
String methods
- Let's look at the string methods we listed in the deep-dive:
dir('hello')
. - These things are string methods. Lets look at some
- Change capitalization:
lower()
,upper()
,capitalize()
,swapcase()
,title()
- Manipulate whitespace:
strip()
,lstrip()
,rstrip()
,ljust()
,rjust()
,center()
,expandtabs()
- Test properties:
isalnum()
,isalpha()
,isdigit()
,islower()
,isspace()
,istitle()
,isupper()
- Replacement:
replace()
,translate()
- Manipulate encodings:
decode()
,encode()
- Divide, combine:
join()
,split()
,splitlines()
- Change capitalization:
- Let's look at the string methods we listed in the deep-dive:
-
Now lets slice a string.
-
Substring Search - find a string that fits inside another string.
- How many instances of 'is' are in 'this is my string'?
- Lets work with some more string methods.
find()
,rfind(),
count()
partition()
,rpartition()
-
String iteration
- Iterate over characters in a string:
for c in some_string: ...
- Iterate over a list of strings:
for s in some_strings: ...
- Iterate over characters in a string:
-
Regular Expressions - Know they exist; try them out sometime.
- Regular expressions are strings that can match a SET of regular strings.
- There's a lot to say about these!
- Check out the Python
re
library, it is already on your computer. - Regexr is a nice resource for learning regular expressions interactively.
- Regular expressions are strings that can match a SET of regular strings.
-