Python has a built-in string class named "str" with many handy features. String literals can be enclosed by either double or single quotes, although single quotes are more commonly used.
'Hello'
"Hello"
Backslash escapes work the usual way within both single and double quoted
literals -- e.g. \n
\'
\"
.
print('first\nsecond')
print('I didn\'t do it')
A double quoted string literal can contain single quotes without any fuss (e.g.
"I didn't do it"
) and likewise single quoted string can contain double quotes.
A string literal can span multiple lines, but there must be a backslash (\
) at
the end of each line to escape the newline. String literals inside triple
quotes, """
or '''
, can span multiple lines of text.
Python strings are "immutable" which means they cannot be changed after they
are created. Since strings can't be changed, we construct new strings as we
go to represent computed values. So for example the expression 'hello' + 'there'
takes in the 2 strings 'hello' and 'there' and builds a new string
'hellothere'
.
Characters in a string can be accessed using the standard [ ]
syntax,
and like Java and C++, Python uses zero-based indexing, so if strng
is
'hello'
strng[1]
is 'e'
.
If the index is out of bounds for the string, Python raises an error. The Python style (unlike Perl) is to halt if it can't tell what to do, rather than just make up a default value.
The handy "slice" syntax (below) also works to extract any substring from a
string. The len(string)
function returns the length of a string. The [ ]
syntax and the len()
function actually work on any sequence type -- strings,
lists, etc.. Python tries to make its operations work consistently across
different types.
Python newbie gotcha: don't use len
as a variable name to avoid blocking
out the len()
function.
The +
operator can concatenate two strings. Notice in the code below that
variables are not pre-declared -- just assign to them and go.
s = 'hi'
print(s[1]) ## i
print(len(s)) ## 2
print(s + ' there') ## hi there
Unlike Java, the +
does not automatically convert numbers or other
types to string form. The str()
function converts values to a string
form so they can be combined with other strings (and this is why str
is not a good
variable name).
pi = 3.14
text = 'The value of pi is ' + pi ## No, does not work
text = 'The value of pi is ' + str(pi) ## Yes
Numeric variables come in two flavours Integer and Floating Point (or Real). Integers are simple numbers
without any fractional part and in Python (unlike some other languages) can be as big as you like. Floating
Point numbers are any number with a fractional or decimal part (even if that part is 0
) so 1
is an integer
and 1.0
is a floating point number. Mostly python doesn't really care about what type of number you are
dealing with and will convert any integers to floats if it needs (by multiplying by 1.0
).
For numbers, the standard operators, +
, -
, /
, *
work in the usual way.
There is no ++
operator (don't worry if you don't know what that is), but +=
, -=
, etc. do work, they add
or subtract the right hand side to the left hand.
If you want integer division, it is necessary to
use 2 slashes -- e.g. 6 // 5
is 1
while 6/5
is 1.2
. You use %
to get the remainder (modulo) of an integer division, so 6%5
is 1
for the remainder of 6//5
. This is a quick way of doing something every nth time in a loop, by checking if
i%n == 0
(e.g i%2==0
finds all the even i
s), or more commonly Not i%n
, as 0
is False
and anything
else it True
.
The "print" operator prints out one or more python items followed by a newline (leave a trailing comma at the end of the items to inhibit the newline).
A "raw" string literal is prefixed by an 'r' and passes all
the chars through without special treatment of backslashes, so r'x\\nx'
evaluates to the length-4 string 'x\\nx'
. Except for \"
(or \'
) which are
printed as "
and '
, so be careful with windows paths as r"c:\temp\"
is not valid as the \
hides the
final "
, see this stackoverflow
question
for more explanation.
raw = r'this\t\n and that'
print(raw) ## this\t\n and that
text = 'this\t\n and that'
print(text)
multi = """It was the best of times.
It was the worst of times."""
Here are some of the most common string methods. A method is like a
function, but it runs "on" an object. If the variable s
is a string,
then the code s.lower()
runs the lower()
method on that string object
and returns the result (this idea of a method running on an object is
one of the basic ideas that make up Object Oriented Programming, OOP).
Here are some of the most common string methods:
s.lower()
,s.upper()
-- returns the lowercase or uppercase version of the strings.strip()
-- returns a string with whitespace removed from the start and ends.isalpha()
/s.isdigit()
/s.isspace()
... -- tests if all the string chars are in the various character classess.startswith('other')
,s.endswith('other')
-- tests if the string starts or ends with the given other strings.find('other')
-- searches for the given other string (not a regular expression) withins
, and returns the first index where it begins or-1
if not founds.replace('old', 'new')
-- returns a string where all occurrences of'old'
have been replaced by'new'
s.split('delim')
-- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text.'aaa,bbb,ccc'.split(',')
->['aaa', 'bbb', 'ccc']
. As a convenient special cases.split()
(with no arguments) splits on all whitespace chars.s.join(list)
-- opposite ofsplit()
, joins the elements in the given list together using the string as the delimiter. e.g.'---'.join(['aaa', 'bbb', 'ccc'])
->'aaa---bbb---ccc'
A google search for "python str" should lead you to the official python.org string methods which lists all the str methods.
Python does not have a separate character type. Instead an expression
like s[8]
returns a string-length-1 containing the character. With
that string-length-1, the operators ==
, <=
, ... all work as you would
expect, so mostly you don't need to know that Python does not have a
separate char
type.
The "slice" syntax is a handy way to refer to sub-parts of sequences --
typically strings and lists. The slice s[start:end]
is the elements
beginning at start and extending up to but not including end. Suppose we
have s = "Hello"
s[1:4]
is'ell'
-- chars starting at index 1 and extending up to but not including index 4s[1:]
is'ello'
-- omitting either index defaults to the start or end of the strings[:]
is'Hello'
-- omitting both always gives us a copy of the whole thing (this is the pythonic way to copy a sequence like a string or list)s[1:100]
is'ello'
-- an index that is too big is truncated down to the string length
The standard zero-based index numbers give easy access to chars near the
start of the string. As an alternative, Python uses negative numbers to
give easy access to the chars at the end of the string: s[-1] is the
last char 'o'
, s[-2]
is 'l'
the next-to-last char, and so on. Negative
index numbers count back from the end of the string:
s[-1]
is'o'
-- last char (1st from the end)s[-4]
is'e'
-- 4th from the ends[:-3]
is'He'
-- going up to but not including the last 3 chars.s[-3:]
is'llo'
-- starting with the 3rd char from the end and extending to the end of the string.
It is a neat truism of slices that for any index n,
s[:n] + s[n:] == s
. This works even for n negative or out of bounds.
Or put another way s[:n]
and s[n:]
always partition the string into
two string parts, conserving all the characters. As we'll see in the
list section later, slices work with lists too.
Python allows you to use a template style string rather than having to add a
number of strings together. This is more efficient if your program is going to
do a lot of string manipulation (as otherwise the strings keep having to be
copied). You tell python that it should format the string by prepending f
to it.
> pigs=3
> huff="huff"
> puff="puff"
> rest="blow your house down"
> text = f"{pigs} little pigs come out or I'll {huff} and {puff} and {rest}"
> text
"3 little pigs come out or I'll huff and puff and blow your house down"
> pigs=2
> text
"3 little pigs come out or I'll huff and puff and blow your house down"
> text = f"{pigs} little pigs come out or I'll {huff} and {puff} and {rest}"
> text
"2 little pigs come out or I'll huff and puff and blow your house down"
>
As you can see the variables' values are substituted into the string when it is
used, so if you change a value you need to recreate the string.
The variable placeholders in a string template can also be formatted for display; for example you might wish
to format a float
value to show 2 decimal places:
> import math
> f"pi to 3 decimal places is {math.pi:.3f}"
'pi to 3 decimal places is 3.142'
And if you are debugging a quick print statement for a variable can be done like this:
>>> var = '\N{snake} \N{snowman}'
>>> print(f"{var=}")
var='🐍 ☃'
>>>
See the Python string documentation for formatting examples.
Python also provides a full string formatter if you need to carry out type conversions of your variables but you are unlikely to need this in your day to day work.
In python3 unicode is supported by default. As English speakers we don't worry too
much about accents and other special characters, but they do crop up in place
and people's names. You can insert unicode characters using your keyboard (if
you know how to type it (alt-gr-^a on my machine)) or using the full unicode (\u00e2
) or hex (\xe2
) code
which are easy to look up on the internet. Llanarmon-yn-Iâl is a village in
Denbighshire, Wales if you were wondering.
See the python documentation for a longer discussion of unicode's development and how python uses it.
> place="Llanarmon-yn-Iâl"
> print(place)
Llanarmon-yn-Iâl
> place="Llanarmon-yn-I\u00E2l"
> print(place)
Llanarmon-yn-Iâl
>
> string = 'A unicode \u018e string \xf1'
> print(string)
A unicode Ǝ string ñ
> print('\N{snake} \N{snowman}')
🐍 ☃
One unusual Python feature is that the whitespace indentation of a piece of code affects its meaning. A logical block of statements such as the ones that make up a function should all have the same indentation, set in from the indentation of their parent function or "if" or whatever. If one of the lines in a group has a different indentation, it is flagged as a syntax error.
Python's use of whitespace feels a little strange at first, but it's logical and I found I got used to it very quickly. Avoid using TABs as they greatly complicate the indentation scheme (not to mention TABs may mean different things on different platforms). Set your editor to insert spaces instead of TABs for Python code.
A common question beginners ask is, "How many spaces should I indent?" According to the official Python style guide (PEP 8), you should indent with 4 spaces.
Python does not use { } to enclose blocks of code for if/loops/function
etc.. Instead, Python uses the colon (:) and indentation/whitespace
to group statements. The boolean test for an if does not need to be
in parenthesis, and it can have elif
and else
clauses (mnemonic:
the word elif
is the same length as the word else
).
Any value can be used as an if-test. The "zero" values all count as
false
: None
, 0
, ''
, empty list []
, empty dictionary {}
,
anything else is true. There is also a Boolean type with two values:
True
and False
(converted to an integer, these are 1
and 0
).
Python has the usual comparison operations: ==
,
!=
, <
, <=
, >
, >=
. That is equals, not equals, less than, less than or
equal to, greater than, and greater than or equal to.
The boolean operators are the spelled out
words and
, or
, not
(Python does not use the C-style &&
||
!
). Here's what the algorithm might look like for a policeman pulling over a
speeder -- notice how each block of then/else statements starts with a :
and the statements are grouped by their indentation:
if speed >= 80:
print('License and registration please')
if mood == 'terrible' or speed >= 100:
print('You have the right to remain silent.')
elif mood == 'bad' or speed >= 90:
print("I'm going to have to write you a ticket.")
else:
print("Let's try to keep it under 80 ok?")
I find that omitting the ":" is my most common syntax mistake when typing in the above sort of code, probably since that's an additional thing to type vs. my C++/Java habits. Also, don't put the boolean test in brackets -- that's a C/Java habit too. If the code is short, you can put the code on the same line after ":", like this (this applies to functions, loops, etc. also), although most people feel it's more readable to space things out on separate lines.
if speed >= 80: print('You are so busted')
else: print('Have a nice day')
Functions in Python are defined like this:
# Defines a "repeat" function that takes 2 arguments.
def repeat(s, exclaim):
result = s + s + s
if exclaim:
result = result + '!!!'
return result
Again, notice how the lines that make up the function and the if-statement are grouped by all having the same level of indentation.
The def
keyword defines the function with its parameters within
parentheses and its code indented. The first line of a function can be a
documentation string ("docstring") that describes what the function
does. The docstring can be a single line, or a multi-line description as
in the example above. (Yes, those are "triple quotes," a feature unique
to Python!) Variables defined in the function are local to that
function, so the "result" in the above function is separate from a
"result" variable in another function. The return
statement can take
an argument, in which case that is the value returned to the caller.
Here is code that calls the above repeat() function, printing what it returns:
print(repeat('Yay', False))
print(repeat('Woo Hoo', True))
Once you have defined a function (line repeat
) it can be used in exactly the same way as a builtin python
function. You follow it's name with a pair of brackets ('()') which should contain any required parameters. At
run time, functions must be defined by the execution of a "def
"
before they are called.
Scope is a fancy computer sciency way of saying who can see a variable. Essentially, a variable that is created with in a block of code can't be seen outside that block. Here's an example:
>>> def variable_test(v1):
... v2 = 100
... result = v1 + v2
... return result
...
>>> ian_var = 20
>>> print(variable_test(ian_var))
120
>>> print(ian_var)
20
>>> print(v1,v2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'v1' is not defined
>>>
So we define a function variable_test
that takes a single argument (v1
, in real code you should use a more
descriptive name). The function creates a second variable (v2
) and saves the value 100
in it. It then adds
that value to the input parameter (v1
) and stores that result in a variable called result
. It then
return
s that variable's value as the result of the function. So when we print
the result of the function
(by calling the function in a print
statement) we see the answer (120
in this case). Note how the value of
ian_var
is not changed and that we can't refer to either v1
or v2
outside of the block that defines the
function. We say that variables v1
, v2
and result
have function scope as they can only be seen
inside the function.
Python does very little checking at compile time, deferring almost all type, name, etc. checks on each line until that line runs.
if name == 'Guido':
print(repeeeet(name) + '!!!')
else:
print(repeat(name, False))
The if-statement contains an obvious error, where the repeat()
function
is accidentally typed in as repeeeet()
. The funny thing in Python ...
this code compiles and runs fine so long as the name at runtime is not
'Guido'. Only when a run actually tries to execute the repeeeet()
will
it notice that there is no such function and raise an error. This just
means that when you first run a Python program, some of the first errors
you see will be simple typos like this. This is one area where languages
with a more verbose type system, like Java, have an advantage ... they
can catch such errors at compile time (but of course you have to
maintain all that type information... it's a trade-off).
Modules are prebuilt collections of useful methods and variables that can be used by another program. They
either come with python, or can be downloaded and added to your installation using tools like pip
. Since
python would be much too big if it included all possible functions that people might need on a few functions
are defined by default (remember that python can run on very small machines). This allows a programmer to
import
just the modules that their program needs.
For example with the statement import sys
you can import and access the definitions
in the sys
module and make them available by their fully-qualified
name, e.g. sys.exit()
. sys
is also known as a "namespace" which is
how Python implements scope in modules. Namespaces
prevent conflicts between classes, methods and objects with the same
name that might have been written by different people.
import sys
# Now can refer to sys.xxx facilities
sys.version
There is another import form that looks like this: from sys import version
. That makes version available as version
without the module prefix;
however, we recommend the original form with the fully-qualified names
because it's a lot easier to determine where a function or attribute
came from.
There are many modules and packages which are bundled with a standard installation of the Python interpreter, so you don't have to do anything extra to use them. These are collectively known as the "Python Standard Library." Commonly used modules/packages include:
sys
— access toversion
,exit()
,argv
,stdin
,stdout
, ...re
— regular expressionsos
— operating system interface, file system
You can find the documentation of all the Standard Library modules and packages at https://docs.python.org/3.6/library/.
There are a variety of ways to get help for Python.
- Do a Google search, starting with the word "python", like "python list" or "python string lowercase". The first hit is often the answer. This technique seems to work better for Python than it does for other languages for some reason.
- The official Python docs site — docs.python.org — has high quality docs. Nonetheless, I often find a Google search of a couple words to be quicker.
- There is also an official Tutor mailing list specifically designed for those who are new to Python and/or programming!
- Many questions (and answers) can be found on StackOverflow, GIS Stackexchange and Quora.
- Use the help() and dir() functions (see below).
Inside the Python interpreter, the help() function pulls up documentation strings for various modules, functions, and methods. These doc strings are similar to Java's javadoc. The dir() function tells you what the attributes of an object are. Below are some ways to call help() and dir() from the interpreter:
help(len)
— help string for the built-inlen()
function; note that it's "len" not "len()", which is a call to the function, which we don't wanthelp(sys)
— help string for thesys
module (must do animport sys
first)dir(sys)
—dir()
is likehelp()
but just gives a quick list of its defined symbols, or "attributes"help(sys.exit)
— help string for theexit()
function in thesys
modulehelp('xyz'.split)
— help string for thesplit()
method for string objects. You can callhelp()
with that object itself or an example of that object, plus its attribute. For example, callinghelp('xyz'.split)
is the same as callinghelp(str.split)
.help(list)
— help string forlist
objectsdir(list)
— displayslist
object attributes, including its methodshelp(list.append)
— help string for theappend()
method forlist
objects
To practice the material in this section, try the string1.py exercise in the Basic Exercises.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License.