In this chapter we will study one of the most imortant aspects of programming which is strings. Any container/class/data strcuture you will encounter will have either numerical or string or boolean types as composing elements. Thus, it is imperative that we learn strings well in order to make full usage of them.
In Python there are four types of strings. Single quotes(with '), double quoted(with "), triple quoted(''' or """", these can
span multiple lines without anything in betweeen), raw strings and formatted strings starting with f'
or
f"
and ending with corresponding quote. The formatted strings can interpolate variables. We have already
seen rules for forming these and we also know that in Python there are no character types. We also know that Python strings
are unicode strings i.e. UTF-8 strings. Note that when you invoke len
method on a string it gives unicode
character count not byte count.
Single quootes strings allow embedding of double quotes; for example 'To quote Einstein, "Insanity is doing the same
thing and expecting different results"'
and double quotes return the favour by allowing embedding of single quotes;
for example "Hello, are you Robert's son?".
Triple quotes will of course allow embedding both single and
double quotes.
Notice that output is exact replica of input.>>> x = """Hello there, ... This is an example of white space preservation in triple quoted strings. ... Thanks, ... Shiv ... """ >>> print(x) Hello there, This is an example of white space preservation in triple quoted strings. Thanks, Shiv >>>
Raw strings start with r', r", r''', r"""", R', R", R''', R""""
and end with correspondig quote. Raw strings
escape any backslash character(\c
) it encounter i.e. they will not honor escape sequences. This makes them
very usedful in writing regular expressions because you do not need two backslashes where you mean one. For example,
>>> R'he\nllo' 'he\\nllo' >>> print(R'he\nllo') he\nllo >>> print('he\nllo') he llo >>>
String literals that are part of a single expression and have only whitespace between them will be implicitly converted to a
single string literal. That is, ("Hello " "world") == "Hello world"
. You can use this to make multilibe
strings as well.
Strings can be indexed and they can be iterated like a sequence. Consider the following iteration for example:
>>> s = 'hello' >>> for c in s: ... print(c) ... h e l l o >>>
The other way to iterate on this could be following unweildy way:
>>> s = 'hello' >>> for i in range(len(s)): ... print(s[i]) ... h e l l o
Like numerical literals strings are immutable. Whenever you change a string it creates a new object. Thus, it is hashable and can be used as a key for a dictonary. What this also means is that you cannot modify a string by index. For example,
>>> s = 'hellp' >>> s[4] = 'o' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' object does not support item assignment
To determine hashability of a value/variable you can try applying hash
function like below:
>>> hash('hello') -7630727383350627038
You will not be able to hash a mutable data structure like a list or dictionary or set. It is a very easy way to determine what can be key of a dictionary. Typically keys of a dictionary are strings or integers but sometimes you may need a list in that case you can convert that to tuple which is an immutable type as well. Tuples are also immutable data structure.
You can create a string with str
constructor. It takes a string or bytes(with encoding). There are two form
of this constructor which are class str(object='')
and class str(object=b'', encoding='utf-8',
errors='strict')
. An example is given below:
>>> s = str('hello') >>> print(s) hello >>> s = str(b'hello') >>> print(s) hello
Through it is much simpler to create them by assignment. When you invoke str(o)
on any object
o
then its __str__()
is called and if that method is missing in class then
__repr__()
is called. This is like calling repr(o)
. When the encoding is given the object
should be bytes
or bytearray
object. If no encoding is given then it converts the bytes
object to its string representation. For example,
>>> str(b'hello') "b'hello'"
This is important to know because several library functions use bytes for example cyrptographic functions typically operate on bytes for example password hashing functions. You want to store them in database as strings. A small example to convert string and bytes is given below:
>>> 'hello'.encode('utf-8') b'hello' >>> b'hello'.decode('utf-8') 'hello'
-5 | -4 | -3 | -2 | -1 |
h | e | l | l | o |
0 | 1 | 2 | 3 | 4 |
Accessing non-exsting index raises IndeError
exception, like below:
>>> s = 'hello' >>> print(s[5]) Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: string index out of range
© 2022 Shiv S. Dayal. www.ashtavakra.org. GNU FDL license v1.3 or later is applicable where not stated.