Chapter 5. Strings and Keyboard IO

5.1. What is a string?
5.2. Operations on Strings

In this chapter we will study one of the most imortant aspects of programming which is strings. Any container/class/data strcuture you will encounter will have either numerical or string or boolean types as composing elements. Thus, it is imperative that we learn strings well in order to make full usage of them.

5.1. What is a string?

In Python there are four types of strings. Single quotes(with '), double quoted(with "), triple quoted(''' or """", these can span multiple lines without anything in betweeen), raw strings and formatted strings starting with f' or f" and ending with corresponding quote. The formatted strings can interpolate variables. We have already seen rules for forming these and we also know that in Python there are no character types. We also know that Python strings are unicode strings i.e. UTF-8 strings. Note that when you invoke len method on a string it gives unicode character count not byte count.

Single quootes strings allow embedding of double quotes; for example 'To quote Einstein, "Insanity is doing the same thing and expecting different results"' and double quotes return the favour by allowing embedding of single quotes; for example "Hello, are you Robert's son?". Triple quotes will of course allow embedding both single and double quotes.

Important

Triple quotes preserve any white space inside the string, for example
>>> x = """Hello there,
...     This is an example of white space preservation in triple quoted strings.
... Thanks,
... Shiv
... """
>>> print(x)
Hello there,
        This is an example of white space preservation in triple quoted strings.
Thanks,
Shiv

>>>
      
Notice that output is exact replica of input.

Raw strings start with r', r", r''', r"""", R', R", R''', R"""" and end with correspondig quote. Raw strings escape any backslash character(\c) it encounter i.e. they will not honor escape sequences. This makes them very usedful in writing regular expressions because you do not need two backslashes where you mean one. For example,

>>> R'he\nllo'
'he\\nllo'
>>> print(R'he\nllo')
he\nllo
>>> print('he\nllo')
he
llo
>>>
    

String literals that are part of a single expression and have only whitespace between them will be implicitly converted to a single string literal. That is, ("Hello " "world") == "Hello world". You can use this to make multilibe strings as well.

5.2. Operations on Strings

Strings can be indexed and they can be iterated like a sequence. Consider the following iteration for example:

>>> s = 'hello'
>>> for c in s:
...     print(c)
...
h
e
l
l
o
>>>
    

The other way to iterate on this could be following unweildy way:

>>> s = 'hello'
>>> for i in range(len(s)):
...     print(s[i])
...
h
e
l
l
o
    

Like numerical literals strings are immutable. Whenever you change a string it creates a new object. Thus, it is hashable and can be used as a key for a dictonary. What this also means is that you cannot modify a string by index. For example,

>>> s = 'hellp'
>>> s[4] = 'o'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
    

To determine hashability of a value/variable you can try applying hash function like below:

>>> hash('hello')
-7630727383350627038
    

You will not be able to hash a mutable data structure like a list or dictionary or set. It is a very easy way to determine what can be key of a dictionary. Typically keys of a dictionary are strings or integers but sometimes you may need a list in that case you can convert that to tuple which is an immutable type as well. Tuples are also immutable data structure.

You can create a string with str constructor. It takes a string or bytes(with encoding). There are two form of this constructor which are class str(object='') and class str(object=b'', encoding='utf-8', errors='strict'). An example is given below:

>>> s = str('hello')
>>> print(s)
hello
>>> s = str(b'hello')
>>> print(s)
hello
    

Through it is much simpler to create them by assignment. When you invoke str(o) on any object o then its __str__() is called and if that method is missing in class then __repr__() is called. This is like calling repr(o). When the encoding is given the object should be bytes or bytearray object. If no encoding is given then it converts the bytes object to its string representation. For example,

>>> str(b'hello')
"b'hello'"
    

This is important to know because several library functions use bytes for example cyrptographic functions typically operate on bytes for example password hashing functions. You want to store them in database as strings. A small example to convert string and bytes is given below:

>>> 'hello'.encode('utf-8')
b'hello'
>>> b'hello'.decode('utf-8')
'hello'
    

5.2.1. String Indexes

Given table shows how indexes are there in case of a string or list in a visual way:
-5 -4 -3 -2 -1
h e l l o
0 1 2 3 4

Accessing non-exsting index raises IndeError exception, like below:

>>> s = 'hello'
>>> print(s[5])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range
    

© 2022 Shiv S. Dayal. www.ashtavakra.org. GNU FDL license v1.3 or later is applicable where not stated.