Session 2: Strings and things#

Introduction#

In the previous workshop you learned how to write simple Python programs that can do things with data - e.g. print it to the screen, or read it in from a file. The data you worked with was bits of text, e.g. ‘Hello World!’.

More exactly, the data objects (remember, everything in Python is an object) were sequences of alphanumeric characters whose start and end were marked with quote symbols '. In Python, these types of data objects are called strings.

The focus of this workshop is to look at the things you can do with strings in Python, in the course of which you will learn some basic concepts that will be very important when you look at other types of data as well.

Task 1#

In a terminal window, navigate to your PHAR2062/Session 2 - Strings and things/ folder.

There are a series of directed tasks to perform later on in this workshop, but also feel free to experiment with writing and running your own small python programs in this folder, according to suggestions you will come across in the following sections.

Making strings#

Remember in the last workshop you saw an assignment statement:

message = 'Hello World'

You learned that in this type of statement, the thing on the right is an object, and the thing on the left is a target. Now we can be a bit more specific: the type of object on the right is a string, and the type of target on the left is a string too. But we can get even more specific - 'Hello World! is a string literal and message is a string variable. So what’s the difference?

Let’s start with the string variable, message. Remember the code:

message = 'Hello World!'
print(message)

When you ran this code, it printed Hello World! on the screen - it didn’t print message. That’s because the print() function prints the value of the variable, not the name of the variable. We call message a string variable because we can change its value without changing its name, e.g.:

message = 'Hello World!'
print(message)
message = 'Python is fun'
print(message)

will print the two different messages one after the other, even though the argument to the print() function was the same. (If you want to check this for yourself you know how - look back to the last session if you need a reminder how to write the code into a file and run it!).

So what about 'Hello World!. This is a string literal because, in effect, its name is its value. There is no way that:

print('Hello World!')

is going to print anything other than Hello World! on the screen.

Now here’s another example:

greeting = 'Hello World!`
message = greeting
print(message)

You can probably guess what running this code will print, but test for yourself if you want to. Look at the second line: it’s an assignment statement as before, but now the object on the right hand side is a string variable, not a string literal. So you can see that in situations like this, the assignment process causes a new string variable to be created whose name is whatever has been chosen by you, the programmer, and it will have a value equal to the value of the string on the righthand side.

This may all seem a bit anal, but it’s important that you have a clear understanding of the difference in what the word variable means in Python, and the word value, and it’s also very important that you can see that in a single assignment statement two things happen:

  1. A new variable is created, with a name chosen by the progammer

  2. A value is assigned to this new variable.

If you have studied other programing languages before, you may realise that this is different from what happens in some other languages, where the processes of creating variables and giving them a value are separated out.

Something to be careful about: in an assignment statement, Python doesn’t check to see if the variable name you are using is already being used somewhere else - if it is, it just gets overwritten and whatever it was before is gone!

Before we move on, we must talk about those quotes ('). Python is just as happy if you use double quotes (") to mark the start and end of string literals:

greeting = "Hello World!"

But it’s an error to mix them - so the following will not work and you will get an error message if you try it:

greeting = 'Hello World!"

The fact that you can use either type of quote can be useful if you want a quote character to actually appear in your string - use the other sort to wrap it, e.g.:

message = "It's a lovely day"
print(message)

will print It’s a lovely day. If you try:

message = 'It's a lovely day'
print(message)

You will see it doesn’t work - Python sees a string literal It followed by s a lovely day' (quote at the end, but none at the start) which it has no idea what to do with. (There will be a session on understanding Python error messages later).


Task 2: Manipulating strings in Python#

We use programs to analyse and manipulate data, so let’s start by looking at some of the ways you can analyse and manipulate strings. Again, the concepts introduced here will have much wider applicability later.

Task 2a#

Create a file called stringlength.py with the following content:

greeting = "Hello"
print(greeting)
print(len(greeting))

Then run it (python stringlength.py). You should get:

Hello
5

So what’s happened here?

Analysis#

I won’t insult your intelligence by discussing the first two lines, so straight on to the third. There is a print() function of the sort you have seen many times before, and the argument to the function is - as you can tell from the brackets - another function, called len() that is taking the argument greeting, which we know is a string variable.

As we discussed in the last session, most functions don’t just take in arguments, they return something. What the len() function returns ends up being print() ed so we can see what it is: 5. If you are thinking then that the len() function returns the number of characters in the string, then you are exactly right.

Task 2b#

Edit stringlength.py so the first line is greeting = "Hello World!", and run again.

Analysis#

The len() function tells you the total length of the string, including spaces, exclamation marks, etc. As we said right at the start, a string is a sequence of characters, and you can think of a character as being anything that you could generate by pressing one key on your keyboard - so let’s explore that a bit further.

Task 2c#

Edit stringlength.py so the first line is greeting = "Hello\nWorld!". Type carefully - notice we have replaced the space character by \n. Now run again.

Analysis#

The output you should have got is:

Hello
World!
12

So: greeting has printed out over two lines, and the total number of characters in the string is apparently still 12, even though we relaced a single space by \n. What’s going on?

If you wanted to directly print ‘Hello’ and ‘World!’ on two lines - e.g. in a text file - you would press the keys for ‘Hello’, then the return key (↲), then the keys for ‘World!’ - i.e., a total of 12 keys. What is happening is that in the string literal "Hello\nWorld!", the two character sequence \n is code for a single character - the one generated by pressing the return key. The sequence \n is called an escape character. There are quite a few other escape characters, all help you to create string literals that contain characters that Python would otherwise not understand, but we will save discussing those for later.

The reason we introduce \n here is because it’s very important when we get more into loading data into our Python programs from files.

Task 3: Slicing#

What is “slicing”? - You find out here.

Task 3a#

Create a file stringslice.py with the following content:

greeting = "Hello"
print(greeting[0])
print(greeting[1])
print(greeting[2])
print(greeting[3])
print(greeting[4])

And run it. You should get:

H
e
l
l
o

Analysis#

We know the string greeting is a sequence of five characters, and we see each of them printed out in turn. So the Python greeting[0] means “the first character in the string greeting”, etc. So immediately we have uncovered a very important aspect of Python: IN PYTHON YOU COUNT FROM ZERO.

So square brackets ([]) allow you to get at the individual characters in a string. The number inside the square brackets is called the index.

Task 3b#

You can do more with square brackets too. Edit stringslice.py to read:

greeting = "Hello"
print(greeting[0])
print(greeting[0:1])
print(greeting[0:2])
print(greeting[0:3])
print(greeting[3:5])

and run it.

Analysis#

You should have got:

H
H
He
Hel
lo

So what has happened?

Line 1 is the same as you got before: greeting[0] means “the first character in greeting”. In lines 2 onwards, instead of having a single index number inside the square brackets, there are two, separeted by a colon (:). This is called a slice. It may seem a little strange at first, but you should be able to work out that in each line, the syntax a:b in the slice means “the subsequence from index a, up to but excluding index b”, or alternatively “every index in the range from a to b-1”.

The way Python ‘stops’ one before b can be a bit hard to get to terms with to begin with and is a common cause of errors for beginners, but ultimately you will find it makes sense (honest!).

Task 3c#

Edit stringslice.py to read:

greeting = "Hello"
print(greeting[1:])
print(greeting[:4])
print(greeting[-1])
print(greeting[-3:])

And run it.

Analysis#

You should have got:

ello
Hell
o
llo

So what has happened? From line 2 you can see that if the is no index given before the colon, it’s assumed to be zero (the start of the string sequence). From line 3 you see that if no index is given after the colon, it’s assumed to be the length of the sequence (which is one more than the index of the last character - think about it!). From lines 4 and 5 you see that if the index is negative then you are counting from the right-hand end of the string instead (so -1 is the right-most character, -2 the one before that, etc.). You will see later that negative indexing is useful if you want to get at parts of a string but you don’t know how long it is.

Task 3d#

Edit stringslice.py to read:

greeting = "Hello World!"
print(greeting[0:-1:2])
print(greeting[4::-1])

And run it.

Analysis#

You should have got:

HloWrd
olleH

You see that slices can be more complicated - they can have the syntax start:end:increment. In line 2 [0:-1:2] means “from index 0 to the last character, in increments of 2” while in line 3 [4::-1] means “from index 4 to the first character, in increments of -1”.

Don’t worry too much about all the fine details here - some of this fancy indexing and slicing will turn out to be quite useful, but other parts you will rarely need.

Programming Challenge

Create a text file called words.txt with whatever content you like, and write a Python program to print out how many characters it contains.

Hint: Look at the program hello.py you wrote in the previous session.

Stretch challenge

Adapt your Python program so:

  1. It starts by prompting you for the name of the file to analyse.

  2. It prints the message: “File X contains N characters”. Where X is the name of the file and N is the number of characters.

Summary#

In this workshop you have:

  1. Met your first Python data type - the string.

  2. Seen how to create string variables from string literals (or other string variables) using assignment statements.

  3. Learned about the difference between a variable and a value.

  4. Learned the rules about using quotes (' or ") to define string literals.

  5. Been introduced to the \n escape sequence.

  6. Learned about the len() function.

  7. Learned how indexes and slices allow you to extract subsequences from strings.

Important concepts you have been introduced to include:

  • How in Python we count from zero

  • How in Python when we define the start and end values in a range (e.g. a slice) we include the start value but exclude the end value.