Session 4: Lists#
Introduction#
In this session you will be learning about lists - one of the fundamental tools Python provides for you to organise, manipulate, and transform data - maybe in very large quantities.
Almost immediately you will see a relationship between lists and what you have learned so far about the way Python handles and allows you to manipulate strings.
Task 1#
In a terminal window, navigate to your
PHAR2062/Session 4 - Lists
folder.Start a Jupyter Notebook session, and create a fresh, empty, notebook (look over the material from the last session if you need a reminder of how to do this).
Copy the following code into the first cell in your new notebook, and then run it (press the ‘run’ button or hit <Shift>+<Enter>).
elements = ['Carbon', 'Hydrogen', 'Nitrogen', 'Oxygen']
print(elements)
print(len(elements))
print(elements[1])
print(elements[:-1])
Analysis#
You will recognise the first line of code as an assignment statement. It creates a target called elements
from what’s on the right of the equals sign. That is a series of strings, separated by commas, within a set of square brackets ([]
). This is a list - specifically, a list literal. So elements
is a list object.
In the second line you see that if you print()
a list, it is printed in a form that resembles the list literal we used to create it.
Can you work out what has happened in the third line? You have come across the len()
function before, when studying strings - you know what it did then, can you see what it is doing here? The length of a list is the number of items it contains, in the same way as the length of a string is the number of characters it contains.
Lines four and five introduce you to the indexing and slicing of lists - you should quickly be able to work out what is happening here, from what you learned about indexing and slicing strings in the last session.
One thing that can slightly confuse beginners is the different meaning of square brackets ([]
) in line 1, compared to lines 4 and 5.
In line 1, square brackets are part of the syntax for writing a list literal in Python - they delimit the start and end of the list of items in the list in the same way that quotes (
'
or"
) delimit the start and end of the sequence of characters in a string.In lines 4 and 5, the square brackets appended to the end of the object indicate that indexing or slicing is happening.
Because these two situations in which square brackets occur are so different, any confusion you have about this should quickly be resolved.
Task 2#
Copy the following code into the empty cell at the bottom of your notebook and then run it:
elements.append('Phosphorus')
print(elements)
elements.remove('Hydrogen')
print(elements)
print(elements.index("Nitrogen"))
Analysis#
If you think (or look) back to session 2, you should be able to work out that line 1 features a function (because of the ()
brackets) that is taking an argument 'Phosphorus'
, and that this function, append()
, is actually a method of the list object elements
(because of the period .
).
In line 2 you see what the append()
method of a list does: it adds the argument to the end of the list as an extra item.
You will be able to work out what’s happening in lines 3 and 4 for yourself, I’m sure.
Important
One important thing to notice about the append()
and remove()
methods of a list object: like the print()
function, but unlike many other functions or methods (e.g. the open()
method of a file object that you saw in session 2) while they take arguments, they don’t return anything - they modify the list object in place - that is, the list is different after the method is called, from how it was before it was called.
Line 5 introduces a further method of a list object: index()
. Now we are back to a method that returns something - I’m sure you can work out what!
Task 3#
Copy (exactly!) the following code into the empty cell at the bottom of your notebook and then run it:
for element in elements:
print(element)
Analysis#
Congratulations, you have just been introduced to one of the most important data processing concepts in Python (and indeed most other computing languages): iteration.
Iteration is about taking, one by one, each of the items in a collection of items and doing something with it - an extremely common situation in any coding or data analysis workflow.
In this case, each of the items in the list elements
is taken in turn and printed to the screen.
Line 1 shows the syntax of the Python code required to do an iteration - it takes the form of something like for A in B:
where B
is some collection of data items (here a list) and A
is a variable which - in a loop - is sequentially assigned the value of each item. The colon (:
) character at the end of the line signals that the lines that follow are part of the loop - but only if they are indented.
The second line is indeed indented, so it gets run four times - once for each of the values that A
(in this case, element
) can take.
Remember that as the programmer, you can give a variable any name you like (almost - see below!). Here we wrote for element in elements:
, but it would work just as well if you wrote: for item in elements
, or: for e in elements
- but the first choice is fairly obviously the one that makes the code easiest to understand.
Note
The word for
in line 1 is the first Python reserved word you are meeting. Reserved words are words in Python that have a special meaning, and can’t be messed with - in particular, having just been told that you can use almost any name you like for a variable, “for” is not one of them - so for example, if you tried the code: for = "Hydrogen"
it will fail to run - you will get an error message.
Task 4#
Edit the code in the current cell so it reads:
for element in elements:
print(element)
print(len(element))
print(element[0])
print('There are ', len(elements), 'elements in the list')
Now re-run the cell.
Analysis#
Here you see how Python uses indentation to mark which parts of the code are included in the iteration loop. Lines 2-4 get run for each item in the list because they are all equally indented, but the last (unindented) one only once, after the iteration is complete.
Python requires all the lines in the loop to be equally indented, but doesn’t mind what that level of indentation is - it could be 1, 2, or 10 spaces, all would work. A value of four spaces (as used here) is however a sort of “standard”, we would suggest you try to stick to this in your own code.
An Aside: More on Indenting and Formatting Python Code#
This is a good moment to talk a little bit more about how to lay out Python code when you write it.
1. Python statements that feature brackets can be split over multiple lines#
Here’s a valid piece of Python code:
elements = ['Carbon', 'Hydrogen', 'Nitrogen', 'Oxygen']
And here’s some more code, which is equally valid and does exactly the same thing:
elements = [
'Carbon',
'Hydrogen',
'Nitrogen',
'Oxygen'
]
This works because the Python interpreter, having reached the [
character on the first line, knows that the (invisible) \n
character at ther end of the line cannot be marking the end of the statement - that must come after it finds the matching ]
character, however many lines further down the code that comes.
Notice it’s important that the opening bracket is on that first line, so for example attempting to format the code like:
elements =
[
'Carbon',
'Hydrogen',
'Nitrogen',
'Oxygen'
]
Will not work.
Note the indentation used here - unlike the indentation in loops, the indentation in these multi-line statements is just to make them as readable as possible. So:
elements = [
'Carbon',
'Hydrogen', Nitrogen',
'Oxygen'
]
Would be perfectly valid, but doesn’t look very nice.
This useful feature of brackets also applies in functions:
print('Carbon', 'Hydrogen',
'Nitrogen', 'Oxygen')
Again, exactly how you lay these multi-line statements out is up to you, however there are tried-and-tested guidelines out there you can follow, and when later in this course you are introduced to writing Python programs with the aid of an Integrated Development Environment (IDE), you will see this being done semi-automatically for you.
2. Adding comments to your code#
Here is a snippet of some Python code:
# Create a list with the names of common elements
elements = [
'Carbon', # put first as likely to be most common
'Hydrogen',
'Nitrogen',
'Oxygen'
]
The purpose of comments is to help other people (any you in the future!) understand what the code is doing. They are ignored by the Python interpreter, so have no effect on the code. Comments always start with a hash (#
) symbol, and finish at the end of the line. Notice you can have a comment all on a line by itself, or to the right of a statement - do which ever is more helpful. The liberal use of comments in code is regarded as good practice - get into the habit of including them!
Task 5#
Back on your desktop, navigate again to your
PHAR2062/Session 4 - Lists
folder and open a second terminal session there (because the first one you started at the beginning of this session will still be “busy” running your Jupyter notebook). Open your favourite text editor and create a file with the following content (be sure the last line (“Oxygen”) has an extra space character at the end!):
Carbon
Hydrogen
Nitrogen
Oxygen
Save this to your session folder with the name
'elements.txt'
.Back in your Notebook, copy the following code into a fresh cell, and run it:
elements_file = open('elements.txt')
elements = []
for line in elements_file:
elements.append(line)
elements_file.close()
print(elements)
Analysis#
Line 1: If you remember (or look back at) session 2 you will recognise that what we are doing here is opening a file of data, creating a “file wrangling” object called elements_file
that has methods we can use to get at the data contained in the file.
Line 2: Here we are creating a list, called elements
- but notice how it is created from a list literal that is “empty” - it starts off with no items in it.
Line 3: We saw above that the “for A in B:” type syntax was used to iterate over a list, but elements_file
isn’t a list is it? No, it isn’t, but it is still a type of Python object that you can iterate over.
Line 4: Each time round the loop, the current value of line
is appended to the list elements
.
Line 5: This line is not indented any more, so it marks the end of the loop. The data file is closed.
Line 6: elements
is printed - when we see what it is: a list of strings, each of which is one line from the data file - including the \n
newline character that’s present (but invisible) in each line except the last.
Programming Challenge 1
Add a selection of comments to the lines of code in this cell, to explain what the code is doing. Check after you have added them that the code still runs OK!
Programming Challenge 2
It’s great that we now have a way of taking information from a file and storing it in a data structure - but those \n
characters are annoying.
Can you adapt the code so that they get removed when the data is loaded?
Hint
Remember about string slicing? Can you see why it was helpful you added that extra space character to the last line of the file?
Stretch Challenge
Create a second data file called “more_elements.txt” and add a few lines with extra element names to it (maybe Sulfur, Phosphorus - whatever you like). Now can you adapt the code you have written so that it reads the contents of both element.txt
and more_elements.txt
into a single list
object, and then prints it out?
Hint
Remember the append()
method of lists?
Summary#
In this session you have been introduced to Python lists - one of the most fundamental data structure types. You have learned how to create lists, and some of the ways you can examine and modify them. In this context, you have seen the similarities between strings and lists.
Next you have learned about iteration - how you can write code that will manipulate each item in a list one by one. In the course of this you have encountered the first “rule” of how you write Python code - that indentation matters.
Expanding on this, you have seen how when brackets feature (either ()
or []
), multi-line statements can make your code more readable, and in all situations how comments can make it more understandable.
Finally you have seen how, with your knowledge of strings, files, lists and iteration, you can write code to load data from an external file into a Python program in a convenient form for further processing and analysis.