Review + Extra Examples Week 1-3#
Hi folks, This is a notebook covering the most common errors from hw1-3 and some extra examples to review before the midterm.
It likely contains a lot of typos in text (but hopefully not in the code)!
Best of luck with the midterm
Ilya
A Reminder about types#
Remember, we are using []
and {}
only when we assigning (creating) lists and dictionaries to variables
my_list = [1, 2, 3] #square braces tell python that what we assigning is a list
print(type(my_list)) #now python knows that
<class 'list'>
… but we don’t need to add []
or {}
when we are using a variable to which we already assigned a list or a dictionary
print(my_list) #the same result as print([1, 2, 3]). Check it!
my_list[1] #will work and return the second element (remember, we count from 0)
[1, 2, 3]
2
but not
print([my_list]) #will still return something, but likely it is not what you want. It is the same as print([[1, 2, 3]])
[[1, 2, 3]]
[my_list][1] #won't work
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-32-c71d39f3cd5f> in <cell line: 1>()
----> 1 [my_list][1] #wont work
IndexError: list index out of range
[my_list][0] #would work, but will return something unexpected
[1, 2, 3]
Same is true for dictionaries!
Sometimes this leads to errors so be careful!
Example 1 Strings, split and lists#
my_string_with_sneaky_spaces = 'This is my string, some spaces are s n e a k y'#note double spaces in the string
split1 = my_string_with_sneaky_spaces.split()
split2 = my_string_with_sneaky_spaces.split(' ')
print(split1)
print(split2)
print(len(split1))
print(len(split2))
['This', 'is', 'my', 'string,', 'some', 'spaces', 'are', 's', 'n', 'e', 'a', 'k', 'y']
['This', 'is', 'my', '', 'string,', 'some', 'spaces', 'are', 's', 'n', 'e', 'a', 'k', '', 'y']
13
15
What happens here? str.split()
without the arguments by default is going to treat multiple consequent spaces as one. Usually that is what we want when we try to split the string to words.
str.split(' ')
is going to treat each space separately!
Example 2 Strings, split and lists#
my_string = "This is my string. It contains multiple sentences. Hell, yeah."
print(my_string.split(".")) #note an extra element in the resulting list. Also note an extra space at the beginning of the list elements 2, 3
print(my_string.split(". ")) # I can split by more than 1 symbol. Note no extra spaces at the beginning of the elements, but extra `.` at the end of the last element.
['This is my string', ' It contains multiple sentences', ' Hell, yeah', '']
['This is my string', 'It contains multiple sentences', 'Hell, yeah.']
What do we need here? Depends on the task. If you need to calculate the number of sentences, option 1 is wrong (of course, you can always also filter out empty sentences but it looks clumsy).
Example 3. Nested loops and filtering#
Remember that you can use one loop inside of another and combine them with if
statements.
Let’s say I want to split my string into sentences and count number of words containing letter ‘e’ in each sentence.
Let me show how I create and test the solution step by step
I’ll start with a loop through a list of sentences I got from the string.
I can do either
my_string = "This is my string. It contains multiple sentences. Hell, yeah."
my_sentence_list = my_string.split(". ")
for sentence in my_sentence_list:
...
...
or just
for sentence in my_string.split(". "):
...
...
Remember that Python in this loop will go through each list element, assign it’s value to the sentence
variable inside of the loop iteration (so, where we have … in the example), then go back to the beginning of the loop and reassign sentence
to the next element on the next iteration, until it goes through all list elements.
my_string = "This is my string. It contains multiple sentences. Hell, yeah."
for sentence in my_string.split(". "):
print(sentence)
print(type(sentence))
This is my string
<class 'str'>
It contains multiple sentences
<class 'str'>
Hell, yeah.
<class 'str'>
Ok, next thing I need is to split each string (sentence) to words
my_string = "This is my string. It contains multiple sentences. Hell, yeah."
for sentence in my_string.split(". "):
word_list = sentence.split()
print(word_list)
['This', 'is', 'my', 'string']
['It', 'contains', 'multiple', 'sentences']
['Hell,', 'yeah.']
Let me first write code to test if each word contains ‘e’
my_string = "This is my string. It contains multiple sentences. Hell, yeah."
for sentence in my_string.split(". "):
word_list = sentence.split()
for word in word_list:
print(type(word)) #check if it is str
if 'e' in word:
print(word)
else:
print("No 'e' :(")
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
multiple
<class 'str'>
sentences
<class 'str'>
Hell,
<class 'str'>
yeah.
Cool! I want to calculate a number of words containing letter ‘e’ in each sentence. For that I need a list for my result (my_lengths_with_e
).
What this list is going to look like? Well, I know that its length would be equal to the number of sentences. So likely it should be updated somewhere in the part of the loop, responsible for sentences.
It will contain numbers, so I need to calculate them somewhere.
I will name this variable num_words_with_e
.
my_lengths_with_e = []
for sentence in my_string.split(". "):
word_list = sentence.split()
num_words_with_e = 0 #for every new sentence we will start counting from 0
for word in word_list:
if 'e' in word:
num_words_with_e = num_words_with_e + 1
my_lengths_with_e.append(num_words_with_e) #note there I am doing that looking at the indentation. Outside of `if 'e' in word`, outside of the inner loop (`for word in word_list`), but inside the outer loop `for sentence in my_string.split(". ")`
print(my_lengths_with_e)
[0, 2, 2]
Example 3. Complex logical (bool) expressions and if
statements#
Say, you need to classify reviews to good or bad or ambiguous
reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']
good_reviews = []
bad_reviews = []
ambiguous_reviews = []
for review in reviews:
if 'best' in review:
good_reviews.append(review)
if 'worst' in review:
bad_reviews.append(review)
if 'best' and 'worst' in review:
ambiguous_reviews.append(review)
print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history', 'best movie, worst actors']
['best movie, worst actors', 'worst movie']
['best movie, worst actors', 'worst movie']
This is wrong! Two problems here, let’s start with if 'best' and 'worst' in review
not doing what we want.
To make it a working example to tinker with, let me define review variable outside of the conditional statement
review = reviews[2] #third review
print(review)
print('best' and 'worst' in review)
worst movie
True
This is upsetting! Why Python thinks it is true?
and and or work for two logical expressions (resulting in bool values: True or False).
That means on the left and right of and, there should be a logical expression. Let me group them as Python interprets them using braces for clarity:
print(('best') and ('worst' in review))
print(bool('best'))
True
True
So if instead of a logical expression we give Python, say, a string (like ‘best’) or a non-zero number, it will try to be smart and transform it into a bool
value, but everything non-zero would be transformed to True
print(bool(56454))
print(bool('0'))
print(bool("That's annoying!"))
print(bool(0))
True
True
True
False
So, we need to be very precise and tell exactly what we want
print(('best' in review) and ('worst' in review))
False
Phew! Now it seems to work, let us try again
reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']
good_reviews = []
bad_reviews = []
ambiguous_reviews = []
for review in reviews:
if 'best' in review:
good_reviews.append(review)
if 'worst' in review:
bad_reviews.append(review)
if ('best' in review) and ('worst' in review):
ambiguous_reviews.append(review)
print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history', 'best movie, worst actors']
['best movie, worst actors', 'worst movie']
['best movie, worst actors']
So that error is fixed, but depending on the task, we might want to have ambiguous reviews only in their own list, not in good or bad lists.
We forgot that Python will check one by one all of the if
clauses and execute all there the condition is satisfied.
Let’s try to use elif
to be more precise!
reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']
good_reviews = []
bad_reviews = []
ambiguous_reviews = []
for review in reviews:
if 'best' in review:
good_reviews.append(review)
elif 'worst' in review:
bad_reviews.append(review)
elif ('best' in review) and ('worst' in review): #or else
ambiguous_reviews.append(review)
print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history', 'best movie, worst actors']
['worst movie']
[]
Again not what we want! We need to change the order of checking conditions
reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']
good_reviews = []
bad_reviews = []
ambiguous_reviews = []
for review in reviews:
if ('best' in review) and ('worst' in review): #ambiguous reviews go here and we won't check other conditions for them
ambiguous_reviews.append(review)
elif 'best' in review:
good_reviews.append(review)
elif 'worst' in review: #or else
bad_reviews.append(review)
print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history']
['worst movie']
['best movie, worst actors']
Now it looks like what we need!
Remember you can always test with nasty logical expressions outside of your if
:
print('best' in 'best')
print('best' in 'worst')
print('best' in 'best and worst')
print('best' in 'best and worst' and 'worst' in 'best and worst')
True
False
True
True
That’s it, folks
Best of luck with the midterm!