Review + Extra Examples Week 1-3#

Hi folks, This is a notebook covering the most common errors from hw1-3 and some extra examples to review before the midterm.

It likely contains a lot of typos in text (but hopefully not in the code)!

Best of luck with the midterm

Ilya

A Reminder about types#

Remember, we are using [] and {} only when we assigning (creating) lists and dictionaries to variables

my_list = [1, 2, 3] #square braces tell python that what we assigning is a list

print(type(my_list)) #now python knows that
<class 'list'>

… but we don’t need to add [] or {} when we are using a variable to which we already assigned a list or a dictionary

print(my_list) #the same result as print([1, 2, 3]). Check it!

my_list[1] #will work and return the second element (remember, we count from 0)
[1, 2, 3]
2

but not

print([my_list]) #will still return something, but likely it is not what you want. It is the same as print([[1, 2, 3]])
[[1, 2, 3]]
[my_list][1] #won't work
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-32-c71d39f3cd5f> in <cell line: 1>()
----> 1 [my_list][1] #wont work

IndexError: list index out of range
[my_list][0] #would work, but will return something unexpected
[1, 2, 3]

Same is true for dictionaries!

Sometimes this leads to errors so be careful!

Example 1 Strings, split and lists#

my_string_with_sneaky_spaces = 'This is my  string, some spaces are s n e a k  y'#note double spaces in the string

split1 = my_string_with_sneaky_spaces.split()
split2 = my_string_with_sneaky_spaces.split(' ')

print(split1)
print(split2)

print(len(split1))
print(len(split2))
['This', 'is', 'my', 'string,', 'some', 'spaces', 'are', 's', 'n', 'e', 'a', 'k', 'y']
['This', 'is', 'my', '', 'string,', 'some', 'spaces', 'are', 's', 'n', 'e', 'a', 'k', '', 'y']
13
15

What happens here? str.split() without the arguments by default is going to treat multiple consequent spaces as one. Usually that is what we want when we try to split the string to words.

str.split(' ') is going to treat each space separately!

Example 2 Strings, split and lists#

my_string = "This is my string. It contains multiple sentences. Hell, yeah."

print(my_string.split(".")) #note an extra element in the resulting list. Also note an extra space at the beginning of the list elements 2, 3
print(my_string.split(". ")) # I can split by more than 1 symbol. Note no extra spaces at the beginning of the elements, but extra `.` at the end of the last element.
['This is my string', ' It contains multiple sentences', ' Hell, yeah', '']
['This is my string', 'It contains multiple sentences', 'Hell, yeah.']

What do we need here? Depends on the task. If you need to calculate the number of sentences, option 1 is wrong (of course, you can always also filter out empty sentences but it looks clumsy).

Example 3. Nested loops and filtering#

Remember that you can use one loop inside of another and combine them with if statements.

Let’s say I want to split my string into sentences and count number of words containing letter ‘e’ in each sentence.

Let me show how I create and test the solution step by step

I’ll start with a loop through a list of sentences I got from the string.

I can do either

my_string = "This is my string. It contains multiple sentences. Hell, yeah."

my_sentence_list = my_string.split(". ")

for sentence in my_sentence_list:
  ...
  ...

or just

for sentence in my_string.split(". "):
  ...
  ...

Remember that Python in this loop will go through each list element, assign it’s value to the sentence variable inside of the loop iteration (so, where we have … in the example), then go back to the beginning of the loop and reassign sentence to the next element on the next iteration, until it goes through all list elements.

my_string = "This is my string. It contains multiple sentences. Hell, yeah."

for sentence in my_string.split(". "):
    print(sentence)
    print(type(sentence))
This is my string
<class 'str'>
It contains multiple sentences
<class 'str'>
Hell, yeah.
<class 'str'>

Ok, next thing I need is to split each string (sentence) to words

my_string = "This is my string. It contains multiple sentences. Hell, yeah."

for sentence in my_string.split(". "):
    word_list = sentence.split()
    print(word_list)
['This', 'is', 'my', 'string']
['It', 'contains', 'multiple', 'sentences']
['Hell,', 'yeah.']

Let me first write code to test if each word contains ‘e’

my_string = "This is my string. It contains multiple sentences. Hell, yeah."

for sentence in my_string.split(". "):
    word_list = sentence.split()
    for word in word_list:
      print(type(word)) #check if it is str
      if 'e' in word:
        print(word)
      else:
        print("No 'e' :(")
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
No 'e' :(
<class 'str'>
multiple
<class 'str'>
sentences
<class 'str'>
Hell,
<class 'str'>
yeah.

Cool! I want to calculate a number of words containing letter ‘e’ in each sentence. For that I need a list for my result (my_lengths_with_e).

What this list is going to look like? Well, I know that its length would be equal to the number of sentences. So likely it should be updated somewhere in the part of the loop, responsible for sentences. It will contain numbers, so I need to calculate them somewhere. I will name this variable num_words_with_e.

my_lengths_with_e = []
for sentence in my_string.split(". "):
    word_list = sentence.split()
    num_words_with_e = 0 #for every new sentence we will start counting from 0
    for word in word_list:
      if 'e' in word:
        num_words_with_e = num_words_with_e + 1
    my_lengths_with_e.append(num_words_with_e) #note there I am doing that looking at the indentation. Outside of `if 'e' in word`, outside of the inner loop (`for word in word_list`), but inside the outer loop `for sentence in my_string.split(". ")`

print(my_lengths_with_e)
[0, 2, 2]

Example 3. Complex logical (bool) expressions and if statements#

Say, you need to classify reviews to good or bad or ambiguous

reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']

good_reviews = []
bad_reviews = []
ambiguous_reviews = []

for review in reviews:
  if 'best' in review:
    good_reviews.append(review)
  if 'worst' in review:
    bad_reviews.append(review)
  if 'best' and 'worst' in review:
    ambiguous_reviews.append(review)

print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history', 'best movie, worst actors']
['best movie, worst actors', 'worst movie']
['best movie, worst actors', 'worst movie']

This is wrong! Two problems here, let’s start with if 'best' and 'worst' in review not doing what we want.

To make it a working example to tinker with, let me define review variable outside of the conditional statement

review = reviews[2] #third review

print(review)

print('best' and 'worst' in review)
worst movie
True

This is upsetting! Why Python thinks it is true?

and and or work for two logical expressions (resulting in bool values: True or False).

That means on the left and right of and, there should be a logical expression. Let me group them as Python interprets them using braces for clarity:

print(('best') and ('worst' in review))

print(bool('best'))
True
True

So if instead of a logical expression we give Python, say, a string (like ‘best’) or a non-zero number, it will try to be smart and transform it into a bool value, but everything non-zero would be transformed to True

print(bool(56454))
print(bool('0'))
print(bool("That's annoying!"))
print(bool(0))
True
True
True
False

So, we need to be very precise and tell exactly what we want

print(('best' in review) and ('worst' in review))
False

Phew! Now it seems to work, let us try again

reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']

good_reviews = []
bad_reviews = []
ambiguous_reviews = []

for review in reviews:
  if 'best' in review:
    good_reviews.append(review)
  if 'worst' in review:
    bad_reviews.append(review)
  if ('best' in review) and ('worst' in review):
    ambiguous_reviews.append(review)

print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history', 'best movie, worst actors']
['best movie, worst actors', 'worst movie']
['best movie, worst actors']

So that error is fixed, but depending on the task, we might want to have ambiguous reviews only in their own list, not in good or bad lists.

We forgot that Python will check one by one all of the if clauses and execute all there the condition is satisfied.

Let’s try to use elif to be more precise!

reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']

good_reviews = []
bad_reviews = []
ambiguous_reviews = []

for review in reviews:
  if 'best' in review:
    good_reviews.append(review)
  elif 'worst' in review:
    bad_reviews.append(review)
  elif ('best' in review) and ('worst' in review): #or else
    ambiguous_reviews.append(review)

print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history', 'best movie, worst actors']
['worst movie']
[]

Again not what we want! We need to change the order of checking conditions

reviews = ['best movie in history', 'best movie, worst actors', 'worst movie']

good_reviews = []
bad_reviews = []
ambiguous_reviews = []

for review in reviews:
  if ('best' in review) and ('worst' in review): #ambiguous reviews go here and we won't check other conditions for them
    ambiguous_reviews.append(review)
  elif 'best' in review:
    good_reviews.append(review)
  elif 'worst' in review: #or else
    bad_reviews.append(review)


print(good_reviews)
print(bad_reviews)
print(ambiguous_reviews)
['best movie in history']
['worst movie']
['best movie, worst actors']

Now it looks like what we need!

Remember you can always test with nasty logical expressions outside of your if:

print('best' in 'best')
print('best' in 'worst')
print('best' in 'best and worst')
print('best' in 'best and worst' and 'worst' in 'best and worst')
True
False
True
True

That’s it, folks

Best of luck with the midterm!