.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_gallery/python_lang.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_gallery_python_lang.py: Source `Kevin Markham `_ .. GENERATED FROM PYTHON SOURCE LINES 8-11 Import libraries ---------------- .. GENERATED FROM PYTHON SOURCE LINES 13-14 'generic import' of math module .. GENERATED FROM PYTHON SOURCE LINES 14-18 .. code-block:: Python import math math.sqrt(25) .. rst-class:: sphx-glr-script-out .. code-block:: none 5.0 .. GENERATED FROM PYTHON SOURCE LINES 19-20 import a function .. GENERATED FROM PYTHON SOURCE LINES 20-24 .. code-block:: Python from math import sqrt sqrt(25) # no longer have to reference the module .. rst-class:: sphx-glr-script-out .. code-block:: none 5.0 .. GENERATED FROM PYTHON SOURCE LINES 25-26 import multiple functions at once .. GENERATED FROM PYTHON SOURCE LINES 26-29 .. code-block:: Python from math import sqrt, exp .. GENERATED FROM PYTHON SOURCE LINES 30-35 import all functions in a module (strongly discouraged) :: from os import * .. GENERATED FROM PYTHON SOURCE LINES 38-39 define an alias .. GENERATED FROM PYTHON SOURCE LINES 39-44 .. code-block:: Python import nltk import numpy as np np.sqrt(9) .. rst-class:: sphx-glr-script-out .. code-block:: none np.float64(3.0) .. GENERATED FROM PYTHON SOURCE LINES 45-46 show all functions in math module .. GENERATED FROM PYTHON SOURCE LINES 46-49 .. code-block:: Python content = dir(math) .. GENERATED FROM PYTHON SOURCE LINES 50-53 Basic operations ---------------- .. GENERATED FROM PYTHON SOURCE LINES 55-56 Numbers .. GENERATED FROM PYTHON SOURCE LINES 56-68 .. code-block:: Python 10 + 4 # add (returns 14) 10 - 4 # subtract (returns 6) 10 * 4 # multiply (returns 40) 10 ** 4 # exponent (returns 10000) 10 / 4 # divide (returns 2 because both types are 'int') 10 / float(4) # divide (returns 2.5) 5 % 4 # modulo (returns 1) - also known as the remainder 10 / 4 # true division (returns 2.5) 10 // 4 # floor division (returns 2) .. rst-class:: sphx-glr-script-out .. code-block:: none 2 .. GENERATED FROM PYTHON SOURCE LINES 69-72 Boolean operations comparisons (these return True) .. GENERATED FROM PYTHON SOURCE LINES 72-78 .. code-block:: Python 5 > 3 5 >= 3 5 != 3 5 == 5 .. rst-class:: sphx-glr-script-out .. code-block:: none True .. GENERATED FROM PYTHON SOURCE LINES 79-80 Boolean operations (these return True) .. GENERATED FROM PYTHON SOURCE LINES 80-87 .. code-block:: Python 5 > 3 and 6 > 3 5 > 3 or 5 < 3 not False False or not False and True # evaluation order: not, and, or .. rst-class:: sphx-glr-script-out .. code-block:: none True .. GENERATED FROM PYTHON SOURCE LINES 88-92 Data types ---------- Determine the type of an object .. GENERATED FROM PYTHON SOURCE LINES 92-99 .. code-block:: Python type(2) # returns 'int' type(2.0) # returns 'float' type('two') # returns 'str' type(True) # returns 'bool' type(None) # returns 'NoneType' .. GENERATED FROM PYTHON SOURCE LINES 100-101 Check if an object is of a given type .. GENERATED FROM PYTHON SOURCE LINES 101-105 .. code-block:: Python isinstance(2.0, int) # returns False isinstance(2.0, (int, float)) # returns True .. rst-class:: sphx-glr-script-out .. code-block:: none True .. GENERATED FROM PYTHON SOURCE LINES 106-107 Convert an object to a given type .. GENERATED FROM PYTHON SOURCE LINES 107-112 .. code-block:: Python float(2) int(2.9) str(2.9) .. rst-class:: sphx-glr-script-out .. code-block:: none '2.9' .. GENERATED FROM PYTHON SOURCE LINES 113-114 zero, None, and empty containers are converted to False .. GENERATED FROM PYTHON SOURCE LINES 114-121 .. code-block:: Python bool(0) bool(None) bool('') # empty string bool([]) # empty list bool({}) # empty dictionary .. rst-class:: sphx-glr-script-out .. code-block:: none False .. GENERATED FROM PYTHON SOURCE LINES 122-123 Non-empty containers and non-zeros are converted to True .. GENERATED FROM PYTHON SOURCE LINES 123-129 .. code-block:: Python bool(2) bool('two') bool([2]) .. rst-class:: sphx-glr-script-out .. code-block:: none True .. GENERATED FROM PYTHON SOURCE LINES 130-140 Lists ~~~~~ Different objects categorized along a certain ordered sequence, lists are ordered, iterable, mutable (adding or removing objects changes the list size), can contain multiple data types. Creation Empty list (two ways) .. GENERATED FROM PYTHON SOURCE LINES 140-144 .. code-block:: Python empty_list = [] empty_list = list() .. GENERATED FROM PYTHON SOURCE LINES 145-146 List with values .. GENERATED FROM PYTHON SOURCE LINES 146-149 .. code-block:: Python simpsons = ['homer', 'marge', 'bart'] .. GENERATED FROM PYTHON SOURCE LINES 150-151 Examine a list .. GENERATED FROM PYTHON SOURCE LINES 151-155 .. code-block:: Python simpsons[0] # print element 0 ('homer') len(simpsons) # returns the length (3) .. rst-class:: sphx-glr-script-out .. code-block:: none 3 .. GENERATED FROM PYTHON SOURCE LINES 156-159 Modify a list (does not return the list) Append .. GENERATED FROM PYTHON SOURCE LINES 159-164 .. code-block:: Python simpsons.append('lisa') # append element to end simpsons.extend(['itchy', 'scratchy']) # append multiple elements to end # insert element at index 0 (shifts everything right) .. GENERATED FROM PYTHON SOURCE LINES 165-166 Insert .. GENERATED FROM PYTHON SOURCE LINES 166-170 .. code-block:: Python simpsons.insert(0, 'maggie') # searches for first instance and removes it .. GENERATED FROM PYTHON SOURCE LINES 171-172 Remove .. GENERATED FROM PYTHON SOURCE LINES 172-179 .. code-block:: Python simpsons.remove('bart') simpsons.pop(0) # removes element 0 and returns it # removes element 0 (does not return it) del simpsons[0] simpsons[0] = 'krusty' # replace element 0 .. GENERATED FROM PYTHON SOURCE LINES 180-181 Concatenate lists (slower than 'extend' method) .. GENERATED FROM PYTHON SOURCE LINES 181-184 .. code-block:: Python neighbors = simpsons + ['ned', 'rod', 'todd'] .. GENERATED FROM PYTHON SOURCE LINES 185-186 Replicate .. GENERATED FROM PYTHON SOURCE LINES 186-189 .. code-block:: Python rep = ["a"] * 2 + ["b"] * 3 .. GENERATED FROM PYTHON SOURCE LINES 190-191 Find elements in a list .. GENERATED FROM PYTHON SOURCE LINES 191-196 .. code-block:: Python 'lisa' in simpsons simpsons.count('lisa') # counts the number of instances simpsons.index('itchy') # returns index of first instance .. rst-class:: sphx-glr-script-out .. code-block:: none 2 .. GENERATED FROM PYTHON SOURCE LINES 197-198 List slicing (selection) ``[start:end:stride]`` .. GENERATED FROM PYTHON SOURCE LINES 198-207 .. code-block:: Python weekdays = ['mon', 'tues', 'wed', 'thurs', 'fri'] weekdays[0] # element 0 weekdays[0:3] # elements 0, 1, 2 weekdays[:3] # elements 0, 1, 2 weekdays[3:] # elements 3, 4 weekdays[-1] # last element (element 4) weekdays[::2] # every 2nd element (0, 2, 4) .. rst-class:: sphx-glr-script-out .. code-block:: none ['mon', 'wed', 'fri'] .. GENERATED FROM PYTHON SOURCE LINES 208-209 Reverse list .. GENERATED FROM PYTHON SOURCE LINES 209-214 .. code-block:: Python weekdays[::-1] # backwards (4, 3, 2, 1, 0) # alternative method for returning the list backwards list(reversed(weekdays)) .. rst-class:: sphx-glr-script-out .. code-block:: none ['fri', 'thurs', 'wed', 'tues', 'mon'] .. GENERATED FROM PYTHON SOURCE LINES 215-218 Sort list Sort a list in place (modifies but does not return the list) .. GENERATED FROM PYTHON SOURCE LINES 218-223 .. code-block:: Python simpsons.sort() simpsons.sort(reverse=True) # sort in reverse simpsons.sort(key=len) # sort by a key .. GENERATED FROM PYTHON SOURCE LINES 224-225 Return a sorted list (but does not modify the original list) .. GENERATED FROM PYTHON SOURCE LINES 225-231 .. code-block:: Python sorted(simpsons) sorted(simpsons, reverse=True) sorted(simpsons, key=len) .. rst-class:: sphx-glr-script-out .. code-block:: none ['lisa', 'itchy', 'krusty', 'scratchy'] .. GENERATED FROM PYTHON SOURCE LINES 232-238 Tuples ~~~~~~ Like lists, but their size cannot change: ordered, iterable, immutable, can contain multiple data types .. GENERATED FROM PYTHON SOURCE LINES 238-264 .. code-block:: Python # create a tuple digits = (0, 1, 'two') # create a tuple directly digits = tuple([0, 1, 'two']) # create a tuple from a list # trailing comma is required to indicate it's a tuple zero = (0,) # examine a tuple digits[2] # returns 'two' len(digits) # returns 3 digits.count(0) # counts the number of instances of that value (1) digits.index(1) # returns the index of the first instance of that value (1) # elements of a tuple cannot be modified # digits[2] = 2 # throws an error # concatenate tuples digits = digits + (3, 4) # create a single tuple with elements repeated (also works with lists) (3, 4) * 2 # returns (3, 4, 3, 4) # tuple unpacking bart = ('male', 10, 'simpson') # create a tuple .. GENERATED FROM PYTHON SOURCE LINES 265-270 Strings ~~~~~~~ A sequence of characters, they are iterable, immutable .. GENERATED FROM PYTHON SOURCE LINES 270-314 .. code-block:: Python # create a string s = str(42) # convert another data type into a string s = 'I like you' # examine a string s[0] # returns 'I' len(s) # returns 10 # string slicing like lists s[:6] # returns 'I like' s[7:] # returns 'you' s[-1] # returns 'u' # basic string methods (does not modify the original string) s.lower() # returns 'i like you' s.upper() # returns 'I LIKE YOU' s.startswith('I') # returns True s.endswith('you') # returns True s.isdigit() # returns False (True if every character is a digit) s.find('like') # returns index of first occurrence s.find('hate') # returns -1 since not found s.replace('like', 'love') # replaces all instances of 'like' with 'love' # split a string into a list of substrings separated by a delimiter s.split(' ') # returns ['I','like','you'] s.split() # same thing s2 = 'a, an, the' s2.split(',') # returns ['a',' an',' the'] # join a list of strings into one string using a delimiter stooges = ['larry', 'curly', 'moe'] ' '.join(stooges) # returns 'larry curly moe' # concatenate strings s3 = 'The meaning of life is' s4 = '42' s3 + ' ' + s4 # returns 'The meaning of life is 42' s3 + ' ' + str(42) # same thing # remove whitespace from start and end of a string s5 = ' ham and cheese ' s5.strip() # returns 'ham and cheese' .. rst-class:: sphx-glr-script-out .. code-block:: none 'ham and cheese' .. GENERATED FROM PYTHON SOURCE LINES 315-316 Strings formatting .. GENERATED FROM PYTHON SOURCE LINES 316-331 .. code-block:: Python # string substitutions: all of these return 'raining cats and dogs' 'raining %s and %s' % ('cats', 'dogs') # old way 'raining {} and {}'.format('cats', 'dogs') # new way 'raining {arg1} and {arg2}'.format(arg1='cats', arg2='dogs') # named arguments # String formatting # See: https://realpython.com/python-formatted-output/ # Old method print('6 %s' % 'bananas') print('%d %s cost $%.1f' % (6, 'bananas', 3.14159)) # Format method positional arguments print('{0} {1} cost ${2:.1f}'.format(6, 'bananas', 3.14159)) .. rst-class:: sphx-glr-script-out .. code-block:: none 6 bananas 6 bananas cost $3.1 6 bananas cost $3.1 .. GENERATED FROM PYTHON SOURCE LINES 332-333 `Strings encoding `_ .. GENERATED FROM PYTHON SOURCE LINES 335-337 Normal strings allow for escaped characters. The default strings use unicode string (u string) .. GENERATED FROM PYTHON SOURCE LINES 337-342 .. code-block:: Python print('first line\nsecond line') # or print(u'first line\nsecond line') print('first line\nsecond line' == u'first line\nsecond line') .. rst-class:: sphx-glr-script-out .. code-block:: none first line second line first line second line True .. GENERATED FROM PYTHON SOURCE LINES 343-345 Raw strings treat backslashes as literal characters .. GENERATED FROM PYTHON SOURCE LINES 345-349 .. code-block:: Python print(r'first line\nfirst line') print('first line\nsecond line' == r'first line\nsecond line') .. rst-class:: sphx-glr-script-out .. code-block:: none first line\nfirst line False .. GENERATED FROM PYTHON SOURCE LINES 350-352 Sequence of bytes are not strings, should be decoded before some operations .. GENERATED FROM PYTHON SOURCE LINES 352-358 .. code-block:: Python s = b'first line\nsecond line' print(s) print(s.decode('utf-8').split()) .. rst-class:: sphx-glr-script-out .. code-block:: none b'first line\nsecond line' ['first', 'line', 'second', 'line'] .. GENERATED FROM PYTHON SOURCE LINES 359-369 Dictionaries ~~~~~~~~~~~~ **Dictionary is the must-known data structure**. Dictionaries are structures which can contain multiple data types, and is ordered with key-value pairs: for each (unique) key, the dictionary outputs one value. Keys can be strings, numbers, or tuples, while the corresponding values can be any Python object. Dictionaries are: unordered, iterable, mutable .. GENERATED FROM PYTHON SOURCE LINES 371-372 Creation .. GENERATED FROM PYTHON SOURCE LINES 372-389 .. code-block:: Python # Empty dictionary (two ways) empty_dict = {} empty_dict = dict() simpsons_roles_dict = {'Homer': 'father', 'Marge': 'mother', 'Bart': 'son', 'Lisa': 'daughter', 'Maggie': 'daughter'} simpsons_roles_dict = dict(Homer='father', Marge='mother', Bart='son', Lisa='daughter', Maggie='daughter') simpsons_roles_dict = dict([('Homer', 'father'), ('Marge', 'mother'), ('Bart', 'son'), ('Lisa', 'daughter'), ('Maggie', 'daughter')]) print(simpsons_roles_dict) .. rst-class:: sphx-glr-script-out .. code-block:: none {'Homer': 'father', 'Marge': 'mother', 'Bart': 'son', 'Lisa': 'daughter', 'Maggie': 'daughter'} .. GENERATED FROM PYTHON SOURCE LINES 390-391 Access .. GENERATED FROM PYTHON SOURCE LINES 391-414 .. code-block:: Python # examine a dictionary simpsons_roles_dict['Homer'] # 'father' len(simpsons_roles_dict) # 5 simpsons_roles_dict.keys() # list: ['Homer', 'Marge', ...] simpsons_roles_dict.values() # list:['father', 'mother', ...] simpsons_roles_dict.items() # list of tuples: [('Homer', 'father') ...] 'Homer' in simpsons_roles_dict # returns True 'John' in simpsons_roles_dict # returns False (only checks keys) # accessing values more safely with 'get' simpsons_roles_dict['Homer'] # returns 'father' simpsons_roles_dict.get('Homer') # same thing try: simpsons_roles_dict['John'] # throws an error except KeyError as e: print("Error", e) simpsons_roles_dict.get('John') # None # returns 'not found' (the default) simpsons_roles_dict.get('John', 'not found') .. rst-class:: sphx-glr-script-out .. code-block:: none Error 'John' 'not found' .. GENERATED FROM PYTHON SOURCE LINES 415-416 Modify a dictionary (does not return the dictionary) .. GENERATED FROM PYTHON SOURCE LINES 416-427 .. code-block:: Python simpsons_roles_dict['Snowball'] = 'dog' # add a new entry simpsons_roles_dict['Snowball'] = 'cat' # add a new entry simpsons_roles_dict['Snoop'] = 'dog' # edit an existing entry del simpsons_roles_dict['Snowball'] # delete an entry simpsons_roles_dict.pop('Snoop') # removes and returns ('dog') simpsons_roles_dict.update( {'Mona': 'grandma', 'Abraham': 'grandpa'}) # add multiple entries print(simpsons_roles_dict) .. rst-class:: sphx-glr-script-out .. code-block:: none {'Homer': 'father', 'Marge': 'mother', 'Bart': 'son', 'Lisa': 'daughter', 'Maggie': 'daughter', 'Mona': 'grandma', 'Abraham': 'grandpa'} .. GENERATED FROM PYTHON SOURCE LINES 428-429 Intersecting two dictionaries .. GENERATED FROM PYTHON SOURCE LINES 429-444 .. code-block:: Python simpsons_ages_dict = {'Homer': 45, 'Marge': 43, 'Bart': 11, 'Lisa': 10, 'Maggie': 1} print(simpsons_roles_dict.keys() & simpsons_ages_dict.keys()) inter = simpsons_roles_dict.keys() & simpsons_ages_dict.keys() l = list() for n in inter: l.append([n, simpsons_ages_dict[n], simpsons_roles_dict[n]]) [[n, simpsons_ages_dict[n], simpsons_roles_dict[n]] for n in inter] .. rst-class:: sphx-glr-script-out .. code-block:: none {'Homer', 'Marge', 'Lisa', 'Maggie', 'Bart'} [['Homer', 45, 'father'], ['Marge', 43, 'mother'], ['Lisa', 10, 'daughter'], ['Maggie', 1, 'daughter'], ['Bart', 11, 'son']] .. GENERATED FROM PYTHON SOURCE LINES 445-446 Iterating both key and values .. GENERATED FROM PYTHON SOURCE LINES 446-449 .. code-block:: Python [[key, val] for key, val in simpsons_ages_dict.items()] .. rst-class:: sphx-glr-script-out .. code-block:: none [['Homer', 45], ['Marge', 43], ['Bart', 11], ['Lisa', 10], ['Maggie', 1]] .. GENERATED FROM PYTHON SOURCE LINES 450-452 String substitution using a dictionary: syntax ``%(key)format``, where ``format`` is the formatting character e.g. ``s`` for string. .. GENERATED FROM PYTHON SOURCE LINES 452-456 .. code-block:: Python print('Homer is the %(Homer)s of the family' % simpsons_roles_dict) .. rst-class:: sphx-glr-script-out .. code-block:: none Homer is the father of the family .. GENERATED FROM PYTHON SOURCE LINES 457-464 Sets ~~~~ Like dictionaries, but with unique keys only (no corresponding values). They are: unordered, iterable, mutable, can contain multiple data types made up of unique elements (strings, numbers, or tuples) .. GENERATED FROM PYTHON SOURCE LINES 466-467 Creation .. GENERATED FROM PYTHON SOURCE LINES 467-475 .. code-block:: Python # create an empty set empty_set = set() # create a set languages = {'python', 'r', 'java'} # create a set directly snakes = set(['cobra', 'viper', 'python']) # create a set from a list .. GENERATED FROM PYTHON SOURCE LINES 476-477 Examine a set .. GENERATED FROM PYTHON SOURCE LINES 477-480 .. code-block:: Python len(languages) # 3 'python' in languages # True .. rst-class:: sphx-glr-script-out .. code-block:: none True .. GENERATED FROM PYTHON SOURCE LINES 481-482 Set operations .. GENERATED FROM PYTHON SOURCE LINES 482-509 .. code-block:: Python languages & snakes # intersection: {'python'} languages | snakes # union: {'cobra', 'r', 'java', 'viper', 'python'} languages - snakes # set difference: {'r', 'java'} snakes - languages # set difference: {'cobra', 'viper'} # modify a set (does not return the set) languages.add('sql') # add a new element # try to add an existing element (ignored, no error) languages.add('r') languages.remove('java') # remove an element try: languages.remove('c') # remove a non-existing element: throws an error except KeyError as e: print("Error", e) # removes an element if present, but ignored otherwise languages.discard('c') languages.pop() # removes and returns an arbitrary element languages.clear() # removes all elements languages.update('go', 'spark') # add multiple elements (list or set) # get a sorted list of unique elements from a list sorted(set([9, 0, 2, 1, 0])) # returns [0, 1, 2, 9] .. rst-class:: sphx-glr-script-out .. code-block:: none Error 'c' [0, 1, 2, 9] .. GENERATED FROM PYTHON SOURCE LINES 510-513 Execution control statements ---------------------------- .. GENERATED FROM PYTHON SOURCE LINES 515-517 Conditional statements ~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 519-520 if statement .. GENERATED FROM PYTHON SOURCE LINES 520-525 .. code-block:: Python x = 3 if x > 0: print('positive') .. rst-class:: sphx-glr-script-out .. code-block:: none positive .. GENERATED FROM PYTHON SOURCE LINES 526-527 if/else statement .. GENERATED FROM PYTHON SOURCE LINES 527-533 .. code-block:: Python if x > 0: print('positive') else: print('zero or negative') .. rst-class:: sphx-glr-script-out .. code-block:: none positive .. GENERATED FROM PYTHON SOURCE LINES 534-535 Single-line if/else statement, known as a 'ternary operator' .. GENERATED FROM PYTHON SOURCE LINES 535-539 .. code-block:: Python sign = 'positive' if x > 0 else 'zero or negative' print(sign) .. rst-class:: sphx-glr-script-out .. code-block:: none positive .. GENERATED FROM PYTHON SOURCE LINES 540-541 if/elif/else statement .. GENERATED FROM PYTHON SOURCE LINES 541-550 .. code-block:: Python if x > 0: print('positive') elif x == 0: print('zero') else: print('negative') .. rst-class:: sphx-glr-script-out .. code-block:: none positive .. GENERATED FROM PYTHON SOURCE LINES 551-558 Loops ~~~~~ Loops are a set of instructions which repeat until termination conditions are met. This can include iterating through all values in an object, go through a range of values, etc .. GENERATED FROM PYTHON SOURCE LINES 558-565 .. code-block:: Python # range returns a list of integers # returns [0, 1, 2]: includes first value but excludes second value range(0, 3) range(3) # same thing: starting at zero is the default range(0, 5, 2) # returns [0, 2, 4]: third argument specifies the 'stride' .. rst-class:: sphx-glr-script-out .. code-block:: none range(0, 5, 2) .. GENERATED FROM PYTHON SOURCE LINES 566-567 Iterate on list values .. GENERATED FROM PYTHON SOURCE LINES 567-572 .. code-block:: Python fruits = ['Apple', 'Banana', 'cherry'] for fruit in fruits: print(fruit.upper()) .. rst-class:: sphx-glr-script-out .. code-block:: none APPLE BANANA CHERRY .. GENERATED FROM PYTHON SOURCE LINES 573-574 Iterate with index .. GENERATED FROM PYTHON SOURCE LINES 574-578 .. code-block:: Python for i in range(len(fruits)): print(fruits[i].lower()) .. rst-class:: sphx-glr-script-out .. code-block:: none apple banana cherry .. GENERATED FROM PYTHON SOURCE LINES 579-580 Iterate with index and values: ``enumerate`` .. GENERATED FROM PYTHON SOURCE LINES 580-590 .. code-block:: Python for i, val in enumerate(fruits): print(i, val.upper()) # Use range when iterating over a large sequence to avoid actually # creating the integer list in memory v = 0 for i in range(10 ** 6): v += 1 .. rst-class:: sphx-glr-script-out .. code-block:: none 0 APPLE 1 BANANA 2 CHERRY .. GENERATED FROM PYTHON SOURCE LINES 591-603 List comprehensions, iterators, etc. ------------------------------------ List comprehensions ~~~~~~~~~~~~~~~~~~~ `List comprehensions `_ provides an elegant syntax for the most common processing pattern: 1. iterate over a list, 2. apply some operation 3. store the result in a new list .. GENERATED FROM PYTHON SOURCE LINES 605-606 Classical iteration over a list .. GENERATED FROM PYTHON SOURCE LINES 606-612 .. code-block:: Python nums = [1, 2, 3, 4, 5] cubes = [] for num in nums: cubes.append(num ** 3) .. GENERATED FROM PYTHON SOURCE LINES 613-614 Equivalent list comprehension .. GENERATED FROM PYTHON SOURCE LINES 614-617 .. code-block:: Python cubes = [num**3 for num in nums] # [1, 8, 27, 64, 125] .. GENERATED FROM PYTHON SOURCE LINES 618-620 Classical iteration over a list with **if condition**: create a list of cubes of even numbers .. GENERATED FROM PYTHON SOURCE LINES 620-626 .. code-block:: Python cubes_of_even = [] for num in nums: if num % 2 == 0: cubes_of_even.append(num**3) .. GENERATED FROM PYTHON SOURCE LINES 627-629 Equivalent list comprehension with **if condition** syntax: ``[expression for variable in iterable if condition]`` .. GENERATED FROM PYTHON SOURCE LINES 629-632 .. code-block:: Python cubes_of_even = [num**3 for num in nums if num % 2 == 0] # [8, 64] .. GENERATED FROM PYTHON SOURCE LINES 633-635 Classical iteration over a list with **if else condition**: for loop to cube even numbers and square odd numbers .. GENERATED FROM PYTHON SOURCE LINES 635-643 .. code-block:: Python cubes_and_squares = [] for num in nums: if num % 2 == 0: cubes_and_squares.append(num**3) else: cubes_and_squares.append(num**2) .. GENERATED FROM PYTHON SOURCE LINES 644-647 Equivalent list comprehension (using a ternary expression) for loop to cube even numbers and square odd numbers syntax: ``[true_condition if condition else false_condition for variable in iterable]`` .. GENERATED FROM PYTHON SOURCE LINES 647-651 .. code-block:: Python cubes_and_squares = [num**3 if num % 2 == 0 else num**2 for num in nums] print(cubes_and_squares) .. rst-class:: sphx-glr-script-out .. code-block:: none [1, 8, 9, 64, 25] .. GENERATED FROM PYTHON SOURCE LINES 652-653 Nested loops: flatten a 2d-matrix .. GENERATED FROM PYTHON SOURCE LINES 653-660 .. code-block:: Python matrix = [[1, 2], [3, 4]] items = [] for row in matrix: for item in row: items.append(item) .. GENERATED FROM PYTHON SOURCE LINES 661-662 Equivalent list comprehension with Nested loops .. GENERATED FROM PYTHON SOURCE LINES 662-668 .. code-block:: Python items = [item for row in matrix for item in row] print(items) .. rst-class:: sphx-glr-script-out .. code-block:: none [1, 2, 3, 4] .. GENERATED FROM PYTHON SOURCE LINES 669-671 Set comprehension ~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 671-676 .. code-block:: Python fruits = ['apple', 'banana', 'cherry'] unique_lengths = {len(fruit) for fruit in fruits} print(unique_lengths) .. rst-class:: sphx-glr-script-out .. code-block:: none {5, 6} .. GENERATED FROM PYTHON SOURCE LINES 677-679 Dictionary comprehension ~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 681-682 Create a dictionary from a list .. GENERATED FROM PYTHON SOURCE LINES 682-686 .. code-block:: Python fruit_lengths = {fruit: len(fruit) for fruit in fruits} print(fruit_lengths) .. rst-class:: sphx-glr-script-out .. code-block:: none {'apple': 5, 'banana': 6, 'cherry': 6} .. GENERATED FROM PYTHON SOURCE LINES 687-688 Iterate over keys and values. Increase age of each subject: .. GENERATED FROM PYTHON SOURCE LINES 688-692 .. code-block:: Python simpsons_ages_ = {key: val + 1 for key, val in simpsons_ages_dict.items()} print(simpsons_ages_) .. rst-class:: sphx-glr-script-out .. code-block:: none {'Homer': 46, 'Marge': 44, 'Bart': 12, 'Lisa': 11, 'Maggie': 2} .. GENERATED FROM PYTHON SOURCE LINES 693-695 Combine two dictionaries sharing key. Example, a function that joins two dictionaries (intersecting keys) into a dictionary of lists .. GENERATED FROM PYTHON SOURCE LINES 695-701 .. code-block:: Python simpsons_info_dict = {name: [simpsons_roles_dict[name], simpsons_ages_dict[name]] for name in simpsons_roles_dict.keys() & simpsons_ages_dict.keys()} print(simpsons_info_dict) .. rst-class:: sphx-glr-script-out .. code-block:: none {'Homer': ['father', 45], 'Marge': ['mother', 43], 'Lisa': ['daughter', 10], 'Maggie': ['daughter', 1], 'Bart': ['son', 11]} .. GENERATED FROM PYTHON SOURCE LINES 702-704 Iterators ``itertools`` package ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 704-707 .. code-block:: Python import itertools .. GENERATED FROM PYTHON SOURCE LINES 708-709 Example: Cartesian product .. GENERATED FROM PYTHON SOURCE LINES 709-713 .. code-block:: Python print([[x, y] for x, y in itertools.product(['a', 'b', 'c'], [1, 2])]) .. rst-class:: sphx-glr-script-out .. code-block:: none [['a', 1], ['a', 2], ['b', 1], ['b', 2], ['c', 1], ['c', 2]] .. GENERATED FROM PYTHON SOURCE LINES 714-717 Example, use loop, dictionary and set to count words in a sentence ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 717-737 .. code-block:: Python quote = """Tick-tow our incomes are like our shoes; if too small they gall and pinch us but if too large they cause us to stumble and to trip """ words = quote.split() len(words) count = {word: 0 for word in set(words)} for word in words: count[word] += 1 # count[word] = count[word] + 1 print(count) import numpy as np freq_veq = np.array(list(count.values())) / len(words) .. rst-class:: sphx-glr-script-out .. code-block:: none {'small': 1, 'like': 1, 'stumble': 1, 'trip': 1, 'shoes;': 1, 'cause': 1, 'and': 2, 'pinch': 1, 'to': 2, 'large': 1, 'us': 2, 'but': 1, 'our': 2, 'if': 2, 'too': 2, 'they': 2, 'Tick-tow': 1, 'gall': 1, 'incomes': 1, 'are': 1} .. GENERATED FROM PYTHON SOURCE LINES 738-741 Exceptions handling ~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 741-753 .. code-block:: Python dct = dict(a=[1, 2], b=[4, 5]) key = 'c' try: dct[key] except: print("Key %s is missing. Add it with empty value" % key) dct['c'] = [] print(dct) .. rst-class:: sphx-glr-script-out .. code-block:: none Key c is missing. Add it with empty value {'a': [1, 2], 'b': [4, 5], 'c': []} .. GENERATED FROM PYTHON SOURCE LINES 754-760 Functions --------- Functions are sets of instructions launched when called upon, they can have multiple input values and a return value .. GENERATED FROM PYTHON SOURCE LINES 762-763 Function with no arguments and no return values .. GENERATED FROM PYTHON SOURCE LINES 763-772 .. code-block:: Python def print_text(): print('this is text') # call the function print_text() .. rst-class:: sphx-glr-script-out .. code-block:: none this is text .. GENERATED FROM PYTHON SOURCE LINES 773-774 Function with one argument and no return values .. GENERATED FROM PYTHON SOURCE LINES 774-785 .. code-block:: Python def print_this(x): print(x) # call the function print_this(3) # prints 3 n = print_this(3) # prints 3, but doesn't assign 3 to n # because the function has no return statement print(n) .. rst-class:: sphx-glr-script-out .. code-block:: none 3 3 None .. GENERATED FROM PYTHON SOURCE LINES 786-793 **Dynamic typing** Important remarque: **Python is a dynamically typed language**, meaning that the Python interpreter does type checking at runtime (as opposed to compiled language that are statically typed). As a consequence, the function behavior, decided, at execution time, will be different and specific to parameters type. Python function are polymorphic. .. GENERATED FROM PYTHON SOURCE LINES 793-801 .. code-block:: Python def add(a, b): return a + b print(add(2, 3), add("deux", "trois"), add(["deux", "trois"], [2, 3])) .. rst-class:: sphx-glr-script-out .. code-block:: none 5 deuxtrois ['deux', 'trois', 2, 3] .. GENERATED FROM PYTHON SOURCE LINES 802-803 **Default arguments** .. GENERATED FROM PYTHON SOURCE LINES 803-811 .. code-block:: Python def power_this(x, power=2): return x ** power print(power_this(2), power_this(2, 3)) .. rst-class:: sphx-glr-script-out .. code-block:: none 4 8 .. GENERATED FROM PYTHON SOURCE LINES 812-814 **Docstring** to describe the effect of a function IDE, ipython (type: ?power_this) to provide function documentation. .. GENERATED FROM PYTHON SOURCE LINES 814-826 .. code-block:: Python def power_this(x, power=2): """Return the power of a number. Args: x (float): the number power (int, optional): the power. Defaults to 2. """ return x ** power .. GENERATED FROM PYTHON SOURCE LINES 827-828 **Return several values** as tuple .. GENERATED FROM PYTHON SOURCE LINES 828-840 .. code-block:: Python def min_max(nums): return min(nums), max(nums) # return values can be assigned to a single variable as a tuple min_max_num = min_max([1, 2, 3]) # min_max_num = (1, 3) # return values can be assigned into multiple variables using tuple unpacking min_num, max_num = min_max([1, 2, 3]) # min_num = 1, max_num = 3 .. GENERATED FROM PYTHON SOURCE LINES 841-847 **Arbitrary number of Arguments** `Packing and Unpacking Arguments in Python `_ `*args` packs many positional arguments e.g., `add(1, 2, 3)` as a tuple, arguments can be manipulated as a tuple, ie `args[0]`, etc. .. GENERATED FROM PYTHON SOURCE LINES 847-857 .. code-block:: Python def add(*args): print(args) s = 0 for x in args: s += x return s print(add(2, 3) + add(1, 2, 3)) .. rst-class:: sphx-glr-script-out .. code-block:: none (2, 3) (1, 2, 3) 11 .. GENERATED FROM PYTHON SOURCE LINES 858-860 Pass arbitrary number of arguments to another function. re-pack arguments while passing them using `*args` .. GENERATED FROM PYTHON SOURCE LINES 860-868 .. code-block:: Python def dummy(*args): # do something return add(*args) print(dummy(2, 3) + dummy(1, 2, 3)) .. rst-class:: sphx-glr-script-out .. code-block:: none (2, 3) (1, 2, 3) 11 .. GENERATED FROM PYTHON SOURCE LINES 869-870 `**kwargs` packs many keywords arguments e.g., `add(x=1, y=2, z=3)` as a dictionary: .. GENERATED FROM PYTHON SOURCE LINES 870-885 .. code-block:: Python def add(**kwargs): s = 0 for key, val in kwargs.items(): s += val return s add(x=2, y=3) + add(x=1, y=2, z=3) # - `*args` packs many positional arguments e.g., `add(1, 2, 3)`` as a tuple: .. rst-class:: sphx-glr-script-out .. code-block:: none 11 .. GENERATED FROM PYTHON SOURCE LINES 886-894 Reference and copy ~~~~~~~~~~~~~~~~~~ `References `_ are used to access objects in memory, here lists. A single object may have multiple references. Modifying the content of the one reference will change the content of all other references. Modify a a reference of a list .. GENERATED FROM PYTHON SOURCE LINES 894-900 .. code-block:: Python num = [1, 2, 3] same_num = num # create a second reference to the same list same_num[0] = 0 # modifies both 'num' and 'same_num' print(num, same_num) .. rst-class:: sphx-glr-script-out .. code-block:: none [0, 2, 3] [0, 2, 3] .. GENERATED FROM PYTHON SOURCE LINES 901-905 Copies are references to different objects. Modifying the content of the one reference, will not affect the others. Modify a copy of a list .. GENERATED FROM PYTHON SOURCE LINES 905-912 .. code-block:: Python new_num = num.copy() new_num = num[:] new_num = list(num) new_num[0] = -1 # modifies 'new_num' but not 'num' print(num, new_num) .. rst-class:: sphx-glr-script-out .. code-block:: none [0, 2, 3] [-1, 2, 3] .. GENERATED FROM PYTHON SOURCE LINES 913-914 Examine objects .. GENERATED FROM PYTHON SOURCE LINES 914-922 .. code-block:: Python id(num) == id(same_num) # returns True id(num) == id(new_num) # returns False num is same_num # returns True num is new_num # returns False num == same_num # returns True num == new_num # returns True (their contents are equivalent) .. rst-class:: sphx-glr-script-out .. code-block:: none False .. GENERATED FROM PYTHON SOURCE LINES 923-925 Functions' arguments are references to objects. Thus functions can modify their arguments with possible side effect. .. GENERATED FROM PYTHON SOURCE LINES 925-934 .. code-block:: Python def change(x, index, newval): x[index] = newval l = [0, 1, 2] change(x=l, index=1, newval=33) print(l) .. rst-class:: sphx-glr-script-out .. code-block:: none [0, 33, 2] .. GENERATED FROM PYTHON SOURCE LINES 935-941 Example: function, and dictionary comprehension ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Example of a function ``join_dict_to_table(dict1, dict2)`` joining two dictionaries (intersecting keys) into a table, i.e., a list of tuples, where the first column is the key, the second and third columns are the values of the dictionaries. .. GENERATED FROM PYTHON SOURCE LINES 941-957 .. code-block:: Python simpsons_ages_dict = {'Homer': 45, 'Marge': 43, 'Bart': 11, 'Lisa': 10} simpsons_roles_dict = {'Homer': 'father', 'Marge': 'mother', 'Bart': 'son', 'Maggie': 'daughter'} def join_dict_to_table(dict1, dict2): table = [[key] + [dict1[key], dict2[key]] for key in dict1.keys() & dict2.keys()] return table print("Roles:", simpsons_roles_dict) print("Ages:", simpsons_ages_dict) print("Join:", join_dict_to_table(simpsons_roles_dict, simpsons_ages_dict)) .. rst-class:: sphx-glr-script-out .. code-block:: none Roles: {'Homer': 'father', 'Marge': 'mother', 'Bart': 'son', 'Maggie': 'daughter'} Ages: {'Homer': 45, 'Marge': 43, 'Bart': 11, 'Lisa': 10} Join: [['Bart', 'son', 11], ['Homer', 'father', 45], ['Marge', 'mother', 43]] .. GENERATED FROM PYTHON SOURCE LINES 958-963 Regular Expression ------------------ Regular Expression (RE, or RegEx) allow to search and patterns in strings. See `this page `_ for the syntax of the RE patterns. .. GENERATED FROM PYTHON SOURCE LINES 963-966 .. code-block:: Python import re .. GENERATED FROM PYTHON SOURCE LINES 967-979 **Usual patterns** - ``.`` period symbol matches any single character (except newline '\n'). - pattern``+`` plus symbol matches one or more occurrences of the pattern. - ``[]`` square brackets specifies a set of characters you wish to match - ``[abc]`` matches a, b or c - ``[a-c]`` matches a to z - ``[0-9]`` matches 0 to 9 - ``[a-zA-Z0-9]+`` matches words, at least one alphanumeric character (digits and alphabets) - ``[\w]+`` matches words, at least one alphanumeric character including underscore. - ``\s`` Matches where a string contains any whitespace character, equivalent to [ \t\n\r\f\v]. - ``[^\s]`` Caret ``^`` symbol (the start of a square-bracket) inverts the pattern selection . .. GENERATED FROM PYTHON SOURCE LINES 979-983 .. code-block:: Python # regex = re.compile("^.+(firstname:.+)_(lastname:.+)_(mod-.+)") # regex = re.compile("(firstname:.+)_(lastname:.+)_(mod-.+)") .. GENERATED FROM PYTHON SOURCE LINES 984-988 **Compile** (``re.compile(string)``) regular expression with a pattern that captures the pattern ``firstname:_lastname:``. Note that we use raw string `r'string'` so `\` is not interpreted as the start of an escape sequence. .. GENERATED FROM PYTHON SOURCE LINES 988-991 .. code-block:: Python pattern = re.compile(r'firstname:[\w]+_lastname:[\w]+') .. GENERATED FROM PYTHON SOURCE LINES 992-994 **Match** (``re.match(string)``) to be used in test, loop, etc. Determine if the RE matches **at the beginning** of the string. .. GENERATED FROM PYTHON SOURCE LINES 994-1001 .. code-block:: Python yes_ = True if pattern.match("firstname:John_lastname:Doe") else False no_ = True if pattern.match("blahbla_firstname:John_lastname:Doe") else False no2_ = True if pattern.match("OUPS-John_lastname:Doe") else False print(yes_, no_, no2_) .. rst-class:: sphx-glr-script-out .. code-block:: none True False False .. GENERATED FROM PYTHON SOURCE LINES 1002-1004 **Match** (``re.search(string)``) to be used in test, loop, etc. Determine if the RE matches **at any location** in the string. .. GENERATED FROM PYTHON SOURCE LINES 1004-1011 .. code-block:: Python yes_ = True if pattern.search("firstname:John_lastname:Doe") else False yes2_ = True if pattern.search( "blahbla_firstname:John_lastname:Doe") else False no_ = True if pattern.search("OUPS-John_lastname:Doe") else False print(yes_, yes2_, no_) .. rst-class:: sphx-glr-script-out .. code-block:: none True True False .. GENERATED FROM PYTHON SOURCE LINES 1012-1014 **Find** (``re.findall(string)``) all substrings where the RE matches, and returns them as a list. .. GENERATED FROM PYTHON SOURCE LINES 1014-1026 .. code-block:: Python # Find the whole pattern within the string pattern = re.compile(r'firstname:[\w]+_lastname:[\w]+') print(pattern.findall("firstname:John_lastname:Doe blah blah")) # Find words print(re.compile("[a-zA-Z0-9]+").findall("firstname:John_lastname:Doe")) # Find words with including underscore print(re.compile(r'[\w]+').findall("firstname:John_lastname:Doe")) .. rst-class:: sphx-glr-script-out .. code-block:: none ['firstname:John_lastname:Doe'] ['firstname', 'John', 'lastname', 'Doe'] ['firstname', 'John_lastname', 'Doe'] .. GENERATED FROM PYTHON SOURCE LINES 1027-1030 Extract specific parts of the RE: use parenthesis ``(part of pattern to be matched)`` Extract John and Doe, such as John is suffixed with firstname: and Doe is suffixed with lastname: .. GENERATED FROM PYTHON SOURCE LINES 1030-1035 .. code-block:: Python pattern = re.compile("firstname:([\w]+)_lastname:([\w]+)") print(pattern.findall("firstname:John_lastname:Doe \ firstname:Bart_lastname:Simpson")) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/ed203246/git/pystatsml/python_lang/python_lang.py:1031: SyntaxWarning: invalid escape sequence '\w' pattern = re.compile("firstname:([\w]+)_lastname:([\w]+)") [('John', 'Doe'), ('Bart', 'Simpson')] .. GENERATED FROM PYTHON SOURCE LINES 1036-1040 **Split** (``re.split(string)``) splits the string where there is a match and returns a list of strings where the splits have occurred. Example, match any non alphanumeric character (digits and alphabets) ``[^a-zA-Z0-9]`` to split the string. .. GENERATED FROM PYTHON SOURCE LINES 1040-1044 .. code-block:: Python print(re.compile("[^a-zA-Z0-9]").split("firstname:John_lastname:Doe")) .. rst-class:: sphx-glr-script-out .. code-block:: none ['firstname', 'John', 'lastname', 'Doe'] .. GENERATED FROM PYTHON SOURCE LINES 1045-1047 **Substitute** (``re.sub(pattern, replace, string)``) returns a string where matched occurrences are replaced with the content of replace variable. .. GENERATED FROM PYTHON SOURCE LINES 1047-1051 .. code-block:: Python print(re.sub('\s', "_", "Sentence with white space")) print(re.sub('\s+', "_", "Sentence with white space")) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/ed203246/git/pystatsml/python_lang/python_lang.py:1048: SyntaxWarning: invalid escape sequence '\s' print(re.sub('\s', "_", "Sentence with white space")) /home/ed203246/git/pystatsml/python_lang/python_lang.py:1049: SyntaxWarning: invalid escape sequence '\s' print(re.sub('\s+', "_", "Sentence with white space")) Sentence_with_white______space Sentence_with_white_space .. GENERATED FROM PYTHON SOURCE LINES 1052-1053 Remove all non-alphanumeric characters and space in a string .. GENERATED FROM PYTHON SOURCE LINES 1053-1056 .. code-block:: Python re.sub('[^0-9a-zA-Z\s]+', '', 'H^&ell`.,|o W]{+orld') .. rst-class:: sphx-glr-script-out .. code-block:: none /home/ed203246/git/pystatsml/python_lang/python_lang.py:1054: SyntaxWarning: invalid escape sequence '\s' re.sub('[^0-9a-zA-Z\s]+', '', 'H^&ell`.,|o W]{+orld') 'Hello World' .. GENERATED FROM PYTHON SOURCE LINES 1057-1060 System programming ------------------ .. GENERATED FROM PYTHON SOURCE LINES 1062-1064 Operating system interfaces (os) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 1064-1067 .. code-block:: Python import os .. GENERATED FROM PYTHON SOURCE LINES 1068-1069 Get/set current working directory .. GENERATED FROM PYTHON SOURCE LINES 1069-1078 .. code-block:: Python # Get the current working directory cwd = os.getcwd() print(cwd) # Set the current working directory os.chdir(cwd) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/ed203246/git/pystatsml/python_lang .. GENERATED FROM PYTHON SOURCE LINES 1079-1080 Temporary directory .. GENERATED FROM PYTHON SOURCE LINES 1080-1085 .. code-block:: Python import tempfile tmpdir = tempfile.gettempdir() print(tmpdir) .. rst-class:: sphx-glr-script-out .. code-block:: none /tmp .. GENERATED FROM PYTHON SOURCE LINES 1086-1087 Join paths .. GENERATED FROM PYTHON SOURCE LINES 1087-1090 .. code-block:: Python mytmpdir = os.path.join(tmpdir, "foobar") .. GENERATED FROM PYTHON SOURCE LINES 1091-1092 Create a directory .. GENERATED FROM PYTHON SOURCE LINES 1092-1098 .. code-block:: Python os.makedirs(os.path.join(tmpdir, "foobar", "plop", "toto"), exist_ok=True) # list containing the names of the entries in the directory given by path. os.listdir(mytmpdir) .. rst-class:: sphx-glr-script-out .. code-block:: none ['myfile.txt', 'plop'] .. GENERATED FROM PYTHON SOURCE LINES 1099-1101 File input/output ~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 1101-1106 .. code-block:: Python filename = os.path.join(mytmpdir, "myfile.txt") print(filename) lines = ["Dans python tout est bon", "Enfin, presque"] .. rst-class:: sphx-glr-script-out .. code-block:: none /tmp/foobar/myfile.txt .. GENERATED FROM PYTHON SOURCE LINES 1107-1108 Write line by line .. GENERATED FROM PYTHON SOURCE LINES 1108-1114 .. code-block:: Python fd = open(filename, "w") fd.write(lines[0] + "\n") fd.write(lines[1] + "\n") fd.close() .. GENERATED FROM PYTHON SOURCE LINES 1115-1116 Context manager to automatically close your file .. GENERATED FROM PYTHON SOURCE LINES 1116-1121 .. code-block:: Python with open(filename, 'w') as f: for line in lines: f.write(line + '\n') .. GENERATED FROM PYTHON SOURCE LINES 1122-1124 Read read one line at a time (entire file does not have to fit into memory) .. GENERATED FROM PYTHON SOURCE LINES 1124-1144 .. code-block:: Python f = open(filename, "r") f.readline() # one string per line (including newlines) f.readline() # next line f.close() # read the whole file at once, return a list of lines f = open(filename, 'r') f.readlines() # one list, each line is one string f.close() # use list comprehension to duplicate readlines without reading entire file at once f = open(filename, 'r') [line for line in f] f.close() # use a context manager to automatically close your file with open(filename, 'r') as f: lines = [line for line in f] .. GENERATED FROM PYTHON SOURCE LINES 1145-1148 Explore, list directories ~~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 1150-1151 Walk through directories and subdirectories ``os.walk(dir)`` .. GENERATED FROM PYTHON SOURCE LINES 1151-1158 .. code-block:: Python WD = os.path.join(tmpdir, "foobar") for dirpath, dirnames, filenames in os.walk(WD): print(dirpath, dirnames, filenames) .. rst-class:: sphx-glr-script-out .. code-block:: none /tmp/foobar ['plop'] ['myfile.txt'] /tmp/foobar/plop ['toto'] [] /tmp/foobar/plop/toto [] [] .. GENERATED FROM PYTHON SOURCE LINES 1159-1160 Search for a file using a wildcard ``glob.glob(dir)`` .. GENERATED FROM PYTHON SOURCE LINES 1160-1165 .. code-block:: Python import glob filenames = glob.glob(os.path.join(tmpdir, "*", "*.txt")) print(filenames) .. rst-class:: sphx-glr-script-out .. code-block:: none ['/tmp/foobar/myfile.txt', '/tmp/plop2/myfile.txt'] .. GENERATED FROM PYTHON SOURCE LINES 1166-1167 Manipulating file names, basename and extension .. GENERATED FROM PYTHON SOURCE LINES 1167-1178 .. code-block:: Python def split_filename_inparts(filename): dirname_ = os.path.dirname(filename) filename_noext_, ext_ = os.path.splitext(filename) basename_ = os.path.basename(filename_noext_) return dirname_, basename_, ext_ print(filenames[0], "=>", split_filename_inparts(filenames[0])) .. rst-class:: sphx-glr-script-out .. code-block:: none /tmp/foobar/myfile.txt => ('/tmp/foobar', 'myfile', '.txt') .. GENERATED FROM PYTHON SOURCE LINES 1179-1180 File operations: (recursive) copy, move, test if exists: ``shutil`` package .. GENERATED FROM PYTHON SOURCE LINES 1180-1183 .. code-block:: Python import shutil .. GENERATED FROM PYTHON SOURCE LINES 1184-1185 Copy .. GENERATED FROM PYTHON SOURCE LINES 1185-1191 .. code-block:: Python src = os.path.join(tmpdir, "foobar", "myfile.txt") dst = os.path.join(tmpdir, "foobar", "plop", "myfile.txt") shutil.copy(src, dst) print("copy %s to %s" % (src, dst)) .. rst-class:: sphx-glr-script-out .. code-block:: none copy /tmp/foobar/myfile.txt to /tmp/foobar/plop/myfile.txt .. GENERATED FROM PYTHON SOURCE LINES 1192-1193 Test if file exists ? .. GENERATED FROM PYTHON SOURCE LINES 1193-1196 .. code-block:: Python print("File %s exists ?" % dst, os.path.exists(dst)) .. rst-class:: sphx-glr-script-out .. code-block:: none File /tmp/foobar/plop/myfile.txt exists ? True .. GENERATED FROM PYTHON SOURCE LINES 1197-1198 Recursive copy,deletion and move .. GENERATED FROM PYTHON SOURCE LINES 1198-1216 .. code-block:: Python src = os.path.join(tmpdir, "foobar", "plop") dst = os.path.join(tmpdir, "plop2") try: print("Copy tree %s under %s" % (src, dst)) # Note that by default (dirs_exist_ok=True), meaning that copy will fail # if destination exists. shutil.copytree(src, dst, dirs_exist_ok=True) print("Delete tree %s" % dst) shutil.rmtree(dst) print("Move tree %s under %s" % (src, dst)) shutil.move(src, dst) except (FileExistsError, FileNotFoundError) as e: pass .. rst-class:: sphx-glr-script-out .. code-block:: none Copy tree /tmp/foobar/plop under /tmp/plop2 Delete tree /tmp/plop2 Move tree /tmp/foobar/plop under /tmp/plop2 .. GENERATED FROM PYTHON SOURCE LINES 1217-1221 Command execution with subprocess ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For more advanced use cases, the underlying Popen interface can be used directly. .. GENERATED FROM PYTHON SOURCE LINES 1221-1224 .. code-block:: Python import subprocess .. GENERATED FROM PYTHON SOURCE LINES 1225-1231 ``subprocess.run([command, args*])`` - Run the command described by args. - Wait for command to complete - return a CompletedProcess instance. - Does not capture stdout or stderr by default. To do so, pass PIPE for the stdout and/or stderr arguments. .. GENERATED FROM PYTHON SOURCE LINES 1231-1235 .. code-block:: Python p = subprocess.run(["ls", "-l"]) print(p.returncode) .. rst-class:: sphx-glr-script-out .. code-block:: none 0 .. GENERATED FROM PYTHON SOURCE LINES 1236-1237 Run through the shell .. GENERATED FROM PYTHON SOURCE LINES 1237-1240 .. code-block:: Python subprocess.run("ls -l", shell=True) .. rst-class:: sphx-glr-script-out .. code-block:: none CompletedProcess(args='ls -l', returncode=0) .. GENERATED FROM PYTHON SOURCE LINES 1241-1242 Capture output .. GENERATED FROM PYTHON SOURCE LINES 1242-1249 .. code-block:: Python out = subprocess.run( ["ls", "-a", "/"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) # out.stdout is a sequence of bytes that should be decoded into a utf-8 string print(out.stdout.decode('utf-8').split("\n")[:5]) .. rst-class:: sphx-glr-script-out .. code-block:: none ['.', '..', 'bin', 'bin.usr-is-merged', 'boot'] .. GENERATED FROM PYTHON SOURCE LINES 1250-1297 Multiprocessing and multithreading ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `Difference between multiprocessing and multithreading `_ is essential to perform efficient parallel processing on multi-cores computers. **Multiprocessing** A process is a program instance that has been loaded into memory and managed by the operating system. Process = address space + execution context (thread of control) - Process address space is made of (memory) segments for (i) code, (ii) data (static/global), (iii) heap (dynamic memory allocation), and the execution stack (functions' execution context). - Execution context consists of (i) data registers, (ii) Stack Pointer (SP), (iii) Program Counter (PC), and (iv) working Registers. OS Scheduling of processes: context switching (ie. save/load Execution context) Pros/cons - Context switching expensive. - (potentially) complex data sharing (not necessary true). - Cooperating processes - no need for memory protection (separate address spaces). - Relevant for parallel computation with memory allocation. **Multithreading** - Threads share the same address space (Data registers): access to code, heap and (global) data. - Separate execution stack, PC and Working Registers. Pros/cons - **Faster context switching** only SP, PC and Working Registers. - Can exploit fine-grain concurrency - Simple data sharing through the shared address space. - **But most of concurrent memory operations are serialized (blocked) by the global interpreter lock (GIL)**. The GIL prevents two threads writing to the same memory at the same time. - Relevant for GUI, I/O (Network, disk) concurrent operation **In Python** - **As long the GIL exists favor multiprocessing over multithreading** - Multithreading rely on ``threading`` module. - Multiprocessing rely on ``multiprocessing`` module. .. GENERATED FROM PYTHON SOURCE LINES 1300-1306 **Example: Random forest** Random forest are the obtained by Majority vote of decision tree on estimated on bootstrapped samples. Toy dataset .. GENERATED FROM PYTHON SOURCE LINES 1306-1321 .. code-block:: Python import time import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import balanced_accuracy_score # Toy dataset X, y = make_classification(n_features=1000, n_samples=5000, n_informative=20, random_state=1, n_clusters_per_class=3) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 1322-1325 Random forest algorithm: (i) In parallel, fit decision trees on bootstrapped data samples. Make predictions. (ii) Majority vote on predictions .. GENERATED FROM PYTHON SOURCE LINES 1327-1328 1. In parallel, fit decision trees on bootstrapped data sample. Make predictions. .. GENERATED FROM PYTHON SOURCE LINES 1328-1339 .. code-block:: Python def boot_decision_tree(X_train, X_test, y_train, predictions_list=None): N = X_train.shape[0] boot_idx = np.random.choice(np.arange(N), size=N, replace=True) clf = DecisionTreeClassifier(random_state=0) clf.fit(X_train[boot_idx], y_train[boot_idx]) y_pred = clf.predict(X_test) if predictions_list is not None: predictions_list.append(y_pred) return y_pred .. GENERATED FROM PYTHON SOURCE LINES 1340-1341 Independent runs of decision tree, see variability of predictions .. GENERATED FROM PYTHON SOURCE LINES 1341-1346 .. code-block:: Python for i in range(5): y_test_boot = boot_decision_tree(X_train, X_test, y_train) print("%.2f" % balanced_accuracy_score(y_test, y_test_boot)) .. rst-class:: sphx-glr-script-out .. code-block:: none 0.64 0.63 0.66 0.62 0.65 .. GENERATED FROM PYTHON SOURCE LINES 1347-1348 2. Majority vote on predictions .. GENERATED FROM PYTHON SOURCE LINES 1348-1358 .. code-block:: Python def vote(predictions): maj = np.apply_along_axis( lambda x: np.argmax(np.bincount(x)), axis=1, arr=predictions ) return maj .. GENERATED FROM PYTHON SOURCE LINES 1359-1362 **Sequential execution** Sequentially fit decision tree on bootstrapped samples, then apply majority vote .. GENERATED FROM PYTHON SOURCE LINES 1362-1372 .. code-block:: Python nboot = 2 start = time.time() y_test_boot = np.dstack([boot_decision_tree(X_train, X_test, y_train) for i in range(nboot)]).squeeze() y_test_vote = vote(y_test_boot) print("Balanced Accuracy: %.2f" % balanced_accuracy_score(y_test, y_test_vote)) print("Sequential execution, elapsed time:", time.time() - start) .. rst-class:: sphx-glr-script-out .. code-block:: none Balanced Accuracy: 0.63 Sequential execution, elapsed time: 1.3636796474456787 .. GENERATED FROM PYTHON SOURCE LINES 1373-1376 **Multithreading** Concurrent (parallel) execution of the function with two threads. .. GENERATED FROM PYTHON SOURCE LINES 1376-1401 .. code-block:: Python from threading import Thread predictions_list = list() thread1 = Thread(target=boot_decision_tree, args=(X_train, X_test, y_train, predictions_list)) thread2 = Thread(target=boot_decision_tree, args=(X_train, X_test, y_train, predictions_list)) # Will execute both in parallel start = time.time() thread1.start() thread2.start() # Joins threads back to the parent process thread1.join() thread2.join() # Vote on concatenated predictions y_test_boot = np.dstack(predictions_list).squeeze() y_test_vote = vote(y_test_boot) print("Balanced Accuracy: %.2f" % balanced_accuracy_score(y_test, y_test_vote)) print("Concurrent execution with threads, elapsed time:", time.time() - start) .. rst-class:: sphx-glr-script-out .. code-block:: none Balanced Accuracy: 0.64 Concurrent execution with threads, elapsed time: 0.6563949584960938 .. GENERATED FROM PYTHON SOURCE LINES 1402-1417 **Multiprocessing** Concurrent (parallel) execution of the function with processes (jobs) executed in different address (memory) space. `Process-based parallelism `_ ``Process()`` for parallel execution and ``Manager()`` for data sharing **Sharing data between process with Managers** Therefore, sharing data requires specific mechanism using . Managers provide a way to create data which can be shared between different processes, including sharing over a network between processes running on different machines. A manager object controls a server process which manages shared objects. .. GENERATED FROM PYTHON SOURCE LINES 1417-1441 .. code-block:: Python from multiprocessing import Process, Manager predictions_list = Manager().list() p1 = Process(target=boot_decision_tree, args=(X_train, X_test, y_train, predictions_list)) p2 = Process(target=boot_decision_tree, args=(X_train, X_test, y_train, predictions_list)) # Will execute both in parallel start = time.time() p1.start() p2.start() # Joins processes back to the parent process p1.join() p2.join() # Vote on concatenated predictions y_test_boot = np.dstack(predictions_list).squeeze() y_test_vote = vote(y_test_boot) print("Balanced Accuracy: %.2f" % balanced_accuracy_score(y_test, y_test_vote)) print("Concurrent execution with processes, elapsed time:", time.time() - start) .. rst-class:: sphx-glr-script-out .. code-block:: none Balanced Accuracy: 0.64 Concurrent execution with processes, elapsed time: 0.6514365673065186 .. GENERATED FROM PYTHON SOURCE LINES 1442-1460 ``Pool()`` of **workers (processes or Jobs)** for concurrent (parallel) execution of multiples tasks. Pool can be used when *N* independent tasks need to be executed in parallel, when there are more tasks than cores on the computer. 1. Initialize a `Pool(), map(), apply_async(), `_ of *P* workers (Process, or Jobs), where *P* < number of cores in the computer. Use `cpu_count` to get the number of logical cores in the current system, See: `Number of CPUs and Cores in Python `_. 2. Map *N* tasks to the *P* workers, here we use the function `Pool.apply_async() `_ that runs the jobs asynchronously. Asynchronous means that calling `pool.apply_async` does not block the execution of the caller that carry on, i.e., it returns immediately with a `AsyncResult` object for the task. that the caller (than runs the sub-processes) is not blocked by the to the process pool does not block, allowing the caller that issued the task to carry on.# 3. Wait for all jobs to complete `pool.join()` 4. Collect the results .. GENERATED FROM PYTHON SOURCE LINES 1460-1488 .. code-block:: Python from multiprocessing import Pool, cpu_count # Numbers of logical cores in the current system. # Rule of thumb: Divide by 2 to get nb of physical cores njobs = int(cpu_count() / 2) start = time.time() ntasks = 12 pool = Pool(njobs) # Run multiple tasks each with multiple arguments async_results = [pool.apply_async(boot_decision_tree, args=(X_train, X_test, y_train)) for i in range(ntasks)] # Close the process pool & wait for all jobs to complete pool.close() pool.join() # Collect the results y_test_boot = np.dstack([ar.get() for ar in async_results]).squeeze() # Vote on concatenated predictions y_test_vote = vote(y_test_boot) print("Balanced Accuracy: %.2f" % balanced_accuracy_score(y_test, y_test_vote)) print("Concurrent execution with processes, elapsed time:", time.time() - start) .. rst-class:: sphx-glr-script-out .. code-block:: none Balanced Accuracy: 0.64 Concurrent execution with processes, elapsed time: 1.8547492027282715 .. GENERATED FROM PYTHON SOURCE LINES 1489-1536 Scripts and argument parsing ----------------------------- Example, the word count script :: import os import os.path import argparse import re import pandas as pd if __name__ == "__main__": # parse command line options output = "word_count.csv" parser = argparse.ArgumentParser() parser.add_argument('-i', '--input', help='list of input files.', nargs='+', type=str) parser.add_argument('-o', '--output', help='output csv file (default %s)' % output, type=str, default=output) options = parser.parse_args() if options.input is None : parser.print_help() raise SystemExit("Error: input files are missing") else: filenames = [f for f in options.input if os.path.isfile(f)] # Match words regex = re.compile("[a-zA-Z]+") count = dict() for filename in filenames: fd = open(filename, "r") for line in fd: for word in regex.findall(line.lower()): if not word in count: count[word] = 1 else: count[word] += 1 fd = open(options.output, "w") # Pandas df = pd.DataFrame([[k, count[k]] for k in count], columns=["word", "count"]) df.to_csv(options.output, index=False) .. GENERATED FROM PYTHON SOURCE LINES 1538-1541 Networking ---------- .. GENERATED FROM PYTHON SOURCE LINES 1541-1544 .. code-block:: Python # TODO .. GENERATED FROM PYTHON SOURCE LINES 1545-1547 FTP ~~~ .. GENERATED FROM PYTHON SOURCE LINES 1550-1551 FTP with ``ftplib`` .. GENERATED FROM PYTHON SOURCE LINES 1551-1564 .. code-block:: Python import ftplib ftp = ftplib.FTP("ftp.cea.fr") ftp.login() ftp.cwd('/pub/unati/people/educhesnay/pystatml') ftp.retrlines('LIST') fd = open(os.path.join(tmpdir, "README.md"), "wb") ftp.retrbinary('RETR README.md', fd.write) fd.close() ftp.quit() .. rst-class:: sphx-glr-script-out .. code-block:: none -rwxrwxr-x 1 ftp ftp 3019 Oct 16 2019 README.md -rwxrwxr-x 1 ftp ftp 10672252 Dec 18 2020 StatisticsMachineLearningPython.pdf -rwxrwxr-x 1 ftp ftp 9676120 Nov 12 2020 StatisticsMachineLearningPythonDraft.pdf -rwxrwxr-x 1 ftp ftp 9798485 Jul 08 2020 StatisticsMachineLearningPythonDraft_202007.pdf '221 Goodbye.' .. GENERATED FROM PYTHON SOURCE LINES 1565-1566 FTP file download with ``urllib`` .. GENERATED FROM PYTHON SOURCE LINES 1566-1572 .. code-block:: Python import urllib ftp_url = 'ftp://ftp.cea.fr/pub/unati/people/educhesnay/pystatml/README.md' urllib.request.urlretrieve(ftp_url, os.path.join(tmpdir, "README2.md")) .. rst-class:: sphx-glr-script-out .. code-block:: none ('/tmp/README2.md', ) .. GENERATED FROM PYTHON SOURCE LINES 1573-1576 HTTP ~~~~ .. GENERATED FROM PYTHON SOURCE LINES 1576-1579 .. code-block:: Python # TODO .. GENERATED FROM PYTHON SOURCE LINES 1580-1583 Sockets ~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 1583-1586 .. code-block:: Python # TODO .. GENERATED FROM PYTHON SOURCE LINES 1587-1590 xmlrpc ~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 1590-1594 .. code-block:: Python # TODO .. GENERATED FROM PYTHON SOURCE LINES 1595-1618 Object Oriented Programming (OOP) --------------------------------- **Sources** - http://python-textbok.readthedocs.org/en/latest/Object\_Oriented\_Programming.html **Principles** - **Encapsulate** data (attributes) and code (methods) into objects. - **Class** = template or blueprint that can be used to create objects. - An **object** is a specific instance of a class. - **Inheritance**: OOP allows classes to inherit commonly used state and behavior from other classes. Reduce code duplication - **Polymorphism**: (usually obtained through polymorphism) calling code is agnostic as to whether an object belongs to a parent class or one of its descendants (abstraction, modularity). The same method called on 2 objects of 2 different classes will behave differently. .. GENERATED FROM PYTHON SOURCE LINES 1618-1643 .. code-block:: Python class Shape2D: def area(self): raise NotImplementedError() # __init__ is a special method called the constructor # Inheritance + Encapsulation class Square(Shape2D): def __init__(self, width): self.width = width def area(self): return self.width ** 2 class Disk(Shape2D): def __init__(self, radius): self.radius = radius def area(self): return math.pi * self.radius ** 2 .. GENERATED FROM PYTHON SOURCE LINES 1644-1645 Object creation .. GENERATED FROM PYTHON SOURCE LINES 1645-1648 .. code-block:: Python square = Square(2) .. GENERATED FROM PYTHON SOURCE LINES 1649-1650 Call a method of the object .. GENERATED FROM PYTHON SOURCE LINES 1650-1653 .. code-block:: Python square.area() .. rst-class:: sphx-glr-script-out .. code-block:: none 4 .. GENERATED FROM PYTHON SOURCE LINES 1654-1655 More sophisticated use .. GENERATED FROM PYTHON SOURCE LINES 1655-1668 .. code-block:: Python shapes = [Square(2), Disk(3)] # Polymorphism print([s.area() for s in shapes]) s = Shape2D() try: s.area() except NotImplementedError as e: print("NotImplementedError", e) .. rst-class:: sphx-glr-script-out .. code-block:: none [4, 28.274333882308138] NotImplementedError .. GENERATED FROM PYTHON SOURCE LINES 1669-1681 Style guide for Python programming ---------------------------------- See `PEP 8 `_ - Spaces (four) are the preferred indentation method. - Two blank lines for top level function or classes definition. - One blank line to indicate logical sections. - Never use: ``from lib import *`` - Bad: ``Capitalized_Words_With_Underscores`` - Function and Variable Names: ``lower_case_with_underscores`` - Class Names: ``CapitalizedWords`` (aka: ``CamelCase``) .. GENERATED FROM PYTHON SOURCE LINES 1684-1696 Documenting ----------- See `Documenting Python `_ Documenting = comments + docstrings (Python documentation string) - `Docstrings `_ are use as documentation for the class, module, and packages. See it as "living documentation". - Comments are used to explain non-obvious portions of the code. "Dead documentation". Docstrings for functions (same for classes and methods): .. GENERATED FROM PYTHON SOURCE LINES 1696-1723 .. code-block:: Python def my_function(a, b=2): """ This function ... Parameters ---------- a : float First operand. b : float, optional Second operand. The default is 2. Returns ------- Sum of operands. Example ------- >>> my_function(3) 5 """ # Add a with b (this is a comment) return a + b print(help(my_function)) .. rst-class:: sphx-glr-script-out .. code-block:: none Help on function my_function in module __main__: my_function(a, b=2) This function ... Parameters ---------- a : float First operand. b : float, optional Second operand. The default is 2. Returns ------- Sum of operands. Example ------- >>> my_function(3) 5 None .. GENERATED FROM PYTHON SOURCE LINES 1724-1735 Docstrings for scripts: At the begining of a script add a pream:: """ Created on Thu Nov 14 12:08:41 CET 2019 @author: firstname.lastname@email.com Some description """ .. GENERATED FROM PYTHON SOURCE LINES 1738-1743 Modules and packages -------------------- Python `packages and modules `_ structure python code into modular "libraries" to be shared. .. GENERATED FROM PYTHON SOURCE LINES 1745-1750 Package ~~~~~~~ Packages are a way of structuring Python’s module namespace by using “dotted module names”. A package is a directory (here, ``stat_pkg``) containing a ``__init__.py`` file. .. GENERATED FROM PYTHON SOURCE LINES 1752-1762 Example, ``package`` :: stat_pkg/ ├── __init__.py └── datasets_mod.py The ``__init__.py`` can be empty. Or it can be used to define the package API, i.e., the modules (``*.py`` files) that are exported and those that remain internal. .. GENERATED FROM PYTHON SOURCE LINES 1764-1772 Example, file ``stat_pkg/__init__.py`` :: # 1) import function for modules in the packages from .module import make_regression # 2) Make them visible in the package __all__ = ["make_regression"] .. GENERATED FROM PYTHON SOURCE LINES 1775-1786 Module ~~~~~~ A module is a python file. Example, ``stat_pkg/datasets_mod.py`` :: import numpy as np def make_regression(n_samples=10, n_features=2, add_intercept=False): ... return X, y, coef .. GENERATED FROM PYTHON SOURCE LINES 1789-1791 Usage .. GENERATED FROM PYTHON SOURCE LINES 1791-1797 .. code-block:: Python import stat_pkg as pkg X, y, coef = pkg.make_regression() print(X.shape) .. rst-class:: sphx-glr-script-out .. code-block:: none (10, 2) .. GENERATED FROM PYTHON SOURCE LINES 1798-1818 The search path ~~~~~~~~~~~~~~~ With a directive like ``import stat_pkg``, Python will searches for - a module, file named ``stat_pkg.py`` or, - a package, directory named ``stat_pkg`` containing a ``stat_pkg/__init__.py`` file. Python will search in a list of directories given by the variable ``sys.path``. This variable is initialized from these locations: - The directory containing the input script (or the current directory when no file is specified). - **``PYTHONPATH``** (a list of directory names, with the same syntax as the shell variable ``PATH``). In our case, to be able to import ``stat_pkg``, the parent directory of ``stat_pkg`` must be in ``sys.path``. You can modify ``PYTHONPATH`` by any method, or access it via ``sys`` package, example: :: import sys sys.path.append("/home/ed203246/git/pystatsml/python_lang") .. GENERATED FROM PYTHON SOURCE LINES 1820-1831 Unit testing ------------ When developing a library (e.g., a python package) that is bound to evolve and being corrected, we want to ensure that: (i) The code correctly implements some expected functionalities; (ii) the modifications and additions don't break those functionalities; Unit testing is a framework to asses to those two points. See sources: - `Unit testing reference doc `_ - `Getting Started With Testing in Python `_ .. GENERATED FROM PYTHON SOURCE LINES 1833-1868 unittest: test your code ~~~~~~~~~~~~~~~~~~~~~~~~ 1) Write unit tests (test cases) In a directory usually called ``tests`` create a `test case `_, i.e., a python file ``test_datasets_mod.py`` (general syntax is ``test_.py``) that will execute some functionalities of the module and test if the output are as expected. `test_datasets_mod.py` file contains specific directives: - ``import unittest``, - ``class TestDatasets(unittest.TestCase)``, the test case class. The general syntax is ``class Test(unittest.TestCase)`` - ``def test_make_regression(self)``, test a function of an element of the module. The general syntax is ``test_(self)`` - ``self.assertTrue(np.allclose(X.shape, (10, 4)))``, test a specific functionality. The general syntax is ``self.assert()`` - ``unittest.main()``, where tests should be executed. Example: :: import unittest import numpy as np from stat_pkg import make_regression class TestDatasets(unittest.TestCase): def test_make_regression(self): X, y, coefs = make_regression(n_samples=10, n_features=3, add_intercept=True) self.assertTrue(np.allclose(X.shape, (10, 4))) self.assertTrue(np.allclose(y.shape, (10, ))) self.assertTrue(np.allclose(coefs.shape, (4, ))) if __name__ == '__main__': unittest.main() .. GENERATED FROM PYTHON SOURCE LINES 1870-1896 2) Run the tests (test runner) The `test runner `_ orchestrates the execution of tests and provides the outcome to the user. Many `test runners `_ are available. `unittest `_ is the first unit test framework, it comes with Python standard library. It employs an object-oriented approach, grouping tests into classes known as test cases, each containing distinct methods representing individual tests. Unitest generally requires that tests are organized as importable modules, `see details `_. Here, we do not introduce this complexity: we directly execute a test file that isn’t importable as a module. :: python tests/test_datasets_mod.py `Unittest test discovery `_: (``-m unittest discover``) within (``-s``) ``tests`` directory, with verbose (``-v``) outputs. :: python -m unittest discover -v -s tests .. GENERATED FROM PYTHON SOURCE LINES 1898-1961 Doctest: add unit tests in docstring ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `Doctest `_ is an inbuilt test framework that comes bundled with Python by default. The doctest module searches for code fragments that resemble interactive Python sessions and runs those sessions to confirm they operate as shown. It promotes `Test-driven (TDD) methodology `_. 1) Add doc test in the docstrings, see `python stat_pkg/supervised_models.py `_: :: class LinearRegression: """Ordinary least squares Linear Regression. ... Examples -------- >>> import numpy as np >>> from stat_pkg import LinearRegression >>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]]) >>> # y = 1 * x_0 + 2 * x_1 + 3 >>> y = np.dot(X, np.array([1, 2])) + 3 >>> reg = LinearRegression().fit(X, y) >>> reg.coef_ array([3., 1., 2.0]) >>> reg.predict(np.array([[3, 5]])) array([16.]) """ def __init__(self, fit_intercept=True): self.fit_intercept = fit_intercept ... 2) Add the call to doctest module ad the end of the python file: :: if __name__ == "__main__": import doctest doctest.testmod() 3) Run doc tests: :: python stat_pkg/supervised_models.py Test failed with the output: :: ********************************************************************** File ".../supervised_models.py", line 36, in __main__.LinearRegression Failed example: reg.coef_ Expected: array([3., 1., 2.0]) Got: array([3., 1., 2.]) ********************************************************************** 1 items had failures: 1 of 7 in __main__.LinearRegression ***Test Failed*** 1 failures. .. GENERATED FROM PYTHON SOURCE LINES 1963-1965 ~~~~~~~~~~~~~~~~~~~~~~~~ .. GENERATED FROM PYTHON SOURCE LINES 1969-1972 Exercises --------- .. GENERATED FROM PYTHON SOURCE LINES 1975-1984 Exercise 1: functions ~~~~~~~~~~~~~~~~~~~~~ Create a function that acts as a simple calculator taking three parameters: the two operand and the operation in "+", "-", and "*". As default use "+". If the operation is misspecified, return a error message Ex: ``calc(4,5,"*")`` returns 20 Ex: ``calc(3,5)`` returns 8 Ex: ``calc(1, 2, "something")`` returns error message .. GENERATED FROM PYTHON SOURCE LINES 1987-1998 Exercise 2: functions + list + loop ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Given a list of numbers, return a list where all adjacent duplicate elements have been reduced to a single element. Ex: ``[1, 2, 2, 3, 2]`` returns ``[1, 2, 3, 2]``. You may create a new list or modify the passed in list. Remove all duplicate values (adjacent or not) Ex: ``[1, 2, 2, 3, 2]`` returns ``[1, 2, 3]`` .. GENERATED FROM PYTHON SOURCE LINES 2001-2016 Exercise 3: File I/O ~~~~~~~~~~~~~~~~~~~~ 1. Copy/paste the BSD 4 clause license (https://en.wikipedia.org/wiki/BSD_licenses) into a text file. Read, the file and count the occurrences of each word within the file. Store the words' occurrence number in a dictionary. 2. Write an executable python command ``count_words.py`` that parse a list of input files provided after ``--input`` parameter. The dictionary of occurrence is save in a csv file provides by ``--output``. with default value word_count.csv. Use: - open - regular expression - argparse (https://docs.python.org/3/howto/argparse.html) .. GENERATED FROM PYTHON SOURCE LINES 2019-2038 Exercise 4: OOP ~~~~~~~~~~~~~~~ 1. Create a class ``Employee`` with 2 attributes provided in the constructor: ``name``, ``years_of_service``. With one method ``salary`` with is obtained by ``1500 + 100 * years_of_service``. 2. Create a subclass ``Manager`` which redefine ``salary`` method ``2500 + 120 * years_of_service``. 3. Create a small dictionary-nosed database where the key is the employee's name. Populate the database with: samples = Employee('lucy', 3), Employee('john', 1), Manager('julie', 10), Manager('paul', 3) 4. Return a table of made name, salary rows, i.e. a list of list [[name, salary]] 5. Compute the average salary .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 8.467 seconds) .. _sphx_glr_download_auto_gallery_python_lang.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: python_lang.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: python_lang.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: python_lang.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_