Machine Learning - Deep Learning

Data Wrangling - Data Cleanup

from csv import DictReader

data_rdr = DictReader(open('mn.csv', 'rt'))
header_rdr = DictReader(open('mn_headers.csv', 'rt'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr]
print (data_rows[:5])
print (header_rows[:5])

[{'': '1', 'HH1': '1', 'HH2': '17', 'LN': '1', 'MWM1': '1', 'MWM2': '17', 'MWM4': '1', 'MWM5': '14', 'MWM6D': '7', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '2', 'MWM9': '20', 'MWM10H': '17', 'MWM10M': '59', 'MWM11H': '18', 'MWM11M': '7', 'MWB1M': '5', 'MWB1Y': '1984', 'MWB2': '29', 'MWB3': 'Yes', 'MWB4': 'Higher', 'MWB5': '31', 'MWB7': 'NA', 'MMT2': 'Almost every day', 'MMT3': 'At least once a week', 'MMT4': 'Less than once a week', 'MMT6': 'Yes', 'MMT7': 'Yes', 'MMT8': 'Almost every day', 'MMT9': 'Yes', 'MMT10': 'Yes', 'MMT11': 'Almost every day', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'No', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '0', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'Yes, currently married', 'MMA3': 'No (Only one)', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'Only once', 'MMA8M': '9', 'MMA8Y': '2013', 'MMA9': 'NA', 'MSB1': '20', 'MSB2': 'Yes', 'MSB3U': 'Days ago', 'MSB3N': '0', 'MSB4': 'No', 'MSB5': 'Wife', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '5', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'DK', 'MHA8B': 'Yes', 'MHA8C': 'DK', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'No', 'MHA12': 'Yes', 'MHA24': 'Yes', 'MHA25': 'Less than 12 months ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'No', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'No', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'Yes', 'MTA15': '16', 'MTA16': '0', 'MTA17': 'NA', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '1013', 'MWAGE': '25-29', 'MWDOM': '1365', 'MWAGEM': '29', 'MWDOBLC': 'NA', 'MMSTATUS': 'Currently married/in union', 'MCEB': '0', 'MCSURV': '0', 'MCDEAD': '0', 'mwelevel': 'Higher', 'mnweight': '0.403797141860459', 'wscore': '1.60367010204171', 'windex5': '5', 'wscoreu': '1.27255184167736', 'windex5u': '5', 'wscorer': 'NA', 'windex5r': 'NA'}, {'': '2', 'HH1': '1', 'HH2': '20', 'LN': '1', 'MWM1': '1', 'MWM2': '20', 'MWM4': '1', 'MWM5': '14', 'MWM6D': '7', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '2', 'MWM9': '20', 'MWM10H': '17', 'MWM10M': '32', 'MWM11H': '17', 'MWM11M': '42', 'MWB1M': '5', 'MWB1Y': '1976', 'MWB2': '37', 'MWB3': 'Yes', 'MWB4': 'Higher', 'MWB5': '31', 'MWB7': 'NA', 'MMT2': 'At least once a week', 'MMT3': 'Not at all', 'MMT4': 'Almost every day', 'MMT6': 'Yes', 'MMT7': 'Yes', 'MMT8': 'Almost every day', 'MMT9': 'Yes', 'MMT10': 'Yes', 'MMT11': 'Almost every day', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'No', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '0', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'Yes, currently married', 'MMA3': 'No (Only one)', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'Only once', 'MMA8M': '2', 'MMA8Y': '2014', 'MMA9': 'NA', 'MSB1': '37', 'MSB2': 'No', 'MSB3U': 'Days ago', 'MSB3N': '0', 'MSB4': 'No', 'MSB5': 'Wife', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '1', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'No', 'MHA8B': 'Yes', 'MHA8C': 'Yes', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'Yes', 'MHA12': 'No', 'MHA24': 'Yes', 'MHA25': 'Less than 12 months ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'No', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'No', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'No', 'MTA15': 'NA', 'MTA16': 'NA', 'MTA17': 'NA', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '917', 'MWAGE': '35-39', 'MWDOM': '1370', 'MWAGEM': '37', 'MWDOBLC': 'NA', 'MMSTATUS': 'Currently married/in union', 'MCEB': '0', 'MCSURV': '0', 'MCDEAD': '0', 'mwelevel': 'Higher', 'mnweight': '0.403797141860459', 'wscore': '1.54327702631422', 'windex5': '5', 'wscoreu': '1.08902631982422', 'windex5u': '5', 'wscorer': 'NA', 'windex5r': 'NA'}, {'': '3', 'HH1': '2', 'HH2': '1', 'LN': '1', 'MWM1': '2', 'MWM2': '1', 'MWM4': '1', 'MWM5': '9', 'MWM6D': '8', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '1', 'MWM9': '40', 'MWM10H': '10', 'MWM10M': '37', 'MWM11H': '10', 'MWM11M': '52', 'MWB1M': '2', 'MWB1Y': '1973', 'MWB2': '41', 'MWB3': 'Yes', 'MWB4': 'Primary', 'MWB5': '17', 'MWB7': 'Able to read whole sentence', 'MMT2': 'Not at all', 'MMT3': 'Almost every day', 'MMT4': 'Less than once a week', 'MMT6': 'No', 'MMT7': 'NA', 'MMT8': 'NA', 'MMT9': 'No', 'MMT10': 'NA', 'MMT11': 'NA', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'Yes', 'MCM3': '21', 'MCM4': 'Yes', 'MCM5A': '1', 'MCM5B': '1', 'MCM6': 'Yes', 'MCM7A': '1', 'MCM7B': '0', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '3', 'MCM11A': 'No', 'MCM11B': '2', 'MCM12M': '5', 'MCM12Y': '2012', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'Yes, currently married', 'MMA3': 'No (Only one)', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'More than once', 'MMA8M': '8', 'MMA8Y': '1991', 'MMA9': 'NA', 'MSB1': '20', 'MSB2': 'No', 'MSB3U': 'Days ago', 'MSB3N': '0', 'MSB4': 'Yes', 'MSB5': 'Wife', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '5', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'No', 'MHA8B': 'Yes', 'MHA8C': 'No', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'No', 'MHA12': 'Yes', 'MHA24': 'Yes', 'MHA25': '2 or more years ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'Yes', 'MTA2': '18', 'MTA3': 'Yes', 'MTA4': '10', 'MTA5': '30', 'MTA6': 'Yes', 'MTA7': 'Yes', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'Rolled tobacco', 'MTA8X': 'NA', 'MTA9': '7', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'Yes', 'MTA15': '19', 'MTA16': '8', 'MTA17': '2', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '878', 'MWAGE': '40-44', 'MWDOM': '1100', 'MWAGEM': '18', 'MWDOBLC': 'NA', 'MMSTATUS': 'Currently married/in union', 'MCEB': '3', 'MCSURV': '3', 'MCDEAD': '0', 'mwelevel': 'Primary', 'mnweight': '1.03192602919895', 'wscore': '0.878635263695964', 'windex5': '4', 'wscoreu': '-0.930720561098312', 'windex5u': '1', 'wscorer': 'NA', 'windex5r': 'NA'}, {'': '4', 'HH1': '2', 'HH2': '1', 'LN': '5', 'MWM1': '2', 'MWM2': '1', 'MWM4': '5', 'MWM5': '9', 'MWM6D': '12', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Not at home', 'MWM8': '1', 'MWM9': '40', 'MWM10H': 'NA', 'MWM10M': 'NA', 'MWM11H': 'NA', 'MWM11M': 'NA', 'MWB1M': 'NA', 'MWB1Y': 'NA', 'MWB2': 'NA', 'MWB3': 'NA', 'MWB4': 'NA', 'MWB5': 'NA', 'MWB7': 'NA', 'MMT2': 'NA', 'MMT3': 'NA', 'MMT4': 'NA', 'MMT6': 'NA', 'MMT7': 'NA', 'MMT8': 'NA', 'MMT9': 'NA', 'MMT10': 'NA', 'MMT11': 'NA', 'MMT12': 'NA', 'MMT13': 'NA', 'MMT14': 'NA', 'MCM1': 'NA', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'NA', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': 'NA', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'NA', 'MDV1B': 'NA', 'MDV1C': 'NA', 'MDV1D': 'NA', 'MDV1E': 'NA', 'MDV1F': 'NA', 'MMA1': 'NA', 'MMA3': 'NA', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'NA', 'MMA8M': 'NA', 'MMA8Y': 'NA', 'MMA9': 'NA', 'MSB1': 'NA', 'MSB2': 'NA', 'MSB3U': 'NA', 'MSB3N': 'NA', 'MSB4': 'NA', 'MSB5': 'NA', 'MSB8': 'NA', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': 'NA', 'MHA1': 'NA', 'MHA2': 'NA', 'MHA3': 'NA', 'MHA4': 'NA', 'MHA5': 'NA', 'MHA6': 'NA', 'MHA7': 'NA', 'MHA8A': 'NA', 'MHA8B': 'NA', 'MHA8C': 'NA', 'MHA9': 'NA', 'MHA10': 'NA', 'MHA11': 'NA', 'MHA12': 'NA', 'MHA24': 'NA', 'MHA25': 'NA', 'MHA26': 'NA', 'MHA27': 'NA', 'MMC1': 'NA', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'NA', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'NA', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'NA', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'NA', 'MTA15': 'NA', 'MTA16': 'NA', 'MTA17': 'NA', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': 'NA', 'MWAGE': 'NA', 'MWDOM': 'NA', 'MWAGEM': 'NA', 'MWDOBLC': 'NA', 'MMSTATUS': 'NA', 'MCEB': 'NA', 'MCSURV': 'NA', 'MCDEAD': 'NA', 'mwelevel': 'NA', 'mnweight': '0', 'wscore': '0', 'windex5': '0', 'wscoreu': '0', 'windex5u': '0', 'wscorer': '0', 'windex5r': '0'}, {'': '5', 'HH1': '2', 'HH2': '1', 'LN': '8', 'MWM1': '2', 'MWM2': '1', 'MWM4': '8', 'MWM5': '9', 'MWM6D': '8', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '1', 'MWM9': '40', 'MWM10H': '10', 'MWM10M': '53', 'MWM11H': '11', 'MWM11M': '10', 'MWB1M': '2', 'MWB1Y': '1993', 'MWB2': '21', 'MWB3': 'Yes', 'MWB4': 'Secondary', 'MWB5': '24', 'MWB7': 'NA', 'MMT2': 'Less than once a week', 'MMT3': 'At least once a week', 'MMT4': 'Less than once a week', 'MMT6': 'No', 'MMT7': 'NA', 'MMT8': 'NA', 'MMT9': 'No', 'MMT10': 'NA', 'MMT11': 'NA', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'No', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '0', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'No, not in union', 'MMA3': 'NA', 'MMA4': 'NA', 'MMA5': 'No', 'MMA6': 'NA', 'MMA7': 'NA', 'MMA8M': 'NA', 'MMA8Y': 'NA', 'MMA9': 'NA', 'MSB1': '19', 'MSB2': 'Yes', 'MSB3U': 'Months ago', 'MSB3N': '7', 'MSB4': 'Yes', 'MSB5': 'Girlfriend', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '3', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'Yes', 'MHA8B': 'Yes', 'MHA8C': 'Yes', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'No', 'MHA12': 'Yes', 'MHA24': 'Yes', 'MHA25': '12-23 months ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'No', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'No', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'Yes', 'MTA15': '20', 'MTA16': '10', 'MTA17': '2', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '1118', 'MWAGE': '20-24', 'MWDOM': 'NA', 'MWAGEM': 'NA', 'MWDOBLC': 'NA', 'MMSTATUS': 'Never married/in union', 'MCEB': '0', 'MCSURV': '0', 'MCDEAD': '0', 'mwelevel': 'Secondary', 'mnweight': '1.03192602919895', 'wscore': '0.878635263695964', 'windex5': '4', 'wscoreu': '-0.930720561098312', 'windex5u': '1', 'wscorer': 'NA', 'windex5r': 'NA'}]
[{'Name': 'HH1', 'Label': 'Cluster number', 'Question': ''}, {'Name': 'HH2', 'Label': 'Household number', 'Question': ''}, {'Name': 'LN', 'Label': 'Line number', 'Question': ''}, {'Name': 'MWM1', 'Label': 'Cluster number', 'Question': ''}, {'Name': 'MWM2', 'Label': 'Household number', 'Question': ''}]

for data_dict in data_rows:
  for dkey, dval in data_dict.items():
    for header_dict in header_rows:
      for hkey, hval in header_dict.items():
        if dkey == hval:
            print('match!')


match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
.....

new_rows = []
for data_dict in data_rows:
  new_row = {}
  for dkey, dval in data_dict.items():
    for header_dict in header_rows:
      if dkey in header_dict.values():
        new_row[header_dict.get('Label')] = dval
  new_rows.append(new_row)

new_rows[0]

'Relationship to last sexual partner': 'Wife',
 'Sex with any other person in the last 12 month': 'No',
 'Condom used with prior sexual partner': 'NA',
 'Relationship to prior sexual partner': 'NA',
 'Sex with any other man in the last 12 months': 'NA',
 'Number of sex partners in last 12 months': 'NA',
 'Number of sex partners in lifetime': '5',
 'Ever heard of AIDS': 'Yes',
 'Can avoid AIDS virus by having one uninfected partner': 'Yes',
 'Can get AIDS virus through supernatural means': 'No',
 'Can avoid AIDS virus by using a condom correctly every time': 'Yes',
 'Can get AIDS virus from mosquito bites': 'No',
 'Can get AIDS virus by sharing food with a person who has AIDS': 'No',
 'Healthy-looking person may have AIDS virus': 'Yes',
 'AIDS virus from mother to child during pregnancy': 'DK',
 'AIDS virus from mother to child during delivery': 'Yes',
 'AIDS virus from mother to child through breastfeeding': 'DK',
 'Should female teacher with AIDS virus be allowed to teach in school': 'Yes',
 'Would buy fresh vegetables from shopkeeper with AIDS virus': 'Yes',
 'If HH member became infected with AIDS virus, would want it to remain a secret': 'No',
 'Willing to care for person with AIDS in household': 'Yes',
 'Ever been tested for AIDS virus': 'Yes',
 'Most recent time of testing for AIDS virus': 'Less than 12 months ago',
 'Received results of AIDS virus test': 'Yes',
 'Know a place to get AIDS virus test': 'NA',
 'Ever tried cigarette smoking': 'No',
 'Age when cigarette was smoked for the first time': 'NA',
 'Currently smoking cigarettes': 'NA',
 'Number of cigarettes smoked in the last 24 hours': 'NA',
 'Number of days when cigarettes were smoked in past month': 'NA',
 'Ever tried any smoked tobacco products other than cigarettes': 'No',
 'Used any smoked tobacco products during the last month': 'NA',
 'Type of smoked tobacco product: Cigars': 'NA',
 'Type of smoked tobacco product: Water pipe': 'NA',
 'Type of smoked tobacco product: Cigarillos': 'NA',
 'Type of smoked tobacco product: Pipe': 'NA',
 'Type of smoked tobacco product: Other': 'NA',
 'Number of days when tobacco products where smoked in past month': 'NA',
 'Ever tried any form of smokeless tobacco products': 'No',
 'Used any smokeless tobacco products during the last month': 'NA',
 'Type of smokeless tobacco product used: Chewing tobacco': 'NA',
 'Type of smokeless tobacco product used: Snuff': 'NA',
 'Type of smokeless tobacco product used: Dip': 'NA',
 'Type of smokeless tobacco product used: Other': 'NA',
 'Number of days when smokeless tobacco products where used in past month': 'NA',
 'Ever drunk alcohol': 'Yes',
 'Age when alcohol was used for the first time': '16',
 'Number of days when at least one drink of alcohol was used in past month': '0',
 'Number of drinks usually consumed': 'NA',
 'Months ago net obtained': 'NA',
 'Net treated with an insecticide when obtained': 'NA',
 'Net soaked or dipped since obtained': 'NA',
 'Months ago net soaked or dipped': 'NA',
 'Persons slept under mosquito net last night': 'NA',
 'Person 1 who slept under net': 'NA',
 'Person 2 who slept under net': 'NA',
 'Person 3 who slept under net': 'NA',
 'Person 4 who slept under net': 'NA'}


from csv import reader
data_rdr = reader(open('mn.csv', 'rt'))
header_rdr = reader(open('mn_headers.csv', 'rt'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr]
print (len(data_rows[0]))
print (len(header_rows))

159
210

data_rows[0]
['',
 'HH1',
 'HH2',
 'LN',
 'MWM1',
 'MWM2',
 'MWM4',
 'MWM5',
 'MWM6D',
 'MWM6M',
 'MWM6Y',
 'MWM7',
 'MWM8',
 'MWM9',
 'MWM10H',
 'MWM10M',
 'MWM11H',
 'MWM11M',
 'MWB1M',
 'MWB1Y',
 'MWB2',
 'MWB3',
 'MWB4',
 'MWB5',
 'MWB7',
 'MMT2',
 'MMT3',
 'MMT4',
 'MMT6',
 'MMT7',
 'MMT8',
 'MMT9',
 'MMT10',
 'MMT11',
 'MMT12',
 'MMT13',
 'MMT14',
 'MCM1',
 'MCM3',
 'MCM4',
 'MCM5A',
 'MCM5B',
 'MCM6',
 'MCM7A',
 'MCM7B',
 'MCM8',
 'MCM9A',
 'MCM9B',
 'MCM10',
 'MCM11A',
 'MCM11B',
 'MCM12M',
 'MCM12Y',
 'MDV1A',
 'MDV1B',
 'MDV1C',
 'MDV1D',
 'MDV1E',
 'MDV1F',
 'MMA1',
 'MMA3',
 'MMA4',
 'MMA5',
 'MMA6',
 'MMA7',
 'MMA8M',
 'MMA8Y',
 'MMA9',
 'MSB1',
 'MSB2',
 'MSB3U',
 'MSB3N',
 'MSB4',
 'MSB5',
 'MSB8',
 'MSB9',
 'MSB10',
 'MSB13',
 'MSB14',
 'MSB15',
 'MHA1',
 'MHA2',
 'MHA3',
 'MHA4',
 'MHA5',
 'MHA6',
 'MHA7',
 'MHA8A',
 'MHA8B',
 'MHA8C',
 'MHA9',
 'MHA10',
 'MHA11',
 'MHA12',
 'MHA24',
 'MHA25',
 'MHA26',
 'MHA27',
 'MMC1',
 'MMC2',
 'MMC3',
 'MMC4',
 'MTA1',
 'MTA2',
 'MTA3',
 'MTA4',
 'MTA5',
 'MTA6',
 'MTA7',
 'MTA8A',
 'MTA8B',
 'MTA8C',
 'MTA8D',
 'MTA8E',
 'MTA8X',
 'MTA9',
 'MTA10',
 'MTA11',
 'MTA12A',
 'MTA12B',
 'MTA12C',
 'MTA12X',
 'MTA13',
 'MTA14',
 'MTA15',
 'MTA16',
 'MTA17',
 'TNLN',
 'TN4',
 'TN5',
 'TN6',
 'TN8',
 'TN9',
 'TN10',
 'TN11',
 'TN12_1',
 'TN12_2',
 'TN12_3',
 'TN12_4',
 'HH6',
 'HH7',
 'MWDOI',
 'MWDOB',
 'MWAGE',
 'MWDOM',
 'MWAGEM',
 'MWDOBLC',
 'MMSTATUS',
 'MCEB',
 'MCSURV',
 'MCDEAD',
 'mwelevel',
 'mnweight',
 'wscore',
 'windex5',
 'wscoreu',
 'windex5u',
 'wscorer',
 'windex5r']

header_rows[:2]
[['HH1', 'Cluster number', ''], ['HH2', 'Household number', '']]
bad_rows = []
for h in header_rows:
  if h[0] not in data_rows[0]:
    bad_rows.append(h)
for h in bad_rows:
  header_rows.remove(h)
print(len(header_rows))

150
all_short_headers = [h[0] for h in header_rows]
for header in data_rows[0]:
  if header not in all_short_headers:
    print ('mismatch!', header)

mismatch! 
mismatch! MDV1F
mismatch! MTA8E
mismatch! mwelevel
mismatch! mnweight
mismatch! wscoreu
mismatch! windex5u
mismatch! wscorer
mismatch! windex5r
from csv import reader
data_rdr = reader(open('mn.csv', 'rt'))
header_rdr = reader(open('mn_headers_updated.csv', 'rt'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr if h[0] in data_rows[0]]
print(len(header_rows))
all_short_headers = [h[0] for h in header_rows]
skip_index = []
for header in data_rows[0]:
  if header not in all_short_headers:
    index = data_rows[0].index(header)
    skip_index.append(index)
new_data = []
for row in data_rows[1:]:
  new_row = []
  for i, d in enumerate(row):
    if i not in skip_index:
      new_row.append(d)
  new_data.append(new_row)
zipped_data = []
for drow in new_data:
  zipped_data.append(zip(header_rows, drow))

152

zipped_data[0]
<zip at 0x7fa99615dc40>

data_headers = []
for i, header in enumerate(data_rows[0]):
  if i not in skip_index:
    data_headers.append(header)
header_match = zip(data_headers, all_short_headers)
print(header_match)

<zip object at 0x7fa9969f37c0>

from csv import reader
data_rdr = reader(open('mn.csv', 'rt',encoding='utf-8'))
header_rdr = reader(open('mn_headers_updated.csv', 'rt',encoding='utf-8'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr if h[0] in data_rows[0]]
all_short_headers = [h[0] for h in header_rows]
skip_index = []
final_header_rows = []
for header in data_rows[0]:
  if header not in all_short_headers:
    index = data_rows[0].index(header)
    skip_index.append(index)
  else:
    for head in header_rows:
      if head[0] == header:
        final_header_rows.append(head)
        break
new_data = []
for row in data_rows[1:]:
  new_row = []
  for i, d in enumerate(row):
    if i not in skip_index:
      new_row.append(d)
  new_data.append(new_row)
zipped_data = []
for drow in new_data:
  zipped_data.append(zip(final_header_rows, drow))

zipped_data[0]

<zip at 0x7fa9b5fbdc00>

Data Wrangling -1 MCQs _______________________________________________

JSON stands for _______

A. JavaScript Object Notation
B. Java Object Notation
C. JavaScript Object Normalization
D. JavaScript Object-Oriented Notation

ANS: A

2. JSON is a _____ for storing and transporting data.

A. xml format
B. text format
C. JavaScript
D. php format

ANS: B

3. The JSON syntax is a subset of the _____ syntax.

A. Ajax
B. Php
C. HTML
D. javaScript

ANS: D

Excel File Reader

ARTIFICIAL INTELLIGENCE - MCQs - 2

1. Which search is equal to minimax search but eliminates the branches that can’t influence the final decision?

A. Depth-first search
B. Breadth-first search
C. Alpha-beta pruning
D. None of the mentioned

Ans: C

2. Which values are independent in minimax search algorithm?

A. Pruned leaves x and y
B. Every states are dependent
C. Root is independent
D. None of the mentioned

Ans: A

3. To which depth does the alpha-beta pruning can be applied?

A. 10 states
B. 8 States
C. 6 States
D. Any depth

Ans: D

4. Which search is similar to minimax search?

A. Hill-climbing search
B. Depth-first search
C. Breadth-first search
D. All of the mentioned

Ans: B

5. Which value is assigned to alpha and beta in the alpha-beta pruning?

A. Alpha = max
B. Beta = min
C. Beta = max
D. Both Alpha = max & Beta = min

Ans: D

6. Where does the values of alpha-beta search get updated?

A. Along the path of search
B. Initial state itself
C. At the end
D. None of the mentioned

Ans: A

7. How the effectiveness of the alpha-beta pruning gets increased?

A. Depends on the nodes
B. Depends on the order in which they are executed
C. All of the mentioned
D. None of the mentioned

Ans: A

8. What is called as transposition table?

A. Hash table of next seen positions
B. Hash table of previously seen positions
C. Next value in the search
D. None of the mentioned

Ans: B

9. Which is identical to the closed list in Graph search?

A. Hill climbing search algorithm
B. Depth-first search
C. Transposition table
D. None of the mentioned

Ans: C

10. Which function is used to calculate the feasibility of whole game tree?

A. Evaluation function
B. Transposition
C. Alpha-beta pruning
D. All of the mentioned

Ans: A

11. General games involves ____________

A. Single-agent
B. Multi-agent
C. Neither Single-agent nor Multi-agent
D. Only Single-agent and Multi-agent

Ans: D

12. Adversarial search problems uses ____________

A. Competitive Environment
B. Cooperative Environment
C. Neither Competitive nor Cooperative Environment
D. Only Competitive and Cooperative Environment

Ans: A

13. Zero sum games are the one in which there are two agents whose actions must alternate and in which the utility values at the end of the game are always the same.

A. True
B. False

Ans: B

14. Zero sum game has to be a ______ game.

A. Single player

B. Two player

C. Multiplayer

D. Three player

Ans: C

15. A game can be formally defined as a kind of search problem with the following components.

A. Initial State
B. Successor Function
C. Terminal Test
D. All of the mentioned

Ans: D

16. The initial state and the legal moves for each side define the __________ for the game.

A. Search Tree
B. Game Tree
C. State Space Search
D. Forest

Ans: B

17. General algorithm applied on game tree for making decision of win/lose is ____________

A. DFS/BFS Search Algorithms
B. Heuristic Search Algorithms
C. Greedy Search Algorithms
D. MIN/MAX Algorithms

Ans: D

18. The minimax algorithm computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state, directly implementing the defining equations. The recursion proceeds all the way down to the leaves of the tree, and then the minimax values are backed up through the tree as the recursion unwinds.

A. True
B. False

Ans: A

19. Which is the most straightforward approach for planning algorithm?

A. Best-first search
B. State-space search
C. Depth-first search
D. Hill-climbing search

Ans: B

20. What are taken into account of state-space search?

A. Postconditions
B. Preconditions
C. Effects
D. Both Preconditions & Effects

Ans: D

21. Which approach is to pretend that a pure divide and conquer algorithm will work?

A. Goal independence
B. Subgoal independence
C. Both Goal & Subgoal independence
D. None of the mentioned

Ans: B

22. Which is the best way to go for Game playing problem?

A. Linear approach
B. Heuristic approach (Some knowledge is stored)
C. Random approach
D. An Optimal approach

Ans: B

23. A production rule consists of ____________

A. A set of Rule
B. A sequence of steps
C. Set of Rule & sequence of steps
D. Arbitrary representation to problem

Ans: C

24. Which search method takes less memory?

A. Depth-First Search
B. Breadth-First search
C. Linear Search
D. Optimal search

Ans: A

25. What is the major component/components for measuring the performance of problem solving?

A. Completeness
B. Optimality
C. Time and Space complexity
D. All of the mentioned

Ans: D

26. The Set of actions for a problem in a state space is formulated by a ___________

A. Intermediate states
B. Initial state
C. Successor function, which takes current action and returns next immediate state
D. None of the mentioned

Ans: C

27. What is state space?

A. The whole problem
B. Your Definition to a problem
C. Problem you design
D. Representing your problem with variable and parameter

Ans: D

28. What is the objective of tower of hanoi puzzle?

A. To move all disks to some other rod by following rules
B. To divide the disks equally among the three rods by following rules
C. To move all disks to some other rod in random order
D. To divide the disks equally among three rods in random order

Ans: A

29. Which of the following is NOT a rule of tower of hanoi puzzle?

A. No disk should be placed over a smaller disk
B. Disk can only be moved if it is the uppermost disk of the stack
C. No disk should be placed over a larger disk
D. Only one disk can be moved at a time

Ans: C

30. Recursive solution of tower of hanoi problem is an example of which of the following algorithm?

A. Dynamic programming

B. Backtracking

C. Greedy algorithm

D. Divide and conquer

Ans: D

Artificial Intelligence - MCQs - 1

1. What is Artificial Intelligence?

A. Artificial Intelligence is a field that aims to make humans more intelligent
B. Artificial Intelligence is a field that aims to improve the

security
C. Artificial Intelligence is a field that aims to develop intelligent machines
D. Artificial Intelligence is a field that aims to mine the data

Ans: C

2. Who is the inventor of Artificial Intelligence?

A. Geoffrey Hinton

B. Andrew Ng

C. John McCarthy

D. Jürgen Schmidhuber

Ans: C

3. Which of the following is the branch of Artificial Intelligence?

A. Machine Learning
B. Cyber forensics
C. Full-Stack Developer
D. Network Design

Ans: A

4. What is the goal of Artificial Intelligence?

A. To solve artificial problems
B. To extract scientific causes
C. To explain various sorts of intelligence
D. To solve real-world problems

Ans: C

5. Which of the following is an application of Artificial Intelligence?

A. It helps to exploit vulnerabilities to secure the firm
B. Language understanding and problem-solving (Text analytics and NLP)
C. Easy to create a website
D. It helps to deploy applications on the cloud

Ans: B

6. Which of the following is not the commonly used programming language for Artificial Intelligence?

A. Perl
B. Java
C. PROLOG
D. LISP

Ans: A

7. Which of the following is not an application of artificial intelligence?

A. Face recognition system
B. Chatbots
C. LIDAR
D. DBMS

Ans: D

8. Which of the following is an advantage of artificial intelligence?

A. Reduces the time taken to solve the problem
B. Helps in providing security
C. Have the ability to think hence makes the work easier
D. All of the above

Ans: D

9. Which search method takes less memory?

A. Depth-First Search
B. Breadth-First search
C. Optimal search
D. Linear Search

Ans: A

10. A heuristic is a way of trying __________

A. To discover something or an idea embedded in a program
B. To search and measure how far a node in a search tree seems to be from a goal
C. To compare two nodes in a search tree to see if one is better than the other is
D. All of the mentioned

Ans: D

11. A.M. Turing developed a technique for determining whether a computer could or could not demonstrate the artificial Intelligence, Presently, this technique is called __________

A. Turing Test
B. Algorithm
C. Boolean Algebra
D. Logarithm

Ans: A

12. DARPA, the agency that has funded a great deal of American Artificial Intelligence research, is part of the Department of __________

A. Defense
B. Energy
C. Education
D. Justice

Ans: A

13. Which of these schools was not among the early leaders in Artificial Intelligence research?

A. Dartmouth University
B. Harvard University
C. Massachusetts Institute of Technology
D. Stanford University

Ans: B

14. A certain Professor at the Stanford University coined the word ‘artificial intelligence’ in 1956 at a conference held at Dartmouth college. Can you name the Professor?

A. David Levy
B. John McCarthy
C. Joseph Weizenbaum
D. Hans Berliner

Ans: B

15. The conference that launched the AI revolution in 1956 was held at?

A. Dartmouth
B. Harvard
C. New York
D. Stanford

Ans: A

16. What is a heuristic function?

A. A function to solve mathematical problems
B. A function which takes parameters of type string and returns an integer value
C. A function whose return type is nothing
D. A function that maps from problem state descriptions to measures of desirability

Ans: D

17. A search algorithm takes _________ as an input and returns ________ as an output.

A. Input, output
B. Problem, solution
C. Solution, problem
D. Parameters, sequence of actions

Ans: B

18. The _______ is a touring problem in which each city must be visited exactly once. The aim is to find the shortest tour.

A. Finding shortest path between a source and a destination
B. Travelling Salesman problem
C. Map coloring problem
D. Depth first search traversal on a given map represented as a graph

Ans: B

19. Which search method takes less memory?

A. Depth-First Search
B. Breadth-First search
C. Linear Search
D. Optimal search

Ans: A

20. Which search strategy is also called as blind search?

A. Uninformed search
B. Informed search
C. Simple reflex search
D. All of the mentioned

Ans: A

21. Which search is implemented with an empty first-in-first-out queue?

A. Depth-first search
B. Breadth-first search
C. Bidirectional search
D. None of the mentioned

Ans: B

22. Which search implements stack operation for searching the states?

A. Depth-limited search
B. Depth-first search
C. Breadth-first search
D. None of the mentioned

Ans: B

23. What is the other name of informed search strategy?

A. Simple search
B. Heuristic search
C. Online search
D. None of the mentioned

Ans: B

24. Which function will select the lowest expansion node at first for evaluation?

A. Greedy best-first search
B. Best-first search
C. Depth-first search
D. None of the mentioned

Ans: B

25. _________________ are mathematical problems defined as a set of objects whose state must satisfy a number of constraints or limitations.

A. Constraints Satisfaction Problems
B. Uninformed Search Problems
C. Local Search Problems
D. All of the mentioned

Ans: A

26. A heuristic is a way of trying ___________

A. To discover something or an idea embedded in a program
B. To search and measure how far a node in a search tree seems to be from a goal
C. To compare two nodes in a search tree to see if one is better than another
D. All of the mentioned

Ans: D

27. A* algorithm is based on ___________

A. Breadth-First-Search
B. Depth-First –Search
C. Best-First-Search
D. Hill climbing

Ans: C

28. Uninformed search strategies are better than informed search strategies.

A. True
B. False

Ans: B

29. Best-First search can be implemented using the following data structure

A. Queue
B. Stack
C. Priority Queue
D. Circular Queue

Ans: C

30. What is the evaluation function in A* approach?

A. Heuristic function

B. Path cost from start node to current node
C. Path cost from start node to current node + Heuristic cost
D. Average of Path cost from start node to current node and Heuristic cost

Ans: C

Machine Learning - Deep Learning

Data Wrangling - Data Cleanup

Data Wrangling -1 MCQs

Data Wrangling -1 MCQs _______________________________________________

4. Excel File Reader

Excel File Reader

ARTIFICIAL INTELLIGENCE - MCQs - 2

ARTIFICIAL INTELLIGENCE - MCQs - 2

Artificial Intelligence - MCQs - 1

Artificial Intelligence - MCQs - 1

About Machine Learning

SOFTWARE ENGINEERING