Machine Learning - Deep Learning

Data Wrangling - Data Cleanup - MCQs

Data Wrangling - Data Cleanup - MCQs

1. Cleaning your data makes for easier

storage
search
reuse
All of the above

Ans: 4

2. It’s much easier to store your data in proper models if it’s cleaned first.

True
False

Ans: 1

3. The IPython magic commands

%logstart
%save
BOTH
NONE

Ans: 3

4. MICS means

Multiple Indicator Cluster System
Multiple Indicator Customer Surveys
Multiple Indicator Cluster Surveys
Multiple Indicator Customer System

Ans: 3

5. MICS raw data is in which format?

SPSS format
.sav files
Both
None

Ans: 3

6. How can you identify values for data cleaning?

Replacing headers
Zipping questions and answers
Both
None

Ans: 3

Data Wrangling - Data Cleanup

from csv import DictReader

data_rdr = DictReader(open('mn.csv', 'rt'))
header_rdr = DictReader(open('mn_headers.csv', 'rt'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr]
print (data_rows[:5])
print (header_rows[:5])

[{'': '1', 'HH1': '1', 'HH2': '17', 'LN': '1', 'MWM1': '1', 'MWM2': '17', 'MWM4': '1', 'MWM5': '14', 'MWM6D': '7', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '2', 'MWM9': '20', 'MWM10H': '17', 'MWM10M': '59', 'MWM11H': '18', 'MWM11M': '7', 'MWB1M': '5', 'MWB1Y': '1984', 'MWB2': '29', 'MWB3': 'Yes', 'MWB4': 'Higher', 'MWB5': '31', 'MWB7': 'NA', 'MMT2': 'Almost every day', 'MMT3': 'At least once a week', 'MMT4': 'Less than once a week', 'MMT6': 'Yes', 'MMT7': 'Yes', 'MMT8': 'Almost every day', 'MMT9': 'Yes', 'MMT10': 'Yes', 'MMT11': 'Almost every day', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'No', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '0', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'Yes, currently married', 'MMA3': 'No (Only one)', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'Only once', 'MMA8M': '9', 'MMA8Y': '2013', 'MMA9': 'NA', 'MSB1': '20', 'MSB2': 'Yes', 'MSB3U': 'Days ago', 'MSB3N': '0', 'MSB4': 'No', 'MSB5': 'Wife', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '5', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'DK', 'MHA8B': 'Yes', 'MHA8C': 'DK', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'No', 'MHA12': 'Yes', 'MHA24': 'Yes', 'MHA25': 'Less than 12 months ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'No', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'No', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'Yes', 'MTA15': '16', 'MTA16': '0', 'MTA17': 'NA', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '1013', 'MWAGE': '25-29', 'MWDOM': '1365', 'MWAGEM': '29', 'MWDOBLC': 'NA', 'MMSTATUS': 'Currently married/in union', 'MCEB': '0', 'MCSURV': '0', 'MCDEAD': '0', 'mwelevel': 'Higher', 'mnweight': '0.403797141860459', 'wscore': '1.60367010204171', 'windex5': '5', 'wscoreu': '1.27255184167736', 'windex5u': '5', 'wscorer': 'NA', 'windex5r': 'NA'}, {'': '2', 'HH1': '1', 'HH2': '20', 'LN': '1', 'MWM1': '1', 'MWM2': '20', 'MWM4': '1', 'MWM5': '14', 'MWM6D': '7', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '2', 'MWM9': '20', 'MWM10H': '17', 'MWM10M': '32', 'MWM11H': '17', 'MWM11M': '42', 'MWB1M': '5', 'MWB1Y': '1976', 'MWB2': '37', 'MWB3': 'Yes', 'MWB4': 'Higher', 'MWB5': '31', 'MWB7': 'NA', 'MMT2': 'At least once a week', 'MMT3': 'Not at all', 'MMT4': 'Almost every day', 'MMT6': 'Yes', 'MMT7': 'Yes', 'MMT8': 'Almost every day', 'MMT9': 'Yes', 'MMT10': 'Yes', 'MMT11': 'Almost every day', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'No', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '0', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'Yes, currently married', 'MMA3': 'No (Only one)', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'Only once', 'MMA8M': '2', 'MMA8Y': '2014', 'MMA9': 'NA', 'MSB1': '37', 'MSB2': 'No', 'MSB3U': 'Days ago', 'MSB3N': '0', 'MSB4': 'No', 'MSB5': 'Wife', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '1', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'No', 'MHA8B': 'Yes', 'MHA8C': 'Yes', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'Yes', 'MHA12': 'No', 'MHA24': 'Yes', 'MHA25': 'Less than 12 months ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'No', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'No', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'No', 'MTA15': 'NA', 'MTA16': 'NA', 'MTA17': 'NA', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '917', 'MWAGE': '35-39', 'MWDOM': '1370', 'MWAGEM': '37', 'MWDOBLC': 'NA', 'MMSTATUS': 'Currently married/in union', 'MCEB': '0', 'MCSURV': '0', 'MCDEAD': '0', 'mwelevel': 'Higher', 'mnweight': '0.403797141860459', 'wscore': '1.54327702631422', 'windex5': '5', 'wscoreu': '1.08902631982422', 'windex5u': '5', 'wscorer': 'NA', 'windex5r': 'NA'}, {'': '3', 'HH1': '2', 'HH2': '1', 'LN': '1', 'MWM1': '2', 'MWM2': '1', 'MWM4': '1', 'MWM5': '9', 'MWM6D': '8', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '1', 'MWM9': '40', 'MWM10H': '10', 'MWM10M': '37', 'MWM11H': '10', 'MWM11M': '52', 'MWB1M': '2', 'MWB1Y': '1973', 'MWB2': '41', 'MWB3': 'Yes', 'MWB4': 'Primary', 'MWB5': '17', 'MWB7': 'Able to read whole sentence', 'MMT2': 'Not at all', 'MMT3': 'Almost every day', 'MMT4': 'Less than once a week', 'MMT6': 'No', 'MMT7': 'NA', 'MMT8': 'NA', 'MMT9': 'No', 'MMT10': 'NA', 'MMT11': 'NA', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'Yes', 'MCM3': '21', 'MCM4': 'Yes', 'MCM5A': '1', 'MCM5B': '1', 'MCM6': 'Yes', 'MCM7A': '1', 'MCM7B': '0', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '3', 'MCM11A': 'No', 'MCM11B': '2', 'MCM12M': '5', 'MCM12Y': '2012', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'Yes, currently married', 'MMA3': 'No (Only one)', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'More than once', 'MMA8M': '8', 'MMA8Y': '1991', 'MMA9': 'NA', 'MSB1': '20', 'MSB2': 'No', 'MSB3U': 'Days ago', 'MSB3N': '0', 'MSB4': 'Yes', 'MSB5': 'Wife', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '5', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'No', 'MHA8B': 'Yes', 'MHA8C': 'No', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'No', 'MHA12': 'Yes', 'MHA24': 'Yes', 'MHA25': '2 or more years ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'Yes', 'MTA2': '18', 'MTA3': 'Yes', 'MTA4': '10', 'MTA5': '30', 'MTA6': 'Yes', 'MTA7': 'Yes', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'Rolled tobacco', 'MTA8X': 'NA', 'MTA9': '7', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'Yes', 'MTA15': '19', 'MTA16': '8', 'MTA17': '2', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '878', 'MWAGE': '40-44', 'MWDOM': '1100', 'MWAGEM': '18', 'MWDOBLC': 'NA', 'MMSTATUS': 'Currently married/in union', 'MCEB': '3', 'MCSURV': '3', 'MCDEAD': '0', 'mwelevel': 'Primary', 'mnweight': '1.03192602919895', 'wscore': '0.878635263695964', 'windex5': '4', 'wscoreu': '-0.930720561098312', 'windex5u': '1', 'wscorer': 'NA', 'windex5r': 'NA'}, {'': '4', 'HH1': '2', 'HH2': '1', 'LN': '5', 'MWM1': '2', 'MWM2': '1', 'MWM4': '5', 'MWM5': '9', 'MWM6D': '12', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Not at home', 'MWM8': '1', 'MWM9': '40', 'MWM10H': 'NA', 'MWM10M': 'NA', 'MWM11H': 'NA', 'MWM11M': 'NA', 'MWB1M': 'NA', 'MWB1Y': 'NA', 'MWB2': 'NA', 'MWB3': 'NA', 'MWB4': 'NA', 'MWB5': 'NA', 'MWB7': 'NA', 'MMT2': 'NA', 'MMT3': 'NA', 'MMT4': 'NA', 'MMT6': 'NA', 'MMT7': 'NA', 'MMT8': 'NA', 'MMT9': 'NA', 'MMT10': 'NA', 'MMT11': 'NA', 'MMT12': 'NA', 'MMT13': 'NA', 'MMT14': 'NA', 'MCM1': 'NA', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'NA', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': 'NA', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'NA', 'MDV1B': 'NA', 'MDV1C': 'NA', 'MDV1D': 'NA', 'MDV1E': 'NA', 'MDV1F': 'NA', 'MMA1': 'NA', 'MMA3': 'NA', 'MMA4': 'NA', 'MMA5': 'NA', 'MMA6': 'NA', 'MMA7': 'NA', 'MMA8M': 'NA', 'MMA8Y': 'NA', 'MMA9': 'NA', 'MSB1': 'NA', 'MSB2': 'NA', 'MSB3U': 'NA', 'MSB3N': 'NA', 'MSB4': 'NA', 'MSB5': 'NA', 'MSB8': 'NA', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': 'NA', 'MHA1': 'NA', 'MHA2': 'NA', 'MHA3': 'NA', 'MHA4': 'NA', 'MHA5': 'NA', 'MHA6': 'NA', 'MHA7': 'NA', 'MHA8A': 'NA', 'MHA8B': 'NA', 'MHA8C': 'NA', 'MHA9': 'NA', 'MHA10': 'NA', 'MHA11': 'NA', 'MHA12': 'NA', 'MHA24': 'NA', 'MHA25': 'NA', 'MHA26': 'NA', 'MHA27': 'NA', 'MMC1': 'NA', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'NA', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'NA', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'NA', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'NA', 'MTA15': 'NA', 'MTA16': 'NA', 'MTA17': 'NA', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': 'NA', 'MWAGE': 'NA', 'MWDOM': 'NA', 'MWAGEM': 'NA', 'MWDOBLC': 'NA', 'MMSTATUS': 'NA', 'MCEB': 'NA', 'MCSURV': 'NA', 'MCDEAD': 'NA', 'mwelevel': 'NA', 'mnweight': '0', 'wscore': '0', 'windex5': '0', 'wscoreu': '0', 'windex5u': '0', 'wscorer': '0', 'windex5r': '0'}, {'': '5', 'HH1': '2', 'HH2': '1', 'LN': '8', 'MWM1': '2', 'MWM2': '1', 'MWM4': '8', 'MWM5': '9', 'MWM6D': '8', 'MWM6M': '4', 'MWM6Y': '2014', 'MWM7': 'Completed', 'MWM8': '1', 'MWM9': '40', 'MWM10H': '10', 'MWM10M': '53', 'MWM11H': '11', 'MWM11M': '10', 'MWB1M': '2', 'MWB1Y': '1993', 'MWB2': '21', 'MWB3': 'Yes', 'MWB4': 'Secondary', 'MWB5': '24', 'MWB7': 'NA', 'MMT2': 'Less than once a week', 'MMT3': 'At least once a week', 'MMT4': 'Less than once a week', 'MMT6': 'No', 'MMT7': 'NA', 'MMT8': 'NA', 'MMT9': 'No', 'MMT10': 'NA', 'MMT11': 'NA', 'MMT12': 'Yes', 'MMT13': 'Yes', 'MMT14': 'Almost every day', 'MCM1': 'No', 'MCM3': 'NA', 'MCM4': 'NA', 'MCM5A': 'NA', 'MCM5B': 'NA', 'MCM6': 'NA', 'MCM7A': 'NA', 'MCM7B': 'NA', 'MCM8': 'No', 'MCM9A': 'NA', 'MCM9B': 'NA', 'MCM10': '0', 'MCM11A': 'NA', 'MCM11B': 'NA', 'MCM12M': 'NA', 'MCM12Y': 'NA', 'MDV1A': 'No', 'MDV1B': 'No', 'MDV1C': 'No', 'MDV1D': 'No', 'MDV1E': 'No', 'MDV1F': 'No', 'MMA1': 'No, not in union', 'MMA3': 'NA', 'MMA4': 'NA', 'MMA5': 'No', 'MMA6': 'NA', 'MMA7': 'NA', 'MMA8M': 'NA', 'MMA8Y': 'NA', 'MMA9': 'NA', 'MSB1': '19', 'MSB2': 'Yes', 'MSB3U': 'Months ago', 'MSB3N': '7', 'MSB4': 'Yes', 'MSB5': 'Girlfriend', 'MSB8': 'No', 'MSB9': 'NA', 'MSB10': 'NA', 'MSB13': 'NA', 'MSB14': 'NA', 'MSB15': '3', 'MHA1': 'Yes', 'MHA2': 'Yes', 'MHA3': 'No', 'MHA4': 'Yes', 'MHA5': 'No', 'MHA6': 'No', 'MHA7': 'Yes', 'MHA8A': 'Yes', 'MHA8B': 'Yes', 'MHA8C': 'Yes', 'MHA9': 'Yes', 'MHA10': 'Yes', 'MHA11': 'No', 'MHA12': 'Yes', 'MHA24': 'Yes', 'MHA25': '12-23 months ago', 'MHA26': 'Yes', 'MHA27': 'NA', 'MMC1': 'No', 'MMC2': 'NA', 'MMC3': 'NA', 'MMC4': 'NA', 'MTA1': 'No', 'MTA2': 'NA', 'MTA3': 'NA', 'MTA4': 'NA', 'MTA5': 'NA', 'MTA6': 'No', 'MTA7': 'NA', 'MTA8A': 'NA', 'MTA8B': 'NA', 'MTA8C': 'NA', 'MTA8D': 'NA', 'MTA8E': 'NA', 'MTA8X': 'NA', 'MTA9': 'NA', 'MTA10': 'No', 'MTA11': 'NA', 'MTA12A': 'NA', 'MTA12B': 'NA', 'MTA12C': 'NA', 'MTA12X': 'NA', 'MTA13': 'NA', 'MTA14': 'Yes', 'MTA15': '20', 'MTA16': '10', 'MTA17': '2', 'TNLN': 'NA', 'TN4': 'NA', 'TN5': 'NA', 'TN6': 'NA', 'TN8': 'NA', 'TN9': 'NA', 'TN10': 'NA', 'TN11': 'NA', 'TN12_1': 'NA', 'TN12_2': 'NA', 'TN12_3': 'NA', 'TN12_4': 'NA', 'HH6': 'Urban', 'HH7': 'Bulawayo', 'MWDOI': '1372', 'MWDOB': '1118', 'MWAGE': '20-24', 'MWDOM': 'NA', 'MWAGEM': 'NA', 'MWDOBLC': 'NA', 'MMSTATUS': 'Never married/in union', 'MCEB': '0', 'MCSURV': '0', 'MCDEAD': '0', 'mwelevel': 'Secondary', 'mnweight': '1.03192602919895', 'wscore': '0.878635263695964', 'windex5': '4', 'wscoreu': '-0.930720561098312', 'windex5u': '1', 'wscorer': 'NA', 'windex5r': 'NA'}]
[{'Name': 'HH1', 'Label': 'Cluster number', 'Question': ''}, {'Name': 'HH2', 'Label': 'Household number', 'Question': ''}, {'Name': 'LN', 'Label': 'Line number', 'Question': ''}, {'Name': 'MWM1', 'Label': 'Cluster number', 'Question': ''}, {'Name': 'MWM2', 'Label': 'Household number', 'Question': ''}]

for data_dict in data_rows:
  for dkey, dval in data_dict.items():
    for header_dict in header_rows:
      for hkey, hval in header_dict.items():
        if dkey == hval:
            print('match!')


match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
match!
.....

new_rows = []
for data_dict in data_rows:
  new_row = {}
  for dkey, dval in data_dict.items():
    for header_dict in header_rows:
      if dkey in header_dict.values():
        new_row[header_dict.get('Label')] = dval
  new_rows.append(new_row)

new_rows[0]

'Relationship to last sexual partner': 'Wife',
 'Sex with any other person in the last 12 month': 'No',
 'Condom used with prior sexual partner': 'NA',
 'Relationship to prior sexual partner': 'NA',
 'Sex with any other man in the last 12 months': 'NA',
 'Number of sex partners in last 12 months': 'NA',
 'Number of sex partners in lifetime': '5',
 'Ever heard of AIDS': 'Yes',
 'Can avoid AIDS virus by having one uninfected partner': 'Yes',
 'Can get AIDS virus through supernatural means': 'No',
 'Can avoid AIDS virus by using a condom correctly every time': 'Yes',
 'Can get AIDS virus from mosquito bites': 'No',
 'Can get AIDS virus by sharing food with a person who has AIDS': 'No',
 'Healthy-looking person may have AIDS virus': 'Yes',
 'AIDS virus from mother to child during pregnancy': 'DK',
 'AIDS virus from mother to child during delivery': 'Yes',
 'AIDS virus from mother to child through breastfeeding': 'DK',
 'Should female teacher with AIDS virus be allowed to teach in school': 'Yes',
 'Would buy fresh vegetables from shopkeeper with AIDS virus': 'Yes',
 'If HH member became infected with AIDS virus, would want it to remain a secret': 'No',
 'Willing to care for person with AIDS in household': 'Yes',
 'Ever been tested for AIDS virus': 'Yes',
 'Most recent time of testing for AIDS virus': 'Less than 12 months ago',
 'Received results of AIDS virus test': 'Yes',
 'Know a place to get AIDS virus test': 'NA',
 'Ever tried cigarette smoking': 'No',
 'Age when cigarette was smoked for the first time': 'NA',
 'Currently smoking cigarettes': 'NA',
 'Number of cigarettes smoked in the last 24 hours': 'NA',
 'Number of days when cigarettes were smoked in past month': 'NA',
 'Ever tried any smoked tobacco products other than cigarettes': 'No',
 'Used any smoked tobacco products during the last month': 'NA',
 'Type of smoked tobacco product: Cigars': 'NA',
 'Type of smoked tobacco product: Water pipe': 'NA',
 'Type of smoked tobacco product: Cigarillos': 'NA',
 'Type of smoked tobacco product: Pipe': 'NA',
 'Type of smoked tobacco product: Other': 'NA',
 'Number of days when tobacco products where smoked in past month': 'NA',
 'Ever tried any form of smokeless tobacco products': 'No',
 'Used any smokeless tobacco products during the last month': 'NA',
 'Type of smokeless tobacco product used: Chewing tobacco': 'NA',
 'Type of smokeless tobacco product used: Snuff': 'NA',
 'Type of smokeless tobacco product used: Dip': 'NA',
 'Type of smokeless tobacco product used: Other': 'NA',
 'Number of days when smokeless tobacco products where used in past month': 'NA',
 'Ever drunk alcohol': 'Yes',
 'Age when alcohol was used for the first time': '16',
 'Number of days when at least one drink of alcohol was used in past month': '0',
 'Number of drinks usually consumed': 'NA',
 'Months ago net obtained': 'NA',
 'Net treated with an insecticide when obtained': 'NA',
 'Net soaked or dipped since obtained': 'NA',
 'Months ago net soaked or dipped': 'NA',
 'Persons slept under mosquito net last night': 'NA',
 'Person 1 who slept under net': 'NA',
 'Person 2 who slept under net': 'NA',
 'Person 3 who slept under net': 'NA',
 'Person 4 who slept under net': 'NA'}


from csv import reader
data_rdr = reader(open('mn.csv', 'rt'))
header_rdr = reader(open('mn_headers.csv', 'rt'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr]
print (len(data_rows[0]))
print (len(header_rows))

159
210

data_rows[0]
['',
 'HH1',
 'HH2',
 'LN',
 'MWM1',
 'MWM2',
 'MWM4',
 'MWM5',
 'MWM6D',
 'MWM6M',
 'MWM6Y',
 'MWM7',
 'MWM8',
 'MWM9',
 'MWM10H',
 'MWM10M',
 'MWM11H',
 'MWM11M',
 'MWB1M',
 'MWB1Y',
 'MWB2',
 'MWB3',
 'MWB4',
 'MWB5',
 'MWB7',
 'MMT2',
 'MMT3',
 'MMT4',
 'MMT6',
 'MMT7',
 'MMT8',
 'MMT9',
 'MMT10',
 'MMT11',
 'MMT12',
 'MMT13',
 'MMT14',
 'MCM1',
 'MCM3',
 'MCM4',
 'MCM5A',
 'MCM5B',
 'MCM6',
 'MCM7A',
 'MCM7B',
 'MCM8',
 'MCM9A',
 'MCM9B',
 'MCM10',
 'MCM11A',
 'MCM11B',
 'MCM12M',
 'MCM12Y',
 'MDV1A',
 'MDV1B',
 'MDV1C',
 'MDV1D',
 'MDV1E',
 'MDV1F',
 'MMA1',
 'MMA3',
 'MMA4',
 'MMA5',
 'MMA6',
 'MMA7',
 'MMA8M',
 'MMA8Y',
 'MMA9',
 'MSB1',
 'MSB2',
 'MSB3U',
 'MSB3N',
 'MSB4',
 'MSB5',
 'MSB8',
 'MSB9',
 'MSB10',
 'MSB13',
 'MSB14',
 'MSB15',
 'MHA1',
 'MHA2',
 'MHA3',
 'MHA4',
 'MHA5',
 'MHA6',
 'MHA7',
 'MHA8A',
 'MHA8B',
 'MHA8C',
 'MHA9',
 'MHA10',
 'MHA11',
 'MHA12',
 'MHA24',
 'MHA25',
 'MHA26',
 'MHA27',
 'MMC1',
 'MMC2',
 'MMC3',
 'MMC4',
 'MTA1',
 'MTA2',
 'MTA3',
 'MTA4',
 'MTA5',
 'MTA6',
 'MTA7',
 'MTA8A',
 'MTA8B',
 'MTA8C',
 'MTA8D',
 'MTA8E',
 'MTA8X',
 'MTA9',
 'MTA10',
 'MTA11',
 'MTA12A',
 'MTA12B',
 'MTA12C',
 'MTA12X',
 'MTA13',
 'MTA14',
 'MTA15',
 'MTA16',
 'MTA17',
 'TNLN',
 'TN4',
 'TN5',
 'TN6',
 'TN8',
 'TN9',
 'TN10',
 'TN11',
 'TN12_1',
 'TN12_2',
 'TN12_3',
 'TN12_4',
 'HH6',
 'HH7',
 'MWDOI',
 'MWDOB',
 'MWAGE',
 'MWDOM',
 'MWAGEM',
 'MWDOBLC',
 'MMSTATUS',
 'MCEB',
 'MCSURV',
 'MCDEAD',
 'mwelevel',
 'mnweight',
 'wscore',
 'windex5',
 'wscoreu',
 'windex5u',
 'wscorer',
 'windex5r']

header_rows[:2]
[['HH1', 'Cluster number', ''], ['HH2', 'Household number', '']]
bad_rows = []
for h in header_rows:
  if h[0] not in data_rows[0]:
    bad_rows.append(h)
for h in bad_rows:
  header_rows.remove(h)
print(len(header_rows))

150
all_short_headers = [h[0] for h in header_rows]
for header in data_rows[0]:
  if header not in all_short_headers:
    print ('mismatch!', header)

mismatch! 
mismatch! MDV1F
mismatch! MTA8E
mismatch! mwelevel
mismatch! mnweight
mismatch! wscoreu
mismatch! windex5u
mismatch! wscorer
mismatch! windex5r
from csv import reader
data_rdr = reader(open('mn.csv', 'rt'))
header_rdr = reader(open('mn_headers_updated.csv', 'rt'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr if h[0] in data_rows[0]]
print(len(header_rows))
all_short_headers = [h[0] for h in header_rows]
skip_index = []
for header in data_rows[0]:
  if header not in all_short_headers:
    index = data_rows[0].index(header)
    skip_index.append(index)
new_data = []
for row in data_rows[1:]:
  new_row = []
  for i, d in enumerate(row):
    if i not in skip_index:
      new_row.append(d)
  new_data.append(new_row)
zipped_data = []
for drow in new_data:
  zipped_data.append(zip(header_rows, drow))

152

zipped_data[0]
<zip at 0x7fa99615dc40>

data_headers = []
for i, header in enumerate(data_rows[0]):
  if i not in skip_index:
    data_headers.append(header)
header_match = zip(data_headers, all_short_headers)
print(header_match)

<zip object at 0x7fa9969f37c0>

from csv import reader
data_rdr = reader(open('mn.csv', 'rt',encoding='utf-8'))
header_rdr = reader(open('mn_headers_updated.csv', 'rt',encoding='utf-8'))
data_rows = [d for d in data_rdr]
header_rows = [h for h in header_rdr if h[0] in data_rows[0]]
all_short_headers = [h[0] for h in header_rows]
skip_index = []
final_header_rows = []
for header in data_rows[0]:
  if header not in all_short_headers:
    index = data_rows[0].index(header)
    skip_index.append(index)
  else:
    for head in header_rows:
      if head[0] == header:
        final_header_rows.append(head)
        break
new_data = []
for row in data_rows[1:]:
  new_row = []
  for i, d in enumerate(row):
    if i not in skip_index:
      new_row.append(d)
  new_data.append(new_row)
zipped_data = []
for drow in new_data:
  zipped_data.append(zip(final_header_rows, drow))

zipped_data[0]

<zip at 0x7fa9b5fbdc00>

Data Wrangling -1 MCQs _______________________________________________

JSON stands for _______

A. JavaScript Object Notation
B. Java Object Notation
C. JavaScript Object Normalization
D. JavaScript Object-Oriented Notation

ANS: A

2. JSON is a _____ for storing and transporting data.

A. xml format
B. text format
C. JavaScript
D. php format

ANS: B

3. The JSON syntax is a subset of the _____ syntax.

A. Ajax
B. Php
C. HTML
D. javaScript

ANS: D

Excel File Reader

ARTIFICIAL INTELLIGENCE - MCQs - 2

1. Which search is equal to minimax search but eliminates the branches that can’t influence the final decision?

A. Depth-first search
B. Breadth-first search
C. Alpha-beta pruning
D. None of the mentioned

Ans: C

2. Which values are independent in minimax search algorithm?

A. Pruned leaves x and y
B. Every states are dependent
C. Root is independent
D. None of the mentioned

Ans: A

3. To which depth does the alpha-beta pruning can be applied?

A. 10 states
B. 8 States
C. 6 States
D. Any depth

Ans: D

4. Which search is similar to minimax search?

A. Hill-climbing search
B. Depth-first search
C. Breadth-first search
D. All of the mentioned

Ans: B

5. Which value is assigned to alpha and beta in the alpha-beta pruning?

A. Alpha = max
B. Beta = min
C. Beta = max
D. Both Alpha = max & Beta = min

Ans: D

6. Where does the values of alpha-beta search get updated?

A. Along the path of search
B. Initial state itself
C. At the end
D. None of the mentioned

Ans: A

7. How the effectiveness of the alpha-beta pruning gets increased?

A. Depends on the nodes
B. Depends on the order in which they are executed
C. All of the mentioned
D. None of the mentioned

Ans: A

8. What is called as transposition table?

A. Hash table of next seen positions
B. Hash table of previously seen positions
C. Next value in the search
D. None of the mentioned

Ans: B

9. Which is identical to the closed list in Graph search?

A. Hill climbing search algorithm
B. Depth-first search
C. Transposition table
D. None of the mentioned

Ans: C

10. Which function is used to calculate the feasibility of whole game tree?

A. Evaluation function
B. Transposition
C. Alpha-beta pruning
D. All of the mentioned

Ans: A

11. General games involves ____________

A. Single-agent
B. Multi-agent
C. Neither Single-agent nor Multi-agent
D. Only Single-agent and Multi-agent

Ans: D

12. Adversarial search problems uses ____________

A. Competitive Environment
B. Cooperative Environment
C. Neither Competitive nor Cooperative Environment
D. Only Competitive and Cooperative Environment

Ans: A

13. Zero sum games are the one in which there are two agents whose actions must alternate and in which the utility values at the end of the game are always the same.

A. True
B. False

Ans: B

14. Zero sum game has to be a ______ game.

A. Single player

B. Two player

C. Multiplayer

D. Three player

Ans: C

15. A game can be formally defined as a kind of search problem with the following components.

A. Initial State
B. Successor Function
C. Terminal Test
D. All of the mentioned

Ans: D

16. The initial state and the legal moves for each side define the __________ for the game.

A. Search Tree
B. Game Tree
C. State Space Search
D. Forest

Ans: B

17. General algorithm applied on game tree for making decision of win/lose is ____________

A. DFS/BFS Search Algorithms
B. Heuristic Search Algorithms
C. Greedy Search Algorithms
D. MIN/MAX Algorithms

Ans: D

18. The minimax algorithm computes the minimax decision from the current state. It uses a simple recursive computation of the minimax values of each successor state, directly implementing the defining equations. The recursion proceeds all the way down to the leaves of the tree, and then the minimax values are backed up through the tree as the recursion unwinds.

A. True
B. False

Ans: A

19. Which is the most straightforward approach for planning algorithm?

A. Best-first search
B. State-space search
C. Depth-first search
D. Hill-climbing search

Ans: B

20. What are taken into account of state-space search?

A. Postconditions
B. Preconditions
C. Effects
D. Both Preconditions & Effects

Ans: D

21. Which approach is to pretend that a pure divide and conquer algorithm will work?

A. Goal independence
B. Subgoal independence
C. Both Goal & Subgoal independence
D. None of the mentioned

Ans: B

22. Which is the best way to go for Game playing problem?

A. Linear approach
B. Heuristic approach (Some knowledge is stored)
C. Random approach
D. An Optimal approach

Ans: B

23. A production rule consists of ____________

A. A set of Rule
B. A sequence of steps
C. Set of Rule & sequence of steps
D. Arbitrary representation to problem

Ans: C

24. Which search method takes less memory?

A. Depth-First Search
B. Breadth-First search
C. Linear Search
D. Optimal search

Ans: A

25. What is the major component/components for measuring the performance of problem solving?

A. Completeness
B. Optimality
C. Time and Space complexity
D. All of the mentioned

Ans: D

26. The Set of actions for a problem in a state space is formulated by a ___________

A. Intermediate states
B. Initial state
C. Successor function, which takes current action and returns next immediate state
D. None of the mentioned

Ans: C

27. What is state space?

A. The whole problem
B. Your Definition to a problem
C. Problem you design
D. Representing your problem with variable and parameter

Ans: D

28. What is the objective of tower of hanoi puzzle?

A. To move all disks to some other rod by following rules
B. To divide the disks equally among the three rods by following rules
C. To move all disks to some other rod in random order
D. To divide the disks equally among three rods in random order

Ans: A

29. Which of the following is NOT a rule of tower of hanoi puzzle?

A. No disk should be placed over a smaller disk
B. Disk can only be moved if it is the uppermost disk of the stack
C. No disk should be placed over a larger disk
D. Only one disk can be moved at a time

Ans: C

30. Recursive solution of tower of hanoi problem is an example of which of the following algorithm?

A. Dynamic programming

B. Backtracking

C. Greedy algorithm

D. Divide and conquer

Ans: D

Machine Learning - Deep Learning

Data Wrangling - Data Cleanup - MCQs

Data Wrangling - Data Cleanup

Data Wrangling -1 MCQs

Data Wrangling -1 MCQs _______________________________________________

4. Excel File Reader

Excel File Reader

ARTIFICIAL INTELLIGENCE - MCQs - 2

ARTIFICIAL INTELLIGENCE - MCQs - 2

About Machine Learning

SOFTWARE ENGINEERING