How to split on two items in a string? [duplicate] - python

This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Closed 9 years ago.
Using .read() to read a file, how would I split on two objects at once? I'm trying to split on commas, and "\n" simultaneously, but when I split on commas first, it turns my string into a list, in which I cannot split again.
Here is the string I'm trying to split:
'States, Total Score, Critical Reading, Mathematics, Writing, Participation (%)\nWashington,1564,524,532,508,41.2000\nNewHampshire,1554,520,524,510,64.0000\nMassachusetts,1547,512,526,509,72.1000\nOregon,1546,523,524,499,37.1000\nVermont,1546,519,512,506,64.0000\nArizona,1544,519,525,500,22.4000\nConnecticut,1536,509,514,513,71.2000\nAlaska,1524,518,515,491,32.7000\nVirginia,1521,512,512,497,56.0000\nCalifornia,1517,501,516,500,37.5000\nNewJersey,1506,495,514,497,69.0000\nMaryland,1502,501,506,495,56.7000\nNorthCarolina,1485,497,511,477,45.5000\nRhodeIsland,1477,494,495,488,60.8000\nIndiana,1476,494,505,477,52.0000\nFlorida,1473,496,498,479,44.7000\nPennsylvania,1473,492,501,480,62.3000\nNevada,1470,496,501,473,25.9000\nDelaware,1469,493,495,481,59.2000\nTexas,1462,484,505,473,41.5000\nNewYork,1461,484,499,478,59.6000\nHawaii,1458,483,505,470,47.1000\nGeorgia,1453,488,490,475,46.5000\nSouthCarolina,1447,484,495,468,40.7000\nMaine,1389,468,467,454,87.1000\nIowa,1798,603,613,582,2.7000\nMinnesota,1781,594,607,580,6.0000\nWisconsin,1778,595,604,579,3.8000\nMissouri,1768,593,595,580,3.6000\nMichigan,1766,585,605,576,3.8000\nSouthDakota,1766,592,603,571,2.0000\nIllinois,1762,585,600,577,4.6700\nKansas,1752,590,595,567,4.7000\nNebraska,1746,585,593,568,3.9000\nNorthDakota,1733,580,594,559,3.4000\nKentucky,1713,575,575,563,5.0000\nTennessee,1712,576,571,565,6.4000\nColorado,1695,568,572,555,14.1000\nArkansas,1684,566,566,552,3.5000\nOklahoma,1684,569,568,547,3.8000\nWyoming,1683,570,567,546,3.6000\nUtah,1674,568,559,547,4.5000\nMississippi,1666,566,548,552,2.2000\nLouisiana,1652,555,550,547,4.0000\nAlabama,1650,556,550,544,5.4000\nNewMexico,1636,553,549,534,7.1000\nOhio,1609,538,548,522,17.2000\nIdaho,1601,543,541,517,14.6000\nMontana,1593,538,538,517,20.0000\nWest Virginia,1522,515,507,500,13.2000\n'

You can use a list comprehension:
>>> strs = 'States, Total Score, Critical Reading, Mathematics, Writing, Participation (%)\nWashington,1564,524,532,508,41.2000\nNewHampshire,1554,520,524,510,64.0000\nMassachusetts,1547,512,526,509,72.1000\nOregon,1546,523,524,499,37.1000\nVermont,1546,519,512,506,64.0000\nArizona,1544,519,525,500,22.4000\nConnecticut,1536,509,514,513,71.2000\nAlaska,1524,518,515,491,32.7000\nVirginia,1521,512,512,497,56.0000\nCalifornia,1517,501,516,500,37.5000\nNewJersey,1506,495,514,497,69.0000\nMaryland,1502,501,506,495,56.7000\nNorthCarolina,1485,497,511,477,45.5000\nRhodeIsland,1477,494,495,488,60.8000\nIndiana,1476,494,505,477,52.0000\nFlorida,1473,496,498,479,44.7000\nPennsylvania,1473,492,501,480,62.3000\nNevada,1470,496,501,473,25.9000\nDelaware,1469,493,495,481,59.2000\nTexas,1462,484,505,473,41.5000\nNewYork,1461,484,499,478,59.6000\nHawaii,1458,483,505,470,47.1000\nGeorgia,1453,488,490,475,46.5000\nSouthCarolina,1447,484,495,468,40.7000\nMaine,1389,468,467,454,87.1000\nIowa,1798,603,613,582,2.7000\nMinnesota,1781,594,607,580,6.0000\nWisconsin,1778,595,604,579,3.8000\nMissouri,1768,593,595,580,3.6000\nMichigan,1766,585,605,576,3.8000\nSouthDakota,1766,592,603,571,2.0000\nIllinois,1762,585,600,577,4.6700\nKansas,1752,590,595,567,4.7000\nNebraska,1746,585,593,568,3.9000\nNorthDakota,1733,580,594,559,3.4000\nKentucky,1713,575,575,563,5.0000\nTennessee,1712,576,571,565,6.4000\nColorado,1695,568,572,555,14.1000\nArkansas,1684,566,566,552,3.5000\nOklahoma,1684,569,568,547,3.8000\nWyoming,1683,570,567,546,3.6000\nUtah,1674,568,559,547,4.5000\nMississippi,1666,566,548,552,2.2000\nLouisiana,1652,555,550,547,4.0000\nAlabama,1650,556,550,544,5.4000\nNewMexico,1636,553,549,534,7.1000\nOhio,1609,538,548,522,17.2000\nIdaho,1601,543,541,517,14.6000\nMontana,1593,538,538,517,20.0000\nWest Virginia,1522,515,507,500,13.2000\n'
>>> [ y for x in strs.splitlines() for y in x.split(",")]
['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)', 'Washington', '1564', '524', '532', '508', '41.2000', 'NewHampshire', '1554', '520', '524', '510', '64.0000', 'Massachusetts', '1547', '512', '526', '509', '72.1000', 'Oregon', '1546', '523', '524', '499', '37.1000', 'Vermont', '1546', '519', '512', '506', '64.0000', 'Arizona', '1544', '519', '525', '500', '22.4000', 'Connecticut', '1536', '509', '514', '513', '71.2000', 'Alaska', '1524', '518', '515', '491', '32.7000', 'Virginia', '1521', '512', '512', '497', '56.0000', 'California', '1517', '501', '516', '500', '37.5000', 'NewJersey', '1506', '495', '514', '497', '69.0000', 'Maryland', '1502', '501', '506', '495', '56.7000', 'NorthCarolina', '1485', '497', '511', '477', '45.5000', 'RhodeIsland', '1477', '494', '495', '488', '60.8000', 'Indiana', '1476', '494', '505', '477', '52.0000', 'Florida', '1473', '496', '498', '479', '44.7000', 'Pennsylvania', '1473', '492', '501', '480', '62.3000', 'Nevada', '1470', '496', '501', '473', '25.9000', 'Delaware', '1469', '493', '495', '481', '59.2000', 'Texas', '1462', '484', '505', '473', '41.5000', 'NewYork', '1461', '484', '499', '478', '59.6000', 'Hawaii', '1458', '483', '505', '470', '47.1000', 'Georgia', '1453', '488', '490', '475', '46.5000', 'SouthCarolina', '1447', '484', '495', '468', '40.7000', 'Maine', '1389', '468', '467', '454', '87.1000', 'Iowa', '1798', '603', '613', '582', '2.7000', 'Minnesota', '1781', '594', '607', '580', '6.0000', 'Wisconsin', '1778', '595', '604', '579', '3.8000', 'Missouri', '1768', '593', '595', '580', '3.6000', 'Michigan', '1766', '585', '605', '576', '3.8000', 'SouthDakota', '1766', '592', '603', '571', '2.0000', 'Illinois', '1762', '585', '600', '577', '4.6700', 'Kansas', '1752', '590', '595', '567', '4.7000', 'Nebraska', '1746', '585', '593', '568', '3.9000', 'NorthDakota', '1733', '580', '594', '559', '3.4000', 'Kentucky', '1713', '575', '575', '563', '5.0000', 'Tennessee', '1712', '576', '571', '565', '6.4000', 'Colorado', '1695', '568', '572', '555', '14.1000', 'Arkansas', '1684', '566', '566', '552', '3.5000', 'Oklahoma', '1684', '569', '568', '547', '3.8000', 'Wyoming', '1683', '570', '567', '546', '3.6000', 'Utah', '1674', '568', '559', '547', '4.5000', 'Mississippi', '1666', '566', '548', '552', '2.2000', 'Louisiana', '1652', '555', '550', '547', '4.0000', 'Alabama', '1650', '556', '550', '544', '5.4000', 'NewMexico', '1636', '553', '549', '534', '7.1000', 'Ohio', '1609', '538', '548', '522', '17.2000', 'Idaho', '1601', '543', '541', '517', '14.6000', 'Montana', '1593', '538', '538', '517', '20.0000', 'West Virginia', '1522', '515', '507', '500', '13.2000']
If you want a list of lists containing each line split at ,:
>>> [x.split(",") for x in strs.splitlines()]
[['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)'], ['Washington', '1564', '524', '532', '508', '41.2000'], ['NewHampshire', '1554', '520', '524', '510', '64.0000'], ['Massachusetts', '1547', '512', '526', '509', '72.1000'], ['Oregon', '1546', '523', '524', '499', '37.1000'], ['Vermont', '1546', '519', '512', '506', '64.0000'], ['Arizona', '1544', '519', '525', '500', '22.4000'], ['Connecticut', '1536', '509', '514', '513', '71.2000'], ['Alaska', '1524', '518', '515', '491', '32.7000'], ['Virginia', '1521', '512', '512', '497', '56.0000'], ['California', '1517', '501', '516', '500', '37.5000'], ['NewJersey', '1506', '495', '514', '497', '69.0000'], ['Maryland', '1502', '501', '506', '495', '56.7000'], ['NorthCarolina', '1485', '497', '511', '477', '45.5000'], ['RhodeIsland', '1477', '494', '495', '488', '60.8000'], ['Indiana', '1476', '494', '505', '477', '52.0000'], ['Florida', '1473', '496', '498', '479', '44.7000'], ['Pennsylvania', '1473', '492', '501', '480', '62.3000'], ['Nevada', '1470', '496', '501', '473', '25.9000'], ['Delaware', '1469', '493', '495', '481', '59.2000'], ['Texas', '1462', '484', '505', '473', '41.5000'], ['NewYork', '1461', '484', '499', '478', '59.6000'], ['Hawaii', '1458', '483', '505', '470', '47.1000'], ['Georgia', '1453', '488', '490', '475', '46.5000'], ['SouthCarolina', '1447', '484', '495', '468', '40.7000'], ['Maine', '1389', '468', '467', '454', '87.1000'], ['Iowa', '1798', '603', '613', '582', '2.7000'], ['Minnesota', '1781', '594', '607', '580', '6.0000'], ['Wisconsin', '1778', '595', '604', '579', '3.8000'], ['Missouri', '1768', '593', '595', '580', '3.6000'], ['Michigan', '1766', '585', '605', '576', '3.8000'], ['SouthDakota', '1766', '592', '603', '571', '2.0000'], ['Illinois', '1762', '585', '600', '577', '4.6700'], ['Kansas', '1752', '590', '595', '567', '4.7000'], ['Nebraska', '1746', '585', '593', '568', '3.9000'], ['NorthDakota', '1733', '580', '594', '559', '3.4000'], ['Kentucky', '1713', '575', '575', '563', '5.0000'], ['Tennessee', '1712', '576', '571', '565', '6.4000'], ['Colorado', '1695', '568', '572', '555', '14.1000'], ['Arkansas', '1684', '566', '566', '552', '3.5000'], ['Oklahoma', '1684', '569', '568', '547', '3.8000'], ['Wyoming', '1683', '570', '567', '546', '3.6000'], ['Utah', '1674', '568', '559', '547', '4.5000'], ['Mississippi', '1666', '566', '548', '552', '2.2000'], ['Louisiana', '1652', '555', '550', '547', '4.0000'], ['Alabama', '1650', '556', '550', '544', '5.4000'], ['NewMexico', '1636', '553', '549', '534', '7.1000'], ['Ohio', '1609', '538', '548', '522', '17.2000'], ['Idaho', '1601', '543', '541', '517', '14.6000'], ['Montana', '1593', '538', '538', '517', '20.0000'], ['West Virginia', '1522', '515', '507', '500', '13.2000']]
Instead of generating the whole list at once you can use itertools.chain to get elements lazily (Or even better if you iterate over one line at once, prefer #Martijn Pieters's solution in that case):
>>> from itertools import chain
>>> for elem in chain(*(x.split(",") for x in strs.splitlines())):
... print elem
...
States
Total Score
Critical Reading
Mathematics
Writing
Participation (%)
Washington
...

Don't read the whole file in one go, read per line, then split:
with open(filepath) as f:
for line in f:
print line.strip().split(',')
You could also first split on newlines, then loop and split on commas:
lines = [line.split(',') for line in somestring.splitlines()]
But for comma-separated files, your best bet is to use the csv module:
import csv
with open(filepath, 'rb') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
print row
This gives you the rows as:
['States', ' Total Score', ' Critical Reading', ' Mathematics', ' Writing', ' Participation (%)']
['Washington', '1564', '524', '532', '508', '41.2000']
['NewHampshire', '1554', '520', '524', '510', '64.0000']
Since you have a first row with headers, you could use a DictReader as well and get dictionaries mapping headers to values:
with open(filepath, 'rb') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print row
# address columns as: row['States'], row['Total Score']
which outputs rows as:
{' Writing': '508', ' Total Score': '1564', ' Critical Reading': '524', 'States': 'Washington', ' Mathematics': '532', ' Participation (%)': '41.2000'}

there's re.split for multiple characters split:
import re
re.split("\n| ","this is\na short\ntest...")
>>> ['this', 'is', 'a', 'short', 'test...']

you could use the split() from the re function where you are able to define a regex for the splitting
look at this: python split string based on regular expression

Related

only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices when using EvolutionaryFS

I'm using GeneticAlgorithm to select the features.
So I used EvolutionaryFS library
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Dropout, BatchNormalization, Activation
from tensorflow.python.keras.utils import np_utils
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from EvolutionaryFS import GeneticAlgorithmFS
seed = 0
np.random.seed(seed)
df = pd.read_csv("/content/drive/MyDrive/RT_predict/Urine_DnS/Dataset/0607/0607Dragon_0607edit.csv")
dataset = df.values
X = dataset[:,0:-1]
Y = dataset[:,-1]
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=seed)
input_dim = X.shape[1]
def build_model(n1_neurons=1000, n2_neurons=500):
model = keras.models.Sequential()
model.add(keras.layers.InputLayer(input_shape=input_dim))
model.add(keras.layers.Dense(n1_neurons, activation="relu"))
model.add(keras.layers.Dense(n2_neurons, activation="relu"))
model.add(keras.layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae', 'mse'])
return model
data_dict={0:{'x_train':X_train,'y_train':Y_train,'x_test':X_test,'y_test':Y_test}}
columns_list=list(df.columns)
model_object=build_model
evoObj=GeneticAlgorithmFS(model=model_object,data_dict=data_dict,cost_function='mean_squared_error',average='',cost_function_improvement='decrease',columns_list=columns_list,generations=100,population=50,prob_crossover=0.9,prob_mutation=0.1,run_time=60000)
best_columns=evoObj.GetBestFeatures()
print(best_columns)
and I got error like this:
IndexError Traceback (most recent call
last) <ipython-input-20-33e6ab735f97> in <module>()
47 model_object=build_model
48 evoObj=GeneticAlgorithmFS(model=model_object,data_dict=data_dict,cost_function='mean_squared_error',average='',cost_function_improvement='decrease',columns_list=columns_list,generations=100,population=50,prob_crossover=0.9,prob_mutation=0.1,run_time=60000)
---> 49 best_columns=evoObj.GetBestFeatures()
50 print(best_columns)
2 frames /usr/local/lib/python3.7/dist-packages/EvolutionaryFS.py in
_getCost(self, population_array)
95 for i in self.data_dict.keys():
96
---> 97 x_train=self.data_dict[i]['x_train'][columns_list]
98 y_train=self.data_dict[i]['y_train']
99
IndexError: only integers, slices (`:`), ellipsis (`...`),
numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I think there is a problem about dataset, but I can't solve this problem.
Edited at July 6th.
I did advise that StatguyUser suggested, and I got this error message when I inactive
best_columns=evoObj.GetBestFeatures()
print(best_columns)
['Unnamed: 0', 'MW', 'Sv', 'Se', 'Sp', ..., 'ALOGP', 'Normalized RT (min)']
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-12-a63bc4c481bb> in <module>()
46 print(columns_list)
47
---> 48 print(data_dict[0]['x_train'][columns_list].shape)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Edited at July 26th.
I did advise that StatguyUser suggested, but it not works.
My error message is like this
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '148', '149', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '164', '165', '166', '167', '168', '169', '170', '171', '172', '173', '174', '175', '176', '177', '178', '179', '180', '181', '182', '183', '184', '185', '186', '187', '188', '189', '190', '191', '192', '193', '194', '195', '196', '197', '198', '199', '200', '201', '202', '203', '204', '205', '206', '207', '208', '209', '210', '211', '212', '213', '214', '215', '216', '217', '218', '219', '220', '221', '222', '223', '224', '225', '226', '227', '228', '229', '230', '231', '232', '233', '234', '235', '236', '237', '238', '239', '240', '241', '242', '243', '244', '245', '246', '247', '248', '249', '250', '251', '252', '253', '254', '255', '256', '257', '258', '259', '260', '261', '262', '263', '264', '265', '266', '267', '268', '269', '270', '271', '272', '273', '274', '275', '276', '277', '278', '279', '280', '281', '282', '283', '284', '285', '286', '287', '288', '289', '290', '291', '292', '293', '294', '295', '296', '297', '298', '299', '300', '301', '302', '303', '304', '305', '306', '307', '308', '309', '310', '311', '312', '313', '314', '315', '316', '317', '318', '319', '320', '321', '322', '323', '324', '325', '326', '327', '328', '329', '330', '331', '332', '333', '334', '335', '336', '337', '338', '339', '340', '341', '342', '343', '344', '345', '346', '347', '348', '349', '350', '351', '352', '353', '354', '355', '356', '357', '358', '359', '360', '361', '362', '363', '364', '365', '366', '367', '368', '369', '370', '371', '372', '373', '374', '375', '376', '377', '378', '379', '380', '381', '382', '383', '384', '385', '386', '387', '388', '389', '390', '391', '392', '393', '394', '395', '396', '397', '398', '399', '400', '401', '402', '403', '404', '405', '406', '407', '408', '409', '410', '411', '412', '413', '414', '415', '416', '417', '418', '419', '420', '421', '422', '423', '424', '425', '426', '427', '428', '429', '430', '431', '432', '433', '434', '435', '436', '437', '438', '439', '440', '441', '442', '443', '444', '445', '446', '447', '448', '449', '450', '451', '452', '453', '454', '455', '456', '457', '458', '459', '460', '461', '462', '463', '464', '465', '466', '467', '468', '469', '470', '471', '472', '473', '474', '475', '476', '477', '478', '479', '480', '481', '482', '483', '484', '485', '486', '487', '488', '489', '490', '491', '492', '493', '494', '495', '496', '497', '498', '499', '500', '501', '502', '503', '504', '505', '506', '507', '508', '509', '510', '511', '512', '513', '514', '515', '516', '517', '518', '519', '520', '521', '522', '523', '524', '525', '526', '527', '528', '529', '530', '531', '532', '533', '534', '535', '536', '537', '538', '539', '540', '541', '542', '543', '544', '545', '546', '547', '548', '549', '550', '551', '552', '553', '554', '555', '556', '557', '558', '559', '560', '561', '562', '563', '564', '565', '566', '567', '568', '569', '570', '571', '572', '573', '574', '575', '576', '577', '578', '579', '580', '581', '582', '583', '584', '585', '586', '587', '588', '589', '590', '591', '592', '593', '594', '595', '596', '597', '598', '599', '600', '601', '602', '603', '604', '605', '606', '607', '608', '609', '610', '611', '612', '613', '614', '615', '616', '617', '618', '619', '620', '621', '622', '623', '624', '625', '626', '627', '628', '629', '630', '631', '632', '633', '634', '635', '636', '637', '638', '639', '640', '641', '642', '643', '644', '645', '646', '647', '648', '649', '650', '651', '652', '653', '654', '655', '656', '657', '658', '659', '660', '661', '662', '663', '664', '665', '666', '667', '668', '669', '670', '671', '672', '673', '674', '675', '676', '677', '678', '679', '680', '681', '682', '683', '684', '685', '686', '687', '688', '689', '690', '691', '692', '693', '694', '695', '696', '697', '698', '699', '700', '701', '702', '703', '704', '705', '706', '707', '708', '709', '710', '711', '712', '713', '714', '715', '716', '717', '718', '719', '720', '721', '722', '723', '724', '725', '726', '727', '728', '729', '730', '731', '732', '733', '734', '735', '736', '737', '738', '739', '740', '741', '742', '743', '744', '745', '746', '747', '748', '749', '750', '751', '752', '753', '754', '755', '756', '757', '758', '759', '760', '761', '762', '763', '764', '765', '766', '767', '768', '769', '770', '771', '772', '773', '774', '775', '776', '777', '778', '779', '780', '781', '782', '783', '784', '785', '786', '787', '788', '789', '790', '791', '792', '793', '794', '795', '796', '797', '798', '799', '800', '801', '802', '803', '804', '805', '806', '807', '808', '809', '810', '811', '812', '813', '814', '815', '816', '817', '818', '819', '820', '821', '822', '823', '824', '825', '826', '827', '828', '829', '830', '831', '832', '833', '834', '835', '836', '837', '838', '839', '840', '841', '842', '843', '844', '845', '846', '847', '848', '849', '850', '851', '852', '853', '854', '855', '856', '857', '858', '859', '860', '861', '862', '863', '864', '865', '866', '867', '868', '869', '870', '871', '872', '873', '874', '875', '876', '877', '878', '879', '880', '881', '882', '883', '884', '885', '886', '887', '888', '889', '890', '891', '892', '893', '894', '895', '896', '897', '898', '899', '900', '901', '902', '903', '904', '905', '906', '907', '908', '909', '910', '911', '912', '913', '914', '915', '916', '917', '918', '919', '920', '921', '922', '923', '924', '925', '926', '927', '928', '929', '930', '931', '932', '933', '934', '935', '936', '937', '938', '939', '940', '941', '942', '943', '944', '945', '946', '947', '948', '949', '950', '951', '952', '953', '954', '955', '956', '957', '958', '959', '960', '961', '962', '963', '964', '965', '966', '967', '968', '969', '970', '971', '972', '973', '974', '975', '976', '977', '978', '979', '980', '981', '982', '983', '984', '985', '986', '987', '988', '989', '990', '991', '992', '993', '994', '995', '996', '997', '998', '999', '1000', '1001', '1002', '1003', '1004', '1005', '1006', '1007', '1008', '1009', '1010', '1011', '1012', '1013', '1014', '1015', '1016', '1017', '1018', '1019', '1020', '1021', '1022', '1023', '1024', '1025', '1026', '1027', '1028', '1029', '1030', '1031', '1032', '1033', '1034', '1035', '1036', '1037', '1038', '1039', '1040', '1041', '1042', '1043', '1044', '1045', '1046', '1047', '1048', '1049', '1050', '1051', '1052', '1053', '1054', '1055', '1056', '1057', '1058', '1059', '1060', '1061', '1062', '1063', '1064', '1065', '1066', '1067', '1068', '1069', '1070', '1071', '1072', '1073', '1074', '1075', '1076', '1077', '1078', '1079', '1080', '1081', '1082', '1083', '1084', '1085', '1086', '1087', '1088', '1089', '1090', '1091', '1092', '1093', '1094', '1095', '1096', '1097', '1098', '1099', '1100', '1101', '1102', '1103', '1104', '1105', '1106', '1107', '1108', '1109', '1110', '1111', '1112', '1113', '1114', '1115', '1116', '1117', '1118', '1119', '1120', '1121', '1122', '1123', '1124', '1125', '1126', '1127', '1128', '1129', '1130', '1131', '1132', '1133', '1134', '1135', '1136', '1137', '1138', '1139', '1140', '1141', '1142', '1143', '1144', '1145', '1146', '1147', '1148', '1149', '1150', '1151', '1152', '1153', '1154', '1155', '1156', '1157', '1158', '1159', '1160', '1161', '1162', '1163', '1164', '1165', '1166', '1167', '1168', '1169', '1170', '1171', '1172', '1173', '1174', '1175', '1176', '1177', '1178', '1179', '1180', '1181', '1182', '1183', '1184', '1185', '1186', '1187', '1188', '1189', '1190', '1191', '1192', '1193', '1194', '1195', '1196', '1197', '1198', '1199', '1200', '1201', '1202', '1203', '1204', '1205', '1206', '1207', '1208', '1209', '1210', '1211', '1212', '1213', '1214', '1215', '1216', '1217', '1218', '1219', '1220', '1221', '1222', '1223', '1224', '1225', '1226', '1227', '1228', '1229', '1230', '1231', '1232', '1233', '1234', '1235', '1236', '1237', '1238', '1239', '1240', '1241', '1242', '1243', '1244', '1245', '1246', '1247', '1248', '1249', '1250', '1251', '1252', '1253', '1254', '1255', '1256', '1257', '1258', '1259', '1260', '1261', '1262', '1263', '1264', '1265', '1266', '1267', '1268', '1269', '1270', '1271', '1272', '1273', '1274', '1275', '1276', '1277', '1278', '1279', '1280', '1281', '1282', '1283', '1284', '1285', '1286', '1287', '1288', '1289', '1290', '1291', '1292', '1293', '1294', '1295', '1296', '1297', '1298', '1299', '1300', '1301', '1302', '1303', '1304', '1305', '1306', '1307', '1308', '1309', '1310', '1311', '1312', '1313', '1314', '1315', '1316', '1317', '1318', '1319', '1320', '1321', '1322', '1323', '1324', '1325', '1326', '1327', '1328', '1329', '1330', '1331', '1332', '1333', '1334', '1335', '1336', '1337', '1338', '1339', '1340', '1341', '1342', '1343', '1344', '1345', '1346', '1347', '1348', '1349', '1350', '1351', '1352', '1353', '1354', '1355', '1356', '1357', '1358', '1359', '1360', '1361', '1362', '1363', '1364', '1365', '1366', '1367', '1368', '1369', '1370', '1371', '1372', '1373', '1374', '1375', '1376', '1377', '1378', '1379', '1380', '1381', '1382', '1383', '1384', '1385', '1386', '1387', '1388', '1389', '1390', '1391', '1392', '1393', '1394', '1395', '1396', '1397', '1398', '1399', '1400', '1401', '1402', '1403', '1404', '1405', '1406', '1407', '1408', '1409', '1410', '1411', '1412', '1413', '1414', '1415', '1416', '1417', '1418', '1419', '1420', '1421', '1422', '1423', '1424', '1425', '1426', '1427', '1428', '1429', '1430', '1431', '1432', '1433', '1434', '1435', '1436', '1437', '1438', '1439', '1440', '1441', '1442', '1443', '1444', '1445', '1446', '1447', '1448', '1449', '1450', '1451', '1452', '1453', '1454', '1455', '1456', '1457', '1458', '1459', '1460', '1461', '1462', '1463', '1464', '1465', '1466', '1467', '1468', '1469', '1470', '1471', '1472', '1473', '1474', '1475', '1476', '1477', '1478', '1479', '1480', '1481', '1482', '1483', '1484', '1485', '1486', '1487', '1488', '1489', '1490', '1491', '1492', '1493', '1494', '1495', '1496', '1497', '1498', '1499', '1500', '1501', '1502', '1503', '1504', '1505', '1506', '1507', '1508', '1509', '1510', '1511', '1512', '1513', '1514', '1515', '1516', '1517', '1518', '1519', '1520', '1521', '1522', '1523', '1524', '1525', '1526', '1527', '1528', '1529', '1530', '1531', '1532', '1533', '1534', '1535', '1536', '1537', '1538', '1539', '1540', '1541', '1542', '1543', '1544', '1545', '1546', '1547', '1548', '1549', '1550', '1551', '1552', '1553', '1554', '1555', '1556', '1557', '1558', '1559', '1560', '1561', '1562', '1563', '1564', '1565', '1566', '1567', '1568', '1569', '1570', '1571', '1572', '1573', '1574', '1575', '1576', '1577', '1578', '1579', '1580', '1581', '1582', '1583', '1584', '1585', '1586', '1587', '1588', '1589', '1590', '1591', '1592', '1593', '1594', '1595', '1596', '1597', '1598', '1599', '1600', '1601', '1602', '1603', '1604', '1605', '1606', '1607', '1608', '1609', '1610', '1611', '1612', '1613', '1614', '1615', '1616', '1617', '1618', '1619', '1620', '1621', '1622', '1623', '1624', '1625', '1626', '1627', '1628', '1629', '1630', '1631', '1632', '1633', '1634', '1635', '1636', '1637', '1638', '1639', '1640', '1641', '1642', '1643', '1644', '1645', '1646', '1647', '1648', '1649', '1650', '1651', '1652', '1653', '1654', '1655', '1656', '1657', '1658', '1659', '1660', '1661', '1662', '1663', '1664', '1665', '1666', '1667', '1668', '1669', '1670', '1671', '1672', '1673', '1674', '1675', '1676', '1677', '1678', '1679', '1680', '1681', '1682', '1683', '1684', '1685', '1686', '1687', '1688', '1689', '1690', '1691', '1692', '1693', '1694', '1695', '1696', '1697', '1698', '1699', '1700', '1701', '1702', '1703', '1704', '1705', '1706', '1707', '1708', '1709', '1710', '1711', '1712', '1713', '1714', '1715', '1716', '1717', '1718', '1719', '1720', '1721', '1722', '1723', '1724', '1725', '1726', '1727', '1728', '1729', '1730', '1731', '1732', '1733', '1734', '1735', '1736', '1737', '1738', '1739', '1740', '1741', '1742', '1743', '1744', '1745', '1746', '1747', '1748', '1749', '1750', '1751', '1752', '1753', '1754', '1755', '1756', '1757', '1758', '1759', '1760', '1761', '1762', '1763', '1764', '1765', '1766', '1767', '1768', '1769', '1770', '1771', '1772', '1773', '1774', '1775', '1776', '1777', '1778', '1779', '1780', '1781', '1782', '1783', '1784', '1785', '1786', '1787', '1788', '1789', '1790', '1791', '1792', '1793', '1794', '1795', '1796', '1797', '1798', '1799', '1800', '1801', '1802', '1803', '1804', '1805', '1806', '1807', '1808', '1809', '1810', '1811', '1812', '1813', '1814', '1815', '1816', '1817', '1818', '1819', '1820', '1821', '1822', '1823', '1824', '1825', '1826', '1827', '1828', '1829', '1830', '1831', '1832', '1833', '1834', '1835', '1836', '1837', '1838', '1839', '1840', '1841', '1842', '1843', '1844', '1845', '1846', '1847', '1848', '1849', '1850', '1851', '1852', '1853', '1854', '1855', '1856', '1857', '1858', '1859', '1860', '1861', '1862', '1863', '1864', '1865', '1866', '1867', '1868', '1869', '1870', '1871', '1872']
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-16-03f83ec536c1> in <module>()
46 print(columns_list)
47
---> 48 print(data_dict[0]['x_train'][columns_list].shape)
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
Issue as it seems is because of column name with special characters that pandas is unable to recognize.
Rename column names for 'Unnamed: 0' , both in pandas data frame and Python list. This might not be causing issue, but it's best practice.
Real issue appears to be column names with special characters. To solve this, replace all special characters such as round brackets (, ), percentage %, backward and forward slash \, /, square bracket [, ] from the names of your pandas data frame columns. If you need any special character, then only use space or underscore_ in both your pandas column names and Python list which has these column names. This should solve your issue.
Let me know if you are still facing this issue.
Edit: It appears that issue is because of x_train is numpy matrix and not dataframe

Speed up Scraping with Multithreading/Multiprocessing for my code

How can I speed my scrapy code with multithreading/multiprocessing?
I have attached my code below I am not familiar with threading in python and don't know where to begin if anyone could help me with this code
import scrapy
import logging
domain = 'https://www.spdigital.cl/categories/view/'
categories = [
'334' , '335', '553', '607', '336', '340', '339', '540', '486', '489', '485', '598', '347', '562','348', '349', '353', '351', '352', '532', '350',
'477', '475', '476', '474', '559','355', '356', '580', '337', '357', '358', '360', '374', '363', '362', '361', '338', '344', '593', '359', '604',
'478', '507', '509', '508', '510', '512', '600', '590', '511', '459','564', '376', '375', '558', '341', '377', '378', '484', '554', '567', '563', '379', '342', '343',
'370', '481', '365', '556', '364', '541', '555', '492', '570','579', '576', '574', '575', '572', '578', '577', '588', '573',
'596', '597', '601', '595','387', '468', '536', '391', '390', '589', '389','399', '394', '396', '397', '398', '392', '592', '401', '402', '530', '560',
'407', '406', '408', '404', '403', '405','413', '411', '414', '410', '409', '412','418', '599', '603', '465', '415', '487', '416', '382', '419', '417', '479',
'515', '582', '518', '514', '581', '583', '517', '519', '520','420', '421', '422', '423', '424', '425', '521', '557', '538', '428', '430', '432', '434', '436', '433', '435', '427', '437', '429', '482',
'544', '552', '545', '546', '550', '547', '551', '549', '548','491', '535', '494', '493', '472', '471', '470', '534', '537',
'587', '586', '585','602', '569', '561','438', '446', '488', '439', '496', '440', '566', '445', '447', '565','547', '448', '449', '450', '451', '452', '531', '453', '454', '456', '455',
'501', '505', '506', '504', '502', '498', '500', '503', '369','527', '460', '529', '606', '528', '591', '462', '526', '525', '605', '463', '464',
]
class ProductosSpider(scrapy.Spider):
name = 'productos'
allowed_domains = ['www.spdigital.cl']
def start_requests(self):
for i in categories:
yield scrapy.Request( url = domain + i, callback = self.parse, headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/78.0.3904.108 Chrome/78.0.3904.108 Safari/537.36'
})
def parse(self, response):
for product in response.xpath( '//div[#class="span8 grid-style-mosaic"]/div/div[#class="span2 product-item-mosaic"]' ):
yield {
'product_name': product.xpath( './/div[#class="name"]/a/text() | //div[#class="name"]/a/span/#data-original-title' ).get(),
'product_brand': product.xpath( './/div[#class="brand"]/text()' ).get(),
'product_url': response.urljoin(product.xpath('.//div[#class="name"]/a/#href').get()),
'product_original': product.xpath( './/div[#class="cash-price"]/text()' ).get(),
'product_discount': product.xpath( './/span[#class="cash-previous-price-value"]/text()' ).get()
}
next_page = response.urljoin( response.xpath( '//a[#class="next"]/#href').get() )
if next_page:
yield scrapy.Request( url = next_page, callback = self.parse, headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/78.0.3904.108 Chrome/78.0.3904.108 Safari/537.36'
})
Scrapy is single-threaded, therefore does not support multi-threading. Scrapy does the requests asynchronously as it is built on Twisted. To speed up your crawling process you can increase your concurrent requests in setting.py by modifying CONCURRENT_REQUESTS and CONCURRENT_REQUESTS_PER_DOMAIN which the default numbers are 16 and 8. read more in Scrapy documentaition about concurrent requests which would be constructive.

Error using scipy.signal.lfilter() in Python3.7.0 : <built-in function _linear_filter> returned NULL without setting an error

I have a list of 256 data elements. I want to filter this data using elliptical filter.
import matplotlib.pyplot as plt
from scipy.signal import *
import numpy as np
def elliptical_bandpass():
Fs=256
lowcut=5
highcut=30
order=5
Rp = 0.5; # Passband Ripple (dB)
Rs = 30; # Stopband Ripple (dB)
nyq = Fs/2 #Nyquist frequency
wp = lowcut / nyq
ws = highcut / nyq
c3=['221', '262', '333', '429', '522', '592', '630', '656', '668', '645', '581', '486', '395', '324', '265', '214', '172', '171', '214', '282', '353', '420', '498', '584', '650', '679', '661', '622', '571', '503', '415', '316', '240', '200', '185', '188', '204', '256', '344', '443', '527', '582', '627', '665', '676', '644', '567', '481', '404', '337', '271', '204', '168', '175', '218', '277', '340', '419', '513', '599', '653', '662', '649', '622', '578', '506', '407', '317', '252', '213', '188', '173', '194', '258', '352', '445', '517', '578', '632', '671', '672', '626', '561', '491', '422', '341', '254', '188', '165', '184', '224', '271', '337', '424', '522', '598', '638', '652', '653', '637', '585', '497', '397', '314', '258', '215', '180', '172', '202', '272', '352', '427', '502', '579', '649', '680', '664', '615', '555', '498', '424', '335', '251', '195', '180', '187', '212', '258', '338', '442', '533', '594', '628', '649', '661', '640', '579', '490', '402', '332', '266', '206', '164', '166', '216', '285', '357', '425', '501', '584', '644', '669', '655', '624', '580', '509', '414', '311', '236', '202', '190', '191', '207', '258', '345', '441', '521', '577', '626', '667', '676', '643', '567', '483', '407', '334', '261', '194', '162', '176', '222', '280', '342', '422', '517', '603', '654', '662', '650', '626', '579', '505', '404', '315', '252', '213', '187', '173', '196', '262', '352', '442', '513', '580', '642', '679', '674', '622', '553', '483', '413', '336', '254', '196', '177', '191', '221', '260', '328', '422', '524', '603', '640', '655', '656', '637', '583', '492', '397', '319', '263', '217', '176', '168', '204', '278', '361', '436', '509', '583', '645', '672', '656', '616', '565', '507', '425', '325', '238', '188', '179', '190', '213', '260', '338', '440']
n, Wn = ellipord(wp, ws, Rp,Rs)
print('Wn IS ----', Wn)
b,a=ellip(order,Rp,Rs,[wp, ws], btype='band') #get filter coefficients
print('b coeff from filter code -- ',b)
print('a coeff from filter code -- ',a)
c3_filtered=lfilter(b,a,c3)
print('filtered data-',c3_filtered)
print('len of filtered data', len(c3_filtered))
w, h = freqz(b, a, worN=2000) #used to plot the frequency response
plt.figure()
plt.plot((Fs * 0.5 / np.pi) * w, abs(h), label="order = %d" % order)
plt.xlabel('Frequency (Hz)')
plt.ylabel('Gain')
plt.grid(True)
plt.legend(loc='best')
plt.show()
elliptical_bandpass()
When I run this, I see filter design and coefficients to be correct, but I get an error using lfilter
File "C:\Users\gtec\AppData\Local\Programs\Python\Python37-32\lib\site-packages\scipy\signal\signaltools.py", line 1354, in lfilter
return sigtools._linear_filter(b, a, x, axis)
SystemError: returned NULL without setting an error
Previously I was using python2.7 and it executed without any errors. Now I am using Python3.7.0
The problem is that c3 is a list of strings. lfilter expects a sequence of numerical values. It will not automatically convert strings to numbers, so you'll have to convert those strings to numbers in your code before calling lfilter.
Do something like
c3 = [float(t) for t in c3]
before passing c3 to lfilter.
Even better would be to look back at how you actually create c3 in your "real" code (assuming the code in the question is a simplified example). It would make sense to convert the strings to numbers at the point where c3 is created.
(The cryptic error message is a bug in lfilter; you should have gotten a nicer error message. :)

non-recursive unlist lists of tuples [duplicate]

This question already has answers here:
flatten list in python
(5 answers)
Closed 6 years ago.
nlist = [[('4698874', '0', '58'), ('838286', '58', '310', '1.01')],('2588097', '368', '179', '1.01'), ('2746740', '547', '342', '1.44'),('3873988', '889', '259', '1.01'), ('808046', '1148', '236', '1.01'), ('2588498', '1384', '158', '1.01'), ('2492893', '1542', '196', '1.02'), ('2168413', '1738', '165', '1.02'), ('1778345', '1903', '448', '1.07'), ('2989691', '2351', '194', '0.99'), [('4698875', '2545', '256'), ('2985955', '2801', '257', '1.54')], [('4698876', '3058', '177'), ('1728736', '3235', '270', '0.96')], ('2615446', '3505', '172', '0.93'),[('4698877', '3677', '177'), ('4698878', '3854', '144'), ('515524', '3998', '134', '1.10')], [('4698879', '4132', '172'), ('4698880', '4304', '98'), ('2444241', '4402', '146', '1.04')], ('4698881', '4548', '-1', '1.00'), ()]
I was wondering is there a one-liner to unlist this non-recursively such that the remaining elements are all tuples.
nlist = [('4698874', '0', '58'), ('838286', '58', '310', '1.01'),('2588097', '368', '179', '1.01'), ('2746740', '547', '342', '1.44'),('3873988', '889', '259', '1.01'), ('808046', '1148', '236', '1.01'), ('2588498', '1384', '158', '1.01'), ('2492893', '1542', '196', '1.02'), ('2168413', '1738', '165', '1.02'), ('1778345', '1903', '448', '1.07'), ('2989691', '2351', '194', '0.99'), ('4698875', '2545', '256'), ('2985955', '2801', '257', '1.54'), ('4698876', '3058', '177'), ('1728736', '3235', '270', '0.96')], ('2615446', '3505', '172', '0.93'),('4698877', '3677', '177'), ('4698878', '3854', '144'), ('515524', '3998', '134', '1.10'), ('4698879', '4132', '172'), ('4698880', '4304', '98'), ('2444241', '4402', '146', '1.04'), ('4698881', '4548', '-1', '1.00')]
Thank you so much for guidance.
Use itertools.chain.
from itertools import chain
def unlist(nlist):
return list(chain(*[[n] if isinstance(n, tuple) else n for n in nlist]))

Creating list of defined number pattern

I want to create a list that produces the output of:
[001,002,003,004,005]
and keeps going until 300. Having the 0's in front of the digits is essential. I tried a method such as:
a = []
for i in range(0,3):
for j in range(0,10):
for k in range(0,10):
a.append(i j k)
However, for obvious reasons, the append function does not behave in the manner I would like.
Do people have any suggestions on how else I could do this?
You cannot produce a list with integers that are presented with padding, no. You can produce strings with leading zeros:
a = [format(i + 1, '03d') for i in range(300)]
The format() function is used to format integers to a field width of 3 characters with leading zeros to pad out the length, encoded as 03d.
Demo:
>>> [format(i + 1, '03d') for i in range(300)]
['001', '002', '003', '004', '005', '006', '007', '008', '009', '010', '011', '012', '013', '014', '015', '016', '017', '018', '019', '020', '021', '022', '023', '024', '025', '026', '027', '028', '029', '030', '031', '032', '033', '034', '035', '036', '037', '038', '039', '040', '041', '042', '043', '044', '045', '046', '047', '048', '049', '050', '051', '052', '053', '054', '055', '056', '057', '058', '059', '060', '061', '062', '063', '064', '065', '066', '067', '068', '069', '070', '071', '072', '073', '074', '075', '076', '077', '078', '079', '080', '081', '082', '083', '084', '085', '086', '087', '088', '089', '090', '091', '092', '093', '094', '095', '096', '097', '098', '099', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '148', '149', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '164', '165', '166', '167', '168', '169', '170', '171', '172', '173', '174', '175', '176', '177', '178', '179', '180', '181', '182', '183', '184', '185', '186', '187', '188', '189', '190', '191', '192', '193', '194', '195', '196', '197', '198', '199', '200', '201', '202', '203', '204', '205', '206', '207', '208', '209', '210', '211', '212', '213', '214', '215', '216', '217', '218', '219', '220', '221', '222', '223', '224', '225', '226', '227', '228', '229', '230', '231', '232', '233', '234', '235', '236', '237', '238', '239', '240', '241', '242', '243', '244', '245', '246', '247', '248', '249', '250', '251', '252', '253', '254', '255', '256', '257', '258', '259', '260', '261', '262', '263', '264', '265', '266', '267', '268', '269', '270', '271', '272', '273', '274', '275', '276', '277', '278', '279', '280', '281', '282', '283', '284', '285', '286', '287', '288', '289', '290', '291', '292', '293', '294', '295', '296', '297', '298', '299', '300']
You could subclass list and overload its __repr__ method to call str.zfill on each number:
class NumList(list):
def __repr__(self):
return '[' + ', '.join([str(x).zfill(3) for x in self]) + ']'
Demo:
>>> class NumList(list):
... def __repr__(self):
... return '[' + ', '.join([str(x).zfill(3) for x in self]) + ']'
...
>>> MyList([1, 2, 3, 4, 5])
[001, 002, 003, 004, 005]
>>>
To make the exact list you want, do NumList(range(300)).
Note however that this does not make integers with leading zeros (as #MartijnPieters said, that is impossible). The output is still a string. All this is doing is telling Python how to display those integers when they are outputed to the console.

Categories