im using Python and it seems that I have an issue with my for loop.
So here are the data sets that I am working with:
A) I have this csv file, that contains all the car data
B) I have also the Brand to Car (type) in csv that looks like below
What I need to build is to build another CSV that breaks down each line into its correct brand and car type. while making sure that it is accurate (for example there is Camry L and Camry LE - sometimes it pulls both).
This is the output that I am looking for:
So far here is my script:
temp_car_list = []
reader_type = csv.DictReader(type_library)
with open(file_name) as Output_file:
reader = csv.DictReader(Output_file)
for row in reader:
Title = row ["Item Name"]
print (Title)
for brand in brand_library_iterate:
if brand in Title:
Brand_Name = brand
print(Brand_Name)
#Get Type Library List
for Car_type in reader_type:
Brand_String = str(Brand_Name)
print(Brand_String)
type_list = temp_car_list.append(Car_type[Brand_String])
print(temp_car_list)
for car_type in temp_car_list:
if car_type in Title:
car_type_output = car_type
print (car_type_output)
temp_car_list.clear()
Logic of the script:
1)Pull Title (e.g. Black Toyota Camry L)
2)Pull The brand of the car from a list
3)From output #2 - map into #B (the csv file in the picture)
Here is what I get (unfortunately):
What I have noticed to be the main issue:
1) for some reason the Brand_Name does not change in accordance to the second or subsequent rows. So it is stuck at Toyota.
2) The car_type_output pulls out both Camry L and Camry LE
Question: Break down each line into its correct brand, color and car type.
A = """Item Name
Black Toyota Camry L
Honda Accord Navy
Grey Toyota Corolla
Black Nissan Murano
Silver Toyota Camry LE
"""
B = """Toyota,Nissan,Honda
Camry L,Murano,Accord
Corolla,Rogue,Civic
Avalon,Pathfinder,CR-V
Highlander,Maxima,HR-V
Prius,Altima,
Camry LE,,
"""
The simplest approach by spliting the lines from file A.
From file B only the first line with the brands are used to detect the out of order line:
Honda Accord Navy.
import io
def simple_split_title():
# with open(<name of your file B>) as fh:
with io.StringIO(B) as fh:
brands = fh.readline().rstrip().split(',')
# with open(<name of your file A>) as fh:
with io.StringIO(A) as fh:
_ = next(fh)
for line in fh:
title = line.rstrip()
item = title.split(' ')
if item[0] in brands:
_color, _brand, _type, _last = len(item) - 1, 0, 1, len(item) - 1
else:
_color, _brand, _type, _last = 0, 1, 2, len(item)
result = {'title': title,
'brand': item[_brand],
'color': item[_color],
'type': ' '.join(item[_type:_last])}
print(result)
The most expensive approach. This requires to loop the dict from file B twice per line from file A.
To detect the out of order line: Honda Accord Navy, two string comparson required.
import io, csv
def looping_dict():
# with open(<name of your file B>) as fh:
with io.StringIO(B) as fh:
car_type = [_dict for _dict in csv.DictReader(fh)]
# with open(<name of your file A>) as fh:
with io.StringIO(A) as fh:
_ = next(fh)
for line in fh:
title = line.rstrip()
result = {'title': title, 'brand': '', 'color': '', 'type': ''}
# Get brand
for brand in car_type[0].keys():
if brand in title:
result['brand'] = brand
title = title.replace(brand + ' ', '')
break
# Get type
for _type in car_type:
if title.endswith(_type[brand]) or title.startswith(_type[brand]):
result['type'] = _type[brand]
title = title.replace(_type[brand], '')
break
# Get color
result['color'] = title.strip()
print(result)
The mathematical approach, using theory of sets.
The list of car_type is only looped once per line of file A.
A extra condition to detect the out of order line: Honda Accord Navy, is not required.
You get a match, if the set of title items are a superset of car_type[x].set.
import io, csv
from collections import namedtuple
def theory_of_sets():
CarType = namedtuple('CarType', 'set brand type')
car_type = []
# with open(<name of your file B>) as fh:
with io.StringIO(B) as fh:
for _dict in csv.DictReader(fh):
for brand, _type in _dict.items():
_set = {brand} | set(_type.split(' '))
car_type.append(CarType._make((_set, brand, _type)))
# with open(<name of your file A>) as fh:
with io.StringIO(A) as fh:
_ = next(fh)
for line in fh:
title = line.rstrip()
_title = title.split(' ')
_items = set(_title)
result = None
for ct in car_type:
if _items.issuperset(ct.set):
result = {'title': title,
'brand': ct.brand,
'color': (_items - ct.set).pop(),
'type': ct.type}
break
print(result)
Output: All three examples print the same output.
{'title': 'Black Toyota Camry L', 'brand': 'Toyota', 'color': 'Black', 'type': 'Camry L'}
{'title': 'Honda Accord Navy', 'brand': 'Honda', 'color': 'Navy', 'type': 'Accord'}
{'title': 'Grey Toyota Corolla', 'brand': 'Toyota', 'color': 'Grey', 'type': 'Corolla'}
{'title': 'Black Nissan Murano', 'brand': 'Nissan', 'color': 'Black', 'type': 'Murano'}
{'title': 'Silver Toyota Camry LE', 'brand': 'Toyota', 'color': 'Silver', 'type': 'Camry LE'}
Related
Here is what I'm trying:
def dict1(filename):
try:
file_handle=open(filename,"r")
except FileNotFoundError:
return None
newdict = {}
for line in file_handle:
list1 = line.split(",")
(k, v) = list1
newdict[k]] = v
return newdict
The text file has lines like such:
cow 1,dog 1
john 1,rob 1,ford 1
robert 1,kyle 1,jake 1
I'm trying to create a dictionary like so:
{'cow 1': ['dog 1'],
'john 1': ['rob 1','ford 1'],
'robert 1': ['kyle 1, 'jake 1']}
The problem is in the (k,v), how can i set the k to the first item in the list, and the v to all items following it?
Using the csv module.
Ex:
import csv
result = {}
with open(filename) as infile:
reader = csv.reader(infile)
for row in reader: #Iterate each row
result[row[0]] = row[1:] #column 0 as key and others as value.
print(result)
Output:
{'cow 1': ['dog 1'],
'john 1': ['rob 1', 'ford 1'],
'robert 1': ['kyle 1', 'jake 1']}
I have files with CommonChar is some of them and my python code works on them to build a dictionary. While building there are some required keys which users might forget to put in. The code should be able to flag the file and the key which is missing.
The syntax for python code to work on is like this:
CommonChar pins Category General
CommonChar pins Contact Mark
CommonChar pins Description 1st line
CommonChar pins Description 2nd line
CommonChar nails Category specific
CommonChar nails Description 1st line
So for above example "Contact" is missing:
CommonChar nails Contact Robert
I have a list for ex: mustNeededKeys=["Category", "Description", "Contact"]
mainDict={}
for dirName, subdirList, fileList in os.walk(sys.argv[1]):
for eachFile in fileList:
#excluding file names ending in .swp , swo which are creatied temporarily when editing in vim
if not eachFile.endswith(('.swp','.swo','~')):
#print eachFile
filePath= os.path.join(dirName,eachFile)
#print filePath
with open(filePath, "r") as fh:
contents=fh.read()
items=re.findall("CommonChar.*$",contents,re.MULTILINE)
for x in items:
cc, group, topic, data = x.split(None, 3)
data = data.split()
group_dict = mainDict.setdefault(group, {'fileLocation': [filePath]})
if topic in group_dict:
group_dict[topic].extend(['</br>'] + data)
else:
group_dict[topic] = data
This above code does its job of building a dict like this:
{'pins': {'Category': ['General'], 'Contact': ['Mark'], 'Description': ['1st', 'line', '2nd', 'line'] } , 'nails':{'Category':['specific'], 'Description':['1st line']}
So when reading each file with CommonChar and building a group_dict , a way to check all the keys and compare it with mustNeededKeys and flag if not there and proceed if met.
Something like this should work:
# Setup mainDict (equivalent to code given above)
mainDict = {
'nails': {
'Category': ['specific'],
'Description': ['1st', 'line'],
'fileLocation': ['/some/path/nails.txt']
},
'pins': {
'Category': ['General'],
'Contact': ['Mark'],
'Description': ['1st', 'line', '</br>', '2nd', 'line'],
'fileLocation': ['/some/path/pins.txt']
}
}
# check for missing keys
mustNeededKeys = {"Category", "Description", "Contact"}
for group, group_dict in mainDict.items():
missing_keys = mustNeededKeys - set(group_dict.keys())
if missing_keys:
missing_key_list = ','.join(missing_keys)
print(
'group "{}" ({}) is missing key(s): {}'
.format(group, group_dict['fileLocation'][0], missing_key_list)
)
# group "nails" (/some/path/nails.txt) is missing key(s): Contact
If you must check for missing keys immediately after processing each group, you could use the code below. This assumes that each group is stored as a contiguous collection of rows in a single file (i.e., not mixed with other groups in the same file or spread across different files).
from itertools import groupby
mainDict={}
mustNeededKeys = {"Category", "Description", "Contact"}
for dirName, subdirList, fileList in os.walk(sys.argv[1]):
for eachFile in fileList:
# excluding file names ending in .swp , swo which are created
# temporarily when editing in vim
if not eachFile.endswith(('.swp','.swo','~')):
#print eachFile
filePath = os.path.join(dirName,eachFile)
#print filePath
with open(filePath, "r") as fh:
contents = fh.read()
items = re.findall("CommonChar.*$", contents, re.MULTILINE)
split_items = [line.split(None, 3) for line in items]
# group the items by group name (element 1 in each row)
for g, group_items in groupby(split_items, lambda row: row[1]):
group_dict = {'fileLocation': [filePath]}
# store all items in the current group
for cc, group, topic, data in group_items:
data = data.split()
if topic in group_dict:
group_dict[topic].extend(['</br>'] + data)
else:
group_dict[topic] = data
# check for missing keys
missing_keys = mustNeededKeys - set(group_dict.keys())
if missing_keys:
missing_key_list = ','.join(missing_keys)
print(
'group "{}" ({}) is missing key(s): {}'
.format(group, filePath, missing_key_list)
)
# add group to mainDict
mainDict[group] = group_dict
data = '''CommonChar pins Category General
CommonChar pins Contact Mark
CommonChar pins Description 1st line
CommonChar pins Description 2nd line
CommonChar nails Category specific
CommonChar nails Description 1st line'''
from collections import defaultdict
from pprint import pprint
required_keys = ["Category", "Description", "Contact"]
d = defaultdict(dict)
for line in data.splitlines():
line = line.split()
if line[2] == 'Description':
if line[2] not in d[line[1]]:
d[line[1]][line[2]] = []
d[line[1]][line[2]].extend(line[3:])
else:
d[line[1]][line[2]] = [line[3]]
pprint(dict(d))
print('*' * 80)
# find missing keys
for k in d.keys():
for missing_key in set(d[k].keys()) ^ set(required_keys):
print('Key "{}" is missing "{}"!'.format(k, missing_key))
Prints:
{'nails': {'Category': ['specific'], 'Description': ['1st', 'line']},
'pins': {'Category': ['General'],
'Contact': ['Mark'],
'Description': ['1st', 'line', '2nd', 'line']}}
********************************************************************************
Key "nails" is missing "Contact"!
I have the following type of document, where each person might have a couple of names and an associated description of features:
New person
name: ana
name: anna
name: ann
feature: A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.
New person
name: tom
name: thomas
name: thimoty
name: tommy
feature: A 32-year old male that is known to be deaf.
New person
.....
What I would like is to read this file in a python dictionary, where each new person is id-ed.
i.e. Person with ID 1 will have the names ['ann','anna','ana']
and will have the feature ['A 65-year old woman that has no known health issues but has a medical history of Schizophrenia.' ]
Any suggestions?
Assuming that your input file is lo.txt. It can be added to dictionary this way:
file = open('lo.txt')
final_data = []
feature = []
names = []
for line in file.readlines():
if ("feature") in line:
data = line.replace("\n","").split(":")
feature=data[1]
final_data.append({
'names': names,
'feature': feature
})
names = []
feature = []
if ("name") in line:
data = line.replace("\n","").split(":")
names.append(data[1])
print final_data
Something like this might work
result = {}
f = open("document.txt")
contents = f.read()
info = contents.split('==== new person ===')
for i in range(len(info)):
info[i].split('\n')
names = []
features = []
for j in range(len(info[i])):
info[i][j].split(':')
if info[i][j][0] == 'name':
names.append(info[i][j][1])
else:
features.append(info[i][j][1])
result[i] = {'names': names,'features': features}
print(result)
This should give you something like:
{0: {'names': ['ana', 'anna', 'ann'], features:['...', '...']}}
e.t.c
Here is code that may work for you:
f = open("documents.txt").readlines()
f = [i.strip('\n') for i in f]
final_condition = f[len(f)-1]
f.remove(final_condition)
names = [i.split(":")[1] for i in f]
the_dict = {}
the_dict["names"] = names
the_dict["features"] = final_condition
print the_dict
All it does is split the names at ":" and take the last element of the resulting list (the names) and keep it for the list names.
Above is the input table i have in csv
I am trying to use array and while loops in python. I am new to this language. Loops should occur twice to give Category\sub-category\sub-category_1 order...I am trying to use split().Ouput should be like below
import csv
with open('D:\\test.csv', 'rb') as f:
reader = csv.reader(f, delimiter='',quotechar='|')
data = []
for name in reader:
data[name] = []
And if you read the lines of your csv and access the data then you can manipulate the way you want later.
cats = {}
with open('my.csv', "r") as ins:
# check each line of the fine
for line in ins:
# remove double quotes: replace('"', '')
# remove break line : rstrip()
a = str(line).replace('"', '').rstrip().split('|')
if a[0] != 'CatNo':
cats[int(a[0])] = a[1:];
for p in cats:
print 'cat_id: %d, value: %s' % (p, cats[p])
# you can access the value by the int ID
print cats[1001]
the output:
cat_id: 100, value: ['Best Sellers', 'Best Sellers']
cat_id: 1001, value: ['New this Month', 'New Products\\New this Month']
cat_id: 10, value: ['New Products', 'New Products']
cat_id: 1003, value: ['Previous Months', 'New Products\\Previous Months']
cat_id: 110, value: ['Promotional Material', 'Promotional Material']
cat_id: 120, value: ['Discounted Products & Special Offers', 'Discounted Products & Special Offers']
cat_id: 1002, value: ['Last Month', 'New Products\\Last Month']
['New this Month', 'New Products\\New this Month']
Updated script for your question:
categories = {}
def get_parent_category(cat_id):
if len(cat_id) <= 2:
return '';
else:
return cat_id[:-1]
with open('my.csv', "r") as ins:
for line in ins:
# remove double quotes: replace('"', '')
# remove break line : rstrip()
a = str(line).replace('"', '').rstrip().split('|')
cat_id = a[0]
if cat_id != 'CatNo':
categories[cat_id] = {
'parent': get_parent_category(cat_id),
'desc': a[1],
'long_desc': a[2]
};
print 'Categories relations:'
for p in categories:
parent = categories[p]['parent']
output = categories[p]['desc']
while parent != '':
output = categories[parent]['desc'] + ' \\ ' + output
parent = categories[parent]['parent']
print '\t', output
output:
Categories relations:
New Products
New Products \ Best Sellers
New Products \ Discounted Products & Special Offers
New Products \ Best Sellers \ Previous Months
New Products \ Best Sellers \ Last Month
New Products \ Best Sellers \ New this Month
I am experiencing a strange faulty behaviour, where a dictionary is only appended once and I can not add more key value pairs to it.
My code reads in a multi-line string and extracts substrings via split(), to be added to a dictionary. I make use of conditional statements. Strangely only the key:value pairs under the first conditional statement are added.
Therefore I can not complete the dictionary.
How can I solve this issue?
Minimal code:
#I hope the '\n' is sufficient or use '\r\n'
example = "Name: Bugs Bunny\nDOB: 01/04/1900\nAddress: 111 Jokes Drive, Hollywood Hills, CA 11111, United States"
def format(data):
dic = {}
for line in data.splitlines():
#print('Line:', line)
if ':' in line:
info = line.split(': ', 1)[1].rstrip() #does not work with files
#print('Info: ', info)
if ' Name:' in info: #middle name problems! /maiden name
dic['F_NAME'] = info.split(' ', 1)[0].rstrip()
dic['L_NAME'] = info.split(' ', 1)[1].rstrip()
elif 'DOB' in info: #overhang
dic['DD'] = info.split('/', 2)[0].rstrip()
dic['MM'] = info.split('/', 2)[1].rstrip()
dic['YY'] = info.split('/', 2)[2].rstrip()
elif 'Address' in info:
dic['STREET'] = info.split(', ', 2)[0].rstrip()
dic['CITY'] = info.split(', ', 2)[1].rstrip()
dic['ZIP'] = info.split(', ', 2)[2].rstrip()
return dic
if __name__ == '__main__':
x = format(example)
for v, k in x.iteritems():
print v, k
Your code doesn't work, at all. You split off the name before the colon and discard it, looking only at the value after the colon, stored in info. That value never contains the names you are looking for; Name, DOB and Address all are part of the line before the :.
Python lets you assign to multiple names at once; make use of this when splitting:
def format(data):
dic = {}
for line in data.splitlines():
if ':' not in line:
continue
name, _, value = line.partition(':')
name = name.strip()
if name == 'Name':
dic['F_NAME'], dic['L_NAME'] = value.split(None, 1) # strips whitespace for us
elif name == 'DOB':
dic['DD'], dic['MM'], dic['YY'] = (v.strip() for v in value.split('/', 2))
elif name == 'Address':
dic['STREET'], dic['CITY'], dic['ZIP'] = (v.strip() for v in value.split(', ', 2))
return dic
I used str.partition() here rather than limit str.split() to just one split; it is slightly faster that way.
For your sample input this produces:
>>> format(example)
{'CITY': 'Hollywood Hills', 'ZIP': 'CA 11111, United States', 'L_NAME': 'Bunny', 'F_NAME': 'Bugs', 'YY': '1900', 'MM': '04', 'STREET': '111 Jokes Drive', 'DD': '01'}
>>> from pprint import pprint
>>> pprint(format(example))
{'CITY': 'Hollywood Hills',
'DD': '01',
'F_NAME': 'Bugs',
'L_NAME': 'Bunny',
'MM': '04',
'STREET': '111 Jokes Drive',
'YY': '1900',
'ZIP': 'CA 11111, United States'}