I have this dictionary and I need it to be formatted in a table like manner to a text file.
dictionary_lines = {'1-2': (69.18217255912117, 182.95794152905918), '2-3': (35.825144800822954, 175.40503498180715), '3-4': (37.34332738254673, 97.30771061242511), '4-5': (57.026590289091914, 97.33437880141743), '5-6': (57.23912298419586, 14.32271997820363), '6-7': (55.61382561917492, 351.4794228420951), '7-8': (41.21551406933976, 275.1365340619268), '8-1': (57.83213034291623, 272.6560961904868)}
As of right now I'm working on printing them out first before working on writing a new file. I'm stuck on how to properly format them. Here's what I got so far:
for item in dictionary_lines:
print ("{l}\t{d:<10.3f}\t{a:<10.3f}".format(l= ,d =,a = ))
I want it to be printed like this:
Lines[tab] Distances [tab] Azimuth from the South
Key 0 value1 in tuple0 Value2 in tuple0
You don't use the data from the dictionary in the format().
Also, consider using items() to iterate over key-value pairs of a dictionary as:
for key, value in dictionary_lines.items():
print ("l={}\td={:<10.3f}\ta={:<10.3f}".format(key, value[0], value[1]))
l=1-2 d=69.182 a=182.958
l=2-3 d=35.825 a=175.405
l=3-4 d=37.343 a=97.308
l=4-5 d=57.027 a=97.334
l=5-6 d=57.239 a=14.323
l=6-7 d=55.614 a=351.479
l=7-8 d=41.216 a=275.137
l=8-1 d=57.832 a=272.656
I'm new in python new developer, just started my intership.
So, I have a csv file with datas customized like that
Event Category,Event Label,Total Events,Unique Events,Event Value,Avg. Value
From each row of the file I want to extract the labels of the ports (bellow) in an dictionary and add the total and unic events too. The total and unique events I have to sum them only the ports with same labels (not being dublicate).
My datas look like that :
'Search,Santorin (JTR) - Paros (PAS) - Santorin (JTR),"2,199","1,584",0,0.00'
I want my dictionary to look like that :
data_file = 'Analytics.csv' ports_dict = {
# "ATH-HER" : [10000, 5000],
# "ATH-JTR" : [20000, 3500],
# "HER-JTR" : [100, 500] }
data = 'Analytics.csv'
#row= 'Search,Santorin (JTR) - Paros (PAS) - Santorin (JTR),"2,199","1,584",0,0.00'
def extract_counts(data):
ports = []
for i in data.split('"')[1:]:
ports.append(i.split('"')[0])
return ports
An example from my code is this , when I'm running with the row runs ok when I'm using 'data' it gives me back an empty string. Can Anyone help me with this ?
extract_counts(data)
Out[13]: []
What I have to do to run this for the whole csv
Thank you for your help!
First of all, 'data' is just a string variable. In your loop, you are iterating over each character, not reading a file. Using split on a single character with '"', results in an empty string.
To begin your journey with reading CSV files in Python, I recommend:
https://realpython.com/python-csv/
https://docs.python.org/3/library/csv.html
and thanks in advance for any advice. First-time poster here, so I'll do my best to put in all required info. I am also quite beginner with Python, have been doing some online tutorials, and some copy/paste coding from StackOverflow, it's FrankenCoding... So I'm probably approaching this wrong...
I need to compare two CSV files, that will have a changing number of columns, there will only ever be 2 columns that match (for example, email_address in one file, and EMAIL in the other). Both files will have headers, however the names of these headers may change. The file sizes may be anywhere from a few thousand lines up to +2,000,000, with potentially 100+ columns (but more likely to have a handful).
Output is to a third 'results.csv' file, containing all the info. It may be a merge (all unique entries), a substract (remove entries present in one or the other) or an intersect (all entries present in both).
I have searched here, and found a lot of good information, but all of the ones I saw had a fixed number of columns in the files. I've tried dict and dictreader, and I know the answer is in there somewhere, but right now, I'm a bit confused. But since I haven't made any progress in several days, and I can only devote so much time on this, I'm hoping that I can get a nudge in the right direction.
Ideally, I want to learn how to do it myself, which means understanding how the data is 'moving around'.
Extract of CSV files below, I didn't add more columns then (I think) necessary, the dataset I have now will match on Originalid/UID or emailaddress/email, but this may not always be the case.
Original.csv
"originalid","emailaddress",""
"12345678","Bob#mail.com",""
"23456789","NORMA#EMAIL.COM",""
"34567890","HENRY#some-mail.com",""
"45678901","Analisa#sports.com",""
"56789012","greta#mail.org",""
"67890123","STEVEN#EMAIL.ORG",""
Compare.CSV
"email","","DATEOFINVALIDATION_WITH_TIME","OPTOUTDATE_WITH_TIME","EMAIL_USERS"
"Bob#mail.com",,,"true"
"NORMA#EMAIL.COM",,,"true"
"HENRY#some-mail.com",,,"true"
"Henrietta#AWESOME.CA",,,"true"
"NORMAN#sports.CA",,,"true"
"albertina#justemail.CA",,,"true"
Data in results.csv should be all columns from Original.CSV + all columns in Compare.csv, but not the matching one (email) :
"originalid","emailaddress","","DATEOFINVALIDATION_WITH_TIME","OPTOUTDATE_WITH_TIME","EMAIL_USERS"
"12345678","Bob#mail.com","",,,"true"
"23456789","NORMA#EMAIL.COM","",,,"true"
"34567890","HENRY#some-mail.com","",,,"true"
Here are my results as they are now:
email,,DATEOFINVALIDATION_WITH_TIME,OPTOUTDATE_WITH_TIME,EMAIL_USERS
Bob#mail.com,,,true,"['12345678', 'Bob#mail.com', '']"
NORMA#EMAIL.COM,,,true,"['23456789', 'NORMA#EMAIL.COM', '']"
HENRY#some-mail.com,,,true,"['34567890', 'HENRY#some-mail.com', '']"
And here's where I'm at with the code, the print statement returns matching data from the files to screen but not to file, so I'm missing something in there.
***** And I'm not getting the headers from the original.csv file, data is coming in.
import csv
def get_column_from_file(filename, column_name):
f = open(filename, 'r')
reader = csv.reader(f)
headers = next(reader, None)
i = 0
max = (len(headers))
while i < max:
if headers[i] == column_name:
column_header = i
# print(headers[i])
i = i + 1
return(column_header)
file_to_check = "Original.csv"
file_console = "Compare.csv"
column_to_read = get_column_from_file(file_console, 'email')
column_to_compare = get_column_from_file(file_to_check, 'emailaddress')
with open(file_console, 'r') as master:
master_indices = dict((r[1], r) for i, r in enumerate(csv.reader(master)))
with open('Compare.csv', 'r') as hosts:
with open('results.csv', 'w', newline='') as results:
reader = csv.reader(hosts)
writer = csv.writer(results)
writer.writerow(next(reader, []))
for row in reader:
index = master_indices.get(row[0])
if index is not None:
print (row +[master_indices.get(row[0])])
writer.writerow(row +[master_indices.get(row[0])])
Thanks for your time!
Pat
I like that you want to do this yourself, and recognize a need to "understand how the data is moving around." This is exactly how you should be thinking of the problem: focusing on the movement of data rather than the result. Some people may disagree with me, but I think this is a good philosophy to follow as it will make future reuse easier.
You're not trying to build a tool that combines two CSVs, you're trying to organize data (that happens to come from a CSV) according to a common reference (email address) and output the result as a CSV. Because you are talking about potentially large data sets (+2,000,000 [rows] with potentially 100+ columns) recognize that it is important to pay attention to the asymptotic runtime. If you do not know what this is, I recommend you read up on Big-O notation and asymptotic algorithm analysis. You might be okay without this.
First you decide what, from each CSV, is your key. You've already done this, 'email' for 'Compare.csv' and 'emailaddress' from 'Original.csv'.
Now, build yourself a function to produce dictionaries from the CSV based off the key.
def get_dict_from_csv(path_to_csv, key):
with open(path_to_csv, 'r') as f:
reader = csv.reader(f)
headers, *rest = reader # requires python3
key_index = headers.index(key) # find index of key
# dictionary comprehensions are your friend, just think about what you want the dict to look like
d = {row[key_index]: row[:key_index] + row[key_index+1:] # +1 to skip the email entry
for row in rest}
headers.remove(key)
d['HEADERS'] = headers # add headers so you know what the information in the dict is
return d
Now you can call this function on both of your CSVs.
file_console_dict = get_dict_from_csv('Compare.csv', 'email')
file_to_check_dict = get_dict_from_csv('Original.csv', 'emailaddress')
Now you have two dicts which are keyed off the same information. Now we need a function to combine these into one dict.
def combine_dicts(*dicts):
d, *rest = dicts # requires python3
# iteratively pull other dicts into the first one, d
for r in rest:
original_headers = d['HEADERS'][:]
new_headers = r['HEADERS'][:]
# copy headers
d['HEADERS'].extend(new_headers)
# find missing keys
s = set(d.keys()) - set(r.keys()) # keys present in d but not in r
for k in s:
d[k].extend(['', ] * len(new_headers))
del r['HEADERS'] # we don't want to copy this a second time in the loop below
for k, v in r.items():
# use setdefault in case the key didn't exist in the first dict
d.setdefault(k, ['', ] * len(original_headers)).extend(v)
return d
Now you have one dict which has all the information you want, all you need to do is write it back as a CSV.
def write_dict_to_csv(output_file, d, include_key=False):
with open(output_file, 'w', newline='') as results:
writer = csv.writer(results)
# email isn't in your HEADERS, so you'll need to add it
if include_key:
headers = ['email',] + d['HEADERS']
else:
headers = d['HEADERS']
writer.writerow(headers)
# now remove it from the dict so we can iterate over it without including it twice
del d['HEADERS']
for k, v in d.items():
if include_key:
row = [k,] + v
else:
row = v
writer.writerow(row)
And that should be it. To call all of this is just
file_console_dict = get_dict_from_csv('Compare.csv', 'email')
file_to_check_dict = get_dict_from_csv('Original.csv', 'emailaddress')
results_dict = combine_dicts(file_to_check_dict, file_console_dict)
write_dict_to_csv('results.csv', results_dict)
And you can easily see how this can be extended to arbitrarily many dictionaries.
You said you didn't want the email to be in the final CSV. This is counter-intuitive to me, so I made it an option in write_dict_to_csv() in case you change your mind.
When I run all the above I get
email,originalid,,,DATEOFINVALIDATION_WITH_TIME,OPTOUTDATE_WITH_TIME,EMAIL_USERS
Bob#mail.com,12345678,,,,true
NORMA#EMAIL.COM,23456789,,,,true
HENRY#some-mail.com,34567890,,,,true
Analisa#sports.com,45678901,,,,,
greta#mail.org,56789012,,,,,
STEVEN#EMAIL.ORG,67890123,,,,,
Henrietta#AWESOME.CA,,,,,true
NORMAN#sports.CA,,,,,true
albertina#justemail.CA,,,,,true
Right now it looks like you only use writerow once for the header:
writer.writerow(next(reader, []))
As francisco pointed out, uncommenting that last line may fix your problem. You can do this by removing the "#" at the beginning of the line.
I have a list adImageList of dictionary items in following form:
[{'Image_thumb_100x75': 'https://cache.domain.com/mmo/7/295/170/227_174707044_thumb.jpg',
'Image_hoved_400x300': 'https://cache.domain.com/mmo/7/295/170/227_174707044_hoved.jpg',
'Image_full_800x600': 'https://cache.domain.com/mmo/7/295/170/227_174707044.jpg'},
{'Image_thumb_100x75': 'https://cache.domain.com/mmo/7/295/170/227_1136648194_thumb.jpg',
'Image_hoved_400x300': 'https://cache.domain.com/mmo/7/295/170/227_1136648194_hoved.jpg',
'Image_full_800x600': 'https://cache.domain.com/mmo/7/295/170/227_1136648194.jpg'},
{'Image_thumb_100x75': 'https://cache.domain.com/mmo/7/295/170/227_400613427_thumb.jpg',
'Image_hoved_400x300': 'https://cache.domain.com/mmo/7/295/170/227_400613427_hoved.jpg',
'Image_full_800x600': 'https://cache.domain.com/mmo/7/295/170/227_400613427.jpg'}]
I have iterator which suppose to add local URL under each image record after fetching it from web (fetching part works ok). So I'm using following code to append local URL to existing dictionary items:
for i, d in enumerate(adImageList):
file_name_thumb = '0{}_{}_{}'.format(i, page_title,'_thumb_100x75.jpg')
urllib.request.urlretrieve(d['Image_thumb_100x75'], file_name_thumb)
local_path_thumb = dir_path+file_name_thumb
adImageList.insert[i](1,{'Image_thumb_100x75_local_path_thumb':local_path_thumb}) # not working
file_name_hoved = '0{}_{}_{}'.format(i, page_title,'_hoved_400x300.jpg')
urllib.request.urlretrieve(d['Image_hoved_400x300'], file_name_hoved)
local_path_hoved = dir_path+file_name_hoved
adImageList.insert[i](3,{'Image_hoved_400x300_local_path_hoved':local_path_hoved}) # not working
file_name_full = '0{}_{}_{}'.format(i, page_title,'_full_800x600.jpg')
urllib.request.urlretrieve(d['Image_full_800x600'], file_name_full)
local_path_full = dir_path+file_name_full
adImageList.insert[i](5,{'Image_full_800x600_local_path_full':local_path_full}) # not working
Idea is to extend dict items in following manner which also explains numbers 1,3 and 5 in my code
{'Image_thumb_100x75': 'https://cache.domain.com/mmo/7/295/170/227_174707044_thumb.jpg',
'Image_thumb_100x75_local_path_thumb':local_path_thumb #1,
'Image_hoved_400x300': 'https://cache.domain.com/mmo/7/295/170/227_174707044_hoved.jpg',
'Image_hoved_400x300_local_path_hoved':local_path_hoved #3
'Image_full_800x600': 'https://cache.domain.com/mmo/7/295/170/227_174707044.jpg',
'Image_full_800x600_local_path_full':local_path_full #5}
But it's giving me error:
TypeError: 'builtin_function_or_method' object is not subscriptable
Most likely here's what you had in mind:
adImageList[i]['Image_thumb_100x75_local_path_thumb']=local_path_thumb
This adds key 'Image_thumb_100x75_local_path_thumb' to the ith dictionary on the list and sets its value to local_path_thumb. The purpose of 1,3,5 is still unclear.
python stack traces give line numbers for a reason, but my guess is this line:
adImageList.insert[i]
insert is a method
I am stuck with a problem, indeed I have a JSON file in which each objects is in a line. So, if there are 100 objects, there will be 100 lines.
[{ "attribute1" : "no1", "attribute1": "no2"}
{ "attribute1" : "no12", "attribute1": "no22"}]
I open this JSON file, and delete some atttributes of every elements.
Then, I want to write the objects back into the file in the same way (1 object = 1 line).
I have tried to do so with "indent" and "separators" but it does not work.
I would like to have :
[{ "attribute1": "no2"}
{"attribute1": "no22"}]
Thanks for reading.
with open('verbes_lowercase.json','r+',encoding='utf-8-sig') as json_data:
data=json.load(json_data)
for k in range(len(data)):
del data[k]["attribute1"]
json.dump(data,json_data,ensure_ascii=False , indent='1', separators=(',',':'))
json_data.seek(0)
json_data.truncate()
I use a trick to do what I want, to rewrite all the objects into a new line. I write what I want to keep into a newfile.
with open('verbes_lowercase.json','r',encoding='utf-8-sig') as json_data:
data=json.load(json_data)
with open("verbes.json",'w',encoding="utf-8-sig") as file:
file.write("[")
length=len(data)
for k in range(0,length):
del data[k]["attribute1"]
if (k!=length-1):
file.write(json.dumps(data[k], ensure_ascii=False)+",\n")
else:
file.write(json.dumps(data[length-1], ensure_ascii=False)+"]")