Creating runtime variable in python to fetch data from dictionary object - python

I have created dictionary object my parsing a json file in python....lets assume the data is as follows
plants = {}
# Add three key-value tuples to the dictionary.
plants["radish"] = {"color":"red", "length":4}
plants["apple"] = {"smell":"sweet", "season":"winter"}
plants["carrot"] = {"use":"medicine", "juice":"sour"}
This could be a very long dictionary object
But at runtime, I need only few values to be stored in a commaa delimited csv file.....The list of properties desired is in a file....
e.g
radish.color
carrot.juice
So, how would I create in python an app, where I can created dynamic variables such as below to get data of the json object & create a csv file....
at runtime i need variable
plants[radish][color]
plants[carrot][juice]
Thank you to all who help
Sanjay

Consider parsing the text file line by line to retrieve file contents. In the read, split the line by period which denotes the keys of dictionaries. From there, use such a list of keys to retrieve dictionary values. Then, iteratively output values to csv, conditioned by number of items:
Txt file
radish.color
carrot.juice
Python code
import csv
plants = {}
plants["radish"] = {"color":"red", "length":4}
plants["apple"] = {"smell":"sweet", "season":"winter"}
plants["carrot"] = {"use":"medicine", "juice":"sour"}
data = []
with open("Input.txt", "r") as f:
for line in f:
data.append(line.replace("\n", "").strip().split("."))
with open("Output.csv", "w") as w:
writer = csv.writer(w, lineterminator = '\n')
for item in data:
if len(item) == 2: # ONE-NEST DEEP
writer.writerow([item[0], item[1], plants[item[0]][item[1]]])
if len(item) == 3: # SECOND NEST DEEP
writer.writerow([item[0], item[1], item[2], plants[item[0]][item[1]][item[2]]])
Output csv
radish,color,red
carrot,juice,sour
(Note: the deeper the nest, the more columns will output conflicting with key/value pairs across columns -maybe output different structured csv files like one-level files/second-level files)

Related

Deleting specific JSON lines while iterating thorugh key in Python

I have a large JSON file that contains image annotation data. I am iterating through one of the keys below.:
import json
# Opening JSON file
f = open('annotations.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Iterating through the json
# list
for i in data['annotations']:
if i['segmentation'] == [[]]:
print(i['segmentation'])
del i
#print(i['segmentation'])
# Closing file
f.close()
Printing the returned dictionaries, they look like this:
{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":339,"area":0}
I am trying to remove the following above lines in the annotations key that contain no data for segmentation. I am able to extract these lines, I am just not sure how to remove them without breaking the format of the file.
{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":339,"area":0}
,{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":340,"area":0}
,{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":341,"area":0}
,{"iscrowd":0,"image_id":32,"bbox":[],"segmentation":[[]],"category_id":2,"id":342,"area":0},
...
Here is what finally got it working for me:
import json
# Opening JSON file
f = open('annotations.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Closing file
f.close()
# Iterating through the json
# list
count = 0
for key in data['annotations']:
count +=1
if key['segmentation'] == [[]]:
print(key['segmentation'])
data["annotations"].pop(count)
if key['bbox'] == []:
data["annotations"].pop(count)
#print(i['segmentation'])
with open("newannotations.json", "w") as json_file:
json.dump(data, json_file)
The function json.loads() returns a python dictionary, which you can then modify as you'd like. Similarly json.dumps() can be used to write a json file from a python dictionary.
In order to remove an entry from a dictionary, you can use the dictionary pop() method. Assuming in the above you want to delete each entry referred to with the key i (as per the del i) if the entry in data["annotations"][i]["segmentation"] ==[[]], one could do it approximately as follows:
import json
# Opening JSON file
f = open('annotations.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# Closing file
f.close()
# Iterating through the json
# list
for key in data['annotations']:
if data["annotations"][key]['segmentation'] == [[]]:
print(data["annotations"][key]['segmentation'])
data["annotations"].pop(key)
#print(i['segmentation'])
with open("newannotations.json", "w") as json_file:
json.dump(data, json_file)
Is this what you wanted to do?

write a list of dictionaries with different sizes and different keys into csv file and read it back

i have a list of dictionaries like this:
[{"a0":0,"a1":1,"a2":2,"a3":3},{"a4":4,"a5":5,"a6":6},{"a7":7,"a8":8}]
i want to save it to a csv file and read it back.
A=[{"a0":0,"a1":1,"a2":2,"a3":3},{"a4":4,"a5":5,"a6":6},{"a7":7,"a8":8}]
with open("file_temp.csv","w+",newline="") as file_temp:
file_temp_writer=csv.writer(file_temp)
for a in A:
temp_list=[]
for key,value in a.items():
temp_list.append([[key],[value]])
file_temp_writer.writerow(temp_list)
now the csv file is:
"[['a0'], [0]]","[['a1'], [1]]","[['a2'], [2]]","[['a3'], [3]]"
"[['a4'], [4]]","[['a5'], [5]]","[['a6'], [6]]"
"[['a7'], [7]]","[['a8'], [8]]"
and then to read it back:
import csv
B=[]
with open("file_temp.csv","r+",newline="") as file_temp:
file_temp_reader= csv.reader(file_temp)
for row in file_temp_reader:
row_dict={}
for i in range(len(row)):
row[i]=row[i].strip('"')
row_dict[row[i][0]]=row[i][1]
B.append(row_dict)
now if i print(B) the result will be:
[{'[': '['}, {'[': '['}, {'[': '['}]
i know the problem is that when i write in a csv file, it save each element as a string. for example "[['a0'], [0]]" instead of [['a0'], [0]]. i used strip('"') to solve this problem. but i cant solve the problem.
If you really need this as a CSV file, I think your issue is where you create temp_list your're creating a nested list when you append to it.
Try this instead:
# use meaningful names
dictionary_list = [{"a0":0,"a1":1,"a2":2,"a3":3},{"a4":4,"a5":5,"a6":6},{"a7":7,"a8":8}]
with open("file_temp.csv","w+",newline="") as file_temp:
file_temp_writer=csv.writer(file_temp)
for d in dictionary_list:
temp_list=[]
for key,value in d.items():
# notice the difference here, instead of appending a nested list
# we just append the key and value
# this will make temp_list something like: [a0, 0, a1, 1, etc...]
temp_list.append(key)
temp_list.append(value)
file_temp_writer.writerow(temp_list)
To save a dictionary it is easy using json:
import json
A=[{"a0":0,"a1":1,"a2":2,"a3":3},{"a4":4,"a5":5,"a6":6},{"a7":7,"a8":8}]
with open("file_temp.json", "w") as f:
json.dump(A, f)
To retrieve data again:
with open("file_temp.json", "r") as f:
B = json.load(f)

Extracting value data from multiple JSON strings in a single file

I know I am missing the obvious here but I have the following PYTHON code in which I am trying to-
Take a specified JSON file containing multiple strings as an input.
Start at the line 1 and look for the key value of "content_text"
Add the key value to a new dictionary and write said dictionary to a new file
Repeat 1-3 on additional JSON files
import json
def OpenJsonFileAndPullData (JsonFileName, JsonOutputFileName):
output_file=open(JsonOutputFileName, 'w')
result = []
with open(JsonFileName, 'r') as InputFile:
for line in InputFile:
Item=json.loads(line)
my_dict={}
print item
my_dict['Post Content']=item.get('content_text')
my_dict['Type of Post']=item.get('content_type')
print my_dict
result.append(my_dict)
json.dumps(result, output_file)
OpenJsonFileAndPullData ('MyInput.json', 'MyOutput.txt')
However, when run I receive this error:
AttributeError: 'str' object has no attribute 'get'
Python is case-sensitive.
Item = json.loads(line) # variable "Item"
my_dict['Post Content'] = item.get('content_text') # another variable "item"
By the way, why don't you load whole file as json at once?

Python merge csv files with matching Index

I want to merge two CSV files based on a field
The 1st one looks like this:
ID, field1, field2
1,a,green
2,b,white
2,b,red
2,b,blue
3,c,black
The second one looks like:
ID, field3
1,value1
2,value2
What I want to have is:
ID, field1, field2,field3
1,a,green,value1
2,b,white,value2
2,b,red,value2
2,b,blue,value2
3,c,black,''
I'm using pydev on eclipse
import csv
endings0=[]
endings1=[]
with open("salaries.csv") as book0:
for line in book0:
endings0.append(line.split(',')[-1])
endings1.append(line.split(',')[0])
linecounter=0
res = open("result.csv","w")
with open('total.csv') as book2:
for line in book2:
# if not header line:
l=line.split(',')[0]
for linecounter in range(0,endings1.__len__()):
if( l == endings1[linecounter]):
res.writelines(line.replace("\n","") +','+str(endings0[linecounter]))
print("done")
There are a bunch of things wrong with what you're doing
You should really really be using the classes in the csv module to read and write csv files. Importing the module isn't enough. You actually need to call its functions
You should never find yourself typing endings1.__len__(). Use len(endings1) instead
You should never find yourself typing for linecounter in range(0,len(endings1)).
Use either for linecounter, _ in enumerate(endings1),
or better yet for end1, end2 in zip(endings1, endings2)
A dictionary is a much better data structure for lookup than a pair of parallel lists. To quote pike:
If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident.
Here's how I'd do it:
import csv
with open('second.csv') as f:
# look, a builtin to read csv file lines as dictionaries!
reader = csv.DictReader(f)
# build a mapping of id to field3
id_to_field3 = {row['ID']: row['field3'] for row in reader}
# you can put more than one open inside a with statement
with open('first.csv') as f, open('result.csv', 'o') as fo:
# csv even has a class to write files!
reader = csv.DictReader(f)
res = csv.DictWriter(fo, fieldnames=reader.fieldnames + ['field3'])
res.writeheader()
for row in reader:
# .get returns its second argument if there was no match
row['field3'] = id_to_field3.get(row['ID'], '')
res.writerow(row)
I have a high-level solution for you.
Deserialize your first CSV into dict1 mapping ID to a list containing a list containing field1 and field2.
Deserialize your second CSV into dict2 mapping ID to field3.
for each (id, list) in dict1, do list.append(dict2.setdefault(id, '')). Now serialize it back into CSV using whatever serializer you were using before.
I used dictionary's setdefault because I noticed that ID 3 is in the first CSV file but not the second.

Write key to separate csv based on value in dictionary

[Using Python3] I have a csv file that has two columns (an email address and a country code; script is made to actually make it two columns if not the case in the original file - kind of) that I want to split out by the value in the second column and output in separate csv files.
eppetj#desrfpkwpwmhdc.com us ==> output-us.csv
uheuyvhy#zyetccm.com de ==> output-de.csv
avpxhbdt#reywimmujbwm.com es ==> output-es.csv
gqcottyqmy#romeajpui.com it ==> output-it.csv
qscar#tpcptkfuaiod.com fr ==> output-fr.csv
qshxvlngi#oxnzjbdpvlwaem.com gb ==> output-gb.csv
vztybzbxqq#gahvg.com us ==> output-us.csv
... ... ...
Currently my code kind of does this, but instead of writing each email address to the csv it overwrites the email placed before that. Can someone help me out with this?
I am very new to programming and Python and I might not have written the code in the most pythonic way, so I would really appreciate any feedback on the code in general!
Thanks in advance!
Code:
import csv
def tsv_to_dict(filename):
"""Creates a reader of a specified .tsv file."""
with open(filename, 'r') as f:
reader = csv.reader(f, delimiter='\t') # '\t' implies tab
email_list = []
# Checks each list in the reader list and removes empty elements
for lst in reader:
email_list.append([elem for elem in lst if elem != '']) # List comprehension
# Stores the list of lists as a dict
email_dict = dict(email_list)
return email_dict
def count_keys(dictionary):
"""Counts the number of entries in a dictionary."""
return len(dictionary.keys())
def clean_dict(dictionary):
"""Removes all whitespace in keys from specified dictionary."""
return { k.strip():v for k,v in dictionary.items() } # Dictionary comprehension
def split_emails(dictionary):
"""Splits out all email addresses from dictionary into output csv files by country code."""
# Creating a list of unique country codes
cc_list = []
for v in dictionary.values():
if not v in cc_list:
cc_list.append(v)
# Writing the email addresses to a csv based on the cc (value) in dictionary
for key, value in dictionary.items():
for c in cc_list:
if c == value:
with open('output-' +str(c) +'.csv', 'w') as f_out:
writer = csv.writer(f_out, lineterminator='\r\n')
writer.writerow([key])
You can simplify this a lot by using a defaultdict:
import csv
from collections import defaultdict
emails = defaultdict(list)
with open('email.tsv','r') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
if row:
if '#' in row[0]:
emails[row[1].strip()].append(row[0].strip()+'\n')
for key,values in emails.items():
with open('output-{}.csv'.format(key), 'w') as f:
f.writelines(values)
As your separated files are not comma separated, but single columns - you don't need the csv module and can simply write the rows.
The emails dictionary contains a key for each country code, and a list for all the matching email addresses. To make sure the email addresses are printed correctly, we remove any whitespace and add the a line break (this is so we can use writelines later).
Once the dictionary is populated, its simply a matter of stepping through the keys to create the files and then writing out the resulting list.
The problem with your code is that it keeps opening the same country output file each time it writes an entry into it, thereby overwriting whatever might have already been there.
A simple way to avoid that is to open all the output files at once for writing and store them in a dictionary keyed by the country code. Likewise, you can have another that associates each country code to acsv.writerobject for that country's output file.
Update: While I agree that Burhan's approach is probably superior, I feel that you have the idea that my earlier answer was excessively long due to all the comments it had -- so here's another version of essentially the same logic but with minimal comments to allow you better discern its reasonably-short true length (even with the contextmanager).
import csv
from contextlib import contextmanager
#contextmanager # to manage simultaneous opening and closing of output files
def open_country_csv_files(countries):
csv_files = {country: open('output-'+country+'.csv', 'w')
for country in countries}
yield csv_files
for f in csv_files.values(): f.close()
with open('email.tsv', 'r') as f:
email_dict = {row[0]: row[1] for row in csv.reader(f, delimiter='\t') if row}
countries = set(email_dict.values())
with open_country_csv_files(countries) as csv_files:
csv_writers = {country: csv.writer(csv_files[country], lineterminator='\r\n')
for country in countries}
for email_addr,country in email_dict.items():
csv_writers[country].writerow([email_addr])
Not a Python answer, but maybe you can use this Bash solution.
$ while read email country
do
echo $email >> output-$country.csv
done < in.csv
This reads the lines from in.csv, splits them into two parts email and country, and appends (>>) the email to the file called output-$country.csv.

Categories