My spreadsheet reading function constantly overwrites everything - python

Okay so I have a spreadsheet and I want to put all the entries into my nested dictionary lists.
I decided to use two for loops to iterate through the spreadsheet. And safe the value of the cell to the according nested dictionary.
Here is my code (I know it's shit I'm pretty inexperienced):
def SpGetLink():
global SpDic
for row in SpDicGen():
Data = dict.fromkeys(SpDic.values(), {})
Data[SpDic[row]]["Link"] = []
Data[SpDic[row]]["Title"] = []
for col in range(1000):
if ws.cell(row=row, column=col + 6).hyperlink is not None:
data = str(ws.cell(row=row, column=col + 6).hyperlink.target)
if data.startswith("http"):
if data not in Data[SpDic[row]]["Link"]:
Data[SpDic[row]]["Link"].append(data)
json.dump(Data, open("Data.json", "w+"), indent=4) # , sort_keys=True)
else:
Data[SpDic[row]]["Title"].append(data)
SpDic is a seperate Dictionary to get the corresponding Name to the row.
Now my Problem is the following.
When I open Data.json every list that should contain all links in the corresponding row contains the same 5 links which are the last 5 links the the spreadsheet. It looks something like this:
"smile": {
"Link": [
"https://media.giphy.com/media/k7J8aS3xpmhpK/giphy.gif",
"https://media.giphy.com/media/aY1HMl4E1Ju1y/giphy.gif",
"https://media.giphy.com/media/RLJxQtX8Hs7XytaoyX/giphy.gif",
"https://media.giphy.com/media/1448TKNMMg4BFu/giphy.gif",
"https://media.giphy.com/media/b7l5cvG94cqo8/giphy.gif"
],
"Title": []
},
"grin": {
"Link": [
"https://media.giphy.com/media/k7J8aS3xpmhpK/giphy.gif",
"https://media.giphy.com/media/aY1HMl4E1Ju1y/giphy.gif",
"https://media.giphy.com/media/RLJxQtX8Hs7XytaoyX/giphy.gif",
"https://media.giphy.com/media/1448TKNMMg4BFu/giphy.gif",
"https://media.giphy.com/media/b7l5cvG94cqo8/giphy.gif"
],
"Title": []
},
"laugh": {
"Link": [
"https://media.giphy.com/media/k7J8aS3xpmhpK/giphy.gif",
"https://media.giphy.com/media/aY1HMl4E1Ju1y/giphy.gif",
"https://media.giphy.com/media/RLJxQtX8Hs7XytaoyX/giphy.gif",
"https://media.giphy.com/media/1448TKNMMg4BFu/giphy.gif",
"https://media.giphy.com/media/b7l5cvG94cqo8/giphy.gif"
],
"Title": []
},
Does anyone have an Idea why this is happening and how to fix it ?

I think the reason why your data is overwritten each time is because you are dumping your json inside the for loop. I moved it outside and I think this should do the trick.
def SpGetLink():
Data = []
global SpDic
for row in SpDicGen():
Data = dict.fromkeys(SpDic.values(), {})
Data[SpDic[row]]["Link"] = []
Data[SpDic[row]]["Title"] = []
for col in range(1000):
if ws.cell(row=row, column=col + 6).hyperlink is not None:
data = str(ws.cell(row=row, column=col + 6).hyperlink.target)
if data.startswith("http"):
if data not in Data[SpDic[row]]["Link"]:
Data[SpDic[row]]["Link"].append(data)
else:
Data[SpDic[row]]["Title"].append(data)
with open("Data.json", "w") as f:
json.dump(Data, f, indent=4)

I fixed it by just making one of the for loops into a function and calling it in another for loop.

Related

How to transform json structure in python?

I have a python code that takes some data from excel file and export`s it to json file.
import json
from collections import OrderedDict
from itertools import islice
from openpyxl import load_workbook
wb = load_workbook('E:\test.xlsx')
sheet = wb['Sheet1']
deviceData_list = []
for row in islice(sheet.values, 1, sheet.max_row):
deviceData = OrderedDict()
deviceData['hostname'] = row[2]
deviceData['ip'] = row[7]
deviceData['username'] = row[13]
deviceData['password'] = row[15]
deviceData['secret'] = row[9]
deviceData_list.append(deviceData)
j = json.dumps(deviceData_list)
print(j)
with open('data.json', 'w') as f:
f.write(j)
it outputs json file like this:
[{"hostname": "sw1", "ip": "8.8.8.8", "username": "contoso", "password": "contoso", "secret": "contoso"}, {"hostname": "sw2", "ip": "8.8.8.9", "username": "contoso", "password": "contoso2", "secret": "contoso2"}]
and what I would like is to make it look like this:
{"switch_list": [{"hostname": "sw1","ip": "8.8.8.8","username": "contoso","password": "contoso","secret": "contoso"},{"hostname": "sw2","ip": "8.8.8.9","username": "contoso","password": "contoso2","secret": "contoso2"}]}
So basically I need to put "{ "switch_list":" in front of the current output and "]}" at the end, and whatever silly idea I had, I got different result. I figured out two ways to do it , first before json.dump , and second to just edit the json file after it is created, but i do not know what to target since "switch_list" is kind of outside :) This also means that I`m a dummy regarding Python or programming in general :) Any help is appreciated, I will not post what I tried since it is uselles. This is also my first post here so please forgive any stupidity. Cheers
Instead of:
j = json.dumps(deviceData_list)
output = {"switch_list": deviceData_list}
j = json.dumps(output)
This creates a new dictionary where the only key is switch_list and its contents are your existing list. Then you dump that data.
Change
j = json.dumps(deviceData_list)
to something like:
j = json.dumps({"switch_list": deviceData_list})

CSV to json convert

I have this data in .csv format:
I want to convert it into .json format like this :
{
"title": "view3",
"sharedWithOrganization": false,
"sharedWithUsers": [
"81241",
"81242",
"81245",
"81265"
],
"filters": [{"field":"Account ID","comparator":"==","value":"prod"}]
},
{
"title": "view3",
"sharedWithOrganization": true,
"sharedWithUsers": [],
"filters": [{"field":"Environment_AG","comparator":"=#","value":"Development"}]
}
Below these are the conversion for comparator
'equals' means '=='
'not equal' means '!='
'contains' means '=#'
'does not contain' means '!=#'
Can you please help me convert .csv to .json I am unable to convert using python .
What I would do, without giving you the proper answer (doing it yourself is better for learning).
First : Create an Object containing your informations
class View():
def __init__(self, title, field, comparator, value, sharedWithOrganization, user1, user2, user3, user4, user5, user6):
self.title = title
self.field = field
self.comparator = comparator
self.value = value
self.sharedWithOrganization = sharedWithOrganization
self.user1 = user1
...
self.user6 = user6
Then I would load the CSV and create an object for each line, and store them in a Dict with the following structure :
loadedCsv = { "Your line title (ex : view3)" : [List of all the object with the title view3] }
Yes, with this point of view, there is data redundancy of the title parameter, you can chose to remove it from the object.
When this is done, I would, for each title in my dictionary, get all the element I need and format them in JSON by using "import json" (c.f python documentation : https://docs.python.org/3/library/json.html)
Hehere I'm posting my work on your doubt.. hope u and others will find it helpful.
But I want you to try urself....
import csv
import json
def csv_to_json(csvFilePath, jsonFilePath):
jsonArray = []
jsonArray2 = []
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for row in csvReader:
if row["comparator"] == "equals":
row["comparator"]="=="
elif row["comparator"]=="not equal":
row["comparator"]="!#"
elif row["comparator"]=="contains":
row["comparator"]="=#"
elif row["comparator"]=="does not contain":
row["comparator"]="!=#"
final_data={
"title":row["title"],
"sharedWithOrganization":bool(row["sharedWithOrganization"]),
"sharedWithUsers": [
row["user1"],
row["user2"],
row["user3"],
row["user4"],
row["user5"],
row["user6"]
],
"filters":[ {"field":row['field'],"comparator":row["comparator"],"value":row["value"]} ]
}
jsonArray.append(final_data)
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(jsonArray, indent=4)
jsonf.write(jsonString)
csvFilePath = r'test.csv'
jsonFilePath = r'test11.json'
csv_to_json(csvFilePath, jsonFilePath)

Read a file and match lines above or below from the matching pattern

I'm reading an input json file, and capturing the array values into a dictionary, by matching tar.gz and printing a line above that (essentially the yaml file).
{"Windows": [
"/home/windows/work/input.yaml",
"/home/windows/work/windows.tar.gz"
],
"Mac": [
"/home/macos/required/utilities/input.yaml",
"/home/macos/required/utilities.tar.gz"
],
"Unix": [
"/home/unix/functional/plugins/input.yaml",
"/home/unix/functional/plugins/Plugin.tar.gz"
]
goes on..
}
Output of the dictionary:
{'/home/windows/work/windows.tar.gz': '/home/windows/work/input.yaml',
'/home/macos/required/utilities/utilities.tar.gz' : '/home/macos/required/input.yaml'
......
}
Problem being, if the entries of json changes, i.e. A) tar.gz entries can come as the 1st element in the list of values or B. or, its mix and match,
Irrespective of the entries, how can I get the output dictionary to be of above mentioned format only.
{ "Windows": [
"/home/windows/work/windows.tar.gz",
"/home/windows/work/input.yaml"
],
"Mac": [
"/home/macos/required/utilities/utilities.tar.gz",
"/home/macos/required/input.yaml"
],
"Unix": [
"/home/unix/functional/plugins/Plugin.tar.gz",
"/home/unix/functional/plugins/input.yaml"
]
goes on.. }
mix and match scenario.
{ "Windows": [
"/home/windows/work/windows.tar.gz",
"/home/windows/work/input.yaml"
],
"Mac": [
"/home/macos/required/utilities/input.yaml",
"/home/macos/required/utilities.tar.gz"
],
"Unix": [
"/home/unix/functional/plugins/Plugin.tar.gz",
"/home/unix/functional/plugins/input.yaml"
] }
My code snippet.
def read_input():
files_to_be_processed = {}
with open('input.json', 'r') as f:
lines = f.read().splitlines()
lines = [line.replace('"', '').replace(" ", '').replace(',', '') for line in lines]
for index, value in enumerate(lines):
match = re.match(r".*.tar.gz", line)
if match:
j = i-1 if i > 1 else 0
for k in range(j, i):
read_input[match.string] = lines[k]
print(read_input)
A method here is to have the following:
1- Using the JSON class in python makes your whole process much easier.
2- After taking the data in the JSON class, you can check each object (aka Windows/Max/Unix), for both the tar-gz and the yaml
3- Assign to new dictionary
Here is a quick code:
import json
def read_input():
files_to_be_processed = {}
with open('input.json','r') as f:
jsonObject = json.load(f)
for value in jsonObject.items():
tarGz = ""
Yaml = ""
for line in value[1]: #value[0] contains the key (e.g. Windows)
if line.endswith('.tar.gz'):
tarGz = line
elif line.endswith('.yaml'):
Yaml = line
files_to_be_processed[tarGz] = Yaml
print(files_to_be_processed)
read_input()
This code can be shortened and optimised using things like list comprehension and other methods, but it should be a good place to get started
One way could be for you to transform the list within your input json_dict into a dict that has a key for "yaml" and "gz"
json_dict_1 = dict.fromkeys(json_dict, dict())
for key in json_dict:
list_val = json_dict[key]
for entry in list_val:
entry_key = 'yaml' if 'yaml' in entry[-4:] else 'gz'
json_dict_1[key][entry_key] = entry
print(json_dict_1)
#{'Windows': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Mac': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Unix': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'}}

Appending "size" element to last json child element for a sunburst diagram

I have a working python script that takes my csv columns data and converts it to a json file to be read by my d3 sunburst visualization. Problem is that there is no "size" element at the final child element which is needed to populate the sunburst diagram correctly.
Below is the script I have that reads the csv to a json the way I need it. I've tried modifying the script with an if else loop to find where there is no child element (the last element) and then appending on that element the "size: 1" but nothing happens.
This is example csv data. The code should work for anything though.
Energy, Grooming, Shedding, Trainability, Group, Breed
Regular Exercise, 2-3 Times a Week Brushing, Seasonal, Easy Training, Toy Group, Affenpinscher
import csv
from collections import defaultdict
def ctree():
return defaultdict(ctree)
def build_leaf(name, leaf):
res = {"name": name}
# add children node if the leaf actually has any children
if len(leaf.keys()) > 0:
res["children"] = [build_leaf(k, v) for k, v in leaf.items()]
return res
def main():
tree = ctree()
# NOTE: you need to have test.csv file as neighbor to this file
with open('test.csv') as csvfile:
reader = csv.reader(csvfile)
for rid, row in enumerate(reader):
if rid == 0:
continue
# usage of python magic to construct dynamic tree structure and
# basically grouping csv values under their parents
leaf = tree[row[0]]
for cid in range(1, len(row)):
leaf = leaf[row[cid]]
# building a custom tree structure
res = []
for name, leaf in tree.items():
res.append(build_leaf(name, leaf))
# this is what I tried to append the size element
def parseTree(leaf):
if len(leaf["children"]) == 0:
return obj["size"] == 1
else:
for child in leaf["children"]:
leaf['children'].append(parseTree(child))
# printing results into the terminal
import json
import uuid
from IPython.display import display_javascript, display_html, display
print(json.dumps(res, indent=2))
main()
The final child element needs to read something like this:
[
{
"name": "Regular Exercise",
"children": [
{
"name": "2-3 Times a Week Brushing",
"children": [
{
"name": "Seasonal",
"children": [
{
"name": "Easy Training",
"children": [
{
"name": "Toy Group",
"children": [
{
"name": "Affenpinscher",
"size": 1
}
]
}]}]}]}]}]}
To add size to the last entry:
import csv
from collections import defaultdict
import json
#import uuid
#from IPython.display import display_javascript, display_html, display
def ctree():
return defaultdict(ctree)
def build_leaf(name, leaf):
res = {"name": name}
# add children node if the leaf actually has any children
if leaf.keys():
res["children"] = [build_leaf(k, v) for k, v in leaf.items()]
else:
res['size'] = 1
return res
def main():
tree = ctree()
# NOTE: you need to have test.csv file as neighbor to this file
with open('test.csv') as csvfile:
reader = csv.reader(csvfile)
header = next(reader) # read the header row
for row in reader:
# usage of python magic to construct dynamic tree structure and
# basically grouping csv values under their parents
leaf = tree[row[0]]
for value in row[1:]:
leaf = leaf[value]
# building a custom tree structure
res = []
for name, leaf in tree.items():
res.append(build_leaf(name, leaf))
# printing results into the terminal
print(json.dumps(res, indent=2))
main()

Extract from dynamic JSON response with Scrapy

I want to extract the 'avail' value from the JSON output that look like this.
{
"result": {
"code": 100,
"message": "Command Successful"
},
"domains": {
"yolotaxpayers.com": {
"avail": false,
"tld": "com",
"price": "49.95",
"premium": false,
"backorder": true
}
}
}
The problem is that the ['avail'] value is under ["domains"]["domain_name"] and I can't figure out how to get the domain name.
You have my spider below. The first part works fine, but not the second one.
import scrapy
import json
from whois.items import WhoisItem
class whoislistSpider(scrapy.Spider):
name = "whois_list"
start_urls = []
f = open('test.txt', 'r')
global lines
lines = f.read().splitlines()
f.close()
def __init__(self):
for line in lines:
self.start_urls.append('http://www.example.com/api/domain/check/%s/com' % line)
def parse(self, response):
for line in lines:
jsonresponse = json.loads(response.body_as_unicode())
item = WhoisItem()
domain_name = list(jsonresponse['domains'].keys())[0]
item["avail"] = jsonresponse["domains"][domain_name]["avail"]
item["domain"] = domain_name
yield item
Thank you in advance for your replies.
Currently, it tries to get the value by the "('%s.com' % line)" key.
You need to do the string formatting correctly:
domain_name = "%s.com" % line.strip()
item["avail"] = jsonresponse["domains"][domain_name]["avail"]
Assuming you are only expecting one result per response:
domain_name = list(jsonresponse['domains'].keys())[0]
item["avail"] = jsonresponse["domains"][domain_name]["avail"]
This will work even if there is a mismatch between the domain in the file "test.txt" and the domain in the result.
To get the domain name from above json response you can use list comprehension , e.g:
domain_name = [x for x in jsonresponse.values()[0].keys()]
To get the "avail" value use same method, e.g:
avail = [x["avail"] for x in jsonresponse.values()[0].values() if "avail" in x]
to get the values in string format you should call it by index 0 e.g:
domain_name[0] and avail[0] because list comprehension results stored in list type variable.
More info on list comprehension

Categories