I have a csv file that has the headers and the value lines are:
site,access_key,secret_access_key
sa1,something,something
na1,something,something
and so on. I would like the dictionary to look like
site_dict = {"sa1" :{"access_key" : "something", "secret_access_key" : "something"}, "na1" :{"access_key" : "something", "secret_access_key" : "something"}}
I tried what was suggested here : How to create a nested dictionary from a csv file with N rows in Python but it deals with numeric values and I could not get my head around changing it to string values. Any help would be appreciated. If you make a suggestion or provide an answer please make it an answer so I can mark it appropriately. EDIT: I changed the sa1 and na1 to keys by adding the quotes.
You can use the csv module for reading and preread the first line to get the key-names:
# create data
with open("f.txt","w") as f:
f.write("""site,access_key,secret_access_key
sa1,something111,something111
na1,something222,something222""")
import csv
result = {}
with open("f.txt") as f:
# get the keynames from the 1st line
fields = next(f).strip().split(",")
reader = csv.reader(f)
# process all other lines
for line in reader:
# outer key is 1st value
# inner key/values are from the header line and rest of line data
result[line[0]] = dict(zip(fields[1:],line[1:]))
print(result)
Output:
{'sa1': {'access_key': 'something111', 'secret_access_key': 'something111'},
'na1': {'access_key': 'something222', 'secret_access_key': 'something222'}}
Lookup: csv
Related
Been having a hard time trying to convert this dictionary with nested lists into a CSV file. I have a CSV file I am filtering - https://easyupload.io/8zobej. I turned it into a dictionary then cleaned it up. I am now stuck on trying to output it to a CSV and I don't know what to do. I've tried many different combinations of DictWriter and writerows but I keep coming up short. I am now trying to come up with a for loop that would go through the dictionary and output the value it finds to the CSV.
Here is my code - please excuse the comments - I was trying many things.
def dataSorter(filename:str):
"""
The defined function scans the inputted CSV file with 2 columns (Category, value) and sorts the values into categories.
Giving us lists of values for each category
Done by
"""
#Open the input csv file and parse them by comma delimited
with open(filename) as inputcsv:
readcsv = csv.reader(inputcsv, delimiter = ',')
sortedData = {}
#skips first row
next(readcsv)
#loops through file and assigns values to the key in dictionary "sortedData"
for i in readcsv:
category = i[0]
if category not in sortedData:
sortedData[category] = [i[1]]
else:
if i[1] not in sortedData[category]:
sortedData[category].append(i[1])
sortedData[category].sort()
for category in sortedData.keys():
sortedData[category].sort()
I am really new to python and I need to change new artikel Ids to the old ones. The Ids are mapped inside a dict. The file I need to edit is a normal txt where every column is sperated by Tabs. The problem is not replacing the values rather then only replacing the ouccurances in the desired column which is set by pos.
I really would appreciate some help.
def replaceArtCol(filename, pos):
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
val = each_line.split("\t")[pos]
for row in artikel_ID:
if each_line[pos] == pos
line = each_line.replace(val, artikel_ID[val])
output_file.write(line)`
This Code just replaces any occurance of the string in the text file.
supposed your ID mapping dict looks like ID_mapping = {'old_id': 'new_id'}, I think your code is not far from working correctly. A modified version could look like
with open(filename) as input_file, open('test.txt','w') as output_file:
for each_line in input_file:
line = each_line.split("\t")
if line[pos] in ID_mapping.keys():
line[pos] = ID_mapping[line[pos]]
line = '\t'.join(line)
output_file.write(line)
if you're not working in pandas anyway, this can save a lot of overhead.
if your data is tab separated then you must load this data into dataframe.. this way you can have columns and rows structure.. what you are sdoing right now will not allow you to do what you want to do without some complex and buggy logic. you may try these steps
import pandas as pd
df = pd.read_csv("dummy.txt", sep="\t", encoding="latin-1")
df['desired_column_name'] = df['desired_column_name'].replace({"value_to_be_changed": "newvalue"})
print(df.head())
I have a csv file that contains csv data separated by ','. I am trying to convert it into a json format. For this I am tyring to extract headers first. But, I am not able to differentiate between headers and the next row.
Here is the data in csv file:
Start Date ,Start Time,End Date,End Time,Event Title
9/5/2011,3:00:00 PM,9/5/2011,,Social Studies Dept. Meeting
9/5/2011,6:00:00 PM,9/5/2011,8:00:00 PM,Curriculum Meeting
I have tried csvreader as well but I got stuck at the same issue.
Basically Event Title and the date on the next line is not being distinguished.
with open(file_path, 'r') as f:
first_line = re.sub(r'\s+', '', f.read())
arr = []
headers = []
for header in f.readline().split(','):
headers.append(header)
for line in f.readlines():
lineItems = {}
for i,item in enumerate(line.split(',')):
lineItems[headers[i]] = item
arr.append(lineItems)
print(arr)
print(headers)
jsonText = json.dumps(arr)
print(jsonText)
All three print statements give empty result below.
[]
['']
[]
I expect jsonText to be a json of key value pairs.
Use csv.DictReader to get a list of dicts (each row is a dict) then serialize it.
import json
import csv
with open(csvfilepath) as f:
json.dump(list(csv.DictReader(f)), jsonfilepath))
In Python, each file has a marker that keeps track of where you are in the file. Once you call read(), you have read through the entire file, and all future read or readline calls will return nothing.
So, just delete the line involving first_line.
I am getting the CSV file updated on daily basis. Need to process and create new file based on the criteria - If New data then should be tagged as New against the row and if its an update to the existing data then should be tagged as Update. How to write a Python code to process and output in CSV file as follows based on the date.
Day1 input data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
Day2 input Data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
3,sundar,300,8/15/2018
2,raman,200,8/15/2018
Output Data
status,empid,enmname,sal,datekey
new,3,sundar,300,8/15/2018
update,2,raman,200,8/15/2018
I'm feeling nice, so I'll give you some code. Try to learn from it.
To work with CSV files, we'll need the csv module:
import csv
First off, let's teach the computer how to open and parse a CSV file:
def parse(path):
with open(path) as f:
return list(csv.DictReader(f))
csv.DictReader reads the first line of the csv file and uses it as the "names" of the columns. It then creates a dictionary for each subsequent row, where the keys are the column names.
That's all well and good, but we just want the last version with each key:
def parse(path):
data = {}
with open(path) as f:
for row in csv.DictReader(f):
data[row["empid"]] = row
return data
Instead of just creating a list containing everything, this creates a dictionary where the keys are the row's id. This way, rows found later in the file will overwrite rows found earlier in the file.
Now that we've taught the computer how to extract the data from the files, let's get it:
old_data = parse("file1.csv")
new_data = parse("file2.csv")
Iterating through a dictionary gives you its keys, which are the ids defined in the data set. For consistency, key in dictionary says whether key is one of the keys in the dictionary. So we can do this:
new = {
id_: row
for id_, row in new_data.items()
if id_ not in old_data
}
updated = {
id_: row
for id_, row in new_data.items()
if id_ in old_data and old_data[id_] != row
}
I'll put csv.DictWriter here and let you sort out the rest on your own.
I have a .json file where each line is an object. For example, first two lines are:
{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}
{"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}
I have tried processing using ijson lib as follows:
with open(filename, 'r') as f:
objects = ijson.items(f, 'columns.items')
columns = list(objects)
However, i get error:
JSONError: Additional data
Its seems due to multiple objects I'm receiving such error.
Whats the recommended way for analyzing such Json file in Jupyter?
Thank You in advance
The file format is not correct if this is the complete file. Between the curly brackets there must be a comma and it should start and end with a square bracket. Like so: [{...},{...}]. For your data it would look like:
[{"review_id":"x7mDIiDB3jEiPGPHOmDzyw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...},
{"review_id":"dDl8zu1vWPdKGihJrwQbpw","user_id":"msQe1u7Z_XuqjGoqhB0J5g","business_id": ...}]
Here is some code how to clean your file:
lastline = None
with open("yourfile.json","r") as f:
lineList = f.readlines()
lastline=lineList[-1]
with open("yourfile.json","r") as f, open("cleanfile.json","w") as g:
for i,line in enumerate(f,0):
if i == 0:
line = "["+str(line)+","
g.write(line)
elif line == lastline:
g.write(line)
g.write("]")
else:
line = str(line)+","
g.write(line)
To read a json file properly you could also consider using the pandas library (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html).
import pandas as pd
#get a pandas dataframe object from json file
df = pd.read_json("path/to/your/filename.json")
If you are not familiar with pandas, here a quick headstart, how to work with a dataframe object:
df.head() #gives you the first rows of the dataframe
df["review_id"] # gives you the column review_id as a vector
df.iloc[1,:] # gives you the complete row with index 1
df.iloc[1,2] # gives you the item in row with index 1 and column with index 2
While each line on it's own is valid JSON, your file as a whole is not. As such, you can't parse it in one go, you will have to iterate over each line parse it into an object.
You can aggregate these objects in one list, and from there do whatever you like with your data :
import json
with open(filename, 'r') as f:
object_list = []
for line in f.readlines():
object_list.append(json.loads(line))
# object_list will contain all of your file's data
You could do it as a list comprehension to have it a little more pythonic :
with open(filename, 'r') as f:
object_list = [json.loads(line)
for line in f.readlines()]
# object_list will contain all of your file's data
You have multiple lines in your file, so that's why it's throwing errors
import json
with open(filename, 'r') as f:
lines = f.readlines()
first = json.loads(lines[0])
second = json.loads(lines[1])
That should catch both lines and load them in properly