output file to CSV - python

I'm trying to parse a data from json file and create csv file from that output. I've written the python script to create output as per my needs. I need to sort the below csv file in time and date.
current output
My code:
## Shift Start | End time. | Primary | Secondary
def write_CSV () :
# field names
fields = ['ShiftStart', 'EndTime', 'Primary', 'Secondary']
# name of csv file
filename = "CallingLog.csv"
# writing to csv file
with open(filename, 'w') as csvfile:
# creating a csv dict writer object
writer = csv.DictWriter(csvfile, delimiter=',', lineterminator='\n', fieldnames = fields)
# writing headers (field names)
writer.writeheader()
# writing data rows
writer.writerows(totalData)
I want my csv file to be sorted out with date and time like below. atleast date would be fine.
ShiftStart
2020-11-30T17:00:00-08:00
2020-12-01T01:00:00-08:00
2020-12-02T05:00:00-08:00
2020-12-03T05:00:00-08:00
2020-12-04T09:00:00-08:00
2020-12-05T13:00:00-08:00
2020-12-06T13:00:00-08:00
2020-12-07T09:00:00-08:00
2020-12-08T17:00:00-08:00
2020-12-09T09:00:00-08:00
2020-12-10T09:00:00-08:00
2020-12-11T17:00:00-08:00

YourDataframe.sort_values(['Col1','Col2']).to_csv('Path')
Try this, this not only sort and copy to csv but also retain original dataframe without sorting in program for further operations if needed..!

You can adapt this example to your data (that I have not in my possession -:)
from csv import DictReader, DictWriter
from sys import stdout
# simple, self-contained data
data = '''\
a,b,c
3,2,1
2,2,3
1,3,2
'''.splitlines()
# read the data
dr = DictReader(data)
rows = [row for row in dr]
# print the data
print('# unsorted')
dw = DictWriter(stdout, dr.fieldnames)
dw.writeheader()
dw.writerows(rows)
print('# sorted')
dw = DictWriter(stdout, dr.fieldnames)
dw.writeheader()
dw.writerows(sorted(rows, key=lambda d:d['a']))
# unsorted
a,b,c
3,2,1
2,2,3
1,3,2
# sorted
a,b,c
1,3,2
2,2,3
3,2,1
In [40]:
When you read the data using a DictReader, each element of the list rows is a dictionary, keyed on the field names of the first line of the CSV data file.
When you want to sort this list according to the values corresponding to a key, you have to provide sorted with a key argument, that is a function that returns the value on which you want to sort.
This function is called with the whole element to be sorted, in your case a dictionary, and we want to sort on the first field of the CSV, the one indexed by 'a', so that our function, using the lambda syntx to inline the definition in the function call, is just lambda d: d['a'] that returns the value on which we want to sort.
NOTE the sort in this case is alphabetically sorted, and works because I'm dealing with single digits, in general you possibly need to convert the value (by default a string) to something else that makes sense in your context, e.g., lambda d: int(d['a']).

Related

How to sort csv data by changing data value in rows to columns?

There are a total of 25 LOC_CODE, which means locations.
There are a total of 6 ITEM_CODE, which corresponds to CO2 level, CO level...etc.
The item_codes are: 1,3,5,6,8, and 9
The problem:
I want to sort this dataset and overwrite the same csv such that there are only 25 rows where each row is a unique location LOC_CODE.
And I want to display values of all six item_codes per location so it's not one item_code per row like in the screenshot. Everything else stays the same. I just want to display values of all six ITEM_CODE for a unique location on a single row.
This solution assumes that the response from the API is already saved into a CSV file in the format given in the first screenshot. I'm using csv.DictReader and csv.DictWriter from the csv module.
Before beginning, let's just import csv using:
import csv
Let's first create a function that'll process the DATA_DT into a desirable format
def get_datetime(value: str):
# returns year, month, day, time (hh:mm:ss), in that order
# assumes string length is 14 and has format 'YYYYMMDDhhmmss'
y, m, d = value[0:4], value[4:6], value[6:8]
t = ':'.join([value[8:10], value[10:12], value[12:14]])
return y, m, d, t
a dictionary for ITEM_CODE:
item_dict = {'1': 'SO2', '3': ...} # please fill this yourself
and the headers list needed for the CSV DictWriter:
headers = ['Location', 'Year', 'Month', 'Day', 'Time (24h)', 'Station No.',
'SO2', 'NO2', 'CO', 'O3', 'PM10', 'PM2.5', 'Meter Status']
We open the CSV file and read from it into a list raw_data (fill the filename, please). Each element of raw_data is a dict:
with open(r'filepath\filename.csv') as file:
raw_data = list(csv.DictReader(file))
We now create an empty dict data, and then iterate over raw_data, processing its data and writing it to the dict (comments added at necessary places):
data = {}
for rec in raw_data:
loc = rec['LOC_CODE']
if loc not in data:
data[loc] = dict.fromkeys(headers, '')
# rec is from old data, record is for the new data
record = data[loc]
if not record['Year']:
# assumed that date & time for a location is same for all ITEM_CODE
(record['Year'],
record['Month'],
record['Day'],
record['Time (24h)']
) = get_datetime(rec['DATA_DT'])
record['Station No.'] = rec['DATA_STATE']
record['Meter Status'] = rec['DATA_NOVER']
# for the readings we get the apt key using item_dict
record[item_dict[rec['ITEM_CODE']]] = rec['DATA_VALUE']
Finally, we arrange all the records in data into a list of dicts the way csv.DictWriter would expect and write it into the output CSV file (please fill in the filename yourself):
records = [{**v, 'Location': k} for k, v in data.items()]
with open(r'filepath\newfilename.csv', 'w') as file:
writer = csv.DictWriter(file, fieldnames=headers, lineterminator='\n')
writer.writeheader()
writer.writerows(records)
(All the ITEM_CODEs that do not have a value in your table will display an empty cell in the created CSV)
You must, of course, tune this code to your requirements - if you want it to not delete existing data from the CSV please change the mode from 'w' to 'a' or 'r+' and modify the data writing part of the code accordingly. And similarly, if you wanna sort the data by date, or whatever, descending, do the same before beginning.
Should I combine all the code into one or leave it to the reader, comment below... ;P

writing to a single CSV file from multiple dictionaries

Background
I have multiple dictionaries of different lengths. I need to write the values of dictionaries to a single CSV file. I figured I can loop through each dictionary one by one and write the data to CSV. I ran in to a small formatting issue.
Problem/Solution
I realized after I loop through the first dictionary the data of the second writing gets written the row where the first dictionary ended as displayed in the first image I would ideally want my data to print as show in the second image
My Code
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()
for name in e:
writer.writerow({'emp_name':name,'age':e.get(name)})
for company in c:
writer.writerow({'company_name':company,'size':c.get(company)})
writeData()
PS: I would have more than 2 dictionaries so I am looking for a generic way where I can print data from row under the header for all the dictionaries. I am open to all solutions and suggestions.
If all dictionaries are of equal size, you could use zip to iterate over them in parallel. If they aren't of equal size, and you want the iteration to pad to the longest dict, you could use itertools.zip_longest
For example:
import csv
from itertools import zip_longest
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
If the dicts are of equal size, it's simpler:
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000, 'Yahoo': 3000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name', 'age', 'company_name', 'size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip(e.items(), c.items()):
writer.writerow(employee + company)
writeData()
A little side note: If you use Python3, dictionaries are ordered. This isn't the case in Python2. So if you use Python2, you should use collections.OrderedDict instead of the standard dictionary.
There might be a more pythonic solution, but I'd do something like this:
I haven't used your .csv writer thing before, so I just made my own comma separated output.
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
dict_list = [e,c] # add more dicts here.
max_dict_size = max(len(d) for d in dict_list)
output = ""
# Add header information here.
for i in range(max_dict_size):
for j in range(len(dict_list)):
key, value = dict_list[j].popitem() if len(dict_list[j]) else ("","")
output += f"{key},{value},"
output += "\n"
# Now output should contain the full text of the .csv file
# Do file manipulation here.
# You could also do it after each row,
# Where I currently have the output += "\n"
Edit: A little more thinking and I found something that might polish this a bit. You could first map the dictionary into a list of keys using the .key() function on each dictionary and appending those to an empty list.
The advantage with that is that you'd be able to go "forward" instead of popping the dictionary items off the back. It also wouldn't destroy the dictionary.

Python code to process CSV file

I am getting the CSV file updated on daily basis. Need to process and create new file based on the criteria - If New data then should be tagged as New against the row and if its an update to the existing data then should be tagged as Update. How to write a Python code to process and output in CSV file as follows based on the date.
Day1 input data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
Day2 input Data
empid,enmname,sal,datekey
1,cholan,100,8/14/2018
2,ram,200,8/14/2018
3,sundar,300,8/15/2018
2,raman,200,8/15/2018
Output Data
status,empid,enmname,sal,datekey
new,3,sundar,300,8/15/2018
update,2,raman,200,8/15/2018
I'm feeling nice, so I'll give you some code. Try to learn from it.
To work with CSV files, we'll need the csv module:
import csv
First off, let's teach the computer how to open and parse a CSV file:
def parse(path):
with open(path) as f:
return list(csv.DictReader(f))
csv.DictReader reads the first line of the csv file and uses it as the "names" of the columns. It then creates a dictionary for each subsequent row, where the keys are the column names.
That's all well and good, but we just want the last version with each key:
def parse(path):
data = {}
with open(path) as f:
for row in csv.DictReader(f):
data[row["empid"]] = row
return data
Instead of just creating a list containing everything, this creates a dictionary where the keys are the row's id. This way, rows found later in the file will overwrite rows found earlier in the file.
Now that we've taught the computer how to extract the data from the files, let's get it:
old_data = parse("file1.csv")
new_data = parse("file2.csv")
Iterating through a dictionary gives you its keys, which are the ids defined in the data set. For consistency, key in dictionary says whether key is one of the keys in the dictionary. So we can do this:
new = {
id_: row
for id_, row in new_data.items()
if id_ not in old_data
}
updated = {
id_: row
for id_, row in new_data.items()
if id_ in old_data and old_data[id_] != row
}
I'll put csv.DictWriter here and let you sort out the rest on your own.

Adding columns that contain equations to a csv file in python

I am using this script to take a large csv file and separate it by unique values in the first column then save new files. I would like to also add 3 columns at the end of each file that contain calculations based on the previous columns. The columns will have headers as well. My current code is as follows
import csv, itertools as it, operator as op
csv_contents = []
with open('Nov15.csv', 'rb') as fin:
file_reader = csv.DictReader(fin) # default delimiter is comma
print file_reader
fieldnames = file_reader.fieldnames # save for writing
for line in file_reader: # read in all of your data
csv_contents.append(line) # gather data into a list (of dicts)
# input to itertools.groupby must be sorted by the grouping value
sorted_csv_contents = sorted(csv_contents, key=op.itemgetter('Object'))
for groupkey, groupdata in it.groupby(sorted_csv_contents, key=op.itemgetter('Object')):
with open('slice_{:s}.csv'.format(groupkey), 'wb') as gips:
file_writer = csv.DictWriter(gips, fieldnames=fieldnames)
file_writer.writeheader()
file_writer.writerows(groupdata)
If your comments are true, you could probably do it like this (for imaginary columns col1, col2, and the calculation col1 * col2:
for line in file_reader: # read in all of your data
line['calculated_col0'] = line['col1'] * line['col2']
csv_contents.append(line) # gather data into a list (of dicts)

Splitting Rows in csv on several header rows

I am very new to python, so please be gentle.
I have a .csv file, reported to me in this format, so I cannot do much about it:
ClientAccountID AccountAlias CurrencyPrimary FromDate
SomeID SomeAlias SomeCurr SomeDate
OtherID OtherAlias OtherCurr OtherDate
ClientAccountID AccountAlias CurrencyPrimary AssetClass
SomeID SomeAlias SomeCurr SomeClass
OtherID OtherAlias OtherCurr OtherDate
AnotherID AnotherAlias AnotherCurr AnotherDate
I am using the csv package in python, so i have:
with open(theFile, 'rb') as csvfile:
theReader = csv.DictReader(csvfile, delimiter = ',')
Which, as I understand it, creates the dictionary 'theReader'. How do I subset this dictionary, into several dictionaries, splitting them by the header rows in the original csv file? Is there a simple, elegant, non-loop way to create a list of dictionaries (or even a dictionary of dictionaries, with account IDs as keys)? Does that make sense?
Oh. Please note the header rows are not equivalent, but the header rows will always begin with 'ClientAccountID'.
Thanks to # codie, I am now using the following to split the csv into several dicts, based on using the '\t' delimiter.
with open(theFile, 'rb') as csvfile:
theReader = csv.DictReader(csvfile, delimiter = '\t')
However, I now get the entire header row as a key, and each other row as a value. How do I further split this up?
Thanks to #Benjamin Hodgson below, I have the following:
from csv import DictReader
from io import BytesIO
stringios = []
with open('file.csv', 'r') as f:
stringio = None
for line in f:
if line.startswith('ClientAccountID'):
if stringio is not None:
stringios.append(stringio)
stringio = BytesIO()
stringio.write(line)
stringio.write("\n")
stringios.append(stringio)
data = [list(DictReader(x.getvalue(), delimiter=',')) for x in stringios]
If I print the first item in stringios, I get what I would expect. It looks like a single csv. However, if I print the first item in data, using below, i get something odd:
for row in data[0]:
print row
It returns:
{'C':'U'}
{'C':'S'}
{'C':'D'}
...
So it appears it is splitting every character, instead of using the comma delimiter.
If I've understood your question correctly, you have a single CSV file which contains multiple tables. Tables are delimited by header rows which always begin with the string "ClientAccountID".
So the job is to read the CSV file into a list of lists-of-dictionaries. Each entry in the list corresponds to one of the tables in your CSV file.
Here's how I'd do it:
Break up the single CSV file with multiple tables into multiple files each with a single table. (These files could be in-memory.) Do this by looking for lines which start with "ClientAccountID".
Read each of these files into a list of dictionaries using a DictReader.
Here's some code to read the file into a list of StringIOs. (A StringIO is an in-memory file. It works by wrapping a string up into a file-like interface).
from csv import DictReader
from io import StringIO
stringios = []
with open('file.csv', 'r') as f:
stringio = None
for line in f:
if line.startswith('ClientAccountID'):
if stringio is not None:
stringio.seek(0)
stringios.append(stringio)
stringio = StringIO()
stringio.write(line)
stringio.write("\n")
stringio.seek(0)
stringios.append(stringio)
If we encounter a line starting with 'ClientAccountID', we put the current StringIO into the list and start writing to a new one. When you've finished, remember to add the last one to the list too.
Don't forget (as I did, in an earlier version of this answer) to rewind the StringIO after you've written to it using stringio.seek(0).
Now it's straightforward to loop over the StringIOs to get a table of dictionaries.
data = [list(DictReader(x, delimiter='\t')) for x in stringios]
For each file-like object in the list stringios, create a DictReader and read it into a list.
It's not too hard to modify this approach if your data is too big to fit into memory. Use generators instead of lists and do the processing line-by-line.
If your data was not comma or tab delimited you could use str.split, you can combine it with itertools.groupby to delimit the headers and rows:
from itertools import groupby, izip, imap
with open("test.txt") as f:
grps, data = groupby(imap(str.split, f), lambda x: x[0] == "ClientAccountID"), []
for k, v in grps:
if k:
names = next(v)
vals = izip(*next(grps)[1])
data.append(dict(izip(names, vals)))
from pprint import pprint as pp
pp(data)
Output:
[{'AccountAlias': ('SomeAlias', 'OtherAlias'),
'ClientAccountID': ('SomeID', 'OtherID'),
'CurrencyPrimary': ('SomeCurr', 'OtherCurr'),
'FromDate': ('SomeDate', 'OtherDate')},
{'AccountAlias': ('SomeAlias', 'OtherAlias', 'AnotherAlias'),
'AssetClass': ('SomeClass', 'OtherDate', 'AnotherDate'),
'ClientAccountID': ('SomeID', 'OtherID', 'AnotherID'),
'CurrencyPrimary': ('SomeCurr', 'OtherCurr', 'AnotherCurr')}]
If it is tab delimited just change one line:
with open("test.txt") as f:
grps, data = groupby(csv.reader(f, delimiter="\t"), lambda x: x[0] == "ClientAccountID"), []
for k, v in grps:
if k:
names = next(v)
vals = izip(*next(grps)[1])
data.append(dict(izip(names, vals)))

Categories