Editing String and dictionary output from JSON file - python

I have a program which takes in a JSON file, reads it line by line, aggregates the time into four bins depending on the time, and then outputs it to a file. However my file output contains extra characters due to concatenating a dictionary with a string.
For example this is how the output for one line looks:
dwQEZBFen2GdihLLfWeexA<bound method DataFrame.to_dict of Friday Monday Saturday Sunday Thursday Tuesday Wednesday
Category
Afternoon 0 0 3 2 2 0 1
Evening 20 4 16 11 4 3 5
Night 16 1 19 5 2 5 3>
The memory address is being concatenated as well into the output file.
Here is the code used for creating this specific file:
import json
import ast
import pandas as pd
from datetime import datetime
def cleanStr4SQL(s):
return s.replace("'","`").replace("\n"," ")
def parseCheckinData():
#write code to parse yelp_checkin.JSON
# Add a new column "Time" to the DataFrame and set the values after left padding the values in the index
with open('yelp_checkin.JSON') as f:
outfile = open('checkin.txt', 'w')
line = f.readline()
# print(line)
count_line = 0
while line:
data = json.loads(line)
# print(data)
# jsontxt = cleanStr4SQL(str(data['time']))
# Parse the json and convert to a dictionary object
jsondict = ast.literal_eval(str(data))
outfile.write(cleanStr4SQL(str(data['business_id'])))
# Convert the "time" element in the dictionary to a pandas DataFrame
df = pd.DataFrame(jsondict['time'])
# Add a new column "Time" to the DataFrame and set the values after left padding the values in the index
df['Time'] = df.index.str.rjust(5, '0')
# Add a new column "Category" and the set the values based on the time slot
df['Category'] = df['Time'].apply(cat)
# Create a pivot table based on the "Category" column
pt = df.pivot_table(index='Category', aggfunc=sum, fill_value=0)
# Convert the pivot table to a dictionary to get the json output you want
jsonoutput = pt.to_dict
# print(jsonoutput)
outfile.write(str(jsonoutput))
line = f.readline()
count_line+=1
print(count_line)
outfile.close()
f.close()
# Define a function to convert the time slots to the categories
def cat(time_slot):
if '06:00' <= time_slot < '12:00':
return 'Morning'
elif '12:00' <= time_slot < '17:00':
return 'Afternoon'
elif '17:00' <= time_slot < '23:00':
return 'Evening'
else:
return 'Night'
I was wondering if it was possible to remove the memory location from the output file in some way?
Any advice is appreciated and please let me know if you require any more information.
Thank you for reading

The way you're working with JSON seems like streaming it, which is an unpleasant problem to deal with.
If you're not working with a terribly big JSON file, you're better off with
with open("input.json", "r") as input_json:
json_data = json.load(input_json)
And then extract specific entries from json_data as you wish (just remember it is a dictionary), manipulate them and populate an output dict intended to be saved
Also, in python if you're using a with open(...) syntax, you don't need to close the file afterwards

Problem 1: missing parenthesis after to_dict, which causes this "memory address".
Problem 2: to produce a valid JSON, you will also need to wrap the output into an array
Problem 3: converting JSON to/from string is not safe with str or eval. Use json.loads() and .dumps()
import json
...
line_chunks = []
outfile.write("[")
while line:
...
jsondict = json.loads(data) # problem 3
...
jsonoutput = pt.to_dict() # problem 1
...
outfile.write(json.dumps(line_chunks)) # problems 2 and 3

Related

Comparing and updating CSV files using lists

I'm writing something that will take two CSV's: #1 is a list of email's with # received for each, #2 is a catalog of every email addr on record, with a # of received emails per reporting period with date annotated at top of column.
import csv
from datetime import datetime
datestring = datetime.strftime(datetime.now(), '%m-%d')
storedEmails = []
newEmails = []
sortedList = []
holderList = []
with open('working.csv', 'r') as newLines, open('archive.csv', 'r') as oldLines: #readers to make lists
f1 = csv.reader(newLines, delimiter=',')
f2 = csv.reader(oldLines, delimiter=',')
print ('Processing new data...')
for row in f2:
storedEmails.append(list(row)) #add archived data to a list
storedEmails[0].append(datestring) #append header row with new date column
for col in f1:
if col[1] == 'email' and col[2] == 'To Address': #new list containing new email data
newEmails.append(list(col))
counter = len(newEmails)
n = len(storedEmails[0]) #using header row len to fill zeros if no email received
print(storedEmails[0])
print (n)
print ('Updating email lists and tallies, this could take a minute...')
with open ('archive.csv', 'w', newline='') as toWrite: #writer to overwrite old csv
writer = csv.writer(toWrite, delimiter=',')
for i in newEmails:
del i[:3] #strip useless identifiers from data
if int(i[1]) > 30: #only keep emails with sufficient traffic
sortedList.append(i) #add these emails to new sorted list
for i in storedEmails:
for entry in sortedList: #compare stored emails with the new emails, on match append row with new # of emails
if i[0] == entry[0]:
i.append(entry[1])
counter -=1
else:
holderList.append(entry) #if no match, it is a new email that meets criteria to land itself on the list
break #break inner loop after iteration of outer email, to move to next email and avoid multiple entries
storedEmails = storedEmails + holderList #combine lists for archived csv rewrite
for i in storedEmails:
if len(i) < n:
i.append('0') #if email on list but didnt have any activity this period, append with 0 to keep records intact
writer.writerow(i)
print('SortedList', sortedList)
print (len(sortedList))
print('storedEmails', storedEmails)
print(len(storedEmails))
print('holderList',holderList)
print(len(holderList))
print ('There are', counter, 'new emails being added to the list.')
print ('All done!')
CSV's will look similar to this.
working.csv:
1,asdf#email.com,'to address',31
2,fsda#email.com,'to address',19
3,zxcv#email.com,'to address',117
4,qwer#gmail.com,'to address',92
5,uiop#fmail.com,'to address',11
archive.csv:
date,01-sep
asdf#email.com,154
fsda#email.com,128
qwer#gmail.com,77
ffff#xmail.com,63
What I want after processing is:
date,01-sep,27-sep
asdf#email.com,154,31
fsda#email.com,128,19
qwer#gmail.com,77,92
ffff#xmail.com,63,0
zxcv#email.com,0,117
I'm not sure where I've gone wrong at - but it keeps producing duplicate entries. Some of the functionality is there but I've been at it for too long and I'm getting tunnel vision trying to figure out what I have done wrong with my loops.
I know my zero-filler section in the end is wrong as well, as it will append onto the end of a newly created record instead of populating zero's up to its first appearance.
I'm sure there are far more efficient ways to do this, I'm new to programming so its probably overly complicated and messy - initially I tried to compare CSV to CSV and realized that wasnt possible since you cant read and write at the same time, so I attempted to convert to using lists, which I also know wont work forever due to memory limitations when the list gets big.
-EDIT-
Using Trenton's Panda's solution:
I ran a script on working.csv so it instead produces the following:
asdf#email.com,1000
bsdf#gmail.com,500
xyz#fmail.com,9999
I have modified your solution to reflect this change:
import pandas as pd
from datetime import datetime
import csv
# get the date string
datestring = datetime.strftime(datetime.now(), '%d-%b')
# filter original list to grab only emails of interest
with open ('working.csv', 'r') as fr, open ('writer.csv', 'w', newline='') as fw:
reader = csv.reader(fr, delimiter=',')
writer = csv.writer(fw, delimiter=',')
for row in reader:
if row[1] == 'Email' and row[2] == 'To Address':
writer.writerow([row[3], row[4]])
# read archive
arch = pd.read_csv('archive.csv')
# rename columns
arch.rename(columns={'email': 'date'}, inplace=True)
# read working, but only the two columns that are needed
working = pd.read_csv('writer.csv', header=None, usecols=[0, 1]) # I assume usecols isnt necessery anymore, but I'm not sure
# rename columns
working.rename(columns={0: 'email', 1: datestring}, inplace=True)
# only emails greater than 30 or already in arch
working = working[(working[datestring] > 30) | (working.email.isin(arch.email))]
# merge
arch_updated = pd.merge(arch, working, on='email', how='outer').fillna(0)
# save to csv
arch_updated.to_csv('archive.csv', index=False)
I apparently still have no idea how this works because I'm now getting :
Traceback (most recent call last):
File "---/agsdga.py", line 29, in <module>
working = working[(working[datestring] > 30) | (working.email.isin(arch.email))]
File "---\Python\Python38-32\lib\site-packages\pandas\core\generic.py", line 5130, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'email'
Process finished with exit code 1
-UPDATE-
It is working now as:
import pandas as pd
from datetime import datetime
import csv
# get the date string
datestring = datetime.strftime(datetime.now(), '%d-%b')
with open ('working.csv', 'r') as fr, open ('writer.csv', 'w', newline='') as fw:
reader = csv.reader(fr, delimiter=',')
writer = csv.writer(fw, delimiter=',')
for row in reader:
if row[1] == 'Email' and row[2] == 'To Address':
writer.writerow([row[3], row[4]])
# read archive
arch = pd.read_csv('archive.csv')
# rename columns
arch.rename(columns={'date': 'email'}, inplace=True)
# read working, but only the two columns that are needed
working = pd.read_csv('writer.csv', header=None, usecols=[0, 1])
# rename columns
working.rename(columns={0: 'email', 1: datestring}, inplace=True)
# only emails greater than 30 or already in arch
working = working[(working[datestring] > 30) | (working.email.isin(arch.email))]
# merge
arch_updated = pd.merge(arch, working, on='email', how='outer').fillna(0)
# save to csv
arch_updated.to_csv('archive.csv', index=False)
The errors above were caused because I changed
arch.rename(columns={'date': 'email'}, inplace=True)
to
arch.rename(columns={'email': 'date'}, inplace=True)
I ran into further complications because I stripped the header row from the test archive because I didnt think the header mattered, even with header=None I still got issues. I'm still not clear why the header is so important when we are assigning our own values to the columns for purposes of the dataframe, but its working now. Thanks for all the help!
I'd load the data with pandas.read_csv
.rename some columns
Renaming the columns in working, is dependent upon the column index, since working.csv has no column headers.
When the working dataframe is created, look at the dataframe to verify the correct columns have been loaded, and the correct column index is being used for renaming.
The date column of arch should really be email, because headers identify what's below them, not the other column headers.
Once the column name has been changed in archive.csv, then rename won't be required any longer.
pandas.merge on the email column.
Since both dataframes have a column renamed with email, the merged result will only have one email column.
If the merge occurs on two different column names, then the result will have two columns containing email addresses.
pandas: Merge, join, concatenate and compare
As long as the columns in the files are consistent, this should work without modification
import pandas as pd
from datetime import datetime
# get the date string
datestring = datetime.strftime(datetime.now(), '%d-%b')
# read archive
arch = pd.read_csv('archive.csv')
# rename columns
arch.rename(columns={'date': 'email'}, inplace=True)
# read working, but only the two columns that are needed
working = pd.read_csv('working.csv', header=None, usecols=[1, 3])
# rename columns
working.rename(columns={1: 'email', 3: datestring}, inplace=True)
# only emails greater than 30 or already in arch
working = working[(working[datestring] > 30) | (working.email.isin(arch.email))]
# merge
arch_updated = pd.merge(arch, working, on='email', how='outer').fillna(0)
# save to csv
arch_updated.to_csv('archive.csv', index=False)
# display(arch_updated)
email 01-sep 27-Aug
asdf#email.com 154.0 31.0
fsda#email.com 128.0 19.0
qwer#gmail.com 77.0 92.0
ffff#xmail.com 63.0 0.0
zxcv#email.com 0.0 117.0
So, the problem is you have two sets of data. Both have the data stored with a "key" entry (the emails) and additional piece of data that you want condensed down to one storage. Identifying that there is a similar "key" for both of these sets of data simplifies this greatly.
Imagine each key as being the name of a bucket. Each bucket needs two pieces of info, one piece from one csv and the other piece from the other csv.
Now, I must take a small detour to explain a dictionary in python. Here is a definition stolen from here
A dictionary is a collection which is unordered, changeable and indexed.
A collection is a container like a list that holds data. Unordered and indexed means that the dictionary is not accessible like a list where the data is accessible by the index. In this case, the dictionary is accessed using keys, which can be anything like a string or a number (technically the key must be hashable, but thats too indepth). And finally changeable means that the dictionary can actually have its the stored data changed (once again, oversimplified).
Example:
dictionary = dict()
key = "Something like a string or a number!"
dictionary[key] = "any kind of value can be stored here! Even lists and other dictionaries!"
print(dictionary[key]) # Would print the above string
Here is the structure that I suggest you use instead of most of your lists:
dictionary[email] = [item1, item2]
This way, you can avoid using multiple lists and massively simplifying your code. If you are still iffy on the usage of dictionaries, there are a lot of articles and videos on the usage of them. Good luck!

Filling Data to pandas dataframe using loop with text file that have missing data

I am working on large log files (4 Gig) with 1000 of variables (ABCD,GFHTI,AAAA,BBBB,...)but I am only interested in 50 of these variables (ABCD,GFHTI,..). The structure of the log file is as follow:
20100101_00:01:33.436-92.451 BLACKBOX ABCD ref 2183 value 24
20100101_00:01:33.638-92.651 BLACKBOX GFHTI ref 2183 value 25
20100101_00:01:33.817-92.851 BLACKBOX AAAA ref 2183 value 26 (Not interested in this one)
20100101_00:01:34.017-93.051 BLACKBOX BBBB ref 2183 value 27 (Not interested
in this one)
I am trying to make a pandas data frame out of the this log file which look like this.
Time ABCD GFHTI
20100101_00:01:33.436-92.451 24 NaN
20100101_00:01:33.638-92.651 NaN 25
I could do this by using loop and appending to pandas data frame but that is not very efficient. I can find the value and dates of the value of the interest in the log files but I don't know how to put NaN for the rest of variables for that specific date and time and at the end convert it to a data frame.
I really appreciate if anyone can help.
Here is part of my code
ListOfData={}
fruit={ABCD, GFHTI}
for file in FileList:
i=i+1
thefile = open('CleanLog'+str(i)+'.txt', 'w')
with open(file,'rt') as in_file:
i=0
for linenum, line in enumerate(in_file): # Keep track of line numbers.
if fruit.search(line) != None:# If substring search finds a match,
i=i+1
Loc=(fruit.search(line))
d = [{'Time': line[0:17], Loc.group(0): line[Loc.span()[1]:-1]}]
for word in Key:
if word == Loc.group(0):
ListOfData.append(d)
you can parse the log file and only return information of interest to the DataFrame constructor
to parse the log lines, I'm using regex here, but the actual parsing function should depend on your log format, also I assume the log file is in the path log.txt relative to where this script is run.
import pandas as pd
import re
def parse_line(line):
code_pattern = r'(?<=BLACKBOX )\w+'
value_pattern = r'(?<=value )\d+'
code = re.findall(code_pattern, line)[0]
value = re.findall(value_pattern, line)[0]
ts = line.split()[0]
return ts, code, value
def parse_filter_logfile(fname):
with open(fname) as f:
for line in f:
data = parse_line(line)
if data[1] in ['ABCD', 'GFHTI']:
# only yield rows that match the filter
yield data
Then feed that generator to construct a data frame
logparser = parse_filter_logfile('log.txt')
df = pd.DataFrame(logparser, columns = ['Time', 'Code', 'Value'])
finally, pivot the data frame using either of the two statements below
df.pivot(index='Time', columns='Code')
df.set_index(['Time', 'Code']).unstack(-1)
outputs the following:
Value
Code ABCD GFHTI
Time
20100101_00:01:33.436-92.451 24 None
20100101_00:01:33.638-92.651 None 25
Hopefully you have enough information to tackle your log file. The tricky part here is dealing with the log line parsing, and you'd have to adapt my example function to get it right.
When you work with pandas, there is no need to read the file by hand in a loop:
data = pd.read_csv('CleanLog.txt', sep='\s+', header=None)
Use time (#0) and variable name (#2) as index, keep the column with variable values (#6).
columns_of_interest = ['ABCD','GFHTI']
data.set_index([0,2])[6].unstack()[columns_of_interest].dropna(how='all')
#2 ABCD GFHTI
#0
#20100101_00:01:33.436-92.451 24.0 NaN
#20100101_00:01:33.638-92.651 NaN 25.0

Pandas skipping malformed line in csv

I am trying to read a csv file with pandas. the file is very long and malformed in the middle like so
Date,Received Date,Tkr,Theta,Wid,Per
2007-08-03,2017/02/13 05:30:G,F,B,A,1
2007-08-06,2017/02/13 05:30:G,F,A,B,1
2007-08-07,2017/02/13 05:30:G,F,A,B,1
2007-08-,nan,,,,
2000-05-30 00:00:00,2017/02/14 05:30:F,D,10,1,1
2000-05-31 00:00:00,2017/02/14 05:30:F,D,10,1,1
My line which is failing is this:
full_frame = pd.read_csv(path, parse_dates=["Date"],error_bad_lines=False).set_index("Date").sort_index()[:date]
with the error
TypeError: unorderable types: str() > datetime.datetime()
File "/A/B/C.py", line 236, in load_ex
full_frame = pd.read_csv(path, parse_dates=["Date"],error_bad_lines=False).set_index("Date").sort_index()[:date]
date is just a variable that holds a given input date.
This is happening because of the broken line in the middle. I have tried to do
error_bad_line=False but that wont prevent my script from failing.
When i take out the bad line from my csv and run it, it works fine. This csv will be used as an input and I cant modify it at source so I was wondering if there is a way to skip a line based on length of the line in the csv in pandas or something else I can do to make it work without duplicating/modifyng the file
UPDATE
The bad line is stored in my data frame if i simply do a
read_csv
as 2007-08- NaN NaN NaN NaN NaN
UPDATE 2:
if i try to just do
full_frame = pd.read_csv(path, parse_dates=["Date"],error_bad_lines=False)
full_frame = full_frame.dropna(how="any")
# this drops the NaN row for sure
full_frame = full_frame.set_index("Date").sort_index()[:date]
still gives same error :(
So I gave this a quick shot. Your data has inconsistencies which should may be of concern to you for your analysis, and you should investigate. Analysis is only as good as that data quality is.
Here's some code (not the best, but gets the job mostly done)
First, since your data needs some work, I read it in as raw text. Then I write a function to parse the dates. I collect the columns in one list, and the rest of the data in another.
For all the data that needs to have dates, I loop over the data 1 line at a time and pass it through parse_dates.
parse_dates works by reading in a list, grabbing the first item in the list (the date part) then trying to convert it from a simple string to a date. Since not all are datetime, I only grab the first 10 bytes for just dates.
Once I have a cleaner data, I pass it through pandas and obtain a dataframe. Then I set the date to the index. This could be improved upon but given that this is not my job, I'll let you do the rest.
import pandas as pd
import datetime as dt
rawdata = []
with open("test.dat", "r") as stuff:
for line in stuff:
line1 = line[:-1]
rawdata.append(line1.split(","))
def parse_dates(line):
datepart = line[0][:10] ## get the date-time, and for the date-time, only get the date part
## since not all rows have date + time, cut it down to date
try:
result = dt.datetime.strptime(datepart, "%Y-%m-%d") ## try converting to date
except ValueError:
result = None
line[0] = result ## update
return line
cols = rawdata[0]
data = rawdata[1:]
print data
data = [parse_dates(line) for line in data]
print data
df = pd.DataFrame(data = data, columns = cols)
print df
df.index = df['Date']
Also, a simple Google search shows plenty of ways of handling dates with Python+pandas. Here is one link I found:
https://chrisalbon.com/python/strings_to_datetime.html

searching for a value in csv and returning the row multiple times

i'm a noob trying to learn python,
i am trying to write a script for a CSV file that has 30,000 rows of data.
i would like to look through every row for a number in a column and return the row every time it finds that number.
i have searched and tried many different suggestion and they don't seem to do what i need it to can anyone help me, if i'm not making sense please let me know.
here is what i have so far and it is only returning to headers:
import csv
with open("test.csv", "r") as input, open ("result.txt","w") as result:
testfilereader = csv.DictReader(input)
Age = 23
fieldnames = testfilereader.fieldnames
testfilewriter = csv.DictWriter(result, fieldnames, delimiter=',',)
testfilewriter.writeheader()
for row in testfilereader:
for field in row:
if field == Age:
testfilewriter(row)
input.close
thanks all
You can use Pandas as follows:
csv file:
Id,Name,Age
1,John,30
2,Alex,20
3,Albert,30
4,Richard,30
5,Mariah,30
python:
import pandas as pd
df = pd.read_csv("ex.csv", sep = ",")
print df[df["Age"] == 30]
Id Name Age
0 1 John 30
2 3 Albert 30
3 4 Richard 30
4 5 Mariah 30
You can use the pandas module which is made for processing tabular data.
First: read your csv into a so called DataFrame:
import pandas as pd
df = pd.read_csv("test.csv")
Now you can filter the rows that you need by logical indexing:
result = df[df['Age']==23]
To get the result back onto disk just use the to_csv method:
result.to_csv('result.csv')
Since you used DictFileReader, you get a list of dictionaries.
so you should search for the age in the field you want by using dictionary['field'].
like this:
with open("test.csv", "r") as input, open ("result.txt","w") as result:
testfilereader = csv.DictReader(input)
Age = 23
fieldnames = testfilereader.fieldnames()
testfilewriter = csv.DictWriter(result, fieldnames, delimiter=',',)
testfilewriter.writeheader()
for row in testfilereader:
if row['Age'] == Age:
testfilewriter.writerow(row)
Of course, if the field name is something else you need to change row['Age'] to row['Somethingelse'].
If you just want ot iterate over the values you should use testfilereader.values(), but then there would be no point in getting the data mapped to a dictionary in the fist place.
You also shouldn't try to close the input there. It will be closed when it leaves the with open... block.
hi all thank you for all your posts i have had some trouble with my computer and installing panda so i had to try another way and this has worked for me.
import csv
import sys
number = '5'
csv_file = csv.reader(open('Alerts.csv', "rb"), delimiter=",")
filename = open("Result.txt",'w')
sys.stdout =filename
#loop through csv list
for row in csv_file:
if number == row[0]:
print row

Python CSV - Check if index is equal on different rows

I'm trying to create code that checks if the value in the index column of a CSV is equivalent in different rows, and if so, find the most occurring values in the other columns and use those as the final data. Not a very good explanation, basically I want to take this data.csv:
customer_ID,month,time,A,B,C
1003,Jan,2:00,1,1,4
1003,Jul,2:00,1,1,3
1003,Jan,2:00,1,1,4
1004,Feb,8:00,2,5,1
1004,Jul,8:00,2,4,1
And create a new answer.csv that recognizes that there are multiple rows for the same customer, so it finds the values that occur the most in each column and outputs those into one row:
customer_ID,month,ABC
1003,Jan,114
1004,Feb,251
I'd also like to learn that if there are values with the same number of occurrences (Month and B for customer 1004) how can I choose which one I want to be outputted?
I've currently written (thanks to Andy Hayden on a previous question I just asked):
import pandas as pd
df = pd.read_csv('data.csv', index_col='customer_ID')
res = df[list('ABC')].astype(str).sum(1)
print df
res.to_frame(name='answer').to_csv('answer.csv')
All this does, however, is create this (I was ignoring month previously, but now I'd like to incorporate it so that I can learn how to not only find the mode of a column of numbers, but also the most occurring string):
customer_ID,ABC
1003,114.0
1003,113.0
1003,114.0
1004,251.0
1004,241.0
Note: I don't know why it is outputting the .0 at the end of the ABC, it seems to be in the wrong variable format. I want each column to be outputted as just the 3 digit number.
Edit: I'm also having an issue that if the value in column A is 0 then the output becomes 2 digits and does not incorporate the leading 0.
What about something like this? This is not using Pandas though, I am not a Pandas expert.
from collections import Counter
dataDict = {}
# Read the csv file, line by line
with open('data.csv', 'r') as dataFile:
for line in dataFile:
# split the line by ',' since it is a csv file...
entry = line.split(',')
# Check to make sure that there is data in the line
if entry and len(entry[0])>0:
# if the customer_id is not in dataDict, add it
if entry[0] not in dataDict:
dataDict[entry[0]] = {'month':[entry[1]],
'time':[entry[2]],
'ABC':[''.join(entry[3:])],
}
# customer_id is already in dataDict, add values
else:
dataDict[entry[0]]['month'].append(entry[1])
dataDict[entry[0]]['time'].append(entry[2])
dataDict[entry[0]]['ABC'].append(''.join(entry[3:]))
# Now write the output file
with open('out.csv','w') as f:
# Loop through sorted customers
for customer in sorted(dataDict.keys()):
# use Counter to find the most common entries
commonMonth = Counter(dataDict[customer]['month']).most_common()[0][0]
commonTime = Counter(dataDict[customer]['time']).most_common()[0][0]
commonABC = Counter(dataDict[customer]['ABC']).most_common()[0][0]
# Write the line to the csv file
f.write(','.join([customer, commonMonth, commonTime, commonABC, '\n']))
It generates a file called out.csv that looks like this:
1003,Jan,2:00,114,
1004,Feb,8:00,251,
customer_ID,month,time,ABC,

Categories