I am looping through a json line files where i am just filtering for sender id and status nd outputting this to the terminal. There are multiple_sender id which are within a list whilst the sender is are just a string. I want to be able to write the output on one csv file where the first column is STATUS and the second one is SENDER_ID. I have attempted this at the top of my script but not sure if this is the right way of doing so.
My script is as follows. At which point would i need to write it to csv.I have read through the documentation but still a little unsure.
import json_lines
text_file = open("senderv1.csv", "a")
with open('specifications.jsonl', 'rb') as f:
for item in json_lines.reader(f):
Using pandas you can create the dataframe and thereby save it as csv. Hope this will solve your problem.
import json_lines
import pandas as pd
# text_file = open("senderv1.csv", "a")
single_sender_status=[]
single_sender=[]
with open('specifications.jsonl', 'rb') as f:
for item in json_lines.reader(f):
if 'sender_id' in item:
single_sender_status.append(item['status'])
single_sender.append(item['sender_id'])
# text_file.write(single_sender_status)
# text_file.write('\t')
# text_file.write(single_sender)
# text_file.write('\n')
# print("Single ID " + str(single_sender))
else:
single_sender_status.append(item['status'])
single_sender.append([sender['id'] for sender in item['senders']])
# text_file.write(single_sender_status)
# text_file.write('\t')
# text_file.write(multiple_sender_ids)
# print("Multiple Sender ID'S " + str(multiple_sender_ids))
df=pd.DataFrame({'STATUS':single_sender_status,'SENDER_ID':single_sender})
df.to_csv('senderv1.csv',index=False)
Here is code to write a CSV file with the csv module from the standard library. If the first column contains the status and the following columns the senders:
#!/usr/bin/env python3
import csv
import json_lines
def main():
with json_lines.open("specifications.jsonl") as reader:
with open("senderv1.csv", "w", encoding="utf8") as csv_file:
writer = csv.writer(csv_file, delimiter="\t")
for item in reader:
row = [item["status"]]
if "sender_id" in item:
row.append(item["sender_id"])
elif "senders" in item:
row.extend(sender["id"] for sender in item["senders"])
else:
raise ValueError("item with no sender information")
writer.writerow(row)
if __name__ == "__main__":
main()
To have the same information spread across different columns isn't really good, but putting more than one value into a single cell isn't good either. CSV is best suited for two dimensional tabular data. Maybe you want JSON (Lines) for the result too‽
Related
I have a folder that contains 60 folders, each of which contains about 60 CSVs (and 1 or 2 non-CSVs).
I need to compare the header rows of all of these CSVs, so I am trying to go through the directories and write to an output CSV (1) the filepath of the file in question and (2) the header row in the subsequent cells in the row in the output CSV.
Then go to the next file, and write the same information in the next row of the output CSV.
I am lost in the part where I am writing the header rows to the CSV -- and am too lost to have even generated an error message.
Can anyone advise on what to do next?
import os
import sys
import csv
csvfile = '/Users/username/Documents/output.csv'
def main(args):
# Open a CSV for writing outputs to
with open(csvfile, 'w') as out:
writer = csv.writer(out, lineterminator='\n')
# Walk through the directory specified in cmd line
for root, dirs, files in os.walk(args):
for item in files:
# Check if the item is a CSV
if item.endswith('.csv'):
# If yes, read the first row
with open(item, newline='') as f:
reader = csv.reader(f)
row1 = next(reader)
# Write the first cell as the file name
f.write(os.path.realpath(item))
f.write(f.readline())
f.write('\n')
# Write this row to a new line in the csvfile var
# Go to next file
# If not a CSV, go to next file
else:
continue
# Write each file to the CSV
# writer.writerow([item])
if __name__ == '__main__':
main(sys.argv[1])
IIUC you need a new csv file with 2 columns: file_path and headers.
If the header that you need is just a list of column names from that csv, then it will be easier if you use a pandas dataframe to store these values first and then write the dataframe to a csv.
import pandas as pd
res = []
for root, dirs, files in os.walk(args):
for item in files:
# Check if the item is a CSV
if item.endswith('.csv'):
# If yes, read the first row
df = pd.read_csv(item)
row = {}
row['file_path'] = os.path.realpath(item)
row['headers'] = df.columns
res.append(row)
res_df = pd.DataFrame(res)
res_df.to_csv(csvfile)
You seem to be getting confused between which file you're reading and writing to. Confusion is normal when you try to do everything in one big function. The whole point of functions is to break things down so it's easy to follow, understand and debug.
Here is some code, which doesn't work, but you can easily print out what each function is returning, and once you know that's correct, you feed it to the next function. Each function is small, with very few variables, so not much can go wrong.
And most importantly, the variables in each function are local to it, meaning they cannot interfere with what's happening elsewhere, or even confuse you into thinking they might be interfering (and that makes a huge difference).
def collect_csv_data():
results = []
for root, dirs, files in os.walk(args):
for file in files:
if file.endswith('.csv'):
headers = extract_headers(os.path.join(root, file))
results.append((file, headers))
return results
def extract_headers(filepath):
with open(filepath) as f:
reader = csv.reader(f)
headers = reader.next()
return headers
def write_results(result, filepath):
with open(filepath, 'w') as f:
writer = csv.writer(f)
for result in results:
writer.writerow(result)
if __name__ == '__main__':
directory = sys.argv[1]
results = collect_csv_data(directory)
write_results(results, 'results.csv')
I am new to python. I have a .csv file which has 13 columns. I want to round off the floating values of the 2nd column which I was able to achieve successfully. I did this and stored it in a list. Now I am unable to figure out how to overwrite the rounded off values into the same csv file and into the same column i.e. column 2? I am using python3. Any help will be much appreciated.
My code is as follows:
Import statements for module import:
import csv
Creating an empty list:
list_string = []
Reading a csv file
with open('/home/user/Desktop/wine.csv', 'r') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter = ',')
next(csvReader, None)
for row in csvReader:
floatParse = float(row[1])
closestInteger = int(round(floatParse))
stringConvert = str(closestInteger)
list_string.append(stringConvert)
print(list_string)
Writing into the same csv file for the second column (Overwrites the entire Excel file)
with open('/home/user/Desktop/wine.csv', 'w') as csvDataFile:
writer = csv.writer(csvDataFile)
next(csvDataFile)
row[1] = list_string
writer.writerows(row[1])
PS: The writing into the csv overwrites the entire csv and removes all the other columns which I don't want. I just want to overwrite the 2nd column with rounded off values and keep the rest of the data same.
this might be what you're looking for.
import pandas as pd
import numpy as np
#Some sample data
data = {"Document_ID": [102994,51861,51879,38242,60880,76139,76139],
"SecondColumnName": [7.256,1.222,3.16547,4.145658,4.154656,6.12,17.1568],
}
wine = pd.DataFrame(data)
#This is how you'd read in your data
#wine = pd.read_csv('/home/user/Desktop/wine.csv')
#Replace the SecondColumnName with the real name
wine["SecondColumnName"] = wine["SecondColumnName"].map('{:,.2f}'.format)
#This will overwrite the sheet, but it will have all the data as before
wine.to_csv(/home/user/Desktop/wine.csv')
Pandas is way easier than read csv...I'd recommended checking it out.
I think this better answers the specific question. The key to this is to define an input_file and an output_file during the with part.
The StringIO part is just there for sample data in this example. newline='' is for Python 3. Without it, blank lines between each row appears in the output. More info.
import csv
from io import StringIO
s = '''A,B,C,D,E,F,G,H,I,J,K,L
1,4.4343,3,4,5,6,7,8,9,10,11
1,8.6775433,3,4,5,6,7,8,9,10,11
1,16.83389832,3,4,5,6,7,8,9,10,11
1,32.2711122,3,4,5,6,7,8,9,10,11
1,128.949483,3,4,5,6,7,8,9,10,11'''
list_string = []
with StringIO(s) as input_file, open('output_file.csv', 'w', newline='') as output_file:
reader = csv.reader(input_file)
next(reader, None)
writer = csv.writer(output_file)
for row in reader:
floatParse = float(row[1]) + 1
closestInteger = int(round(floatParse))
stringConvert = str(closestInteger)
row[1] = stringConvert
writer.writerow(row)
When I run the code below, I am only getting the first row (my name row) of my CSV file. What can I do to make sure the code below returns my entire CSV?
import csv
import pandas as pd
import numpy as np
def open_elves():
with open('elves.csv') as csvjawn:
readCS = csv.reader(csvjawn, delimiter = ',')
for row in readCS:
return row
x = pd.DataFrame(open_elves())
print (x)
Use the read_csv function provided by the Pandas api
df = pd.read_csv('elves.csv')
Return always quits the loop immediately. Try e.g.
def f():
for i in range(100):
return i
f()
What your flow should look like instead is something like:
with open('elves.csv') as csvjawn:
readCS = csv.reader(csvjawn, delimiter = ',')
data = [row for row in readCS]
This uses a list comprehension which you may want to research if you haven't seen before.
I wouldn't use return to accomplish what you want to do.
The response of stackoverflow's colleague (abarnert):
with open('old.csv', 'rb') as oldf, open('new.csv', 'wb') as newf:
old_reader = csv.reader(oldf)
writer = csv.writer(newt)
for row in old_reader:
writer.writerow(transform(row))
with open('new.csv', 'rb') as newf:
new_reader = csv.reader(newf)
for row in new_reader:
print row
url: CSV reader object not reading entire file [Python]
I have an Excel file(that I am exporting as a csv) that I want to parse, but I am having trouble with finding the best way to do it. The csv is a list of computers in my network, and what accounts are in the local administrator group for each one. I have done something similar with tuples, but the number of accounts for each computer range from 1 to 30. I want to build a list of lists, then go through each list to find the accounts that should be there(Administrator, etc.) and delete them, so that I can then export a list of only accounts that shouldn't be a local admin, but are. The csv file is formatted as follows:
"computer1" Administrator localadmin useraccount
"computer2" localadmin Administrator
"computer3" localadmin Administrator user2account
Any help would be appreciated
EDIT: Here is the code I am working with
import csv
import sys #used for passing in the argument
file_name = sys.argv[1] #filename is argument 1
with open(file_name, 'rU') as f: #opens PW file
reader = csv.reader(f)
data = list(list(rec) for rec in csv.reader(f, delimiter=',')) #reads csv into a list of lists
f.close() #close the csv
for i in range(len(data)):
print data[i][0] #this alone will print all the computer names
for j in range(len(data[i])) #Trying to run another for loop to print the usernames
print data[i][j]
The issue is with the second for loop. I want to be able to read across each line and for now, just print them.
This should get you on the right track:
import csv
import sys #used for passing in the argument
file_name = sys.argv[1] #filename is argument 1
with open(file_name, 'rU') as f: #opens PW file
reader = csv.reader(f)
data = list(list(rec) for rec in csv.reader(f, delimiter=',')) #reads csv into a list of lists
for row in data:
print row[0] #this alone will print all the computer names
for username in row: #Trying to run another for loop to print the usernames
print username
Last two lines will print all of the row (including the "computer"). Do
for x in range(1, len(row)):
print row[x]
... to avoid printing the computer twice.
Note that f.close() is not required when using the "with" construct because the resource will automatically be closed when the "with" block is exited.
Personally, I would just do:
import csv
import sys #used for passing in the argument
file_name = sys.argv[1] #filename is argument 1
with open(file_name, 'rU') as f: #opens PW file
reader = csv.reader(f)
# Print every value of every row.
for row in reader:
for value in row:
print value
That's a reasonable way to iterate through the data and should give you a firm basis to add whatever further logic is required.
This is how I opened a .csv file and imported columns of data as numpy arrays - naturally, you don't need numpy arrays, but...
data = {}
app = QApplication( sys.argv )
fname = unicode ( QFileDialog.getOpenFileName() )
app.quit()
filename = fname.strip('.csv') + ' for release.csv'
#open the file and skip the first two rows of data
imported_array = np.loadtxt(fname, delimiter=',', skiprows = 2)
data = {'time_s':imported_array[:,0]}
data['Speed_RPM'] = imported_array[:,1]
It can be done using the pandas library.
import pandas as pd
df = pd.read_csv(filename)
list_of_lists = df.values.tolist()
This approach applies to other kinds of data like .tsv, etc.
I have a large csv file in which some rows are entirely blank. How do I use Python to delete all blank rows from the csv?
After all your suggestions, this is what I have so far
import csv
# open input csv for reading
inputCSV = open(r'C:\input.csv', 'rb')
# create output csv for writing
outputCSV = open(r'C:\OUTPUT.csv', 'wb')
# prepare output csv for appending
appendCSV = open(r'C:\OUTPUT.csv', 'ab')
# create reader object
cr = csv.reader(inputCSV, dialect = 'excel')
# create writer object
cw = csv.writer(outputCSV, dialect = 'excel')
# create writer object for append
ca = csv.writer(appendCSV, dialect = 'excel')
# add pre-defined fields
cw.writerow(['FIELD1_','FIELD2_','FIELD3_','FIELD4_'])
# delete existing field names in input CSV
# ???????????????????????????
# loop through input csv, check for blanks, and write all changes to append csv
for row in cr:
if row or any(row) or any(field.strip() for field in row):
ca.writerow(row)
# close files
inputCSV.close()
outputCSV.close()
appendCSV.close()
Is this ok or is there a better way to do this?
Use the csv module:
import csv
...
with open(in_fnam, newline='') as in_file:
with open(out_fnam, 'w', newline='') as out_file:
writer = csv.writer(out_file)
for row in csv.reader(in_file):
if row:
writer.writerow(row)
If you also need to remove rows where all of the fields are empty, change the if row: line to:
if any(row):
And if you also want to treat fields that consist of only whitespace as empty you can replace it with:
if any(field.strip() for field in row):
Note that in Python 2.x and earlier, the csv module expected binary files, and so you'd need to open your files with e 'b' flag. In 3.x, doing this will result in an error.
Surprised that nobody here mentioned pandas. Here is a possible solution.
import pandas as pd
df = pd.read_csv('input.csv')
df.to_csv('output.csv', index=False)
Delete empty row from .csv file using python
import csv
...
with open('demo004.csv') as input, open('demo005.csv', 'w', newline='') as output:
writer = csv.writer(output)
for row in csv.reader(input):
if any(field.strip() for field in row):
writer.writerow(row)
Thankyou
You have to open a second file, write all non blank lines to it, delete the original file and rename the second file to the original name.
EDIT: a real blank line will be like '\n':
for line in f1.readlines():
if line.strip() == '':
continue
f2.write(line)
a line with all blank fields would look like ',,,,,\n'. If you consider this a blank line:
for line in f1.readlines():
if ''.join(line.split(',')).strip() == '':
continue
f2.write(line)
openning, closing, deleting and renaming the files is left as an exercise for you. (hint: import os, help(open), help(os.rename), help(os.unlink))
EDIT2: Laurence Gonsalves brought to my attention that a valid csv file could have blank lines embedded in quoted csv fields, like 1, 'this\n\nis tricky',123.45. In this case the csv module will take care of that for you. I'm sorry Laurence, your answer deserved to be accepted. The csv module will also address the concerns about a line like "","",""\n.
Doing it with pandas is very simple. Open your csv file with pandas:
import pandas as pd
df = pd.read_csv("example.csv")
#checking the number of empty rows in th csv file
print (df.isnull().sum())
#Droping the empty rows
modifiedDF = df.dropna()
#Saving it to the csv file
modifiedDF.to_csv('modifiedExample.csv',index=False)
python code for remove blank line from csv file without create another file.
def ReadWriteconfig_file(file):
try:
file_object = open(file, 'r')
lines = csv.reader(file_object, delimiter=',', quotechar='"')
flag = 0
data=[]
for line in lines:
if line == []:
flag =1
continue
else:
data.append(line)
file_object.close()
if flag ==1: #if blank line is present in file
file_object = open(file, 'w')
for line in data:
str1 = ','.join(line)
file_object.write(str1+"\n")
file_object.close()
except Exception,e:
print e
Here is a solution using pandas that removes blank rows.
import pandas as pd
df = pd.read_csv('input.csv')
df.dropna(axis=0, how='all',inplace=True)
df.to_csv('output.csv', index=False)
I need to do this but not have a blank row written at the end of the CSV file like this code unfortunately does (which is also what Excel does if you Save-> .csv). My (even simpler) code using the CSV module does this too:
import csv
input = open("M51_csv_proc.csv", 'rb')
output = open("dumpFile.csv", 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
writer.writerow(row)
input.close()
output.close()
M51_csv_proc.csv has exactly 125 rows; the program always outputs 126 rows, the last one being blank.
I've been through all these threads any nothing seems to change this behaviour.
In this script all the CR / CRLF are removed from a CSV file then has lines like this:
"My name";mail#mail.com;"This is a comment.
Thanks!"
Execute the script https://github.com/eoconsulting/lr2excelcsv/blob/master/lr2excelcsv.py
Result (in Excel CSV format):
"My name",mail#mail.com,"This is a comment. Thanks!"
Replace the PATH_TO_YOUR_CSV with your
import pandas as pd
df = pd.read_csv('PATH_TO_YOUR_CSV')
new_df = df.dropna()
df.dropna().to_csv('output.csv', index=False)
or in-line:
import pandas as pd
pd.read_csv('data.csv').dropna().to_csv('output.csv', index=False)
I had the same, problem.
I converted the .csv file to a dataframe and after that I converted the dataframe back to the .csv file.
The initial .csv file with the blank lines was the 'csv_file_logger2.csv' .
So, i do the following process
import csv
import pandas as pd
df=pd.read_csv('csv_file_logger2.csv')
df.to_csv('out2.csv',index = False)