Convert file from one CSV format to another - python

I have two CSV files. When I open them with Notepad++, I see the differences between them.
This is the first CSV file:
"Type" "Id"
"Task" "170033"
"Task" "170256"
"Task" "170263"
This is second CSV file:
Type,Id
Task,170033
Task,170256
Task,170263
What is the difference, and how can I change the first one to the second one using Python?

You can do this with the built-in Python library csv:
import csv
with open('out.csv', 'w') as csvfile_write:
writer = csv.writer(csvfile_write, delimiter=',', lineterminator='\n')
with open('in.csv') as csvfile_read:
reader = csv.reader(csvfile_read, delimiter=' ', quotechar='"')
for row in reader:
writer.writerow(row)

The first file is space separated and each item quoated.
The second file is comma separated.
To convert just use the following code:
out = open('02.csv','w')
with open('01.csv') as f:
for line in f:
new_str = ','.join(list(map(lambda x: x.strip('"'),line.split())))
out.write(new_str + "\n")
out.close()
Just create a new file
Open the first file and iterate through line
convert each line, split fields, strip quoates, and join them again
with comma
Write result to new file
Don't forget to close the writable file at the end

Related

Need help in extracting data from csv and writing to a text file

I have a csv with two columns of data. I want to extract data from one column and write to a text file with single-quote on each element and separated by a comma. For example, I have this..
taxable_entity_id,id
45efc167-9254-406c-b5a8-6aef91a73dd9,331999
5ae97680-f489-4182-9dcb-eb07a73fab15,103507
00018d93-ae71-4367-a0da-f252cea4dfa2,32991
I want all the taxable_entity_ids in a text file like this
'45efc167-9254-406c-b5a8-6aef91a73dd9','5ae97680-f489-4182-9dcb-eb07a73fab15','00018d93-ae71-4367-a0da-f252cea4dfa2'
without any space between two elements, separated by a comma.
Edit:
This is what i tried..
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_MINIMAL)
for row in reader:
writer.writerow(row["taxable_entity_id"])
# print(row["taxable_entity_id"])
text_file.close()
csv_File.close()
and this is what I got..
4,5,e,f,c,1,6,7,-,9,2,5,4,-,4,0,6,c,-,b,5,a,8,-,6,a,e,f,9,1,a,7,3,d,d,9
5,a,e,9,7,6,8,0,-,f,4,8,9,-,4,1,8,2,-,9,d,c,b,-,e,b,0,7,a,7,3,f,a,b,1,5
0,0,0,1,8,d,9,3,-,a,e,7,1,-,4,3,6,7,-,a,0,d,a,-,f,2,5,2,c,e,a,4,d,f,a,2
You were close. Simply as you want one single line in the output file, you should write it at once by using a comprehension:
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
# use QUOTE_ALL to force the quoting
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_ALL)
writer.writerow((row["taxable_entity_id"] for row in reader))
And do not use close as you have (correctly) used with.
try that
import pandas as pd
df = pd.read_csv('nameoffile.csv',delimiter = ',')
X = df[0].values
f = open('newfile.txt','w')
for i in X:
f.write(X[i] + ',')
f.close()
It's seems a little odd that you basically want a one row csv file for the taxable_entity_ids, but certain possible. You also don't need to explicitly close() the open files because the with context manager will do it for you automatically.
You also need to open the CSV file with newline='' as shown in all the examples in the csv module's documentation.
Lastly, if you want the all the fields to be quoted you need to use quoting=csv.QUOTE_ALL instead of quoting=csv.QUOTE_MINIMAL.
import csv
inp_filename = "Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv"
outp_filename = "te_id.csv"
with open(outp_filename, 'w', newline='') as text_file, \
open(inp_filename, 'r', newline='') as csv_File:
reader = csv.DictReader(csv_File)
writer = csv.writer(text_file, quotechar="'", quoting=csv.QUOTE_ALL)
taxable_entity_ids = (row["taxable_entity_id"] for row in reader)
writer.writerow(taxable_entity_ids)
print('done')

Insert new line in CSV at second row via Python

Is it possible to insert a new line in a CSV file at the 2nd row? For example, I have a CSV with column names and their values:
meter_id, sdp_id, lat, lon
813711, 1331, 34.298, -83.113
The application I'm attempting to read this file into requires a new line added indicating the column type. In the above example all would be string, so the CSV would need to be:
meter_id, sdp_id, lat, lon
String, String, String, String
813711, 1331, 34.298, -83.113
I've read several posts how to add a new line at the end of the CSV, but couldn't find anything on how to do the above.
This is one approach using csv module.
Demo:
import csv
toAdd = ["String", "String", "String", "String"]
with open(filename, "r") as infile:
reader = list(csv.reader(infile))
reader.insert(1, toAdd)
with open(filename, "w") as outfile:
writer = csv.writer(outfile)
for line in reader:
writer.writerow(line)
A simple solution could be:
import csv
row = ['String', ' String', ' String']
with open('file.csv', 'r') as readFile:
reader = csv.reader(readFile)
lines = list(reader)
lines.insert(1, row)
with open('file.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
readFile.close()
writeFile.close()
I could not try it. Please let me know if it works.

add a new line after a specific line of a csv file in python3

I am writing a python code in which a csv file is read and some information are written in. I should find one specific row and add a new line of data after it, at this stage. I have succeeded finding the row but I can not write the new line of data after it. Here is my attempt:
file = open('db.csv', 'r+')
table = csv.reader(file)
for row in table:
if(row == ['tbl']):
file.seek(len(row)) #this part is the problem I suppose
break
table = csv.writer(file)
table.writerow(['1', '2'])
Using file.seek / file.tell is tricky because csv.reader could read ahead; cannot tell exact file position that match current row.
Also inserting is not trivial; you need to remember remaing parts.
I would do it following way:
creating another file for write
write according to your need
once done, replace the old file with new file
import csv
import shutil
with open('db.csv', 'r', newline='') as f, open('db.csv.temp', 'w', newline='') as fout:
reader = csv.reader(f)
writer = csv.writer(fout)
for row in reader:
writer.writerow(row)
if row == ['tbl']:
writer.writerow([]) # empty line
shutil.move('db.csv.temp', 'db.csv')

How to copy multiple rows and one column from one CSV file to another CSV Excel?

I am extremely new to python(coding, for that matter).
Could I please get some help as to how can I achieve this. I have gone through numerous threads but nothing helped.
My input file looks like this:
I want my output file to look like this:
Just replication of the first column, twice in the second excel sheet. With a line after every 5 rows.
A .csv file can be opened with a normal text editor, do this and you'll see that the entries for each column are comma-separated (csv = comma separated values). Most likely it's semicolons ;, though.
Since you're new to coding, I recommend trying it manually with a text editor first until you have the desired output, and then try to replicate it with python.
Also, you should post code examples here and ask specific questions about why it doesn't work like you expected it to work.
Below is the solution. Don't forget to configure input/output files and the delimiter:
input_file = 'c:\Temp\input.csv'
output_file = 'c:\Temp\output.csv'
delimiter = ';'
i = 0
output_data = ''
with open(input_file) as f:
for line in f:
i += 1
output_data += line.strip() + delimiter + line
if i == 5:
output_data += '\n'
i = 0
with open(output_file, 'w') as file_:
file_.write(output_data)
Python has a csv module for doing this. It is able to automatically read each row into a list of columns. It is then possible to simply take the first element and replicate it into the second column in an output file.
import csv
with open('input.csv', 'rb') as f_input:
csv_input = csv.reader(f_input)
input_rows = list(csv_input)
with open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
for line, row in enumerate(input_rows, start=1):
csv_output.writerow([row[0], row[0]])
if line % 5 == 0:
csv_output.writerow([])
Note, it is not advisable to write the updated data directly over the input file as if there was a problem you would lose your original file.
If your input file has multiple columns, this script will remove them and simple duplicate the first column.
By default, the csv format separates each column using a comma, this can be modified by specifying a desired delimiter as follows:
csv_output = csv.writer(f_output, delimiter=';')

Python search csv file from input text file

I'm new to python and I struggling with this code. Have 2 file, 1st file is text file containing email addresses (one each line), 2nd file is csv file with 5-6 columns. Script should take search input from file1 and search in file 2, the output should be stored in another csv file (only first 3 columns) see example below. Also I have copied a script that I was working on. If there is a better/efficient script then please let me know. Thank you, appreciate your help.
File1 (output.txt)
rrr#company.com
eee#company.com
ccc#company.com
File2 (final.csv)
Sam,Smith,sss#company.com,admin
Eric,Smith,eee#company.com,finance
Joe,Doe,jjj#company.com,telcom
Chase,Li,ccc#company.com,IT
output (out_name_email.csv)
Eric,Smith,eee#company.com
Chase,Li,ccc#company.com
Here is the script
import csv
outputfile = 'C:\\Python27\\scripts\\out_name_email.csv'
inputfile = 'C:\\Python27\\scripts\\output.txt'
datafile = 'C:\\Python27\\scripts\\final.csv'
names=[]
with open(inputfile) as f:
for line in f:
names.append(line)
with open(datafile, 'rb') as fd, open(outputfile, 'wb') as fp_out1:
writer = csv.writer(fp_out1, delimiter=",")
reader = csv.reader(fd, delimiter=",")
headers = next(reader)
for row in fd:
for name in names:
if name in line:
writer.writerow(row)
Load the emails into a set for O(1) lookup:
with open(inputfile) as fin:
emails = set(line.strip() for line in fin)
Then loop over the rows once, and check it exists in emails - no need to loop over each possible match for each row:
# ...
for row in reader:
if row[1] in emails:
writer.writerow(row)
If you're not doing anything else, then you can make it:
writer.writerows(row for row in reader if row[1] in emails)
A couple of notes, in your original code you're not using the csv.reader object reader - you're looping over fd and you appear to have some naming issues with names and line and row...

Categories