Read a tab delimited txt file and write to separate column csv - python

I have a txt file that have several lines for the headers which are represented by a '#'.
Then I have three columns each with their own header that I want to copy into a csv file that will allow for each column to have their own column in the spreadsheet.
Currently all I am able to get is a file that has all three columns in one section of the csv.
import csv
infile = r'path\seawater_nh.txt'
outfile = r'path\emissivity_new.csv'
print "definitions successful"
in_txt = csv.reader(open(infile, 'rb'), delimiter = '\t')
out_csv = csv.writer(open(outfile, 'wb'))
out_csv.writerows(in_txt)

In the absence of your sample input and output files, I'm guessing here. But perhaps change how your files are read and written to (note: depending on the OS, you may need to change how the lines are read).
import csv
infile = r'path\seawater_nh.txt'
outfile = r'path\emissivity_new.csv'
with open(infile, "r") as in_text:
in_reader = csv.reader(infile , delimiter = '\t')
with open(outfile, "w") as out_csv:
out_writer = csv.writer(out_csv, newline='')
for row in in_reader:
out_writer.writerow(row)

Related

Appending a row of data to line 2 in a large CSV File

I'm sure this is a really easy question but I can't seem to find any information on it.
I have a very large CSV file which I need to insert a row directly after the header which helps with another code that reads the csv and joins it to a parcel shapefile.
I have the code to append the row of data that I want, but it will only go to the last line. I cannot figure out how to get the code to insert my row immediately after the header row. Here is my code:
import os
import csv
insert_row = '"AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00'
os.chdir(r"D:\PROPERTY\PINELLAS\Data_20201001_t")
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail.csv", 'a', newline = "") as new_file:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(new_file)
csv_writer.writerow(insert_row)
So that's it. I just need the insert_row line of data to be in row position number 2 instead of at the end of the file. Thank you.
You can't just insert a row in the middle of a file unless replacing data of exactly the same length. You have to read the entire file, edit it, and re-write it.
Something like this should work:
import csv
# This must be an iterable not a string
insert_row = "AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail_updated.csv", 'w', newline = "") as new_file:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(new_file)
header = next(csv_reader)
csv_writer.writerow(header)
csv_writer.writerow(insert_row)
for line in csv_reader:
csv_writer.writerow(line)
If the CSV file is not too large to fit entirely in memory than you can read all the lines at once, edit them, and write them back out to the same file. It's riskier if there is a problem. Safer to write to a new file, then delete original and rename if no errors:
import csv
# This must be an iterable not a string
insert_row = "AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00
with open("owner_mail.csv", 'r') as csv_file:
rows = list(csv.reader(csv_file))
rows.insert(1,insert_row) # insert after header row
with open("owner_mail.csv", 'w') as csv_file:
w = csv.writer(csv_file)
w.writerows(rows)
Please try this:
import os
import csv
insert_row = '"AAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00'
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail.csv", 'w') as new_file:
csv_reader = csv.reader(csv_file)
reader = list(csv_reader)
reader.insert(1,insert_row)
csv_writer = csv.writer(new_file)
csv_writer.writerows(reader)

Need help in extracting data from csv and writing to a text file

I have a csv with two columns of data. I want to extract data from one column and write to a text file with single-quote on each element and separated by a comma. For example, I have this..
taxable_entity_id,id
45efc167-9254-406c-b5a8-6aef91a73dd9,331999
5ae97680-f489-4182-9dcb-eb07a73fab15,103507
00018d93-ae71-4367-a0da-f252cea4dfa2,32991
I want all the taxable_entity_ids in a text file like this
'45efc167-9254-406c-b5a8-6aef91a73dd9','5ae97680-f489-4182-9dcb-eb07a73fab15','00018d93-ae71-4367-a0da-f252cea4dfa2'
without any space between two elements, separated by a comma.
Edit:
This is what i tried..
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_MINIMAL)
for row in reader:
writer.writerow(row["taxable_entity_id"])
# print(row["taxable_entity_id"])
text_file.close()
csv_File.close()
and this is what I got..
4,5,e,f,c,1,6,7,-,9,2,5,4,-,4,0,6,c,-,b,5,a,8,-,6,a,e,f,9,1,a,7,3,d,d,9
5,a,e,9,7,6,8,0,-,f,4,8,9,-,4,1,8,2,-,9,d,c,b,-,e,b,0,7,a,7,3,f,a,b,1,5
0,0,0,1,8,d,9,3,-,a,e,7,1,-,4,3,6,7,-,a,0,d,a,-,f,2,5,2,c,e,a,4,d,f,a,2
You were close. Simply as you want one single line in the output file, you should write it at once by using a comprehension:
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
# use QUOTE_ALL to force the quoting
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_ALL)
writer.writerow((row["taxable_entity_id"] for row in reader))
And do not use close as you have (correctly) used with.
try that
import pandas as pd
df = pd.read_csv('nameoffile.csv',delimiter = ',')
X = df[0].values
f = open('newfile.txt','w')
for i in X:
f.write(X[i] + ',')
f.close()
It's seems a little odd that you basically want a one row csv file for the taxable_entity_ids, but certain possible. You also don't need to explicitly close() the open files because the with context manager will do it for you automatically.
You also need to open the CSV file with newline='' as shown in all the examples in the csv module's documentation.
Lastly, if you want the all the fields to be quoted you need to use quoting=csv.QUOTE_ALL instead of quoting=csv.QUOTE_MINIMAL.
import csv
inp_filename = "Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv"
outp_filename = "te_id.csv"
with open(outp_filename, 'w', newline='') as text_file, \
open(inp_filename, 'r', newline='') as csv_File:
reader = csv.DictReader(csv_File)
writer = csv.writer(text_file, quotechar="'", quoting=csv.QUOTE_ALL)
taxable_entity_ids = (row["taxable_entity_id"] for row in reader)
writer.writerow(taxable_entity_ids)
print('done')

Formatting csv file with python

I have a csv file with the following structure:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
I need him to stay like this:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
I received this .csv file from someone else, so I do not know how the conversion was done. I am trying unsuccessfully with the code below:
input_fd = open("/home/gustavo/Downloads/Redes/Despesas/csvfile.csv", 'r')
output_fd = open('dados_2018_1.csv', 'w')
for line in input_fd.readlines():
line.replace("\"","")
output_fd.write(line)
input_fd.close()
output_fd.close()
Is it possible to make this change or will I have to do the conversion from an xml file to a csv, and make this change during the conversion?
First: tell the reader to use delimiter=";" and quoting=csv.QUOTE_NONE. This will properly split your second line which is a string literal containing your delimiter, which you desire to be split. We'll tweak that data to remove the quotation marks (otherwise our output will be quoted strings like '"txNomeParlamentar"', etc).
import csv
with open('file.txt') as f:
reader = csv.reader(f, delimiter=";", quoting=csv.QUOTE_NONE)
data = [list(map(lambda s: s.replace('"', ''), row)) for row in reader]
Then: we write the file back out, with the delimiter=";", and quoting=csv.QUOTE_ALL to ensure each item is set in quotes
with open('out.txt', 'w', newline='') as o:
writer = csv.writer(o, delimiter=";", quoting=csv.QUOTE_ALL)
writer.writerows(data)
Input:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
Output:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
A couple things. First, you do NOT have a csv file because in a csv file, the delimiter is a comma by definition. I'm assuming you want the values in your data file to (1) remain separated by semicolons [why not fix it and make it commas?] and (2) you want each value to be in quotation marks.
If so, I think this will work:
# data reader
in_file = 'data.txt'
out_file = 'fixed.txt'
output = open(out_file, 'w')
with open(in_file, 'r') as source:
for line in source:
# split by semicolon
data = line.strip().split(';')
# remove all quotes found
data = [t.replace('"','') for t in data]
for item in data[:-1]:
output.write(''.join(['"', item, '"',';']))
# write the last item separately, without the trailing ';'
output.write(''.join(['"', item, '"']))
output.write('\n')
output.close()
If your target user is python, you should consider replacing the semicolons with commas (correct csv format) and forgoing the quotes. Everything python reads from csv is taken in as string anyhow.
Using csv module.
Ex:
import csv
with open(filename) as csvfile:
reader = csv.reader(csvfile, delimiter=";")
headers = next(reader) #Read Headers
data = [row.strip('"').split(";") for row in csvfile] #Format data
with open(filename, "w") as csvfile_out:
writer = csv.writer(csvfile_out, delimiter=";")
writer.writerow(headers) #Write Headers
writer.writerows(data) #Write data
You could use the csv module to do it if you massage the input data a little first.
import csv
#input_csv = '/home/gustavo/Downloads/Redes/Despesas/csvfile.csv'
input_csv = 'gustavo_input.csv'
output_csv = 'dados_2018_1.csv'
with open(input_csv, 'r', newline='') as input_fd, \
open(output_csv, 'w', newline='') as output_fd:
reader = csv.DictReader(input_fd, delimiter=';')
writer = csv.DictWriter(output_fd, delimiter=';',
fieldnames=reader.fieldnames,
quoting=csv.QUOTE_ALL)
first_field = reader.fieldnames[0]
for row in reader:
fields = row[first_field].split(';')
newrow = dict(zip(reader.fieldnames, fields))
writer.writerow(newrow)
print('done')

How to merge two csv file vertically and keep the data formate (number to number, string to string)

I want to merge two csv file vertically. One file contain only strings (first column, first three rows). Second file contain strings and numbers.
I can print them out. But have problem to save them row by row to a csv file. Also have problem to keep the data type. (number to number, string to string).
The following is the code I used :
Method 1:
import csv
file1 = ("/Users/yingdu/GitHub/20180807/String_.csv")
file2 = ("/Users/yingdu/GitHub/20180807/CovertFile_SampleData4.csv")
combined_file = ("/Users/yingdu/GitHub/20180807/combined_file.csv")
spreadsheet_filenames = [file1,file2]
for filename in spreadsheet_filenames:
with open(filename, 'r') as csvfile:
output = csv.reader(csvfile)
for row in output:
print row
The following is my print results:
['SoftGenetics GeneMarker Trace Data Export']
['Raw Data']
['PAT_Ladder_1.fsa']
['Blue', 'Green', 'Yellow', 'Red', 'Orange']
['82.45', '97.65', '229.05', '85.25', '44.85']
['151.08', '167.48', '454.48', '136.68', '59.28']
['144.45', '161.25', '440.25', '133.65', '60.45']
['49.5', '65.9', '105.5', '69.1', '44.5']
['73.25', '109.45', '326.65', '70.85', '26.85']
['66.58', '97.18', '322.58', '65.38', '24.78']
['56.95', '77.35', '138.35', '91.95', '61.75']
['66.45', '79.65', '351.05', '69.25', '35.25']
The following is the code I used to write csv file that save all data to a new csv file. I found that csv file was not created correctly.
import csv
file1 = ("/Users/yingdu/GitHub/20180807/String_.csv")
file2 = ("/Users/yingdu/GitHub/20180807/CovertFile_SampleData4.csv")
combined_file = ("/Users/yingdu/GitHub/20180807/combined_file.csv")
spreadsheet_filenames = [file1,file2]
for filename in spreadsheet_filenames:
with open(filename, 'r') as csvfile:
output = csv.reader(csvfile)
with open(Combined_File, mode='w') as Combined_File:
for row in output:
print row
csv_writer = csv.writer(Combined_File, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_writer.writerow(row)
Method 2:
By using "dataframe.concat". I consider file one and file two as two objects (dataframes). merged is the dataframe I expected. But file "combined_file.csv" wasn't created/ generated by df.to_csv method here. And there is no error message.
f1 = pd.read_csv(file1, header=None)
f2 = pd.read_csv(file2, header=None)
merged = pd.concat([f1, f2])
merged.to_csv(combined_file, index=None, header=None)
concat takes a list of dataframes as its first argument.
Try:
merged = pd.concat([f1, f2])
You are creating the file twice! open(Combined_File, mode='w') overwrites the file and since it is inside the loop, you will only get data from the last file.
Another hint is that you can use writerows() to write multiple rows with a single call, and it takes an iterable, so you can just pass csv_input to write everything:
import csv
file1 = "/Users/yingdu/GitHub/20180807/String_.csv"
file2 = "/Users/yingdu/GitHub/20180807/CovertFile_SampleData4.csv"
spreadsheet_filenames = [file1, file2]
combined_file = "/Users/yingdu/GitHub/20180807/combined_file.csv"
with open(combined_file, 'w') as output_file: # create output outside the for loop
csv_output = csv.writer(output_file, delimiter=',')
for filename in spreadsheet_filenames:
with open(filename) as input_file:
csv_input = csv.reader(input_file, delimiter=',')
csv_output.writerows(csv_input)

Python search csv file from input text file

I'm new to python and I struggling with this code. Have 2 file, 1st file is text file containing email addresses (one each line), 2nd file is csv file with 5-6 columns. Script should take search input from file1 and search in file 2, the output should be stored in another csv file (only first 3 columns) see example below. Also I have copied a script that I was working on. If there is a better/efficient script then please let me know. Thank you, appreciate your help.
File1 (output.txt)
rrr#company.com
eee#company.com
ccc#company.com
File2 (final.csv)
Sam,Smith,sss#company.com,admin
Eric,Smith,eee#company.com,finance
Joe,Doe,jjj#company.com,telcom
Chase,Li,ccc#company.com,IT
output (out_name_email.csv)
Eric,Smith,eee#company.com
Chase,Li,ccc#company.com
Here is the script
import csv
outputfile = 'C:\\Python27\\scripts\\out_name_email.csv'
inputfile = 'C:\\Python27\\scripts\\output.txt'
datafile = 'C:\\Python27\\scripts\\final.csv'
names=[]
with open(inputfile) as f:
for line in f:
names.append(line)
with open(datafile, 'rb') as fd, open(outputfile, 'wb') as fp_out1:
writer = csv.writer(fp_out1, delimiter=",")
reader = csv.reader(fd, delimiter=",")
headers = next(reader)
for row in fd:
for name in names:
if name in line:
writer.writerow(row)
Load the emails into a set for O(1) lookup:
with open(inputfile) as fin:
emails = set(line.strip() for line in fin)
Then loop over the rows once, and check it exists in emails - no need to loop over each possible match for each row:
# ...
for row in reader:
if row[1] in emails:
writer.writerow(row)
If you're not doing anything else, then you can make it:
writer.writerows(row for row in reader if row[1] in emails)
A couple of notes, in your original code you're not using the csv.reader object reader - you're looping over fd and you appear to have some naming issues with names and line and row...

Categories