csv file to list in python - python

I have a CSV file which looks like this:
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_02,983,0,Prod,983
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_03,124,0,Prod ,124
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_04,206,0,Prod,206
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_05,983,0,Prod ,983
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_06,564,0,Prod,564
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_07,189,0,Prod ,189
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_08,168,0,Prod,168
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_09,570,0,Prod ,570
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_10,189,0,Prod,189
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_11,204,0,Prod ,204
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_12,189,2,Prod,187
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_13,568,0,Prod ,568
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_14,204,0,Prod,204
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_15,142,0,Prod ,142
File,2/13/2017,domain\test_roi,root_user,ntsrv1,/vol/vol_ntsrv1_16,168,0,Prod,168
I want to add to a list the 4th column (root_user) and the 7th column (where the numbers are written). Any suggestions how?

import csv
four_col, seven_col = [], []
with open(file='test.csv', mode='r', encoding='utf-8') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
# firstline = csvfile.readline() # if csv have header uncomment it
for row in spamreader:
four_col.append(row[3])
seven_col.append(row[6])
With this csv file you can read it also setting the spamreader as:
spamreader = csv.reader(csvfile, dialect='excel')
but I wrote you the more generic way if the file don't uses commas for delimiter.

It's pretty simple this way:
fourth_column_list = []
seventh_column_list = []
with open(my_csv_file, 'r') as infile:
parsed = (x.split(',') for line in infile) # get all parsed columns
for parsed_line in parsed: # iterate over parsed lines
fourth_column_list.append(parsed_line[3]) # append 4th column
seventhth_column_list.append(parsed_line[6]) # append 7th column

Related

Adding/appending additional information in a column for a csv file

EDIT:
I need to store/add/append additional information in a specific column in a csv file with out using csv.DictReader.
If I wanted to skip a row in a column and it was empty, what do I need to do for it?
For example:
Sample csv file:
$ cat file.csv
"A","B","C","D","E"
"a1","b1","c1","d1","e1"
"a2","b2","c2","d2","e2"
"a2","b2","c2",,"e2"
Code:
sample = ['dx;dy']
with(openfile.csv, "r") as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
headers = next(reader)
for row in reader:
#sample.append(to the column D)
The Output should look like this:
$ cat file.csv
"A","B","C","D","E"
"a1","b1","c1","d1;dx;dy","e1"
"a2","b2","c2","d2;dx;dy","e2"
"a2","b2","c2",,"e2"
Since you know the header of the column you want to append to, you can find its index in the headers row, and then modify that element of each row.
append_to_column = 'D'
separator = ';'
sample = ['dx;dy']
with open('file.csv', "r") as csvfile, open("outfile.csv", "w") as outfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
headers = next(reader)
writer = csv.writer(outfile, delimiter=',', quotechar='"')
col_index = headers.index(append_to_column)
for row in reader:
value = row[col_index]
new_value = value + separator + sample[0]
row[col_index] = new_value
writer.writerow(row)
Which gives:
A,B,C,D,E
a1,b1,c1,d1;dx;dy,e1
a2,b2,c2,d2;dx;dy,e2
Note that this file doesn't have quotes because they aren't required, since the fields don't contain any commas. If you want to force the csv.writer to write quotes, you can add the quoting=csv.QUOTE_ALL argument to the csv.writer() call, like so: writer = csv.writer(outfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
Then, you'll get:
"A","B","C","D","E"
"a1","b1","c1","d1;dx;dy","e1"
"a2","b2","c2","d2;dx;dy","e2"

Appending a row of data to line 2 in a large CSV File

I'm sure this is a really easy question but I can't seem to find any information on it.
I have a very large CSV file which I need to insert a row directly after the header which helps with another code that reads the csv and joins it to a parcel shapefile.
I have the code to append the row of data that I want, but it will only go to the last line. I cannot figure out how to get the code to insert my row immediately after the header row. Here is my code:
import os
import csv
insert_row = '"AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00'
os.chdir(r"D:\PROPERTY\PINELLAS\Data_20201001_t")
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail.csv", 'a', newline = "") as new_file:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(new_file)
csv_writer.writerow(insert_row)
So that's it. I just need the insert_row line of data to be in row position number 2 instead of at the end of the file. Thank you.
You can't just insert a row in the middle of a file unless replacing data of exactly the same length. You have to read the entire file, edit it, and re-write it.
Something like this should work:
import csv
# This must be an iterable not a string
insert_row = "AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail_updated.csv", 'w', newline = "") as new_file:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(new_file)
header = next(csv_reader)
csv_writer.writerow(header)
csv_writer.writerow(insert_row)
for line in csv_reader:
csv_writer.writerow(line)
If the CSV file is not too large to fit entirely in memory than you can read all the lines at once, edit them, and write them back out to the same file. It's riskier if there is a problem. Safer to write to a new file, then delete original and rename if no errors:
import csv
# This must be an iterable not a string
insert_row = "AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00
with open("owner_mail.csv", 'r') as csv_file:
rows = list(csv.reader(csv_file))
rows.insert(1,insert_row) # insert after header row
with open("owner_mail.csv", 'w') as csv_file:
w = csv.writer(csv_file)
w.writerows(rows)
Please try this:
import os
import csv
insert_row = '"AAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00'
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail.csv", 'w') as new_file:
csv_reader = csv.reader(csv_file)
reader = list(csv_reader)
reader.insert(1,insert_row)
csv_writer = csv.writer(new_file)
csv_writer.writerows(reader)

Need help in extracting data from csv and writing to a text file

I have a csv with two columns of data. I want to extract data from one column and write to a text file with single-quote on each element and separated by a comma. For example, I have this..
taxable_entity_id,id
45efc167-9254-406c-b5a8-6aef91a73dd9,331999
5ae97680-f489-4182-9dcb-eb07a73fab15,103507
00018d93-ae71-4367-a0da-f252cea4dfa2,32991
I want all the taxable_entity_ids in a text file like this
'45efc167-9254-406c-b5a8-6aef91a73dd9','5ae97680-f489-4182-9dcb-eb07a73fab15','00018d93-ae71-4367-a0da-f252cea4dfa2'
without any space between two elements, separated by a comma.
Edit:
This is what i tried..
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_MINIMAL)
for row in reader:
writer.writerow(row["taxable_entity_id"])
# print(row["taxable_entity_id"])
text_file.close()
csv_File.close()
and this is what I got..
4,5,e,f,c,1,6,7,-,9,2,5,4,-,4,0,6,c,-,b,5,a,8,-,6,a,e,f,9,1,a,7,3,d,d,9
5,a,e,9,7,6,8,0,-,f,4,8,9,-,4,1,8,2,-,9,d,c,b,-,e,b,0,7,a,7,3,f,a,b,1,5
0,0,0,1,8,d,9,3,-,a,e,7,1,-,4,3,6,7,-,a,0,d,a,-,f,2,5,2,c,e,a,4,d,f,a,2
You were close. Simply as you want one single line in the output file, you should write it at once by using a comprehension:
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
# use QUOTE_ALL to force the quoting
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_ALL)
writer.writerow((row["taxable_entity_id"] for row in reader))
And do not use close as you have (correctly) used with.
try that
import pandas as pd
df = pd.read_csv('nameoffile.csv',delimiter = ',')
X = df[0].values
f = open('newfile.txt','w')
for i in X:
f.write(X[i] + ',')
f.close()
It's seems a little odd that you basically want a one row csv file for the taxable_entity_ids, but certain possible. You also don't need to explicitly close() the open files because the with context manager will do it for you automatically.
You also need to open the CSV file with newline='' as shown in all the examples in the csv module's documentation.
Lastly, if you want the all the fields to be quoted you need to use quoting=csv.QUOTE_ALL instead of quoting=csv.QUOTE_MINIMAL.
import csv
inp_filename = "Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv"
outp_filename = "te_id.csv"
with open(outp_filename, 'w', newline='') as text_file, \
open(inp_filename, 'r', newline='') as csv_File:
reader = csv.DictReader(csv_File)
writer = csv.writer(text_file, quotechar="'", quoting=csv.QUOTE_ALL)
taxable_entity_ids = (row["taxable_entity_id"] for row in reader)
writer.writerow(taxable_entity_ids)
print('done')

Formatting csv file with python

I have a csv file with the following structure:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
I need him to stay like this:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
I received this .csv file from someone else, so I do not know how the conversion was done. I am trying unsuccessfully with the code below:
input_fd = open("/home/gustavo/Downloads/Redes/Despesas/csvfile.csv", 'r')
output_fd = open('dados_2018_1.csv', 'w')
for line in input_fd.readlines():
line.replace("\"","")
output_fd.write(line)
input_fd.close()
output_fd.close()
Is it possible to make this change or will I have to do the conversion from an xml file to a csv, and make this change during the conversion?
First: tell the reader to use delimiter=";" and quoting=csv.QUOTE_NONE. This will properly split your second line which is a string literal containing your delimiter, which you desire to be split. We'll tweak that data to remove the quotation marks (otherwise our output will be quoted strings like '"txNomeParlamentar"', etc).
import csv
with open('file.txt') as f:
reader = csv.reader(f, delimiter=";", quoting=csv.QUOTE_NONE)
data = [list(map(lambda s: s.replace('"', ''), row)) for row in reader]
Then: we write the file back out, with the delimiter=";", and quoting=csv.QUOTE_ALL to ensure each item is set in quotes
with open('out.txt', 'w', newline='') as o:
writer = csv.writer(o, delimiter=";", quoting=csv.QUOTE_ALL)
writer.writerows(data)
Input:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
Output:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
A couple things. First, you do NOT have a csv file because in a csv file, the delimiter is a comma by definition. I'm assuming you want the values in your data file to (1) remain separated by semicolons [why not fix it and make it commas?] and (2) you want each value to be in quotation marks.
If so, I think this will work:
# data reader
in_file = 'data.txt'
out_file = 'fixed.txt'
output = open(out_file, 'w')
with open(in_file, 'r') as source:
for line in source:
# split by semicolon
data = line.strip().split(';')
# remove all quotes found
data = [t.replace('"','') for t in data]
for item in data[:-1]:
output.write(''.join(['"', item, '"',';']))
# write the last item separately, without the trailing ';'
output.write(''.join(['"', item, '"']))
output.write('\n')
output.close()
If your target user is python, you should consider replacing the semicolons with commas (correct csv format) and forgoing the quotes. Everything python reads from csv is taken in as string anyhow.
Using csv module.
Ex:
import csv
with open(filename) as csvfile:
reader = csv.reader(csvfile, delimiter=";")
headers = next(reader) #Read Headers
data = [row.strip('"').split(";") for row in csvfile] #Format data
with open(filename, "w") as csvfile_out:
writer = csv.writer(csvfile_out, delimiter=";")
writer.writerow(headers) #Write Headers
writer.writerows(data) #Write data
You could use the csv module to do it if you massage the input data a little first.
import csv
#input_csv = '/home/gustavo/Downloads/Redes/Despesas/csvfile.csv'
input_csv = 'gustavo_input.csv'
output_csv = 'dados_2018_1.csv'
with open(input_csv, 'r', newline='') as input_fd, \
open(output_csv, 'w', newline='') as output_fd:
reader = csv.DictReader(input_fd, delimiter=';')
writer = csv.DictWriter(output_fd, delimiter=';',
fieldnames=reader.fieldnames,
quoting=csv.QUOTE_ALL)
first_field = reader.fieldnames[0]
for row in reader:
fields = row[first_field].split(';')
newrow = dict(zip(reader.fieldnames, fields))
writer.writerow(newrow)
print('done')

How to read one single line of csv data in Python?

There is a lot of examples of reading csv data using python, like this one:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
I only want to read one line of data and enter it into various variables. How do I do that? I've looked everywhere for a working example.
My code only retrieves the value for i, and none of the other values
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
i = int(row[0])
a1 = int(row[1])
b1 = int(row[2])
c1 = int(row[2])
x1 = int(row[2])
y1 = int(row[2])
z1 = int(row[2])
To read only the first row of the csv file use next() on the reader object.
with open('some.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
# now do something here
# if first row is the header, then you can do one more next() to get the next row:
# row2 = next(f)
or :
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
# do something here with `row`
break
you could get just the first row like:
with open('some.csv', newline='') as f:
csv_reader = csv.reader(f)
csv_headings = next(csv_reader)
first_line = next(csv_reader)
You can use Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
Just for reference, a for loop can be used after getting the first row to get the rest of the file:
with open('file.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
for row in reader:
print(row) # prints rows 2 and onward
From the Python documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just drop your string data into a singleton list.
The simple way to get any row in csv file
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(csvFileArray[0])
To print a range of line, in this case from line 4 to 7
import csv
with open('california_housing_test.csv') as csv_file:
data = csv.reader(csv_file)
for row in list(data)[4:7]:
print(row)
I think the simplest way is the best way, and in this case (and in most others) is one without using external libraries (pandas) or modules (csv). So, here is the simple answer.
""" no need to give any mode, keep it simple """
with open('some.csv') as f:
""" store in a variable to be used later """
my_line = f.nextline()
""" do what you like with 'my_line' now """

Categories