Getting unique values from csv file, output to new file

Getting unique values from csv file, output to new file - python

I am trying to get the unique values from a csv file. Here's an example of the file:
12,life,car,good,exellent
10,gift,truck,great,great
11,time,car,great,perfect
The desired output in the new file is this:
12,10,11
life,gift,time
car,truck
good.great
excellent,great,perfect
Here is my code:
def attribute_values(in_file, out_file):
fname = open(in_file)
fout = open(out_file, 'w')
# get the header line
header = fname.readline()
# get the attribute names
attrs = header.strip().split(',')
# get the distinct values for each attribute
values = []
for i in range(len(attrs)):
values.append(set())
# read the data
for line in fname:
cols = line.strip().split(',')
for i in range(len(attrs)):
values[i].add(cols[i])
# write the distinct values to the file
for i in range(len(attrs)):
fout.write(attrs[i] + ',' + ','.join(list(values[i])) + '\n')
fout.close()
fname.close()
The code currently outputs this:
12,10
life,gift
car,truck
good,great
exellent,great
12,10,11
life,gift,time
car,car,truck
good,great
exellent,great,perfect
How can I fix this?

You could try to use zip to iterate over the columns of the input file, and then eliminate the duplicates:
import csv
def attribute_values(in_file, out_file):
with open(in_file, "r") as fin, open(out_file, "w") as fout:
for column in zip(*csv.reader(fin)):
items, row = set(), []
for item in column:
if item not in items:
items.add(item)
row.append(item)
fout.write(",".join(row) + "\n")
Result for the example file:
12,10,11
life,gift,time
car,truck
good,great
exellent,great,perfect

Related

removing duplicate id entry from text file using python

I have a text file which contains this data (items corresponds to code,entry1,entry2) :
a,1,2
b,2,3
c,4,5
....
....
Here a,b,c.. will be unique always
Every time I read this file in python to either create a new entry for example d,6,7 or to update existing values: say a,1,2 to a,4,3.
I use the following code :
data = ['a',5,6]
datastring = ''
for d in data
datastring = datastring + str(d) + ','
try:
with open("opfile.txt", "a") as f:
f.write(datastring + '\n')
f.close()
return(True)
except:
return(False)
This appends any data as a new entry.
I am trying something like this which checks the first character of each line:
f = open("opfile.txt", "r")
for x in f:
if(x[0] == username):
pass
I don't know how to club these two so that a check will be done on first character(lets say it as id) and if an entry with id is already in the file, then it should be replaced with new data and all other data remains same else it will be entered as new line item.

Read the file into a dictionary that uses the first field as keys. Update the appropriate dictionary, then write it back.
Use the csv module to parse and format the file.
import csv
data = ['a',5,6]
with open("opfile.txt", "r", newline='') as infile:
incsv = csv.reader(infile)
d = {row[0]: row for row in incsv if len(row) != 0}
d[data[0]] = data
with open("opfile.txt", "w") as outfile:
outcsv = csv.writer(outfile)
outcsv.writerows(d.values())

first append all new row to the file.
second, try using write to update rows in your file
def update_record(file_name, field1, field2, field3):
with open(file_name, 'r') as f:
lines = f.readlines()
with open(file_name, 'w') as f:
for line in lines:
if field1 in line:
f.write(field1 + ',' + field2 + ',' + field3 + '\n')
else:
f.write(line)

Python script using json.load to compare two files and replace stringss

I have a JSON file like this: [{"ID": "12345", "Name":"John"}, {"ID":"45321", "Name":"Max"}...] called myclass.json. I used json.load library to get "ID" and "Name" values.
I have another .txt file with the content below. File name is list.txt:
Student,12345,Age 14
Student,45321,Age 15
.
.
.
I'm trying to create a script in python that compares the two files line by line and replace the student ID for the students name in list.txt file, so the new file would be:
Student,John,Age 14
Student,Max,Age 15
.
.
Any ideas?
My code so far:
import json
with open('/myclass.json') as f:
data = json.load(f)
for key in data:
x = key['Name']
z = key['ID']
with open('/myclass.json', 'r') as file1:
with open('/list.txt', 'r+') as file2:
for line in file2:
x = z

try this:
import json
import csv
with open('myclass.json') as f:
data = json.load(f)
with open('list.txt', 'r') as f:
reader = csv.reader(f)
rows = list(reader)
def get_name(id_):
for item in data:
if item['ID'] == id_:
return item["Name"]
with open('list.txt', 'w') as f:
writer = csv.writer(f)
for row in rows:
name = get_name(id_ = row[1])
if name:
row[1] = name
writer.writerows(rows)
Keep in mind that this script technically does not replace the items in the list.txt file one by one, but instead reads the entire file in and then overwrites the list.txt file entirely and constructs it from scratch. I suggest making a back up of list.txt or naming the new txt file something different incase the program crashes from some unexpected input.

One option is individually open each file for each mode while appending a list for matched ID values among those two files as
import json
with open('myclass.json','r') as f_in:
data = json.load(f_in)
j=0
lis=[]
with open('list.txt', 'r') as f_in:
for line in f_in:
if data[j]['ID']==line.split(',')[1]:
s = line.replace(line.split(',')[1],data[j]['Name'])
lis.append(s)
j+=1
with open('list.txt', 'w') as f_out:
for i in lis:
f_out.write(i)

How to read list which contains comma from CSV file as a column?

I want to read CSV file which contains following data :
Input.csv-
10,[40000,1][50000,5][60000,14]
20,[40000,5][50000,2][60000,1][70000,1][80000,1][90000,1]
30,[60000,4]
40,[40000,5][50000,14]
I want to parse this CSV file and parse it row by row. But these lists contains commas ',' so I'm not getting correct result.
Program-Code-
if __name__ == "__main__":
with open(inputfile, "r") as f:
reader = csv.reader(f,skipinitialspace=True)
next(reader,None)
for read in reader:
no = read[0]
splitted_record = read[1]
print splitted_record
Output-
[40000
[40000
[60000
[40000
I can understand read.csv method reads till commas for each column. But how I can read whole lists as a one column?
Expected Output-
[40000,1][50000,5][60000,14]
[40000,5][50000,2][60000,1][70000,1][80000,1][90000,1]
[60000,4]
[40000,5][50000,14]
Writing stuff to other file-
name_list = ['no','splitted_record']
file_name = 'temp/'+ no +'.csv'
if not os.path.exists(file_name):
f = open(file_name, 'a')
writer = csv.DictWriter(f,delimiter=',',fieldnames=name_list)
writer.writeheader()
else:
f = open(file_name, 'a')
writer = csv.DictWriter(f,delimiter=',',fieldnames=name_list)
writer.writerow({'no':no,'splitted_record':splitted_record})
How I can write this splitted_record without quote ""?

you can join those items together, since you know it split by comma
if __name__ == "__main__":
with open(inputfile, "r") as f:
reader = csv.reader(f,skipinitialspace=True)
next(reader,None)
for read in reader:
no = read[0]
splitted_record = ','.join(read[1:])
print splitted_record
output
[40000,1][50000,5][60000,14]
[40000,5][50000,2][60000,1][70000,1][80000,1][90000,1]
[60000,4]
[40000,5][50000,14]
---update---
data is the above output
with open(filepath,'wb') as f:
w = csv.writer(f)
for line in data:
w.writerow([line])

You can use your own dialect and register it to read as you need.
https://docs.python.org/2/library/csv.html

How to not just add a new first column to csv but alter the header names

I would like to do the following
read a csv file, Add a new first column, then rename some of the columns
then load the records from csv file.
Ultimately, I would like the first column to be populated with the file
name.
I'm fairly new to Python and I've kind of worked out how to change the fieldnames however, loading the data is a problem as it's looking for the original fieldnames which no longer match.
Code snippet
import csv
import os
inputFileName = "manifest1.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_modified.csv"
with open(inputFileName, 'rb') as inFile, open(outputFileName, 'wb') as outfile:
r = csv.DictReader(inFile)
fieldnames = ['MapSvcName','ClientHostName', 'Databasetype', 'ID_A', 'KeepExistingData', 'KeepExistingMapCache', 'Name', 'OnPremisePath', 'Resourcestype']
w = csv.DictWriter(outfile,fieldnames)
w.writeheader()
*** Here is where I start to go wrong
# copy the rest
for node, row in enumerate(r,1):
w.writerow(dict(row))
Error
File "D:\Apps\Python27\ArcGIS10.3\lib\csv.py", line 148, in _dict_to_list
+ ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'Databases [xsi:type]', 'Resources [xsi:type]', 'ID'
Would like to some assistance to not just learn but truly understand what I need to do.
Cheers and thanks
Peter
Update..
I think I've worked it out
import csv
import os
inputFileName = "manifest1.csv"
outputFileName = os.path.splitext(inputFileName)[0] + "_modified.csv"
with open(inputFileName, 'rb') as inFile, open(outputFileName, 'wb') as outfile:
r = csv.reader(inFile)
w = csv.writer(outfile)
header = next(r)
header.insert(0, 'MapSvcName')
#w.writerow(header)
next(r, None) # skip the first row from the reader, the old header
# write new header
w.writerow(['MapSvcName','ClientHostName', 'Databasetype', 'ID_A', 'KeepExistingData', 'KeepExistingMapCache', 'Name', 'OnPremisePath', 'Resourcestype'])
prevRow = next(r)
prevRow.insert(0, '0')
w.writerow(prevRow)
for row in r:
if prevRow[-1] == row[-1]:
val = '0'
else:
val = prevRow[-1]
row.insert(0,val)
prevRow = row
w.writerow(row)

Python: add value and write output

I need to get information from a list and add a column year from name. I still not sure how to add one field 'year' in record. Can I use append?
And about output file, I just need use outputcsv.writerow(records) isn't it?
This is a part of code that I stuck:
filenames = ('babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv')
outFile = open('babyQldAll.csv','w')
csvFile_out = csv.writer(outFile, delimiter=',')
for filename in filenames:
name, ext = filename.split('.')
year = name[-4:] #extract year from file names
records = extract_names(filename)
# Get (name, count, gender) from list "records",
# and add value of "year" and write into output file (using "for" loop )
Output file look like:
2010,Lola,69,Girl
And input, I have 5 file babyQld2010.csv, babyQld2011.csv, babyQld2012.csv, babyQld2012.csv, babyQld2014.csv which contains:
Mia,425,William,493
and I have to sort it in format and I already done it and save in list 'records'
Lola,69,Girl
now I need to add one field 'year' on 'record' list and export csv file.
This is my full code:
import csv
def extract_names(filename):
''' Extract babyname, count, gender from a csv file,
and return the data in a list.
'''
inFile = open(filename, 'rU')
csvFile = csv.reader(inFile, delimiter=',')
# Initialization
records = []
rowNum = 0
for row in csvFile:
if rowNum != 0:
# +++++ You code here ++++
# Read each row of csv file and save information in list 'records'
# as (name, count, gender)
records.append([row[0], row[1], "Female"])
records.append([row[2], row[3], "Male"])
print('Process each row...')
rowNum += 1
inFile.close()
return(records)
#### Start main program #####
filenames = ('babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv')
with open('babyQldAll.csv','w') as outFile:
csvFile_out = csv.writer(outFile, delimiter=',')
for filename in filenames:
name, ext = filename.split('.')
year = name.split('.')[0][-4:] #extract year from file names
records = extract_names(filename)
for record in records:
csvFile_out.write([year] + record)
print("Write in csv file...")
outFile.close()

To get the year from the csv file you can simply split the string at '.' and then take the last four characters from the first part of the split. Example -
>>> s = 'babyQld2010.csv'
>>> s.split('.')[0][-4:]
'2010'
Then just simply iterate over your list of records, which you say is correct, for each list within in, use list contatenation to create a new list with year at the start and write that to csv file.
I would also suggest that you use with statement for opening the file to write to (and even in the function where you are reading from the other csv files). Example -
filenames = ('babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv')
with open('babyQldAll.csv','w') as outFile:
csvFile_out = csv.writer(outFile, delimiter=',')
for filename in filenames:
name, ext = filename.split('.')
year = name.split('.')[0][-4:] #extract year from file names
records = extract_names(filename)
for record in records:
csvFile_out.writerow([year] + record)

Yes, you can just append the year column to each row as you read it in from your source files. You can read in & write out each row as a dictionary so that you can use your existing column headers to address the data if you need to massage it on the way through.
Using the csv.DictWriter() method you specify your headers (fieldnames) when you set it up. You can then write them out with the writeheader() method.
import csv
file_list = ['babyQld2010.csv',
'babyQld2011.csv',
'babyQld2012.csv',
'babyQld2012.csv',
'babyQld2014.csv']
outFile = open('babyQldAll.csv', 'wb')
csv_writer = csv.DictWriter(outFile,
fieldnames=['name','count','gender','year'])
csv_write_out.writeheader()
for a_file in file_list:
name,ext = a_file.split('.')
year = name[-4:]
with open(a_file, 'rb') as inFile:
csv_read_in = csv.DictReader(inFile)
for row in csv_read_in:
row['year'] = year
csv_writer.writerow(row)
outfile.close()
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting unique values from csv file, output to new file - python

Related

removing duplicate id entry from text file using python

Python script using json.load to compare two files and replace stringss

How to read list which contains comma from CSV file as a column?

How to not just add a new first column to csv but alter the header names

Python: add value and write output

Categories

Resources