csv skipping appending data skips rows - python

I have python code for appending data to the same csv, but when I append the data, it skips rows, and starts from row 15, instead from row 4
import csv
with open('csvtask.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
ls = []
for line in csv_reader:
if len(line['Values'])!= 0:
ls.append(int(line['Values']))
new_ls = ['','','']
for i in range(len(ls)-1):
new_ls.append(ls[i+1]-ls[i])
print(new_ls)
with open('csvtask.csv','a',newline='') as new_file:
csv_writer = csv.writer(new_file)
for i in new_ls:
csv_writer.writerow(('','','','',i))
new_file.close()
Here is the image

It's not really feasible to update a file at the same time you're reading it, so a common workaround it to create a new file. The following does that while preserving the fieldnames in the origin file. The new column will be named Diff.
Since there's no previous value to use to calculate a difference for the first row, the rows of the files are processed using the built-in enumerate() function which provides a value each time it's called which provides the index of the item in the sequence as well as the item itself as the object is iterated. You can use the index to know whether the current row is the first one or not and handle in a special way.
import csv
# Read csv file and calculate values of new column.
with open('csvtask.csv', 'r', newline='') as file:
reader = csv.DictReader(file)
fieldnames = reader.fieldnames # Save for later.
diffs = []
prev_value = 0
for i, row in enumerate(reader):
row['Values'] = int(row['Values']) if row['Values'] else 0
diff = row['Values'] - prev_value if i > 0 else ''
prev_value = row['Values']
diffs.append(diff)
# Read file again and write an updated file with the column added to it.
fieldnames.append('Diff') # Name of new field.
with open('csvtask.csv', 'r', newline='') as inp:
reader = csv.DictReader(inp)
with open('csvtask_updated.csv', 'w', newline='') as outp:
writer = csv.DictWriter(outp, fieldnames)
writer.writeheader()
for i, row in enumerate(reader):
row.update({'Diff': diffs[i]}) # Add new column.
writer.writerow(row)
print('Done')

You can use the DictWriter function like this:-
header = ["data", "values"]
writer = csv.DictWriter(file, fieldnames = header)
data = [[1, 2], [4, 6]]
writer.writerows(data)

Related

Create multiple files from unique values of a column using inbuilt libraries of python

I started learning python and was wondering if there was a way to create multiple files from unique values of a column. I know there are 100's of ways of getting it done through pandas. But I am looking to have it done through inbuilt libraries. I couldn't find a single example where its done through inbuilt libraries.
Here is the sample csv file data:
uniquevalue|count
a|123
b|345
c|567
d|789
a|123
b|345
c|567
Sample output file:
a.csv
uniquevalue|count
a|123
a|123
b.csv
b|345
b|345
I am struggling with looping on unique values in a column and then print them out. Can someone explain with logic how to do it ? That will be much appreciated. Thanks.
import csv
from collections import defaultdict
header = []
data = defaultdict(list)
DELIMITER = "|"
with open("inputfile.csv", newline="") as csvfile:
reader = csv.reader(csvfile, delimiter=DELIMITER)
for i, row in enumerate(reader):
if i == 0:
header = row
else:
key = row[0]
data[key].append(row)
for key, value in data.items():
filename = f"{key}.csv"
with open(filename, "w", newline="") as f:
writer = csv.writer(f, delimiter=DELIMITER)
rows = [header] + value
writer.writerows(rows)
import csv
with open('sample.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
with open(f"{row[0]}.csv", 'a') as inner:
writer = csv.writer(
inner, delimiter='|',
fieldnames=('uniquevalue', 'count')
)
writer.writerow(row)
the task can also be done without using csv module. the lines of the file are read, and with read_file.read().splitlines()[1:] the newline characters are stripped off, also skipping the header line of the csv file. with a set a unique collection of inputdata is created, that is used to count number of duplicates and to create the output files.
with open("unique_sample.csv", "r") as read_file:
items = read_file.read().splitlines()[1:]
for line in set(items):
with open(line[:line.index('|')] + '.csv', 'w') as output:
output.write((line + '\n') * items.count(line))

Insert next line in For loop

I'd like to have a next line inside my for loop, currently, what happens is that since it is inside a for loop, all the data is stored in an array and once I write it at the end of my code, it prints as one line.
fields = []
new_rows_list = []
file1 = open('CSV_sample_file.csv','rb')
reader = csv.reader(file1)
fields = reader.next()
for row in reader:
for column in row:
cellValue = column
new_row = find_and_mask_cc(cellValue)
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerow(new_rows_list)
file2.close()
What I am getting is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111,EMSPDGCAJAPIN,test2,511111XXXXXX1111,EMSPDGNCRETES,test3,611111XXXXXX1111
My expected output is this:
Created_User_ID__c,BRAND_Type__c,Migration_Remarks__c
EMSPDGBELNAS,test1,411111XXXXXX1111
EMSPDGCAJAPIN,test2,511111XXXXXX111
EMSPDGNCRETES,test3,611111XXXXXX1111
You're appending all columns to the same list new_rows_list and writing it as one row with writer.writerow(new_rows_list).
You can make new_rows_list a list of lists and use writer.writerows for output instead:
...
for row in reader:
new_row = []
for column in row:
cellValue = column
new_row.append(find_and_mask_cc(cellValue))
new_rows_list.append(new_row)
file1.close()
file2 = open('CSV_sample_file2.csv', 'wb')
writer = csv.writer(file2)
writer.writerow(fields)
writer.writerows(new_rows_list)
...
Alternatively, you can pass to writerows a generator expression that iterates through reader to write each row with columns converted by find_and_mask_cc to the output as you read it from the input, so it won't require reading the entire input into memory:
with open('CSV_sample_file.csv') as file1, open('CSV_sample_file2.csv', 'w', newline='') as file2:
reader = csv.reader(file1)
writer = csv.writer(file2)
writer.writerow(next(reader))
writer.writerows(map(find_and_mask_cc, row) for row in reader)
Demo: https://repl.it/repls/SatisfiedSardonicExponents

write specific row only once?

I want to write in a CSV file some data. I don't have a problem to do this. The only issue I get is that I want to write the "title" just once, but it's writing it every two lines.
Here is my code:
rows = [['IVE_PATH','FPS moyen','FPS max','FPS min','MEDIAN'],[str(listFps[k]),statistics.mean(numberList), max(numberList), min(numberList), statistics.median(numberList)]]
with open("C:\ProgramData\OutilTestObjets3D\MaquetteCB-2019\DataSet\doc.csv", 'a', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=';')
for row in rows:
csv_writer.writerow(row)
k += 1
I want to have this:
['IVE_PATH','FPS moyen','FPS max','FPS min','MEDIAN']
written only once at the top of the file, and not every two lines.
Solution is adding Not keywor in loop
with open("C:\ProgramData\OutilTestObjets3D\MaquetteCB-2019\DataSet\doc.csv", 'a', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=';')
for row Not in rows:
csv_writer.writerow(row)
k += 1
It's because you opened the file in append mode ('a') and you are iterating over all the rows each time you write to the file. This means every time you write, you will add both the header and the data to the existing file.
The solution is to separate the writing of the header and the data rows.
One way is to check first if you are writing to an empty file with tell(), and if you are, that's the only time to write the header. Then proceed with iterating over all the rows except for the header.
import csv
rows = [
['IVE_PATH','FPS moyen','FPS max','FPS min','MEDIAN'], # header
[1,2,3,4,5], # sample data
[6,7,8,9,0] # sample data
]
with open("doc.csv", 'a', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=';')
# Check if we are at the top of an empty file.
# If yes, then write the header.
# If no, then assume that the header was already written earlier.
if csvfile.tell() == 0:
csv_writer.writerow(rows[0])
# Iterate over only the data, skip rows[0]
for row in rows[1:]:
csv_writer.writerow(row)
Another way is to check first if the output CSV file exists. If it does not exist yet, create it and write the header row. Then succeeding runs of your code should only append the data rows.
import csv
import os
rows = [
['IVE_PATH','FPS moyen','FPS max','FPS min','MEDIAN'], # header
[1,2,3,4,5], # sample data
[6,7,8,9,0] # sample data
]
csvpath = "doc.csv"
# If the output file does not exist yet, create it.
# Then write the header row.
if not os.path.exists(csvpath):
with open(csvpath, "w") as csvfile:
csv_writer = csv.writer(csvfile, delimiter=';')
csv_writer.writerow(rows[0])
with open(csvpath, 'a', newline='') as csvfile:
csv_writer = csv.writer(csvfile, delimiter=';')
# Iterate over only the data, skip rows[0]
for row in rows[1:]:
csv_writer.writerow(row)

Python: replacing some rows of a csv but not others

I am trying to create a new csv file from an original. The new csv file should be a copy of the old, with the exception that a range of values in one column is multiplied by a constant. The values to alter occur from rows i to j inclusive. Here is the code I am attempting:
import csv
import itertools
i, j = 2, 10785
infile = open('../../combined_kW.csv', 'r')
outfile = open('../../combined_kW_adj.csv', 'w')
reader = csv.reader(infile, delimiter= ',')
datawriter = csv.writer(outfile, delimiter=',')
datawriter.writerow(['date', 'PVkW', 'TBLkW'])
next(reader) # there is a header row
for row in reader:
for row in itertools.islice(reader, i, j):
row[1] = row[1].replace(row[1], str(float(row[1]) * 5))
datawriter.writerow((row[0], row[1], row[2]))
From a csv with roughly 25,000 rows, the contents of the returned file are only:
date,PVkW,TBLkW
2016/04/04 03:00,0.0,207.23748999999998
2017/07/19 09:00,2921.5,287.15625
2018/01/12 18:00,0.0,267.9414
None of which are related to the rows i and j designated above. How can I better go about this?
import csv
i, j = 2, 10785
# assuming Python 3; otherwise omit 'newline'
with open('../../combined_kW.csv', 'r', newline='') as f:
r = csv.reader(f, delimiter=',')
rows = list(r)
# slice creates a shallow copy
# meaning each element still points to the same array element in rows!
for row in rows[i:j]:
row[1] = row[1].replace(row[1], str(float(row[1]) * 5))
with open('../../combined_kW_adj.csv', 'w', newline='') as f:
w = csv.writer(f, delimiter=',')
w.writerows(rows)

How to read one single line of csv data in Python?

There is a lot of examples of reading csv data using python, like this one:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
print(row)
I only want to read one line of data and enter it into various variables. How do I do that? I've looked everywhere for a working example.
My code only retrieves the value for i, and none of the other values
reader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in reader:
i = int(row[0])
a1 = int(row[1])
b1 = int(row[2])
c1 = int(row[2])
x1 = int(row[2])
y1 = int(row[2])
z1 = int(row[2])
To read only the first row of the csv file use next() on the reader object.
with open('some.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
# now do something here
# if first row is the header, then you can do one more next() to get the next row:
# row2 = next(f)
or :
with open('some.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
# do something here with `row`
break
you could get just the first row like:
with open('some.csv', newline='') as f:
csv_reader = csv.reader(f)
csv_headings = next(csv_reader)
first_line = next(csv_reader)
You can use Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
Just for reference, a for loop can be used after getting the first row to get the rest of the file:
with open('file.csv', newline='') as f:
reader = csv.reader(f)
row1 = next(reader) # gets the first line
for row in reader:
print(row) # prints rows 2 and onward
From the Python documentation:
And while the module doesn’t directly support parsing strings, it can easily be done:
import csv
for row in csv.reader(['one,two,three']):
print row
Just drop your string data into a singleton list.
The simple way to get any row in csv file
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(csvFileArray[0])
To print a range of line, in this case from line 4 to 7
import csv
with open('california_housing_test.csv') as csv_file:
data = csv.reader(csv_file)
for row in list(data)[4:7]:
print(row)
I think the simplest way is the best way, and in this case (and in most others) is one without using external libraries (pandas) or modules (csv). So, here is the simple answer.
""" no need to give any mode, keep it simple """
with open('some.csv') as f:
""" store in a variable to be used later """
my_line = f.nextline()
""" do what you like with 'my_line' now """

Categories