How can I read csv file lines in batches?

How can I read csv file lines in batches? - python

I want to do something like this:
1.Run a for loop until a condition is met.
2.Do something with the first 10 lines from csv file.
3.Run the for loop from 1. again until a condition is again met.
4.Do something with the next 10 lines from the csv file.
I have searches for a solution but it's all about how to get the first lines in batches from a txt file but not how to do it with csv file.

Related

How to write to CSV using csv package in python line by line

I want to write data to a csv file row by row :
Please find the code below:
with open('a.csv',mode='w') as csv_file:
fieldnames=['colA','colB','colC']
writer=csv.DictWriter(csv_file,fieldnames=fieldnames)
writer.write_row({'colA':data1,'colB':data2,'colC':data3})
The above code is inside a loop in which the data changes each time and i need to write to csv file in every loop. With this code my csv file is having only 1 line with the latest data. How do i modify the code to get multiple lines?

You're recreating the file and reinitializing the csv on each iteration. Instead, move the initialization outside the loop so that it only happens once.
with open('a.csv',mode='w') as csv_file:
fieldnames=['colA','colB','colC']
writer=csv.DictWriter(csv_file,fieldnames=fieldnames)
for line in lines: # This is where your `for` loop should be relative to `with`
writer.write_row({'colA':data1,'colB':data2,'colC':data3})

Is there a way I can extract mutliple pieces of data from a multiple text file in python and save it as a row in a new .csv file?

Is there a way I can extract multiple pieces of data from a text file in python and save it as a row in a new .csv file? I need to do this for multiple input files and save the output as a single .csv file for all of the input files.
I have never used Python before so I am quite clueless. I have used matlab before and I know how I would do it in matlab if it was numbers (but unfortunately it is text which is why I am trying python). So to be clear I need a new line in the .csv output file for each "ID" in the input files.
An example of the data is show below (2 separate files)
EXAMPLE DATA - FILE 1:
id,ARI201803290
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
info,date,2018/03/29
id,ARI201803300
data,er,corbp001,2
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
info,date,2018/03/30
data,er,delaj001,0
EXAMPLE DATA - FILE 2:
id,NYN201803290
version,2
info,visteam,SLN
info,hometeam,NYN
info,site,NYC20
info,usedh,false
info,date,2018/03/29
data,er,famij001,0
id,NYN201803310
version,2
info,visteam,SLN
info,hometeam,NYN
info,site,NYC20
info,date,2018/03/31
data,er,gselr001,0
I'm hoping to get the data in a .csv format with all the details from one "id" on 1 line. There are multiple "id's" per text file and there are multiple files. I want to repeat this process for multiple text files so the outputs are in the same .csv output file. I want the output to look as follows in the .csv file, with each piece of info as a new cell:
ARI201803290 COL ARI PHO01 2018/03/29 2
ARI201803300 COL ARI PHO01 2018/03/30 0
NYN201803290 SLN NYN NYC20 2018/03/29 0
NYN201803310 SLN NYN NYC20 2018/03/31 0
If I was doing it in matlab I'd use a for loop and if statement and say
j=1
k=1
for i=1:size(myMatrix, 1)
if file1(i;1)==id
output(k,1)=(i;2)
k=k+1
else if
file1(i;1)==info && file1(i;1)==info
output(j,2)=(i;3)
j=j+1
etc.....
However I obviously can't do this in matlab because I have comma separated text files, not a matrix. Does anyone have any suggestions how I can translate my idea to python code? Or any other suggestion. I am super new to python so willing to try anything that might work.
Thank you very much in advance!

python is very flexible and can do these jobs very easily,
there is a lot of csv tools/modules in python to handle pretty much all type of csv and excel files, however i prefer to handle a csv the same as a text file because csv is simply a text file with comma separated text, so simple is better than complicated
below is the code with comments to explain most of it, you can tweak it to match your needs exactly
import os
input_folder = 'myfolder/' # path of folder containing the text files on your disk
# create a list with file names with their full paths using list comprehension
data_files = [os.path.join(input_folder, file) for file in os.listdir(input_folder)]
# open our csv file for writing
csv = open('myoutput.csv', 'w') # better to open files with context manager like below but i am trying to show you different methods
def write_to_csv(line):
print(line)
csv.write(line)
# loop thru your text files
for file in data_files:
with open(file, 'r') as f: # use context manager to open files (best practice)
buff = []
for line in f:
line = line.strip() # remove spaces and new lines
line = line.split(',') # split line to list of values
if buff and line[0] == 'id': # hit another 'id'
write_to_csv(','.join(buff) + '\n')
buff = []
buff.append(line[-1]) # add the last word in line
write_to_csv(','.join(buff) + '\n')
csv.close() # must close any open file handles opened manually "no context manager i.e. no with"
output:
ARI201803290,2,COL,ARI,PHO01,2018/03/29,2
ARI201803300,2,COL,ARI,PHO01,2018/03/30,0
NYN201803290,2,SLN,NYN,NYC20,false,2018/03/29,0
NYN201803310,2,SLN,NYN,NYC20,2018/03/31,0

python open first row in csv then save the remaining rows only in the same csv file

What is the most efficient way?
Number of rows is approximately 100,000. So using a standard for loop to read all line and the using for loop to write the remaining lines must be very inefficient.

Overwriting Multiple rows in a csv file using python

I have csv file in which I am storing 5 rows by running my python script.
I want to overwrite the rows when 6th time or nth time my script run.and delete the 1st row
My csv looks like this:
Timestamp,Offset,lag
1441365363,601,1
1441365363,602,2
1441365371,603,3
1441365378,604,4
1441375562,605,0

Sounds like you should use line_number % 5 when you write to your file. Show us some code.

reading from multiple txt files - strip data and save to xls

i'm very new to python, so far i have written the following code below, which allows me to search for text files in a folder, then read all the lines from it, open an excel file and save the read lines in it. (Im still unsure whether this does it for all the text files one by one)
Having run this, i only see the file text data being read and saved into the excel file (first column). Or it could be that it is overwriting the the data from multiple text files into the same column until it finishes.
Could anyone point me in the right direction on how to get it to write the stripped data to the next available column in excel through each text file?
import os
import glob
list_of_files = glob.glob('./*.txt')
for fileName in list_of_files:
fin = open( fileName, "r" )
data_list = fin.readlines()
fin.close() # closes file
del data_list[0:17]
del data_list[1:27] # [*:*]
fout = open("stripD.xls", "w")
fout.writelines(data_list)
fout.flush()
fout.close()

Can be condensed in
import glob
list_of_files = glob.glob('./*.txt')
with open("stripD.xls", "w") as fout:
for fileName in list_of_files:
data_list = open( fileName, "r" ).readlines()
fout.write(data_list[17])
fout.writelines(data_list[44:])
Are you aware that writelines() doesn't introduce newlines ? readlines() keeps newlines during a reading, so there are newlines present in the elements of data_list written in the file by writelines() , but this latter doesn't introduce newlines itself

You may like to check this and for simple needs also csv.

These lines are "interesting":
del data_list[0:17]
del data_list[1:27] # [*:*]
You are deleting as many of the first 17 lines of your input file as exist, keeping the 18th (if it exists), deleting another 26 (if they exist), and keeping any following lines. This is a very unusual procedure, and is not mentioned at all in your description of what you are trying to do.
Secondly, you are writing the output lines (if any) from each to the same output file. At the end of the script, the output file will contain data from only the last input file. Don't change your code to use append mode ... opening and closing the same file all the time just to append records is very wasteful, and only justified if you have a real need to make sure that the data is flushed to disk in case of a power or other failure. Open your output file once, before you start reading files, and close it once when you have finished with all the input files.
Thirdly, any old arbitrary text file doesn't become an "excel file" just because you have named it "something.xls". You should write it with the csv module and name it "something.csv". If you want more control over how Excel will interpret it, write an xls file using xlwt.
Fourthly, you mention "column" several times, but as you have not given any details about how your input lines are to be split into "columns", it is rather difficult to guess what you mean by "next available column". It is even possible to suspect that you are confusing columns and rows ... assuming less than 43 lines in each input file, the 18th ROW of the last input file will be all you will see in the output file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.