Parsing and saving the rows of excel file using pandas - python

Have an excel file with a column with some text in each row of this column.
I'm using pandas pd.read_excel() to open this .xlsx file. Then I would like to do the following: I would like to save every row of this column as a distinct .txt file (that would have the text from this row inside this file). Is it possible to be done via pandas?

the basic idea would be to use an iterator to loop over the rows, opening a each file and writing the value in, something like:
import pandas as pd
df = pd.read_excel('test.xlsx')
for i, value in enumerate(df['column']):
with open(f'row-{i}.txt', 'w') as fd:
fd.write(value)

Related

I would like to put a text file in existing excel file on specific cell using python

I would like to put text file(calculation log - not not in tabular form) in existing excel file on specific cell.
import pandas as pd
log=open('path')
writer = pd.ExcelWriter('existing excel file', engine='xlsxwriter')
log.to_excel(writer , sheet_name='Sheet1', startcol=57,startrow=1)
writer.save()
I put other dataframes like this.
However, this txt file cannot be made into a dataframe.
I want the txt file not to go into one cell, but like when I did control c + control v for the whole thing.
What should I do?
You can use pandas to do that too. First, get your log with a context manager
with open('example text.txt', 'r') as file:
log = file.read()
Get your excel file as a dataframe with no headers (column names will be the column numbers)
df = pd.read_excel('example excel.xlsx', header = None)
Put your text where you need it using column number and row number
df.at[row_number,column_number] = log
Then, if you need to, rewrite the excel file
df.to_excel('example excel.xlsx', index = False)

Delete rows in CSV file after being read by pandas

So I want to have 1 script writing continually to a CSV file, and another script reading periodically from that same CSV file.
What I'm looking for is a way to delete the rows I've just read in from the CSV file (not from my pandas dataframe).
Can anybody help?
# Read data in to dataframe
deviceInfo = pd.read_csv("sampleData.csv", nrows = 100)
# Somehow delete those 100 rows from the CSV file
#JoseAngelSanchez is correct that you might want to read the whole csv into a dataframe, but I think this way lets you get a dataframe with the first 100 rows and still delete them from the csv file.
import pandas as pd
df = pd.read_csv("sampleData.csv")
deviceInfo = df.iloc[:100]
df.iloc[100:].to_csv("sampleData.csv")
Note: if you're doing this repetitively then you'll probably want to write to_csv(...,index=None) or a new index column will be created in the .csv file on each iteration.
You should read the whole document and then delete the rows you don't want
import pandas as pd
df = pd.read_csv("sampleData.csv")
df = df.iloc[100:]
df.to_csv("sampleData.csv")

a way to update csv file without rewriting

import pandas as pd
import csv,sched,nltk,arrow
from time import perf_counter
def ngram_search():
#code here
item['length']=len(word_tokenize(s)) # number
item['time edit']=arrow.utcnow().shift(hours=-1).humanize().upper() # text
item['top1'] = "word"
return item
path=r"C:\Users\Sublime Text\products.csv"
df = pd.read_csv(path)
saved_column = df['Name of Product']
for i in saved_column:
name=i.replace(",","+").replace(";","+")
item=ngram_search(name)
#update csv file here add the columns of items to the csv file
I am trying to add to an existing csv file (15 columns, 283216 row), the function returns a dictionary of 3 values.
Is there a way to directly update the csv file without rewriting it?
DataFrame.to_csv() has option mode="a" to append to existing file.
You have to only write the same number of columns and write them in the same order (with the same separator, etc.) to create correctly formatted CVS.
This way you can only append to the end of the file. If you need to write between existing rows or add a new column then you have to read all. OR you have to read from one file and write to a new one (all data with changes).

How to convert data from txt files to Excel files using python

I have a text file that contains data like this. It is is just a small example, but the real one is pretty similar.
I am wondering how to display such data in an "Excel Table" like this using Python?
The pandas library is wonderful for reading csv files (which is the file content in the image you linked). You can read in a csv or a txt file using the pandas library and output this to excel in 3 simple lines.
import pandas as pd
df = pd.read_csv('input.csv') # if your file is comma separated
or if your file is tab delimited '\t':
df = pd.read_csv('input.csv', sep='\t')
To save to excel file add the following:
df.to_excel('output.xlsx', 'Sheet1')
complete code:
import pandas as pd
df = pd.read_csv('input.csv') # can replace with df = pd.read_table('input.txt') for '\t'
df.to_excel('output.xlsx', 'Sheet1')
This will explicitly keep the index, so if your input file was:
A,B,C
1,2,3
4,5,6
7,8,9
Your output excel would look like this:
You can see your data has been shifted one column and your index axis has been kept. If you do not want this index column (because you have not assigned your df an index so it has the arbitrary one provided by pandas):
df.to_excel('output.xlsx', 'Sheet1', index=False)
Your output will look like:
Here you can see the index has been dropped from the excel file.
You do not need python! Just rename your text file to CSV and voila, you get your desired output :)
If you want to rename using python then -
You can use os.rename function
os.rename(src, dst)
Where src is the source file and dst is the destination file
XLWT
I use the XLWT library. It produces native Excel files, which is much better than simply importing text files as CSV files. It is a bit of work, but provides most key Excel features, including setting column widths, cell colors, cell formatting, etc.
saving this is:
df.to_excel("testfile.xlsx")

How to read data from excel from a particular column in python

I have an excel sheet and I am reading the excel sheet using pandas in python.
Now I want to read the excel file based on a column, if the column has some value then do not read that row, if the column is empty than read that and store the values in a list.
Here is a screenshot
Excel Example
Now in the above image when the uniqueidentifier is yes then it should not read that value, but if it is empty then it should start reading from that value.
How to do that using python and how to get index so that after I have performed some function that I am again able to write to that blank unique identifier column saying that row has been read
This is possible for csv files. There you could do
iter_csv = pandas.read_csv('file.csv', iterator=True, chunksize=100000)
df = pd.concat([chunk[chunk['UniqueIdentifier'] == 'True'] for chunk in iter_csv])
But pd.read_excel does not offer to return an iterator object, maybe some other excel-readers can. But I don't no which ones. Nevertheless you could export your excel file as csv and use the solution for csv files.

Categories