import pandas as pd
import csv,sched,nltk,arrow
from time import perf_counter
def ngram_search():
#code here
item['length']=len(word_tokenize(s)) # number
item['time edit']=arrow.utcnow().shift(hours=-1).humanize().upper() # text
item['top1'] = "word"
return item
path=r"C:\Users\Sublime Text\products.csv"
df = pd.read_csv(path)
saved_column = df['Name of Product']
for i in saved_column:
name=i.replace(",","+").replace(";","+")
item=ngram_search(name)
#update csv file here add the columns of items to the csv file
I am trying to add to an existing csv file (15 columns, 283216 row), the function returns a dictionary of 3 values.
Is there a way to directly update the csv file without rewriting it?
DataFrame.to_csv() has option mode="a" to append to existing file.
You have to only write the same number of columns and write them in the same order (with the same separator, etc.) to create correctly formatted CVS.
This way you can only append to the end of the file. If you need to write between existing rows or add a new column then you have to read all. OR you have to read from one file and write to a new one (all data with changes).
Related
I am currently doing one of my final assignment and I have a CSV file with a few columns of different data.
Currently interested in extracting out a single column and converting the individual rows into a txt file.
Here is my code:
import pandas as pd
import csv
df = pd.read_csv("AUS_NZ.csv")
print(df.head(10))
print(df["content"])
num_of_review = len(df["content"])
print(num_of_review)
for i in range (num_of_review):
with open ("{}.txt".format(i),"a", encoding="utf-8") as f:
f.write(df["content"][i])
No issue with extracting out the individual rows. But when I examine the txt files that was extracted and look at the content, I noticed that it copied out the text (which is what I want) but it did so twice (which is not what I want).
Example:
"This is an example of what the dataframe have at that particular column which I want to convert to a txt file."
This is what was copied to the txt file:
"This is an example of what the dataframe have at that particular column which I want to convert to a txt file.This is an example of what the dataframe have at that particular column which I want to convert to a txt file."
Any advise on how to just copy the content once only?
Thanks! While thinking about how to rectify this, I came to the same conclusion as you. I made a switch from "a" to "w" and it solved that issue.
Too used to append so I tried that before I tried write.
The correct code:
import pandas as pd
import csv
df = pd.read_csv("AUS_NZ.csv")
print(df.head(10))
print(df["content"])
num_of_review = len(df["content"])
print(num_of_review)
for i in range (num_of_review):
with open ("{}.txt".format(i),"w", encoding="utf-8") as f:
f.write(df["content"][i])
So I want to have 1 script writing continually to a CSV file, and another script reading periodically from that same CSV file.
What I'm looking for is a way to delete the rows I've just read in from the CSV file (not from my pandas dataframe).
Can anybody help?
# Read data in to dataframe
deviceInfo = pd.read_csv("sampleData.csv", nrows = 100)
# Somehow delete those 100 rows from the CSV file
#JoseAngelSanchez is correct that you might want to read the whole csv into a dataframe, but I think this way lets you get a dataframe with the first 100 rows and still delete them from the csv file.
import pandas as pd
df = pd.read_csv("sampleData.csv")
deviceInfo = df.iloc[:100]
df.iloc[100:].to_csv("sampleData.csv")
Note: if you're doing this repetitively then you'll probably want to write to_csv(...,index=None) or a new index column will be created in the .csv file on each iteration.
You should read the whole document and then delete the rows you don't want
import pandas as pd
df = pd.read_csv("sampleData.csv")
df = df.iloc[100:]
df.to_csv("sampleData.csv")
Have an excel file with a column with some text in each row of this column.
I'm using pandas pd.read_excel() to open this .xlsx file. Then I would like to do the following: I would like to save every row of this column as a distinct .txt file (that would have the text from this row inside this file). Is it possible to be done via pandas?
the basic idea would be to use an iterator to loop over the rows, opening a each file and writing the value in, something like:
import pandas as pd
df = pd.read_excel('test.xlsx')
for i, value in enumerate(df['column']):
with open(f'row-{i}.txt', 'w') as fd:
fd.write(value)
I am pretty new to Python in general, but am trying to make a script that takes data from certain files in a folder and puts it into an Excel spreadsheet.
The code I have will find the file type that I want in my specified folder, and then make a list with the full file paths.
import os
file_paths = []
for folder, subs, files in os.walk('C://Users/Dir'):
for filename in files:
if filename.endswith(".log") or filename.endswith(".txt"):
file_paths.append(os.path.abspath(os.path.join(folder,filename)))
It will also take a specific file path, pull data from the correct column, and put it into excel in the correct cells.
import pandas as pd
import numpy
for i in range(len(file_paths)):
fields = ['RDCR']
data = pd.read_table(file_paths[i], sep= "\s+", names = fields, usecols=[3],
Where I am having trouble is making the read_table iterate through my list of files and put the data into an excel sheet where every time it reads a new file it moves over one column in the spreadsheet.
Ideally, the for loop would see how long the file_paths list is, and use that as the range. It would then use the file_paths[i] to input the file names into the read_table one by one.
What happens is that it finds the length of file_paths, and instead of iterating through the files in it one by one, it just inputs the data from the last file on the list.
Any help would be much appreciated! Thank you!
Try to concatenate all of them at once and write to excel one time.
from glob import glob
import pandas as pd
files = glob('C://Users/Dir/*.log') + glob('C://Users/Dir/*.txt')
def read_file(f):
fields = ['RDCR']
return pd.read_table(
f, sep="\s+",
names=fields, usecols=[3])
df = pd.concat([read_file(f) for f in files], axis=1).to_excel('out.xlsx')
I am a Python beginner and trying to solve this task:
I have multiple (125) .csv files (48 rows and 5 columns each), and trying to make a new file that will contain first row and last row (written in a single row) from every .csv file a have.
To get you started here is how you can generate the list of files and open them using Pandas. This will generate a list of csv files from a directory, iterate the list and open each as a CSV Pandas DataFrame. Then it creates a new list of the first and last rows of each csv file. I am not sure how you want to create one row out of two though so hopefully this is a starting point for you.
import os
import pandas as pd
#get all files in current directory, or specify the directory in lisdir
csv_files = [file for file in os.listdir(".")]
#create dictionary and load all the files as dataframes.
dataframes = {}
for x in range(len(csv_files)):
dataframes[x] = pd.read_csv(csv_files[x])
#get first and last row from each dataframe(loaded csv).
result_df = pd.DataFrame()
for item in dataframes:
result_df = result_df.append(dataframes[item].iloc[0])
result_df = result_df.append(dataframes[item].iloc[-1])
#write to csv file.
result_df.to_csv("resulting.csv")