Python & Pandas: How to address NaN values in a loop?

Python & Pandas: How to address NaN values in a loop? - python

With Python and Pandas I'm seeking to take values from CSV cells and write them as txt files via a loop. The structure of the CSV file is:
user_id, text, text_number
0, test text A, text_0
1,
2,
3,
4,
5, test text B, text_1
The script below successfully writes a txt file for the first row - it is named text_0.txt and contains test text A.
import pandas as pd
df= pd.read_csv("test.csv", sep=",")
for index in range(len(df)):
with open(df["text_number"][index] + '.txt', 'w') as output:
output.write(df["text"][index])
However, I receive an error when it proceeds to the next row:
TypeError: write() argument must be str, not float
I'm guessing the error is generated when it encounters values it reads as NaN. I attempted to add the dropna feature per the pandas documentation like so:
import pandas as pd
df= pd.read_csv("test.csv", sep=",")
df2 = df.dropna(axis=0, how='any')
for index in range(len(df)):
with open(df2["text_number"][index] + '.txt', 'w') as output:
output.write(df2["text"][index])
However, the same issue persists - a txt file is created for the first row, but a new error message is returned for the next row: KeyError: 1.
Any suggestions? All assistance greatly appreciated.

The issue here is that you are creating a range index which is not necessarily in the data frame's index. For your use case, you can just iterate through rows of data frame and write to the file.
for t in df.itertuples():
if t.text_number: # do not write if text number is None
with open(t.text_number + '.txt', 'w') as output:
output.write(str(t.text))

Related

In Pandas, how can I extract certain value using the key off of a dataframe imported from a csv file?

Using Pandas, I'm trying to extract value using the key but I keep failing to do so. Could you help me with this?
There's a csv file like below:
value
"{""id"":""1234"",""currency"":""USD""}"
"{""id"":""5678"",""currency"":""EUR""}"
I imported this file in Pandas and made a DataFrame out of it:
dataframe from a csv file
However, when I tried to extract the value using a key (e.g. df["id"]), I'm facing an error message.
I'd like to see a value 1234 or 5678 using df["id"]. Which step should I take to get it done? This may be a very basic question but I need your help. Thanks.

The csv file isn't being read in correctly.
You haven't set a delimiter; pandas can automatically detect a delimiter but hasn't done so in your case. See the read_csv documentation for more on this. Because the , the pandas dataframe has a single column, value, which has entire lines from your file as individual cells - the first entry is "{""id"":""1234"",""currency"":""USD""}". So, the file doesn't have a column id, and you can't select data by id.
The data aren't formatted as a pandas df, with row titles and columns of data. One option is to read in this data is to manually process each row, though there may be slicker options.
file = 'test.dat'
f = open(file,'r')
id_vals = []
currency = []
for line in f.readlines()[1:]:
## remove obfuscating characters
for c in '"{}\n':
line = line.replace(c,'')
line = line.split(',')
## extract values to two lists
id_vals.append(line[0][3:])
currency.append(line[1][9:])

You just need to clean up the CSV file a little and you are good. Here is every step:
# open your csv and read as a text string
with open('My_CSV.csv', 'r') as f:
my_csv_text = f.read()
# remove problematic strings
find_str = ['{', '}', '"', 'id:', 'currency:','value']
replace_str = ''
for i in find_str:
my_csv_text = re.sub(i, replace_str, my_csv_text)
# Create new csv file and save cleaned text
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(my_csv_text)
# Create pandas dataframe
df = pd.read_csv('my_new_csv.csv', sep=',', names=['ID', 'Currency'])
print(df)
Output df:
ID Currency
0 1234 USD
1 5678 EUR

You need to extract each row of your dataframe using json.loads() or eval()
something like this:
import json
for row in df.iteritems():
print(json.loads(row.value)["id"])
# OR
print(eval(row.value)["id"])

Python, Pandas. create a text file from a specific column

I am trying to create a text file from a column in a pandas dataframe. There are repeating values and I'd like each value to only be copied once. I also do not want the row value in the text file.
I have tried creating a dictionary:
stocks = dict(enumerate(df.tic.unique()))
then:
f = open("stocks","w")
f.write( str(stocks) )
f.close()
The text file output is all the names, but I'd like each to have their own line. Additionally, the row number is included which I need left out.

I don't know what your dataframe looks like but here's an example.
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)
df["col1"].to_csv(r'data.txt', header=None, index=None, sep='\t', mode='a')
Replace the col1 in df["col1"] with the name of your column and df by the name of your dataframe of course. IF you want to remove duplicates you could also simply use df.drop_duplicate(...) with the settings you want before saving your dataframe in the text file.

You're writing the string 'stocks' to the file. You need to write the variable stocks.
saveFile = open('stocks', 'w')
saveFile.write(stocks)
saveFile.close()

If you want to use unique(), you can try this:
with open("stocks.txt", "w") as f:
f.write("\n".join(df.tic.unique()))
df.tic.unique() returns the unique values of the column tic
"\n".join(df.tic.unique()) concatenates the unique values into a giant string, separated by new-line
f.write writes the giant string from 2 to the file
I have also edited "stocks" to "stocks.txt".

How to process .dat file Dataframe with multiple columns in different rows?

I am trying to import data from .dat files.
The files have the following structure (and there are a few hundred for each measurement):
#-G8k5perc
#acf0
4e-07 1.67466
8e-07 1.57061
...
13.4217728 0.97419
&
#fit0
2.4e-06 1.5376
3.2e-06 1.5312
...
13.4 0.99578
&
...
#cnta0
#with g2
#cnta0
0 109.74
0.25 107.97
...
19.75 104.05
#rate0 107.2
I have tried:
1)
df = pd.read_csv("G8k5perc-1.dat")
which only gives one column.
Adding ,sep=' ', ,delimiter=' ' or ,delim_whitespace=True leads to
ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
2)
I have seen someone using:
from string import find, rfind, split, strip
Which raises the error: ImportError: cannot import name 'find' from 'string' for all four.
3)
Creating slices and changing them afterwards wont work either:
acf=df[1:179]
acf["#-G8k5perc"]= acf["#-G8k5perc"].str.split(" ", n = 1, expand = True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
app.launch_new_instance()
Any Ideas on how to get two columns for each set of data (acf0, fit0, etc.) in the files?

You cannot use csv reader with a data format .dat.
Try the code below:
import csv
datContent = [i.strip().split() for i in open("./yourdata.dat").readlines()]
with open("./yourdata.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
Then try to use pandas to make new columns:
import pandas as pd
def your_func(row):
return row['x-momentum'] / row['mass']
columns_to_keep = ['#time', 'x-momentum', 'mass']
dataframe = pd.read_csv("./yourdata.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)
print dataframe
Replace yourdata.csv with your input file name.

Convert this list of lists in CSV

I am a novice in Python, and after several searches about how to convert my list of lists into a CSV file, I didn't find how to correct my issue.
Here is my code :
#!C:\Python27\read_and_convert_txt.py
import csv
if __name__ == '__main__':
with open('c:/python27/mytxt.txt',"r") as t:
lines = t.readlines()
list = [ line.split() for line in lines ]
with open('c:/python27/myfile.csv','w') as f:
writer = csv.writer(f)
for sublist in list:
writer.writerow(sublist)
The first open() will create a list of lists from the txt file like
list = [["hello","world"], ["my","name","is","bob"], .... , ["good","morning"]]
then the second part will write the list of lists into a csv file but only in the first column.
What I need is from this list of lists to write it into a csv file like this :
Column 1, Column 2, Column 3, Column 4 ......
hello world
my name is bob
good morning
To resume when I open the csv file with the txtpad:
hello;world
my;name;is;bob
good;morning

Simply use pandas dataframe
import pandas as pd
df = pd.DataFrame(list)
df.to_csv('filename.csv')
By default missing values will be filled in with None to replace None use
df.fillna('', inplace=True)
So your final code should be like
import pandas as pd
df = pd.DataFrame(list)
df.fillna('', inplace=True)
df.to_csv('filename.csv')
Cheers!!!
Note: You should not use list as a variable name as it is a keyword in python.

I do not know if this is what you want:
list = [["hello","world"], ["my","name","is","bob"] , ["good","morning"]]
with open("d:/test.csv","w") as f:
writer = csv.writer(f, delimiter=";")
writer.writerows(list)
Gives as output file:
hello;world
my;name;is;bob
good;morning

Python Pandas performing operation on each row of CSV file

I have a 1million line CSV file. I want to do call a lookup function on each row's 1'st column, and append its result as a new column in the same CSV (if possible).
What I want is this is something like this:
for each row in dataframe
string=row[1]
result=lookupFunction(string)
row.append[string]
I Know i could do it using python's CSV library by opening my CSV, read each row, do my operation, write results to a new CSV.
This is my code using Python's CSV library
with open(rawfile, 'r') as f:
with open(newFile, 'a') as csvfile:
csvwritter = csv.writer(csvfile, delimiter=' ')
for line in f:
#do operation
However I really want to do it with Pandas because it would be something new to me.
This is what my data looks like
77,#oshkosh # tannersville pa,,PA,US
82,#osithesakcom ca,,CA,US
88,#osp open records or,,OR,US
89,#ospbco tel ord in,,IN,US
98,#ospwmnwithn return in,,IN,US
99,#ospwmnwithn tel ord in,,IN,US
100,#osram sylvania inc ma,,MA,US
106,#osteria giotto montclair nj,,NJ,US
Any help and guidance will be appreciated it. THanks

here is a simple example of adding 2 columns to a new column from you csv file
import pandas as pd
df = pd.read_csv("yourpath/yourfile.csv")
df['newcol'] = df['col1'] + df['col2']

create df and csv
import pandas as pd
df = pd.DataFrame(dict(A=[1, 2], B=[3, 4]))
df.to_csv('test_add_column.csv')
read csv into dfromcsv
dfromcsv = pd.read_csv('test_add_column.csv', index_col=0)
create new column
dfromcsv['C'] = df['A'] * df['B']
dfromcsv
write csv
dfromcsv.to_csv('test_add_column.csv')
read it again
dfromcsv2 = pd.read_csv('test_add_column.csv', index_col=0)
dfromcsv2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python & Pandas: How to address NaN values in a loop? - python

Related

In Pandas, how can I extract certain value using the key off of a dataframe imported from a csv file?

Python, Pandas. create a text file from a specific column

How to process .dat file Dataframe with multiple columns in different rows?

Convert this list of lists in CSV

Python Pandas performing operation on each row of CSV file

Categories

Resources