Python, Pandas. create a text file from a specific column - python

I am trying to create a text file from a column in a pandas dataframe. There are repeating values and I'd like each value to only be copied once. I also do not want the row value in the text file.
I have tried creating a dictionary:
stocks = dict(enumerate(df.tic.unique()))
then:
f = open("stocks","w")
f.write( str(stocks) )
f.close()
The text file output is all the names, but I'd like each to have their own line. Additionally, the row number is included which I need left out.

I don't know what your dataframe looks like but here's an example.
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)
df["col1"].to_csv(r'data.txt', header=None, index=None, sep='\t', mode='a')
Replace the col1 in df["col1"] with the name of your column and df by the name of your dataframe of course. IF you want to remove duplicates you could also simply use df.drop_duplicate(...) with the settings you want before saving your dataframe in the text file.

You're writing the string 'stocks' to the file. You need to write the variable stocks.
saveFile = open('stocks', 'w')
saveFile.write(stocks)
saveFile.close()

If you want to use unique(), you can try this:
with open("stocks.txt", "w") as f:
f.write("\n".join(df.tic.unique()))
df.tic.unique() returns the unique values of the column tic
"\n".join(df.tic.unique()) concatenates the unique values into a giant string, separated by new-line
f.write writes the giant string from 2 to the file
I have also edited "stocks" to "stocks.txt".

Related

In Pandas, how can I extract certain value using the key off of a dataframe imported from a csv file?

Using Pandas, I'm trying to extract value using the key but I keep failing to do so. Could you help me with this?
There's a csv file like below:
value
"{""id"":""1234"",""currency"":""USD""}"
"{""id"":""5678"",""currency"":""EUR""}"
I imported this file in Pandas and made a DataFrame out of it:
dataframe from a csv file
However, when I tried to extract the value using a key (e.g. df["id"]), I'm facing an error message.
I'd like to see a value 1234 or 5678 using df["id"]. Which step should I take to get it done? This may be a very basic question but I need your help. Thanks.
The csv file isn't being read in correctly.
You haven't set a delimiter; pandas can automatically detect a delimiter but hasn't done so in your case. See the read_csv documentation for more on this. Because the , the pandas dataframe has a single column, value, which has entire lines from your file as individual cells - the first entry is "{""id"":""1234"",""currency"":""USD""}". So, the file doesn't have a column id, and you can't select data by id.
The data aren't formatted as a pandas df, with row titles and columns of data. One option is to read in this data is to manually process each row, though there may be slicker options.
file = 'test.dat'
f = open(file,'r')
id_vals = []
currency = []
for line in f.readlines()[1:]:
## remove obfuscating characters
for c in '"{}\n':
line = line.replace(c,'')
line = line.split(',')
## extract values to two lists
id_vals.append(line[0][3:])
currency.append(line[1][9:])
You just need to clean up the CSV file a little and you are good. Here is every step:
# open your csv and read as a text string
with open('My_CSV.csv', 'r') as f:
my_csv_text = f.read()
# remove problematic strings
find_str = ['{', '}', '"', 'id:', 'currency:','value']
replace_str = ''
for i in find_str:
my_csv_text = re.sub(i, replace_str, my_csv_text)
# Create new csv file and save cleaned text
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(my_csv_text)
# Create pandas dataframe
df = pd.read_csv('my_new_csv.csv', sep=',', names=['ID', 'Currency'])
print(df)
Output df:
ID Currency
0 1234 USD
1 5678 EUR
You need to extract each row of your dataframe using json.loads() or eval()
something like this:
import json
for row in df.iteritems():
print(json.loads(row.value)["id"])
# OR
print(eval(row.value)["id"])

appending a df to new row of csv file adds a empty row between added data

I want to append this single rowed df
rndList = ["albert", "magnus", "calc", 2, 5, "drop"]
rndListDf = pd.DataFrame([rndList])
to a new row of this csv file ,
first,second,third,fourth,fifth,sixth
to place each value under the corespondent column header
using this aproach
rndListDf.to_csv('./rnd_data.csv', mode='a', header=False)
leaves a empty row between header and data in the csv file
how can I append the row without the empty row ?
first,second,third,fourth,fifth,sixth
0,albert,magnus,calc,2,5,drop
I think you have empty lines after your header rows but you can try:
data = pd.read_csv('./rnd_data.csv')
rndListDf.rename(columns=dict(zip(rndListDf.columns, data.columns))) \
.to_csv('./rnd_data.csv', index=False)
Content of your file after this operation:
first,second,third,fourth,fifth,sixth
albert,magnus,calc,2,5,drop
I tested. Code or pandas.to_csv doesn't append new line. It comes from your original csv file. If you are trying to figure out how to add heading to your dataframe:
rndList = ["albert", "magnus", "calc", 2, 5, "drop"]
rndListDf = pd.DataFrame([rndList])
rndListDf.columns = 'first,second,third,fourth,fifth,sixth'.split(',')
rndListDf.to_csv('./rnd_data.csv', index=False)
alternatively, you can first clean your csv as suggested by Corralien and continue doing what you are doing. However, I would suggest to go with Corralien's solution.
# Cleanup
pd.read_csv('./rnd_data.csv').to_csv('rnd_data.csv', index=False)
# Your Code
rndList = ["albert", "magnus", "calc", 2, 5, "drop"]
rndListDf = pd.DataFrame([rndList])
rndListDf.to_csv('./rnd_data.csv', mode='a', header=False)
# Result
first,second,third,fourth,fifth,sixth
albert,magnus,calc,2,5,drop

Python & Pandas: How to address NaN values in a loop?

With Python and Pandas I'm seeking to take values from CSV cells and write them as txt files via a loop. The structure of the CSV file is:
user_id, text, text_number
0, test text A, text_0
1,
2,
3,
4,
5, test text B, text_1
The script below successfully writes a txt file for the first row - it is named text_0.txt and contains test text A.
import pandas as pd
df= pd.read_csv("test.csv", sep=",")
for index in range(len(df)):
with open(df["text_number"][index] + '.txt', 'w') as output:
output.write(df["text"][index])
However, I receive an error when it proceeds to the next row:
TypeError: write() argument must be str, not float
I'm guessing the error is generated when it encounters values it reads as NaN. I attempted to add the dropna feature per the pandas documentation like so:
import pandas as pd
df= pd.read_csv("test.csv", sep=",")
df2 = df.dropna(axis=0, how='any')
for index in range(len(df)):
with open(df2["text_number"][index] + '.txt', 'w') as output:
output.write(df2["text"][index])
However, the same issue persists - a txt file is created for the first row, but a new error message is returned for the next row: KeyError: 1.
Any suggestions? All assistance greatly appreciated.
The issue here is that you are creating a range index which is not necessarily in the data frame's index. For your use case, you can just iterate through rows of data frame and write to the file.
for t in df.itertuples():
if t.text_number: # do not write if text number is None
with open(t.text_number + '.txt', 'w') as output:
output.write(str(t.text))

Pandas dataframe. Lose a part of text data when save with encoding='utf-8'

I'm trying to put a large list of words (in russian, unicoded) to a dataframe column and save resulting dataframe into a .csv file. I need to save encoded text, but whenever I manually set encoding='utf-8', it cuts a part of my data and saves only first 100 words or something.
I'm using Python 2.7.
(lists are quite large, so here I write only the first and the last elements)
a = [u'\u0441\u043e\u0432\u043c\u0435\u0449\u0430\u0442\u044c', ... , u'\u044d\u043d\u0435\u0440\u0433\u0438\u0438']
s = [u'\u0441\u043e\u0432\u043c\u0435\u0449\u0430\u0442\u044c', ... , u'\u043b\u0438\u0447\u043d\u043e\u0439']
d = {'col1': [0, 1], 'col2': [a, s]}
df = pd.DataFrame(data=d)
df.to_csv('test.csv', encoding='utf-8')
Appriciate any suggestions.
Use 'latin-1' instead of 'utf-8'

Convert this list of lists in CSV

I am a novice in Python, and after several searches about how to convert my list of lists into a CSV file, I didn't find how to correct my issue.
Here is my code :
#!C:\Python27\read_and_convert_txt.py
import csv
if __name__ == '__main__':
with open('c:/python27/mytxt.txt',"r") as t:
lines = t.readlines()
list = [ line.split() for line in lines ]
with open('c:/python27/myfile.csv','w') as f:
writer = csv.writer(f)
for sublist in list:
writer.writerow(sublist)
The first open() will create a list of lists from the txt file like
list = [["hello","world"], ["my","name","is","bob"], .... , ["good","morning"]]
then the second part will write the list of lists into a csv file but only in the first column.
What I need is from this list of lists to write it into a csv file like this :
Column 1, Column 2, Column 3, Column 4 ......
hello world
my name is bob
good morning
To resume when I open the csv file with the txtpad:
hello;world
my;name;is;bob
good;morning
Simply use pandas dataframe
import pandas as pd
df = pd.DataFrame(list)
df.to_csv('filename.csv')
By default missing values will be filled in with None to replace None use
df.fillna('', inplace=True)
So your final code should be like
import pandas as pd
df = pd.DataFrame(list)
df.fillna('', inplace=True)
df.to_csv('filename.csv')
Cheers!!!
Note: You should not use list as a variable name as it is a keyword in python.
I do not know if this is what you want:
list = [["hello","world"], ["my","name","is","bob"] , ["good","morning"]]
with open("d:/test.csv","w") as f:
writer = csv.writer(f, delimiter=";")
writer.writerows(list)
Gives as output file:
hello;world
my;name;is;bob
good;morning

Categories