I am a novice in Python, and after several searches about how to convert my list of lists into a CSV file, I didn't find how to correct my issue.
Here is my code :
#!C:\Python27\read_and_convert_txt.py
import csv
if __name__ == '__main__':
with open('c:/python27/mytxt.txt',"r") as t:
lines = t.readlines()
list = [ line.split() for line in lines ]
with open('c:/python27/myfile.csv','w') as f:
writer = csv.writer(f)
for sublist in list:
writer.writerow(sublist)
The first open() will create a list of lists from the txt file like
list = [["hello","world"], ["my","name","is","bob"], .... , ["good","morning"]]
then the second part will write the list of lists into a csv file but only in the first column.
What I need is from this list of lists to write it into a csv file like this :
Column 1, Column 2, Column 3, Column 4 ......
hello world
my name is bob
good morning
To resume when I open the csv file with the txtpad:
hello;world
my;name;is;bob
good;morning
Simply use pandas dataframe
import pandas as pd
df = pd.DataFrame(list)
df.to_csv('filename.csv')
By default missing values will be filled in with None to replace None use
df.fillna('', inplace=True)
So your final code should be like
import pandas as pd
df = pd.DataFrame(list)
df.fillna('', inplace=True)
df.to_csv('filename.csv')
Cheers!!!
Note: You should not use list as a variable name as it is a keyword in python.
I do not know if this is what you want:
list = [["hello","world"], ["my","name","is","bob"] , ["good","morning"]]
with open("d:/test.csv","w") as f:
writer = csv.writer(f, delimiter=";")
writer.writerows(list)
Gives as output file:
hello;world
my;name;is;bob
good;morning
Related
I am trying to create a text file from a column in a pandas dataframe. There are repeating values and I'd like each value to only be copied once. I also do not want the row value in the text file.
I have tried creating a dictionary:
stocks = dict(enumerate(df.tic.unique()))
then:
f = open("stocks","w")
f.write( str(stocks) )
f.close()
The text file output is all the names, but I'd like each to have their own line. Additionally, the row number is included which I need left out.
I don't know what your dataframe looks like but here's an example.
d = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(d)
df["col1"].to_csv(r'data.txt', header=None, index=None, sep='\t', mode='a')
Replace the col1 in df["col1"] with the name of your column and df by the name of your dataframe of course. IF you want to remove duplicates you could also simply use df.drop_duplicate(...) with the settings you want before saving your dataframe in the text file.
You're writing the string 'stocks' to the file. You need to write the variable stocks.
saveFile = open('stocks', 'w')
saveFile.write(stocks)
saveFile.close()
If you want to use unique(), you can try this:
with open("stocks.txt", "w") as f:
f.write("\n".join(df.tic.unique()))
df.tic.unique() returns the unique values of the column tic
"\n".join(df.tic.unique()) concatenates the unique values into a giant string, separated by new-line
f.write writes the giant string from 2 to the file
I have also edited "stocks" to "stocks.txt".
In the below program i have created a workbook which contains a worksheet named sort
where i have placed words in one column and Numbers in another column
Now i have successfully outputed the .xlsxv file
But i need the numbers should be sorted from DESCENDING TO ASCENDING ORDER.
I don't know how to place the code for that.
Code
=====
import csv
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('wordsandnumbers.xlsx')
worksheet = workbook.add_worksheet('sort')
with open('sort.csv') as f:
reader = csv.reader(f)
alist = list(reader)
worksheet.write(2,0,'words')
worksheet.write(2,1,'Numbers')
newlist = []
for values in alist:
convstr = str(values)
convstr = convstr.split(",")
newlist.extend(convstr)
a=3
for i in range(3,10):
newlist[a] = re.sub('[^a-zA-Z]','',newlist[a])
worksheet.write(i,0,newlist[a].strip('['))
a=a+1
newlist[a] = re.sub('[^0-9]','',newlist[a])
int(newlist[a])
worksheet.write(i,1,newlist[a])
a=a+1
workbook.close()
The Output i'm getting in .xlsx sheet is :
Needed output:
(The corresponding words which is in the same row of number should also be sorted)
I would recommend loading your original csv as a dataframe and then sorting it by a particular column. I've provided a fully reproducible example below that illustrates this.
I make my own version of sort.csv for demonstration purposes, then read it in as a dataframe using pandas.read_csv, and then sort using pandas.DataFrame.sort_values.
import pandas as pd
sort = open('sort.csv', 'w+')
sort.write('May, 5227\n')
sort.write('June, 417\n')
sort.write('Jan, 4\n')
sort.write('Feb, 424\n')
sort.write('Dec, 36\n')
sort.write('Mar, 4981\n')
sort.write('Apr, 3460\n')
sort.close()
df = pd.read_csv('sort.csv', names = ['words', 'Numbers'])
df = df.sort_values(['Numbers'], ascending=[False])
writer = pd.ExcelWriter('wordsandnumbers.xlsx', engine='xlsxwriter')
df.to_excel(writer, index=False, startrow=2)
writer.save()
Outputted sort.csv:
Outputted wordsandnumbers.xlsx:
Once you get the data into the array its straightforward to sort it and maintain order. You can just use the built in sort but give it a key which is the value you want the list sorted based on. See this.
import csv
import xlsxwriter
import re
workbook = xlsxwriter.Workbook('wordsandnumbers.xlsx')
worksheet = workbook.add_worksheet('sort')
with open('./sort.csv') as f:
reader = csv.reader(f)
alist = list(reader)
worksheet.write(2,0,'words')
worksheet.write(2,1,'Numbers')
#Here convert the number to an integer
newerlist = [[x[0], int(x[1])] for x in alist[1:]]
print(newerlist)
#key is the function applied to the arguments to get the answer and lambda
#is just a 1 line way to write a function f(x) which returns x[1] (the number in the rows)
newerlist.sort(key = lambda x : x[1], reverse = True)
a=3
for i in range(3,9):
for j in range(0,2):
worksheet.write(i,j,str(newerlist[i-a][j]))
workbook.close()
# Program to combine data from 2 csv file
The cdc_list gets updated after second call of read_csv
overall_list = []
def read_csv(filename):
file_read = open(filename,"r").read()
file_split = file_read.split("\n")
string_list = file_split[1:len(file_split)]
#final_list = []
for item in string_list:
int_fields = []
string_fields = item.split(",")
string_fields = [int(x) for x in string_fields]
int_fields.append(string_fields)
#final_list.append()
overall_list.append(int_fields)
return(overall_list)
cdc_list = read_csv("US_births_1994-2003_CDC_NCHS.csv")
print(len(cdc_list)) #3652
total_list = read_csv("US_births_2000-2014_SSA.csv")
print(len(total_list)) #9131
print(len(cdc_list)) #9131
I don't think the code you pasted explains the issue you've had, at least it's not anywhere I can determine. Seems like there's a lot of code you did not include in what you pasted above, that might be responsible.
However, if all you want to do is merge two csvs (assuming they both have the same columns), you can use Pandas' read_csv and Pandas DataFrame methods append and to_csv, to achieve this with 3 lines of code (not including imports):
import pandas as pd
# Read CSV file into a Pandas DataFrame object
df = pd.read_csv("first.csv")
# Read and append the 2nd CSV file to the same DataFrame object
df = df.append( pd.read_csv("second.csv") )
# Write merged DataFrame object (with both CSV's data) to file
df.to_csv("merged.csv")
I have a 1million line CSV file. I want to do call a lookup function on each row's 1'st column, and append its result as a new column in the same CSV (if possible).
What I want is this is something like this:
for each row in dataframe
string=row[1]
result=lookupFunction(string)
row.append[string]
I Know i could do it using python's CSV library by opening my CSV, read each row, do my operation, write results to a new CSV.
This is my code using Python's CSV library
with open(rawfile, 'r') as f:
with open(newFile, 'a') as csvfile:
csvwritter = csv.writer(csvfile, delimiter=' ')
for line in f:
#do operation
However I really want to do it with Pandas because it would be something new to me.
This is what my data looks like
77,#oshkosh # tannersville pa,,PA,US
82,#osithesakcom ca,,CA,US
88,#osp open records or,,OR,US
89,#ospbco tel ord in,,IN,US
98,#ospwmnwithn return in,,IN,US
99,#ospwmnwithn tel ord in,,IN,US
100,#osram sylvania inc ma,,MA,US
106,#osteria giotto montclair nj,,NJ,US
Any help and guidance will be appreciated it. THanks
here is a simple example of adding 2 columns to a new column from you csv file
import pandas as pd
df = pd.read_csv("yourpath/yourfile.csv")
df['newcol'] = df['col1'] + df['col2']
create df and csv
import pandas as pd
df = pd.DataFrame(dict(A=[1, 2], B=[3, 4]))
df.to_csv('test_add_column.csv')
read csv into dfromcsv
dfromcsv = pd.read_csv('test_add_column.csv', index_col=0)
create new column
dfromcsv['C'] = df['A'] * df['B']
dfromcsv
write csv
dfromcsv.to_csv('test_add_column.csv')
read it again
dfromcsv2 = pd.read_csv('test_add_column.csv', index_col=0)
dfromcsv2
I am dealing with a csv file that contains three columns and three rows containing numeric data. The csv data file simply looks like the following:
Colum1,Colum2,Colum3
1,2,3
1,2,3
1,2,3
My question is how to write a python code that take a single value of one of the column and perform a specific operation. For example, let say I want to take the first value in 'Colum1' and subtract it from the sum of all the values in the column.
Here is my attempt:
import csv
f = open('columns.csv')
rows = csv.DictReader(f)
value_of_single_row = 0.0
for i in rows:
value_of_single_Row += float(i) # trying to isolate a single value here!
print value_of_single_row - sum(float(r['Colum1']) for r in rows)
f.close()
Based on the code you provided, I suggest you take a look at the doc to see the preferred approach on how to read through a csv file. Take a look here:
How to use CsvReader
with that being said, you can modify the beginning of your code slightly to this:
import csv
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
# perform operation per row
From there you now have access to each row.
This should give you what you need to do proper row-by-row operations.
What I suggest you do is play around with printing out your rows to see what your data looks like. You will see that each row being outputted is a dictionary.
So if you were going through each row, you can just simply do something like this:
for row in rows:
row['Colum1'] # or row.get('Colum1')
# to do some math to add everything in Column1
s += float(row['Column1'])
So all of that will look like this:
import csv
s = 0
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
s += float(row['Colum1'])
You can do pretty much all of this with pandas
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
import os
Location = r'path/test.csv'
df = pd.read_csv(Location, names=['Colum1','Colum2','Colum3'])
df = df[1:] #Remove the headers since they're unnecessary
print df
df.xs(1)['Colum1']=int(df.loc[1,'Colum1'])+5
print df
You can write back to your csv using df.to_csv('File path', index=False,header=True) Having headers=True will add the headers back in.
To do this more along the lines of what you have you can do it like this
import csv
Location = r'C:/Users/tnabrelsfo/Documents/Programs/Stack/test.csv'
data = []
with open(Location, 'r') as f:
for line in f:
data.append(line.replace('\n','').replace(' ','').split(','))
data = data[1:]
print data
data[1][1] = 5
print data
it will read in each row, cut out the column names, and then you can modify the values by index
So here is my simple solution using pandas library. Suppose we have sample.csv file
import pandas as pd
df = pd.read_csv('sample.csv') # df is now a DataFrame
df['Colum1'] = df['Colum1'] - df['Colum1'].sum() # here we replace the column by subtracting sum of value in the column
print df
df.to_csv('sample.csv', index=False) # save dataframe back to csv file
You can also use map function to do operation to one column, for example,
import pandas as pd
df = pd.read_csv('sample.csv')
col_sum = df['Colum1'].sum() # sum of the first column
df['Colum1'] = df['Colum1'].map(lambda x: x - col_sum)