Write list to specific column in csv - python

I'm trying to write the data from my list to just column 4
namelist = ['PEAR']
for name in namelist:
for man_year in yearlist:
for man_month in monthlist:
with open('{2}\{0}\{1}.csv'.format(man_year,man_month,name),'w') as filename:
writer = csv.writer(filename)
writer.writerow(name)
time.sleep(0.01)
it outputs to a csv like this
P E A R
4015854 234342 2442343 234242
How can I get it to go on just the 4th column?
PEAR
4015854 234342 2442343 234242

Replace the line writer.writerow(name) with,
writer.writerow(['', '', '', name])

When you pass the name to csvwriter it assumes the name as an iterable and write each character in a column.
So, for getting ride of this problem change the following line:
writer.writerow(name)
With:
writer.writerow([''] * (len(other_row)-1) + [name])
Here other_row can be one of the rest rows, but if you are sure about the length you can do something like:
writer.writerow([''] * (length-1) + [name])

Instead of writing '' to cells you don't want to touch, you could use df.at instead. For example, you could write df.at[index, ColumnName] = 10 which would change only the value of that specific cell.
You can read more about it here: Set value for particular cell in pandas DataFrame using index

Related

In Pandas, how can I extract certain value using the key off of a dataframe imported from a csv file?

Using Pandas, I'm trying to extract value using the key but I keep failing to do so. Could you help me with this?
There's a csv file like below:
value
"{""id"":""1234"",""currency"":""USD""}"
"{""id"":""5678"",""currency"":""EUR""}"
I imported this file in Pandas and made a DataFrame out of it:
dataframe from a csv file
However, when I tried to extract the value using a key (e.g. df["id"]), I'm facing an error message.
I'd like to see a value 1234 or 5678 using df["id"]. Which step should I take to get it done? This may be a very basic question but I need your help. Thanks.
The csv file isn't being read in correctly.
You haven't set a delimiter; pandas can automatically detect a delimiter but hasn't done so in your case. See the read_csv documentation for more on this. Because the , the pandas dataframe has a single column, value, which has entire lines from your file as individual cells - the first entry is "{""id"":""1234"",""currency"":""USD""}". So, the file doesn't have a column id, and you can't select data by id.
The data aren't formatted as a pandas df, with row titles and columns of data. One option is to read in this data is to manually process each row, though there may be slicker options.
file = 'test.dat'
f = open(file,'r')
id_vals = []
currency = []
for line in f.readlines()[1:]:
## remove obfuscating characters
for c in '"{}\n':
line = line.replace(c,'')
line = line.split(',')
## extract values to two lists
id_vals.append(line[0][3:])
currency.append(line[1][9:])
You just need to clean up the CSV file a little and you are good. Here is every step:
# open your csv and read as a text string
with open('My_CSV.csv', 'r') as f:
my_csv_text = f.read()
# remove problematic strings
find_str = ['{', '}', '"', 'id:', 'currency:','value']
replace_str = ''
for i in find_str:
my_csv_text = re.sub(i, replace_str, my_csv_text)
# Create new csv file and save cleaned text
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
f.write(my_csv_text)
# Create pandas dataframe
df = pd.read_csv('my_new_csv.csv', sep=',', names=['ID', 'Currency'])
print(df)
Output df:
ID Currency
0 1234 USD
1 5678 EUR
You need to extract each row of your dataframe using json.loads() or eval()
something like this:
import json
for row in df.iteritems():
print(json.loads(row.value)["id"])
# OR
print(eval(row.value)["id"])

how to extract first part of name(first name) in a list that contains full names and discard names with one part

I have a CSV file that contains one column of names. what I want is a python code to check every name in the column and see if the name has more than one part, it takes just the first part and appends it in a new CSV file list while it skips any name that has just one part in the old CSV file.
For Example
input CSV file
Column1
Metarhizium robertsii ARSEF 23
Danio rerio
Parascaris equorum
Hevea
Gossypium
Vitis vinifera
The output CSV file should be
Column1
Metarhizium
Danio
Parascaris
Vitis
You can first create a flag for those values that have more than one word, then use the apply() method and write a lambda function to retrieve the first word in all names.
flag = df.loc[:,'Column1'].str.split(' ').apply(len) > 1
split_names = lambda name: name.split()[0] if (len(name.split())) else None
new_df = df.loc[flag,'Column1'].apply(split_names)
new_df.to_csv('output.csv', index=False)
You can split then apply the function len to mask the result, then get the first element of the filtered in rows.
import pandas as pd
df = pd.read_csv("input.csv")
splitted = df.Column1.apply(lambda x: x.split())
output = splitted[splitted.apply(len) > 1].apply(lambda x: x[0])
output.to_csv("output.csv")
# > ,Column1
# 0,Metarhizium
# 1,Danio
# 2,Parascaris
# 5,Vitis
Are the names always separated with a space?
You could use the re module in python and use regex expressions or if you looking for something simple you can also use the str.split() method in python:
for name in column:
split_name = name.split(' ', 1) #Splits the name once after the first space and returns a list of strings
if len(split_name) > 1: new_csv.write(split_name[0]) #write the first part of the split up name into the new csv

Loop Pandas DF to append new lines to txt

I have flat file (txt), in which I have Date column and Value column.
I am trying to append new lines to the txt in case my dataframe receives new lines, using a loop logic. I have this following code:
LastDate I am saying here its equal to 0 for simplicity reasons.
LastDate = 0
saveFileLine = name+'.txt.'
saveFile = open(saveFileLine,'a')
for index, row in namedf.iterrows():
if int(''.join(row['Date'].split('-')[:3])) > LastDate:
lineToWrite = row+'\n'
saveFile.write(lineToWrite)
saveFile.close()
and it gives me the error:
write() argument must be str, not Series
I dont know how to make it write the line of the loop currently active.
Hope you can help me out!
Thanks
Each row is a Pandas Series. You're treating the whole row as a string. Do you want to write the whole row?
for index, row in namedf.iterrows():
lineToWrite = row.to_string()
saveFile.write(lineToWrite)

Python pandas trouble with storing result in variable

I'm using pandas to handle some csv file, but i'm having trouble storing the results in a variable and printing it out as it is.
This is the code that I have.
df = pd.read_csv(MY_FILE.csv, index_col=False, header=0)
df2 = df[(df['Name'])]
# Trying to get the result of Name to the variable
n = df2['Name']
print(n)
And the result that i get:
1 jake
Name: Name, dtype: object
My Question:
Is it possible to just have "Jake" stored in a variable "n" so that i can call it out whenever i need it?
EG: Print (n)
Result: Jake
This is the code that I have constructed
def name_search():
list_to_open = input("Which list to open: ") + ".csv"
directory = "C:\Users\Jake Wong\PycharmProjects\box" "\\" + list_to_open
if os.path.isfile(directory):
# Search for NAME
Name_id = input("Name to search for: ")
df = pd.read_csv(directory, index_col=False, header=0)
df2 = df[(df['Name'] == Name_id)]
# Defining the name to save the file as
n = df2['Name'].ix[1]
print(n)
This is what is in the csv file
S/N,Name,Points,test1,test2,test3
s49,sing chun,5000,sc,90 sunrsie,4984365132
s49,Alice Suh,5000,jake,88 sunrsie,15641816
s1231,Alice Suhfds,5000,sw,54290 sunrsie,1561986153
s49,Jake Wong,5000,jake,88 sunrsie,15641816
The problem is that n = df2['Name'] is actually a Pandas Series:
type(df.loc[df.Name == 'Jake Wong'].Name)
pandas.core.series.Series
If you just want the value, you can use values[0] -- values is the underlying array behind the Pandas object, and in this case it's length 1, and you're just taking the first element.
n = df2['Name'].values[0]
Also your CSV is not formatted properly: It's not enough to have things lined up in columns like that, you need to have a consistent delimiter (a comma or a tab usually) between columns, so the parser can know when one column ends and another one starts. Can you fix your csv to look like this?:
S/n,Name,points
s56,Alice Suh,5000
s49,Jake Wong,5000
Otherwise we can work on another solution for you but we will probably use regex rather than pandas.

Checking for Regular Expressions within a CSV

I'm currently trying to run through my csv file and identify the rows in a column.
The output should be something like "This column contains alpha characters only".
My code currently:
Within a method I have:
print('\nREGULAR EXPRESSIONS\n' +
'----------------------------------')
for x in range(0, self.tot_col):
print('\n' + self.file_list[0][x] +
'\n--------------') # Prints the column name
for y in range(0, self.tot_rows + 1):
if regex.re_alpha(self.file_list[y][x]) is True:
true_count += 1
else:
false_count += 1
if true_count > false_count:
percentage = (true_count / self.tot_rows) * 100
print(str(percentage) + '% chance that this column is alpha only')
true_count = 0
false_count = 0
self.file_list is the csv file in list format.
self.tot_rows & self.tot_col are the total rows and total columns respectively which has been calculated earlier within the program.
regex.re_alpha has been imported from a file and the method looks like:
def re_alpha(column):
# Checks alpha characters
alpha_valid = alpha.match(column)
if alpha_valid:
return True
else:
return False
This currently works, however I am unable to add my other regex checks such as alpha, numeric etc
I have tried to duplicate the if statement with a different regex check but it doesn't work.
I've also tried to do the counts in the regex.py file however the count stops at '1' and returns the wrong information..
I thought creating a class in the regex.py file would help however no avail.
Summary:
I would like to run multiple regex checks against my csv file and have them ordered via columns.
Thanks in advance.
From the code above, the first line of the CSV contains the column names. This means you could make a dictionary to contain your result where the keys are the column names.
from csv import DictReader
reader = DictReader(open(filename)) # filename is the name of the CSV file
results = {}
for row in reader:
for col_name, value in row.items():
results.setdefault(col_name, []).append(regex.re_alpha(value))
Now you have a dictionary called 'results' which has the output from the regex checks stored by column name. You can then output statistics. Or you could save the rows as you read them in a list and once you decide on an order you can go back and output rows to a new CSV file by outputting the items in each dictionary using the keys in the new order.
csv_writer = csv.writer(open(output_filename, 'w'))
new_order = [list of key names in the right order]
for row in saved_data:
new_row = map(row.get, new_order)
csv_writer.writerow(new_row)
Admittedly this is a bit of a sketch but it should get you going.

Categories