I have the following code to generate a .csv File:
sfdc_dataframe.to_csv('sfdc_data_demo.csv',index=False,header=True)
It is just one column, how could I get the last value of the column, and delete the last comma in the value?
Example image of input data:
https://i.stack.imgur.com/M5nVO.png
And the result that im try to make:
https://i.stack.imgur.com/fEOXM.png
Anyone have an idea or tip?
Thanks!
Once after reading csv file in dataframe(logic shared by you), you can use below logic which is specifically for last row of your specific column replace.
sfdc_dataframe['your_column_name'].iat[-1]=sfdc_dataframe['your_column_name'].iat[-1].str[:-1]
Updated answer below as it only required to change value of the last row.
val = sfdc_dataframe.iloc[-1, sfdc_dataframe.columns.get_loc('col')]
sfdc_dataframe.iloc[-1, sfdc_dataframe.columns.get_loc('col')] = val[:-1]
Easy way
df = pd.read_csv("df_name.csv", dtype=object)
df['column_name'] = df['column_name'].str[:-1]
Related
I have a large dataframe and am trying to add a leading (far left, 0th position) column for descriptive purposes. The dataframe and column which I'm trying to insert both have the same number of lines.
The column I'm inserting looks like this:
Description 1
Description 2
Description 3
.
.
.
Description n
The code I'm using to attach the column is:
df.insert(loc=0, column='description', value=columnToInsert)
The code I'm using to write to file is:
df.to_csv('output', sep='\t', header=None, index=None)
(Note: I've written to file with and without the "header=None" option, doesn't change my problem)
Now after writing to file, what I end up getting is:
Description 2 E11 ... E1n
Description 3 E21 ... E2n
.
.
.
Description n E(n-1)1... E(n-1)n
NaN En1 ... Enn
So the first element of my descriptive, leading column is deleted, all the descriptions are off by one, and the last row has "not a number" as it's description.
I have no idea what I'm doing which might cause this, and I'm not really sure where to start in correcting it.
Figured it out. The issue was stemming from the fact that I had deleted a row from my large dataframe prior to inserting my descriptive column, this was causing the indices to line up improperly.
So now I included the line:
df.reset_index(drop=True, inplace=True)
Everything lines up properly now and no elements are deleted!
I'm fairly new to Python and still learning the ropes, so I need help with a step by step program without using any functions. I understand how to count through an unknown column range and output the quantity. However, for this program, I'm trying to loop through a column, picking out unique numbers and counting its frequency.
So I have an excel file with random numbers down column A. I only put in 20 numbers but let's pretend the range is unknown. How would I go about extracting the unique numbers and inputting them into a separate column along with how many times they appeared in the list?
I'm not really sure how to go about this. :/
unique = 1
while xw.Range((unique,1)).value != None:
frequency = 0
if unique != unique: break
quantity += 1
"end"
I presume as you can't use functions this may be homework...so, high level:
You could first go through the column and then put all the values in a list?
Secondly take the first value from the list and go through the rest of the list - is it in there? If so then it is not unique. Now remove the value where you have found the duplicate from the list. Keep going if you find another remove that too.
Take the second value and so on?
You would just need list comprehension, some loops and perhaps .pop()
Using pandas library would be the easiest way to do. I created a sample excel sheet having only one column called "Random_num"
import pandas
data = pandas.read_excel("sample.xlsx", sheet_name = "Sheet1")
print(data.head()) # This would give you a sneak peek of your data
print(data['Random_num'].value_counts()) # This would solve the problem you asked for
# Make sure to pass your column name within the quotation marks
#eg: data['your_column'].value_counts()
Thanks
Suppose I have the following table:
enter image description here
I am currently using this line to create a new column:
MyDataFrame['PersonAge'] = MyDataFrame.apply(lambda row: "({},{})".format(row['Person'],['Age']), axis=1)
my goal is to have a column consisting of something like: (John, 24.0)
after that line when i MyDataFrame.head() this is the last column i see: (John, ['Age'])
This is true for all rows. for example in the next row i have: (Myla, ['Age'])
Any idea what could be the issue? I copied the column name from my table hoping it was a typo of some sort, but I got the same result.
I would appreciate any help (or a new way to make a "pair" of the previous data)! :)
It seems like I forgot to put row before ['Age'].
The answer should be:
MyDataFrame['PersonAge'] = MyDataFrame.apply(lambda row: "({},{})".format(row['Person'],row['Age']), axis=1)
I checked the code and this one works great! :)
I am unable to skip the second row of a data file while reading a csv file in python.
I am using the following code :
imdb_data = pd.read_csv('IMDB_data.csv', encoding = "ISO-8859-1",skiprows = 2)
Your code will ommit the first two lines of your csv.
If you want the second line to be ommitted (but the first one included) just do this minor change:
imdb_data = pd.read_csv('IMDB_data.csv', encoding = "ISO-8859-1",skiprows = [1])
Looking at the documentation we can learn that if you supply an integer n for skiprows, the first n rows are skipped. If you want to skip single lines explicitly by line number (0 indexed), you must supply a list-like argument.
In your specific case, that would be skiprows=[1].
The question has already answered. If one wants to skip number of rows at once, one can do the following:
df = pd.read_csv("transaction_activity.csv", skiprows=list(np.arange(1, 13)))
It will skip rows from second up to 12 by keeping your original columns in the dataframe, as it is counted '0'.
Hope it helps for similar problem.
I am going to start off with stating I am very much new at working in Python. I have a very rudimentary knowledge of SQL but this is my first go 'round with Python. I have a csv file of customer related data and I need to output the records of customers who have spent more than $1000. I was also given this starter code:
import csv
import re
data = []
with open('customerData.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
print(data[0])
print(data[1]["Name"])
print(data[2]["Spent Past 30 Days"])
I am not looking for anyone to give me the answer, but maybe nudge me in the right direction. I know that it has opened the file to read and created a list (data) and is ready to output the values of the first and second row. I am stuck trying to figure out how to call out the column value without limiting it to a specific row number. Do I need to make another list for columns? Do I need to create a loop to output each record that meets the > 1000 criteria? Any advice would be much appreciated.
To get a particular column you could use a for loop. I'm not sure exactly what you're wanting to do with it, but this might be a good place to start.
for i in range(0,len(data)):
print data[i]['Name']
len(data) should equal the number of rows, thus iterating through the entire column
The sample code does not give away the secret of data structure. It looks like maybe a list of dicts. Which does not make much sense, so I'll guess how data is organized. Assuming data is a list of lists you can get at a column with a list comprehension:
data = [['Name','Spent Past 30 Days'],['Ernie',890],['Bert',1200]]
spent_column = [row[1] for row in data]
print(spent_column) # prints: ['Spent Past 30 Days', 890, 1200]
But you will probably want to know who is a big spender so maybe you should return the names:
data = [['Name','Spent Past 30 Days'],['Ernie',890],['Bert',1200]]
spent_names = [row[0] for row in data[1:] if int(row[1])>1000]
print(spent_names) # prints: ['Bert']
If the examples are unclear I suggest you read up on list comprehensions; they are awesome :)
You can do all of the above with regular for-loops as well.