I am new in Pandas, After I use the for loop, the output is rotating and column is swapping with row's position. How can i still keep the output same as its in file?
for i in range(1,100):
print(df.loc[(i*2)])
Related
I want to clear the contents of the first two cells in location for every first 2 duplicates in last name.
For eg: i want to clear out the 1st 2 location occurances for Balchuinas and only keep the 3rd one. Same goes for London and Fleck. I ONLY want to clear out the location cells, not complete rows.
Any help?
I tried the .drop_duplicates,keep='last' method but that removes the whole row. I only want to clear the contents of the cells (or change it to NaN if thats possible)
Ps. This is my first time asking a question so im not sure how to paste the image without a link. Please help!
Rather than removing the duplicate rows. I would suggest, find the duplicate values and replace it with NaN while keeping the last cell value
Something like this:
df[df.duplicated(keep='last')] = float('nan')
I've got multiple excels and I need a specific value but in each excel, the cell with the value changes position slightly. However, this value is always preceded by a generic description of it which remains constant in all excels.
I was wondering if there was a way to ask Python to grab the value to the right of the element containing the string "xxx".
try iterating over the excel files (I guess you loaded each as a separate pandas object?)
somehting like for df in [dataframe1, dataframe2...dataframeN].
Then you could pick the column you need (if the column stays constant), e.g. - df['columnX'] and find which index it has:
df.index[df['columnX']=="xxx"]. Maybe will make sense to add .tolist() at the end, so that if "xxx" is a value that repeats more than once, you get all occurances in alist.
The last step would be too take the index+1 to get the value you want.
Hope it was helpful.
In general I would highly suggest to be more specific in your questions and provide code / examples.
Just as the name suggests, I have been using openpyxl to do some data automation stuff and I have been trying to find a way to add data to cells without overriding the current data. I need two different methods of doing this; 1, would be taking an input variable and appending it to the end of a string of data in a sentence format. (ex. Hello world. [appended data]') and then 2 would be taking an input variable (which is a number) and adding it to the number that is currently in the cell (250 + [variable]).
You can get the current value of the cell and then add it, subtract it or whatever. For example imagine the cell has 3, and you want to add 5 whatever it is in the cell. You retrieve the 3, and then add the 5, and write the result to the cell.
Hopefully a fairly simple answer to my issue.
When I run the following code:
print (data_1.iloc[1])
I get a nice, vertical presentation of the data, with each column value header, and its value presented on separate rows. This is very useful when looking at 2 sets of data, and trying to find discrepancies.
However, when I write the code as:
print (data_1.loc[data_1["Name"].isin(["John"])])
I get all the information arrayed across the screen, with the column header in 1 row, and the values in another row.
My question is:
Is there any way of using the second code, and getting the same vertical presentation of the data?
The difference is that data_1.iloc[1] returns a pandas Series whereas data_1.loc[data_1["Name"].isin(["John"])] returns a DataFrame. Pandas has different representations for these two data types (i.e. they print differently).
The reason iloc[1] gives you a Series is because you indexed it using a scalar. If you do data_1.iloc[[1]] you'll see you get a DataFrame instead. Conversely, I'm assuming that data_1["Name"].isin(["John"]) is returning a collection. If you wanted to get a Series instead you might try something like
print(data_1.loc[data_1["Name"].isin(["John"])[0]])
but only if you're sure you're getting one element back.
I have a text file containing an array of numbers from which I want to plot certain columns vs other columns. I defined a column function so I can assign a name to each column and then plot them, as in this sample code:
def column(matrix,i):
return [float(row.split()[i]) for row in matrix]
Db = file('ResolutionEffects', 'r' )
HIcontour = column(Db,1)
Db.seek(1)
However when I display a column in my terminal to check that Python is indeed reading the right one, it appears that the first value of the column (as returned in my terminal) is actually the first value of the NEXT column in the text file. All the other numbers are from the correct column. There are no blank spaces or lines in the text file. As far as I can tell this offset happens to every column after the first one.
If anyone can tell why this is happening, or find a more robust way to read columns in text files I would greatly appreciate it.
Indeed I found loadtext to be a lot more robust. After converting my text file to a data file (.dat) I simply use this:
a=np.loadtxt('ResolutionEffects.dat', usecols=(0,1,11,12))
ax1.plot(a[:,0], a[:,1], 'dk', label='HI')
ax1.plot(a[:,2], a[:,3], 'dr', label='CO')
No weird offsets or bugs anymore :) Thanks Ajean and jedwards!