I have a dataframe object from pandas and I wanted to know if there is any way that I can access a specific value from a specific column and change it.
from pandas import DataFrame as df
gameboard = df([['#','#',"#"],['#','#',"#"],['#','#',"#"]], columns = [1, 2, 3], index = [1,2,3])
print(gameboard)
Like for example, I wanted to change the '#' from the second 'second' list.
Or if gameboard was a 2d list how can I access gameboard[1][1]'s element.
I think you're looking for the .iloc function
(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)
to access said value you would need to call something like:
gameboard.iloc[1, 1] = 6
iloc would essentially call the second row (that's what the [1 is) and then you choose the location of the value in the list (, 1] for the second value in our case). Finally you assign whatever new value you want that to be.
Your output would be:
1 2 3
1 # # #
2 # 6 #
3 # # #
edit using alollz recommendation.
Related
I have a column in a dataframe that contain a list inside. My dataframe column is:
[],
['NORM'],
['NORM'],
['NORM'],
['NORM'],
['MI', 'STTC'],
As you can see I have an empty list and also a list with two elements. How can I change list with two elements to just take one of it (I don't care which one of it).
I tried with df.column.explode()but this just add more rows and I don't want more rows, I just need to take one of it.
Thank you so much
You can use Series.map with a custom mapping function which maps the elements of column according to desired requirements:
df['col'] = df['col'].map(lambda l: l[:1])
Result:
# print(df['col'])
0 []
1 [NORM]
2 [NORM]
3 [NORM]
4 [NORM]
5 [MI]
i, j is the location of the cell you need to access and this will give the first element of the list
list_ = df.loc[i][j]
if len(list_) > 0:
print(list_[0])
As you store lists into a pandas column, I assume that you do not worry for vectorization. So you could just use a list comprehension:
df[col] = [i[:1] for i in df[col]]
I have a dataframe with headers 'Category', 'Factor1', 'Factor2', 'Factor3', 'Factor4', 'UseFactorA', 'UseFactorB'.
The value of 'UseFactorA' and 'UseFactorB' are one of the strings ['Factor1', 'Factor2', 'Factor3', 'Factor4'], keyed based on the value in 'Category'.
I want to generate a column, 'Result', which equals dataframe[UseFactorA]/dataframe[UseFactorB]
Take the below dataframe as an example:
[Category] [Factor1] [Factor2] [Factor3] [Factor4] [useFactor1] [useFactor2]
A 1 2 5 8 'Factor1' 'Factor3'
B 2 7 4 2 'Factor3' 'Factor1'
The 'Result' series should be [2, .2]
However, I cannot figure out how to feed the value of useFactor1 and useFactor2 into an index to make this happen--if the columns to use were fixed, I would just give
df['Result'] = df['Factor1']/df['Factor2']
However, when I try to give
df['Results'] = df[df['useFactorA']]/df[df['useFactorB']]
I get the error
ValueError: Wrong number of items passed 3842, placement implies 1
Is there a method for doing what I am trying here?
Probably not the prettiest solution (because of the iterrows), but what comes to mind is to iterate through the sets of factors and set the 'Result' value at each index:
for i, factors in df[['UseFactorA', 'UseFactorB']].iterrows():
df.loc[i, 'Result'] = df[factors['UseFactorA']] / df[factors['UseFactorB']]
Edit:
Another option:
def factor_calc_for_row(row):
factorA = row['UseFactorA']
factorB = row['UseFactorB']
return row[factorA] / row[factorB]
df['Result'] = df.apply(factor_calc_for_row, axis=1)
Here's the one liner:
df['Results'] = [df[df['UseFactorA'][x]][x]/df[df['UseFactorB'][x]][x] for x in range(len(df))]
How it works is:
df[df['UseFactorA']]
Returns a data frame,
df[df['UseFactorA'][x]]
Returns a Series
df[df['UseFactorA'][x]][x]
Pulls a single value from the series.
I'm using pandas data frame which is populated from a CSV file, then I use Bokeh to convert that data frame into a ColumnDataSource.
Which looks like:
dataFrame = pandas.read_csv('somefile.CSV')
source = ColumnDataSource(dataFrame)
Now that I have all my columns, I want to do row-based calculations.
For example: I have three columns:
x, y, colour
it might be populated with:
1, 2, blue
2, 5, red
1, 8, yellow
Now, I want to change some associated variable, in that row, when I search through the source, so how can I do this:
# how do i step through the source dictionary?
if source['colour'] == 'blue':
# how do I get the current index, which is the row number
# how do I change the x column value at the index(row) we retrieved
source['x' index] = 2
Thank you
If you are iterating through the data you can do it this way:
dataFrame = pandas.read_csv('somefile.csv')
source = ColumnDataSource(dataFrame)
for index, colour in enumerate(source.data['colour']):
if colour == 'blue':
source.data['x'][index] = 2
Alternatively, to avoid iterating through the entire ColumnDataSource you can get the index of the first value of 'blue' in the 'colour' column using this:
list(source.data['colour']).index('blue')
You can use this as the index for editing column x like this:
source.data['x'][list(source.data['colour']).index('blue')] = 2
Indexing this list this way will just give you the first index of the value 'blue'. If you have more than one occurrence of 'blue' in your ColumnDataSource for which the associated 'x' value should be edited you should be able to iterate through the 'colour' column by indexing the list starting just after the last index of 'blue':
list(source.data['colour'])[last_index+1:].index('blue')
The loop that this is in should be wrapped in a try-statement as index('blue') throws a ValueError when the list it is searching does not contain the value 'blue'.
Use
source.x[source.color == 'blue'] = 2
source.x is the series which you want to change, condition in brackets select only rows for which it is true.
I have a Pandas data frame which is MultiIndexed. The second level contains a year ([2014,2015]) and the third contains the month number ([1, 2, .., 12]). I would like to merge these two into a single level like - [1/2014, 2/2014 ..., 6/2015]. How could this be done?
I'm new to Pandas. Searched a lot but could not find any similar question/solution.
Edit: I found a way to avoid having to do this altogether with the answer to this question. I should have been creating my data frame that way. This seems to be the way to go for indexing by DateTime.
Consider the pd.MultiIndex and pd.DataFrame, mux and df
mux = pd.MultiIndex.from_product([list('ab'), [2014, 2015], range(1, 3)])
df = pd.DataFrame(dict(A=1), mux)
print(df)
A
a 2014 1 1
2 1
2015 1 1
2 1
b 2014 1 1
2 1
2015 1 1
2 1
We want to reassign to the index a list if lists that represent the index we want.
I want the 1st level the same
df.index.get_level_values(0)
I want the new 2nd level to be a string concatenation of the current 2nd and 3rd levels but reverse the order
df.index.map('{0[2]}/{0[1]}'.format)
df.index = [df.index.get_level_values(0), df.index.map('{0[2]}/{0[1]}'.format)]
print(df)
A
a 1/2014 1
2/2014 1
1/2015 1
2/2015 1
b 1/2014 1
2/2014 1
1/2015 1
2/2015 1
You can use a list comprehension to restructure your index. For example, if you have a 3 levels index and you want to combine the second and the third levels:
lst = [(i, f'{k}/{j}') for i, j, k in df.index]
df.index = pd.MultiIndex.from_tuples(lst)
This is just an explanation to the answer of piRSquared.
df.index.map('{0[2]}/{0[1]}'.format)
the map() method has one argument, which is a callback that is executed on each element of the index. In this example, the method happens to be the python built-in str.format function.
The format function is pretty mighty and has a lot of functionality (see the docs). One of those functions is to refer to positional arguments by specifying their position. This means that
"Hello {1}, I am {0}, how are you?".format("Bob", "Alice")
--> Hello Alice, I am Bob, how are you?
That's where the zero in piRSquared's answer comes from.
Normally, it is not required if only one argument is replaced in the string:
"Hello {}".format("Bob")
--> Hello Bob
However, in this case, two additional features are required:
using an element multiple times in the same string, and
selecting a sub-element from the argument.
Since the map method will pass a single index entry as argument to the format function, "{0[2]}" refers to the third element of that index.
Now the index in the original questions has three levels, so each argument passed to the format function is a tuple containing the three elements corresponding to the row's index.
A more verbose, but equivalent solution would be:
df.index.map(lambda idx: str(idx[2]) + '/' + str(idx[1]))
or
df.index.map(lambda idx: f'{idx[2]}/{idx[1]}')
I have a dataframe that looks likes this:
Sweep Index
Sweep0001 0 -70.434570
1 -67.626953
2 -68.725586
3 -70.556641
4 -71.899414
5 -69.946289
6 -63.964844
7 -73.974609
...
Sweep0039 79985 -63.964844
79986 -66.406250
79987 -67.993164
79988 -68.237305
79989 -66.894531
79990 -71.411133
I want to slice out different ranges of Sweeps.
So for example, I want Sweep0001 : Sweep0003, Sweep0009 : Sweep0015, etc.
I know I can do this in separate lines with ix, i.e.:
df.ix['Sweep0001':'Sweep0003']
df.ix['Sweep0009':'Sweep0015']
And then put those back together into one dataframe (I'm doing this so I can average sweeps together, but I need to select some of them and remove others).
Is there a way to do that selection in one line though? I.e. without having to slice each piece separately, followed by recombining all of it into one dataframe.
Use Pandas IndexSlice
import pandas as pd
idx = pd.IndexSlice
df.loc[idx[["Sweep0001", "Sweep0002", ..., "Sweep0003", "Sweep0009", ..., "Sweep0015"]]
You can retrieve the labels you want this way:
list1 = df.index.get_level_values(0).unique()
list2 = [x for x in list1]
list3 = list2[1:4] #For your Sweep0001:Sweep0003
list3.extend(list2[9:16]) #For you Sweep0009:Sweep0015
df.loc[idx[list3]] #Note that you need one set of "[]"
#less around "list3" as this list comes
#by default with its own set of "[]".
In case you want to also slice by columns you can use:
df.loc[idx[list3],:] #Same as above to include all columns.
df.loc[idx[list3],:"column label"] #Returns data up to that "column label".
More information on slicing is on the Pandas website (http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers) or in this similar Stackoverflow Q/A: Python Pandas slice multiindex by second level index (or any other level)