I have a columns like this :
colums
how to get individual value from the column ?
desired output is list ex: [42008598,26472654,42054590,42774221,42444463], so it value(s) can be counted
Let me give you an advice: when you have some example code to show us, It would be great if you paste into the code quotes like this. It is easiest to read. Let's go with your question. You can select row in a pandas dataframe like this:
import pandas as pd
print(df.iloc[i])
where i is the row number: 0, 1, 2,... and df is your dataframe. Here is the Documentation
I am also new in Stackoverflow. I hope this could help you.
What you need to convert each row in the dataframe to an array and then do the operation that you want with this array. The way you can do it with Pandas would be to declare a function that deals with each row, and them use apply to run the function each row.
An example to count how many elements has inside each row:
def treat_array(row):
row = row.replace("{", "")
row = row.replace("}", "")
row = row.split(",")
return len(row)
df["Elements Count"] = df["Name of Column with the Arrays"].apply(treat_array)
Related
I have an embedded set of data given to me which needs to be converted to a pandas Dataframe
"{'rows':{'data':[[{'column_name':'column','row_value':value}]]}"
It's just a snippet of what it looks like at the start. Everything inside data repeats over and over. i.e.
{‘column_name’:’name’, ’row_value :value }
I want the values of column_name to be the column headings. And the values of row_value to be the values in each row.
Ive tried a few different ways. I thought it would be something along the lines of
df = pd.DataFrame(data=[data_rows['row_value'] for data_rows in raw_data['rows']['data']], columns=['column_name'])
But I might be way off. I probably not stepping into the data right with raw_data['rows']['data']
Any suggestions would be great.
You can try to add another loop in your list comprehension to get elements out:
df = pd.DataFrame(data=[data_row for data_rows in raw_data['rows']['data'] for data_row in data_rows])
print(df)
name value type
0 dynamic_tag_tracker null null
I see a lot of questions related to dropping rows that have a certain value in a column, or dropping the entirety of columns, but pretend we have a Pandas Dataframe like the one below.
In this case, how could one write a line to go through the CSV, and drop all rows like 2 and 4? Thank you.
You could try
~((~df).all(axis=1))
to get the rows that you want to keep/drop. To get the dataframe with just those rows, you would use
df = df[~((~df).all(axis=1))]
A more detailed explanation is here:
Delete rows from a pandas DataFrame based on a conditional expression involving len(string) giving KeyError
This should help
for i in range(df.shape[0]):
value=df.shape[1]
count=0
for column_name in column_names:
if df.loc[[i]].column_name==False:
count=count+1
if count==value:
df.drop(index=i,inplace=True)
I have a pandas dataframe df.
There are 27 columns in df.
I want to read the 1st, 2nd and 10th to the last columns of df. I can do this df.iloc[0,1,9,10,11,.....,26] but this is too tedious to type if the dataframe has many columns. What is a more elegant way to read the columns?
I am using python v3.7
If you like to select columns by their numerical index, iloc is the right thing to use. You can use np.arange add a range of columns (such as between the 10th to the last one).
import pandas as pd
import numpy as np
cols = [0, 1]
cols.extend(np.arange(10, df.shape[1]))
df.iloc[:,cols]
Alternatively, you can use numpy's r_ slicing trick:
df.iloc[:,np.r_[0:2, 10:df.shape[1]]]
You can use "list" and "range":
df.iloc[:,[0,1]+list(range(9,27))]
Or numpy way:
df.iloc[:,np.append([0,1],np.arange(9,27))]
If you know the column names, you can try :
df = df[['col1', 'col2', 'coln']]
If you don't know the exact column names, you can try this :
list_of_columns_index = [1,2,3, n]
df = df[[df.columns[i] for i in list_of_columns_index]]
Suppose you know the name of the starting column or name of column 10th in your context. Assume name is starting_column_name.
Using name of column will make the code more readable and you save the trouble of counting columns to get to the right one.
num_columns = df.shape[1] # number of columns in dataframe
starting_column = df.columns.get_loc(starting_column_name)
features = df.iloc[:, np.r_[0:2, starting_column:num_columns]]
I have a dataset that consists of tokenized, POS-tagged phrases as one column of a dataframe:
Current Dataframe
I want to create a new column in the dataframe, consisting only of the proper nouns in the previous column:
Desired Solution
Right now, I'm trying something like this for a single row:
if 'NNP' in df['Description_POS'][96][0:-1]:
df['Proper Noun'] = df['Description_POS'][96]
But then I don't know how to loop this for each row, and how to obtain the tuple which contains the proper noun.
I'm very new right now and at a loss for what to use, so any help would be really appreciated!
Edit: I tried the solution recommended, and it seems to work, but there is an issue.
this was my dataframe:
Original dataframe
After implementing the code recommended
df['Proper Nouns'] = df['POS_Description'].apply(
lambda row: [i[0] for i in row if i[1] == 'NNP'])
it looks like this:
Dataframe after creating a proper nouns column
You can use the apply method, which as the name suggests will apply the given function to every row of the dataframe or series. This will return a series, which you can add as a new column to your dataframe
df['Proper Nouns'] = df['POS_Description'].apply(
lambda row: [i[0] for i in row if i[1] == 'NNP'])
I am assuming the POS_Description dtype to be a list of tuples.
my question is very similar to here: Find unique values in a Pandas dataframe, irrespective of row or column location
I am very new to coding, so I apologize for the cringing in advance.
I have a .csv file which I open as a pandas dataframe, and would like to be able to return unique values across the entire dataframe, as well as all unique strings.
I have tried:
for row in df:
pd.unique(df.values.ravel())
This fails to iterate through rows.
The following code prints what I want:
for index, row in df.iterrows():
if isinstance(row, object):
print('%s\n%s' % (index, row))
However, trying to place these values into a previously defined set (myset = set()) fails when I hit a blank column (NoneType error):
for index, row in df.iterrows():
if isinstance(row, object):
myset.update(print('%s\n%s' % (index, row)))
I get closest to what I was when I try the following:
for index, row in df.iterrows():
if isinstance(row, object):
myset.update('%s\n%s' % (index, row))
However, my set prints out a list of characters rather than the strings/floats/values that appear on my screen when I print above.
Someone please help point out where I fail miserably at this task. Thanks!
I think the following should work for almost any dataframe. It will extract each value that is unique in the entire dataframe.
Post a comment if you encounter a problem, i'll try to solve it.
# Replace all nones / nas by spaces - so they won't bother us later
df = df.fillna('')
# Preparing a list
list_sets = []
# Iterates all columns (much faster than rows)
for col in df.columns:
# List containing all the unique values of this column
this_set = list(set(df[col].values))
# Creating a combined list
list_sets = list_sets + this_set
# Doing a set of the combined list
final_set = list(set(list_sets))
# For completion's sake, you can remove the space introduced by the fillna step
final_set.remove('')
Edit :
I think i know what happens. You must have some float columns, and fillna is failing on those, as the code i gave you was replacing missing values with an empty string. Try those :
df = df.fillna(np.nan) or
df = df.fillna(0)
For the first point, you'll need to import numpy first (import numpy as np). It must already be installed as you have pandas.