Creating a dataframe from a dictionary without the key being the index - python

I've got a basic dictionary that gives me a count of how many times data shows up. e.g. Adam: 10, Beth: 3, ... , Zack: 1
If I do df = pd.DataFrame([dataDict]).T then the keys from the dictionary become the index of the dataframe and I only have 1 true column of data. I've looked by I haven't found a way around this so any help would be appreciated.
Edit: More detail
The dictionary was formed from a count function of another dataframe e.g. dataDict = df1.Name.value_counts().to_dict ()
This is my expected output.
| Name | Count
------ | -----|------
0 | Adam | 10
------ | -----|------
1 | Beth | 3
What I'm getting at the moment is this:
| Count
-----|------
Adam | 10
-----|------
Beth | 3

try reset_index
dataDict = dict(Adam=10, Beth=3, Zack=1)
df = pd.Series(dataDict).rename_axis('Name').reset_index(name='Count')
df

Related

Replace Multiple Values On Python or EXCEL

Seeking for Help. Hi Guys i didnt code yet because i think i need some idea to access the csv and the row. so technically i want to replace the text with the id on the CSV file
import pandas as pd
df = pd.read_csv('replace.csv')
print(df)
Please kindly view the photo. so if you see there is 3 column, so i want to replace the D Column if the D Column is Equal to A Column, then replace with the ID (column B). seeking for i idea if what is the first step or guide.. thanks
Photo
In The Photo
name | id | Replace
james | 5 | James,James,Tom
tom | 2 | Tom,James,James
jerry | 10 | Tom,Tom,Tom
What Im Expected Result:
name | id | Replace
james | 5 | 5,5,2
tom | 2 | 2,5,5
jerry | 10 | 2,2,2
Excel 365:
As per my comment, if it's ok to get data in a new column and with ms365, try:
Formula in E2:
=MAP(C2:C4,LAMBDA(x,TEXTJOIN(",",,XLOOKUP(TEXTSPLIT(x,","),A2:A4,B2:B4,"",0))))
Or, if all values will be present anyways:
=MAP(C2:C4,LAMBDA(x,TEXTJOIN(",",,VLOOKUP(TEXTSPLIT(x,","),A2:B4,2,0))))
Google-Sheets:
The Google-Sheets equivalent, as per your request, could be:
=MAP(C2:C4,LAMBDA(x,INDEX(TEXTJOIN(",",,VLOOKUP(SPLIT(x,","),A2:B4,2,0)))))
Python/Pandas:
After some trial and error I came up with:
import pandas as pd
df = pd.read_csv('replace.csv', sep=';')
df['Replace'] = df['Replace'].replace(pd.Series(dict(zip(df.name, df.id))).astype(str), regex=True)
print(df)
Prints:
name id Replace
0 James 5 5,5,2
1 Tom 2 2,5,5
2 Jerry 10 2,2,2
Note: I used the semi-colon as seperator in the function call to open the CSV.
Nested =substitute functions would make this easy.
=substitute(substitute(substitute(d2, a2, b2),a3,b3),a4,b4)

Need to assign the results of df.apply to two new columns [duplicate]

This question already has answers here:
Pandas Apply Function That returns two new columns
(6 answers)
Closed 1 year ago.
all.
I have a function that returns two values. One is a list, the other is a double.
I want to use something like this to create two new columns in my df and use .apply to populate those columns on a row by row basis.
def f(a_list):
#do some stuff to the list
if(stuff):
make_new_stuff_happen
#return results of stuff
return new_list, a_double
def main():
df['new_col1'], df['new_col2'] = df.apply(lambda x: f(x['some_col']))
Thanks for any help you can provide.
A few notes:
I think by double you mean a float in Python?
Even for examples, I'd name your function & vars something more meaningful, so it's easier to diagnose
Maybe this answer will help:
If this is the original dataframe you're working with:
col_1 | col_2 | col_3
-------------------------
1 | 3 | 3
2 | 3 | 4
3 | 1 | 1
You can just have a function like this:
def transform_into_two_columns(original_val_from_row):
# do some stuff to the list:
# example 1: multiply each row by 2 & save output to new list (this would be "new_list" in your question)
original_val_times_2 = original_val_from_row*2
# example 2: sum all values in list/column (this would be "a_double" in your question)
original_val_plus_2 = original_val_from_row+2.1
return original_val_times_2, original_val_plus_2
Then, you can save that function's output to a list:
list_of_tuples = df['col_2'].apply(lambda x: transform_into_two_columns(x)).to_list()
Then, with that list_of_tuples, you can create 2 new columns:
df[['NEW_col_4', 'NEW_col_5']] = pd.DataFrame(list_of_tuples, index=df.index)
Your new dataframe will look like this:
col_1 | col_2 | col_3 | NEW_col_4 | NEW_col_5
---------------------------------------------------
1 | 3 | 3 | 6 | 5.1
2 | 3 | 4 | 6 | 5.1
3 | 1 | 1 | 2 | 3.1

Apply function with string and integer from multiple columns not working

I want to create a combined string based on two columns, one is an integer and the other is a string. I need to combine them to create a string.
I've already tried using the solution from this answer here (Apply function to create string with multiple columns as argument) but it doesn't give the required output. H
I have two columns: prod_no which is an integer and PROD which is a string. So something like
| prod_no | PROD | out | | |
|---------|-------|---------------|---|---|
| 1 | PRODA | #Item=1=PRODA | | |
| 2 | PRODB | #Item=2=PRODB | | |
| 3 | PRODC | #Item=3=PRODC | | |
to get the last column, I used the following code:
prod_list['out'] = prod_list.apply(lambda x: "#ITEM={}=={}"
.format(prod_list.prod_no.astype(str), prod_list.PROD), axis=1)
I'm trying to produce the column "out" but the result of that code is weird. The output is #Item=0 1 22 3...very odd. I'm specifically trying to implement using apply and lambda. However, I am biased to efficient implementations since I am trying to learn how to write optimized code. Please help :)
This works.
import pandas as pd
df= pd.DataFrame({"prod_no": [1,2,3], "PROD": [ "PRODA", "PRODB", "PRODC" ]})
df["out"] = df.apply(lambda x: "#ITEM={}=={}".format(x["prod_no"], x["PROD"]), axis=1)
print(df)
Output:
PROD prod_no out
0 PRODA 1 #ITEM=1==PRODA
1 PRODB 2 #ITEM=2==PRODB
2 PRODC 3 #ITEM=3==PRODC
you can also try with zip:
df=df.assign(out=['#ITEM={}=={}'.format(a,b) for a,b in zip(df.prod_no,df.PROD)])
#or directly : df.assign(out='#Item='+df.prod_no.astype(str)+'=='+df.PROD)
prod_no PROD out
0 1 PRODA #ITEM=1==PRODA
1 2 PRODB #ITEM=2==PRODB
2 3 PRODC #ITEM=3==PRODC

Replacing string value in a pandas dataframe column inside a list in Python

I have a column in my dataframe like this:
___________________________
| columnn |
____________________________
| [happiness#sad] |
| [happy ness#moderate] |
| [happie ness#sad] |
____________________________
and I want to replace the “happy ness”,”happiness”,”happie ness” with 'happyness' . I am currently using this method but nothing is changed.
string exactly matching
happy ness===> happyness
happiness ===> happyness
happie ness===>happyness
I treid the below two approaches
1st Approach
df['column']
df.column=df.column.replace({"happiness":"happyness" ,"happy ness":"happyness" ,"happie ness":"happynesss" })
2nd Approach
df['column']=df['column'].str.replace("happiness","happyness").replace(“happy ness”.”happyness”).replace(“happie ness”,”happynesss”)
Desired Output:
______________________
| columnn |
_______________________
| [happyness,sad] |
| [happyness,moderate] |
| [happyness,sad] |
_______________________
This is one approach using replace with regex=True.
Ex:
import pandas as pd
df = pd.DataFrame({"columnn": [["happiness#sad"], ["happy ness#moderate"], ["happie ness$sad"]]})
data = {"happiness":"happyness" ,"happy ness":"happyness" ,"happie ness":"happynesss" }
df["columnn"] = df["columnn"].apply(lambda x: pd.Series(x).replace(data, regex=True).tolist())
print(df)
Output:
columnn
0 [happyness#sad]
1 [happyness#moderate]
2 [happynesss$sad]
Try this approach i think this will work for you.
df['new_col']=df['column'].replace(to_replace =
['happyness','happiness','happie ness'], value =
['happyness','happyness','happyness'])

Appending a csv with dictionary values using pandas python

My python script produces a dictionary as follows:
================================================================
TL&DR
I overcomplicated the problem by using from_dict method, while creating a dataframe from dictionary. Thanks to #Sword.
In other words, pd.DataFrame.from_dict is only needed if you want to create a dataframe with all keys in one column, all values in another column. In all other cases, it is as simple as the approach mentioned in the accepted answer.
==============================================================
{u'19:00': 2, u'12:00': 1, u'06:00': 2, u'00:00': 0, u'23:00': 2, u'05:00': 2, u'11:00': 4, u'14:00': 2, u'04:00': 0, u'09:00': 7, u'03:00': 1, u'18:00': 6, u'01:00': 0, u'21:00': 5, u'15:00': 8, u'22:00': 1, u'08:00': 5, u'16:00': 8, u'02:00': 0, u'13:00': 8, u'20:00': 5, u'07:00': 11, u'17:00': 12, u'10:00': 8}
and it also produces a variable, let's say full_name (taken as an argument to the script) which has the value "John".
Everytime I run the script, it gives me a dictionary and name in the aforementioned format.
I want to write this into a csv file for later analysis in the following format:
FULLNAME | 00:00 | 01:00 | 02:00 | .....| 22:00 | 23:00 |
John | 0 | 0 | 0 | .....| 1 | 2 |
My code to produce that is as follows:
import collections
import pandas as pd
# ........................
# Other part of code, which produces the dictionary by name "data_dict"
# ........................
#Sorting the dictionary (And adding it to a ordereddict) in order to skip matching dictionary keys with column headers
data_dict_sorted = collections.OrderedDict(sorted(data_dict.items()))
# For the first time to produce column headers, I used .items() and rest of the following lines follows it.
# df = pd.DataFrame.from_dict(data_dict_sorted.items())
#For the second time onwards, I just need to append the values, I am using .values()
df = pd.DataFrame.from_dict(data_dict_sorted.values())
df2 = df.T # transposing because from_dict creates all keys in one column, and corresponding values in the next column.
df2.columns = df2.iloc[0]
df3 = df2[1:]
df3["FULLNAME"] = args.name #This is how we add a value, isn't it?
df3.to_csv('test.csv', mode = 'a', sep=str('\t'), encoding='utf-8', index=False)
My code is producing the following csv
00:00 | 01:00 | 02:00 | …….. | 22:00 | 23:00 | FULLNAME
0 | 0 | 0 | …….. | 1 | 2 | John
0 | 0 | 0 | …….. | 1 | 2 | FULLNAME
0 | 0 | 0 | …….. | 1 | 2 | FULLNAME
My question is two fold:
Why is it printing "FULLNAME" instead of "John" in the second iteration (as in the second time the script is run)? What am I missing?
is there a better way to do this?
How about this?
df = pd.DataFrame(data_dict, index=[0])
df['FullName'] = 'John'
EDIT:
It is a bit difficult to understand the way you are conducting the operations but it looks like the issue is with the line df.columns = df.iloc[0] . The above code I've mentioned will not need the assignment of column names or the transpose operation. If you are adding a dictionary at each iteration, try:
data_dict['FullName'] = 'John'
df = df.append(pd.DataFrame(data_dict, index =[0]), ignore_index = True).reset_index()
If each row might have a different name, then df['FullName'] = 'John' will cause the entire column to equate to John. Hence as a better step, create a key called 'FullName' in your dict with the appropriate name as its value to avoid assigning a uniform value to the entire column i.e
data_dict['FullName'] = 'John'

Categories