This question already has answers here:
Create a Pandas Dataframe by appending one row at a time
(31 answers)
Closed 3 years ago.
I have a pandas dataframe that has been defined as empty and then I would like to add some rows to it after doing some calculations.
I have tried to do the following:
test = pd.DataFrame(columns=['Name', 'Age', 'Gender'])
if #some statement:
test.append(['James', '95', 'M'])
If I try to print and then append to test shows
print(test)
test.append(['a', 'a', 'a', 'a', 'a', 'a'])
print(test)
>>>
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
Empty DataFrame
Columns: [Name, Age, Gender]
Index: []
So clearly the line is not being added to the dataframe.
I want the output to be
Name | Age | Gender
James | 95 | M
Use append with dictionary as:
test = test.append(dict(zip(test.columns,['James', '95', 'M'])), ignore_index=True)
print(test)
Name Age Gender
0 James 95 M
You can pass as a Series:
test.append(pd.Series(['James', 95, 'M'], index=test.columns), ignore_index=True)
[out]
Name Age Gender
0 James 95 M
Try to append it as dictionary:
>>> test = test.append({'Name': "James", "Age": 95, "Gender": "M"}, ignore_index=True)
>>> print(test)
Outputs:
Name Age Gender
0 James 95 M
Append function will convert list into DataFrame,it automatic generation columns[0],but test have columns=['Name', 'Age', 'Gender'].And append Will not change test.What I said may be confusing,running the following code might make you understand.
import pandas as pd
#Append function will convert list into DataFrame,and two dataframe object should have the same column
data = pd.DataFrame(['James', '95', 'M'])
print(data)
#right code
test = pd.DataFrame(columns=['Name', 'Age', 'Gender'])
test = test.append(pd.DataFrame([['James', '95', 'M']],columns=['Name', 'Age', 'Gender']))
print(test)
Try this,
>>> test.append(dict(zip(test.columns,['James', '95', 'M'])), ignore_index=True)
Name Age Gender
0 James 95 M
First, the append method does not modify the DataFrame in place but returns the modified (appended version).
Second, the new row passed should be either a DataFrame, either a dict/Series, or either a list of these.
#using dict
row = {'Name': "James", "Age": 95, "Gender": "M"}
# using Series
row = pd.Series(['James', 95, 'M'], index=test.columns))
Try print( test.append(row) ) and see the result.
What you need is to save the return value of test.append as the appended version, you can save it with the same name if you don't care about the preceding version, it gives us:
test = test.append( row )
Related
I am attempting to resort a list based on the order in which they are listed in a dataframe, despite the dataframe column being of a greater length.
enrolNo Surname
0 1 Jones
1 2 Smith
2 3 Henderson
3 4 Kilm
4 5 Henry
5 6 Joseph
late = ['Kilm', 'Henry', 'Smith']
Desired output:
sorted_late = ['Smith', 'Kilm', 'Henry']
My initial attempt was to add a new column to the existing dataframe and then extract it as a list, but this seems like a very long way around. Furthermore I discovered that my attempt was not going to work due the the different lengths as stated by the error message after trying this to start with:
df_register['late_arrivals'] = np.where((df_register['Surname'] == late),
late , '')
Should I be using a 'for' loop instead?
Pluck out the matching values from dataframe itself. No need to sort the list itself:
sorted_late = df[df.Surname.isin(late)].Surname.to_list()
If it were a list you can be clever with that too:
sorted_late = [master_late for master_late in master_list if master_late in late]
Why not using the .isin() function?
df['Surename'].isin(late)
then you should get the desired output.
you can specify a custom key for the sort function
import pandas
df = pandas.DataFrame([
{"enrolNo": 1, "Surname": "Jones"},
{"enrolNo": 2, "Surname": "Smith"},
{"enrolNo": 3, "Surname": "Henderson"},
{"enrolNo": 4, "Surname": "Kilm"},
{"enrolNo": 5, "Surname": "Henry"},
{"enrolNo": 6, "Surname": "Joseph"},
])
# set Surname as index so we can access enrolNo by it
df = df.set_index('Surname')
# now you can access enrolNo by Surname
assert df.loc['Kilm']['enrolNo'] == 4
# define the list to be sorted
late = ['Kilm', 'Henry', 'Smith']
# Sort late by enrolNo as listed in the dataframe
late_sorted = sorted(late, key=lambda n: df.loc[n]['enrolNo'])
# ['Smith', 'Kilm', 'Henry']
I have pandas dataframe, where I listed items, and categorised them:
col_name |col_group
-------------------------
id | Metadata
listing_url | Metadata
scrape_id | Metadata
name | Text
summary | Text
space | Text
To reproduce:
import pandas
df = pandas.DataFrame([
['id','metadata'],
['listing_url','metadata'],
['scrape_id','metadata'],
['name','Text'],
['summary','Text'],
['space','Text']],
columns=['col_name', 'col_group'])
Can you suggest how I can convert this dataframe to multiple lists based on "col_group":
Metadata = ['id','listing_url','scraping_id]
Text = ['name','summary','space']
This is to allow me to pass these lists of columns to panda and drop columns.
I googled a lot and got stuck: all answers are about converting lists to df, not vice versa. Should I aim to convert into dictionary, or list of lists?
I have over 100 rows, belonging to 10 categories, so would like to avoid manual hard-coding.
I've try this code:
import pandas
df = pandas.DataFrame([
[1, 'url_a', 'scrap_a', 'name_a', 'summary_a', 'space_a'],
[2, 'url_b', 'scrap_b', 'name_b', 'summary_b', 'space_b'],
[3, 'url_c', 'scrap_c', 'name_c', 'summary_c', 'space_ac']],
columns=['id', 'listing_url', 'scrape_id', 'name', 'summary', 'space'])
print(df)
for row in df.iterrows():
print(row[1].to_list())
which give this answer:
[1, 'url_a', 'scrap_a', 'name_a', 'summary_a', 'space_a']
[2, 'url_b', 'scrap_b', 'name_b', 'summary_b', 'space_b']
[3, 'url_c', 'scrap_c', 'name_c', 'summary_c', 'space_ac']
You can use
for row in df[['name', 'summary', 'space']].iterrows():
to only iter over specific columns.
Like this:
In [245]: res = df.groupby('col_group', as_index=False)['Col_name'].apply(list)
In [248]: res.tolist()
Out[248]: [['id', 'listing_url', 'scrape_id'], ['name', 'summary', 'space']]
my_vars = df.groupby('col_group').agg(list)['col_name'].to_dict()
Output:
>>> my_vars
{'Text': ['name', 'summary', 'space'], 'metadata': ['id', 'listing_url', 'scrape_id']}
The recommended usage would be just my_vars['Text'] to access the Text, and etc. If you must have this as distinct names you can force it upon your target scope, e.g. globals:
globals().update(df.groupby('col_group').agg(list)['col_name'].to_dict())
Result:
>>> Text
['name', 'summary', 'space']
>>> metadata
['id', 'listing_url', 'scrape_id']
However I would advise against that as you might unwittingly overwrite some of your other objects, or they might not be in the proper scope you needed (e.g. locals).
Given an output of df=pandas.read_csv(somePath,header=None):
0 1
0 Name Bambang
1 Gender Male
2 Age 25
How to convert it into:
dict_data={
'Name':Bambang,
'Gender':Male,
'Age':25
}
I can do it but in a long way:
df=pandas.read_csv(somePath,header=None)
df=df.set_index([0])
theDict=df.to_dict()
theDict=theDict[1]
Is there a native and simple way to do it using pandas.read_csv() or python native command? Thank you.
The assumption is that u've read the data and want it as a dict
something like this could work :
df.set_index('0').T.to_dict('records')[0]
{'Name': 'Bambang', 'Gender': 'Male', 'Age ': '25'}
Also, if u really want to to do this, it would be better to just use python's csv reader instead to get ur dict, instead of the round about way of pandas first then dict:
This is how the data looks in data.txt; I'm not sure if this replicates exactly what you have:
data = '''
Name Bambang
Gender Male
Age 25'''
data
import csv
A = []
with open('data.txt', newline = '') as csvfile:
content = csv.reader(csvfile,delimiter = ' ')
for row in content:
A.append([entry for entry in row if entry != ''])
dict(A)
{'Name': 'Bambang', 'Gender': 'Male', 'Age': '25'}
UPDATE : thanks to #AMC, it is much simpler from the pandas end -: get the numpy values and apply dict:
dict(df.to_numpy())
{'Name': 'Bambang', 'Gender': 'Male', 'Age': '25'}
I have an Excel file with a structure like this:
name age status
anna 35 single
petr 27 married
I have converted such a file into a nested dictionary with a structure like this:
{'anna': {'age':35}, {'status': 'single'}},
{'petr': {'age':27}, {'status': 'married'}}
using pandas:
import pandas as pd
df = pd.read_excel('path/to/file')
df.set_index('name', inplace=True)
print(df.to_dict(orient='index'))
But now when running list(df.keys()) it returns me a list of all keys in the dictionary ('age', 'status', etc) but not 'name'.
My eventual goal is that it returns me all the keys and values by typing a name.
Is it possible somehow? Or maybe I should use some other way to import a data in order to achieve a goal? Eventually I should anyway come to a dictionary because I will merge it with other dictionaries by a key.
I think you need parameter drop=False to set_index for not drop column Name:
import pandas as pd
df = pd.read_excel('path/to/file')
df.set_index('name', inplace=True, drop=False)
print (df)
name age status
name
anna anna 35 single
petr petr 27 married
d = df.to_dict(orient='index')
print (d)
{'anna': {'age': 35, 'status': 'single', 'name': 'anna'},
'petr': {'age': 27, 'status': 'married', 'name': 'petr'}}
print (list(df.keys()))
['name', 'age', 'status']
Given a dataframe from excel, you should do this to obtain the thing you want:
resulting_dict = {}
for name, info in df.groupby('name').apply(lambda x: x.to_dict()).iteritems():
stats = {}
for key, values in info.items():
if key != 'name':
value = list(values.values())[0]
stats[key] = value
resulting_dict[name] = stats
Try this :
import pandas as pd
df = pd.read_excel('path/to/file')
df[df['name']=='anna'] #Get all details of anna
df[df['name']=='petr'] #Get all details of petr
I have what should be a simple problem but 3 hours into trying different things and I cant solve it.
I have a pymysql returning me results from a query. I cant share the exact example but this straw man should do.
cur.execute("select name, address, phonenum from contacts")
This returns results perfectly which i grab with
results = cur.fetchall()
and then convert to a list object exactly as I want it
data = list(results)
Unfortunately this doesn't include the header but you can get it with cur.description (which contains metadata including but not limited to the header). I push this into a list
Header=[]
for n in cur.description:
header.append(str((n[0])))
so my header looks like:
['name','address','phonenum']
and my results look like:
[['Tom','dublin','12345'],['Bob','Kerry','56789']]
I want to create a dataframe in pandas and then pivot it but it needs column headers to work properly. I had previously been importing a completed csv into a pandas DF which included the header so this all worked smoothly but now i need to get this data direct from the DB so I was thinking, that's easy, I just join the two lists and hey presto I have what I am looking for, but when i try to append I actually wind up with this:
['name','address','phonenum',['Tom','dublin','12345'],['Bob','Kerry','56789']]
when i need this
[['name','address','phonenum'],['Tom','dublin','12345'],['Bob','Kerry','56789']]
Anyone any ideas?
Much appreciated!
Addition of lists concatenates contents:
In [17]: [1] + [2,3]
Out[17]: [1, 2, 3]
This is true even if the contents are themselves lists:
In [18]: [[1]] + [[2],[3]]
Out[18]: [[1], [2], [3]]
So:
In [13]: header = ['name','address','phonenum']
In [14]: data = [['Tom','dublin','12345'],['Bob','Kerry','56789']]
In [15]: [header] + data
Out[15]:
[['name', 'address', 'phonenum'],
['Tom', 'dublin', '12345'],
['Bob', 'Kerry', '56789']]
In [16]: pd.DataFrame(data, columns=header)
Out[16]:
name address phonenum
0 Tom dublin 12345
1 Bob Kerry 56789
Note that loading a DataFrame with data from a database can also be done with pandas.read_sql.
is that what you are looking for?
first = ['name','address','phonenum']
second = [['Tom','dublin','12345'],['Bob','Kerry','56789']]
second = [first] + second
print second
'[['name', 'address', 'phonenum'], ['Tom', 'dublin', '12345'], ['Bob', 'Kerry', '56789']]'
Other possibilities:
You could insert it into data location 0 as a list
header = ['name','address','phonenum']
data = [['Tom','dublin','12345'],['Bob','Kerry','56789']]
data.insert(0,header)
print data
[['name', 'address', 'phonenum'], ['Tom', 'dublin', '12345'], ['Bob', 'Kerry', '56789']]
But if you are going to manipulate header variable you can shallow copy it
header = ['name','address','phonenum']
data = [['Tom','dublin','12345'],['Bob','Kerry','56789']]
data.insert(0,header[:])
print data
[['name', 'address', 'phonenum'], ['Tom', 'dublin', '12345'], ['Bob', 'Kerry', '56789']]