I have this table which i export to CSV Using this code:
df['time'] = df['time'].astype("datetime64").dt.date
df = df.set_index("time")
df = df.groupby(df.index).agg(['min', 'max', 'mean'])
df = df.reset_index()
df = df.to_csv(r'C:\****\Exports\exportMMA.csv', index=False)
While exporting this, my result is:
| column1 | column2 | column3 |
|:---- |:------: | -----: |
| FT1 | FT2 | FT3 |
| 12 | 8 | 3 |
I want to get rid of column 1,2,3 and replace the header with FT2 and FT3
Tried this :
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
And This :
df.columns = df.iloc[0]
df = df[1:]
Somehow it wont work, I not really in need to replace the headers in the dataframe having the right headers in csv is more important.
Thanks!
You can change the columns names in your DataFrame before writing it to a CSV file. Here is the updated code:
df['time'] = df['time'].astype("datetime64").dt.date
df = df.set_index("time")
df = df.groupby(df.index).agg(['min', 'max', 'mean'])
df = df.reset_index()
# Changing the column names
df.columns = ['FT2', 'FT3']
# Writing the DataFrame to a CSV file
df.to_csv(r'C:\****\Exports\exportMMA.csv', index=False, header=True)
The header parameter in the to_csv method determines whether or not to write the column names to the CSV file. In this case, it's set to True so that the column names will be written.
Related
I need to arrange a Pandas DataFrame with values that aren't in the right columns. I would like to rearrange the values in the cells according to a prefix that I have, and push the 'unknown' columns with their values to the end of the dataframe.
I have the following dataframe:
The output I am looking for is:
the 'known' values have a header while the unknowns (5, 6) are to the end.
the 'rule': if there is no cell with '|' in the column then the column name will not be changed.
any suggestions that I could try would be really helpful in solving this.
Try this:
import pandas as pd
rename_dict = {} # reset rename dictionay
df = pd.DataFrame({'1':['name | Steve', 'name | John'],
'2':[None, None],
'3':[None , 'age | 50']})
for col in df.columns:
vals = df[col].values # look at values in each column
vals = [x for x in vals if x] # remove Nulls
vals = [x for x in vals if '|' in x] # leave values with | only
if len(vals) > 0:
new_col_name = vals[0].split('|')[0] # getting the new column name
rename_dict[col] = new_col_name # add column names to rename dictionay
df.rename(columns=rename_dict, inplace = True) # renaming the column name
df
name 2 age
0 name | Steve None None
1 name | John None age | 50
it looks a bit tricky and not exactly what you expected, but it might give you an idea how to solve your task:
df = pd.DataFrame([['email | 1#mail.com','name | name1','surname | surname1','','',''],
['email | 2#mail.com','','name | name2','occupation | student','surname | surname2','abc | 123']])
df.apply(lambda x: pd.Series(dict([tuple(i.split(' | ')) for i in x.tolist() if i])),axis=1)
>>> out
'''
abc email name occupation surname
0 NaN 1#mail.com name1 NaN surname1
1 123 2#mail.com name2 student surname2
You can try this solution:
my_dict = {}
def createDict(ss):
for i in range(1, 7, 1):
sss = ss[i].split('|')
if len(sss) > 1:
if sss[0].strip() in my_dict:
my_dict[sss[0].strip()].append(ss[i])
else:
my_dict[sss[0].strip()] = [ss[i]]
df = df.apply(lambda x: createDict(x), axis=1)
dff = pd.DataFrame.from_dict(my_dict, orient='index')
dff = dff.transpose()
print(dff)
Hope this answers your question.
:
I taking the confirm cases of data from here :
https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
I load the data in python using Pandas Dataframe .
my problem is : i am trying to make the columns of the date as rows , and the ' Country/Region' column as columns .
url_confirmed = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
df = pd.read_csv(url_confirmed)
df = df.drop(columns=['Province/State','Lat','Long'],axis=1)
df_piv = pd.melt(df,id_vars=['Country/Region'],var_name='Date',value_name="Value")
I get until here and really don't know how to proceed
my final dataframe suppose to look like this :
Date Afghanistan Albania and so on
0 1/22/20 0 val
1 1/23/20 300 val
3 1/24/20 4023 val
6 1/25/20 300 val
7 1/26/20 2000 val
8 .. ..
.
.
**Thank You Very Much **
I think a simple transpose with renaming a column should do it:
url_confirmed = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
df = pd.read_csv(url_confirmed)
df = df.drop(columns=['Province/State','Lat','Long'],axis=1)
df = df.T.reset_index() # Transpose and reset index
df.columns = df.iloc[0] # Set first row as header
df = df[1:]
df.rename(columns = {'Country/Region' : 'Date'}, inplace=True)
import pandas as pd
data = {'col': ['a11 aaa','a121 aaaa','a3333 adfdf']}
df = pd.DataFrame(data)
i want to set index Similar ['a11','a121','a3333']
print(df)
Option 1
import pandas as pd
data = {'col': ['a11 aaa','a121 aaaa','a3333 adfdf']}
df = pd.DataFrame(data)
df.index = df['col'].apply(lambda line: line.split(' ')[0]).tolist()
df
Option 2
import pandas as pd
data = {'col': ['a11 aaa','a121 aaaa','a3333 adfdf']}
indexes = map(lambda line: line.split(' ')[0],data['col'])
df = pd.DataFrame(data,index=indexes)
df
Output
| col
a11 | a11 aaa
a121 | a121 aaaa
a3333 | a3333 adfdf
i have a column dataframe with a json array that i want to split in columns for every row.
Dataframe
FIRST_NAME CUSTOMFIELDS
0 Maria [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALU...
1 John [{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALU...
...
Goal
I need convert the json content in that column into a dataframe
+------------+-----------------+-------------+-----------------+
| FIRST NAME | FIELD_NAME | FIELD_VALUE | CUSTOM_FIELD_ID |
+------------+-----------------+-------------+-----------------+
| Maria | CONTACT_FIELD_1 | EN | CONTACT_FIELD_1 |
| John | CONTACT_FIELD_1 | false | CONTACT_FIELD_1 |
+------------+-----------------+-------------+-----------------+
The code snippet below should work for you.
import pandas as pd
df = pd.DataFrame()
df['FIELD'] = [[{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'EN', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}, {'FIELD_NAME': 'CONTACT_FIELD_10', 'FIELD_VALUE': 'false', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_10'}]]
temp_dict = {}
counter = 0
for entry in df['FIELD'][0]:
temp_dict[counter] = entry
counter += 1
new_dataframe = pd.DataFrame.from_dict(temp_dict, orient='index')
new_dataframe #outputs dataframe
Edited answer to reflect edited question:
Under the assumption that each entry in CUSTOMFIELDS is a list with 1 element (which is different from original question; the entry had 2 elements), the following will work for you and create a dataframe in the requested format.
import pandas as pd
# Need to recreate example problem
df = pd.DataFrame()
df['CUSTOMFIELDS'] = [[{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'EN', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}],
[{'FIELD_NAME': 'CONTACT_FIELD_1', 'FIELD_VALUE': 'FR', 'CUSTOM_FIELD_ID': 'CONTACT_FIELD_1'}]]
df['FIRST_NAME'] = ['Maria', 'John']
#begin solution
counter = 0
dataframe_solution = pd.DataFrame()
for index, row in df.iterrows():
dataframe_solution = pd.concat([dataframe_solution, pd.DataFrame.from_dict(row['CUSTOMFIELDS'][0], orient = 'index').transpose()], sort = False, ignore_index = True)
dataframe_solution.loc[counter,'FIRST_NAME'] = row['FIRST_NAME']
counter += 1
Your dataframe is in dataframe_solution
I have 3 different csv files and I'm looking for concat the values. The only condition I need is that the first csv dataframe must be in column A of the new csv, the second csv dataframe in the column B and the Thirth csv dataframe in the C Column. The quantity of rows is the same for all csv files.
Also I need to change the three headers to ['año_pasado','mes_pasado','este_mes']
import pandas as pd
df = pd.read_csv('año_pasado_subastas2.csv', sep=',')
df1 = pd.read_csv('mes_pasado_subastas2.csv', sep=',')
df2 = pd.read_csv('este_mes_subastas2.csv', sep=',')
df1
>>>
Subastas
166665859
237944547
260106086
276599496
251813654
223790056
179340698
177500866
239884764
234813107
df2
>>>
Subastas
212003586
161813617
172179313
209185016
203804433
198207783
179410798
156375658
130228140
124964988
df3
>>>
Subastas
142552750
227514418
222635042
216263925
196209965
140984000
139712089
215588302
229478041
222211457
The output that I need is:
año_pasado,mes_pasado,este_mes
166665859,124964988,142552750
237944547,161813617,227514418
260106086,172179313,222635042
276599496,209185016,216263925
251813654,203804433,196209965
223790056,198207783,140984000
179340698,179410798,139712089
177500866,156375658,215588302
239884764,130228140,229478041
234813107,124964988,222211457
I think you need concat of Series created by squeeze=True if one column data only or selecting columns and for new columns names use parameter keys:
df = pd.read_csv('año_pasado_subastas2.csv', squeeze=True)
df1 = pd.read_csv('mes_pasado_subastas2.csv', squeeze=True)
df2 = pd.read_csv('este_mes_subastas2.csv', squeeze=True)
cols = ['año_pasado','mes_pasado','este_mes']
df = pd.concat([df, df1, df2], keys = cols, axis=1)
Or:
df = pd.read_csv('año_pasado_subastas2.csv')
df1 = pd.read_csv('mes_pasado_subastas2.csv')
df2 = pd.read_csv('este_mes_subastas2.csv')
cols = ['año_pasado','mes_pasado','este_mes']
df = pd.concat([df['Subastas'], df1['Subastas'], df2['Subastas']], keys = cols, axis=1)
print (df)
año_pasado mes_pasado este_mes
0 166665859 212003586 142552750
1 237944547 161813617 227514418
2 260106086 172179313 222635042
3 276599496 209185016 216263925
4 251813654 203804433 196209965
5 223790056 198207783 140984000
6 179340698 179410798 139712089
7 177500866 156375658 215588302
8 239884764 130228140 229478041
9 234813107 124964988 222211457