I have the below dataframe
After doing the below manipulations to the dataframe, I am getting the output in the Rule column with comma at the end which is expected .but I want to remove it .How to do it
df['Rule'] = df.State.apply(lambda x: str("'"+str(x)+"',"))
df['Rule'] = df.groupby(['Description'])['Rule'].transform(lambda x: ' '.join(x))
df1 = df.drop_duplicates('Description',keep = 'first')
df1['Rule'] = df1['Rule'].apply(lambda x: str("("+str(x)+")")
I have tried it using ilo[-1].replace(",",""). But it is not working .
Try this:
df['Rule'] = df.State.apply(lambda x: str("'"+str(x)+"'"))
df['Rule'] = df.groupby(['Description'])['Rule'].transform(lambda x: ', '.join(x))
df1 = df.drop_duplicates('Description', keep = 'first')
df1['Rule'] = df1['Rule'].apply(lambda x: str("("+str(x)+")"))
Related
I need help on how to properly transform my df from this:
df = pd.DataFrame({'ID': ['ID no1', "ID no2", "ID no3"],
'ValueM2': ["11998","11076", "12025"],
'ValueSqFt': [129145.39718,119221.07178, 129.43600276]})
to this: --> also i need it to be outputted as double quote (") instead of single quote (')
dfnew = pd.DataFrame({'ID': ["ID no1", "ID no2", "ID no3"],
'DataMetric': [{"ValueM2": "11998"}, {"ValueM2": "11076"}, {"ValueM2": "12025"}],
'DataImperial': [{"ValueSqFt": "129145.39718"}, {"ValueSqFt": "119221.07178"}, {"ValueSqFt": "129.43600276"}]})
If there are only 2 columns to be manipulated, it is best to adopt a manual approach as follows:
df['ValueM2'] = [{'ValueM2': x} for x in df['ValueM2'].values]
df['ValueSqFt'] = [{"ValueSqFt": x} for x in df['ValueSqFt'].values]
df = df.rename(columns={'ValueM2': 'DataMetric', 'ValueSqFt': 'DataImperial'})
If you want to have the output with double quotes, you can use json.dumps:
import json
df['DataMetric'] = df['DataMetric'].apply(lambda x: json.dumps(x))
df['DataImperial'] = df['DataImperial'].apply(lambda x: json.dumps(x))
or
df['DataMetric'] = df['DataMetric'].astype(str).apply(lambda x: x.replace("'", '"'))
df['DataImperial'] = df['DataImperial'].astype(str).apply(lambda x: x.replace("'", '"'))
but this will convert the date type to string!
These are the values in my DataFrame
What I am trying to do, is to change the datatype of columns from string to float, but I can't because some of the values in df are written with the space.
What I've already tried:
df['Value'] = df['Value'].str.strip()
df['Value'] = df['Value'].str.replace(' ','')
Nothing helps... Anyone got any ideas?
This code should work:
df['Value'] = df['Value'].astype(str).str.replace(' ','').astype(float)
If it does not, try to troubleshoot with the following:
def check_cell(x):
try:
x = float(str(x).replace(' ', ''))
return x
except:
print(x)
df['Value'] = df['Value'].apply(lambda x: check_cell(x))
You could try this variant:
df['Value'] = df['Value'].str.replace(' ', '', regex=True)
As the OP says, they want the result to be a Series of float, not str. A safe way to do this is, after removing any potential spaces, to use pd.to_numeric():
df['Value'] = pd.to_numeric(df['Value'].str.replace(' ', '', regex=False))
My data consist of 1million rows. A sample look like this:
_id:object("603678958a6eade21c0790b8")
id1:3758
date2:2010-01-01
time3:00:05:00
date4 :2009-12-31
time5:19:05:00
id6 :2
id7:-79.09
id8:35.97
id9:5.5
id10:0
id11:-99999
id12 :0
id13 :-9999
c14:"U"
id15:0
id16:99
id17:0
id18:-99
id19:-9999
id20:33
id21:0
id22:-99
id23:0
The thing is that date2 and date4 are in the form that i want but they are string and i want to convert them to date.
The code i have used look like this:
df['date4'] = df['date4'].astype('datetime64[ns]')
df['date2'] = df['date2'].astype('datetime64[ns]')
df['time3'] = df['time3'].apply(lambda x:datetime.datetime.strptime(x[0]+x[1]+":"+x[2]+x[3], '%H:%M'))
df['time5'] = df['time5'].apply( lambda x: datetime.datetime.strptime(x[0] + x[1] + ":" + x[2] + x[3], '%H:%M'))
df['date2'] = df['date2'].apply(lambda x: arrow.get(x).format("YYYY-MM-DD"))
df['date4'] = df['date4'].apply(lambda x: arrow.get(x).format("YYYY-MM-DD"))
df['time3'] = df['time3'].apply(lambda x: arrow.get(x).format("HH:mm:ss"))
df['time5'] = df['time5'].apply(lambda x: arrow.get(x).format("HH:mm:ss"))
Do i need to convert them before inserting or after?
Does anyone know how i can do that?
If it were me, I'd want to combine date2/time3 into one column, and date4/time5, as in:
df['date2'] = (df['date2']+'T'+df['time3']).astype('datetime64')
df['date4'] = (df['date4']+'T'+df['time5']).astype('datetime64')
As the title suggest, I am trying to turn this
into this
The end-goal is ideally so that I'd be able to group-by them up and get a word count.
This is the code that I've tried so far. I am unsure.
df_unpivoted = final_df.melt(df.reset_index(), id_vars= 'index', var_name = 'Count', value_name = 'Value')
df = final_df.rename(columns=lambda x: x + x[-1] if x.startswith('index') else x)
df = pd.wide_to_long(df, ['0'], i='id', j='i')
df = df.reset_index(level=1, drop=True).reset_index()
Here's my code for reading in this dataframe:
html = 'https://www.agroindustria.gob.ar/sitio/areas/ss_mercados_agropecuarios/logistica/_archivos/000023_Posici%C3%B3n%20de%20Camiones%20y%20Vagones/000010_Entrada%20de%20camiones%20y%20vagones%20a%20puertos%20semanal%20y%20mensual.php'
url = urlopen(html)
df = pd.read_html(html, encoding = 'utf-8')
remove = []
for x in range(len(df)):
if len(df[x]) < 10:
remove.append(x)
for x in remove[::-1]:
df.pop(x)
df = df[0]
The dataframe contained uses both ',' and '.' as thousands indicators, and i want neither. So 5.103 should be 5103.
Using this code:
df = df.apply(lambda x: x.str.replace('.', ''))
df = df.apply(lambda x: x.str.replace(',', ''))
All of the data will get changed, but the values in the last four columns will all turn to NaN. I'm assuming this has something to do with trying to use str.replace on a float?
Trying any sort of df[column] = df[column].astype(str) also gives back errors, as does something convoluted like the following:
for x in df.columns.tolist():
for k, v in df[x].iteritems():
if pd.isnull(v) == False and type(v) = float:
df.loc(k, df[x]) == str(v)
What is the right way to approach this problem?
You can try this regex approach. I haven't tested it, but it should work.
df = df.apply(lambda x: re.sub(r'(\d+)[.,](\d+)',r'\1\2',str(x)))