how to add pound(#) symbol in pandas 'to_csv' header?

how to add pound(#) symbol in pandas 'to_csv' header? - python

I have a pandas DataFrame and I would like to save the DataFrame in a tab separated file format with pound(#) symbol at the beginning of the header.
Here is my demo code:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])
file_name = 'test.tsv'
df.to_csv(file_name, sep='\t', index=False)
The above code create a dataframe and save it in a tab separated value format. that looks like:
a b c
1 2 3
4 5 6
7 8 9
But how I can add add pound symbol with the header while saving the DataFrame.
I want the output to be like bellow:
#a b c
1 2 3
4 5 6
7 8 9
Hope I am clear with the question and thanks in advance for the help.
Note: I would like to keep the DataFrame header definition same

Using your code, just modify the a column to be #a like below
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['#a', 'b', 'c'])
file_name = 'test.tsv'
df.to_csv(file_name, sep='\t', index=False)
Edit
If you don't want to adjust the starting dataframe, use .rename before sending to csv:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])
file_name = 'test.tsv'
df.rename(columns={
'a' : '#a'
}).to_csv(file_name, sep='\t', index=False)

Use the header argument to create aliases for the columns.
df.to_csv(file_name, sep='\t', index=False,
header=[f'#{x}' if x == df.columns[0] else x for x in df.columns])
#a b c
1 2 3
4 5 6
7 8 9
Here's another way to get your column aliases:
from itertools import zip_longest
header = [''.join(x) for x in zip_longest('#', df.columns, fillvalue='')]
#['#a', 'b', 'c']

Related

How to calculate number of rows between 2 indexes of pandas dataframe

I have the following Pandas dataframe in Python:
import pandas as pd
d = {'col1': [1, 2, 3, 4, 5], 'col2': [6, 7, 8, 9, 10]}
df = pd.DataFrame(data=d)
df.index=['A', 'B', 'C', 'D', 'E']
df
which gives the following output:
col1 col2
A 1 6
B 2 7
C 3 8
D 4 9
E 5 10
I need to write a function (say the name will be getNrRows(fromIndex) ) that will take an index value as input and will return the number of rows between that given index and the last index of the dataframe.
For instance:
nrRows = getNrRows("C")
print(nrRows)
> 2
Because it takes 2 steps (rows) from the index C to the index E.
How can I write such a function in the most elegant way?

The simplest way might be
len(df[row_index:]) - 1

For your information we have built-in function get_indexer_for
len(df)-df.index.get_indexer_for(['C'])-1
Out[179]: array([2], dtype=int64)

Newbie with Reticulate: How can I take the object from this python script to use in R?

Python Script
#!/bin/python3
import pandas as pd
import numpy as np
class test(object):
def checker(self):
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
return df2
if __name__ == "__main__":
q = test()
q.checker()
I want that df2 object. The dataframe.
R code
x <- py_run_file("new1.py")
The output ends of being a Dictionary with 28 items.
What is the correct way to grab that object in R using Reticulate?

You need to pull an object from that environment:
import pandas as pd
import numpy as np
class test(object):
def checker(self):
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
return df2
if __name__ == "__main__":
q = test()
x = q.checker()
In R:
library(reticulate)
x <- py_run_file("test.py")$x
x
a b c
1 1 2 3
2 4 5 6
3 7 8 9

How to replace specific character in pandas column with null?

I Have a column within a dataset, regarding categorical company sizes, which currently looks like this, where the '-' hyphens are currently representing missing data:
I want to change the '-' in missing values with nulls so i can analyse missing data. However when I use the pd replace tool (see following code) with a None value it seems to also make any of the genuine entries as they also contain hyphens (e.g 51-200).
df['Company Size'].replace({'-': None},inplace =True, regex= True)
How can I replace only lone standing hyphens and leave the other entries untouched?

You need not to use regex=True.
df['Company Size'].replace({'-': None},inplace =True)

You could also just do:
df['column_name'] = df['column_name'].replace('-','None')

import numpy as np
df.replace('-', np.NaN, inplace=True)
This code worked for me.

you can do it like this
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['a', '-', 'c--', 'd', 'e']})
df['C'] = df['C'].replace('-', np.nan)
df = df.where((pd.notnull(df)), None)
# can also use this -> df['C'] = df['C'].where((pd.notnull(df)), None)
print(df)
output:
A B C
0 0 5 a
1 1 6 None
2 2 7 c--
3 3 8 d
4 4 9 e
another example:
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
'B': ['5-5', '-', 7, 8, 9],
'C': ['a', 'b', 'c--', 'd', 'e']})
df['B'] = df['B'].replace('-', np.nan)
df = df.where((pd.notnull(df)), None)
print(df)
output:
A B C
0 0 5-5 a
1 1 None b
2 2 7 c--
3 3 8 d
4 4 9 e

Python Pandas, Selecting Column with „loc“ statement

I´ve got the following problem:
If i select some index of my Pandas DataFrame:
df = pd.DataFrame(data=CoordArray[0:,1:],index=CoordArray[:,0],columns=["x","y","z"])
like this:
print(df.loc[['1234567','7654321'],:])
it works pretty well.
but if i have those data in a numpy array, transform this array to a list and do it like this:
mynewlist = list(SomeNumpyArray)
print(df.loc[mynewlist])
i get the following problem:
"None of [[1234567, 7654321]] are in the [index]"
I really dont know whats going wrong.

I haven't been able to replicate your issue. As #Wen commented, your list and numpy array may not have the same types.
Here is an example demonstrating that lists or numpy arrays are acceptable as indexers:
import pandas as pd, numpy as np
df = pd.DataFrame(data=[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]],
index=['1000', '2000', '3000', '4000'],
columns= ['x', 'y', 'z'])
idx = np.array(['2000', '3000'])
df.loc[idx]
# x y z
# 2000 4 5 6
# 3000 7 8 9
lst = list(idx)
df.loc[idx_lst]
# x y z
# 2000 4 5 6
# 3000 7 8 9

How to convert Pandas DataFrame into Pandas ML ModelFrame?

I would like to repeat these example_1 example_2 with my dataset.
import pandas_ml as pdml
df = pdml.ModelFrame({'A': [1, 2, 3], 'B': [2, 3, 4],
'C': [3, 4, 5]}, index=['a', 'b', 'c'])
df
A B C
a 1 2 3
b 2 3 4
c 3 4 5
But, the issue is I have my data set in a csv file.
x_test = pd.read_csv("x_test.csv",sep=';',header=None)
I've tried to convert pandas data frame to dict, but it didn't work.
So, the question is there a way for converting the pandas dataframe into Pandas-Ml ModelFrame?

I think you need DataFrame.to_dict with parameter orient:
x_test = pd.read_csv("x_test.csv",sep=';',header=None)
df = pdml.ModelFrame(x_test.to_dict(orient='list'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to add pound(#) symbol in pandas 'to_csv' header? - python

Related

How to calculate number of rows between 2 indexes of pandas dataframe

Newbie with Reticulate: How can I take the object from this python script to use in R?

How to replace specific character in pandas column with null?

Python Pandas, Selecting Column with „loc“ statement

How to convert Pandas DataFrame into Pandas ML ModelFrame?

Categories

Resources