applying conditional on data in pandas dataframe and ignore headers - python

I have data in this format in CSV file. I want to have a excel file where all the values greater than 0 replaced with 1. Now I have tried this code but problem is either I loose the header (years eg 1960/1961) or I get error when I ignore them.
Here is my code trail.
import pandas as pd
data = pd.read_csv("first.csv")
data1 = data.apply(pd.to_numeric,errors='coerce')
data1 = (data1>0).astype(int)
data2 = data1.combine_first(data)
print(data2)
I want the output to be like
Here is the URL to csv file, you can download to run the given code.
https://gofile.io/?c=eWd049

numpy has a .ceil method that round up and .floor to round down.
numpy.ceil()
numpy.floor()
so it should be something like (once you change year/year as column title):
import numpy as np
for column in data.columns:
data[column]=data[column].apply(lambda x: np.ceil(x) if x<1 else np.floor(x))
for column title issues: specify the dtype and check the separator.

Related

Mix of columns in excel to one colum with Pandas

I have to import this Excel in code and I would like to unify the multi-index in a single column. I would like to delete the unnamed columns and unify everything into one. I don't know if it's possible.
I have tried the following and it imports, but the output is not as expected. I add the code here too
import pandas as pd
import numpy as np
macro = pd.read_excel(nameExcel, sheet_name=nameSheet, skiprows=3, header=[1,3,4])
macro = macro[macro.columns[1:]]
macro
The way to solve it is to save another header of the same length as the previous header.
cols = [...]
if len(df1.columns) == len(cols):
df1.columns = cols
else:
print("error")

pandas importing float values and returning object values how to prevent this?

i would like to keep the formatation data type of excel when i import on the pandas, but when in the case os screenshot i'am trying to import in my csv the header area venda m2 is numeric but when i import to the pandas it's became a object.
Code of the imports
df = pd.read_csv('C:\\Users\\LucasSilvaDantasAbra\\Desktop\\teste\\base_nova_040420224.csv', delimiter = ';')
df.columns
Code to show all the colums
pd.set_option('display.max_rows', None)
Data frame
dfbase = df
Showing types
dfbase.dtypes
Pandas Data Frame is confused with the commas "," You need to replace then by dots "."

How to implement an Conditions in an Panda Data Frame, to save it

I would like to create a counter for a Panda DataFrame to save only until a certain Value in a specific column.
f.e save only until df['cycle'] == 2.
For what I gathered from the answers below is that df[df['cycle']<=2] will solve my Problem.
Edit: If I am correct python pandas always read the whole file, only if us nrows than you say f.e go to index x but want if I don't want to use Index but a specific value from a column. How can I do that?
See my code below:
import pandas as pd
import numpy as np
l = list(np.linspace(0,10,12))
data = [
('time',l),
('A',[0,5,0.6,-4.8,-0.3,4.9,0.2,-4.7,0.5,5,0.1,-4.6]),
('B',[ 0,300,20,-280,-25,290,30,-270,40,300,-10,-260]),
]
df = pd.DataFrame.from_dict(dict(data))
df['cycle'] = [df.index.get_loc(i) // 4 + 1 for i in df.index]
df[df['cycle']<=2]
df.to_csv(path_or_buf='test.out', index=True, sep='\t', columns=['time','A','B','cycle'], decimal='.')
So I modified the answer according to the suggestion from the users.
I am glad for every help that I can get.

How to parse pandas dataframes between functions

I want to read in data from a csv file into a pandas dataframe. Then I want to do several operations on this dataframe. I want to do this in different functions (ideally in a separate file).
import pandas as pd
def read_text(file):
df = pd.read_csv(file,skipinitialspace=True, sep=";", encoding = "ISO-8859-1")
return [df]
file = "/path/file.txt"
content = pd.DataFrame()
content = read_text(file)
Now, the reading of the file works fine. But "content" does not seem to be a dataframe anymore. At least, if I try something like e.g. print (content.value) there does not seem to be this option.
What I later want to do is:
send dataframe to a function to remove duplicates and return dataframe
Take this new dataframe and remove certain entries and again return this dataframe
Do some more things with the dataframe
Ideally, these functions will be in a separate file. But I will take care of this later. For right now it would be of great help, if I could parse these dataframes back and forth.
You're returning [df] so that's a list of a single dataframe. You should modify your code as follows:
import pandas as pd
def read_text(file):
df = pd.read_csv(file,skipinitialspace=True, sep=";", encoding = "ISO-8859-1")
return df
file = "/path/file.txt"
content = read_text(file)

converting column names to integer with read_csv

I have constructed a matrix with integer values for columns and index. The matrix is acutally hierachical for each month. My problem is that the indexing and selecting of data does not work anymore as before when I write the data to csv and then load as pandas dataframe.
Selecting data before writing and reading data to file:
matrix.ix[1][4][3] would for example give 123
In words select, month January and get me the (travel) flow from origin 4 to destination 3.
After writing and reading the data to csv and back into pandas, the original referencing fails but if I convert the column indexing to string it works:
matrix.ix[1]['4'][3]
... the column names have automatically been tranformed from integer into string. But I would prefer the original indexing.
Any suggestions?
My current quick fix for handling the data after loading from csv is:
#Writing df to file
mulitindex_df_Travel_monthly.to_csv(r'result/Final_monthly_FlightData_countrylevel_v4.csv')
#Loading df from csv
test_matrix = pd.read_csv(filepath_inputdata+'/Final_monthly_FlightData_countrylevel_v4.csv',
index_col=[0, 1])
test_matrix.rename(columns = int, inplace = True) #Thx, #ayhan
CSV FILE:
https://www.dropbox.com/s/4u2opzh65zwcn81/travel_matrix_SO.csv?dl=0
I used something like this:
df = df.rename(columns={str(c): c for c in columns})
where:
df is pandas dataframe and columns are column to change
You could also do
df.columns = df.columns.astype(int)
or
df.columns = df.columns.map(int)
Related: what is difference between .map(str) and .astype(str) in dataframe

Categories