Subtract a constant from a column in a pandas dataframe - python

I have a dataframe as follows:
year,value
1970,2.0729729191557147
1971,1.0184197388632872
1972,2.574009084167593
1973,1.4986879160266255
1974,3.0246498975934464
1975,1.7876222478238608
1976,2.5631745148930913
1977,2.444014336917563
1978,2.619502688172043
1979,2.268273809523809
1980,2.6086169818316645
1981,0.8452720174091145
1982,1.3158922171018947
1983,-0.12695212493599603
1984,1.4374230626622169
1985,2.389290834613415
1986,2.3489311315924217
1987,2.6002265745007676
1988,1.2623717711036955
1989,1.1793426779313878
I would like to subtract a constant from each of the values in the second column. This is the code I have tried:
df = pd.read_csv(f1, sep=",", header=0)
df2 = df["value"].subtract(1)
However when I do this, df2 becomes this:
70 1.072973
71 0.018420
72 1.574009
73 0.498688
74 2.024650
75 0.787622
76 1.563175
77 1.444014
78 1.619503
79 1.268274
80 1.608617
81 -0.154728
82 0.315892
83 -1.126952
84 0.437423
85 1.389291
86 1.348931
87 1.600227
88 0.262372
89 0.179343
The year becomes only the last two digits. How can I retain all of the digits of the year?

I think column year is not modified, only need assign back subtracted values:
df["value"] = df["value"].subtract(1)

Related

How to add the List data to the first column of the CSV file, which has 256 columns file via python?

I have a CSV file which has 255 columns and 16,000 rows of data, and I want to add a list of data which contains 16,000 data to the first column of my CSV file.
The code I tried to use is
# Append the name of the file to List
path = 'C:/Users/User/Desktop/Guanlin_CNN1D/CNN1D/0.3 15 and 105 circle cropped'
list = os.listdir(path)
List = []
for a in list:
List.append(str(a))
## Load the to-be-added CSV file
data = pd.read_csv('C:/Users/User/Desktop/Guanlin_CNN1D/CNN1D/0.3 15 and 105 for toolpath recreatation.csv',sep=',', engine='python' ,header=None)
tempdata = pd.DataFrame(data)
features = tempdata.values[:, 1:]
file_num = tempdata.values[:, 0]
# add the List to first columns of CSV file
Temp = {List,file_num,features}
temp = pd.DataFrame(Temp)
temp
The result shows
TypeError: unhashable type: 'list'
How to rewrite the code?
Thanks in advance!
I think you simply need to use the dataframe insert method. It looks like you are trying to create a new dataframe but I think it is not necessary. Below example inserts a new column at the zeroth position. It looks like you were trying to make a new dataframe from a dict; this link has some easy examples on way to populate a dataframe with lists and dicts. I think the number of rows and columns should not be a concern for you in this case.
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 5)), columns=list('ABCDE'))
print(df)
df.insert(0,column='newcol', value=np.random.randint(0, 100, size=(5)))
print()
print(df)
df.to_csv( r'data.csv', index=False, header=True)
will produce this output
A B C D E
0 44 47 64 67 67
1 9 83 21 36 87
2 70 88 88 12 58
3 65 39 87 46 88
4 81 37 25 77 72
newcol A B C D E
0 9 44 47 64 67 67
1 20 9 83 21 36 87
2 80 70 88 88 12 58
3 69 65 39 87 46 88
4 79 81 37 25 77 72

Extract information from an Excel (by updating arrays) with Excel / Python

I have an Excel file with thousands of columns on the following format:
Member No.
X
Y
Z
1000
25
60
-30
-69
38
68
45
2
43
1001
24
55
79
4
-7
89
78
51
-2
1002
45
-55
149
94
77
-985
-2
559
56
I need a way such that I shall get a new table with the absolute maximum value from each column. In this example, something like:
Member No.
X
Y
Z
1000
69
60
68
1001
78
55
89
1002
94
559
985
I have tried it in Excel (with using VLOOKUP for finding the "Member Number" in the first row and then using HLOOKUP for finding the values from the rows thereafter), but the problem is that the HLOOKUP command is not automatically updated with the new array (the array in which Member number 1001 is) (so my solution works for member 1000, but not for 1001 and 1002), and hence it always searches for the new value ONLY in the 1st Row (i.e. the row with the member number 1000).
I also tried reading the file with Python, but I am not well-versed enough to make much of a headway - once the dataset has been read, how do I tell excel to read the next 3 rows and get the (absolute) maximum in each column?
Can someone please help? Solution required in Python 3 or Excel (ideally, Excel 2014).
The below solution will get you your desired output using Python.
I first ffill to fill in the blanks in your Member No column (axis=0 means row-wise). Then convert your dataframe values to +ve using abs. Lastly, using pandas.DataFrame.agg, I get the max value for all the columns in your dataframe.
Assuming your dataframe is called data:
import pandas as pd
data['Member No.'] = data['Member No.'].ffill(axis=0).astype(int)
df = abs(df)
res = (data.groupby('Member No.').apply(lambda x: x.max())).drop('Member No.',axis=1).reset_index()
Which will print you:
Member No. X Y Z A B C
0 1000 69 60 68 60 74 69
1 1001 78 55 89 78 92 87
2 1002 94 559 985 985 971 976
Note that I added extra columns in your sample data to make sure that all the columns will return their max() value.

How to merge data frames in pandas that have some columns in common and some not without losing any data

For example, if I had
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))
df1 = pd.DataFrame(np.random.randint(0,100,size=(8, 3)), columns=list('BCD'))
display(df,df1)
A B C D
0 63 16 89 55
1 17 29 81 17
2 88 82 9 64
B C D
0 21 38 36
1 54 88 80
2 44 53 53
3 24 58 29
A. B. C. D
0. 63. 16. 89. 55
1. 17 29. 81. 17
2. 88 82 9 64
3. NAN. 21 38 36
4. NAN. 54 88 80
5. NAN 44 53 53
6. NAN. 24 58 29
Is this possible?? I have about 25 data frames, all organized by ascending dates (the columns) and containing data for different airports (the indexes) for how many times a plane has ascended at each airport on each day. To reiterate, aiport names are the rows and dates are columns. The problem is, every data frame, containing 7 days, has a different number of airport names because some weeks some airports are inactive, and some weeks they're not. For that reason, it is really hard to merge them all together into a dataframe because each one has a lot of airports in common, but not necessarily in the exact same position (column number). Is there anyway to merge them so that NANs appear in the dates that the airports are inactive, and so that the rows of airport names are never duplicates? Sorry it is so hard to explain, thank you!!

How to add another name in existing index with scores?

I created a datafame as shown below. I have to add another name in index and update scores; how to append it to existing. data? I have to add 'Pandey' in index and Test1 = 56 and Test2 = 76
test_score = pd.DataFrame(
{'Test1':[82,75,83,92,85],
'Test2':[85,81,75,85,91]},
index = ['Sachin','Dravid','virat','Rohith','Dhawan'])
My result should be
Test1 Test2
Sachin 82 85
Dravid 75 81
virat 83 75
Rohith 92 85
Dhawan 85 91
Pandey 56 76
row = pd.Series({'Test1':56, 'Test2' : 76},index='pandey')
test_score= test_score.append(row)

How to read one column data as one by one row in csv file using python

Here I have a dataset with three inputs. Three inputs x1,x2,x3. Here I want to read just x2 column and in that column data stepwise row by row.
Here I wrote a code. But it is just showing only letters.
Here is my code
data = pd.read_csv('data6.csv')
row_num =0
x=[]
for col in data:
if (row_num==1):
x.append(col[0])
row_num =+ 1
print(x)
result : x1,x2,x3
What I expected output is:
expected output x2 (read one by one row)
65
32
14
25
85
47
63
21
98
65
21
47
48
49
46
43
48
25
28
29
37
Subset of my csv file :
x1 x2 x3
6 65 78
5 32 59
5 14 547
6 25 69
7 85 57
8 47 51
9 63 26
3 21 38
2 98 24
7 65 96
1 21 85
5 47 94
9 48 15
4 49 27
3 46 96
6 43 32
5 48 10
8 25 75
5 28 20
2 29 30
7 37 96
Can anyone help me to solve this error?
If you want list from x2 use:
x = data['x2'].tolist()
I am not sure I even get what you're trying to do from your code.
What you're doing (after fixing the indentation to make it somewhat correct):
Iterate through all columns of your dataframe
Take the first character of the column name if row_num is equal to 1.
Based on this guess:
import pandas as pd
data = pd.read_csv("data6.csv")
row_num = 0
x = []
for col in data:
if row_num == 1:
x.append(col[0])
row_num = +1
print(x)
What you probably want to do:
import pandas as pd
data = pd.read_csv("data6.csv")
# Make a list containing the values in column 'x2'
x = list(data['x2'])
# Print all values at once:
print(x)
# Print one value per line:
for val in x:
print(val)
When you are using pandas you can use it. You can try this to get any specific column values by using list to direct convert into a list.For loop not needed
import pandas as pd
data = pd.read_csv('data6.csv')
print(list(data['x2']))

Categories