I am filtering out records for last month data records, however when doing
emp_df = emp_df[emp_df['Date'].dt.month == (currentMonth-1)]
It neglects some records(treats some records months as days).Link to File
from datetime import datetime, date
import pandas as pd
import numpy as np
cholareport = pd.read_excel("D:/Automations/HealthCheck and Audit Trail/report.xlsx")
uniqueemp = set(cholareport['Email'])
cholareport['Date'] = pd.to_datetime(cholareport['Date'])
uniqueemp = set(cholareport['Email'])
daystoignore = ['Holiday_COE', 'Leave_COE']
# datedfforemp = pd.DataFrame(columns=uniqueemp)
cholareport['Date'] = cholareport['Date'].apply(lambda x:
pd.to_datetime(x).strftime('%d/%m/%Y'))
cholareport["Date"] = pd.to_datetime(cholareport["Date"], utc=True)
for emp in uniqueemp:
emp_df = cholareport[cholareport['Email'].isin([emp])]
emp_df = emp_df[~emp_df['Task: Task Name'].isin(daystoignore)]
# s1 = pd.to_datetime(emp_df['Date']).dt.strftime('%Y-%m')
# s2 = (pd.to_datetime('today').strftime('%Y-%m') -pd.DateOffset(months=1)).strftime('%Y-%m')
# emp_df = emp_df[s1 == s2]
currentMonth = datetime.now().month
# print(currentMonth)
# print(emp_df['Date'])
emp_df['Date'] = pd.to_datetime(emp_df['Date']).dt.strftime("%dd-%mm-%YYYY")
format_data = "%dd-%mm-%YYYY"
empdfdate = []
for i in emp_df['Date']:
empdfdate.append(datetime.strptime(i,format_data))
print(empdfdate)
emp_df['Date'] = empdfdate
for i in emp_df['Date']:
print(i.month, i.day)
# emp_df['Date'] = pd.to_datetime(emp_df['Date']).dt.strftime('%Y-%m')
emp_df = emp_df[emp_df['Date'].dt.month == (currentMonth-1)]
for i in emp_df['Date']:
print(i.month, i.day)
Results :
6 10
7 10
10 10
11 10
12 10
10 13
10 14
Expected:
6 10
7 10
10 10
11 10
12 10
13 10
14 10
I am not entirely sure what you want to accomplish. If I understand it correctly, you simply want to count the number of entries per day for the past month. In such case, you can simply do the following.
from datetime import datetime
import pandas as pd
report = pd.read_excel('report.xlsx')
print('day: counts', report.Date[report.Date.dt.month == datetime.now().month - 1].dt.day.value_counts(), sep='\n')
I do not get your expected results. It might be that you also want to filter by email somehow; however, I cannot understand from your code what it is that you want to do.
Output:
day: counts
3 101
5 101
6 101
7 101
4 101
24 84
28 84
27 84
26 84
25 84
10 82
11 82
12 82
13 82
14 82
17 67
21 67
20 67
19 67
18 67
31 2
Name: Date, dtype: int64
I have a dataframe called df_location:
location = {'location_id': [1,2,3,4,5,6,7,8,9,10],
'temperature_value': [20,21,22,23,24,25,26,27,28,29],
'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_location = pd.DataFrame(locations)
I have another dataframe called df_islands:
islands = {'island_id':[10,20,30,40,50,60],
'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)
Each island_id corresponds to one or more locations. As you can see, the locations are stored in a list.
What I'm trying to do is to search the list_of_locations for each unique location and merge it to df_location in a way where each island_id will correspond to a specific location.
Final dataframe should be the following:
merged = {'location_id': [1,2,3,4,5,6,7,8,9,10],
'temperature_value': [20,21,22,23,24,25,26,27,28,29],
'humidity_value':[60,61,62,63,64,65,66,67,68,69],
'island_id':[10,20,20,30,30,40,40,40,50,60]}
df_merged = pd.DataFrame(merged)
I don't know whether there is a method or function in python to do so. I would really appreciate it if someone can give me a solution to this problem.
The pandas method you're looking for to expand your df_islands dataframe is .explode(column_name). From there, rename your column to location_id and then join the dataframes using pd.merge(). It'll perform a SQL-like join method using the location_id as the key.
import pandas as pd
locations = {'location_id': [1,2,3,4,5,6,7,8,9,10],
'temperature_value': [20,21,22,23,24,25,26,27,28,29],
'humidity_value':[60,61,62,63,64,65,66,67,68,69]}
df_locations = pd.DataFrame(locations)
islands = {'island_id':[10,20,30,40,50,60],
'list_of_locations':[[1],[2,3],[4,5],[6,7,8],[9],[10]]}
df_islands = pd.DataFrame(islands)
df_islands = df_islands.explode(column='list_of_locations')
df_islands.columns = ['island_id', 'location_id']
pd.merge(df_locations, df_islands)
Out[]:
location_id temperature_value humidity_value island_id
0 1 20 60 10
1 2 21 61 20
2 3 22 62 20
3 4 23 63 30
4 5 24 64 30
5 6 25 65 40
6 7 26 66 40
7 8 27 67 40
8 9 28 68 50
9 10 29 69 60
The df.apply() method works here. It's a bit long-winded but it works:
df_location['island_id'] = df_location['location_id'].apply(
lambda x: [
df_islands['island_id'][i] \
for i in df_islands.index \
if x in df_islands['list_of_locations'][i]
# comment above line and use this instead if list is stored in a string
# if x in eval(df_islands['list_of_locations'][i])
][0]
)
First we select the final value we want if the if statement is True: df_islands['island_id'][i]
Then we loop over each column in df_islands by using df_islands.index
Then create the if statement which loops over all values in df_islands['list_of_locations'] and returns True if the value for df_location['location_id'] is in the list.
Finally, since we must contain this long statement in square brackets, it is a list. However, we know that there is only one value in the list so we can index it by using [0] at the end.
I hope this helps and happy for other editors to make the answer more legible!
print(df_location)
location_id temperature_value humidity_value island_id
0 1 20 60 10
1 2 21 61 20
2 3 22 62 20
3 4 23 63 30
4 5 24 64 30
5 6 25 65 40
6 7 26 66 40
7 8 27 67 40
8 9 28 68 50
9 10 29 69 60
Here I have a dataset with three inputs. Three inputs x1,x2,x3. Here I want to read just x2 column and in that column data stepwise row by row.
Here I wrote a code. But it is just showing only letters.
Here is my code
data = pd.read_csv('data6.csv')
row_num =0
x=[]
for col in data:
if (row_num==1):
x.append(col[0])
row_num =+ 1
print(x)
result : x1,x2,x3
What I expected output is:
expected output x2 (read one by one row)
65
32
14
25
85
47
63
21
98
65
21
47
48
49
46
43
48
25
28
29
37
Subset of my csv file :
x1 x2 x3
6 65 78
5 32 59
5 14 547
6 25 69
7 85 57
8 47 51
9 63 26
3 21 38
2 98 24
7 65 96
1 21 85
5 47 94
9 48 15
4 49 27
3 46 96
6 43 32
5 48 10
8 25 75
5 28 20
2 29 30
7 37 96
Can anyone help me to solve this error?
If you want list from x2 use:
x = data['x2'].tolist()
I am not sure I even get what you're trying to do from your code.
What you're doing (after fixing the indentation to make it somewhat correct):
Iterate through all columns of your dataframe
Take the first character of the column name if row_num is equal to 1.
Based on this guess:
import pandas as pd
data = pd.read_csv("data6.csv")
row_num = 0
x = []
for col in data:
if row_num == 1:
x.append(col[0])
row_num = +1
print(x)
What you probably want to do:
import pandas as pd
data = pd.read_csv("data6.csv")
# Make a list containing the values in column 'x2'
x = list(data['x2'])
# Print all values at once:
print(x)
# Print one value per line:
for val in x:
print(val)
When you are using pandas you can use it. You can try this to get any specific column values by using list to direct convert into a list.For loop not needed
import pandas as pd
data = pd.read_csv('data6.csv')
print(list(data['x2']))
I have a socket that take 60 numbers from another computer in 6 columns and 10 rows. I orderd them with spilit and output is completely right. about first column, I want to take each number separately for calculating moving average filter on them.
Codes:
import socket
import numpy as np
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('192.168.0.1', 2015))
column1 = []
column2 = []
column3 = []
column4 = []
column5 = []
column6 = []
for message in range(10):
message = sock.recv(1024)
a1 = column1.append(message.split()[0])
a2 = column2.append(message.split()[1])
a3 = column3.append(message.split()[2])
a4 = column4.append(message.split()[3])
a5 = column5.append(message.split()[4])
a6 = column6.append(message.split()[5])
b1 =message.split()[0]
b2 = message.split()[1]
b3 = message.split()[2]
b4 = message.split()[3]
b5 = message.split()[4]
b6 = message.split()[5]
print b1
print b2
print b3
print b4
print b5
print b6
if I only print b1, output will be 10 numbers that I want to have tham separately for next function (moving average filter). I need help to make them separate.
I tried a for loop for b1[i] but gives me only first digit of b1.
First, you want to use a list of columns:
columns = [[] for _ in range(6)]
Then you can split the message into a single list:
for message in range(10):
message = sock.recv(1024)
splits = message.split(None, 5) # split into six pieces at most
which you can then append to the list of lists you created before:
for index, item in enumerate(splits):
columns[index].append(item)
Now if you only wish to print the first of those appended numbers, do
print columns[0][0] # first item of first list
The following should get you started. I have created some random data in the format 6 columns by 10 rows. It then splits the raw data into rows, splits each row into columns and then transposes them to get the data per columns.
Each entry in the first column is then displayed with a moving average of the last 3 entries. deque is used to implement an efficient mini queue of the last entries to calculate the moving average with.
import collections
message = """89 39 59 88 46 1 87 21 2 34
59 40 68 74 29 29 26 30 93 38
84 60 44 98 41 29 8 60 61 83
36 44 56 8 50 94 99 1 30 52
5 27 53 85 67 69 38 67 69 26
92 17 4 13 74 89 30 49 44 20"""
rows = message.splitlines()
data = []
for row in rows:
data.append(row.split())
columns = zip(*data)
total = 0
moving = collections.deque()
# Display the moving average for the first column
for entry in columns[0]:
value = int(entry)
moving.append(value)
total += value
if len(moving) > 3: # Length of moving average
total -= moving.popleft()
print "%3d %.1f" % (value, total/float(len(moving)))
For this data, it will display the following output:
89 89.0
59 74.0
84 77.3
36 59.7
5 41.7
92 44.3
Tested using Python 2.7
I have a list consisting of 148 entries. Each entry is a four digit number. I would like to print out the result as this:
1 14 27 40
2 15 28 41
3 16 29 42
4 17 30 43
5 18 31 44
6 19 32 45
7 20 33 46
8 21 34 47
9 22 35 48
10 23 36 49
11 24 37 50
12 25 38 51
13 26 39 52
53
54
55... and so on
I have some code that work for the first 13 rows and 4 columns:
kort_identifier = [my_list_with_the_entries]
print_val = 0
print_num_1 = 0
print_num_2 = 13
print_num_3 = 26
print_num_4 = 39
while (print_val <= 36):
print kort_identifier[print_num_1], '%10s' % kort_identifier[print_num_2], '%10s' % kort_identifier[print_num_3], '%10s' % kort_identifier[print_num_4]
print_val += 1
print_num_1 += 1
print_num_2 += 1
print_num_3 += 1
print_num_4 += 1
I feel this is an awful solution and there has to be a better and simpler way of doing this. I have searched through here (searched for printing tables and matrices) and tried those solution but none seems to work with this odd table/matrix behaviour that I need.
Please point me in the right direction.
A bit tricky, but here you go. I opted to manipulate the list until it had the right shape, instead of messing around with indexes.
lst = range(1, 149)
lst = [lst[i:i+13] for i in xrange(0, len(lst), 13)]
lst = zip(*[lst[i] + lst[i+4] + lst[i+8] for i in xrange(4)])
for row in lst:
for col in row:
print col,
print
It might be overkill, but you could just make a numpy array.
import numpy as np
x = np.array(kort_identifier).reshape(2, 13, 4)
for subarray in x:
for row in subarray:
print row