How to use videos in validate_data? - python

This is the code I am using for training my model.
model_training = convlstm_model.fit(x=features_train,y=labels_train,epochs=50,batch_size=4,shuffle=True,validation_split=0.2,callbacks=[early_stopping_callback])
I have used validation_split above but I want to instead use validation_data to validate my model. Here's my validate data:
0 0 0.mp4
1 0 1.mp4
2 0 2.mp4
3 0 3.mp4
4 0 4.mp4
5 0 5.mp4
6 0 6.mp4
7 0 7.mp4
8 0 8.mp4
9 0 9.mp4
10 0 10.mp4
11 0 11.mp4
12 0 12.mp4
13 0 13.mp4
14 0 14.mp4
15 0 15.mp4
16 0 16.mp4
17 0 17.mp4
18 0 18.mp4
19 0 19.mp4
20 0 20.mp4
21 0 21.mp4
22 0 22.mp4
23 0 23.mp4
24 1 24.mp4
25 1 25.mp4
26 1 26.mp4
27 1 27.mp4
28 1 28.mp4
29 1 29.mp4
30 1 30.mp4
31 1 31.mp4
32 1 32.mp4
33 1 33.mp4
34 1 34.mp4
35 1 35.mp4
36 1 36.mp4
37 1 37.mp4
38 1 38.mp4
39 1 39.mp4
40 1 40.mp4
41 1 41.mp4
42 1 42.mp4
43 1 43.mp4
44 1 44.mp4
45 1 45.mp4
46 1 46.mp4
47 1 47.mp4
48 1 48.mp4
49 1 49.mp4
50 1 50.mp4
So, its basically a text file which has the first column as serial no. , the second column as class_number and third column is the video file name. The video files for the validate is given separately in another file.
So my question is how do I give this as input to validate_data because I generally see validation_data = (x_test, y_test) ?

Related

How to look for same columns from one dataframe in other dataframe pandas python?

I have one dataframe like this,
tabla_aciertos= {'Numeros_acertados' : [5,5,5,4,4,3,4,2,3,3,1,2,2],'Estrellas_acertadas': [2,1,0,2,1,2,0,2,1,0,2,1,0]}
categorias = [1,2,3,4,5,6,7,8,9,10,11,12,13]
categoria_de_premios = pd.DataFrame (tabla_aciertos,index = [categorias] )
categoria_de_premios
Numeros_acertados Estrellas_acertadas
1 5 2
2 5 1
3 5 0
4 4 2
5 4 1
6 3 2
7 4 0
8 2 2
9 3 1
10 3 0
11 1 2
12 2 1
13 2 0
and another df :
sorteos_anteriores.iloc[:,:]
uno dos tres cuatro cinco Estrella1 Estrella2 bolas_Acertadas estrellas_Acertadas
Fecha
2020-10-13 5 14 38 41 46 1 10 0 1
2020-09-10 11 15 35 41 50 5 8 1 0
2020-06-10 4 21 36 41 47 9 11 0 0
2020-02-10 6 12 15 40 45 3 9 0 0
2020-09-29 4 14 16 41 44 11 12 0 1
... ... ... ... ... ... ... ... ... ...
2004-12-03 15 24 28 44 47 4 5 0 0
2004-05-03 4 7 33 37 39 1 5 0 1
2004-02-27 14 18 19 31 37 4 5 0 0
2004-02-20 7 13 39 47 50 2 5 1 0
2004-02-13 16 29 32 36 41 7 9 0 0
1363 rows × 9 columns
Now I need to see in each and every row of the df "sorteos_anteriores" is in one of the all row from the first df, "tabla_aciertos" .
Let me give you one example,
Inmagine in "sorteos_anteriores" you have in:
2019-11-2 in the column "bolas_Acertadas"= 5 and "estrellas_Acertadas= 1". Now you go to fist table, "tabla_aciertos" and you find that in (index 2 = "Numeros_acertados" = 5 and Estrellas_acertadas=1) . You have won a second (index=2) class prize. You should create a new column "Prize" in "sorteos_anteriores" and in each row write a number from 1 to 13 if you have some kind of prize of 0 or Nan if you not.
I have try :
sorteos_anteriores ['categorias'] = sorteos_anteriores(sorteos_anteriores.loc[:,'bolas_Acertadas':'estrellas_Acertadas'] == tabla_premios.iloc[ : ,0:2])
Also with where and merge, but nothing works.
Thanks for your help.
Thanks to Cuina Max I could do it.
answer here
# supposing that the indexes, starting from one, correspond to the the premiums
categoria_de_premios['Categoria'] = df.index
# Merge using pd.merge and the appropriate arguments
sorteos_anteriores = (sorteos_anteriores.merge(
categoria_de_premios,
how='outer',
left_on=['bolas_Acertadas','estrellas_Acertadas'],
right_on=['Numeros_acertados', 'Estrellas_acertadas']
)).drop(columns=['Numeros_acertados', 'Estrellas_acertadas'])

ValueError Length mismatch Expected axis has 2 elements, new values have 3 elements

def bar1():
df=pd.read_csv('#CSVFILELOCATION#',encoding= 'unicode_escape')
x=np.arange(11)
df=df.set_index(['Country'])
dfl=df.iloc[:,[4,9]]
w=dfl.groupby('Country')['SummerTotal' , 'WinterTotal'].sum()
final_df=w.sort_values(by='Country').tail(11)
final_df.reset_index(inplace=True)
final_df.columns=('Country','SummerTotal','WinterTotal')
final_df=final_df.drop(11,axis='index')
Countries=df['Country']
STotalMed=df['SummerTotal']
WTotalMed=df['WinterTotal']
plt.bar(x-0.25,STotalMed,label='Total Medals by Countries in Summer',color='g')
plt.bar(x+0.25,WTotalMed,label='Total Medals by Countries in Winter',color='r')
plt.xticks(r,Countries,rotation=30)
plt.title('Olympics Data Analysis of Top 10 Countries',color='red',fontsize=10)
plt.xlabel('Countries')
plt.ylabel('Total Medals')
plt.grid()
plt.legend()
plt.show()
THIS IS THE CODE FOR A BAR GRAPH I AM USING IN A PROJECT
IN HERE THERE IS AN ERROR
ValueError: Length mismatch: Expected axis has 2 elements, new values have 3 elements
PLEASE HELP ANYONE I WANT TO SUBMIT THIS PROJECT FAST
CSV:
Country SummerTimesPart Sumgoldmedal Sumsilvermedal Sumbronzemedal SummerTotal WinterTimesPart Wingoldmedal Winsilvermedal Winbronzemedal WinterTotal TotalTimesPart Tgoldmedal Tsilvermedal Tbronzemedal TotalMedal
 Afghanistan  14 0 0 2 2 0 0 0 0 0 14 0 0 2 2
 Algeria  13 5 4 8 17 3 0 0 0 0 16 5 4 8 17
 Argentina  24 21 25 28 74 19 0 0 0 0 43 21 25 28 74
 Armenia  6 2 6 6 14 7 0 0 0 0 13 2 6 6 14
 Australasia 2 3 4 5 12 0 0 0 0 0 2 3 4 5 12
 Australia  26 147 163 187 497 19 5 5 5 15 45 152 168 192 512
 Austria  27 18 33 36 87 23 64 81 87 232 50 82 114 123 319
 Azerbaijan  6 7 11 24 42 6 0 0 0 0 12 7 11 24 42
 Bahamas  16 6 2 6 14 0 0 0 0 0 16 6 2 6 14
 Bahrain  9 2 1 0 3 0 0 0 0 0 9 2 1 0 3
 Barbados 12 0 0 1 1 0 0 0 0 0 12 0 0 1 1
 Belarus  6 12 27 39 78 7 8 5 5 18 13 20 32 44 96
 Belgium  26 40 53 55 148 21 1 2 3 6 47 41 55 58 154
 Bermuda  18 0 0 1 1 8 0 0 0 0 26 0 0 1 1
 Bohemia  3 0 1 3 4 0 0 0 0 0 3 0 1 3 4
 Botswana  10 0 1 0 1 0 0 0 0 0 10 0 1 0 1
 Brazil  22 30 36 63 129 8 0 0 0 0 30 30 36 63 129
 British West Indies  1 0 0 2 2 0 0 0 0 0 1 0 0 2 2
 Bulgaria  20 51 87 80 218 20 1 2 3 6 40 52 89 83 224
 Burundi  6 1 1 0 2 0 0 0 0 0 6 1 1 0 2
 Cameroon 14 3 1 2 6 1 0 0 0 0 15 3 1 2 6
INFO-----> SummerTimesPart : No. of times participated in summer by each country
WinterTimesPart : No. of times participated in winter by each country
A few changes were needed to get the chart working:
A tick array is required to plot the country names
Use final_df for the chart data, not df
Set the bar width so the bars don't overlap
Here is the updated code:
data = '''
Country SummerTimesPart Sumgoldmedal Sumsilvermedal Sumbronzemedal SummerTotal WinterTimesPart Wingoldmedal Winsilvermedal Winbronzemedal WinterTotal TotalTimesPart Tgoldmedal Tsilvermedal Tbronzemedal TotalMedal
Afghanistan 14 0 0 2 2 0 0 0 0 0 14 0 0 2 2
Algeria 13 5 4 8 17 3 0 0 0 0 16 5 4 8 17
Argentina 24 21 25 28 74 19 0 0 0 0 43 21 25 28 74
Armenia 6 2 6 6 14 7 0 0 0 0 13 2 6 6 14
Australasia 2 3 4 5 12 0 0 0 0 0 2 3 4 5 12
Australia 26 147 163 187 497 19 5 5 5 15 45 152 168 192 512
Austria 27 18 33 36 87 23 64 81 87 232 50 82 114 123 319
Azerbaijan 6 7 11 24 42 6 0 0 0 0 12 7 11 24 42
Bahamas 16 6 2 6 14 0 0 0 0 0 16 6 2 6 14
Bahrain 9 2 1 0 3 0 0 0 0 0 9 2 1 0 3
Barbados 12 0 0 1 1 0 0 0 0 0 12 0 0 1 1
Belarus 6 12 27 39 78 7 8 5 5 18 13 20 32 44 96
Belgium 26 40 53 55 148 21 1 2 3 6 47 41 55 58 154
Bermuda 18 0 0 1 1 8 0 0 0 0 26 0 0 1 1
Bohemia 3 0 1 3 4 0 0 0 0 0 3 0 1 3 4
Botswana 10 0 1 0 1 0 0 0 0 0 10 0 1 0 1
Brazil 22 30 36 63 129 8 0 0 0 0 30 30 36 63 129
BritishWestIndies 1 0 0 2 2 0 0 0 0 0 1 0 0 2 2
Bulgaria 20 51 87 80 218 20 1 2 3 6 40 52 89 83 224
Burundi 6 1 1 0 2 0 0 0 0 0 6 1 1 0 2
Cameroon 14 3 1 2 6 1 0 0 0 0 15 3 1 2 6
'''.strip()
with open('data,csv', 'w') as f: f.write(data) # write test file
############################
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def bar1():
df=pd.read_csv('data,csv', encoding= 'unicode_escape', sep=' ', index_col=False)
x=np.arange(11)
df=df.set_index(['Country'])
dfl=df.iloc[:,[4,9]]
w=dfl.groupby('Country')['SummerTotal' , 'WinterTotal'].sum()
final_df=w.sort_values(by='Country').tail(11)
final_df.reset_index(inplace=True)
final_df.columns=('Country','SummerTotal','WinterTotal')
print(final_df)
# final_df=final_df.drop(11,axis='index')
Countries=final_df['Country']
STotalMed=final_df['SummerTotal']
WTotalMed=final_df['WinterTotal']
plt.bar(x-0.25,STotalMed,width=.2, label='Total Medals by Countries in Summer',color='g')
plt.bar(x+0.25,WTotalMed,width=.2, label='Total Medals by Countries in Winter',color='r')
plt.xticks(np.arange(11),Countries,rotation=30)
plt.title('Olympics Data Analysis of Top 10 Countries',color='red',fontsize=10)
plt.xlabel('Countries')
plt.ylabel('Total Medals')
plt.grid()
plt.legend()
plt.show()
bar1()
Output

How to fill in values of a dataframe column if the difference between values in another column is sufficiently small?

I have a dataframe df1:
Time Delta_time
0 0 NaN
1 15 15
2 18 3
3 30 12
4 45 15
5 64 19
6 80 16
7 82 2
8 100 18
9 120 20
where Delta_time is the difference between adjacent values in the Time column. I have another dataframe df2 that has time values numbering from 0 to 120 (121 rows) and another column called 'Short_gap'.
How do I set the value of Short_gap to 1 for all Time values that lie in a Delta_time value smaller than 5? For example, the Short_gap column should have a value of 1 for Time = 15,16,17,18 since Delta_time = 3 < 5.
Edit: Currently, df2 looks like this.
Time Short_gap
0 0 0
1 1 0
2 2 0
3 3 0
... ... ...
118 118 0
119 119 0
120 120 0
The expected output for df2 is
Time Short_gap
0 0 0
1 1 0
2 2 0
... ... ...
13 13 0
14 14 0
15 15 1
16 16 1
17 17 1
18 18 1
19 19 0
20 20 0
... ... ...
78 78 0
79 79 0
80 80 1
81 81 1
82 82 1
83 83 0
84 84 0
... ... ...
119 119 0
120 120 0
Try:
t = df['Delta_time'].shift(-1)
df2 = ((t < 5).repeat(t.fillna(1)).astype(int).reset_index(drop=True)
.to_frame(name='Short_gap').rename_axis('Time').reset_index())
print(df2.head(20))
print('...')
print(df2.loc[78:84])
Output:
Time Short_gap
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 0
7 7 0
8 8 0
9 9 0
10 10 0
11 11 0
12 12 0
13 13 0
14 14 0
15 15 1
16 16 1
17 17 1
18 18 0
19 19 0
...
Time Short_gap
78 78 0
79 79 0
80 80 1
81 81 1
82 82 0
83 83 0
84 84 0

get data from xml using pandas

I'm trying to get some data from xml using pandas. Currently I have "working" code, and by working i mean it almost work.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "http://degra.wi.pb.edu.pl/rozklady/webservices.php?"
response = requests.get(url).content
soup = BeautifulSoup(response)
tables = soup.find_all('tabela_rozklad')
tags = ['dzien', 'godz', 'ilosc', 'tyg', 'id_naucz', 'id_sala',
'id_prz', 'rodz', 'grupa', 'id_st', 'sem', 'id_spec']
df = pd.DataFrame()
for table in tables:
all = map(lambda x: table.find(x).text, tags)
df = df.append([all])
df.columns = tags
a = df[(df.sem == "1")]
a = a[(a.id_spec == "0")]
a = a[(a.dzien == "1")]
print(a)
So I'm getting error on "a = df[(df.sem == "1")]" which is :
File "pandas\index.pyx", line 139, in pandas.index.IndexEngine.get_loc (pandas\index.c:4443)
File "pandas\index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas\index.c:4289)
File "pandas\src\hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13733)
File "pandas\src\hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13687)
As i read other stacks questions I saw people suggest using df.loc so i modyfied this line into
a = df.loc[(df.sem == "1")]
Now code compile but the results show like this line doesn't exists. Need to mention that the problem is with the "sem" tag only. Rest works perfectly but unfortunately i need to use exactly this tag. If anyone could explain what i causing this error and how to fix it I would be grateful.
You can add ignore_index=True to append for avoid duplicated index and then need select column sem by [], because function sem:
df = pd.DataFrame()
for table in tables:
all = map(lambda x: table.find(x).text, tags)
df = df.append([all], ignore_index=True)
df.columns = tags
#print (df)
a = df[(df['sem'] == '1') & (df.id_spec == "0") & (df.dzien == "1")]
print(a)
dzien godz ilosc tyg id_naucz id_sala id_prz rodz grupa id_st sem id_spec
0 1 1 2 0 52 79 13 W 1 13 1 0
1 1 3 2 0 12 79 32 W 1 13 1 0
2 1 5 2 0 52 65 13 Ćw 1 13 1 0
3 1 11 2 0 201 3 70 Ćw 10 13 1 0
4 1 5 2 0 36 78 13 Ps 5 13 1 0
5 1 5 2 1 18 32 450 Ps 3 13 1 0
6 1 5 2 2 18 32 450 Ps 4 13 1 0
7 1 7 2 1 18 32 450 Ps 7 13 1 0
8 1 7 2 2 18 32 450 Ps 8 13 1 0
9 1 7 2 0 66 65 104 Ćw 1 13 1 0
10 1 7 2 0 283 3 104 Ćw 5 13 1 0
11 1 7 2 0 346 5 104 Ćw 8 13 1 0
12 1 7 2 0 184 29 13 Ćw 7 13 1 0
13 1 9 2 0 66 65 104 Ćw 2 13 1 0
14 1 9 2 0 346 5 70 Ćw 8 13 1 0
15 1 9 1 0 73 3 203 Ćw 9 13 1 0
16 1 10 1 0 73 3 203 Ćw 10 13 1 0
17 1 9 2 0 184 19 13 Ps 13 13 1 0
18 1 11 2 0 184 19 13 Ps 14 13 1 0
19 1 11 2 0 44 65 13 Ćw 9 13 1 0
87 1 9 2 0 201 54 463 W 1 17 1 0
88 1 3 2 0 36 29 13 Ćw 2 17 1 0
89 1 3 2 0 211 5 70 Ćw 1 17 1 0
90 1 5 2 0 211 5 70 Ćw 2 17 1 0
91 1 7 2 0 36 78 13 Ps 4 17 1 0
105 1 1 2 1 11 16 32 Ps 2 18 1 0
106 1 1 2 2 11 16 32 Ps 3 18 1 0
107 1 3 2 0 51 3 457 W 1 18 1 0
110 1 5 2 2 11 16 32 Ps 1 18 1 0
111 1 7 2 0 91 64 97 Ćw 2 18 1 0
112 1 5 2 0 283 3 457 Ćw 2 18 1 0
254 1 5 1 0 12 29 32 Ćw 6 13 1 0
255 1 6 1 0 12 29 32 Ćw 5 13 1 0
462 1 7 2 0 98 1 486 W 1 19 1 0
463 1 9 1 0 91 1 484 W 1 19 1 0
487 1 5 2 0 116 19 13 Ps 1 17 1 0
488 1 7 2 0 116 19 13 Ps 2 17 1 0
498 1 5 2 0 0 0 431 Ps 2 17 1 0
502 1 5 2 0 0 0 431 Ps 15 13 1 0
503 1 5 2 0 0 0 431 Ps 16 13 1 0
504 1 5 2 0 0 0 431 Ps 19 13 1 0
505 1 5 2 0 0 0 431 Ps 20 13 1 0
531 1 13 2 0 350 79 493 W 1 13 1 0
532 1 13 2 0 350 79 493 W 2 17 1 0
533 1 13 2 0 350 79 493 W 1 18 1 0

Delete lines that contain decimal numbers

I am trying to delete lines that contain decimal numbers. For instance:
82.45 76.16 21.49 -2.775
5 24 13 6 9 0 3 2 4 9 7 11 54 11 1 1 18 5 0 0
1 1 0 2 2 0 0 0 0 0 0 0 14 90 21 5 24 26 73 13
20 33 23 59 158 85 17 6 158 66 15 13 13 10 2 37 81 0 0 0
1 3 0 19 8 158 75 7 10 8 5 1 23 58 148 77 120 78 6 7
158 80 15 10 16 21 6 37 100 25 0 0 0 0 0 3 1 10 9 1
0 0 0 0 11 16 57 15 0 0 0 0 158 76 9 1 0 0 0 0
22 17 0 0 0 0 0 0
50.04 143.84 18.52 -1.792
3 0 0 0 0 0 0 0 36 0 0 0 2 4 0 1 23 2 0 0
8 24 4 12 21 9 5 2 0 0 0 4 40 0 0 0 0 0 0 12
150 11 2 7 12 16 4 59 72 8 30 88 68 83 15 27 21 11 49 94
6 1 1 8 17 8 0 0 0 0 0 5 150 150 33 46 9 0 0 20
28 49 81 150 76 5 8 17 36 23 41 48 7 1 16 88 0 3 0 0
0 0 0 0 36 108 13 9 2 0 3 61 19 26 14 34 27 8 98 150
14 2 0 1 1 0 115 150
114.27 171.37 10.74 -2.245
.................. and this pattern continues for thousands of lines and likewise I have about 3000 files with similar pattern of data.
So, I want to delete lines that have these decimal numbers. In most cases, every 8th line has decimal numbers and hence I tried using awk 'NR % 8! == 0' < file_name. But the problem is, not all files in the database have their every 8th line as decimal numbers. So, is there a way in which I can delete the lines that have decimal numbers? I am coding in python 2.7 in ubuntu.
You can just look for lines containing decimal limiters:
with open('filename_without_decimals.txt','wb') as of:
with open('filename.txt') as fp:
for line in fp:
if line.index(".") == -1: of.write(line)
If you prefer to use sed, would be cleaner:
sed -i '/\./d' file.txt
The solution would be something like
file = open('textfile.txt')
text = ""
for line in file.readLines():
if '.' not in line:
text += line
print text
have you tried this:
using awk:
awk '!/\./{print}' your_file
deci = open('with.txt')
no_deci = open('without.txt', 'w')
for line in with_deci.readlines():
if '.' not in line:
no_deci.write(line)
deci.close()
no_deci.close()
readlines returns a list of all the lines in the file.

Categories