How to load 28 files with the same amount of rows and columns so it wont iterate index through all files data 0-2911, but only one file data with index 0-103 and give a second index 1-28 for every new file data started?
Here is the code that I wrote that iterates through all data:
import pandas as pd
import glob
path = r"C:/Users/Measurment_Data/Test_1"
all_files = glob.glob(path + "/*.dat")
li = []
for filename in all_files:
df = pd.read_csv(filename, sep="\t", names=["Voltage", "Current"], header=None)
li.append(df)
frame = pd.concat(li, axis = 0, ignore_index = True)
frame
Output:
ID Voltage Current
0 NaN 1.000000e+00
1 0.00 -3.047149e-06
2 0.04 -4.941096e-06
3 0.08 -4.472754e-06
4 0.12 -1.053477e-05
... ... ...
2907 -0.16 1.194359e-06
2908 -0.12 5.489425e-06
2909 -0.08 -9.656614e-09
2910 -0.04 -3.427169e-06
2911 -0.00 -2.173696e-06
I would like to have new indexes for every new loaded file. Something like this:
File ID Curr Volt
1 0 0.00 1.00E+00
1 1 0.00 -3.05E-06
1 2 0.04 -4.94E-06
...
1 102 0.08 -4.47E-06
1 103 0.12 -1.05E-05
...
2 0 0.00 2.00E+00
2 1 4.00 -3.05E-06
2 2 0.44 -3.94E-06
...
2 102 5.08 -6.47E-06
2 103 0.22 -6.05E-05
...
...
27 0 0.00 2.00E+00
27 1 4.00 -3.05E-06
27 2 0.44 -3.94E-06
...
27 102 5.08 -6.47E-06
27 103 0.22 -6.05E-05
...
28 0 0.00 2.00E+00
28 1 4.00 -3.05E-06
28 2 0.44 -3.94E-06
...
28 102 5.08 -6.47E-06
28 103 0.22 -6.05E-05
I would like to easily access the values of every file with index, so for example all values from 0-5 from 28 files.
Just define a new column after you read every file, then concatenate using default value of ignore_index:
import pandas as pd
import glob
path = r"C:/Users/Measurment_Data/Test_1"
all_files = glob.glob(path + "/*.dat")
li = []
j = 1
for filename in all_files:
df = pd.read_csv(filename, sep="\t", names=["Voltage", "Current"], header=None)
df.insert(0, 'File', '')
df["File"] = j
j += 1
li.append(df)
frame = pd.concat(li, axis = 0)
frame
Give it a try!
Related
While trying to read a .txt file in pandas, I'm getting an error where the imported file is only one row but has far too many columns.
This is one row from the data
1 182154.6-025557 18:21:54.63 -02:55:57.2 0.0 8.25e-03 1.5e-02 0.20 1.02e-01 -1.95e-01 1.5e-02 55 37 189 0.0 1.53e-01 3.3e-02 0.16 6.32e-01 7.24e-01 6.5e-02 46 29 59 6.2 2.91e-01 5.8e-02 0.17 4.62e-01 6.83e-01 7.0e-02 37 20 54 6.3 3.27e-01 6.2e-02 0.19 3.92e-01 5.51e-01 6.6e-02 37 26 47 0.0 2.28e-01 9.8e-02 0.12 2.50e-01 9.8e-02 46 36 43 7.6 1.1 0.24 0.5 4.6 40 22 36 2 0 starless
I'm using the following code to import the data:
data = pd.read_csv("data.txt", header=None, sep='\t', lineterminator='\r')
And this outputs:
0 1 ... 26254 26255
0 1 182154.6-025557 18:21:54.63 ... NaN CO high-V_LSR\n
[1 rows x 26256 columns]
Any advice on how to import this data correctly would be very helpful
Perhaps your .txt file isn't quite tab separated.
This code should work for reading in multiple lines from your file. It just splits items if there is whitespace between them.
with open('data.txt', 'r') as f:
raw_data = f.readlines()
data = []
for line in raw_data:
data.append([l for l in line.strip().split(' ') if l !=''])
pd.DataFrame(data)
I get the following output (dataframe with 63 columns)
0 1 2 3 4 5 6 \
0 182154.6-025557 18:21:54.63 -02:55:57.2 0.0 8.25e-03 1.5e-02 0.20
7 8 9 ... 53 54 55 56 57 58 59 60 61 \
0 1.02e-01 -1.95e-01 1.5e-02 ... 1.1 0.24 0.5 4.6 40 22 36 2 0
62
0 starless
[1 rows x 63 columns]
Either that or you want to try…
data = pd.read_csv("data.txt", header=None, sep='\t', lineterminator='\n')
I'm working with the following data, where ultimately instead of having old/new in the variable name, I would like to compare old x with new x, old y with new y, etc., where the oldness and newness is contained in the "age" variable.
np.random.seed(5)
dat = []
for r in range(100):
v = np.random.rand(6)
rec = {
"i": r,
"old_x": v[0],
"old_y": v[1],
"old_z": v[2],
"new_x": v[3],
"new_y": v[4],
"new_z": v[5],
}
dat.append(rec)
df = pd.DataFrame(dat)
>>> df
i old_x old_y old_z new_x new_y new_z
0 0 0.110519 0.096792 0.980107 0.156369 0.540795 0.358307
1 1 0.292648 0.623699 0.376485 0.271227 0.931222 0.391800
2 2 0.872280 0.412259 0.831854 0.417520 0.874671 0.267805
3 3 0.497580 0.342821 0.338618 0.447617 0.618905 0.630221
4 4 0.611636 0.413489 0.302103 0.855590 0.061317 0.155975
.. .. ... ... ... ... ... ...
95 95 0.798706 0.085928 0.215995 0.819614 0.074777 0.876801
96 96 0.997671 0.344107 0.335971 0.199516 0.238919 0.852654
97 97 0.437936 0.924561 0.668733 0.148862 0.166350 0.861785
98 98 0.822570 0.426939 0.935153 0.771598 0.555669 0.639590
99 99 0.849823 0.960070 0.437960 0.675045 0.745331 0.428660
[100 rows x 7 columns]
I'd like to reshape this into a dataframe given by columns = ["age", "x", "y", "z"] where age takes values ["old", "new"].
Here's what I tried:
>>> pd.wide_to_long(df, stubnames=['old',"new"], i='i', j='age', sep='_', suffix=r'\w+')
old new
i age
0 x 0.110519 0.156369
1 x 0.292648 0.271227
2 x 0.872280 0.417520
3 x 0.497580 0.447617
4 x 0.611636 0.855590
... ... ...
95 z 0.215995 0.876801
96 z 0.335971 0.852654
97 z 0.668733 0.861785
98 z 0.935153 0.639590
99 z 0.437960 0.428660
[300 rows x 2 columns]
You can see this is kind of the reverse of what I'm looking for. This also didn't work:
df.pivot_table(values=["x", "y", "z"], index=[i], columns='age')
KeyError: 'x'
What I'm looking for is more like:
>>> df
i x y z age
0 0 0.110519 0.096792 0.980107 old
0 0.156369 0.540795 0.358307 new
1 1 0.292648 0.623699 0.376485 old
1 0.271227 0.931222 0.391800 new
2 2 0.872280 0.412259 0.831854 old
2 0.417520 0.874671 0.267805 new
I'm fine if "old" and "new" are bools rather than strings.
Another option:
df.set_index(['i'], inplace=True)
df.columns = df.columns.str.split('_').map(tuple)
df.stack(level=0).rename_axis(('i', 'age')).reset_index()
# i age x y z
#0 0 new 0.918611 0.488411 0.611744
#1 0 old 0.221993 0.870732 0.206719
#2 1 new 0.187721 0.080741 0.738440
#3 1 old 0.765908 0.518418 0.296801
#4 2 new 0.274086 0.414235 0.296080
#.. .. ... ... ... ...
#195 97 old 0.960385 0.784069 0.922694
#196 98 new 0.056743 0.165556 0.430358
#197 98 old 0.460486 0.734635 0.953751
#198 99 new 0.174529 0.041988 0.635096
#199 99 old 0.027449 0.359603 0.423178
#[200 rows x 5 columns]
Method 1: using columns.str.split and stack:
df.columns = df.columns.str.split("_", expand=True)
df = df.stack(level=0).reset_index(level=1).rename(columns={"level_1": "age"})
age x y z
0 new 0.92 0.49 0.61
0 old 0.22 0.87 0.21
1 new 0.19 0.08 0.74
1 old 0.77 0.52 0.30
2 new 0.27 0.41 0.30
.. ... ... ... ...
97 old 0.96 0.78 0.92
98 new 0.06 0.17 0.43
98 old 0.46 0.73 0.95
99 new 0.17 0.04 0.64
99 old 0.03 0.36 0.42
[200 rows x 4 columns]
Method 2: Melt and Pivot
We can use melt then split your column names and pivot back again:
d = df.melt(id_vars="i", var_name="age")
d[["age", "columns"]] = d["age"].str.split("_", expand=True)
d = d.pivot_table(index=["i", "age"], columns="columns", values="value")
d = d.reset_index(level="age").rename_axis(columns=None)
age x y z
i
0 new 0.92 0.49 0.61
0 old 0.22 0.87 0.21
1 new 0.19 0.08 0.74
1 old 0.77 0.52 0.30
2 new 0.27 0.41 0.30
.. ... ... ... ...
97 old 0.96 0.78 0.92
98 new 0.06 0.17 0.43
98 old 0.46 0.73 0.95
99 new 0.17 0.04 0.64
99 old 0.03 0.36 0.42
[200 rows x 4 columns]
You could use pivot_longer from pyjanitor to reshape the data:
df.pivot_longer(index = 'i', names_to=("age", ".value"), names_sep="_")
i age x y z
0 0 old 0.221993 0.870732 0.206719
1 1 old 0.765908 0.518418 0.296801
2 2 old 0.441309 0.158310 0.879937
3 3 old 0.628788 0.579838 0.599929
4 4 old 0.327564 0.144164 0.165613
.. .. ... ... ... ...
195 95 new 0.779014 0.014644 0.692856
196 96 new 0.083641 0.930439 0.185207
197 97 new 0.626007 0.351780 0.699121
198 98 new 0.056743 0.165556 0.430358
199 99 new 0.174529 0.041988 0.635096
[200 rows x 5 columns]
In the code above, names_to determines how the new column names will be: you have old_x, old_y,... age will be paired with old, new, while x, y, z will be paired with .value. .value indicates to the function that x, y and z will remain as column names, while the others will be transformed into the age column.
You could also stick to pandas only and use wide_to_long; first reorder the columns:
new_df = df.rename(columns = lambda col: "_".join(col.split("_")[::-1])
if "_" in col else col)
new_df
i x_old y_old z_old x_new y_new z_new
0 0 0.221993 0.870732 0.206719 0.918611 0.488411 0.611744
1 1 0.765908 0.518418 0.296801 0.187721 0.080741 0.738440
2 2 0.441309 0.158310 0.879937 0.274086 0.414235 0.296080
3 3 0.628788 0.579838 0.599929 0.265819 0.284686 0.253588
4 4 0.327564 0.144164 0.165613 0.963931 0.960227 0.188415
.. .. ... ... ... ... ... ...
95 95 0.883177 0.936967 0.771458 0.779014 0.014644 0.692856
96 96 0.034320 0.754875 0.424930 0.083641 0.930439 0.185207
97 97 0.960385 0.784069 0.922694 0.626007 0.351780 0.699121
98 98 0.460486 0.734635 0.953751 0.056743 0.165556 0.430358
99 99 0.027449 0.359603 0.423178 0.174529 0.041988 0.635096
Let's reshape:
(pd.wide_to_long(new_df,
stubnames = ['x', 'y', 'z'],
i = 'i',
j = 'age',
sep = '_',
suffix = '.+')
.reset_index()
)
i age x y z
0 0 old 0.221993 0.870732 0.206719
1 1 old 0.765908 0.518418 0.296801
2 2 old 0.441309 0.158310 0.879937
3 3 old 0.628788 0.579838 0.599929
4 4 old 0.327564 0.144164 0.165613
.. .. ... ... ... ...
195 95 new 0.779014 0.014644 0.692856
196 96 new 0.083641 0.930439 0.185207
197 97 new 0.626007 0.351780 0.699121
198 98 new 0.056743 0.165556 0.430358
199 99 new 0.174529 0.041988 0.635096
[200 rows x 5 columns]
pivot_longer is a wrapper around Pandas functions, and helps abstract the reshaping process while dealing with things like duplicated index; it is also efficient
x = pd.concat(
[
df.filter(like="old")
.rename(columns=lambda x: x.split("_")[1])
.assign(age="old"),
df.filter(like="new")
.rename(columns=lambda x: x.split("_")[1])
.assign(age="new"),
],
).sort_index(kind="mergesort")
print(x)
Prints:
x y z age
0 0.221993 0.870732 0.206719 old
0 0.918611 0.488411 0.611744 new
1 0.765908 0.518418 0.296801 old
1 0.187721 0.080741 0.738440 new
2 0.441309 0.158310 0.879937 old
2 0.274086 0.414235 0.296080 new
3 0.628788 0.579838 0.599929 old
3 0.265819 0.284686 0.253588 new
4 0.327564 0.144164 0.165613 old
4 0.963931 0.960227 0.188415 new
5 0.024307 0.204556 0.699844 old
5 0.779515 0.022933 0.577663 new
6 0.001642 0.515473 0.639795 old
6 0.985624 0.259098 0.802497 new
7 0.870483 0.922750 0.002214 old
7 0.469488 0.981469 0.398945 new
8 0.813732 0.546456 0.770854 old
...
I am trying to do something that I think should be rather simple but I am stuck.
I would like to be able to get the standard deviation of each column in my dataframe and remove that column if the standard deviation is below a set number. This is as far as I have gotten.
stdev_min = 0.6
df = pd.DataFrame(np.random.randn(20, 5), columns=list('ABCDE'))
namelist = list(df.columns.values.tolist())
stdev = pd.DataFrame(df.std())
I've tried a few things but nothing worth mentioning, any help would be greatly appreciated.
You don't need any loops.
You rarely do with pandas.
In this case, you need boolean indexing:
import pandas
import numpy
numpy.random.seed(37)
stdev_min = 0.95
df = pandas.DataFrame(numpy.random.randn(20, 5), columns=list('ABCDE'))
So now df.std() gives me:
A 0.928547
B 0.859394
C 0.998692
D 1.187380
E 1.092970
dtype: float64
so I can do
df.loc[:, df.std() > stdev_min]
And get:
C D E
0 0.35 -1.30 1.52
1 -0.45 0.96 -0.83
2 0.52 -0.06 -0.03
3 1.89 0.40 0.19
4 -0.27 -2.07 -0.71
5 -1.72 -0.40 1.27
6 0.44 -2.05 -0.23
7 1.76 0.06 0.36
8 -0.30 -2.05 1.68
9 0.34 1.26 -1.08
10 0.10 -0.48 -1.74
11 1.95 -0.08 1.51
12 0.43 -0.06 -0.63
13 -0.30 -1.06 0.57
14 -0.95 -1.45 0.93
15 -1.13 2.23 -0.88
16 -0.77 0.86 0.58
17 0.93 -0.11 -1.29
18 -0.82 0.03 -0.44
19 0.40 1.13 -1.89
Here's a way to do this.
Iterate through each column. Get the Standard Deviation for the column. Check if it is less than the minimum standard deviation value. If it is, drop the column using inplace=True
stdev_min = 0.6
df = pd.DataFrame(np.random.randn(10, 5), columns=list('ABCDE'))
for col in df.columns:
print (col, df[col].std())
if df[col].std() < stdev_min:
df.drop(col,axis='columns', inplace=True)
print (df)
Output:
A 0.5046725928657507
B 1.1382221163449697
C 1.0318169576864502
D 0.7129102193331575
E 1.3805207184389312
The value of A is less than 0.6 and so it got dropped.
B C D E
0 -0.923822 1.155547 -0.601033 -0.066207
1 0.068844 0.426304 -0.376052 0.368574
2 0.585187 -0.367270 0.530934 0.086811
3 0.021466 1.381579 0.483134 -0.300033
4 0.351492 -0.648734 -0.736213 0.827953
5 0.155731 -0.004504 0.315432 0.310515
6 -1.092933 1.341933 -0.672240 -3.482960
7 -0.587766 0.227846 0.246781 1.978528
8 1.565055 0.527668 -0.371854 -0.030196
9 -2.634862 -1.973874 1.508080 -0.362073
Did a few more runs. Here's an example with before and after.
DF before
A B C D E
0 0.496740 0.799021 1.655287 0.091138 0.309186
1 -0.580667 -0.749337 -0.521909 -0.529410 1.010981
2 0.212731 0.126389 -2.244500 0.400540 -0.148761
3 -0.424375 -0.832478 -0.030865 -0.561107 0.196268
4 0.229766 0.688040 0.580294 0.941885 1.554929
5 0.676926 -0.062092 -1.452619 0.952388 -0.963857
6 0.683216 0.747429 -1.834337 -0.402467 -0.383881
7 0.834815 -0.770804 1.299346 1.694612 1.171190
8 0.500445 -1.517488 0.610287 -0.601442 0.343389
9 -0.182286 -0.713332 0.526507 1.042717 1.229628
Standard Deviations for each column of DF:
A 0.49088743174291477
B 0.8047513692231202
C 1.333382184686379
D 0.8248456756163864
E 0.8033725216710547
df['A'] is less than 0.6 and so got dropped.
DF after dropping the column.
B C D E
0 0.799021 1.655287 0.091138 0.309186
1 -0.749337 -0.521909 -0.529410 1.010981
2 0.126389 -2.244500 0.400540 -0.148761
3 -0.832478 -0.030865 -0.561107 0.196268
4 0.688040 0.580294 0.941885 1.554929
5 -0.062092 -1.452619 0.952388 -0.963857
6 0.747429 -1.834337 -0.402467 -0.383881
7 -0.770804 1.299346 1.694612 1.171190
8 -1.517488 0.610287 -0.601442 0.343389
9 -0.713332 0.526507 1.042717 1.229628
I have a little code for scraping info from fbref (link for data: https://fbref.com/en/comps/9/stats/Premier-League-Stats) and it worked well but now I have some problems with some features (I've checked that the fields which don't work now are"player","nationality","position","squad","age","birth_year"). I have also checked that the fields have the same name in the web that it used to be. Any ideas/help to solve the problem?
Many Thanks!
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
import sys, getopt
import csv
def get_tables(url):
res = requests.get(url)
## The next two lines get around the issue with comments breaking the parsing.
comm = re.compile("<!--|-->")
soup = BeautifulSoup(comm.sub("",res.text),'lxml')
all_tables = soup.findAll("tbody")
team_table = all_tables[0]
player_table = all_tables[1]
return player_table, team_table
def get_frame(features, player_table):
pre_df_player = dict()
features_wanted_player = features
rows_player = player_table.find_all('tr')
for row in rows_player:
if(row.find('th',{"scope":"row"}) != None):
for f in features_wanted_player:
cell = row.find("td",{"data-stat": f})
a = cell.text.strip().encode()
text=a.decode("utf-8")
if(text == ''):
text = '0'
if((f!='player')&(f!='nationality')&(f!='position')&(f!='squad')&(f!='age')&(f!='birth_year')):
text = float(text.replace(',',''))
if f in pre_df_player:
pre_df_player[f].append(text)
else:
pre_df_player[f] = [text]
df_player = pd.DataFrame.from_dict(pre_df_player)
return df_player
stats = ["player","nationality","position","squad","age","birth_year","games","games_starts","minutes","goals","assists","pens_made","pens_att","cards_yellow","cards_red","goals_per90","assists_per90","goals_assists_per90","goals_pens_per90","goals_assists_pens_per90","xg","npxg","xa","xg_per90","xa_per90","xg_xa_per90","npxg_per90","npxg_xa_per90"]
def frame_for_category(category,top,end,features):
url = (top + category + end)
player_table, team_table = get_tables(url)
df_player = get_frame(features, player_table)
return df_player
top='https://fbref.com/en/comps/9/'
end='/Premier-League-Stats'
df1 = frame_for_category('stats',top,end,stats)
df1
I suggest loading the table with panda's read_html. There is a direct link to this table under Share & Export --> Embed this Table.
import pandas as pd
df = pd.read_html("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fstats%2FPremier-League-Stats&div=div_stats_standard", header=1)
This outputs a list of dataframes, the table can be accessed as df[0]. Output df[0].head():
Rk
Player
Nation
Pos
Squad
Age
Born
MP
Starts
Min
90s
Gls
Ast
G-PK
PK
PKatt
CrdY
CrdR
Gls.1
Ast.1
G+A
G-PK.1
G+A-PK
xG
npxG
xA
npxG+xA
xG.1
xA.1
xG+xA
npxG.1
npxG+xA.1
Matches
0
1
Patrick van Aanholt
nl NED
DF
Crystal Palace
30-190
1990
16
15
1324
14.7
0
1
0
0
0
1
0
0
0.07
0.07
0
0.07
1.2
1.2
0.8
2
0.08
0.05
0.13
0.08
0.13
Matches
1
2
Tammy Abraham
eng ENG
FW
Chelsea
23-156
1997
20
12
1021
11.3
6
1
6
0
0
0
0
0.53
0.09
0.62
0.53
0.62
5.6
5.6
0.9
6.5
0.49
0.08
0.57
0.49
0.57
Matches
2
3
Che Adams
eng ENG
FW
Southampton
24-237
1996
26
22
1985
22.1
5
4
5
0
0
1
0
0.23
0.18
0.41
0.23
0.41
5.5
5.5
4.3
9.9
0.25
0.2
0.45
0.25
0.45
Matches
3
4
Tosin Adarabioyo
eng ENG
DF
Fulham
23-164
1997
23
23
2070
23
0
0
0
0
0
1
0
0
0
0
0
0
1
1
0.1
1.1
0.04
0.01
0.05
0.04
0.05
Matches
4
5
Adrián
es ESP
GK
Liverpool
34-063
1987
3
3
270
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Matches
If you're only after the player stats, change player_table = all_tables[1] to player_table = all_tables[2], because now you are feeding team table into get_frame function.
I tried it and it worked fine after that.
I have csv files that I need to join together based upon date but the dates in each file are not the same (i.e. some files start on 1/1/1991 and other in 1998). I have a basic start to the code (see below) but I am not sure where to go from here. Any tips are appreciated. Below please find a sample of the different csv I am trying to join.
import os, pandas as pd, glob
directory = r'C:\data\Monthly_Data'
files = os.listdir(directory)
print(files)
all_data =pd.DataFrame()
for f in glob.glob(directory):
df=pd.read_csv(f)
all_data=all_data.append(df,ignore_index=True)
all_data.describe()
File 1
DateTime F1_cfs F2_cfs F3_cfs F4_cfs F5_cfs F6_cfs F7_cfs
3/31/1991 0.860702028 1.167239264 0 0 0 0 0
4/30/1991 2.116930556 2.463493056 3.316688418
5/31/1991 4.056572581 4.544307796 5.562668011
6/30/1991 1.587513889 2.348215278 2.611659722
7/31/1991 0.55328629 1.089637097 1.132043011
8/31/1991 0.29702957 0.54186828 0.585073925 2.624375
9/30/1991 0.237083333 0.323902778 0.362583333 0.925563094 1.157786606 2.68722973 2.104090278
File 2
DateTime F1_mg-P_L F2_mg-P_L F3_mg-P_L F4_mg-P_L F5_mg-P_L F6_mg-P_L F7_mg-P_L
6/1/1992 0.05 0.05 0.06 0.04 0.03 0.18 0.08
7/1/1992 0.03 0.05 0.04 0.03 0.04 0.05 0.09
8/1/1992 0.02 0.03 0.02 0.02 0.02 0.02 0.02
File 3
DateTime F1_TSS_mgL F1_TVS_mgL F2_TSS_mgL F2_TVS_mgL F3_TSS_mgL F3_TVS_mgL F4_TSS_mgL F4_TVS_mgL F5_TSS_mgL F5_TVS_mgL F6_TSS_mgL F6_TVS_mgL F7_TSS_mgL F7_TVS_mgL
4/30/1991 10 7.285714286 8.5 6.083333333 3.7 3.1
5/31/1991 5.042553191 3.723404255 6.8 6.3 3.769230769 2.980769231
6/30/1991 5 5 1 1
7/31/1991
8/31/1991
9/30/1991 5.75 3.75 6.75 4.75 9.666666667 6.333333333 8.666666667 5 12 7.666666667 8 5.5 9 6.75
10/31/1991 14.33333333 9 14 10.66666667 16.25 11 12.75 9.25 10.25 7.25 29.33333333 18.33333333 13.66666667 9
11/30/1991 2.2 1.933333333 2 1.88 0 0 4.208333333 3.708333333 10.15151515 7.909090909 9.5 6.785714286 4.612903226 3.580645161
You didn't read the csv files correctly.
1) You need to comment out the following lines because you never use it later in your code.
files = os.listdir(directory)
print(files)
2) glob.glob(directory) didnt return any match files. glob.glob() takes pattern as argument, for example: 'C:\data\Monthly_Data\File*.csv', unfortunately you put a directory as a pattern, and no files are found
for f in glob.glob(directory):
I modified the above 2 parts and print all_data, the file contents display on my console