Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I would like to scrape data from a tweet's volume chart on https://bitinfocharts.com into some kind of data file using python or r. I'm very new to python and do not know how to do this. I've looked at other questions in the forum but I was not able to do it
The chart I'm interested in is the following: https://bitinfocharts.com/comparison/decred-tweets.html#1y
I'm looking for a data table with each date and the respective number of tweets for that day as the columns.
I would really appreciate your help.
Possible there is more elegant solution, but the data is embedded within the script tags. It's just a matter of pulling that out and parsing it into a table:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
def parse_strlist(sl):
clean = re.sub("[\[\],\s]","",sl)
splitted = re.split("[\'\"]",clean)
values_only = [s for s in splitted if s != '']
return values_only
url = 'https://bitinfocharts.com/comparison/decred-tweets.html#1y'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script')
for script in scripts:
if 'd = new Dygraph(document.getElementById("container")' in script.text:
StrList = script.text
StrList = '[[' + StrList.split('[[')[-1]
StrList = StrList.split(']]')[0] +']]'
StrList = StrList.replace("new Date(", '').replace(')','')
dataList = parse_strlist(StrList)
date = []
tweet = []
for each in dataList:
if (dataList.index(each) % 2) == 0:
date.append(each)
else:
tweet.append(each)
df = pd.DataFrame(list(zip(date, tweet)), columns=["Date","Decred - Tweets"])
Output:
print (df)
Date Decred - Tweets
0 2018/01/08 69
1 2018/01/09 200
2 2018/01/10 163
3 2018/01/11 210
4 2018/01/12 256
5 2018/01/13 185
6 2018/01/14 147
7 2018/01/15 119
8 2018/01/16 169
9 2018/01/17 176
10 2018/01/18 209
11 2018/01/19 179
12 2018/01/20 274
13 2018/01/21 124
14 2018/01/22 185
15 2018/01/23 110
16 2018/01/24 109
17 2018/01/25 86
18 2018/01/26 49
19 2018/01/27 null
20 2018/01/28 null
21 2018/01/29 null
22 2018/01/30 null
23 2018/01/31 194
24 2018/02/01 197
25 2018/02/02 163
26 2018/02/03 73
27 2018/02/04 98
28 2018/02/05 210
29 2018/02/06 215
.. ... ...
680 2019/11/19 58
681 2019/11/20 67
682 2019/11/21 72
683 2019/11/22 79
684 2019/11/23 46
685 2019/11/24 38
686 2019/11/25 81
687 2019/11/26 57
688 2019/11/27 54
689 2019/11/28 60
690 2019/11/29 55
691 2019/11/30 40
692 2019/12/01 39
693 2019/12/02 71
694 2019/12/03 93
695 2019/12/04 44
696 2019/12/05 41
697 2019/12/06 34
698 2019/12/07 40
699 2019/12/08 44
700 2019/12/09 47
701 2019/12/10 47
702 2019/12/11 64
703 2019/12/12 61
704 2019/12/13 67
705 2019/12/14 93
706 2019/12/15 59
707 2019/12/16 86
708 2019/12/17 82
709 2019/12/18 51
[710 rows x 2 columns]
Related
I am trying to linearly interpolate a series of CDS rates. Below is the data I have available; it is represented as maturities and in years.
Maturity Company 1 Company 2
0 0.5 186.73 186.73
1 1.0 210.65 210.65
2 2.0 249.09 249.09
3 3.0 285.4 285.4
4 4.0 317.59 317.59
5 5.0 344.06 344.06
6 6.0 363.01 363.01
7 7.0 375.69 375.69
8 8.0 384.31 384.31
9 9.0 391.0 391.0
10 10.0 396.12 396.12
I am now trying to use this set of maturities and their CDS rate in a similar format to interpolate the rate; below is an example of the maturities I will need to interpolate.
Maturity Years
28 0.10410958904109589
29 0.1863013698630137
30 0.27671232876712326
31 0.3561643835616438
32 0.4328767123287671
33 0.5260273972602739
34 0.5945205479452055
35 0.684931506849315
36 0.7753424657534247
37 0.852054794520548
38 0.9397260273972603
39 1.0164383561643835
40 1.104109589041096
41 1.1863013698630136
42 1.2728102189781023
43 1.35492700729927
44 1.4343065693430657
45 1.5136861313868615
46 1.6012773722627738
47 1.686131386861314
48 1.770985401459854
49 1.853102189781022
50 1.9434306569343067
51 2.02007299270073
52 2.10492700729927
53 2.184306569343066
54 2.2751540041067764
55 2.3572895277207393
56 2.433949349760438
57 2.518822724161533
58 2.6009582477754964
59 2.6830937713894594
60 2.7707049965776864
61 2.8528405201916494
62 2.940451745379877
63 3.0198494182067077
64 3.1047227926078027
65 3.1813826146475015
66 3.2749178532311065
67 3.3543263964950714
68 3.4309967141292446
69 3.521358159912377
70 3.6007667031763417
71 3.6801752464403066
72 3.7677984665936473
73 3.8526834611171963
74 3.940306681270537
75 4.019715224534502
76 4.101861993428258
77 4.186746987951808
78 4.274760383386581
79 4.351437699680511
80 4.4281150159744405
81 4.518484710178001
82 4.600638977635782
83 4.68553172067549
84 4.767685988133272
85 4.8498402555910545
86 4.940209949794614
87 5.0196257416704695
88 5.099041533546326
89 5.18667275216796
90 5.269847477512711
91 5.354712553773954
92 5.434102463824795
93 5.518967540086038
94 5.595619867031678
95 5.685960109503324
96 5.776300351974971
97 5.8529526789206106
98 5.940555338287055
99 6.017207665232695
100 6.10481032459914
101 6.186937817755182
102 6.275154004106776
103 6.357289527720739
104 6.433949349760438
105 6.5160848733744015
106 6.6009582477754964
107 6.685831622176591
108 6.7734428473648185
109 6.852840520191649
110 6.945927446954141
111 7.014373716632443
112 7.104722792607803
113 7.186858316221766
114 7.275022817158503
115 7.3571645877700025
116 7.433830240340736
117 7.513233951931853
118 7.600851840584119
119 7.685731670216002
120 7.7706114998478855
121 7.852753270459385
122 7.943109218132035
123 8.019774870702769
124 8.10465470033465
125 8.184058411925768
126 8.274917853231106
127 8.357064622124863
128 8.433734939759036
129 8.518619934282585
130 8.600766703176342
131 8.6829134720701
132 8.77053669222344
133 8.852683461117197
134 8.940306681270537
135 9.019715224534503
136 9.104600219058051
137 9.181270536692224
138 9.272523643603783
139 9.35191637630662
In the past, I have created log-linear interpolation functions that will allow me to interpolate discount rates based on maturities represented as dates, not as yearly numerical values; below is an example of that function.
def loglinearinterpolation(df, list_of_dates, dt_rng_name, rate_rng_name):
asofDate = pd.to_datetime(list_of_dates)
low_lim = df[df[dt_rng_name] <= asofDate].tail(1)
upper_lim = df[df[dt_rng_name] >= asofDate].head(1)
if low_lim.index == upper_lim.index:
return low_lim[rate_rng_name].iloc[0]
mat_dt_min = low_lim[dt_rng_name].iloc[0]
mat_dt_max = upper_lim[dt_rng_name].iloc[0]
y_min = low_lim[rate_rng_name].iloc[0]
y_max = upper_lim[rate_rng_name].iloc[0]
return np.exp(((np.log(y_max) - np.log(y_min))/((mat_dt_max - mat_dt_min).days))*(asofDate - mat_dt_min).days + np.log(y_min))
df_Libor_interpolated = pd.DataFrame()
df_Libor_interpolated = [(pd.to_datetime(x), loglinearinterpolation(df_libor_curve, pd.to_datetime(
x), 'Dates', 'value')) for x in df_client_curve['Date'].unique()]
I now need to do a similar task using the same formula in the return function, except linearly, not log linearly; however, my code is breaking as the values I am feeding for dates are being converted to DateTime and providing numpy and timestamp comparison errors.
I have tried using the code below as a workaround; however, it is not providing me with the values my team expects.
[np.interp(x,df_cds['Maturity'],df_cds['Company 1']) for x in df_cds_interpolated['Maturity Years']]
Any guidance or insight on how I can modify the function and formula above to work with the input data I provided would be greatly appreciated!
This is a continuation of the following question:
Plot a line on a curve that is undersampled
I tried the solution provided but with real data and getting a straight line. The full data is pasted below:
mdcol, tvdcol = 'md_m', 'tvd_m'
df = df[[mdcol, tvdcol]].copy().set_index(mdcol)
df = df[~df.index.duplicated()]
data_intp = (df.reindex(index = range(int(df.index.min()), int(df.index.max())))
.reset_index() # optional, you could write 'index' in the second line plot, too.
.interpolate()
)
data_intp
Dataframe is shown below:
md_m tvd_m
0 0.00 0.00
1 281.00 281.00
2 300.00 300.00
3 330.00 330.00
4 360.00 360.00
5 390.00 390.00
6 420.00 420.00
7 450.00 450.00
8 480.00 480.00
9 510.00 510.00
10 540.00 539.99
11 570.00 569.98
12 600.00 599.97
13 630.00 629.94
14 660.00 659.91
15 690.00 689.88
16 720.00 719.84
17 750.00 749.80
18 780.00 779.75
19 810.00 809.69
20 840.00 839.58
21 870.00 869.34
22 900.00 898.90
23 930.00 928.19
24 950.00 947.55
25 960.00 957.18
26 970.00 966.76
27 980.00 976.32
28 990.00 985.83
29 1000.00 995.32
30 1010.00 1004.77
31 1020.00 1014.20
32 1030.00 1023.60
33 1040.00 1032.96
34 1050.00 1042.29
35 1060.00 1051.56
36 1070.00 1060.78
37 1080.00 1069.94
38 1090.00 1079.05
39 1100.00 1088.11
40 1110.00 1097.12
41 1120.00 1106.10
42 1130.00 1115.03
43 1140.00 1123.91
44 1150.00 1132.73
45 1160.00 1141.48
46 1170.00 1150.17
47 1180.00 1158.80
48 1190.00 1167.37
49 1200.00 1175.86
50 1210.00 1184.28
51 1220.00 1192.61
52 1230.00 1200.88
53 1240.00 1209.09
54 1250.00 1217.24
55 1260.00 1225.36
56 1270.00 1233.50
57 1280.00 1241.70
58 1290.00 1249.95
59 1300.00 1258.22
60 1310.00 1266.50
61 1320.00 1274.79
62 1330.00 1283.11
63 1340.00 1291.46
64 1350.00 1299.84
65 1360.00 1308.23
66 1370.00 1316.64
67 1380.00 1325.08
68 1390.00 1333.55
69 1400.00 1342.05
70 1410.00 1350.59
71 1420.00 1359.16
72 1430.00 1367.75
73 1440.00 1376.37
74 1450.00 1385.00
75 1460.00 1393.65
76 1470.00 1402.31
77 1480.00 1411.01
78 1490.00 1419.75
79 1500.00 1428.51
80 1510.00 1437.30
81 1520.00 1446.11
82 1530.00 1454.92
83 1540.00 1463.71
84 1550.00 1472.46
85 1560.00 1481.20
86 1570.00 1489.93
87 1580.00 1498.65
88 1590.00 1507.37
89 1600.00 1516.09
90 1610.00 1524.84
91 1620.00 1533.62
92 1630.00 1542.40
93 1640.00 1551.18
94 1650.00 1559.96
95 1660.00 1568.74
96 1670.00 1577.53
97 1680.00 1586.29
98 1690.00 1595.01
99 1700.00 1603.69
100 1710.00 1612.36
101 1720.00 1621.02
102 1730.00 1629.66
103 1740.00 1638.27
104 1750.00 1646.84
105 1760.00 1655.35
106 1770.00 1663.83
107 1780.00 1672.27
108 1790.00 1680.65
109 1800.00 1688.97
110 1810.00 1697.23
111 1820.00 1705.42
112 1830.00 1713.54
113 1840.00 1721.60
114 1850.00 1729.61
115 1860.00 1737.63
116 1870.00 1745.66
117 1880.00 1753.69
118 1890.00 1761.72
119 1900.00 1769.70
120 1910.00 1777.61
121 1920.00 1785.44
122 1930.00 1793.20
123 1940.00 1800.86
124 1950.00 1808.43
125 1960.00 1815.92
126 1970.00 1823.31
127 1980.00 1830.62
128 1990.00 1837.83
129 2000.00 1844.95
130 2010.00 1851.96
131 2020.00 1858.89
132 2030.00 1865.76
133 2040.00 1872.58
134 2050.00 1879.35
135 2060.00 1886.05
136 2070.00 1892.70
137 2080.00 1899.28
138 2090.00 1905.78
139 2100.00 1912.20
140 2110.00 1918.50
141 2120.00 1924.66
142 2130.00 1930.68
143 2140.00 1936.57
144 2150.00 1942.34
145 2160.00 1947.97
146 2170.00 1953.47
147 2180.00 1958.83
148 2190.00 1964.06
149 2200.00 1969.16
150 2210.00 1974.12
151 2220.00 1978.93
152 2230.00 1983.63
153 2240.00 1988.25
154 2250.00 1992.78
155 2260.00 1997.23
156 2270.00 2001.60
157 2280.00 2005.87
158 2290.00 2010.06
159 2300.00 2014.15
160 2310.00 2018.12
161 2320.00 2021.97
162 2330.00 2025.68
163 2340.00 2029.25
164 2373.20 2039.67
165 2401.60 2047.31
166 2430.80 2054.90
167 2459.70 2062.45
168 2488.30 2069.84
169 2488.30 2069.88
170 2489.97 2070.30
171 2493.30 2071.11
172 2503.50 2073.51
173 2519.97 2077.32
174 2549.97 2083.99
175 2563.51 2086.88
176 2579.97 2090.18
177 2609.97 2095.34
178 2639.97 2099.36
179 2662.86 2101.68
180 2752.86 2109.47
181 2759.97 2110.08
182 2789.97 2112.33
183 2819.97 2114.10
184 2849.97 2115.39
185 2879.97 2116.19
186 2902.87 2116.48
187 2909.96 2116.53
188 2939.96 2116.72
189 2969.96 2116.92
190 2999.96 2117.11
191 3029.96 2117.31
192 3059.96 2117.51
193 3089.96 2117.70
194 3119.96 2117.90
195 3149.96 2118.09
196 3179.96 2118.29
197 3209.96 2118.49
198 3239.96 2118.68
199 3252.87 2118.76
200 3352.87 2119.41
201 3359.96 2119.45
202 3389.96 2119.65
203 3419.96 2119.84
204 3449.96 2120.04
205 3479.96 2120.23
206 3509.96 2120.43
207 3539.96 2120.62
208 3569.96 2120.82
209 3599.96 2121.01
210 3629.96 2121.21
211 3652.87 2121.35
212 3779.95 2122.17
213 3852.87 2122.64
Plotting shows the horizontal line:
TOOLS = ["box_zoom", "reset", "save", "crosshair", "pan", "wheel_zoom" ,"lasso_select"]
p = figure(plot_width = 1600,
plot_height = 800, tools = TOOLS,
title = 'Well Survey', toolbar_location='above')
p.line(data_intp[mdcol], data_intp[tvdcol], line_width = 2, color='red')
show(p)
Not sure where the interpolation is going wrong. I was hoping it would interpolate between the points. Anyone have an idea what I am doing wrong here?
The issue with your data (and this proposed solution) is that you have a single duplicate value in md_m:
chk = df.groupby('md_m').agg({'tvd_m':['count',lambda x: list(x)]})
print(chk[chk['tvd_m']['count']>1])
returns:
Pandas can't "reindex from a duplicate axis" which is what this approach relies on and really, linear interpolation won't really work either when you have two identical x values and two distinct y values.
An extra layer of QA could be done on the input data, inspecting it beforehand like my snippet, and doing a groupby average or something like that if appropriate.
The only other thing I'd point out is using a range (i.e. integers) for the reindexing is kinda unneccessary --> you should be able to reindex with floats to any step size you want.
Thanks for the answer gmerrit123, but I believe I remove that error by using this line:
df = df[~df.index.duplicated()]
What solved it for me was converting the md column to INT, as the reindexing was running with step = 1 meter, but the raw data had 2 decimal points.
mdcol, tvdcol = 'md_m', 'tvd_m'
df = data[[mdcol, tvdcol]].copy()#.set_index(mdcol)#.reindex(index = range(int(df.index.min()), int(df.index.max())))
df[mdcol] = df[mdcol].astype('int')
df = df.set_index(mdcol)
df = df[~df.index.duplicated()]
data_intp = (df.reindex(index = range(int(df.index.min()), int(df.index.max()+ 1)))
.reset_index()
.interpolate()
)
I need to search through each value of a column, do some comparison with all entries of another column and, if certain conditions are met, print. I'm using the python code below and it works, but the drawback is that both columns have tens of thousands of entries, so it's very slow. Is there a more efficient way to do this?
for i in df1.index:
for j in df2['pdb']:
if df1['pdb'][i] == df2['pdb'][j]:
if df1['res1'][i] >= df2['start'][j] and df1['res2'][i] <= df2['end'][j]:
print(df1['pdb'][i], df2['PFAM_ACC'][j])
Example:
df1 =
pdb res1 res2
4xhfA 76 83
4xhfA 126 133
2mx1A 179 186
3s8lA 111 118
4ucmA 115 122
1pigA 119 126
4mavA 263 270
4mavA 289 296
3sbrA 101 108
3sbrA 148 155
3sbrA 158 165
3sbrA 222 229
3sbrA 394 401
5zeaA 83 90
5zeaC 562 569
5zeaD 32 39
5zeaD 89 96
5zeaG 277 284
df2 =
pdb start end PFAM_ACC
4xhfA 140 236 PF04205
1pigA 61 332 PF00128
1pigA 409 493 PF02806
3sbrA 171 241 PF18793
3sbrA 424 494 PF18764
3sbrA 558 635 PF00116
5zeaA 13 75 PF02874
5zeaC 13 75 PF02874
5zeaD 15 81 PF02874
5zeaG 13 75 PF02874
and I want to get as output:
1pigA PF00128
3sbrA PF18793
5zeaD PF02874
I hope it's more clear now.
Please let me know if you have any suggestions
Try:
x = df1.merge(df2, on="pdb")
out = x.loc[
(x["res1"] >= x["start"]) & (x["res2"] <= x["end"]), ["pdb", "PFAM_ACC"]
]
print(out)
Prints:
pdb PFAM_ACC
2 1pigA PF00128
13 3sbrA PF18793
21 5zeaD PF02874
I have some simulation results that I wish to pair with some static information I hold for their particular coordinates.
I am using pandas and the key dataframe looks like this:
Orig_lat Orig_lng Dest_lat Dest_lng Site Lane_1
51.4410925 -0.0913334 51.4431736 -0.0681643 6 E
51.4431736 -0.0681643 51.4410925 -0.0913334 6 W
51.6300955 -0.0781079 51.6489284 -0.0602954 7 N
51.648917 -0.0600521 51.6299841 -0.0779832 7 S
51.4648078 -0.301316 51.4573656 -0.3219232 9 S
51.4573656 -0.3219232 51.4649063 -0.3013827 9 N
51.412392 0.0743042 51.4088694 0.0800096 11 S
51.4088694 0.0800096 51.412392 0.0743042 11 N
51.4728599 -0.0235216 51.4804927 -0.0231821 14 N
The results dataframe looks like this:
distance duration duration_in_traffic Orig_lat Orig_lng Dest_lat Dest_lng
1456736402 1670 186 337 51.4431736 -0.0681643 51.4410925 -0.0913334
1456736416 508 73 73 51.4380877 -0.2131928 51.4417083 -0.2168077
1456736416 508 71 71 51.4417083 -0.2168077 51.4380877 -0.2131928
1456736417 578 83 82 51.5229177 -0.4402988 51.5180086 -0.4391647
1456736417 578 79 79 51.5180086 -0.4391647 51.5229177 -0.4402988
1456736417 894 148 155 51.489123 -0.3015009 51.4886771 -0.2894982
1456736418 894 170 163 51.4886771 -0.2894982 51.489123 -0.3015009
1456736418 410 88 88 51.5294107 0.107865 51.5296292 0.1019929
1456736418 410 91 90 51.5296292 0.1019929 51.5294107 0.107865
1456736419 821 90 102 51.6043935 -0.340337 51.6038698 -0.3521945
1456736419 821 96 121 51.6038698 -0.3521945 51.6043935 -0.340337
1456736419 263 48 47 51.3718957 -0.0471616 51.3741868 -0.0480754
1456736420 263 48 48 51.3741868 -0.0480754 51.3718957 -0.0471616
1456736421 426 59 58 51.5122705 -0.2177689 51.5086821 -0.2156843
1456736421 426 55 70 51.5086821 -0.2156843 51.5122705 -0.2177689
1456736421 471 57 57 51.3782746 -0.1864154 51.3800551 -0.1916053
I wish to harvest the Site and Lane_1 columns from the key and join these to the results dataframe using Orig_lat, Orig_lng, Dest_lat, Dest_lng, giving:
distance duration duration_in_traffic Orig_lat Orig_lng Dest_lat Dest_lng, Site, Lane_1
1456736402 1670 186 337 51.4431736 -0.0681643 51.4410925 -0.0913334
1456736416 508 73 73 51.4380877 -0.2131928 51.4417083 -0.2168077 41 N
1456736416 508 71 71 51.4417083 -0.2168077 51.4380877 -0.2131928 41 S
1456736417 578 83 82 51.5229177 -0.4402988 51.5180086 -0.4391647 42 S
1456736417 578 79 79 51.5180086 -0.4391647 51.5229177 -0.4402988 42 N
1456736417 894 148 155 51.489123 -0.3015009 51.4886771 -0.2894982 43 E
1456736418 894 170 163 51.4886771 -0.2894982 51.489123 -0.3015009 43 W
1456736418 410 88 88 51.5294107 0.107865 51.5296292 0.1019929 45 W
1456736418 410 91 90 51.5296292 0.1019929 51.5294107 0.107865 45 E
1456736419 821 90 102 51.6043935 -0.340337 51.6038698 -0.3521945 46 W
1456736419 821 96 121 51.6038698 -0.3521945 51.6043935 -0.340337 46 E
1456736419 263 48 47 51.3718957 -0.0471616 51.3741868 -0.0480754 48 N
1456736420 263 48 48 51.3741868 -0.0480754 51.3718957 -0.0471616 48 S
1456736421 426 59 58 51.5122705 -0.2177689 51.5086821 -0.2156843 54 S
1456736421 426 55 70 51.5086821 -0.2156843 51.5122705 -0.2177689 54 N
1456736421 471 57 57 51.3782746 -0.1864154 51.3800551 -0.1916053 58 W
How would I use merge to achieve this?
IIUC use merge on columns Orig_lat, Orig_lng, Dest_lat, Dest_lng:
print pd.merge(dataframe, key, on=['Orig_lat','Orig_lng','Dest_lat','Dest_lng'])
distance duration duration_in_traffic Orig_lat Orig_lng Dest_lat \
0 1670 186 337 51.443174 -0.068164 51.441092
Dest_lng Site Lane_1
0 -0.091333 6 W
My first function creates a list from my input file. I'm trying to use the list I created as a parameter for my second function. How would I do this? I understand that each function has its own namespace so the way I'm doing it wrong. I'm assuming I need to assign this variable in the global namespace.
def get_data(file_object):
while True:
try:
file_object=input("Enter the name of the input file: ")
input_file=open(file_object, "r")
break
except FileNotFoundError:
print("Error: file not found\n:")
student_db=[]
for line in input_file:
fields=(line.split())
name=int(fields[0])
exam1=int(fields[1])
exam2=int(fields[2])
exam3=int(fields[3])
in_class=int(fields[4])
projects=int(fields[5])
exercises=int(fields[6])
record=[name,exam1,exam2,exam3,in_class,projects,exercises]
student_db.append(record)
student_db.sort()
return student_db
#def calculate_grade(a_list):
# print(a_list)
#how do I use student_db as a parameter??
def main():
# a_list=student_db
# b=calculate_grade(a_list)
# print(b)
a=get_data("data.tiny.txt")
print(a)
Here is the input file I am using
031 97 108 113 48 217 14
032 97 124 147 45 355 15
033 140 145 175 50 446 14
034 133 123 115 46 430 15
035 107 92 136 45 278 13
036 98 115 130 37 387 15
037 117 69 131 34 238 12
038 134 125 132 50 434 15
039 125 116 178 50 433 15
040 125 142 156 50 363 15
041 77 51 68 45 219 15
042 122 142 182 50 447 15
043 103 123 102 46 320 15
044 106 100 127 50 362 15
045 125 110 140 50 396 15
046 120 98 129 48 325 13
047 89 70 80 46 302 14
048 99 130 103 50 436 15
049 100 87 148 17 408 13
050 104 47 91 37 50 9
Your main (and commented out code) should look like:
def calculate_grade(a_list):
print(a_list)
def main():
a_list=get_data("data.tiny.txt")
calculate_grade(a_list)
main()
Remember this:
If your function returns a value. then you would assign it to a variable in the global namespace and use it at different points. If it has a print statement then you do not need to use print again when you are calling it
In your example, the student_db is (as I understood) stored in the variable a. You can just pass that variable to the second function, so just add calculate_grade(a) to your main function (after defining it, obviously).
The function get_data() returning a list which can be assigned to local variable and passed to other functions. like you are doing it now.
a=get_data("data.tiny.txt")
calculate_grade(a)
we can't directly use student_db[] because it is local to get_data(). it can declared/used as global but ultimately they make your program less flexible and more likely to contain errors that will be difficult to spot.
other approach would be using methods (object oriented mechanism).
class FileList :
def get_data(self, file_object):
while True:
try:
file_object=input("Enter the name of the input file: ")
input_file=open(file_object, "r")
break
except FileNotFoundError:
print("Error: file not found\n:")
self.student_db=[]
for line in input_file:
fields=(line.split())
name=int(fields[0])
exam1=int(fields[1])
exam2=int(fields[2])
exam3=int(fields[3])
in_class=int(fields[4])
projects=int(fields[5])
exercises=int(fields[6])
record=[name,exam1,exam2,exam3,in_class,projects,exercises]
self.student_db.append(record)
self.student_db.sort()
def print_object(self):
print self.student_db
def main():
myobj=FileList()
myobj.get_data("data.tiny.txt")
myobj.print_object()