Importing CSV with time losing format

Importing CSV with time losing format - python

I am trying to import csv with time, but when I import it, it does not show the correct format, how can I fix the time uploaded format?
import pandas as pd
# load the data from the CSV file
data = pd.read_csv('H_data - Copy.csv')
print (data.head(5))
Result :
0 0 24:39.5
1 1 25:20.4
2 2 25:56.1
3 3 26:36.1
4 4 27:21.0
CSV looks like this when I copy and past, not sure how to upload it here:
time
0 24:39.5
1 25:20.4
2 25:56.1
3 26:36.1
4 27:21.0
5 27:57.1
6 28:34.2
7 29:11.0
8 29:47.6
9 30:27.4
10 31:06.6
11 31:46.9
12 32:22.9
13 32:58.4
14 33:30.3
15 34:13.2
16 34:51.8
17 35:32.8
18 36:04.5
19 36:46.4
20 37:27.0
21 37:58.2
22 38:43.1
23 39:23.5
24 39:54.6
25 40:39.5
26 41:15.1
27 41:55.6
28 42:27.8

Related

Calculate a prediction interval for a dataset Python

I have the following table:
perc
0 59.98797
1 61.89383
2 61.08403
3 61.00661
4 62.64753
5 62.18118
6 60.74520
7 57.83964
8 62.09705
9 57.07985
10 58.62777
11 60.02589
12 58.74948
13 59.14136
14 58.37719
15 58.27401
16 59.67806
17 58.62855
18 58.45272
19 57.62186
20 58.64749
21 58.88152
22 54.80138
23 59.57697
24 60.26713
25 60.96022
26 55.59813
27 60.32104
28 57.95403
29 58.90658
30 53.72838
31 57.03986
32 58.14056
33 53.62257
34 57.08174
35 57.26881
36 48.80800
37 56.90632
38 59.08444
39 57.36432
consisting of various percentages.
I'm interested in creating a probability distribution based on these percentages for the sake of coming up with a prediction interval (say 95%) of what we would expect a new observation of this percentage to be within.
I initially was doing the following, but upon testing with my sample data I remembered that CIs capture the mean, not a new observation.
import scipy.stats as st
import numpy as np
# Get data in a list
lst = list(percDone['perc'])
# create 95% confidence interval
st.t.interval(alpha=0.95, df=len(lst)-1,
loc=np.mean(lst),
scale=st.sem(lst))
Thanks!

How to change a list of synsets to list elements?

I have tried out the following snippet of code for my project:
import pandas as pd
import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet')
df=[]
hypo = wn.synset('science.n.01').hyponyms()
hyper = wn.synset('science.n.01').hypernyms()
mero = wn.synset('science.n.01').part_meronyms()
holo = wn.synset('science.n.01').part_holonyms()
ent = wn.synset('science.n.01').entailments()
df = df+hypo+hyper+mero+holo+ent
df_agri_clean = pd.DataFrame(df)
df_agri_clean.columns=["Items"]
print(df_agri_clean)
pd.set_option('display.expand_frame_repr', False)
It has given me this output of a dataframe:
Items
0 Synset('agrobiology.n.01')
1 Synset('agrology.n.01')
2 Synset('agronomy.n.01')
3 Synset('architectonics.n.01')
4 Synset('cognitive_science.n.01')
5 Synset('cryptanalysis.n.01')
6 Synset('information_science.n.01')
7 Synset('linguistics.n.01')
8 Synset('mathematics.n.01')
9 Synset('metallurgy.n.01')
10 Synset('metrology.n.01')
11 Synset('natural_history.n.01')
12 Synset('natural_science.n.01')
13 Synset('nutrition.n.03')
14 Synset('psychology.n.01')
15 Synset('social_science.n.01')
16 Synset('strategics.n.01')
17 Synset('systematics.n.01')
18 Synset('thanatology.n.01')
19 Synset('discipline.n.01')
20 Synset('scientific_theory.n.01')
21 Synset('scientific_knowledge.n.01')
This can be converted to a list by just printing df.
[Synset('agrobiology.n.01'), Synset('agrology.n.01'), Synset('agronomy.n.01'), Synset('architectonics.n.01'), Synset('cognitive_science.n.01'), Synset('cryptanalysis.n.01'), Synset('information_science.n.01'), Synset('linguistics.n.01'), Synset('mathematics.n.01'), Synset('metallurgy.n.01'), Synset('metrology.n.01'), Synset('natural_history.n.01'), Synset('natural_science.n.01'), Synset('nutrition.n.03'), Synset('psychology.n.01'), Synset('social_science.n.01'), Synset('strategics.n.01'), Synset('systematics.n.01'), Synset('thanatology.n.01'), Synset('discipline.n.01'), Synset('scientific_theory.n.01'), Synset('scientific_knowledge.n.01')]
I wish to change every word under "Items" like so :
Synset('agrobiology.n.01') => agrobiology.n.01
or
Synset('agrobiology.n.01') => 'agrobiology'
Any answer associated will be appreciated! Thanks!

To access the name of these items, just do function.name(). You could use line comprehension update these items as follows:
df_agri_clean['Items'] = [df_agri_clean['Items'][i].name() for i in range(len(df_agri_clean))]
df_agri_clean
The output will be as you expected
Items
0 agrobiology.n.01
1 agrology.n.01
2 agronomy.n.01
3 architectonics.n.01
4 cognitive_science.n.01
5 cryptanalysis.n.01
6 information_science.n.01
7 linguistics.n.01
8 mathematics.n.01
9 metallurgy.n.01
10 metrology.n.01
11 natural_history.n.01
12 natural_science.n.01
13 nutrition.n.03
14 psychology.n.01
15 social_science.n.01
16 strategics.n.01
17 systematics.n.01
18 thanatology.n.01
19 discipline.n.01
20 scientific_theory.n.01
21 scientific_knowledge.n.01
To further replace ".n.01" as well from the string, you could do the following:
df_agri_clean['Items'] = [df_agri_clean['Items'][i].name().replace('.n.01', '') for i in range(len(df_agri_clean))]
df_agri_clean
Output (just like your second expected output)
Items
0 agrobiology
1 agrology
2 agronomy
3 architectonics
4 cognitive_science
5 cryptanalysis
6 information_science
7 linguistics
8 mathematics
9 metallurgy
10 metrology
11 natural_history
12 natural_science
13 nutrition.n.03
14 psychology
15 social_science
16 strategics
17 systematics
18 thanatology
19 discipline
20 scientific_theory
21 scientific_knowledge

What does the last line do?

apb = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
for i in range(26):
s = apb[i:26] + apb[0:i]
print("{:2d} {} ".format(i, s))
Supposed to output this
Sorry just started learning python and this can seem like a dumb question. I tried googling but it keeps telling me it has something to do with 2d array and I definietly know thats not the answer I am looking for.
I understand everything until the last line.
What does: print("{:2d} {} ".format(i, s)) do?

The format function replaces {} (a placeholder) with a variable. {:2d} is similar to the %2d printf format specifier in the C language where it reserves 2 spaces on the console for the variable. For example '{:2d}'.format(2) would print ' 2'. If you want, you can use {}, which would yield the same result, but the letters would not be aligned the same. With {:2d}:
0 ABCDEFGHIJKLMNOPQRSTUVWXYZ
1 BCDEFGHIJKLMNOPQRSTUVWXYZA
2 CDEFGHIJKLMNOPQRSTUVWXYZAB
3 DEFGHIJKLMNOPQRSTUVWXYZABC
4 EFGHIJKLMNOPQRSTUVWXYZABCD
5 FGHIJKLMNOPQRSTUVWXYZABCDE
6 GHIJKLMNOPQRSTUVWXYZABCDEF
7 HIJKLMNOPQRSTUVWXYZABCDEFG
8 IJKLMNOPQRSTUVWXYZABCDEFGH
9 JKLMNOPQRSTUVWXYZABCDEFGHI
10 KLMNOPQRSTUVWXYZABCDEFGHIJ
11 LMNOPQRSTUVWXYZABCDEFGHIJK
12 MNOPQRSTUVWXYZABCDEFGHIJKL
13 NOPQRSTUVWXYZABCDEFGHIJKLM
14 OPQRSTUVWXYZABCDEFGHIJKLMN
15 PQRSTUVWXYZABCDEFGHIJKLMNO
16 QRSTUVWXYZABCDEFGHIJKLMNOP
17 RSTUVWXYZABCDEFGHIJKLMNOPQ
18 STUVWXYZABCDEFGHIJKLMNOPQR
19 TUVWXYZABCDEFGHIJKLMNOPQRS
20 UVWXYZABCDEFGHIJKLMNOPQRST
21 VWXYZABCDEFGHIJKLMNOPQRSTU
22 WXYZABCDEFGHIJKLMNOPQRSTUV
23 XYZABCDEFGHIJKLMNOPQRSTUVW
24 YZABCDEFGHIJKLMNOPQRSTUVWX
25 ZABCDEFGHIJKLMNOPQRSTUVWXY
With {}:
0 ABCDEFGHIJKLMNOPQRSTUVWXYZ
1 BCDEFGHIJKLMNOPQRSTUVWXYZA
2 CDEFGHIJKLMNOPQRSTUVWXYZAB
3 DEFGHIJKLMNOPQRSTUVWXYZABC
4 EFGHIJKLMNOPQRSTUVWXYZABCD
5 FGHIJKLMNOPQRSTUVWXYZABCDE
6 GHIJKLMNOPQRSTUVWXYZABCDEF
7 HIJKLMNOPQRSTUVWXYZABCDEFG
8 IJKLMNOPQRSTUVWXYZABCDEFGH
9 JKLMNOPQRSTUVWXYZABCDEFGHI
10 KLMNOPQRSTUVWXYZABCDEFGHIJ
11 LMNOPQRSTUVWXYZABCDEFGHIJK
12 MNOPQRSTUVWXYZABCDEFGHIJKL
13 NOPQRSTUVWXYZABCDEFGHIJKLM
14 OPQRSTUVWXYZABCDEFGHIJKLMN
15 PQRSTUVWXYZABCDEFGHIJKLMNO
16 QRSTUVWXYZABCDEFGHIJKLMNOP
17 RSTUVWXYZABCDEFGHIJKLMNOPQ
18 STUVWXYZABCDEFGHIJKLMNOPQR
19 TUVWXYZABCDEFGHIJKLMNOPQRS
20 UVWXYZABCDEFGHIJKLMNOPQRST
21 VWXYZABCDEFGHIJKLMNOPQRSTU
22 WXYZABCDEFGHIJKLMNOPQRSTUV
23 XYZABCDEFGHIJKLMNOPQRSTUVW
24 YZABCDEFGHIJKLMNOPQRSTUVWX
25 ZABCDEFGHIJKLMNOPQRSTUVWXY

extracting upper and lower row if a condition is met

Regards.
I have the following coordinate dataframe, divided by blocks. Each block starts at seq0_leftend, seq0_rightend, seq1_leftend, seq1_rightend, seq2_leftend, seq2_rightend, seq3_leftend, seq3_rightend, and so on. I would like that, for each block given the condition if, coordinates are negative, extract the upper and lower row. example of my dataframe file:
seq0_leftend seq0_rightend
0 7 107088
1 107089 108940
2 108941 362759
3 362760 500485
4 500486 509260
5 509261 702736
seq1_leftend seq1_rightend
0 1 106766
1 106767 108619
2 108620 355933
3 355934 488418
4 488419 497151
5 497152 690112
6 690113 700692
7 700693 721993
8 721994 722347
9 722348 946296
10 946297 977714
11 977715 985708
12 -985709 -990725
13 991992 1042023
14 1042024 1259523
15 1259524 1261239
seq2_leftend seq2_rightend
0 1 109407
1 362514 364315
2 109408 362513
3 364450 504968
4 -504969 -515995
5 515996 671291
6 -671295 -682263
7 682264 707010
8 -707011 -709780
9 709781 934501
10 973791 1015417
11 -961703 -973790
12 948955 961702
13 1015418 1069976
14 1069977 1300633
15 -1300634 -1301616
16 1301617 1344821
17 -1515463 -1596433
18 1514459 1515462
19 -1508094 -1514458
20 1346999 1361467
21 -1361468 -1367472
22 1369840 1508093
seq3_leftend seq3_rightend
0 1 112030
1 112031 113882
2 113883 381662
3 381663 519575
4 519576 528317
5 528318 724500
6 724501 735077
7 735078 759456
8 759457 763157
9 763158 996929
10 996931 1034492
11 1034493 1040984
12 -1040985 -1061402
13 1071212 1125426
14 1125427 1353901
15 1353902 1356209
16 1356210 1392818
seq4_leftend seq4_rightend
0 1 105722
1 105723 107575
2 107576 355193
3 355194 487487
4 487488 496220
5 496221 689560
6 689561 700139
7 700140 721438
8 721458 721497
9 721498 947183
10 947184 978601
11 978602 986595
12 -986596 -991612
13 994605 1046245
14 1046247 1264692
15 1264693 1266814
Finally write a new csv with the data of interest, an example of the final result that I would like, would be this:
seq1_leftend seq1_rightend
11 977715 985708
12 -985709 -990725
13 991992 1042023
seq2_leftend seq2_rightend
3 364450 504968
4 -504969 -515995
5 515996 671291
6 -671295 -682263
7 682264 707010
8 -707011 -709780
9 709781 934501
10 973791 1015417
11 -961703 -973790
12 948955 961702
14 1069977 1300633
15 -1300634 -1301616
16 1301617 1344821
17 -1515463 -1596433
18 1514459 1515462
19 -1508094 -1514458
20 1346999 1361467
21 -1361468 -1367472
22 1369840 1508093
seq3_leftend seq3_rightend
11 1034493 1040984
12 -1040985 -1061402
13 1071212 1125426
seq4_leftend seq4_rightend
11 978602 986595
12 -986596 -991612
13 994605 1046245

I assume that you have a list of DataFrames, let's call it src.
To convert a single DataFrame, define the following function:
def findRows(df):
col = df.iloc[:, 0]
if col.lt(0).any():
return df[col.lt(0) | col.shift(1).lt(0) | col.shift(-1).lt(0)]
else:
return None
Note that this function starts with reading column 0 from the source
DataFrame, so it is independent of the name of this column.
Then it checks whether any element in this column is < 0.
If found, the returned object is a DataFrame with rows which
contain a value < 0:
either in this element,
or in the previous element,
or in the next element.
If not found, this function returns None (from your expected result
I see that in such a case you don't want even any empty DataFrame).
The first stage is to collect results of this function called on each
DataFrame from src:
result = [ findRows(df) for df in src ]
An the last part is to filter out elements which are None:
result = list(filter(None.__ne__, result))
To see the result, run:
for df in result:
print(df)
For src containing first 3 of your DataFrames, I got:
seq1_leftend seq1_rightend
11 977715 985708
12 -985709 -990725
13 991992 1042023
seq2_leftend seq2_rightend
3 364450 504968
4 -504969 -515995
5 515996 671291
6 -671295 -682263
7 682264 707010
8 -707011 -709780
9 709781 934501
10 973791 1015417
11 -961703 -973790
12 948955 961702
14 1069977 1300633
15 -1300634 -1301616
16 1301617 1344821
17 -1515463 -1596433
18 1514459 1515462
19 -1508094 -1514458
20 1346999 1361467
21 -1361468 -1367472
22 1369840 1508093
As you can see, the resulting list contains only results
originating from the second and third source DataFrame.
The first was filtered out, since findRows returned
None from its processing.

Python Bokeh Getting Empty Heatmap

i'm attempting to make a HeatMap just like this one using Bokeh.
Here is my dataframe Data from which i'm trying to make the HeatMap
Day Code Total
0 1 6001 44
1 1 6002 40
2 1 6006 8
3 1 6008 2
4 1 6010 38
5 1 6011 1
6 1 6014 19
7 1 6018 1
8 1 6019 1
9 1 6023 10
10 1 6028 4
11 2 6001 17
12 2 6010 2
13 2 6014 4
14 2 6020 1
15 2 6028 2
16 3 6001 48
17 3 6002 24
18 3 6003 1
19 3 6005 1
20 3 6006 2
21 3 6008 18
22 3 6010 75
23 3 6011 1
24 3 6014 72
25 3 6023 34
26 3 6028 1
27 3 6038 3
28 4 6001 19
29 4 6002 105
30 5 6001 52
...
And here is my code:
from bokeh.io import output_file
from bokeh.io import show
from bokeh.models import (
ColumnDataSource,
HoverTool,
LinearColorMapper
)
from bokeh.plotting import figure
output_file('SHM_Test.html', title='SHM', mode='inline')
source = ColumnDataSource(Data)
TOOLS = "hover,save"
# Creating the Figure
SHM = figure(title="HeatMap",
x_range=[str(i) for i in range(1,32)],
y_range=[str(i) for i in range(6043,6000,-1)],
x_axis_location="above", plot_width=400, plot_height=970,
tools=TOOLS, toolbar_location='right')
# Figure Styling
SHM.grid.grid_line_color = None
SHM.axis.axis_line_color = None
SHM.axis.major_tick_line_color = None
SHM.axis.major_label_text_font_size = "5pt"
SHM.axis.major_label_standoff = 0
SHM.toolbar.logo = None
SHM.title.text_alpha = 0.3
# Color Mapping
CM = LinearColorMapper(palette='RdPu9', low=Data.Total.min(), high=Data.Total.max())
SHM.rect(x='Day', y="Code", width=1, height=1,source=source,
fill_color={'field': 'Total','transform': CM})
show(SHM)
When i excecute my code i don't get any errors but i just get an empty Frame, as shown in the image below.
I've been struggling trying to find where is my mistake, ¿Why i'm getting this? ¿Where is my error?

The problem with your code is the data type that you are setting for the x and y axis range and the data type of your ColumnDataSource are different. You are setting the x_range and y_range to be a list of strings, but from looking at your data in csv format it will be treated as integers.
In your case, you would want to make sure that your Day and Code column are in
string format.
This can be easily done using the pandas package with
Data['Day'] = Data['Day'].astype('str')
Data['Code'] = Date['Code'].astype('str')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Importing CSV with time losing format - python

Related

Calculate a prediction interval for a dataset Python

How to change a list of synsets to list elements?

What does the last line do?

extracting upper and lower row if a condition is met

Python Bokeh Getting Empty Heatmap

Categories

Resources