Debugging a print DataFrame issue in Pandas - python

How do I debug a problem with printing a Pandas DataFrame ? I call this function and then print the output (which is a Pandas DataFrame).
n=ion_tab(y_ion,cycles,t,pH)
print(n)
The last part of the output looks like this:
58 O2 1.784306e-35 4 86 7.3
60 HCO3- 5.751170e+02 5 86 7.3
61 Ca+2 1.825748e+02 5 86 7.3
62 CO2 3.928413e+01 5 86 7.3
63 CaHCO3+ 3.755015e+01 5 86 7.3
64 CaCO3 4.616840e+00 5 86 7.3
65 SO4-2 1.393365e+00 5 86 7.3
66 CO3-2 8.243118e-01 5 86 7.3
67 CaSO4 7.363058e-01 5 86 7.3
... ... ... ... ...
[65 rows x 5 columns]
But if I do an n.tail() command, I see the missing data that ... seems to suggest.
print n.tail()
Species ppm as ion Cycles Temp F pH
68 OH- 5.516061e-03 5 86 7.3
69 CaOH+ 6.097815e-04 5 86 7.3
70 HSO4- 5.395493e-06 5 86 7.3
71 CaHSO4+ 2.632098e-07 5 86 7.3
73 O2 1.783007e-35 5 86 7.3
[5 rows x 5 columns]
If I count the number of rows showing up on the screen, I get 60. If I add the 5 extra that show up with n.tail(), I get the expected 65 rows. Is there some limit in print that would only allow 60 rows ? What's causing ... at the end of my DataFrame ?
Initially I though there was something in the ion_tab function that was limiting the printing. But one I saw the missing data in the n.tail() statement, I got confused.
Any help in debugging this would be appreciated.

Pandas limits the number of rows printed by default. You can change that with pd.set_option
In [4]: pd.get_option('display.max_rows')
Out[4]: 60
In [5]: pd.set_option('display.max_rows', 100)

Related

How do I sort columns of numerical file data in python

I'm trying to write a piece of code in python to graph some data from a tab separated file with numerical data.
I'm very new to Python so I would appreciate it if any help could be dumbed down a little bit.
Basically, I have this file and I would like to take two columns from it, sort them each in ascending order, and then graph those sorted columns against each other.
First of all, you should not put code as images, since there is a functionality to insert and format here in the editor.
It's as simple as calling x.sort() and y.sort() since both of them are slices from data so that should work fine (assuming they are 1 dimensional arrays).
Here is an example:
import numpy as np
array = np.random.randint(0,100, size=50)
print(array)
Output:
[89 47 4 10 29 21 91 95 32 12 97 66 59 70 20 20 36 79 23 4]
So if we use the method mentioned before:
print(array.sort())
Output:
[ 4 4 10 12 20 20 21 23 29 32 36 47 59 66 70 79 89 91 95 97]
Easy as that :)

How to read one column data as one by one row in csv file using python

Here I have a dataset with three inputs. Three inputs x1,x2,x3. Here I want to read just x2 column and in that column data stepwise row by row.
Here I wrote a code. But it is just showing only letters.
Here is my code
data = pd.read_csv('data6.csv')
row_num =0
x=[]
for col in data:
if (row_num==1):
x.append(col[0])
row_num =+ 1
print(x)
result : x1,x2,x3
What I expected output is:
expected output x2 (read one by one row)
65
32
14
25
85
47
63
21
98
65
21
47
48
49
46
43
48
25
28
29
37
Subset of my csv file :
x1 x2 x3
6 65 78
5 32 59
5 14 547
6 25 69
7 85 57
8 47 51
9 63 26
3 21 38
2 98 24
7 65 96
1 21 85
5 47 94
9 48 15
4 49 27
3 46 96
6 43 32
5 48 10
8 25 75
5 28 20
2 29 30
7 37 96
Can anyone help me to solve this error?
If you want list from x2 use:
x = data['x2'].tolist()
I am not sure I even get what you're trying to do from your code.
What you're doing (after fixing the indentation to make it somewhat correct):
Iterate through all columns of your dataframe
Take the first character of the column name if row_num is equal to 1.
Based on this guess:
import pandas as pd
data = pd.read_csv("data6.csv")
row_num = 0
x = []
for col in data:
if row_num == 1:
x.append(col[0])
row_num = +1
print(x)
What you probably want to do:
import pandas as pd
data = pd.read_csv("data6.csv")
# Make a list containing the values in column 'x2'
x = list(data['x2'])
# Print all values at once:
print(x)
# Print one value per line:
for val in x:
print(val)
When you are using pandas you can use it. You can try this to get any specific column values by using list to direct convert into a list.For loop not needed
import pandas as pd
data = pd.read_csv('data6.csv')
print(list(data['x2']))

DataFrame max() not return max

Real beginner question here, but it is so simple, I'm genuinely stumped. Python/DataFrame newbie.
I've loaded a DataFrame from a Google Sheet, however any graphing or attempts at calculations are generating bogus results. Loading code:
# Setup
!pip install --upgrade -q gspread
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
worksheet = gc.open('Linear Regression - Brain vs. Body Predictor').worksheet("Raw Data")
rows = worksheet.get_all_values()
# Convert to a DataFrame and render.
import pandas as pd
df = pd.DataFrame.from_records(rows)
This seems to work fine and the data looks to be correctly loaded when I print out the DataFrame but running max() returns obviously false results. For example:
print(df[0])
print(df[0].max())
Will output:
0 3.385
1 0.48
2 1.35
3 465
4 36.33
5 27.66
6 14.83
7 1.04
8 4.19
9 0.425
10 0.101
11 0.92
12 1
13 0.005
14 0.06
15 3.5
16 2
17 1.7
18 2547
19 0.023
20 187.1
21 521
22 0.785
23 10
24 3.3
25 0.2
26 1.41
27 529
28 207
29 85
...
32 6654
33 3.5
34 6.8
35 35
36 4.05
37 0.12
38 0.023
39 0.01
40 1.4
41 250
42 2.5
43 55.5
44 100
45 52.16
46 10.55
47 0.55
48 60
49 3.6
50 4.288
51 0.28
52 0.075
53 0.122
54 0.048
55 192
56 3
57 160
58 0.9
59 1.62
60 0.104
61 4.235
Name: 0, Length: 62, dtype: object
Max: 85
Obviously, the maximum value is way out -- it should be 6654, not 85.
What on earth am I doing wrong?
First StackOverflow post, so thanks in advance.
If you check it, you'll see at the end of your print() that dtype=object. Also, you'll notice your pandas Series have "int" values along with "float" values (e.g. you have 6654 and 3.5 in the same Series).
These are good hints you have a series of strings, and the max operator here is comparing based on string comparing. You want, however, to have a series of numbers (specifically floats) and to compare based on number comparing.
Check the following reproducible example:
>>> df = pd.DataFrame({'col': ['0.02', '9', '85']}, dtype=object)
>>> df.col.max()
'9'
You can check that because
>>> '9' > '85'
True
You want these values to be considered floats instead. Use pd.to_numeric
>>> df['col'] = pd.to_numeric(df.col)
>>> df.col.max()
85
For more on str and int comparison, check this question

Write a pandas DataFrame mixing integers and floats in a csv file

I'm working with pandas DataFrames full of float numbers, but with integers in one every three lines (the whole line is made of integers). When I make a print df, all the values displayed are shown as floats (the integers values have a ``.000000```added) for example :
aromatics charged polar unpolar
Ac_obs_counts 712.000000 1486.000000 2688.000000 2792.000000
Ac_obs_freqs 0.092732 0.193540 0.350091 0.363636
Ac_pvalues 0.524752 0.099010 0.356436 0.495050
Am_obs_counts 10.000000 59.000000 62.000000 50.000000
Am_obs_freqs 0.055249 0.325967 0.342541 0.276243
Am_pvalues 0.495050 0.980198 0.356436 0.009901
Ap_obs_counts 18.000000 34.000000 83.000000 78.000000
Ap_obs_freqs 0.084507 0.159624 0.389671 0.366197
Ap_pvalues 0.524752 0.039604 0.980198 0.663366
When I use df.iloc[range(0, len(df.index), 3)], I see integers displayed :
aromatics charged polar unpolar
Ac_obs_counts 712 1486 2688 2792
Am_obs_counts 10 59 62 50
Ap_obs_counts 18 34 83 78
Pa_obs_counts 47 81 125 144
Pf_obs_counts 31 58 99 109
Pg_obs_counts 27 106 102 108
Ph_obs_counts 7 49 42 36
Pp_obs_counts 15 83 45 65
Ps_obs_counts 57 125 170 216
Pu_obs_counts 14 62 102 84
When I use df.to_csv("mydf.csv", sep=",", encoding="utf-8") , the integers are written as floats ; how can I force the writing as integers for these lines ? Would it be better to split the data in two DataFrames ?
Thanks in advance.
Simply call object
df.astype('object')
Out[1517]:
aromatics charged polar unpolar
Ac_obs_counts 712 1486 2688 2792
Ac_obs_freqs 0.092732 0.19354 0.350091 0.363636
Ac_pvalues 0.524752 0.09901 0.356436 0.49505
Am_obs_counts 10 59 62 50
Am_obs_freqs 0.055249 0.325967 0.342541 0.276243
Am_pvalues 0.49505 0.980198 0.356436 0.009901
Ap_obs_counts 18 34 83 78
Ap_obs_freqs 0.084507 0.159624 0.389671 0.366197
Ap_pvalues 0.524752 0.039604 0.980198 0.663366

Analysing Json file in Python using pandas

I have to analyse a lot of data doing my Bachelors project.
The data will be handed to me in .json files. My supervisor has told me that it should be fairly easy if I just use Pandas.
Since I am all new to Python (I have decent experience with MatLab and C though) I am having a rough start.
If someone would be so kind to explain me how to do this I would really appreciate it.
The files look like this:
{"columns":["id","timestamp","offset_freq","reprate_freq"],
"index":[0,1,2,3,4,5,6,7 ...
"data":[[526144,1451900097533,20000000.495000001,250000093.9642499983],[...
need to import the data and analyse it (make some plots), but I'm not sure how to import data like this..
Ps. I have Python and the required packages installed.
You did not give the full format of JSON file, but if it looks like
{"columns":["id","timestamp","offset_freq","reprate_freq"],
"index":[0,1,2,3,4,5,6,7,8,9],
"data":[[39,69,50,51],[62,14,12,49],[17,99,65,79],[93,5,29,0],[89,37,42,47],[83,79,26,29],[88,17,2,7],[95,87,34,34],[40,54,18,68],[84,56,94,40]]}
then you can do (I made up random numbers)
df = pd.read_json(file_name_or_Python_string, orient='split')
print df
id timestamp offset_freq reprate_freq
0 39 69 50 51
1 62 14 12 49
2 17 99 65 79
3 93 5 29 0
4 89 37 42 47
5 83 79 26 29
6 88 17 2 7
7 95 87 34 34
8 40 54 18 68
9 84 56 94 40

Categories