Cannot open eps file after saving figure - python

Normally, opening an eps file is no problem but with this current code in Python that I am working on, the exported eps file is loading when opened but never appearing. I have tried exporting the same figure as a png and that works fine. Also I have tried exporting a really simple figure as eps and that opens without any flaws. I have included some of the relevant code concerning the plot/figure. Any help would be much appreciated.
#%% plot section
plt.close('all')
plt.figure()
plt.errorbar(r,omega,yerr=omega_err,fmt='mo')
plt.xlabel('xlabel')
plt.ylabel('ylabel')
plt.title('profile averaged from {} ms to {} ms \n shot {}'.format(tidsinterval[0],tidsinterval[1],skud_numre[0]),y=1.05)
plt.grid()
plt.axhline(y=2.45,color='Red')
plt.text(39,2.43,'txt block for horizontal line',backgroundcolor='white')
plt.axvline(x=37.5,color='Purple')
plt.text(37.5,1.2,'txt block for vertical line',ha='center',va="center",rotation='vertical',backgroundcolor='white')
plt.savefig('directory/plot.eps', format='eps')
plt.show()
The variables r, omega, omega_err are vectors of float of small sizes (6 perhaps).
Update: The program I use for opening eps-files is Evince, furthermore, one can download the eps file here https://filedropper.com/d/s/z7lxUCtANeox7tDMQ6dI6HZUpcTfHn. As far as I can see, it is fine sharing files over filedropper via community guidelines, but if I'm wrong please say so.
Found out that it is possible to open the file as long as there is no text contained in the plot (for example x-label,y-label, title and so on), so the problem has to be related to the text.

The short answer is it's your font. The /e glyph is throwing an error on setcachedevice (your PostScript interpreter should have told you this).
The actual problem is that the font program is careless (at least) about it's use of function name. The program contains this:
/mpldict 11 dict def
mpldict begin
/d { bind def } bind def
That creates a new dictionary called mpldict, begins that dictionary (makes it the topmost entry in the dictionary stack) and defines a function called 'd' in that dictionary
We then move on to the font definition, there's a lot of boiler plate in here, but each character shape is defined by an entry in the font's CharStrings dictionary, we'll pick that up with the definition of the function called 'd' in the font's CharStrings dictionary.
/d{1300 0 113 -29 1114 1556 sc
930 950 m
930 1556 l
ce} d
(2.60) == flush
/e{1260 0 113 -29 1151 1147 sc
1151 606 m
1151 516 l
305 516 l
313 389 351 293 419 226 c
488 160 583 127 705 127 c
776 127 844 136 910 153 c
977 170 1043 196 1108 231 c
1108 57 l
1042 29 974 8 905 -7 c
836 -22 765 -29 694 -29 c
515 -29 374 23 269 127 c
165 231 113 372 113 549 c
113 732 162 878 261 985 c
360 1093 494 1147 662 1147 c
813 1147 932 1098 1019 1001 c
1107 904 1151 773 1151 606 c
967 660 m
966 761 937 841 882 901 c
827 961 755 991 664 991 c
561 991 479 962 417 904 c
356 846 320 764 311 659 c
967 660 l
ce} d
Notice that what this does is create a new definition of a function named 'd' in the current dictionary. That's not a problem in itself. We now have two functions named 'd'; one in the current dictionary (the font's CharStrings dictionary) and one in 'mpldict'.
Then we define the next character:
/e{1260 0 113 -29 1151 1147 sc
1151 606 m
1151 516 l
305 516 l
313 389 351 293 419 226 c
488 160 583 127 705 127 c
776 127 844 136 910 153 c
977 170 1043 196 1108 231 c
1108 57 l
1042 29 974 8 905 -7 c
836 -22 765 -29 694 -29 c
515 -29 374 23 269 127 c
165 231 113 372 113 549 c
113 732 162 878 261 985 c
360 1093 494 1147 662 1147 c
813 1147 932 1098 1019 1001 c
1107 904 1151 773 1151 606 c
967 660 m
966 761 937 841 882 901 c
827 961 755 991 664 991 c
561 991 479 962 417 904 c
356 846 320 764 311 659 c
967 660 l
ce} d
Now, the last thing we do at the end of defining that character shape (for the character named 'e') is call a function named 'd'. But there are two, which one do we call ? The answer is that we work backwards down the dictionary stack looking in each dictionary to see if it has a function called 'd' and we use the first one we find. The current dictionary is the font's CharStrings dictionary, and it has a function called 'd' (which defines the 'd' character) so we call that.
And that function then tries to use setcachedevice. That operator is not legal except when executing a character description, which we are not doing, so it throws an undefined error.
Now your PostScript interpreter should tell you there is an error (Ghostscript, for example, does so). Because there is an error the interpreter stops and doesn't draw anything further, which is why you get a blank page.
What can you do about this ? Well you could raise a bug report with the creating application (apparently Matplotlib created the font too). This is not a good way to define a font!
Other than that, well frankly the only thing you can do is search and replace through the file. If you look for occurrences of ce} d and replace them with ce}bind def it'll probably work. This time.

Related

sklearn StandardScaler doesn't seem to be working properly

I am trying to normalise my data so that it will be normally distributed needed for a later hypothesis test. The data I am trying to normalise, points, is as such:
P100m Plj Psp Phj P400m P110h Ppv Pdt Pjt P1500
0 938 1061 773 859 896 911 880 732 757 752
1 839 975 870 749 887 878 880 823 863 741
2 814 866 841 887 921 939 819 778 884 691
3 872 898 789 878 848 879 790 790 861 804
4 892 913 742 803 816 869 1004 789 854 699
... ... ... ... ... ... ... ... ... ...
7963 755 760 604 714 812 794 482 571 539 780
7964 830 845 524 767 786 783 601 573 562 535
7965 819 804 653 840 791 699 659 461 448 632
7966 804 720 539 758 830 782 731 487 425 729
7967 687 809 692 714 565 741 804 527 738 523
I am using sklearn.preprocessing.StandardScaler() and my code is as follows:
scaler = preprocessing.StandardScaler()
scaler.fit(points)
points_norm = scaler.transform(points)
points_norm_df = pd.DataFrame(points_norm, columns = ['P100m', 'Plj', 'Psp', 'Phj', 'P400m',
'P110h', 'Ppv', 'Pdt', 'Pjt','P1500'])
The strange part is that I am running an Anderson-Darling normality test from scipy.stats.anderson and the result is that it is very far from a normal distribution.
I am not the most proficient statistician. Am I misunderstanding what I am doing here or is it a problem with my code/data?
Any help would be greatly appreciated
The StandardScaler does not claim to make the data have a normal distribution rather than to Standardize so that your data will have zero mean and unit variance.
From the documentation:
Standardize features by removing the mean and scaling to unit variance
The standard score of a sample x is calculated as z = (x - u) / s
where u is the mean of the training samples or zero if
with_mean=False, and s is the standard deviation of the training
samples or one if with_std=False.
As gilad already pointed out the StandardScaler is standardizing your data.
You can find a list of methods here for preprocessing: https://scikit-learn.org/stable/modules/preprocessing.html
Are you searching for:
6.3.2.1. Mapping to a Uniform distribution
QuantileTransformer and quantile_transform provide a non-parametric
transformation to map the data to a uniform distribution with values
between 0 and 1
this would work somehow like this:
quantile_transformer = preprocessing.QuantileTransformer(random_state=0)
points_norm = quantile_transformer.fit_transform(points)

How can I read a file having different column for each rows?

my data looks like this.
0 199 1028 251 1449 847 1483 1314 23 1066 604 398 225 552 1512 1598
1 1214 910 631 422 503 183 887 342 794 590 392 874 1223 314 276 1411
2 1199 700 1717 450 1043 540 552 101 359 219 64 781 953
10 1707 1019 463 827 675 874 470 943 667 237 1440 892 677 631 425
How can I read this file structure in python? I want to extract a specific column from rows. For example, If I want to extract value in the second row, second column, how can I do that? I've tried 'loadtxt' using data type string. But it requires string index slicing, so that I could not proceed because each column has different digits. Moreover, each row has a different number of columns. Can you guys help me?
Thanks in advance.
Use something like this to split it
split2=[]
split1=txt.split("\n")
for item in split1:
split2.append(item.split(" "))
I have stored given data in "data.txt". Try below code once.
res=[]
arr=[]
lines = open('data.txt').read().split("\n")
for i in lines:
arr=i.split(" ")
res.append(arr)
for i in range(len(res)):
for j in range(len(res[i])):
print(res[i][j],sep=' ', end=' ', flush=True)
print()

Line plot using matplotlib for a dataframe of 200 columns

I am struggling to plot line graph for a data Frame I have . The data frame consist of 200 column and approximately 4900 rows.
It head of my dataframe looks as follows,
Geneid pool16.1 pool18.13 pool14.11 pool15.6 pool15.2 pool17.1 pool14.16 pool14.9 pool15.10 ... pool3.13 pool2.3 pool4.7 pool1.16 pool3.14 pool1.14 pool2.14 pool8.7 pool9.15 pool10.11
0 ABL1.exon1 1073 594 901 1164 1117 1681 1516 914 1002
... 1471 1086 1032 1600 1023 1203 1465 546 650 947
1 ABL1.exon2 974 549 738 1006 1057 1463 1463 783 1334 ... 1288 1095 967 1474 881 1134 1326 595 505 912
2 ABL1.exon3 701 619 471 748 732 1043 1145 531 935 ... 1031 871 702 1206 771 985 1236 301 301 710
3 ABL1.exon4 555 225 371 586 559 842 830 402 636 ... 831 621 555 887 575 726 936 359 238 556
4 ABL1.exon5 1063 524 817 1085 1086 1624 1448 843 1368 ... 1523 1234 1185 1883 1025 1387 1655 732 581 882
5 rows × 199 columns
So I wanted to plot a line graph from the above dataFrame, here is what I tried,When a small part of the data frame is used to plot,
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
CDKN2A= All_pools[All_pools.Geneid.str.contains("CDKN2A") == True]
CDKN2A.plot.line(figsize=(15,10),x='Geneid',y='value')
Which gives plot which looks like this,
Where I have no information about the first column on x axis and and the plot no informative. I aiming to plot something which looks like this,
Still the plot look so screwed no much informative...Any suggestions to make it look better would be great..
If you want to see dates in the x axis you need a column with dates in your dataframe.
If you want to see several lines you need to plot several columns.
If you want to see an actual, distinct line, you need to plot less rows or more autocorrelated data or both.
You already have information about the first column on x axis: the column name and some values labelled, as usual.

Output formatting based on user parameters

250 255 260 265 270 275 280 285 290 295 300 305 310 315 320
325 330 335 340 345 350 355 360 365 370 375 380 385 390 395
400 405 410 415 420 425 430 435 440 445 450 455 460 465 470
475 480 485 490 495 500 505 510 515 520 525 530 535 540 545
550 555 560 565 570 575 580 585 590 595 600 605 610 615 620
625 630 635 640 645 650 655 660 665 670 675 680 685 690 695
700 705 710 715 720 725 730 735 740 745 750
I am trying to create a program where you enter in a lower bound (250 in this case), and upper bound(750 in this case, a skip number (5 in this case) and also tell Python how many numbers I want per line (15 in this case) before breaking the line, can somehow help me go about with formatting this? I have Python 3.1.7. Thanks!
Basically how do I tell python to create 15 numbers per line before breaking, thanks.
Use range to create a list of your numbers from 250 to 750 in steps of 5. Loop over this using enumerate which gives the value and a counter. Check modulo 15 of the counter to determine if a line break needs to be included in the print. The counter starts at zero so look for cnt%15==14.
import sys
for cnt,v in enumerate(range(250,755,5)):
if cnt%15==14:
sys.stdout.write("%d\n" % v)
else:
sys.stdout.write("%d " % v)
#ljk07
xrange is better - does not create whole list at once
You do not really need sys.stdout - just add comma
New style formatting is better
how about a little more compact and more contemporary
for cnt,v in enumerate(range(250,755,5),1):
print '{}{}'.format(v,' ' if cnt % 15 else '\n'),
EDIT:
Missed python 3 part - print even better here
for cnt,v in enumerate(range(250,755,5),1):
print( '{:3}'.format(v),end=' ' if cnt % 15 else '\n')
Another option to through in the mix is to use the itertools.grouper recipe:
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
from itertools import izip_longest
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
Then the loop becomes:
for line in grouper(15, range(250, 755, 5), ''):
print(' '.join(map(str, line)))
You should keep a counter and check on the modulo of the counter if you have to print a new line.
Could look something like this:
if counter % 15 == 0:
sys.stdout.write(str("\n"))

Pandas FloatingPoint Error

I'm getting a floating point error on a simple time series in pandas. I'm trying to do shift operations... but this also happens with the window functions like rolling_mean.
EDIT: For some more info... I tried to actually build this from source yesterday prior to the error. I'm not sure if the error would've occurred prior the build attempt, as I'd never messed around w/ these functions.
EDIT2: I thought I'd fixed this, but when I run this inside python it works, but when it's in ipython I get the error.
EDIT3: Numpy 1.7.0, iPython 0.13, pandas 0.7.3
In [35]: ts = Series(np.arange(12), index=DateRange('1/1/2000', periods=12, freq='T'))
In [36]: ts.shift(0)
Out[36]:
2000-01-03 0
2000-01-04 1
2000-01-05 2
2000-01-06 3
2000-01-07 4
2000-01-10 5
2000-01-11 6
2000-01-12 7
2000-01-13 8
2000-01-14 9
2000-01-17 10
2000-01-18 11
In [37]: ts.shift(1)
Out[37]: ---------------------------------------------------------------------------
FloatingPointError Traceback (most recent call last)
/Users/trenthauck/Repository/work/SQS/analysis/campaign/tv2/data/<ipython-input-37-2b7cec97d440> in <module>()
----> 1 ts.shift(1)
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/displayhook.pyc in __call__(self, result)
236 self.start_displayhook()
237 self.write_output_prompt()
--> 238 format_dict = self.compute_format_data(result)
239 self.write_format_data(format_dict)
240 self.update_user_ns(result)
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/displayhook.pyc in compute_format_data(self, result)
148 MIME type representation of the object.
149 """
--> 150 return self.shell.display_formatter.format(result)
151
152 def write_format_data(self, format_dict):
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/formatters.pyc in format(self, obj, include, exclude)
124 continue
125 try:
--> 126 data = formatter(obj)
127 except:
128 # FIXME: log the exception
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/formatters.pyc in __call__(self, obj)
445 type_pprinters=self.type_printers,
446 deferred_pprinters=self.deferred_printers)
--> 447 printer.pretty(obj)
448 printer.flush()
449 return stream.getvalue()
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/lib/pretty.pyc in pretty(self, obj)
353 if callable(obj_class._repr_pretty_):
354 return obj_class._repr_pretty_(obj, self, cycle)
--> 355 return _default_pprint(obj, self, cycle)
356 finally:
357 self.end_group()
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
473 if getattr(klass, '__repr__', None) not in _baseclass_reprs:
474 # A user-provided repr.
--> 475 p.text(repr(obj))
476 return
477 p.begin_group(1, '<')
/Library/Python/2.7/site-packages/pandas/core/series.pyc in __repr__(self)
696 result = self._get_repr(print_header=True,
697 length=len(self) > 50,
--> 698 name=True)
699 else:
700 result = '%s' % ndarray.__repr__(self)
/Library/Python/2.7/site-packages/pandas/core/series.pyc in _get_repr(self, name, print_header, length, na_rep, float_format)
756 length=length, na_rep=na_rep,
757 float_format=float_format)
--> 758 return formatter.to_string()
759
760 def __str__(self):
/Library/Python/2.7/site-packages/pandas/core/format.pyc in to_string(self)
99
100 fmt_index, have_header = self._get_formatted_index()
--> 101 fmt_values = self._get_formatted_values()
102
103 maxlen = max(len(x) for x in fmt_index)
/Library/Python/2.7/site-packages/pandas/core/format.pyc in _get_formatted_values(self)
90 return format_array(self.series.values, None,
91 float_format=self.float_format,
---> 92 na_rep=self.na_rep)
93
94 def to_string(self):
/Library/Python/2.7/site-packages/pandas/core/format.pyc in format_array(values, formatter, float_format, na_rep, digits, space, justify)
431 justify=justify)
432
--> 433 return fmt_obj.get_result()
434
435
/Library/Python/2.7/site-packages/pandas/core/format.pyc in get_result(self)
528
529 # this is pretty arbitrary for now
--> 530 has_large_values = (np.abs(self.values) > 1e8).any()
531
532 if too_long and has_large_values:
FloatingPointError: invalid value encountered in absolute
In [38]: ts.shift(-1)
Out[38]: ---------------------------------------------------------------------------
FloatingPointError Traceback (most recent call last)
/Users/myusername/Repository/work/SQS/analysis/campaign/tv2/data/<ipython-input-38-314ec815a7c5> in <module>()
----> 1 ts.shift(-1)
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/displayhook.pyc in __call__(self, result)
236 self.start_displayhook()
237 self.write_output_prompt()
--> 238 format_dict = self.compute_format_data(result)
239 self.write_format_data(format_dict)
240 self.update_user_ns(result)
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/displayhook.pyc in compute_format_data(self, result)
148 MIME type representation of the object.
149 """
--> 150 return self.shell.display_formatter.format(result)
151
152 def write_format_data(self, format_dict):
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/formatters.pyc in format(self, obj, include, exclude)
124 continue
125 try:
--> 126 data = formatter(obj)
127 except:
128 # FIXME: log the exception
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/core/formatters.pyc in __call__(self, obj)
445 type_pprinters=self.type_printers,
446 deferred_pprinters=self.deferred_printers)
--> 447 printer.pretty(obj)
448 printer.flush()
449 return stream.getvalue()
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/lib/pretty.pyc in pretty(self, obj)
353 if callable(obj_class._repr_pretty_):
354 return obj_class._repr_pretty_(obj, self, cycle)
--> 355 return _default_pprint(obj, self, cycle)
356 finally:
357 self.end_group()
/Library/Python/2.7/site-packages/ipython-0.13.dev-py2.7.egg/IPython/lib/pretty.pyc in _default_pprint(obj, p, cycle)
473 if getattr(klass, '__repr__', None) not in _baseclass_reprs:
474 # A user-provided repr.
--> 475 p.text(repr(obj))
476 return
477 p.begin_group(1, '<')
/Library/Python/2.7/site-packages/pandas/core/series.pyc in __repr__(self)
696 result = self._get_repr(print_header=True,
697 length=len(self) > 50,
--> 698 name=True)
699 else:
700 result = '%s' % ndarray.__repr__(self)
/Library/Python/2.7/site-packages/pandas/core/series.pyc in _get_repr(self, name, print_header, length, na_rep, float_format)
756 length=length, na_rep=na_rep,
757 float_format=float_format)
--> 758 return formatter.to_string()
759
760 def __str__(self):
/Library/Python/2.7/site-packages/pandas/core/format.pyc in to_string(self)
99
100 fmt_index, have_header = self._get_formatted_index()
--> 101 fmt_values = self._get_formatted_values()
102
103 maxlen = max(len(x) for x in fmt_index)
/Library/Python/2.7/site-packages/pandas/core/format.pyc in _get_formatted_values(self)
90 return format_array(self.series.values, None,
91 float_format=self.float_format,
---> 92 na_rep=self.na_rep)
93
94 def to_string(self):
/Library/Python/2.7/site-packages/pandas/core/format.pyc in format_array(values, formatter, float_format, na_rep, digits, space, justify)
431 justify=justify)
432
--> 433 return fmt_obj.get_result()
434
435
/Library/Python/2.7/site-packages/pandas/core/format.pyc in get_result(self)
528
529 # this is pretty arbitrary for now
--> 530 has_large_values = (np.abs(self.values) > 1e8).any()
531
532 if too_long and has_large_values:
FloatingPointError: invalid value encountered in absolute
I would add this as a comment, but I don't have the privilege to do that yet :)
It works for me in python and iPython 0.12; iPython 0.13 is still in development (see http://ipython.org/ ), and, since the errors you're getting seem to involve formatting in the iPython 0.13 egg, I suspect that might be the cause. Try with iPython 0.12 instead-- if it works, file a bug report with iPython and then probably stick with 0.12 until 0.13 is (more) stable.

Categories