How to parse of a simple csv file?

How to parse of a simple csv file? - python

I have a simple CSV-file like this:
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Note ,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32
1,,,,,X,,,,,,,,X,,,,,,,,X,,,,,,,,X,,,
2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
I need to parse it into a 2D array containing 1-s where X are and 0 otherwise, ignoring the headers/extra rows.
After reading the docs on the csv module I wrote a simple script like so:
import csv
csvfile = open('input.csv', 'rb')
reader = csv.reader(csvfile,dialect='excel', delimiter=' ', quotechar='|')
data = []
rowCount = 0
for row in reader:
if(rowCount > 2): #skip first 3 rows (2 empty and 1 label)
dataRow = []
for i in xrange(1,len(row[0])):#skip 1st label column
dataRow.append(1 if row[0][i] == 'X' else 0) #append 1s for X, 0s otherwise
data.append(dataRow)
rowCount += 1
print data
This gives me the expected output:
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
for
,,,,,X,,,,,,,,X,,,,,,,,X,,,,,,,,X,,,
The ternary condition could be written as ord(row[0][i])/88, but would it be possible to map the each string row to an integer row of ones and zeros?
Is there's a more 'pythonic' way of writing this ?

You should use delimiter=',':
reader = csv.reader(csvfile, dialect='excel', delimiter=',', quotechar='|')
Actually:
dialect='excel', delimiter=',' are defaults and quotechar='|' is not need for your example file (keep it if needed). So this is shorter:
reader = csv.reader(csvfile)
Throw away the first three lines:
[next(reader) for _ in range(3)]
Read all lines:
data = [[1 if entry=='X' else 0 for entry in row[1:]] for row in reader]
This equivalent to:
data = []
for row in reader:
data.append([1 if entry=='X' else 0 for entry in row[1:]])
Of course, open your file with automatic closing after dedent:
with open('input.csv', 'rb‘) as csvfile:
# Put the rest of the algorithm here.
# The file is closed automatically just because continuing detended.
This is the prime example of a context manager.
Putting it all together:
import csv
with open('input.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
[next(reader) for _ in range(3)]
data = [[1 if entry=='X' else 0 for entry in row[1:]] for row in reader]

Firstly, skipping three rows can be done like:
for _ in range(3):
next(reader)
then you can use a list comprehension on the rest:
data = [[int(cell == 'X') for cell in line[1:]] for line in reader]
This gives you a list of lists:
>>> data
[[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
If efficiency is important and the lines are long, using itertools.islice lets you slice each line without creating a new list.
Note that your delimiter and quotechar settings don't seem to match with the example file, so you might want to double-check that.

one thing i want to add to #jonrsharpe answer.
Pythonic way of using a file is given below.
This will take care of closing the file for you when your calculation is complete
with open('input.csv', 'rb') as csvfile:
//use the csvfile here

A: Try to avoid overheads ... a smart-enough reader comes from numpy.genfromtxt()
DATA = np.genfromtxt( aFH, # aFH = open( <aFileNAME>, "r" )
#kiprows = 1, # DeprecationWarning: The use of `skiprows` is deprecated, it will be removed in numpy 2.0.
skip_header = 3, # twice "..." + "Note,1,2,3.."
delimiter = ",",
converters = { 1: lambda aString: mPlotDATEs.date2num( datetime.datetime.strptime( aString[1:-1], "%m/%d/%y %H:%M" ) ),
0: lambda aString: float( aString[1:-1] )
} # left as an example of powers the genfromtxt()'s inline conversters create
)
print "DATA has shape of ", DATA.shape
aFH.close()
converters ( asDict ) are the key ( per-column specified ) -- i.e. may post-process X-s into 1-s and isBlank-s into 0-s
+ Other benefits
Any serious number crunching is, be it visible or not, based on numpy. So the imported modules' foot-print does not go as far as if importing pandas ( very smart and very powerful ) just to import rows of text without any further use of the advanced DataFrame capabilities.
Additionally, you may benefit from other, smarter numpy features, if your DATA.size grows bigger and bigger. As seen from your sample piece of data, this is a typical situation to use SPARSE-MATRICEs, where you do not pay immense-overhead costs associated with handling almost-empty-cells ( 0-s ).
This may bring you even a much more memory-compact solution, than the bool-ean-MASK array, which still spends more than a few bits on 0-cell-s in the fully-meshed ( DENSE-MATRIX ).
Q: Is there a more-python-ic approach?
Hope this will not fire off a war of flames. So as a minimalistic view, let's assume that for-loops were avoided ( first plus - both formally & performance-wise ), a one-liner-populists may be also happy as the whole import process takes one line ( well a bit talkative one, but still one ... SLOC-line if anyone seriously cares :o) ). Finally the lambda-inliners are so smart and so powerful python-ic non-plus-ultra ( get in love with 'em as they both give you immense powers & are so, so, so python-ic ( though in principle these "Chuck Norrises"-of-code originated in the LISP generation of science about efficient software design in late 50-s of the previous century) )

Related

Torch tensor randomly contains huge, impossible values

I'm new to PyTorch and I'm trying to debug my code using IntelliJ PyCharm. I have a line that logs the content of a torch.IntTensor
logger.debug(f"action_tensor = {action_tensor}")
Most of the time this seems to work just fine, but occasionally the print out shows one or several huge values in the tensor, such as:
2021-08-06 09:21:17,737 DEBUG main.py state_tensor = tensor([2089484293, 0, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 0,
0, 0, 0, 0, 0, 6,
0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0],
dtype=torch.int32)
The tensor is created by extracting a few values from the state of an object
rolls = int(self.rolls)
allowed = [int(self.scorecard[c]["allowed"] == True) for c in self.scorecard]
scored = [int(self.scorecard[c]["score"]) if self.scorecard[c]["score"] else int(0) for c in self.scorecard]
return torch.cat([torch.IntTensor(rolls),
torch.IntTensor(allowed),
torch.IntTensor(scored)])
I've checked multiple times, and there is no way any of these values are as large as the example above (e.g. 2089484293). I've tried just creating a numpy array instead of a tensor, and print that shows no problems. I'm suspecting there is something I don't know about how torch.IntTensor.
What is wrong with the way I create my tensor that results in these huge values appearing sometimes?

Can one hardcode convolutional filters to detect characters in a CNN?

In Pytorch, you can hardcode your filters to be whatever you like.
At the moment, I'm doing text detection and I need to identify the location of a certain information. This information always starts with the letter 'X'. Could this radically improve detection performance if I hardcode the 'X' filter?
Here's what I have so far:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
kernel = (torch.zeros((9, 9)) + \
torch.eye(9) + \
torch.rot90(torch.eye(9))).type(torch.bool)*1
print(kernel)
tensor([[1, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0, 0, 1]])
We can visualize it like this:
plt.imshow(kernel)
plt.show()
Then, we can set the filter weights as such:
conv = nn.Conv2d(in_channels=1,
out_channels=1,
kernel_size=3,
stride=3,
bias=None)
conv.weight.data = kernel

No, I do not think this will improve detection performance.
Detection performance is usually known as "inference," that is, it is the process of running your network on new data where the training labels are unknown. Hard-coding the weights will make absolutely no difference on the test performance of the network, as you still need to compute the convolutions.
We could also ask if it will improve the training performance. Here, too, I expect that the answer is no. One of the reasons that neural networks achieve the high accuracy they do is that they pick up on subtle patterns in the training data. A real x on a real page is very unlikely to align with the pixels you set to 1 in your example. Slight rotations or sub-pixel shifts or even different aspect ratios of the letter will change what the optimal filter will look like.
Indeed, one of the major changes in Machine Learning as we move into the Deep Learning era is that neural networks do a better job of picking the low-level features than a human engineer could do.
But thank you for the question -- just the code snippet of how to hard-code the value of a layer was useful to me!

How to plot eigenvalues representing symbolic functions in Python?

I need to calculate the eigenvalues of an 8x8-matrix and plot each of the eigenvalues for a symbolic variable occuring in the matrix. For the matrix I'm using I get 8 different eigenvalues where each is representing a function in "W", which is my symbolic variable.
Using python I tried calculating the eigenvalues with Scipy and Sympy which worked kind of, but the results are stored in a weird way (at least for me as a newbie not understanding much of programming so far) and I didn't find a way to extract just one eigenvalue in order to plot it.
import numpy as np
import sympy as sp
W = sp.Symbol('W')
w0=1/780
wl=1/1064
# This is my 8x8-matrix
A= sp.Matrix([[w0+3*wl, 2*W, 0, 0, 0, np.sqrt(3)*W, 0, 0],
[2*W, 4*wl, 0, 0, 0, 0, 0, 0],
[0, 0, 2*wl+w0, np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W],
[0, 0, np.sqrt(3)*W, 3*wl, 0, 0, 0, 0],
[0, 0, 0, 0, wl+w0, np.sqrt(2)*W, 0, 0],
[np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W, 2*wl, 0, 0],
[0, 0, 0, 0, 0, 0, w0, W],
[0, 0, np.sqrt(2)*W, 0, 0, 0, W, wl]])
# Calculating eigenvalues
eva = A.eigenvals()
evaRR = np.array(list(eva.keys()))
eva1p = evaRR[0] # <- this is my try to refer to the first eigenvalue
In the end I hope to get a plot over "W" where the interesting range is [-0.002 0.002]. For the ones interested it's about atomic physics and W refers to the rabi frequency and I'm looking at so called dressed states.

You're not doing anything incorrectly -- I think you're just caught up since your eigenvalues look so jambled and complicated.
import numpy as np
import sympy as sp
import matplotlib.pyplot as plt
W = sp.Symbol('W')
w0=1/780
wl=1/1064
# This is my 8x8-matrix
A= sp.Matrix([[w0+3*wl, 2*W, 0, 0, 0, np.sqrt(3)*W, 0, 0],
[2*W, 4*wl, 0, 0, 0, 0, 0, 0],
[0, 0, 2*wl+w0, np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W],
[0, 0, np.sqrt(3)*W, 3*wl, 0, 0, 0, 0],
[0, 0, 0, 0, wl+w0, np.sqrt(2)*W, 0, 0],
[np.sqrt(3)*W, 0, 0, 0, np.sqrt(2)*W, 2*wl, 0, 0],
[0, 0, 0, 0, 0, 0, w0, W],
[0, 0, np.sqrt(2)*W, 0, 0, 0, W, wl]])
# Calculating eigenvalues
eva = A.eigenvals()
evaRR = np.array(list(eva.keys()))
# The above is copied from your question
# We have to answer what exactly the eigenvalue is in this case
print(type(evaRR[0])) # >>> Piecewise
# Okay, so it's a piecewise function (link to documentation below).
# In the documentation we see that we can use the .subs method to evaluate
# the piecewise function by substituting a symbol for a value. For instance,
print(evaRR[0].subs(W, 0)) # Will substitute 0 for W
# This prints out something really nasty with tons of fractions..
# We can evaluate this mess with sympy's numerical evaluation method, N
print(sp.N(evaRR[0].subs(W, 0)))
# >>> 0.00222190090611143 - 6.49672880062804e-34*I
# That's looking more like it! Notice the e-34 exponent on the imaginary part...
# I think it's safe to assume we can just trim that off.
# This is done by setting the chop keyword to True when using N:
print(sp.N(evaRR[0].subs(W, 0), chop=True)) # >>> 0.00222190090611143
# Now let's try to plot each of the eigenvalues over your specified range
fig, ax = plt.subplots(3, 3) # 3x3 grid of plots (for our 8 e.vals)
ax = ax.flatten() # This is so we can index the axes easier
plot_range = np.linspace(-0.002, 0.002, 10) # Range from -0.002 to 0.002 with 10 steps
for n in range(8):
current_eigenval = evaRR[n]
# There may be a way to vectorize this computation, but I'm not familiar enough with sympy.
evaluated_array = np.zeros(np.size(plot_range))
# This will be our Y-axis (or W-value). It is set to be the same shape as
# plot_range and is initally filled with all zeros.
for i in range(np.size(plot_range)):
evaluated_array[i] = sp.N(current_eigenval.subs(W, plot_range[i]),
chop=True)
# The above line is evaluating your eigenvalue at a specific point,
# approximating it numerically, and then chopping off the imaginary.
ax[n].plot(plot_range, evaluated_array, "c-")
ax[n].set_title("Eigenvalue #{}".format(n))
ax[n].grid()
plt.tight_layout()
plt.show()
And as promised, the Piecewise documentation.

Storing planetdata for a solar system program in python

I've been creating a solar system simulation as a project for fun and practice in python. The problem I'm facing is that storing the data for the planets in the .py itself is getting rather hectic. Example:
#shaped as: name, parent, type, size, orbital radius (AU), x, y, r, t, hidden, theta, orbitalperiod (y), color
#for type 0=sun, 1=planet, 2=moon, 3=asteroid (unused)
#x, y, r and t start as 0 and get assigned values later on. hidden is 0 or 1, if obscured by body
solsystem = [('sun', 'none', 0, 20, 0, centerx, centery, 0, 0, 0, 0, 0, (255, 255, 0)),
('earth', 'sun', 1, 2, 1, 0, 0, 0, 0, 0, 0, 1, (0, 0, 255)),
('luna', 'earth', 1, 1, 0.04, 0, 0, 0, 0, 0, 0, 0.075, (169,169,169)), #actual radius is 0.00254
('venus', 'sun', 1, 2, 0.675, 0, 0, 0, 0, 0, 0, 0.616, (255,255,0)),
('mercury', 'sun', 1, 2, 0.387, 0, 0, 0, 0, 0, 0, 0.24, (169,169,169)),
('mars', 'sun', 1, 2, 1.524, 0, 0, 0, 0, 0, 0, 1.88, (255, 0, 0)),
('jupiter', 'sun', 1, 4, 5.20, 0, 0, 0, 0, 0, 0, 11.86, (255, 0, 0)),
('io', 'jupiter', 1, 1, 0.08, 0, 0, 0, 0, 0, 0, 0.00484, (169,169,169)), #different radiuses for moons to keep visibility
('europa', 'jupiter', 1, 1, 0.12, 0, 0, 0, 0, 0, 0, 0.0097, (169,169,169)),
('ganymede', 'jupiter', 1, 1, 0.16, 0, 0, 0, 0, 0, 0, 0.0195, (169,169,169)),
('callisto', 'jupiter', 1, 1, 0.2, 0, 0, 0, 0, 0, 0, 0.0456, (169,169,169))]
This is what I'm currently add, and I plan on adding asteroids, more planets and moons, and all that stuff... What would be a better way to do this? To store the data in a more organized way, something like a spreadsheet perhaps so I could easily add more values if needed.
For reference, the full code: https://pastebin.com/L8n23bLt (It's working pretty decently but there's quite a few kinks and bugs I still want to work out. Any tips on stuff I'm doing wrong here are appreciated too!)

it's OK to store values like that if it's OK for you and your project.
I prefer to use library configparser or json files.
ConfigParser vs JSON files for config
Here is short example from my project:
def get_current_scenario_number():
"""
Get scenario number from temporary config.
:return:
"""
config = configparser.ConfigParser()
config.read('scenario_data.ini')
scenario_number = config['scenario_data']['scenario_config']
return int(scenario_number)
def set_current_scenario_number(scenario_number):
"""
Change scenario number in temporary config.
:param scenario_number:
:return:
"""
config = configparser.ConfigParser()
config.read('scenario_data.ini')
try:
config['scenario_data']['scenario_config'] = str(scenario_number)
except KeyError:
config.add_section('scenario_data')
config.set('scenario_data', 'scenario_config', str(scenario_number))
with open('scenario_data.ini', 'w') as configfile:
config.write(configfile)

Consider a simple csv or yaml file as the next step. Both will allow you to explicitly name the fields and read the elements as dictionaries. If maintaining a file by hand gets too cumbersome, consider sqlite.

There's a lot of options in this space that work well. At the most simple end of things would be a csv file (a text-based spreadsheet) storing all of your objects, which you could then read in and parse using the csv module of the standard library.
Your CSV file could look like this:
"sun","none",0,20,0,111,111,0,0,0,0,0,"(255,255,0)"
"earth","sun",1,2,1,0,0,0,0,0,0,1,"(0,0,255)"
"luna","earth",1,1,0.04,0,0,0,0,0,0,0.075,"(169,169,169)"
...
And the reading code like this:
import csv, ast
with open('test.csv') as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC)
solsystem = []
for row in reader:
solsystem.append(row)
solsystem[-1][12] = ast.literal_eval(solsystem[-1][12])
A similar alternative is to format your files using JSON, YAML, or XML, and read in and parse with with the json or xml modules of the standard library.
At the more complex end of the space are full relational (and non-relational) databases. Your use case, plus the fact that there is a sqlite3 module in the standard library makes sqlite a good potential option in this space.

Making a checkers board in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am trying to make a 2D array that is 8x8 for a checkers game in python. How would I go about doing this?
Here is my current code:
class Board():
board = [[]]
def __init__(self,width,height):
self.width = width
self.height = height
def __repr__(self):
print(self.board)
def setup(self):
for y in range(self.height):
for x in range(self.width):
self.board[y].append(0)
board = Board(8,8)
board.setup()
print(board.board)

board = [[0]*8 for i in range(8)] # This makes you 8x8 list
>>>[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]
def setup(self):
board = [[0]*self.height for i in range(self.width)]
You only change 8's with your instance attributes(self.heigth,self.width)

At the point where your code does
self.board[y].append(0)
self.board has only one element, so for y>0 this will fail. You need to make self.board contain not one empty list but self.height empty lists for this to work.
I am not going into more detail because, as one commenter has mentioned, this sounds a lot like homework and in such cases it's best for everyone not to fill in all the details.

Nested lists can sometimes be difficult to work with. If you don't absolutely need a 2D list, I recommend using a dict. Creating a 2D array with a dict is easy. You can use a tuple of (row,column) as the index.
For example:
board = {}
for row in range(8):
for column in range(8):
board[(row, column)] = 0

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse of a simple csv file? - python

one thing i want to add to #jonrsharpe answer. Pythonic way of using a file is given below. This will take care of closing the file for you when your calculation is complete with open('input.csv', 'rb') as csvfile: //use the csvfile here

Related

Torch tensor randomly contains huge, impossible values

Can one hardcode convolutional filters to detect characters in a CNN?

How to plot eigenvalues representing symbolic functions in Python?

Storing planetdata for a solar system program in python

Making a checkers board in python [closed]

Categories

Resources