I have this CSV that I have to modify using Python.
The number of files varies each time. The input CSV files i have are just a list of coordinates (x, y, z) and i have to modify the file into a 'model' which contains the same coordinates but also some information/headers.
The model looks like this :
Number | 1 | |
Head | N | E | El
Begin | list | list | list
| . | . | .
| . | . | .
| . | . | .
| . | . | .
End | . | . | .
| . | . | .
BeginR | Ok | |
EndR | | |
The dots are the coordinates that are in the lists.
So far I've managed to write almost everything.
What's left is to write the Begin and the End in the first column.
Because the size of the list varies, I have difficulties to place it where they should be : Begin at the same line with the first coordinates and End at the second to last coordinate line.
This is my updated code :
for i in ficList:
with open(i, newline='') as f:
reader = csv.reader(f, delimiter = ';')
next(reader) # skip the header
for row in reader:
coord_x.append(row[0]) # X
coord_y.append(row [1]) # Y
coord_z.append(row[2]) # Z
list_list = [coord_x, coord_y, coord_z] # list of coordinates
len_x = len(coord_x) # length of list
with open(i, 'w', newline='') as fp:
writer = csv.writer(fp, delimiter = ';')
writer.writerow(['Number', number])
writer.writerow(['Head','N', 'E', 'El'])
for l in range(len_x):
if l == 0:
writer.writerow(['Begin',list_list[0][l], list_list[1][l], list_list[2][l]])
if l == len_x-2 :
writer.writerow(['End',list_list[0][l], list_list[1][l], list_list[2][l]])
writer.writerow(['',list_list[0][l], list_list[1][l], list_list[2][l]]) # write the coordinates
writer.writerow(['BeginR', 'Ok'])
writer.writerow(['EndR'])
coord_x.clear() # empty list x
coord_y.clear() # empty list y
coord_z.clear() # empty list z
You're probably better off to define the row labels in advance in a map, then look them up for each row. Also list_list is not really needed, you should just stick to the separate vectors:
...
with open(i, 'w', newline='') as fp:
writer = csv.writer(fp, delimiter = ';')
writer.writerow(['Number', number])
writer.writerow(['Head','N', 'E', 'El'])
row_label_map = {0:'Begin',len_x-2:'End'}
for l in range(len_x):
row_label = row_label_map.get(l,"")
writer.writerow([row_label, coord_x[l], coord_y[l], coord_z[l]])
writer.writerow(['BeginR', 'Ok'])
writer.writerow(['EndR'])
...
Also you don't need to clear the vectors coord_x etc. afterwards as they will be deleted when they go out of scope.
With your latest code, I am guessing the issue is because you are first writing the line with BEGIN tag and then without it, move the logic into a if..elif..else part -
for l in range(len_x):
if l == 0:
writer.writerow(['Begin',list_list[0][l], list_list[1][l], list_list[2][l]])
elif l == len_x-2 :
writer.writerow(['End',list_list[0][l], list_list[1][l], list_list[2][l]])
else:
writer.writerow(['',list_list[0][l], list_list[1][l], list_list[2][l]]) # write the coordinates
To me it seems like it would be easier to first modify the input CSV to include and extra column that has the Begin and End tags with sed like this:
sed -e 's/^/,/' -e '1s/^/Begin/' -e '$ s/^/End/' -e 's/^,/ ,/' test.csv
Then you can simply print the columns as they are without having to add logic for when to add the additional tags in python. This assumes that the input CSV is called test.csv
Related
if anyone can help me I need my code to display as such:
Hammad | Won | 5
The code I'm using is:
f = open("Statistics.txt", "a")
f.write(str(player_name) +''+ str(Outcome)+''+str(max_guesses)+"\n"
f = open("Statistics.txt", "r")
print(f.read())
f.close()
I need the output to be:
Hammad | Won | 6
Instead I'm getting:
Hammad Won 6
Python does not add | character automatically while string concatenation, you have to do it manually,
f.write(str(player_name) +' | '+ str(Outcome)+' | '+str(max_guesses)+"\n")
PS : f.write need a closing parenthesis(all functions do)
Try replacing the write line with:
f.write(f'{player_name} | {Outcome} | {max_guesses}\n')
Replace f.write with this
f.write(str(player_name)+'|'+str(Outcome)+'|'+str(max_guesses)+"\n"
I am trying to parse a text document line by line and in doing so I stumbled onto some weird behavior which I believe is caused by the presence of some kind of ankh symbol (☥). I am not able to copy the real symbol here.
In my code I try to determine whether a '+' symbol is present in the first characters of each line. To see if this worked I added a print statement containing a boolean and this string.
The relevant part of my code:
with open(file_path) as input_file:
content = input_file.readlines()
for line in content:
plus = '+' in line[0:2]
print('Plus: {0}, line: {1}'.format(plus,line))
A file I could try to parse:
+------------------------------
row 1 with some content
+------+------+-------+-------
☥+------+------+-------+------
| col 1 | col 2 | col 3 ...
+------+------+-------+-------
|_ valu | val | | dsf |..
|_ valu | valu | ...
What I get as output:
Plus: True, line: +------------------------------
Plus: False, line: row 1 with some content
Plus: True, line: +------+------+-------+-------
♀+------+------+-------+------
Plus: False, line: | col 1 | col 2 | col 3 ...
Plus: True, line: +------+------+-------+-------
Plus: False, line: |_ valu | val | | dsf |..
Plus: False, line: |_ valu | valu | ...
So my question is why does it just print the line containing the symbol without the 'Plus: True/False'. How should I solve this?
Thanks.
What you are seeing is the gender symbol. It is from the original IBM PC character set and is encoded as 0x0c, aka FormFeed, aka Ctrl-L.
If you are parsing text data with these present, they likely were inserted to indicate to a printer to start a new page.
From wikipedia:
Form feed is a page-breaking ASCII control character. It forces the printer to eject the current page and to continue printing at the top of another. Often, it will also cause a carriage return. The form feed character code is defined as 12 (0xC in hexadecimal), and may be represented as control+L or ^L.
My list is laid out like this:
1st name, last name, wins, losses
zac,kop,5,6
jack,mop,0,11
farth,tal,11,0
darth,vader,2,9
nump,kk,1,10
My code is this:
def points():
template = "|{0:30}|{1:30}|{2:30}|{3:30}|{4:30}"
lol = template.format("1st name: ","2nd Name: ", "won: ", "lost: ","points: ")
print(lol)
with open(r'prac2.txt', 'r') as file:
for line in file:
data = line.split(',')
if data[2] >= ('1'):
poin = (int(float(data[2]))) * 3)
add_list = data.insert(4,poin)
print('|{0[0]:<30}|{0[1]:<30}|{0[2]:<30}|{0[3]:<30}|{0[4]:<30}'.format(data))
points()
The code is supposed to print out only the info of the players with at least 1 win.
Each win is 3 points, so the program has to calculate each players points and display it along side their information in a table.
The problem is that when it prints the points its on a newline indented, only the last player on the list isn't effected by this problem. Please help.
The problem is that you don't remove the line end characters from the last element of the line! Right before printing, after you inserted the points, data looks like this:
['zac', 'kop', '5', '6\n', 15]
To fix it, you can add strip to your data = ... line to this:
data = line.strip().split(',')
Some more points:
don't compare strings when you want to compare numbers
skip the header line, e.g. using next(file) before the loop
no point in casting wins to float and then int, just use int
instead of insert(4, ...) just use append to add to the end of the list
if data[2] >= ('1'):
poin = int(float(data[2])) * 3
data[3] = data[3].strip()
add_list = data.insert(4,poin)
The last string in your split line contains a newline. You need to strip() it.
or, when you read the line:
data = line.strip().split(',')
Which gives you what you want:
>>> points()
|1st name: |2nd Name: |won: |lost: |points:
|zac |kop |5 |6 |15
|farth |tal |11 |0 |33
|darth |vader |2 |9 |6
|nump |kk |1 |10 |3
Also relevant:
template = "|{0:<30}|{1:<30}|{2:<30}|{3:<30}|{4:<30}"
...
print(template.format(*data))
works and actually uses the template you designed.
*data unpacks the list into its individual components, so you don't have to subscript them in, equivalent, in this case, to doing:
print(template.format(data[0], data[1], data[2], data[3], data[4]))
At least one of your problems is data type:
if data[2] >= ('1'):
Here you have two strings and are comparing them lexicographically. You really want two numbers and to compare them arithmetically.
Without any error handling, that would look like this:
if int(data[2]) >= 1:
Whats an easy way convert the output of Python Pretty table to grammatically usable format such as CSV.
The output looks like this :
C:\test> nova list
spu+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+
| 6bca09f8-a320-44d4-a11f-647dcec0aaa1 | tester | ACTIVE | - | Running | OpenStack-net=10.0.0.1, 10.0.0.3 |
+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+
Perhaps this will get you close:
nova list | grep -v '\-\-\-\-' | sed 's/^[^|]\+|//g' | sed 's/|\(.\)/,\1/g' | tr '|' '\n'
This will strip the --- lines
Remove the leading |
Replace all but the last | with ,
Replace the last | with \n
Here's a real ugly one-liner
import csv
s = """\
spu+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+
| 6bca09f8-a320-44d4-a11f-647dcec0aaa1 | tester | ACTIVE | - | Running | OpenStack-net=10.0.0.1, 10.0.0.3 |
+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+"""
result = [tuple(filter(None, map(str.strip, splitline))) for line in s.splitlines() for splitline in [line.split("|")] if len(splitline) > 1]
with open('output.csv', 'wb') as outcsv:
writer = csv.writer(outcsv)
writer.writerows(result)
I can unwrap it a bit to make it nicer:
splitlines = s.splitlines()
splitdata = line.split("|")
splitdata = filter(lambda line: len(line) > 1, data)
# toss the lines that don't have any data in them -- pure separator lines
header, *data = [[field.strip() for field in line if field.strip()] for line in splitdata]
result = [header] + data
# I'm really just separating these, then re-joining them, but sometimes having
# the headers separately is an important thing!
Or possibly more helpful:
result = []
for line in s.splitlines():
splitdata = line.split("|")
if len(splitdata) == 1:
continue # skip lines with no separators
linedata = []
for field in splitdata:
field = field.strip()
if field:
linedata.append(field)
result.append(linedata)
#AdamSmith's answer has a nice method for parsing the raw table string. Here are a few additions to turn it into a generic function (I chose not to use the csv module so there are no additional dependencies)
def ptable_to_csv(table, filename, headers=True):
"""Save PrettyTable results to a CSV file.
Adapted from #AdamSmith https://stackoverflow.com/questions/32128226
:param PrettyTable table: Table object to get data from.
:param str filename: Filepath for the output CSV.
:param bool headers: Whether to include the header row in the CSV.
:return: None
"""
raw = table.get_string()
data = [tuple(filter(None, map(str.strip, splitline)))
for line in raw.splitlines()
for splitline in [line.split('|')] if len(splitline) > 1]
if table.title is not None:
data = data[1:]
if not headers:
data = data[1:]
with open(filename, 'w') as f:
for d in data:
f.write('{}\n'.format(','.join(d)))
Here's a solution using a regular expression. It also works for an arbitrary number of columns (the number of columns is determined by counting the number of plus signs in the first input line).
input_string = """spu+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+
| 6bca09f8-a320-44d4-a11f-647dcec0aaa1 | tester | ACTIVE | - | Running | OpenStack-net=10.0.0.1, 10.0.0.3 |
+--------------------------------------+--------+--------+------------+-------------+-----------------------------------+"""
import re, csv, sys
def pretty_table_to_tuples(input_str):
lines = input_str.split("\n")
num_columns = len(re.findall("\+", lines[0])) - 1
line_regex = r"\|" + (r" +(.*?) +\|"*num_columns)
for line in lines:
m = re.match(line_regex, line.strip())
if m:
yield m.groups()
w = csv.writer(sys.stdout)
w.writerows(pretty_table_to_tuples(input_string))
I have two text files in the following format:
The first is this on every line:
Key1:Value1
The second is this:
Key2:Value2
Is there a way I can replace Value1 in file1 by the Value2 obtained from using it as a key in file2?
For example:
file1:
foo:hello
bar:world
file2:
hello:adam
bar:eve
I would like to get:
foo:adam
bar:eve
There isn't necessarily a match between the two files on every line. Can this be done neatly in awk or something, or should I do it naively in Python?
Create two dictionaries, one for each file. For example:
file1 = {}
for line in open('file1', 'r'):
k, v = line.strip().split(':')
file1[k] = v
Or if you prefer a one-liner:
file1 = dict(l.strip().split(':') for l in open('file1', 'r'))
Then you could do something like:
result = {}
for key, value in file1.iteritems():
if value in file2:
result[key] = file2[value]
Another way is you could generate the key-value pairs in reverse for file1 and use sets. For example, if your file1 contains foo:bar, your file1 dict is {bar: foo}.
for key in set(file1) & set(file2):
result[file1[key]] = file2[key]
Basically, you can quickly find common elements using set intersection, so those elements are guaranteed to be in file2 and you don't waste time checking for their existence.
Edit: As pointed out by #pepr You can use collections.OrderedDict for the first method if order is important to you.
The awk solution:
awk '
BEGIN {FS = OFS = ":"}
NR==FNR {val[$1] = $2; next}
$1 in val {$2 = val[$1]}
{print}
}' file2 file1
join -t : -1 2 -2 1 -o 0 2.2 -a 2 <(sort -k 2 -t : file1) <(sort file2)
The input files must be sorted on the field they are joined on.
The options:
-t : - Use a colon as the delimiter
-1 2 - Join on field 2 of file 1
-2 1 - Join on field 1 of file 2
-o 0 2.2 - Output the join field followed by field 2 from file2 (separated by the delimiter character)
-a 2 - Output unjoined lines from file2
Once you have:
file1 = {'foo':'hello', 'bar':'world'}
file2 = {'hello':'adam', 'bar':'eve'}
You can do an ugly one liner:
print dict([(i,file2[i]) if i in file2 else (i,file2[j]) if j in file2 else (i,j) for i,j in file1.items()])
{'foo': 'adam', 'bar': 'eve'}
As in your example you are using both the keys and values of file1 as keys in file2.
This might work for you (probably GNU sed):
sed 's#\([^:]*\):\(.*\)#/\\(^\1:\\|:\1$\\)/s/:.*/:\2/#' file2 | sed -f - file1
If you do not consider using basic Unix/Linux commands cheating, then here is a solution using paste and awk.
paste file1.txt file2.txt | awk -F ":" '{ print $1":"$3 }'
TXR:
#(next "file2")
#(collect)
#key:#value1
# (cases)
# (next "file1")
# (skip)
#value2:#key
# (or)
# (bind value2 key)
# (end)
# (output)
#value2:#value1
# (end)
#(end)
Run:
$ txr subst.txr
foo:adam
bar:eve