modify a string python - python

I have a csv file structured in the following way:
num mut
36 L
45 P
...
where num indicates the position of a mutation and mut indicates the mutation. I have to modify at the position num with the letter mut a string. I wrote the following code in python:
import pandas as pd
import os
df = pd.read_csv(r'file.csv')
df_tmp=df.astype(str)
df_tmp["folder"]=df_tmp["num"]+df_tmp["mut"] #add a third column
f = open("sequence.txt", 'r')
content = f.read()
for i in range(len(df)):
num=df_tmp.num.loc[[i]]-13
num=num.astype(int)
prev=num-1
prev=prev.astype(int)
mut=df_tmp.mut.loc[[i]]
mut=mut.astype(str)
new="".join((content[:prev],mut,content[num:])) #this should modify the file
But it returns me
TypeError: slice indices must be integers or None or have an __index__ method
How can I solve?
Edit: maybe it is more clear what I want to do. I have to insert only the first mutation in my sequence, save it to a file, copy the file in a folder that is named as the third column (that I added in the code), make the same thing with the second mutation, then the third and so on. But I have to insert only one mutation at time.

multiple mutations:
IIUC, you'd be better off pandas, convert your dataframe to dictionary, iterate and join:
# input DataFrame
df = pd.DataFrame({'num': [36, 45], 'mut': ['L', 'P']})
# input string
string = '-'*50
# '--------------------------------------------------'
# get the positions to modify
pos = df.set_index('num')['mut'].to_dict()
# {36: 'L', 45: 'P'}
# iterate over the string, replace hte characters if in the dictionary
# NB. define start=1 if you want the first position to be 1
new_string = ''.join([pos.get(i, c) for i,c in enumerate(string, start=0)])
# '------------------------------------L--------P----'
single mutations:
string = '-'*50
# '--------------------------------------------------'
for idx, r in df.iterrows():
new_string = string[:r['num']-1]+r['mut']+string[r['num']:]
# or
# new_string = ''.join([string[:r['num']-1], r['mut'], string[r['num']:]])
with open(f'file_{idx}.txt', 'w') as f:
f.write(new_string)
output:
file_0.txt
-----------------------------------L--------------
file_1.txt
--------------------------------------------P-----

I tried your code with a sample file.csv and an empty sequence.txt file,
in your code first line from for loop
num=df_tmp.num.loc[[i]]-13
#gives an error since the num in that location is str, to correct that:
num=df_tmp.num.loc[[i]].astype(int)-13
# I used astype to convert it into int first
After this the next error is in last line , the slice indices type error,
This is due to the fact that , the resulting prev and num you use to slice
the content variable is not a int, to get the int value add a [0] to it
in this way:
content="".join((content[:prev[0]],mut,content[num[0]:]))
There shouldn't be an error now.

Related

Efficiently create list of list of list with varying amount of input

I have a .txt file with floating point numbers inside. This file always contains an even number of values which need to be formatted as follows: [[[a,b],[c,d],[e,f]]]
The values always need to be in pairs of two. Even when there are less or more values: [[[a,b], ... [y,z]]]
So it needs to go from this:
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this:
[[[3.31497114423,50.803721015],[7.09205325687,50.803721015],[7.09205325687,53.5104033474],[3.31497114423,53.5104033474],[3.31497114423,50.803721015]]]
I have the feeling this can be done fairly easy and efficiënt. The code I have so far works, but is far from efficient...
with open(filename) as f:
for line in f:
footprint = line.strip()
splitted = footprint.split(' ')
list_str = []
for coordinate in splitted:
list_str.append(coordinate.replace(',', ''))
list_floats = [float(x) for x in list_str]
footprint = [list_floats[x:x+2] for x in range(0, len(list_floats), 2)]
return [footprint]
Any help is greatly appreciated!
The split function is very useful in scenarios such as these.
with open(filename) as f:
# Format the string of numbers into a list seperated by commas
new_list = f.read().split(", ")
# For every element in this list, make it a list seperated by space
# Also convert the strings into floats
for i in range(len(new_list)):
new_list[i] = list(map(float, new_list[i].split(" ")))
new_list = [new_list]
The first split converts the code from this
3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015
To this
['3.31497114423 50.803721015', '7.09205325687 50.803721015', '7.09205325687 53.5104033474', '3.31497114423 53.5104033474', '3.31497114423 50.803721015']
The second split converts that to this
[['3.31497114423', '50.803721015'], ['7.09205325687', '50.803721015'], ['7.09205325687', 53.5104033474'], ['3.31497114423', '53.5104033474'], ['3.31497114423', '50.803721015']]
Then the mapping of the float function converts it to this (the list converts the map object to a list object)
[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]
The last brackets place the whole thing into another list
[[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]]

How to turn a list containing strings into a list containing integers (Python)

I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this)
print("What map file to open?")
mapopen = input(">")
mapload = open(mapopen, "r")
worldMap = [line.split(',') for line in mapload.readlines()]
print(worldMap)
The map file:
1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1,
1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2,
2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2,
2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1,
1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2,
2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1,
1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2,
2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1,
1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2,
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1,
2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,
Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list?
with open(filename, "r") as txtr:
data = txtr.read()
data = txtr.split("/n") # split into list of strings
data = [ list(map(int, x.split(","))) for x in data]
fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps.
Here is for just one large list.
with open(filename, "r") as txtr:
data = txtr.readlines() # remove empty lines in your file!
data = ",".join(data) # turns it into a large string
data = data.split(",") # now you have a list of strings
data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python.
L=['1', '2', '3']
map = map(int, L)
for el in map:
print(el)
>>> 1
... 2
... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps.
myStrList = ["1","2","\n","3"]
global myNewIntList
myNewIntList = []
for x in myStrList:
if(x != "\n"):
y = int(x)
myNewIntList.append(y)
print(myNewIntList)

python add data to existing excel cell Win32com

Assume I have A1 as the only cell in a workbook, and it's blank.
I want my code to add "1" "2" and "3" to it so it says "1 2 3"
As of now I have:
NUMBERS = [1, 2, 3, 4, 5]
ThisSheet.Cells(1,1).Value = NUMBERS
this just writes the first value to the cell. I tried
ThisSheet.Cells(1,1).Value = Numbers[0-2]
but that just puts the LAST value in there. Is there a way for me to just add all of the data in there? This information will always be in String format, and I need to use Win32Com.
update:
I did
stringVar = ', '.join(str(v) for v in LIST)
UPDATE:this .join works perfectly for the NUMBERS list. Now I tried attributing it to another list that looks like this
LIST=[Description Good\nBad, Description Valid\nInvalid]
If I print LIST[0] The outcome is
Description Good
Bad
Which is what I want. But if I use .join on this one, it prints
('Description Good\nBad, Description Valid\nInvalid')
so for this one I need it to print as though I did LIST[0] and LIST[1]
So if you want to put each number in a different cell, you would do something like:
it = 1
for num in NUMBERS:
ThisSheet.Cells(1,it).Value = num
it += 1
Or if you want the first 3 numbers in the same cell:
ThisSheet.Cells(1,it).Value = ' '.join([str(num) for num in NUMBERS[:3]])
Or all of the elements in NUMBERS:
ThisSheet.Cells(1,1).Value = ' '.join([str(num) for num in NUMBERS])
EDIT
Based on your question edit, for string types containing \n and assuming every time you find a newline character, you want to jump to the next row:
# Split the LIST[0] by the \n character
splitted_lst0 = LIST[0].split('\n')
# Iterate through the LIST[0] splitted by newlines
it = 1
for line in splitted_lst0:
ThisSheet.Cells(1,it).Value = line
it += 1
If you want to do this for the whole LIST and not only for LIST[0], first merge it with the join method and split it just after it:
joined_list = (''.join(LIST)).split('\n')
And then, iterate through it the same way as we did before.

How do I avoid errors when parsing a .csv file in python?

I'm trying to parse a .csv file that contains two columns: Ticker (the company ticker name) and Earnings (the corresponding company's earnings). When I read the file using the following code:
f = open('earnings.csv', 'r')
earnings = f.read()
The result when I run print earnings looks like this (it's a single string):
Ticker;Earnings
AAPL;52131400000
TSLA;-911214000
AMZN;583841600
I use the following code to split the string by the break line character (\n), followed by splitting each resulting line by the semi-colon character:
earnings_list = earnings.split('\n')
string_earnings = []
for string in earnings_list:
colon_list = string.split(';')
string_earnings.append(colon_list)
The result is a list of lists where each list contains the company's ticker at index[0] and its earnigns at index[1], like such:
[['Ticker', 'Earnings\r\r'], ['AAPL', '52131400000\r\r'], ['TSLA', '-911214000\r\r'], ['AMZN', '583841600\r\r']]
Now, I want to convert the earnings at index[1] of each list -which are currently strings- intro integers. So I first remove the first list containing the column names:
headless_earnings = string_earnings[1:]
Afterwards I try to loop over the resulting list to convert the values at index[1] of each list into integers with the following:
numerical = []
for i in headless_earnings:
num = int(i[1])
numerical.append(num)
I get the following error:
num = int(i[1])
IndexError: list index out of range
How is that index out of range?
You certainly mishandle the end of lines.
If I try your code with this string: "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600" it works.
But with this one: "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600\r\r\n" it doesn't.
Explanation: split creates a last list item containing only ['']. So at the end, python tries to access [''][1], hence the error.
So a very simple workaround would be to remove the last '\n' (if you're sure it's a '\n', otherwise you might have surprises).
You could write this:
earnings_list = earnings[:-1].split('\n')
this will fix your error.
If you want to be sure you remove a last '\n', you can write:
earnings_list = earnings[:-1].split('\n') if earnings[-1] == '\n' else earnings.split('\n')
EDIT: test code:
#!/usr/bin/env python2
earnings = "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600\r\r\n"
earnings_list = earnings[:-1].split('\n') if earnings[-1] == '\n' else earnings.split('\n')
string_earnings = []
for string in earnings_list:
colon_list = string.split(';')
string_earnings.append(colon_list)
headless_earnings = string_earnings[1:]
#print(headless_earnings)
numerical = []
for i in headless_earnings:
num = int(i[1])
numerical.append(num)
print numerical
Output:
nico#ometeotl:~/temp$ ./test_script2.py
[52131400000, -911214000, 583841600]

Splitting a list and then splitting the last element of that list

For my assignment, I have to split a list twice:
I need to split the address string from the input line using ’+’, and then split the last part of the resulting list at the ’,’
in_file = open('yelp-short.txt')
def parse_line(text_file):
a = text_file.strip('\n')
b = a.split('+')
c = b.split(',')
print c
I get the error: 'list' object has no attribute 'split'
What other methods could I use to do this?
The hint is that you split the last part of the resulting list.
Therefore, you want to pull out the last part and split it:
def parse_line(line):
line = line.strip('\n')
parts = line.split('+')
addrs = parts[-1].split(',')
I would rpartition:
>>> 'a+b+c,d,e'.rpartition('+')[-1].split(',')
['c', 'd', 'e']
The problem is that you are trying to split up a list, not a string. You need to get a particular item out of that list:
b = a.split('+')
c = b[-1].split(',')
You apply split on strings, and it results in a list. Thus, a is a string, b is a list. You can't split a list. Let's say a is "X+Y,Z". b will be the list ["X", "Y,Z"]. What you want to split is the 1st (normal people's 2nd) element of the list b - b[1].split(','). This way there is no error. You can also say "last", by saying b[-1]. It is the same element.

Categories