modify a string python - python
I have a csv file structured in the following way:
num mut
36 L
45 P
...
where num indicates the position of a mutation and mut indicates the mutation. I have to modify at the position num with the letter mut a string. I wrote the following code in python:
import pandas as pd
import os
df = pd.read_csv(r'file.csv')
df_tmp=df.astype(str)
df_tmp["folder"]=df_tmp["num"]+df_tmp["mut"] #add a third column
f = open("sequence.txt", 'r')
content = f.read()
for i in range(len(df)):
num=df_tmp.num.loc[[i]]-13
num=num.astype(int)
prev=num-1
prev=prev.astype(int)
mut=df_tmp.mut.loc[[i]]
mut=mut.astype(str)
new="".join((content[:prev],mut,content[num:])) #this should modify the file
But it returns me
TypeError: slice indices must be integers or None or have an __index__ method
How can I solve?
Edit: maybe it is more clear what I want to do. I have to insert only the first mutation in my sequence, save it to a file, copy the file in a folder that is named as the third column (that I added in the code), make the same thing with the second mutation, then the third and so on. But I have to insert only one mutation at time.
multiple mutations:
IIUC, you'd be better off pandas, convert your dataframe to dictionary, iterate and join:
# input DataFrame
df = pd.DataFrame({'num': [36, 45], 'mut': ['L', 'P']})
# input string
string = '-'*50
# '--------------------------------------------------'
# get the positions to modify
pos = df.set_index('num')['mut'].to_dict()
# {36: 'L', 45: 'P'}
# iterate over the string, replace hte characters if in the dictionary
# NB. define start=1 if you want the first position to be 1
new_string = ''.join([pos.get(i, c) for i,c in enumerate(string, start=0)])
# '------------------------------------L--------P----'
single mutations:
string = '-'*50
# '--------------------------------------------------'
for idx, r in df.iterrows():
new_string = string[:r['num']-1]+r['mut']+string[r['num']:]
# or
# new_string = ''.join([string[:r['num']-1], r['mut'], string[r['num']:]])
with open(f'file_{idx}.txt', 'w') as f:
f.write(new_string)
output:
file_0.txt
-----------------------------------L--------------
file_1.txt
--------------------------------------------P-----
I tried your code with a sample file.csv and an empty sequence.txt file,
in your code first line from for loop
num=df_tmp.num.loc[[i]]-13
#gives an error since the num in that location is str, to correct that:
num=df_tmp.num.loc[[i]].astype(int)-13
# I used astype to convert it into int first
After this the next error is in last line , the slice indices type error,
This is due to the fact that , the resulting prev and num you use to slice
the content variable is not a int, to get the int value add a [0] to it
in this way:
content="".join((content[:prev[0]],mut,content[num[0]:]))
There shouldn't be an error now.
Related
Efficiently create list of list of list with varying amount of input
I have a .txt file with floating point numbers inside. This file always contains an even number of values which need to be formatted as follows: [[[a,b],[c,d],[e,f]]] The values always need to be in pairs of two. Even when there are less or more values: [[[a,b], ... [y,z]]] So it needs to go from this: 3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015 To this: [[[3.31497114423,50.803721015],[7.09205325687,50.803721015],[7.09205325687,53.5104033474],[3.31497114423,53.5104033474],[3.31497114423,50.803721015]]] I have the feeling this can be done fairly easy and efficiënt. The code I have so far works, but is far from efficient... with open(filename) as f: for line in f: footprint = line.strip() splitted = footprint.split(' ') list_str = [] for coordinate in splitted: list_str.append(coordinate.replace(',', '')) list_floats = [float(x) for x in list_str] footprint = [list_floats[x:x+2] for x in range(0, len(list_floats), 2)] return [footprint] Any help is greatly appreciated!
The split function is very useful in scenarios such as these. with open(filename) as f: # Format the string of numbers into a list seperated by commas new_list = f.read().split(", ") # For every element in this list, make it a list seperated by space # Also convert the strings into floats for i in range(len(new_list)): new_list[i] = list(map(float, new_list[i].split(" "))) new_list = [new_list] The first split converts the code from this 3.31497114423 50.803721015, 7.09205325687 50.803721015, 7.09205325687 53.5104033474, 3.31497114423 53.5104033474, 3.31497114423 50.803721015 To this ['3.31497114423 50.803721015', '7.09205325687 50.803721015', '7.09205325687 53.5104033474', '3.31497114423 53.5104033474', '3.31497114423 50.803721015'] The second split converts that to this [['3.31497114423', '50.803721015'], ['7.09205325687', '50.803721015'], ['7.09205325687', 53.5104033474'], ['3.31497114423', '53.5104033474'], ['3.31497114423', '50.803721015']] Then the mapping of the float function converts it to this (the list converts the map object to a list object) [[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]] The last brackets place the whole thing into another list [[[3.31497114423, 50.803721015], [7.09205325687, 50.803721015], [7.09205325687, 53.5104033474], [3.31497114423, 53.5104033474], [3.31497114423, 50.803721015]]]
How to turn a list containing strings into a list containing integers (Python)
I am optimizing PyRay (https://github.com/oscr/PyRay) to be a usable Python ray-casting engine, and I am working on a feature that takes a text file and turns it into a list (PyRay uses as a map). But when I use the file as a list, it turns the contents into strings, therefore not usable by PyRay. So my question is: How do I convert a list of strings into integers? Here is my code so far. (I commented the actual code so I can test this) print("What map file to open?") mapopen = input(">") mapload = open(mapopen, "r") worldMap = [line.split(',') for line in mapload.readlines()] print(worldMap) The map file: 1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2, 1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,2,0,0,3,0,0,0,0,0,0,0,2,3,2,3,0,0,2, 2,0,3,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,3,1,0,0,2,0,0,0,2,3,2,0,0,0,0,0,0,0,1, 1,0,0,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,2, 2,0,0,0,0,0,0,0,0,2,0,2,0,0,2,1,0,0,0,1, 1,0,0,0,0,0,0,0,0,1,3,1,0,0,0,0,0,0,0,2, 2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1, 1,0,2,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,2, 2,0,3,0,0,2,0,0,0,0,0,0,0,2,3,2,1,2,0,1, 1,0,0,0,0,3,0,0,0,0,0,0,0,1,0,0,2,0,0,2, 2,3,1,0,0,2,0,0,2,1,3,2,0,2,0,0,3,0,3,1, 1,0,0,0,0,0,0,0,0,3,0,0,0,1,0,0,2,0,0,2, 2,0,0,0,0,0,0,0,0,2,0,0,0,2,3,0,1,2,0,1, 1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,3,0,2, 2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,1, 2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1, Please help me, I have been searching all about and I can't find anything.
try this: Did you want a list of lists? or just one big list? with open(filename, "r") as txtr: data = txtr.read() data = txtr.split("/n") # split into list of strings data = [ list(map(int, x.split(","))) for x in data] fourth line splits string into list by removing comma, then appliea int() on each element then turns it into a list. It does this for every element in data. I hope it helps. Here is for just one large list. with open(filename, "r") as txtr: data = txtr.readlines() # remove empty lines in your file! data = ",".join(data) # turns it into a large string data = data.split(",") # now you have a list of strings data = list(map(int, data)) # applies int() to each element in data.
Look into the map built-in function in python. L=['1', '2', '3'] map = map(int, L) for el in map: print(el) >>> 1 ... 2 ... 3
As per you question, please find below a way you can change list of strings to list of integers (or integers if you use list index to get the integer value). Hope this helps. myStrList = ["1","2","\n","3"] global myNewIntList myNewIntList = [] for x in myStrList: if(x != "\n"): y = int(x) myNewIntList.append(y) print(myNewIntList)
python add data to existing excel cell Win32com
Assume I have A1 as the only cell in a workbook, and it's blank. I want my code to add "1" "2" and "3" to it so it says "1 2 3" As of now I have: NUMBERS = [1, 2, 3, 4, 5] ThisSheet.Cells(1,1).Value = NUMBERS this just writes the first value to the cell. I tried ThisSheet.Cells(1,1).Value = Numbers[0-2] but that just puts the LAST value in there. Is there a way for me to just add all of the data in there? This information will always be in String format, and I need to use Win32Com. update: I did stringVar = ', '.join(str(v) for v in LIST) UPDATE:this .join works perfectly for the NUMBERS list. Now I tried attributing it to another list that looks like this LIST=[Description Good\nBad, Description Valid\nInvalid] If I print LIST[0] The outcome is Description Good Bad Which is what I want. But if I use .join on this one, it prints ('Description Good\nBad, Description Valid\nInvalid') so for this one I need it to print as though I did LIST[0] and LIST[1]
So if you want to put each number in a different cell, you would do something like: it = 1 for num in NUMBERS: ThisSheet.Cells(1,it).Value = num it += 1 Or if you want the first 3 numbers in the same cell: ThisSheet.Cells(1,it).Value = ' '.join([str(num) for num in NUMBERS[:3]]) Or all of the elements in NUMBERS: ThisSheet.Cells(1,1).Value = ' '.join([str(num) for num in NUMBERS]) EDIT Based on your question edit, for string types containing \n and assuming every time you find a newline character, you want to jump to the next row: # Split the LIST[0] by the \n character splitted_lst0 = LIST[0].split('\n') # Iterate through the LIST[0] splitted by newlines it = 1 for line in splitted_lst0: ThisSheet.Cells(1,it).Value = line it += 1 If you want to do this for the whole LIST and not only for LIST[0], first merge it with the join method and split it just after it: joined_list = (''.join(LIST)).split('\n') And then, iterate through it the same way as we did before.
How do I avoid errors when parsing a .csv file in python?
I'm trying to parse a .csv file that contains two columns: Ticker (the company ticker name) and Earnings (the corresponding company's earnings). When I read the file using the following code: f = open('earnings.csv', 'r') earnings = f.read() The result when I run print earnings looks like this (it's a single string): Ticker;Earnings AAPL;52131400000 TSLA;-911214000 AMZN;583841600 I use the following code to split the string by the break line character (\n), followed by splitting each resulting line by the semi-colon character: earnings_list = earnings.split('\n') string_earnings = [] for string in earnings_list: colon_list = string.split(';') string_earnings.append(colon_list) The result is a list of lists where each list contains the company's ticker at index[0] and its earnigns at index[1], like such: [['Ticker', 'Earnings\r\r'], ['AAPL', '52131400000\r\r'], ['TSLA', '-911214000\r\r'], ['AMZN', '583841600\r\r']] Now, I want to convert the earnings at index[1] of each list -which are currently strings- intro integers. So I first remove the first list containing the column names: headless_earnings = string_earnings[1:] Afterwards I try to loop over the resulting list to convert the values at index[1] of each list into integers with the following: numerical = [] for i in headless_earnings: num = int(i[1]) numerical.append(num) I get the following error: num = int(i[1]) IndexError: list index out of range How is that index out of range?
You certainly mishandle the end of lines. If I try your code with this string: "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600" it works. But with this one: "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600\r\r\n" it doesn't. Explanation: split creates a last list item containing only ['']. So at the end, python tries to access [''][1], hence the error. So a very simple workaround would be to remove the last '\n' (if you're sure it's a '\n', otherwise you might have surprises). You could write this: earnings_list = earnings[:-1].split('\n') this will fix your error. If you want to be sure you remove a last '\n', you can write: earnings_list = earnings[:-1].split('\n') if earnings[-1] == '\n' else earnings.split('\n') EDIT: test code: #!/usr/bin/env python2 earnings = "Ticker;Earnings\r\r\nAAPL;52131400000\r\r\nTSLA;-911214000\r\r\nAMZN;583841600\r\r\n" earnings_list = earnings[:-1].split('\n') if earnings[-1] == '\n' else earnings.split('\n') string_earnings = [] for string in earnings_list: colon_list = string.split(';') string_earnings.append(colon_list) headless_earnings = string_earnings[1:] #print(headless_earnings) numerical = [] for i in headless_earnings: num = int(i[1]) numerical.append(num) print numerical Output: nico#ometeotl:~/temp$ ./test_script2.py [52131400000, -911214000, 583841600]
Splitting a list and then splitting the last element of that list
For my assignment, I have to split a list twice: I need to split the address string from the input line using ’+’, and then split the last part of the resulting list at the ’,’ in_file = open('yelp-short.txt') def parse_line(text_file): a = text_file.strip('\n') b = a.split('+') c = b.split(',') print c I get the error: 'list' object has no attribute 'split' What other methods could I use to do this?
The hint is that you split the last part of the resulting list. Therefore, you want to pull out the last part and split it: def parse_line(line): line = line.strip('\n') parts = line.split('+') addrs = parts[-1].split(',')
I would rpartition: >>> 'a+b+c,d,e'.rpartition('+')[-1].split(',') ['c', 'd', 'e'] The problem is that you are trying to split up a list, not a string. You need to get a particular item out of that list: b = a.split('+') c = b[-1].split(',')
You apply split on strings, and it results in a list. Thus, a is a string, b is a list. You can't split a list. Let's say a is "X+Y,Z". b will be the list ["X", "Y,Z"]. What you want to split is the 1st (normal people's 2nd) element of the list b - b[1].split(','). This way there is no error. You can also say "last", by saying b[-1]. It is the same element.