I am trying to remove the last two columns from my data frame by using Python.
The issue is there are cells with values in the last two columns that we don't need, and those columns don't have headers.
Here's the code I wrote, but I'm really new to Python, and don't know how to take my original data and remove the last two columns.
import csv
with open("Filename","rb") as source:
rdr= csv.reader( source )
with open("Filename","wb") as result:
wrt= csv.writer ( result )
for r in rdr:
wrt.writerow( (r[0], r[1], r[2], r[3], r[4], r[5], r[6], r[7], r[8], r[9], r[10], r[11]) )
Thanks!
The proper Pythonic way to perform something like this is through slicing:
r[start:stop(:step)]
start and stop are indexes, where positive indexes are counted from the front and negative is counted from the end. Blank starts and stops are treated as the beginning and the end of r respectively. step is an optional parameter that I'll explain later. Any slice returns an array, which you can perform additional operations on or just return immediately.
In order to remove the last two values, you can use the slice
r[:-2]
Additional fun with step
Now that step parameter. It allows you to pick every stepth value from the selected slice. With an array of, say, r = [0,1,2,3,4,5,6,7,8,9,10] you can pick every other number starting with the first (all of the even numbers) with the slice r[::2]. In order to get results in reverse order, you can make the step negative:
> r = [0,1,2,3,4,5,6,7,8,9,10]
[0,1,2,3,4,5,6,7,8,9,10]
> r[::-1]
[10,9,8,7,6,5,4,3,2,1,0]
Related
I know there have been a few similar questions regarding loop and random numbers, but I can't seem to find a solution for my problem.
Say I have a fixed list of numbers from my dataset and a threshold that the number has to meet:
x = (7,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41)
threshold= 25
I need to randomly pick a number from this list. Unfortunately, I cannot just directly loop over my original list, I'm forced to randomly pick an index of the list first, and find my number. So for example if I now randomly generate a index number 1 then I get x[1] which is 11
The final result I need is to find numbers that are greater than the threshold for at least 3 times and then put all the resulting number in a list, then my loop can stop. (The indexes cannot repeat).
As an example, a possible final results would be (27,29,31) (The results can be in any format) . I'm thinking maybe something like this to start but really need help to proceed:
Unless you are particularly concerned about the memory usage of creating an additional filtered list, the simplest would probably be to start by doing this:
filtered = [i for i in x if i > threshold]
You can then choose three samples from this filtered list (after import random). The following will potentially choose the same item more than once:
random.choices(filtered, k=3)
or if you want to avoid choosing the same item more than once:
random.sample(filtered, k=3)
Each of the above functions will output a list. Use tuple(....) on the output if you need to convert it to a tuple.
First a clarification. Do you need to pick a random element from the list each iteration, or do you need to pick a different random element from the list each time. I.e., can the same index be picked twice? You're doing the latter.
Second, you want to use range(len(x)). You don't want to hardwire the length of x into your code, and you want index 0 to be a possibility. random.shuffle() may be a better choice.
Lastly, you want to do something like:
result = []
for ....
if select >= threshold:
result.append(select)
if len(result) >= 3: break
If we assume the following constraints:
We are not allowed to loop over the original list (including list comprehension)
We are only allowed to access one member of the original list at a time through its index
We must pick 3 distinct members of the list that are greater or equal to the threshold
The following code should satisfy all of them:
x = [ 7,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41 ]
threshold = 25
result_index = []
while len(result_index) < 3:
index = random.range(0, len(x))
if x[index] >= threshold and index not in result_index:
result_index.append(index)
result = [ x[a] for a in result_index ]
Here is how this works:
In the loop, we store indices, not the numbers them selves.
For each index we check 2 conditions: there is a number there that is bigger or equal to the threshold and we haven't seen this index before.
If the conditions are satisfied, we save the index, not the number!
Repeat until we have 3 indices.
Build new list by getting numbers from those indices directly.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a table similar to this format:
Table
My table has 50000 rows and 800 columns, certain cells (tab-separated) contain multiple comma-separated words (e.g L,N). I want to retain only rows that contain one of a set of words (say A to N) at a given column (say col2) and remove the remaining rows.
Is it possible to do using Vlookup or there is another way to do it? Any suggestion is appreciated.
Create a helper column, perhaps on the extreme right of a copy of your worksheet. Enter this formula in row 2 of that column. Modify the formula to replace column C with the ID of the helper column (the one in which you write the formula) and replace B with the column in which to find the word. Cope the formula all the way down.
=ISERROR(FIND(C$1,$B2))
Now enter the word you want to keep in cell(1) of the helper column (C$1 in my example). The column will fill with TRUE and FALSE.
TRUE means that the word wasn't found and the row should be deleted
FALSE means that the word exists and should be kept
Now sort the sheet on that column and delete the block with TRUE in it. 15 seconds flat, after you have done it for a few times. That's faster than any VBA or r or Python solution could be made to run.
The biggest issue I had was in communicating conveying what I did, how and why. No need to, so deleted.
So, Select any part, section or range of your table and run the code.
You can remove the section of the code below 'clear any existng fiters' if you dont need to delete the found data and want to keep in your sheet, still tagged for additional purposes (like you could copy those bits to another file if you want, etc).
The code below should do what you asked in your question, and leave you with just the parts of the table that DONT contain, in your selection, any of your terms. i.e. it will delete and move up rows in your table according to your criteria.
& yes Python & R can simply do this for you too too, with dataframes in python, with less code. But this VBA code worked for my many examples. Don't know how it will fare with 50000 rows and X columns, but it should be alright {Post edit: It works fine}.
Sub SearchTableorSelectionDeletetermsFound5()
Dim corresspondingpartner() As Variant
Dim rng As Range: Set rng = Selection
Dim col As Range
For Each col In rng.Columns
Dim r As Range
Dim rn As Variant
Dim Rownum As Long
Rownum = Selection.Rows.Count
ReDim rm(0 To Rownum) As Variant 'size of this array needs to equal or bigger than your selection
'With Sheet2
terms = Sheets("Sheet2").Cells(1, 1).CurrentRegion
k = 1
For rw = 0 To UBound(terms)
ReDim Preserve corresspondingpartner(rw)
corresspondingpartner(rw) = (k / k) 'gives each correspondong partner element an id of 1.
k = k + 1
Next
'End With
For Each r In Selection
n = 0
m = n
For Each c In terms
' Checks for each term in turn in the terms column.
' If it finds one, it inserts the corresponding corresspondingpartner name in the column cell/corresponding row column O (*post edit: now, column ADU)*
If r.Offset(0, 0).Value Like "*" & c & "*" Then
rm(n) = corresspondingpartner(n) 'corresspondingpartner(n) and in the end, you dont even need this, you can replace with any value which the auto fiter section looks for to delete
'so you can remove any instances and classees of corresspondingpartner, including the making of this corresponding second array
'turns out it could have been just if =1
Cells(r.Row, 801).Value = rm(n) / rm(n) 'Sheets("HXY2").
'##### YOU DONT EVEN NEED A HLOOKUP! :)
'#### BUT I STILL WANT TO GET RID OF THE OFFSET COLUMS, DO IT WITHOUT THEM. DONE!! :)
'''###''' same here , turns out could have just been =1
End If
n = n + 1
Next
Next
Next col
'Clear any existing filters
On Error Resume Next
ActiveSheet.ShowAllData
On Error GoTo 0
'1. Apply Filter
ActiveSheet.Range("A1:ADU5000").AutoFilter Field:=801, Criteria1:=corresspondingpartner(n) / corresspondingpartner(n)
'2. Delete Rows
Application.DisplayAlerts = False
ActiveSheet.Range("A1:ADU5000").SpecialCells(xlCellTypeVisible).Delete
Application.DisplayAlerts = True
'3. Clear Filter
On Error Resume Next
ActiveSheet.ShowAllData
On Error GoTo 0
End Sub
You might be able to see that in the beginning I was working with to print offset column results, from the table/selection - which took up unneccisary space and also was employing a VBA Application.WorksheetFunction.Hlookup in the code to give a final result column, tagging the rows to delete, but that was in the end unneccissary. Those earlier versions/macros worked too, but were slower, so I did it without the need for helper columns, using arrays.
I turned to my friend, [excel campus - delete rows based on conditions][1] to embed and tweak the autoflter code at the end which deletes the rows you don't want , so you don't have to do it yourself)
Its now a "virtual" hlookup on the array matches in your selection (or data), by deleting all rows that match your specifications/requirements leaving you with the data you want.
I know and have a huge hunch it could be further improved, expanded streamlined much further (starting with the way I produce the arrays), but I'm happy with its functionality and potential scope for now.
[1]: https://www.youtube.com/watch?v=UwNcSZrzd_w&t=6s)
I am trying to write my own quick sort algorithm in Python without looking up how it's done professionally (I will learn more this way). If my idea of how I intend to implement this quick sort seems silly to you, (I am aware that it probably will) please don't give me a completely different way of doing it, unless my method will never succeed or at least not without ridiculous measures, please help me reach a solution with my desired method :)
Currently I have a defined a function "pivot" which will take the input list and output three lists, a list of numbers smaller than the pivot (chosen in this case to be the first number in the list every time), a list of numbers equal to the pivot and a list of numbers greater than the pivot.
My next step was to define a function "q_sort". First this function creates a list called "finalList" and fills it with 0s such that it is the same length as the list being sorted. Next it pivots the list and adds the the numbers equal to the pivot to finalList in what is already their correct position (as there are 0s in place to represent the number of items smaller than it and 0s as place-holders again in place of the items bigger than pivot)
This all works fine.
What doesn't work fine is the next step. I have written what I want to happen next in some poorly thought out psuedo-code below:
numList = [3, 5, 3, 1, 12, 65, 2, 11, 32]
def pivot(aList):
biggerNum =[]
smallerNum = []
equalNum = [aList[0]]
for x in range(1, len(aList)):
if aList[0]<aList[x]:
biggerNum.append(aList[x])
elif aList[0]>aList[x]:
smallerNum.append(aList[x])
elif aList[0] == aList[x]:
equalNum.append(aList[x])
pivoted = [smallerNum, equalNum, biggerNum]
return pivoted
def q_sort(aList):
finalList = []
for x in range(len(aList)):
finalList.append(0)
pivot(aList)
for i in range(len(pivot(aList)[1])):
finalList[len(pivot(aList)[0])+i] = pivot(aList)[1][i]
Pseudo Code:
#if len(smallerNum) != 0:
#q_sort(smallerNum) <--- I want this to add it's pivot to finalList
#if len(biggerNum) != 0:
#q_sort(biggerNum) <--- Again I want this to add it's pivot to finalList
#return finalList <--- Now after all the recursion every number has been pivoted and added
What I intend to happen is that if the list of numbers smaller than the pivot actually has any items in it, it will then q_sort this list. This means it will find a new pivot and add it's value to the right position in finalList. The way I imagine it working is that the function only reaches "return finalList" once every number from "numList" has been put in it's correct position. As the recursive nature of including q_sort within q_sort means after pivoting "smallerNum" (and adding the pivot to finalList) it will have another list to pivot.
The problem is that you're starting over on each call: using the entire list, working from both ends. You need to recur on each partition of the list: the part below the pivot, then the part above the pivot. This is generally done by passing the endpoints of the sub-list, such as ...
def q_sort(aList, low, high):
if low >= high:
return
# find pivot position, "pivot"
...
# arrange list on either side of pivot
...
# recur on each part of list.
q_sort(alist, low_index, pivot-1)
q_sort(alist, pivot+1, high_index)
Is that enough of an outline to get you moving?
If not, try a browser search on "Python quicksort", and you'll find a lot of help, more thorough than we can cover here.
I am trying to do the following:
create an array of random data
create an array of predefined codes (AW, SS)
subtract all numbers as well as any instance of predefined code.
if a string called "HL" remains after step 3, remove that as well and take the next alphabet pair. If a string called "HL" is the ONLY string in the array then take that.
I do not know how to go about completing steps 3 - 4.
1.
array_data = ['HL22','PG1234-332HL','1334-SF-21HL','HL43--222PG','HL222AW11144RH','HLSSDD','SSDD']
2.
predefined_code = ['AW','SS']
3.
ideally, results for this step will look like
result_data = [['HL'],['PG,HL'],['SF','HL'],['HL','PG'],['HL','RH'],
['HL','DD'],['DD']
4. ideally, results for this step will look like this:
result_data = [['HL'],['PG'],['SF'],['PG'],['RH'], ['DD'],['DD']
for step 3, I have tried the following code
not_in_predefined = [item for item in array_data if item not in predefined_code]
but this doesnt produce the result im looking for, because it it checking item against item. not a partial string match.
This is fairly simple using Regex.
re.findall(r'[A-Z].',item) should give you the text from your strings, and then you can do the required processing on that.
You may want to convert the list to a set eventually and use the difference operation, instead of looping and removing the elements defined in the predefined_code list.
All right, so I have a code where I need to print a game board with 5x5 squares. All the squares is in squareList, which is a list (oh you don't say). A square is based on an object with a variable number that is the value I want to print. How can I do this?
EDIT: The reason to why I want to do this is because I want to start on a new line every five squares so that I get a board, not a line, with values.
The python slice / array operator supports an optional step as the third value. squareList[start:end:step]:
for o in squareList[::5]:
print(o.number)
Use 5 as the step value to get every fifth entry in the list.
Why not make list for each row in the square, and then put those rows into a list?
squareList = [rowList1, rowList2, rowList3, rowlist4, rowList5]
This way you can manipulate a column as you loop through the rows.
for row in SquareList:
doSomething(row[4])
You can also extract a column using a list comprehension.
colList1 = [row[0] for row in squareList]
I would agree that you might want to consider other more convenient structures for your data, as suggested by CCKx.
Here are two approaches
Assuming:
squareList = [0,0,1,0,0,
1,2,0,0,1,
0,0,0,1,2,
2,2,0,0,0,
1,0,0,0,1]
Then you can do this:
for index, value in enumerate(squareList):
print(value, end='')
if (index % 5) == 4:
print()
(I'm assuming Python 3.)
That will literally do what you asked for in the way that you asked for it. index will count up through each element of your list, and the % or "modulo" operator will get you the remainder when you divide by 5, allowing you to take some action every 5 times round the loop.
Or you can use slicing:
for row in range(5):
for cell in squareList[row*5:row*5+5]:
print(cell, end='')
print()
See also:
What is the most "pythonic" way to iterate over a list in chunks?