How to format data into a python list - python

I am trying to figure out how to take data which looks like:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
from a file and make it look like:
[1, 2, 3, 4, 5,
6, 7, 8, 9, 10,
11, 12, 13, 14, 15,
16, 17, 18, 19, 20]
Each line represents a row from the data file, so the starting format should not be confused with:
1 2 3 4 5 6 7...
The original data is contained in a file so it must be first read in and then rewritten with the commas and brackets into a new file. My starting point would be to first read in the data:
with open("data.txt","r") as data:
lines = data.readlines()
Then I know that I have to take the read lines and rewrite them in the format I need but I don't know how to do this for each element of each line.

You can tell Python to split each line by its spaces, and then join the resulting elements with ', ' like this:
', '.join(line_in.split())
This will convert a string like:
'6 7 8 9 10'
in this:
'6, 7, 8, 9, 10'
Now, you need to decide whether this is the last line of the file or not. If it is the last line of the file you need to append a "]" whereas if it is not the last line, you need to add a ",".
At the beginning of the file you need also to add a "["
Hope it helps

You can try something like this:
>>> data = open('data.txt').read().split()
>>> data = [int(item) for item in data if item.isdigit()
>>> data
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
This will get all of the integers in the file. If you don't care, just remove the second, line, and call data = open('data.txt').read().split().

Related

python try to get next available number

From a range of numbers [0:2407] I need to know what are the ones that are already being used.
arrray [0,1,2,..,2407]
To know the ones already used I have a file that I load with pandas.
example:
...| Index |...
...| 100 |...
...| 1572 |...
...| 2046 |...
...| 2045 |...
I need to remove from my initial list the ones coming from the file.
trying to do this in a clean and faster way since the files can be quite big.
Try this:
import pandas as pd
import random
## for demo purpose max number is changed from 2407 to 27
max = 27
## list containing range of numbers
unsed= list(range(max+1))
print(f'all_n : {unsed}')
## define dataFrame exaple
df = pd.DataFrame(random.sample(range(max+1), 10), columns=['index'])
# index
# 0 6
# 1 14
# 2 20
# 3 4
# 4 25
## convert used number to list
used = df['index'].tolist()
print(f'used : {sorted(used)}')
## unused
for n in used:
unused.remove(n)
print(f'unused : {unused}')
Result:
all_n : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27]
used : [4, 6, 14, 20, 25]
unused : [0, 1, 2, 3, 5, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27]
Create a list of flags of size 2408, initially setting all flags to false:
is_used = [False for i in range(2408)]
Iterate through your column and change the corresponding flag to True:
for entry in column:
is_used[entry] = True
Iterate through your list and append to a new list the elements that are not used:
new_list = []
for entry in l:
if not is_used[entry]:
new_l.append(entry)
Summarizing all in a single method:
def remove_used(l, column):
is_used = [False for i in range(2408)]
for entry in column:
is_used[entry] = True
new_list = []
for entry in l:
if not is_used[entry]:
new_l.append(entry)
return new_list
Also, it is worth mentioning that you can speed up by dividing the loops into blocks and putting threads/processes to act on each block.

How do I change my code to draw a table from this list?

seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(seq[i],end ="\t")
How do I get my output table to look like this?
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
one of many ways is this, you make iterate over the seq list by a step of 6 and print the element between those margins
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(0, len(seq), 6):
print(*seq[i:i+6], sep=' ')
output
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
You probably want to make use of string formatting. Below, f"{seq[i]:<4d}" means "A string of length 4, left-aligned, containing the string representation of seq[i]". If you want to right-align, just remove <.
seq = [11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]
for i in range(len(seq)):
print(f"{seq[i]:<4d}", end = "")
if not (i+1) % 6:
print("")
print("")
Output:
11 34 17 52 26 13
40 20 10 5 16 8
4 2 1
The simplest relevant technique is padding
for i in range(0, len(seq), 6):
print(" ".join[str(k).ljust(2, " ") for k in seq[i: i + 6]]
but string formatting as in Printing Lists as Tabular Data will make is a more sophisticated solution

Pandas compare items in list in one column with single value in another column

Consider this two column df. I would like to create an apply function that compares each item in the "other_yrs" column list with the single integer in the "cur" column and keeps count of each item in the "other_yrs" column list that is greater than or equal to the single value in the "cur" column. I cannot figure out how to enable pandas to do this with apply. I am using apply functions for other purposes and they are working well. Any ideas would be very appreciated.
cur other_yrs
1 11 [11, 11]
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0]
4 16 [15, 85]
5 17 [17, 17, 16]
6 13 [8, 8]
Below is the function I used to extract the values into the "other_yrs" column. I am thinking I can just insert into this function some way of comparing each successive value in the list with the "cur" column value and keep count. I really only need to store the count of how many of the list items are <= the value in the "cur" column.
def col_check(col_string):
cs_yr_lst = []
count = 0
if len(col_string) < 1: #avoids col values of 0 meaning no other cases.
pass
else:
case_lst = col_string.split(", ") #splits the string of cases into a list
for i in case_lst:
cs_yr = int(i[3:5]) #gets the case year from each individual case number
cs_yr_lst.append(cs_yr) #stores those integers in a list and then into a new column using apply
return cs_yr_lst
The expected output would be this:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
Use zip inside a list comprehension to zip the columns cur and other_yrs and use np.sum on boolean mask:
df['count'] = [np.sum(np.array(b) <= a) for a, b in zip(df['cur'], df['other_yrs'])]
Another idea:
df['count'] = pd.DataFrame(df['other_yrs'].tolist(), index=df.index).le(df['cur'], axis=0).sum(1)
Result:
cur other_yrs count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
You can consider explode and compare then group on level=0 and sum:
u = df.explode('other_yrs')
df['Count'] = u['cur'].ge(u['other_yrs']).sum(level=0).astype(int)
print(df)
cur other_yrs Count
1 11 [11, 11] 2
2 12 [16, 13, 12, 9, 9, 6, 6, 3, 3, 3, 2, 1, 0] 11
4 16 [15, 85] 1
5 17 [17, 17, 16] 3
6 13 [8, 8] 2
If columns contain millions of records in both of the dataframes and one has to compare each element in first column with all the elements in the second column then following code might be helpful.
for element in Dataframe1.Column1:
Dataframe2[Dateframe2.Column2.isin([element])]
Above code snippet will return one by one specific rows of dataframe2 where element from dataframe1 is found in dataframe2.column2.

How to use binary files. Overwriting specific bytes

I'm writing a program in python, and would like to be able to write to specific bytes in a binary file. I tried to do this in the shell with a small binary file containing the numbers 0 through 15, but I can't figure out how to do so. Below is the code I just entered into the shell with comments to demonstrate what I am trying to do:
>>> File=open("TEST","wb") # Opens the file for writing.
>>> File.write(bytes(range(16))) # Writes the numbers 0 through 15 to the file.
16
>>> File.close() # Closes the file.
>>> File=open("TEST","rb") # Opens the file for reading, so that we can test that its contents are correct.
>>> tuple(File.read()) # Expected output: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
>>> File.close() # Closes the file.
>>> File=open("TEST","wb") # Opens the file for writing.
>>> File.seek(3) # Moves the pointer to position 3. (Fourth byte.)
3
>>> File.write(b"\x01") # Writes 1 to the file in its current position.
1
>>> File.close() # Closes the file.
>>> File=open("TEST","rb") # Opens the file for reading, so that we can test that its contents are correct.
>>> tuple(File.read()) # Expected output: (0, 1, 2, 1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
(0, 0, 0, 1)
>>> File.close()
>>> File=open("TEST","wb") # I will try again using apend mode to overwrite.
>>> File.write(bytes(range(16)))
16
>>> File.close()
>>> File=open("TEST","ab") # Append mode.
>>> File.seek(3)
3
>>> File.write(b"\x01")
1
>>> File.close()
>>> File=open("TEST","rb")
>>> tuple(File.read()) # Expected output: (0, 1, 2, 1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1)
>>> File.close()
My desired output is as shown, but "wb" seems to erase all the data in the file, while "ab" can't seek backwards.
How would I achieve my desired output without rewriting the whole file?
When you open a file for writing with w, the file is truncated, all contents removed. You need to open the file for reading and writing with r+ instead. From the open() function documentation:
'w' open for writing, truncating the file first
and
For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation.
Because the file was truncated first, seeking to position 3 then writing \x01 has the first few bytes filled in with \x00 for you.
Opening a file in append mode usually restricts access to the new portion of the file only, so anything past the first 16 bytes. Again, from the documentation:
Other common values are [...] and 'a' for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position).
(bold emphasis in quoted sections mine). This is why your \x01 byte ends up right at the end in spite of the File.seek(3) call.
r does not truncate a file and gives you full range of the contents with seek(); r+ adds write access to that mode. Demo with 'r+b':
>>> with open('demo.bin', 'wb') as f:
... f.write(bytes(range(16)))
...
16
>>> with open('demo.bin', 'rb') as f:
... print(*f.read())
...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
>>> with open('demo.bin', 'r+b') as f: # read and write mode
... f.seek(3)
... f.write(b'\x01')
...
3
1
>>> with open('demo.bin', 'rb') as f:
... print(*f.read())
...
0 1 2 1 4 5 6 7 8 9 10 11 12 13 14 15
The solution is another mode: "r+b". (as shown by other answers.)
Here is the solution in the shell from where the file left off:
>>> File=open("TEST","r+b") # Opens file for reading and writing.
>>> File.seek(3)
3
>>> File.write(b"\x01")
1
>>> File.close()
>>> File=open("TEST","rb")
>>> tuple(File.read()) # Expected output: (0, 1, 2, 1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1)
(0, 1, 2, 1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1)
>>> File.close()
If I remember properly, you have to open the file in "Append mode" or it will simply erase everything and start from scratch, and then when you use seek(3) you just create those 3 0's and then you write the 1. I'll investigate further on how to write directly to a position but you may have to read the whole file, modify, write the whole file again.
You can actually read about this behaviour in the documentation:
'w' for only writing (an existing file with the same name will be erased)

convert all rows to columns and columns to rows in Arrays [duplicate]

This question already has answers here:
Matrix Transpose in Python [duplicate]
(19 answers)
Closed 6 years ago.
I'm trying to create this program (in python) that converts all rows to columns and columns to rows. To be more specific, the first input are 2 numbers. N and M. N - total rows,M total columns. I've used b=map(int, raw_input().split()). and then based on b[0], Each of the next N lines will contain M space separated integers. For example:
Input:
3 5
13 4 8 14 1
9 6 3 7 21
5 12 17 9 3
Now the program will store it in a 2D array:
arr=[[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
What's required for the output is to print M lines each containing N space separated integers. For example:
Output:
13 9 5
4 6 12
8 3 17
14 7 9
1 21 3
This is what I've tried so far:
#Getting N and M from input
NM=map(int, raw_input().split())
arr=[]
for i in xrange(NM[0]):
c=map(int, raw_input().split())
arr.append(c)
I've created a 2D array and got the values from input but I don't know the rest. Let me make this clear that I'm definitely NOT asking for code. Just exactly what to do to convert rows to columns and in reverse.
Thanks in advance!
You can use zip to transpose the data:
arr = [[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
new_arr = zip(*arr)
# [(13, 9, 5), (4, 6, 12), (8, 3, 17), (14, 7, 9), (1, 21, 3)]

Categories