creating docID for each text file in folder - python

hello I have a folder with name dict and that folder contains 4 to 6 text files, now I wanted to assign a ID docID to each text file in folder and I have used the code below
docID_list = [int(docID_string) for docID_string in os.listdir('/Users/suryavamsi/dict')]
and I have got an error
invalid literal for int() with base 10:
I have tried lots of ways but couldn't crack it can any one help me out

It looks like you're trying to convert strings to integers.
That will only work if your strings look like integers (e.g. '1').
If you just want an integer value associated with each file, use enumerate:
docID_list = [i for i, _ in enumerate(os.listdir('/Users/suryavamsi/dict'))]
Or just:
docID_list = list(range(len(os.listdir('/Users/suryavamsi/dict'))))
You might want to keep a dict that maps docID to filename, in which case you can use a dictionary comprehension:
docID_list = {i:doc for i, doc in enumerate(os.listdir('/Users/suryavamsi/dict'))}

Related

Check if multiple dictionary keys are located in a string

Imagine having a txt file with like
5843092xxx289421xxx832175xxx...
You have a dictionary with keys correspoding to letters
A am trying to search for each key within the string to output a message.
decoder = {5843092:'a', 289421:'b'}
with open( "code.txt","r") as fileTxt:
fileTxt = fileTxt.readlines()
b = []
for key in decoder.keys():
if key in fileTxt:
b.append(decoder[key])
print(b)
this is what I have I feel like im on the right track but I am missing how to do each iteration maybe?
the goal output in this i.e. would be either a list or string of ab...
There are two problems here:
You have a list of strings, and you're treating it as if it's one string.
You're building your output based on the order the keys appear in the decoder dictionary, rather than the order they appear in the input text. That means the message will be all scrambled.
If the input text actually separates each key with a fixed string like xxx, the straightforward solution is to split on that string:
for line in fileTxt:
print(' '.join(decoder.get(int(key), '?') for key in line.split('xxx')))

How to sort a lot of csv files to read them in a specific order?

Hello I have multiple csv files(a lot) that have same names (filename) but have a number at the end. For example I have 4 csv files have same filename and at the end of first file there is no extra number, but for the second file there is a (0) at the end, and for the third there is (1) at the end of the filename and so on.....
I am using pandas read to read the files in a for loop because I have a lot of files in a folder, and to sort them I am using sorted. The problem I have is it sorts the filename fine and the first file good too but I have issue when it has the a filename(0) at the end. It puts it at the last, I want to solve this because these individual files together have the data of a one big file and I am trying to concatenate them automatically. Everything works but the sorting order is not what I wanted and because of that I have same file concatenating(which is what I want) but in wrong order.
How can I rectify this. BTY after reading I am sorting files in a list and it sorts in the wrong order like this ['filename','filename1','filname2','filename0']. But I want it ['Filename','Filename0','Filename1','Filename2'] in this order.
I know the filenames in the list are strings, I have tried converting them to int and float but I have no success I get this value error (ValueError: invalid literal for int() with base 10:)
Any help would be greatly appreciated. I cannot upload code because it has a lot of functions and it is absolutely massive to find these bits it will take a very long time for me. Sorry about that.
Use rsplit and sorted methods with a custom function that does some checking and serves as a key for the sort comparison.
You can try like this :
def function_work(x):
y = x.rsplit('.', 2)[-2]
return ('log' not in x, int(y) if y.isdigit() else float('inf'), x)
csvFiles = ['Filename5.csv', 'Filename0.csv', 'Filename1.csv', 'Filename.csv', 'Filename2.csv']
print(sorted(csvFiles, key=function_work, reverse=False))
#output : ['Filename.csv', 'Filename0.csv', 'Filename1.csv', 'Filename2.csv', 'Filename5.csv']
The sorted function takes an additional keyword argument called key that tells it how to sort the items in the iterable. This argument, key, is a function that is expected to take each entry from the input iterable and give it a "rank" or a "sort order" -
In your case, you'll need to define a key function that will put the "no suffix" file before "0" -
lst = ['abc.csv', 'abc (0).csv', 'abc (1).csv']
filenames_split_lst = [_.rsplit('.', 1) for _ in lst]
# [['abc', 'csv'], ['abc (0)', 'csv'], ['abc (1)', 'csv']]
base_filenames = [_ for _, csv in filenames_split_lst]
# ['abc', 'abc (0)', 'abc (1)']
def sorting_function(base_filename):
if (len(base_filename.split()) == 1):
return 0
elif len(base_filename.split()) == 2:
number_suffix = base_filename.split()[1][1:-1]
return int(number_suffix) + 1
sorted(base_filenames, key=sorting_function)
# ['abc', 'abc (0)', 'abc (1)']

Decoding String list in python from a binary file

I need to read a list of strings from a binary file and create a python list.
I'm using the below command to extract data from binary file:
tmp = f.read(100)
abc, = struct.unpack('100c',tmp)
The data that I can see in variable 'abc' is exactly as shown below, but I need to get the below data into a python list as strings.
Data that I need as a list: 'UsrVal' 'VdetHC' 'VcupHC' ..... 'Gravity_Axis'
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
Here is how i would suggest you to do it with one liner.
You need to decode binary string and then you can do a split based on "\x00" which will return the list you are looking for.
e.g
my_binary_out = b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
decoded_list = my_binary_out.decode("latin1", 'ignore').split('\x00')
#or
decoded_list = my_binary_out.decode("cp1252", 'ignore').split('\x00')
Output Will look like this :
['UsrVal', 'VdetHC', 'VcupHC', 'VdirHC', 'HdirHC', 'UpFlwHC', 'UxHC', 'UyHC', 'UzHC', 'VresHC', 'UxRP', 'UyRP', 'UzRP', 'VresRP', 'Gravity_Axis']
Hope this helps
If you're going for a quick and messy way here, AND assuming your string
b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis'
is in fact interpreted as
" b'UsrVal\x00VdetHC\x00VcupHC\x00VdirHC\x00HdirHC\x00UpFlwHC\x00UxHC\x00UyHC\x00UzHC\x00VresHC\x00UxRP\x00UyRP\x00UzRP\x00VresRP\x00Gravity_Axis' "
Then the following few lines of code result with 'b' having the array you want.
a = {YourStringHere}
b = a[2:-1].split("\x00")

How to remove the stuff lists add when writing to textfiles

I need to write a list to a text file named accounts.txt in the following format:
kieranc,conyers,asdsd,pop
ethand,day,sadads,dubstep
However, it ends up like the following with brackets:
['kieranc', 'conyers', 'asdsd', 'pop\n']['ethand', 'day', 'sadads', 'dubstep']
Here is my code (accreplace is a list):
accreplace = [['kieranc', 'conyers', 'asdsd', 'pop\n'],['ethand', 'day', 'sadads', 'dubstep']]
acc = open("accounts.txt", "w")
for x in accreplace:
acc.write(str(x))
Since each element in accreplace is a list, str(x) doesn't help. It just adds quotes around it. To print the list in proper format use the code below:
for x in accreplace:
acc.write(",".join([str(l) for l in x]))
This will convert the list items into a string.

Creating a match between string and file name in lists

Let's say I have a list containing strings of the form "{N} word" (without quotation marks), where N is some integer and word is some string. In some folder D:\path\folder, I have plenty of files with names of the form "{N}name.filetype". With an input of the aforementioned list (elements being "{N}"), how would I get an output of a list, where every element is of the following form: "{N} words D:\path\folder\{N}name.filetype"?
For example...
InputList = [{75} Hello, {823} World, ...]
OutputList = [{75} Hello D:\path\folder\{75}Stuff.docx, {823} World D:\path\folder\{823}Things.docx, ...]
if folder at D:\path\folder contains, among other files, {75}Stuff.docx and {823}Things.docx.
To generalize, my question is fundamentally:
How do I get python to read a folder and take the absolute path of any file that contains only some part of every element in the list (in this case, we look for {N} in the file names and disregard the word) and add that path to every corresponding element in the list to make the output list?
I understand this is a bit of a long question that combines a couple concepts so I highly thank anyone willing to help in advance!
The important step is to convert your InputList to a dict of {number: word} - this makes it significantly easier to work with. After that it's just a matter of looping through the files in the folder, extracting the number from their name and looking them up in the dict:
InputList = ['{75} Hello', '{823} World']
folder_path= r'D:\path\folder'
# define a function to extract the number between curly braces
def extract_number(text):
return text[1:text.find('}')]
from pathlib import Path
# convert the InputList to a dict for easy and efficient lookup
words= {extract_number(name):name for name in InputList}
OutputList= []
# iterate through the folder to find matching files
for path in Path(folder_path).iterdir():
# extract the file name from the path, e.g. "{75}Stuff.docx"
name= path.name
# extract the number from the file name and find the matching word
number= extract_number(name)
try:
word= words[number]
except KeyError: # if no matching word exists, skip this file
continue
# put the path and the word together and add them to the output list
path= '{} {}'.format(word, path)
OutputList.append(path)
print(OutputList)
# output: ['{75} Hello D:\\path\\folder\\{75}Stuff.docx', '{823} World D:\\path\\folder\\{823}Things.docx']

Categories