Separating a string by the first occurrence of a separator - python

I just got into python very recently and now I'm practicing by (what I imagine to be rather simple, but challenging enough for me) creating small tools to sort files into folders.
So far it has been going pretty well, but now I've encountered a problem:
My files are in the following format:
myAsset_prefix1_prefix2_prettyName.ext ;
(i.e. Tiger_texture_spec_brightOrange.png)
myAsset always has a different length since it's dependent on name.
I want to sort every file of the same asset ( "myAsset_" tag) in a separate folder.
The copying to a separate folder etc is no challenge but..
I don't want to update an array by hand every time I create/receive a new asset.
So instead of using the startswith operation and make it run through a list, I'd like to build that array when my script runs, by making the script look at the name of the file and store everything up to and including the first "_" in a variable/array.
Is that possible?

I think you want the glob module. This allows you to list the files that match a certain format.
For example:
for filename in glob.glob(*.ext):
asset_tag = filename.split(" ")[0]

Related

how to open a file closest to the file specified python

I am using python and I wand to run a program the will open the file specified by the user. But the problem is that, if the user doesn't specify the exact file name, then it will give an error. If the user wants to open "99999-file-name.mp3"
and he has typed "filename.mp3", then how can the program open the file closest to the one specified?
First get a list of files in the particular folder
Then use difflib.get_close_matches like so:
difflib.get_close_matches(user_specified_file, list_of_files)
to find "good" matches.
N.B: Consider putting a providing a small cutoff i.e 0.1 as suggested by #tobias_k to ensure you do you get a match always as the default cutoff of 0.6 means sometimes nothing will be a "good match" for what the user entered.
Similarly if you need to get only one file name also pass in the optional parameter n=1 to get the closest match since if you don't specify it you will get the 3 best matches.
To answer this question, you need to first define "closest" because in computing this can mean very different things. If you want to compare strings and find the most similar, then one good way of doing that is checking the edit distance. There are Python libraries out there for that, i.e. https://pypi.python.org/pypi/editdistance.
You give it two strings and it tells you how much you have to change one string to get the other. As per the documentation:
>>> import editdistance
>>> editdistance.eval('banana', 'bahama')
2L
PS. Can't help but to mention that I think this is a bad idea. If you want to do sth with the file opened and the program starts opening random files, then either you're eventually gonna overwrite a file that is not meant to be overwritten or you try to process a file that can't be processed in your intended way. I would recommend using a file select box that you can easily use with tKinter for example (even though tKinter is cancer).

Can a file read be combined in a list comprehension with list slicing?

The following code fragment is the core of a twitter bot using Twython.
I would like to know if I can combine the file read into the list comprehension
as it seems rather convoluted to read a line in as a one-item list only to then
create another multi-item list from that.
I've checked around and found some examples where a whole file is read in using
readlines() for instance, but not one where slicing is involved too.
with open(tweet_datafile,'r') as smstweets:
bigtweet = smstweets.readline().strip()
text_entire = [ bigtweet[i:i+140] for i in range(0,len(bigtweet),140) ]
for line in range(len(text_entire)):
twitter.update_status(status=text_entire[line])
Notes:
Python 2.7, Linux. Python 3.5 is installed & available if needs be.
readline().strip() is used because I want to be able to read a file with lines
of arbitrary length and remove any EOL and whitespace (the last item of the list
could end up as spaces only; twitter will reject a status update of spaces, and I
haven't yet written any error-handling for this).
I read only the first line of the input file and then later in the code write the file back out
minus that line. I decided this was the simplest solution for my limited skills as the bot won't run 24/7
I'm not a programmer, I've hacked this together using scraps of example code I
found lying about on Stack Overflow and elsewhere. I'm trying to use quite simple
code and not rely on 3rd party libs apart from Twython. Generators and iterators
appear as sorcery to me.
Well, maybe just a minor change - it is not necessary to assign the list to a variable nor to iterate over indices, the list can be iterated over directly :
with open(tweet_datafile,'r') as smstweets:
bigtweet = smstweets.readline().strip()
for line in ( bigtweet[i:i+140] for i in range(0,len(bigtweet),140) ):
twitter.update_status(status=line)

python parsing file into data structure

So I started looking into it, and I haven't found a good way to parse a file following the format I will show you below. I have taken a data structures course, but it doesn't really help me with what I want to do. Any help will be greatly appreciated!
Goal: Create a tool that can read, create, and manipulate a custom file type
File Format: I'm sure there is a name for this type of format, but I couldn't find it. Anyways, the format is subject to some change since the variable names can be added, removed, or changed. Also, after each variable name the data could be one of several different types. Right now the files do not use sub groups, but I want to be prepared in case they decide to change that. The only things I can think of that will remain constant are the GROUP = groupName, END_GROUP = groupName, and the varName = data.
GROUP = myGroup
name1 = String, datenum, number, list, array
name2 = String, datenum, number, list, array
// . . .
name# = String, datenum, number, list, array
GROUP = mySubGroup
name1 = String, datenum, number, list, array
END_GROUP = mySubGroup
// More names could go here
END_GROUP = myGroup
GROUP = myGroup2
// etc.
END_GROUP = myGroup2
Strings and dates are enclosed in " (ie "myString")
Numbers are written as a raw ascii encoded number. They also use the E format if they are large or small (ie 5.023E-6)
Lists are comma separated and enclosed in parentheses (ie (1,2,3,4) )
Additional Info:
I want to be able to easily read a file and manipulate it as needed. For example, if I read the file and I want to change an attribute of a specific variable within a group I should be able to do something along the lines of dataStructure.groupName.varName = newData.
It should be easy to create my own file (using a default template that I will make myself or a custom template that has been passed in).
I want it to treat numbers as numbers and not strings. I should be able to add, subtract, multiply, etc. values within the data structure that are numbers
The big kicker, I'd like to have this written in vanilla python since our systems have only the most basic modules. It is a huge pain for someone to download another module since they have to create their own virtual environment and import the module to it. This tool should be as system independent as possible
Initial Attempt: I was thinking of using a dictionary to organize the data in levels. I do, however, like the idea of using dot structures (like what one would see using MATLAB structures). I wrote a function that will read all the lines of the file and remove the newline characters from each line. From there I want to check for every GROUP = I can find. I would start adding data to that group until I hit an END_GROUP line. Using regular expressions I should be able to parse out the line to determine whether it is a date, number, string, etc.
I am asking this question because I hope to have some insight on things I may be missing. I'd like for this tool to be used long after I've left the dev team which is why I'm trying to do my best to make it as intuitive and easy to use as possible. Thank you all for your help, I really appreciate it! Let me know if you need any more information to help you help me.
EDIT: To clarify what help I need, here are my two main questions I am hoping to answer:
How should I build a data structure to hold grouped data?
Is there an accepted algorithm for parsing data like this?

writing a python script that calls different directories

It's kind of hard to explain but I'm using a directory that has a number of different files but essentially I want to loop over files with irregular intervals
so in pseudocode I guess it would be written like:
A = 1E4, 1E5, 5E5, 7E5, 1E6, 1.05E6, 1.1E6, 1.2E6, 1.5E6, 2E6
For A in range(start(A),end(A)):
inputdir ="../../../COMBI_Output/Noise Studies/[A] Macro Particles/10KT_[A]MP_IP1hoN0.0025/"
Run rest of code
Because at the moment I'm doing it manually by changing the value in [A] and its a nightmare and time consuming. I'm using Python on a macbook so I wonder if writing a bash script that is called within Python would be the right idea?
Or replacing A with a text file, such that its:
import numpy as np
mpnum=np.loadtxt("mp.txt")
for A in range(0,len(A)):
for B in range(0,len(A)):
inputdir ="../../../COMBI_Output/Noise Studies/",[A] "Macro Particles/10KT_",[A]"MP_IP1hoN0.0025/"
But I tried this first and still had no luck.
You are almost there. You don't need a range, just iterate over the list. Then do a replacement in the string using format.
A = ['1E4', '1E5', '5E5', '7E5', '1E6', '1.05E6', '1.1E6', '1.2E6', '1.5E6', '2E6']
for a in A:
inputdir = "../../../COMBI_Output/Noise Studies/{} Macro Particles/10KT_{}MP_IP1hoN0.0025/".format(a)
The idea of putting the file names in a list and simply iterating over them using
for a in A:
seems to be the best idea. However, one small suggestion, if I may, instead of having a list, if you're going to have a large number of files inside this list, why not make it a dictionary? In this way, you can iterate through your files easily as well as keep a count on them.

Choosing random number from list, and then removing it?

Let's say that I have a separate text file that contains a series of numbers:
1
2
3
And so on. Is it possible for a Python program to randomly choose one of the numbers in that text file, and then remove that number from the text file? I know it is possible to do the first, but the I am struggling with the second part.
If it helps, the list is about 180000 numbers long. I am very new at this. The idea is to randomly assign a player a number, and then remove that number from the list so another player can't get it.
Do you actually have 180,000 players? If not, what about solving the problem the other way round:
Create a file listing the IDs already used
For each new user:
Create a fairly large random ID (like the ones in your current file)
Run through the 'used' IDs in your file and check your new ID doesn't collide with an existing one - if it does, generate new ones until there is no collision
Append the new ID to your file
This will be much faster than reading, checking and writing a large file each time. If your IDs are large, you won't get many collisions.
You could also optimise the process, for example using a two-part ID consisting of today's date and a random number. You would then keep a file for each day, and only need to check for collisions with the IDs issued today.
The suggestion I would say is that, you read the entire text file, make whatever changes you want to do to it, and then rewrite over the original contents of the file, which is the best way as far as i know
If the file is small, read the whole thing into a list, delete a value from the list, then write the new list to a temp file. Finally, rename the temp file to the original filename.
If the file is large, read the file one line at a time, writing the values (except one) to a temp file. Then rename the temp file to the original filename.
Like dstromberg said, if the file is small, check out the documentation on file IO and this answer's strategy for writing lists to a file. Note that writelines() "does not add line separators."

Categories