Trying to create a Python Script to extract data from .log files - python

I'm trying to create a Python Script but I'm a bit stuck and can't find what I'm looking for on a Google search as it's quite specific.
I need to run a script on two .log files (auth.log and access.log) to view the following information:
Find how many attempts were made with the bin account
So how many attempts the bin account made to try and get into the server.
The logs are based off being hacked and needing to identify how and who is responsible.
Would anyone be able to give me some help in how I go about doing this? I can provide more information if needed.
Thanks in advance.
Edit:
I've managed to print all the times 'bin' appears in the log which is one way of doing it. Does anyone know if I can count how many times 'bin' appears as well?
with open("auth.log") as f:
for line in f:
if "bin" in line:
print line

Given that you work with system logs and their format is known and stable, my approach would be something like:
identify a set of keywords (either common, or one per log)
for each log, iterate line by line
once keywords match, add the relevant information from each line in e.g. a dictionary
You could use shell tools (like grep, cut and/or awk) to pre-process the log and extract relevant lines from the log (I assume you only need e.g. error entries).
You can use something like this as a starting point.

If you want ot use tool then you can use ELK(Elastic,Logstash and kibana).
if no then you have to read first log file then apply regex according to your requirment.

In case you might be interested in extracting some data and save it to a .txt file, the following sample code might be helpful:
import re
import sys
import os.path
expDate = '2018-11-27'
expTime = '11-21-09'
infile = r"/home/xenial/Datasets/CIVIT/Nov_27/rover/NMND17420010S_"+expDate+"_"+expTime+".LOG"
keep_phrases = ["FINESTEERING"]
with open(infile) as f:
f = f.readlines()
with open('/home/xenial/Datasets/CIVIT/Nov_27/rover/GPS_'+expDate+'_'+expTime+'.txt', 'w') as file:
file.write("gpsWeek,gpsSOW\n")
for line in f:
for phrase in keep_phrases:
if phrase in line:
resFind = re.findall('\.*?FINESTEERING,(\d+).*?,(\d+\.\d*)',line)[0]
gpsWeek = re.findall('\.*?FINESTEERING,(\d+)',line)[0]
gpsWeekStr = str(gpsWeek)
gpsSOW = re.findall('\.*?FINESTEERING,'+ gpsWeekStr + ',(\d+\.\d*)',line)[0]
gpsSOWStr = str(gpsSOW)
file.write(gpsWeekStr+','+gpsSOWStr+'\n')
break
print ("------------------------------------")
In my case, FINESTEERING was an interesting keyword in my .log file to extract numbers, including GPS_Week and GPS_Seconds_of_Weeks. You may modify this code to suit your own application.

Related

How to speed up reading/scanning mass files for an exact keyword match

So ive looked on google and the only results i get is "reading large files", not much about how to speed up reading multiple files.
I have a sound-alias-keyword. This keyword will need to be scanned in up to 128 files
and i could have up to 1,600 keywords to scan for in said files.
So as you can see thats a lot of opening/reading. And its loading time is very slow. I cant have it be this slow for my program. I need to reduce the load time by 10 fold.
So i have this code snippet which reads files line by line and if a mention of the keyword is in said line then it will do an exact-match check.
for file in weapon_files:
with open(file, 'r', encoding="ISO-8859-1") as weapon_file:
for m in weapon_file:
if sAlias in m:
t = re.compile(fr'\b{sAlias}\b')
result = t.search(m)
if result:
called_file_.append(''.join(f"root{file[len(WAW_ROOT_DIR):]}"))
I then thought id try and see if i could speed things up by turning the file into a string. Do a basic scan to see if theres any mention and if so, then do an exact-match search. But this approach didnt speed it up by anything worth caring about.
for file in weapon_files:
with open(file, 'r', encoding="ISO-8859-1") as weapon_file:
w = weapon_file.read()
if sAlias in w:
t = re.compile(fr'\b{sAlias}\b')
result = t.search(w)
if result:
called_file_.append(''.join(f"root{file[len(WAW_ROOT_DIR):]}"))
I then thought id just open each file, turn it into a string and then append all the file-strings together, check for any mention, then do an exact-match search. Which did actually reduce the loading time but then i realised i cant use that approach as the whole point of scanning these files for an exact-keyword-match is to then store the matched-file-directory into a list. This approach removes any chance of that.
weaponString = ""
for file in weapon_files:
with open(file, 'r', encoding="ISO-8859-1") as weapon_file:
e = weapon_file.read()
weaponString += e
if sAlias in weaponString:
t = re.compile(fr'\b{sAlias}\b')
result = t.search(weaponString)
if result:
called_file_.append(''.join(f"root{file[len(WAW_ROOT_DIR):]}"))
This is what the files look like.
It may also be worth mentioning these files have no .extension, but i dont think thats an issue as python can still read these files just fine.
WEAPONFILE\displayName\WEAPON_30CAL\modeName\\playerAnimType\smg\altWeapon\\AIOverlayDescription\WEAPON_SUBMACHINEGUNNER\weaponType\bullet\weaponClass\mg\penetrateType\large\impactType\bullet_large\inventoryType\primary\fireType\Full Auto\clipType\lmg\twoHanded\1\rifleBullet\0\armorPiercing\0\boltAction\0\aimDownSight\1\rechamberWhileAds\1\noADSAutoReload\0\noPartialReload\0\segmentedReload\0\adsFire\0\noAdsWhenMagEmpty\0\avoidDropCleanup\0\enhanced\0\bayonet\0\cancelAutoHolsterWhenEmpty\0\suppressAmmoReserveDisplay\0\laserSightDuringNightvision\0\blocksProne\0\silenced\0\mountableWeapon\0\autoAimRange\1200\aimAssistRange\3200\aimAssistRangeAds\3200\enemyCrosshairRange\720\crosshairColorChange\1\moveSpeedScale\0.75\adsMoveSpeedScale\0.75\sprintDurationScale\0.75\gunMaxPitch\6\gunMaxYaw\6\lowAmmoWarningThreshold\0.33\ammoName\30cal\maxAmmo\500\startAmmo\500\clipName\30cal\clipSize\125\shotCount\1\dropAmmoMin\200\dropAmmoMax\250\reloadAmmoAdd\0\reloadStartAdd\0\damage\130\minDamage\90\meleeDamage\150\maxDamageRange\1024\minDamageRange\2400\playerDamage\70\locNone\1\locHelmet\3\locHead\3\locNeck\1\locTorsoUpper\1\locTorsoLower\1\locRightArmUpper\1\locRightArmLower\1\locRightHand\1\locLeftArmUpper\1\locLeftArmLower\1\locLeftHand\1\locRightLegUpper\1\locRightLegLower\1\locRightFoot\1\locLeftLegUpper\1\locLeftLegLower\1\locLeftFoot\1\locGun\0\fireTime\0.096\fireDelay\0\meleeTime\0.5\meleeChargeTime\1\meleeDelay\0.05\meleeChargeDelay\0.15\reloadTime\7\reloadEmptyTime\6\reloadEmptyAddTime\0\reloadStartTime\0\reloadEndTime\0\reloadAddTime\4.75\reloadStartAddTime\0\rechamberTime\0.1\rechamberBoltTime\0\dropTime\0.83\raiseTime\0.9\altDropTime\0.7\altRaiseTime\0\quickDropTime\0.25\quickRaiseTime\0.25\firstRaiseTime\1.5\emptyDropTime\0.5\emptyRaiseTime\0.5\sprintInTime\0.5\sprintLoopTime\0.8\sprintOutTime\0.2\deployTime\0.5\breakdownTime\0.5\nightVisionWearTime\0.5\nightVisionWearTimeFadeOutEnd\0\nightVisionWearTimePowerUp\0\nightVisionRemoveTime\0.5\nightVisionRemoveTimePowerDown\0\nightVisionRemoveTimeFadeInStart\0\standMoveF\0\standMoveR\0\standMoveU\-2\standRotP\0\standRotY\0\standRotR\0\standMoveMinSpeed\0\standRotMinSpeed\0\posMoveRate\8\posRotRate\8\sprintOfsF\1\sprintOfsR\-2\sprintOfsU\-1\sprintRotP\10\sprintRotY\45\sprintRotR\-20\sprintBobH\8\sprintBobV\6\sprintScale\0.9\duckedSprintOfsF\2\duckedSprintOfsR\-1\duckedSprintOfsU\0\duckedSprintRotP\10\duckedSprintRotY\25\duckedSprintRotR\-20\duckedSprintBobH\2\duckedSprintBobV\3\duckedSprintScale\0.8\duckedMoveF\0\duckedMoveR\0\duckedMoveU\-1.5\duckedRotP\0\duckedRotY\0\duckedRotR\0\duckedOfsF\-0.5\duckedOfsR\0.25\duckedOfsU\-0.6\duckedMoveMinSpeed\0\duckedRotMinSpeed\0\proneMoveF\-160\proneMoveR\-75\proneMoveU\-120\proneRotP\0\proneRotY\300\proneRotR\-300\proneOfsF\0\proneOfsR\0.5\proneOfsU\-1\posProneMoveRate\10\posProneRotRate\10\proneMoveMinSpeed\0\proneRotMinSpeed\0\hipIdleAmount\30\adsIdleAmount\28\hipIdleSpeed\1\adsIdleSpeed\0.9\idleCrouchFactor\0.75\idleProneFactor\0.4\adsSpread\0\adsAimPitch\0\adsTransInTime\0.22\adsTransOutTime\0.4\adsTransBlendTime\0.1\adsReloadTransTime\0.3\adsCrosshairInFrac\1\adsCrosshairOutFrac\0.2\adsZoomFov\50\adsZoomInFrac\0.7\adsZoomOutFrac\0.4\adsBobFactor\0\adsViewBobMult\0.25\adsViewErrorMin\0\adsViewErrorMax\0\hipSpreadStandMin\4\hipSpreadDuckedMin\3.5\hipSpreadProneMin\3\hipSpreadMax\10\hipSpreadDuckedMax\8\hipSpreadProneMax\6\hipSpreadFireAdd\0.6\hipSpreadTurnAdd\0\hipSpreadMoveAdd\5\hipSpreadDecayRate\4\hipSpreadDuckedDecay\1.05\hipSpreadProneDecay\1.1\hipGunKickReducedKickBullets\0\hipGunKickReducedKickPercent\0\hipGunKickPitchMin\5\hipGunKickPitchMax\-15\hipGunKickYawMin\5\hipGunKickYawMax\-5\hipGunKickAccel\800\hipGunKickSpeedMax\2000\hipGunKickSpeedDecay\16\hipGunKickStaticDecay\20\adsGunKickReducedKickBullets\0\adsGunKickReducedKickPercent\75\adsGunKickPitchMin\5\adsGunKickPitchMax\15\adsGunKickYawMin\-5\adsGunKickYawMax\10\adsGunKickAccel\800\adsGunKickSpeedMax\2000\adsGunKickSpeedDecay\32\adsGunKickStaticDecay\40\hipViewKickPitchMin\70\hipViewKickPitchMax\80\hipViewKickYawMin\-30\hipViewKickYawMax\-60\hipViewKickCenterSpeed\1500\adsViewKickPitchMin\45\adsViewKickPitchMax\55\adsViewKickYawMin\-70\adsViewKickYawMax\70\adsViewKickCenterSpeed\1800\swayMaxAngle\4\swayLerpSpeed\6\swayPitchScale\0.1\swayYawScale\0.1\swayHorizScale\0.2\swayVertScale\0.2\swayShellShockScale\5\adsSwayMaxAngle\4\adsSwayLerpSpeed\6\adsSwayPitchScale\0.1\adsSwayYawScale\0\adsSwayHorizScale\0.08\adsSwayVertScale\0.1\fightDist\720\maxDist\340\aiVsAiAccuracyGraph\thompson.accu\aiVsPlayerAccuracyGraph\light_machine_gun.accu\reticleCenter\\reticleSide\reticle_side_small\reticleCenterSize\4\reticleSideSize\8\reticleMinOfs\0\hipReticleSidePos\0\adsOverlayShader\\adsOverlayShaderLowRes\\adsOverlayReticle\none\adsOverlayWidth\220\adsOverlayHeight\220\gunModel\viewmodel_usa_30cal_lmg\gunModel2\\gunModel3\\gunModel4\\gunModel5\\gunModel6\\gunModel7\\gunModel8\\gunModel9\\gunModel10\\gunModel11\\gunModel12\\gunModel13\\gunModel14\\gunModel15\\gunModel16\\handModel\viewmodel_hands_no_model\worldModel\weapon_usa_30cal_lmg\worldModel2\\worldModel3\\worldModel4\\worldModel5\\worldModel6\\worldModel7\\worldModel8\\worldModel9\\worldModel10\\worldModel11\\worldModel12\\worldModel13\\worldModel14\\worldModel15\\worldModel16\\worldClipModel\\knifeModel\viewmodel_usa_kbar_knife\worldKnifeModel\weapon_usa_kbar_knife\idleAnim\viewmodel_30cal_idle\emptyIdleAnim\viewmodel_30cal_empty_idle\fireAnim\viewmodel_30cal_fire\lastShotAnim\viewmodel_30cal_lastshot\rechamberAnim\\meleeAnim\viewmodel_knife_slash\meleeChargeAnim\viewmodel_knife_stick\reloadAnim\viewmodel_30cal_partial_reload\reloadEmptyAnim\viewmodel_30cal_reload\reloadStartAnim\\reloadEndAnim\\raiseAnim\viewmodel_30cal_pullout\dropAnim\viewmodel_30cal_putaway\firstRaiseAnim\viewmodel_30cal_first_raise\altRaiseAnim\\altDropAnim\\quickRaiseAnim\viewmodel_30cal_pullout_fast\quickDropAnim\viewmodel_30cal_putaway_fast\emptyRaiseAnim\viewmodel_30cal_pullout_empty\emptyDropAnim\viewmodel_30cal_putaway_empty\sprintInAnim\\sprintLoopAnim\\sprintOutAnim\\nightVisionWearAnim\\nightVisionRemoveAnim\\adsFireAnim\viewmodel_30cal_ADS_fire\adsLastShotAnim\viewmodel_30cal_ADS_lastshot\adsRechamberAnim\\adsUpAnim\viewmodel_30cal_ADS_up\adsDownAnim\viewmodel_30cal_ADS_down\deployAnim\\breakdownAnim\\viewFlashEffect\weapon/muzzleflashes/fx_30cal_bulletweap_view\worldFlashEffect\weapon/muzzleflashes/fx_30cal_bulletweap\viewShellEjectEffect\weapon/shellejects/fx_heavy_link_view\worldShellEjectEffect\weapon/shellejects/fx_heavy\viewLastShotEjectEffect\\worldLastShotEjectEffect\\worldClipDropEffect\\pickupSound\weap_pickup\pickupSoundPlayer\weap_pickup_plr\ammoPickupSound\ammo_pickup\ammoPickupSoundPlayer\ammo_pickup_plr\breakdownSound\\breakdownSoundPlayer\\deploySound\\deploySoundPlayer\\finishDeploySound\\finishDeploySoundPlayer\\fireSound\weap_30cal_fire\fireSoundPlayer\weap_30cal_fire_plr\lastShotSound\weap_30cal_fire\lastShotSoundPlayer\weap_30cal_fire_plr\emptyFireSound\dryfire_rifle\emptyFireSoundPlayer\dryfire_rifle_plr\crackSound\\whizbySound\\meleeSwipeSound\melee_swing\meleeSwipeSoundPlayer\melee_swing_plr\meleeHitSound\melee_hit\meleeMissSound\\rechamberSound\\rechamberSoundPlayer\\reloadSound\gr_30cal_3p_full\reloadSoundPlayer\\reloadEmptySound\gr_30cal_3p_full\reloadEmptySoundPlayer\\reloadStartSound\\reloadStartSoundPlayer\\reloadEndSound\\reloadEndSoundPlayer\\altSwitchSound\\altSwitchSoundPlayer\\raiseSound\weap_raise\raiseSoundPlayer\weap_raise_plr\firstRaiseSound\weap_raise\firstRaiseSoundPlayer\weap_raise_plr\putawaySound\weap_putaway\putawaySoundPlayer\weap_putaway_plr\nightVisionWearSound\\nightVisionWearSoundPlayer\\nightVisionRemoveSound\\nightVisionRemoveSoundPlayer\\standMountedWeapdef\\crouchMountedWeapdef\\proneMountedWeapdef\\mountedModel\\hudIcon\hud_icon_30cal\killIcon\hud_icon_30cal\dpadIcon\\ammoCounterIcon\\hudIconRatio\4:1\killIconRatio\4:1\dpadIconRatio\4:1\ammoCounterIconRatio\4:1\ammoCounterClip\Beltfed\flipKillIcon\1\fireRumble\defaultweapon_fire\meleeImpactRumble\defaultweapon_melee\adsDofStart\0\adsDofEnd\7.5\hideTags\\notetrackSoundMap\gr_30cal_start_plr gr_30cal_start_plr
gr_30cal_open_plr gr_30cal_open_plr
gr_30cal_grab_belt_plr gr_30cal_grab_belt_plr
gr_30cal_belt_remove_plr gr_30cal_belt_remove_plr
gr_30cal_belt_raise_plr gr_30cal_belt_raise_plr
gr_30cal_belt_contact_plr gr_30cal_belt_contact_plr
gr_30cal_belt_press_plr gr_30cal_belt_press_plr
gr_30cal_close_plr gr_30cal_close_plr
gr_30cal_charge_plr gr_30cal_charge_plr
gr_30cal_ammo_toss_plr gr_30cal_ammo_toss_plr
gr_30cal_charge_release_plr gr_30cal_charge_release_plr
gr_30cal_lid_bonk_plr gr_30cal_lid_bonk_plr
knife_stab_plr knife_stab_plr
knife_pull_plr knife_pull_plr
Knife_slash_plr Knife_slash_plr
gr_mg_deploy_start gr_mg_deploy_start
gr_mg_deploy_end gr_mg_deploy_end
gr_mg_break_down gr_mg_break_down
gr_30cal_tap_plr gr_30cal_tap_plr
Any help is appreciated.
Instead of searching line by line, you can search the entire file at once. I have included a code example below, which searches one file for multiple keywords and prints the keywords found.
keywords = ["gr_30cal_open_plr", "gr_mg_deploy_end", "wontfindthis"]
with open("test.txt") as f:
contents = f.read()
# Search the file for each keyword.
keywords_found = {keyword for keyword in keywords if keyword in contents}
if keywords_found:
print("This file contains the following keywords:")
print(keywords_found)
else:
print("This file did not contain any keywords.")
I'll explain the code. f.read() will read the file contents. Then I use a set comprehension to get all of the keywords found in the file. I use a set because that will keep only the unique keywords -- I assume you don't need to know how many times a keyword appears in the file. (A set comprehension is similar to a list comprehension, but it creates a set.) Testing whether the keyword is in the file is as easy as keyword in contents.
I used your sample file contents and duplicated it multiple times so the file contained 45,252,362 lines (1.8 GB). And my code above took less than 1 second.
well you can use multiprocessing to speed up your work, but I don't think it is the best way, but well I am sharing the code so you can try it for yourself and see if that work for you or not.
import multiprocessing
def process(file):
with open(file, 'r', encoding="ISO-8859-1") as weapon_file:
for m in weapon_file:
if sAlias in m:
t = re.compile(fr'\b{sAlias}\b')
result = t.search(m)
if result:
called_file_.append
(''.join(f"root{file[len(WAW_ROOT_DIR):]}"))
p = multiprocessing.Pool()
for file in weapon_files:
# now launch a process for each file.
# The result will be approximately one process per CPU core available.
p.apply_async(process, [file])
p.close()
p.join() # Wait for all child processes to be closed.
hope it helps.

How can I read and search multiple textfiles so that I can store a list of files that match my search?

I hope you can help out a new learner of Python. I could not find my problem in other questions, but if so: apologies. What I basically want to do is this:
Read a large number of text files and search each for a number of string terms.
If the search terms are matched, store the corresponding file name to a new file called "filelist", so that I can tell the good files from the bad files.
Export "filelist" to Excel or CSV.
Here is the code that I have so far:
#textfiles all contain only simple text e.g. "6 Apples"
filelist=[]
for file in os.listdir('C:/mydirectory/'):
with open('C:/mydirectory/' + file, encoding="Latin1") as f:
fine=f.read()
if re.search('APPLES',fine) or re.search('ORANGE',fine) or re.search('BANANA',fine):
filelist.append(file)
listoffiles = pd.DataFrame(filelist)
writer = pd.ExcelWriter('ListofFiles.xlsx', engine='xlsxwriter')
listoffiles.to_excel(writer,sheet_name='welcome',index=False)
writer.save()
print(filelist)
Questions:
Surely, there is a more elegant or time-efficient way? I need to do this for a large amount of files :D
Related to the former, is there a way to solve the reading-in of files using pandas? Or is it less time efficient? For me as a STATA user, having a dataframe feels a bit more like home....
I added the "Latin1" option, as some characters in the raw data create conflict in encoding. Is there a way to understand which characters are causing the problem? Can I get rid of this easily, e.g. by cutting of the first line beforehand (skiprow maybe)?
Just couple of things to speed up the script:
1.) compile your regex beforehand, not every time in the loop (also use | to combine multiple strings to one regex!
2.) read files line by line, not all at once!
3.) Use any() to terminate search when you get first positive
For example:
import re
import os
filelist=[]
r = re.compile(r'APPLES|ORANGE|BANANA') # you can add flags=re.I for case insensitive search
for file in os.listdir('C:/mydirectory/'):
with open('C:/mydirectory/' + file, 'r', encoding='latin1') as f:
if any(r.search(line) for line in f): # read files line by line, not all content at once
filelist.append(file) # add to list
# convert list to pandas, etc...

Regex search term used in Notepad++ does not work with python

I'm working with a large .json filled with twitter bios and would like to extract screen_names. To prevent that the search also returns potential users mentioned in the bio section it is important only to extract the first match ofeach line.
When I open the file in Notepad++ I can use the following regex to do exactly that:
(^.*?)\K"screen_name": "(\w+)"
Using the same as part of an re.findall or re.search in python does not result in any matches.
I'm totally new to both Python and regex so I'm fairly certain I'm not fully aware of all the necessary coding.
Many thanks in advance!
As noted by other users Python and Notepad use different search codes, and so to achieve my wanted result I deployed the following code:
import re
regex=re.compile(r'"screen_name":\s*"(\w+)"')
with open("followers.json", "r") as f:
for line in f:
output=regex.search(line)
with open("followers.txt", "a") as outp:
outp.write(output.group(1)+"\n")
This will analyse your specified .json file, read it line by line, and save every first match of each line in the file "followers.txt".

Webbrowser() reading through a text file for URLS

I am trying to write a script to automate browsing to my most commonly visited websites. I have put the websites into a list and am trying to open it using the webbrowser() module in Python. My code looks like the following at the moment:
import webbrowser
f = open("URLs", "r")
list = f.readline()
for line in list:
webbrowser.open_new_tab(list)
This only reads the first line from my file "URLs" and opens it in the browser. Could any one please help me understand how I can achieve reading through the entire file and also opening the URLs in different tabs?
Also other options that can help me achieve the same.
You have two main problems.
The first problem you have is that you are using readline and not readlines. readline will give you the first line in the file, while readlines gives you a list of your file contents.
Take this file as an example:
# urls.txt
http://www.google.com
http://www.imdb.com
Also, get in to the habit of using a context manager, as this will close the file for you once you have finished reading from it. Right now, even though for what you are doing, there is no real danger, you are leaving your file open.
Here is the information from the documentation on files. There is a mention about best practices with handling files and using with.
The second problem in your code is that, when you are iterating over list (which you should not use as a variable name, since it shadows the builtin list), you are passing list in to your webrowser call. This is definitely not what you are trying to do. You want to pass your iterator.
So, taking all this in to mind, your final solution will be:
import webbrowser
with open("urls.txt") as f:
for url in f:
webbrowser.open_new_tab(url.strip())
Note the strip that is called in order to ensure that newline characters are removed.
You're not reading the file properly. You're only reading the first line. Also, assuming you were reading the file properly, you're still trying to open list, which is incorrect. You should be trying to open line.
This should work for you:
import webbrowser
with open('file name goes here') as f:
all_urls = f.read().split('\n')
for each_url in all_urls:
webbrowser.open_new_tab(each_url)
My answer is assuming that you have the URLs 1 per line in the text file. If they are separated by spaces, simply change the line to all_urls = f.read().split(' '). If they're separated in another way just change the line to split accordingly.

What is the best way to do a find and replace of multiple queries on multiple files?

I have a file that has over 200 lines in this format:
name old_id new_id
The name is useless for what I'm trying to do currently, but I still want it there because it may become useful for debugging later.
Now I need to go through every file in a folder and find all the instances of old_id and replace them with new_id. The files I'm scanning are code files that could be thousands of lines long. I need to scan every file with each of the 200+ ids that I have, because some may be used in more than one file, and multiple times per file.
What is the best way to go about doing this? So far I've been creating python scripts to figure out the list of old ids and new ids and which ones match up with each other, but I've been doing it very inefficient because I basically scanned the first file line by line and got the current id of the current line, then I would scan the second file line by line until I found a match. Then I did this over again for each line in the first file, which ended up with my reading the second file a lot. I didn't mind doing this inefficiently because they were small files.
Now that I'm searching probably somewhere around 30-50 files that can have thousands of line of code in it, I want it to be a little more efficient. This is just a hobbyist project, so it doesn't need to be super good, I just don't want it to take more than 5 minutes to find and replace everything, then look at the result and see that I made a little mistake and need to do it all over again. Taking a few minutes is fine(although I'm sure with computers nowadays they can do this almost instantly still) but I just don't want it to be ridiculous.
So what's the best way to go about doing this? So far I've been using python but it doesn't need to be a python script. I don't care about elegance in the code or way I do it or anything, I just want an easy way to replace all of my old ids with my new ids using whatever tool is easiest to use or implement.
Examples:
Here is a line from the list of ids. The first part is the name and can be ignored, the second part is the old id, and the third part is the new id that needs to replace the old id.
unlock_music_play_grid_thumb_01 0x108043c 0x10804f0
Here is an example line in one of the files to be replaced:
const v1, 0x108043c
I need to be able to replace that id with the new id so it looks like this:
const v1, 0x10804f0
Use something like multiwordReplace (I've edited it for your situation) with mmap.
import os
import os.path
import re
from mmap import mmap
from contextlib import closing
id_filename = 'path/to/id/file'
directory_name = 'directory/to/replace/in'
# read the ids into a dictionary mapping old to new
with open(id_filename) as id_file:
ids = dict(line.split()[1:] for line in id_file)
# compile a regex to do the replacement
id_regex = re.compile('|'.join(map(re.escape, ids)))
def translate(match):
return ids[match.group(0)]
def multiwordReplace(text):
return id_regex.sub(translate, text)
for code_filename in os.listdir(directory_name):
with open(os.path.join(directory, code_filename), 'r+') as code_file:
with closing(mmap(code_file.fileno(), 0)) as code_map:
new_file = multiword_replace(code_map)
with open(os.path.join(directory, code_filename), 'w') as code_file:
code_file.write(new_file)

Categories