I am building a small tool in Python, the function of the tool is the following:
Open a master data file (cvs format)
open a log file (cvs format)
Request the user to select the pointer for the field that will need to be compared in both files.
Start comparing record by record
when the field is not found in the record proceed to look forward in the log file until the field can be found, in the meantime keep in memory the pointer for when the comparison will be continued.
Once the field is found proceed to cut the whole record off that line and place it in the right position
Here is an example :
Data file
"1","1234","abc"
"2","5678","def"
"3","9012","ghi"
log file
"1","1234","abc"
"3","9012","ghi"
"2","5678","def"
final log file :
"1","1234","abc"
"2","5678","def"
"3","9012","ghi"
I was looking in the cvs lib in python and the sqlite3 lib but there is nothing that seems to really do a swap in file so i was thinking that maybe i should just create a new file with all the records in order.
What could be done in this regard ? Is there a library or a command that can move records in an existing file ?
I would prefer to modify the existing file instead of creating a new one but if it's not possible i would just move to create a new one.
In addition to that the code i was planning to use to verify the files was this :
import csv
reader1 = csv.reader(open('data.csv', 'rb'), delimiter=',', quotechar='"'))
row1 = reader1.next()
reader2 = csv.reader(open('log.csv', 'rb'), delimiter=',', quotechar='"'))
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[2:] == row2[2:]):
#here it move to the next record
else:
#here it would run a function that replace the field
Please note that this piece of code was found at this page :
Python: Comparing specific columns in two csv files
(i don't want to take away the glory from another coder).
I just like it for it's simplicity.
Thanks to all for the attention.
Regards
Danilo
Related
Update: Sorry it seems my question wasn't asked properly. So I am analyzing a transportation network consisting of more than 5000 links. All the data included in a big CSV file. I have several JSON files which each consist of subset of this network. I am trying to loop through all the JSON files INDIVIDUALLY (i.e. not trying to concatenate or something), read the JSON file, extract the information from the CVS file, perform calculation, and save the information along with the name of file in new dataframe. Something like this:
enter image description here
This is the code I wrote, but not sure if it's efficient enough.
name=[]
percent_of_truck=[]
path_to_json = \\directory
import glob
z= glob.glob(os.path.join(path_to_json, '*.json'))
for i in z:
with open(i, 'r') as myfile:
l=json.load(myfile)
name.append(i)
d_2019= final.loc[final['LINK_ID'].isin(l)] #retreive data from main CSV file
avg_m=(d_2019['AADTT16']/d_2019['AADT16']*d_2019['Length']).sum()/d_2019['Length'].sum() #calculation
percent_of_truck.append(avg_m)
f=pd.DataFrame()
f['Name']=name
f['% of truck']=percent_of_truck
I'm assuming here you just want a dictionary of all the JSON. If so, use the JSON library ( import JSON). If so, this code may be of use:
import json
def importSomeJSONFile(f):
return json.load(open(f))
# make sure the file exists in the same directory
example = importSomeJSONFile("example.json")
print(example)
#access a value within this , replacing key with what you want like "name"
print(JSON_imported[key])
Since you haven't added any Schema or any other specific requirements.
You can follow this approach to solve your problem, in any language you prefer
Get Directory of the JsonFiles, which needs to be read
Get List of all files present in directory
For each file-name returned in Step2.
Read File
Parse Json from String
Perform required calculation
i have written a python script that reads continuously a csv file outputted and updated by a sensor and live plots some data with matplotlib.
Everytime i start recording data from the sensor it creates a new file like:
data-2020_02_27_14_42_29.csv
So everytime i have to update my script to point the correct csv file.
How i can automate this?
Is there a way to recognize the creation of a new file in a specific directory and take the name of it?
with open('/home/matteo/Documents/PlatformIO/Projects/200213-123258-INS/data/data-2020_02_27_14_42_29.csv', 'r') as csv_file:
csv_reader=csv.DictReader(csv_file, delimiter=',')
Thanks
You can get a list of what files there were at the start at end of your script, then get the new item by comparing the two lists.
def read(file):
folder='/home/matteo/Documents/PlatformIO/Projects/200213-123258-INS/data'
start_list=[file for file in os.listdir(folder)]
with open(os.path.join(folder,file), 'r') as csv_file:
csv_reader=csv.DictReader(csv_file, delimiter=',')
#any other code that happens after reading the csv....
end_list=[file for file in os.listdir(folder)]
new_file=list(set(start_list)-set(end_list))
#checks that a new file was found (if nothing new list is empty)
#if it is empty, re-run function on existing file
if new_file!=[]
new_file=new_file[0]
else:
new_file=file
#recursion to call the function on the new file we found
read(new_file)
I've just managed to run my python code on ubuntu, all seems to be going well. My python script writes out a .csv file every hour and I can't seem to find the .csv file.
Having the .csv file is important as I need to do research on the data. I am also using Filezilla, I would have thought the .csv would have run into there.
import csv
import time
collectionTime= datetime.now().strftime('%Y-%m-%d %H:%M:%S')
mylist= [d['Spaces'] for d in data]
mylist.append(collectionTime)
print(mylist)
with open("CarparkData.csv","a",newline="") as f:
writer = csv.writer(f)
writer.writerow(mylist)
In short, your code is outputting to wherever the file you're opening is in this line:
with open("CarparkData.csv","a",newline="") as f:
You can change this filename to the location of wherever you'd like the file to be read/written from/to. For example, data/CarparkData.csv if you had a folder named data/ within your application dedicated to holding data files.
As written in your code, writer.writerow will write the lines to both python's in-memory object of the file (instantiated with open("filename.csv"...), and the file itself (in this case, CarparkData.csv).
The way your code is structured, it won't be creating a new .csv every hour because it is using a static filename. If a file with this name did not exist at time of opening, it will create one, and if it did, it will continue to append new lines to the existing file.
I am using the application Zim Wiki(cross-platform, FOSS), which I am using to keep a personal wiki with lots of data coming from tables, copy and pasting, my own writing, and downloading and attaching .png and .html files for viewing and offline access.
The data that is not written or pasted can be stored in tables in the form of names, url addresses, and the names and locations of images and other attachments.
To insert into zim, I can use the front end with WSIWYG, or to make the skeleton of each entry, I could modify a template text entry. If I do this, nothing matters except for the location and identity of each character in each line.
By supplying the text in this image:
DandelionDemo source text,
--I can make this entry for Dandelion:
DandelionDemo Wiki.
So, I can generate and name the Wiki entry in Zim, which creates the .txt file for me, and inserts the time stamp and title, so, the template for this type of entry without the pasted fields would be:
**Full Scientific Name: **[[|]]**[syn]**
**Common Name(s): **
===== =====
**USDA PLANTS entry for Code:** [[https://plants.usda.gov/core/profile?symbol=|]] **- CalPhotos available images for:** [[https://calphotos.berkeley.edu/cgi/img_query?query_src=photos_index&where-taxon=|]]
**---**
**From - Wikipedia **[[wp?]] **- **[[/Docs/Plants/]]
{{/Docs/Plants/?height=364}}{{/Docs/Plants/?height=364}}
**()** //,// [[|(source)]]
**()** //// [[|(source)]]
**Wikipedia Intro: **////
---
So the first line with content, after the 31st character(which is a tab), you paste "http... {etc}. Then the procedure would insert "Taraxacum officinale... {etc}" after the "|", or what was the 32nd character, and so on. This data could be from "table1" and "table2", or combining the tables to make an un-normalized "table1table2", where each row could be converted to text or a .csv or I don't know, what do you think?
Is there a way, in LibreOffice to do this? I have used LibreOffice Base to generate a "book" form that populated fields, but it was much less complex data, without wiki liking and drag-and-drop pasting of images and attachments. So maybe the answer is to go simpler? The tables are not currently part of a registered database, but I could do that, once I have decided on the method of doing this.
I am ultimately looking for a "way", hopefully an "easy" way. However, that may not be in LibreOffice. If not, I know that I could do this in Python, but I haven't learned much about Python yet. If it involves a language, that is the first and only one I don't know that I will invest in learning for this project. If you know a "way" to do this in Python, let me know, and my first project and way of framing my study process will be in learning the methods that you share.
If you know of some other Linux GUI, I am definitely interested, but only in active free and open source builds that involve minimal/no compiling. I know the basics of SQL and DBMS's. In the past, have gotten Microsoft SQL server lite to work, but not DBeaver, yet. If you know of a CLI way also let me know, but I am a self-taught outdoors-loving Linux newb and mostly know about how to tweak little settings in programs, how to use moderately easy programs like ImageMagick, and I have built a few Lamp stacks for Drupal and Wordpress (no BASH etc...).
Thank you very much!
Ok, since you want to learn some python, let me propose you a way to do it this. First you need a template engine -like jinja2 (there are many others)-, a data source in our example a .csv file, -could be other like a db- and finally some code that reads the csv line by line and mix the content with the template.
Sample CSV file:
1;sample
2;dandelion
3;just for fun
Sample template:
**Full Scientific Name: **[[|]]**[syn]**
**Common Name(s): *{{name}}*
===== =====
USDA PLANTS entry for Code: *{{symbol}}*
---
Sample code:
#!/usr/bin/env/python
#
# Using the file system load
#
# We now assume we have a file in the same dir as this one called
# test_template.ziim
#
from jinja2 import Environment, FileSystemLoader
import os
import csv
# Capture our current directory
THIS_DIR = os.path.dirname(os.path.abspath(__file__))
def print_zim_doc():
# Create the jinja2 environment.
# Notice the use of trim_blocks, which greatly helps control whitespace.
j2_env = Environment(loader=FileSystemLoader(THIS_DIR),
trim_blocks=True)
template = j2_env.get_template('test_template.zim')
with open('example.csv') as File:
reader = csv.reader(File, delimiter=';')
for row in reader:
result = template.render(
symbol=row[0]
, name=row[1]
)
# to save the results
with open(row[0]+".txt", "wt") as fh:
fh.write(result)
fh.close()
if __name__ == '__main__':
print_zim_doc()
The code is pretty simple, reads the template located in the same folder as the python code, opens the csv file (also located in the same place), iterates over each line of the csv and renders the template using the values of the csv columns to fill the {{var_name}} in the template, finally saves the rendered result in a new file named as one of the csv column values. This sample will generate 3 files (1.txt, 2.txt, 3.txt). From here you can extend and improve the code to get your desired results.
I am currently programming a game that requires reading and writing lines in a text file. I was wondering if there is a way to read a specific line in the text file (i.e. the first line in the text file). Also, is there a way to write a line in a specific location (i.e. change the first line in the file, write a couple of other lines and then change the first line again)? I know that we can read lines sequentially by calling:
f.readline()
Edit: Based on responses, apparently there is no way to read specific lines if they are different lengths. I am only working on a small part of a large group project and to change the way I'm storing data would mean a lot of work.
But is there a method to change specifically the first line of the file? I know calling:
f.write('text')
Writes something into the file, but it writes the line at the end of the file instead of the beginning. Is there a way for me to specifically rewrite the text at the beginning?
If all your lines are guaranteed to be the same length, then you can use f.seek(N) to position the file pointer at the N'th byte (where N is LINESIZE*line_number) and then f.read(LINESIZE). Otherwise, I'm not aware of any way to do it in an ordinary ASCII file (which I think is what you're asking about).
Of course, you could store some sort of record information in the header of the file and read that first to let you know where to seek to in your file -- but at that point you're better off using some external library that has already done all that work for you.
Unless your text file is really big, you can always store each line in a list:
with open('textfile','r') as f:
lines=[L[:-1] for L in f.readlines()]
(note I've stripped off the newline so you don't have to remember to keep it around)
Then you can manipulate the list by adding entries, removing entries, changing entries, etc.
At the end of the day, you can write the list back to your text file:
with open('textfile','w') as f:
f.write('\n'.join(lines))
Here's a little test which works for me on OS-X to replace only the first line.
test.dat
this line has n characters
this line also has n characters
test.py
#First, I get the length of the first line -- if you already know it, skip this block
f=open('test.dat','r')
l=f.readline()
linelen=len(l)-1
f.close()
#apparently mode='a+' doesn't work on all systems :( so I use 'r+' instead
f=open('test.dat','r+')
f.seek(0)
f.write('a'*linelen+'\n') #'a'*linelen = 'aaaaaaaaa...'
f.close()
These days, jumping within files in an optimized fashion is a task for high performance applications that manage huge files.
Are you sure that your software project requires reading/writing random places in a file during runtime? I think you should consider changing the whole approach:
If the data is small, you can keep / modify / generate the data at runtime in memory within appropriate container formats (list or dict, for instance) and then write it entirely at once (on change, or only when your program exits). You could consider looking at simple databases. Also, there are nice data exchange formats like JSON, which would be the ideal format in case your data is stored in a dictionary at runtime.
An example, to make the concept more clear. Consider you already have data written to gamedata.dat:
[{"playtime": 25, "score": 13, "name": "rudolf"}, {"playtime": 300, "score": 1, "name": "peter"}]
This is utf-8-encoded and JSON-formatted data. Read the file during runtime of your Python game:
with open("gamedata.dat") as f:
s = f.read().decode("utf-8")
Convert the data to Python types:
gamedata = json.loads(s)
Modify the data (add a new user):
user = {"name": "john", "score": 1337, "playtime": 1}
gamedata.append(user)
John really is a 1337 gamer. However, at this point, you also could have deleted a user, changed the score of Rudolf or changed the name of Peter, ... In any case, after the modification, you can simply write the new data back to disk:
with open("gamedata.dat", "w") as f:
f.write(json.dumps(gamedata).encode("utf-8"))
The point is that you manage (create/modify/remove) data during runtime within appropriate container types. When writing data to disk, you write the entire data set in order to save the current state of the game.