Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
i have data inside a directory as follows
IU.WRT.00.MTR.1999.081.081015.txt
IU.WRT.00.MTS.2007.229.022240.txt
IU.WRT.00.MTR.2007.229.022240.txt
IU.WRT.00.MTT.1999.081.081015.txt
IU.WRT.00.MTS.1999.081.081015.txt
IU.WRT.00.MTT.2007.229.022240.txt
and i want to read data group wise,
At first I want to read 3 files with similar pattern (differ by R,S,T)
IU.WRT.00.MTR.1999.081.081015.txt
IU.WRT.00.MTS.1999.081.081015.txt
IU.WRT.00.MTT.1999.081.081015.txt
and want to apply some operations on it
and then i want to read data
IU.WRT.00.MTT.2007.229.022240.txt
IU.WRT.00.MTS.2007.229.022240.txt
IU.WRT.00.MTR.2007.229.022240.txt
and want to apply similar operation on it.
In the sameway i want to continue the process for millions of data sets.
I tried the example script
import os
import glob
import matplotlib.pyplot as plt
from collections import defaultdict
def groupfiles(pattern):
files = glob.glob(pattern)
filedict = defaultdict(list)
for file in files:
parts = file.split(".")
filedict[".".join([parts[5], parts[6], parts[7]])].append(file)
for filegroup in filedict.values():
yield filegroup
for relatedfiles in groupfiles('*.txt'):
print(relatedfiles)
for filename in relatedfiles:
print(filename)
However it reads the file one by one but i need to read 3 file at a time.I hope experts may help me.Thanks in advance.
Use proper patterns to get the files
files_1999 = glob.glob('IU.WRT.00.MT[RST].1999.081.081015.txt')
To generalize,
years = set(file.split('.')[4] for file in glob.glob('*.txt'))
file_group = {}
for year in years:
pattern = f'IU.WRT.00.MT[RST].{year}*.txt'
file_group[year] = glob.glob(pattern)
Output
{
"2007":[
"IU.WRT.00.MTS.2007.229.022240.txt",
"IU.WRT.00.MTR.2007.229.022240.txt",
"IU.WRT.00.MTT.2007.229.022240.txt"
],
"1999":[
"IU.WRT.00.MTS.1999.081.081015.txt",
"IU.WRT.00.MTR.1999.081.081015.txt",
"IU.WRT.00.MTT.1999.081.081015.txt"
]
}
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
I am currently trying to sum two .txt files containing each other over 35 millions value and put the result in a third file.
File 1 :
2694.28
2694.62
2694.84
2695.17
File 2 :
1.483429484776452
2.2403221757269196
1.101004844694236
1.6119626937837102
File 3 :
2695.76343
2696.86032
2695.941
2696.78196
Any idea to do that with python ?
You can use numpy for speed. It will be much faster than pure python. Numpy uses C/C++ for a lot of it's operations.
import numpy
import os
path = os.path.dirname(os.path.realpath(__file__))
file_name_1 = path + '/values_1.txt'
file_name_2 = path + '/values_2.txt'
a = numpy.loadtxt(file_name_1, dtype=float)
b = numpy.loadtxt(file_name_2, dtype=float)
c = a + b
precision = 10
numpy.savetxt(path + '/sum.txt', c, fmt=f'%-.{precision}f')
This assumes your .txt files are located where your python script is located.
You can use pandas.read_csv to read, sum, and then write chunks of your file.
Presumably all 35 million records do not stay in memory. You need to read the file by chunk. In this way you read one chunk at a time, and load into memory only one chunk (2 actually one for file1 and one for file2), do the sum and write into memory one chunk at a time in append mode on file3.
In this dummy example I put as chunksize=2, because doing tests on your inputs that are 4 long. It depends on the server you are working on, do some tests and see what is the best size for your problem (50k, 100k, 500k, 1kk etc).
import pandas as pd
chunksize = 2
with pd.read_csv("file1.txt", chunksize=chunksize, header=None) as reader1, pd.read_csv("file2.txt", chunksize=chunksize, header=None) as reader2:
for chunk1, chunk2 in zip(reader1, reader2):
(chunk1 + chunk2).to_csv("file3.txt", index=False, header=False, mode='a')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a list of phone numbers in excel sheet column A, i need a python code to extract all the numbers to a .txt file in one line separated with commas.
list of phone numbers
example: 560000000,560000001,560000003,560000004,560000005,560000006,560000007,560000008,560000009,560000010,560000011,560000012
enter image description here
from openpyxl import load_workbook
book = load_workbook('1.xlsx')
sheet = book.active
first_column = sheet['A']
with open('out.txt', 'w') as outfile:
for x in range(len(first_column)):
outfile.write(str(first_column[x].value)+',')
you can use openpyxl which is pretty much intuitive.
load_workbook opens your excel file specified in the path
we need to select the sheet
select the column in my case it is A
Open text file and iterate over the column to write the data into text file.
try this
import pandas as pd
df = pd.read_excel('test.xlsx', header=None)
print(','.join(map(str,df[0].to_list())))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have some data of 50 people in 50 different excel files placed in the same folder. For each person the data is present in five different files like shown below:
Example:
Person1_a.xls, Person1_b.xls, Person1_c.xls, Person1_d.xls, Person1_e.xls.
Each excel sheet has two columns and multiple sheets. I need to create a file Person1.xls which will have the second column of all these files, combined. Same process should be applicable for all the 50 people.
Any suggestions would be appreciated.
Thank you!
I have created a trial folder that I believe is similar to yours. I added data only for Person1 and Person3.
In the attached picture, the files called Person1 and Person3 are the exported files that include only the 2nd column for each person. So each person has their own file now.
I added a small description on what each line does. Please let me know if something is not clear.
import pandas as pd
import glob
path = r'C:\..\trial' # use your path where the files are
all_files = glob.glob(path + "/*.xlsx") # will get you all files with an extension .xlsx in a folder
li = []
for i in range(0,51): # numbers from 1 to 50 (for the 50 different people)
for f in all_files:
if str(i) in f: # checks if the number (i) is in the excel name
df = pd.read_excel(f,
sheet_name=0, # import 1st sheet
usecols=([1])) # only import column 2
df['person'] = f.rsplit('\\',1)[1].split('_')[0] # get the name of the person in a column
li.append(df) # add it to the list of dataframes
all_person = pd.concat(li, axis=0, ignore_index=True) # concat all dataframes imported
Then you can export to the same path, a different excel file for each different person
for i,j in all_person.groupby('person'):
j.to_excel(f'{path}\{i}.xlsx', index = False)
I am aware that this is probably not the most efficient way, but it will probably get you what you need.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
The Picture below shows an excel sheet which is taken as input(in which all the strings are separated by "," and are in the first column)
Output: An excel sheet in which all the strings are split by the comma as delimiter and should be seen in successive columns
Please help me out!!!!
Thanks in advance
you could try :
wb = Workbook(your_filename)
ws = wb.add_worksheet()
openfile = open(yourfile, 'rt')
filereader = csv.reader(openfile)
for posr, rows in enumerate(filereader):
for posc, col_info in enumerate(rows):
ws.write(posr, posc, col_info)
wb.close()
You might need to import the following libs:
import os
from xlsxwriter.workbook import Workbook
import csv
I hope this help you!
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to import csv file into python and then create a table in python to display the contents of the imported csv file.
Further need to do manipulations on the data present in the table.
More functions related to table in python should be performed further:
Like:
1) highlighting the specified column using python
2) doing modifications with particular column like sorting data as per the date or quantity using python
This is an example if you want to import a csv file or txt file with Python.
I use Numpy in order to make that :
#!/usr/bin/env python
import numpy as np
file = np.loadtxt('filename', delimiter=',') # Put the delimiter type from your csv file : coma, blank, ..
print file # print the numpy array
If you want to make a sort in your array, you can use the numpy function :
np.sort()
Doc is there
Now, try to make something, because it's important to get a script before to post your ask ;)