I have data file(it has not extension-file type) that I think it is combination of two types of strings: one string is plain text and another one is string which has been compressed by LZH(LHA) algorithm,when I open the file with notepad++ it is look like this:
as it shows in picture some data is readable but other data are compressed, is there any software or source code in python,c++, php, or any other language that can read this file by chunk and decompressed them?
I googled but I found many source codes or software that decompress lzh,but they decomprssed a file that totally compressed by lzh, and first they check the head of file and if it is not lzh file give error.
Related
I have a list of tuples [(x,y,z),...,] and I want to store this list of tuples in a file. For this I chose a .txt file. I write to the file in the mode "wb" and then I close it. Later, I want to open the file in mode "rb" and convert this byte object back to a list of tuples. How would I go about this without regular expression nonsense? Is there a file type that would allow me to store this data and read it easily that I've overlooked?
The .txt extension is typically not used for binary data, as you seem to intend.
Since your data structure is not known on a byte level, it's not that simple.
If you do know your data (types and length), you could "encode" it as a binary structure with https://docs.python.org/3.4/library/struct.html and write that to a (binary) file.
Otherwise, there are many solutions to the problem of writing (structured) data to and reading data from files (that's why there are soo many file formats):
Standard library:
https://docs.python.org/3/library/fileformats.html
https://docs.python.org/3/library/persistence.html
https://docs.python.org/3/library/xml.html
https://docs.python.org/3/library/json.html
3rd party:
https://pypi.python.org/pypi/PyYAML
and other modules on https://pypi.python.org/
Related Q&A on Stackoverflow:
How to save data with Python?
I am dealing with a somewhat large binary file (717M). This binary file contains a set (unknown number!) of complete zip files.
I would like to extract all of those zip files (no need to explitly decompress them). I am able to find the offset (start point) of each chunks thanks to the magic number ('PK') but I fail to find a way to compute the length for each chunk (eg. to carve those zip file out of the large binary file).
Reading some documentation (http://forensicswiki.org/wiki/ZIP), gives me the impression it is easy to parse a zip file since it contains the compressed size of each compressed file.
Is there a way for me to do that in C or Python without reinventing the wheel ?
A zip entry is permitted to not contain the compressed size in the local header. There is a flag bit to have a descriptor with the compressed size, uncompressed size, and CRC follow the compressed data.
It would be more reliable to search for end-of-central-directory headers, use that to find the central directories, and use that to find the local headers and entries. This will require attention to detail, very carefully reading the PKWare appnote that describes the zip format. You will need to handle the Zip64 format as well, which has additional headers and fields.
It is possible a zip entry to be stored, i.e. copied verbatim into that location in the zip file, and it is possible for that entry to itself be a zip file. So make sure that you handle the case of embedded zip files, extracting only the outermost zip files.
There are some standard ways to handle zip files in python for example but as far as i know (not that i'm an expert) you first need to supply the actual file somehow. I suggest looking at the zip file format specification.
You should be able to find the other information you need based on the relative position to the magic number. If I'm not mistaken the CRC-32 is the magic number, so jumping forward 4 bytes will get you to the compressed size, and another 8 bytes should get you the file name.
local file header signature 4 bytes (0x04034b50)
version needed to extract 2 bytes
general purpose bit flag 2 bytes
compression method 2 bytes
last mod file time 2 bytes
last mod file date 2 bytes
crc-32 4 bytes
compressed size 4 bytes
uncompressed size 4 bytes
file name length 2 bytes
extra field length 2 bytes
file name (variable size)
extra field (variable size)
Hope that helps a little bit at least :)
I am trying to read the text data out of an mp3 file, and then save it to a different mp3 file in python. I DON´T simply want to move the file, as I will be trying to modify it´s contents in the future.
Here is my code:
encoding1="latin-1"
with open(path.get(),"r", encoding=encoding_1) as f:
file=f.read()
...
...
with open("D:\\test\\music_2.mp3","w+", encoding=encoding_1) as f:
f.write(file)
I already tried different combinations of .encode() and .decode() with latin1 and utf8, but that didn´t work either.
Here are some notes on my problem:
The file I save has about 32.000 more symbols than the original one for some reason, even though it should have the exact same length
I don´t get an error message, but the mp3 file is just noise, not music
If I don´t use encoding="latin-1", there is an error message, usually already while reading the file
In one of these error messages, there was a problem with the letter "ï"
mp3 files are not text files. You need to open them as binary files, so that certain characters are not translated. You also will not need to worry about encoding with binary files as you are dealing with binary data not text. To open a file as binary you need to pass the a b to the file mode. open(file, mode)
with open(path.get(),"rb") as f:
You can then parse the file and get to the text data in the binary mp3 file.
I'm relatively new to programming and using Python, and I couldn't find anything on here that quite answered my question. Basically what I'm looking to do is extract a certain section of about 150 different .txt files and collect each of these pieces into a single .txt file.
Each of the .txt files contains DNA sequence alignment data, and each file basically reads out several dozen different possible sequences. I'm only interested in one of the sequences in each file, and I want to be able to use a script to excise that sequence from all of the files and combine them into a single file that I can then feed into a program that translates the sequences into protein code. Really what I'm trying to avoid is having to go one by one through each of the 150 files and copy/paste the desired sequence into the software.
Does anyone have any idea how I might do this? Thanks!
Edit: I tried to post an image of one of the text files, but apparently I don't have enough "reputation."
Edit2: Hi y'all, I'm sorry I didn't get back to this sooner. I've uploaded the image, here's a link to the upload: http://imgur.com/k3zBTu8
Im assuming you have 150 fasta files and in each fasta file you have sequence id that you want its sequence. you could use Biopython module to do this, put all your 150 files in a folder such as "C:\seq_folder"(folder should not contain any other file, and txt files should not be open)
import os
from Bio import SeqIO
from Bio.Seq import Seq
os.chdir('C:\\seq_folder') # changing working directory, to make it easy for python finding txt files
seq_id=x # the sequence id you want the sequence
txt_list=os.listdir('C:\\seq_folder')
result=open('result.fa','w')
for item in txt_list:
with open (item,'rU') as file:
for records in SeqIO.parse(file,'fasta'):
if records.id == seq_id:
txt.write('>'+records.id+'\n')
txt.write(str(records.seq)+'\n')
else:
continue
result.close()
this code will produce a fasta file including the sequence from your desired id from all the files and put them in 'result.fa'. you can also translate them into protein using Biopythn module.
i have a csv file and a text file. is it possible to compare the values in both files? or should i have the values of both in a csv file to make it easier?
is it possible to compare the values
in both files?
Yes. You can open them both in binary mode an compare the bytes, or in text mode and compare the characters. Neither will be particularly useful, though.
or should i have the values of both in
a csv file to make it easier?
Convert them both to list-of-lists format. For the CSV file, use a csv.reader. For the text file, use [line.split('\t') for line in open('filename.txt')] or whatever the equivalent is for your file format.
Yes, you can compare values from any N sources. You have to extract the values from each and then compare them. If you make your question more specific (the format of the text file for instance), we might be able to help you more.
csv itself is of course text as well. And that's basically the problem when "comparing", there's no "text file standard". Even csv isn't that strictly defined, and there's no normal form. For exmaple, should a header be included? Is column ordering relevant?
How are fields separated in the textfile? Fixed width records? Newlines? Special markers (like csv)?
,
If you know the format of the textfile, you can read/parse it and compare the result with the csv file (which you will also need to read/parse of course), or generate csv from the textfile and compare that using diff.