|
 |
|
 |
08-06-2006, 01:29 PM
|
#1 (permalink)
|
|
Senior Contributor
Join Date: Mar 2005
Posts: 672
|
[TIP] Easy filesize to human readable
You ever wanted to have nice and clean filesize viewing?
I've seen many solutions but they are very bloated for just that simple task, hack I've even seen 72 lines of code in an article, that i still don't understand, on codeproject: http://www.codeproject.com/cpp/formatsize.asp
The trick is to know your <math.h>
WikiPedia properly describes which powers are used.
So how do we know in which range we are?
This is simple to understand if you've read the WikiPedia article.
KB = 1024
MB = 1024*1024
GB = 1024*1024*1024
TB.....
PHP Code:
#include <math.h> #include <stdio.h>
// Bytes, Kilo, Mega, Giga, Tera, Peta, Exa, Zeta and Yotta static const char* humanreadablebytes[] = {"B","KB","MB","GB","TB","PB","EB","ZB","YB"};
int main(void) { // the filesize to calculate unsigned long filesize = 22825; // calculate the humanreadable bytes index int i = (int) floor( log(filesize) / log(1024) ); // output the data printf("filesize: %.02f %s\n", filesize/pow(1024, i), humanreadablebytes[i]); return 0; }
output:
log Calculates the natural logarithm.
pow Calculates x to the power of y.
Basicly speaking "log() / log()" is the reversion of pow().
More information can be read here: http://en.wikipedia.org/wiki/Pow
You see? Only 4 lines of code to calculate a clean human readable output.
Want a rough estimated one that rounds up/down if the ammount is >= 1000?
PHP Code:
// the filesize to calculate unsigned long filesize = 1062793248; // calculate the humanreadable bytes index int i = (int) floor( log(filesize) / log(1024) ); // Get a rougher size to prevent large outputs if (filesize/pow(1024, i) >= 1000) { ++i; } // Prevent unknown format if (i > 8) { i=8; } // output the data printf_d("%.02f %s\n", filesize/pow(1024, i), humanreadablebytes[i]);
output:
Just my $0.02 to make your life easier.
__________________

UT: Ultra-kill... God like!
|
|
|
08-06-2006, 07:44 PM
|
#2 (permalink)
|
|
Anti-Zealot
Join Date: Feb 2006
Location: Atlanta, GA
Posts: 72
|
Buffer Overflow
Cool idea, but that's far too complicated to be of good use like that.
Take this as constructive criticism.
Your implementation makes error checking more complicated than this function really needs. Basically, you would have to implement a more proper algorithm to do this in reverse to ensure that if your array of formats (KB, MB, etc) stops early, your algorithm will still work.
In other words, this is a better, in terms of stability, way to do it (the code is crude, I threw this together and tested it in about 10 minutes):
Code:
char *mem_types[] = { "B", "KB", "MB", "GB" };
int numTypes = sizeof(mem_types)/sizeof(char*);
double getHRSize(double fileSize_Bytes, int base, int *type) {
double HRSize = fileSize_Bytes;
int baseLevel = 0;
for (;baseLevel < numTypes-1 && fileSize_Bytes >= base;baseLevel++) {
fileSize_Bytes /= base;
HRSize = fileSize_Bytes;
}
*type = baseLevel;
return HRSize;
}
With example usage:
Code:
const int bibiBytebase = 1024;
const int byteBase = 1000;
void outputHR(double fileSize) {
double HRSize;
int type;
HRSize = getHRSize(fileSize, bibiBytebase, &type);
std::cout << "FileSize: " << HRSize << " " << mem_types[type] << std::endl;
}
So, with that code, it will not crash regardless of how big the input filesize is. If you used the above memory types (mem_types[]) in your code, you would read outside of your array for anything larger than 1023 GB.
Your algorithm is actually very elegant, you can quickly find out just how many divisions by the base are necessary which will directly correlate to the human readable format. The problem is, you would then have to do the algorithm I just posted in reverse to get it into a format that your types array supports. In other words, the above algorithm supports human readable types indefinitely (or as long as the data types will hold out).
There is also the problem of efficiency. Your algorithm requires two logs and raising to a power. One of the two logs (log(1024)) can theoretically be optimized out by the compiler, but chances are it won't be.
The direct approach (what I posted) will, in general, require only a few iterations (less than 4). It will scale linearly, so at a higher number of iterations, yours would be superior assuming an indefinitely long memory types list. However, it could be modified to use your algorithm to "jump" to the solution, then reverse solve into a solution that fits into the memory types array.
This is overkill, but somewhat important considering program security nowadays. Plus, on many systems, you might not have access to math.h, be dissallowed/discouraged from using math.h, lack a fast implementation of math.h, etc etc.
__________________
If you always think like an expert, you'll always be a beginner. | "A handful of knowledgeable people is more effective than an army of fools" -Writing Secure Code, 2nd Ed.
|
|
|
08-07-2006, 01:14 AM
|
#3 (permalink)
|
|
Senior Contributor
Join Date: Mar 2005
Posts: 672
|
Quote:
|
Originally Posted by AssKoala
So, with that code, it will not crash regardless of how big the input filesize is. If you used the above memory types (mem_types[]) in your code, you would read outside of your array for anything larger than 1023 GB.
|
Yes i should have posted:
PHP Code:
// the filesize to calculate double filesize = 22825;
I didn't because Windows mostly uses DWORD for filesizes and DWORD is unsigned long, so Windows will crash on terrabytes anyway.
Yours is slick as well, i've used a variant of that many times.
PHP Code:
while (fileSize_Bytes >= base) { fileSize_Bytes /= base; ++baseLevel; } if (baseLevel > 8) baseLevel = 8;
It does the same but also infinite.
I just wanted one using math.h and try to reduce the ammount of code used, and i must say it can't be less then 4 lines 
__________________

UT: Ultra-kill... God like!
|
|
|
08-07-2006, 05:08 AM
|
#4 (permalink)
|
|
Anti-Zealot
Join Date: Feb 2006
Location: Atlanta, GA
Posts: 72
|
Quote:
|
Originally Posted by DJMaze
I didn't because Windows mostly uses DWORD for filesizes and DWORD is unsigned long, so Windows will crash on terrabytes anyway.
|
Well, Windows uses two DWORD's to represent the File Size. Check out the documentation for "GetFileSize()". Maximum file size (assuming crash at a file that's too large) is 2^64 bytes on 32-bit systems.
Not that that matters since the File System will determine the maximum size
Quote:
I just wanted one using math.h and try to reduce the ammount of code used, and i must say it can't be less then 4 lines
|
Oh, don't get me wrong, yours is elegant and a fancy way to do it. I like it 
__________________
If you always think like an expert, you'll always be a beginner. | "A handful of knowledgeable people is more effective than an army of fools" -Writing Secure Code, 2nd Ed.
|
|
|
08-08-2006, 10:20 AM
|
#5 (permalink)
|
|
Code Monkey
Join Date: Aug 2002
Location: Boston, MA
Posts: 79
|
ls -lh 
__________________
|
|
|
08-08-2006, 11:10 AM
|
#6 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
|
Or stat() and then use the provided st_size.
But since this can't be dealth with in a ISO/ANSI way, I guess that was the reason I didn't comment on it in the first place...
|
|
|
08-08-2006, 05:37 PM
|
#7 (permalink)
|
|
Anti-Zealot
Join Date: Feb 2006
Location: Atlanta, GA
Posts: 72
|
Quote:
|
Originally Posted by toe_cutter
ls -lh 
|
Seeing as this is the C++ forum, that's just a stupid comment.
Only a fool would execute a separate application like ls and use a parser to extract the file size data for what can be done in 10 lines of code.
Nevermind that it's OS specific, succeptable to problems if ls's output changes, etc. So,
Quote:
|
Originally Posted by redhead
Or stat() and then use the provided st_size.
But since this can't be dealth with in a ISO/ANSI way, I guess that was the reason I didn't comment on it in the first place...
|
That has no relevance to what DJMaze posted or what I responded about.
The purpose of the post was to convert the file size to a human readable number. The structure provided by stat only provides the file size in bytes in the form of the member st_size. It has no bearing whatsoever on the post.
Now, had he said here's an ANSI way to obtain the file size:
Code:
int fileSize(FILE *fp) {
int size=0,c=getc(fp);
for (;c!=EOF;c=getc(fp),size++);
return size;
}
Then responding with functions like stat(), GetFileSize(), GetFileSizeEx(), read_vnode() or whatever other OS specific functions would apply would make sense.
__________________
If you always think like an expert, you'll always be a beginner. | "A handful of knowledgeable people is more effective than an army of fools" -Writing Secure Code, 2nd Ed.
|
|
|
08-08-2006, 07:42 PM
|
#8 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
|
Quote:
|
That has no relevance to what DJMaze posted or what I responded about.
|
My my, are we touchy about this...
True DjMs orriginal post has nothing OS specific in it, but discussing DWORD sizes and calls to ls are quite the opposite, and since my response was regarding toe_cutter, I see no need to question my response..
|
|
|
08-09-2006, 12:28 PM
|
#9 (permalink)
|
|
Anti-Zealot
Join Date: Feb 2006
Location: Atlanta, GA
Posts: 72
|
Quote:
|
Originally Posted by redhead
... but discussing DWORD sizes ...
|
Not quite, seeing as my response to DJM's post was regarding potential overflows due to file sizes.
Quote:
|
Originally Posted by redhead
since my response was regarding toe_cutter, I see no need to question my response..
|
I'm not sure how capable you are in a forum, but general forum etiquette is to quote the person you are responding to. This forum doesn't have a hierchy of responses; if it did, then quoting wouldn't be strictly necessary.
__________________
If you always think like an expert, you'll always be a beginner. | "A handful of knowledgeable people is more effective than an army of fools" -Writing Secure Code, 2nd Ed.
|
|
|
08-10-2006, 06:27 AM
|
#10 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
|
Quote:
|
This forum doesn't have a hierchy of responses;
|
It is called viewing thread in lineary mode...
|
|
|
08-22-2006, 06:34 AM
|
#11 (permalink)
|
|
Code Monkey
Join Date: Aug 2002
Location: Boston, MA
Posts: 79
|
Quote:
|
Originally Posted by AssKoala
Seeing as this is the C++ forum, that's just a stupid comment.
Only a fool would execute a separate application like ls and use a parser to extract the file size data for what can be done in 10 lines of code.
Nevermind that it's OS specific, succeptable to problems if ls's output changes, etc. So, 
|
I never suggested writing a parser because the out put is already in human readable form. I believe in not reinventing the wheel.
But you are right this is a C++ forums, and it is OS specific, but then again you could run cygwin and it would still work.
TC
__________________
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -8. The time now is 04:21 PM.
|
Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting
|
 |
|